With an array, we can search for an element with time complexity O(1). However, it has limitations, such as it stores similar data types only, each cell of an array occupies the same amount of space, and to find an element, we require its index value.
To make things worse, finding the index value of an element in an array can have a time complexity of O(n) or O(log n). By using the concept of hashing, we can build a data structure that can search elements with constant time complexity.
What is Hashing?
Hashing is a technique in which we store data in an array at specific indices using some methods, rather than storing the data in an ascending or descending order or randomly.
Suppose if we want to store 4 in an array, we perform some methods or operations on 4 and calculate the perfect index value for it, and if we want to retrieve 4 from the array, we just reverse that method or operation and get 4 with constant time complexity.
A Hashing Table or Hash Table is a collection of elements that are stored in a data structure using a hashing method, which makes it easy to find the elements later. The hash table consists of keys and indexes (or slots).
Here, a key represents the value that is stored in the table, and an index represents the index (location) of that key. Each position of the hash table, i.e., slots, can hold an item and is named by an integer value starting at 0. We can use an array to implement a hash table, and initially, all the elements of the array would be None.
Hashing Example: Suppose we want to insert 1, 3, 5, 7, 8, 10, and 11 in a hash table using hashing. Now, create an array of arbitrary size as shown below:
|Key Value||Hashing (key value % size of array)||Array Index|
|1||1 % 7 =||1|
|3||3 % 7 =||3|
|5||5 % 7 =||5|
|7||7 % 7 =||0|
|9||9 % 7 =||2|
|10||10 % 7 =||3 ( collision ) 3 is already occupied Collision resolution = 3+1= 4|
|11||11 % 7 =||4 (Collison) = 4+1=5(collision)= 5+1 = 6|
Array will be arr = [7, 1, 9, 3, 10, 5, 11, None, None, None, None, None, None]
Hash functions or hashing methods are used to map each element or key to a unique slot or index. A hash function is also used to minimize the number of collisions, and using some easy methods, it computes and evenly distributes the items in the hash table.
There are various hashing methods we can use to map a key to its slot. These are:
- Remainder Method
- Folding Method
- Mid Square Method
1) Remainder method
In the reminder method, we divide the key value with the total size of the table or array and use its remainder to specify the index or slot value of that key. For example, if we want to insert 9 in an array of size 20, so it would be placed at 9%20 = 9th index if the 9th index is free.
2) Folding Method
The folding method for constructing hash functions begins by dividing the item into equal-sized pieces (the last piece may not be of equal size, however). These pieces are then added together to give the resulting hash value. This hashing method applies to large numbers.
For example, if we want a hash table in which key elements are the mobile numbers of the customers, then the reminder method would not be efficient. We have to use the folding method here.
If our key was the phone number 436-555-4601. We would take the digits and divide them into groups of 2 (43,65,55,46,01). After the addition, 43+65+55+46+01, we get 210. If we assume our hash table has 11 slots, then we need to perform the extra step of dividing by 11 and keeping the remainder. 210 % 11 is 1, so the phone number 436-555-4601 hashes to slot 1 .
3. Mid-Square Method
In the mid-square method , we compute the key slot number or index. We first square the item and then extract some portion of the resulting digits.
For example: If the item were 44, we would first compute 442=1,936. By extracting the middle two digits, 93, and performing the remainder step, we get 93%11 = 5
Implementation of the Hash Table
Python Code for Implementing a Hash Table
#load factor = number of items / table size class Hash_Table(): def __init__(self,size): self.size=size self.slots= [None]*self.size self.data =[None]*self.size def put(self,key,data): hashvalue = self.hashfunction(key,len(self.slots)) #if slot is empty if self.slots[hashvalue]==None: self.slots[hashvalue]=key self.data[hashvalue]=data else: if self.slots[hashvalue]==key: #if slot has the similar key as old update it with new data self.data[hashvalue]=data else: nextslots =self.rehash(hashvalue,len(self.slots)) #if key is different while self.slots[nextslots]!=None and self.slots[nextslots]!=key: nextslots=self.rehash(nextslots,len(self.slots)) if self.slots[nextslots]==None: self.slots[nextslots]=key self.data[nextslots]=data else: self.data[nextslots]=data def hashfunction(self,key,size): return key%size def rehash(self,oldhash,size): return (oldhash+1)%size def get(self,key): startslot =self.hashfunction(key,len(self.slots)) data= None stop =False found= False position =startslot while self.slots[position]!=None and not found and not stop: if self.slots[position]==key: found=True data= self.data[position] else: position=self.rehash(position,len(self.slots)) if position==startslot: stop=True return data def __getitem__(self,key): return self.get(key) def __setitem__(self,key,data): self.put(key,data) h = Hash_Table(5) h='go' h='python' print(h) print(h)
Hashing is an effective way of searching stored elements. Choosing the best hash method for a given problem depends on the problem structure. Therefore, pay good attention to the problem first and then write a program to solve the same using the best hash function.
People are also reading: