Though using an Array, we can search an element with time complexity O(1), but the array has its limitation such as it stores similar data types, each cell of array occupies the same amount of space and to find an element we require its index value. To find the Index value of an element itself can take a time complexity of O(n) or O(log n).
Using the concept of Hashing we can build a data structure that can search elements with constant time complexity.
What is Hashing?
Hashing is a Technique in which we store data, in an array, at specific indices using some methods, rather than then storing it in ascending order, descending order or randomly. Suppose if we want to store 4 in an array, we perform some methods or operations on 4 and calculate the perfect index value for it, and if we want to retrieve 4 from the array, we just reverse that method or operation and get 4 with constant time complexity.
Hashing Table
Hashing Table or Hash Table is a collection of elements which are stored in a data structure using a Hashing method, which makes it easy to find them later. The Hash table consists of key and index or slot, here key represents the value which will store in the table and index or slot represent the index location of that key.
Each position of the hash table, slots, can hold an item and is named by an integer value starting at 0. We can use an Array to implement a hash table and initially all the elements of the array would be None.
For Example:
Insert 1, 3 ,5 , 7 , 8, 10, 11 in a hash table using hashing.
Create an array arbitrary size
m=13 arr[m]
Key-Value | Hashing (key value % size of array) | Array Index |
1 | 1 % 7 = | 1 |
3 | 3 % 7 = | 3 |
5 | 5 % 7 = | 5 |
7 | 7 % 7 = | 0 |
9 | 9 % 7 = | 2 |
10 | 10 % 7 = | 3 (collision) 3 is already occupied
Collision resolution = 3+1= 4 |
11 | 11 % 7 = | 4 (Collison) = 4+1=5(collision)= 5+1 = 6 |
Array will be arr = [7, 1, 9, 3, 10, 5, 11, None, None, None, None, None, None]
Hashing Function
Hash function, are also known as Hashing methods and they are used to map each element or key to a unique slot or index. Hash is also used to minimize the number of collisions and using some easy methods it computes and evenly distributes the items in the hash table.
There are various hashing methods we can use to map a key to its slot:
- Remainder Method
- Folding Method
- Mid Square Method
1. Reminder method
In the reminder method, we use the divide the key value with the total size of the table or array and use it remainder to specify the index or slot value of that key.
For example, if we want to insert 9 in an array of size 20, so it would be placed at 9%20 = 9^{th} index if 9^{th} index is free.
2. Folding Method
The folding method for constructing hash functions begins by dividing the item into equal-size pieces (the last piece may not be of equal size).
These pieces are then added together to give the resulting hash value.
This hashing method applies to large digit numbers, for example, if we want a hash table in which key elements are the mobile numbers of the customers, this reminder method would not be efficient.
For instance:
If our key was the phone number 436-555-4601
We would take the digits and divide them into groups of 2 (43,65,55,46,01).
After the addition, 43+65+55+46+01, we get 210.
If we assume our hash table has 11 slots, then we need to perform the extra step of dividing by 11 and keeping the remainder.
210 % 11 is 1, so the phone number 436-555-4601 hashes to slot 1.
3. Mid Square Method
In the mid-square method, we to compute the key slot number or index location we first square the item and then extract some portion of the resulting digits.
For example:
if the item were 44, we would first compute 442=1,936.
By extracting the middle two digits, 93, and performing the remainder step, we get 93%11 = 5
Implementation of Hash table
Python
#load factor = number of items / table size class Hash_Table(): def __init__(self,size): self.size=size self.slots= [None]*self.size self.data =[None]*self.size def put(self,key,data): hashvalue = self.hashfunction(key,len(self.slots)) #if slot is empty if self.slots[hashvalue]==None: self.slots[hashvalue]=key self.data[hashvalue]=data else: if self.slots[hashvalue]==key: #if slot has the similar key as old update it with new data self.data[hashvalue]=data else: nextslots =self.rehash(hashvalue,len(self.slots)) #if key is different while self.slots[nextslots]!=None and self.slots[nextslots]!=key: nextslots=self.rehash(nextslots,len(self.slots)) if self.slots[nextslots]==None: self.slots[nextslots]=key self.data[nextslots]=data else: self.data[nextslots]=data def hashfunction(self,key,size): return key%size def rehash(self,oldhash,size): return (oldhash+1)%size def get(self,key): startslot =self.hashfunction(key,len(self.slots)) data= None stop =False found= False position =startslot while self.slots[position]!=None and not found and not stop: if self.slots[position]==key: found=True data= self.data[position] else: position=self.rehash(position,len(self.slots)) if position==startslot: stop=True return data def __getitem__(self,key): return self.get(key) def __setitem__(self,key,data): self.put(key,data) h = Hash_Table(5) h[1]='go' h[2]='python' print(h[1]) print(h[2])
Output:
go python