Data Structure & Algorithms: Hashing, Hash Table, and Hash Function

With an array, we can search for an element with time complexity O(1). However, it has limitations, such as it stores similar data types only, each cell of an array occupies the same amount of space, and to find an element, we require its index value.

To make things worse, finding the index value of an element in an array can have a time complexity of O(n) or O(log n). By using the concept of hashing, we can build a data structure that can search elements with constant time complexity.

Let's learn more!

What is Hashing?

Hashing is a technique in which we store data in an array at specific indices using some methods rather than storing the data in an ascending or descending order or randomly.

Suppose if we want to store 4 in an array, we perform some methods or operations on 4 and calculate the perfect index value for it, and if we want to retrieve 4 from the array, we just reverse that method or operation and get 4 with constant time complexity.

Hashing Table

A hashing table or hash table is a collection of elements that are stored in a data structure using a hashing method, which makes it easy to find the elements later. The hash table consists of keys and indexes (or slots).

Here, a key represents the value that is stored in the table, and an index represents the index (location) of that key. Each position of the hash table, i.e., slots, can hold an item and is named by an integer value starting at 0. We can use an array to implement a hash table, and initially, all the elements of the array would be None.

Hashing Example: Suppose we want to insert 1, 3, 5, 7, 8, 10, and 11 in a hash table using hashing. Now, create an array of arbitrary size as shown below:

m=13
arr[m]

Key Value	Hashing (key value % size of array)	Array Index
1	1 % 7 =	1
3	3 % 7 =	3
5	5 % 7 =	5
7	7 % 7 =	0
9	9 % 7 =	2
10	10 % 7 =	3 ( collision ) 3 is already occupied Collision resolution = 3+1= 4
11	11 % 7 =	4 (Collison) = 4+1=5(collision)= 5+1 = 6

The array will be arr = [7, 1, 9, 3, 10, 5, 11, None, None, None, None, None, None]

Hashing Function

Hash functions or hashing methods are used to map each element or key to a unique slot or index. A hash function is also used to minimize the number of collisions, and using some easy methods, it computes and evenly distributes the items in the hash table.

There are various hashing methods we can use to map a key to its slot. These are:

Remainder Method
Folding Method
Mid Square Method

1) Remainder method

In the reminder method, we divide the key value by the total size of the table or array and use its remainder to specify the index or slot value of that key. For example, if we want to insert 9 in an array of size 20, it would be placed at 9%20 = 9th index if the 9th index is free.

2) Folding Method

The folding method for constructing hash functions begins by dividing the item into equal-sized pieces (the last piece may not be of equal size, however). These pieces are then added together to give the resulting hash value. This hashing method applies to large numbers.

For example, if we want a hash table in which key elements are the mobile numbers of the customers, then the reminder method would not be efficient. We have to use the folding method here.

If our key was the phone number 436-555-4601. We would take the digits and divide them into groups of 2 (43,65,55,46,01). After the addition of 43+65+55+46+01, we get 210. If we assume our hash table has 11 slots, then we need to perform the extra step of dividing by 11 and keeping the remainder. 210 % 11 is 1, so the phone number 436-555-4601 hashes to slot 1 .

3. Mid-Square Method

In the mid-square method, we compute the key slot number or index. We first square the item and then extract some portion of the resulting digits.

For example, If the item were 44, we would first compute 442=1,936. By extracting the middle two digits, 93, and performing the remainder step, we get 93%11 = 5

Implementation of the Hash Table

Python Code for Implementing a Hash Table:

#load factor = number of items / table size
class Hash_Table():
    def __init__(self,size):
        self.size=size
        self.slots= [None]*self.size
        self.data =[None]*self.size

    def put(self,key,data):
        hashvalue = self.hashfunction(key,len(self.slots))

        #if slot is empty
        if self.slots[hashvalue]==None:
            self.slots[hashvalue]=key
            self.data[hashvalue]=data
        else:
            if self.slots[hashvalue]==key:          #if slot has the similar key as old update it with new data
                self.data[hashvalue]=data

            else:
                nextslots =self.rehash(hashvalue,len(self.slots))    #if key is different

                while self.slots[nextslots]!=None and self.slots[nextslots]!=key:
                    nextslots=self.rehash(nextslots,len(self.slots))
                    if self.slots[nextslots]==None:
                        self.slots[nextslots]=key
                        self.data[nextslots]=data
                    else:
                        self.data[nextslots]=data  
    def hashfunction(self,key,size):
        return key%size
 
    def rehash(self,oldhash,size):
        return (oldhash+1)%size

    def get(self,key):
        startslot =self.hashfunction(key,len(self.slots))
        data= None
        stop =False
        found= False
        position =startslot

        while self.slots[position]!=None and not found and not stop:
             if self.slots[position]==key:
                 found=True
                 data= self.data[position]
             else:
                 position=self.rehash(position,len(self.slots))
                 if position==startslot:
                     stop=True
        return data

    def __getitem__(self,key):
        return self.get(key)

    def __setitem__(self,key,data):
        self.put(key,data)

h = Hash_Table(5)
h[1]='go'
h[2]='python'
print(h[1])
print(h[2])

Output:

go
python

Conclusion

Hashing is an effective way of searching stored elements. Choosing the best hash method for a given problem depends on the problem structure. Therefore, pay good attention to the problem first and then write a program to solve the same using the best hash function.

We hope that this tutorial helps you understand the topic of hashing in detail.

Keep learning!

People are also reading: