DSA: Hashing

By | January 12, 2020
DSA_ Hashing

Though using an Array, we can search an element with time complexity O(1), but the array has its limitation such as it stores similar data types, each cell of array occupies the same amount of space and to find an element we require its index value. To find the Index value of an element itself can take a time complexity of O(n) or O(log n).

Using the concept of Hashing we can build a data structure that can search elements with constant time complexity.

Vamware

What is Hashing?

Hashing is a Technique in which we store data, in an array, at specific indices using some methods, rather than then storing it in ascending order, descending order or randomly. Suppose if we want to store 4 in an array, we perform some methods or operations on 4 and calculate the perfect index value for it, and if we want to retrieve 4 from the array, we just reverse that method or operation and get 4 with constant time complexity.

Hashing Table

Hashing Table or Hash Table is a collection of elements which are stored in a data structure using a Hashing method, which makes it easy to find them later. The Hash table consists of key and index or slot, here key represents the value which will store in the table and index or slot represent the index location of that key.

Each position of the hash table, slots, can hold an item and is named by an integer value starting at 0. We can use an Array to implement a hash table and initially all the elements of the array would be None.

For Example:

Insert 1, 3 ,5 , 7 , 8, 10, 11 in a hash table using hashing.

Create an array arbitrary size

m=13
arr[m]
Key-Value Hashing (key value % size of array) Array Index
1 1 % 7 = 1
3 3 % 7 = 3
5 5 % 7 = 5
7 7 % 7 = 0
9 9 % 7 = 2
10 10 % 7 = 3 (collision) 3 is already occupied  

Collision resolution = 3+1= 4

11 11 % 7 = 4 (Collison) = 4+1=5(collision)= 5+1 = 6

Array will be arr = [7, 1,  9, 3, 10, 5, 11, None, None, None, None, None, None]

Hashing Function

Hash function, are also known as Hashing methods and they are used to map each element or key to a unique slot or index. Hash is also used to minimize the number of collisions and using some easy methods it computes and evenly distributes the items in the hash table.

There are various hashing methods we can use to map a key to its slot:

  • Remainder Method
  • Folding Method
  • Mid Square Method

1. Reminder method

In the reminder method, we use the divide the key value with the total size of the table or array and use it remainder to specify the index or slot value of that key.

For example, if we want to insert 9 in an array of size 20, so it would be placed at 9%20 = 9th index if 9th index is free.

2. Folding Method

The folding method for constructing hash functions begins by dividing the item into equal-size pieces (the last piece may not be of equal size).

These pieces are then added together to give the resulting hash value.

This hashing method applies to large digit numbers, for example, if we want a hash table in which key elements are the mobile numbers of the customers, this reminder method would not be efficient.

For instance:

If our key was the phone number 436-555-4601

We would take the digits and divide them into groups of 2 (43,65,55,46,01).

After the addition, 43+65+55+46+01, we get 210.

If we assume our hash table has 11 slots, then we need to perform the extra step of dividing by 11 and keeping the remainder.

210 % 11 is 1, so the phone number 436-555-4601 hashes to slot 1.

3. Mid Square Method

In the mid-square method, we to compute the key slot number or index location we first square the item and then extract some portion of the resulting digits.

For example:

if the item were 44, we would first compute 442=1,936.

By extracting the middle two digits, 93, and performing the remainder step, we get 93%11 = 5

Implementation of Hash table

Python

#load factor = number of items / table size
class Hash_Table():
    def __init__(self,size):
        self.size=size
        self.slots= [None]*self.size
        self.data =[None]*self.size

    def put(self,key,data):
        hashvalue = self.hashfunction(key,len(self.slots))

        #if slot is empty
        if self.slots[hashvalue]==None:
            self.slots[hashvalue]=key
            self.data[hashvalue]=data
        else:
            if self.slots[hashvalue]==key:          #if slot has the similar key as old update it with new data
                self.data[hashvalue]=data

            else:
                nextslots =self.rehash(hashvalue,len(self.slots))    #if key is different

                while self.slots[nextslots]!=None and self.slots[nextslots]!=key:
                    nextslots=self.rehash(nextslots,len(self.slots))
                    if self.slots[nextslots]==None:
                        self.slots[nextslots]=key
                        self.data[nextslots]=data
                    else:
                        self.data[nextslots]=data  
    def hashfunction(self,key,size):
        return key%size
 
    def rehash(self,oldhash,size):
        return (oldhash+1)%size

    def get(self,key):
        startslot =self.hashfunction(key,len(self.slots))
        data= None
        stop =False
        found= False
        position =startslot

        while self.slots[position]!=None and not found and not stop:
             if self.slots[position]==key:
                 found=True
                 data= self.data[position]
             else:
                 position=self.rehash(position,len(self.slots))
                 if position==startslot:
                     stop=True
        return data

    def __getitem__(self,key):
        return self.get(key)

    def __setitem__(self,key,data):
        self.put(key,data)

h = Hash_Table(5)
h[1]='go'
h[2]='python'
print(h[1])
print(h[2])

Output: 

go
python

Leave a Reply

Your email address will not be published. Required fields are marked *