Hash Table In C++: Programs to Implement Hash Table and Hash Maps

This Tutorial Explains C++ Hash Tables And Hash Maps. You Will Also Learn About Hash Table Applications And Implementation in C++:

Hashing is a technique using which we can map a large amount of data to a smaller table using a “hash function”.

Using the hashing technique, we can search the data more quickly and efficiently when compared to other searching techniques like linear and binary search.

Let us understand the hashing technique with an example in this tutorial.

=> Read Through The Easy C++ Training Series.

Hashing in C++

Hashing In C++

Let us take an example of a college library which houses thousands of books. The books are arranged according to subjects, departments, etc. But still, each section will have numerous books which thereby make searching for books highly difficult.

Thus, to overcome this difficulty we assign a unique number or key to each book so that we instantly know the location of the book. This indeed is achieved through hashing.

Continuing with our library example, instead of identifying each book based on its department, subject, section, etc. that can result in a very long string, we compute a unique integer value or key for each book in the library using a unique function and store these keys in a separate table.

The unique function mentioned above is called the “Hash function” and the separate table is called “Hash Table”. A hash function is used to map the given value to a particular unique key in the Hash Table. This results in faster access to elements. The more efficient the hashing function is, the more efficient will be the mapping of each element to the unique key.

Let us consider a hash function h(x) that maps the value “x” at “x%10” in the array. For the given data, we can construct a hash table containing keys or Hash codes or Hashes as shown in the below diagram.

Hash codes

In the above diagram, we can see that the entries in the array are mapped to their positions in the hash table using a hash function.

Thus we can say that hashing is implemented using two steps as mentioned below:

#1) The value is converted into a unique integer key or hash by using a hash function. It is used as an index to store the original element, which falls into the hash table.

In the above diagram, value 1 in the hash table is the unique key to store element 1 from the data array given on the LHS of the diagram.

#2) The element from the data array is stored in the hash table where it can be quickly retrieved using the hashed key. In the above diagram, we saw that we have stored all the elements in the hash table after computing their respective locations using a hash function. We can use the following expressions to retrieve hash values and index.

hash = hash_func(key)
index = hash % array_size

Hash Function

We already mentioned that the efficiency of mapping depends on the efficiency of the hash function that we use.

A hash function basically should fulfill the following requirements:

  • Easy to Compute: A hash function, should be easy to compute the unique keys.
  • Less Collision: When elements equate to the same key values, there occurs a collision. There should be minimum collisions as far as possible in the hash function that is used. As collisions are bound to occur, we have to use appropriate collision resolution techniques to take care of the collisions.
  • Uniform Distribution: Hash function should result in a uniform distribution of data across the hash table and thereby prevent clustering.

Hash Table C++

Hash table or a hash map is a data structure that stores pointers to the elements of the original data array.

In our library example, the hash table for the library will contain pointers to each of the books in the library.

Having entries in the hash table makes it easier to search for a particular element in the array.

As already seen, the hash table uses a hash function to compute the index into the array of buckets or slots using which the desired value can be found.

Consider another example with following data array:

Example Hash Table

Assume that we have a hash table of size 10 as shown below:

hash table of size 10

Now let’s use the hash function given below.

Hash_code = Key_value % size_of_hash_table

This will equate to Hash_code = Key_value%10

Using the above function, we map the key values to the hash table locations as shown below.

Data itemHash functionHash_code
2525%10 = 55
2727%10 = 77
4646%10 = 66
7070%10 = 00
8989%10 = 99
3131%10 = 11
2222%10 = 22

Using the above table, we can represent the hash table as follows.

hash table example

Thus when we need to access an element from the hash table, it will just take O (1) time to do the search.

Collision

We usually compute the hash code using the hash function so that we can map the key value to the hash code in the hash table. In the above example of the data array, let us insert a value 12. In that case, the hash_code for key value 12 will be 2. (12%10 = 2).

But in the hash table, we already have a mapping to key-value 22 for hash_code 2 as shown below:

Collision

As shown above, we have the same hash code for two values, 12 and 22 i.e. 2. When one or more key values equate to the same location, it results in a collision. Thus the hash code location is already occupied by one key value and there is another key value that needs to be placed in the same location.

In the case of hashing, even if we have a hash table of very large size then a collision is bound to be there. This is because we find a small unique value for a large key in general, hence it is completely possible for one or more value to have the same hash code at any given time.

Given that a collision is inevitable in hashing, we should always look for ways to prevent or resolve the collision. There are various collision resolution techniques that we can employ to resolve the collision occurring during hashing.

Collision Resolution Techniques

The following are the techniques that we can employ to resolve collision in the hash table.

Separate Chaining (Open Hashing)

This is the most common collision resolution technique. This is also known as open hashing and is implemented using a linked list.

In separate chaining technique, each entry in the hash table is a linked list. When the key matches the hash code, it is entered into a list corresponding to that particular hash code. Thus when two keys have the same hash code, then both the entries are entered into the linked list.

For the above example, Separate Chaining is represented below.

separate chaining

The above diagram represents chaining. Here we use the mod (%) function. We see that when two key values equate to the same hash code, then we link these elements to that hash code using a linked list.

If the keys are uniformly distributed across the hash table then the average cost of looking up for the particular key depends on the average number of keys in that linked list. Thus separate chaining remains effective even when there is an increase in the number of entries than the slots.

The worst-case for separate chaining is when all the keys equate to the same hash code and thus are inserted in one linked list only. Hence, we need to look up for all the entries in the hash table and the cost which are proportional to the number of keys in the table.

Linear Probing (Open Addressing/Closed Hashing)

In open addressing or linear probing technique, all the entry records are stored in the hash table itself. When key-value maps to a hash code and the position pointed to by hash code is unoccupied, then the key value is inserted at that location.

If the position is already occupied, then using a probing sequence the key value is inserted in the next position which is unoccupied in the hash table.

For linear probing, the hash function may change as shown below:

hash = hash % hashTableSize
hash = (hash + 1) % hashTableSize
hash = (hash + 2) % hashTableSize
hash = (hash + 3) % hashTableSize

We see that in case of linear probing the interval between slots or successive probes is constant i.e. 1.

linear probing

In the above diagram, we see that in the 0th location we enter 10 using the hash function “hash = hash%hash_tableSize”.

Now the element 70 also equates to location 0 in the hash table. But that location is already occupied. Hence using linear probing we will find the next location which is 1. As this location is unoccupied, we place the key 70 at this location as shown using an arrow.

The resultant Hash Table is shown below.

Resultant Hash Table

Linear probing may suffer from the problem of “Primary Clustering” in which there is a chance that the continuous cells may get occupied and the probability of inserting a new element gets reduced.

Also if two elements get the same value at the first hash function, then both these elements will follow the same probe sequence.

Quadratic Probing

Quadratic probing is the same as linear probing with the only difference being the interval used for probing. As the name suggests, this technique uses non-linear or quadratic distance to occupy slots when a collision occurs instead of linear distance.

In quadratic probing, the interval between the slots is computed by adding an arbitrary polynomial value to the already hashed index. This technique reduces primary clustering to a significant extent but does not improve upon secondary clustering.

Double Hashing

The double hashing technique is similar to linear probing. The only difference between double hashing and linear probing is that in double hashing technique the interval used for probing is computed using two hash functions. Since we apply the hash function to the key one after the other, it eliminates primary clustering as well as secondary clustering.

Difference Between Chaining (Open Hashing) and Linear Probing (Open Addressing)

Chaining (Open Hashing)Linear Probing (Open Addressing)
Key values can be stored outside of the table using a separate linked list.Key values should be stored inside the table only.
The number of elements in the hash table may exceed the size of the hash table.The number of elements present in the hash table will not exceed the number of indices in the hash table.
Deletion is efficient in chaining technique.Deletion can be cumbersome. Can be avoided if not required.
Since a separate linked list is maintained for each location, the space taken is large.Since all entries are accommodated in the same table, space taken is lesser.

C++ Hash Table Implementation

We can implement hashing by using arrays or linked lists to program the hash tables. In C++ we also have a feature called “hash map” which is a structure similar to a hash table but each entry is a key-value pair. In C++ its called hash map or simply a map. Hash map in C++ is usually unordered.

There is a <map> header defined in Standard Template Library (STL) of C++ which implements the functionality of maps. We have covered STL Maps in detail in our tutorial on STL.

The following implementation is for hashing using the linked lists as a data structure for the hash table. We also use “Chaining” as a collision resolution technique in this implementation.

#include&amp;lt;iostream&amp;gt; 
#include &amp;lt;list&amp;gt; 
using namespace std; 
  
class Hashing 
{ 
int hash_bucket;    // No. of buckets 
  
    // Pointer to an array containing buckets 
list&amp;lt;int&amp;gt; *hashtable; 
public: 
Hashing(int V);  // Constructor 
  
    // inserts a key into hash table 
void insert_key(int val); 
  
    // deletes a key from hash table 
void delete_key(int key); 
  
    // hash function to map values to key 
int hashFunction(int x) { 
return (x % hash_bucket); 
    } 
  
void displayHash(); 
}; 
 
Hashing::Hashing(int b) 
{ 
this-&amp;gt;hash_bucket = b; 
hashtable = new list&amp;lt;int&amp;gt;[hash_bucket]; 
} 
 
//insert to hash table
void Hashing::insert_key(int key) 
{ 
    int index = hashFunction(key); 
    hashtable[index].push_back(key);  
} 
  
void Hashing::delete_key(int key) 
{ 
  // get the hash index for key 
  int index = hashFunction(key); 
  
  // find the key in (inex)th list 
  list &amp;lt;int&amp;gt; :: iterator i; 
  for (i = hashtable[index].begin(); 
  i != hashtable[index].end(); i++) { 
 if (*i == key) 
 break; 
  } 
  // if key is found in hash table, remove it 
 if (i != hashtable[index].end()) 
 hashtable[index].erase(i); 
} 
  
// display the hash table 
void Hashing::displayHash() { 
for (int i = 0; i &amp;lt; hash_bucket; i++) { 
 cout &amp;lt;&amp;lt; i; 
for (auto x : hashtable[i]) 
 cout &amp;lt;&amp;lt; " --&amp;gt; " &amp;lt;&amp;lt; x; 
cout &amp;lt;&amp;lt; endl; 
  } 
} 
// main program  
int main() { 
  // array that contains keys to be mapped 
int hash_array[] = {11,12,21, 14, 15}; 
int n = sizeof(hash_array)/sizeof(hash_array[0]); 
  
  Hashing h(7);   		// Number of buckets = 7
                
  //insert the keys into the hash table
for (int i = 0; i &amp;lt; n; i++)  
h.insert_key(hash_array[i]);   
    // display the Hash table 
 cout&amp;lt;&amp;lt;"Hash table created:"&amp;lt;&amp;lt;endl;
 h.displayHash(); 
  
  // delete 12 from hash table 
  h.delete_key(12); 
  
  // display the Hash table
  cout&amp;lt;&amp;lt;"Hash table after deletion of key 12:"&amp;lt;&amp;lt;endl;
  h.displayHash(); 
  
  return 0; 
}

Output:

Hash table created:

0 –> 21 –> 14
1 –> 15
2
3
4 –> 11
5 –> 12
6

Hash table after deletion of key 12:

0 –> 21 –> 14
1 –> 15
2
3
4 –> 11
5
6

The output shows a hash table which is created of size 7. We use chaining to resolve collision. We display the hash table after deleting one of the keys.

Applications Of Hashing

#1) Verification Of Passwords: Verification of passwords is usually done by using cryptographic hash functions. When the password is entered, the system calculates the hash of the password and is then sent to the server for verification. On the server, the hash values of the original passwords are stored.

#2) Data Structures: Different data structures like unordered_set and unordered_map in C++, dictionaries in python or C#, HashSet and hash map in Java all use key-value pair wherein keys are unique values. The values can be the same for different keys. Hashing is used to implement these data structures.

#3) Message Digest: This is yet another application that uses a cryptographic hash. In message digests, we compute a hash for data being sent and received or even files and compare them with the stored values to ensure that the data files are not tampered with. The most common algorithm here is “SHA 256”.

#4) Compiler Operation: When the compiler compiles a program, the keywords for programming language are stored differently from the other identifies. The compiler uses a hash table for storing these keywords.

#5) Database Indexing: Hash tables are used for database indexing and disk-based data structures.

#6) Associative Arrays: Associative arrays are arrays whose indices are of data type other than integer-like strings or other object types. Hash tables can be used for implementing associative arrays.

Conclusion

Hashing is the most widely used data structure as it takes constant time O (1) for insert, delete, and search operations. Hashing is mostly implemented by using a hash function that computes a unique smaller key value for large data entries. We can implement hashing using arrays and linked lists.

Whenever one or more data entries equate to the same values of keys, it results in a collision. We have seen various collision resolution techniques including linear probing, chaining, etc. We have also seen the implementation of hashing in C++.

To conclude, we can say that hashing is by far the most efficient data structure in the programming world.

=> Look For The Entire C++ Training Series Here.