- Published on
Handwriting a HashTable: Building a Strong Foundation for Algorithms from Scratch
- Authors
- Name
- light-city
Handwriting a HashTable to Lay a Solid Foundation for Algorithms from 0 to 1
Table of Contents
0. Preface
Aim: To handwrite an implementation of a hash table using separate chaining, where each hash(key) corresponds to a Red-Black Tree.
It may seem simple, but there's a lot to learn from it. Implementation language: C++.
To solidify the algorithmic foundation, a GitHub repository has been opened to complete the subsequent applications and implementations of algorithms:
1. Simplified Hash Table
We encapsulate the hash table in a class, defining and declaring traversal operations as well as implementing construction and destruction:
template<typename Key, typename Value>
class HashTable {
private:
const static int upperTol = 3;
const static int lowerTol = 1;
const static int initCapacity = 1;
map<Key, Value> **hashtable;
int M;
int size;
public:
/**
* Parameterized constructor
* @param M
*/
HashTable(int M) : M(M), size(0) {
this->hashtable = new map<Key, Value> *[M];
for (int i = 0; i < M; i++) {
this->hashtable[i] = new map<Key, Value>;
}
}
/**
* Default constructor
*/
HashTable() {
HashTable(initCapacity);
}
/**
* Destructor, release memory
*/
~HashTable() {
free(M);
}
private:
/**
* Release memory
* @param M
*/
void free(int M) {
for (int i = 0; i < M; i++) {
if (hashtable[i])
delete hashtable[i];
}
delete[]hashtable;
};
For the hash table implementation, there is a crucial hash function. Here we define one:
/**
* Hash function
* @param key
* @return
*/
int hashFunc(Key key) {
std::hash<Key> h;
return (h(key) & 0x7fffffff) % M;
}
In this function, after obtaining the value with std::hash
, we perform a bitwise AND operation with 0x7fffffff
to remove the negative sign bit and then take the modulo by M.
Now that we have these, let's implement the operations for adding, removing, updating, and querying.
Add operation
The underlying structure uses a Red-Black Tree. Insertion is done using the insert method by constructing a pair. When a key exists, simply update the value. For updates, direct insertion via insert won't work as intended. Thus, we use []
for modification or first deletion followed by insertion:
/**
* Add new element
* @param key
* @param value
*/
void add(Key key, Value value) {
// If the map from chaining is empty, dynamically allocate it and then insert
// If the key doesn't exist, check if memory exists, if not, allocate, else insert
if (hashtable[hashFunc(key)] == NULL || hashtable[hashFunc(key)]->count(key) == 0) {
if (hashtable[hashFunc(key)] == NULL)
hashtable[hashFunc(key)] = new map<Key, Value>;
hashtable[hashFunc(key)]->insert(make_pair(key, value));
size++;
if (size >= maxCapacity())
resize(2 * M);
} else {
// Otherwise, update the value
hashtable[hashFunc(key)]->erase(key);
hashtable[hashFunc(key)]->insert(make_pair(key, value));
}
}
Remove operation
If the key exists, delete it and decrement size, else return a failure flag:
/**
* Remove Key
* @param key
* @return 0 success -1 fail
*/
Value remove(Key key) {
Value ret = -1;
// If the key is present, directly delete it
if (contains(key)) {
hashtable[hashFunc(key)]->erase(key);
size--;
ret = 0;
// Ensure not to exceed initCapacity
if (size < minCapacity() && M / 2 >= initCapacity) resize(M / 2);
}
return ret;
}
Update operation
As mentioned earlier, here's the code for it:
/**
* Reset value
* @param key
* @param value
*/
void set(Key key, Value value) {
// If key doesn't exist
if (!contains(key))
throw "key not exists!";
// Update value
hashtable[hashFunc(key)]->erase(key);
hashtable[hashFunc(key)]->insert(make_pair(key, value));
}
Query operation
Get the value corresponding to the key:
/**
* Get the value corresponding to the key
* @param key
* @return
*/
Value get(Key key) {
if (contains(key))
return hashtable[hashFunc(key)]->at(key);
return 0;
}
Lastly, there are functions like contains
, resize
, etc., which are not explained above.
Check key existence
The contains
function checks if the key exists:
/**
* Check if key exists
* @param key
* @return
*/
bool contains(Key key) {
return hashtable[hashFunc(key)] == NULL || this->hashtable[hashFunc(key)]->count(key) == 0 ? false : true;
}
Get size
/**
* Get the number of elements in the hash table
* @return
*/
int getSize() {
return size;
}
Maximum and minimum capacity
/**
* Maximum capacity
* @return
*/
Value maxCapacity() {
return M * upperTol;
}
/**
* Minimum capacity
* @return
*/
Value minCapacity() {
return M * lowerTol;
}
Resize function
This function dynamically adjusts memory, copying content from the original space to the newly allocated one and releasing the original space:
/**
* Dynamically adjust memory to ensure O(1) time complexity for lookup
* Spread out the post-resize operations to every previous operation, making it O(2), thus O(1)
* @param newM
*/
void resize(int newM) {
cout << "resize " << newM << endl;
map<Key, Value> **newHashTable = new map<Key, Value> *[newM];
for (int i = 0; i < newM; i++) {
newHashTable[i] = new map<Key, Value>;
}
int oldM = M;
this->M = newM;
for (int i = 0; i < oldM; i++) {
map<Key, Value> m = *(hashtable[i]);
for (auto p:m)
newHashTable[hashFunc(p.first)]->insert(make_pair(p.first, p.second));
}
free(oldM);
this->hashtable = newHashTable;
}
Overload
<<
operator
Declaration:
private:
template<typename K, typename V>
// Overload << operator
friend ostream &operator<<(ostream &out, HashTable<K, V> &hashTable);
Definition:
template<typename K, typename V>
ostream &operator<<(ostream &out, HashTable<K, V> &hashTable) {
hashTable.print();
return out;
}
With this, the hash table implementation is complete. Now, let's test it:
#include "hash.h"
#include <vector>
int main() {
vector<string> words{"java", "c++", "c", "c++", "c#", "python", "ruby", "python",
"c", "c", "c++", "java", "c++", "rust", "python"};
HashTable<string, int> ht(1);
for (string word : words) {
if (ht.contains(word)) {
ht.set(word, ht.get(word) + 1);
} else {
ht.add(word, 1);
}
}
cout<<ht;
cout<<"size="<<ht.getSize()<<",maxCapacity="<<ht.maxCapacity()<<",minCapacity="<<ht.minCapacity()<<endl;
string w="c++";
ht.remove(w);
if (ht.contains(w))
cout << "" << w << ": " << ht.get(w) << endl;
else
cout << "No word " << w << " in words" << endl;
cout<<ht;
ht.remove("c#");
ht.remove("java");
ht.remove("c");
cout<<"size="<<ht.getSize()<<",maxCapacity="<<ht.maxCapacity()<<",minCapacity="<<ht.minCapacity()<<endl;
cout<<ht;
return 0;
}
Output:
resize 2
resize 4
{c#:1,java:2,ruby:1,c:3,rust:1,python:3,c++:4}
size=7,maxCapacity=12,minCapacity=4
No word c++ in words
{c#:1,java:2,ruby:1,c:3,rust:1,python:3}
resize 2
size=3,maxCapacity=6,minCapacity=2
{python:3,ruby:1,rust:1}
With this, a simple hash table implementation is completed.
1. Optimizing the Hash Table
In the gcc2.9 version, the underlying hash table modified its capacity dynamically using prime numbers. Hence, the optimization starts from there:
At the beginning of the class internally, add the following array:
// Prime number array
const vector<int> capacity = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,
196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,
50331653, 100663319, 201326611, 402653189, 805306457, 1610612741};
Remove the parameterized constructor and modify the default constructor as follows:
/**
* Default constructor
* @param M
*/
HashTable() {
M = capacity[capacityIndex], size = 0;
this->hashtable = new map<Key, Value> *[M];
for (int i = 0; i < M; i++) {
this->hashtable[i] = new map<Key, Value>;
}
}
Modify the add
function:
After size++
, add the following code:
if (size >= maxCapacity() && capacityIndex + 1 < capacity.size()) {
capacityIndex++;
resize(capacity[M]);
}
Each resize operation now takes values from the capacity array.
Modify the
remove
function
After size--
, modify:
if (size < minCapacity() && capacityIndex - 1 >= 0) {
capacityIndex--;
resize(capacityIndex);
}
With that, the hash table is complete! The testing code remains the same.