Published on

Handwriting a HashTable: Building a Strong Foundation for Algorithms from Scratch

Authors
  • avatar
    Name
    light-city

Handwriting a HashTable to Lay a Solid Foundation for Algorithms from 0 to 1

Table of Contents

0. Preface

Aim: To handwrite an implementation of a hash table using separate chaining, where each hash(key) corresponds to a Red-Black Tree.

It may seem simple, but there's a lot to learn from it. Implementation language: C++.

To solidify the algorithmic foundation, a GitHub repository has been opened to complete the subsequent applications and implementations of algorithms:

https://github.com/Light-City/algPratice

1. Simplified Hash Table

We encapsulate the hash table in a class, defining and declaring traversal operations as well as implementing construction and destruction:

template<typename Key, typename Value>
class HashTable {
private:
    const static int upperTol = 3;
    const static int lowerTol = 1;
    const static int initCapacity = 1;
    map<Key, Value> **hashtable;
    int M;
    int size;
public:
    /**
     * Parameterized constructor
     * @param M
     */
    HashTable(int M) : M(M), size(0) {
        this->hashtable = new map<Key, Value> *[M];
        for (int i = 0; i < M; i++) {
            this->hashtable[i] = new map<Key, Value>;
        }
    }

    /**
     * Default constructor
     */
    HashTable() {
        HashTable(initCapacity);
    }

    /**
     * Destructor, release memory
     */
    ~HashTable() {
        free(M);
    }
private:
    /**
     * Release memory
     * @param M
     */
    void free(int M) {
        for (int i = 0; i < M; i++) {
            if (hashtable[i])
                delete hashtable[i];
        }
        delete[]hashtable;
    
};

For the hash table implementation, there is a crucial hash function. Here we define one:

/**
* Hash function
* @param key
* @return
*/
int hashFunc(Key key) {
    std::hash<Key> h;
    return (h(key) & 0x7fffffff) % M;
}

In this function, after obtaining the value with std::hash, we perform a bitwise AND operation with 0x7fffffff to remove the negative sign bit and then take the modulo by M.

Now that we have these, let's implement the operations for adding, removing, updating, and querying.

Add operation

The underlying structure uses a Red-Black Tree. Insertion is done using the insert method by constructing a pair. When a key exists, simply update the value. For updates, direct insertion via insert won't work as intended. Thus, we use [] for modification or first deletion followed by insertion:

/**
* Add new element
* @param key
* @param value
*/
void add(Key key, Value value) {
    // If the map from chaining is empty, dynamically allocate it and then insert
    // If the key doesn't exist, check if memory exists, if not, allocate, else insert
    if (hashtable[hashFunc(key)] == NULL || hashtable[hashFunc(key)]->count(key) == 0) {
        if (hashtable[hashFunc(key)] == NULL)
            hashtable[hashFunc(key)] = new map<Key, Value>;
        hashtable[hashFunc(key)]->insert(make_pair(key, value));
        size++;
        if (size >= maxCapacity())
            resize(2 * M);
    } else {
        // Otherwise, update the value
        hashtable[hashFunc(key)]->erase(key);
        hashtable[hashFunc(key)]->insert(make_pair(key, value));
    }
}

Remove operation

If the key exists, delete it and decrement size, else return a failure flag:

/**
* Remove Key
* @param key
* @return 0 success -1 fail
*/
Value remove(Key key) {
    Value ret = -1;
    // If the key is present, directly delete it
    if (contains(key)) {
        hashtable[hashFunc(key)]->erase(key);
        size--;
        ret = 0;
        // Ensure not to exceed initCapacity
        if (size < minCapacity() && M / 2 >= initCapacity) resize(M / 2);
    }
    return ret;
}

Update operation

As mentioned earlier, here's the code for it:

/**
* Reset value
* @param key
* @param value
*/
void set(Key key, Value value) {
    // If key doesn't exist
    if (!contains(key))
        throw "key not exists!";
    // Update value
    hashtable[hashFunc(key)]->erase(key);
    hashtable[hashFunc(key)]->insert(make_pair(key, value));
}

Query operation

Get the value corresponding to the key:

/**
* Get the value corresponding to the key
* @param key
* @return
*/
Value get(Key key) {
    if (contains(key))
        return hashtable[hashFunc(key)]->at(key);
    return 0;
}

Lastly, there are functions like contains, resize, etc., which are not explained above.

Check key existence

The contains function checks if the key exists:

/**
* Check if key exists
* @param key
* @return
*/
bool contains(Key key) {
    return hashtable[hashFunc(key)] == NULL || this->hashtable[hashFunc(key)]->count(key) == 0 ? false : true;
}

Get size

/**
* Get the number of elements in the hash table
* @return
*/
int getSize() {
    return size;
}

Maximum and minimum capacity

/**
* Maximum capacity
* @return
*/
Value maxCapacity() {
    return M * upperTol;
}

/**
* Minimum capacity
* @return
*/
Value minCapacity() {
    return M * lowerTol;
}

Resize function

This function dynamically adjusts memory, copying content from the original space to the newly allocated one and releasing the original space:

/**
* Dynamically adjust memory to ensure O(1) time complexity for lookup
* Spread out the post-resize operations to every previous operation, making it O(2), thus O(1)
* @param newM
*/
void resize(int newM) {
    cout << "resize " << newM << endl;
    map<Key, Value> **newHashTable = new map<Key, Value> *[newM];
    for (int i = 0; i < newM; i++) {
        newHashTable[i] = new map<Key, Value>;
    }
    int oldM = M;
    this->M = newM;
    for (int i = 0; i < oldM; i++) {
        map<Key, Value> m = *(hashtable[i]);
        for (auto p:m)
            newHashTable[hashFunc(p.first)]->insert(make_pair(p.first, p.second));
    }

    free(oldM);
    this->hashtable = newHashTable;
}

Overload << operator

Declaration:

private:
    template<typename K, typename V>
    // Overload << operator
    friend ostream &operator<<(ostream &out, HashTable<K, V> &hashTable);

Definition:

template<typename K, typename V>
ostream &operator<<(ostream &out, HashTable<K, V> &hashTable) {
    hashTable.print();
    return out;
}

With this, the hash table implementation is complete. Now, let's test it:

#include "hash.h"
#include <vector>
int main() {

    vector<string> words{"java", "c++", "c", "c++", "c#", "python", "ruby", "python",
                         "c", "c", "c++", "java", "c++", "rust", "python"};
    HashTable<string, int> ht(1);
    for (string word : words) {
        if (ht.contains(word)) {
            ht.set(word, ht.get(word) + 1);
        } else {
            ht.add(word, 1);
        }
    }
    cout<<ht;
    cout<<"size="<<ht.getSize()<<",maxCapacity="<<ht.maxCapacity()<<",minCapacity="<<ht.minCapacity()<<endl;
    string w="c++";
    ht.remove(w);
    if (ht.contains(w))
        cout << "" << w << ": " << ht.get(w) << endl;
    else
        cout << "No word " << w << " in words" << endl;
    cout<<ht;
    ht.remove("c#");
    ht.remove("java");
    ht.remove("c");
    cout<<"size="<<ht.getSize()<<",maxCapacity="<<ht.maxCapacity()<<",minCapacity="<<ht.minCapacity()<<endl;
    cout<<ht;

    return 0;
}

Output:

resize 2
resize 4
{c#:1,java:2,ruby:1,c:3,rust:1,python:3,c++:4}
size=7,maxCapacity=12,minCapacity=4
No word c++ in words
{c#:1,java:2,ruby:1,c:3,rust:1,python:3}
resize 2
size=3,maxCapacity=6,minCapacity=2
{python:3,ruby:1,rust:1}

With this, a simple hash table implementation is completed.

1. Optimizing the Hash Table

In the gcc2.9 version, the underlying hash table modified its capacity dynamically using prime numbers. Hence, the optimization starts from there:

At the beginning of the class internally, add the following array:

// Prime number array
const vector<int> capacity = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,
                                196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,
                                50331653, 100663319, 201326611, 402653189, 805306457, 1610612741};

Remove the parameterized constructor and modify the default constructor as follows:

/**
* Default constructor
* @param M
*/
HashTable()  {
    M = capacity[capacityIndex], size = 0;
    this->hashtable = new map<Key, Value> *[M];
    for (int i = 0; i < M; i++) {
        this->hashtable[i] = new map<Key, Value>;
    }
}

Modify the add function:

After size++, add the following code:

if (size >= maxCapacity() && capacityIndex + 1 < capacity.size()) {
    capacityIndex++;
    resize(capacity[M]);
}

Each resize operation now takes values from the capacity array.

Modify the remove function

After size--, modify:

if (size < minCapacity() && capacityIndex - 1 >= 0) {
    capacityIndex--;
    resize(capacityIndex);
}

With that, the hash table is complete! The testing code remains the same.