Published on

Exploring the STL Source Code: Unordered Containers

Authors
  • avatar
    Name
    light-city

Exploring the STL Source Code: Unordered Containers

Table of Contents

Unordered Containers: unordered_map, unordered_multimap, unordered_set, unordered_multiset

0. Introduction

Previously, we learned about hashtable, and this section deals with its container adapters: unordered_map.

So, the underlying container for unordered_map is hashtable.

The source code for unordered_map and unordered_multimap can be found in the unordered_map.h file.

1. Fundamental Differences between unordered_map and unordered_multimap

Let's first examine the source code for unordered_map:

template<class _Key, class _Tp,
class _Hash = hash<_Key>,
class _Pred = std::equal_to<_Key>,
class _Alloc = std::allocator<std::pair<const _Key, _Tp> > >
class unordered_map
{
    typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc>  _Hashtable;
    _Hashtable _M_h;
};

Now, let's inspect the declaration of the underlying container __umap_hashtable:

template<bool _Cache>
using __umap_traits = __detail::_Hashtable_traits<_Cache, false, true>;

template<typename _Key,
    typename _Tp,
    typename _Hash = hash<_Key>,
    typename _Pred = std::equal_to<_Key>,
    typename _Alloc = std::allocator<std::pair<const _Key, _Tp> >,
    typename _Tr = __umap_traits<__cache_default<_Key, _Hash>::value>>
using __umap_hashtable = _Hashtable<_Key, std::pair<const _Key, _Tp>,
_Alloc, __detail::_Select1st,
_Pred, _Hash,
__detail::_Mod_range_hashing,
__detail::_Default_ranged_hash,
__detail::_Prime_rehash_policy, _Tr>;

From this, we can conclude that the hashtable's template parameters are as follows:

template<typename _Key, typename _Value, typename _Alloc,
typename _ExtractKey, typename _Equal,
typename _H1, typename _H2, typename _Hash,
typename _RehashPolicy, typename _Traits>

By default, unordered_map uses:

  • H1 as hash<key>
  • H2 as _Mod_range_hashing
  • _Hash as _Default_ranged_hash
  • _RehashPolicy as _Prime_rehash_policy
  • _Traits as _Tr

The last parameter _Tr is crucial because it's what enables the existence of unordered_multimap.

Similarly, let's analyze _Tr:

typename _Tr = __umap_traits<__cache_default<_Key, _Hash>::value>>

_Tr utilizes __umap_traits, so let's delve deeper:

template<bool _Cache>
using __umap_traits = __detail::_Hashtable_traits<_Cache, false, true>;

Comparing the above two, we notice that __umap_traits includes a series of __cache_default, and looking into the template parameter of type bool, we can determine whether it's false or true. After testing, it's found to be false, indicating no hash code caching.

Further analyzing __umap_traits, which essentially invokes _Hashtable_traits:

template<bool _Cache_hash_code, bool _Constant_iterators, bool _Unique_keys>
struct _Hashtable_traits
{
    using __hash_cached = __bool_constant<_Cache_hash_code>;
    using __constant_iterators = __bool_constant<_Constant_iterators>;
    using __unique_keys = __bool_constant<_Unique_keys>;
};

Here, we see three using declarations, which can be understood as three typedefs, representing whether hash code caching is enabled, whether iterators are constant, and whether keys are unique, respectively. Looking back, the three template parameters passed in are false, false, true, confirming that unordered_map has unique keys. Accordingly, unordered_multimap has non-unique keys, with the last parameter being false. This is evident from the corresponding code:

template<bool _Cache>
using __ummap_traits = __detail::_Hashtable_traits<_Cache, false, false>;

In summary, we've identified the fundamental difference between unordered_map and unordered_multimap, and also observed how to implement two container adapters using a single container through underlying source code analysis!

2. Fundamental Differences between unordered_set and unordered_multiset

Similarly, let's analyze unordered_set:

template<class _Value,
    class _Hash = hash<_Value>,
    class _Pred = std::equal_to<_Value>,
    class _Alloc = std::allocator<_Value> >
class unordered_set
{
    typedef __uset_hashtable<_Value, _Hash, _Pred, _Alloc>  _Hashtable;
    _Hashtable _M_h;
}

template<bool _Cache>
using __uset_traits = __detail::_Hashtable_traits<_Cache, true, true>;

template<typename _Value,
    typename _Hash = hash<_Value>,
    typename _Pred = std::equal_to<_Value>,
    typename _Alloc = std::allocator<_Value>,
    typename _Tr = __uset_traits<__cache_default<_Value, _Hash>::value>>
using __uset_hashtable = _Hashtable<_Value, _Value, _Alloc,
                __detail::_Identity, _Pred, _Hash,
                __detail::_Mod_range_hashing,
                __detail::_Default_ranged_hash,
                __detail::_Prime_rehash_policy, _Tr>;

It's evident that the parameters passed to _Hashtable_traits are false, true, true. For unordered_set, it uses constant iterators and unique keys. Now, let's examine unordered_multiset:

template<bool _Cache>
using __umset_traits = __detail::_Hashtable_traits<_Cache, true, false>;

Comparing the two, the essence is that unordered_set does not allow key duplication, whereas unordered_multiset does.

3. Three Main Conclusions

Now that we've laid the groundwork, let's list the four container adapters discussed earlier:

(1) unordered_map

template<bool _Cache>
using __umap_traits = __detail::_Hashtable_traits<_Cache, false, true>;

(2) unordered_multimap

template<bool _Cache>
using __umap_traits = __detail::_Hashtable_traits<_Cache, false, false>;

(3) unordered_set

template<bool _Cache>
using __uset_traits = __detail::_Hashtable_traits<_Cache, true, true>;

(4) unordered_multiset

template<bool _Cache>
using __uset_traits = __detail::_Hashtable_traits<_Cache, true, false>;

Comparing them, we can derive:

  • Conclusion 1: unordered_map and unordered_set do not allow key duplication, whereas their 'multi' counterparts do.
  • Conclusion 2: unordered_map and unordered_multimap use iterators, while unordered_set and unordered_multiset use const_iterators.
  • Conclusion

3: In unordered_map and unordered_multimap, the key is the key, and the value is the key + value, whereas in unordered_set and unordered_multiset, both key and value are the same.

The last conclusion is further evidenced by examining the parameters passed to the hashtable:

unordered_map and unordered_multimap:

using __umap_hashtable = _Hashtable<_Key, 
std::pair<const _Key, _Tp>,
_Alloc, __detail::_Select1st,
_Pred, _Hash,
__detail::_Mod_range_hashing,
__detail::_Default_ranged_hash,
__detail::_Prime_rehash_policy, _Tr>;

unordered_set and unordered_multiset:

template<typename _Value,
typename _Hash = hash<_Value>,
typename _Pred = std::equal_to<_Value>,
typename _Alloc = std::allocator<_Value>,
typename _Tr = __uset_traits<__cache_default<_Value, _Hash>::value>>
using __uset_hashtable = _Hashtable<_Value, _Value, _Alloc,
__detail::_Identity, _Pred, _Hash,
__detail::_Mod_range_hashing,
__detail::_Default_ranged_hash,
__detail::_Prime_rehash_policy, _Tr>;

4. Important Functions of unordered_map

Initialization

In the constructor below, we can see that the default number of buckets for unordered_map is 10.

By default, unordered_map uses hasher(), i.e., H1, i.e., std::hash.

unordered_map(size_type __n = 10,
    const hasher& __hf = hasher(),
    const key_equal& __eql = key_equal(),
    const allocator_type& __a = allocator_type())
: _M_h(__n, __hf, __eql, __a)
{ }

Let's test if the default hash is used:

unordered_map<string,int> um;
hash<string> h;
cout<<um.hash_function()("hhhhhawq")<<endl;
cout<<h("hhhhhawq")<<endl;

Output:

9074142923776869151
9074142923776869151

This further verifies the usage of the default hash.

Empty, Size, Max Size

bool
empty() const noexcept
{ return _M_h.empty(); }

size_type
size() const noexcept
{ return _M_h.size(); }

size_type
max_size() const noexcept
{ return _M_h.max_size(); }

Begin and End

iterator
begin() noexcept
{ return _M_h.begin(); }

iterator
end() noexcept
{ return _M_h.end(); }

Insert Five insertion methods

// value
std::pair<iterator, bool>
insert(const value_type& __x)
{ return _M_h.insert(__x); }

// pair 
std::pair<iterator, bool>
insert(_Pair&& __x)
{ return _M_h.insert(std::forward<_Pair>(__x)); }

// iterator+value
iterator
insert(const_iterator __hint, const value_type& __x)
{ return _M_h.insert(__hint, __x); }

// Range insert from first to last
template<typename _InputIterator>
void
insert(_InputIterator __first, _InputIterator __last)
{ _M_h.insert(__first, __last); }

// Insert from initializer list
void
insert(initializer_list<value_type> __l)
{ _M_h.insert(__l); }

Erase

Three deletion methods

// iterator
iterator
erase(iterator __position)
{ return _M_h.erase(__position); }

// key
size_type
erase(const key_type& __x)
{ return _M_h.erase(__x); }

// Range from first to last
iterator
erase(const_iterator __first, const_iterator __last)
{ return _M_h.erase(__first, __last); 

Clear

void
clear() noexcept
{ _M_h.clear(); }

Hash Function

Retrieve the hash_function of this unordered_map

hasher
hash_function() const
{ return _M_h.hash_function(); }

Usage:

unordered_map<string,int> um;
cout<<um.hash_function()("hhhhhawq")<<endl; // The content passed here should match the key type defined above.

Key_eq

key_eq returns std::equal_to

Usage:

unordered_map<string,int> um;
cout<<um.key_eq()("1","2")<<endl;

Lookup and Retrieve Value

iterator
find(const key_type& __x)
{ return _M_h.find(__x); }

mapped_type&
operator[](const key_type& __k)
{ return _M_h[__k]; }

mapped_type&
at(const key_type& __k)
{ return _M_h.at(__k); }

In addition to these functions, there are functions to get buckets, maximum bucket count, load factor, rehash, etc. However, sorting is not provided as hashtable does not support sorting. Hashtable provides constant time for finding, deleting, and inserting nodes, which is superior to RB-Tree (Red-Black Tree).

Similarly, unordered_set, unordered_multiset, and unordered_multimap have the same functions as unordered_map, so they are not elaborated here.