Published on

In-depth Analysis of C++ STL Source Code: Traits Programming Technique

Authors
  • avatar
    Name
    light-city

Analysis of C++ STL Source Code: Traits Programming Technique

Table of Contents

0. Introduction

In STL programming, containers and algorithms are designed independently, meaning that data structures and algorithms are designed separately. The bridge that connects containers and algorithms is the iterator, which makes independent design possible. The following diagram illustrates this:

The diagram shows that the goal of STL is to separate data and algorithms, design them separately, and then combine them using something called an iterator.

According to the design pattern, an iterator is described as: a method that can sequentially access each element in a container, without exposing the internal representation of the container. Traits programming technique is used to solve problems related to iterators.

It enables the use of generic algorithms (such as find, count, find_if) on a container. The most important thing is to provide the algorithm with a tool to access container elements, and the iterator plays this important role.

In algorithms, we may define simple intermediate variables or specify the return variable type of the algorithm. At this time, we need to know what type the iterator points to. However, since there is no typeof function for determining types, we cannot directly obtain it. What should we do? This article will explain it in detail.

For an iterator, it is a smart pointer, so it has all the characteristics of a regular pointer - it can be dereferenced (*) and accessed using the arrow operator (->). However, when traversing a container, it is inevitable to have some knowledge of the container's internals. Therefore, it is better to leave the development of iterators to the container designers. This way, all implementation details can be encapsulated and hidden from users. That's why each STL container provides its own iterator.

In bits/stl_iterator_base_types.h, traits are defined as follows:

template<class _Tp>
struct iterator_traits<_Tp*>
{
    typedef ptrdiff_t difference_type;
    typedef typename _Tp::value_type value_type;
    typedef typename _Tp::pointer pointer;
    typedef typename _Tp::reference reference;
    typedef typename _Tp::iterator_category iterator_category;
};

It may seem confusing, but don't worry, after reading this section, you will have a good understanding of STL. Ha ha~

1. Template Argument Deduction

Firstly, when using iterators in algorithms, it is likely that the corresponding type (associated type) of the iterator, i.e., the type of the object pointed to by the iterator, will be needed. What should be done if it is necessary to declare a variable with the same type as the object pointed to by the iterator?

The solution is to use the template argument deduction mechanism of function templates.

For example:

If T is a pointer to a specific object, and we need the type of the object pointed to by the pointer in func, it can be easily done using the template argument deduction mechanism:

template <class I>
inline
void func(I iter) {
    func_impl(iter, *iter); // pass iter and the value it points to, class deduces automatically
}

Through template deduction, we can easily obtain the type of the object pointed to by the pointer.

template <class I, class T>
void func_impl(I iter, T t) {
        T tmp; // this is the type of the object pointed to by the iterator
        // ... implementation
}

int main() {
    int i;
    func(&i);
}

However, the template argument deduction mechanism of functions can only deduce the parameters, and cannot deduce the return value type of the function. If it is necessary to deduce the return value of the function, there is no way to do it. Therefore, we introduce the following method.

2. Declaration of Nested Types

The type of the object pointed to by the iterator is called the value type of the iterator.

Although we can use T as the return value of func_impl, the problem is that users need to call func.

template <class T>
struct MyIter {
    typedef T value_type; // declaration of nested type
    T* ptr;
    MyIter(T* p = 0) : ptr(p) {}
    T& operator*() const { return *ptr; }
};

template <class I>
typename I::value_type
func(I ite) {
	std::cout << "class version" << std::endl;
    return *ite;
}
int main() {
    // ...
    MyIter<int> ite(new int(8));
    cout << func(ite);	// outputs 8
}

Very elegant solution, everything looks perfect. However, in reality, there is still a problem because if func is a generic algorithm, it absolutely needs to accept a raw pointer as an iterator. But obviously, you cannot make the following code compile:

int *p = new int(5);
cout<<func(p)<<endl; // error

Our func cannot support raw pointers, which is clearly unacceptable. In this case, template partial specialization comes into play.

3. The Savior: Traits

As mentioned earlier, if we use typename I::value_type directly, the algorithm cannot accept raw pointers because raw pointers simply do not have the nested type value_type.

Therefore, we need to add an intermediate layer to determine if it is a raw pointer. Note that this is the beauty of the traits technique.

If we only use the above approach, which is embedding value_type, then for pointers without value_type, we can only partially specialize it. This kind of partial specialization is for the specialization of the callable function func. If func has 1 million lines of code, it will cause a great visual pollution.

(1) Function Partial Specialization

Function partial specialization:

template <class T>
struct MyIter {
    typedef T value_type; // nested type declaration
    T* ptr;
    MyIter(T* p = 0) : ptr(p) {}
    T& operator*() const { return *ptr; }
};

template <class I>
typename I::value_type
func(I ite) {
    std::cout << "class version" << std::endl;
    return *ite;
}
template <class I>
I
func(I* ite) {
    std::cout << "pointer version" << std::endl;
    return *ite;
}
template <class I>
I func(const I* ite) {
    std::cout << "const pointer version" << std::endl;
    return *ite;
}
int main() {
    // ...
    MyIter<int> ite(new int(8));
    cout << func(ite)<<endl;
    int *p = new int(52);
    cout<<func(p)<<endl;
    const int k = 3;
    cout<<func(&k)<<endl;
}

Output:

class version
8
pointer version
52
const pointer version
3

(2) Adding an Intermediate Layer

In the STL, what is Traits? See the diagram below:

By using an intermediate layer iterator_traits, we fix the form of func and greatly reduce the duplicated code. The only thing we need to do is to slightly specialize iterator_traits to support pointers and const pointers:)

#include <iostream>

template <class T>
struct MyIter {
    typedef T value_type; // nested type declaration
    T* ptr;
    MyIter(T* p = 0) : ptr(p) {}
    T& operator*() const { return *ptr; }
};
// class type
template <class T>
struct iterator_traits {
    typedef typename T::value_type value_type;
};
// Partial specialization 1
template <class T>
struct iterator_traits<T*> {
    typedef T value_type;
};
// Partial specialization 2
template <class T>
struct iterator_traits<const T*> {
    typedef T value_type;
};

template <class I>
typename iterator_traits<I>::value_type
// First, ask iterator_traits<I>::value_type. If I is a pointer, enter the specialized version, and iterator_traits directly answers; if I is a class type, ask T::value_type.
func(I ite) {
    std::cout << "normal version" << std::endl;
    return *ite;
}
int main() {
    // ...
    MyIter<int> ite(new int(8));
    std::cout << func(ite)<<std::endl;
    int *p = new int(52);
    std::cout<<func(p)<<std::endl;
    const int k = 3;
    std::cout<<func(&k)<<std::endl;
}

The above process first asks iterator_traits<I>::value_type. If I is a pointer, it enters the specialized version, and iterator_traits directly answers T; if I is a class type, it continues to ask T::value_type.

In simpler terms, the algorithm (func) asks the iterator_traits (me), but when iterator_traits (me) realizes that it has a pointer in hand, it answers on its behalf. If it is a class type, iterator_traits (me) continues to ask (him—T::value_type).

Summary: By defining nested types, we obtain a method to know the element type pointed by an iterator. Through the traits technique, we unify the definition of function templates for raw pointers and custom iterators. We mainly use the traits technique to solve the code redundancy caused by the differences between raw pointers and custom iterators. That's the beauty of traits technique.

Recommended Book:

"STL源码剖析" by 侯捷

Reference Articles:

https://juejin.im/post/5b1a43fb51882513bf1795c6

https://www.cnblogs.com/mangoyuan/p/6446046.html

http://www.cppblog.com/nacci/archive/2005/11/03/911.aspx