Introduction

Dictionary is a type of data structure that supports fast retrieval of key-value pairs. Hashtable, B-tree and binary search trees are among the most frequently used dictionary data structures.

This page benchmarks various C/C++ dictionary libraries. You may see a similar page elsewhere, but I can assure you that this page is the origin and the latest.


Design of the Benchmark

Data generation

To benchmark dictionary with integer keys, a random integer array of 5 million elements is generated with about 1.25 million distinct keys. Each element is then tested whether it is present in the dictionary. If the element is in the dictionary, it will be removed; otherwise, it will be inserted. 625,792 distinct keys will be left in the dictionary after this process. To benchmark dictionary with string keys, the same integer array is converted to char* array using sprintf().

Configuration and compilation

All programs are compiled on Intel-Mac (OS X 10.5, 32-bit gcc-4.2) and Intel-Linux (Debian etch, 64-bit gcc-4.1) with the command-line option: `-g -Wall -O2 -fomit-frame-pointer'.

Source code

Benchmark programs are available as udb-latest.tar.bz2. This tar-ball also includes a small program that checks the peak memory usage and a table in Numbers from which the plots below were generated. When you extract the tar-ball, directories with name beginning with underscore contain programs which require pre-installation of libraries; other directories can be compiled straight out of box.

Included libraries

Hash table


B-tree


Red-black tree


AVL tree


Other binary search tree


Results

The following figures show the results which are also available as this PDF file where the data table is also included.

Discussions

C pointers vs. C macros vs. C++ templates

For generic programming in C, people usually use `void*' pointers to work with type-independent data. However, this always comes with the cost of a pointer for each key. For small keys like integers, the cost is considerable. In C, a better way is to use macros. This way also avoids the overhead on retrieving the data pointed by `void*' pointers. In most cases, implementations with macros achieve the same speed as fully optimized type-specific implementations. To some extend, C++ templates extend the ability of C macros. They are usually more convenient, but on efficiency they do not do better than C macros.

In this benchmark, libavl_rb_cpp is a literal translation of libavl_rb by replacing `void*' pointers to keys with actual keys. This simple change makes libavl_rb_cpp both faster and more light-weight.