Skip to content

MultiDictionary

Kal Ahmed edited this page Sep 26, 2016 · 1 revision

Home > Collections > MultiDictionary

MultiDictionary

The MultiDictionary is a fast and efficient implementation of the standard .Net IDictionary interface designed primarily for use as an indexing structure. It provides essentially the same behaviour as the standard .Net dictionary but alters how collisions are handled compared to the standard implementation.

It's performance is comparable (if marginally slower) to the standard implementation if your data is well distributed and has minimal hash code collisions. If your data has a lot of collisions then you may see significant performance improvements by using the MultiDictionary (in the pathological case where every key has the same hash code this can be orders of magnitude faster).

Note that regardless of how good your hash function is .Net only uses a 32 bit hash code so as the number of data items you are handling increases the chance of a hash collision starts to get very high. This article by Jeff Preshing covers the math behind this but the basic takeaway is that with a 32 bit hash code once you have more than 77,163 items in your collection the chance of a collision is 50%. Therefore even if you are not interested in the indexing use case MultiDictionary it may still be valuable if you have large data sets.

Use for Indexing

It's performance characteristics (see Complexity section for further details) make it particularly well suited to use as an indexing structure. This is because in indexing scenarios we are typically indexing objects by partial characteristics so it is far more likely (if not guaranteed) that we will see plenty of hash code collisions. The standard Dictionary implementation handles these poorly and so for indexing scenarios we would strongly recommend our MultiDictionary over the standard .Net version.

Complexity

In complexity terms the performance for various operations is shown in the subsequent table, we include comparisons for the standard .Net implementation. Best case performance assumes a perfect key distribution with no collisions, worst case performance assumes a pathological key distribution where all keys collide.

Operation MultiDictionary Best Case MultiDictionary Worst Case Dictionary Best Case Dictionary Worst Case
Add() O(1) O(log n) O(1) O(n)
ContainsKey() and this[] O(1) O(log n) O(1) O(n)
Remove() O(1) O(log n) O(1) O(n)

As can be seen the MultiDictionary has equivalent best case performance and significantly better worst case performance.

Usage

A MultiDictionary may be used exactly as a normal dictionary would since it implements the same IDictionary interface. The main thing to consider is the options you provide when constructing a MultiDictionary since these can dramatically change it's performance under different usage scenarios.

We'll look at the full constructor for completeness, though there are overloads that allow only the pertinent options to be set.

public MultiDictionary(Func<TKey, int> hashFunction, IComparer<TKey> comparer, MultiDictionaryMode mode)

The constructor takes a function that generates the hash codes for the keys, a comparer which determines whether keys are equivalent and a mode which controls internal data structures.

The choice of hash code and comparer can be very important to performance. What you choose will depend on your intended usage and the type of the keys you are using. When using constructors that don't specify these parameters the default is to simply use the types GetHashCode() implementation and the IComparer.Default() for the type.

The choice of mode should be carefully considered depending on how you intend to use your MultiDictionary. In most cases using MultiDictionaryMode.AVL will be the best option and give the best balance of performance. MultiDictionaryMode.Unbalanced will improve ingest performance at the cost of search and delete performance. Finally MultiDictionaryMode.Scapegoat will give roughly equivalent performance to MultiDictionaryMode.AVL but will suffer occasional drops in performance when periodic re-balances are necessary since these are costly O(n) operations.

When using a constructor that does not specify a mode MultiDictionaryMode.AVL is used.

Clone this wiki locally