Extendible hashing

Extendible hashing is a type of hash system which treats a hash as a bit string, and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation (done one bucket at a time, as needed). This means that time-sensitive applications are less affected by table growth than by standard full-table rehashes.

Extendible hashing was described by Ronald Fagin in 1979. Practically all modern filesystems use either extendible hashing or B-trees. In particular, the Global File System, ZFS, and the SpadFS filesystem use extendible hashing.

Assume that the hash function $h(k)$ $h(k)$ returns a string of bits. The first i bits of each string will be used as indices to figure out where they will go in the "directory" (hash table). Additionally, i is the smallest number such that the index of every item in the table is unique.

Keys to be used:

Let's assume that for this particular example, the bucket size is 1. The first two keys to be inserted, k₁ and k₂, can be distinguished by the most significant bit, and would be inserted into the table as follows:

Now, if k₃ were to be hashed to the table, it wouldn't be enough to distinguish all three keys by one bit (because both k₃ and k₁ have 1 as their leftmost bit). Also, because the bucket size is one, the table would overflow. Because comparing the first two most significant bits would give each key a unique location, the directory size is doubled as follows:

And so now k₁ and k₃ have a unique location, being distinguished by the first two leftmost bits. Because k₂ is in the top half of the table, both 00 and 01 point to it because there is no other key to compare to that begins with a 0.

The above example is from Fagin et al. (1979).

...
Wikipedia