Quick Index
Efficiency Problem


Let us first discuss the purpose for rehashing via problem solution.

Consider a hash table with no rehash:


🐞 This is a problem. The hash table promises O(1) efficiency, but now it will need to scan/iterate through a million elements in a bucket to find a key. Efficiency has degraded to O(n).

Solution -- Rehash


The efficiency of a hash tables is dependent on small bucket sizes.

That is the reason we rehash. We rebuild the hash table with more buckets and then re-insert all the elements (key-values) into the new buckets.

We do this so the "problem case" we discussed does not occur. Small bucket sizes (one or two elements) is crucial -- to avoid any chance of scanning searches through long lists.

Note we discussed rehash here in terms of "growth" (this is most typical). However, optionally, a hash table may rehash to "shrink" if it is very sparse (has a lot of wasted/empty space).

Max Load Ratio (Load Factor)


The max load ratio helps ensure high performance of the hash table.

In various source literature we will also see the max load ratio referred to as the load factor.

When the actual load ratio goes above the max, we rehash (grow) the hash table.

A common default maxLoadRatio is 0.75.

Load Ratio (Rehash) Examples


Legend
size = number of keys in structure
length = length of table (i.e. number of buckets)
loadRatio = size / length
maxLoadRatio = ratio that triggers a rehash - when exceeded
Example 1
  • We're using maxLoadRatio = 0.75
  • Here we have a table of length 16 buckets
  • Keys are added and the loadRatio increases (recall that loadRatio mean the average number of keys per bucket)
  • When the 13th key is added, then loadRatio > maxLoadRatio, which triggers a rehash
sizelengthloadRatiomaxLoadRatio
1160.060.75
2160.130.75
10160.630.75
11160.690.75
12160.750.75
13160.810.75
Example 2
  • We're using maxLoadRatio = 0.75
  • Here we have a table of length 512 buckets
  • Keys are added and the loadRatio increases (recall that loadRatio mean the average number of keys per bucket)
  • When the 385th key is added, then loadRatio > maxLoadRatio, which triggers a rehash
sizelengthloadRatiomaxLoadRatio
3805120.7420.75
3815120.7440.75
3825120.7460.75
3835120.7480.75
3845120.7500.75
3855120.7520.75


Min Load Ratio


Optionally we may also choose to rehash if the load ratio goes below a given minimum load ratio.

We would rehash in this case to save space. This rehash would not impact search performance.

A common default minLoadRatio is 0.1.

Rehashing Steps


Rehashing is a two-step process:


The purpose is to improve the load ratio.

Auto-Rehashing


Hash table auto-rehashing does the following:


Note that "grow" is fundamental to maintain performance and "shrink" is optional to reduce wasted space.

Generally when we grow the table we double the size. If we shrink the table (for min load ratio) we would half the size.