Data Structures 101 - Hash Table implementation - part 2


In the first installment, we highlighted the main concepts and features of a Hash Table.

Using tha previous knowledge as our starting point, today we are going to show an implementation in Ruby as part of the DataStructures101 gem.

The gem has two (2) user-facing classes for hash table related operations:

  • ChainedHashTable which uses separate chaining for collisions resolutions.
  • ProbeHashTable which uses linear probing and its variants as collisions-handling scheme.

Both classes rely on a common base class named BaseHashTable which implements the public interface to interact with a hash table object and some necessary methods such as resize. On the other hand, BaseHashTable depends on its children implementations to decide how to do the actual search, insertion and deletion of keys.

BaseHashTable

Our base HashTable class handles the table array creation, the resize logic and it stores a lambda function to perform the compression step of the keys' hash codes.

Why a lambda function?

For a production-ready HashTable implementation, we wouldn’t use a lambda function, since it adds a small overhead to performance and it can sometimes reduce readability of the code. Instead, we would use directly the MAD method within the code without even making the external interface aware of it. But for our purposes, it works just great, because it allow us to swap different compression function implementations and thus make easier to analyze and compare them.

Implementation Details

BaseHashTable class receives as constructor parameters the initial capacity for the bucket table, a prime number and compression lambda. If the compression lambda is nil, then the prime number is required to generate the default compression function using the MAD method.

def initialize(capacity:, prime:, compression_lambda:)
  @capacity = capacity
  @size = 0
  @table = Array.new(@capacity)

  @compression_lambda = compression_lambda

  return unless @compression_lambda.nil?

  random = Random.new
  scale = random.rand(prime - 1) + 1
  shift = random.rand(prime)
  @compression_lambda = lambda do |key, cap|
    return (((key.hash * scale + shift) % prime) % cap).abs
  end
end

There are three main things worth to point out:

  • Our implementation relies on the objects being used as key to define a proper hash code.
  • Since the lambda acts as a closure, we need to pass as a parameter the capacity of the array, otherwise, it would get stuck with whatever value was used at the moment of the lambda creation and we would end up with extra slots in the array that wouldn’t be used.
  • If we ever were to resize larger than prime number, we would end with a similar problem as above, we would waste space in the array that would never be filled.

The insert method makes the call to insert to the child method implementation (bucket_insert) and then resizes the bucket table if the size is larger than half of the current capacity (this is to guarantee that probing strategies can always insert successfully). Finally it returns the previous value, if any.

def insert(key, value)
  hash_code = compression_lambda.call(key, @capacity)
  old_value = bucket_insert(hash_code, key, value)

  # keep load factor <= 0.5
  resize(new_capacity) if @size > @capacity / 2

  old_value
end

The other two user-facing methods: [] and delete are simply a wrapper around the child method implementations.

def [](key)
  bucket_find(compression_lambda.call(key, @capacity), key)
end

def delete(key)
  bucket_delete(compression_lambda.call(key, @capacity), key)
end

To increase the usability of the class, we have included Enumerable module and implemented the each method as follow.

def each
  return enum_for(:each) unless block_given?

  bucket_each do |key, value|
    yield(key, value)
  end
end

The final BaseHashTable implementation is shown below, including the private methods new_capacity and resize.

class BaseHashTable
  include Enumerable

  attr_reader :size, :compression_lambda, :capacity

  def initialize(capacity:, prime:, compression_lambda:)
    @capacity = capacity
    @size = 0
    @table = Array.new(@capacity)

    @compression_lambda = compression_lambda

    return unless @compression_lambda.nil?

    random = Random.new
    scale = random.rand(prime - 1) + 1
    shift = random.rand(prime)
    @compression_lambda = lambda do |key, cap|
      return (((key.hash * scale + shift) % prime) % cap).abs
    end
  end

  def []=(key, value)
    insert(key, value)
  end

  def insert(key, value)
    hash_code = compression_lambda.call(key, @capacity)
    old_value = bucket_insert(hash_code, key, value)

    # Keep load factor below 0.5.
    resize(new_capacity) if @size > @capacity / 2

    old_value
  end

  def [](key)
    bucket_find(compression_lambda.call(key, @capacity), key)
  end

  def delete(key)
    bucket_delete(compression_lambda.call(key, @capacity), key)
  end

  def each
    return enum_for(:each) unless block_given?

    bucket_each do |key, value|
      yield(key, value)
    end
  end

  private

  def new_capacity
    2 * @capacity - 1
  end

  def resize(new_cap)
    @capacity = new_cap

    buffer = map { |key, value| [key, value] }

    @table = Array.new(@capacity)
    @size = 0

    buffer.each { |key, value| self[key] = value }
  end
end

ChainedHashTable

ChainedHashTable class implements separate chaining as collision-handling scheme. To achieve it, it uses a support class named Bucket which handles the storage and retrieval of (key, value) pairs that match to the same hashcode.

Bucket Utility Class

The Bucket class uses an array as internal storage and stores at each position a two-values array [key, value]. Next we are going to highlight some details about the main methods.

The insert method, shown below, appends a new pair at the end of array if the key is not present in the bucket and returns nil, otherwise it swaps the previous value with the new one in the existing pair and returns the old value.

def insert(key, value)
  idx = @table.find_index { |table_key, _| table_key == key }

  if idx.nil?
    @table << [key, value]
    return nil
  else
    value, @table[idx][1] = @table[idx][1], value
    return value
  end
end

The delete method removes an existing key from the bucket and returns the associated value if the key is found, otherwise it returns nil. One subtlety to mention is that it swaps the last entry to the position of the entry being removed. This way it can avoid a shift of the values to the left (which has \(O(n)\) performance) by calling pop (which performs in constant amount of time) instead of directly calling delete_at using the idx variable.

def delete(key)
  idx = @table.find_index { |table_key, _| table_key == key }
  return nil if idx.nil?

  value = @table[idx].last
  @table[idx] = @table.last if idx != @table.size - 1
  @table.pop
  value
end

To finish with the Bucket class analysis, the complete code is shown below:

class Bucket
  attr_reader :table

  def initialize
    @table = []
  end

  def [](key)
    find(key)
  end

  def []=(key, value)
    insert(key, value)
  end

  def insert(key, value)
    idx = @table.find_index { |table_key, _| table_key == key }

    if idx.nil?
      @table << [key, value]
      return nil
    else
      value, @table[idx][1] = @table[idx][1], value
      return value
    end
  end

  def size
    @table.size
  end

  def find(key)
    pair = @table.find { |table_key, _| table_key == key }
    pair.nil? ? nil : pair.last
  end

  def delete(key)
    idx = @table.find_index { |table_key, _| table_key == key }
    return nil if idx.nil?

    value = @table[idx].last
    @table[idx] = @table.last if idx != @table.size - 1
    @table.pop
    value
  end

  def each
    return enum_for(:each) unless block_given?

    @table.each do |key, value|
      yield(key, value)
    end
  end
end

bucket_* Methods

ChainedHashTable class relies in the Bucket class discussed above to do the actual implementation of its main operations, namely find, insert and delete.

bucket_find logic is rather simple. It checks if the table has a valid bucket and if it does it returns the result of calling find in the bucket, otherwise returns nil.

def bucket_find(hash_code, key)
  bucket = @table[hash_code]
  return nil if bucket.nil?

  bucket.find(key)
end

bucket_insert checks if there is already a valid bucket in the position to insert and then proceeds to produce the insertion within the bucket. It changes the size by adding the difference between the previous size and the current size (which could be either 0 or 1) and returns the result of the insert call in the bucket object.

def bucket_insert(hash_code, key, value)
  bucket = @table[hash_code]
  bucket = @table[hash_code] = Hash::Bucket.new if bucket.nil?

  old_size = bucket.size
  old_value = bucket.insert(key, value)
  @size += (bucket.size - old_size)

  old_value
end

And finally, bucket_delete verifies if there is a bucket in the hash_code position and in case there is valid bucket, it proceeds to delete the key in the bucket and returns the result from this action.

def bucket_delete(hash_code, key)
  bucket = @table[hash_code]
  return nil if bucket.nil?

  old_size = bucket.size
  value = bucket.delete(key)
  @size -= (old_size - bucket.size)

  value
end

To end the ChainedHashTable section, below it is shown the code:

class ChainedHashTable < Hash::BaseHashTable
  def initialize(capacity: 31, prime: 109_345_121, compression_lambda: nil)
    super
  end

  private

  def bucket_find(hash_code, key)
    bucket = @table[hash_code]
    return nil if bucket.nil?

    bucket.find(key)
  end

  def bucket_insert(hash_code, key, value)
    bucket = @table[hash_code]
    bucket = @table[hash_code] = Hash::Bucket.new if bucket.nil?

    old_size = bucket.size
    old_value = bucket.insert(key, value)
    @size += (bucket.size - old_size)

    old_value
  end

  def bucket_delete(hash_code, key)
    bucket = @table[hash_code]
    return nil if bucket.nil?

    old_size = bucket.size
    value = bucket.delete(key)
    @size -= (old_size - bucket.size)

    value
  end

  def bucket_each
    @table.each do |bucket|
      next if bucket.nil?

      bucket.each do |key, value|
        yield(key, value)
      end
    end
  end
end

ProbeHashTable

ProbeHashTable class implements by default linear probing as collision-resolution strategy, but it has a probe_lambda parameter in the constructor that allows us to passs another resolution scheme as a lambda function. In turn, this facilitates the utilization of different alternatives.

A point to highlight is the utilization of a private singleton class, shown below, to differentiate positions where pairs have been deleted, from nil or empty positions. This is necessary in order to proper use probing strategies to handle collissions.

Sentinel = Class.new do
  include Singleton
end

bucket_* Methods

The main methods implementation rely on the usage of private method find_slot which handles getting the corresponding index location for a given hash_code.

The bucket_find logic is easily understood. It calls find_slot to get the index location and returns nil if the position is available or the corresponding value for the key.

def bucket_find(hash_code, key)
  idx = find_slot(hash_code, key)

  slot_available?(idx) ? nil : @table[idx].last
end

Similarly, the bucket_insert method finds the slot to be used and either replaces or insert the new (key, value) pair based on the availability of the index position returned by the find_slot call.

def bucket_insert(hash_code, key, value)
  idx = find_slot(hash_code, key)

  unless slot_available?(idx)
    old_value = @table[idx].last
    @table[idx] = [key, value]
    return old_value
  end

  @table[idx] = [key, value]
  @size += 1

  nil
end

Last, but no the least, bucket_delete returns nil if find_slot returns an available slot, otherwise it sets the index position with the special sentinel mark and returns the previous value there.

def bucket_delete(hash_code, key)
  idx = find_slot(hash_code, key)
  return nil if slot_available?(idx)

  value = @table[idx].last
  @table[idx] = Sentinel.instance
  @size -= 1

  value
end

To end the discussion of the ProbeHashTable class, below we share the class implementation, including the private methods seen before:

class ProbeHashTable < Hash::BaseHashTable
  attr_reader :probe_lambda

  def initialize(capacity: 31, prime: 109_345_121,
               compression_lambda: nil, probe_lambda: nil)
    super(capacity: capacity, prime: prime,
        compression_lambda: compression_lambda)

    @probe_lambda = probe_lambda

    return unless @probe_lambda.nil?

    @probe_lambda = ->(h, i, cap) { return (h + i) % cap }
  end

  private

  Sentinel = Class.new do
    include Singleton
  end

  def bucket_find(hash_code, key)
    idx = find_slot(hash_code, key)

    slot_available?(idx) ? nil : @table[idx].last
  end

  def bucket_insert(hash_code, key, value)
    idx = find_slot(hash_code, key)

    unless slot_available?(idx)
      old_value = @table[idx].last
      @table[idx] = [key, value]
      return old_value
    end

    @table[idx] = [key, value]
    @size += 1

    nil
  end

  def bucket_delete(hash_code, key)
    idx = find_slot(hash_code, key)
    return nil if slot_available?(idx)

    value = @table[idx].last
    @table[idx] = Sentinel.instance
    @size -= 1

    value
  end

  def bucket_each
    @table.each do |elem|
      next if elem.nil? || elem == Sentinel.instance

      yield(elem.first, elem.last)
    end
  end

  def find_slot(h, key)
    idx = -1

    j = 0
    loop do
      i = @probe_lambda.call(h, j, @capacity)
      if slot_available?(i)
        idx = i if idx == -1
        break if @table[i].nil?
      elsif @table[i].first == key
        return i
      end
      j += 1
    end

    idx
  end

  def slot_available?(i)
    @table[i].nil? || @table[i] == Sentinel.instance
  end
end

Hey!! You made it this far!! Get yourself a cookie :). This post was full of chunks of code to show how DataStructures101 gem implements a HashTable data structure. Our next post will close this serie about HashTable and it will provide several analyses and draw some conclusions from them.

Thanks to read till the very end. Hope it was worth for you as it was creating it for me. See you soon and stay tuned for more!!

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.