In the first installment, we highlighted the main concepts and features of a Hash Table.

Using tha previous knowledge as our starting point, today we are going to show an implementation in Ruby as part of the DataStructures101 gem.

The gem has two (2) user-facing classes for hash table related operations:

• ChainedHashTable which uses separate chaining for collisions resolutions.
• ProbeHashTable which uses linear probing and its variants as collisions-handling scheme.

Both classes rely on a common base class named BaseHashTable which implements the public interface to interact with a hash table object and some necessary methods such as resize. On the other hand, BaseHashTable depends on its children implementations to decide how to do the actual search, insertion and deletion of keys.

## BaseHashTable #

Our base HashTable class handles the table array creation, the resize logic and it stores a lambda function to perform the compression step of the keys' hash codes.

#### Why a lambda function? #

For a production-ready HashTable implementation, we wouldn't use a lambda function, since it adds a small overhead to performance and it can sometimes reduce readability of the code. Instead, we would use directly the MAD method within the code without even making the external interface aware of it. But for our purposes, it works just great, because it allow us to swap different compression function implementations and thus make easier to analyze and compare them.

#### Implementation Details #

BaseHashTable class receives as constructor parameters the initial capacity for the bucket table, a prime number and compression lambda. If the compression lambda is nil, then the prime number is required to generate the default compression function using the MAD method.

def initialize(capacity:, prime:, compression_lambda:)  @capacity = capacity  @size = 0  @table = Array.new(@capacity)  @compression_lambda = compression_lambda  return unless @compression_lambda.nil?  random = Random.new  scale = random.rand(prime - 1) + 1  shift = random.rand(prime)  @compression_lambda = lambda do |key, cap|    return (((key.hash * scale + shift) % prime) % cap).abs  endend

There are three main things worth to point out:

• Our implementation relies on the objects being used as key to define a proper hash code.
• Since the lambda acts as a closure, we need to pass as a parameter the capacity of the array, otherwise, it would get stuck with whatever value was used at the moment of the lambda creation and we would end up with extra slots in the array that wouldn't be used.
• If we ever were to resize larger than prime number, we would end with a similar problem as above, we would waste space in the array that would never be filled.

The insert method makes the call to insert to the child method implementation (bucket_insert) and then resizes the bucket table if the size is larger than half of the current capacity (this is to guarantee that probing strategies can always insert successfully). Finally it returns the previous value, if any.

def insert(key, value)  hash_code = compression_lambda.call(key, @capacity)  old_value = bucket_insert(hash_code, key, value)  # keep load factor <= 0.5  resize(new_capacity) if @size > @capacity / 2  old_valueend

The other two user-facing methods: [] and delete are simply a wrapper around the child method implementations.

def [](key)  bucket_find(compression_lambda.call(key, @capacity), key)enddef delete(key)  bucket_delete(compression_lambda.call(key, @capacity), key)end

To increase the usability of the class, we have included Enumerable module and implemented the each method as follow.

def each  return enum_for(:each) unless block_given?  bucket_each do |key, value|    yield(key, value)  endend

The final BaseHashTable implementation is shown below, including the private methods new_capacity and resize.

class BaseHashTable  include Enumerable  attr_reader :size, :compression_lambda, :capacity  def initialize(capacity:, prime:, compression_lambda:)    @capacity = capacity    @size = 0    @table = Array.new(@capacity)    @compression_lambda = compression_lambda    return unless @compression_lambda.nil?    random = Random.new    scale = random.rand(prime - 1) + 1    shift = random.rand(prime)    @compression_lambda = lambda do |key, cap|      return (((key.hash * scale + shift) % prime) % cap).abs    end  end  def []=(key, value)    insert(key, value)  end  def insert(key, value)    hash_code = compression_lambda.call(key, @capacity)    old_value = bucket_insert(hash_code, key, value)    # Keep load factor below 0.5.    resize(new_capacity) if @size > @capacity / 2    old_value  end  def [](key)    bucket_find(compression_lambda.call(key, @capacity), key)  end  def delete(key)    bucket_delete(compression_lambda.call(key, @capacity), key)  end  def each    return enum_for(:each) unless block_given?    bucket_each do |key, value|      yield(key, value)    end  end  private  def new_capacity    2 * @capacity - 1  end  def resize(new_cap)    @capacity = new_cap    buffer = map { |key, value| [key, value] }    @table = Array.new(@capacity)    @size = 0    buffer.each { |key, value| self[key] = value }  endend

## ChainedHashTable #

ChainedHashTable class implements separate chaining as collision-handling scheme. To achieve it, it uses a support class named Bucket which handles the storage and retrieval of (key, value) pairs that match to the same hashcode.

#### Bucket Utility Class #

The Bucket class uses an array as internal storage and stores at each position a two-values array [key, value]. Next we are going to highlight some details about the main methods.

The insert method, shown below, appends a new pair at the end of array if the key is not present in the bucket and returns nil, otherwise it swaps the previous value with the new one in the existing pair and returns the old value.

def insert(key, value)  idx = @table.find_index { |table_key, _| table_key == key }  if idx.nil?    @table << [key, value]    return nil  else    value, @table[idx][1] = @table[idx][1], value    return value  endend

The delete method removes an existing key from the bucket and returns the associated value if the key is found, otherwise it returns nil. One subtlety to mention is that it swaps the last entry to the position of the entry being removed. This way it can avoid a shift of the values to the left (which has $$O(n)$$ performance) by calling pop (which performs in constant amount of time) instead of directly calling delete_at using the idx variable.

def delete(key)  idx = @table.find_index { |table_key, _| table_key == key }  return nil if idx.nil?  value = @table[idx].last  @table[idx] = @table.last if idx != @table.size - 1  @table.pop  valueend

To finish with the Bucket class analysis, the complete code is shown below:

class Bucket  attr_reader :table  def initialize    @table = []  end  def [](key)    find(key)  end  def []=(key, value)    insert(key, value)  end  def insert(key, value)    idx = @table.find_index { |table_key, _| table_key == key }    if idx.nil?      @table << [key, value]      return nil    else      value, @table[idx][1] = @table[idx][1], value      return value    end  end  def size    @table.size  end  def find(key)    pair = @table.find { |table_key, _| table_key == key }    pair.nil? ? nil : pair.last  end  def delete(key)    idx = @table.find_index { |table_key, _| table_key == key }    return nil if idx.nil?    value = @table[idx].last    @table[idx] = @table.last if idx != @table.size - 1    @table.pop    value  end  def each    return enum_for(:each) unless block_given?    @table.each do |key, value|      yield(key, value)    end  endend

#### bucket_* Methods #

ChainedHashTable class relies in the Bucket class discussed above to do the actual implementation of its main operations, namely find, insert and delete.

bucket_find logic is rather simple. It checks if the table has a valid bucket and if it does it returns the result of calling find in the bucket, otherwise returns nil.

def bucket_find(hash_code, key)  bucket = @table[hash_code]  return nil if bucket.nil?  bucket.find(key)end

bucket_insert checks if there is already a valid bucket in the position to insert and then proceeds to produce the insertion within the bucket. It changes the size by adding the difference between the previous size and the current size (which could be either 0 or 1) and returns the result of the insert call in the bucket object.

def bucket_insert(hash_code, key, value)  bucket = @table[hash_code]  bucket = @table[hash_code] = Hash::Bucket.new if bucket.nil?  old_size = bucket.size  old_value = bucket.insert(key, value)  @size += (bucket.size - old_size)  old_valueend

And finally, bucket_delete verifies if there is a bucket in the hash_code position and in case there is valid bucket, it proceeds to delete the key in the bucket and returns the result from this action.

def bucket_delete(hash_code, key)  bucket = @table[hash_code]  return nil if bucket.nil?  old_size = bucket.size  value = bucket.delete(key)  @size -= (old_size - bucket.size)  valueend

To end the ChainedHashTable section, below it is shown the code:

class ChainedHashTable < Hash::BaseHashTable  def initialize(capacity: 31, prime: 109_345_121, compression_lambda: nil)    super  end  private  def bucket_find(hash_code, key)    bucket = @table[hash_code]    return nil if bucket.nil?    bucket.find(key)  end  def bucket_insert(hash_code, key, value)    bucket = @table[hash_code]    bucket = @table[hash_code] = Hash::Bucket.new if bucket.nil?    old_size = bucket.size    old_value = bucket.insert(key, value)    @size += (bucket.size - old_size)    old_value  end  def bucket_delete(hash_code, key)    bucket = @table[hash_code]    return nil if bucket.nil?    old_size = bucket.size    value = bucket.delete(key)    @size -= (old_size - bucket.size)    value  end  def bucket_each    @table.each do |bucket|      next if bucket.nil?      bucket.each do |key, value|        yield(key, value)      end    end  endend

## ProbeHashTable #

ProbeHashTable class implements by default linear probing as collision-resolution strategy, but it has a probe_lambda parameter in the constructor that allows us to passs another resolution scheme as a lambda function. In turn, this facilitates the utilization of different alternatives.

A point to highlight is the utilization of a private singleton class, shown below, to differentiate positions where pairs have been deleted, from nil or empty positions. This is necessary in order to proper use probing strategies to handle collissions.

Sentinel = Class.new do  include Singletonend

#### bucket_* Methods #

The main methods implementation rely on the usage of private method find_slot which handles getting the corresponding index location for a given hash_code.

The bucket_find logic is easily understood. It calls find_slot to get the index location and returns nil if the position is available or the corresponding value for the key.

def bucket_find(hash_code, key)  idx = find_slot(hash_code, key)  slot_available?(idx) ? nil : @table[idx].lastend

Similarly, the bucket_insert method finds the slot to be used and either replaces or insert the new (key, value) pair based on the availability of the index position returned by the find_slot call.

def bucket_insert(hash_code, key, value)  idx = find_slot(hash_code, key)  unless slot_available?(idx)    old_value = @table[idx].last    @table[idx] = [key, value]    return old_value  end  @table[idx] = [key, value]  @size += 1  nilend

Last, but no the least, bucket_delete returns nil if find_slot returns an available slot, otherwise it sets the index position with the special sentinel mark and returns the previous value there.

def bucket_delete(hash_code, key)  idx = find_slot(hash_code, key)  return nil if slot_available?(idx)  value = @table[idx].last  @table[idx] = Sentinel.instance  @size -= 1  valueend

To end the discussion of the ProbeHashTable class, below we share the class implementation, including the private methods seen before:

class ProbeHashTable < Hash::BaseHashTable  attr_reader :probe_lambda  def initialize(capacity: 31, prime: 109_345_121,               compression_lambda: nil, probe_lambda: nil)    super(capacity: capacity, prime: prime,        compression_lambda: compression_lambda)    @probe_lambda = probe_lambda    return unless @probe_lambda.nil?    @probe_lambda = ->(h, i, cap) { return (h + i) % cap }  end  private  Sentinel = Class.new do    include Singleton  end  def bucket_find(hash_code, key)    idx = find_slot(hash_code, key)    slot_available?(idx) ? nil : @table[idx].last  end  def bucket_insert(hash_code, key, value)    idx = find_slot(hash_code, key)    unless slot_available?(idx)      old_value = @table[idx].last      @table[idx] = [key, value]      return old_value    end    @table[idx] = [key, value]    @size += 1    nil  end  def bucket_delete(hash_code, key)    idx = find_slot(hash_code, key)    return nil if slot_available?(idx)    value = @table[idx].last    @table[idx] = Sentinel.instance    @size -= 1    value  end  def bucket_each    @table.each do |elem|      next if elem.nil? || elem == Sentinel.instance      yield(elem.first, elem.last)    end  end  def find_slot(h, key)    idx = -1    j = 0    loop do      i = @probe_lambda.call(h, j, @capacity)      if slot_available?(i)        idx = i if idx == -1        break if @table[i].nil?      elsif @table[i].first == key        return i      end      j += 1    end    idx  end  def slot_available?(i)    @table[i].nil? || @table[i] == Sentinel.instance  endend

Hey!! You made it this far!! Get yourself a cookie :). This post was full of chunks of code to show how DataStructures101 gem implements a HashTable data structure. Our next post will close this serie about HashTable and it will provide several analyses and draw some conclusions from them.

Thanks to read till the very end. Hope it was worth for you as it was creating it for me. See you soon and stay tuned for more!!

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.