# Data Structures 101 - Hash Table implementation - part 2

In the first installment, we highlighted the main concepts and features of a Hash Table.

Using tha previous knowledge as our starting point, today we are going to show an implementation in Ruby as part of the DataStructures101 gem.

The gem has two (2) user-facing classes for hash table related operations:

• ChainedHashTable which uses separate chaining for collisions resolutions.
• ProbeHashTable which uses linear probing and its variants as collisions-handling scheme.

Both classes rely on a common base class named BaseHashTable which implements the public interface to interact with a hash table object and some necessary methods such as resize. On the other hand, BaseHashTable depends on its children implementations to decide how to do the actual search, insertion and deletion of keys.

## BaseHashTable

Our base HashTable class handles the table array creation, the resize logic and it stores a lambda function to perform the compression step of the keys' hash codes.

#### Why a lambda function?

For a production-ready HashTable implementation, we wouldn’t use a lambda function, since it adds a small overhead to performance and it can sometimes reduce readability of the code. Instead, we would use directly the MAD method within the code without even making the external interface aware of it. But for our purposes, it works just great, because it allow us to swap different compression function implementations and thus make easier to analyze and compare them.

#### Implementation Details

BaseHashTable class receives as constructor parameters the initial capacity for the bucket table, a prime number and compression lambda. If the compression lambda is nil, then the prime number is required to generate the default compression function using the MAD method.

def initialize(capacity:, prime:, compression_lambda:)
@capacity = capacity
@size = 0
@table = Array.new(@capacity)

@compression_lambda = compression_lambda

return unless @compression_lambda.nil?

random = Random.new
scale = random.rand(prime - 1) + 1
shift = random.rand(prime)
@compression_lambda = lambda do |key, cap|
return (((key.hash * scale + shift) % prime) % cap).abs
end
end


There are three main things worth to point out:

• Our implementation relies on the objects being used as key to define a proper hash code.
• Since the lambda acts as a closure, we need to pass as a parameter the capacity of the array, otherwise, it would get stuck with whatever value was used at the moment of the lambda creation and we would end up with extra slots in the array that wouldn’t be used.
• If we ever were to resize larger than prime number, we would end with a similar problem as above, we would waste space in the array that would never be filled.

The insert method makes the call to insert to the child method implementation (bucket_insert) and then resizes the bucket table if the size is larger than half of the current capacity (this is to guarantee that probing strategies can always insert successfully). Finally it returns the previous value, if any.

def insert(key, value)
hash_code = compression_lambda.call(key, @capacity)
old_value = bucket_insert(hash_code, key, value)

# keep load factor <= 0.5
resize(new_capacity) if @size > @capacity / 2

old_value
end


The other two user-facing methods: [] and delete are simply a wrapper around the child method implementations.

def [](key)
bucket_find(compression_lambda.call(key, @capacity), key)
end

def delete(key)
bucket_delete(compression_lambda.call(key, @capacity), key)
end


To increase the usability of the class, we have included Enumerable module and implemented the each method as follow.

def each
return enum_for(:each) unless block_given?

bucket_each do |key, value|
yield(key, value)
end
end


The final BaseHashTable implementation is shown below, including the private methods new_capacity and resize.

class BaseHashTable
include Enumerable

def initialize(capacity:, prime:, compression_lambda:)
@capacity = capacity
@size = 0
@table = Array.new(@capacity)

@compression_lambda = compression_lambda

return unless @compression_lambda.nil?

random = Random.new
scale = random.rand(prime - 1) + 1
shift = random.rand(prime)
@compression_lambda = lambda do |key, cap|
return (((key.hash * scale + shift) % prime) % cap).abs
end
end

def []=(key, value)
insert(key, value)
end

def insert(key, value)
hash_code = compression_lambda.call(key, @capacity)
old_value = bucket_insert(hash_code, key, value)

# Keep load factor below 0.5.
resize(new_capacity) if @size > @capacity / 2

old_value
end

def [](key)
bucket_find(compression_lambda.call(key, @capacity), key)
end

def delete(key)
bucket_delete(compression_lambda.call(key, @capacity), key)
end

def each
return enum_for(:each) unless block_given?

bucket_each do |key, value|
yield(key, value)
end
end

private

def new_capacity
2 * @capacity - 1
end

def resize(new_cap)
@capacity = new_cap

buffer = map { |key, value| [key, value] }

@table = Array.new(@capacity)
@size = 0

buffer.each { |key, value| self[key] = value }
end
end


## ChainedHashTable

ChainedHashTable class implements separate chaining as collision-handling scheme. To achieve it, it uses a support class named Bucket which handles the storage and retrieval of (key, value) pairs that match to the same hashcode.

#### Bucket Utility Class

The Bucket class uses an array as internal storage and stores at each position a two-values array [key, value]. Next we are going to highlight some details about the main methods.

The insert method, shown below, appends a new pair at the end of array if the key is not present in the bucket and returns nil, otherwise it swaps the previous value with the new one in the existing pair and returns the old value.

def insert(key, value)
idx = @table.find_index { |table_key, _| table_key == key }

if idx.nil?
@table << [key, value]
return nil
else
value, @table[idx][1] = @table[idx][1], value
return value
end
end


The delete method removes an existing key from the bucket and returns the associated value if the key is found, otherwise it returns nil. One subtlety to mention is that it swaps the last entry to the position of the entry being removed. This way it can avoid a shift of the values to the left (which has $$O(n)$$ performance) by calling pop (which performs in constant amount of time) instead of directly calling delete_at using the idx variable.

def delete(key)
idx = @table.find_index { |table_key, _| table_key == key }
return nil if idx.nil?

value = @table[idx].last
@table[idx] = @table.last if idx != @table.size - 1
@table.pop
value
end


To finish with the Bucket class analysis, the complete code is shown below:

class Bucket

def initialize
@table = []
end

def [](key)
find(key)
end

def []=(key, value)
insert(key, value)
end

def insert(key, value)
idx = @table.find_index { |table_key, _| table_key == key }

if idx.nil?
@table << [key, value]
return nil
else
value, @table[idx][1] = @table[idx][1], value
return value
end
end

def size
@table.size
end

def find(key)
pair = @table.find { |table_key, _| table_key == key }
pair.nil? ? nil : pair.last
end

def delete(key)
idx = @table.find_index { |table_key, _| table_key == key }
return nil if idx.nil?

value = @table[idx].last
@table[idx] = @table.last if idx != @table.size - 1
@table.pop
value
end

def each
return enum_for(:each) unless block_given?

@table.each do |key, value|
yield(key, value)
end
end
end


#### bucket_* Methods

ChainedHashTable class relies in the Bucket class discussed above to do the actual implementation of its main operations, namely find, insert and delete.

bucket_find logic is rather simple. It checks if the table has a valid bucket and if it does it returns the result of calling find in the bucket, otherwise returns nil.

def bucket_find(hash_code, key)
bucket = @table[hash_code]
return nil if bucket.nil?

bucket.find(key)
end


bucket_insert checks if there is already a valid bucket in the position to insert and then proceeds to produce the insertion within the bucket. It changes the size by adding the difference between the previous size and the current size (which could be either 0 or 1) and returns the result of the insert call in the bucket object.

def bucket_insert(hash_code, key, value)
bucket = @table[hash_code]
bucket = @table[hash_code] = Hash::Bucket.new if bucket.nil?

old_size = bucket.size
old_value = bucket.insert(key, value)
@size += (bucket.size - old_size)

old_value
end


And finally, bucket_delete verifies if there is a bucket in the hash_code position and in case there is valid bucket, it proceeds to delete the key in the bucket and returns the result from this action.

def bucket_delete(hash_code, key)
bucket = @table[hash_code]
return nil if bucket.nil?

old_size = bucket.size
value = bucket.delete(key)
@size -= (old_size - bucket.size)

value
end


To end the ChainedHashTable section, below it is shown the code:

class ChainedHashTable < Hash::BaseHashTable
def initialize(capacity: 31, prime: 109_345_121, compression_lambda: nil)
super
end

private

def bucket_find(hash_code, key)
bucket = @table[hash_code]
return nil if bucket.nil?

bucket.find(key)
end

def bucket_insert(hash_code, key, value)
bucket = @table[hash_code]
bucket = @table[hash_code] = Hash::Bucket.new if bucket.nil?

old_size = bucket.size
old_value = bucket.insert(key, value)
@size += (bucket.size - old_size)

old_value
end

def bucket_delete(hash_code, key)
bucket = @table[hash_code]
return nil if bucket.nil?

old_size = bucket.size
value = bucket.delete(key)
@size -= (old_size - bucket.size)

value
end

def bucket_each
@table.each do |bucket|
next if bucket.nil?

bucket.each do |key, value|
yield(key, value)
end
end
end
end


## ProbeHashTable

ProbeHashTable class implements by default linear probing as collision-resolution strategy, but it has a probe_lambda parameter in the constructor that allows us to passs another resolution scheme as a lambda function. In turn, this facilitates the utilization of different alternatives.

A point to highlight is the utilization of a private singleton class, shown below, to differentiate positions where pairs have been deleted, from nil or empty positions. This is necessary in order to proper use probing strategies to handle collissions.

Sentinel = Class.new do
include Singleton
end


#### bucket_* Methods

The main methods implementation rely on the usage of private method find_slot which handles getting the corresponding index location for a given hash_code.

The bucket_find logic is easily understood. It calls find_slot to get the index location and returns nil if the position is available or the corresponding value for the key.

def bucket_find(hash_code, key)
idx = find_slot(hash_code, key)

slot_available?(idx) ? nil : @table[idx].last
end


Similarly, the bucket_insert method finds the slot to be used and either replaces or insert the new (key, value) pair based on the availability of the index position returned by the find_slot call.

def bucket_insert(hash_code, key, value)
idx = find_slot(hash_code, key)

unless slot_available?(idx)
old_value = @table[idx].last
@table[idx] = [key, value]
return old_value
end

@table[idx] = [key, value]
@size += 1

nil
end


Last, but no the least, bucket_delete returns nil if find_slot returns an available slot, otherwise it sets the index position with the special sentinel mark and returns the previous value there.

def bucket_delete(hash_code, key)
idx = find_slot(hash_code, key)
return nil if slot_available?(idx)

value = @table[idx].last
@table[idx] = Sentinel.instance
@size -= 1

value
end


To end the discussion of the ProbeHashTable class, below we share the class implementation, including the private methods seen before:

class ProbeHashTable < Hash::BaseHashTable

def initialize(capacity: 31, prime: 109_345_121,
compression_lambda: nil, probe_lambda: nil)
super(capacity: capacity, prime: prime,
compression_lambda: compression_lambda)

@probe_lambda = probe_lambda

return unless @probe_lambda.nil?

@probe_lambda = ->(h, i, cap) { return (h + i) % cap }
end

private

Sentinel = Class.new do
include Singleton
end

def bucket_find(hash_code, key)
idx = find_slot(hash_code, key)

slot_available?(idx) ? nil : @table[idx].last
end

def bucket_insert(hash_code, key, value)
idx = find_slot(hash_code, key)

unless slot_available?(idx)
old_value = @table[idx].last
@table[idx] = [key, value]
return old_value
end

@table[idx] = [key, value]
@size += 1

nil
end

def bucket_delete(hash_code, key)
idx = find_slot(hash_code, key)
return nil if slot_available?(idx)

value = @table[idx].last
@table[idx] = Sentinel.instance
@size -= 1

value
end

def bucket_each
@table.each do |elem|
next if elem.nil? || elem == Sentinel.instance

yield(elem.first, elem.last)
end
end

def find_slot(h, key)
idx = -1

j = 0
loop do
i = @probe_lambda.call(h, j, @capacity)
if slot_available?(i)
idx = i if idx == -1
break if @table[i].nil?
elsif @table[i].first == key
return i
end
j += 1
end

idx
end

def slot_available?(i)
@table[i].nil? || @table[i] == Sentinel.instance
end
end


Hey!! You made it this far!! Get yourself a cookie :). This post was full of chunks of code to show how DataStructures101 gem implements a HashTable data structure. Our next post will close this serie about HashTable and it will provide several analyses and draw some conclusions from them.

Thanks to read till the very end. Hope it was worth for you as it was creating it for me. See you soon and stay tuned for more!!

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.