In the first installment, we highlighted the main concepts and features of a Hash Table.
Using tha previous knowledge as our starting point, today we are going to show an implementation in Ruby as part of the DataStructures101 gem.
The gem has two (2) user-facing classes for hash table related operations:
- ChainedHashTable which uses separate chaining for collisions resolutions.
- ProbeHashTable which uses linear probing and its variants as collisions-handling scheme.
Both classes rely on a common base class named BaseHashTable which implements the public interface to interact with a hash table object and some necessary methods such as resize
. On the other hand, BaseHashTable depends on its children implementations to decide how to do the actual search, insertion and deletion of keys.
BaseHashTable
Our base HashTable class handles the table array creation, the resize logic and it stores a lambda function to perform the compression step of the keys' hash codes.
Why a lambda function?
For a production-ready HashTable implementation, we wouldn’t use a lambda function, since it adds a small overhead to performance and it can sometimes reduce readability of the code. Instead, we would use directly the MAD method within the code without even making the external interface aware of it. But for our purposes, it works just great, because it allow us to swap different compression function implementations and thus make easier to analyze and compare them.
Implementation Details
BaseHashTable
class receives as constructor parameters the initial capacity for the bucket table, a prime number and compression lambda. If the compression lambda is nil, then the prime number is required to generate the default compression function using the MAD method.
def initialize(capacity:, prime:, compression_lambda:)
@capacity = capacity
@size = 0
@table = Array.new(@capacity)
@compression_lambda = compression_lambda
return unless @compression_lambda.nil?
random = Random.new
scale = random.rand(prime - 1) + 1
shift = random.rand(prime)
@compression_lambda = lambda do |key, cap|
return (((key.hash * scale + shift) % prime) % cap).abs
end
end
There are three main things worth to point out:
- Our implementation relies on the objects being used as key to define a proper hash code.
- Since the lambda acts as a closure, we need to pass as a parameter the capacity of the array, otherwise, it would get stuck with whatever value was used at the moment of the lambda creation and we would end up with extra slots in the array that wouldn’t be used.
- If we ever were to resize larger than prime number, we would end with a similar problem as above, we would waste space in the array that would never be filled.
The insert
method makes the call to insert to the child method implementation (bucket_insert
) and then resizes the bucket table if the size is larger than half of the current capacity (this is to guarantee that probing strategies can always insert successfully). Finally it returns the previous value, if any.
def insert(key, value)
hash_code = compression_lambda.call(key, @capacity)
old_value = bucket_insert(hash_code, key, value)
# keep load factor <= 0.5
resize(new_capacity) if @size > @capacity / 2
old_value
end
The other two user-facing methods: []
and delete
are simply a wrapper around the child method implementations.
def [](key)
bucket_find(compression_lambda.call(key, @capacity), key)
end
def delete(key)
bucket_delete(compression_lambda.call(key, @capacity), key)
end
To increase the usability of the class, we have included Enumerable module and implemented the each
method as follow.
def each
return enum_for(:each) unless block_given?
bucket_each do |key, value|
yield(key, value)
end
end
The final BaseHashTable
implementation is shown below, including the private methods new_capacity
and resize
.
class BaseHashTable
include Enumerable
attr_reader :size, :compression_lambda, :capacity
def initialize(capacity:, prime:, compression_lambda:)
@capacity = capacity
@size = 0
@table = Array.new(@capacity)
@compression_lambda = compression_lambda
return unless @compression_lambda.nil?
random = Random.new
scale = random.rand(prime - 1) + 1
shift = random.rand(prime)
@compression_lambda = lambda do |key, cap|
return (((key.hash * scale + shift) % prime) % cap).abs
end
end
def []=(key, value)
insert(key, value)
end
def insert(key, value)
hash_code = compression_lambda.call(key, @capacity)
old_value = bucket_insert(hash_code, key, value)
# Keep load factor below 0.5.
resize(new_capacity) if @size > @capacity / 2
old_value
end
def [](key)
bucket_find(compression_lambda.call(key, @capacity), key)
end
def delete(key)
bucket_delete(compression_lambda.call(key, @capacity), key)
end
def each
return enum_for(:each) unless block_given?
bucket_each do |key, value|
yield(key, value)
end
end
private
def new_capacity
2 * @capacity - 1
end
def resize(new_cap)
@capacity = new_cap
buffer = map { |key, value| [key, value] }
@table = Array.new(@capacity)
@size = 0
buffer.each { |key, value| self[key] = value }
end
end
ChainedHashTable
ChainedHashTable
class implements separate chaining as collision-handling scheme. To achieve it, it uses a support class named Bucket
which handles the storage and retrieval of (key, value) pairs that match to the same hashcode.
Bucket Utility Class
The Bucket
class uses an array as internal storage and stores at each position a two-values array [key, value]. Next we are going to highlight some details about the main methods.
The insert
method, shown below, appends a new pair at the end of array if the key is not present in the bucket and returns nil, otherwise it swaps the previous value with the new one in the existing pair and returns the old value.
def insert(key, value)
idx = @table.find_index { |table_key, _| table_key == key }
if idx.nil?
@table << [key, value]
return nil
else
value, @table[idx][1] = @table[idx][1], value
return value
end
end
The delete
method removes an existing key from the bucket and returns the associated value if the key is found, otherwise it returns nil. One subtlety to mention is that it swaps the last entry to the position of the entry being removed. This way it can avoid a shift of the values to the left (which has \(O(n)\) performance) by calling pop
(which performs in constant amount of time) instead of directly calling delete_at
using the idx variable.
def delete(key)
idx = @table.find_index { |table_key, _| table_key == key }
return nil if idx.nil?
value = @table[idx].last
@table[idx] = @table.last if idx != @table.size - 1
@table.pop
value
end
To finish with the Bucket
class analysis, the complete code is shown below:
class Bucket
attr_reader :table
def initialize
@table = []
end
def [](key)
find(key)
end
def []=(key, value)
insert(key, value)
end
def insert(key, value)
idx = @table.find_index { |table_key, _| table_key == key }
if idx.nil?
@table << [key, value]
return nil
else
value, @table[idx][1] = @table[idx][1], value
return value
end
end
def size
@table.size
end
def find(key)
pair = @table.find { |table_key, _| table_key == key }
pair.nil? ? nil : pair.last
end
def delete(key)
idx = @table.find_index { |table_key, _| table_key == key }
return nil if idx.nil?
value = @table[idx].last
@table[idx] = @table.last if idx != @table.size - 1
@table.pop
value
end
def each
return enum_for(:each) unless block_given?
@table.each do |key, value|
yield(key, value)
end
end
end
bucket_* Methods
ChainedHashTable
class relies in the Bucket
class discussed above to do the actual implementation of its main operations, namely find, insert and delete.
bucket_find
logic is rather simple. It checks if the table has a valid bucket and if it does it returns the result of calling find in the bucket, otherwise returns nil.
def bucket_find(hash_code, key)
bucket = @table[hash_code]
return nil if bucket.nil?
bucket.find(key)
end
bucket_insert
checks if there is already a valid bucket in the position to insert and then proceeds to produce the insertion within the bucket. It changes the size by adding the difference between the previous size and the current size (which could be either 0 or 1) and returns the result of the insert
call in the bucket object.
def bucket_insert(hash_code, key, value)
bucket = @table[hash_code]
bucket = @table[hash_code] = Hash::Bucket.new if bucket.nil?
old_size = bucket.size
old_value = bucket.insert(key, value)
@size += (bucket.size - old_size)
old_value
end
And finally, bucket_delete
verifies if there is a bucket in the hash_code position and in case there is valid bucket, it proceeds to delete the key in the bucket and returns the result from this action.
def bucket_delete(hash_code, key)
bucket = @table[hash_code]
return nil if bucket.nil?
old_size = bucket.size
value = bucket.delete(key)
@size -= (old_size - bucket.size)
value
end
To end the ChainedHashTable
section, below it is shown the code:
class ChainedHashTable < Hash::BaseHashTable
def initialize(capacity: 31, prime: 109_345_121, compression_lambda: nil)
super
end
private
def bucket_find(hash_code, key)
bucket = @table[hash_code]
return nil if bucket.nil?
bucket.find(key)
end
def bucket_insert(hash_code, key, value)
bucket = @table[hash_code]
bucket = @table[hash_code] = Hash::Bucket.new if bucket.nil?
old_size = bucket.size
old_value = bucket.insert(key, value)
@size += (bucket.size - old_size)
old_value
end
def bucket_delete(hash_code, key)
bucket = @table[hash_code]
return nil if bucket.nil?
old_size = bucket.size
value = bucket.delete(key)
@size -= (old_size - bucket.size)
value
end
def bucket_each
@table.each do |bucket|
next if bucket.nil?
bucket.each do |key, value|
yield(key, value)
end
end
end
end
ProbeHashTable
ProbeHashTable
class implements by default linear probing as collision-resolution strategy, but it has a probe_lambda parameter in the constructor that allows us to passs another resolution scheme as a lambda function. In turn, this facilitates the utilization of different alternatives.
A point to highlight is the utilization of a private singleton class, shown below, to differentiate positions where pairs have been deleted, from nil or empty positions. This is necessary in order to proper use probing strategies to handle collissions.
Sentinel = Class.new do
include Singleton
end
bucket_* Methods
The main methods implementation rely on the usage of private method find_slot
which handles getting the corresponding index location for a given hash_code.
The bucket_find
logic is easily understood. It calls find_slot
to get the index location and returns nil if the position is available or the corresponding value for the key.
def bucket_find(hash_code, key)
idx = find_slot(hash_code, key)
slot_available?(idx) ? nil : @table[idx].last
end
Similarly, the bucket_insert
method finds the slot to be used and either replaces or insert the new (key, value) pair based on the availability of the index position returned by the find_slot
call.
def bucket_insert(hash_code, key, value)
idx = find_slot(hash_code, key)
unless slot_available?(idx)
old_value = @table[idx].last
@table[idx] = [key, value]
return old_value
end
@table[idx] = [key, value]
@size += 1
nil
end
Last, but no the least, bucket_delete
returns nil if find_slot
returns an available slot, otherwise it sets the index position with the special sentinel mark and returns the previous value there.
def bucket_delete(hash_code, key)
idx = find_slot(hash_code, key)
return nil if slot_available?(idx)
value = @table[idx].last
@table[idx] = Sentinel.instance
@size -= 1
value
end
To end the discussion of the ProbeHashTable
class, below we share the class implementation, including the private methods seen before:
class ProbeHashTable < Hash::BaseHashTable
attr_reader :probe_lambda
def initialize(capacity: 31, prime: 109_345_121,
compression_lambda: nil, probe_lambda: nil)
super(capacity: capacity, prime: prime,
compression_lambda: compression_lambda)
@probe_lambda = probe_lambda
return unless @probe_lambda.nil?
@probe_lambda = ->(h, i, cap) { return (h + i) % cap }
end
private
Sentinel = Class.new do
include Singleton
end
def bucket_find(hash_code, key)
idx = find_slot(hash_code, key)
slot_available?(idx) ? nil : @table[idx].last
end
def bucket_insert(hash_code, key, value)
idx = find_slot(hash_code, key)
unless slot_available?(idx)
old_value = @table[idx].last
@table[idx] = [key, value]
return old_value
end
@table[idx] = [key, value]
@size += 1
nil
end
def bucket_delete(hash_code, key)
idx = find_slot(hash_code, key)
return nil if slot_available?(idx)
value = @table[idx].last
@table[idx] = Sentinel.instance
@size -= 1
value
end
def bucket_each
@table.each do |elem|
next if elem.nil? || elem == Sentinel.instance
yield(elem.first, elem.last)
end
end
def find_slot(h, key)
idx = -1
j = 0
loop do
i = @probe_lambda.call(h, j, @capacity)
if slot_available?(i)
idx = i if idx == -1
break if @table[i].nil?
elsif @table[i].first == key
return i
end
j += 1
end
idx
end
def slot_available?(i)
@table[i].nil? || @table[i] == Sentinel.instance
end
end
Hey!! You made it this far!! Get yourself a cookie :). This post was full of chunks of code to show how DataStructures101 gem implements a HashTable data structure. Our next post will close this serie about HashTable and it will provide several analyses and draw some conclusions from them.
Thanks to read till the very end. Hope it was worth for you as it was creating it for me. See you soon and stay tuned for more!!