As part of the development of the DataStructures101 gem, I recently added a binary Heap implementation to the codebase.
A heap is a specialized tree-based structure that holds the following property true: the key of a parent node is always, either less than or equal (in a min heap) or greater than or equal (in a max heap), than any of its children.
The heap structure is very well known and has a lot of applications, among the most commons being the underlying implementation of a Priority-Queue and part of the HeapSort algorithm.
Heap
A binary Heap can be viewed both as a nearly complete binary tree or as an array, as shown in the figures below.
Each node of the three corresponds with an element of the array. The lines in the array show the parent-child relationship with the parent always at the left side.
Using an array-based implementation, we can express the heap property in the following way:
$$ A[Parent(i)] \le A[i] \quad for \ a \ \textbf{min-heap} $$ $$ A[Parent(i)] \ge A[i] \quad for \ a \ \textbf{max-heap} $$
Operations
The most important operations involving heaps are:
- Build-Heap: Operation to create a heap out of a given array of values. O(n)
- Insert (Push): Adds a new key to the heap. As part of the execution of this operation, it could cause the heap to violate the heap property and thus, the data structure must be rebalanced. O(log n)
- Extract-Max, Extract-Min (Pop): Removes the root of the heap, which is either the min (or max) element in the heap. Requires as well a rebalance of the heap. O(log n)
- Merge: Combines to heap to form a valid new heap containing all the elements of the original two. O(n), where n is the size of larger heap
Implementation
Our Heap
class will support both max_heap and min_heap. It will do so by keeping the heap property as a lambda defined (@heap_check
) in the constructor based on the value min_heap flag:
class Heap
attr_reader :min_heap
def initialize(*args, min_heap: false)
@data = args
@min_heap = min_heap
@heap_check = ->(i, j) { return @data[i] >= @data[j] }
if min_heap
@heap_check = ->(i, j) { return @data[i] <= @data[j] }
end
build_heap
end
As it is shown in the initialize
method, the Heap construction finishes with the build_heap
operation, which will construct the heap in-place by applying heapify
recursively, as it can be seen in the code below. Notice that build_heap
avoids calling heapify on the second half of the data
array because a single node by itself holds valid the heap property.
def build_heap
start = @data.length / 2
start.downto(0) { |i| heapify(i) }
end
def heapify(i)
l = left(i)
r = right(i)
head = i
head = l if l < @data.size && @heap_check.call(l, head)
head = r if r < @data.size && @heap_check.call(r, head)
if head != i
@data[i], @data[head] = @data[head], @data[i]
heapify(head)
end
end
The left
and right
methods provide the children position of node in the array using 0-based indexing. Associated with them, parent
method allows to obtain the parent node in the array. These methods are implemented as follows:
def left(i)
2 * i + 1
end
def right(i)
2 * (i + 1)
end
def parent(i)
(i - 1) / 2
end
The following operations let the heap behave as a priority queue. The push
operation allows to insert new values into the heap, and will guarantee that the heap maintains its property after each insertion, by applying the upheap
method to the corresponding new inserted value. This will move the value up in the heap to its right location. On the other hand, the pop
operation removes the top element of the heap (either min or max) and will force a heapify
operation to re-establish the heap invariant, by buckle down to the right location the temporary last value that was set to replace the top of the heap.
def push(*values)
values.each do |val|
@data.push(val)
upheap(@data.size - 1)
end
self
end
def pop
result = @data.first
@data[0] = @data.pop
heapify(0)
result
end
Complete version
Combining all the code explained above, the final version of the heap implementations is as follow:
class Heap
attr_reader :min_heap
def initialize(*args, min_heap: false)
@data = args
@min_heap = min_heap
@heap_check = ->(i, j) { return @data[i] >= @data[j] }
if min_heap
@heap_check = ->(i, j) { return @data[i] <= @data[j] }
end
build_heap
end
def size
@data.size
end
def [](i)
@data[i]
end
def push(*values)
values.each do |val|
@data.push(val)
upheap(@data.size - 1)
end
self
end
def pop
result = @data.first
@data[0] = @data.pop
heapify(0)
result
end
def merge(heap)
new_array = @data + heap.instance_variable_get(:@data)
Heap.new(new_array, min_heap: self.min_heap)
end
def left(i)
2 * i + 1
end
def right(i)
2 * (i + 1)
end
def parent(i)
(i - 1) / 2
end
private
def build_heap
start = @data.length / 2
start.downto(0) { |i| heapify(i) }
end
def heapify(i)
l = left(i)
r = right(i)
head = i
head = l if l < @data.size && @heap_check.call(l, head)
head = r if r < @data.size && @heap_check.call(r, head)
if head != i
@data[i], @data[head] = @data[head], @data[i]
heapify(head)
end
end
end
Conclusions
With this post, we went over a simple Heap implementation in Ruby for the DataStructures101 gem. As part of the discussion, we mentioned what the heap property is and the most common operations a heap should support.
Then we analyzed a simple implementation of Heap in Ruby and provided some comments about the methods behavior. Finally we concluded with an overview of the final implementation.
Thank you so much for reading this post. Hope you liked reading it as much as I did writing it. Stay tuned for more!!