Data Structures 101 - Heap Implementation


As part of the development of the DataStructures101 gem, I recently added a binary Heap implementation to the codebase.

A heap is a specialized tree-based structure that holds the following property true: the key of a parent node is always, either less than or equal (in a min heap) or greater than or equal (in a max heap), than any of its children.

The heap structure is very well known and has a lot of applications, among the most commons being the underlying implementation of a Priority-Queue and part of the HeapSort algorithm.

Heap

A binary Heap can be viewed both as a nearly complete binary tree or as an array, as shown in the figures below.

heap_tree heap_array

Each node of the three corresponds with an element of the array. The lines in the array show the parent-child relationship with the parent always at the left side.

Using an array-based implementation, we can express the heap property in the following way:

$$ A[Parent(i)] \le A[i] \quad for \ a \ \textbf{min-heap} $$ $$ A[Parent(i)] \ge A[i] \quad for \ a \ \textbf{max-heap} $$

Operations

The most important operations involving heaps are:

  • Build-Heap: Operation to create a heap out of a given array of values. O(n)
  • Insert (Push): Adds a new key to the heap. As part of the execution of this operation, it could cause the heap to violate the heap property and thus, the data structure must be rebalanced. O(log n)
  • Extract-Max, Extract-Min (Pop): Removes the root of the heap, which is either the min (or max) element in the heap. Requires as well a rebalance of the heap. O(log n)
  • Merge: Combines to heap to form a valid new heap containing all the elements of the original two. O(n), where n is the size of larger heap

Implementation

Our Heap class will support both max_heap and min_heap. It will do so by keeping the heap property as a lambda defined (@heap_check) in the constructor based on the value min_heap flag:

class Heap
  attr_reader :min_heap

  def initialize(*args, min_heap: false)
    @data = args
    @min_heap = min_heap
    @heap_check = ->(i, j) { return @data[i] >= @data[j] }

    if min_heap
      @heap_check = ->(i, j) { return @data[i] <= @data[j] }
    end

    build_heap
  end

As it is shown in the initialize method, the Heap construction finishes with the build_heap operation, which will construct the heap in-place by applying heapify recursively, as it can be seen in the code below. Notice that build_heap avoids calling heapify on the second half of the data array because a single node by itself holds valid the heap property.

  def build_heap
    start = @data.length / 2

    start.downto(0) { |i| heapify(i) }
  end

  def heapify(i)
    l = left(i)
    r = right(i)

    head = i
    head = l if l < @data.size && @heap_check.call(l, head)
    head = r if r < @data.size && @heap_check.call(r, head)

    if head != i
      @data[i], @data[head] = @data[head], @data[i]
        heapify(head)
    end
  end

The left and right methods provide the children position of node in the array using 0-based indexing. Associated with them, parent method allows to obtain the parent node in the array. These methods are implemented as follows:

  def left(i)
    2 * i + 1
  end

  def right(i)
    2 * (i + 1)
  end

  def parent(i)
    (i - 1) / 2
  end

The following operations let the heap behave as a priority queue. The push operation allows to insert new values into the heap, and will guarantee that the heap maintains its property after each insertion, by applying the upheap method to the corresponding new inserted value. This will move the value up in the heap to its right location. On the other hand, the pop operation removes the top element of the heap (either min or max) and will force a heapify operation to re-establish the heap invariant, by buckle down to the right location the temporary last value that was set to replace the top of the heap.

 def push(*values)
  values.each do |val|
    @data.push(val)
    upheap(@data.size - 1)
  end
  self
end

def pop
  result = @data.first
  @data[0] = @data.pop
  heapify(0)
  result
end

Complete version

Combining all the code explained above, the final version of the heap implementations is as follow:

class Heap
  attr_reader :min_heap

  def initialize(*args, min_heap: false)
    @data = args
    @min_heap = min_heap
    @heap_check = ->(i, j) { return @data[i] >= @data[j] }

    if min_heap
      @heap_check = ->(i, j) { return @data[i] <= @data[j] }
    end

    build_heap
  end

  def size
    @data.size
  end

  def [](i)
    @data[i]
  end

  def push(*values)
    values.each do |val|
      @data.push(val)
      upheap(@data.size - 1)
    end
    self
  end

  def pop
    result = @data.first
    @data[0] = @data.pop
    heapify(0)
    result
  end

  def merge(heap)
    new_array = @data + heap.instance_variable_get(:@data)

    Heap.new(new_array, min_heap: self.min_heap)
  end

  def left(i)
    2 * i + 1
  end

  def right(i)
    2 * (i + 1)
  end

  def parent(i)
    (i - 1) / 2
  end

  private

  def build_heap
    start = @data.length / 2

    start.downto(0) { |i| heapify(i) }
  end

  def heapify(i)
    l = left(i)
    r = right(i)

    head = i
    head = l if l < @data.size && @heap_check.call(l, head)
    head = r if r < @data.size && @heap_check.call(r, head)

    if head != i
      @data[i], @data[head] = @data[head], @data[i]
        heapify(head)
    end
  end
end

Conclusions

With this post, we went over a simple Heap implementation in Ruby for the DataStructures101 gem. As part of the discussion, we mentioned what the heap property is and the most common operations a heap should support.

Then we analyzed a simple implementation of Heap in Ruby and provided some comments about the methods behavior. Finally we concluded with an overview of the final implementation.

Thank you so much for reading this post. Hope you liked reading it as much as I did writing it. Stay tuned for more!!

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.