Automating GitLab pipelines for base docker images

It’s common to have a repository dedicated to store the Dockerfile definitions and associated files for base docker images used across different projects in the organization. At work, we have such a repo where to store a variety of such images, e.g: ruby, node and golang images. Over time, the number of images has grown significantly and as a result, the size of GitLab CI (.gitlab-ci.yml) with very similar jobs definitions (definitely not DRY).

Problem at glance

Let’s see below the structure of the repo and the simplified version of the .gitlab-ci.yml to better illustrate the situation and its issues:

Repo structure

The repo was structured with multiple directories, each one potentially containing multiple subdirectories, with potential Dockerfile at the different directory levels.

...
├── docker_1
│   ├── Dockerfile
│   └── entrypoint.sh
├── docker_2
│   ├── current.Dockerfile
│   ├── next.Dockerfile
│   └── site
│       └── index.html
├── docker_3
│	├── api
│	│   ├── default.conf
│	│   └── Dockerfile
│   └── site
│		├── Dockerfile
│       └── index.html
...

GitLab CI

For each of the Dockerfiles, the pipeline defined an associated job that builds the image and pushes it to our internal container registry.

# .gitlab-ci.yml

.build: &build
  before_script:
    - docker login -u $DOCKERHUB\_USERNAME -p $DOCKERHUB\_PASSWORD
	- docker login -u gitlab-ci-token -p $CI\_BUILD\_TOKEN <registry_url>
  tags:
    - pool
  stage: build
  only:
    variables:
	- $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH

docker_1:
  <<: *build
  script:
    - docker build -t <registry_url>/docker_1:0.1.0 -f docker_1/Dockerfile docker_1/
	- docker push <registry_url>/docker_1:0.1.0

docker_2:
  <<: *build
  script:
    - docker build -t <registry_url>/docker_2:current -f docker_2/current.Dockerfile docker_2/
	- docker push <registry_url>/docker_2:current
    - docker build -t <registry_url>/docker_2:next -f docker_2/next.Dockerfile docker_2/
	- docker push <registry_url>/docker_2:next

docker_3_api:
  <<: *build
  script:
    - docker build -t <registry_url>/docker_3/api:0.0.1 -f docker_3/api/Dockerfile docker_3/api
	- docker push <registry_url>/docker_3/api:0.0.1

docker_3_site:
  <<: *build
  script:
    - docker build -t <registry_url>/docker_3/site:0.0.1 -f docker_3/site/Dockerfile docker_3/site
	- docker push <registry_url>/docker_3/site:0.0.1
...

Issues

Let’s discuss the main issues stemming from the current setup:

All images are built on the default branch pipeline every time, even when they haven’t changed, thus consuming extra build resources unnecessarily and increasing the execution time for the pipeline, and potentially failing the pipeline for unrelated reasons to the changes.
Images aren’t built on branches, therefore developers and DevOps engineers cannot get automated feedback about their proposed changes until they have merged their changes in the default branch.
For the majority of the docker images definitions, the corresponding .gitlab-ci.yml job definition follows the same structure:
1. Build image
2. Push image to internal GitLab registry

Solution

After exploring some alternatives, I decided to leverage built-in GitLab capabilities to automate and simplify the procedure of adding/modifying the docker images.

Dynamic child pipelines: GitLab offers the capability of dynamically trigger child pipelines from a running pipeline using a yaml file generated from a job and stored as an artifact. See the docs for more details.
Container Registry API: Provides API access to query and modify the GitLab Container Registry, including finding images names, tags, etc. See the docs for more details.

Implementation

The implementation is formed by several components:

child pipeline generator script: The script takes care of accessing the Registry API to verify if there a new docker or tag defined in the repository that is not present in the Registry, plus creating the child pipeline with the collected docker images data.
child pipeline template: Used by the above script to define the child pipeline that will be run by Gitlab. It contains the jobs definition to run in the default branch, as well as non default ones.
Dockerfile & Gemfile: Specify the dependencies and packaging to build the script and be used as part of the repo pipeline
Repository pipeline: Use the above components to build the docker image to run the script and trigger the child pipeline.

If you want to run the code below as part of a GitLab repo, you will need to do the following changes:

<registry_url>: replaced by the URL of your registry (either internal or DockerHub)
<API_URL>: replaced by the API URL
CI_TOKEN: When I originally implemented this I had to use a personal token to be able to interact with the Registry API. In more recent versions, the CI/CD Job token may have enough access to fulfill these requirements.
Define at least one Dockerfile.
Define at least one build.yml at the same folder level of the above Dockerfile.

Child pipeline generator script

The script is main building block. It works as follows:

Gets the list of all image repositories defined in the GitLab project.
Parses all existing build.yml and *.build.yml files in the repository and prepares the metadata to generate the jobs in the child pipeline. A build.yml file provides the following information:

name: <image_name>
tag: <image_tag>

Checks if any of the docker images found in the metadata are not present in the Registry already (either the image name if it is new or the tag if updating an existing image).
Applies the data to the template to generate the final child pipeline artifact.

#! /usr/bin/env ruby
# frozen_string_literal: true
# generate_custom_images_pipeline.rb

require 'erb'
require 'yaml'
require 'ostruct'
require 'pathname'
require 'gitlab'

def sanitize_name(internal_name, tag)
  "#{internal_name}_#{tag}"
end

def full_internal_name(name)
  "<registry_url>/#{name}"
end

def prepare_dockerfile_name(build_filename)
  prefix = build_filename.to_s.delete_suffix('build.yml')
  dockerfile_name = 'Dockerfile'
  if prefix != "" # this means that we have a prefixed build.yml e.g <name>.build.yml
 # prefix include the last dot
    dockerfile_name = "#{prefix}Dockerfile"
  end
  dockerfile_name
end

Gitlab.configure do |config|
  config.endpoint       = '<GITLAB_API_URL>'
  config.private_token  = ENV.fetch('CI_TOKEN')
end

REPO_DIR = Pathname(ENV.fetch('CI_PROJECT_DIR'))
BASE_DIR = Pathname(File.expand_path(File.dirname(__FILE__)))
CHILD_PIPELINE_TEMPLATE_PATH = BASE_DIR.join('pipeline_custom_images.yaml.erb')
CHILD_PIPELINE_OUTPUT_PATH = REPO_DIR.join('pipeline_custom_images.yaml')
PROJECT_ID = ENV.fetch('CI_PROJECT_ID')

puts "Getting list of existing images"
repositories = Gitlab.registry_repositories(PROJECT_ID)
  .auto_paginate
  .to_h { |repo| [repo.name, repo] }

custom_images = []

puts "Detect custom images with automated build process"
REPO_DIR.glob('**/*build.yml').each do |build_path|
  puts "Processing #{build_path}"
  dockerfile_name = prepare_dockerfile_name(build_path.basename)
  context_path = build_path.parent
  dockerfile_path = context_path.join(dockerfile_name)

  raise "Expected Dockerfile at #{dockerfile_path} not found" unless dockerfile_path.exist?

  build_config = YAML.safe_load(build_path.read)

  unless build_config.key?('name') && build_config.key?('tag')
    raise "Invalid build config found at #{context_path}. It needs both name and tag"
  end

  full_internal_name = full_internal_name(build_config['name'])
  tag = build_config['tag']
  custom_images << OpenStruct.new(
    internal_name: build_config['name'],
    full_internal_name: full_internal_name,
    tag: tag,
    sanitized_name: sanitize_name(build_config['name'], tag),
    dockerfile_path: dockerfile_path.relative_path_from(REPO_DIR).to_s,
    context: context_path.relative_path_from(REPO_DIR).to_s
  )
end

non_published_images = []

custom_images.each do |image|
  if !repositories.key?(image.internal_name)
    puts "#{image.internal_name} image | Not found in container registry. Adding tag"
    non_published_images << image
    next
  end

  puts "#{image.internal_name} image | Getting tags from container registry"
  repo = repositories[image.internal_name]

  begin
    Gitlab.registry_repository_tag(repo.project_id, repo.id, image.tag)
  rescue Gitlab::Error::NotFound => e
    puts "#{image.internal_name} image | Tag #{image.tag} not found. Adding tag"
    non_published_images << image
    next
  end
end

if non_published_images.empty?
  puts "No new custom images or tags to build"
else
  puts "Custom images to build: #{non_published_images.map(&:sanitized_name)}"
end

puts "Loading pipeline template from #{CHILD_PIPELINE_TEMPLATE_PATH}"
template = ERB.new(CHILD_PIPELINE_TEMPLATE_PATH.read, trim_mode: '<>')

puts "Generate child pipeline at #{CHILD_PIPELINE_OUTPUT_PATH}"
# Evaluate the custom_images_pipeline.yaml.erb template
# with the processed data from the different build.yml
CHILD_PIPELINE_OUTPUT_PATH.open(mode="w") do |f|
  f << template.result_with_hash(
    non_published_images: non_published_images
  )
end

Child pipeline template

The child pipeline declares the job to run for new images or images that have changed. It haves 3 types of jobs:

new_<%= image.sanitized_name %>_branch: Templatized job to run when the child pipeline is triggered in non-default branches. It builds the Dockerfile, but doesn’t push it to the registry. We use it as to validate that the Dockerfile can be built succesfully before merging to the default branch
new_<%= image.sanitized_name %>_default: Templatized job to run when the child pipeline is triggered in the default branch. It builds the Dockerfile and the tags with the tag value extracted from the build.yml and pushes it to the registry
no_changes: Fallback job to be use if there are no other jobs to execute in the pipeline (either no build.yml was changed nor a new Dockerfile definition was added). This is required because a valid pipeline requires at least on job to run.

# .template-gitlab-ci.yml.erb

stages:
  - build

.build_keys: &build_keys
  before_script:
  - docker login -u $DOCKERHUB_USERNAME -p $DOCKERHUB_PASSWORD
  - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN <registry_url>
  stage: build
  tags:
  - pool

.image_in_branch:
  extends: .build_keys
  only:
    refs:
    - branches
  except:
	variables:
	- $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH

.image_in_default:
  extends: .build_keys
  only:
    variables:
	- $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH>)

<% non_published_images.each do |image| %>
new_<%= image.sanitized_name %>_branch:
  extends: .image_in_branch
  script:
  - docker build -t <%= "#{image.full_internal_name}:#{image.tag}" %> -f <%= image.dockerfile_path %> <%= image.context %>
<% end %>

<% non_published_images.each do |image| %>
new_<%= image.sanitized_name %>_default:
  extends: .image_in_default
  script:
  - docker build -t <%= "#{image.full_internal_name}:#{image.tag}" %> -f <%= image.dockerfile_path %> <%= image.context %>
  - docker tag <%= "#{image.full_internal_name}:#{image.tag}" %> <%= image.full_internal_name %>:$CI_COMMIT_SHA
  - docker push <%= "#{image.full_internal_name}:#{image.tag}" %>
  - docker push <%= image.full_internal_name %>:$CI_COMMIT_SHA
<% end %>

<% if non_published_images.length == 0 %>
no_changes:
  stage: build
  tags:
  - pool
  script:
  - echo "No new custom images or tags are being added/changed"
<% end %>

Dockerfile & Gemfile

This 2 files define how the software is going to be packaged (Dockerfile) and its dependencies (Gemfile). For this automation, the only library used is the gitlab gem.

FROM <registry_url>/ruby:2.7.2-alpine

WORKDIR /pipelines

COPY . .

RUN bundle check || bundle install

# frozen_string_literal: true
source 'https://rubygems.org'

# BUNDLE_GEMFILE=Gemfile bundle install
gem 'gitlab'

Repository pipeline

Finally, the repo pipeline ties everything together as follows:

build_child_pipelines_image: Builds the docker image from the Dockerfile under child_images_pipeline folder and tags it as latest (only if there are changes, otherwise the job is skipped)
ci_yaml_for_custom_images: Build the gitlab-ci.yml definition for the dynamic child pipeline by running the script and leveraging the pipeline .erb template. The built pipeline file (pipeline_custom_images.yaml) is then saved as artifact to be consume for the subsequent job.
trigger_pipeline_for_custom_images: Trigger the child pipeline using pipeline_custom_images.yaml as the its definition.

# .gitlab-ci.yml

stages:
- build_pipeline_image
- build_child_pipelines
- build_downstream

build_child_pipelines_image:
  before_script:
  - docker login -u $DOCKERHUB_USERNAME -p $DOCKERHUB_PASSWORD
  - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN <registry_url>
  tags:
  - pool
  stage: build_pipeline_image
  only:
    variables:
	- $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH
    changes:
    - pipelines/*
  script:
  - docker build -t <registry_location>/child-images-pipeline:latest -f child_images_pipeline/Dockerfile child_images_pipeline/
  - docker push <registry_location>/child-images-pipeline:latest

ci_yaml_for_custom_images:
  tags:
  - pool
  stage: build_child_pipelines
  artifacts:
    paths:
    - pipeline_custom_images.yaml
  script:
  - cd /pipelines && bundle exec ruby ./generate_custom_images_pipeline.rb
  image: <registry_location>/child-images-pipeline:latest

trigger_pipeline_for_custom_images:
  stage: build_downstream
  needs:
  - ci_yaml_for_custom_images
  trigger:
    include:
    - artifact: pipeline_custom_images.yaml
      job: ci_yaml_for_custom_images
    strategy: depend

Next steps

With this solution in place, let’s briefly mention some areas that it could be improved in the future:

Automatic cleanup of images from the registry if the directory or the Dockerfile is removed from the repository. This would require having instrumentation to evaluate the usage of the image from the registry
Break down the ruby logic in the script and add tests to verify behavior and prevent regressions. This would need changes in the repo pipeline to include at least a new testing job.