Writing good Dockerfiles in 2023

Introduction

Recently I bought an arm vm at Hetzner for my Kubernetes Cluster. While these are cheap, I quickly discovered how neglected arm images actually are.

To give you a quick rundown:

Many projects don't offer arm images
People hardcode their registries in the build scripts, making out-of-tree building harder than needed
People write way too complex Dockerfiles (please trust your Docker. It sets a lot of useful things by default!)
People don't sign things
Docker and Podman both kinda are odd with their multiarch images
- Docker generates 2 dead references in the manifest which don't exist
- Podman needs you to manage the manifest somewhat manually

You may now rightfully ask how to actually do this properly.

Writing a good Dockerfile

A good Dockerfile is actually simple. There are more specific examples if you search for it, but here are some good guidelines to start from:

Make use of multistage Dockerfiles.
- Dockerimages will be smaller
- Caching can be more efficient in some cases
- Your CI may be faster since there is less to upload
Don't require the .git folder to exist. People may run your script in all kind of scenarios like for example having it as a submodule
Don't make the build process hidden behind a HUGE Makefile
- This both decreases readability and maintainability
- It also reduces the flexibility
Only copy files you need into the Dockerimage
- Use a .dockerignore file
Set a UID and GID for a rootless Dockerimage
Try to be minimal

Writing a good Docker CI on GitHub Actions

For the most people this is the more easy part and yet often people explicitly seem to disable arm for no reason.

An example good GitHub action would be:

name: Publish Synapse Docker image

on:
  push:

jobs:
  push_to_registries:
    permissions:
      contents: read
      packages: write
      id-token: write
    name: Push synapse image to repo
    runs-on: ubuntu-latest
    steps:
      - name: Check out the repo
        uses: actions/checkout@v3
        with:
          submodules: "recursive"

      - name: Log in to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${\{ secrets.DOCKER_USERNAME }}
          password: ${\{ secrets.DOCKER_PASSWORD }}

      - name: Install Cosign
        uses: sigstore/cosign-installer@v3
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Docker meta
        id: docker_meta
        uses: docker/metadata-action@v4
        with:
          tags: type=sha,format=long

      - name: Build and push
        id: build_and_push
        uses: docker/build-push-action@v4
        with:
          push: true
          context: .
          sbom: true
          provenance: true
          platforms: linux/amd64,linux/arm64
          tags: ${\{ steps.docker_meta.outputs.tags }}

      - name: Sign the images with GitHub OIDC Token
        env:
          DIGEST: ${\{ steps.build_and_push.outputs.digest }}
          TAGS: ${\{ steps.docker_meta.outputs.tags }}
        run: cosign sign --yes "${TAGS}@${DIGEST}"

Rendering bug with publishing software

Due to a bug with the publishing software there are escape characters visible in the above source. Make sure to remove the backslashes for the secret and step references!

But what does this actually do, you may ask: It's actually quite simple. First it clones the repo with all the submodules, then it logs in to the Docker Hub. After doing these too basic steps, we install all the required dependencies.

First we install Cosign^[1] which we later need to be able to sign our image.

Then we install 2 components required to be able to do a multiarch build. Since we are usually using amd64 runners, QEMU is required to emulate arm64. The BuildX setup is then needed to make use of QEMU in the build.

After we installed our dependencies, we now can generate the metadata. You can look its GitHub repo^[2] for more details on how to actually use. Important here is just the tags section since this format is needed for cosign.

The metadata then can be used to build the Image itself and push it to the server. The most notable things about this step are that we define both arm64 and amd64 as a platform. This makes the resulting image a multiarch image.

As a last step, we then sign the images using the GitHub Account. That way, people can ensure they are actually using the image from the right person.

Bonus: Building multiarch on modern Podman

I also did find Multi-arch build with Podman · Rust stuff while digging the internet on multiarch. It is fairly complete, but there is a minor improvement you can do:

First, you should add a podman manifest rm ${MANIFEST_NAME} before the create to make sure that it actually can be created.
Secondly, you need to use podman manifest push instead of podman push otherwise only one of the images is being pushed instead of the manifest.

Writing good Dockerfiles in 2023

Introduction

Writing a good Dockerfile

Writing a good Docker CI on GitHub Actions

Bonus: Building multiarch on modern Podman

Sources

Footnotes