New Dockerfile Features in v1.7.0

Dockerfile
Dockerfile


Recently, the new versions of the BuildKit builder toolkit, Docker Buildx CLI, and Dockerfile frontend for BuildKit (v1.7.0) were released. In this blog post, I’ll delve into some of the new Dockerfile capabilities and guide you on integrating them into your projects.

Dockerfile Versioning

Let’s begin with a quick reminder of Dockerfile versioning and how to transition to v1.7.0.

While Dockerfiles are commonly used for image building, BuildKit extends beyond this format. It supports various frontends for defining build steps. These frontends can be created by anyone, packaged as container images, and fetched from a registry during build invocation.

In the latest release, they’ve made available two frontend images on Docker Hub: docker/dockerfile:1.7.0 and docker/dockerfile:1.7.0-labs.

To utilize these frontends, you must specify a #syntax directive at the start of the file. This directive informs BuildKit which frontend image to use for the build. Below is an example where we’ve set it to utilize the latest version within the 1.7.x major version:

#syntax=docker/dockerfile:1.7
FROM alpine
...

This decouples BuildKit from Dockerfile frontend syntax. You can adopt new Dockerfile features immediately, irrespective of the BuildKit version. The examples outlined in this post are compatible with any Docker version supporting BuildKit (the default builder as of Docker Engine v23), provided you define the correct #syntax directive in your Dockerfile.

For more information on Dockerfile frontend versions, refer to the Docker docs.

Variable Expansions

When writing Dockerfiles, build steps can contain variables defined using the build arguments (ARG) and environment variables (ENV) instructions. The difference between build arguments and environment variables is that environment variables are kept in the resulting image and persist when a container is created from it.

When using such variables, they most likely use ${NAME} or, more simplified, $NAME in COPY, RUN, and other commands.

You might not know that Dockerfile also supports two forms of bash-like variable expansion:

  • ${variable:-word} sets a value to word if the variable is unset
  • ${variable:+word} sets a value to word if the variable is set

Up to this point, these special forms were not that useful because the default value of ARG instructions can also be set directly.

FROM alpine
ARG foo="default value"

If you are an expert in various shell applications, you know that Bash and other tools usually have many additional forms of variable expansion to ease the development of your scripts.

In Dockerfile v1.7, they have added:

  • ${variable#pattern} and ${variable##pattern} to remove the shortest or longest prefix from the start of the variable’s value.
  • ${variable%pattern} and ${variable%%pattern} to remove the shortest or longest suffix from the end of the variable’s value.
  • ${variable/pattern/replacement} to first replace occurrence of a pattern
  • ${variable//pattern/replacement} to replace all occurrences of a pattern

Let’s start with some simple examples.

Projects still can’t agree on whether versions for downloading your dependencies should have a “v” prefix or not. The following allows you to get the format you need:

# example VERSION=v1.2.3
ARG VERSION=${VERSION#v}
# VERSION is now '1.2.3'

In the next case, multiple variants are used by the same project:

ARG VERSION=v1.7.15
ADD https://github.com/containerd/containerd/releases/download/${VERSION}/containerd-${VERSION#v}-linux-amd64.tar.gz /

To configure different command behaviors for multi-platform builds, BuildKit provides useful built-in variables like TARGETOS and TARGETARCH. Unfortunately, not all projects use the same values. For example, while in containers and in the Go ecosystem, they refer to 64-bit ARM architecture as arm64, sometimes you need aarch64 instead.

ADD https://.../download/bun-v1.0.30/bun-linux-${TARGETARCH/arm64/aarch64}.zip /

In this case, the URL also uses a custom name for AMD64 architecture. To pass a variable through multiple expansions, use another ARG definition with an expansion from the previous value. You could also write all the definitions on a single line, as ARG allows multiple parameters, but that may hurt readability.

ARG ARCH=${TARGETARCH/arm64/aarch64}
ARG ARCH=${ARCH/amd64/x64}
ADD https://.../download/bun-v1.0.30/bun-linux-${ARCH}.zip /

Note that the example above is written in a way that if a user passes their own –build-arg ARCH=value, then that value is used as-is.

Now, let’s look at how new expansions can be useful in multi-stage builds.

One of the techniques described in “Advanced multi-stage build patterns” shows how build arguments can be used so that different Dockerfile commands run depending on the build-arg value. For example, you can use that pattern if you build a multi-platform image and wish to run additional COPY or RUN commands only for specific platforms.

If this method is new to you, you can learn more about it from the original post. In summarized form, the idea is to define a global build argument and then define build stages that use the build argument value in the stage name while pointing to the base of your target stage via the build-arg name.

Old example:

ARG BUILD_VERSION=1

FROM alpine AS base
RUN …

FROM base AS branch-version-1
RUN touch version1

FROM base AS branch-version-2
RUN touch version2

FROM branch-version-${BUILD_VERSION} AS after-condition

FROM after-condition
RUN …

When using this pattern for multi-platform builds, one of the limitations is that all the possible values for the build-arg need to be defined by your Dockerfile. This is problematic as they want Dockerfile to be built in a way that it can build on any platform and not limit it to a specific set. Here are some examples of Dockerfiles where dummy stage aliases must be defined for all architectures, and no other architecture can be built. Instead, the pattern they would like to use is that there is one architecture that has a special behavior, and everything else shares another common behavior.

With new expansions, they can write this to demonstrate running special commands only on RISC-V, which is still somewhat new and may need custom behavior:

#syntax=docker/dockerfile:1.7

ARG ARCH=${TARGETARCH#riscv64}
ARG ARCH=${ARCH:+"common"}
ARG ARCH=${ARCH:-$TARGETARCH}

FROM --platform=$BUILDPLATFORM alpine AS base-common
ARG TARGETARCH
RUN echo "Common build, I am $TARGETARCH" > /out

FROM --platform=$BUILDPLATFORM alpine AS base-riscv64
ARG TARGETARCH
RUN echo "Riscv only special build, I am $TARGETARCH" > /out

FROM base-${ARCH} AS base

Let’s go over these ARCH definitions.

  • The first sets ARCH to TARGETARCH but removes “riscv64” from the value.
  • Next, as we described earlier, we don’t actually want the other architectures to use their own values but want them all to share a common value. So we set ARCH to “common” except if it was cleared from the previous riscv64 rule.
  • Now, if we still have an empty value, we default it back to $TARGETARCH.
  • The last definition is optional, as we would already have a unique value for both cases, but it makes the final stage name base-riscv64 nicer to read.

Additional examples of including multiple conditions with shared conditions, or conditions based on architecture variants can be found here.

Comparing this to the initial example of conditions between stages, the new pattern isn’t limited to just controlling the platform differences of your builds but can be used with any build-arg. If you used this pattern before, then you can effectively now define an “else” clause, while previously, you were limited to only “if” clauses.

Copy with keeping parent directories

The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature.

#syntax=docker/dockerfile:1.7-labs

When you are copying files in your Dockerfile, for example:

COPY app/file /to/dest/dir/

It means that the source file is copied directly to the destination directory. If your source path was a directory, then all the files inside that directory are copied directly to the destination path.

But what if you have a file structure such as:


├── app1
│ ├── docs
│ │ └── manual.md
│ └── src
│ └── server.go
└── app2
└── src
└── client.go

You wish to copy only files in app1/src, but so that the final files at the destination would be /to/dest/dir/app1/src/server.go and not just /to/dest/dir/server.go.

With the new COPY –parents flag, you can write:

COPY --parents /app1/src/ /to/dest/dir/

This will copy the files inside the src directory and recreate the app1/src directory structure for these files.

Things get more powerful when you start to use wildcard paths. To copy the src directories for both apps into their respective locations, you can write:

COPY --parents */src/ /to/dest/dir/

This will create both /to/dest/dir/app1 and /to/dest/dir/app2, but it will not copy the “docs” directory. Previously, this kind of copy was not possible with a single command. You would have needed multiple copies for individual files (like here) or used some workaround with the RUN –mount instruction instead.

You can also use double-star wildcard (**) to match files under any directory structure. For example, to copy only the Go source code files anywhere in your build context, you can write:

COPY --parents **/*.go /to/dest/dir/

If you are thinking about why you would need to copy specific files instead of just using COPY ./ to copy all files, remember that your build cache gets invalidated when you include new files in your build. If you copy all files, the cache gets invalidated when any file is added or changed, while if you copy only Go files, then only changes in these files influence the cache.

The new –parents flag is not only for COPY instructions from your build context, but obviously, you can also use them in multi-stage builds when copying files between stages using COPY –from. One thing to note is that with COPY –from syntax, all source paths are expected to be absolute, meaning that if the –parents flag is used with such paths, they will be fully replicated as they were in the source stage. That may not always be desirable, and instead, you may wish to keep some of the parents but discard and replace others. In that case, you can use a special /./ relative pivot point in your source path to mark which parents you wish to copy and which should be ignored. This special path component resembles how rsync works with the –relative flag.

#syntax=docker/dockerfile:1.7-labs

FROM … AS base
RUN ./generate-lot-of-files -o /out/
# /out/usr/bin/foo
# /out/usr/lib/bar.so
# /out/usr/local/bin/baz

FROM scratch
COPY --from=base --parents /out/./**/bin/ /
# /usr/bin/foo
# /usr/local/bin/baz

The above example shows how only “bin” directories are copied from the collection of files that the intermediate stage generated, but all the directories will keep their paths relative to the “out” directory.

Exclusion Filters

The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature.

#syntax=docker/dockerfile:1.7-labs

Another related case when moving files in your Dockerfile with COPY and ADD instructions is when you want to move a group of files, but exclude a specific subset. Previously, your only options were to use RUN –mount or try to define your excluded files inside a .dockerignore file.

However, .dockerignore files are not a good solution for this problem as they only list the files excluded from the client-side build context and not from builds from remote Git/HTTP URLs and are limited to one per Dockerfile. You should use them similarly to .gitignore to mark files that are never part of your project, but not as a way to define your application-specific build logic.

With the new –exclude=[pattern] flag, you can now define such exclusion filters for your COPY and ADD commands directly in the Dockerfile. The pattern uses the same format as .dockerignore files.

The following example copies all the files in a directory except Markdown files:

COPY --exclude=*.md app /dest/

You can use the flag multiple times to add multiple filters. The next example excludes Markdown files and also a file called README.

COPY --exclude=*.md --exclude=README app /dest/

Double-star wildcards exclude not only Markdown files in the copied directory but also in any subdirectory.

COPY --exclude=**/*.md app /dest/

As in .dockerignore files, you can also define exceptions to the exclusions with ! prefix. The following excludes all Markdown files in any copied directory, except if the file is called “important.md” — in that case, it is still copied.

COPY --exclude=**/*.md --exclude=!**/important.md app /dest/

This double negative may be confusing initially, but note that this is a reversal of the previous exclude rule, and include patterns are defined by the source parameter of the COPY instruction.

When using –exclude together with the previously described –parents copy mode, note that the exclude patterns are relative to the copied parent directories or to the pivot point /./ if one is defined.

For example, with a directory structure like:

assets
├── app1
│ ├── icons32x32
│ ├── icons64x64
│ ├── notes
│ └── backup
├── app2
│ └── icons32x32
└── testapp
└── icons32x32
COPY --parents --exclude=testapp assets/./**/icons* /dest/

This would create the directory structure below. Note that only directories with the icons prefix were copied, the root parent directory “assets” was skipped as it was before the relative pivot point, and additionally, “testapp” was not copied as it was defined with an exclusion filter.

dest
├── app1
│ ├── icons32x32
│ └── icons64x64
└── app2
└── icons32x32

Conclusion

In summary, this post introduced several new features and patterns for enhancing Dockerfiles. By leveraging these capabilities, you can describe your builds more efficiently and effectively. Remember, you can start using these features in your Dockerfiles today by simply adding the #syntax line at the top, even if you haven’t updated to the latest Docker version yet.

Author

  • Mohamed BEN HASSINE

    Mohamed BEN HASSINE is a Hands-On Cloud Solution Architect based out of France. he has been working on Java, Web , API and Cloud technologies for over 12 years and still going strong for learning new things. Actually , he plays the role of Cloud / Application Architect in Paris ,while he is designing cloud native solutions and APIs ( REST , gRPC). using cutting edge technologies ( GCP / Kubernetes / APIGEE / Java / Python )

    View all posts
0 Shares:
You May Also Like