How To Extract Container Image Filesystem Using Docker

How To Extract Container Image Filesystem Using Docker
How To Extract Container Image Filesystem Using Docker

Even though technically container images are represented as layers of cumulative filesystem changes, from a mere developer’s standpoint, they are just simple holders of future container files. And developers often want to explore the contents of container images accordingly – with familiar tools like catls, or find. In this tutorial, we’ll see how to extract the filesystem of a container image using nothing but the standard Docker means.

Container image to filesystem.

The not so helpful docker save command

The docker help output has just a few entries that look relevant for our task. The first one in the list is the docker save command:

docker save --help
Usage:  docker save [OPTIONS] IMAGE [IMAGE...]

Save one or more images to a tar archive (streamed to STDOUT by default)

Trying it out quickly shows that it’s not something we need:

The docker save command, also known as docker image save, dumps the content of the image in its storage (i.e. layered) representation while we’re interested in seeing the final filesystem the image would produce when the container is about to start.

The almost working docker export command

The second command that looks relevant is docker export. Let’s try our luck with it:

docker export --help
Usage:  docker export [OPTIONS] CONTAINER

Export a container's filesystem as a tar archive

Seems like a good candidate. However, an attempt to export the filesystem of the nginx:latest image fails:

docker export nginx:latest -o nginx.tar.gz
Error response from daemon: No such container: nginx:latest

The problem with the docker export command is that it works with containers and not with their images. An obvious workaround would be to start an nginx:latest container and repeat the export attempt:

CONT_ID=$(docker run -d nginx:latest)

docker export ${CONT_ID} -o nginx.tar.gz

What’s inside?

mkdir rootfs

tar -xf nginx.tar.gz -C rootfs

ls -l rootfs
total 68
lrwxrwxrwx  1 root root    7 Mar 11 00:00 bin -> usr/bin
drwxr-xr-x  2 root root 4096 Jan 28 21:20 boot
drwxr-xr-x  4 root root 4096 Apr  9 09:43 dev
drwxr-xr-x  2 root root 4096 Mar 12 01:55 docker-entrypoint.d
...
drwxrwxrwt  2 root root 4096 Mar 11 00:00 tmp
drwxr-xr-x 12 root root 4096 Mar 11 00:00 usr
drwxr-xr-x 11 root root 4096 Mar 11 00:00 var

💡 Pro Tip: By default, extracting files from a tar archive sets the file ownership to the current user. If the original file ownership needs to be preserved, you can use the --same-owner flag while extracting the archive. Beware that you’ll have to be sufficiently privileged for that.

Example: sudo tar --same-owner -xf nginx.tar.gz -C rootfs

Well, the output does look like what we need – just a regular folder with a bunch of files inside that we can explore as any other filesystem. However, running a container just to see its image contents has significant downsides:

  • The technique might be unnecessarily slow (e.g., heavy container startup logic).
  • Running arbitrary containers is potentially insecure.
  • Some files can be modified upon startup, spoiling the export results.
  • Sometimes, running a container is simply impossible (e.g., a broken image).

The working docker create + docker export combo

Containers are stateful creatures – they are as much about files as about processes. In particular, it means that when a containerized process dies, its execution environment, including the filesystem, is preserved on disk (unless you ran the container with the --rm flag, of course). Thus, using docker export for a stopped container should be possible, too. However, this approach suffers from pretty much the same set of drawbacks as exporting a filesystem of a running container – to get a stopped container, you need to run it first…

But wait a second! There is another type of not-running containers – the ones that were created but haven’t been started yet.

The well-known docker run command is actually a shortcut for two less frequently used commands – docker create <IMAGE> and docker start <CONTAINER>. And since containers aren’t (only) processes, the docker create command, in particular, prepares the root filesystem for the future container.

So, here is the trick:

CONT_ID=$(docker create nginx:latest)

docker export ${CONT_ID} -o nginx.tar.gz

And a handy oneliner (assuming the target folder has already been created):

docker export $(docker create nginx:latest) | tar -xC <dest>

Don’t forget to docker rm the temporary container after the export is done 😉

The more accurate docker build -o alternative

Most of the time, the docker create + docker export combo produces satisfactory results. However, you may still notice some tiny artifacts in the resulting filesystem. For instance, the exported filesystem may have the /etc/hosts file even when the original image would not have one. This is because the docker create command actually performs some additional modifications on top of the extracted original filesystem.

But what if we want to get the original filesystem, without any modifications?

Turns out that starting with Docker 18.09 (released ~early 2019), it’s possible to specify a custom output location for the docker build command using the --output|-o flag. So, here is the trick:

echo 'FROM nginx:latest' > Dockerfile

# DOCKER_BUILDKIT=1 if you're running Docker < 23.0
docker build -o rootfs .

ls -l rootfs
total 84
drwxr-xr-x  2 vagrant vagrant 4096 Aug 22 00:00 bin
drwxr-xr-x  2 vagrant vagrant 4096 Jun 30 21:35 boot
drwxr-xr-x  4 vagrant vagrant 4096 Sep 12 14:07 dev
drwxr-xr-x  2 vagrant vagrant 4096 Aug 23 03:59 docker-entrypoint.d
...
drwxr-xr-x  2 vagrant vagrant 4096 Aug 23 03:59 tmp
drwxr-xr-x 11 vagrant vagrant 4096 Aug 22 00:00 usr
drwxr-xr-x 11 vagrant vagrant 4096 Aug 22 00:00 var

Generally speaking, building container images involves running intermediate containers, but if the Dockerfile has no RUN instructions, as the one above, no containers will be spun up. So, the docker build command will just copy the FROM image contents into the anonymous image, and then save the result to the specified output location.

The --output flag works only if BuildKit is used as a builder engine, so if you’re still on Docker < 23.0 (released ~early 2023), you’ll either need to use docker buildx build or set the DOCKER_BUILDKIT=1 environment variable.

⚠️ Caveat: It might be impossible to preserve the file ownership information using the docker build -o approach.

The bonus ctr image mount method

As you probably know, Docker delegates more and more some of its container management tasks to another lower-level daemon called containerd. It means that if you have a dockerd daemon running on a machine, most likely there is a containerd daemon somewhere nearby as well. And containerd often comes with its own command-line client, ctr, that can be used, in particular, to inspect images.

The cool part about containerd is that it provides a much more fine-grained control over the typical container management tasks than Docker does. For instance, you can use ctr to mount a container image to a local folder, without even mentioning any containers:

ctr image pull docker.io/library/nginx:latest

mkdir rootfs

ctr image mount docker.io/library/nginx:latest rootfs

In the above example, the resulting rootfs folder will contain the extracted filesystem of the nginx:latest image, without any docker create-like artifacts and without potentially confusing docker build tricks.

The downside of this approach is that you may need to pull the image explicitly before mounting it, even if it has been already pulled by Docker. Historically, dockerd and containerd used different image storage backends, and it was not possible to use ctr to access images owned by dockerd. A quick check (ctr --namespace moby image ls) shows that at least with Docker Engine 26.0 (~Q1 2024), it’s still the case. However, things may have already improved in Docker Desktop, thanks to the ongoing effort to offload more and more lower-level tasks from Docker to containerd.

Conclusion

In this article, we’ve learned how to extract the filesystem of a container image using standard Docker commands. As usual, there are multiple ways to achieve the same goal, and it’s important to understand the trade-offs of each one. Here is a quick summary of the methods we’ve covered:

  • docker save is unlikely the command you’re looking for.
  • docker export works but requires a container in addition to the image.
  • docker create + docker export is a way to export the filesystem w/o starting the container.
  • docker build -o is a potentially surprising but a more accurate way to export the filesystem.
  • ctr image mount is a clever alternative method that also produces artifact-free results.

Author

  • Mohamed BEN HASSINE

    Mohamed BEN HASSINE is a Hands-On Cloud Solution Architect based out of France. he has been working on Java, Web , API and Cloud technologies for over 12 years and still going strong for learning new things. Actually , he plays the role of Cloud / Application Architect in Paris ,while he is designing cloud native solutions and APIs ( REST , gRPC). using cutting edge technologies ( GCP / Kubernetes / APIGEE / Java / Python )

    View all posts
0 Shares:
You May Also Like
Read More

Learn 9 Docker commands

Table of Contents Hide docker historysystem dfpruneexecdocker CPdocker eventsdocker stats“docker save” and “docker load”docker network lsConclusionAuthor In the…
How to Learn the Main Docker Commands
Read More

How to Learn the Main Docker Commands

Table of Contents Hide Process-File DualityThe docker container create commandThe docker container rename and update commandsThe docker container start commandThe docker container wait commandThe docker container stop and kill commandsThe docker container ls commandThe docker…