Getting Started with Docker From Scratch

Docker originated as an internal project initiated by Solomon Hykes, the founder of dotCloud, during his time in France. It emerged as an innovation drawing from dotCloud’s extensive experience in cloud service technology and was released as open source under the Apache 2.0 licensing agreement in March 2013.

Docker serves as an open source application container engine developed and implemented using the Go language. Unlike virtualization technologies like KVM and Xen, Docker leverages kernel features such as cgroups, namespaces, and a Union FS similar to AUFS in the Linux kernel. It operates by encapsulating and isolating processes, functioning as a system-level virtualization technology.

Within a Docker container, the application processes execute directly on the host’s kernel. The container doesn’t possess its own kernel, eliminating the need for hardware virtualization and resulting in lightweight resource consumption. Docker proves highly efficient in addressing challenges like dependency management and standardization of development environments in software development workflows.

Docker Installation

Docker offers two editions: Docker CE (Community Edition) and Docker EE (Enterprise Edition). Docker CE is free and comes with a 7-month support period, while Docker EE requires a paid subscription and offers a 24-month support period.

To install Docker, you can follow the official documentation on the Docker website. Docker Engine is available for Linux, Windows, and macOS through Docker Desktop. For Linux, Docker provides binaries for manual installation, which are statically linked and can be used on any Linux distribution. Docker does not test or verify installations on distro derivatives, so it’s recommended to follow the installation instructions for your specific Linux distribution.

For Docker CE, you can use the convenience script provided by Docker to install the latest release on supported Linux distributions:

curl -sSL https://get.docker.com/ | sh

This script automatically detects your Linux distribution and installs Docker CE accordingly. After installation, you can verify it by running:

sudo docker run hello-world

For Docker EE, installation requires a trial or purchased version. Since Mirantis acquired Docker Enterprise, you need to contact their sales team for a trial account. The Mirantis Launchpad CLI tool is recommended for evaluating Docker Enterprise.

For detailed installation instructions, including using the Docker repository for a more manual approach, refer to the Docker documentation and the provided sources.

Docker Concepts

In Docker, there are three main concepts : Image, Container, and Registry.

Docker Image

A Docker image is a special file system. It includes all the necessary programs, libraries, resources, configurations, and other files needed for running a container. Additionally, it contains certain configuration parameters tailored for runtime, such as anonymous volumes, environment variables, and user settings.

Docker Container

Think of the image as a class and the container as an instance.

The core of a container is a process, but unlike processes executed directly on the host, container processes operate within their own isolated namespace.

If you want to ensure that data persists beyond the lifespan of a container, you can utilize a data volume or bind a directory from the host. Operations such as reading and writing in these locations bypass the container storage layer, directly accessing and modifying data on the host (or network storage).

Docker Registry

Once an image is built, it can be published to a remote repository known as a “registry.” This registry serves as a centralized location for storing and distributing images, allowing others to download and utilize them. Docker Registry is an example of such a service.

A Docker Registry can host multiple repositories, with each repository containing multiple tags. Each tag corresponds to a specific version of an image.

Images are named using the format namespace/repository:tag, such as jwilder/nginx-proxy:latest. If only the repository name is provided (e.g., mohamedbenhassine/nginx-proxy), it implies the latest tag by default.

Official images typically do not include the namespace section. For example, centos can be used directly as the image name.

The most commonly used public Registry service is Docker Hub, which serves as the default Registry. Users can explore image descriptions, comments, and other information on Docker Hub.

Docker Architecture

The Docker installation consists of two components: Docker Client and Docker Server. Commands are transmitted from the Client to the Server, where tasks such as creating images and running containers are executed.

Docker relies on the cgroup and namespace features of the Linux kernel.

Linux Namespace is a kernel-level mechanism that provides environment isolation for processes or groups of processes. It allows for the segregation of resources such as disk and network.
Control groups (cgroups) are used to manage and limit the resource utilization of individual processes, including CPU, memory, and more.

Since Windows and macOS lack native support for these technologies, Docker on these platforms actually operates within a Linux virtual machine.

Docker Layered Storage

Docker employs a layered storage approach, leveraging Union File System (UnionFS) technology, to efficiently manage the extensive data associated with container images. This method involves constructing layers incrementally during the image creation phase, with each layer adding to the previous one.

When a file is removed in a previous layer, it is not physically erased from that layer. Instead, it is marked as deleted in the current layer. This means that even though the deleted file may not be visible in the final container, it still exists within the underlying image.

Docker Registry Mirroring

Docker Registry Mirroring is a technique to optimize image pulls by caching them locally. When you run a container, Docker checks the local cache for the image. If not found, it downloads the image from the Docker Hub and stores it locally before running the container. This process involves Docker acting as a client that requests images from a registry mirror, which fetches the image from the Docker Hub if not already cached locally.

To configure a Docker Registry mirror, you add a proxy section to the registry’s config file, specifying the Docker Hub URL and optionally, credentials for accessing private images. It’s crucial to secure the mirror by implementing authentication to keep private resources private.

Docker daemon can be configured to use the registry mirror by passing the --registry-mirror option or editing the /etc/docker/daemon.json file to include the registry-mirrors key with the mirror URL. This setup is beneficial for environments with multiple Docker instances, reducing internet traffic and improving container startup times.

However, it’s currently not possible to mirror another private registry; only the Docker Hub can be mirrored. Also, mirrors of Docker Hub are subject to Docker’s fair usage policy.

Using Docker

After setting up Docker, you can utilize various commands to manage your containers and images efficiently:

Check Docker Version: docker version View Docker version information.
Get Docker System Information: docker info Display system-wide information including configuration and current status.
Access Help: docker --help View help information.

Docker commands come in two forms:

docker <command> (options) # Old command format
docker <command> <sub-command> (options) # New command format

For instance, docker ps is equivalent to docker container ls.

Image Management:

Search for Images docker search <keyword> Search for images using keywords.
Pull Images:docker pull <image> Pull remote images to your local machine.
List Local Images: docker images List local images. Note that the total size displayed is not the actual disk consumption since Docker images utilize a layered storage structure.
Check System Disk Usage: docker system df View memory usage information for images, containers, and volumes.

Image Cleanup:

Dangling Images: You might encounter images labeled with <none> due to name conflicts between old and new images. These images have no associated name or label.
Middle Layer Images: docker images -a Display all images, including middle-tier images. Middle-tier images are essential for maintaining dependencies between top-level images.
Delete Images: docker rmi <image> Delete images. If a container is based on the specified image, deletion will fail unless you use the -f flag to force deletion.

When you delete an image, it is either ‘Untagged’ or ‘Deleted’, depending on whether it has multiple tags. Deletion ultimately occurs when all tags associated with an image are removed.

Manage Containers

To manage Docker containers effectively, you can follow these steps:

Start Container(s): Use docker start <container> [...] to run one or more containers.
Create a Container from an Image: Use docker create <image> to create a container from an image. You can specify a name using --name to avoid a random name assignment.
Run a Container from an Image: Use docker run <image> [COMMAND] to create and start a container from an image. If the image doesn’t exist locally, Docker will automatically pull it from the remote repository.

Common parameters include:

--name <name>: Set the container name.
-d: Run as a daemon in the background.
--rm: Automatically delete the container when it’s stopped.
-p <host-port>:<container-port>: Map host port to container port.
-e <variable_name>=<variable_value>: Set environment variables.
-i: Keep STDIN open, allowing interaction with the container.
-t: Allocate a pseudo-terminal.

Examples:

Start an nginx server: docker run --name webserver --rm -d -p 80:80 nginx
Start an nginx container and enter its shell: docker run -it nginx sh
Start a container and execute a command: docker run -it --name hi busybox echo hi
List containers: docker ps
Show real-time resource usage of containers: docker stats
Display running processes inside a container: docker top <container>
View container logs: docker logs <container>
Stop a container: docker stop <container>
Kill a container forcefully: docker kill <container>
Restart a container: docker restart <container>
Execute a command inside a running container: docker exec -it <container> sh
Remove a container: docker rm <container>
Prune stopped containers: docker container prune
Clean up Docker system: docker system prune

Container Image Creation

Creating a Docker image from a container involves several steps. Here’s a guide on how to do it:

Start the Container: First, you need to start the container from which you want to create the image. You can do this by running the container with the appropriate command and options. For example: docker run -d --name my-container my-image:tag
Execute Commands in the Container (Optional): If needed, you can execute commands inside the running container to set up its environment or configuration. You can access the container’s shell using the docker exec command: docker exec -it my-container /bin/bash
Make Changes (Optional): If you need to make any changes to the container’s filesystem, such as installing packages or modifying files, you can do so now.
Stop the Container: Once you’re done making changes or if you don’t need to make any changes, you can stop the container: docker stop my-container
Commit the Container to an Image: After stopping the container, you can commit it to create a new Docker image. Use the docker commit command and specify the container ID or name along with the desired image name and tag: docker commit my-container my-new-image:tag
Verify the New Image: You can verify that the new image has been created successfully by listing Docker images: docker images
Cleanup (Optional): If you no longer need the original container, you can remove it: docker rm my-container
Run Containers from the New Image: You can now use the newly created Docker image to run containers: docker run -d --name my-new-container my-new-image:tag

Build Docker Image

To build an image using Docker, you can follow these steps:

Save the Dockerfile: Save a Dockerfile in the directory where your project files are located.
Execute Build Command: Run the following command in the terminal under the current folder where the Dockerfile is located: docker build -t nginx:v3 . This command creates an image and tags it with nginx:v3. The -f parameter is used to specify the path to the Dockerfile. Since we are using the default Dockerfile name (Dockerfile), we can omit this parameter. The . at the end represents the current path, which specifies the context for building the image.
Understand Build Process: When the command is executed, Docker starts building the image. Each command in the Dockerfile corresponds to a step in the build process. Docker creates a temporary container to execute each instruction. After the execution of each instruction is completed, Docker removes the temporary container and uses the resulting layer as the base layer for the next instruction.
Using Cache: Docker caches layers to speed up the build process. If a Dockerfile instruction hasn’t changed since the last build, Docker will use the cached layer instead of rebuilding it. This is indicated by the message “Using cache” in the build output.
Inspect the Image: After the build process completes successfully, you can inspect the newly built image: docker run -it nginx:v3 bash This command runs a bash shell in a container based on the newly built image, allowing you to explore its contents.

In addition to building images from Dockerfiles, you can also build images from URLs, such as Git repositories:

docker build https://gitrepo.com/git.git#:app

This command specifies the Git repo required for the build and specifies the default master branch. The build directory is /app/.

Multi Stage Build Docker

In a multi-stage build, we can streamline the Dockerfile to perform multiple build processes within a single file. Let’s walk through an example using a front-end project where we need to package files with Node.js and then serve them using Nginx:

First, let’s set up our project and Dockerfile:

To initialize a React app:

npx create-react-app myapp

To navigate into the project directory:

cd myapp

To create a .dockerignore file to exclude unnecessary files from the Docker build:

echo ".idea/\n.vscode/\n.git/\nbuild/\nnode_modules/\nnpm-debug.log*" > .dockerignore

Create Dockerfile:
- Create a Dockerfile in the project directory:

FROM node:alpine AS builder
WORKDIR /app
COPY ./package.json .
RUN npm install --registry=https://registry.npm.taobao.org
COPY . .
RUN npm run build

FROM nginx AS prod
COPY --from=builder /app/build /usr/share/nginx/html/

Let’s break down the Dockerfile:

Builder Stage (node:alpine):
- We use the Node.js Alpine image to set up our build environment.
- We copy package.json, install dependencies, copy the project files, and run the build command.
Production Stage (nginx):
- We use the Nginx image to create our production image.
- We copy the built files from the builder stage to the appropriate directory in the Nginx image.

Now, let’s build and run our Docker image:

Build Image: docker build -t myapp:v1 .
Check Image: docker images
Run Container: docker run -d --rm --name myapp -p 80:80 myapp:v1

Now you can open your browser and navigate to http://127.0.0.1 to see your application running.

This approach keeps the Dockerfile concise and efficient by leveraging multi-stage builds to perform different tasks within a single build context. It also ensures that the final image is lightweight and only contains the necessary components for production deployment.

Data Management

Let’s delve into data management in Docker, particularly focusing on mounting host directories as data volumes into containers.

Creating a Development Image

We start by creating a Dockerfile for our development environment:

FROM node:alpine

WORKDIR /app

COPY package.json .
RUN npm install --registry=https://registry.npm.taobao.org
COPY . .

EXPOSE 3000

CMD ["npm", "run", "start"]

Explanation:

We use the Node.js Alpine image for setting up our development environment.
We copy the package.json file, install dependencies, and copy the project files.
Expose port 3000 for our application.
Run the npm start command as the default command.

Building the Development Image

Build the image using the Dockerfile.dev:

docker build -t myapp:dev -f Dockerfile.dev .

Running the Container

Run a container from the built image:

docker run --rm --name myappdev -p 8000:3000 myapp:dev

Now you can access the page by navigating to http://localhost:8000.

Mounting Host Directory

To reflect changes made in the host directory into the container, we can mount the host directory into the container:

echo 'CHOKIDAR_USEPOLLING=true' > .env
docker run --rm --name myappdev -v /app/node_modules -v $(pwd):/app -p 8000:3000 myapp:dev

Explanation:

We create a .env file with the content CHOKIDAR_USEPOLLING=true to ensure proper file watching.
We mount the current directory ($(pwd)) to the /app directory in the container.
We mount the /app/node_modules directory as an anonymous data volume to prevent interference with the host’s node_modules.

Data Volume Management

We can create, inspect, use, and clean up data volumes:

docker volume create vol1  # Create a data volume
docker volume ls  # List data volumes
docker volume inspect vol1  # Inspect details of a data volume
docker run -v vol1:/webapp imagename:tag  # Mount a data volume into a container
docker volume rm vol1  # Remove a data volume
docker volume prune  # Clean up unused data volumes

Using `VOLUME` Directive in Dockerfile

We can specify volumes in the Dockerfile using the VOLUME directive:

VOLUME ["/path1", "/path2"]

This ensures that the specified directories are mounted as volumes by default, allowing for data persistence and easier management.

By leveraging these techniques, we can effectively manage data in Docker containers, enabling seamless development and deployment workflows.

Network Management

Let’s explore network management in Docker, including port mapping, container interconnection, custom network creation, DNS resolution, and load balancing.

Port Mapping

To make network applications accessible from the outside, we use port mapping with the -p or -P parameter:

docker run -d -p 127.0.0.1:80:80 nginx  # Map port 80 of the container to port 80 on localhost
docker run -d -p 127.0.0.1::80 nginx  # Map container port 80 to a randomly assigned port on localhost
docker run -d -p 80:80/udp nginx  # Map UDP port
docker run -d -p 80:80 -p 81:81 nginx  # Map multiple ports

Container Interconnection

Containers can be connected to each other using Docker’s virtual networks:

docker network ls  # List Docker networks
docker network inspect bridge  # Inspect details of the default bridge network
docker run -d --rm --name web1 nginx  # Start a container
docker run --rm alpine ping 172.17.0.2  # Ping another container

Custom Network Creation

We can create our own virtual networks and connect containers to them:

docker network create net1  # Create a network
docker run --network net1 image  # Run a container in the created network
docker network connect net1 containerName  # Connect a container to the network
docker network disconnect net1 containerName  # Disconnect a container from the network

DNS Resolution

Custom networks provide DNS resolution, allowing containers to communicate with each other using container names:

docker network create net1  # Create a network
docker run -d --network net1 --rm --name server nginx  # Run a container with a name
docker run -it --rm --network alpine ping server  # Ping the container by name

Load Balancing

Load balancing can be achieved by specifying network aliases for containers:

docker network create net2
docker run -d --network net2 --network-alias search elasticsearch:2  # Create containers with the same alias
docker run --rm --net net2 alpine nslookup search  # Check IP addresses
docker run --rm --net net2 centos curl -s search:9200  # Use the alias for load balancing

By leveraging these network management techniques, we can efficiently manage container communication and connectivity in Docker environments.

Publish Docker Images

To make our Docker image available for others to use, we can publish it to Docker Hub or another Docker registry. Here’s how you can do it:

Publish to Docker Hub

By following these steps, you can share your Docker images with others, either through Docker Hub

Log in to Docker Hub: docker login -u username -p password
Push the image to Docker Hub: docker push username/imageName:Tag
Others can pull the image from Docker Hub : docker pull username/imageName:Tag

If you require more detailed knowledge about Docker, it’s recommended to refer directly to the official Docker documentation.

However, managing multiple containers and specifying a bunch of parameters every time you start a container can become cumbersome. In such cases, Docker Compose can be a valuable tool.

Docker Compose is a tool for defining and running multi-container Docker applications. With Docker Compose, you can use a YAML file to configure your application’s services, networks, and volumes. This allows you to define all the necessary configurations in one place and easily manage multiple containers at once.

You can learn more about Docker Compose and how to use it effectively by exploring its documentation and tutorials.

Conclusion

This post served as a comprehensive guide to understanding and utilizing Docker effectively.

If you’re interested in Kubernetes certification, you should check out these guides.

How to Pass Certified Kubernetes Application Developer (CKAD) 2024

How to Pass Certified Kubernetes Administrator (CKA) 2024

CKS Exam Complete Study Guide 2024 : Certified Kubernetes Security Specialist

Author

Mohamed BEN HASSINE

Mohamed BEN HASSINE is a Hands-On Cloud Solution Architect based out of France. he has been working on Java, Web , API and Cloud technologies for over 12 years and still going strong for learning new things. Actually , he plays the role of Cloud / Application Architect in Paris ,while he is designing cloud native solutions and APIs ( REST , gRPC). using cutting edge technologies ( GCP / Kubernetes / APIGEE / Java / Python )
View all posts