What is cgroup
📚️ Reference:
Control groups, often called cgroups, are a feature of the Linux kernel. It allows organizing processes into hierarchical groups and then limiting and monitoring the usage of various resources. The kernel’s cgroup interface is provided through a pseudo file system called cgroupfs. Grouping is implemented in the core cgroup kernel code, while resource tracking and throttling is implemented in a set of subsystems per resource type (memory, CPU, etc.).
cgroup is the underlying technology stack for containers and cloud native. Both kubelet and CRI need to interface with cgroup to enforce resource management for pods and containers, namely: requests/limits and cpu/memory.
There are two cgroup versions in Linux: cgroup v1 and cgroup v2. Cgroup v2 is the new generation of cgroup API.
The cgroup2 feature has been officially stable since Kubernetes v1.25.
What are the advantages of cgroup v2?
📚️ Reference:
cgroup v2 provides a unified control system with enhanced resource management capabilities.
cgroup v2 has many improvements over cgroup v1, such as:
- Single unified hierarchical design across APIs
- Safer subtree delegation to containers
- Updated features such as Pressure Stall Information (PSI)
- Enhanced resource allocation management and isolation across multiple resources
- Unified accounting for different types of memory allocation (network memory, kernel memory, etc.)
- Consider non-immediate resource changes, such as page cache writebacks
Some Kubernetes features specifically use cgroups v2 to enhance resource management and isolation. For example, the MemoryQoS feature improves memory QoS and relies on cgroup v2 primitives.
Prerequisites for using cgroup v2
📚️ Reference:
cgroup v2 has the following requirements:
- Operating system distribution enables cgroup v2
- Ubuntu (starting from 21.10, 22.04+ recommended)
- Debian GNU/Linux (starting from Debian 11 Bullseye)
- Fedora (starting in 31)
- RHEL and RHEL-like distributions (starting from 9)
- …
- Linux kernel is 5.8 or higher
- The container runtime supports cgroup v2. For example:
- containerd v1.4 and higher
- cri-o v1.20 and later
- The kubelet and container runtime are configured to use the systemd cgroup driver
Using cgroup v2
📝 Notes:
Here we take Debian 11 Bullseye + containerd v1.4 as an example.
Enable and check cgroup v2 for Linux nodes
Debian 11 Bullseye has cgroup v2 enabled by default.
This can be verified by the following command:
stat -fc %T /sys/fs/cgroup/
- For cgroup v2, the output is
cgroup2fs
. - For cgroup v1, the output is
tmpfs
.
If it is not enabled, you /etc/default/grub
can GRUB_CMDLINE_LINUX
add it in the following systemd.unified_cgroup_hierarchy=1
and then executesudo update-grub
📝 Notes:If it is a Raspberry Pi, the standard Raspberry Pi OS installation will not be enabled
cgroups
. Requiredcgroups
to start the systemd service. Can be enabled bycgroup_memory=1 cgroup_enable=memory systemd.unified_cgroup_hierarchy=1
appending/boot/cmdline.txt
tocgroups
.and restart to take effect
kubelet uses systemd cgroup driver
kubeadm supports kubeadm init
passing a KubeletConfiguration
structure when executing . KubeletConfiguration
It contains cgroupDriver
fields that can be used to control the cgroup driver of kubelet.
Description: In version 1.22, if the user does not KubeletConfiguration
set cgroupDriver
the field in , kubeadm init
it will be set to the default value systemd
.
Here is a minimal example where this field is explicitly configured:
# kubeadm-config.yaml
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.21.0
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
Such a configuration file can be passed to the kubeadm command:
kubeadm init --config kubeadm-config.yaml
illustrate:
Kubeadm uses the same one for all nodes in the cluster KubeletConfiguration
. KubeletConfiguration
Stored in a ConfigMapkube-system
object under the namespace .
Executing subcommands such as init
, , join
and will cause kubeadm to write to the file , which will then be passed to the kubelet on the local node.upgrade
KubeletConfiguration
/var/lib/kubelet/config.yaml
containerd uses the systemd cgroup driver
edit /etc/containerd/config.toml
:
[plugins.cri.containerd.runtimes.runc.options]
SystemdCgroup = true
Upgrade monitoring components to support cgroup v2 monitoring
📚️ Reference:
cgroup v2 uses a different API than cgroup v1, so if any applications directly access the cgroup file system, these applications need to be updated to support cgroup v2. For example:
- Some third-party monitoring and security agents may rely on the cgroup file system. You will need to update these agents to versions that support cgroup v2.
- If you run cAdvisor as a standalone DaemonSet to monitor Pods and containers, you need to update it to v0.43.0 or higher.
- If you use JDK, it is recommended to use JDK 11.0.16 and higher or JDK 15 and higher to fully support cgroup v2 .
Complete 🎉🎉🎉
Conclusion
The cgroup2 feature of Kubernetes has been officially stable since v1.25. Compared with cgroup v1, cgroup2 has the following advantages:
- Single unified hierarchical design across APIs
- Safer subtree delegation to containers
- Updated features such as Pressure Stall Information (PSI)
- Enhanced resource allocation management and isolation across multiple resources
- Unified accounting for different types of memory allocation (network memory, kernel memory, etc.)
- Consider non-immediate resource changes, such as page cache writebacks
It is recommended to use Linux and CRI that support cgroup v2 when using Kubernetes v1.25 and above. And enable the cgroup v2 feature of Kubernetes.