Overview
Kubernetes users inevitably face cluster issues requiring debugging and resolution to maintain smooth operation of pods and services. Cloud-native DevOps, in particular, involves managing containers, microservices, and auto-scaling, which can be complex. GenAI can aid in troubleshooting and executing operational tasks related to platforms like Kubernetes. This may entail utilizing natural language prompts to initiate, revert, or gain insights into the cluster.
For instance, at KubeCon + CloudNativeCon 2023, Kubiya introduced a generative AI workflow engine capable of interpreting such commands within Slack. Enhancing natural language processing capabilities can empower platform teams to devise new workflows that abstract the intricacies of cloud-native platform management.
K8sGPT
The K8sGPT project is a renowned CLI tool with widespread usage, particularly in production environments by two organizations. It has applied to become a CNCF sandbox project. Its primary functions include:
- Offering detailed contextual explanations of Kubernetes error messages.
- Providing cluster insights.
- Supporting multiple installation options.
- Supporting different AI backends.
One of its key commands is “k8sgpt analyze”, designed to identify issues within Kubernetes clusters. This functionality is facilitated through analyzers, which define the logic of each Kubernetes object and potential issues. For instance, Kubernetes Services analyzers verify the existence of a specific Service, its endpoint, and its readiness.
An even more potent feature is accessible by running the command “k8sgpt analyze –explain”, prompting the AI to provide tailored instructions for the user’s specific scenario. These instructions include troubleshooting actions and precise kubectl commands, simplifying execution via copy-pasting. This efficiency is achievable due to pre-established Kubernetes resource names.
Install
There are several installation options available, depending on your preference and operating system. You can explore these options in the installation section of the documentation.
Before installing K8sGPT, ensure you have Homebrew installed on a Mac or WSL on a Windows machine.
Next, execute the following commands:
For Mac:
brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt
For Windows:
# WSL Installation Steps
To view all available commands provided by K8sGPT, utilize the “–help” flag:
k8sgpt --help
Prerequisites
The prerequisites for the following steps include having an OpenAI account and a running Kubernetes cluster. Any cluster, such as microk8s or minikube, will suffice.
Once you have an OpenAI account, you need to visit its website to generate a new API key. Alternatively, you can run the following command, and K8sGPT will open the same website in your default browser:
k8sgpt generate
This key is essential for K8sGPT to interact with OpenAI. Authorize K8sGPT using the newly created API key/token:
k8sgpt auth add openai
Enter openai Key: openai added to the AI backend provider list
You can list your backends using:
k8sgpt auth list
Next, we’ll proceed to install a malicious deployment in the Kubernetes cluster. The pod will enter a CrashLoopBackOff state. Here’s the YAML configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
securityContext:
readOnlyRootFilesystem: true
Next, we’ll create a private namespace for the sample application and install the Deployment:
kubectl create ns demo
namespace/demo created
kubectl apply -f ./deployment.yaml -n demo
deployment.apps/nginx-deployment created
If we want to view the events of one of the pods, it will display:
Warning BackOff 3s (x8 over 87s) kubelet Back-off restarting failed container
Therefore, we can run the K8sGPT command to get more details about why these pods are failing:
k8sgpt analyse
This will show the issues K8sGPT found in the cluster:
AI Provider: openai
For more information and suggestions on how to solve the problem, we can use --explain
the flag:
k8sgpt analyse --explain
0 demo/nginx-deployment-5f4c7db77b-hq74n(Deployment/nginx-deployment)
- Error: back-off 1m20s restarting failed container=nginx pod=nginx-deployment-5f4c7db77b-hq74n_demo(7854b793-21b7-4f81-86e5-dbb4113f64f4)
1 demo/nginx-deployment-5f4c7db77b-phbq8(Deployment/nginx-deployment)
- Error: back-off 1m20s restarting failed container=nginx pod=nginx-deployment-5f4c7db77b-phbq8_demo(74038531-e362-45a6-a436-cf1a6ea46d8a)
2 demo/nginx-deployment-5f4c7db77b-shkw6(Deployment/nginx-deployment)
- Error: back-off 1m20s restarting failed container=nginx pod=nginx-deployment-5f4c7db77b-shkw6_demo(2603f332-3e1c-45da-8080-e34dd6d956ad)
kubectl-ai
To enhance this project with a visual and engaging approach, we’ll leverage Kubectl + OpenAI. Here’s how to install it:
Install via Homebrew:
brew tap sozercan/kubectl-ai https://github.com/sozercan/kubectl-ai
brew install kubectl-ai
Install via Krew:
kubectl krew index add kubectl-ai https://github.com/sozercan/kubectl-ai
kubectl krew install kubectl-ai/kubectl-ai
To use kubectl-ai
, you need a valid Kubernetes configuration and one of the following conditions:
- OpenAI API key
- Azure OpenAI service API keys and endpoints
- LocalAI
For these tools, the following environment variables are available:
export OPENAI_API_KEY=<your OpenAI key>
export OPENAI_DEPLOYMENT_NAME=<your OpenAI deployment/model name. defaults to "gpt-3.5-turbo-0301">
export OPENAI_ENDPOINT=<your OpenAI endpoint, like "https://my-aoi-endpoint.openai.azure.com" or "http://localhost:8080/v1">
If the variable OPENAI_ENDPOINT
is set, it will be used. Otherwise, the OpenAI API will be used.
The Azure OpenAI service does not allow certain characters in deployment names, such as “..”. Therefore, for Azure, kubectl-ai
automatically replaces “gpt-3.5-turbo” with “gpt-35-turbo”. However, if you are using an Azure OpenAI deployment name that is completely different from the model name, you can set an environment variable AZURE_OPENAI_MAP
to map the model name to the Azure OpenAI deployment name. For example:
export AZURE_OPENAI_MAP="gpt-3.5-turbo=my-deployment"
Demo
Nginx Pod commands :
kubectl ai "create an nginx pod"
kubectl ai "create an nginx pod"
✨ Attempting to apply the following manifest:
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
Use the arrow keys to navigate: ↓ ↑ → ←
? Would you like to apply this? [Reprompt/Apply/Don't Apply]:
+ Reprompt
▸ Apply
Don't Apply
Deployment : Select “Reprompt” and enter “make this into deployment”
Reprompt: make this into deployment
✨ Attempting to apply the following manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
ReplicaSet :
Reprompt: Scale to 3 replicas
Reprompt: Scale to 3 replicas
✨ Attempting to apply the following manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
KoPylot
KoPylot is an open source Kubernetes assistant powered by AI. Its goal is to help developers and DevOps engineers easily manage and monitor Kubernetes clusters .
Function
In its current version, KoPylot has four main features. These functions can be converted into kopylotCLI
subcommands. These subcommands are:
- Audit
- Chat
- Diagnose
- Ctl
Let’s delve deeper into each of these commands.
Audit : Audit resources such as pods, deployments, and services. KoPylot looks for vulnerabilities based on the manifest files of individual resources.
Chat : Ask KoPylot to generate kubectl commands in plain English. You can review the command before running it.
Diagnose : You can use diagnostic tools to help you debug different components of your application. The diagnostic command gives you a list of possible fixes for corrupted resources.
Ctl : A wrapper for kubectl. All arguments passed to the ctl subcommand are interpreted by kubectl.
Operating principle
Currently, KoPylot operates by extracting information from a Kubernetes resource description (via kubectl describe ...
) or manifest and feeding it into OpenAI’s Davinci model, alongside hints. These hints guide the model on how to interpret Kubernetes resources and construct its output.
Hints also play a crucial role in instructing the model on the format of its output. For instance, the prompt for the Audit command instructs the model to generate output in the form of a two-column JSON, containing the vulnerability and its severity.
One of the objectives outlined in the roadmap is to substitute the OpenAI model with an internally hosted model. This approach aims to address the concern of transmitting potentially sensitive data to the OpenAI server.
You can use KoPylot by following these steps:
- Apply for an API key from OpenAI.
- Export the key using the following command
export KOPYLOT_AUTH_TOKEN=
- Install Kopylot using pip:
pip install kopylot
- Run Kopylot
kopylot --help
Overall, KoPylot is a useful tool for diagnosing and troubleshooting Kubernetes workloads. Its web-based chat interface and CLI make it easy to use and suitable for all levels of users .
Kopilot
Kopilot is the only one of these projects written in Go. It includes two functions: troubleshooting and auditing.
Install
macOS :
brew install knight42/tap/kopilot
Krew:
kubectl krew install kopilot
Currently, you need to set up two ENVs to run Kopilot:
- Set
KOPILOT_TOKEN
to specify the token. - Set
KOPILOT_LANG
to specify the language, which defaults toEnglish
. Valid options includeChinese
,French
,Spain
etc.
Imagine that the command comes in handy when your Pod is stuck or in Pending
the or CrashLoopBackOff
state . kopilot diagnose
It may ask AI for help and display its conclusions, including explanations of possible causes.
Also, kopilot audit
commands take a similar approach and are checked against well-known good practices and possible security misconfigurations. The tool will answer questions using your OpenAI API token and the language of your choice.
Kubectl-GPT
Kubectl-GPT is a kubectl plugin that uses the GPT model to generate commands from natural language input . This plugin introduces the command, whose sole mission is to implement your requests in the Kubernetes cluster.kubectl
kubectl GPT
Install
Homebrew :
# Install Homebrew: https://brew.sh/
brew tap devinjeon/kubectl-gpt https://github.com/devinjeon/kubectl-gpt
brew install kubectl-gpt
Krew :
# Install Krew: https://krew.sigs.k8s.io/docs/user-guide/setup/install/
kubectl krew index add devinjeon https://github.com/devinjeon/kubectl-gpt
kubectl krew install devinjeon/gpt
You need to use natural language input to run the command line tool and generate kubectl
commands.
kubectl gpt "<WHAT-YOU-WANT-TO-DO>"
Prerequisites
Before you begin, make sure your OpenAI API key is set as OPENAI_API_KEY
an environment variable named .
Then you can add the following line in the .zshrc
or .bashrc
file:
export OPENAI_API_KEY=<your-key>
Of course, this depends on the languages supported by the OpenAI GPT API, such as:
# English
kubectl gpt "Print the creation time and pod name of all pods in all namespaces."
kubectl gpt "Print the memory limit and request of all pods"
kubectl gpt "Increase the replica count of the coredns deployment to 2"
kubectl gpt "Switch context to the kube-system namespace"
Kube-Copilot
It is Kubernetes Copilot powered by OpenAI. The main functions are:
- Automate Kubernetes cluster operations using ChatGPT (GPT-4 or GPT-3.5) .
- Diagnose and analyze potential issues with Kubernetes workloads.
- Generate a Kubernetes manifest according to the provided prompt instructions.
- Utilize local kubectl and trivy commands for Kubernetes cluster access and security vulnerability scanning.
- Access the web and perform Google searches without leaving your terminal .
Install
When running in Kubernetes :
Option 1: Use Web UI with Helm (recommended)
# Option 1: OpenAI
export OPENAI_API_KEY="<replace-this>"
helm install kube-copilot kube-copilot \
--repo https://feisky.xyz/kube-copilot \
--set openai.apiModel=gpt-4 \
--set openai.apiKey=$OPENAI_API_KEY
# Option 2: Azure OpenAI Service
export OPENAI_API_KEY="<replace-this>"
export OPENAI_API_BASE="<replace-this>"
helm install kube-copilot kube-copilot \
--repo https://feisky.xyz/kube-copilot \
--set openai.apiModel=gpt-4 \
--set openai.apiKey=$OPENAI_API_KEY \
--set openai.apiBase=$OPENAI_API_BASE
# Forwarding requests to the service
kubectl port-forward service/kube-copilot 8080:80
echo "Visit http://127.0.0.1:8080 to use the copilot"
Option 2: Use kubectl with CLI
kubectl run -it --rm copilot \
--env="OPENAI_API_KEY=$OPENAI_API_KEY" \
--restart=Never \
--image=ghcr.io/feiskyer/kube-copilot \
-- execute --verbose 'What Pods are using max memory in the cluster'
Install locally :
Install copilot using the following pip command:
pip install kube-copilot
Set operation
- Make sure the kubeconfig file is installed on your local machine
kubectl
and configured for Kubernetes cluster access. - Install
trivy
to evaluate container image security issues (for use withaudit
the command). - Set the OpenAI API key to
OPENAI_API_KEY
the environment variable to enable ChatGPT functionality. - For the Azure OpenAI service, also set up
OPENAI_API_TYPE=azure
, as wellOPENAI_API_BASE=https://<replace-this>.openai.azure.com/
. - Google search is disabled by default. To enable it, set
GOOGLE_API_KEY
andGOOGLE_CSE_ID
.
Method using CLI : Run directly in the terminal.
Usage: kube-copilot [OPTIONS] COMMAND [ARGS]...
Kubernetes Copilot powered by OpenAI
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
analyze analyze issues for a given resource
audit audit security issues for a Pod
diagnose diagnose problems for a Pod
execute execute operations based on prompt instructions
generate generate Kubernetes manifests
Audit Pod’s security issues : You can use kube-copilot audit POD [NAMESPACE]
to audit Pod’s security issues.
Usage: kube-copilot audit [OPTIONS] POD [NAMESPACE]
audit security issues for a Pod
Options:
--verbose Enable verbose information of copilot execution steps
--model MODEL OpenAI model to use for copilot execution, default is gpt-4
--help Show this message and exit.
Diagnose Pod problems : Use kube-copilot diagnose POD [NAMESPACE]
to diagnose Pod problems.
Usage: kube-copilot diagnose [OPTIONS] POD [NAMESPACE]
diagnose problems for a Pod
Options:
--verbose Enable verbose information of copilot execution steps
--model MODEL OpenAI model to use for copilot execution, default is gpt-4
--help Show this message and exit.
Analyze K8s Object for potential issues : Running kube-copilot analyze RESOURCE NAME [NAMESPACE]
will analyze the given resource object for potential issues.
Usage: kube-copilot analyze [OPTIONS] RESOURCE NAME [NAMESPACE]
analyze issues for a given resource
Options:
--verbose Enable verbose information of copilot execution steps
--model TEXT OpenAI model to use for copilot execution, default is gpt-4
--help Show this message and exit.
Perform operations according to prompt instructions : kube-copilot execute INSTRUCTIONS
Can perform operations according to prompt instructions. It can also be used to ask any questions.
Usage: kube-copilot execute [OPTIONS] INSTRUCTIONS
execute operations based on prompt instructions
Options:
--verbose Enable verbose information of copilot execution steps
--model MODEL OpenAI model to use for copilot execution, default is gpt-4
--help Show this message and exit.
Generate a Kubernetes manifest : Use kube-copilot generate
the command and follow the prompts to create a Kubernetes manifest. After the manifests are generated, you will be prompted to confirm whether you want to apply them.
Usage: kube-copilot generate [OPTIONS] INSTRUCTIONS
generate Kubernetes manifests
Options:
--verbose Enable verbose information of copilot execution steps
--model TEXT OpenAI model to use for copilot execution, default is gpt-4
--help Show this message and exit.
Kubernetes ChatGPT bot
This is the ChatGPT1 bot for Kubernetes issues. It can ask the AI how to resolve Prometheus alerts and get a concise response.
Prometheus will forward the alert to the bot via the webhook receiver. The bot will then send a query to OpenAI asking how to fix the alert, and you just have to wait patiently for the results.
Such a bot is implemented through Robusta.dev, an open source platform for responding to Kubernetes alerts. We also have a SaaS platform for multi-cluster Kubernetes observability.
A Slack workspace is a prerequisite for setting it up.
Then you just:
- Install Robusta using Helm
- Load the ChatGPT playbook. Add the following to
generated_values.yaml
playbookRepos:
chatgpt_robusta_actions:
url: "https://github.com/robusta-dev/kubernetes-chatgpt-bot.git"
customPlaybooks:
# Add the 'Ask ChatGPT' button to all Prometheus alerts
- triggers:
- on_prometheus_alert: {}
actions:
- chat_gpt_enricher: {}
- Add the OpenAI API key in
generated_values.yaml
. Make sure to edit existingglobalConfig
sections and do not add duplicate sections.
globalConfig:
chat_gpt_token: YOUR KEY GOES HERE
- Do a Helm upgrade to apply the new values
helm upgrade robusta robusta/robusta
--values=generated_values.yaml
--set clusterName=<YOUR_CLUSTER_NAME>
- Send Prometheus alerts to Robusta. Or, directly use the Prometheus stack bundled with Robusta.
Demo
Deploy the broken pod first so that the pod will stay in the pending state:
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/pending_pods/pending_pod_node_selector.yaml
The Prometheus alert is triggered immediately, skipping the normal delay:
robusta playbooks trigger prometheus_alert alert_name=KubePodCrashLooping namespace=default pod_name=example-pod
An alert with a button will appear in Slack. Just click the button to ask ChatGPT about the alert.
Appilot
Appilot is an open-source AI assistant designed for DevOps scenarios. It leverages the capabilities of large language models to enable users to input natural language directly, thus simplifying the Kubernetes management experience.
Using inference based on large language models, Appilot can run locally on a PC. Users have the flexibility to integrate Appilot into any platform according to their needs and usage habits. This integration enables users to interact with the backend platform by inputting natural language, facilitating tasks such as application management, environment management, and Kubernetes debugging.
Appilot project address https://github.com/seal-io/appilot