From 1daaf01adcd8d077bd62affc76877aa33d98b776 Mon Sep 17 00:00:00 2001 From: Ezra Citron Date: Tue, 11 Jan 2022 17:43:14 -0800 Subject: [PATCH 1/2] add kubernetes cheat sheet under admin section --- _pages/administration/troubleshooting/kubectl-cheat-sheet | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 _pages/administration/troubleshooting/kubectl-cheat-sheet diff --git a/_pages/administration/troubleshooting/kubectl-cheat-sheet b/_pages/administration/troubleshooting/kubectl-cheat-sheet new file mode 100644 index 000000000..d7026e652 --- /dev/null +++ b/_pages/administration/troubleshooting/kubectl-cheat-sheet @@ -0,0 +1,6 @@ +--- +categories: troubleshooting +layout: article +title: "kubectl cheat sheet" +--- + From 5a6140ed7e8c999dfd47d8d0e778e687fd6a7cf7 Mon Sep 17 00:00:00 2001 From: Ezra Citron Date: Thu, 13 Jan 2022 09:19:54 -0800 Subject: [PATCH 2/2] revamp cheat sheet --- .../troubleshooting/kubectl-cheat-sheet | 6 - .../troubleshooting/kubectl-cheat-sheet.md | 138 ++++++++++++++++++ 2 files changed, 138 insertions(+), 6 deletions(-) delete mode 100644 _pages/administration/troubleshooting/kubectl-cheat-sheet create mode 100644 _pages/administration/troubleshooting/kubectl-cheat-sheet.md diff --git a/_pages/administration/troubleshooting/kubectl-cheat-sheet b/_pages/administration/troubleshooting/kubectl-cheat-sheet deleted file mode 100644 index d7026e652..000000000 --- a/_pages/administration/troubleshooting/kubectl-cheat-sheet +++ /dev/null @@ -1,6 +0,0 @@ ---- -categories: troubleshooting -layout: article -title: "kubectl cheat sheet" ---- - diff --git a/_pages/administration/troubleshooting/kubectl-cheat-sheet.md b/_pages/administration/troubleshooting/kubectl-cheat-sheet.md new file mode 100644 index 000000000..272b16482 --- /dev/null +++ b/_pages/administration/troubleshooting/kubectl-cheat-sheet.md @@ -0,0 +1,138 @@ +--- +categories: troubleshooting +layout: article +title: "kubectl cheat sheet" +--- + +This guide serves as a cheat sheet for the Kubernetes and related commands that are most often needed for managing an Algorithmia (DataRobot) cluster. See the [Official kubectl Cheat Sheet](https://kubernetes.io/docs/reference/kubectl/cheatsheet/) for a comprehensive list of Kubernetes commands, or for a more concise listing, see [Linux Academy's Kubernetes Cheat Sheet](https://linuxacademy.com/site-content/uploads/2019/04/Kubernetes-Cheat-Sheet_07182019.pdf). + +## Algorithmia cluster access + +In order to run `kubectl` commands ot manage a DataRobot (Algorithmia) cluster, you'll first need to access it. All Algorithmia clusters have a bastion host that serves as a trusted access point to view the status of and take actions on cluster resources. Before attempting to access a cluster, please ensure that you have the appropriate permissions and credentials. + +To access a cluster: + +- SSH into the bastion host using your credentials +- Run `docker ps` to identify the correct Docker container (`DOCKER_CONTAINER_ID`) +- Run `docker exec -it /bin/bash` +- Run `export KUBECONFIG=/home/algo/` +- Run `kubectl get pods` to ensure that you can run `kubectl` commands + +If you're having any issues with the above, first confirm that you can access the cluster with your credentials. Verify that the VM you are trying to access is active and healthy in the control panel of your cloud provider. Contact DataRobot support if you are unable to run `kubectl` commands. + +!!! Note Specifying resource types with `kubectl` + For most resource types, `kubectl` doesn't distinguish between the plural and singular form. Thus, for example, `kubectl get pod` is equivalent to `kubectl get pods`. In this guide, for consistency we mainly use the plural forms. + +## Common commands + +Below are commonly used Kubernetes and related commands for managing your Algorithmia cluster. Please keep this reference guide handy when working with our support team. Note that these commands are specific to a cloud environment and may be different for on-premise deployments. + +```bash +# @Neely: what does this do? delete persistent data? I'd like to move it to a more specific section below +kubectl exec -it unilog-es-master-0 -- rm -rf /usr/share/elasticsearch/data/nodes +kubectl exec -it unilog-es-master-1 -- rm -rf /usr/share/elasticsearch/data/nodes +kubectl exec -it unilog-es-master-2 -- rm -rf /usr/share/elasticsearch/data/nodes + +# List running Docker containers (include stopped containers with `-a/--all`) +sudo docker ps [-a] + +# Get a bash shell in a running container +sudo docker exec -it bash + +# Identify the Kubernetes configuration file that kubectl should use +export KUBECONFIG=/home/algo/deployment/current/xxx.config + +# @Neely: was there a specific endpoint example you want to include? +curl -v ... +``` + +## Nodes + +The typical cloud cluster VM configuration has the following: + +- 1 bastion host (you can access this host by following the [above steps](#common-kubectl-commands)) +- 3 Kubernetes control plane nodes +- 3 Kubernetes general purpose nodes +- 1 Legit (Git server) node +- a variable number of CPU and GPU nodes depending on the cluster’s workload and configuration + +```bash +# Get verbose node configuration details +kubectl describe nodes [] + +# Get node name, status, role, and IP address +kubectl get nodes [] --show-labels -o wide + +# Get node disk and memory usage +kubectl top nodes +``` + +## Pods + +```bash +# Get pod name, status, age, IP address, as well as number of ready pods and the node they are on +kubectl describe pods [] + +# Get pod names, number ready, status, number of restarts, age, and labels +kubectl get pods --show-labels -o wide + +# Get pods with specific label +kubectl get pods -l app=