Privileged workloads in Google Kubernetes Engine (GKE) Autopilot clusters must be configured correctly to avoid problems. Misconfigurations can lead to synchronization failures with allowlists or cause the workload to be rejected. These problems can prevent essential agents or services from running with the necessary permissions.
Use this document to troubleshoot issues with deploying privileged workloads on Autopilot. Find guidance on resolving allowlist synchronization errors and diagnosing why a privileged workload might be rejected.
This information is important for Platform admins and operators and security teams who deploy workloads with elevated permissions on Autopilot clusters. For more information about the common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.
Allowlist synchronization issues
When you deploy an AllowlistSynchronizer, GKE attempts to
install and synchronize the allowlist files that you specify. If this
synchronization fails, the status field of the AllowlistSynchronizer
reports the error.
Get the status of the AllowlistSynchronizer object:
kubectl get allowlistsynchronizer ALLOWLIST_SYNCHRONIZER_NAME -o yaml
The output is similar to the following:
...
status:
conditions:
- type: Ready
status: "False"
reason: "SyncError"
message: "some allowlists failed to sync: example-allowlist-1.yaml"
lastTransitionTime: "2024-10-12T10:00:00Z"
observedGeneration: 2
managedAllowlistStatus:
- filePath: "gs://path/to/allowlist1.yaml"
generation: 1
phase: Installed
lastSuccessfulSync: "2024-10-10T10:00:00Z"
- filePath: "gs://path/to/allowlist2.yaml"
phase: Failed
lastError: "Initial install failed: invalid contents"
lastSuccessfulSync: "2024-10-08T10:00:00Z"
The conditions.message field and the managedAllowlistStatus.lastError field
provide detailed information about the error. Use this information to resolve
the issue.
Multiple AllowlistSynchronizers
In GKE clusters on versions earlier than 1.33.4-gke.1035000,
WorkloadAllowlists might fail to install if more than one AllowlistSynchronizer
is present.
To resolve the issue, use only a single AllowlistSynchronizer that contains
multiple allowlistPaths.
Alternatively, you can upgrade your cluster to a newer version.
Workload container sorting
In GKE clusters on versions earlier than 1.34.0-gke.0000000, if one or more workload container images match a container image that's specified in an in-cluster WorkloadAllowlist, then the workload containers might be created and sorted in reverse-alphabetical order.
To resolve this issue, try the following options:
- Upgrade your cluster to version 1.34.0-gke.0000000 or later.
- Rename your workload's containers so that they are sorted in the correct order.
Privileged workload deployment issues
After successfully installing an allowlist, you deploy the corresponding privileged workload in your cluster. In some cases, GKE might reject the workload.
Try the following resolution options:
- Ensure that the GKE version of your cluster meets the version requirement of the workload.
- Ensure that the workload that you're deploying is the workload to which the allowlist file applies.
To see why a privileged workload was rejected, request detailed information from GKE about allowlist violations:
Get a list of the installed allowlists in the cluster:
kubectl get workloadallowlistFind the name of the allowlist that should apply to the privileged workload.
Open the YAML manifest of the privileged workload in a text editor. If you can't access the YAML manifests, for example if the workload deployment process uses other tooling, contact the workload provider to open an issue. Skip the remaining steps.
Add the following label to the
spec.metadata.labelssection of the privileged workload Pod specification:labels: cloud.google.com/matching-allowlist: ALLOWLIST_NAMEReplace
ALLOWLIST_NAMEwith the name of the allowlist that you obtained in the previous step. Use the name from the output of thekubectl get workloadallowlistcommand, not the path to the allowlist file.Save the manifest and apply the workload to the cluster:
kubectl apply -f WORKLOAD_MANIFEST_FILEReplace
WORKLOAD_MANIFEST_FILEwith the path to the manifest file.The output provides detailed information about which fields in the workload didn't match the specified allowlist, like in the following example:
Error from server (GKE Warden constraints violations): error when creating "STDIN": admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: =========================================================================== Workload Mismatches Found for Allowlist (example-allowlist-1): =========================================================================== HostNetwork Mismatch: Workload=true, Allowlist=false HostPID Mismatch: Workload=true, Allowlist=false Volume[0]: data - data not found in allowlist. Verify volume with matching name exists in allowlist. Container[0]: - Envs Mismatch: - env[0]: 'ENV_VAR1' has no matching string or regex pattern in allowlist. - env[1]: 'ENV_VAR2' has no matching string or regex pattern in allowlist. - Image Mismatch: Workload=k8s.gcr.io/diff/image, Allowlist=k8s.gcr.io/pause2. Verify that image string or regex match. - SecurityContext: - Capabilities.Add Mismatch: the following added capabilities are not permitted by the allowlist: [SYS_ADMIN SYS_PTRACE] - VolumeMount[0]: data - data not found in allowlist. Verify volumeMount with matching name exists in allowlist.In this example, the following violations occur:
- The workload specifies
hostNetwork: true, but the allowlist doesn't specifyhostNetwork: true. - The workload specifies
hostPID: true, but the allowlist doesn't specifyhostPID: true. - The workload specifies a volume named
data, but the allowlist doesn't specify a volume nameddata. - The container specifies environment variables named
ENV_VAR1andENV_VAR2, but the allowlist doesn't specify these environment variables. - The container specifies the image
k8s.gcr.io/diff/image, but the allowlist specifiesk8s.gcr.io/pause2. - The container adds the
SYS_ADMINandSYS_PTRACEcapabilities, but the allowlist doesn't allow adding these capabilities. - The container specifies a volume mount named
data, but the allowlist doesn't specify a volume mount nameddata.
- The workload specifies
If you're deploying a workload that's provided by a third-party provider, open an issue with that provider to resolve the violations. Provide the output from the previous step in the issue.
Webhook interference with workloads on an allowlist
In some cases, even if a workload is correctly configured to match an allowlist, it might still be rejected by GKE. This situation can happen if another admission controller (webhook) in your cluster modifies the Pods created by the workload controller after they have been allowed by the allowlist. These modifications can cause the Pod specification to no longer match the allowlist, leading to rejection by the GKE Warden admission webhook.
This issue is common with third-party monitoring and security agents that inject sidecar containers or environment variables into Pods.
Symptom
The most common symptom is that your workload controller (such as a DaemonSet or Deployment) is created successfully, but it fails to create any Pods. When you inspect the controller's events, you will see messages indicating that the Pods were denied by the admission webhook.
Diagnosis
- Follow the steps in the Privileged workload deployment issues
section to add the
cloud.google.com/matching-allowlistlabel to your workload. - Copy the
spec.templatefrom your workload's YAML manifest. - Create a new Pod manifest and paste the copied spec into the
specfield. Set the
apiVersion,kind, andmetadata.namefields in the Pod manifest:apiVersion: v1 kind: Pod metadata: name: POD_NAME labels: cloud.google.com/matching-allowlist: ALLOWLIST_NAME spec: # Paste the content of spec.template hereReplace the following:
POD_NAME: The name for your test Pod.ALLOWLIST_NAME: The name of the allowlist.
Apply the Pod manifest:
kubectl apply -f YOUR_POD_MANIFEST_FILEReplace
YOUR_POD_MANIFEST_FILEwith the path to your Pod manifest file.Inspect the output from the previous step. If you see unexpected fields in the "Workload Mismatches" section, such as extra environment variables (for example,
DD_AGENT_HOST), containers, or volumes, it is a strong indication that another webhook is modifying your Pods.
Resolution
To resolve this issue, you need to configure the conflicting webhook to exclude
it from modifying the Pods of your allowlisted workload. This is typically
done by adding a label or annotation to the workload or its namespace to signal
to the webhook that it should be excluded from mutation. For example, with
Datadog, you would add the admission.datadoghq.com/enabled: "false" label to
your workload's namespace.
Consult the documentation for the specific third-party software you are using to learn how to exclude workloads from its admission controller.
By preventing the other webhook from modifying the Pods, you can help to ensure that they continue to match the allowlist and are successfully deployed on your Autopilot cluster.
Bugs and feature requests for privileged workloads and allowlists
Partners are responsible for creating, developing, and maintaining their privileged workloads and allowlists. If you encounter a bug or have a feature request for a privileged workload or allowlist, contact the corresponding partner.
What's next
If you can't find a solution to your problem in the documentation, see Get support for further help, including advice on the following topics:
- Opening a support case by contacting Cloud Customer Care.
- Getting support from the community by
asking questions on StackOverflow
and using the
google-kubernetes-enginetag to search for similar issues. You can also join the#kubernetes-engineSlack channel for more community support. - Opening bugs or feature requests by using the public issue tracker.