-
Notifications
You must be signed in to change notification settings - Fork 231
Description
Bug Report
What did you do?
Install a HiveMQ Platform and HiveMQ Platform Operator Helm chart in a GKE Autopilot cluster.
What did you expect to see?
I expect a smooth reconciliation.
What did you see instead? Under which circumstances?
We see a constant mismatch of our StatefulSet
resource, so it's updated on every reconciliation:
15:04:28.712 [INFO] c.h.p.o.d.StatefulSetResourceMatcher - Detected changes in StatefulSet specification:
Path: /spec/template/spec/containers/0/resources/limits/cpu
Actual value: "1"
Desired value: "1000m"
Path: /spec/template/spec/containers/0/resources/requests/cpu
Actual value: "1"
Desired value: "1000m"
(StatefulSetResourceMatcher
extends SSABasedGenericKubernetesResourceMatcher
and uses the internal, pruned actual and desired maps for the diff logging)
This mismatch should be prevented by the PodTemplateSpecSanitizer
. The actual root cause for the mismatch is hidden, due to an unlucky configuration of resource requests/limits and the interference of GKE Autopilot:
-
The HiveMQ Platform Helm chart configures
cpu
requests/limits of1000m
that will be serialized as1
by K8s. So we require thePodTemplateSpecSanitizer
in JOSDK to sanitize theactualMap
, to prevent false positive mismatches on ourStatefulSet
resource. -
The HiveMQ Platform Helm chart doesn't configure
ephemeral-storage
requests/limits by default, but GKE Autopilot enforces this and updates ourStatefulSet
accordingly on-the-fly.
Under the hood we end up with these values in the matcher:
desired:
resources:
limits:
cpu: 1000m
memory: 2048M
requests:
cpu: 1000m
memory: 2048M
actual:
resources:
limits:
cpu: 1 # changed by K8s
ephemeral-storage: 1Gi # added by GKE Autopilot
memory: 2048M
requests:
cpu: 1 # changed by K8s
ephemeral-storage: 1Gi # added by GKE Autopilot
memory: 2048M
The size mismatch of the actual and desired maps trigger this early return in PodTemplateSpecSanitizer. So the cpu
values are not sanitized and we end up with a false positive mismatch of the StatefulSet
.
Since the desired state doesn't contain ephemeral-storage
, there are no managed fields for this key in the requests/limits resources of our container. The SSABasedGenericKubernetesResourceMatcher
then correctly prunes ephemeral-storage
from the actual map, but also hides it as the actual root cause for the wrong cpu
mismatch. For example, even with debug logging the ephemeral-storage
won't show up in the diff, because that uses the pruned actual map: var diff = getDiff(prunedActual, desiredMap, objectMapper);
. The same applies to our custom logging, that also uses the pruned actual map.
Environment
Kubernetes cluster type: K8s 1.33.5 on GKE with Autopilot
$ Mention java-operator-sdk version from pom.xml file
5.1.4
$ java -version
openjdk version "21.0.8" 2025-07-15
OpenJDK Runtime Environment (build 21.0.8+9-Ubuntu-0ubuntu124.04.1)
OpenJDK 64-Bit Server VM (build 21.0.8+9-Ubuntu-0ubuntu124.04.1, mixed mode, sharing)
$ kubectl version
Client Version: v1.34.1
Kustomize Version: v5.7.1
Server Version: v1.33.5-gke.1080000
Possible Solution
The easiest solution would be to remove the early return: .filter(m -> m.size() == desiredResource.size())
.
This shouldn't cost much performance, since we still have two more early returns before we call equals()
check that invokes the expensive getNumericalAmount()
.