Skip to content

Add CoderdUnprovisionedPrebuiltWorkspaces alert #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ values which are defined [here](https://github.com/grafana/helm-charts/tree/main

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"IneligiblePrebuilds":{"delay":"10m","enabled":true,"thresholds":{"notify":1}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"IneligiblePrebuilds":{"delay":"10m","enabled":true,"thresholds":{"notify":1}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"UnprovisionedPrebuiltWorkspaces":{"delay":"10m","enabled":true,"thresholds":{"warn":1}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
| global.coder.coderdSelector | string | `"pod=~`coder.*`, pod!~`.*provisioner.*`"` | series selector for Prometheus/Loki to locate provisioner pods. ensure this uses backticks for quotes! |
| global.coder.controlPlaneNamespace | string | `"coder"` | the namespace into which the control plane has been deployed. |
| global.coder.externalProvisionersNamespace | string | `"coder"` | the namespace into which any external provisioners have been deployed. |
Expand Down
52 changes: 51 additions & 1 deletion coder-observability/runbooks/coderd.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,54 @@ Please contact your Coder sales contact, or visit https://coder.com/contact/sale
Prebuilds only become eligible to be claimed by users once the workspace's agent is a) running and b) all of its startup
scripts have completed.

If a prebuilt workspace is not eligible, view its agent logs to diagnose the problem.
If a prebuilt workspace is not eligible, view its agent logs to diagnose the problem.

## CoderdUnprovisionedPrebuiltWorkspaces

The number of running prebuilt workspaces is lower than the desired instances. This could be for several reasons,
ordered by likehood:

### Experiment/License

The prebuilds feature is currently gated behind an experiment *and* a premium license.

Ensure that the prebuilds experiment is enabled with `CODER_EXPERIMENTS=workspace-prebuilds`, and that you have a premium
license added.

### Preset Validation Issue

Templates which have prebuilds configured will require a configured preset defined, with ALL of the required parameters
set in the preset. If any of these are missing, or any of the parameters - as defined - fail validation, then the prebuilds
subsystem will refuse to attempt a workspace build.

Consult the coderd logs for more information; look out for errors or warnings from the prebuilds subsystem.

### Template Misconfiguration or Error

Prebuilt workspaces cannot be provisioned due to some issue at `terraform apply`-time. This could be due to misconfigured
cloud resources, improper authorization, or any number of other issues.

Visit the Workspaces page, change the search term to `owner:prebuilds`, and view on the previously failed builds. The
error will likely be quite obvious.

### Provisioner Latency

If your provisioners are overloaded and cannot process provisioner jobs quickly enough, prebuilt workspaces may be affected.
There is no prioritization at present for prebuilt workspace jobs.

Ensure your provisioners are appropriately resources (i.e. you have enough instances) to handle the concurrent build demand.

### Use of Workspace Tags

If you are using `coder_workspace_tags` ([docs](https://coder.com/docs/admin/templates/extending-templates/workspace-tags))
in your template, chances are you do not have any provisioners running or they are under-resourced (see **Provisioner Latency**).

Ensure your running provisioners are configured with your desired tags.

### Reconciliation Loop Issue

The prebuilds subsystem runs a _reconciliation loop_ which monitors the state of prebuilt workspaces to ensure the desired
number of instances are present at all times. Workspace Prebuilds is currently a BETA feature and so there could be a bug
in this _reconciliation loop_, which should be reported to Coder.

Examine your coderd logs for any errors or warnings relating to prebuilds.
20 changes: 20 additions & 0 deletions coder-observability/templates/configmap-prometheus-alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,26 @@ data:
{{- end }}
{{- end }}

{{- with .groups.UnprovisionedPrebuiltWorkspaces }}
{{- $group := . }}
{{- if .enabled }}
- name: Coderd Unprovisioned Prebuilt Workspaces
rules:
{{ $alert := "CoderdUnprovisionedPrebuiltWorkspaces" }}
{{- range $severity, $threshold := .thresholds }}
- alert: {{ $alert }}
expr: max by (template_name, preset_name) (coderd_prebuilds_desired - coderd_prebuilds_running) > 0
for: {{ $group.delay }}
annotations:
summary: >
{{ `{{ $value }}` }} prebuilt workspace(s) not yet been provisioned for the "{{ `{{ $labels.template_name }}` }}" template and "{{ `{{ $labels.preset_name }}` }}" preset.
labels:
severity: {{ $severity }}
runbook_url: {{ template "runbook-url" (deepCopy $ | merge (dict "alert" $alert) $service) }}
{{- end }}
{{- end }}
{{- end }}

{{- end }} {{/* end-section */}}


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,9 +111,9 @@
},
"editorMode": "code",
"expr": "min(coderd_experiments{experiment=\"workspace-prebuilds\"})",
"instant": false,
"instant": true,
"legendFormat": "__auto",
"range": true,
"range": false,
"refId": "A"
}
],
Expand Down Expand Up @@ -645,7 +645,7 @@
"refId": "E"
}
],
"title": "Change over range: $preset",
"title": "Pool Capacity: $preset",
"type": "timeseries"
},
{
Expand Down Expand Up @@ -871,7 +871,7 @@
"refId": "F"
}
],
"title": "Change over range: $preset",
"title": "Pool Operations: $preset",
"type": "timeseries"
},
{
Expand Down
5 changes: 5 additions & 0 deletions coder-observability/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@ global:
delay: 10m
thresholds:
notify: 1
UnprovisionedPrebuiltWorkspaces:
enabled: true
delay: 10m
thresholds:
warn: 1
provisionerd:
groups:
Replicas:
Expand Down
Loading