Skip to content

Document GitOps-based approach for managing User Cluster MLA alerting rules #2001

@mpavlovicbb

Description

@mpavlovicbb

Problem Statement

The current User Cluster MLA Admin Guide primarily documents the API-based approach for managing alert rules via REST endpoints. However, it doesn't document the CRD-based approach which is more suitable for GitOps workflows.

Through cluster investigation, I discovered that Kubermatic provides native Kubernetes CRDs for managing alerting rules:

  • rulegroups.kubermatic.k8c.io - For defining Prometheus alert/recording rules
  • alertmanagers.kubermatic.k8c.io - For configuring Alertmanager routing

These CRDs can be applied directly to user cluster namespaces (e.g., cluster-*) in the seed cluster, enabling GitOps-based alert management.

Proposed Documentation Addition

Add a new section: "Managing User Cluster Alerting via GitOps" to the User Cluster MLA documentation.

Content to Include:

1. RuleGroup CRD Overview

  • Explain that RuleGroups can be created as Kubernetes resources in user cluster namespaces
  • Document the CRD structure and fields

Example:

apiVersion: kubermatic.k8c.io/v1
kind: RuleGroup
metadata:
  name: haproxy-alerts
  namespace: cluster-xxxxx  # User cluster namespace in seed
spec:
  cluster:
    name: xxxxx
  ruleGroupType: Metrics  # or "Logs"
  isDefault: false
  data: |
    groups:
      - name: haproxy-service-specific-alerts
        rules:
          - alert: HighErrorRate
            expr: rate(http_requests_total{code=~"5.."}[5m]) > 0.05
            for: 5m
            labels:
              severity: critical
            annotations:
              summary: High 5xx error rate detected

2. Alertmanager Configuration via Secret

  • Document how to configure Alertmanager by updating the alertmanager secret
  • Show the secret structure and key names

Example:

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager
  namespace: cluster-xxxxx
type: Opaque
stringData:
  alertmanager.yaml: |
    template_files: {}
    alertmanager_config: |
      route:
        receiver: 'default'
        group_by: ['alertname', 'cluster', 'service']
        routes:
          - receiver: 'slack-critical'
            match:
              severity: critical
      receivers:
        - name: 'default'
          slack_configs:
            - api_url: 'https://hooks.slack.com/services/XXX'
              channel: '#alerts'
        - name: 'slack-critical'
          slack_configs:
            - api_url: 'https://hooks.slack.com/services/XXX'
              channel: '#critical-alerts'

3. Alertmanager CRD Reference

  • Document the alertmanagers.kubermatic.k8c.io CRD
  • Explain its relationship with the secret

Example:

apiVersion: kubermatic.k8c.io/v1
kind: Alertmanager
metadata:
  name: alertmanager
  namespace: cluster-xxxxx
spec:
  configSecret:
    name: alertmanager  # References the secret above

4. GitOps Workflow Examples

  • Show how to structure alert rules in Git repository
  • Provide ArgoCD/Flux application examples
  • Best practices for organizing rules by service/component

5. API vs CRD Comparison Table

Aspect API Approach CRD Approach
GitOps Support Requires CI/CD integration Native Kubernetes resources
Version Control Manual API calls Git history
Declarative No Yes
Access Control KKP API permissions Kubernetes RBAC
Tooling curl, API clients kubectl, ArgoCD, Flux
Use Case Programmatic management Infrastructure as Code

Benefits

This documentation would:

  1. Enable GitOps workflows for alert management
  2. Provide a more Kubernetes-native approach
  3. Help teams already using ArgoCD/Flux for infrastructure
  4. Reduce the learning curve for Kubernetes users
  5. Fill a gap in current documentation

Additional Context

Current documentation focuses on:

  • API endpoints: GET/POST/PUT/DELETE /api/v2/projects/{project_id}/clusters/{cluster_id}/rulegroups
  • UI-based management via KKP dashboard

Missing:

  • CRD-based declarative approach
  • GitOps integration patterns
  • Complete CRD specification examples

Related Documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions