Resize a Postgres Cluster
You did it -- the application is a success! Traffic is booming, so much so that you need to add more resources to your Postgres cluster. However, you're worried that any resize operation may cause downtime and create a poor experience for your end users.
This is where PGO comes in: PGO will help orchestrate rolling out any potentially disruptive changes to your cluster to minimize or eliminate and downtime for your application. To do so, we will assume that you have deployed a high availability Postgres cluster as described in the Day Two Tasks tutorial.
Let's dive in.
Resize Memory and CPU
Memory and CPU resources are an important component for vertically scaling your Postgres cluster. Coupled with tweaks to your Postgres configuration file, allocating more memory and CPU to your cluster can help it to perform better under load.
It's important for instances in the same high availability set to have the same resources. PGO lets you adjust CPU and memory within the resources sections of the postgresclusters.postgres-operator.crunchydata.com custom resource. These include:
- spec.instances.resourcessection, which sets the resource values for the PostgreSQL container, as well as any init containers in the associated pod and containers created by the- pgDataVolumeand- pgWALVolumedata migration jobs.
- spec.instances.sidecars.replicaCertCopy.resourcessection, which sets the resources for the- replica-cert-copysidecar container.
- spec.monitoring.pgmonitor.exporter.resourcessection, which sets the resources for the- exportersidecar container.
- spec.backups.pgbackrest.repoHost.resourcessection, which sets the resources for the pgBackRest repo host container, as well as any init containers in the associated pod and containers created by the- pgBackRestVolumedata migration job.
- spec.backups.pgbackrest.sidecars.pgbackrest.resourcessection, which sets the resources for the- pgbackrestsidecar container.
- spec.backups.pgbackrest.sidecars.pgbackrestConfig.resourcessection, which sets the resources for the- pgbackrest-configsidecar container.
- spec.backups.pgbackrest.jobs.resourcessection, which sets the resources for any pgBackRest backup job.
- spec.backups.pgbackrest.restore.resourcessection, which sets the resources for manual pgBackRest restore jobs.
- spec.dataSource.postgresCluster.resourcessection, which sets the resources for pgBackRest restore jobs created during the cloning process.
- spec.proxy.pgBouncer.resourcessection, which sets the resources for the- pgbouncercontainer.
- spec.proxy.pgBouncer.sidecars.pgbouncerConfig.resourcessection, which sets the resources for the- pgbouncer-configsidecar container.
The layout of these resources sections should be familiar: they follow the same pattern as the standard Kubernetes structure for setting container resources. Note that these settings also allow for the configuration of QoS classes.
For example, using the spec.instances.resources section, let's say we want to update our hippo Postgres cluster so that each instance has a limit of 2.0 CPUs and 4Gi of memory. We can make the following changes to the manifest:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
spec:
  postgresVersion: 17
  instances:
    - name: instance1
      replicas: 2
      resources:
        limits:
          cpu: 2.0
          memory: 4Gi
      dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi
In particular, we added the following to spec.instances:
resources:
  limits:
    cpu: 2.0
    memory: 4Gi
Apply these updates to your Postgres cluster with the following command:
kubectl apply -k kustomize/postgres
Now, let's watch how the rollout happens. In Bash, you can use a command like the following:
watch "kubectl -n postgres-operator get pods --selector=postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/instance -o=jsonpath='{range .items[*]}{.metadata.name}{\"\t\"}{.metadata.labels.postgres-operator\.crunchydata\.com/role}{\"\t\"}{.status.phase}{\"\t\"}{.spec.containers[].resources.limits}{\"\n\"}{end}'"
In Powershell, you can use a command like:
kubectl -n postgres-operator get pods --watch --selector=postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/instance -o=jsonpath="{range .items[*]}{.metadata.name}{'\t'}{.metadata.labels.postgres-operator\.crunchydata\.com/role}{'\t'}{.status.phase}{'\t'}{.spec.containers[].resources.limits}{'\n'}"
Observe how each Pod is terminated one-at-a-time. This is part of a "rolling update". Because updating the resources of a Pod is a destructive action, PGO first applies the CPU and memory changes to the replicas. PGO ensures that the changes are successfully applied to a replica instance before moving on to the next replica.
Once all of the changes are applied, PGO will perform a "controlled switchover": it will promote a replica to become a primary, and apply the changes to the final Postgres instance.
By rolling out the changes in this way, PGO ensures there is minimal to zero disruption to your application: you are able to successfully roll out updates and your users may not even notice!
Resize PVC
Your application is a success! Your data continues to grow, and it's becoming apparently that you need more disk. That's great: you can resize your PVC directly on your postgresclusters.postgres-operator.crunchydata.com custom resource with minimal to zero downtime.
PVC resizing, also known as volume expansion, is a function of your storage class: it must support volume resizing. Additionally, PVCs can only be sized up: you cannot shrink the size of a PVC.
You can adjust PVC sizes on all of the managed storage instances in a Postgres instance that are using Kubernetes storage. These include:
- spec.instances.dataVolumeClaimSpec.resources.requests.storage: The Postgres data directory (aka your database).
- spec.backups.pgbackrest.repos.volume.volumeClaimSpec.resources.requests.storage: The pgBackRest repository when using "volume" storage
The above should be familiar: it follows the same pattern as the standard Kubernetes PVC structure.
For example, let's say we want to update our hippo Postgres cluster so that each instance now uses a 10Gi PVC and our backup repository uses a 20Gi PVC. We can do so with the following markup:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
spec:
  postgresVersion: 17
  instances:
    - name: instance1
      replicas: 2
      resources:
        limits:
          cpu: 2.0
          memory: 4Gi
      dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 10Gi
  backups:
    pgbackrest:
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 20Gi
In particular, we added the following to spec.instances:
dataVolumeClaimSpec:
  resources:
    requests:
      storage: 10Gi
and added the following to spec.backups.pgbackrest.repos.volume:
volumeClaimSpec:
  accessModes:
  - "ReadWriteOnce"
  resources:
    requests:
      storage: 20Gi
Apply these updates to your Postgres cluster with the following command:
kubectl apply -k kustomize/postgres
Resize PVCs With StorageClass That Does Not Allow Expansion
Not all Kubernetes Storage Classes allow for volume expansion. However, with PGO, you can still resize your Postgres cluster data volumes even if your storage class does not allow it!
Let's go back to the previous example:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
spec:
  postgresVersion: 17
  instances:
    - name: instance1
      replicas: 2
      resources:
        limits:
          cpu: 2.0
          memory: 4Gi
      dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - 'ReadWriteOnce'
              resources:
                requests:
                  storage: 20Gi
First, create a new instance that has the larger volume size. Call this instance instance2. The manifest would look like this:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
spec:
  postgresVersion: 17
  instances:
    - name: instance1
      replicas: 2
      resources:
        limits:
          cpu: 2.0
          memory: 4Gi
      dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 1Gi
    - name: instance2
      replicas: 2
      resources:
        limits:
          cpu: 2.0
          memory: 4Gi
      dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 10Gi
  backups:
    pgbackrest:
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - 'ReadWriteOnce'
              resources:
                requests:
                  storage: 20Gi
Take note of the block that contains instance2:
- name: instance2
  replicas: 2
  resources:
    limits:
      cpu: 2.0
      memory: 4Gi
  dataVolumeClaimSpec:
    accessModes:
      - 'ReadWriteOnce'
    resources:
      requests:
        storage: 10Gi
This creates a second set of two Postgres instances, both of which come up as replicas, that have a larger PVC.
Once this new instance set is available and they are caught to the primary, you can then apply the following manifest:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
spec:
  postgresVersion: 17
  instances:
    - name: instance2
      replicas: 2
      resources:
        limits:
          cpu: 2.0
          memory: 4Gi
      dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 10Gi
  backups:
    pgbackrest:
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - 'ReadWriteOnce'
              resources:
                requests:
                  storage: 20Gi
This will promote one of the instances with the larger PVC to be the new primary and remove the instances with the smaller PVCs!
This method can also be used to shrink PVCs to use a smaller amount.
Troubleshooting
Postgres Pod Can't Be Scheduled
There are many reasons why a PostgreSQL Pod may not be scheduled:
- Resources are unavailable. Ensure that you have a Kubernetes Node with enough resources to satisfy your memory or CPU Request.
- PVC cannot be provisioned. Ensure that you request a PVC size that is available, or that your PVC storage class is set up correctly.
PVCs Do Not Resize
Ensure that your storage class supports PVC resizing. You can check that by inspecting the allowVolumeExpansion attribute:
kubectl get sc
If the storage class does not support PVC resizing, you can use the technique described above to resize PVCs using a second instance set.
Next Steps
Now that we know how to resize our Postgres clusters, let's look at how PGO handles software updates!