In a Kubernetes cluster, I'd like to be able to schedule a Job (with a CronJob) that will mount the same Volumes as 1 Pod of a given StatefulSet. Which Pod that is is a run time decision, depending on the labels set on the Pod at the time of scheduling the Job.
I guess many people will wonder why, therefore a description of what we're doing and trying to do:
Current setup
We have a StatefulSet which serves a PostgreSQL database. (one primary, multiple replica's) We want to be able to create a backup from one of the pods of the StatefulSet.
For PostgreSQL we can already do backups over the network with pg_basebackup
, however we are running multi-TB PostgreSQL databases, which means full streaming backups (with pg_basebackup
) is not feasible.
We currently use pgBackRest
to backup the databases, which allows for incremental backups.
As the incremental backup of pgBackRest
requires access to the data volume and the WAL volume, we need to run the Backup Container on the same Kubernetes Node as the PostgreSQL instance, we currently even run it inside the same Pod in a separate Container.
Inside the container, a small api wraps around pgBackRest
and can be triggered by sending POST
requests to the api, this triggering is currently done using CronJobs.
Downsides
- Every PostgreSQL instance has multiple containers in the Pod, 1 to serve Postgres, 1 to serve a tiny wrapper around
pgBackRest
- Job Logs only show successful backup triggers, the actual backup logs are part of the Backup container
- The Pod that will run the backup may run on a relatively old configuration, changing backup configuration requires a rescheduling of the Pod, which may mean a fail over of the PostgreSQL primary.
Proposed setup
Have a CronJob schedule a Pod that has the same Volume's as 1 of the Pods of the StatefulSet. This will allow the backup to use these Volumes.
However, which Volumes it needs is a run time decision: We may want to run the backup on the Volumes connected to the primary, or we may want to backup using the Volumes of a replica. The primary/replica may change at any moment, as auto-failover of the PostgreSQL primary is part of the solution.
Currently, this is not possible, as I cannot in the CronJob spec find any way to use information from the k8s api.
What does work, but is not very nice:
- Use a CronJob that schedules a Job
- This Job queries the k8s api and schedules another job
For example, this is what we can do to have a job create another job using this run-time information:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: schedule-backup
spec:
schedule: "13 03 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup-trigger
image: bitnami/kubectl
command:
- sh
- -c
- |
PRIMARYPOD=$(kubectl get pods -l cluster-name=<NAME>,role=master -o custom-columns=":metadata.name" --no-headers)
kubectl apply -f - <<__JOB__
apiVersion: batch/v1
kind: Job
metadata:
name: test
spec:
volumes:
name: storage-volume
persistentVolumeClaim:
claimName:
data-volume-${PRIMARYPOD}
[...]
__JOB__
The above may be best served by an Operator instead of using just a CronJob, but I'm wondering if anyone has a solution to the above.
Downsides
- Job Logs only show successful backup triggers, the actual backup logs are part of another Job
- The Job requires permissions to schedule Pod's, requiring yet another role/rolebinding
- Using heredocs in Bash makes things harder to read/parse/understand
Summary
Long story, but these are the constraints we want to satisfy:
- Run a Backup of PostgreSQL database
- These are multi-TB databases
- Therefore, incremental backups are required
- Therefore, we need to mount the same PV of an already running Pod
- Therefore, we need to run a Pod (or container) on the same K8s Node as the PV
- We want to be able to express this in a CronJob spec, instead of having to do runtime kubernetes api calls