0

I am running a django application in a Kubernetes cluster on gcloud. I implemented the database migration as a helm pre-intall hook that launches my app container and does the database migration. I use cloud-sql-proxy in a sidecar pattern as recommended in the official tutorial: https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine

Basically this launches my app and a cloud-sql-proxy containers within the pod described by the job. The problem is that cloud-sql-proxy never terminates after my app has completed the migration causing the pre-intall job to timeout and cancel my deployment. How do I gracefully exit the cloud-sql-proxy container after my app container completes so that the job can complete?

Here is my helm pre-intall hook template definition:

apiVersion: batch/v1
kind: Job
metadata:
  name: database-migration-job
  labels:
    app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
    app.kubernetes.io/instance: {{ .Release.Name | quote }}
    app.kubernetes.io/version: {{ .Chart.AppVersion }}
    helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
  annotations:
    # This is what defines this resource as a hook. Without this line, the
    # job is considered part of the release.
    "helm.sh/hook": pre-install,pre-upgrade
    "helm.sh/hook-weight": "-1"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
spec:
  activeDeadlineSeconds: 230
  template:
    metadata:
      name: "{{ .Release.Name }}"
      labels:
        app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
        app.kubernetes.io/instance: {{ .Release.Name | quote }}
        helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
    spec:
      restartPolicy: Never
      containers:
      - name: db-migrate
        image: {{ .Values.my-project.docker_repo }}{{ .Values.backend.image }}:{{ .Values.my-project.image.tag}}
        imagePullPolicy: {{ .Values.my-project.image.pullPolicy }}
        env:
        - name: DJANGO_SETTINGS_MODULE
          value: "{{ .Values.backend.django_settings_module }}"
        - name: SENDGRID_API_KEY
          valueFrom:
            secretKeyRef:
              name: sendgrid-api-key
              key: sendgrid-api-key
        - name: DJANGO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: django-secret-key
              key: django-secret-key
        - name: DB_USER
          value: {{ .Values.postgresql.postgresqlUsername }}
        - name: DB_PASSWORD
          {{- if .Values.postgresql.enabled }}
          value: {{ .Values.postgresql.postgresqlPassword }}
          {{- else }}
          valueFrom:
            secretKeyRef:
              name: database-password
              key: database-pwd
          {{- end }}
        - name: DB_NAME
          value: {{ .Values.postgresql.postgresqlDatabase }}
        - name: DB_HOST
          {{- if .Values.postgresql.enabled }}
          value: "postgresql"
          {{- else }}
          value: "127.0.0.1"
          {{- end }}
        workingDir: /app-root
        command: ["/bin/sh"]
        args: ["-c", "python manage.py migrate --no-input"]
      {{- if eq .Values.postgresql.enabled false }}
      - name: cloud-sql-proxy
        image: gcr.io/cloudsql-docker/gce-proxy:1.17
        command:
          - "/cloud_sql_proxy"
          - "-instances=<INSTANCE_CONNECTION_NAME>=tcp:<DB_PORT>"
          - "-credential_file=/secrets/service_account.json"
        securityContext:
          #fsGroup: 65532
          runAsNonRoot: true
          runAsUser: 65532
        volumeMounts:
        - name: db-con-mnt
          mountPath: /secrets/
          readOnly: true
      volumes:
      - name: db-con-mnt
        secret:
          secretName: db-service-account-credentials
      {{- end }}

Funny enough, if I kill the job with "kubectl delete jobs database-migration-job" after the migration is done the helm upgrade completes and my new app version gets installed.

Vess Perfanov
  • 41
  • 1
  • 7

2 Answers2

2

Well, I have a solution which will work but might be hacky. First of all this is Kubernetes is lacking feature which is in discussion in this issue.

With Kubernetes v1.17, containers in same Pods can share process namespaces. This enables us to kill proxy container from app container. Since this is a Kubernetes job there shouldn't be any anomalies to enable postStop handlers for app container.

With this solution when your app finishes and exits normally(or abnormally) then Kubernetes will run one last command from your dying container which will be kill another process in this case. This should result in job completion with success or fail depending on how you will be killing process. Process exit code will be container exit code, then it will be job exit code basically.

Akin Ozer
  • 1,001
  • 6
  • 14
  • Great info. "Pods can share process namespaces" was especially useful. I used process sharing to terminate the 'cloud-sql-proxy' executable by simply "kullall" after I did the migration in a command. That did the trick. However when I tried the "preStop" I think that I hit this bug: "https://github.com/kubernetes/kubernetes/issues/55807". Basically my preStop never executed. Maybe there is something I am missing. Thanks. – Vess Perfanov Jun 22 '20 at 19:18
  • Yeah, I wasn't aware how it would work differently in Job. You can convert your dockerfile command field to script.sh and create that file by starting line with your job then next line would be kill all processes and then it'd be run for sure. – Akin Ozer Jun 23 '20 at 07:10
1

With cloud-sql-proxy v2, the quiquiquit endpoint should be used.

run your cloud-sql-proxy with the flag --quitquitquit

Then, run your migrate command like:

command: [ "/bin/bash" ]
args: [ "-c", "python manage.py migrate --noinput && curl -X POST localhost:9091/quitquitquit" ]

Helpful links:

Guillaume Cisco
  • 2,859
  • 24
  • 25