The problem: Some steps create entities in K8b that must eventually be removed regardless of the success of any other steps in the build.
Here is an example of cloudbuild.yaml
steps:
# TEST NAMESPACE
#
# some previous steps...
# package jar
# build container & etc.
# Kubernetes RUN DB
- name: 'gcr.io/cloud-builders/gke-deploy'
id: deploy-db
waitFor: ['-']
args: ['apply',
'--filename', './k8b/db/',
'--location', 'somewhere',
'--cluster', 'my-trololo-cluster']
# Run something else in Kubernetes
- name: 'gcr.io/cloud-builders/gke-deploy'
id: deploy-other-things
waitFor:
- 'deploy-db'
args: ['apply',
'--filename', './k8b/other-things/',
'--location', 'somewhere',
'--cluster', 'my-trololo-cluster']
# Test DB in pod
- name: 'gcr.io/cloud-builders/gke-deploy'
id: test-db
waitFor:
- 'deploy-other-things'
entrypoint: 'bash'
args: ['./scripts/test_db.sh']
# Run REST-API
- name: 'gcr.io/cloud-builders/gke-deploy'
id: deploy-rest
waitFor:
- 'deploy-other-things'
args: ['run',
'--filename', './k8b/rest/',
'--location', 'somewhere',
'--cluster', 'my-trololo-cluster' ]
# Test REST-API
- name: 'gcr.io/cloud-builders/gke-deploy'
id: test-REST-API
waitFor:
- 'deploy-rest'
- 'test-db'
entrypoint: 'bash'
args: ['./scripts/test_rest_api.sh']
# Cleanup steps
- name: 'gcr.io/cloud-builders/gke-deploy'
id: cleanup
waitFor:
- 'test-REST-API'
entrypoint: 'kubectl'
args: [ 'delete', '--filename', './k8b', '--recursive' ]
# Delete PERSISTENT VOLUME
- name: 'gcr.io/cloud-builders/gke-deploy'
id: delete-persistent-volume
waitFor:
- 'test-REST-API'
entrypoint: 'bash'
args:
- '-c'
- |
pvc_name=$(kubectl get pvc --selector=$sel -o jsonpath={.items..metadata.name})
kubectl delete pvc ${pvc_name}
So, if any step before the “cleanup steps” fails, the deletion of entities in GKE will not occur. And the next cloudbuild runs will not happen on a clean cluster.
I can't find any solution to this case in the docs.
And I think that at the moment I can solve this problem using bash scripts at every step, and if there is a crash, I need to:
- catch it in a bash script;
- give a command to clear the cluster inside the bash script;
- then exit the script with a non-zero code. And so in every step.
But this is not a very good solution in my opinion. Maybe there is some better solution?