Spring Batch Restartability on Kubernetes for File Operations

Question

I want to learn what is the proper way to reach the processed files when restarting the spring batch application on Kubernetes. Especially if the target type is file, it is being deleted together with the pod after the job failed.

We are considering to use persistent volume or backing up the created file somewhere such as DB or sftp server by implementing a listener.

Is there anyone have the experience of persistent volume usage(nfs or other solutions) for file operations. We are concerned about the performance and unexpected problems. Do you have any suggestions?

Thank you.

score 2 · Answer 1 · answered Jun 06 '21 at 19:09

If you want data persistence, you may begin by using hostPath volumes first. This will restrict which nodes your pods may be spawned on. But is the simplest and gives you the best performance.

https://kubernetes.io/docs/concepts/storage/volumes/#hostpath

If you want dynamic allocation, you will need to configure storage solutions such as GlusterFS, NFS, CEPH etc.

score 2 · Accepted Answer · answered Jun 07 '21 at 09:00

2

You should not rely on the ephemeral file system of a Pod for files that should persist and survive a Job (Pod) failure.

You need to use a persistent volume for that, so that Spring Batch can find the (incomplete) output file in a restart scenario and resume writing where it left off.

answered Jun 07 '21 at 09:00

Mahmoud Ben Hassine

28,519
3
32
50

Thanks for reply. Yes as you said we want to use PV for restart scenarios. Have you ever experienced it for huge file operations? We are wondering its performance or any side effects. – Yasar Jun 13 '21 at 11:20
Yes, I have some experience that I shared here: https://stackoverflow.com/a/60930025/5019386. Never noticed any performance issues when working on file operations in a PV. – Mahmoud Ben Hassine Jun 13 '21 at 20:32

Spring Batch Restartability on Kubernetes for File Operations

2 Answers2