I have an indexed k8s job with thousands of indexes. When job fails due to BackoffLimitExceeded, it is hard to identify the exact index or set of indexes that caused the failure. Is there an easier way to identify the failing indexes from the Job APIs?
Currently, I have to go through logs from over 1k pods to identify the erroring pod(s).