I am using client-go to continuouslly pull log streams from kubernetes pods. Most of the time everything works as expected, until the job runs couple of hours.
The code is like below:
podLogOpts := corev1.PodLogOptions{ Follow: true, }
kubeJob, err := l.k8sclient.GetKubeJob(l.job.GetNamespace(), l.job.GetJobId())
...
podName := l.k8sclient.GetKubeJobPodNameByJobId(l.job.GetNamespace(), l.job.GetJobId())
req := l.k8sclient.GetKubeClient().CoreV1().Pods(l.job.GetNamespace()).GetLogs(podName, &podLogOpts)
podLogStream, err := req.Stream(context.TODO())
...
for {
copied, err := podLogStream.Read(buf)
if err == io.EOF {
// here is place where error happens
// usually after many hours, the podLogStream return EOF.
// I checked the pod status it is still running and keeps printing data to pod stdout. why would this happend???
break
}
...
}
The podLogStream returns EOF about 3-4 hours later. But I checked the pod status and found pod is still running and the service inside keeps printing data to the stdout. So why would this happend? How to fix it?
UPDATE I found that every 4 hours the pod stream api -- read -- would return EOF so I have to make the goroutine sleep and retry a second later, by recreating the pogLogStream and reading logs from new stream object. It works. But why would this happen??