Logging solutions for Kubernetes (GKE)

Question

I'm looking to capture logs from a pod in kubernetes for two use cases:

Realtime -> for which I'm using kubectl logs --- right now
Not realtime -> using stackdriver to pipe to bigquery

For both use cases, everything is working, however, when a container exits early due to an error, I lose the logs (i.e. stackdriver doesn't pick them up fast enough).

Is this latency documented somewhere? And assuming stackdriver isn't fast enough, is there another logging solution that would prove more effective? I'm considering having a sidecar container that captures logs, but I'm not sure if this is the best approach.

I assume your Pods have `restartPolicy: Always`. You could try setting it to `Never` so that you have time to get the logs. The RepilcaSet Controller will allocate a new Pod so this shouldn't be a problem for you. Give it a try and let me know how you go — Serge, Jul 27 '20 at 21:16
@Serge can you elaborate a little more here? What do you mean by "so you have time to get the logs"? — Jay K., Jul 29 '20 at 17:46
With `restartPolicy: Always`, your containers will always be restarted when they fail (with an exponential backoff). If you disable this, your container will not be restarted in the pods. This would leave a period of time which allows your theory to be tested - that logs are being lost due to a timing/speed issue. — Serge, Jul 29 '20 at 18:48

yyyyahir · Accepted Answer · 2020-07-27T21:23:09.863

The logging stack on GKE uses fluentd to pick the logs from the stdout, stderr that the container runtime writes to the nodes, as show in the node logging agent approach.

This isn't much different from what you do when you use kubectl logs:

When you run kubectl logs as in the basic logging example, the kubelet on the node handles the request and reads directly from the log file, returning the contents in the response.

You issue doesn't sound like Stackdriver isn't fast enough but, your container runtime is, for some reason not writing logs to the aforementioned log file where fluentd picks the logs before exporting them.

Before changing the logging architecture, you might want to determine the reasons for pod failure and even customize the termination message path in order to later retrieve it with a custom fluentd log collector.

If this doesn't suit your needs, you can try Elasticsearch instead.

As for the sidecar approach, while it's completely feasible, the official documentation warns on some drawback on this approach:

Using a logging agent in a sidecar container can lead to significant resource consumption. Moreover, you won't be able to access those logs using kubectl logs command, because they are not controlled by the kubelet.

Finally, you should also consider that all the previous information relies on the fact that the container gets to the phase of creation and it's able to write to the log file. If your containers are having "early exits", meaning that aren't even created, then the logs might not even be there for a start, and Stackdriver will never pick them.

Edit:

To mention that you want to also consider that a failed container needs to write to both outputs, stdout and stderr. If it's failing "silently", that will also won't be reflected in Stackdriver.

Logging solutions for Kubernetes (GKE)

1 Answers1