0

If you embed the Stackdrvier client library in your application and the Google stack driver API has downtime (Google documentation indicates 99.95% or 21.92 minutes of downtime/month)

My question is: What will happen to my application during the downtime? Will logging info build up in memory? Will it cause application errors or will it discard the log data and continue on?

Drew
  • 701
  • 2
  • 10
  • 22

1 Answers1

3

Logging API downtimes can have different root causes and consequences. Google System Engineers have mechanisms in place to track and take mitigation actions so the downtime and its consequences are minimal but Google cannot guarantee data loss prevention in all outages all the time related to logging API.

Hopefully your application and pipeline can withstand up to (21.56 minutes) expected downtime a month (SLA 99.95%) as per the internal SLOs and SLAs of GCP.

The three scenarios you listed are plausible. In this period, your application sending the logs may have 500 responses from the network so it has to be able to deal with this kind of issue.

If the logging data manages to reach Google's platform but an outage prevents the data to be accessible, then Google's team will try their best to release backlogs, repopulate data, etc. They will post general notice on https://status.cloud.google.com/

If the issue is caused by the logging agent not sending data to our platform, then logging data may not be retrievable (but it could still be an infrastructural outage with one of the GCP products) or linked to something other than an outage like your application or its underlying host running out of resources or the logging agent being corrupted which is not covered by GCP Stackdriver SLA [1].

If the pipeline that ingests data from Logging API is backlogged, it could cause an outage but GCP team will try their best to make the data accessible after the outage ends.

If you suspect issues with Logging API malfunctioning, please contact support or file issue tracker or inspect open issues where Google's product team will provide updates live. Links below:

[1] https://cloud.google.com/stackdriver/sla#sla_exclusions

[2] create new incident: https://issuetracker.google.com/issues/new?component=187203&template=0

[3] open issues: https://issuetracker.google.com/savedsearches/559764

Ashik Mahbub
  • 156
  • 7
  • Thanks! I'll ask this again on the github client page because I need specific answers for what happens during downtime in https://github.com/googleapis/nodejs-logging-bunyan A google maintained stackdriver logging client) – Drew Aug 13 '19 at 22:37
  • Ashik - If the Stackdriver logging agent (say running on GCE) receives an error posting logs, will the agent retry? If yes, how large are the buffers/cache that Stackdriver logging uses? – John Hanley Aug 15 '19 at 02:36
  • 2
    In terms of the mechanisms of the google-fluentd retry mechanisms, you can inspect the Github project here : https://cloud.google.com/logging/docs/agent/#source. The google-fluentd agent is based off the open source FluentD project. There is some design docs on buffer design here that could help : https://docs.fluentd.org/buffer#how-buffer-works and GCP: and I believe this can also help: https://github.com/GoogleCloudPlatform/google-fluentd/blob/master/templates/etc/td-agent/td-agent.conf#L11 – Ashik Mahbub Aug 15 '19 at 14:07
  • Ashik - Thank you. – John Hanley Aug 15 '19 at 19:03