1

We deployed DataFlows in Google Cloud. Dataflows are developed using Apache Beam.

Dataflow logging doesn't include the transaction id, which is needed for tracing the transaction in the pipeline.

Any logging pattern used in the logback is being ignored by Google Cloud.

How do we capture the trace id in Google Cloud logging ?

logback.xml

<configuration >
<property name="projectId" value="${projectId:-${GOOGLE_CLOUD_PROJECT}}"/>
<appender name="CONSOLE_JSON" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
        <layout class="org.springframework.cloud.gcp.logging.StackdriverJsonLayout">
            <projectId>${projectId}</projectId>
            <includeTraceId>true</includeTraceId>
            <includeSpanId>true</includeSpanId>
            <includeLevel>true</includeLevel>
            <includeThreadName>true</includeThreadName>
            <includeMDC>true</includeMDC>
            <includeLoggerName>true</includeLoggerName>
            <includeFormattedMessage>true</includeFormattedMessage>
            <includeExceptionInMessage>true</includeExceptionInMessage>
            <includeContextName>true</includeContextName>
            <includeMessage>false</includeMessage>
            <includeException>false</includeException>
        </layout>
    </encoder>
</appender>

<root level="INFO">
    <appender-ref ref="CONSOLE_JSON"/>
</root>
</configuration>

Java:
     MDC.put("traceId", "12345");
     log.info("Logging from test class");


Google Cloud:

jsonPayload: {
  job: "2022-09-08_19_05_07-12432432432"
  logger: "TestLogger"
  message: "Logging from test class"
  stage: "A1"
  step: "Test Step"
  thread: "49"
  work: "3243243"
  worker: "test-worker"
}

1 Answers1

1

Dataflow relies on using java.util.logging (aka JUL) as the logging backend for SLF4J and adds various bridges ensuring that logs from other libraries are output as well. With this kind of setup, we are limited to adding any additional details to the log message itself only.

This also applies to any runner executing a portable job since the container with the SDK harness has a similar logging configuration. For example Dataflow Runner V2.

To do this we want to create a custom formatter to apply to the root JUL logger. For example:

public class CustomFormatter extends SimpleFormatter {
  public String formatMessage(LogRecord record) {
    // implement whatever logic the is needed to add details to the message portion of the log statement
    return super.formatMessage(record);
  }
}

And then during start-up of the worker we need to update the root logger to use this formatter. We can achieve this using a JvmInitializer and implement the beforeProcessing method like so:

@AutoService(JvmInitializer.class)
public class LoggerInitializer implements JvmInitializer {
  public void beforeProcessing(PipelineOptions options) {
    LogManager logManager = LogManager.getLogManager();
    Logger rootLogger = logManager.getLogger("");
    for (Handler handler : rootLogger.getHandlers()) {
      handler.setFormatter(new CustomFormatter());
    }
  }
}
Lukasz Cwik
  • 1,641
  • 12
  • 14