2

I'm building a Dataflow pipeline to process Stackdriver logs, the data are read from Pub/Sub and results written into BigQuery. When I read from Pub/Sub I get JSON strings of LogEntry objects but what I'm really interested in is protoPayload.line records which contain user log messages. To get those I need to parse LogEntry JSON object and I found a two years old Google example how to do it:

import com.google.api.client.json.JsonParser;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.logging.model.LogEntry;

try {
    JsonParser parser = new JacksonFactory().createJsonParser(entry);
    LogEntry logEntry = parser.parse(LogEntry.class);
    logString = logEntry.getTextPayload();
}
catch (IOException e) {
    LOG.error("IOException parsing entry: " + e.getMessage());
}
catch(NullPointerException e) {
    LOG.error("NullPointerException parsing entry: " + e.getMessage());
}

Unfortunately this doesn't work for me, the logEntry.getTextPayload() returns null. I'm not even sure if it's suppose to work as com.google.api.services.logging library is not mentioned anywhere in Google Cloud docs, the current logging library seems to be google-cloud-logging.

So if anyone could suggest what is the right or simplest way of parsing LogEntry objects?

dmitryb
  • 281
  • 2
  • 12

1 Answers1

1

I ended up with manually parsing LogEntry JSON with gson library, using the tree traversing approach in particular. Here is a small snippet:

static class ProcessLogMessages extends DoFn<String, String> {
    @ProcessElement
    public void processElement(ProcessContext c) {
        String entry = c.element();

        JsonParser parser = new JsonParser();
        JsonElement element = parser.parse(entry);
        if (element.isJsonNull()) {
            return;
        }
        JsonObject root = element.getAsJsonObject();
        JsonArray lines = root.get("protoPayload").getAsJsonObject().get("line").getAsJsonArray();
        for (int i = 0; i < lines.size(); i++) {
            JsonObject line = lines.get(i).getAsJsonObject();
            String logMessage = line.get("logMessage").getAsString();

            // Do what you need with the logMessage here
            c.output(logMessage);
        }
    }
}

This is simple enough and works fine for me since I'm interested in protoPayload.line.logMessage objects only. But I guess this is not ideal way of parsing LogEntry objects if you need to work with many attributes.

dmitryb
  • 281
  • 2
  • 12