1

I'm using a Dataflow Job template to stream data from a Pub/Sub Subscription to BigQuery. From each JSON file I need to transform the values and output multiple table rows at once to a BQ table. A simplified version of the JSON message arriving to Pub/Sub is as follows:

{"a":{"k1":v1, "k2":v2}, "b":{"k1":v1, "k2":v2}...}

And the transformed JSON instead should look like:

[{"k1":v1, "k2":v2}, {"k1":v1, "k2":v2}...]

This is a simplification of the UDF I've created:

function transformToTableRows(inJson) {
  var input = JSON.parse(inJson);
  var output = [];
  for (var elem in input) {
    output.push({"k1": input[elem].k1, "k2": input[elem].k2})
  }
  return JSON.stringify(output);
}

Unfortunately this wouldn't work and will log the error "Failed to serialize json to table row". Any suggestion on how can I fix this?

Jos
  • 63
  • 5
  • Are you using the Google provided template from the Dataflow UI? Where does the error come from, do you have detailed stacktrace? – ningk Jul 26 '21 at 20:27
  • @大ドア東, Yes I'm using the template from Dataflow UI. The stacktrace is as follows: java.lang.RuntimeException: Failed to serialize json to table row: [{"k1":v1, "k2":v2}, {"k1":v1, "k2":v2}...] Could it be that the template is meant to output only a single table row per message instead of multiple? – Jos Jul 28 '21 at 13:07
  • You can check the template source code [here](https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#cloudpubsubsubscriptiontobigquery). It seems that `the template is meant to output only a single table row per message`. – ningk Jul 28 '21 at 18:42

1 Answers1

0

As per the documentation the template is meant to output only a single table row per message. Thanks – 大ドア東

Jos
  • 63
  • 5