4

Short of writing my own function to do it, what is the easiest way to convert a TableRow object, inside a dataflow 2.x pipeline, to a JSON-formatted String?

I thought the code below would work, but it isn't correctly inserting quotes in between key/values, especially where there are nested fields.

public static class TableRowToString extends DoFn<TableRow, String> {    
  private static final long serialVersionUID = 1L;

  @ProcessElement
    public void processElement(ProcessContext c) {
      c.output(c.element().toString());
    }
  }
}
Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Max
  • 808
  • 11
  • 25

2 Answers2

4

Use GSON and do a gson.toJson(yourTableRow) details here

PUG
  • 4,301
  • 13
  • 73
  • 115
4

I ran into the same problem, I solved by using org.apache.beam.sdk.extensions.jackson.AsJsons.

To use it, it is not necessary to create a new transform, you can apply it directly on the pipeline.

import org.apache.beam.sdk.extensions.jackson.AsJsons;

Pipeline p = Pipeline.create(options);

p.apply("The transform that returns a PCollection of TableRow")
.apply("JSon Transform", AsJsons.of(TableRow.class));

And if you are managing your project with maven, you can add this to the <dependencies> in the pom.xml file

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-sdks-java-extensions-json-jackson</artifactId>
  <version>2.5.0</version>
  <scope>compile</scope>
</dependency>
  • This is interesting. I wonder what the efficiency of AsJsons is? But comparison, we have found that Gson is very expensive to use; At this point we try to use it exclusively in unit tests to transform JSON strings into objects for use as inputs. – Max Oct 17 '18 at 20:37
  • I am running samples with a 100 TableRows and about 20 columns so I can't say much about the efficiency right now. – Rafael Alves Oct 18 '18 at 15:42
  • Update about the efficiency: I ran the Json Transform for 142,515,724 TableRows and it took 14 min 22 sec on dataflow – Rafael Alves Oct 24 '18 at 19:17
  • What's the return value of the `AsJsons` transform? is it like `PCollection`? – bigbounty Jan 23 '20 at 06:53
  • @bigbounty looking at the documentation, `AsJsons.of` returns a `PCollection ` _Creates a AsJsons PTransform that will transform a PCollection into a PCollection of JSON Strings representing those objects using a Jackson ObjectMapper._ https://beam.apache.org/releases/javadoc/2.0.0/org/apache/beam/sdk/extensions/jackson/AsJsons.html#of-java.lang.Class- – Rafael Alves Jan 23 '20 at 18:52
  • It should `PCollection` right? Because the docs say JSON ***Strings*** – bigbounty Jan 24 '20 at 03:40
  • You are probably right, but I do not have the environment test this anymore – Rafael Alves Jan 29 '20 at 22:16