0

I have data like

[{"ProjectId":1476401625,"ProjectName":"This is project name","ProjectPostcode":4178},{"ProjectId":2343,"ProjectName":"This is project 2 name","ProjectPostcode":5323}]

I need to to deserialize it to Java object and I use this code :

PCollection<Project> deserialisedProjectObject = projectFile.apply("Deserialize Projects", ParseJsons.of(Project.class))
        .setCoder(SerializableCoder.of(Project.class));

but I always got error

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: Failed to parse a com.lendlease.dp.entity.Project from JSON value: [{"ProjectId":1476401625,"ProjectName":"This is project name","ProjectPostcode":4178},{"ProjectId":2343,"ProjectName":"This is project 2 name","ProjectPostcode":5323}]

If I change the code to become :

PCollection<Project[]> deserialisedProjectObject = projectFile.apply("Deserialize Projects", ParseJsons.of(Project[].class))
        .setCoder(SerializableCoder.of(Project[].class));

The runner able to deserialize it but I need this line to return a collection of Project; not collection of Project array

Alexey Romanenko
  • 1,353
  • 5
  • 11
  • really hard to read it without proper formatting, You can use ``` ``` for json. This way you would have mach more attention to your question – user2932688 Nov 12 '20 at 22:56

1 Answers1

0

You are starting with a Project[] object, so the parse is correct. To extract the Project objects from that object, just apply a FlatMap transform after the ParseJson, outputting the elements within the Array.

As well as ParseJson you may want to look at:

JsonToRow

The output of this is a Row object which you can use as a schema which provide a lot of nice functionality, see using schemas. If you need a an actual POJO within the pipeline as well as the Row object you can make use of Convert.fromRow to turn it into a Pojo object.

Reza Rokni
  • 1,206
  • 7
  • 12