0

Problem Statement:

I have an input PCollection with following fields:

{
   firstname_1,
   lastname_1,
   dob,
   firstname_2,
   lastname_2, 
   firstname_3,
   lastname_3,
}

then I execute a Beam SQL operation such that output of resultant PCollection should be like

 ----------------------------------------------
   name.firstname |  name.lastname | dob
 ---------------------------------------------- 
      firstname_1 |  lastname_1    | 202009
      firstname_2 |  lastname_2    | 
      firstname_3 |  lastname_3    |
-----------------------------------------------

To be precise:

array[
    (firstname_1,lastname_1,dob),
    (firstname_2,lastname_2,dob),
    (firstname_3,lastname_3,dob)
]

Here is the code snippet where I execute Beam SQL:

PCollectionTuple tuple=
    PCollectionTuple.of(new TupleTag<>("testPcollection"), testPcollection);

PCollection<Row> result = tuple
    .apply(SqlTransform.query(
        "SELECT array[(firstname_1,lastname_1,dob), (firstname_2,lastname_2,dob), (firstname_3,lastname_3,dob)]"));

I am not getting proper results.

Can someone guide me how to query an array of repeated field in Beam SQL?

Kenn Knowles
  • 5,838
  • 18
  • 22

2 Answers2

0

You can take a look at this example on how to access arrays in Beam SQL - https://github.com/apache/beam/blob/d110f6b7610b26edc1eb9a4b698840b21c151847/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslNestedRowsTest.java#L234

Jayadeep Jayaraman
  • 2,747
  • 3
  • 15
  • 26
  • hi i just had a look at the link which u shared , in the example mentioned in link input schema itself is having an array type, but solution that i am trying to find is : input schema will be normal set of fields, and output schema will have array type where fields from input will be grouped under them , can u help pls help me on this :) – Spiriter_rider May 30 '20 at 06:20
0

Your SQL query has a few errors.

  1. You have named the input to the SQL query testPcollection. Your SQL query does not select FROM testPcollection. Let us assume you meant it to be FROM testPcollection.
  2. You use the syntax (firstname_1, lastname_1, doc) in both your expected output and your query. This is not any valid SQL expression.
Kenn Knowles
  • 5,838
  • 18
  • 22