0

Try to use BeamSQL for unnest the nested type of PCollection. Lets assume the PCollection which have the Employees and its details. Here details are in nested collection. So if we use the BeamSQL like "SELECT PCOLLECTION.details FROM PCOLLECTION" then getting nested type of details as array collection in the separate PCollection. However when I want to get specific column from the nested type collection as details, then getting error like unable to find the column name. Tried the BeamSQL like (similar like BigQuery SQL) "SELECT X.address FROM PCOLLECTION, Unnest(details) as X" then getting nullpointer exception. Used 2.12.0 apache beam version.

Appreciate some one please help on this.

Below is the sample data of details nested Value (details has email, phone columns. so per row, 'n' no of list of details. Here it has two list of details):

WARNING: printValue:Row:[[Row:[lourdurajan@gmail.com, 9840618047], Row:[lourdurajan@sanmina.com, 9840618047]]]

Here is the Java stacktrace for second select statement:

SELECT `X`.`email`
FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`,
UNNEST(`PCOLLECTION`.`details`) AS `X`
May 08, 2019 11:23:30 AM org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner convertToBeamRel
INFO: SQLPlan>
LogicalProject(email=[$3])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{2}])
    BeamIOSourceRel(table=[[beam, PCOLLECTION]])
    Uncollect
      LogicalProject(details=[$cor0.details_2])
        LogicalValues(tuples=[[{ 0 }]])

May 08, 2019 11:23:30 AM org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner convertToBeamRel
INFO: BEAMPlan>
BeamCalcRel(expr#0..4=[{inputs}], email=[$t3])
  BeamUnnestRel(unnestIndex=[2])
    BeamIOSourceRel(table=[[beam, PCOLLECTION]])

[WARNING] 
java.lang.NullPointerException
    at org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toSchema(CalciteUtils.java:171)
    at org.apache.beam.sdk.extensions.sql.impl.rel.BeamUnnestRel$Transform.expand(BeamUnnestRel.java:93)
    at org.apache.beam.sdk.extensions.sql.impl.rel.BeamUnnestRel$Transform.expand(BeamUnnestRel.java:87)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
    at org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
    at org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
    at org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
    at org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36)
    at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:111)
    at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:79)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
    at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:370)
    at com.sanmina.BeamSQLUnnest.main(BeamSQLUnnest.java:217)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
    at java.lang.Thread.run(Thread.java:748)
lourdu rajan
  • 329
  • 1
  • 5
  • 24
  • Could you also share your java stacktrace? – Rui Wang May 07 '19 at 18:13
  • @Rui Wang, I have modified the question with Java stackrace and example of the details nested collection. Please have a look at it and let me know your feedback. – lourdu rajan May 08 '19 at 06:04
  • I cannot find examples in Calcite how only UNEST is used to flatten array like BigQuery: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening-arrays. The last approach you tried is UNNEST with JOIN, which passes Calcite, but BeamSQL has bugs in this approach. I have filed a JIRA for it: https://jira.apache.org/jira/browse/BEAM-7255 – Rui Wang May 09 '19 at 03:41
  • Thanks for your update. Let me know once its get resolved. – lourdu rajan May 09 '19 at 13:03

1 Answers1

0

You can achieve this using BigQueryIO.

String Query ="SELECT `X`.`email`
FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`,
UNNEST(`PCOLLECTION`.`details`) AS `X`"

BigQueryIO.readTableRows().fromQuery(query).usingStandardSql()
Pirama
  • 11
  • 5