0

I am trying to execute the below code in which I am using Named Tuple for PCollection and SQL transform for doing a simple select.

As per the video link (4:06) : https://www.youtube.com/watch?v=zx4p-UNSmrA.

Instead of using PCOLLECTION in SQLTransform query, named PCollections can also be provided as below.

Code Block

class EmployeeType(typing.NamedTuple):
    name:str
    age:int

beam.coders.registry.register_coder(EmployeeType, beam.coders.RowCoder)

pcol = p | "Create" >> beam.Create([EmployeeType(name="ABC", age=10)]).with_output_types(EmployeeType)

(
{'a':pcol} | SqlTransform(
    """ SELECT age FROM a """) 
    | "Map" >> beam.Map(lambda row: row.age)
    | "Print" >> beam.Map(print) 
)

p.run()

However the below code block errors out with error

Caused by: org.apache.beam.vendor.calcite.v1_28_0.org.apache.calcite.sql.validate.SqlValidatorException: Object 'a' not found

Apache Beam SDK used is 2.35.0, are there any known limitation in using named PCollection

  • Try removing single quotes from 'a'. Use this ````{a:pcol}```` – Mr.Batra Feb 03 '22 at 08:13
  • @Mr.Batra - Tried your suggestion, does not work as if the quotes are removed, Python will treat it as variable and throw Name Error error NameError: name 'a' is not defined – Murli Krishnan Feb 04 '22 at 09:26

0 Answers0