1

How can I read nested structures using Apache Beam Python SDK?

lines = p | io.Read(io.BigQuerySource('project:test.beam_in'))

result in

"reason": "invalidQuery",
"message": "Cannot output multiple independently repeated fields at the same time. Found classification_item_distribution and category_cat_name"

Is it possible to read nested structures?

Evgeny Minkevich
  • 2,319
  • 3
  • 28
  • 42

2 Answers2

1

This is a property of BigQuery. The two ways to execute such a query are to disable result flattening (by BigQuery) or to explicitly flatten fields in your query.

With the current Python SDK only the latter is available - see "Flattening Google Analytics data (with repeated fields) not working anymore" for a guide on where and how to invoke the FLATTEN function.

The feature to disable flattening is filed as BEAM-877 if you care to subscribe to updates or discuss.

Community
  • 1
  • 1
Kenn Knowles
  • 5,838
  • 18
  • 22
1

You can now read nested results directly in Beam Python by adding flatten_results=False when creating your source:

lines = p | io.Read(io.BigQuerySource('project:test.beam_in', flatten_results=False))

See source here.

Kat
  • 1,604
  • 17
  • 24