What is the use of specifying beam schemas in our code when we are reading source? How does it make our pipeline more efficient?
Asked
Active
Viewed 63 times
1 Answers
3
Schemas are not so much for facilitating code reading, but for making it easier to do type conversions inside the pipeline (better performance, less need to specify coders, etc), and for being able to apply higher level transforms, such Beam SQL, joins in Java or dataframes in Python.
The pipeline is more efficient because of type conversions. Schema types have a direct mapping to the programming language types, and they also have efficient coders for serialization. But I would say the main purpose and advantage of schemas is in the possibilities I mention in the previous paragraph.

OneCricketeer
- 179,855
- 19
- 132
- 245

Israel Herraiz
- 611
- 3
- 8