Effect of using Apache Beam schemas

Question

What is the use of specifying beam schemas in our code when we are reading source? How does it make our pipeline more efficient?

score 3 · Accepted Answer · edited Sep 01 '22 at 18:17

Schemas are not so much for facilitating code reading, but for making it easier to do type conversions inside the pipeline (better performance, less need to specify coders, etc), and for being able to apply higher level transforms, such Beam SQL, joins in Java or dataframes in Python.

The pipeline is more efficient because of type conversions. Schema types have a direct mapping to the programming language types, and they also have efficient coders for serialization. But I would say the main purpose and advantage of schemas is in the possibilities I mention in the previous paragraph.

Effect of using Apache Beam schemas

1 Answers1