I am trying to setup a Beam pipeline to read from Kafka using python API. I am able to setup consumer config and topic(s). How to update the pipeline to use the confluent schema registry and to define the Avro message value deserializer?
Asked
Active
Viewed 563 times
0
-
Does this answer your question? [beam dynamically decode avro record using schema registry](https://stackoverflow.com/questions/61943251/beam-dynamically-decode-avro-record-using-schema-registry) – Sakshi Gatyan Jun 15 '21 at 10:41
-
I tried the approach but getting AvroException due to the Enum usage in the Schema. However, I would like to receive as 'SpecificRecord' type not as 'GenericRecord' – Vim Jun 16 '21 at 13:20
1 Answers
0
You can provide a key, value deserializers using the Python API but the types this interface can return are limited (currently only byte and integer values). Deserializers that return Java types (for example GenericRecord
) do not make much sense for Python. Will it be possible to use the byte deserializer (default) and process returned bytes using a Beam Python transform ? Also Beam Kafka transforms do not provide in-built support for connecting to the Confluent Schema registry but you might do so from a Beam ParDo
transform (using a Kafka client library).

chamikara
- 1,896
- 1
- 9
- 6
-
I tried to setup a simple pipeline to read the kafka records and log the records as described in [link](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py). I don't see any log output and I am using DirectRunner. The Expansion service is automatically started and also runs the apache/beam_java11_sdk:2.29.0 image. Is DirectRunner not compatible for KafkaIO? or only PortableRunner is preferred? – Vim Jul 08 '21 at 11:49