I try to run following code snippet using Apache Beam SDK for Python and get the java.lang.RuntimeException
import apache_beam as beam
from apache_beam.io.external.kafka import ReadFromKafka
from apache_beam.io.external.kafka import WriteToKafka
from apache_beam.options.pipeline_options import PipelineOptions
# some variables definition
conf = {
'auto.offset.reset': 'latest',
'basic.auth.credentials.source': 'SASL_INHERIT',
'bootstrap.servers': '{}:{}'.format(host, sasl_port),
'client.id': 'demo-py-client',
'enable.auto.commit': 'true',
'group.id': 'group_id',
'isolation.level': 'read_uncommitted',
'sasl.mechanism': 'PLAIN',
'sasl.jaas.config': "org.apache.kafka.common.security.plain.PlainLoginModule required username='{}' password='{}';".format(username, password),
'sasl.password': password,
'sasl.username': username,
'schema.registry.client.cache.capacity': '1000',
'schema.registry.url': 'https://{}:{}@{}:{}'.format(username, password, host, 29650),
'security.protocol': 'SASL_SSL',
'enable.ssl.certificate.verification': 'false',
'specific.avro.reader': 'false'
}
with beam.Pipeline(options=PipelineOptions()) as pipeline:
(
pipeline
| 'Read' >> ReadFromKafka(
consumer_config=conf,
topics=[topic],
with_metadata=False,
key_deserializer='io.confluent.kafka.serializers.KafkaAvroDeserializer',
value_deserializer='io.confluent.kafka.serializers.KafkaAvroDeserializer',
)
| 'Print' >> beam.Map(print)
)
And getting the following errors stack
{
"name": "RuntimeError",
"message": "java.lang.RuntimeException: Failed to build transform beam:transform:org.apache.beam:kafka_read_without_metadata:v1 from spec urn: "beam:transform:org.apache.beam:kafka_read_without_metadata:v1
org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$1.getTransform(ExpansionService.java:151)
org.apache.beam.sdk.expansion.service.ExpansionService$TransformProvider.apply(ExpansionService.java:400)
org.apache.beam.sdk.expansion.service.ExpansionService.expand(ExpansionService.java:526)
org.apache.beam.sdk.expansion.service.ExpansionService.expand(ExpansionService.java:606)
org.apache.beam.model.expansion.v1.ExpansionServiceGrpc$MethodHandlers.invoke(ExpansionServiceGrpc.java:305)
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:354)
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: Couldn't resolve coder for Deserializer: class io.confluent.kafka.serializers.KafkaAvroDeserializer
org.apache.beam.sdk.io.kafka.KafkaIO$Read$Builder.resolveCoder(KafkaIO.java:819)
org.apache.beam.sdk.io.kafka.KafkaIO$Read$Builder.setupExternalBuilder(KafkaIO.java:750)
org.apache.beam.sdk.io.kafka.KafkaIO$TypedWithoutMetadata$Builder.buildExternal(KafkaIO.java:1686)
org.apache.beam.sdk.io.kafka.KafkaIO$TypedWithoutMetadata$Builder.buildExternal(KafkaIO.java:1677)
org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$1.getTransform(ExpansionService.java:145)\n\t... 12 more
}
I run the code on Apple M1 platform using MiniConda 22.11.1 and Python 3.9.16 with requirements:
Package Version
------------------------------- ----------
apache-beam 2.44.0
appnope 0.1.2
asttokens 2.0.5
avro 1.11.1
backcall 0.2.0
cachetools 4.2.4
certifi 2022.12.7
charset-normalizer 3.0.1
cloudpickle 2.2.1
comm 0.1.2
confluent-kafka 2.0.2
crcmod 1.7
debugpy 1.5.1
decorator 5.1.1
dill 0.3.1.1
docopt 0.6.2
entrypoints 0.4
executing 0.8.3
facets-overview 1.0.0
fastavro 1.7.0
fasteners 0.18
google-api-core 2.11.0
google-apitools 0.5.31
google-auth 2.16.0
google-auth-httplib2 0.1.0
google-cloud-bigquery 3.4.2
google-cloud-bigquery-storage 2.13.2
google-cloud-bigtable 1.7.3
google-cloud-core 2.3.2
google-cloud-dataproc 3.1.1
google-cloud-datastore 1.15.5
google-cloud-dlp 3.11.1
google-cloud-language 1.3.2
google-cloud-pubsub 2.14.0
google-cloud-pubsublite 1.6.0
google-cloud-recommendations-ai 0.7.1
google-cloud-spanner 3.27.0
google-cloud-videointelligence 1.16.3
google-cloud-vision 3.3.1
google-crc32c 1.5.0
google-resumable-media 2.4.1
googleapis-common-protos 1.58.0
grpc-google-iam-v1 0.12.6
grpcio 1.51.1
grpcio-status 1.48.2
hdfs 2.7.0
httplib2 0.20.4
idna 3.4
ipykernel 6.19.2
ipython 8.7.0
ipywidgets 8.0.4
jedi 0.18.1
jupyter-client 6.1.12
jupyter_core 5.1.1
jupyterlab-widgets 3.0.5
matplotlib-inline 0.1.6
nest-asyncio 1.5.6
numpy 1.22.4
oauth2client 4.1.3
objsize 0.6.1
orjson 3.8.5
overrides 6.5.0
packaging 23.0
pandas 1.5.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pip 22.3.1
platformdirs 2.5.2
prompt-toolkit 3.0.36
proto-plus 1.22.2
protobuf 3.20.3
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 9.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pydot 1.4.2
Pygments 2.11.2
pymongo 3.13.0
pyparsing 3.0.9
python-dateutil 2.8.2
pytz 2022.7.1
pyzmq 23.2.0
regex 2022.10.31
requests 2.28.2
rsa 4.9
setuptools 65.6.3
six 1.16.0
sqlparse 0.4.3
stack-data 0.2.0
timeloop 1.0.2
tornado 6.2
traitlets 5.7.1
typing_extensions 4.4.0
urllib3 1.26.14
wcwidth 0.2.5
wheel 0.37.1
widgetsnbextension 4.0.5
zstandard 0.19.0
Could someone make suggestion, what could be the possible problem?