0

I try to run following code snippet using Apache Beam SDK for Python and get the java.lang.RuntimeException

import apache_beam as beam

from apache_beam.io.external.kafka import ReadFromKafka
from apache_beam.io.external.kafka import WriteToKafka
from apache_beam.options.pipeline_options import PipelineOptions

# some variables definition

conf = {
    'auto.offset.reset': 'latest',
    'basic.auth.credentials.source': 'SASL_INHERIT',
    'bootstrap.servers': '{}:{}'.format(host, sasl_port),
    'client.id': 'demo-py-client',
    'enable.auto.commit': 'true',
    'group.id': 'group_id',
    'isolation.level': 'read_uncommitted',
    'sasl.mechanism': 'PLAIN',
    'sasl.jaas.config': "org.apache.kafka.common.security.plain.PlainLoginModule required username='{}' password='{}';".format(username, password),
    'sasl.password': password,
    'sasl.username': username,
    'schema.registry.client.cache.capacity': '1000',
    'schema.registry.url': 'https://{}:{}@{}:{}'.format(username, password, host, 29650),
    'security.protocol': 'SASL_SSL',
    'enable.ssl.certificate.verification': 'false',
    'specific.avro.reader': 'false'
}

with beam.Pipeline(options=PipelineOptions()) as pipeline:
    (
        pipeline
        | 'Read' >> ReadFromKafka(
                consumer_config=conf,
                topics=[topic],
                with_metadata=False,
                key_deserializer='io.confluent.kafka.serializers.KafkaAvroDeserializer',
                value_deserializer='io.confluent.kafka.serializers.KafkaAvroDeserializer',
            )
        | 'Print' >> beam.Map(print)
    )

And getting the following errors stack

{
"name": "RuntimeError",
"message": "java.lang.RuntimeException: Failed to build transform beam:transform:org.apache.beam:kafka_read_without_metadata:v1 from spec urn: "beam:transform:org.apache.beam:kafka_read_without_metadata:v1

org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$1.getTransform(ExpansionService.java:151)

org.apache.beam.sdk.expansion.service.ExpansionService$TransformProvider.apply(ExpansionService.java:400)

org.apache.beam.sdk.expansion.service.ExpansionService.expand(ExpansionService.java:526)

org.apache.beam.sdk.expansion.service.ExpansionService.expand(ExpansionService.java:606)

org.apache.beam.model.expansion.v1.ExpansionServiceGrpc$MethodHandlers.invoke(ExpansionServiceGrpc.java:305)

org.apache.beam.vendor.grpc.v1p48p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
 org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:354)

org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)

org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)

org.apache.beam.vendor.grpc.v1p48p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)

java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)

java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

java.base/java.lang.Thread.run(Thread.java:829)

Caused by: java.lang.RuntimeException: Couldn't resolve coder for Deserializer: class io.confluent.kafka.serializers.KafkaAvroDeserializer

org.apache.beam.sdk.io.kafka.KafkaIO$Read$Builder.resolveCoder(KafkaIO.java:819)

org.apache.beam.sdk.io.kafka.KafkaIO$Read$Builder.setupExternalBuilder(KafkaIO.java:750)

org.apache.beam.sdk.io.kafka.KafkaIO$TypedWithoutMetadata$Builder.buildExternal(KafkaIO.java:1686)

org.apache.beam.sdk.io.kafka.KafkaIO$TypedWithoutMetadata$Builder.buildExternal(KafkaIO.java:1677)

org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader$1.getTransform(ExpansionService.java:145)\n\t... 12 more
}

I run the code on Apple M1 platform using MiniConda 22.11.1 and Python 3.9.16 with requirements:

Package                         Version
------------------------------- ----------
apache-beam                     2.44.0
appnope                         0.1.2
asttokens                       2.0.5
avro                            1.11.1
backcall                        0.2.0
cachetools                      4.2.4
certifi                         2022.12.7
charset-normalizer              3.0.1
cloudpickle                     2.2.1
comm                            0.1.2
confluent-kafka                 2.0.2
crcmod                          1.7
debugpy                         1.5.1
decorator                       5.1.1
dill                            0.3.1.1
docopt                          0.6.2
entrypoints                     0.4
executing                       0.8.3
facets-overview                 1.0.0
fastavro                        1.7.0
fasteners                       0.18
google-api-core                 2.11.0
google-apitools                 0.5.31
google-auth                     2.16.0
google-auth-httplib2            0.1.0
google-cloud-bigquery           3.4.2
google-cloud-bigquery-storage   2.13.2
google-cloud-bigtable           1.7.3
google-cloud-core               2.3.2
google-cloud-dataproc           3.1.1
google-cloud-datastore          1.15.5
google-cloud-dlp                3.11.1
google-cloud-language           1.3.2
google-cloud-pubsub             2.14.0
google-cloud-pubsublite         1.6.0
google-cloud-recommendations-ai 0.7.1
google-cloud-spanner            3.27.0
google-cloud-videointelligence  1.16.3
google-cloud-vision             3.3.1
google-crc32c                   1.5.0
google-resumable-media          2.4.1
googleapis-common-protos        1.58.0
grpc-google-iam-v1              0.12.6
grpcio                          1.51.1
grpcio-status                   1.48.2
hdfs                            2.7.0
httplib2                        0.20.4
idna                            3.4
ipykernel                       6.19.2
ipython                         8.7.0
ipywidgets                      8.0.4
jedi                            0.18.1
jupyter-client                  6.1.12
jupyter_core                    5.1.1
jupyterlab-widgets              3.0.5
matplotlib-inline               0.1.6
nest-asyncio                    1.5.6
numpy                           1.22.4
oauth2client                    4.1.3
objsize                         0.6.1
orjson                          3.8.5
overrides                       6.5.0
packaging                       23.0
pandas                          1.5.3
parso                           0.8.3
pexpect                         4.8.0
pickleshare                     0.7.5
pip                             22.3.1
platformdirs                    2.5.2
prompt-toolkit                  3.0.36
proto-plus                      1.22.2
protobuf                        3.20.3
psutil                          5.9.0
ptyprocess                      0.7.0
pure-eval                       0.2.2
pyarrow                         9.0.0
pyasn1                          0.4.8
pyasn1-modules                  0.2.8
pydot                           1.4.2
Pygments                        2.11.2
pymongo                         3.13.0
pyparsing                       3.0.9
python-dateutil                 2.8.2
pytz                            2022.7.1
pyzmq                           23.2.0
regex                           2022.10.31
requests                        2.28.2
rsa                             4.9
setuptools                      65.6.3
six                             1.16.0
sqlparse                        0.4.3
stack-data                      0.2.0
timeloop                        1.0.2
tornado                         6.2
traitlets                       5.7.1
typing_extensions               4.4.0
urllib3                         1.26.14
wcwidth                         0.2.5
wheel                           0.37.1
widgetsnbextension              4.0.5
zstandard                       0.19.0

Could someone make suggestion, what could be the possible problem?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • I don't have an explanation, but you can write your own avro serializer/ deserializer utilizing e.g. `AvroDeserializer` from `confluent_kafka.schema_registry.avro` module if that helps – Hej Ja Apr 05 '23 at 09:23

1 Answers1

0

I had a similar issue today, with the following output logs :

RuntimeError: java.lang.RuntimeException: Failed to get dependencies of beam:transform:org.apache.beam:kafka_read_with_metadata:v1 from spec urn: "beam:transform:org.apache.beam:kafka_read_with_metadata:v1"
...
Caused by: java.lang.UnsupportedOperationException: Cannot define class using reflection: Unable to make protected java.lang.Package java.lang.ClassLoader.getPackage(java.lang.String) accessible: module java.base does not "opens java.lang" to unnamed module @34a3d150
...

Root cause analysis:
Apache Beam isn't yet compatible with jre-openjdk 20.0.1

Solve:
Downgrade jre-openjdk and jre-openjdk-headless to the version 19.0.2 Command to solve on archlinux-like OS :

sudo pacman -U https://archive.archlinux.org/packages/j/jre-openjdk/jre-openjdk-19.0.2.u7-2-x86_64.pkg.tar.zst https://archive.archlinux.org/packages/j/jre-openjdk-headless/jre-openjdk-headless-19.0.2.u7-2-x86_64.pkg.tar.zst

For pacman-U usage : https://wiki.archlinux.org/title/Arch_Linux_Archive#How_to_downgrade_one_package

Note:
I have also have to consume messages from a Confluent platform. I plan to create a custom Dataflow docker image in order to add the requirements for the Confluent Python SDK and then deserialize using a Map transform and the SDK. I exchanged with a Google expert on the subject that confirm this approach. I think it cans helps someone. I have also have to consume messages from a Confluent platform. I plan to create a custom Dataflow docker image in order to add the requirements for the Confluent Python SDK and then deserialize using a Map transform and the SDK. I discussed with a Google expert on the subject that confirm this approach. I think it can help someone.

Antoine Pointeau
  • 399
  • 2
  • 14