Update: I spun up an EC2 instance and was able to get the example below to work, which confirms that this is a connectivity issue with Docker on Mac.
Update: I still face this error even when I bring down the Flink Server Container and Kafka, which leads to believe this is a connectivity issue
I recently tried processing a Kafka Stream with Python, Apache Beam, and Apache Flink using tutorial tutorial. Based on the tutorial, I setup Flink with the following command:
docker run --net=host apache/beam_flink1.13_job_server:latest
Doing so results in the following:
Jul 14, 2021 8:40:47 PM org.apache.beam.runners.jobsubmission.JobServerDriver createArtifactStagingService
INFO: ArtifactStagingService started on localhost:8098
Jul 14, 2021 8:40:47 PM org.apache.beam.runners.jobsubmission.JobServerDriver createExpansionService
INFO: Java ExpansionService started on localhost:8097
Jul 14, 2021 8:40:47 PM org.apache.beam.runners.jobsubmission.JobServerDriver createJobServer
INFO: JobService started on localhost:8099
Jul 14, 2021 8:40:47 PM org.apache.beam.runners.jobsubmission.JobServerDriver run
INFO: Job server now running, terminate with Ctrl+C
When running my script with python main.py
(shown below) I get the following error:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1626301362.091496000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3009,"referenced_errors":[{"created":"@1626301362.091494000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":398,"grpc_status":14}]}"
Does anyone know of a quick workaround for this? I should note I found this
main.py
import apache_beam as beam
from apache_beam.io.kafka import ReadFromKafka
from apache_beam.options.pipeline_options import PipelineOptions
if __name__ == '__main__':
options = PipelineOptions([
"--runner=PortableRunner",
"--job_endpoint=localhost:8099",
"--environment_type=LOOPBACK",
])
pipeline = beam.Pipeline(options=options)
result = (
pipeline
| "Read from kafka" >> ReadFromKafka(
consumer_config={
"bootstrap.servers": 'localhost:9092',
},
topics=['demo'],
expansion_service='localhost:8097',
)
| beam.Map(print)
)
pipeline.run()