2

TLDR;

It's possible to connect 2 pods in Kubernetes as they were in the same local-net with all the ports opened?

Motivation

Currently, we have airflow implemented in a Kubernetes cluster, and aiming to use TensorFlow Extended we need to use Apache beam. For our use case Spark would be the appropriate runner to be used, and as airflow and TensorFlow are coded in python we would need to use the Apache Beam's Portable Runner (https://beam.apache.org/documentation/runners/spark/#portability).

The problem

The communication between the airflow pod and the job server pod is resulting in transmitting errors (probably because of some random ports used by the job server).

Setup

To follow a good isolation practice and to imitate the Spark in Kubernetes common setup (using the driver inside the cluster in a pod), the job server was implemented as:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: beam-spark-job-server
  labels:
    app: airflow-k8s
spec:
  selector:
    matchLabels:
      app: beam-spark-job-server
  replicas: 1
  template:
    metadata:
      labels:
        app: beam-spark-job-server
    spec:
      restartPolicy: Always
      containers:
        - name: beam-spark-job-server
          image: apache/beam_spark_job_server:2.27.0
          args: ["--spark-master-url=spark://spark-master:7077"]
          resources:
            limits:
              memory: "1Gi"
              cpu: "0.7"
          env:
            - name: SPARK_PUBLIC_DNS
              value: spark-client
          ports:
          - containerPort: 8099
            protocol: TCP
            name: job-server
          - containerPort: 7077
            protocol: TCP
            name: spark-master
          - containerPort: 8098
            protocol: TCP
            name: artifact
          - containerPort: 8097
            protocol: TCP
            name: java-expansion
apiVersion: v1
kind: Service
metadata:
  name: beam-spark-job-server
  labels:
    app: airflow-k8s
spec:
  type: ClusterIP
  selector:
    app: beam-spark-job-server
  ports:
    - port: 8099
      protocol: TCP
      targetPort: 8099
      name: job-server
    - port: 7077
      protocol: TCP
      targetPort: 7077
      name: spark-master
    - port: 8098
      protocol: TCP
      targetPort: 8098
      name: artifact
    - port: 8097
      protocol: TCP
      targetPort: 8097
      name: java-expansion

Development/Errors

If I execute the command python -m apache_beam.examples.wordcount --output ./data_test/ --runner=PortableRunner --job_endpoint=beam-spark-job-server:8099 --environment_type=LOOPBACK from the airflow pod I get no logs on the job server and I get this error on the terminal:

INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.client:Timeout attempting to reach GCE metadata service.
WARNING:apache_beam.internal.gcp.auth:Unable to find default credentials to use: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
Connecting anonymously.
INFO:apache_beam.runners.worker.worker_pool_main:Listening for workers at localhost:46569
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
INFO:root:Default Python SDK image for environment is apache/beam_python3.7_sdk:2.27.0
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/examples/wordcount.py", line 99, in <module>
    run()
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/examples/wordcount.py", line 94, in run
ERROR:grpc._channel:Exception iterating requests!
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 195, in consume_request_iterator
    request = next(request_iterator)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/artifact_service.py", line 355, in __next__
    raise self._queue.get()
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 561, in run
    return self.runner.run_pipeline(self, self._options)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 421, in run_pipeline
    job_service_handle.submit(proto_pipeline)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 115, in submit
    prepare_response.staging_session_token)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 214, in stage
    staging_session_token)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/artifact_service.py", line 241, in offer_artifacts
    for request in requests:
  File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 416, in __next__
    return self._next()
  File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 803, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "Unknown staging token job_b6f49cc2-6732-4ea3-9aef-774e3d22867b"
    debug_error_string = "{"created":"@1613765341.075846957","description":"Error received from peer ipv4:127.0.0.1:8098","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Unknown staging token job_b6f49cc2-6732-4ea3-9aef-774e3d22867b","grpc_status":3}"
>
    output | 'Write' >> WriteToText(known_args.output)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 582, in __exit__
    self.result = self.run()
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 561, in run
    return self.runner.run_pipeline(self, self._options)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 421, in run_pipeline
    job_service_handle.submit(proto_pipeline)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 115, in submit
    prepare_response.staging_session_token)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 214, in stage
    staging_session_token)
  File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/artifact_service.py", line 241, in offer_artifacts
    for request in requests:
  File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 416, in __next__
    return self._next()
  File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 803, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "Unknown staging token job_b6f49cc2-6732-4ea3-9aef-774e3d22867b"
    debug_error_string = "{"created":"@1613765341.075846957","description":"Error received from peer ipv4:127.0.0.1:8098","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Unknown staging token job_b6f49cc2-6732-4ea3-9aef-774e3d22867b","grpc_status":3}"

Which indicates an error transmitting the job. If I implement the Job Server in the same pod as airflow I get a full working communication between these two containers, I would like to have the same behavior but with them in different pods.

1 Answers1

1

You need to deploy two containers in one pod

Dipen
  • 11
  • 1
  • "If I implement the Job Server in the same pod as airflow I get a full working communication between these two containers, I would like to have the same behavior but with them in different pods. " – Giovani Merlin Feb 24 '21 at 12:24
  • If You want same environment as that then you have to use StatefulSet – Dipen Feb 26 '21 at 11:35