2

I am trying local Kubernetes(Docker-on-mac), and trying to submit a spark job. The spark job, connects with a PostgreSQL database and do some calculations.

The PostgreSQL is running on my Kube and since I have published it, I can access it from the host via localhost:5432. However, when the spark application is trying to connect to PostgreSQL, it throws

Exception in thread "main" org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

kubectl cluster-info

Kubernetes master is running at https://kubernetes.docker.internal:6443
KubeDNS is running at https://kubernetes.docker.internal:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

kubectl get service postgresql-published

enter image description here

kubectl describe service spark-store-1588217023181-driver-svc

Name:              spark-store-1588217023181-driver-svc
Namespace:         default
Labels:            <none>
Annotations:       <none>
Selector:          spark-app-selector=spark-533ecb8556b6439eb938d487cc77c330,spark-role=driver
Type:              ClusterIP
IP:                None
Port:              driver-rpc-port  7078/TCP
TargetPort:        7078/TCP
Endpoints:         <none>
Port:              blockmanager  7079/TCP
TargetPort:        7079/TCP
Endpoints:         <none>
Session Affinity:  None

How can I make my spark job, have access to PostgreSQL service?

10465355
  • 4,481
  • 2
  • 20
  • 44
JDev
  • 1,662
  • 4
  • 25
  • 55
  • try to connect with postgres host as `postgresql-published` beacuse that's your service. – Oli Apr 30 '20 at 06:13

2 Answers2

5

localhost is there in EXTERNAL_IP but Kubernetes cluster DNS system(CoreDNS) does not know how to resolve it to an IP address.EXTERNAL_IP is supposed to be resolved by an external DNS server and it's generally meant to be used to connect to Postgres from outside the Kubernetes cluster(i.e from another system or from Kubernetes nodes as well) and not from the inside the cluster(i.e from another pod)

Postgres should be accessible from spark pod via 10.106.15.112:5432 or postgresql-published:5432 because kubernetes cluster DNS system knows how to resolve it.

Test the Postgres connectivity

kubectl run postgresql-postgresql-client --rm --tty -i --restart='Never' --namespace default --image bitnami/postgresql --env="PGPASSWORD=<HERE_YOUR_PASSWORD>" --command -- psql --host <HERE_HOSTNAME=SVC_OR_IP> -U <HERE_USERNAME> 
Arghya Sadhu
  • 41,002
  • 9
  • 78
  • 107
  • thanks for the help!! PostgreSQL is having an external ip and I am able to connect it through my terminal. However, when I am submitting the spark-job, the job is not able to connect to the database. – JDev Apr 30 '20 at 05:59
  • are you able to access it via 10.106.15.112:5432 from the pod? – Arghya Sadhu Apr 30 '20 at 06:00
  • use of 10.106.15.112:5432 did work. Plz, help me understand why it is not working with localhost – JDev Apr 30 '20 at 06:22
  • EXTERNAL_IP is only externally resolvable. postgresql-published:5432 should work from spark pod – Arghya Sadhu Apr 30 '20 at 06:36
1
NAME                  TYPE           CLUSTER-IP      EXTERNAL-IP   PORTS
postgresql-published  LoadBalancer   10.106.15.112   localhost     5432:31277

Means that the service shall be accessible within the cluster at 10.106.15.112:5432 , postgresql-published:5432 and externally at localhost:31277.

Please note that for the Pod the localhost is the Pod itself. In this very case localhost looks ambiguous. However that is how the expose works.

Nick
  • 1,882
  • 11
  • 16