I have a AWS EMR cluster with Spark. I can connect to it (spark):
- from master node after SSHing into it
- from another AWS EMR cluster
But NOT able to connect to it:
I have read answers of this question. I have checked that folder permissions and disk space are fine on all the nodes. My assumption is I'm facing similar problem what James Wierzba is asking in the comments. However, I do not have enough reputation to add a comment there. Also, this might be a different problem considering it is specific to AWS EMR.
Connection works fine after SSHing to master node.
# SSHed to master node
$ ssh -i ~/identityfile hadoop@ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com
# on master node
$ /usr/lib/spark/bin/beeline -u 'jdbc:hive2://localhost:10001/default'
# it connects fine and I can run commands, for e.g., 'show databases;'
# Beeline version 1.2.1-spark2-amzn-0 by Apache Hive
Connection to this node works fine from master node of another EMR cluster as well.
However, connection does not work from my local machine (macOS Mojave), Metabase and Redash.
My local machine:
# installed hive (for beeline)
$ brew install hive
# Beeline version 3.1.1 by Apache Hive
# connect directly
# I have checked that all ports are open for my IP
$ beeline -u 'jdbc:hive2://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:10001/default'
# ERROR: ConnectException: Operation timed out
#
# this connection timeout probably has something to do with spark accepting only localhost connections
# I have allowed all the ports in AWS security group for my IP
# connect via port forwarding
# open a port
$ ssh -i ~/identityfile -Nf -L 10001:localhost:10001 hadoop@ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com
$ beeline -u 'jdbc:hive2://localhost:10001/default'
# Failed to connect to localhost:10001
# Required field 'client_protocol' is unset!
$ beeline -u 'jdbc:hive2://localhost:10001/;transportMode=http'
# org.apache.http.ProtocolException: The server failed to respond with a valid HTTP response
I have setup Metabase and Redash in ec2.
Metabase → connect using data source Spark SQL → results in
java.sql.SQLException: org.apache.spark.SparkException: java.io.IOException: Failed to create local dir in /mnt/tmp/blockmgr*
Redash → connect using data source Hive → results in same error.