0

I have single node hadoop cluster on ec2. Tried to give all posible combinations in slaves file.

May 01 2020 08:16:25.227 DEBUG org.apache.hadoop.hdfs.DFSClient - pipeline = 172.31.45.114:9866 
May 01 2020 08:16:25.227 DEBUG org.apache.hadoop.hdfs.DFSClient - pipeline = 172.31.45.114:9866 
May 01 2020 08:16:25.228 DEBUG org.apache.hadoop.hdfs.DFSClient - Connecting to datanode 172.31.45.114:9866 
May 01 2020 08:16:25.228 DEBUG org.apache.hadoop.hdfs.DFSClient - Connecting to datanode 172.31.45.114:9866 
May 01 2020 08:16:35.167 DEBUG org.apache.hadoop.ipc.Client - IPC Client (2007716372) connection to ec-x.x.x.x/x.x.x.x:54310 from vgs: closed 

I have tried to bind the datanode to external ip , but its not binding, by default its binding on internal ip of the machine.

Also used dfs.client.use.datanode.hostname as true, still client is receiving the internal ip not external.

user3190018
  • 890
  • 13
  • 26

1 Answers1

0

In order to run spark on EMR you need at least 2 nodes (I managed to run it on minimum 3, but from what I'm reading- I assume 2 should also be enough) - 1 node - MASTER is not enough. You need MASTER and CORE. Here you have some more comprehensive guide how to do it: https://medium.com/big-data-on-amazon-elastic-mapreduce/run-a-spark-job-within-amazon-emr-in-15-minutes-68b02af1ae16

Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34