0

How do I properly reach my hdfs in the cloud from my laptop with hdfs dfs command?

I set up a HDFS on EC2 following this guide and I've set up my (AWS) security group to allow all inbound and outbound traffic on my Network interface (so I'm ruling out networking issues) and I'm able to see http://18.216.33.186:8088/cluster/nodes on my desktop. 18.216.33.186 is the public IP of my EC2 instance.

However when I try to access my hdfs via command line with my ubuntu host public IP it doesn't work. Can you help me please?

Jennys-MacBook-Pro:~ jennylian$ hdfs dfs -ls -h -R hdfs://18.216.33.186:9000
Jenny Modified JAVA /Library/Java/JavaVirtualMachines/applejdk-11.0.17.8.3.jdk/Contents/Home
2023-03-21 12:35:36,404 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From Jennys-MacBook-Pro.local/127.0.0.1 to ec2-18-216-33-186.us-east-2.compute.amazonaws.com:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

This is how I've configured my hdfs on ubuntu. If I don't use localhost there hdfs won't work

ubuntu@ip-172-31-11-139:~$ cat /opt/hadoop/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

From the ubuntu machine that actually runs HDFS (working great):

ubuntu@ip-172-31-11-139:~$ hdfs dfs -ls -h -R hdfs://localhost:9000/
drwxr-xr-x   - ubuntu supergroup          0 2023-03-21 18:44 hdfs://localhost:9000/jenny
drwxr-xr-x   - ubuntu supergroup          0 2023-03-21 18:42 hdfs://localhost:9000/user
-rw-r--r--   1 ubuntu supergroup         20 2023-03-21 18:42 hdfs://localhost:9000/user/helloworld.txt
drwxr-xr-x   - ubuntu supergroup          0 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu
drwxr-xr-x   - ubuntu supergroup          0 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu/hadoop
-rw-r--r--   1 ubuntu supergroup      8.6 K 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu/hadoop/capacity-scheduler.xml
-rw-r--r--   1 ubuntu supergroup      1.3 K 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu/hadoop/configuration.xsl
-rw-r--r--   1 ubuntu supergroup      1.2 K 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu/hadoop/container-executor

This is trying to access hdfs from my laptop and it doesn't work:

Jennys-MacBook-Pro:aws jennylian$ hdfs dfs -ls -h -R hdfs://18.216.33.18:9000/
Jenny Modified JAVA /Library/Java/JavaVirtualMachines/applejdk-11.0.17.8.3.jdk/Contents/Home
2023-03-21 12:48:13,435 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2023-03-21 12:48:36,978 INFO ipc.Client: Retrying connect to server: ec2-18-216-33-18.us-east-2.compute.amazonaws.com/18.216.33.18:9000. Already tried 0 time(s); maxRetries=45
2023-03-21 12:48:56,986 INFO ipc.Client: Retrying connect to server: ec2-18-216-33-18.us-east-2.compute.amazonaws.com/18.216.33.18:9000. Already tried 1 time(s); maxRetries=45

But This works: http://18.216.33.186:8088/cluster/nodes

Jenny Lian
  • 79
  • 4

1 Answers1

0

Here can be one reason, in core-site.xml namenode address is localhost:9000, so namenode process will be listening on 127.0.0.1:9000(You can verify using netstat command) but when you try to connect to namenode using hdfs dfs at that time it is connecting to 18.216.33.18:9000 but it is not listening on interface 18.216.33.18:9000

In logs also 2023-03-21 12:48:36,978 INFO ipc.Client: Retrying connect to server: ec2-18-216-33-18.us-east-2.compute.amazonaws.com/18.216.33.18:9000. Already tried 0 time(s); maxRetries=45 we can see it is connecting to 18.216.33.18:9000.

You can modify core-site.xml to hdfs://18.216.33.18:9000 and try once.

Ultimately you need to make sure that you contact to the same interface or IP on which 9000 post is listning.

As here you want to access it using outside network you need to make sure that namenode binds to 18.216.33.18:9000 .