How do I properly reach my hdfs in the cloud from my laptop with hdfs dfs
command?
I set up a HDFS on EC2 following this guide and I've set up my (AWS) security group to allow all inbound and outbound traffic on my Network interface (so I'm ruling out networking issues) and I'm able to see http://18.216.33.186:8088/cluster/nodes on my desktop. 18.216.33.186 is the public IP of my EC2 instance.
However when I try to access my hdfs via command line with my ubuntu host public IP it doesn't work. Can you help me please?
Jennys-MacBook-Pro:~ jennylian$ hdfs dfs -ls -h -R hdfs://18.216.33.186:9000
Jenny Modified JAVA /Library/Java/JavaVirtualMachines/applejdk-11.0.17.8.3.jdk/Contents/Home
2023-03-21 12:35:36,404 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From Jennys-MacBook-Pro.local/127.0.0.1 to ec2-18-216-33-186.us-east-2.compute.amazonaws.com:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
This is how I've configured my hdfs on ubuntu. If I don't use localhost there hdfs won't work
ubuntu@ip-172-31-11-139:~$ cat /opt/hadoop/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
From the ubuntu machine that actually runs HDFS (working great):
ubuntu@ip-172-31-11-139:~$ hdfs dfs -ls -h -R hdfs://localhost:9000/
drwxr-xr-x - ubuntu supergroup 0 2023-03-21 18:44 hdfs://localhost:9000/jenny
drwxr-xr-x - ubuntu supergroup 0 2023-03-21 18:42 hdfs://localhost:9000/user
-rw-r--r-- 1 ubuntu supergroup 20 2023-03-21 18:42 hdfs://localhost:9000/user/helloworld.txt
drwxr-xr-x - ubuntu supergroup 0 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu
drwxr-xr-x - ubuntu supergroup 0 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu/hadoop
-rw-r--r-- 1 ubuntu supergroup 8.6 K 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu/hadoop/capacity-scheduler.xml
-rw-r--r-- 1 ubuntu supergroup 1.3 K 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu/hadoop/configuration.xsl
-rw-r--r-- 1 ubuntu supergroup 1.2 K 2023-03-21 18:41 hdfs://localhost:9000/user/ubuntu/hadoop/container-executor
This is trying to access hdfs from my laptop and it doesn't work:
Jennys-MacBook-Pro:aws jennylian$ hdfs dfs -ls -h -R hdfs://18.216.33.18:9000/
Jenny Modified JAVA /Library/Java/JavaVirtualMachines/applejdk-11.0.17.8.3.jdk/Contents/Home
2023-03-21 12:48:13,435 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2023-03-21 12:48:36,978 INFO ipc.Client: Retrying connect to server: ec2-18-216-33-18.us-east-2.compute.amazonaws.com/18.216.33.18:9000. Already tried 0 time(s); maxRetries=45
2023-03-21 12:48:56,986 INFO ipc.Client: Retrying connect to server: ec2-18-216-33-18.us-east-2.compute.amazonaws.com/18.216.33.18:9000. Already tried 1 time(s); maxRetries=45