I installed the Cloudera CDH4 distribution on a single machine in pseudo-distributed mode and successfully tested that it was working correctly (e.g. can run MapReduce programs, insert data on the Hive server, etc.) However, if I chance the core-site.xml
file to have fs.default.name
set to machine name rather than localhost
and restart the NameNode service, the HDFS enters safe-mode.
Before the change of fs.default.name
, I ran the following to check the state of the HDFS:
$ hadoop dfsadmin -report
...
Configured Capacity: 18503614464 (17.23 GB)
Present Capacity: 13794557952 (12.85 GB)
DFS Remaining: 13790785536 (12.84 GB)
DFS Used: 3772416 (3.60 MB)
DFS Used%: 0.03%
Under replicated blocks: 2
Blocks with corrupt replicas: 0
Missing blocks: 0
Then I made the modification to core-site.xml
(with the machine name being hadoop
):
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop:8020</value>
</property>
I restarted the service and reran the report.
$ sudo service hadoop-hdfs-namenode restart
$ hadoop dfsadmin -report
...
Safe mode is ON
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
An interesting note is that I can still perform some HDFS commands. For example, I can run
$ hadoop fs -ls /tmp
However, if I try to read a file using hadoop fs -cat
or try to place a file in the HDFS, I am told the NameNode is in safemode.
$ hadoop fs -put somefile .
put: Cannot create file/user/hadinstall/somefile._COPYING_. Name node is in safe mode.
The reason I need the fs.default.name
to be set to the machine name is because I need to communicate with this machine on port 8020 (the default NameNode port). If fs.default.name
is left to localhost
, then the NameNode service will not listen to external connection requests.
I am at a loss as to why this is happening and would appreciate any help.