1

I found the question Where HDFS stores files locally by default?.

My HDFS store data in /tmp/ folder which is deleted by system.

I want to change where HDFS stores files locally .

I am looking in hdfs-default.xml but cannot find dfs.data.dir

Run bin/hadoop version

Hadoop 2.8.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 66c47f2a01ad9637879e95f80c41f798373828fb
Compiled by jdu on 2017-10-19T20:39Z
Compiled with protoc 2.5.0
From source with checksum dce55e5afe30c210816b39b631a53b1d
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.8.2.jar

Edit
I want to know detail that:
Which file and how should I edit to change HDFS stores files locally ?

Haha TTpro
  • 5,137
  • 6
  • 45
  • 71

3 Answers3

5

Thank @ultimoTG for a hint.

So, my solution is find the file name hdfs-default.xml (this file is reference only, change config here NOT WORK) in my hadoop directory.

$HADOOP_HOME/share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Then, I copy the line I want to change from hdfs-default.xml into $HADOOP_HOME/etc/hadoop/hdfs-site.xml before modified value.

This is my $HADOOP_HOME/etc/hadoop/hdfs-site.xml which change HDFS stores files locally directory into Download folder.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>


<property>
  <name>dfs.namenode.name.dir</name>
  <value>/home/my_name/Downloads/hadoop_data/dfs/name</value>
  <description>Determines where on the local filesystem the DFS name node
      should store the name table(fsimage).  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
</property>

<property>
  <name>dfs.datanode.data.dir</name>
  <value>/home/my_name/Downloads/hadoop_data/dfs/data</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices. The directories should be tagged
  with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
  storage policies. The default storage type will be DISK if the directory does
  not have a storage type tagged explicitly. Directories that do not exist will
  be created if local filesystem permission allows.
  </description>
</property>

</configuration>
Haha TTpro
  • 5,137
  • 6
  • 45
  • 71
1

Look for dfs.datanode.data.dir. Docs here - http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

ultimoTG
  • 853
  • 6
  • 9
0

When you first extracted your hadoop, hdfs-site.xml is present in $HADOOP_HOME/etc/hadoop and is empty by default. You can add the following configuration to your hdfs-site.xml to change your local store location:

<property> 
      <name>dfs.data.dir</name> 
      <value>path_to_dir</value> 
</property> 

<property> 
      <name>dfs.name.dir</name> 
      <value>path_to_dir</value> 
</property> 
Geetika
  • 109
  • 6