2

I'm trying to access my HDFS using Java code but I can't get it working... after 2 days of struggling I think it's time to ask for help.

This is my code:

Configuration conf = new Configuration();           
conf.addResource(new Path("/HADOOP_HOME/conf/core-site.xml"));
conf.addResource(new Path("/HADOOP_HOME/conf/hdfs-site.xml"));
FileSystem hdfs = FileSystem.get(conf);

boolean success = hdfs.mkdirs(new Path("/user/cloudera/testdirectory"));
System.out.println(success);
        

I got this code from here and here. Unfortunately the hdfs object is just a "LocalFileSystem"-object, so something must be wrong. Looks like this is exactly what Rejeev wrote on his website:

[...] If you do not assign the configurations to conf object (using hadoop xml file) your HDFS operation will be performed on the local file system and not on the HDFS. [...]

With absolute paths I get the same result.

conf.addResource(new Path("/etc/hadoop/conf/core-site.xml"))

This is the libary I'm currently using:

hadoop-core-2.0.0-mr1-cdh4.4.0.jar

I heard that hadoop-core was split into multiple libs so I also tried the following libs:

hadoop-common-2.0.0-alpha.jar

hadoop-mapreduce-client-core-2.0.2-alpha.jar

I'm using Cloudera-CDH4.4.0 so hadoop is already installed. Via console everything is working fine. For example:

hadoop fs -mkdir testdirectory

So everything should be set up correctly as per default.

I hope that you guys can help me... this stuff is driving me nuts! It's extremely frustrating to fail with such a simple task.

Many thanks in advance for any help.

Community
  • 1
  • 1
Tim
  • 127
  • 2
  • 8

3 Answers3

1

Try this:

conf.set("fs.defaultFS", "file:///"); conf.set("mapreduce.framework.name", "local");

Hajmola
  • 93
  • 1
  • 6
  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post - you can always comment on your own posts, and once you have sufficient [reputation](http://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](http://stackoverflow.com/help/privileges/comment). – Ben Jan 06 '15 at 21:45
  • Ben Why is that not an answer ? – Hajmola Jan 06 '15 at 23:40
  • The question clearly specifies that he wants to access the HDFS filesystem, yet your suggestion is to set the default implementation to be local. Do you see the problem? – Matt Fortier Mar 23 '15 at 08:19
0

1) You don't need to conf.addResource unless you are overriding any configuration variables.

2) Hope you are creating a Jar file and running the jar file in command window and not in eclipse. If you execute in eclipse, it will execute on local file system.

3) I ran below code and it worked.

public class Hmkdirs {
public static void main(String[] args) 
        throws IOException 
        { 
Configuration conf = new Configuration();  
FileSystem fs = FileSystem.get(conf); 
boolean success = fs.mkdirs(new Path("/user/cloudera/testdirectory1"));
System.out.println(success);
        }

}

4) To execute, you need to create a jar file, you can do that either from eclipse or command prompt and execute the jar file.

command prompt jar file sample:

javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar -d classes WordCount.java && jar -cvf WordCount.jar -C classes/ .

jar file execution via hadoop at command prompt.

hadoop jar hadoopfile.jar hadoop.sample.fileaccess.Hmkdirs

hadoop.sample.fileaccess is the package in which my class Hmkdirs exist. If your class exist in default package, you don't have to specify it, just the class is fine.


Update: You can execute from eclipse and still access hdfs, check below code.

public class HmkdirsFromEclipse {

public static void main(String[] args) 

        throws IOException 
        { 
Configuration conf = new Configuration();  
conf.addResource("/etc/hadoop/conf/core-site.xml");
conf.addResource("/etc/hadoop/conf/hdfs-site.xml");
conf.set("fs.defaultFS", "hdfs://quickstart.cloudera:8020/");
conf.set("hadoop.job.ugi", "cloudera");
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
FileSystem fs = FileSystem.get(conf); 
boolean success = fs.mkdirs(new Path("/user/cloudera/testdirectory9"));
System.out.println(success);
        }

}

user1652210
  • 56
  • 1
  • 1
  • 3
  • Thank you so much for your replies! :) I followed your steps 1-4 and executed the application via console with "hadoop jar hadoopfile.jar hadoop.sample.fileaccess.Hmkdirs", then the console said "True" and the new folder in HDFS was created. I guess the problem was that I didn't use "hadoop jar". However, when running the code in eclipse I get the following error: java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory. How did you get this working? What libraries are you using? Thanks again so much :) – Tim Jan 08 '15 at 20:48
  • Add below imports: import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; Check your build path to have below jar files: hadoop-hdfs and hadoop-common – user1652210 Feb 19 '15 at 16:21
0

This is indeed a tricky bit of configuration, but this is essentially what you need to do:

    Configuration conf = new Configuration();
    conf.addResource("/etc/hadoop/conf/core-site.xml");
    conf.addResource("/etc/hadoop/conf/hdfs-site.xml");
    conf.set("fs.defaultFS", hdfs://[your namenode]);
    conf.set("hadoop.job.ugi", [your user]
    conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());

make sure you have hadoop-hdfs in your classpath, too.

Erik Schmiegelow
  • 2,739
  • 1
  • 18
  • 22
  • Erik, thanks for you reply. The code works for me when I run it from console via "hadoop jar" as user1652210 has described below. Can you run the code from eclipse? Is the hadoop-hdfs library the only one you are using? I get several NoClassDefFoundErrors when running the code from eclipse. – Tim Jan 08 '15 at 21:07
  • you'll need hadoop-hdfs and hadoop-common at least. Be aware that libraries names have changed between CDH4 and CDH5. You should also use maven or gradle to assemble your classpath to grab transitive dependencies too. – Erik Schmiegelow Jan 09 '15 at 09:05