0

My Hadoop version is - 2.6.0 -cdh5.10.0 I am using a Cloudera Vm.

I am trying to access the hdfs file system through my code to access the files and add it as input or a cache file.

When I try to access the hdfs file through command line am able to list the files.

Command :

[cloudera@quickstart java]$ hadoop fs -ls hdfs://localhost:8020/user/cloudera 
Found 5items
-rw-r--r--   1 cloudera cloudera        106 2017-02-19 15:48 hdfs://localhost:8020/user/cloudera/test
drwxr-xr-x   - cloudera cloudera          0 2017-02-19 15:42 hdfs://localhost:8020/user/cloudera/test_op
drwxr-xr-x   - cloudera cloudera          0 2017-02-19 15:49 hdfs://localhost:8020/user/cloudera/test_op1
drwxr-xr-x   - cloudera cloudera          0 2017-02-19 15:12 hdfs://localhost:8020/user/cloudera/wc_output
drwxr-xr-x   - cloudera cloudera          0 2017-02-19 15:16 hdfs://localhost:8020/user/cloudera/wc_output1

When I try to access the same thing through my map reduce program,I am receiving File Not Found exception. My Map reduce sample configuration code is :

public int run(String[] args) throws Exception {
  
  Configuration conf = getConf();
  
  if (args.length != 2) {
   System.err.println("Usage: test <in> <out>");
   System.exit(2);
  }
  
  ConfigurationUtil.dumpConfigurations(conf, System.out);
  
  LOG.info("input: " + args[0] + " output: " + args[1]);
  
  Job job = Job.getInstance(conf);
  
  job.setJobName("test");
  
  job.setJarByClass(Driver.class);
  job.setMapperClass(Mapper.class);
  job.setReducerClass(Reducer.class);

  job.setMapOutputKeyClass(Text.class);
  job.setMapOutputValueClass(Text.class);
  
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(DoubleWritable.class);
  
  
  job.addCacheFile(new Path("hdfs://localhost:8020/user/cloudera/test/test.tsv").toUri());
  
  
  FileInputFormat.addInputPath(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  
  
  boolean result = job.waitForCompletion(true);
  return (result) ? 0 : 1;
 }

The line job.addCacheFile in the above snippet returns FileNotFound Exception.

2)My second question is :

My entry at core-site.xml points to localhost:9000 for default hdfs file system URI.But at the command prompt am able to access the default hdfs file system only at port 8020 and not at 9000.when I tried using port 9000,I ended up with ConnectionRefused Exception. I am not sure from where the configurations are read.

My core-site.xml is as follows :

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!--  
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/student/tmp/hadoop-local/tmp</value>
   <description>A base for other temporary directories.</description>
  </property>
-->
  
 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
  <description>Default file system URI.  URI:scheme://authority/path scheme:method of access authority:host,port etc.</description>
</property>
 
</configuration>

My hdfs-site.xml is as follows :

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

 <property>
  <name>dfs.name.dir</name>
  <value>/tmp/hdfs/name</value>
  <description>Determines where on the local filesystem the DFS name
   node should store the name table(fsimage).</description>
 </property>

 <property>
  <name>dfs.data.dir</name>
  <value>/tmp/hdfs/data</value>
  <description>Determines where on the local filesystem an DFS data node should store its blocks.</description>
 </property>
 
 <property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.Usually 3, 1 in our case
  </description>
 </property>
</configuration>

I am receiving the following exception :

java.io.FileNotFoundException: hdfs:/localhost:8020/user/cloudera/test/   (No such file or directory)
  at java.io.FileInputStream.open(Native Method)
  at java.io.FileInputStream.<init>(FileInputStream.java:146)
  at java.io.FileInputStream.<init>(FileInputStream.java:101)
  at java.io.FileReader.<init>(FileReader.java:58)
  at hadoop.TestDriver$ActorWeightReducer.setup(TestDriver.java:104)
  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
  at        org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any help will be useful!

user1477232
  • 457
  • 1
  • 8
  • 17
  • can you share the argument which you are giving when you are trying to access the file through Map reduce – siddhartha jain Mar 06 '17 at 01:42
  • @siddhartha jain :hadoop test.jar path-to-driverclass hdfs-path-to-input output – user1477232 Mar 06 '17 at 05:01
  • can you post exception which is throwing by program – Hari Singh Mar 06 '17 at 05:07
  • @HariSingh : I have updated the post with the exception am receiving. – user1477232 Mar 06 '17 at 05:23
  • @user1477232 if you will see logs hdfs:/localhost:8020/user/cloudera/test/ it is trying to get from this path but what i think it should be hdfs://localhost:8020/user/cloudera/test/ so give three slashes (hdfs:///localhost:8020/) or either don't give full path directly write the (/user/cloudera/test) by default it will take the hdfs path – Hari Singh Mar 06 '17 at 05:30
  • You have `hdfs://localhost:9000` in the XML, so why are you using `hdfs://localhost:8020`?? – OneCricketeer Mar 06 '17 at 07:37
  • @HariSingh : I tried giving /user/cloudera/test but it didn work.I received FileNotFoundException. – user1477232 Mar 06 '17 at 08:46
  • @cricket_007: Thats the only port (8020) through which am able to access the hdfs file system. Like I said in my post,I initially tried localhost:9000 but ended up with ConnectionRefused Exception. – user1477232 Mar 06 '17 at 08:48
  • Sounds like you didn't restart hadoop after changing the `core-site.xml`... Though, really, if you are using the Cloudera VM, then you need to edit no XML files. – OneCricketeer Mar 06 '17 at 08:56

1 Answers1

0

you are not required to give full path as an argument for accessing the file from hdfs. Namenode on it's own (from core-site.xml) will add the prefix of hdfs://host_address. You just need to mention the file you want to access along with the directory structure in your case which should be /user/cloudera/test .

Coming to your 2 question port no 8020 is the default port for hdfs. That is why you are able to access the hdfs at port 8020 even when you did not mention it. The reason for the connectionrefused exception is because hdfs get started at 8020 that is why port 9000 is not expecting any request thus it refused the connection.

refer here for more details about default ports

siddhartha jain
  • 1,006
  • 10
  • 16
  • I tried giving /user/cloudera/test but it didn work.I received FileNotFoundException."The reason for the connectionrefused exception is because hdfs get started at 8020 that is why port 9000 is not expecting any request thus it refused the connection." How should I fix this? – user1477232 Mar 06 '17 at 08:52
  • 1
    change port in core-site.xml to 8020 – siddhartha jain Mar 06 '17 at 08:55