1

I’ve 2 nodes cluster (v1.04), master and slave. On the master, in Tool.run() we add two files to the DistributedCache using addCacheFile(). Files do exist in HDFS. In the Mapper.setup() we want to retrieve those files from the cache using

FSDataInputStream fs = FileSystem.get( context.getConfiguration() ).open( path ). 

The problem is that for one file a FileNotFoundException is thrown, although the file exists on the slave node:

attempt_201211211227_0020_m_000000_2: java.io.FileNotFoundException: File does not exist: /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv

ls –l on the slave:

[hduser@slave ~]$ ll /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/ analytics/1.csv                        
-rwxr-xr-x 1 hduser hadoop 42701 Nov 22 10:18 /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/ analytics/1.csv

My questions are:

  1. Shouldn't all files exist on all nodes?
  2. What should be done to fix that?

Thanks.

USB
  • 6,019
  • 15
  • 62
  • 93
Seffy
  • 1,045
  • 1
  • 13
  • 27
  • The `ll` output shows a space before *analytics*. The space is missing in the path you provide to hadoop. – Joe23 Nov 22 '12 at 09:19
  • I edited the output to hide some sensitive info, probably entered a space by mistake. On my environment, there is no difference between the path provided to hadoop and the path to file on the node. Thanks for paying attention... – Seffy Nov 22 '12 at 10:28

1 Answers1

4

Solved - should have beed used:

FileSystem.getLocal( conf ) 

Thanks to Harsh J from Hadoop mailing list.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
Seffy
  • 1,045
  • 1
  • 13
  • 27