0

My program use a DistributedCache to cache files

JobConf conf = new JobConf(new Configuration(), ItemMining.class);
DistributedCache.addCacheFile(new URI("output1/FList.txt"), conf);
DistributedCache.addCacheFile(new URI("output1/GList.txt"), conf);

I get the files in

configure(){

..
localFiles = DistributedCache.getLocalCacheFiles(job);
FileSystem fs = FileSystem.get(job);
FSDataInputStream inF = fs.open(localFiles[0]);
..

}

The whole program can be run and get the right result on Eclipse. But when I run it in Hadoop cluster, I find that this part is not be called! Why does this happen? Do I need to set something in configuration?

1 Answers1

0

Problem solved, it turns out that I made two mistake:

1) I added a System.out.println() in the beginning of configure(), but it didn't show up it turns out that mapreduce can't use System.out.println() in mapreduce phases, if we want to see the it, we need to check our log, for details thanks to Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)

2) my real error is related to DistributedCache, I added a file and want to read it into memory, to open the path, we need FileSystem.getLocal() as following:

    localFiles = DistributedCache.getLocalCacheFiles(job);
    FileSystem fs = FileSystem.getLocal(job);
    FSDataInputStream inF = fs.open(localFiles[0]); 

Thanks to Hadoop: FileNotFoundExcepion when getting file from DistributedCache

Community
  • 1
  • 1