1

I need to use MapReduce to read some contents from ORC files, but at the same time, I hope when my program launched, it can automatically load one file which locates in the same jar package of my MapReduce program.

My tree folders are listed below:

MRProj/
├── bin
│   ├── com
│   │   ├── folder
├── **FILE_I_WANT_TO_READIN.dat**
├── lib
│   ├── jar01
│   ├── jar02
│   ├── ....
└── src
    ├── com
    │   ├── **MY_MAPREDUCE_FOLDER**
            ├── **MR.java**

There is no problem for me to read ORC files on HDFS, but it seems that when running the MapReduce, my program can not locate my file: "FILE_I_WANT_TO_READIN.dat".

The codes to read this file is listed below:

public static HashMap<String, String> ReadBinaryFile(String inputDir) {

    HashMap<String, String> opt = new HashMap<String, String>();
    String k = "";
    try {
        DataInputStream dis = new DataInputStream(new BufferedInputStream(new FileInputStream(new File(inputDir))));
        while (dis.available() > 0) {
            k = dis.readUTF();
            ... other codes ...
        }
        dis.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return opt;
}
rz.He
  • 199
  • 1
  • 8
  • The file would need to reside on the HDFS - this is because Mappers + Reducers will be running on nodes that don't have your file locally. See https://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api for solutions to loading files into memory within each Mapper/Reducer. – Ben Watson Jul 28 '17 at 13:02

0 Answers0