3

I want to store some value in map task into local disk in each data node. For example,

public void map (...) {
   //Process
   List<Object> cache = new ArrayList<Object>();
   //Add value to cache
   //Serialize cache to local file in this data node
}

How can I store this cache object to local disk in each data node, because if I store this cache in map function like above, then the performance will be terrible because I/O task?

I mean is there any way to wait for map task in this data node run completely and then we will store this cache into local disk? Or does Hadoop have a function to solve this issue?

user3190018
  • 890
  • 13
  • 26
nd07
  • 127
  • 9

1 Answers1

1

Please see below example, the created file will be somewhere under the directories used by NodeManager for containers. This is configuration property yarn.nodemanager.local-dirs in yarn-site.xml, or the default inherited from yarn-default.xml, which is under /tmp

Please see @Chris Nauroth answer, Which says that Its just for debugging purpose and It's not recommended as a permanent production configuration. It was clearly described why it was not recommended.

public void map(Object key, Text value, Context context)
        throws IOException, InterruptedException {
    // do some hadoop stuff, like counting words
    String path = "newFile.txt";
    try {
        File f = new File(path);
        f.createNewFile();
    } catch (IOException e) {
        System.out.println("Message easy to look up in the logs.");
        System.err.println("Error easy to look up in the logs.");
        e.printStackTrace();
        throw e;
    }
}
Community
  • 1
  • 1
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Thank you for point out how to create local file in data node. But how about serialize this file to a data node as I exampled in my question. If we serialize it in map function(), then for example, if inputsplit has 1000 records, then program will call 1000 times serialize funciton()? Is there any to serialize a object when task function () finished completely in each node – nd07 May 17 '16 at 10:52
  • 1
    As I understood that you want to serialize 1000 records or the number of records you process through map method. I think you can open file handle in setup and close in cleanup methods. in map method you can write all your records in append mode. Would that be okay for your kind of requirement ? again! points mentioned in Chris Nauroth answer are applicable. you can give a try for this. Thx – Ram Ghadiyaram May 17 '16 at 11:54
  • Thanks for your support! – nd07 May 17 '16 at 12:43