1

I have a code that reads files from FTP server and writes it into HDFS. I have implemented a customised InputFormatReader that sets the isSplitable property of the input as false .However this gives me the following error.

INFO mapred.MapTask: Record too large for in-memory buffer

The code I use to read data is

Path file = fileSplit.getPath();
                FileSystem fs = file.getFileSystem(conf);
                FSDataInputStream in = null;
                try {
                    in = fs.open(file);


                    IOUtils.readFully(in, contents, 0, contents.length);

                    value.set(contents, 0, contents.length);

                }

Any ideas how to avoid java heap space error without splitting the input file ? Or in case I make isSplitable true how do I go about reading the file ?

RadAl
  • 404
  • 5
  • 23

2 Answers2

2

If I got you right - you load the whole file in memory. Unrelated to hadoop - you can not do it on Java and be sure that you have enough memory.
I would suggest to define some resonable chunk and make it to be "a record"

David Gruzman
  • 7,900
  • 1
  • 28
  • 30
  • I get wat you are talking about. I even tried coding it . But I started having issues with closing the input stream that would read from the source.Every time a chunk of the input was read and written into the mapper as a record , returning back to read the remaining wasn't possible. I would read a chunk of 1024 and set it as value for the record. `while(totalBytes < len){ bytesRead =in.read(buf); } totalBytes+=1024; } value.set(buf); ` – RadAl Jan 02 '13 at 04:18
  • Ok .. got thins sored to some extent . However I hv run into a new issue .. Please find that here : [link](http://stackoverflow.com/questions/14117719/downloading-files-from-ftp-to-local-using-java-makes-the-file-unreadable-encod) – RadAl Jan 02 '13 at 06:35
1

While a Map function is running hadoop collects output records in an in-memory buffer called MapOutputBuffer.

The total size of this in memory buffer is set by the io.sort.mb property and defaults to 100 MB.

Try increasing this property value in mapred-site.xml

  • Doesn't work .. I tried setting it via my code using 'conf.set' .. It did set the value to the one I specified but however it still runs into heap space error. – RadAl Jan 01 '13 at 06:32
  • Doesn't work .. I tried setting it via my code using 'conf.set' .. It did set the value to the one I specified but however it still runs into heap space error – RadAl Jan 01 '13 at 08:59