2

Hadoop Writable interface relies on "public void write(DataOutput out)" method. It looks like behind DataOutput interface, Hadoop uses DataOutputStream, which uses a simple array under the cover.

When I try to write a lot of data in DataOutput in my reducer, I get:

Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:3230) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.DataOutputStream.write(DataOutputStream.java:107) at java.io.FilterOutputStream.write(FilterOutputStream.java:97)

Looks like the system is unable to allocate the continuous array of the requested size. Apparently, increasing the heap size available to the reducer does not help - it is already at 84GB (-Xmx84G)

If I cannot reduce the size of the object that I need to serialize (as the reducer constructs this object by combining the object data), what should I try to work around this problem?

user1234883
  • 1,675
  • 10
  • 18
  • Can you give more details about your key/value type, serialization and the output format ? – Clément MATHIEU Aug 22 '14 at 15:05
  • That's the file (I am trying to get this OOS to work for my data) https://github.com/thinkaurelius/faunus/blob/master/src/main/java/com/thinkaurelius/faunus/FaunusVertex.java – user1234883 Aug 23 '14 at 03:58

1 Answers1

0

I think you should use -Xms e.g. -Xms40G rather than -Xmx84G

Mosab Shaheen
  • 1,114
  • 10
  • 25