0

I'm trying to allocate a large memory-mapped 2d array that's larger than RAM and it keeps failing with an out of memory error. I'm using java8, linux-amd64 and nd4j 1.0.0-beta4. My understanding according to the docs (https://deeplearning4j.org/docs/latest/deeplearning4j-config-memory) is that I should be able to allocate an array that is much larger than RAM, as it will use a temporary file and then rely on the OS to page that in as required (e.g. using mmap)

Update - I'm getting some intermittent successes after a reboot - I'm wondering if there's some base amount of RAM needed for housekeeping for the large array allocation routine? Maybe the zeroing? I'll report back...

I've tried some different policy options, made sure there's enough disk free where the temp file goes, and done a wee bit of debugging to see what's going on the bowels of the memory allocation code, but to no avail. It always seem to fail complaining that there's not enough physical memory - which is sort of correct, there isn't enough RAM to do this, that's the point

long cols = 3000;
long rows = 1000000;

long expectedSize = 4 * cols * rows;
this.nd4jWorkspaceManager = Nd4j.getWorkspaceManager();
this.workspaceConfig  = WorkspaceConfiguration.builder()
    .initialSize(expectedSize)
    .policyLocation(LocationPolicy.MMAP)
    .policyAllocation(AllocationPolicy.OVERALLOCATE)
    .policySpill(SpillPolicy.EXTERNAL)
    .tempFilePath(System.getProperty("user.home") + "/.nd4jtmp")
    .build();

System.out.format("Attempting to create workspace of size %s%n", formatBytes(expectedSize));
this.memoryWorkspace = Nd4j.getWorkspaceManager().getAndActivateWorkspace(workspaceConfig, "M2");
System.out.println("... Done");

System.out.format("Attempting to create array of size %s%n", formatBytes(expectedSize));
INDArray matrix = Nd4j.create(DataType.FLOAT, rows, cols);
System.out.println("... Done");

System.out.format("Populating array with random numbers...%n");

for (int i = 0; i < rows; i++) {
  for (int j = 0; j < cols; j++) {
    matrix.put(i, j, (float) Math.random());
  } 
}

System.out.println("... Done");

Here's the output to free,

$ free
              total        used        free      shared  buff/cache   available
Mem:        7852420     2950656      120860      311200     4780904     4067768
Swap:       7811068      190632     7620436

I run the main method and it fails to allocate the array with:

09:14:36,476 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
09:14:36,477 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
09:14:36,477 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/home/nickg/src/riskscape/riskscape/cli/bin/main/logback.xml]
09:14:36,478 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.
09:14:36,478 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [file:/home/nickg/src/riskscape/riskscape/test-shared/bin/main/logback.xml]
09:14:36,478 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [file:/home/nickg/src/riskscape/riskscape/cli/bin/main/logback.xml]
09:14:36,786 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
09:14:36,790 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
09:14:36,797 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDERR]
09:14:36,909 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [nz.org.riskscape] to WARN
09:14:36,909 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDERR] to Logger[nz.org.riskscape]
09:14:36,909 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to ERROR
09:14:36,910 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDERR] to Logger[ROOT]
09:14:36,910 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
09:14:36,912 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6950e31 - Registering current configuration as safe fallback point

Attempting to create workspace of size 11.18gb
... Done
Attempting to create array of size 11.18gb
Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate new LongPointer(8): totalBytes = 1, physicalBytes = 3779M
    at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:76)
    at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:41)
    at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:407)
    at org.nd4j.linalg.api.buffer.LongBuffer.<init>(LongBuffer.java:81)
    at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createLong(DefaultDataBufferFactory.java:478)
    at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createLong(DefaultDataBufferFactory.java:473)
    at org.nd4j.linalg.factory.Nd4j.createBufferDetached(Nd4j.java:1449)
    at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:3241)
    at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:76)
    at org.nd4j.linalg.cpu.nativecpu.DirectShapeInfoProvider.createShapeInformation(DirectShapeInfoProvider.java:65)
    at org.nd4j.linalg.cpu.nativecpu.DirectShapeInfoProvider.createShapeInformation(DirectShapeInfoProvider.java:49)
    at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:232)
    at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:343)
    at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:185)
    at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:189)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4651)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4129)
    at NDArrayAllocationTest.run(NDArrayAllocationTest.java:40)
    at NDArrayAllocationTest.main(NDArrayAllocationTest.java:14)
Caused by: java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (3779M) > maxPhysicalBytes (3410M)
    at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:585)
    at org.bytedeco.javacpp.Pointer.init(Pointer.java:125)
    at org.bytedeco.javacpp.LongPointer.allocateArray(Native Method)
    at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:68)
    ... 18 more

  • How much RAM and how much swap does the machine have? What does the `free` command print out? Temp space is not relevant here AFAIK. – Nate Eldredge Jul 14 '19 at 21:51
  • I'm trying to go outside of RAM by using memory mapped files (https://deeplearning4j.org/docs/latest/deeplearning4j-config-memory), so my understanding is that disk space is relevant. I'll edit the question to put those details in as well and clarify. – Nick Griffiths Jul 15 '19 at 22:08

1 Answers1

0

It looks like it was just a house-keeping thing. I confirmed that the mmaped file was being used for my NDArray, the OOM was happening when allocating some buffer for the array's shape. After setting org.bytedeco.javacpp.maxphysicalbytes to be big enough, the NDArray successfully builds.

I'm not really sure why this works and why it's necessary, but there we go. The long buffer that it's failing to allocate is only about 8 longs in length... - perhaps the mmap'd file is skewing the processes reported memory size?

If there's anyone who knows more about ND4J's memory management who can comment, please comment.