I'm trying to allocate a large memory-mapped 2d array that's larger than RAM and it keeps failing with an out of memory error. I'm using java8, linux-amd64 and nd4j 1.0.0-beta4. My understanding according to the docs (https://deeplearning4j.org/docs/latest/deeplearning4j-config-memory) is that I should be able to allocate an array that is much larger than RAM, as it will use a temporary file and then rely on the OS to page that in as required (e.g. using mmap)
Update - I'm getting some intermittent successes after a reboot - I'm wondering if there's some base amount of RAM needed for housekeeping for the large array allocation routine? Maybe the zeroing? I'll report back...
I've tried some different policy options, made sure there's enough disk free where the temp file goes, and done a wee bit of debugging to see what's going on the bowels of the memory allocation code, but to no avail. It always seem to fail complaining that there's not enough physical memory - which is sort of correct, there isn't enough RAM to do this, that's the point
long cols = 3000;
long rows = 1000000;
long expectedSize = 4 * cols * rows;
this.nd4jWorkspaceManager = Nd4j.getWorkspaceManager();
this.workspaceConfig = WorkspaceConfiguration.builder()
.initialSize(expectedSize)
.policyLocation(LocationPolicy.MMAP)
.policyAllocation(AllocationPolicy.OVERALLOCATE)
.policySpill(SpillPolicy.EXTERNAL)
.tempFilePath(System.getProperty("user.home") + "/.nd4jtmp")
.build();
System.out.format("Attempting to create workspace of size %s%n", formatBytes(expectedSize));
this.memoryWorkspace = Nd4j.getWorkspaceManager().getAndActivateWorkspace(workspaceConfig, "M2");
System.out.println("... Done");
System.out.format("Attempting to create array of size %s%n", formatBytes(expectedSize));
INDArray matrix = Nd4j.create(DataType.FLOAT, rows, cols);
System.out.println("... Done");
System.out.format("Populating array with random numbers...%n");
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
matrix.put(i, j, (float) Math.random());
}
}
System.out.println("... Done");
Here's the output to free
,
$ free
total used free shared buff/cache available
Mem: 7852420 2950656 120860 311200 4780904 4067768
Swap: 7811068 190632 7620436
I run the main method and it fails to allocate the array with:
09:14:36,476 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
09:14:36,477 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
09:14:36,477 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/home/nickg/src/riskscape/riskscape/cli/bin/main/logback.xml]
09:14:36,478 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.
09:14:36,478 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [file:/home/nickg/src/riskscape/riskscape/test-shared/bin/main/logback.xml]
09:14:36,478 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [file:/home/nickg/src/riskscape/riskscape/cli/bin/main/logback.xml]
09:14:36,786 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
09:14:36,790 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
09:14:36,797 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDERR]
09:14:36,909 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [nz.org.riskscape] to WARN
09:14:36,909 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDERR] to Logger[nz.org.riskscape]
09:14:36,909 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to ERROR
09:14:36,910 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDERR] to Logger[ROOT]
09:14:36,910 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
09:14:36,912 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6950e31 - Registering current configuration as safe fallback point
Attempting to create workspace of size 11.18gb
... Done
Attempting to create array of size 11.18gb
Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate new LongPointer(8): totalBytes = 1, physicalBytes = 3779M
at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:76)
at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:41)
at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:407)
at org.nd4j.linalg.api.buffer.LongBuffer.<init>(LongBuffer.java:81)
at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createLong(DefaultDataBufferFactory.java:478)
at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createLong(DefaultDataBufferFactory.java:473)
at org.nd4j.linalg.factory.Nd4j.createBufferDetached(Nd4j.java:1449)
at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:3241)
at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:76)
at org.nd4j.linalg.cpu.nativecpu.DirectShapeInfoProvider.createShapeInformation(DirectShapeInfoProvider.java:65)
at org.nd4j.linalg.cpu.nativecpu.DirectShapeInfoProvider.createShapeInformation(DirectShapeInfoProvider.java:49)
at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:232)
at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:343)
at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:185)
at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:189)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4651)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4129)
at NDArrayAllocationTest.run(NDArrayAllocationTest.java:40)
at NDArrayAllocationTest.main(NDArrayAllocationTest.java:14)
Caused by: java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (3779M) > maxPhysicalBytes (3410M)
at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:585)
at org.bytedeco.javacpp.Pointer.init(Pointer.java:125)
at org.bytedeco.javacpp.LongPointer.allocateArray(Native Method)
at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:68)
... 18 more