I'm from the deeplearning4j project. Memory mapped workspaces are made for embeddings yes and should be considered a separate concept from our off heap memory. The off heap memory is a conceptual rabbit hole I won't cover here (you have to have an understanding of the JVM and the topic isn't relevant here)
The way you would have to use memory mapped workspaces is by loading the word2vec inside a memory mapped scope.
The first component is the configuration:
import org.nd4j.linalg.api.memory.MemoryWorkspace;
import org.nd4j.linalg.api.memory.conf.WorkspaceConfiguration;
import org.nd4j.linalg.api.memory.enums.LocationPolicy;
WorkspaceConfiguration mmap = WorkspaceConfiguration.builder()
.initialSize(initialSize)
.policyLocation(LocationPolicy.MMAP)
.build();
try (MemoryWorkspace ws =
Nd4j.getWorkspaceManager().getAndActivateWorkspace(mmap)) {
//load your word2vec here
}
Of note with memory mapped workspaces is how it should be used. Mem map is intended only for accessing a large array and pulling subsets of it out from ram.
You should only use it to pull out a subset of the word vectors out you need for doing training.
When using word2vec (or any other embedding technique), the typical pattern is to lookup only the word vectors you want and merge them together in to a mini batch.
That minibatch (and the associated training) should happen in a separate workspace (or have it be unattached which is the default). The reason you can have it unattached is we already do workspaces and the other associated optimizations for you inside of ComputationGraph and MultiLayerNetwork. Just make sure to pass in whatever you need to fit.
From there, use the INDArray get(..) and put(..) methods to copy the rows you need in to another array that you should use for training.
For more on that see: https://deeplearning4j.org/docs/latest/nd4j-overview
For more information look at leverage, leverageTo, detach,.. in the INDArray javadoc:
https://deeplearning4j.org/api/latest/org/nd4j/linalg/api/ndarray/INDArray.html