I am performing a data analysis that entails loading a large data matrix of ~112GB into a memory-mapped file using R
programming language, specifically the bigmemory
package (see https://cran.r-project.org/web/packages/bigmemory/index.html). The matrix has 80664 columns and 356751 rows.
Data storage consists of NFS-mounted XFS filesystem.
XFS mount options are:
xfs noatime,nodiratime,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=131072k
NFS is exporting the FS using the following options:
rw,async,no_subtree_check,no_root_squash
NFS client is mounting the FS using these options:
defaults,async,_netdev
After sometime in loading the file, the compute node becomes unresponsive (including other nodes on the cluster) and the file server logs report the following errors:
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
Which I can resolve by dropping cache like so:
echo 3 > /proc/sys/vm/drop_caches
The file server has 16 GB of memory.
I have already read though the following blog:
https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/
However, the issue is not due to fragmentation, as the fragmentation reported is below 2% for the filesystem I am writing to.
So, due to the XFS error above, I assume that the file server is running out of memory as it cannot handle the number of IO requests issued by the task at hand.
Apart from dropping cache periodically (eg. via cron
), is there a more permanent solution to this?
Thanks in advance for the help.
Edit: CentOS 7.2 on client and server.
Edit #2: Kernel 3.10.0-229.14.1.el7.x86_64 on client and server.