My code runs on CentOS 6.6 on a cluster node with 100GB memory. However this seems still not large enough because my code needs to read 1000+ hickle files (each 200MB). This is totally 240GB. When the code is running, the system memory cache keeps increasing until full and the code performance becomes very slow when allocating new object and doing numpy arrays calculations.
I tried to do gc.collect and del to prevent any memory leakage, but the memory is still increasing. I doubt this is due to file caching. So I wonder if there is a function in python sys or os lib that can disable python file system caching when reading large amount (1000) of large hickle files (200MB each) or a single lmdb file (240GB). Actually, I don't really need to cache those files once read.