0

I am new to working with numpy.core.memmap objects and am having trouble figuring out how I edit an existing .npy file read into python using numpy.memmap(). For example, following the example from Scipy.org, I can create an object and write to it, but once created, I cannot modify the contents.

from tempfile import mkdtemp
import os.path as path

data = np.arange(12, dtype='float32')
data.resize((3,4))

filename = path.join(mkdtemp(), 'newfile.dat')
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp[:] = data[:] ### write data to fp array

del fp ### remove fp object

fpc = np.memmap(filename, dtype='float32', mode='c', shape=(3,4)) ### This is writeable in memory

fpc[0,:] = 0

del fpc ### close object

This simply deletes the object from memory, but the object at filename is not modified. I have tried numpy.memmap.flush(fpc) as well, but this doesn't seem to work either.

I understand from reading other posts that one can simply copy the edited .npy file to another location, but this seems like it could become problematic in terms of disk space. Is it correct that you cannot modify an existing .npy file?

user44796
  • 1,053
  • 1
  • 11
  • 20
  • `mode='c'` is copy on write mode, which only writes changes to ram, never to disk. try using `mode='r+'`. ('r+' is actually the default according to the [docs](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.memmap.html)) – Aaron Feb 09 '18 at 21:40
  • Well, now I feel stupid. If you write that as the answer, I will accept it. – user44796 Feb 09 '18 at 22:26

1 Answers1

1

Numpy interprets "copy on write" as "write changes to ram, but don't save them to disk" (docs). This is a fairly standard implementation when referring to data that could be shared between threads or processes. It sounds like you confused copy on write with snapshots (which sometimes use similar terminology, but refer to disk writes rather than ram).

If you change mode="c" to mode="r+" (or eliminate the mode keyword as "r+" is the default anyway), this should solve your problem.

Additionally I would like to point out that in most cases it is simpler and more pythonic to use np.save and np.load and simply specify the mmap_mode keyword with the correct mode when loading the file. While technically limiting flexibility, this eliminates the need to specify a few keywords making things a bit more concise.

Aaron
  • 10,133
  • 1
  • 24
  • 40