Questions tagged [numpy-memmap]

An advanced numpy.memmap() utility to avoid RAM-size limit and reduce final RAM-footprint ( at a reasonable cost of O/S-cached fileIO mediated via a small-size in-RAM proxy-view window into whole array-data ) Creates and handles a memory-map to an array stored in a binary file on disk.

Creates and handles a memory-map to an array stored in a binary file on disk.

Memory-mapped files are used for arranging access to large non-in-RAM arrays via small proxy-segments of an O/S-cached area of otherwise unmanageably large data files.

Leaving most of the data on disk, without reading the entire file into RAM memory and working with data via smart, moving, O/S-cached window-view into the non-in-RAM big file, enables to escape from both O/S RAM-limits and from adverse side-effects of python's memory management painfull reluctance to release once allocated memory-blocks anytime before the python program termination.

numpy's memmap's are array-like objects.

This differs from Python's mmap module, which uses file-like objects.

101 questions
3
votes
1 answer

Numpy load a memory-mapped array (mmap_mode) from google cloud storage

I want to load a .npy from google storage (gs://project/file.npy) into my google ml-job as training data. Since the file is +10GB big, I want to use the mmap_mode option of numpy.load() to not run out of memory. Background: I use Keras with…
DΦC__WTF
  • 105
  • 1
  • 9
3
votes
0 answers

numpy memmap read error memory mapped size must be positive

I am reading a large binary file in partitions. Each partition is mapped using numpy.memmap. The file consist of 1M rows, where a row is 198 2-byte integers. A partition is 1000 rows long. Below is the code snippet: mdata = np.memmap(fn,…
3
votes
1 answer

Python: passing memmap array through function?

Suppose that I am working with very large array (e.g., ~45GB) and am trying to pass it through a function which open accepts numpy arrays. What is the best way to: Store this for limited memory? Pass this stored array into a function that takes…
Andy
  • 175
  • 1
  • 7
3
votes
0 answers

When updating a numpy.memmap'd file in parallel, is there a way to only "flush" a slice and not the whole file?

I have to do a lot of nasty i/o and I have elected to use memory mapped files with numpy...after a lot of headache I realized that when a process "flushes" to disk it often overwrites what other processes are attempting to write with old data...I…
3
votes
1 answer

Why am I getting an OverflowError and WindowsError with numpy memmap and how to solve it?

In relation to my other question here, this code works if I use a small chunk of my dataset with dtype='int32', using a float64 produces a TypeError on my main process after this portion because of safe rules so I'll stick to working with int32 but…
ZeferiniX
  • 500
  • 5
  • 18
3
votes
1 answer

Memory Error when using float32 in dask array

I am trying to import a 1.25 GB dataset into python using dask.array The file is a 1312*2500*196 Array of uint16's. I need to convert this to a float32 array for later processing. I have managed to stitch together this Dask array in uint16, however…
Amdixer
  • 91
  • 4
2
votes
1 answer

Numpy's memmap acting strangely?

I am dealing with large numpy arrays and I am trying out memmap as it could help. big_matrix = np.memmap(parameters.big_matrix_path, dtype=np.float16, mode='w+', shape=(1000000, 1000000) The above works fine and it creates a file on my hard drive…
als7
  • 35
  • 3
2
votes
1 answer

Memory Error using np.unique on large array to get unique rows

I have a large 2D Numpy array like arr = np.random.randint(0,255,(243327132, 3), dtype=np.uint8). I'm trying to get the unique rows of the array. Using np.unique I get the following memory error: unique_arr =…
Naru1243
  • 33
  • 4
2
votes
1 answer

Looping through Dask array made of npy memmap files increases RAM without ever freeing it

Context I am trying to load multiple .npy files containing 2D arrays into one big 2D array to process it by chunk later.All of this data is bigger than my RAM so I am using the memmap storage/loading system here: pattern = os.path.join(FROM_DIR,…
Tom Moritz
  • 33
  • 4
2
votes
2 answers

Random access in a saved-on-disk numpy array

I have one big numpy array A of shape (2_000_000, 2000) of dtype float64, which takes 32 GB. (or alternatively the same data split into 10 arrays of shape (200_000, 2000), it may be easier for serialization?). How can we serialize it to disk such…
Basj
  • 41,386
  • 99
  • 383
  • 673
2
votes
1 answer

Is there a way to know how much memory a numpy.memmap is currently using?

I want to investigate the memory usage of a python program that uses numpy.memmap to access data from large files. Is there a way to check the size in memory that a memmap is currently using? I tried sys.getsizeof on the numpy object and the _mmap…
Colin
  • 10,447
  • 11
  • 46
  • 54
2
votes
1 answer

How to use numpy memmap inside keras generator to not exceed RAM memory?

I'm trying to implement the numpy.memmap method inside a generator for training a neural network using keras in order to not exceed the memory RAM limit. I'm using as reference this post however unsuccessfully. Here is my attempt: def…
2
votes
0 answers

NumPy memmap slow loading small chunk from large file on first read only

I am using NumPy memmap to load a small amount of data from various locations throughout a large binary file (memmap'd, reshaped, flipped around, and then around 2000x1000 points loaded from around a 2 GB binary file). There are five 2 GB files each…
lesthaeghet
  • 116
  • 6
2
votes
1 answer

Why is concurrent.futures holding onto memory when returning np.memmap?

The problem My application is extracting a list of zip files in memory and writing the data to a temporary file. I then memory map the data in the temp file for use in another function. When I do this in a single process, it works fine, reading the…
2
votes
1 answer

Is it possible to close a memmap'd temporary file without flushing its contents?

Use Case: Enormous image processing. I employ mem-mapped temporary files when the intermeditate dataset exceeds physical memory. I have no need to store intermediate results to disk after I'm done with them. When I delete them, numpy seems to flush…
Jesse Meyer
  • 315
  • 1
  • 3
  • 12