1

I am trying to create an array of dtype='U' and saving that using numpy.save(), however, when trying to load the saved file into a numpy.memmap I get an error related to the size not being a multiple of 'U3'

I am working with python 3.5.2. I have tried the following code where I am creating an empty array and another array with 3 entries, all with length of 3 letters and then save the array into file1.npy file.

import numpy as np
arr = np.empty((1, 0), dtype='U')
arr2 = np.array(['111', '222', '333'], dtype='U')
arr = np.concatenate((arr, arr2), axis = None)
print(arr)
np.save('file1', arr)

rArr = np.memmap('file1.npy', dtype='U3', mode='r')

However, when I try to load the file into a numpy.memmap I get the the following error ValueError: Size of available data is not a multiple of the data-type size.

Is there a way to load the data into a numpy.memmap using strings? I feel I am missing something simple.

Kour
  • 33
  • 8
  • Possible duplicate of [NumPy mmap: "ValueError: Size of available data is not a multiple of data-type size."](https://stackoverflow.com/questions/15303087/numpy-mmap-valueerror-size-of-available-data-is-not-a-multiple-of-data-type-s) – iAmTryingOK May 15 '19 at 07:03
  • I saw that question but I am not sure how it is a duplicate. The saved file is a binary file. I believe there is another reason I am getting the error like some extra data in the file I am not aware of. – Kour May 15 '19 at 07:15
  • 1
    Have you tried `np.load` with `mmap_mode`? – hpaulj May 15 '19 at 07:17
  • @hpaulj, this actually worked. I can't believe I missed it. I was trying different things and I forgot to test with `np.load` and `mmap_mode`. On the other hand, to address my original question I believe I found the answer as well. When using `numpy.save`, it seems the resulted file has a header if I delete that header (the first line in the file) and use `numpy.memmap` I am able to load the data properly. So I am guessing `memmap` was more directed for manually saved files without using `numpy.save` – Kour May 15 '19 at 07:29

2 Answers2

2

The files used by numpy.memmap are raw binary files, not NPY-format files. If you want to read a memory-mapped NPY file, use numpy.load with the argument mmap_mode='r' (or whatever other value is appropriate).

After creating 'file1.npy' like you did, here's how it can be memory-mapped with numpy.load:

In [16]: a = np.load('file1.npy', mmap_mode='r')                                                                       

In [17]: a                                                                                                             
Out[17]: memmap(['111', '222', '333'], dtype='<U3')
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • Thanks, this confirmed my suspicion about `memmap`. So I will have to manually save the binary data that would be loaded later). `np.load()` works fine. – Kour May 15 '19 at 07:34
  • Alternatively, you can save a raw binary file, which is via `array.tofile()`, and then load it with `memmap`. – Denziloe May 30 '20 at 14:59
0

Looks like np.load is your friend here.

Doc

Issue

The following snippet works for me:

rArr = np.load('file1.npy', mmap_mode='r')
iAmTryingOK
  • 216
  • 1
  • 10