1

I use a web service to train some of my deep learning models through a Jupyter Notebook on AWS. For cost reasons I would like to store my data as .npz files on my own server and load them straight to memory of my remote machine.

The np.load() function doesn't work with http links and using urlretrieve I wasn't able to make it work. I only got it working downloading the data with wget and then loading the file from a local path. However, this doesn't fully solve my problem.

Any recommendations?

pietz
  • 2,093
  • 1
  • 21
  • 23

1 Answers1

2

The thing is that if the first argument of np.load is a file-like object, it has to be seek-able:

file : file-like object, string, or pathlib.Path The file to read. File-like objects must support the seek() and read() methods. Pickled files require that the file-like object support the readline() method as well.

If you are going to serve those files over http and your server supports the Range headers, you could employ the implementation (Python 2) presented in this answer for example as:

F = HttpFile('http://localhost:8000/meta.data.npz')
with np.load(F) as data:
    a = data['arr_0']
    print(a)

Alternatively, you could fetch the entire file, create an in-memory file-like object and pass it to np.load:

from io import BytesIO
import numpy as np
import requests

r = requests.get('http://localhost:8000/meta.data.npz', stream = True)
data = np.load(BytesIO(r.raw.read()))
print(data['arr_0'])
ewcz
  • 12,819
  • 1
  • 25
  • 47