How to read/load images from a dataset from Kaggle directly into Python without downloading locally?

Question

I want to use a large dataset containing images from Kaggle in order to train a network I've created, however I have limited storage space on the machine I'm working on. I was wondering if there is any way to get the Kaggle dataset from a URL and load/read its images directly into a Python file and start training on it, without having to download the 5+ GB of data on my machine, since I don't have access to that space.

One of the datasets I want to use, is for example a Casia dataset: https://www.kaggle.com/datasets/sophatvathana/casia-dataset

I want something like:

url_casia = "https://www.kaggle.com/datasets/sophatvathana/casia-dataset/download?datasetVersionNumber=1"

response = requests.get(url_casia, stream=True)
# or something like: response = urllib.request.urlopen(url_casia)

img_list = np.array([cv2.imread(image) for image in response]

I know this doesn't work because the response content type is text/html; charset=utf-8, but I was wondering if there is any way to get the images in either a zipfile or anything else that is readable in python without actually downloading the zipfile.

Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Сергей Кох, Mar 16 '23 at 08:05
@СергейКох I have edited my question, I hope it is more clear now — seeckhout, Mar 16 '23 at 11:13

score 0 · Answer 1 · answered Mar 15 '23 at 16:46

When you request data from a webpage, the response is loaded into your local or virtual machine's memory. To do this, you need the URL of each image. Then you can run this code for each URL:

resp = requests.get(
    url
    stream=True,
)
for chunk in resp.raw:
    print("Do something...")

This is basically webscraping and doesn't suit your use case I imagine.

The Kaggle Python API allows you to download the entire dataset locally which is probably a better option.

How to read/load images from a dataset from Kaggle directly into Python without downloading locally?

1 Answers1