I want to use a large dataset containing images from Kaggle in order to train a network I've created, however I have limited storage space on the machine I'm working on. I was wondering if there is any way to get the Kaggle dataset from a URL and load/read its images directly into a Python file and start training on it, without having to download the 5+ GB of data on my machine, since I don't have access to that space.
One of the datasets I want to use, is for example a Casia dataset: https://www.kaggle.com/datasets/sophatvathana/casia-dataset
I want something like:
url_casia = "https://www.kaggle.com/datasets/sophatvathana/casia-dataset/download?datasetVersionNumber=1"
response = requests.get(url_casia, stream=True)
# or something like: response = urllib.request.urlopen(url_casia)
img_list = np.array([cv2.imread(image) for image in response]
I know this doesn't work because the response content type is text/html; charset=utf-8
, but I was wondering if there is any way to get the images in either a zipfile or anything else that is readable in python without actually downloading the zipfile.