I have a dataset that is 1.2 TB. It is a directory of several folders and is nested in a URL. The directory looks like the image below (Li et al. 2021 - a synthetic building operation dataset).
No matter how I request to get data with urllib, the Google Collab crashes. I changed the RunTime Type as well, and it didn't work. I was wondering whether are there any methods that I can use to read this directory without purchasing the Pro versions of Google Collab?
I have used two methods to get the data.
import urllib.request
url = 'https://oedi-data-lake.s3-us-west-2.amazonaws.com/building_synthetic_dataset/A_Synthetic_Building_Operation_Dataset.h5'
with urllib.request.urlopen(url) as response:
html = response.read()
And
import urllib.request
import xarray as xr
import io
url = 'https://oedi-data-lake.s3-us-west-2.amazonaws.com/building_synthetic_dataset/A_Synthetic_Building_Operation_Dataset.h5'
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as resp:
ds = xr.open_dataset(io.BytesIO(resp.read()))