I'm trying to read a large csv file (25 GB) onto a google cloud instance using the following method:
from google.cloud import storage
from io import StringIO
client = storage.Client()
bucket = client.get_bucket('bucket')
blob = bucket.get_blob(f"full_dataset.csv")
bt = blob.download_as_string()
s = str(bt,"utf-8")
s = StringIO(s)
df = pd.read_csv(s)
which gives me the following error:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-18-e919b9b86de2> in <module>
2
3 s = str(bt,"utf-8")
----> 4 s = StringIO(s)
MemoryError:
Is there another method that I could you use to efficiently read this csv file without a memory error ?