0

I'm trying to read a large csv file (25 GB) onto a google cloud instance using the following method:

from google.cloud import storage
from io import StringIO

client = storage.Client()
bucket = client.get_bucket('bucket')
blob = bucket.get_blob(f"full_dataset.csv")
bt = blob.download_as_string()

s = str(bt,"utf-8")
s = StringIO(s)
df = pd.read_csv(s)

which gives me the following error:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-18-e919b9b86de2> in <module>
  2 
  3 s = str(bt,"utf-8")
 ----> 4 s = StringIO(s)

MemoryError: 

Is there another method that I could you use to efficiently read this csv file without a memory error ?

Alex Martin
  • 507
  • 1
  • 3
  • 9

1 Answers1

0

The object is too big to fit into a string in memory. You can instead read it chunk by chunk, for example by using google.resumable_media.

Mike Schwartz
  • 11,511
  • 1
  • 33
  • 36