I'm working on large data files stored in Google Cloud. I'm using a Python script which downloads first a blob containing json lines, and then opens it to analyze data line by line. This method is very slow, and i'd like to know if exists a faster way to do this. From command line i can use gsutil cat
to stream data to stdout, is there a similar way to do this on Python?
This is what i currently do to read data:
myClient = storage.Client()
bucket = myClient.get_bucket(bucketname)
blob = storage.blob.Blob(blobname, bucket)
current_blob.download_to_filename("filename.txt")
file = open("filename.txt", "r")
data = f.readlines()
for line in data:
# Do stuff
I want to read the blob line by line, without waiting for download.
Edit: i found this answer but the function isn't clear to me. I don't know how to read the streaming lines.