I'm trying to read a random line out of a large file stored in a public cloud storage bucket.
My understanding is that I can't do this with gsutil and have looked into FUSE but am not sure it will fill my use case: https://cloud.google.com/storage/docs/gcs-fuse
There are many files, which are ~50GB each -- for a total of several terabytes. If possible I would like to avoid downloading these files. They are all plain text files -- you can see them here: https://console.cloud.google.com/storage/browser/genomics-public-data/linkage-disequilibrium/1000-genomes-phase-3/ldCutoff0.4_window1MB
It would be great if I could simply get a filesystem handle using FUSE so I could place the data directly into other scripts -- but I am okay with having to re-write them to read line by line if that is what is necessary. The key thing is -- under no circumstances should the interface download the entire file.