0

Is there a way by which I can expose data in google cloud storage to a VM without completely downloading it all at once on the VM.

Amazon Sagemaker Fast File Mode (FFM) exposes the data in S3 to machine learning application in such a way that it appears as if it is accessing a local file system. This provides the convenience of accessing the data as if it was stored locally without the overhead and cost of actually downloading it before training.

Is something similar also available in GCP? Can GCP's Filestore be used for this purpose?

  • Try GCSFuse. https://medium.com/analytics-vidhya/improve-workflow-with-cloud-storage-fuse-89b8d76d0886 – Ricco D Oct 11 '22 at 05:36
  • Is "Reading" data different of "Downloading" data? – guillaume blaquiere Oct 11 '22 at 07:16
  • By reading I meant streaming the data - this would allow model training to start without waiting for the complete dataset to be downloaded first. This is what Amazon Sagemaker's Fast File Mode (FFM) provides. Is there something similar in GCP? – Tarun Gupta Oct 12 '22 at 18:03

1 Answers1

0

Google Cloud Storage is an object storage API. As a result, it isn't really designed to be "mounted" within a VM. It is designed to be highly durable and scalable to extraordinarily large objects.

However, as mentioned by @Ricco D you can use gcsfuse to mount it as a filesystem, that method has pretty significant drawbacks. For example, it can be expensive in operation count to do even simple operations for a normal filesystem.

Chanpols
  • 1,184
  • 1
  • 3
  • 13