1

I am creating a web application that prompts the user to upload a file, which is stored in a Google Cloud Storage bucket. I use a parsing function (package on pip to extract the data, which takes a filepath and loads the specified file.

The path for the file in the bucket is gs://my_bucket/myfile.ged, but this file can't be found when I pass the path to the parsing function. When run locally, it parses the file as expected. If I deploy the file in the project folder with the script, it works. But when run on App engine on Google Cloud Platform, it cannot find the file.

The problem is similar to described here. This is how I would expect it to work.

f = request.files['fileToUpload']
blob = bucket.blob("myfile.ged")
blob.upload_from_file(f)
gs_path = 'gs://my_bucket/myfile.ged'
parsing_function(gs_path)

And I guess I shouldn't be too surprised that the following testing function always returns 'empty':

def testing():
    var = 'emtpy'

    filename = 'gs://my_bucket/myfile.ged'

    if(os.path.exists(filename)):
        var = 'filename'

    else:
        blobs = bucket.list_blobs()
        for blob in blobs:
            if(os.path.exists(blob.path)):
                var = blob.path

    return var

I've tried using the temp_file method, which gives the TypeError: expected str, bytes or os.PathLike object, not _TemporaryFileWrapper

 with TemporaryFile() as temp_file:
     blob.download_to_file(temp_file)
     temp_file.seek(0)
     parsing_function(gs_path)

I've also tried:

  • Obtaining the blob.path, but it comes in a format /b/my_bucket/o/myfile.ged that also can't be found.
  • The I/O method described here:

    filepath = BytesIO()
    blob.download_to_file(filepath)
    parsing_function(filepath)  
    

But this also retuns the TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO

So after an exhaustive search, I've come here for help. Any suggestions, or alternatives would be greatly appreciated.

  • I think you are starting to realize that an object contained in a bucket owned by Google Cloud Storage is not the same as a POSIX (regular) file that can be read/written using ordinary APIs. The solution is to use the GCS specific APIs to access the content of the object and work with it there. For example, you should be able to stream/read the GCS object content as data and process it. – Kolban Dec 19 '19 at 05:30
  • As I understand the issue is that the given path doesn't return the file as expected. In that case I would suggest taking a look in [this post](https://stackoverflow.com/a/48917978/11928130). If this is not the case, could you explain in details what is your exact issue? Thank you. – tzovourn Dec 19 '19 at 16:16
  • 1
    This function `os.path.exists()` does not test if a file exists in Google Cloud Storage. Use a Google Cloud SDK function such as to `bucket.blob(filepath).exists()` – John Hanley Dec 19 '19 at 18:32

1 Answers1

2

As already stated, you should use the Google Cloud Storage client library.

One thing to keep in mind is that you cannot directly use the idea of file paths in Cloud Storage. What you can do instead is use something like:

from google.cloud import storage

client = storage.Client()
bucket = client.bucket(bucket_name)
iterator = bucket.list_blobs(
    versions=True,
    prefix='dir/subdir1/subdir2/',
    delimiter='/'
)
subdirectories = iterator.prefixes
objects = list(iterator)

Another thing to keep in mind is where to store the files in app engine. As stated here, you can use the /tmp directory to store temporary files that you want to process in App Engine.

Waelmas
  • 1,894
  • 1
  • 9
  • 19
  • 1
    Thank you so much, using the /tmp directory did it. While my previous temp file approaches didn't work, this did: `temp_path ="tmp/myfile.ged" blob.download_to_filename(temp_path) my_function(temp_path)` – Tyler Sloan Jan 07 '20 at 11:30
  • I should really clarify that the solution that works to write a file to a temporary `tmp` folder is: `temp_path = "/tmp/myfile.ged"` The first forward slash is essential for appengine, as described [here](https://cloud.google.com/appengine/docs/standard/python3/using-temp-files). – Tyler Sloan Jan 08 '20 at 18:54