2

I am trying to get Google App Engine to gunzip my .gz blob file (single file compressed) automatically by setting the response headers as follows:

class download(blobstore_handlers.BlobstoreDownloadHandler):
    def get(self, resource):
        resource = str(urllib.unquote(resource))
        blob_info = blobstore.BlobInfo.get(resource)
        self.response.headers['Content-Encoding'] = str('gzip')
        # self.response.headers['Content-type'] = str('application/x-gzip')
        self.response.headers['Content-type'] = str(blob_info.content_type)
        self.response.headers['Content-Length'] = str(blob_info.size)
        cd = 'attachment; filename=%s' % (blob_info.filename)
        self.response.headers['Content-Disposition'] = str(cd)
        self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
        self.response.headers['Pragma'] = str(' public')
        self.send_blob(blob_info)

When this runs, the file is downloaded without the .gz extension. However, the downloaded file is still gzipped. The file size of the downloaded data match the .gz file size on the server. Also, I can confirm this by manually gunzipping the downloaded file. I am trying to avoid the manual gunzip step.

I am trying to get the blob file to automatically gunzip during the download. What am I doing wrong?

By the way, the gzip file contains only a single file. On my self-hosted (non Google) server, I could accomplish the automatic gunzip by setting same response headers; albeit, my code there is written in PHP.

UPDATE:

I rewrote the handler to serve data from the bucket. However, this generates HTML 500 error. The file is partially downloaded before the failure. The rewrite is as follows:

class download(blobstore_handlers.BlobstoreDownloadHandler):
    def get(self, resource):
        resource = str(urllib.unquote(resource))
        blob_info = blobstore.BlobInfo.get(resource)
        file = '/gs/mydatabucket/%s' % blob_info.filename
        print file
        self.response.headers['Content-Encoding'] = str('gzip')
        self.response.headers['Content-Type'] = str('application/x-gzip')
        # self.response.headers['Content-Length'] = str(blob_info.size)
        cd = 'filename=%s' % (file)
        self.response.headers['Content-Disposition'] = str(cd)
        self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
        self.response.headers['Pragma'] = str(' public')
        self.send_blob(file)

This downloads 540,672 bytes of the 6,094,848 bytes file to the client before the server terminate and issued a 500 error. When I issue 'file' on the partially downloaded file from the command line, Mac OS seems to correctly identify the file format as 'SQLite 3.x database' file. Any idea of why the 500 error on the server? How can I fix the problem?

Sunny
  • 1,464
  • 14
  • 26
  • Can you give more details on the types of blobs you are trying to transport in this manner and how the blobs were created? – someone1 Nov 12 '14 at 23:49
  • Hi there! Thanks for your response. I am trying to transport .sqlite files. The files were manually gzipped from the command line on a server. Also, we tried uploading Mac OS X Yosemite gzipped version files. The files were uploaded with a webapp2.RequestHandler "upload_url = blobstore.create_upload_url('/upload', gs_bucket_name='mydatabucket', )" and blobstore_handlers.BlobstoreUploadHandler "blob_info = self.get_uploads('file')[0]" is saved off to database for use in future downloads. – Sunny Nov 13 '14 at 04:03
  • If you uploaded a SQL file, your content-type should be `plain/text` and you need to make sure that the file uploaded was gzip compressed before being stored and had the `content-encoding: gzip` meta data attached to it. You will not get this with a direct upload using blobstore, you will need to transform it after upload or upload via CLI – someone1 Nov 17 '14 at 18:11
  • I have tried uploading the gzipped file to the GCS with gsutil/gsutil -h "Content-Encoding:gzip" -h "Content-Type:plain/text" cp ../my_db_file.sqlite.gz gs://mydatabucket and changing the Content-Type in download code above. The download still fails with 500 error. How do you upload to the blobstore from the command line or transform it after upload? – Sunny Nov 17 '14 at 19:14
  • It looks like you are using blobstore to upload to GCS. For a basic example, I suggest you download the SQL file without compression, use the "-z" CLI option so it will automatically compress and add the content-encoding headers for you, set the ACL to public, and download using the `storage.googleapis.com//` URL format. Again, you need to set the object to "public" in order for this basic example to work. Please refer to the SO question I linked in my answer – someone1 Nov 17 '14 at 19:29
  • I have done gsutil/gsutil cp -z sqlite -a public-read ../my_db_file.sqlite gs://mydatabucket. Now, I am not finding code showing how to serve storage.googleapis.com/mydatabucket/my_db_file.sqlite anywhere. Can you tell me how to change the above download code to user the storage.googleapis.com url? Doing self.send_blob(url) fails with 500. – Sunny Nov 17 '14 at 20:46
  • You can just use the URL, no need to "serve" it with the blobstore API, this defeats the purpose of having GCS handle the gzip detection for you. FYI, content-encoding and content-disposition is taken care of for you in the `send_blob` function: https://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/webapp/blobstore_handlers.py#226 – someone1 Nov 17 '14 at 21:49
  • Hmmm. Please show the code to 'use' the url. I am new to Google Cloud Platform and learning on the go. I tried the self.send_blob(url) and still got the 500 error. If this is not the correct way, what do I do with the url I constructed? – Sunny Nov 17 '14 at 22:00
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/65105/discussion-between-someone1-and-sunny). – someone1 Nov 17 '14 at 22:16

2 Answers2

3

You should first check to see if your requesting client supports gzipped content. If it does support gzip content encoding, then you may pass the gzipped blob as is with the proper content-encoding and content-type headers, otherwise you need to decompress the blob for the client. You should also verify that your blob's content_type isn't gzip (this depends on how you created your blob to begin with!)

You may also want to look at Google Cloud Storage as this automatically handles gzip transportation so long as you properly compress the data before storing it with the proper content-encoding and content-type metadata.

See this SO question: Google cloud storage console Content-Encoding to gzip

Or the GCS Docs: https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata#content-encoding

You may use GCS as easily (if not more easily) as you use the blobstore in AppEngine and it seems to be the preferred storage layer to use going forward. I say this because the File API has been deprecated which made blobstore interaction easier and great efforts and advancements have been made to the GCS libraries making the API similar to the base python file interaction API

UPDATE:

Since the objects are stored in GCS, you can use 302 redirects to point users to files rather than relying on the Blobstore API. This eliminates any unknown behavior of the Blobstore API and GAE delivering your stored objects with the content-type and content-encoding you intended to use. For objects with a public-read ACL, you may simply direct them to either storage.googleapis.com/<bucket>/<object> or <bucket>.storage.googleapis.com/<object>. Alternatively, if you'd like to have application logic dictate access, you should keep the ACL to the objects private and can use GCS Signed URLs to create short lived URLs to use when doing a 302 redirect.

Its worth noting that if you want users to be able to upload objects via GAE, you'd still use the Blobstore API to handle storing the file in GCS, but you'd have to modify the object after it was uploaded to ensure proper gzip compressing and content-encoding meta data is used.

class legacy_download(blobstore_handlers.BlobstoreDownloadHandler):
    def get(self, resource):
        filename = str(urllib.unquote(resource))
        url = 'https://storage.googleapis.com/mybucket/' + filename
        self.redirect(url)
Community
  • 1
  • 1
someone1
  • 3,570
  • 2
  • 22
  • 35
  • From your suggestion, I understand that I should manually upload the files using gsutil: gsutil -h "Content-Encoding:gzip" -h "Content-Type:text/plain" cp my_data_file.sqlite gs://bucket/mydatabucket. I am not sure how to serve this file to users from code. I haven't seen any examples of serving data from gsutil buckets. E.g., user apps make requests such as: http://myserver.com/download/dataX--where dataX identifies the data needed. – Sunny Nov 13 '14 at 04:19
  • tried serving from gcs. However, this does not seem to work (see the updated question above). Any ideas what might be happening? – Sunny Nov 14 '14 at 20:13
0

GAE already serves everything using gzip if the client supports it. So I think what's happening after your update is that the browser expects there to be more of the file, but GAE thinks it's already at the end of the file since it's already gzipped. That's why you get the 500. (if that makes sense)

Anyway, since GAE already handles compression for you, the easiest way is probably to put non compressed files in GCS and let the Google infrastructure handle the compression automatically for you when you serve them.

Christiaan
  • 2,637
  • 21
  • 26