3

Some months ago activated Cloud CDN for storage buckets. Our storage data is regularly changed via a backend. So to invalidate the cached version we added a query param with the changedDate to the url that is served to the client.

Back then this worked well.

Sometime in the last months (probably weeks) Google seemed to change that and is now ignoring the query string for caching from storage buckets.

  • First part: Does anyone know why this is changed and why noone was notified about it?
  • Second part: How can you invalidate the Cache for a particular object in a storage bucket without sending a cache-invalidation request (which you shouldn't) everytime?

I don't like the idea of deleting the old file and uploading a new file with changed filename everytime something is uploaded...

EDIT: for clarification: the official docu ( cloud.google.com/cdn/docs/caching ) already states that they now ignore query strings for storage buckets:

For backend buckets, the cache key consists of the URI without the query > string. Thus https://example.com/images/cat.jpg, https://example.com/images/cat.jpg?user=user1, and https://example.com/images/cat.jpg?user=user2 are equivalent.

  • What do you have set for the CDN `cache key`? Edit your question with the CDN configuration. This document might help you: https://cloud.google.com/cdn/docs/caching – John Hanley Mar 21 '19 at 19:33
  • Thats exactly the point: they changed it so that you can't set it for storage buckets. says so in the document if you scroll down. – Markus Zancolò Mar 23 '19 at 20:09

3 Answers3

3

We were affected by this also. After contacting Google Support, they have confirmed this is a permanent change. The recommended work around is to either use versioning in the object name, or use cache invalidation. The latter sounds a bit odd as the cache invalidation documentation states:

Invalidation is intended for use in exceptional circumstances, not as part of your normal workflow.

jkoskela
  • 41
  • 1
  • If you change your storage data infrequently, it's completely fine to use invalidation. The downside to invalidation is just that you can only initiate one invalidation per minute. – Todd Greer Apr 01 '19 at 21:04
  • Yes, the original poster changes data regularly, as do we also in a multi-tenant system with thousands of customers and backend updating their data often. Invalidation is out of question is such case. We had to disable Cloud CDN for now as using versioning in the object name requires too many changes, and isn't very nice solution anyway. – jkoskela Apr 02 '19 at 06:12
0

For backend buckets, the cache key consists of the URI without the query string, as the official documentation states.1 The bucket is not evaluating the query string but the CDN should still do that. I could reproduce this same scenario and currently is still possible to use a query string as cache buster.

Seems like the reason for the change is that the old behavior resulted in lost caching opportunities, higher costs and higher latency. The only recommended workaround for now is to create the new objects by incorporating the version into the object's name (which seems is not valid options for your case), or using cache invalidation.

Invalidating the cache for a particular object will require to use a particular query. Maybe a Cache-Control header allowing such objects to be cached for a certain time may be your workaround. Cloud CDN cache has an expiration time defined by the "Cache-Control: s-maxage", "Cache-Control: max-age", and/or Expires headers 2.


Community
  • 1
  • 1
Lozano
  • 170
  • 6
  • invalidating cache is not an option. we have a few thousand objects. most of them not changing for weeks/months. but every day some are changed. so triggering a custom cache invalidation every time is not an option. Saving the object as a new file results in massive increase of storage since some clients might have cached the urls for the old files so we cant delete them. reducing the cache time to an acceptable delay (15 min) basically removes the benefit of a CDN which could cache them for days... – Markus Zancolò Mar 28 '19 at 19:41
  • i understand that the old behaviour has problems for some usecases. but those problems were resolved by using custom cache keys and remove the query string from the cache-key. – Markus Zancolò Mar 28 '19 at 19:46
  • I created a feature request to give this some attention. You can follow up here: https://issuetracker.google.com/129539674 – Lozano Mar 29 '19 at 15:52
0

According to the doc, when using backend bucket as origin for Cloud CDN, query strings in the request URL are not included in the cache key:

For backend buckets, the cache key consists of the URI without the protocol, host, or query string.

Maybe using the query string to identify different versions of cached content is not the best practices promoted by GCP. But for some legacy issues, it has to be.

So, one way to workaround this is make backend bucket to be a static website (do NOT enable CDN here), then use custom origins (Cloud CDN backed by Internet network endpoint groups backend service) which points to that static website.

For backend service, query string IS part of cache key.

For backend services, Cloud CDN defaults to using the complete request URI as the cache key

That's it. Yes, It is tedious but works!

Browny Lin
  • 2,427
  • 3
  • 28
  • 32
  • I need more information. Those links do explain what each of the terms means, but don't describe this specific scenario. Does this mean I have to write a server to proxy to the bucket, or is there a way to configure this using Google's Cloud Console? I currently have my Load Balancer pointing directly to the bucket for the entire domain. – Megamind Nov 10 '20 at 00:46