Cache issues: React + REST server behind CDN

Question

I am looking for a pattern that would allow me to better the UX for my users. I have a REST server running behind CloudFront being consumed from a plain React application on the frontend.

I'll simplify my example to illustrate my issue.

I have an endpoint called GET /posts/<id>. When the browser asks for it, it comes with a max=age=180 which means it would get stored in the browser's cache and any subsequent call to GET /posts/<id> will be served from the browser's cache for the duration of those 180 seconds, after which it will hit the CDN again to try and obtain a fresh copy.

That is okay for most users. I don't mind if updates to any post to delay up to 3 minutes before they're propagated to all the users. But there is one user who's the author of this post. That user can make changes to this post using PATCH /posts/<id>. Let's call that user The Editor.

Here's a scenario I have right now:

The Editor loads up the post page which then calls GET /posts/5
The CDN serves the latest copy to the front end.
the Editor then makes a change to the post and submits it to be back end via PATCH /posts/5.
The editor then refreshes his browser tab using Command-R (or CTRL-R).
As a result, the front end then requests GET /posts/5 again -- but gets the stale copy from before the changes because 180 seconds haven't passed yet since the last GET and the GET issued after the PATCH

What I'd like the experience to be is:

The Editor loads up the post page which then calls GET /posts/5
The CDN serves the latest copy to the front end.
The editor then makes a change to the post and submits it to be back end via PATCH /posts/5.
After a Command-R browser tab refresh the GET /posts/5 brings back a copy of the data with the changes the editor made with PATCH right away, regardless of the 180 seconds of ttl before a fresh copy can be obtained.
As for the rest of the users, it's perfectly okay for them to wait up to 180 seconds before the change in the post propagates to them when the GET /posts/5

I am using Axios, but I do not that SWR and React-Query support mutations. To my understanding this would allow the editor to declare a mutation for the object he just PATCH'ed on the server, so that any subsequent calls he makes to GET /posts/5 will be served from there, until a fresher version can be obtained from the backend.

My questions are:

Can SWR with "mutations" serve the mutated object via the GET /posts/5 transparently?
Will the mutation survive a hard browser tab refresh? or a browser closure, re-opening and subsequent /GET posts/5?
Is there another pattern/best practice to solve that?

The solution presented by @hackape is one potential way forward. However, we can also burst the CloudFront cache upon write operations. You can try something like that- https://stackoverflow.com/questions/22021651/amazon-s3-and-cloudfront-cache-how-to-clear-cache-or-synchronize-their-cache/27241309#27241309 — Mukul Bansal, Apr 01 '21 at 09:14
@MukulBansal Good info! But I think that'll only bust the cache on CloudFront CDN node, not the cache in browser. — hackape, Apr 01 '21 at 12:54
@MukulBansal that's correct.. if the browser GETs an object with a ttl of 5 minutes from the CDN it will not even bother to talk to the CDN for the next 5 minutes. It will just serve the object from memory/disk -- and invalidating the object on CDN won't change that. — JasonGenX, Apr 01 '21 at 13:16
@JasonGenX Ok. So there are 3 things here- 1. Bursting the CDN cache- This can be done as discussed above. 2. Bursting the browser cache for post owner- This is in your hand. once the editor PATCHES a post, you can clear his browser cache upon a successful response from the server. 3. Bursting the browser cache for non-post owners- A per your scenario, you are fine if this is not burst. although, if you want to burst it you can via sockets. maintaining a list of all active users is a heavy thing to do though. Facebook or Bitbucket do this kinda thing. — Mukul Bansal, Apr 02 '21 at 09:40

hackape · Accepted Answer · 2021-03-31T09:15:37.513

TL;DR: Just append a harmless, gibberish querystring to the end of the request GET /posts/<id>?version=whatever

Good question. I must admit I don't know the full answer to this problem, but I want to share one well-known technique among frontend devs.

The technique is called cache busting. I'm not sure if this is the best practice, but I'm pretty sure it's widely practiced, since it's so straight-forward to understand.

Idea is simple. When you add a changed querystring to the end, you effectively change the URL, thus no cache is hit, you evade the whole cache problem.

So the detail steps to a solution for your particular use case would go like this:

Normally you'll just request GET /posts/<id> for all users
When a user logs in, a hash key is generated from whatever algorithm. For simplicity let's just use increasing integer and call it version. You store this version in localStorage so it can survive through page refresh.
Now you need to distinguish scenario when the user is viewing his own posts or other's posts. When guy is viewing his own, you always use GET /posts/<id>?version=n
Whenever the user edits his post and hits save button, you bump version from n to n+1
Next time he goes to post view page, the app requests GET /posts/<id>?version=n+1 which is not cached, and would retrieve the up-to-date content.
One last thing, make sure your server safely ignores that ?version=n querystring.

I'm sure there're other solutions to this problem. I'm no expert of server config and HTTP headers so I'm not getting into that topic, but there must be something to look for.

As of pure frontend solution, there's Serivce Worker API for you to consider. The main point of this API is to enable devs to programmatically control cache strategies.

With this API, you could leave your current app code as-is, just install a service worker, then you could use the same cache busting technique in the background to fetch new content, or just delete the cache (using Cache API) when user edits, or even fake a response for the GET /posts/<id> from the PATCH /posts/<id> that user just send.

score 0 · Answer 2 · answered Apr 02 '21 at 14:21

Depending on what CDN you use, you can invalidate a cache manually when publishing updates to a post. For example cloudfront lets you specify which path you want to fetch fresh on the next request.

For sites with lots of traffic but few updates this works pretty well, and is quite simple to implement. For sites with a lot of authors and frequently changing content you would need to get more creative though.

One strategy I've used in the past is using a technique called object versioning, where instead of invalidating the cache to an object you just publish a version of it with a timestamp. This would also mean you need to publish a manifest file when your frontend loads. The manifest contains the latest timestamps of all the content the page needs to load, and is on a much shorter TTL than the rest of the content. When you publish a new version of a post you would update the timestamp in the manifest, and the frontend pulls the latest version of it the next time the page loads.

Cache issues: React + REST server behind CDN

2 Answers2