Cache invalidation quick view for React developers
A worthy read if you choose to host your React app with AWS Cloudfront + S3
TL; DR if you already know how content delivery network (CDN) works under the hood
In the last post, Infrastructure as Code made easy for React engineers, we built an CI / CD pipeline solution for a solo react engineer to deliver production grade serving infrastructure, which can scale to serve a reasonably large front-end team as well.
But in that post, we did not talk deep enough about why we needed to have the operation in our Github action script:
aws cloudfront create-invalidation …
It is an very important step for building production grade serving infrastructure for single page application (SPA), and it is called cache invalidation.
Why do we need Cloudfront?
Cloudfront is a content delivery network (CDN). In short, CDN is a content accelerating solution to the end users. A picture is worth 1000 words, so let’s talk about it with a diagram:
Our React app asset can only been stored in one place (an S3 region), and let’s say it’s in Virginia. But our potential customers come from not only from east coast in United State, but also west coast in United State, and plus there are some Europe customers…
If we don’t have CDN in our hosting solution, then customers in California will bear the long latency (through public Internet, US west coast to US east coast latency is ~180 ms; even with dedicated network like AWS internal network, it’s still > 60ms) — we are competing against law of physics here.
But if the needed content is located on a server somewhere in California (cached content), then we can significantly bring down the user experienced latency — this should be easy to understand because
shorter distance for the packet travel from the server to user;
less router for the packet to go through on the road
What Cloudfront did for us?
For the first customer who request our React app, the local Cloudfront (called edge server in CDN’s term) does not hold the copy; therefore it turns back to S3 to retrieve the content; after it gets a copy from S3, it serves the customer;
The acceleration happens from the 2nd customer on, when the customers request our React app, the local edge server serves the app immediately from its local cache — in this way, it saves a huge round trip and therefore the users experience a much quicker response.
PS: Another reason why we need Cloudfront is that even we can tolerate the potential long round trip, S3 can’t service https traffic (S3 only serves http), and the React app web hosting needs an https endpoint. Feel free to ask any question if this short note is not clear enough.
With CDN cache we have a staled cache problem, and that’s why we need cache invalidation
Let’s first read AWS Cloudfront document related to this specific topic:
You can control how long your files stay in a CloudFront cache before CloudFront forwards another request to your origin…
Typically, CloudFront serves a file from an edge location until the cache duration that you specified passes…
By default, each file automatically expires after 24 hours.
If we release a new version of our React app, we will run into problem that our Cloudfront servers serve the older version of React app depending your luck (when was the current cache retrieved). For static pages, serving staled content is generally ok; but for an app, a staled version of app may break the whole user experience (i.e., our backend APIs may have already changed as well). So, is there any way to solve this problem?
The answer is yes and the mechanism is called cache invalidation for the CDN servers:
When we run: aws cloudfront create-invalidation
,
we are telling the CDN servers that “hey, your current copy is invalid, please grab a new copy from the origin”. This was exactly what we did whenever we release a new version of our React app, we want to do cache invalidation, a critical step shown in the previous post. Please read that post with this “under the hood why” explanation.
Summary:
In this post,
we talked about why we need to use CDN for our React application;
why we need force cache invalidation whenever we have a new release of our React app
For how to achieve the above with real code, please go back and read our previous post.
PS: if you read this far, please click the like button to encourage me keep writing 😀