On June 8th, 2021, a massive internet outage caused by CDN provider Fastly, left hundreds of popular websites in the dark. Millions of web users around the world reported problems accessing websites and applications like Spotify, Netflix, Amazon, BBC among many others hit by the issue.
What exactly happened?
Fastly, a popular US-based Content Delivery Network (CDN) provider is dedicated to helping web users around the globe access and display website content as fast as possible.
Just like other CDN vendors, they accomplish this by leveraging a wide network of web servers (points of presence or ‘PoPs’) that are configured to deliver content to clients from the geographically closest PoP to that client, thereby enhancing the transfer speed. However, on June 8th, their service experienced a glitch, preventing the companies that use their services from operating on the net at all :
- Early morning June 8th (around 5:00 AM CST), Fastly officially announced that they were investigating potential impacts to the performance of its CDN services.
- In less than 60 minutes after the initial status update, Fastly let customers know that they found a solution to the issue and implemented a fix.
- Update at 5:57 AM CST : The issue has been identified and a fix has been applied. Customers may experience increased origin load as global services return.
- Resolved at 6:41 AM CST : Fastly has observed recovery of all services and has resolved this incident. Customers could continue to experience a period of increased origin load and lower Cache Hit Ratio (CHR).
Multi CDN as a solution for CDN outages
As can be seen from this incident, CDN providers like Fastly are vulnerable to outages, causing downtime for everyone who relies on their services whenever an outage happens.
As websites should be available to users free of interruptions and outages, minimizing or even better eliminating the risks associated with outages is essential. This is where Multi CDN solutions come into play.
A Multi CDN solution, as the name implies, is a method that leverages multiple CDNs from different providers simultaneously to optimize the speed of content delivery and assists in avoiding latency and outage issues.
How we helped our users avoiding this
Mlytics uses RUM and synthetic monitoring to collect network performance data including CDN latency and availability.
These data are fed into the Mlytics decision engine, which helps our users identify and choose the best-performing CDN autonomously. These collective features are what we call : Smart Load Balancing.
The same data are also displayed on the ‘Pulse’ chart, which gives a comprehensive view of each CDN’s performance at a certain time.
Fastly is one of the various CDNs offered via our Power-Ups (Multi CDN marketplace) on the platform. When Fastly’s outage took place, Mlytics Smart Load Balancing swapped out Fastly with the next best-performing CDN (varies in different regions) for requests.
The chart below displays the Smart Load Balancing optimization decisions, it shows which CDNs were switched to during a certain timeframe. As shown, Akamai had several query spikes in the United States due to the Fastly performance drop – the system is actively swapping over to Akamai.
This successfully helped plenty of our customers mitigate Fastly’s outage, and we received no complaints over the course.
On the Pulse chart, we’re also seeing an availability drop for Fastly in the same time frame. This helps illustrate what happened when aligning this with the chart above.
Of course, every CDN provider goes the extra mile to deliver the best performance possible, but unfortunately, things do happen. At Mlytics we strive to help our customers minimize the risks associated with outages to ensure maximum uptime at all times.
Again, outages happen, so stay prepared…
The outage Fastly experienced should be considered a wake-up call, reminding us that any service can experience an outage, no matter its size or scale. Nobody can stay immune to the vulnerabilities of misconfigurations, cyber attacks, or vendor outages.
Therefore, having a solid cloud redundancy and disaster recovery plan in place to prevent any event from taking down your service, is imperative. As we learned from this event, putting all your ‘CDN-eggs in a single provider basket’ is not the most ideal way to provide continuous, optimal service to your users.
Opting for a Multi CDN solution is a proactive decision that will eliminate the need for reactive ones later.