Last week on Thursday July 22nd, 2021, a sweeping internet outage caused by CDN provider Akamai, made thousands of websites go down. Netizens around the world reported issues accessing websites and applications like HSBC, Delta Air Lines, FedEx, and McDonald’s among others hit by the outage.
What exactly happened?
Akamai, a leading Content Delivery Network (CDN) provider strives to help web users around the world access and display website content in the most seamless possible manner.
Akamai achieves this by taking advantage of a wide network of edge servers (points of presence or ‘PoPs’) that are configured to deliver content to users from the geographically closest server to that client, therefore boosting the transfer speed. However, on July 22nd, their service experienced a disruption, causing the websites of companies that solely rely on Akamai’s services to go down :
- In the morning of July 22nd (around 11:30 AM CT), Akamai said in a tweet that they were investigating a service disruption.
- At noon (around 1:00 PM CT) Akamai stated that a software configuration update triggered a bug in the DNS system, the system that directs browsers to websites. This caused a disruption impacting availability of some customer websites.
- Shortly followed by a tweet saying that the disruption lasted up to an hour. Upon rolling back the software configuration update, the services resumed normal operations and Akamai confirms this was not a cyberattack against Akamai’s platform.
Yet another outage, time for change?
In the meantime, this is the third large-scale internet outage in two months, and already the second one caused by Akamai.
Earlier this year, in June, hundreds of websites including Spotify, Netflix, and Amazon went down for about an hour because of a glitch in the content delivery network Fastly. Ten days later, banks, airlines, stock exchanges and trading platforms experienced short outages, which according to Akamai was the result of a bug in their service that helps mitigate distributed denial-of-service (or DDoS) attacks.
Large-scale website and web app outages happen occasionally and usually don’t last very long. Content delivery networks and other hosting services leverage a global network of backup servers designed to limit the risk of disruptions when things go down. However, when things go down, and they do – this may have devastating consequences for websites’ brand name and revenue streams.
Outages that took place recently have caused experts to warn of the risks of the internet’s reliance on a relatively small number of core infrastructure providers. As stated by Nick Merrill, research fellow at UC Berkeley’s Center for Long-Term Cybersecurity: “CDNs are the biggest centralized point on the internet, making them a potential target for cybercriminals or government actors. If one of them goes down huge swaths of the internet could go with it.”
A solution for CDN outages: Multi CDN
As can be seen from this incident, CDN providers like Akamai are vulnerable to outages, causing downtime for all website owners who rely solely on their services whenever these services experience a glitch.
Websites should be available to users free of lags and down times. Therefore, limiting or even better eliminating the risks associated with outages is extremely important. This is where Multi CDN solutions come into play.
A Multi CDN setup, as the name implies, is a solution that leverages multiple CDNs from different CDN providers simultaneously to boost the speed of content delivery and assists in avoiding outages and latency issues.
How Mlytics helped users to eliminate downtime
Mlytics leverages RUM and synthetic monitoring to collect CDN performance data including CDN latency and availability.
These data go through the Mlytics decision engine, helping our users identify and choose the best-performing CDN automatically. These collective features are what we call : Smart Load Balancing.
These data are also displayed on the ‘Pulse’ (performance analytics) chart, which gives a holistic overview of each CDN’s performance at a certain time.
Akamai is one of the multiple CDNs accessible via the Multi CDN marketplace (Power-Ups) on the Mlytics platform. When the Akamai outage happened, Mlytics Smart Load Balancing changed Akamai with the next best-performing CDN (different for each region) for user requests.
This chart shows the optimization decisions made by our Smart Load Balancing solution, it displays which CDNs were selected during a certain timeframe. As shown, Stackpath and Cloudfront had several query spikes due to the Akamai performance drop – the system automatically switched Akamai with better performing CDNs.
This successfully helped our customers mitigate Akamai’s outage, and we received no complaints over the course.
On the Pulse chart, we also see a drop in terms of availability for Akamai in the same time frame. This helps illustrate what happened when aligning this with the chart above.
Every CDN provider aims to deliver the most seamless user experience possible, but as seen from recent outages, things tend to go haywire. At Mlytics we help our customers to eliminate the risks associated with outages and latency to ensure maximum uptime at all times.
Lesson learned, prepare for downtime
We should consider the outage caused by Akamai as a wake-up call, it alerts us that CDN services do experience downtime. Regardless of the size or scale, nobody can stay immune to the vulnerabilities of misconfigurations, cyber attacks, or other glitches.
Hence, it is important to have a solid cloud redundancy and disaster recovery plan in place to prevent any event from causing your service to go down. As stated by Nick Merril, when a CDN goes down, parts of the internet go with it, and we should be aware of the risk of relying solely on a relatively small number of core infrastructure providers.
Opting for a Multi CDN setup is a proactive approach that eliminates the need for reactive ones later.