2021’s slew of Internet outages or disruptions show how connected and relatively fragile the Internet ecosystem is. Case in point: December’s trifecta of Amazon Web Services (AWS) outages, which really brought home the fact that no service is too big to fail: The reality is, the next outage is not if, but when, where, and for how long. Pretending they don’t exist or won’t happen is not only pointless but harmful to your business. Looking back at the three December outages, we see four key takeaways:
1. Early detection is key to handling outages like the AWS incidents. 2. Comprehensive observability helps your team react at speed to outages. 3. Ensuring your company’s availability and business continuity is not a solo endeavor. 4. Depending on only a monitoring solution hosted within the environment being monitored is not enough.
Lack of observability is never a good thing, but over the course of an outage, it is significantly worse. Interestingly, AWS’s Adrian Cockroft pointed out the issue in a post on Medium, where he noted, “The first thing that would be useful is to have a monitoring system that has failure modes which are uncorrelated with the infrastructure it is monitoring.”