Preferences

deathanatos parent
"Configuration errors" — I'm going to include "bugs" in that —, IME, tend to be global outages more often than regional. If I recount the outages >AZ that I've seen, I think the most recent ones were:

  GCP, IAM (global; just like a week and a half ago!)
  GCP, VMs etc. (regional!¹)
  Azure, application GW (global)
  Cloudflare (global)
  Azure, IAM (global)
  Azure, IAM (global)
You can tell IAM is a point of weakness. (As it kinda must be.)

¹though I wasn't affected by this one, as it was in Europe.


deepsun
I mostly remember AWS S3 outages, usually limited to a region, but the one in 2017 was supposed to be a regional update (US-EAST-1 region), brought down like a half of AWS, because they depended on S3 in US-EAST-1 [1]

Note that even the intended configuration change was designed to be Regional, not just limited to one AZ.

https://aws.amazon.com/message/41926/

everfrustrated
Notable you don't have AWS on that list.

AWS's definitions for AZ & Regions are by far the strongest in the industry.

GCP has AZ in the same physical complex. Azure Regions would be AZ's under AWS's definition.

timewizard
AWS had a console login issue a while back due to the default region being us-east-1. There are a handful of other services that are exclusively available in that region as well.
deathanatos OP
I haven't worked with them in quite some time. (That's changing, so uh … looking forward to my next AWS outage?) This was more to show regional vs. global than any specific cloud provider. AWS is skating by here on account of not being sampled¹.

If I go waaaaay back (like mid 2010s), we did have an S3 outage. It was regional, even!

> GCP has AZ in the same physical complex.

I can't say if that's correct or not; GCP says,

> Zones should be considered a single failure domain within a region. To deploy fault-tolerant applications with high availability and help protect against unexpected failures, deploy your applications across multiple zones in a region.

That's an AZ, to me. (Or, alternatively & synonymously, a failure domain.)

¹IME over my career, though, AWS is fairly stable. GCP is too. AWS has its foibles, though. When last I worked with RDS (circa 2019), there were bugs.

This item has no comments currently.