Preferences

I don’t work for aws, but a different cloud provider so this is not a description of this incident, but an example of the kind of thing that can happen

One particular “dns” issue that caused an outage was actually a bug in software that monitors healthchecks.

It would actively monitor all servers for a particular service (by updating itself based on what was deployed) and update dns based on those checks.

So when the health check monitors failed, servers would get removed from dns within a few milliseconds.

Bug gets deployed to health check service. All of a sudden users can’t resolve dns names because everything is marked as unhealthy and removed from dns.

So not really a “dns” issue, but it looks like one to users


This item has no comments currently.