That doesn’t mean people aren’t accountable for their actions in “blameless” cultures, but problems are first assumed to be the result of systemic issues, deserving to be fixed, rather than individual issues, deserving to be punished.
Two of them relevant to this story were:
webservers.regenerate.all.cache.files
webservers.release-prep.stop.all.services
The first one would refresh all the cached information after a marketing database update. The second would stop all the webservers.Guy's first day; I'm showing him the ropes; we push the marketing data update and set about regenerating all the cache files by manually picking the correct file from the folder of all possible files. I'm sure we can all guess what happened to make this a story remotely worth telling...
Complete site outage. Completely unnecessary. Completely human error.
Should we blame the guy who clicked on the file that was directly adjacent to the one he intended? Should we blame me as the guy overseeing the training? Or should we change the system so that files that we use multiple times everyday and are safe/innocuous are't right next to an E-stop/EPO button? Or maybe we should change the system so that pushing marketing data refreshes the caches files automatically?
Blameless culture favors the latter actions over the former and tends to make your operation stronger and more resilient over time. The experts (and the novices) who made the mistake can speak freely about what happened and how we might prevent it, without fearing reprisal.
If someone repeatedly kills the site by mistake time after time, despite reasonable safeguards being in place, they should face disciplinary action. But when they make an honest mistake because we left an idling chainsaw laying around on the workbench, it makes no sense to blame them for grabbing it by mistake.
As stated earlier, blameless postmortems are for RCA of a particular incident. If you shoot every engineer who causes a incident you will succeed in having no incidents because no one will bring them up or making any changes for fear of getting shot.
If you promise people that there will be no blame, no punishment, no nothing, then they may speak more forthrightly. Do you want that honesty or do you prefer to retain the option of punishment?