2014-05-23

What a radioactive leak can teach us about avoiding blame culture

The Verge published a notable article today titled "Radioactive kitty litter may have ruined our best hope to store nuclear waste". It's a well published story by reporter Matt Stroud (@ssttrroouudd) about the New Mexico Waste Isolation Pilot Plant (WIPP) and how a seemingly banal procedural human error has resulted in the shutdown of the site and jeopardized the future of radioactive waste disposal for the facility.

More interestingly is the teachable moment in all this about avoiding blame culture. It's easy to quickly react and suggest that the human who made the error should be punished, perhaps heavily fined, fired, or worse (some of the comments in the article suggest just that, and even Jim Conca a PhD and ex-geologist at WIPP who Matt interviewed suggested the same but later backtracked). Matt jumped in to reply to a comment suggesting the offender be jailed with an insightful rebuttal from Per Peterson, a professor at UC Berkeley’s Department of Nuclear Engineering who Matt exchanged emails with for the article. In the comment Matt mentions what Peterson had to say about this:

"The natural tendency in events and accidents is to focus on assigning blame and punishing human errors. This approach is generally ineffective because human error happens. The critical issue for safety is to design systems which are tolerant of human error and which encourage reporting of problems and errors and effective corrective action."

He's absolutely right about this. And it's applicable to so many other industries like health, construction, banking, and more, but its especially relevant to me as it applies to my line of work: IT. It's not uncommon that major mistakes happen in software development, that a small coding error brings down a system or an incorrect infrastructure config causes downtime in the middle of the night. It's hard not to react with blame top of mind when these moments happen.

Peterson suggests we eschew the natural reaction and instead design processes that account for the possibility of human error AND that we promote feedback loops allowing for process improvement. In IT that can manifest itself as infrastructure-automation (Chef, Puppet, etc.), continuous integration with good test coverage, etc. for the former, AND things like DevOps culture, blameless post-mortems, etc. for the latter. Making this the default mindset in a company really comes down to culture and how this type of behavior is rewarded and encouraged. I know I don't always practice building a blameless culture myself, but stories like this and advice like the kind that Peterson gives reminds me of the importance of doing it.

No comments:

Post a Comment