I used to run a fluid ops group managing a complicated and (relatively) unstable system.
I always approached this as a difference between incident management vs problem management, the later being the “what actually happened” phase, with lots of bureaucracy and post-mortems.
I always taught people in my group to manage out during incidents if they understood what was happening. In the vast majority of failure modes you don’t need the most technical people working the keyboard performing, for example, a failover. Most of those processes are well documented and well understood. Very technical/operationally minded people tend to want to solve the problem as quickly as possible, but I always found them far more valuable discussing the issue with stakeholders, and playing a blocking move for the more junior guys/gals on the keyboards. This also helps the juniors get the experience necessary to eventually be able to help develop future staff.
I always approached this as a difference between incident management vs problem management, the later being the “what actually happened” phase, with lots of bureaucracy and post-mortems.
I always taught people in my group to manage out during incidents if they understood what was happening. In the vast majority of failure modes you don’t need the most technical people working the keyboard performing, for example, a failover. Most of those processes are well documented and well understood. Very technical/operationally minded people tend to want to solve the problem as quickly as possible, but I always found them far more valuable discussing the issue with stakeholders, and playing a blocking move for the more junior guys/gals on the keyboards. This also helps the juniors get the experience necessary to eventually be able to help develop future staff.