There is a mistake among developers, very counter-productive: to try to eradicate minor bugs in complex systems. With this approach, they ultimately lose the ability to detect what is hidden within seemingly error-free systems.

According to Sidney Dekker, a professor at Griffith University, the worst incidents are those that occurred in companies whose safety record was supposedly perfect. He was speaking at the DevOps Enterprise Summit 2017 in San Francisco.

Software bugs: a healthy element

Developing flawless software is certainly a noble goal. But ultimately it can also discourage a company from having an honest approach to communication. "Minimizing this ability to disclose is bad for a company and its business," says Sidney Dekker. If the teams openly share the most complicated bugs, they could ultimately identify bigger flaws.

Kết quả hình ảnh cho công nghệ

There is a middle ground when it comes to standard rules and methods for quality control. Both create good conditions for securing software. However, if you use the wrong measures, you encourage bad behavior.

"This fascination with counting negative events and reporting them is an illusion," he continues. "To understand how complex systems crash, you have to act differently. "

Companies often calculate the number of days without incident. In software development , managers regularly count the number of days that orders did not show any errors. For Sidney Dekker, these two approaches are not very effective.

If a software is not considered perfect, that does not mean that it is not defective. Companies should not give too much importance to small incidents, especially if the system can reveal more critical flaws.

Encourage the culture of responsibility

The next key point is that developers should also consider another element : recurrent - or hidden - errors that are generated even when the results are positive. However, this requires some constraints:
We must be able to say "stop";

Recognize that what has gone well in the past is not a guarantee of success in the future;
This requires a culture that takes into account the diversity of opinions; and
a culture where risk is ubiquitous in discussions.

Learn from what went well

Establishing adequate feedback conditions for the entire company can reduce potential incidents in complex systems. "If we want to understand what's going to go wrong and the reasons for the problem, we should not focus on small bugs and error counting," says Sidney Dekker. 
"Understanding why the system works is the right thing to do. "

But the reality is that there are many more systems that work than others that do not work, says Erik Hollnagel, a university professor in Denmark and expert in resilience. Businesses tend to post-mortems most often when things go wrong.

"To find out why things are really bad, we need to understand why they are doing well," says Dekker. In other words, post-mortems must also identify what is behind the positive aspects.

Abraham Wald, whom many see as one of the founders of operational research during the Second World War, laid the foundation for a method of learning from positive elements in a negative context. For example, he was asked to determine how to optimize the installation of the shield on fighter planes. Shielding is a dead weight on an airplane, and the pilots wanted the bare minimum to maintain these aircraft in flight.

After measuring and counting the impacts on the aircraft, Abraham Wald and his team suggested putting the armor where there were holes. What developers are finally doing today is focusing their attention on where the bugs are happening. Instead, Abraham Wald realized that the armor should be placed where there were no holes. An example to follow.