Fail Fast
This is one of the first design principles I learned, “fail fast" means that we should not try to hide errors in a system, it’s better…
This is one of the first design principles I learned, “fail fast" means that we should not try to hide errors in a system, it’s better instead to make a lot of noise when something fails, to make the failure evident and solve it fast.
Some people recommend making your software robust by working around problems automatically.
This results in the software “failing slowly.” The program continues working right after an error but fails in strange ways later on.
A system that fails fast does exactly the opposite: when a problem occurs, it fails immediately and visibly. Failing fast is a nonintuitive technique: “failing immediately and visibly” sounds like it would make your software more
fragile, but it actually makes it more robust.
Bugs are easier to find and fix, so fewer go into production
https://martinfowler.com/ieeeSoftware/failFast.pdf
Last week we were having an error in a preproduction environment (remember just production is production), people were not reading the code just taking a look at the logs.
In the logs, it was said that the problem was because it was not possible to connect to a third party to create a user.
Taking a closer look at the code, the log was painted in a generic try-catch block in the method to create the user.
But the error itself was a null pointer because the method used to do a post, a kind of utility method, returned a null.
Utility methods are usually a signal of poor object-oriented design.
The code was trying to deserialize the null object, creating a null pointer exception that finally was the cause of the log.
The real problem was that the utility method tried to hide the error returning a null and making the client of this method to manage that null, the client didn’t manage it because no one expected this to happen. To have a null when the system responsible to create the user fail to do it.
This is also related to the least astonishment principle:
In user interface design and software design, the principle of least astonishment (POLA), also known as principle of least surprise, proposes that a component of a system should behave in a way that most users will expect it to behave, and therefore not astonish or surprise users. The following is a corollary of the principle: “If a necessary feature has a high astonishment factor, it may be necessary to redesign the feature.”
https://en.wikipedia.org/wiki/Principle_of_least_astonishment
In this case instead of using that try-catch it should be better to fail with a big exception and return a 500 with a message saying what the system in charge of creating the user said.
That message was related to the fact that the token to authenticate was not allowed to do this operation.
The cost of deciding to hide the real problem into a generic error in this case meant to wait for something that was to change a property that was wrong during two weeks.
Two things to learn from this, the only truth is in the code, read the code and be fair with yourself don’t hide problems, apply Fail Fast Principle.


