Software development has two general phases, the building phase and the verification phase. In the building phase, the attitude towards change defaults to "allow". New features can be added, code may be significantly reworked, and designs might be altered, as the goal is to add new capabilities to the system at the expense of increasing the risk of introducing new flaws and problems. Once the building phase has substantially completed, the verification phase commences, where the attitude towards change flips to default "deny". The goal is to achieve a stable system that performs the desired features. Since every change has the potential to introduce errors, changes are only made to correct known flaws.
Obviously, a significant amount of testing is done during the verification phase to assure that the features are implemented as specified, and that no new problems have emerged. However, some amount of testing must be done in the building phase to convince the developer and the team that things are near enough to stable that it is worth switching the change policy to default "deny". The big question is, how much testing is enough? There are significant dangers to insufficient testing, but too much testing also has consequences.
Dangers of Insufficient Testing
- Wasted time - Blocker bugs found during verification often mean that testers and developers sit around waiting for an issue to get resolved, and that wastes everyone's time.
- False sense of security - There is a general feeling of excitement and progress when things move to verification, that the application is moving towards complete. If it turns out that major bugs exist and features are not implemented correctly, this can throw the whole schedule into disarray.
- Blurring of phase line/Vacillation - This is the most dangerous result of all, a general blurring of the line between building and verification, or constantly switching back and forth between phases. The end result is an "anything goes" policy towards change, and a general march towards testing hell where no features can be verified because developers feel free to change anything at any time. It starts simply enough, where a couple of big bugs make it into the verification phase so things are moved back into building mode. Then it's back to verification after fixes are put in place but testing still wasn't done sufficiently and new bugs are found or the old bugs are rediscovered, but now time is getting tight and the team is encouraged to just get it working so changes start getting made wherever needed at all once. At this point, no more guarantees can be made about functionality. Essentially, this is permanent building mode. It is the sad truth that many applications ship directly from the building phase, without ever having a verification phase, and appear to do so successfully. IT WON'T WORK LONG TERM. Code can ship a few times under these conditions, but eventually this attitude will cause the codebase to crumble under the weight of a million changes with no external verification. It is accruing technical debt, and just because no one is watching the balance go up doesn't mean it won't eventually come due.
Given all these dangers, why not doing as much testing as possible during the building phase?
Dangers of Excessive Build-Phase Testing
- False sense of quality - Developers are notoriously bad at testing their own code. It's too easy to fall into the trap of checking the few code paths you've spent the most time on and ignore some of the other paths that just happen to be the ones most frequently experienced by users. Developers also have a built-in disincentive, which is that testing finds bugs and that means more work for them, so they are more likely to turn a blind eye to a real problem or dismiss it as "unlikely". Like under-testing, over-testing can create a false sense of confidence in the code because so much testing was done, when in fact a small portion of the code was heavily tested, but in fact many new issues are likely to be found during verification.
- Skill mismatch - Testing is a separate skill from development, and one that requires a very different mindset than development. Some developers are also good testers because they are good at thinking like users, or are good at thinking up diabolical ways to break their own code. Most are not. They can produce quality code but haven't developed that additional skill set. In short, developers are generally not very efficient testers, and asking them to do testing beyond what is necessary to move to verification is inefficient.
- Perfectionism - If the management and team come down too harshly on developers for letting even small or unlikely bugs through, developers will tend to respond by becoming perfectionists, never letting their code be called "done" until they have gone over every line, every usage scenario, and every potential input. Developers become loathe to make changes because it means their code might have a bug and they will get an earful from the team. This result is nearly as bad as the "Blurring of Phase Lines" case, but in a different way. Rather than live in the build phase forever, the verification mindset takes over the entire process. Instead of shipping code of questionable stability because there was never a verification phase, code never gets shipped at all as every developer is too worried about protecting their reputation.
Unfortunately, trying to require too much testing can be just as bad as requiring too little, so then returning to the question of how much testing is enough, suppose that during the verification phase, two different bugs are found:
- If more than 10000 transactions are rolled back simultaneously, a system crash results.
- The application will not launch.
Which of these ought to have been found and fixed in the building phase? Most observers would likely agree that item 1) is probably not the kind of thing that could or should be found in the building phase, and item 2) is most definitely the kind of thing that should have been found. Somewhere in the middle is a great gray area of bugs and issues that some believe should be rooted out during building, and others might consider a verification phase find. Even if the opinion on a particular issue may vary from developer to developer, it is useful to establish some criteria to determine what bugs might indicate that the building phase code has or has not been sufficiently tested. If a bug is found during verification, consider the following:
- Recency - Might this bug have existed for weeks, months, or even years and the new work has somehow uncovered it, or was this bug clearly introduced by the latest work? Developers can easily break parts of the code that they do not use frequently, or that has been around for a long time because they assume that it has gone through many rounds of testing and is hardened. It can be very hard, and it is probably inefficient in the build phase to throughly test enough of the system to find bugs that have stubbornly hidden through many rounds of testing. On the other hand, if a developer adds a feature and it is found to be immediately broken, that's a red flag.
- Obviousness - How likely is it that a developer performing a reasonable testing regimen would have come across this bug? In the examples given, the application failing to start is horribly obvious, while a failure that occurs after a large number of concurrently rolled back transactions is much less so (unless this is a known common condition in this application). Obviousness is a value judgment though; what is obvious to one developer might not have been to another, especially those with more rigorous testing practices. Still, a developer whose code frequently has obvious bugs on the main code paths is generally not doing enough testing in the building phase.
- Frequency - How often does this bug occur? This goes along with obviousness in that an infrequent bug might not show up in the kind of basic testing done in the build phase, while frequently occurring bugs should be found and rooted out. There are frequently occurring bugs that might not show up in any of the obvious code paths though, for instance, a failure that occurs after the user clicks "Back" on their browser and resubmits a form. This might be something that happens all the time with a web application, but which a developer might not think to test.
- Severity - Less important than the other three, but still something to consider. A developer can easily overlook something important to users but which doesn't result in errors or crashes, for example, the tab order of a set of fields, because it's simply harder to test for that kind of thing. On the flip side, a very severe issue such a system crash that only happens under very infrequent or even apparently non-replicable conditions can go equally unnoticed. However, serious errors that are easily detectable just by walking through the application should be caught during the building phase.
To summarize, bugs that would have been added during the most recent set of changes, are obvious (according to one's own definition of obvious), frequently occurring, and severe should all be stamped out during the build phase. After that, it's up to the manager and the team to decide what level of testing to do before moving to verification, and that will be a factor of both the experience and skillset of the team, and the type of project being implemented. By choosing this level wisely and frequently adjusting it through feedback, both developers and testers can work efficiently to create a stable, feature-rich product.
