So you have a bug reported, some network of contacts that you have developed over time, and maybe a theory that developed when you "Blinked" at the report. How do you actually start to confirm or deny that theory, or even develop a theory if you came up with nothing? The first thing to do is open a "case file" for the bug, and that will require a new kind of bug tracking.
Traditional bug reporting systems such as Bugzilla and TestTrack, while offering a wide range of capabilities, have fundamentally the same goal: allow a manager or team to see what is currently wrong, who is responsible for fixing it, and possibly when it will be fixed. Strangely, they offer very little support for the person actually doing the fixing in terms of tracking and augmenting their work. Imagine if development environments had more support for keeping track of who was responsible for writing each piece of code than for actually writing the code itself. That is the state of bug tracking.
There are other problems with current bug tracking systems, with their emphasis on workflow and assignment of responsbility. They encourage finger-pointing rather than fixing. We should be creating an atmosphere that says, "every bug is everyone's problem". Often bugs take forever to get fixed simply because the bug is constantly being reassigned in tiny steps to different users rather than having them attack it quickly as a team. In a more subtle way, they orient us towards the surface features of a bug rather than theories of the problem. This is most pronounced when a bug is "reopened" when a problem reappears because of a totally different root cause. This makes it appear that the original fixer was lax or incorrect in their solution, which is totally false.
What should a better bug tracking system provide:
- It should provide a way of entering basic information about the bug as current systems do, including a description of the problem as first described, the initial reporter, and the severity.
- It should, at a glance, allow a developer to determine the current state of a bug. Do we already know what's wrong but haven't had time to fix it? Do we not even have any theories of the problem? Have we not even begun collecting any data about the problem? There should be more information than a state like OPEN, ASSIGNED, or CLOSED.
- It should provide a way to track and quickly review all the observables for a bug, i.e. direct data collections, user observations, etc. along with an estimate of the certainty of that information so that we can discount information that the team did not obtain directly if necessary.
- It should provide a way to track and quickly review any theories that we have developed. It should allow the assignment of likelihoods to theories so that we can see what our most probable theories are. We should also be able to quickly see what theories were rejected and whether any theories remain that have not be disproved.
- It should provide a place to list any tests that have been tried and the data they produced, and the ability to link that information to the the theories in terms of whether the results support, refute or do not affect them.
- It should provide a place to put any additional assumptions that are being made about the bug or the system in question that either remain to be tested, or which are either untestable, or usually tacit but which might be called into question in the current bug investigation.
- It should provide a clear place to list possible fixes based on the current theory, with a description of how it addresses the problem, and any associated information about the fix.
- It should provide a clear statement of the final resolution and any follow-up caveats or assumptions built in to the fix that was chosen, such as side-effects or reasons that the fix might be undone or redone if circumstances change in the future.
The overarching goal of a system like this is to act as a persistent debugging memory, where you can relate new issues to previous ones, figure out which underlying causes recur in your system to point out weak links, and gain other insight into how and why your system tends to fail. In my copious free time, I am actually working on developing a bug tracking system that meets the above requirements, so look for that on this site sometime in the (not-so) near future. However, I've discovered that this type of information can be tracked in a document without too much trouble. I will reference this type of tracking as the investigation process is covered in the next few posts.
Tomorrow: The Distance Bug Investigation, Part II
