Distance Debugging Logo

Continuing with the discussion of what to do when you have no theory, the next likely cause is underconstraint, when the problem suggests a large number of possible causes. In this case, your best course of action is to figure out a few data collections that are the ideal combination of distinguishing (i.e. they would support some theories and not others) and low-cost doable.

Let's say that you are beta-testing a new version of your software, a standalone desktop application. It runs fine in your test rig, but for every user that installs it, it crashes on startup. Since the crash could have many causes, and dozens if not hundreds of things have changed in the newest version, you are totally underconstrained. You might try the following set of data collections to try to narrow things down:

  • Go through the list of dependent libraries for the application and compare the version on the customer machine to the version on the test machine.
  • Attempt an installation on several additional machines in your local environment in the hope that one of them has the same crash, thereby giving you a local machine to use for comparison.
  • Perform an installation of the software yourself on a customer computer, with a user present. Perhaps in your testing you are making a different set of choices or are otherwise installing it in a fashion that differs from their actions.

These three data collections are likely fast, as in could be performed the same day, and fairly accurate. The first would tell you if the environment might be to blame, the second would give you a leg up on replication, and the third would tell you if installation procedures were to blame. In each case, a meaningful result you would give you a significantly more constrained problem, even though you might still not have a theory at that point.

The final common cause of having no theory is the improbable problem. It can be hard to come up with a theory when we believe that the situation being described simply can't happen. Imagine that you get a bug report that a user is receiving two identical email notifications every time a report is generated by your system. You however know that you just put in a piece of code that is explicitly looking for duplicate notifications and throwing them out because of a known previous issue. You will convince yourself that the problem is everywhere but in your code. You will blame the outgoing mail server, the incoming mail server, user error, etc. before coming up with a theory. As described in the post from a couple of weeks ago on this same topic, there is one good thing to start with when you encounter a problem like this, and that is to replicate. The replication will force you to take it seriously. Another good tactic is to get another opinion from someone on your team. They might say, "oh yeah, I can totally see how that would happen even with your check in there", and then you will have something to go on.

For more general suggestions about trying to generate a theory, see Day 14: When the Blink Fails. Now the we have covered theorizing, the next few posts will cover data collection in more detail.

Tomorrow: The Distance Bug Investigation, Part V