When you sit down to debug a problem, there is a certain amount of information missing. Many point to this as the essence of debugging: filling in those gaps until you have a clear enough picture of what is wrong to identify the cause and fix the problem. My problem with this description is that it assumes that you can simply gather the necessary information until you have enough. In my experience, there is a certain amount of information that is simply not available due to reasons such as:
- The person or people who know about the system or code no longer work there, or were never directly accessible in the first place as when using commercial software.
- It is unclear who actually knows, especially in the case of large third-party libraries or open-source software.
- The information is closely guarded for intellectual property or reverse engineering prevention reasons.
- The cost of acquiring the necessary information is simply too high.
So while debugging is partially about figuring out what you don't know and filling in gaps, it is just as much about reasoning well with partial information, or using what you do know to constrain the missing information to a set of possibilities that can be exhaustively tested. In some cases, debugging is even about deciding whether it is faster to research and acquire the information directly (with some estimated probability of success), or to try to guess at the missing information in the hopes that with a small number of possibilities, one will become obviously correct. You want to find the mental distance gap that is easiest, cheapest, or fastest to close, depending on the situation.
There is a second kind of mental distance, and that is not knowing about changes that take place that affect your system. Since change tracking should drive a debugging investigation (more on this in a few days), this can be very troublesome. Some examples of "hidden" changes that result in baffling bugs:
- A piece of hardware is added or removed from a machine.
- A user decides to try data inputs that are outside the original specifications of the system.
- A user decides to try a sequence of operations that they have never tried before (and maybe no one has tried before).
- A piece of conflicting software is installed.
- A piece of dependent software or the OS is upgraded.
When a bug is reported on a running system and one of these types of things has taken place unbeknownst to you, it can be nearly impossible to determine why the system has failed. Being aware of this mental gap, and keeping in mind that machines change configuration and people change their behavior can be critical to quickly asking the right questions in a debugging situation.
In case you were looking for help right away, I'm not going to cover how to overcome mental distance in this post. Instead, I will be laying out the techniques, tools and strategies in a separate series of posts after the types of distance are described.
Tomorrow: Physical Distance
