Distance Debugging Logo

I enabled and configured FastCGI on this server, so I'm hoping that will make things a little more zippy for you. It seems to be working so far, and things do seem to be faster. Apparently it's not a perfect solution, so please post a comment, or drop me a line using the contact link at the top if you notice any weirdness.

So you have a bug reported, some network of contacts that you have developed over time, and maybe a theory that developed when you "Blinked" at the report. How do you actually start to confirm or deny that theory, or even develop a theory if you came up with nothing? The first thing to do is open a "case file" for the bug, and that will require a new kind of bug tracking.
Traditional bug reporting systems such as Bugzilla and TestTrack, while offering a wide range of capabilities, have fundamentally the same goal: allow a manager or team to see what is currently wrong, who is responsible for fixing it, and possibly when it will be fixed. Strangely, they offer very little support for the person actually doing the fixing in terms of tracking and augmenting their work. Imagine if development environments had more support for keeping track of who was responsible for writing each piece of code than for actually writing the code itself.  That is the state of bug tracking.

There are other problems with current bug tracking systems, with their emphasis on workflow and assignment of responsbility. They encourage finger-pointing rather than fixing. We should be creating an atmosphere that says, "every bug is everyone's problem". Often bugs take forever to get fixed simply because the bug is constantly being reassigned in tiny steps to different users rather than having them attack it quickly as a team. In a more subtle way, they orient us towards the surface features of a bug rather than theories of the problem. This is most pronounced when a bug is "reopened" when a problem reappears because of a totally different root cause. This makes it appear that the original fixer was lax or incorrect in their solution, which is totally false.

What should a better bug tracking system provide:

  • It should provide a way of entering basic information about the bug as current systems do, including a description of the problem as first described, the initial reporter, and the severity.
  • It should, at a glance, allow a developer to determine the current state of a bug. Do we already know what's wrong but haven't had time to fix it? Do we not even have any theories of the problem? Have we not even begun collecting any data about the problem? There should be more information than a state like OPEN, ASSIGNED, or CLOSED.
  • It should provide a way to track and quickly review all the observables for a bug, i.e. direct data collections, user observations, etc. along with an estimate of the certainty of that information so that we can discount information that the team did not obtain directly if necessary.
  • It should provide a way to track and quickly review any theories that we have developed. It should allow the assignment of likelihoods to theories so that we can see what our most probable theories are. We should also be able to quickly see what theories were rejected and whether any theories remain that have not be disproved.
  • It should provide a place to list any tests that have been tried and the data they produced, and the ability to link that information to the the theories in terms of whether the results support, refute or do not affect them.
  • It should provide a place to put any additional assumptions that are being made about the bug or the system in question that either remain to be tested, or which are either untestable, or usually tacit but which might be called into question in the current bug investigation.
  • It should provide a clear place to list possible fixes based on the current theory, with a description of how it addresses the problem, and any associated information about the fix.
  • It should provide a clear statement of the final resolution and any follow-up caveats or assumptions built in to the fix that was chosen, such as side-effects or reasons that the fix might be undone or redone if circumstances change in the future.

The overarching goal of a system like this is to act as a persistent debugging memory, where you can relate new issues to previous ones, figure out which underlying causes recur in your system to point out weak links, and gain other insight into how and why your system tends to fail. In my copious free time, I am actually working on developing a bug tracking system that meets the above requirements, so look for that on this site sometime in the (not-so) near future. However, I've discovered that this type of information can be tracked in a document without too much trouble.  I will reference this type of tracking as the investigation process is covered in the next few posts.
Tomorrow: The Distance Bug Investigation, Part II

Now that you have a contact who has delivered results on at least one task, you should figure out their commitment level and domain of expertise. Here are some general levels from least to most commitment:

Level 0: The user will field the occasional question, but will not perform any actions on your behalf.

Level 1: The user will field questions, and is willing to test patches, beta releases, and other code changes to verify fixes and produce extra debugging information.

Level 2: Everything from Level 1, plus is willing to spend regular time working through issues with you. They will set aside time (maybe averaging an hour every 1-2 weeks) to actually work with you over the phone or IM (or even email if necessary) to collect information in real time.

Level 3: Everything from Level 2, plus will continue to assist you with some task autonomously. They will accept some level of tasking from you to perform during their daily work and will report back results to you. There are very few users that are willing to do this out of the goodness of their heart, or for love of system. Generally you will get a Level 3 user when some higher-up compels an employee to assist you for the good of the project.

In some cases, it will be immediately clear what level a particular user is willing to work at just based on what they offer to do. In other cases, a user will progress to higher levels over time simply because they feel rewarded for the work they do.

Beyond determining the level, you will want to get a mix of contacts from various departments or teams to get a wider set of perspectives and testers. For example, if you are making a billing system, you will want a contact from the sales team, the accountants, and any other group that might have it's own perspective. In general, you shouldn't turn away an additional contact from a group from which you already have a contact, but it makes sense to actively look for users from groups for which you don't currently have a good contact.

Tomorrow: The Distance Bug Investigation, Part I

In a Distance Debugging situation, especially one that involves a large physical distance, nothing is more valuable than a trusted contact. A trusted contact ideally is another member of your team that is working with you to solve problems, but more likely it is someone from your customer, either a user or local administrator, that will act as your eyes and fingers.

The qualities that you need in a trusted contact include (in approximate order of importance):

  1. Attention to Detail - This can't be stressed enough. I would much prefer a totally naive user with no technical skill that will faithfully report error messages and follow directions perfectly than the opposite. Sometimes a highly technical contact will try to out-think you to the detriment of the debugging effort.
  2. Enthusiasm/Passion - I am firm believer that we ultimately do not commit to things that we do not enjoy. A tepid contact who doesn't care about the fate of the system will not make the extra effort to make sure that you solve the tough problem.
  3. A Thick Skin - This is needed for two reasons. First, when time is short and tempers flare, you need to know that your contact can take a little sarcasm. Second, you will need them to check and recheck something, and then check it again just for your peace of mind. They need to understand that you are being thorough, and not that you are questioning their competence.
  4. Domain Knowledge - They should understand the world of the users of the system, in order to answer certain questions, although this is less important
  5. Technical Knowledge - Some technical knowledge can be helpful, especially simple utility things like how to open the Windows Control Panel. However, this is the generally not very important.

The first thing to do when looking for a trusted contact is look for someone with these qualities. Often, I will look at a very detailed bug report and think, "This person looks like they would be a good contact to make." I will immediately try to get in touch to let them know that I appreciate their report and to gauge their interest in working with me on a regular basis. One of the easiest ways to determine this is by simply giving them a small task such as getting the answer to a question for you. For example, I might ask, "Can you check around with other users in your area and see if any of them use the X feature? We're trying to decide if we want to keep it around." If I check in with them the next week and they've polled 5 or 10 users, then I know I can count on them in the future. On their end, they usually realize that this gives them more of an opportunity to influence and guide future development, and it gives them buy-in.

Tomorrow: Developing a Trusted Contact, Part II

I'm going to veer off topic a bit again (sorry for those looking for debugging stuff, back tomorrow) and talk about a different kind of debugging, trying to figure out why some couples don't work out. I have a theory called "BS Filter Compatibility" that I think explains a lot. Have you ever met a couple that goes to a movie together and one of them thinks it was really deep and powerful and the other one thinks it was fluffy nonsense? To me, those couples never last, or if they do, they aren't ever that happy. The problem is their BS filters are calibrated differently. This ultimately means that one person starts to lose respect for the other, which is death for a relationship.

The origin of the BS filter theory came from stories that I've heard many long-term couples tell about their meeting and initial bonding. For example, my wife and I originally found a connection because we both felt that a student organization that we had both joined independently was totally pointless and overblown (and we quit soon after). We felt like we were the only ones who could see it, and it gave us mutual respect for each other right from the start. Oddly, the only "romantic" movie I've seen play up this angle is Wedding Crashers, specifically the initial wedding scene where the sister-of-the-bride (on the altar as a bridesmaid) can't stop laughing as the bride and groom read their ultra-cheesy wedding vows laden with horrible sailing puns. I loved the fact that Owen Wilson's character basically chooses her at that moment because he realizes that she finds it as ridiculous as he does.

The problem is, how close do people's BS filters really need to be for compatibility. Based on the incredibly unscientific survey of myself and my wife of seven years, I would say you have to see the value in about 75% of things that the other person is interested in, and you can feel smug about the other 25% without things disintegrating. For instance, if I look at our musical tastes, we certainly don't overlap 100%, and very few people do. Instead, we both find common ground in classic soul and rock, 70s disco and funk, hip-hop, and some modern pop music. I enjoy a lot of electronic and indie rock (think Stereolab and Geggy Tah), and modern folk/singersongwriter stuff that she can't stand and lots of jazz to which she is mostly indifferent. She enjoys 80s and 90s punk and other hard rock, and some contemporary rock that I can't stand. However, the 75% of music that we do like in common is enough to always find a mutually agreeable song on the radio, or to put together a mix CD that both can enjoy.

I don't pretend to be an expert, but if you are in a relationship where you just don't feel compatible, it might help to sit down and make short list of music, movies, books, etc that you think are good and meaningful, and think about the same set for your partner. If you find yourself sneering at his or her selections and at the end you only have a few things in common, it might be time to reevaluate.

To briefly recap Part I, the idea is to try to establish a rapport while getting more information, as well as learning about what kind of user is behind the report. By now, you have heard "the story of the bug".

  1. After hearing the basic story, decide whether you have a clear enough understanding of the problem. If so, restate it to the user to see if they agree. At that point it mostly becomes a negotiation to make sure that any remaining disfluencies are ironed out.
  2. If you feel that there are still gaps or confusing elements, it might be because of one of a few things:
    • There is some word that you both think is clearly defined but which you are actually using differently and that is preventing good communication.The key here is to focus on the phrase that is leading you astray. The user might say "and then the mouse stops working", and they really mean that they can't click on something that they think should be clickable, but in your head you are imagining some kind of driver or hardware failure. Try to get them to state the problem in different words or better yet, see if they can demonstrate the problem for you live (assuming you don't have a major physical distance issue).
    • The user keeps restating the problem in terms of the way that the system is currently operating (which they think of as a wrong) but is not stating what they actually want.The user may not really know what they want, they only know what the don't want. In that case, it is often helpful to start offering up your own countersuggestions to play a little game of "warmer/colder". If they say, "the reports are generated biweekly", you might counter with, "We could easily generate them every week". If the problem is that they want more frequent reporting, they might continue along that line and say, "How about twice a week?", but if the problem is that their inbox is choked with stuff already and they don't need that much information, they might say, "no, no, that's even worse". At least you can start to see the direction in which the change should be made.
  3. If you reach a state where the problem is not clear, but at least the steps to reproduce are, then it is often useful to end your conversation with the user and move on to replication as that can give you a more direct understanding of the problem. Once you can see if firsthand, then you can always call back the user to gather more information.
  4. If you determine that the user is making a feature request and not a bug report, be upfront about it. "I agree that what you are describing would be very nice to have and would save you time. It would probably take a fair amount of time to build, but I'm sure that we could get it out in the next release if you can get it added to the list of features." Encourage them to discuss it with other users to create a critical mass, and point them to the customer contact for your project.
  5. If all else fails, and you have already developed another trusted user contact (see next item) end your conversation with the current user and talk it over with your trusted contact. "I got this bug report from another user that I just can't make heads or tails of. Does this make any sense to you?" Since they likely speak the same language as the reported, they make pick up on language that you did not, or they might even say, "yeah, I have that problem all the time, but I didn't really know how to explain it". In any case, they will likely give you additional assistance in deciphering it.
  6. At the end of the conversation, make sure you are clear on the severity and timetable for this problem. State it to the user: "I know that this is really keeping you from getting any work done. I will get back to you in an hour or so with a status report. Hopefully I'll have a fix ready to go by then.", or "Thanks for the info. I'll check in next week and let you know if I've thought of anything." Also, make sure to be clear if you think you will need more information: "I may have a few more questions for you as I look into this. Would you mind if I gave you a call tomorrow or the next day?"
  7. Finally, if after discussing the report, it becomes clear that the reporting user is committed to/passionate about your system, has a natural rapport with you over the phone, and is somewhat technically savvy, you will want to think about escalating that relationship into a trusted contact or intermediary. That will be the subject of tomorrow's post.

Tomorrow: Developing a Trusted Contact

I thought I would get back to some off-topic posting for a day, so stay tuned for more Distance Debugging Tomorrow.

When we first moved in to our house, we discovered that the previous owners had left behind a lot of stuff, and I mean everything from major pieces of furniture to random personal effects. I think they did it partly out of a misguided sense of altruism, and partly out of simple failure to look thoroughly since much of it was stuffed in the back of high shelves, etc.

Anyway, after my wife used a left-behind brown crayon stub to write down an important phone number, we decided that the house was actually filled with items placed there by our future selves to help solve problems we didn't yet know existed, a la the the movie Paycheck. For those who haven't seen the movie, it's yet another Phillip K. Dick short story stretched into a feature-length film. It revolves around a software engineer that works on a project so secret that he agrees to have his memory erased afterwards. When he "wakes up" after the memory erasure, he discovers that he has left himself an envelope of miscellaneous unfamiliar items. Spoiler Alert!!! It turns out that what he had been working on was a device that could look into the future, and he used the device while building it to see that he is going to be killed by this employer, and that it will be used for very bad things, so he then uses it to see how he might avoid the problems he will encounter by using innocent looking items. The majority of the movie is him finding funny and clever uses for the items to escape from various situations.

Here are few of the items that were left behind:

Item Theorized Use Actual Use
Twin Bunkbed Sudden pregnancy/Surprise custody Unknown, gave away to neighbor
Old pink bathtowel Put out flames of fire that would have destroyed house Cleaned up something gross, threw out
Small crescent wrench Escape from kidnappers' makeshift prison Installed new shower head (good enough!)
Odd collection of municipal parking tickets mounted in frame Clues to location of treasure? TBD

What's funny is that it got my wife and I thinking about the fact that often in our lives we have a similar kind of experience, where we learn something that seems tangential at the time, but shortly after we find a very important use for the information. It's as if our future selves have planted the information for us knowing that we would need it later. Do other people feel this way?

When you a receive a bug report that is hard to understand from a person with whom you have never had contact, it can be difficult to get a complete picture of what is wrong. Assuming that you can make some contact with them, here is a good process for clarifying a report:

  1. Start by communicating your desire to be helpful and fix things. This will start the relationship out with the right tone. Users are often so used to being treated like nuisances that they will be very reticent to discuss things at first and will almost be apologetic for things they notice. Once they figure out that your top priority is fixing things, communication will become easier.
  2. Try to gauge the priority of this bug for this user in terms of their overall workload. This might be something very small to them and they are swamped with work, or it might be preventing them from completing something important. Say something like, "I saw your bug report and I wanted to get a little more information, if you have time. I couldn't tell from what your report whether it was actually blocking your work, or just an annoyance."
  3. If the user has time and interest, start with the "Tell Me a Story" bit described a few days ago, but in addition to focusing on the details of the bug, try to gauge the user's level of technical skill, and their overall feeling about the system. Your goal is not only to fix this bug, but also to get them to feel good about the system. It sounds touchy-feely, but perception is everything, and if they want to vent about lots of things they think are wrong, it can help if they know you are listening, and you can get invaluable information about how things are actually used.
  4. As you are talking with the user, it can sometimes help to categorize them for future reference, because it can help you understand where they are coming from:
    • The Power User - The most valuable kind of user because they push the limits of the system. They will report a lot of bugs, and will care if they are fixed.
    • The Squeaky Wheel - Reports a lot of bugs but often of the preference type. Reports need to be read with more skepticism than usual.
    • The Boss - While someone in charge might report bugs that you otherwise would push off, fixing their bugs might have a stronger effect on the overall use of the system because of their influence. Plus, they might be the ones controlling your budget.
    • The Geek - A tech-savvy user can be an invaluable source of not only bug reporting, but also as a kind of translator between the way that you think and the way that the users think. However, watch out for geeks trying to be overly helpful and offering endless "free advice".

Tomorrow: Clarifying the Bug Report, Part II

One of the hardest things to do when you receive a bug report is outright rejecting it. As keepers of the system, we want everything to be perfect under every circumstance. However, wanting to please users to this degree can actually hurt your reputation rather than help it.

There are few good reasons to reject a bug report on its face:

  • The bug report involves running the system in a totally untested and out-of-scope configuration. For example, if a user reported a bug where the server wouldn't run inside the Windows emulator on a Mac, I might briefly investigate as an intellectual exercise, but I'm not going to treat this a bug.
  • The bug report expresses a preference, and other users might have different preferences. If every user expresses the same sentiment, then I might accept it as a bug. If users differ, then it starts to fall into the realm of customizability, which generally involves new feature development.
  • The bug cannot be replicated and is not severe. In this case, it's just not worth spending time on. If it can be replicated, it's probably relatively easy to fix. If it's severe (i.e. results in data loss or unusuable system), then it's probably worth spending some time looking in to.

Problems that can arise from accepting any bug report as a command to fix include:

  • Your reputation can suffer because it appears that you can't fix some of these "unfixable" bugs, especially the unreplicable ones.
  • Your reputation can suffer because you are spending too much time developing new features under the guise of bug fixing, and so you are perceived as having a buggy system rather than a full-featured one.
  • Your bug tracking system gets cluttered with lots of unreplicable bugs, which never get fixed since they can't be replicated.

Saying No is not that hard if you start doing it from the outset, and you communicate clearly with the reporter your reasons for rejecting the bug. Most users will accept an explanation that says, "I'm sure that was very annoying when it happened, but it hasn't happened again, and no one seems to be able to make it happen again. If it happens again, let me know", or "We've never tried running it that way before, so I'm not surprised that it didn't work. I'll make a note of it, and if you want to do this in the future, make sure it gets into the list of requirements". You can't just start rejecting bugs 3 years into a project though, it will just make users frustrated. You must implement this strategy from the beginning.

Tomorrow: Clarifying the Bug Report

What if you look at that initial bug report and no possibilities jump to mind? It happens for many reasons. Sometimes, the nature of the bug being reported is so bizarre and unlikely that you can't even imagine why it might happen. Other times, the bug report itself is impenetrable and you can't even tell what might be wrong. How do you start the process of formulating a theory? Here are a few techniques for getting the investigation rolling:

  • Tell me a Story - Contact the person reporting the bug (if available) and ask them to tell you the story of what happened leading up to and immediately after the bug occurred in an informal way. Often it will reveal details that the reporter did not think were relevant originally but which turn out to be crucial.
  • Give it a Shot - If the bug report lays out a set of steps to reproduce and you have the ability to attempt them in some fashion, give it a shot. Often there is an initial psychological barrier of doubt where a bug is hard to take seriously because it seems implausible. The effect of making it happen can catalyze your thinking when it forces you to accept the reality of the problem.
  • Ask for Corroboration - This is useful in the case where the bug appears to be in a an apparently heavily-used piece of code, in which case, it would surprising that no one else is encountering it. It can help to send out a message to other users asking if they have seen this problem, or if they would be willing to try the steps. The results can tell you one of a few things: the reported activity is actually uncommon (in the case that it is easily replicable, but no one else has hit it), it is very common, and users just haven't been reporting it (happens more than you think), the reporting user's installation might be corrupted or broken (when no one else can replicate except the reporter), or the steps to replicate are slightly or majorly incomplete or incorrect, which happens because people can't quite remember what they did.
  • Ask for Replication - Often a user will report something the first time it happens, which is good. The problem is, often it's an odd one-time occurrence that never happens again. You will tear your hair out trying to replicate and track down something that very, very rarely occurs. If no one can replicate after a few attempts, make a note of it and move on (see tomorrow's post, Saying No)
  • Check the History - Bug tracking will be covered in another post, but assuming you are keep good records, check for similar bugs. Perhaps the combination of the newly found problem and a previously unsolved bug will give you enough evidence to suddenly find a solution for both.
  • Look for Non-obvious Changes - If all else fails and you suddenly have a repeatable bug and there is no obvious cause, start to look into non-obvious changes. Was some unexpected system maintenance performed? Was a piece of hardware upgraded or swapped out? This can be a somewhat open-ended investigation, but start by looking at the software and hardware that would have the most obvious effect on the failure.

Tomorrow: Saying No to a Bug Report