Google has recently announced a new open source crash reporting client-server library called Air Bag. As described in this article, the idea is to replace the closed source TalkBack crash reporting system currently used in Firefox, to begin with, and then extend it further to other applications. The idea of a client-server crash reporting system is not new. On Windows, the Dr. Watson "Application X has quit unexpectedly. Send report to Microsoft?" box is somewhat ubiquitous, and there are equivalents on other operating systems such as Bug Buddy on Linux.
While Google will probably make a few friends with this new reporting system, I have to wonder whether these types of blind crash reporting tools are really all that valuable. They suffer from some serious problems depite their advantages:
- The staggering amount of data that is collected
- The lack of context information
- User annoyance and perception
One of the links in the side bar is to the Cooperative Bug Isolation Project, which aims to resolve the first issue by applying statistical techniques to a cluster of reports to help isolate faulty regions. I'm not really doing the project justice, but I recommend checking it out.
In terms of the second issue, I'm not really sure what there is to do. I've seen some debate in the bug reporting tools' mailing lists about whether or not to ask the user what they were doing at the time of the crash. The pro is that it provides more information, the con is that it forces a disgruntled user to stop what they were doing for even more time, and probably increases the likelihood that they will send no report at all. That is true of pretty much any non-automated data collection technique.
The third issue relates to getting the user even more upset about something that they were annoyed with in the first place. For example, every so often, my Windows computer reboots for no apparent reason, always when I am not using it (like in the middle of the night). I only know because when I come in the next day, it is sitting at the login screen instead of where I left it. When I log in, the little "Windows has encountered a serious error!" box pops up and asks if I want to send a report. The thing is, I don't honestly believe they care, and I wonder if that's something that a lot of people submitting these bug reports come to believe. It crashes all the time and I'm constantly sending these bug reports, yet nothing ever seems to get done about it. It reinforces the notion that developers don't have any desire to fix problems.
So what can be done to improve centralized crash reporting?
- Something I've been thinking about for a while is introducing the notion of a "semantic" stack within a program. Essentially this would be adding code markup that gives a series of method calls a semantic tag like "Opening an Existing Document". Then, in addition to providing the raw stack trace, the system could dump out the semantic stack and also get a sense of what the user was doing in more context. It might look like "Clicked Existing Document Menu Item -> Selected "Foo.txt" from File Chooser -> Crash"
- Using some of the statistical techniques from cooperative bug isolation, allow a user to look at how common their particular problem is, and from that get a sense of its priority. If each crash that was produced was associated with a key that could be used to query a database, then the user could submit that key to a website that showed them how that bug clustered with other reports. If it is an outlier, perhaps their own system is to blame. If it is one of the most frequently reported, it will likely be fixed.
- Perhaps link the bug reporting system in with the automatic update checking that many systems do (like windows automatic update or pup on Fedora) and tie incoming updates to specific bug reporting clusters/keys. That way, a user knows that when they get a particular update, it will fix a particular issue. This helps with user perception that developers are paying attention.
Crash reporting as it stands seems to be somewhat useful for developers, but mostly a pain for users. I think that a few small changes could go a long way towards improving its utility for both sides.
