Distance Debugging Logo

A recent publication by Edward A. Lee at UC Berkeley called "The Problem with Threads" is an interesting look at why multithreaded programming is hard, and more specifically, why the Thread abstraction makes is harder. While I'm not going to disagree with these sentiments in general, I am shocked by the common paralyzing fear of multithreaded programming among otherwise competent, confident, programmers.

There seems to be a disconnect between the actual difficulty level of writing a multithreaded system and the perceived difficulty. The reasoning goes like this:

  1. Single-threaded programs are deterministic.
  2. It is possible to exhaustively test only deterministic programs.
  3. Multi-threaded programs are non-deterministic.
  4. Therefore, it is not possible to exhaustively test multi-threaded programs.

Take for example, this passage from the Lee's paper:

A part of the Ptolemy Project experiment was to see whether effective software engineering practices could be developed for an academic research setting. We developed a process that included a code maturity rating system (with four levels, red, yellow, green, and blue), design reviews, code reviews, nightly builds, regression tests, and automated code coverage metrics... The reviewers included concurrency experts, not just inexperienced graduate students...We wrote regression tests that achieved 100 percent code coverage. The nightly build and regression tests ran on a two processor SMP machine, which exhibited different thread behavior than the development machines, which all had a single processor. The Ptolemy II system itself began to be widely used, and every use of the system exercised this code. No problems were observed until the code deadlocked on April 26, 2004, four years later.

It is certainly true that our relatively rigorous software engineering practice identi?ed and ?xed many concurrency bugs. But the fact that a problem as serious as a deadlock that locked up the system could go undetected for four years despite this practice is alarming. How many more such problems remain? How long do we need test before we can be sure to have discovered all such problems? Regrettably, I have to conclude that testing may never reveal all the problems in nontrivial multithreaded code.

There are few elements that bother me here. My primary complaint is the final sentence. Testing may never real all the problems in nontrivial multithreaded code. It implies that testing may reveal all the problems in nontrivial single-threaded code, which I believe is totally false. Testing will never reveal all the problems in any nontrivial system. Multithreaded programs are no different, but that's no reason to assign any particular menace to them.

My second complaint is this statement: "No problems were observed until the code deadlocked on April 26, 2004, four years later." Seriously? If your system had no observable defects whatsoever during a 4-year active usage period by a large and diverse group of users, then my hat is off to you. I assume what he meant is "No problems were observed that appeared to be related to threading until the code deadlocked". It would be shocking of none of these users found bugs in the UI, or errors related to pointer logic, or any of the dozen other problems that commonly occur in complex systems. I believe that the Ptolemy system was well-written and well-designed, so I am not trying to claim that it is buggy or problematic, I am simply claiming that singling out the 4 year gap before the first thread-related bug was found is misleading. I could argue that multithreading is in fact the least of their worries if the first bug was found 4 years after the release of the system. How many UI bugs were found in that period. 50? 500?

My final complaint is that these papers are what stokes the fear burning in ordinary programmers: that they somehow be exposed by the complexity of building a multithreaded system. Perhaps it is that programmers are ultimately control freaks, and somehow, the nondeterminism of multithreading seems more out of control than the ordinary nondeterminism of the ridiculous things human users do to every application.  Whatever the reason, I recommend to all programmers that they become familiar with the tools of multithreaded coding in the same way they might learn graphics or databases, and just start writing lots and lots of multithreaded programs.  This type of exposure is the only way to beat this phobia that plagues the industry.

I have always wanted to invent a new, commonly used expression or cliche. Most of the things I come up with refer to events or states that don't happen that much, or are just not catchy enough to endure. Here are some of my creations. I apologize if I actually stole them from somewhere else, but I've seen no evidence on the web:

  • Throwing rocks down the well - There is a fable by Aesop called The Crow and the Pitcher. In short, a crow uses stones to slowly raise the level of water in a pitcher until he can drink it. It's supposed to be about ingenuity and perseverance. I always took it to be about recognizing when brute force is your only option. I use this more dramatic sounding variant to describe situations where you have been doing things in a slow, grinding way because there is simply no (known) alternative. For example, if you are trying to build a new line of business for your company, you can't just create it through a flash of insight. You have to find new customers, and convince them of your worth, etc. In short, you can only build a new business by throwing rocks down the well.
  • All '5's and 'yes's - I've purchased two cars from two different car companies. In both cases, the sales and service staff has admonshed me, "<car company> will be calling you for a follow up survey. Please, please, please do not give us a rating other than a 5 or a yes; PLEASE, ALL '5's AND 'YES'S!!!", with the idea being that they need a 5 on a scale from 1 to 5 for the numerical questions, and yes on the yes or no questions. On a side note, this notion of customer satisfaction where anything besides perfection is failure is patently ridiculous, leading to the situation described; the company actually gets no feedback at all, but that's a topic for another post. Anyway, I've adapted this expression to describe a situation where you want an honest critique, but you just get superlatives or gladhanding. "I wanted her feedback on the latest design document, but she was all '5's and 'yes's."

So that brings me to my latest creation: the Check Engine Light. My old car had a little orange light that many cars have, and is generally referred to as the Check Engine light.  In my limited understanding of cars, I have only a vague notion of what this light is supposed to tell you, especially with a title like "Check Engine".  Basically, I was told, it is supposed to come on when one of the multitudes of sensors that track the efficiency and emission level of the engine and exhaust detect an out-of-bounds condition.  So when that happens, you should dutifully take it over to the nearest official service shop and have them look at it because, at least on my car, it stores a code that indicates what was wrong.

Here's the problem: if you had the car up at highway speeds, that little light would come on for about 10 minutes and then it would turn off for 10 minutes, over and over.  We'd take it to the dealer and everytime it would read some random code and they couldn't find anything wrong with that code.  One out of the dozens of times we had it in, they found a hose that had a leak and replaced it.  Eventually, the theory was that the engine had a timing problem causing it to misfire occasionally at higher speeds, and this would trigger the light.  Overall though, this car had very few problems and was an excellent vehicle, so it was just annoying.  It got to the point that I began referring to it as the "Everything is Fine" light because it would come on when we were happily cruising along.  I would have been worried if it didn't start it's off-on cycle.

Ultimately, I understood this light though, because I have built "Check Engine" lights into software many times, and I see it in software that I use.  It happens like this: you have a piece of software that is doing a fair number of complex things, and there are hundreds of possible things that can go wrong, so in general, you are better off monitoring a limited set of outputs for problems than you are putting lots of checks in the code itself.  The trouble is, you are just not sure what constitute "normal" values for outputs in every case.

A good example is something like a server thread stuck timer.  It's good practice to put in a timer that waits a certain period of time before declaring a thread "stuck".  Since threads can become stuck for many reasons, this is much easier than trying to detect every cause, you can just kill the thread and let someone know.  The problem is, if you set the threshold too high, then threads will be stuck for a long period of time before being noticed, and if load is heavy, the system might lock up in a cascading effect.  If it's too low, then jobs that take a long time might be misclassified as "stuck".  So it has to be calibrated to the application, but there will always be cases of this "Check Engine" light coming on for no apparent reason.

So into your arsenal of debugging terms,  I hope you will add "Check Engine Light", defined as errors which indicate non-specific, recurring fault conditions, and which may have a reasonable cause, or may be false positives.  Or at least the next time you are poring over a log file with someone and they dismiss a stack trace with a wave of their hand and a reference to "The Check Engine Light", you'll know what they mean.

Now that I've covered the basics, I'd like to talk about using Linux with the Dash. It turned out to be a learning experience in many ways. First of all, Windows Mobile wants you to use windows, and has a lot of elements that make it especially painful to use with any other operating system. Take for example, application installation. In my previous dealings with Palm and Symbian OS, it was generally a matter of copying the application onto the memory card from the computer (which could be done from anywhere) and then either installing the application from the card, or copying the application from the card into the proper location on the phone or PDA file system to get it to be recognized. Windows Mobile applications appear to be distributed as a Windows executable (.exe) that you execute on your Windows machine and it installs the actual application on your PDA when you sync. So essentially, unless you can run windows software and have ActiveSync (the synchronization app) installed, you are SOL. More on that in a second. Here are the highlights of what I've been able to do thus far:

  • I've gotten hooked up with the excellent synce project, which aims to provide tools for working with Windows Mobile devices on Linux. They had been supporting lots of devices, but after a long drought where there were few developers, it looks like attention has been focused on WM5. It also appears that the community is once again picking up steam. With a very active mailing list where I got a very quick response to a question, and new stuff being added every day, it looks like there may be some serious momentum for getting these devices fully supported.  I'd like to offer my technical assistance as well, so I'm going to dig into the code as well and see what's happening.
  • Using the info and tools on the SynCE site, I was able to set up my machine so that I get desktop notifications when it's plugged and unplugged, I can list the contents of the device and copy files to and from it, and it appears that synchronization of contacts and calendar from Evolution really wants to work. Unfortunately, while both the computer and the phone think that they are exchanging information, my Evolution contacts just seems to get a bunch of blank entries. I'll post more when I get that sorted out. Word on the mailing list is that Task support is very close to being ready as well so that would be the big three PIM applications (I don't need email sync since I've just got both the phone and my desktop using the same IMAP account).
  • In terms of the Windows-dependency for application installation described earlier, there is a tool called "Orange" on the SynCE that can extract the .cab files, which can be installed on the phone directly, from some Windows installers, but I believe it is limited to self-extracting installers that were used in the past. It doesn't seem to handle this new breed of "Windows application as Windows Mobile Installer", which is quite frustating. I've tried a few things such as running the installers under WINE, which works, except that they all want ActiveSync to be installed, and I've tried installing ActiveSync under WINE, which fails at the moment for reasons unknown. I think I'm going to have to temporarily resort to installing applications with my Windows machine shudder until I come up with something better.

So, Linux support is moving along.  I'll post again when I start getting things synchronized or if I find a solution to the Windows installer problem.

Besides big projects eating up all my time, I did have one fun side pastime: a brand new T-Mobile Dash smartphone. The device, which is the same as the XTC Excalibur, runs Windows Mobile 5 (or 2005 as it's also called), has a full qwerty keyboard, and WiFi so it's pretty serious little thing. I spent a lot of time fiddling with it, and with associated tools. Here is a quick summary:

Pros:

  • This is well covered in other places, but it just looks cool. It's got a soft textured rubber exterior on the back and it's easy to grip, and a brushed metal look on the front.
  • Lots of useful built in applications, stuff for viewing word docs, windows media player (more on that in a second), an IM client, and mobile outlook.
  • I bought it as a replacement music player after the untimely demise of my iPod, so I went out and bought a big (2GB) microSD card which, quite frankly, I could easily inhale if I weren't careful. It's about the size of a quarter of a postage stamp. Anyway, it comes with earphones that plug into the "micro USB" port on the bottom, and I have been totally shocked (in a good way) at the quality of the audio. I'm not exactly an audiophile, but compared to my old iPod, the bass is much better, and it just has a nice clear, rich sound. That really surprised me. I have been using the built-in music player, but I'm going to try out some of the other players and compare. Overall, I would highly recommend it as an ipod replacement thus far.
  • It's a little slow switching between applications, but the applications themselves run without a hitch.

Cons:

  • There don't seem to be that many applications available for the Windows Smartphone platform. For anyone who has tried to produce an application for the mobile world, you know that each platform has it's own set of capabilities and quirks, and so it's time-consuming and often not worth the trouble to develop for multiple platforms. Windows Mobile itself is actually divided into two branches: the smartphone branch and the PocketPC branch. The main philosophical difference is that the smartphone branch does not use a touchscreen, but there are other subtle differences. Therefore, you can't just grab from the huge slate of existing PocketPC applications, there has to be a smartphone version.
  • Windows Mobile, like its big brother on the PC, is kind of uptight. I've spent a lot of time trying to figure out where it wants me to put things and how to get things installed. Part of this is probably because I refuse to use the standard Windows tools and want to make everything work on Linux (more on this tomorrow), so Windows users may have less trouble with this. There is talk that some people have gotten Linux to run on these phones, but I'm not quite that brave yet.
  • I got my email set up on it, which is very cool, but I have it check every 15 minutes or so for new mail and it insists on using the EDGE connection rather than the WiFi, and I have absolutely no idea how to change that. Overall the whole connectivity thing works, but mostly I just end up having the data service up all the time, and I only turn on the Wifi when I am browsing the web.

Tomorrow: Linux Tools for Windows Mobile

My apologies to those looking for new content here.  I am currently embroiled in a set of big projects that is occupying all my time.  Look for a bunch of new posts starting on Thursday:

- These (2.5) weeks in Debugging

- Is Multithreaded Debugging Really that Hard?

-  Fun with the Java Media Framework

I apologize to those that have come here recently to look at an individual post and found it unreadable. A helpful comment noted that while the main site looked fine, the individual post page was screwed up. I had forgotten to apply the same trick to the Single Post template (page.php) that I had applied to the Main Template (index.php). It should be fixed now. If this problem shows up anywhere else, please let me know.

For those who are interested, getting a scrollable DIV is fairly straightforward. The key is to set a fixed height, and then set overflow to "auto". Like this:

.scrolldiv {
height: 500px;
overflow: auto;
}

Then, if the content exceeds that height, it will give you a scroll bar automatically. Look for a belated "this week in debugging" with more info later today...

I've tweaked the layout a bit so that you can scroll the posts while keeping the sidebars and header in place.  My goal is to ultimately make it so that all the sidebar content fits onto a typical screen so that you can see all the archives, search, etc without having the scroll the page, and you only have to scroll the post content.  Let me know if this is good/bad for you.  I'm probably going to keep playing around with this for a few more iterations, so please excuse me in advance if you stop by and things are little out of whack.

I have not worked at that many different companies, although I've been in the industry for a while and I've seen the day-to-day workings of many workplaces. A recurring theme in many of these workplaces, which always surprises me, is the failure of the information technology to meet the needs of the users on a day-to-day basis. I've often asked why this is such a problem. Is it because it's hard to know what users want? Do the solutions not exist? Are they too expensive to implement? Is it not a priority?

I've concluded that businesses are not serious about IT. By not serious, I mean that they treat it as a nuisance, or that, in general, the least possible effort is expended on providing employees with the tools and technologies that would help them do their jobs. What would taking IT seriously mean?

  • Problems with the computer infrastructure would be treated like an emergency. When users go without email for 4 hours, this is often shrugged off like it was inevitable. What if the heat went off in the middle of the day for 4 hours, or the water stopped running for 4 hours? It would be just about as disruptive to business, but it would be treated like a crisis.
  • Solutions would not just be, "Don't let anyone do anything". Most big business seem to have an IT policy which is unconcerned with stomping on users legitimate needs in the interest of preventing possible problems. For example, the prevalence of ridiculously aggressive email filters that don't let any zip file in or out. Yes, it stops a certain class of viruses from spreading, but it gets in the way of transmitting legitmate files probably 10 times for every virus stopped? Why not just not let people use computers at all? Then you'd have no computer problems.
  • Intermittent problems would be dealt with instead of glossed over. Have you ever worked at a company where a critical server went down once a week and instead of just fixing the problem, it was rebooted? That's what I'm talking about.
  • Users' requests would be taken treated as requirements instead of burdens. Most IT policies are a top-down affair, with some sort of group deciding which capabilities will be offered to which users, and how those services will be delivered. Unfortunately, since the people who make the policy are generally the people who have to implement it, the decisions often lean towards "easy-to-administer" instead of "good-for-users". When users complain about policies that interfere with their work, or offer alternatives for software or hardware to help them do their jobs more effectively, the message is "but that will mean more work for us!". The result is that large gains in user efficiency are sacrificed for much smaller gains in IT efficiency.

That last point is the most important. The IT staff often has a conflict of interest because they want to reduce their burden (which honestly is hard to blame them for, given that IT departments are often dramatically understaffed) but by doing so they create a suboptimal environment for the day-to-day users.  Businesses need to recognize this and provide a better mandate for their IT staff. Besides raising staffing levels and constantly trying to improve the staff, management needs to judge the IT department on user satisfaction. Those two steps would be a serious commitment to  IT, and ultimately, they can help improve morale, and make the business more competitive.