A recent publication by Edward A. Lee at UC Berkeley called "The Problem with Threads" is an interesting look at why multithreaded programming is hard, and more specifically, why the Thread abstraction makes is harder. While I'm not going to disagree with these sentiments in general, I am shocked by the common paralyzing fear of multithreaded programming among otherwise competent, confident, programmers.
There seems to be a disconnect between the actual difficulty level of writing a multithreaded system and the perceived difficulty. The reasoning goes like this:
- Single-threaded programs are deterministic.
- It is possible to exhaustively test only deterministic programs.
- Multi-threaded programs are non-deterministic.
- Therefore, it is not possible to exhaustively test multi-threaded programs.
Take for example, this passage from the Lee's paper:
A part of the Ptolemy Project experiment was to see whether effective software engineering practices could be developed for an academic research setting. We developed a process that included a code maturity rating system (with four levels, red, yellow, green, and blue), design reviews, code reviews, nightly builds, regression tests, and automated code coverage metrics... The reviewers included concurrency experts, not just inexperienced graduate students...We wrote regression tests that achieved 100 percent code coverage. The nightly build and regression tests ran on a two processor SMP machine, which exhibited different thread behavior than the development machines, which all had a single processor. The Ptolemy II system itself began to be widely used, and every use of the system exercised this code. No problems were observed until the code deadlocked on April 26, 2004, four years later.
It is certainly true that our relatively rigorous software engineering practice identi?ed and ?xed many concurrency bugs. But the fact that a problem as serious as a deadlock that locked up the system could go undetected for four years despite this practice is alarming. How many more such problems remain? How long do we need test before we can be sure to have discovered all such problems? Regrettably, I have to conclude that testing may never reveal all the problems in nontrivial multithreaded code.
There are few elements that bother me here. My primary complaint is the final sentence. Testing may never real all the problems in nontrivial multithreaded code. It implies that testing may reveal all the problems in nontrivial single-threaded code, which I believe is totally false. Testing will never reveal all the problems in any nontrivial system. Multithreaded programs are no different, but that's no reason to assign any particular menace to them.
My second complaint is this statement: "No problems were observed until the code deadlocked on April 26, 2004, four years later." Seriously? If your system had no observable defects whatsoever during a 4-year active usage period by a large and diverse group of users, then my hat is off to you. I assume what he meant is "No problems were observed that appeared to be related to threading until the code deadlocked". It would be shocking of none of these users found bugs in the UI, or errors related to pointer logic, or any of the dozen other problems that commonly occur in complex systems. I believe that the Ptolemy system was well-written and well-designed, so I am not trying to claim that it is buggy or problematic, I am simply claiming that singling out the 4 year gap before the first thread-related bug was found is misleading. I could argue that multithreading is in fact the least of their worries if the first bug was found 4 years after the release of the system. How many UI bugs were found in that period. 50? 500?
My final complaint is that these papers are what stokes the fear burning in ordinary programmers: that they somehow be exposed by the complexity of building a multithreaded system. Perhaps it is that programmers are ultimately control freaks, and somehow, the nondeterminism of multithreading seems more out of control than the ordinary nondeterminism of the ridiculous things human users do to every application. Whatever the reason, I recommend to all programmers that they become familiar with the tools of multithreaded coding in the same way they might learn graphics or databases, and just start writing lots and lots of multithreaded programs. This type of exposure is the only way to beat this phobia that plagues the industry.
