Distance Debugging Logo

Software development has two general phases, the building phase and the verification phase. In the building phase, the attitude towards change defaults to "allow". New features can be added, code may be significantly reworked, and designs might be altered, as the goal is to add new capabilities to the system at the expense of increasing the risk of introducing new flaws and problems. Once the building phase has substantially completed, the verification phase commences, where the attitude towards change flips to default "deny". The goal is to achieve a stable system that performs the desired features. Since every change has the potential to introduce errors, changes are only made to correct known flaws.

Obviously, a significant amount of testing is done during the verification phase to assure that the features are implemented as specified, and that no new problems have emerged. However, some amount of testing must be done in the building phase to convince the developer and the team that things are near enough to stable that it is worth switching the change policy to default "deny". The big question is, how much testing is enough? There are significant dangers to insufficient testing, but too much testing also has consequences.

Dangers of Insufficient Testing

  • Wasted time - Blocker bugs found during verification often mean that testers and developers sit around waiting for an issue to get resolved, and that wastes everyone's time.
  • False sense of security - There is a general feeling of excitement and progress when things move to verification, that the application is moving towards complete. If it turns out that major bugs exist and features are not implemented correctly, this can throw the whole schedule into disarray.
  • Blurring of phase line/Vacillation - This is the most dangerous result of all, a general blurring of the line between building and verification, or constantly switching back and forth between phases. The end result is an "anything goes" policy towards change, and a general march towards testing hell where no features can be verified because developers feel free to change anything at any time. It starts simply enough, where a couple of big bugs make it into the verification phase so things are moved back into building mode. Then it's back to verification after fixes are put in place but testing still wasn't done sufficiently and new bugs are found or the old bugs are rediscovered, but now time is getting tight and the team is encouraged to just get it working so changes start getting made wherever needed at all once. At this point, no more guarantees can be made about functionality. Essentially, this is permanent building mode. It is the sad truth that many applications ship directly from the building phase, without ever having a verification phase, and appear to do so successfully. IT WON'T WORK LONG TERM. Code can ship a few times under these conditions, but eventually this attitude will cause the codebase to crumble under the weight of a million changes with no external verification. It is accruing technical debt, and just because no one is watching the balance go up doesn't mean it won't eventually come due.

Given all these dangers, why not doing as much testing as possible during the building phase?

Dangers of Excessive Build-Phase Testing

  • False sense of quality - Developers are notoriously bad at testing their own code. It's too easy to fall into the trap of checking the few code paths you've spent the most time on and ignore some of the other paths that just happen to be the ones most frequently experienced by users. Developers also have a built-in disincentive, which is that testing finds bugs and that means more work for them, so they are more likely to turn a blind eye to a real problem or dismiss it as "unlikely". Like under-testing, over-testing can create a false sense of confidence in the code because so much testing was done, when in fact a small portion of the code was heavily tested, but in fact many new issues are likely to be found during verification.
  • Skill mismatch - Testing is a separate skill from development, and one that requires a very different mindset than development. Some developers are also good testers because they are good at thinking like users, or are good at thinking up diabolical ways to break their own code. Most are not. They can produce quality code but haven't developed that additional skill set. In short, developers are generally not very efficient testers, and asking them to do testing beyond what is necessary to move to verification is inefficient.
  • Perfectionism - If the management and team come down too harshly on developers for letting even small or unlikely bugs through, developers will tend to respond by becoming perfectionists, never letting their code be called "done" until they have gone over every line, every usage scenario, and every potential input. Developers become loathe to make changes because it means their code might have a bug and they will get an earful from the team. This result is nearly as bad as the "Blurring of Phase Lines" case, but in a different way. Rather than live in the build phase forever, the verification mindset takes over the entire process. Instead of shipping code of questionable stability because there was never a verification phase, code never gets shipped at all as every developer is too worried about protecting their reputation.

Unfortunately, trying to require too much testing can be just as bad as requiring too little, so then returning to the question of how much testing is enough, suppose that during the verification phase, two different bugs are found:

  1. If more than 10000 transactions are rolled back simultaneously, a system crash results.
  2. The application will not launch.

Which of these ought to have been found and fixed in the building phase? Most observers would likely agree that item 1) is probably not the kind of thing that could or should be found in the building phase, and item 2) is most definitely the kind of thing that should have been found. Somewhere in the middle is a great gray area of bugs and issues that some believe should be rooted out during building, and others might consider a verification phase find. Even if the opinion on a particular issue may vary from developer to developer, it is useful to establish some criteria to determine what bugs might indicate that the building phase code has or has not been sufficiently tested. If a bug is found during verification, consider the following:

  1. Recency - Might this bug have existed for weeks, months, or even years and the new work has somehow uncovered it, or was this bug clearly introduced by the latest work? Developers can easily break parts of the code that they do not use frequently, or that has been around for a long time because they assume that it has gone through many rounds of testing and is hardened. It can be very hard, and it is probably inefficient in the build phase to throughly test enough of the system to find bugs that have stubbornly hidden through many rounds of testing. On the other hand, if a developer adds a feature and it is found to be immediately broken, that's a red flag.
  2. Obviousness - How likely is it that a developer performing a reasonable testing regimen would have come across this bug? In the examples given, the application failing to start is horribly obvious, while a failure that occurs after a large number of concurrently rolled back transactions is much less so (unless this is a known common condition in this application). Obviousness is a value judgment though; what is obvious to one developer might not have been to another, especially those with more rigorous testing practices. Still, a developer whose code frequently has obvious bugs on the main code paths is generally not doing enough testing in the building phase.
  3. Frequency - How often does this bug occur? This goes along with obviousness in that an infrequent bug might not show up in the kind of basic testing done in the build phase, while frequently occurring bugs should be found and rooted out. There are frequently occurring bugs that might not show up in any of the obvious code paths though, for instance, a failure that occurs after the user clicks "Back" on their browser and resubmits a form. This might be something that happens all the time with a web application, but which a developer might not think to test.
  4. Severity - Less important than the other three, but still something to consider. A developer can easily overlook something important to users but which doesn't result in errors or crashes, for example, the tab order of a set of fields, because it's simply harder to test for that kind of thing. On the flip side, a very severe issue such a system crash that only happens under very infrequent or even apparently non-replicable conditions can go equally unnoticed. However, serious errors that are easily detectable just by walking through the application should be caught during the building phase.

To summarize, bugs that would have been added during the most recent set of changes, are obvious (according to one's own definition of obvious), frequently occurring, and severe should all be stamped out during the build phase. After that, it's up to the manager and the team to decide what level of testing to do before moving to verification, and that will be a factor of both the experience and skillset of the team, and the type of project being implemented. By choosing this level wisely and frequently adjusting it through feedback, both developers and testers can work efficiently to create a stable, feature-rich product.

I've noticed myself becoming biased against developers who claim to not use any IDE as part of their normal course of development. Am I right to think less of their abilities and productivity because they choose to develop in a text editor, even a powerful one meant for coding?  This bias isn't uniform, as there are cases where an IDE makes more or less sense. I would say that:

  1. My bias is much stronger in the case that the development community for a particular language or technology tends to develop in an IDE, and there are several very powerful and well-thought-of IDEs, see for instance, the Java development community.
  2. My bias is stronger in the case that the developer is/has been doing Enterprise-level development. I have a forthcoming post discussing that concept in detail, but in a nutshell, I am biased against non-IDE developers who are building something where users are paying for the service and expect a high level of robustness and stability over a long period of time.
  3. My bias is much weaker in the case that a user is a recent grad and may not have been exposed to IDE-based work, or is doing research-y type work that might be hindered by an IDE.

The development world seems split on the issue of whether or not IDEs are a Good Thing or Bad Thing for developers. For every "10 reasons to use an IDE" article out there, there is a corresponding rebuttal.  This article from a few years back goes into great detail explaining the core difference between Language Mavens and Tool Mavens, where the former tend not to use IDEs since they lack support for the latest and greatest languages and features, while Tool Mavens favor the power of tools over language capabilities to magnify their productivity.  However, in my experience, being a language maven , i.e. using cutting-edge languages and features, is something of luxury in The Real World, where you are interfacing with legacy systems and often have to work with what is already installed somewhere, so your choices are limited.  In that sense, I discount the Language Maven perspective.

In a broader sense, I don't actually understand how developers create systems of any significant scale without using an IDE.   It just seems like you lack so much context that performing a task of any complexity becomes unwieldly.  Speaking from my own experience, I have built systems that would have been basically impossible to create if I had attempted to write them outside of an IDE.  My experience may be unique, since I do primarily Java development, but here are a smattering of things that Eclipse handles for me that emacs (my old "IDE") could not:

  • Build Automatically - Every time I save a file, the entire source tree is rebuilt.  I see any errors or warnings I've created instantaneously.  I actually can't remember what development was like before Build Automatically.  Oh wait, yes I can: I also use Visual Studio regularly.
  • Refactor Change Method/Class/Variable name - Without this capability, names have inertia.  It's a pain to find every reference and update it, so things have names that are out-of-date.  They don't accurately describe what something does because the nature of a class or method has changed over time but no one wants to fix it.  Misspellings are never corrected.  It leads to subtle, but painful, bit rot.  In Eclipse, I can change the name of a class or method or variable as fast as I can type it, and all references are automatically corrected, so I have no barrier to keeping names up-to-date.
  • Run Tests - I can one click run my JUnit test suite, so I do it all the time.  You can do this without an IDE, but it's more difficult.  When I used emacs, I would keep a shell open and rerun the tests, but it was much harder to process and understand the results, and to remember to keep running the tests because they were out of sight and out of mind when I was writing code.
  • Ctrl-Click to Jump - I'm guessing emacs could be retrofitted with this one, but I don't think I've ever seen it.  In Eclipse I can hold Ctrl and click a method to jump to the method definition.  When I'm done, I can use the "back" button to go back to where I was editing.  In the newest versions of Eclipse, I can hover on a method to see a some or all of the source code in a tooltip without leaving the current file, while frequently eliminates the need to ctrl-click at all, and helps me avoid breaking my current train of thought.
    So if I talk to a Java developer who doesn't use an IDE, they are
  • Find References -  It's nice to be able to get a caller graph of a method to see exactly where it is being called, and who is calling the calling methods, and so on.  Again, this could be retrofitted, but it involves a deep parse of the code base, something that non-IDE editors tend not to do.

So if I talk to a Java developer who doesn't use an IDE, they are basically telling me that they:

  1. Do a lot of development without recompiling their code, so there can be a big delay between when they introduce a compile error and when it is resolved.
  2. They generally (unless they are extremely disciplined) avoid altering class, method, and variables to match changes in behavior or implementation.
  3. They are unlikely to have a large, frequently run test suite.
  4. They probably spend a lot of time scrolling through files looking for methods instead of writing code.

When I write it all out that like, my bias certainly feels justified.  While those are broad pronouncements about someone's capabilities just because they don't use an IDE, I'd love to hear the counter arguments.  Are there developers out there who shun an IDE and feel like they are gaining other advantages that outweigh the above, or that they have similar capabilities that I'm just not aware of?

I'm always looking for ways to improve the interview process, and I hit upon another interesting technique when reflecting on the qualities I am looking for in a good hire. To quickly review, these generally include:

  • Enthusiasm
  • Flexibility and Open-mindedness in regards to technical approaches and solutions
  • General technical skill
  • An intuitive sense of how things are built

Enthusiasm and Open-mindedness are easy to figure in the interview process. I have my own thoughts on how to learn about general technical skill, but on the last one, I think the easiest thing is just to ask someone to do what I tend to do on a daily basis: look at things are try to guess how they were probably built. I do this sort of reflexively; whenever I see an application, website, or any other piece of technology that does something clever or unusual, I immediately start trying to deconstruct it in the same way a kid might try to figure out how a magic trick works. Here are two quick examples that came to mind:

jacksonpollock.org - The technology is simple enough. It's clearly a flash application (if you could do this with javascript and CSS alone, that would be 100 times the hack). However, what is the algorithm that generates the paint strokes? It seems to be some combination of mouse speed and motion, as if you have a paint can from which paint is leaking at a constant rate and if you move it quickly you produce a small amount of paint per square inch, but if you leave it somewhere it will pile up. There's slightly more too it though, as it must be doing some simple physics modeling or just randomization to produce splatter.

DHTML Lemmings - Speaking of 100 times the hack, how about a full arcade game using only DHTML? I haven't really dug down into the code at all, but I can guess at how it is implemented although there are probably dozens of good ways. I would assume that the screen is composed of images rendered into divs or other containers, just like any tile-based game would be. The sprites are probably just images that are moved or altered by javascript by the game logic, and animation is a well-known javascript technique.

There are both small, fun applications that don't have any serious tricks that need to be unraveled, but the exercise is still useful. A lot of what I and others like me do really boils down to reverse engineering. Reverse engineering not in the DMCA violating sense, but in the sense of being given an application and a huge mass of code and needing to quickly understand the code by guessing at how the application is probably built and then looking for evidence that you were right.

Asking a potential hire to demonstrate their skill in this area seems very valuable, geared toward their area of expertise. A good web designer should be able to tell you how a webpage was probably constructed without looking at very much of the HTML and CSS. A good web application developer should be able to guess at the database structure, the types of objects in the system, and the types of actions being taken without looking at very much of the code. If the candidate makes wildly improbable guesses, or simply refuses to play the game, I would look elsewhere.

On more than one occasion, I've been asked by a client to dig into an issue only to discover that either it appears that the feature or bug has already been implemented/fixed, or that 99% of what would be necessary to make it happen is already in place.  I pride myself on my honesty in these kinds of dealings, as in general, I find that making someone happy by saying "I only spent an hour on this, because it turns out someone has already fixed it" is worth 100 times what I might have gained by pretending to fix or refixing it in some alternate way just to bill a bunch of hours.  Not only is the customer pleased to get something more or less for free, but I gain their trust that I am not going to jerk them around in future dealings. This is critically important when you need to deliver bad news, such as when something that appeared to be straightforward will actually take much longer than expected.

This situation comes up in any circumstance where there is significant information disparity between customer and expert.  I belong to Angie's List, and the comments on many of the top-rated providers are of the form "I took my {furnace, car, stove, etc} to 10 repair people who all said that it would need to be totally replaced for $2000.  Then I heard about Super Repair, who came out to my house, took one look at it, hit it with a hammer, and now everything has been working perfectly! A+!"  The recurring theme is that true experts don't feel the need to earn their living by overcharging for full replacements and extra work; they would much rather solve the problem and move on to their steady stream of referrals.  This got me to thinking, if you are in the market for a high-quality contract developer, in the absence of a direct referral, could you plant such a bug or feature among your list of things to do in an effort to check the quality of the contractor?

The idea is fairly simple.  Just add a bug or feature to the list that you know has recently been resolved.  Make sure to point the contractor to the approximate section of the code where you know this fix to reside (since it's very possible in a large code base that even a very competent contractor could look elsewhere for a resolution and miss the fix).  There are a few possible outcomes:

  1. They come back to you and say "it looks like this is already there".  You know you've got a good contractor on your hands. 
  2. They end up billing a couple of days to it and saying nothing about it.  They are either lying or incompetent.  Steer clear.
  3. They bill some time to it, discover that it is already there, and say as much.  This can easily happen when someone starts with a different approach to solving the problem but then eventually finds themselves in the code that already has the fix.  This is the cost of this trick: billing on something that has already been fixed.  You probably still have a decent contractor.

It's also possible that they sniff out the set-up, but if they, like me, have seen this happen often due to simple communication problems on a team, that seems unlikely.  Overall, I would bet that it would be an effective and relatively low-cost way to quickly vet a prospective long-term contractor.  Has anyone out there actually tried this?

Once every few months, I get a case of what I call "The Digital 'Dropsies'", named after the problem that can temporarily affect a previously sure-handed wide receiver. During this affliction, which usually lasts about 24 hours, every technical thing I try to accomplish seems to encounter unexpected hurdles, bizarre failures, simple mistakes that get magnified into serious issues, or all of the above. Nothing that happens is critical. It's not massive data loss or huge setbacks, just a serious of breakdowns that make me want to get away from technology for the rest of the day.

I had one of those days yesterday, and I'm just now recovering from the effects. Here are some of the highlights:

  • I was working on an AWS EC2 instance (a virtual server instance hosted with the Amazon Elastic Computing Cloud) and needed a non-standard port open. Instead of just looking up the right way to do it, I just tried figured I would use the usual tools on the Fedora Core 4 image I was connecting with. So I ran the firewall config tool, set the port to open and saved it. Somehow I managed to accidentally wind up with a machine will all ports filtered, so I lost my ability to connect with ssh, which apparently with an EC2 instance is equivalent to"bricking" it. There was not yet any critical data on that machine, but I was forced to get a new instance created and reupload and reconfigure a bunch of stuff.
  • I wanted to try out Ubuntu Gutsy, so I torrented and burned a copy of the live CD. I ran the live version, and then set it to install. Halfway through, I wound up with an I/O error, and so I ran a consistency check on the CD, which turned out to be invalid. I don't think I've ever had that happen before.
  • Instead I decided to install Fedora Core 8, so I kicked off a torrent download. About 10 minutes in it crashed my network. I restarted my router and it completed without issue.
  • I had the aforementioned problem with Windows Tablet edition getting locked up with some kind of DVD driver conflict. For those interested, I did resolve it, but only by running msconfig and disabling all startup items. I'm still not sure which of those processes is to blame, and I don't have time for a binary search right now.

Fortunately, it vanished as quickly as it came. Today I was able to fix the DVD problem, along with resolving a host of other nagging issues. Most importantly, Fedora Core 8 is excellent, even allowing my tablet to suspend/resume without issue out of the box, something no previous release could do. Combined with having my machine set up to effectively develop in two different environments (Linux + Eclipse/Windows + Visual Studio), it was actually worth all the trouble I went through yesterday.

API Taxes

Taxes serve two purposes: the first is to collect revenue, and the second is to help incentivize or disincentivize certain activities. For instance, we (supposedly) keep raising cigarette taxes to encourage people to quit, and we offer mortgage deductions to encourage home ownership. I got to thinking about taxes when tackling some recurring problems in the development of an API intended for broad consumption including:

  • You have methods that perform low-level operations that have to be made available for certain uses, but should generally be avoided.
  • You have methods that have serious performance implications, or serious performance implications depending on the arguments passed in.
  • You have methods that you want to discourage use of for whatever reason; maybe you are thinking about removing it but haven't yet gone to deprecated; maybe you think it is buggy and unreliable for many inputs; maybe there's a better way of doing it that people often overlook.

All these problems boil down to one thing: there's no consistent way to indicate to API users what methods they ought to be calling under what circumstances. You can add comments all day long and people are unlikely to change their behavior. Once they find something that works, they will stick with it, and then complain when performance is terrible or they corrupt a data-structure. So I thought of a simple solution: API taxes. It would be easy enough to implement in Java with annotations and/or aspects. Basically, every method can be assigned a tax (the default being zero) annotation. Methods that you want to encourage the use of have negative taxes, and methods that you want to discourage have positive taxes. The annotation could specify a base cost to call, or different costs for different values of inputs, or even different costs for calling in different contexts (for instance, calling a low-level method from a wrapper API is zero-cost, but calling it directly is high). Then you have a "Tax Collector" aspect that is called for every method annotated with a tax, to keep track of what you owe. As the API designer, I can then give out guidelines for what a reasonable cost is different usage patterns. If you are exceeding that cost, you need to revisit your calling patterns to see where you are incurring that unnecessary cost. Perhaps you are sending in incorrect inputs, or just failing to cache results that should be cached. Overall, besides the added burden of coming up with tax values during API creation, it seems like an easy way to catch problems early on, when you only have small data sets and test cases to work with.

A well-designed piece of software includes specific places where the designer anticipated the need for alternatives, despite the fact that they may not have had a need for more than one at the time. These are often referred to as change points. This might be allowing for pluggable search or indexing algorithms to be added, alternate representations of a core datatype, or even an entire replacement for some core piece of functionality such as the backend of a system. A system without any change points is certainly bad: it is brittle and inflexible, and the alternatives must be grafted on with lots of if-else statements or #ifdefs. However, too much flexibility can be just as bad.  Furthermore, while many designers are good at putting them in, they tend to be very bad at taking them back out again. We convince ourselves that we Just Might Someday Need That Flexibility, and It Doesn't Hurt To Leave It In. That's just plain wrong. It does hurt, in the form of the cost to maintain the flexibility throughout the design. There is cost in complexity, maintainability, and in many cases, a runtime performance cost.

If you make use of a change point and have implemented several alternatives, then this overhead is usually justifiable. For instance, architecting a system to allow for different databases to be plugged in, when you actually need to be able to run the system on several different types of database is a big win. Architecting the system to allow for database pluggability when you only ever need to run on MySQL is just a huge waste. Every time you add a new feature that affects that backend, you'll need to consider how that affects your generic database interface instead of of just implementing it for the one. I've seen too many projects bog down in this kind of unnecessary flexibility, where everyone spends their time worrying about how to keep the change points tidy instead of writing code that does anything. In describing the emphasis in Extreme Programming on meeting current needs rather than making things unnecessarily general, Kent Beck uses an economic analogy to show that since feature value tends to be highly unknown, and the value of many features holds essentially constant over time, by waiting on all the features that you don't absolutely need, you eliminate the risk of implementing a lot of worthless features. It's not that easy in practice since it is still hard to guess the value of features, especially points of code flexibilty, but the point is well-taken.

While you can't recover the costs of implementing a change point in the first place, you can help reduce their ongoing expense through a flexibility audit. The idea is simple: as you develop a piece of software, keep track of where you made an explicit decision to allow for alternatives, and a note about what you expected those alternatives to be. Every few iterations, go over the list with the team and ask:

  • What is this change point costing us?
  • Do we still need this level of flexibility?
  • Could we get the same kind of flexibility in a less costly way?
  • Has one alternative proven superior to others such that that alternative could become the only implementation?
  • Where do we really need flexibility that don't currently have?
  • Have any change points become significantly more or less costly than last time we audited?

The outcome should be kind of a score for each change point, which is the cost to keep it versus the value to the application. Then purge the ones that aren't worth the trouble. It's can be surprisingly cathartic to eliminate code flexibility as we spend so much time worrying about how to make things more general, and rarely get to make them concrete again. You can make your team happy by eliminating troublesome and never useful bits of flexibility, and your code will be better off for it.

Apple recently released an upgrade to OS X, version 10.5 ("Leopard") to much acclaim. It certainly appears to have some excellent new features, in particular the Time Machine capability that gives you a version control-eque view of your file system, allowing you to see what your system looked like at any previous point in time. As with any major upgrade, there have been a wide variety of reported problems, particularly the "Blue Screen of Death" upgrade issue where the machine refuses to restart until manually rebooted. Whether these problems are more or less than a typical upgrade is for others to decide, but it got me thinking about what happens when I upgrade to a new version of Linux, or try out really any new piece of open-source software: what my expectations are for how smoothly it will go, and what the outcome will be.

To put it bluntly, as a long-time user of Linux on the desktop, I have lost any concept of technical support. I never have a phone number, email address, or in some cases, even a forum to which I can go complain or get assistance with my problems. When I install the latest version of Fedora, or a new version of the kernel, or get a new video driver, I can expect a wireless card not to work, or the system to hang on boot, or my laptop to fail to suspend and resume properly, or worse. While Linux has made huge strides from the early days of endless handholding to even bring up a graphical environment, a few glitches are still par for the course. This isn't a complaint at all; to me, the process of fixing these things is half the fun. I have argued before that users have developed a kind of learned helplessness from Windows (and Mac) hiding information from them, and needing to learn something to get my system fully functional is a great opportunity to acquire new skills.

With Linux, it's not just that there isn't a help line or a dedicated user support forum (and certainly there are many places to get help; learning how to research a technical question is a valuable skill in-and-of itself), it's that there isn't a company that you can ultimately force to do anything about your problem. If my laptop isn't booting, my choices are to try to fix it myself, try to get a more knowledgeable person interested (which usually works!), or give up and switch to something else. I am owed nothing by the Linux community in the way that Microsoft owes me a working machine because I dropped some amount of $ on the license.

I think that notion is what scares most people, and businesses, off of Linux. The arguments that it doesn't run needed applications, or doesn't have the driver support, or is not user-friendly are all slowly eroding, but I think there is still a deep-rooted psychological fear of removing that safety net. Some companies, like Red Hat or Novell, have stepped into that void with support contracts aimed at businesses, but I look at friends and family who all are very smart and capable people, but still pay the "Microsoft tax", and I can understand not wanting to have to give up that feeling of having someone to complain to, or at least blame when something goes wrong. When I take a cross-country car trip, I have a gnawing fear that something strange is going to go wrong and I will be stuck in another state with an inoperable vehicle, and no way to judge the validity of what a mechanic is telling me, nor any faith that their solutions will be the right ones. So I compensate in the same way as people in the computer world: I buy a comprehensive warranty package that makes sure that I can always have my car towed to a dealer who will pick up the tab for the vast majority of problems.

As a "computer mechanic", I feel sorry for computer users who have so little knowledge of what goes on in their system and who are beholden to a faceless software company, in the way that car mechanics probably feel a little sorry for me, but I can still understanding it. For developers though, I have more trouble accepting it.. The large upside of living without technical support is that it has given me a level of confidence that I will be able to resolve any problem, and this has spilled over into my work as a developer on a variety of projects. This benefit is yet another reason that I encourage young software engineers to begin using and learning an "unsupported" system like Linux as early in their training as they can. The sooner you start living without technical support, the better you will be able to effectively tackle the problems you encounter in your professional career.

Fantasy football is an incredibly popular pastime for many fans this time of year. I caught the bug a few years back, and over my time as a fantasy owner, I've noticed a lot of similarities between the lessons I've learned managing my fantasy team, and those I've learned managing software projects.? Here are a few big ones:

  • ?The mythical man-month - Sure, we've all read the book and pay lip service to the concept that people and time are not fungible, but nothing will make this hit home like a disastrous 1-for-2 player trade that looks fair, but actually cripples your team.? Each week, you can only start a limited number of players, and so it is in your best interest to concentrate the talent in as few players as possible.? It's easy to get suckered into a trade where you trade a 16 point/week player for 2 10 point/week players.? Don't do it.? Even though the total output is higher, you are getting a terrible deal.? It doesn't matter if your bench players score 80 points every week if your starters do just the same.

    This is true of your software team as well: 8 great members are infinitely better than 16 mediocre ones, yet software teams tend to "staff up" to solve hard problems instead of just trying to concentrate talent in a smaller number of team members.? It's not as easy to address this problem in real life as in fantasy football, but it's still a worthy goal.

  • Look out for bye weeks - It's all too easy to salivate over the prospect of a great player dropping into your lap at some later round of the draft, only to discover that you will wind up with both of your starters on a bye the same week, which cripples your chances of a win that week.? What's the equivalent in software terms? Choosing multiple outstanding team members who have the same holes in their game.? Three great coders are useless if they are all terrible at design.? You have to learn to put together the right mix of skills.
  • Dance with the one that brung ya - You can easily outsmart yourself if you start worrying about weekly matchups, such as a running back facing a tough run defense, because you will pull good players in favor of mediocre players who look like they have a better chance to succeed.? Except in a few rare cases, just start the same set of good players as much as possible, and you will be better off in the long run.? In software projects we have a tendency to call on a "specialist" to look at a problem, such as calling in a DBA to try to help us address database performance problems.? While a true expert can occasionally offer some insight (much like a bench running back can occasionally put up big numbers on a bad defense) , generally you just wind up angering the team that worked hard on the application by discounting their ability to work through it themselves, and then getting generic advice from the expert who knows very, very little about your actual application.? "Playing the matchup" by calling in the specialist is a risky play when your own staff already understands the problem, and is hopefully highly skilled themselves.
  • Balance your risk - It's easy to draft a team that is all guys with big potential upside: rookie running backs, breakout stars from the previous year, an up-and-coming defense, a no-name "sleeper" tight end.? It's fine to take some of these guys, especially if you have a really solid top of the draft, but you have to balance your risk and take a bunch of established performers that will give you a base 50-60 points every week, and then swap in the gambles that pay off.? In the same way, your software team needs to consist of some sobering, get-the-job-done influences so that you meet your deadlines, while also bringing on some lateral thinking, risk-taking staff that will solve problems in novel ways.? They'll sometimes take a part of the system off a cliff and have to be reigned in, but that's what the established low-risk team members are for.? The right balance is critical.
  • Draft a team you like - I can't remember where I picked up this piece of advice, but the idea is simple: rather than (or more likely, in addition to) running elaborate analyses of who the best players are or mock drafts to try to pin down who you might get, simply make a list of players you'd be happy to have on your team, and go after those players.? Nothing is worse than following a strict value-based draft spreadsheet to get the "best" player you can at each place in the draft, only to realize that you have a team that you aren't really that excited about.

    A software team is just the same, and the effect is exacerbated because you don't have to actually interact with the people on your fantasy team.? When interviewing or selecting team member for a project, ask yourself: "setting aside what I see in the resume, the quality of their sample code, and their demeanor, would I actually be happy with them on my team? " You will be surprised how often with an apparently great candidate the answer to that question is no, and how often with a marginal candidate the answer is yes.

When you think of generating a list of tasks for a particular timebox, you generally think of things that are construction-oriented. For instance:

  • Implement Feature X
  • Fix Bug Y
  • Refactor API section Z

which all involve actually constructing or changing a piece of code. However, there are a bunch of other tasks that are process-oriented, which are usually verbally stated in the minutes of meetings or implicitly executed by the management team, but which show up on no formal task list.

For some reason, software managers (and maybe managers in general?) have an aversion to comingling development and process tasks into a single planning system. While some management tasks don't make sense in this way, like ongoing activities or recurring reporting, I've noticed that by keeping everything in one place and by using some simple metatask designations (tasks that involve working with other tasks instead of with code), the schedule becomes much more transparent and manageable.

Here are a few metatask designations that I commonly use:

  • Task Scope Analysis - This is the general heading for tasks whose outcome is simply more knowledge about the scope of another tasks. There are a few subtypes of this:
    • Increase Estimate Quality - The most generic form of analysis attempts to take a task that has a wildly varying or low-confidence estimate and either narrow the estimate range, or increase your confidence level.
    • Cap Scope - This is a very specific kind of tasks that involves taking a broadly-defined task (such as a typical Chop) and breaking out a more limited set of well-defined, and time-capped (i.e. this must take no more than 2 hours) tasks. This metatask is useful when you have a strongly fixed amount of time, but nebulous tasks that need to be tightly managed.
    • Go/No Go - Many development tasks are not required for the success of a project, and as any software manager will tell you, you spend much of your time figuring out what you don't need to do. This metatask makes the work that goes into those decisions explicit.
  • Triage - This idea should be well-known to most managers, but it's rarely explicitly stated. In short, going through some set of tasks and organizing them by priority.
  • Schedule - We generate schedules all the time, but we rarely put "generate the next schedule" as an item on our current schedule. It is implied that by the time one timebox finishes, the next one will be ready to go.
  • Balance/Rebalance - During a timebox, do two thingsL 1) look at each developer's task list and determine if they have too much, not enough, etc. and redistribute tasks as necessary 2) if you are overscheduled, knock some tasks out of the timebox, and if you are underscheduled, bring in some tasks from the on-deck circle.
  • Purge - I don't know if I've ever seen this as an explicit tasks, but the idea is to go over your task list and just get rid of tasks. This can be for many reasons: it was a dupe of something that's already done, the task is OBE but was never discarded, or the task is so poorly described or understood that it will never get scheduled in its current form.

Using metatasks has many benefits. I've already mentioned the transparency aspect. To me, the biggest benefit is the transformation of a timebox outcome from simply "we added these features, and fixed these bugs" to "we added these features, and fixed these bugs, and acquired this knowledge". In many cases, gaining the knowledge of the scope of a task is as valuable as the task itself. Without a metatask, we are forced to schedule the task directly, and treat the analysis of its scope as part of the task.  Another benefit is the ability to schedule when information will become available.  A great example is the Go/No-Go metatask.  In timebox N, you schedule a series of Go/No-Go metatasks on a handful of tasks that you might schedule in timebox N + 1.  This guarantees that by the time you needed to add the task or tasks to the schedule, the information about which one or ones you plan to do is already available to you.

With or without metatasks, the problem of how much time is spent analyzing a tasks or set of tasks, versus doing them is a constant struggle, and is the topic of the next post.

Next: Task Management III: Task Management as Gambling

Syndicate content