Distance Debugging Logo

One of the hobby horses of this blog is that debugging is hard because of human problems, and not necessarily because of computers. A recent story linked on Slashdot regarding problems with multiple computer failures aboard the international space station is a great illustration of that problem, in a very high stakes debugging situation.
From Space Station: Internal NASA Reports Explain Origins of June Computer Crisis:

During the first days of the computer failure in June, the station's atmosphere control system seized up. The failure also knocked out the autopilot's ability to fire maneuvering thrusters to hold the station steady during the undocking of the space shuttle, which had arrived on 10 June. The terse description in the NASA internal technical report on the crisis, obtained by IEEE Spectrum, put it this way: On 13 June, a complete shutdown of secondary power to all [three] central computer and terminal computer channels occurred, resulting in the loss of capability to control ISS Russian segment systems.

That's really, really bad.But what's worse is the response to the problem:

Russian officials were quick to blame NASA for 'zapping their computers' with 'dirty' 28-volt power from a newly installed solar power wing. Another Russian explanation was that the expanded station structure (the main purpose of the shuttle visit) might be excessively charging up due to its orbital speed through Earth's magnetic field. These were the first of many bad guesses by top Russian program managers that would distract engineers trying to get at the real problem. [My italics added for emphasis]

Now I'm not willing to hold the US team as blameless as this author does (a NASA team member) despite the clear faults with the Russian systems that were eventually uncovered. However, I agree with his characterization of these problems as "guesses" and that they were "distracting". Interestingly, in this case, they took an action that fixed the problem, although no one could explain why:

The initial assumption was that some external interference, such as noise on the power supply, was responsible for generating false commands inside the computer system. On the assumption that the bad commands were coming from inside a power-monitoring device, the crew bypassed it on two of the three downed computers, using jumper cables. By the time the shuttle undocked on 19 June, the computers began to function normally?or so it seemed. Replacement parts were quickly manifested on a robot supply ship, while ground engineers wrestled with the fundamental question of cause and effect.Analysis teams still had to determine why the computers failed, and why the jumper cables seemed to fix the problem. More important, they needed to know whether the problem really was fixed, or whether something could again trigger the systemwide crash of the supposedly triply redundant architecture.

In the end, it turns out that they had guessed right about the source of the problem being bad commands from the power-monitoring device, but the why took some explaining.Essentially a short-circuit resulting from corrosion, itself the result of poor design, was causing the power-monitoring device to send a "shut down" message to the computers, which was a "misfeature" designed to protect them from voltage spikes that would damage them.? Instead of there being triple redundancy, all three were brought down by the same underlying cause.
This situation first brings to mind one of the rules from Debugging Rules: Stop Thinking and Look! In this case it turns out by simply opening the case, it was immediately clear that things weren't right: electronics were wet.

In the weeks that followed the crisis and apparent recovery, station commander Fyodor Yurchikhin and his fellow cosmonaut Oleg Kotov disassembled the boxes and cabling and inspected every angle of the hardware, occasionally assisted by their American crewmate, Clayton Anderson. Multiple scopes and probes had failed to find the flaw, but their eyes and fingers eventually did.
The connection pins from the power-monitoring device they'd bypassed earlier, they found, were wet?and corroded. The final report described the change in appearance of fasteners on one box's connectors and noted the presence of deposits and residue on the housings, and residue and spots on the contact surfaces.

If they had just stopped theorizing and looked inside, it would have been immediately clear that something was very, very wrong. From a Distance Debugging standpoint, the issue was "the danger of the unfamiliar", more technically know as Ingroup Bias. Here the issue was directly attributable to general mistrust between US and Russian astronauts, and the social boundaries that exist between them.? However, in any system, we will tend to blame "other" systems, the ones that we didn't build or those of which we have less knowledge. I would guess that in this case, even if the the two groups were both US but had developed their technologies independently, the finger-pointed would have been the same.
In any case, it is thankful that they were able to workaround and eventually resolve the problem, but it shows how pervasive debugging problems are. Even the most highly trained astronauts and scientists resorted to the same heuristics and biases that tend to guide the rest of us when we go about fixing problems.

I am constantly admonishing companies to take IT seriously, in the way that they would take accounting or marketing seriously. In fact, I would argue that technology management should be a core part of a modern business curriculum beyond generic "Management of Business Technology" courses. Most businesses I know take only a passing interest in trying to keep tabs on where they are in terms of the applications they are running, their infrastructure, and in particular, having any idea what their employees could use to do their job better. So when I saw this post on TreeHugger about major power savings being found simply by turning totally unused machines off, it seemed like a nice metaphor for corporate IT problems in general.? From the article:

In some companies it may be the case that there are many servers that are left on for no good reason, simply to serve legacy applications. Mark Monroe, Sun Microsystem's director of sustainable computing, gave a talk where he explained that they were able to tuen off 10% of their servers in this way. He called the phenomenon ?data center drift?.

He went on to explain that a survey had found two companies had 504 ?mystery machines? out of 4,300 servers. When they were turned off they had no actual impact on the companies operations. This is something that should be simple to implement, but can have a dramatic impact on energy bills.

I particularly love the characterization of the phenomenon as 'data center drift', as it transforms incompetence into something that sounds almost natural.? In the above example, nearly an eighth of their servers were simply doing nothing that affected a single person if they were turned off.? Imagine if the accounting staff did an audit and determined that an eighth of the corporate budget was just being thrown into a giant pit and buried every day, but that they didn't notice because it used to be that the money was funding things, and it slowly shifted to the pit.? They would probably be fired on the spot.? I somehow doubt any of the IT staff were taken to task for this gross oversight.

When choosing an avenue of attack during the Isolation stage, it's important to keep in mind two different dimensions: probability and testability.? Probability is your informed estimate of how likely you believe a particular problem is the cause. Testability is how much time and effort you suspect it will take to rule that particular cause in or out.? Ideally, your most probable causes would be the most testable, but it rarely works out so nicely.? Ultimately, you have a simple 2x2 matrix of possibilities, and you can place each theory in one of the sectors:

Probability vs. Testability

The first theories to try are of course the Highly Probable-Easily Testable ones,? labeled "Ideal" in the matrix.? Next is a judgement call.? If you have some very Easily Testable theories that are fairly Improbable, it might make sense to take an hour to knock them all out.? These are labeled "Why Not?". On the other hand, if you have a Highly Probable theory that might take some effort to test, it could be much more valuable. These are labeled "Necessary Evil".? Finally, if you've exhausted all other possibilities, it's time for the Low Probability and Hard to Test theories, labeled "Last Resort".? Before you start trying to follow up on these ones, take another long look at what has already been tried, the data you've already collected, and any other information that might help you see a glimmer of another possibility before wasting a lot of effort on an unlikely theory.? However, sometimes there's no other choice.
Rather than just assigning a label to each theory, it can also help to simply draw out the matrix above and plot your theories on an X/Y axis, where upper-left is best and lower-right is worst.? This can help you easily see both where you ought to start, and how much work you are in for before starting in on an extended isolation exercise.

Recently, Steve Ballmer made some comments regarding social networking that were widely ridiculed (and probably more appropriately, labeled as self-serving since Microsoft has been looking to acquire a stake in Facebook and would be happy to drive down the price):

"I think these things [social networks] are going to have some legs, and yet there?s a faddishness, a faddish nature about anything that basically appeals to younger people," Mr. Ballmer told Times Online yesterday.

On his blog, Marc Andreessen wrote a response making use of a common conceit: applying comments about a modern phenomenon to historical phenomena in kind of a reductio ad absurdum argument. A brief excerpt:

"I think these things [televisions] are going to have some legs, and yet there?s a faddishness, a faddish nature about anything that basically appeals to younger people."

"I think these things [hip hop music] are going to have some legs, and yet there?s a faddishness, a faddish nature about anything that basically appeals to younger people."

"I think these things [mobile phones] are going to have some legs, and yet there?s a faddishness, a faddish nature about anything that basically appeals to younger people."

Now, I assume his point is that so-called disruptive technologies are often dismissed at the time as a fad and, quite frankly, it can be very hard to tell a fad from something truly transformative. This brings up a larger question though: is social networking more like television, hip-hop, and mobile phones, or is it more like video arcades, pocket bikes, and the "Rachel" haircut? What features of a trend might we use to determine this?

I started considering this question recently because I have a guilty secret: I don't really get mainstream social networking site even though I make heavy use of technology in general. I certainly understand why teenagers and college students use them, and I take part in lots of implicit social networks via listservs and other online communities, and I've even made use of some sites like last.fm, but I don't see why I would care to get seriously involved in Facebook or MySpace (Andreessen's company, Ning, makes more sense to me for reasons that will become clear in a moment).

My issue is fundamentally about what one might call personal power. I remember reading one of the Carlos Castaneda books when I was younger, and Don Juan at one point counsels the narrator to cut off his ties with friends and colleagues back home in an effort to erase his personal history and thereby increase his personal power, which is diluted by his past and relationships (I'm strongly paraphrasing; I read this book probably 15 years ago, but this part stuck with me for some reason).? However you feel about the mystical mumbo-jumbo in these books, it's hard to not see the kernel of truth in this idea: you can increase your perceived status simply by limiting others' access to you.

This brings me back to the dilemma of social networking sites, and a rule that I've just made up that I'll call the inverse social power rule.? Simply put, the likelihood of finding a contact on one of these sites is inversely proportional to the quality of the contact. The problem is that people who have power have no need of additional access paths to themselves, while those who are trying to rise in the ranks are much more willing to be "promiscuous" in allowing social access in the hopes of making a connection with someone of higher status. Blogging uses the same logic: I divulge information about what I'm doing and thinking in the hopes that I might attract smarter, more interesting people to say or think nice things about me, and maybe give me some money to do some work for them (hint, hint).

So if I am a high-school or college student, and therefore I am generally on the weak side of the power equation in most relationships, social networking makes sense.? If I'm a CEO or a celebrity, I want to limit my access as much as possible and avoid social networking like the plague, since that's just giving the milk away for free.? If I'm somewhere in the middle, I want to be more like the CEO, not more like the college student, so I want to make extremely judicious use of these types of sites lest I give the appearance of a weaker social status.

You'll note that Marc Andreessen does have a page on MySpace, but he hasn't logged in in nearly 2 years, and has 0 friends, revealing basically nothing about himself. Now that's a MySpace page for a CEO.? As far as I can tell, there is no real Steve Ballmer listed there, although there are at least 2 parody profiles.? It's pretty much the same story on Facebook, although Marc does list his companies.? I somehow doubt he would respond to a poke though.

And therein lies my problem with social networking sites, and why I tend to agree a bit more with Ballmer than with Andreessen on this one, although I think Marc has a very different perspective because Ning is for building sites that allow for topical rather than status-oriented social connections, which breaks my power rule completely. I honestly believe that at this point in my life and career, I am better served by avoiding them than by joining them, and I wonder how many upwardly mobile 20-somethings are going to be frantically deleting their profiles from these sites when they realize that they have moved to the strong side of the power equation.

In conclusion, I have no doubt that entreprenurs can profit from social networking sites since they do have benefit to the ones that need them.? However, I ultimately believe that they will not have transformative power because unlike a technology such as a cellphone, which has become essential as a tool for increasing one's social status and only becomes more vital to the owner over time, social networking sites will continue to lose members just as they are becoming truly valuable, draining their ability to make a significant cultural impact.

Fantasy football is an incredibly popular pastime for many fans this time of year. I caught the bug a few years back, and over my time as a fantasy owner, I've noticed a lot of similarities between the lessons I've learned managing my fantasy team, and those I've learned managing software projects.? Here are a few big ones:

  • ?The mythical man-month - Sure, we've all read the book and pay lip service to the concept that people and time are not fungible, but nothing will make this hit home like a disastrous 1-for-2 player trade that looks fair, but actually cripples your team.? Each week, you can only start a limited number of players, and so it is in your best interest to concentrate the talent in as few players as possible.? It's easy to get suckered into a trade where you trade a 16 point/week player for 2 10 point/week players.? Don't do it.? Even though the total output is higher, you are getting a terrible deal.? It doesn't matter if your bench players score 80 points every week if your starters do just the same.

    This is true of your software team as well: 8 great members are infinitely better than 16 mediocre ones, yet software teams tend to "staff up" to solve hard problems instead of just trying to concentrate talent in a smaller number of team members.? It's not as easy to address this problem in real life as in fantasy football, but it's still a worthy goal.

  • Look out for bye weeks - It's all too easy to salivate over the prospect of a great player dropping into your lap at some later round of the draft, only to discover that you will wind up with both of your starters on a bye the same week, which cripples your chances of a win that week.? What's the equivalent in software terms? Choosing multiple outstanding team members who have the same holes in their game.? Three great coders are useless if they are all terrible at design.? You have to learn to put together the right mix of skills.
  • Dance with the one that brung ya - You can easily outsmart yourself if you start worrying about weekly matchups, such as a running back facing a tough run defense, because you will pull good players in favor of mediocre players who look like they have a better chance to succeed.? Except in a few rare cases, just start the same set of good players as much as possible, and you will be better off in the long run.? In software projects we have a tendency to call on a "specialist" to look at a problem, such as calling in a DBA to try to help us address database performance problems.? While a true expert can occasionally offer some insight (much like a bench running back can occasionally put up big numbers on a bad defense) , generally you just wind up angering the team that worked hard on the application by discounting their ability to work through it themselves, and then getting generic advice from the expert who knows very, very little about your actual application.? "Playing the matchup" by calling in the specialist is a risky play when your own staff already understands the problem, and is hopefully highly skilled themselves.
  • Balance your risk - It's easy to draft a team that is all guys with big potential upside: rookie running backs, breakout stars from the previous year, an up-and-coming defense, a no-name "sleeper" tight end.? It's fine to take some of these guys, especially if you have a really solid top of the draft, but you have to balance your risk and take a bunch of established performers that will give you a base 50-60 points every week, and then swap in the gambles that pay off.? In the same way, your software team needs to consist of some sobering, get-the-job-done influences so that you meet your deadlines, while also bringing on some lateral thinking, risk-taking staff that will solve problems in novel ways.? They'll sometimes take a part of the system off a cliff and have to be reigned in, but that's what the established low-risk team members are for.? The right balance is critical.
  • Draft a team you like - I can't remember where I picked up this piece of advice, but the idea is simple: rather than (or more likely, in addition to) running elaborate analyses of who the best players are or mock drafts to try to pin down who you might get, simply make a list of players you'd be happy to have on your team, and go after those players.? Nothing is worse than following a strict value-based draft spreadsheet to get the "best" player you can at each place in the draft, only to realize that you have a team that you aren't really that excited about.

    A software team is just the same, and the effect is exacerbated because you don't have to actually interact with the people on your fantasy team.? When interviewing or selecting team member for a project, ask yourself: "setting aside what I see in the resume, the quality of their sample code, and their demeanor, would I actually be happy with them on my team? " You will be surprised how often with an apparently great candidate the answer to that question is no, and how often with a marginal candidate the answer is yes.

When people discuss the creation of software, they make a clean separation between the tools that one can use, such as an IDE, and the techniques one can use, such as Object-Oriented Programming. In general, good tools can offer efficiencies, analysis, or insight that would be missed without it, and they might even help to reinforce the use of certain techniques, such as Eclipse's support for refactoring as a core piece of functionality.

For whatever reason, people have trouble making this same distinction with debugging. When people talk about it, they tend to refer to the tools they use for the job rather than their theoretical approach to problem solving. Case in point: a post about debugging on the Meebo Blog. It didn't contain any misinformation, but it only mentioned the tools that they use: gdb, core dumps, strace, etc.? It left me wondering: so you open up the core dump with gdb, then what?? How do you search through the massive dump to find the information you need?? Do you keep a detailed history of common errors? Do you generally have an idea of what is wrong before you start?

The notion that the availability or knowledge of certain tools shapes or empowers your thinking has been called "The Fingertip Effect" by Dr. David Perkins,? referring to the concept of? having resources "at your fingertips".? He used it to formulate an argument in the context of educational technologies, where the fingertip effect is generally taken as a given.? Parents frequently demand that a certain number of computers, laptops, PDAs, technology X? be present in every classroom or within the school in order to maximize student opportunities.? The implicit premise is that by making these resources available "at students' fingertips" they will suddenly be able to use them in an appropriate way, and they will naturally benefit.

Most research seems to suggest that this is not true (and Dr. Perkins coined the term for the purposes of investigating, and eventually discrediting the idea).? The problem is that students widely vary in their ability to make use of new technologies without corresponding instruction in its affordances (now I'm really pouring on the ed jargon).? So the fingertip effect isn't necessarily wrong, it's just not a good educational strategy;? technology has to be combined with an effective instructional program so that students actually understand what it can do for them, how it does it, and why.

The same can be said for debugging.? It isn't that a focus on tools such as gdb, strace, or even Delta Debugging is wrong, it's that if you don't have a framework for understanding how to isolate and fix bugs, tools won't help you.? In the same way, having Microsoft Word will not make a poor writer into a good one, even though it will allow a good writer the opportunity to become better since they can spend less time on copyediting and more time on style and content.? In retrospect, the idea that tools cannot be substituted for techniques seems obvious, yet parents continue to call for more computers without calling for more computer-savvy instructors, and the debugging field continues to offer a wider variety of tools without regard for instructing people in their use. My hope is that the material on this blog helps people make better use of the tools that are already at their fingertips.

From Engadget yesterday:

Merely three days after hearing of one user's run-in with Apple over his unlocked iPhone, the company has released an official statement warning users that "unauthorized iPhone unlocking programs" could cause "irreparable damage to the iPhone's software." Furthermore, the firm stated that these apps could result in the handset becoming "permanently inoperable when a future Apple-supplied iPhone software update is installed" -- you know, like the one coming "later this week" that includes the iTunes WiFi Music Store.

The team that developed the unlocking software offered a response today:

Based on download numbers, the iPhone Dev Team believes that, worldwide, several hundred thousand people have unlocked their iPhones. That number continues growing every day. The removal of the lock, a bug, was a major step forward in the iPhone development. It made the iPhone free and useful to anyone, not only to those in certain countries.

Apple now announces that the next firmware update, expected later this week, will possibly break the handset of all of us free users in the World. It speaks of "damage" done to the firmware and "unauthorized access" to our own property, The removal of those firmware problems, which were built in in favor for AT&T, does not cause "damage" as they want to make us believe.

We will provide you with a tool in the next week which will be able to recover your nck counter and seczones and even enables you to restore your phone to a Factory-like state.

Apple has taken a lot of flack for this statement, with people arguing that they are fearmongering with their claim that the unlock will break their phone when the firmware update comes out, and by making such an update in the first place.? My question is, why would Apple go to the trouble of telling people about the problem ahead of time?? Why not just release the update and break all the hacked phones, thereby making people wary of unlocking things at all?

To me, this whole exchange seems like some tricky subtextul communication between Apple and the iPhone hackers to keep customers, who want unlocked iPhones and give apple $, and vendors, who want locked iPhones and give apple $, happy while still allowing everyone to go about their business:

Apple: I'm going to give you until the count of 3 to put your iPhone back the way you found it! I'd hate to have to take away your privileges altogether (wink, wink)!

iPhone Dev Team: Okay, okay, you win.? I'll put the iPhone back the way it was.? You can roll out that upgrade whenever you'd like (wink, wink).

I guess we'll see what happens the next time around...

When called in to fix a computer problem at someone else's location, what should you bring with you to increase the chances of success?? Certainly, it helps to know something about the problem before walking in the door, but having a generic set of tools that will assist in a wide variety of situations is critical. Here is a list of what I put in my toolbox.

Hardware:

  • ?Laptop - You need your own computer since there may not be a working one present. Dual boot Linux/Windows, although you can stay in Linux, generally speaking. Better yet, run Linux and have a few virtual machines with different OSes ready to go.
  • Wireless (cellular) modem card - This is not necessary, but I like to have a high-speed, dedicated internet connection that keeps me from having to rely on the customer environment, especially if the network is the problem.
  • Wireless (802.11x) card - For working with/sniffing wireless networks
  • Hub - You can use a switch that has a sniffer port that allows you to see all the traffic, but a 4-port hub makes it easy to insert yourself into the network quickly and unobtrusively
  • Ethernet cables - Probably at least two standard and one crossover, just in case.
  • Computer screwdrivers - in case you need to open the case and look for evidence of physical problems, or pop out flaky hardware.
  • Thumb drive with all the software described below for as many OSes as they are supported on - Since a 2GB thumbdrive costs so little, this is easy, and allows you to quickly copy analysis and dev tools onto other machines.

Software

  • Standard network and comunication utilities - ping, traceroute, ssh, etc - helpful for checking the status of machines and for answering questions about networks
  • Standard network service daemons - dhcpd, named, etc - helpful for allowing your laptop to pose as various services.
  • Advanced network utilities - nmap, wireshark - for really looking at what is coming over the network, and analyzing hosts
  • Standard? gcc toolchain - don't leave home without it
  • A serious IDE such as Eclipse - I like Eclipse because I can use it to quickly examine Java, Ruby, C++, PHP, etc with all the plugins I've installed over the years.? If you have a Windows partition/VM and some extra cash, Visual Studio can help too.
  • Linux rescue CD from your favorite flavor - That way you can boot into an OS where you can do whatever you want, including inspect partition tables, mount various drives to access content without needing passwords, and generally, take the machine's possibly broken configuration out of the equation to separate out hardware and software issues.
  • Password crackers and recovery tools - This one can be ethically questionable, but when you need some files that some developer left in their account and they've left the country for a 1 month vacation, a customer will be begging you to break them out.? I recommend something to recover BIOS passwords, a zip password file cracker, and if you want to lug a big drive around, a generic password cracker that uses rainbow tables to break systems protected by weak hash-based encryption.

That's a good starting list.? I may post again if I add other tools that may be of interest, and feel free to add your own suggestions in the comments.

Technically speaking, blogs are supposed to be places where you talk about things you've seen or read and comment on them, at least in one connotation of the word, which I rarely do here. However, I came across this article, discussing the origins and popularity of the "brain teaser" interview/recruiting format and seemed like a good way to return to my blogging roots.

I have previously railed against brainteasers as I feel that it has little or nothing to do with what software engineers actually do. In this article though, in addition to the classic brain teasers, it also discusses the use of estimation problems such as "How much would you charge to wash all the windows in Seattle?". The article failed to mention the historical context for these questions as Fermi Problems, but it offers a similar justification, via a Google employee:

Such questions are more relevant to a high-tech job interview than you might think. "Employers want to see if you can make an estimate in the ballpark, within an order of magnitude," says Mark Jen, a former Google employee who is now a program manager at Tagged.

and it goes on to posit:

Coders are constantly making educated guesses rather than calculating exact answers, so a good interview should probe how well a candidate handles such estimates. That's why Amazon.com interviewers, for example, have been known to ask job candidates to guess how many gas stations there are in the United States or to ballpark that bill for washing all of Seattle's windows.

I agree that we often need to make educated guesses instead of direct calculations, so it certainly is a more authenticate assessment than a brainteaser (in my opinion). The bigger question is: is this a useful skill to test in an interview setting? I believe that if the original intent of the Fermi Problem is actually observed, then the answer is yes. To me, that means two things:

  1. Ignoring the actual generated estimate, or inputs used to the estimate in favor of looking at what the inputs were - In the classic Fermi Problem "How many piano tuners are there in Chicago?" the fact that someone wildly under or overestimates the population of the city is less important than the fact that they chose that as an input.
  2. Paying attention to where they feel they need more information, and probing about how they might obtain it.

In addition, I would suggest choosing more domain-specific estimation problems. Instead of the cost to wash the windows of Seattle, ask them to estimate the number of bytes that are actually transmitted across the network to load a 100K HTML file in the browser, or the amount of power needed to keep a 1000 piece server farm at 90 degrees for an hour. In addition to understanding their estimation process, you will be able to see their mental model for the realization of these operations in the real world, which is the part I personally feel is the most critical.

Isolation of a problem can be a tricky thing.? If you have the luxury of the ability to actually change the system and observe the effects of your changes, an easy way to approach the problem is to start from the known bad system run, turn that into a test case that you can repeatedly run, and then start lopping away stuff that you either assume is not related to the problem, or that you have already verified is not related.? You can tell how well you understand the problem by how good your choices in regards to what you slice off.? If you repeatedly make cuts that either break the application or that make the problem unexpectedly go away, then you should probably do some more digging before you do this kind of isolation.

Cutting away can take two forms: either, you can simply skip a step that seems irrelevant, or you can replace something with a mock object (a topic that is too broad for this post) that gives you back some necessary data.? Eventually, the goal is to cut away everything that is not relevant to illustrating the problem.? This is usually some setup to get the right data structures, some actions to make them change in the correct way, and a comparison with the correct result (which will fail).

For instance, let's say that you have a PHP script that is generating bad output.? The page you get back lays out in a completely unexpected way, and eyeballing the HTML source, it's clear that you are not generating the expected HTML.? First of all, your debugging the problem is hindered by the fact that if you are omitting tags or adding strange attributes to things, the browser is covering a lot of that in its attempt to be fault-tolerant.? So you can cut away the browser and replace it with an HTML DOM parser that verifies the output is what you expect.

Next, all the pages have headers and footers and sidebars, which all layout fine, as they do on every other page.? So you cut the includes and such that create those parts.? The DOM is still not what you would expect.? Finally, you cut away some of the calls that insert database data into the layout and replace them with hardcoded text that approximates what you think is coming back from the database.? Suddenly, the DOM is fine.? You hadn't anticipated that the database data was the issue, but clearly it's relevant and you've cut too much.

So you then cut HTML layout generation from the testing completely, and just write tests to look at the values coming back from the database.? It becomes clear that HTML has crept into certain fields unexpectedly and that is throwing off the layout.? At this point, you have isolated the problem, and you have a simple, quick test that only does the thing necessary to illustrate the problem.? This will greatly aid your Repair process, since it will be easy to verify any attempted fixes, and you can easily run the whole real world test case again when you get to the Validation stage.

Syndicate content