Distance Debugging Logo

My grandmother has an old laptop (going on 5 years) that has started to have a lot of odd problems. Of course, she bought the long warranty so it's the manufacturers problem for the most part. I haven't been able to convince her that all the time she spends fretting over it, on the phone with them, trying to fix the problem herself, calling me to get my advice on fixing the problem, etc. are far more costly than running out to Best Buy or whatever and picking up a new laptop, which now can be had for less than $500 easily. it's been another interesting exercise in Distance Debugging both with her on the phone, and listening to the results of her latest conversation with the manufacturer. Some highlights:

  • The initial problem as she described it, was that she saw a flash of light and then smoke came out of the side of it. I don't know what really happened, because there is very little inside of a modern computer that dies in such a dramatic fashion.
  • As it was under warranty, she sent it off to the manufacturer after I backed up the hard drive. They claimed that the screen was cracked (which it most certainly was not when it left here) and so they were sending it back.
  • When she got it back, they had replaced the screen. So I guess they invented/discovered and then fixed a problem that as far as we know was unrelated to the original problem. It was some of the worst distance debugging I've ever seen.
  • When she got it back, she started have a weird problem with not being able to click links in applications and have them open a web browser properly. While this had to be unrelated, she spent hours on the phone with their tech support who were completely stumped. Fortunately, I knew that she had tried to install Firefox and I guessed (correctly) that it screwed up her "Applications to Use" settings for HTML mime types and http:// URLs. I fixed it in about 30 seconds much to her amazement.
  • A few days ago, the screen went black while she was working on it. The lights come on, it gets hot, etc. So something is happening, but it's impossible to tell what is going on. I assume that the brand-new screen they quietly installed failed, but my grandmother insisted that it was getting hotter than it used to. However, there is no proof either way.

To me, the basic principle being illustrated again and again here is looking at what has changed, In the case of the bad linking, I first ascertained what if anything was different and only after I learned that a new browser had been installed was it clear what the likely problem was. On the second issue, I assume that the new screen was to blame both because it was the thing that changed, and because of the famous "bathtub" failure curve that electronic components tend to follow. While the heat issue might in actuality have something to do with the problem, it has to be thrown out because we have no hard data about whether or not the computer is actually hotter than it used to be. I've wasted so much time in my debugging life because I've decided that some error or condition that I've noticed is responsible for a problem, when in fact, it was there before and so it probably has little or nothing to do with the problem. In the long run, this whole process showed me again why I don't get the warranties on anything. Companies are so bad at debugging problems that I might as well put that saved warranty money towards a replacement and save all the time and hassle. Warranties also convince you to keep something alive long after it should have been discarded. Keep that in mind the next time they try to upsell you.

My new Abit board seems to work great. I was able to boot it with all peripherals hooked up the first time I launched it, so I"m going to assume that something was genuinely wrong with the old one (now on it's way back to Abit).

Just received my replacement board from ABIT, so I'll be swapping it in and hoping for the best. For those interested, the RMA process is totally electronic, and zero hassle, so despite my annoyance at the lack of general support, it looks like it may work out. I guess its cheaper for them to handle RMAs versus staffing up a larger support team.

I have two WRT54g routers (well, one 54g v4.0 and one 54gl since they messed with the 54g line after that version), both "upgraded" with the OpenWRT firmware. One of the things that I use them for is to bridge them together over WDS so that I can have a bunch of computers talking to each other over a wired link upstairs in my office, and then have that router bounce traffic destined for outside or other computers downstairs over the WDS link. I had it seemingly set up okay, but I started to notice some weird effects, which I've since learned had to do with my failed understanding of what was actually happening.

  • All wireless clients would only connect to the downstairs AP. I figured they would talk to whomever is closer, and this defeated the secondary purpose of this set up, which was to provide better coverage for wireless clients. Since the WRT54g has a much better set of antennas, it seemed like it would be a win for clients upstairs, not to mention that they would "think" they were connected more often, which is just as good.
  • Occasionally I would have weird wireless dropouts. I figured that this would no longer happen since there were multiple access points. Instead I expected to see the wireless signal staying strong, but the packets being dropped. That's how I discovered the above.
  • I would get duplicate packets when I pinged things from a wireless client. I attributed this to just an odd side-effect of my setup, and it didn't seem to affect anything major.

Well, it turns out that what I was doing was a little bit screwy, and I'm still not even quite sure why it worked. First of all, I set it up in client-bridged mode, so the upstairs router was not acting as an Access Point. In my mental model of the situation, I assumed that being an access point and being a bridged client were incompatible and that the latter subsumed the former. Even weirder, I was using the wrong mac addresses for the WDS connection, so I think I was actually entering the network at the wrong level and I think that was what was causing the duplicate packets, although I'm surprised WDS worked at all.

My real problem is that I failed to understand that using WDS and being a client are orthogonal properties. You can turn on WDS in all your routers, and still let them all be access points, or let them be clients. As access points, they all accept wireless clients, and route traffic to each other accordingly, which is what I wanted. When I switched the upstairs box to be an access point, it suddenly stopped routing packets to the downstairs box, and that's where I got the idea that they were linked together. Somehow, when I set the box up as a client/repeater, it worked, even with bad MAC addresses. Once I fixed the mac addresses to correctly use the wireless devices, it worked as expected, in access point mode.

Now if I scan for access points, I see the one with the stronger signal, and my wired boxes upstairs can all get out through the WDS connection. For anyone who has played around with this, make sure you use the MAC addresses that you see when you say 'iwconfig eth1' on your routers, and not the one that shows up on the status page. That was my main mistake. This was also a great example of how you can fit observed symptoms to an assumption of functionality that proves to be totally wrong.

Fixing my new computer

I moved recently, and my Linux server started having trouble when I set it up at my new place. Specifically, yum and firefox kept crashing out oddly where I never had any previous errors. I assumed it was a memory problem, so I ran memtest86 on it for a while, and I got a whole slew of errors on Test 7 where it reads and writes random values. I also tried memtest86+, which would just hang when it got to a certain set of operations. I tried removing each memory stick separately and rerunning, but the errors persisted. At that point, I had to assume that something had gone wrong either on the CPU or the motherboard somewhere.

Long story short, the machine was 2 years old so I figured I was due for an upgrade anyway and purchased the components for a new Pentium D-based system, which I figured was the best performing thing in my price range. I could assemble an ABIT AW8D mobo-based system including 1GB RAM for about $400, which is nice. I threw in a new video card since the mobo was PCI Express. It all arrived last week, but I got this funny feeling when I assembled it that it wasn't going to work, and my "this-ain't-right" detector is usual pretty accurate. I don't know what it was, but sure enough, I apply power and the thing gives me a series of beeps (1 long, 3 short) which indicate a keyboard error, apparently. 2 PS/2 keyboards and a USB later, still no change.

Oddly, the POST code readout shows 8.7. which could be a CPU voltage problem, so I don't know who to believe. I thought maybe my old 350W power supply couldn't handle the load, despite the manual's claim that 300 was sufficient, so I swapped in a 500W Seasonic S12 which is rated one of the best by Tom's hardware. Still no change. I've appealed to the distance debuggers out there on the Abit forums but haven't gotten any responses. Might just be RMA for me...

If you have any suggestions, please leave them in the comments.

Syndicate content