Sunday, January 30, 2011

Update-grub, and Mysterious Error Messages

I ran the Squeeze upgrades on my desktop machine Saturday morning, and noticed that a grub update was on the list. I've had enough trouble with grub updates lately to know that it was time to back up my data files before I went any further. After the backups, I ran aptitude dist-upgrade, and restarted the machine. (Why wait for trouble to come to me? I'll just go meet trouble.)

Although grub was there and looked OK, it sent me straight into a kernel panic, even in recovery mode. I pulled out my net install disk and booted into "Rescue Mode," asked for a terminal, and ran update-grub. A reboot sent me back to a kernel panic, so I re-rebooted into recovery mode, which worked this time. I ran update-grub again, and rebooted. This time, the system booted, albeit very slowly, and with lots of error messages concerning ata3.00. I googled these error messages, and found some potentially useful links.

There's one message I always get at start up, and whenever I "wake up" my desktop computer: ata3.00: softreset failed (device not ready). This is apparently common among AMD64 Linux kernels, and I've been ignoring it successfully. I found out some interesting things about it.

Some of my error messages might have been indicating a problem with the physical hard drive, or its connectors. I suspected this wasn't my problem, but I wasn't having much luck looking for software solutions. Monitoring Hard Drive Health clued me into S.M.A.R.T., which can let you know if your hard drive is in trouble before it gives up altogether. I installed the Debian smartmontools package, and it told me the hard drive was apparently OK.

In some forum somewhere, I read that SATA cables were more likely to cause error messages than hard drives. I was running out of things to try, so I shut my computer down, let it stand for half an hour, and opened it up. All I did was blow the dust out, and check to see if the SATA cable was plugged in good and tight on the mother board and the hard drive. It seemed fine. The first computer I ever owned always went belly-up every time I moved it, or even bumped it. The fix was to open the case and jiggle all the cables a little bit. I did that this time, too. Call it superstition if you like.

After I put things back together and powered up, the little darlin' booted right up with nary an error message, and has been running like a top ever since. I really hate it when I don't know what was wrong, or why it's running now. Did something reset while it was powered down? Did jiggling the cables do the trick? Was it dust? And what's with all the update-grub/reboot iterations? Why do I always have to do that over and over again? And why does none of this ever happen with my Linux laptop?

While I don't have answers to these questions, here are some links where I learned something.

No comments: