Wednesday, January 18, 2012

Just Don't Turn It Off....

My Linux reboot problems continue. If I restart shutdown -r now, I get a variety of boot failures. I've determined there's nothing wrong with my grub2 and config files, the whole /boot directory and everything in it is fine. If I leave the computer alone for at least half an hour, I can turn it on, and boot successfully from /dev/sda1. Any less time, and every version of grub, SuperGrub, rescue disk, live Linux CD, whatever I try reports it cannot read any files. Hardware problem, or software, or a combination of the two? I really don't know. For the moment, I'm just leaving it turned on all the time, because sleep modes give other problems. If I need to turn it off, I'll try to leave it off overnight.

This is really unsatisfactory. Here's some unsatisfying but perhaps informative information I've found about similar Linux troubles.

  • The Analysis of Drive Issues: This page has been designed to help with the analysis of drive problems, and often to recommend what steps to take.
  • SATA disks resets in a md setup:
    The system is amd64bit running Debian unstable stock with kernel 2.6.29
    (Debian package). full dmesg is attached
    I have 2 250GB disks (/dev/sda, /dev/sdb) that I used to assemble a
    md array (/dev/md0)
    Please note that the two disk are tested via smart long selftest
    and via $dd bs=256M if=/dev/sd? of=/dev/null without any problem.
    I researched in web and followed advice:
    I have checked / exchanged cables
    I disabled smartd.
    The actual Problem:
    Then I start the following stress test. From the other disks of the machine
    /dev/hda, /dev/hdb, /dev/sdc I start copying (via rsync) to /dev/md0 to a
    newly formatted ext3 filesystem.
    Everything goes fine for a while and then the system freezes
    [Error messages deleted--they look much like my errors]
    and my filesystem is dead. /dev/sdb is deleted from /dev. I have to reboot
    and even then Linux can't find the ata2 /dev/sdb.
    I have to remove power for 1-2 min for the disk to become accessible again.
    Do you think the disk is bad or something?

    The answer he eventually gets is that it's a kernel bug, which has since been fixed.

  • SATA functionality and performance with OMAPL-138: In this case the writer is working with a board that seems to have problems accessing SATA drives. The answers he gets suggest looking at different kernels and checking SATA cables.

  • ata3: COMRESET failed (errno=-16):
    At bootup I end up in busybox and I see the following message on the top of the screen "Gave up waiting for root device"
    Actually I think that my hard drive "falls asleep" just after leaving grub. When I'm in the busybox I need to unplug my hard drive (serial ata) and to plug it again so that I can hear that it's restarting. After doing that I type exit in the busybox and the boot process restarts normally.

    This turns out to be a kernel bug which has been fixed.

  • Fails to find boot device in Intel D945Gnt: These boot failures resemble mine; also a kernel bug which has been fixed.
  • Can someone explain how to perform this workaround? I've tried this fix with no success:
    sudo gedit /boot/grub/menu.lst
    you will see some lines like : 
    title  Debian GNU/Linux, kernel 2.6.26-1-686
    root  (hd0,0)
    kernel  /boot/vmlinuz-2.6.26-1-686 root=/dev/sda1 ro quiet
    initrd  /boot/initrd.img-2.6.26-1-686
    add rootdelay=90 after quiet

    This person had success with the rootdelay=90 solution: Cold boots fine, but get 'Gave up waiting for root device' on reboot.

  • Debian Bug report logs - #649563 linux-image-3.1.0-1-amd64 can't load initial ramdisk anymore:
    after upgrading wheezy to linux-image-3.1.0-1-amd64, grub complains:
    Loading initial ramdisk ...
    error: couldn't read file.
    Later the kernel panics. 

    This one is also solved by a kernel upgrade.

Here are some more rescue disks I've burned and tried.

No comments: