Sunday, November 01, 2009

Crazy weekend

So, what I was going to do this weekend was re-work how our git branches are organized, creating a testing branch called "too", and work on getting the new Funtoo/OpenRC network scripts finalized.

What I *actually* did was try to reload one of my servers -- and ran into lots of problems. This process actually started the middle of last week, sucking up my after-work hours. I wanted to reload one of my Nehalem servers, and for some reason -- the thing just wouldn't boot. I tried rolling back firmware. I tried using known-good kernels. I tried using an exact replica of my *working* identical Nehalem server, so it was basically set up totally identically to my other box, and the thing still wouldn't boot. Everything I tried, it complained about an unknown block device and that it was unable to mount the root filesystem. But the strange thing was -- sysrescuecd could access the disk just fine. And I created a minimal initrd with "bash" and "mount" on it, and I was able to mount the root filesystem just fine with no extra modules required.

I found the solution around 2:30 AM last night - for some reason, the kernel auto-detection of what filesystem is on the partition was consistently failing. If I pass a "rootfstype=ext3" kernel boot option, everything works. But if I leave my "root=/dev/sda3" boot option to fend for itself, the thing won't boot. I've never ran into this behavior before.

But thinking about it, the "rootfstype" option makes a lot of sense. I don't want to rely on my kernel auto-detecting the type of my filesystem by itself - especially now that I've seen it fail so badly. Apparently some funky data on your block devices can cause this auto-detection to bite it pretty hard.

All that to say -- I had a crazy weekend. 20 hours of troubleshooting.

But there's some good things that are going to come out of it - I am going to fork the Gentoo GRUB package, which needs quite a bit of love. It's time to get GPT partitioning documented properly and grub-1.97 supported offiicially as it is in Ubuntu. So expect to see some things related to that soon.