How I lost my data and found it again
ORHow to (possibly) recover your Master Boot Record (MBR) if you overwrite it
This past week was an unusually anxious one for me. My Windows installation was giving me grief (random blackout screens, etc) so I went for the obvious solution: reinstall windows. Aside from the obvious problems, I had a poor setup anyways. I was running my OS off one hard drive, with most programs, data, and a Linux partition on my second hard drive. I wanted to put everything all on one drive.
Problem 1: I could not get the Windows installation CD to install Windows. At the time that it *should* bring up a text-based GUI with a blue background, it would just go black. I was aware that this happens sometimes with a non-Windows bootloader (I was using Grub). Confident that I knew what I was doing, I proceeded to overwrite my master boot record with something Windows liked. First, I used my Linux installation to make a backup copy of my MBR. This basically consists of using the dd command and copy over the first 512 bytes of the hard drive. The command looks like this:
dd if=/dev/hda of=~/Desktop/hdaimg bs=512 count=1
which translates into "copy 1 block (size 512 bytes) from the beginning of hard disk A into a file on the desktop called hdaimg." These 512 bytes are the MBR itself.
With a backup copy made, I copied over the MBR of my second drive, which had Windows stuff in it. The command looked a little something like this:
dd if=/dev/hdb of=/dev/hda bs=512 count=1
Reboot, reinstall windows, copy back over the original MBR, go about my merry way. That should have been it, I'd be through with my headaches. Little did I know, they were just beginning.
Reboot, reinstall windows....
Wait a sec...how come the windows installer shows only one partition on my hard drive? Where's my data and Linux partitions? Time to do some research.
After reading up about MBRs and all that jazz, I come to find out that near the end of the MBR is a set of 64 bytes called the partition table. There are 16 bytes per partition and a max of four (primary) partitions. If you want more than four partitions, you have to create an extended partition and put virtual partitions inside of it.
The root of the problem is that when I overwrote the MBR, the partition table was lost too. Without the partition table, my hard drive is just a jumbled collection of sectors. There's no organizational structure, no way to pick out files or *gasp* get to the backup copy of my original MBR.
Background information: I had no active backups. For whatever reason, all my data was on this drive and nowhere else, except a few shreds of data on a memory stick here and there. All the emails for the last 7 years, all the contact information, all the work that I'm putting together for my Master's thesis...all gone.
Needless to say, I really wanted my data back.
So, I take out an old 6 or 7 year old hard drive with 13 gigs of storage, and install Linux on it. I would use as a base to work from and see if I could possibly get my backup MBR off the drive. Fortunately, I had a plan: I looked at the Grub bootloader and noticed a short error messsage in ASCII text. If I grep the hard drive for that text, I might be able to find it. First, however, I give my brother Richard a call, as he's significantly more familiar with Linux than I am. By the time I hand up, I have an example Perl script that should work. As it's my first Perl program, I give it a try, and for some reason it doesn't work out. I didn't like the Perl script all that much anyways.
I change back to my original plan of using grep on the hard drive. The whole 160 gigabyte hard drive. I familiarize myself with the grep manpages, run a quick test on a copy of the MBR of the 13 gigabyte hard drive, and decide that it works just like I want. Here's what I had:
grep "GRUB .Geom.Hard Disk.Read. Error" /dev/hdb --binary-files=text >> ~/Desktop/hdbreal
I check for typos, and press enter. I wait.
I wait some more.
I get something to eat.
I give up on waiting, and decide to check back every few hours. Before bed, it's still running. I hexdump the output file occasionally to see if anything turns up, but see nothing. I wake up, same situation. If I didn't hear my hard drive running, I would wonder if it was actually doing anything. Finally, that evening, after 33 grueling hours, the command finishes, and I see the following output (look at the rear terminal screen):
Memory ehxausted. Wow. I exhausted my memory with a single grep. I check the hexdump, and there's something, but it's not what I need or want: it looks like the MBR error message but there's no partition table (which I could recognize pretty easily at this point).
Undaunted, I decide that the approach was right but my technique was lacking. I was in the dark the whole time it was processing, I ran out of memory, and if grep didn't return enough bytes, I had no way to go back to the source and grab some more. I knew about scripting in Linux, but I had never tried it, never knew how it worked. So, I looked online, checked out Linux in a Nutshell from the library, and set to work. Before long, I had a tested script that did what I needed. I looked a like this:
for ((skipamount=46280257 ; skipamount<156279257 ; ))
dd if=/dev/hdb of=~/Desktop/testhex bs=1024 count=1000 skip=$((skipamount+=1000))
echo "Sector" $skipamount of 156280257 echo "S" $skipamount>> ~/Desktop/hdbfound
echo >> ~/Desktop/hdbfound
grep "GRUB .Geom.Hard Disk.Read. Error" ~/Desktop/testhex --binary-files=text >> ~/Desktop/hdbfound
Long story short, this script copies one thousand sectors of 1024 bytes each (i.e. about one megabyte at a time), greps the copied sector for the error message, and writes a sector number and any results from grep into an output file. It also writes the current sector number to the terminal so I can see the progress while it runs.
Note that I didn't need to start at sector 0 because I knew the back up MBR was in the last half of the drive. I did a few test runs and found out that it took about 0.25 seconds per megabyte. At that rate, it should finish in about 8 hours, which is good because it was bedtime.
I sleep, wake up, and there's still about an hour left of processing. When it's done, I grep the output file. Finally, I see the partition table data I had lost. I apparently had another MBR backed up, so I did some figuring to determine which partition table was the most recent one based on the partition sizes. I made copies of the results, and handwrote them too just to be safe. Then, I use dd to copy over newfound backup MBR (and its partition table) to sector 0 of the hard drive.
Reboot. Run Knoppix. The hard drives are there, on the desktop. I click to open one, and breathe a big sigh of relieve when I see my data.
Reboot back into Linux (on the 13 gig hard drive). Copy all my important data onto that hard drive--I wanted a backup ASAP.
With the backup securely in place, I'm now back at square one. I have my data, but I can't reinstall Windows because there's a Grub bootloader. With my newfound MBR knowledge, I use dd to piece together a new MBR that is windows-based yet has the right partition table.
Still won't run the windows install CD. I take out the entry for the Linux extended partition, and it finally works--Windows just didn't know what to do with that one partition (note: the other partitions were FAT based). I install windows, add the Linux partition back in, and put the Grub bootloader back in place. For the future, I kept a copy of the Windows MBR in case I need to reinstall again.
I tell you what, I need excuses like these to learn the details of Linux. I learned all about the MBR, partition tables, grep, Linux scripting, and a little bit of Perl just from this one panicked situation involving all my data. I think most importantly, though, I learned that an ounce of prevention (aka backups) is worth a pound of cure. I'm just glad that the cure didn't involve amputation of data.