After trick-or-treating with the kids Saturday night, I turned on the television and attempted to stream some Halloween specials for the kids to watch (Simpsons’ Treehouse of Horror specials, Scooby-Doo and the Headless Horseman of Halloween, Disney’s Halloween Treat — you know, the classics). But my server wasn’t responding, so I walked upstairs to investigate and found the machine in an unresponsive state. It had most certainly been tricked; there would be no treat for me.
I rebooted the server and was greeted by a screen asking me what default language I preferred. Any time you turn on your computer and are asked whether you would prefer English or Swahili, it’s never good. Bypassing that screen led me to a few choices, ranging from “Would you like me to automatically try and fix the problem?” (Who would say no to that?) to “Here’s a DOS prompt — good luck, adventurer.” I think six or seven menu options in all were presented to me; before the night was through I had clicked on them all. None of them fixed anything.
There are three keys to recovering from catastrophic computer failure: backup your data, have a plan, and don’t freak out. Two of those three things happen before your computer crashes. To paraphrase a crude saying, put wishes in one hand and a dead hard drive in the other and see which one coughs up your data first. Typically, it’s a tie.
The good news is, I’m the king of backups. I transferred my virtual web server off the dead machine and to another USB enclosure, where it was remounted on my workstation and back online within an hour. With that (the most critical task) finished I went to bed around midnight and dreamed about how I was going to fix things on Sunday.
Sunday started with four or five hours of CHKDSK, an archaic and somewhat barbaric tool that scans hard drives, attempts to repair damaged files, and discards the rest. After it finished I rebooted the server and was greeted with a new error, one that let me know that the damage to the operating system had been fatal. The next step was installing a new hard drive and reloading the server. From an old, beat up external USB DVD drive, that takes a while. As mealtimes came and went I sat hunched over the keyboard, occasionally punching buttons when prompted while killing time by scrolling through Halloween pictures on Facebook on my other machine.
The only thing I *didn’t* have backed up was the C: drive of my server. The good news was/is, I was still able to see the drive (just not boot from it) and was able to recover all my backup scripts and such. The bad news is, I lost a few programs in the process. By yesterday evening I had most everything reinstalled and back up and running. I still need to recreate a couple of service accounts and scheduled tasks, but I’m 99% done. And, all my virtual machines (hosted on a different drive) escaped unscathed.
Not entirely how I wanted to spend the weekend, but I’ll take a long day’s worth of recovery to a long night’s worth of tears over data loss every time.
I’m a Debian Linux guy and I completely abandoned the Windows ecosystem years ago, but I have been through this very problem in 2009 and I recovered in about four hours without any data loss. Maybe my approach can be of your help.
I set up an automated backup task which is run FROM a remote computer and which backups everything INTO the remote computer. I know you must be saying “yeah, right, no rocket science so far”, and you are right. Just notice a few things here:
1) The backup runs in a remote computer. That is, the server does not make the backup, instead it is “backed up from outside”. This will work even if the server has a serious problem, in about 50% of the cases or more. In my case, the server had a bad ram module, it was frozen, BUT because the remote access creates an SSH shell and runs as a new user logged into the server, and the base daemons (services) were up, it happened not to interfere with faulty memory space… and it worked even in such conditions.
2) Make separate backups for files and for databases. If all you need is a database recovery, this comes in very handy. In my case, after filesystem backup, I lock all databases, backup them all (it does not take more than 5 seconds in my case), and then unlock them.
3) Make your scrips send you and email whenever the backup has not been done, or finished with some error (in this case, include log files in the email). This is a time saver.
4) Do not keep a compter on just for that task. Simply, dust an old computer and set the bios to wake the box up sometime during the night. Then have a script run on startup where, if current time is between X and Y, then run the backup script, which finishes with a “shutdown” command.
If you are wondering, I use an ancient backup software called Dirvish, which does the trick very well. Of course, it is GNU software. My backup box is a Pentium II with 64Mb RAM, running a base, console-only (no graphical environment), Debian system, which never comes to use more than 16Mb of RAM, and I have turned a Pentium II (= junk) into something useful.
Cheers!
I don’t understand a third of what he just said but…do you know why it crashed in the first place?