Monday evening after hours I got a call informing me that one of our most important servers at work was offline. I’m not officially “on call” like I used to be years ago, but when something like this happens you throw your shoes back on and go see what’s up.
What was “up” — or technically, what wasn’t “up”, was the server’s RAID card. No RAID card meant no hard drives. Being a relatively old server, I didn’t have any spares of the same make or model available at my disposal, and the RAID card was attached to the motherboard so I couldn’t simply swap it out. With this server down none of our local users could log in to the network (or their own computers), and none of our external (or public) applications were working correctly. These semi-serious issues at 10pm would turn into really serious issues the following morning when users began showing up to work.
I spent a total of 11 hours working on the server and brainstorming solutions, starting at 9pm and finishing up just before 8am. Being a work machine I can’t go into too many technical details, but suffice it to say few simple ones presented themselves that night.
When Clint (our branch manager) heard that there was an outage he began texting me, wanting to know how long I was going to be at work. “Until everything’s working,” was my response. Around 1am Clint showed up on site with a sack full of energy drinks and a hot coffee from 7-11. I had already pounded one Starbucks skinny caramel macchiato on the way in, so the reserve caffeine delivery was much appreciated.
Apparently I don’t bounce back like I used to. After pulling an all-nighter Monday night, I was fairly worthless Tuesday. Tuesday night I crashed when I got home, which messed my sleep schedule up even further. Wednesday after work we replaced the barely-hobbling original server with brand new hardware. That took another three hours. This weekend, we’re having a power outage at work. By the time Saturday comes to a close, I expect to have accrued somewhere around 20 hours of comp time this week. My bones feel it.
So anyway … if you happen to walk past my desk and see a mess of papers, or a pile of food, or a pyramid of energy drinks and coffee cups, now you know why sometimes it looks that way.
Hence me using *ONLY* software RAID. I can recall RAID cards creating proprietary RAIDs which could not be read unless you had another of those cards. If a disk failed you could clone it, but if the card failed you were doomed.
I have not used a Windows server for years now, my last RAID on a Win server was NT4… But with Linux it’s really easy and straightforward. I just plug the failed disk and the replacement disk into a dektop Linux box, usually with Ubuntu Live CD running off a USB stick, and clone the old disk into the new one. If it fails due to problems in the old disk, no problem. All my RAIDs are RAID1 (copy RAID). The cloning process begins by declaring the whole disk as a member of a Linux MultiDisk unit (that is, RAID). When the new disk is put to work, any missing information is copied into it in order to keep it all in sync as part of the normal RAID1 operation. “Et voilà “.