All Hands on Deck

Through what can only be called a comedy of errors, “we” (not me) managed to brick nearly 1,000 laptops at work. The details aren’t particularly important (nor am I in a position to discuss them), but let’s just say that a unique combination of outdated operating systems, older third party disk encryption solutions, and Microsoft’s latest patches combined to create the perfect storm for these specific machines. All of these things came together in a way that left the machines in a non-bootable state, effectively turning them into “bricks.”

A quick response was needed, so management requested that those machines be mailed to Oklahoma City where an impromptu team of approximately thirty computer specialists gathered to remediate the issue. Roughly one third of the team members like myself live here in Oklahoma; the rest have come from Texas to Washington DC to lend a hand. Over the past several days I’ve had the opportunity to catch up with some old friends and make some new ones.

Because of the large scope and quick turnaround of this effort, many kinds of volunteers were needed. We have people receiving and inventorying the machines. We have people dealing with obtaining disk decryption keys. We have people physically swapping hard drives. We have people reimaging machines, performing data recovery, and making sure all the required software has been reinstalled before the laptops are repackaged, tracked, and shipped out. My contribution to the project has largely been in automating some of the processes. I wrote the scripts that automate the backup process, updates the BIOS, and performs other tasks — and when we found that PowerShell couldn’t operate in the environment, I reverted back to ol’ DOS batch files (yes, in 2020). If I had more time I could have written more graceful code, but this is a quick-and-dirty operation, and I wrote some quick-and-dirty scripts to move things along. There have been hiccups along the way, and the scripts are still being tweaked as I get feedback from the technicians about more parts of the process that can be automated.

Beginning last Wednesday, I’ve been working twelve hours a day (as have most of the volunteers). Wednesday morning we had volunteers but no tools, no new hard drives, no USB sticks (needed for imaging) and, most important of all, no donor machines. Looking at a glass half full, that gave us time to create a skeleton process (which to be sure has changed several times as bottlenecks were identified). By Friday, machines were moving through the process as smoothly as could be expected.

Today, Saturday, we’ll be working from nine a.m. until we run out of machines. If we can get the stockpile down to zero today, we’ll get Sunday off and begin again on Monday. Sometime next week, the first wave of traveling volunteers will return to their duty locations and a new wave will arrive. Hopefully there will be enough crossover time for the old group to pass along the process to the incomers. The official ETA for completion is “when all the machines are done,” so we’re hoping within another week or two the mass wave of dead machines will have been processed, and any future machines can be handled by one of our normal imaging teams. Until then, it’s all hands on deck. I’m tired and sore, but also excited and determined.

3 thoughts on “All Hands on Deck

  1. Been there, done that, got the t-shirt.

    A few years ago, an undetermined application issue caused 9000 laptops to boot loop… Boot, 30secs, BoD, boot again, repeat…

    The laptops were all over the country and about 1000+ of them were remote – meaning they were not near a company location.

    I was in charge of the physical response while in parallel the engineers worked through the night to figure out how to fix this remotely, if at all possible.

    While calling in all favors from many IT groups (ramping up temp admin rights for IT people that don’t know how to spell P-C so they could image systems, etc), the engineering team with the help of three vendors figured out a very creative way to fix the issue.

    They created a top domain GPO that could stop a certain process from running, thus allowing the workstation to boot fully. The concern was will the workstation have enough time to ‘checkin’ prior to boot looping. At 6am, it worked and was deployed as fast as it could. We had techs swarming all the major office building floors (there were over 30 locations) to help people through.

    As the morning progressed, people were able to receive the GPO and start work normally. The issue left was the 500+ fully remote people (some decided to drive over 2-3+ hours to get to a company location). The could not establish VPN with the laptop rebooting.

    Yet another very creative solution. 1. A tech would call the end user and give them the root admin password for the workstation. 2. Have them boot in Safe Mode w/networking (Windows 7 still). 3. Have the end user establish a LogMeIn session with the tech. 4. The tech would run a script to do what the GPO did. 5. Tech would force reboot the workstation and the end user was good to go.

    We had a few systems where the hard drive was on it’s last leg and with all the boot looping finally crashed. So, there was some minor clean up after the fact.

    Amazing efforts by all involved.

  2. Sounds like a lot of work but challenges of this sort (especially on a grand scale) are where you have always shined. You do impressive things under pressure! Remember to drink plenty of fluids (whiskey,etc.) every evening!!

Comments are closed.