Harinezumi (harinezumi) wrote,
Harinezumi
harinezumi

  • Mood:
Over the course of my sysadmin career I have developed two all-encompassing fears. The first is my network getting hacked so bad it makes the news (see Tepper) and it being my fault. The second is one of the (un)Holy Trinity (main fileserver, mail server, and web server) dying without a spare. Today I came damn close to experiencing the second as reality.

Today was when Dave and I finally got around to installing the new power backup units. Theoretically, the new UPSes and their batteries (all 300 pounds of them) have arrived more than two months ago. First, though, they were mis-shipped to the SEI building across the street by Dell, and we weren't about to lug all of that back here ourselves. Then, when we finally got Dell to get the units to our building, it turned out that the IBSC Beowulf cluster (for the sake of which the whole thing was started in the first place) was running a series of long-term neural net simulations that could not be gracefully interrupted until May 16. So we scheduled the UPS shuffle for today.

I say shuffle, because we had the (un)Holy Trinity plugged into a 3000VA UPS unit which was way overkill for the wattage involved. The Beowulf cluster, on the other hand, with about 15 processors between its 6 machines, was a huge powerhog whose lust for current could just barely be satisfied by a 3000VA unit. So the process involved shutting down 10 machines, unplugging them all from the wall and/or UPS, moving 3 of them out of their rack, plopping down the UPSes at the bottom of that rack (trust me, you do not want 400 pounds of steel, lead, and battery acid sitting on top of your blade servers), stacking the three displaced machines on top of the UPSes (one of these days I'll figure out the infernal mechanics of mounting rails, one of these days...), and then plugging all the machines into their appropriate UPSes. A simple process, aside from the heavy lifting. Simple, that is, until you factor in my characteristic clumsiness.

Specifically, I decided to move the three machines (the web server, the mail server, and the quad-xeon node of the beowulf cluster) out of the rack while they were still connected, so as to take care of the heavy lifting before the scheduled maintenance window. Everything went fine until I decided to push them a bit further out of the way, and to my horror felt the give of a power switch. The sad thing is that I knew that the servers had power switches in the back. I noticed them and carefully avoided placing my hands anywhere near them while pulling them out of the rack and placing them on the floor. And then I decided to give the machines a little nudge. Oops.

For people who haven't worked with Linux much, all but the most modern Linux boxes really don't like being turned off without getting properly shut down. In most cases you get away with 10 minutes of fscking and a couple of reboots before the machine forgives you and goes along its merry way. This time, though, the mail server (and of course it had to be the mail server that I accidentally turned off) decided that it had just about enough of me (I have spent the past couple of weeks poking around in its software innards fine-tuning the spam and virus filters) and instead of a slightly annoyed LILO boot sequence started spitting out strings of letters and numbers that looked like a processor's register states. Cue for adrenaline, increased heartbeat, and hyperventilation.

It didn't help that the maintenance window was upon us, so I had to put the server aside and get on with the heavy lifting and the power plug reshuffling. I've finally managed to placate it almost four hours later (did I mention I'm horrible at estimating how long things are going to take? Had the maintenance period slated for half an hour. Took three hours to finish everything ^^;). The solution involved taking off the server's top and wiggling the IDE cables repeatedly. I was about at the point where I was considering sacrificing goats to it next.

Will have to poke around it some more tomorrow, make a full backup of all the configuration files (and the precious precious bayesian filter databases), and push up a couple of notches the priority of putting together the hot spare server I've been eyeballing for the past couple of months. For now, though, I thing I need to go cattle-prod some peons. Either that or watch loli-maids get chased around by an SD alligator. Something brain-rotting at any rate.
Subscribe

  • (no subject)

    HELLLOOOOOOOO SEATTLE! Finally made the move, now I just need to acquire a bed and a job, not necessarily in that order.

  • (no subject)

    Meme yoinked from kjpepper ( Click here to post your own answers for this meme.) ✓ I miss somebody right now.…

  • (no subject)

    V!

  • Post a new comment

    Error

    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 1 comment