11
bahua
5y

!coding

I used to be a sysadmin, which meant I was in charge of quarterly server patching. My team managed about 2500 servers, running various flavors of linux and legacy unix. The vast majority(95% or more) ran Linux(SLES). Our maintenance window was always in the overnight-- 10pm to 6am --so the stroke of 10pm would be a massive cascade of patching commands sent to hundreds of servers.

Before I was brought into the process, it made use of the automation product we were tasked by mgmt to use: Bigfix. It's a real piece of shit. Though we had 2500 or so servers, this environment was dominated by windows. All our vcenter servers ran it, and more importantly, our bigfix nodes were all windows machines. That meant that while we're trying to patch, the bigfix servers would get patched by the windows team. This would cause lots of failed and timed out patching, because the windows admins never quite understood that taking down the automation infrastructure would cause problems.

As such, I got tired of depending on a bunch of button-pushing checkbox-clickers who didn't know shit about shit, so I started writing an ssh-wrapped patching system. By the time I left for my current job, patching had been reduced to a single command to initiate each group's patching and reboots, and an easy check to see when servers come back up. So usually, the way it worked out was that I would send patching orders to 750 machines or so, and within about 5 minutes, they would all be done patching, and within another 20 minutes all the ones that required rebooting but about 5 would be done rebooting.

The "all-nighter" which happened every time was waiting for oracle servers to run timed fscks against a dozen or so large filesystems per server, because they were all on ext3/4, which eats complete shit. Then, several hours later, as they finished, I would have to call the DBAs to tell them to validate their shitty servers.

Comments
  • 4
    Ahh, that brings my night shift memories :) although we used mssh for widespread commands

    hats off to a fellow sysadmin! Noone recognizes our struggles to keep show on the road
  • 4
    @netikras

    Yeah, I used a perl control script that actually made the connections with pssh. It worked really well, but even so, we SHOULD have been using a capable automation product.
  • 0
    KernelCare?
Add Comment