Story of pulling a coding all-nighter?

Week 146 Group Rant

bahua

6y

!coding

I used to be a sysadmin, which meant I was in charge of quarterly server patching. My team managed about 2500 servers, running various flavors of linux and legacy unix. The vast majority(95% or more) ran Linux(SLES). Our maintenance window was always in the overnight-- 10pm to 6am --so the stroke of 10pm would be a massive cascade of patching commands sent to hundreds of servers.

Before I was brought into the process, it made use of the automation product we were tasked by mgmt to use: Bigfix. It's a real piece of shit. Though we had 2500 or so servers, this environment was dominated by windows. All our vcenter servers ran it, and more importantly, our bigfix nodes were all windows machines. That meant that while we're trying to patch, the bigfix servers would get patched by the windows team. This would cause lots of failed and timed out patching, because the windows admins never quite understood that taking down the automation infrastructure would cause problems.

As such, I got tired of depending on a bunch of button-pushing checkbox-clickers who didn't know shit about shit, so I started writing an ssh-wrapped patching system. By the time I left for my current job, patching had been reduced to a single command to initiate each group's patching and reboots, and an easy check to see when servers come back up. So usually, the way it worked out was that I would send patching orders to 750 machines or so, and within about 5 minutes, they would all be done patching, and within another 20 minutes all the ones that required rebooting but about 5 would be done rebooting.

The "all-nighter" which happened every time was waiting for oracle servers to run timed fscks against a dozen or so large filesystems per server, because they were all on ext3/4, which eats complete shit. Then, several hours later, as they finished, I would have to call the DBAs to tell them to validate their shitty servers.

rant

wk146

Ranter

Comments

4

netikras

34610

6y

Ahh, that brings my night shift memories :) although we used mssh for widespread commands

hats off to a fellow sysadmin! Noone recognizes our struggles to keep show on the road
4

bahua

12496

6y

@netikras

Yeah, I used a perl control script that actually made the connections with pssh. It worked really well, but even so, we SHOULD have been using a capable automation product.
0

xalys

1588

6y

KernelCare?

Related Rants

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service