Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple APILearn More
Search - "clearly still failing"
My first job: The Mystery of The Powered-Down Server
I paid my way through college by working every-other-semester in the Cooperative-Education Program my school provided. My first job was with a small company (now defunct) which made some of the very first optical-storage robotic storage systems. I honestly forgot what I was "officially" hired for at first, but I quickly moved up into the kernel device-driver team and was quite happy there.
It was primarily a Solaris shop, with a smattering of IBM AIX RS/6000. It was one of these ill-fated RS/6000 machines which (by no fault of its own) plays a major role in this story.
One day, I came to work to find my team-leader in quite a tizzy -- cursing and ranting about our VAR selling us bad equipment; about how IBM just doesn't make good hardware like they did in the good old days; about how back when _he_ was in charge of buying equipment this wouldn't happen, and on and on and on.
Our primary AIX dev server was powered off when he arrived. He booted it up, checked logs and was running self-diagnostics, but absolutely nothing so far indicated why the machine had shut down. We blew a couple of hours trying to figure out what happened, to no avail. Eventually, with other deadlines looming, we just chalked it up be something we'll look into more later.
Several days went by, with the usual day-to-day comings and goings; no surprises.
Then, next week, it happened again.
My team-leader was LIVID. The same server was hard-down again when he came in; no explanation. He opened a ticket with IBM and put in a call to our VAR rep, demanding answers -- how could they sell us bad equipment -- why isn't there any indication of what's failing -- someone must come out here and fix this NOW, and on and on and on.
(As a quick aside, in case it's not clearly coming through between-the-lines, our team leader was always a little bit "over to top" for me. He was the kind of person who "got things done," and as long as you stayed on his good side, you could just watch the fireworks most days - but it became pretty exhausting sometimes).
Back our story -
An IBM CE comes out and does a full on-site hardware diagnostic -- tears the whole server down, runs through everything one part a time. Absolutely. Nothing. Wrong.
I recall, at some point of all this, making the comment "It's almost like someone just pulls the plug on it -- like the power just, poof, goes away."
My team-leader demands the CE replace the power supply, even though it appeared to be operating normally. He does, at our cost, of course.
Another weeks goes by and all is forgotten in the swamp of work we have to do.
Until one day, the next week... Yes, you guessed it... It happens again. The server is down. Heads are exploding (will at least one head we all know by now). With all the screaming going on, the entire office staff should have comped some Advil.
My team-leader demands the facilities team do a full diagnostic on the UPS system and assure we aren't getting drop-outs on the power system. They do the diagnostic. They also review the logs for the power/load distribution to the entire lab and office spaces. Nothing is amiss.
This would also be a good time draw the picture of where this server is -- this particular server is not in the actual server room, it's out in the office area. That's on purpose, since it is connected to a demo robotics cabinet we use for testing and POC work. And customer demos. This will date me, but these were the days when robotic storage was new and VERY exciting to watch...
So, this is basically a couple of big boxes out on the office floor, with power cables running into a special power-drop near the middle of the room. That information might seem superfluous now, but will come into play shortly in our story.
So, we still have no answer to what's causing the server problems, but we all have work to do, so we keep plugging away, hoping for the best.
The team leader is insisting the VAR swap in a new server.
One night, we (the device-driver team) are working late, burning the midnight oil, right there in the office, and we bear witness to something I will never forget.
The cleaning staff came in.
Anxious for a brief distraction from our marathon of debugging, we stopped to watch them set up and start cleaning the office for a bit.
Then, friends, I Am Not Making This Up(tm)... I watched one of the cleaning staff walk right over to that beautiful RS/6000 dev server, dwarfed in shadow beside that huge robotic disc enclosure... and yank the server power cable right out of the dedicated power drop. And plug in their vacuum cleaner. And vacuum the floor.
We each looked at one-another, slowly, in bewilderment... and then went home, after a brief discussion on the way out the door.
You see, our team-leader wasn't with us that night; so before we left, we all agreed to come in late the next day. Very late indeed.9
That feeling when you’re working on writing a program to take differently based numbers, parse the input character by character, and return the value in different bases (dec, binary, hex, etc), get stuck, ask your partner to take a look at it and their response is along the lines of “What are you even DOING?! That doesn’t even make sense.”3
I have a co-worker who won’t stop “refactoring” our codebase. He will go on a long tangent — under the guise of working on a proper story — and then reveal proudly after a few days that he now introduced a new middle-layer into the code which will help us such and such.
I have never seen any benefit from this. I think sometimes cleaning up variable names is nice, but a lot of the things just add noise and complexity. He’s a junior dev, I’m a senior dev. My progressional opinion is that he is doing a bad job. Management doesn’t know the full extent and the lead programmer scolds him every now and then but in the end let’s the code changes pass code review. “It has already been implemented so what’s the harm”.
Then the rest of us are stuck with horrible merge conflicts. I recently noticed that some new business-important unit tests that I wrote were mysteriously gone. Oops — lost in some misguided refactoring I guess. I’m assuming they were failing after the refactor, so clearly they had to go... Fortunately the underlying logic still works I think.
His main tactic in all of this seems to be to just use argumentative stamina. He will lose discussion after discussion but doesn’t seem to care. He’ll just talk and talk. And the in the end the lead tech gives in. And/or doesn’t have the energy to catch the error introduced.
I swear, the company would be better off without him. Maybe even better if we keep paying him but he just cleans the toilets instead. Sometimes I almost believes he gets up in the morning to come to work and just fuck with people all day.2