Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
Get a devDuck
Rubber duck debugging has never been so cute! Get your favorite coding language devDuckBuy Now
Search - "production failure"
Worst thing you've seen another dev do? Long one, but has a happy ending.
Classic 'Dev deploys to production at 5:00PM on a Friday, and goes home.' story.
The web department was managed under the the Marketing department, so they were not required to adhere to any type of coding standards and for months we fought with them on logging. Pre-Splunk, we rolled our own logging/alerting solution and they hated being the #1 reason for phone calls/texts/emails every night.
Wanting to "get it done", 'Tony' decided to bypass the default logging and send himself an email if an exception occurred in his code.
At 5:00PM on a Friday, deploys, goes home.
Around 11:00AM on Sunday (a lot folks are still in church at this time), the VP of IS gets a call from the CEO (who does not go to church) about unable to log into his email. VP has to leave church..drive home and find out he cannot remote access the exchange server. He starts making other phone calls..forcing the entire networking department to drive in and get email back up (you can imagine not a group of happy people)
After some network-admin voodoo, by 12:00, they discover/fix the issue (know it was Tony's email that was the problem)
We find out Monday that not only did Tony deploy at 5:00 on a Friday, the deployment wasn't approved, had features no one asked for, wasn't checked into version control, and the exception during checkout cost the company over $50,000 in lost sales.
Was Tony fired? Noooo. The web is our cash cow and Tony was considered a top web developer (and he knew that), Tony decided to blame logging. While in the discovery meeting, Tony told the bosses that it wasn't his fault logging was so buggy and caused so many phone calls/texts/emails every night, if he had been trained properly, this problem could have been avoided.
Well, since I was responsible for logging, I was next in the hot seat.
For almost 30 minutes I listened to every terrible thing I had done to Tony ever since he started. I was a terrible mentor, I was mean, I was degrading, etc..etc.
Me: "Where is this coming from? I barely know Tony. We're not even in the same building. I met him once when he started, maybe saw him a couple of times in meetings."
Andrew: "Aren't you responsible for this logging fiasco?"
Me: "Good Lord no, why am I here?"
Andrew: "I'll rephrase so you'll understand, aren't you are responsible for the proper training of how developers log errors in their code? This disaster is clearly a consequence of your failure. What do you have to say for yourself?"
Me: "Nothing. Developers are responsible for their own choices. Tony made the choice to bypass our logging and send errors to himself, causing Exchange to lockup and losing sales."
Andrew: "A choice he made because he was not properly informed of the consequences? Again, that is a failure in the proper use of logging, and why you are here."
Me: "I'm done with this. Does John know I'm in here? How about you get John and you talk to him like that."
'John' was the department head at the time.
Andrew:"John, have you spoken to Tony?"
John: "Yes, and I'm very sorry and very disappointed. This won't happen again."
John: "You know what. Did you even fucking talk to Tony? You just sit in your ivory tower and think your actions don't matter?"
Me: "Whoa!! What are you talking about!? My responsibility for logging stops with the work instructions. After that if Tony decides to do something else, that is on him."
John: "That is not how Tony tells it. He said he's been struggling with your logging system everyday since he's started and you've done nothing to help. This behavior ends today. We're a fucking team. Get off your damn high horse and help the little guy every once in a while."
Me: "I don't know what Tony has been telling you, but I barely know the guy. If he has been having trouble with the one line of code to log, this is the first I've heard of it."
John: "Like I said, this ends today. You are going to come up with a proper training class and learn to get out and talk to other people."
Over the next couple of weeks I become a powerpoint wizard and 'train' anyone/everyone on the proper use of logging. The one line of code to log. One line of code.
A friend 'Scott' sits close to Tony (I mean I do get out and know people) told me that Tony poured out the crocodile tears. Like cried and cried, apologizing, calling me everything but a kitchen sink,...etc. It was so bad, his manager 'Sally' was crying, her boss 'Andrew', was red in the face, when 'John' heard 'Sally' was crying, you can imagine the high levels of alpha-male 'gotta look like I'm protecting the females' hormones flowing.
Took almost another year, Tony released a change on a Friday, went home, web site crashed (losses were in the thousands of $ per minute this time), and Tony was not let back into the building on Monday (one of the best days of my life).10
Rather than singling out one person, I wanna present what I see as incompetent/stupid/ignorant:
- no will to learn
- failure to follow the very specific instructions & later asking for help when they FUBR sth & not even knowing what they did to fuck up in the first place
- asking how to solve stuff, then ignoring the suggestions & doing sth totally against recommendations
- failure to remember most basic stuff, especially if not writing it down to look at later when needed
- failure to check logs & 'google' stuff before asking why something isn't working the way they want it
- after two weeks, asking me how feature xy works, mind you they coded it, not me
- asking me why they did something in a specific way - WTF, am I a mind reader?! Who designed that crap?! Me or you?!!
- being passive/aggressive & snarky when told to do something or being asked why isn't it done already
- not testing their shit properly
- not making backups when upgrading (production) servers
- not checking the input value, no validation.. even after many many debacles on production with null ref exceptions
- failure to admit they fucked up
- not learning from (their) mistakes8
Finding a bug as a developer: "Fuck..." *start working on an undocumented hotfix that breaks other parts of the application"
Finding a bug as a tester: "Yeah, right.." *start writing a comprehensive report including all possible failure scenarios and how world famine will increase and men develop boobs if this bug is shipped into production"
Finding a bug as a PM: "Well, the other parts work, right ?" *click randomly on nearby buttons and input fields to "check" if everything is all right. Ditch said report from tester*3
Holy fuck my first software built at this job (alone, in 6 months) has gone live.
And it has a shit load of bugs and failure points. There are some mistakes in the code level too (I work at a small vendor and had nobody to review code)
I'm so tensed right now. Identified some problems and solved them, but the issues are countless and the project can't be taken down because people are using it as I type!
I'm afraid that I'll be getting fired soon.19
We started a project in January for which I was the sole developer, to automate tedious interaction with a vendor's ticketing system. We have a storage environment with about 400,000 commodity disks attached(for this vendor-- there are other vendors too), in sites around the US and Canada. With a weekly failure rate of about 0.0005%, that means about 200 disks a week need to be replaced.
This work-- hardware investigation through storage appliance frontends, internal ticket creation, external ticket creation, watching the external ticket for updates to include in our internal ticket --was all manual, and for around 200 issues a week, it was done by one guy for two years. He was hopelessly behind. This is all automated now, and this morning, I pushed this automation from dev/test to production.
It feels great to see your work helping people around you.8
Doing exams at the moment. Finished phase one out of four successfully at Monday but now stuff is going bad again as usual. Seriously, with me, everything goes perfectly fine until stuff gets official, then code starts failing, self doubt comes up and fair of failure and low self esteem hit me like a bomb.
I'm using my own framework which I actually also use in production and it works fine! But then it has to start to fucking fail at the moment I need it to work the fucking most.
I've worked towards this for five years now, I don't want to fail this! I don't want to disappoint either myself or my friends or my parents.
We had a major core router hardware failure in our LA datacenter today and every one of our services has been down since 6am, including all production servers. We have about 15,000 sites down across our entire platform. Our manager came over and told us to just go home because we need to replace the hardware and the process is expected to take all day, and we can't do any work until then because all the production servers are down. So you could say that it's been a pretty easy Friday so far! I'm headed home to play Spider-Man2
This is the craziest shit... MY FUCKING SERVER JUST SET ON FIRE!!!
Like seriously its hot news (can't resist the puns), it's actually really bad news and I'm just in shock (it's not everyday you find out your running the hottest stack in the country :-P)... I thought it slow as fuck this morning but the office internet was also on the fritz so I carried on with my life until EVERYTHING went down (completely down - poof gone) and within 2 minutes I had a technician from the data centre telling me that something to do with fans had failed and they caught fire, melted and have become one with the hardware. WTF? The last time I went to the data centre it was so cold I pissed sitting down for 2 days because my dick vanished.
I'm just so fucking torn right now because initially I was absolutely fucking ecstatic - 1 week ago after a year of doomsday bitching about having a single point of failure and me not being a sysadmin only to have them look at me like I'm some kind of techie flat earther I finally got approval to spend around 5x more per month and migrate all our software to containerized micro services.
I'll admit this is a bit worse than I expected but thanks to last week at least I have recent off site images of the drives - because big surprise I have to set this monolithic beast back up (No small feat - its gonna be a long night) on a fresh VPS, I also have to do it on premises or the data will only finish uploading sometime next week.
Pro Tip: If your also pleading for more resources/better production environment only to be stone walled the second you mention there's a cost attached be like me - I gave them an ultimatum, either I deploy the software on a stack that's manageable or they man the fuck up and pay a sys admin (This idea got them really amped up until they checked how much decent sys admins cost).
Now I have very flexible pockets because even if I go rambo the max server costs would only be 15-20% of a sys admins paycheck even though that is 13 x more than our current costs.3
Have you ever been interrupted because a marketing workmate had a friend on the phone who needed advice on a WordPress hosting, and wanted your advice right now?
Because I have.
When we had a massive server failure and our production environment was down.
Seriously, what the fuck is wrong with people nowadays.6
If they followed my suggestion and went straight to debugging the server issues they would have been solved it from week 1 and everyone would have thought the migration had a minor performance hiccup. In fact, we have already done such at least twice before and nobody batted an eye.
Instead they self-labelled the migration a failure on first error, setting the stage for apologizing to the client, and put themselves on the spot for a whole staging / production signoff, replication / backup worfklow, almost a blue-green "seamless" deployment reminiscent of DigitalOcean.
Well they're not DigitalOcean, and anyone who has spent any time understanding users knows they will not participate in "new system" tests long enough to find or report issues.
So of course the migration stretched out to almost three months up until the whole reason for the migration - the rapidly escalating risk of the old provider disappearing - hit like a freight train and now they have to go through the problem of debugging the server like I told them to on week 1. Only this time they've set the client mindset against it, lost any chance of reverting, have had grave risk for data loss, and are under pressure to debug other people's code in real-time.
This is why I don't trust devs to do ops. A dev's first solution to any problem is to throw tech at it.
Of course the shouting episodes all happened during the era I was doing WordPress dev.
So we were a team of consultants working on this elephant-traffic website. There were a couple of systems for managing content on a more modular level, the "best" being one dubbed MF, a spaghettified monstrosity that the 2 people who joined before me had developed.
We were about to launch that shit into production, so I was watching their AWS account, being the only dev who had operational experience (and not afraid to wipe out that macos piece of shit and dev on a real os).
Anyhow, we enable the thing, and the average number of queries per page load instantly jumps from ~30 (even vanilla WP is horrible) to 1000+. Instances are overloaded and the ASG group goes up from 4 to 22. That just moves the problem elsewhere as now the database server is overwhelmed.
Me: we have to enable database caching for this thing *NOW*
Shitty authors of the monstrosity (SAM): no, our code cannot be responsible for that, it's the platform that can't handle the transition.
Me: we literally flipped a single switch here and look at the jump in all these graphs.
SAM: nono, it's fine, just add more instances
Me: ARE YOU FUCKIN SERIOUS?
Me: - goes and enables database caching without any approvals to do so, explaining to mgmt. that failure to do so would impair business revenue due to huge loading times, so they have to live with some data staleness -
SAM: Noooo, we'll show you it's not our code.
SAM: - pushes a new release of the monstrosity that makes DB queries go above 2k / page load -
Tho on the bright side, from that point on I focused exclusively on performance, was building a nice fragment caching framework which made the site fly regardless of what shitty code was powering it, tuned the stack to no end and learned a ton of stuff in the process which allowed me to graduate from the tar pit of WP development.5
So, here at this place ... last person to touch a project is the sole person responsible for it being a success or failure. Wtf?!?!?!
If a client sends me the wrong picture, and agrees that is the correct one, and then it goes to production. Later on to find that it was never correct to begin with ... guess who's fault that is?
Is everybody taking crazy pills?
Don't answer that, I already know the answer.4
Did you ever had to integrate a fucking "API" that is done via mail bodies?
Fuck this shit! Who need responses about success or failure?! Guess this will take a long time to test this fucking piece of garbage... We don't get a test system, we need to test this with the production system of the other company. I hope their retarded application crashes when receiving malicious mails.
Not speaking about security, I bet everyone can send a mail to their stupid mail address and modify their data 🙈
And inside of this crap mail you also have to send the name, street and email of their company. Why do you fucking need this information?!1