Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "bug in bus"
-
Worst dev team failure I've experienced?
One of several.
Around 2012, a team of devs were tasked to convert a ASPX service to WCF that had one responsibility, returning product data (description, price, availability, etc...simple stuff)
No complex searching, just pass the ID, you get the response.
I was the original developer of the ASPX service, which API was an XML request and returned an XML response. The 'powers-that-be' decided anything XML was evil and had to be purged from the planet. If this thought bubble popped up over your head "Wait a sec...doesn't WCF transmit everything via SOAP, which is XML?", yes, but in their minds SOAP wasn't XML. That's not the worst WTF of this story.
The team, 3 developers, 2 DBAs, network administrators, several web developers, worked on the conversion for about 9 months using the Waterfall method (3~5 months was mostly in meetings and very basic prototyping) and using a test-first approach (their own flavor of TDD). The 'go live' day was to occur at 3:00AM and mandatory that nearly the entire department be on-sight (including the department VP) and available to help troubleshoot any system issues.
3:00AM - Teams start their deployments
3:05AM - Thousands and thousands of errors from all kinds of sources (web exceptions, database exceptions, server exceptions, etc), site goes down, teams roll everything back.
3:30AM - The primary developer remembered he made a last minute change to a stored procedure parameter that hadn't been pushed to production, which caused a side-affect across several layers of their stack.
4:00AM - The developer found his bug, but the manager decided it would be better if everyone went home and get a fresh look at the problem at 8:00AM (yes, he expected everyone to be back in the office at 8:00AM).
About a month later, the team scheduled another 3:00AM deployment (VP was present again), confident that introducing mocking into their testing pipeline would fix any database related errors.
3:00AM - Team starts their deployments.
3:30AM - No major errors, things seem to be going well. High fives, cheers..manager tells everyone to head home.
3:35AM - Site crashes, like white page, no response from the servers kind of crash. Resetting IIS on the servers works, but only for around 10 minutes or so.
4:00AM - Team rolls back, manager is clearly pissed at this point, "Nobody is going fucking home until we figure this out!!"
6:00AM - Diagnostics found the WCF client was causing the server to run out of resources, with a mix of clogging up server bandwidth, and a sprinkle of N+1 scaling problem. Manager lets everyone go home, but be back in the office at 8:00AM to develop a plan so this *never* happens again.
About 2 months later, a 'real' development+integration environment (previously, any+all integration tests were on the developer's machine) and the team scheduled a 6:00AM deployment, but at a much, much smaller scale with just the 3 development team members.
Why? Because the manager 'froze' changes to the ASPX service, the web team still needed various enhancements, so they bypassed the service (not using the ASPX service at all) and wrote their own SQL scripts that hit the database directly and utilized AppFabric/Velocity caching to allow the site to scale. There were only a couple client application using the ASPX service that needed to be converted, so deploying at 6:00AM gave everyone a couple of hours before users got into the office. Service deployed, worked like a champ.
A week later the VP schedules a celebration for the successful migration to WCF. Pizza, cake, the works. The 3 team members received awards (and a envelope, which probably equaled some $$$) and the entire team received a custom Benchmade pocket knife to remember this project's success. Myself and several others just stared at each other, not knowing what to say.
Later, my manager pulls several of us into a conference room
Me: "What the hell? This is one of the biggest failures I've been apart of. We got rewarded for thousands and thousands of dollars of wasted time."
<others expressed the same and expletive sediments>
Mgr: "I know..I know...but that's the story we have to stick with. If the company realizes what a fucking mess this is, we could all be fired."
Me: "What?!! All of us?!"
Mgr: "Well, shit rolls downhill. Dept-Mgr-John is ready to fire anyone he felt could make him look bad, which is why I pulled you guys in here. The other sheep out there will go along with anything he says and more than happy to throw you under the bus. Keep your head down until this blows over. Say nothing."11 -
Our web department was deploying a fairly large sales campaign (equivalent to a ‘Black Friday’ for us), and the day before, at 4:00PM, one of the devs emails us and asks “Hey, just a heads up, the main sales page takes almost 30 seconds to load. Any chance you could find out why? Thanks!”
We click the URL they sent, and sure enough, 30 seconds on the dot.
Our department manager almost fell out of his chair (a few ‘F’ bombs were thrown).
DBAs sit next door, so he shouts…
Mgr: ”Hey, did you know the new sales page is taking 30 seconds to open!?”
DBA: “Yea, but it’s not the database. Are you just now hearing about this? They have had performance problems for over week now. Our traces show it’s something on their end.”
Mgr: “-bleep- no!”
Mgr tries to get a hold of anyone …no one is answering the phone..so he leaves to find someone…anyone with authority.
4:15 he comes back..
Mgr: “-beep- All the web managers were in a meeting. I had to interrupt and ask if they knew about the performance problem.”
Me: “Oh crap. I assume they didn’t know or they wouldn’t be in a meeting.”
Mgr: “-bleep- no! No one knew. Apparently the only ones who knew were the 3 developers and the DBA!”
Me: “Uh…what exactly do they want us to do?”
Mgr: “The –bleep- if I know!”
Me: “Are there any load tests we could use for the staging servers? Maybe it’s only the developer servers.”
DBA: “No, just those 3 developers testing. They could reproduce the slowness on staging, so no need for the load tests.”
Mgr: “Oh my –bleep-ing God!”
4:30 ..one of the vice presidents comes into our area…
VP: “So, do we know what the problem is? John tells me you guys are fixing the problem.”
Mgr: “No, we just heard about the problem half hour ago. DBAs said the database side is fine and the traces look like the bottleneck is on web side of things.”
VP: “Hmm, no, John said the problem is the caching. Aren’t you responsible for that?”
Mgr: “Uh…um…yea, but I don’t think anyone knows what the problem is yet.”
VP: “Well, get the caching problem fixed as soon as possible. Our sales numbers this year hinge on the deployment tomorrow.”
- VP leaves -
Me: “I looked at the cache, it’s fine. Their traffic is barely a blip. How much do you want to bet they have a bug or a mistyped url in their javascript? A consistent 30 second load time is suspiciously indicative of a timeout somewhere.”
Mgr: “I was thinking the same thing. I’ll have networking run a trace.”
4:45 Networking run their trace, and sure enough, there was some relative path of ‘something’ pointing to a local resource not on development, it was waiting/timing out after 30 seconds. Fixed the path and page loaded instantaneously. Network admin walks over..
NetworkAdmin: “We had no idea they were having problems. If they told us last week, we could have identified the issue. Did anyone else think 30 second load time was a bit suspicious?”
4:50 VP walks in (“John” is the web team manager)..
VP: “John said the caching issue is fixed. Great job everyone.”
Mgr: “It wasn’t the caching, it was a mistyped resource or something in a javascript file.”
VP: “But the caching is fixed? Right? John said it was caching. Anyway, great job everyone. We’re going to have a great day tomorrow!”
VP leaves
NetworkAdmin: “Ouch…you feel that?”
Me: “Feel what?”
NetworkAdmin: “That bus John just threw us under.”
Mgr: “Yea, but I think John just saved 3 jobs. Remember that.”4 -
My company has two offices in separate cities but they treat each the devs of each location very very differently.
In one office the devs get full power to experiment with whatever tech they want, they just stomp their feet and management gives em whatever they ask for, freedom of choice regarding anything they are working on, to be allowed to do greenfield work or experimental stuff
But in my office we are forced to do ONLY. Bug fixing and refactoring shitty code from over a decade a go, our tech is ancient and we are not allowed to to
Shit , anything we ask for is denied
And improvements to our process is shut down with the reasoning that whatever we got works so why meddle ??
For us , management is solely focussed on making sure we respond to support calls , deployments , configurations and little bug fixing. Basically they only care that we manage to finish for out next delivery.
No new work whatsoever!
If there is any hint of something new to to
Implemented the golden boys from the other office just stopm their feet tillmthey get it or just go off and start working on it then seek permission afterwards, with their much larger team they obviously get further than we do by the time management hears about it so they end up taking over the work since they already have more done already
My manager decided to push us to attend a company devCon to share ideas with our devs from our other location. This rapidly turned into a sour experience
Basically we do all shitty boring work which puts money on the table which goes straight to those idiots to play with...
They have the guts to laugh when we mentioned that we never get anything interesting to work on
Never seen so many of our devs looking up job sites on the bus back...
This is gonna blow up in management's face...2 -
Most painful code error you've made?
More than I probably care to count.
One in particular where I was asked to integrate our code and converted the wrong value..ex
The correct code was supposed to be ...
var serviceBusMessage = new Message() {ID = dto.InvoiceId ...}
but I wrote ..
var serviceBusMessage = new Message() {ID = dto.OrderId ...}
At the time of the message bus event, the dto.OrderId is zero (it's set after a successful credit card transaction in another process)
Because of a 'true up' job that occurs at EOD, the issue went unnoticed for weeks. One day the credit card system went down and thousands of invoices needed to be re-processed, but seemed to be 'stuck', and 'John' was tasked to investigate, found the issue, and traced back to the code changes.
John: "There is a bug in the event bus, looks like you used the wrong key and all the keys are zero."
Me: "Oh crap, I made that change weeks ago. No one noticed?"
John: "Nah, its not a big deal. The true-up job cleans up anything we missed and in the rare event the credit card system goes down, like now. No worries, I can fix the data and the code."
<about an hour later I'm called into a meeting>
Mgr1: "We're following up on the credit card outage earlier. You made the code changes that prevented the cards from reprocessing?"
Me: "Yes, it was my screw up."
Mgr1: "Why wasn't there a code review? It should have caught this mistake."
Mgr2: "All code that is deployed is reviewed. 'Tom' performed the review."
Mgr1: "Tom, why didn't you catch that mistake."
Tom: "I don't know, that code is over 5 years old written by someone else. I assumed it was correct."
Mgr1: "Aren't there unit tests? Integration tests?"
Tom: "Oh yea, and passed them all. In the scenario, the original developers probably never thought the wrong ID would be passed."
Mgr1: "What are you going to do so this never happens again?"
Tom: "Its an easy addition to the tests. Should only take 5 minutes."
Mgr1: "No, what are *you* going to do so this never happens again?"
Me: "It was my mistake, I need to do a better job in paying attention. I knew what value was supposed to passed, but I screwed up."
Mgr2: "No harm no foul. We didn't lose any money and no customer was negativity affected. Credit card system may go down once, or twice a year? Nothing to lose sleep over. Thanks guys."
A week later Mgr1 fires Tom.
I feel/felt like a total d-bag.
Talking to 'John' later about it, turns out Tom's attention to detail and 'passion' was lacking in other areas. Understandable since he has 2 kids + one with special-needs, and in the middle of a divorce, taking most/all of his vacation+sick time (which 'Mgr1' dislikes people taking more than a few days off, that's another story) and 'Mgr1' didn't like Tom's lack of work ethic (felt he needed to leave his problems at home). The outage and the 'lack of due diligence' was the last straw.1 -
Do you know guys why a programming bug is called so? It's because the very first time a software crashed it was because of a bug ( a real bug stuck in a bus on the machine!) That caused that 😂 imagine if something else was stuck instead! Like someone'sfinger : hey I fixed that finger but still got 2 critical fingers and 4 small ones7
-
I just don't get the WordPress hate or CMS hate in general. Using these is not perfect, but neither is _anyone's_ code. Get over that and be more productive for your client. Unless you're the best coder the world has ever seen, and you're _always_ available to push content for an organization of 90 or 900 or 9,000 people, nobody CARES about your "coding purity". They want a website that they can still operate if your ass gets hit by a bus. Don't like WP? Find a CMS that ticks most of the boxes for your client's needs. If you have the time, budget, and long-term inclination to provide bug fixes for it, write your own Awesomesauce Custom CMS(TM) and release it to the open source community so we can finally replace WordPress with the next best thing.
Otherwise, launch site, get check. Repeat until you can retire.10 -
After three months of development, my first contribution to the client is going live on their servers in less than 12 hours. And let me say, I shall never again be doing that much programming in one go, because the last week and a half has been a nightmare... Where to begin...
So last Monday, my code passed to our testing servers, for QA to review and give its seal of approval. But the server was acting up and wouldn't let us do much, giving us tons of timeouts and other errors, so we reported it to the sysadmin and had to put off the testing.
Now that's all fine and dandy, but last Wednesday we had to prepare the release for 4 days of regression testing on our staging servers, which meant that by Wednesday night the code had to be greenlight by QA. Tuesday the sysadmin was unable to check the problem on our testing servers, so we had to wait to Wednesday.
Wednesday comes along, I'm patching a couple things I saw, and around lunch time we deploy to the testing servers. I launch our fancy new Postman tests which pass in local, and I get a bunch of errors. Partially my codes fault, partially the testing env manipulating server responses and systems failing.
Fifteen minutes before I leave work on the day we have to leave everything ready to pass to staging, I find another bug, which is not really something I can ignore. My typing skills go to work as I'm hammering line after line of code out, trying to get it finished so we can deploy and test when I get home. Done just in time to catch the bus home...
So I get home. Run the tests. Still a couple failures due to the bug I tried to resolve. We ask for an extension till the following morning, thus delaying our deployment to staging. Eight hours later, at 1AM, after working a full 8 hours before, I push my code and leave it ready for deployment the following morning. Finally, everything works and we can get our code up to staging. Tests had to be modified to accommodate the shitty testing environment, but I'm happy that we're finally done there.
Staging server shits itself for half a day, so we end up doing regression tests a full day late, without a change in date for our upload to production (yay...).
We get to staging, I run my tests, all green, all working, so happy. I keep on working on other stuff, and the day that we were slated to upload to production, my coworkers find that throughout the development (which included a huge migration), code was removed which should not have. Team panics. Everyone is reviewing my commits (over a hundred commits) trying to see what we're missing that is required (especially legal requirements). Upload to production is delayed one day because of this. Ended up being one class missing, and a couple lines of code, which is my bad (but seriously, not bad considering I'm a Junior who was handed this project as his first task at his first job).
I swear to God, from here on out, one feature per branch and merge request. Never again shall I let this happen. I don't even know why it was allowed to happen, it breaks our branch policies. But ohel... I will now personally oppose crap like this too...
Now if you'll excuse me... I'm going to be highly unproductive and rest, because I might start balding otherwise after these weeks... -
Found a bug today that made me groan in frustration.
It appears that the official elasticsearch debian package checks if the system's init daemon is systemd by... Checking if systemctl binary is available.
Issue is... Systems might contain that binary while using a different init, as the binary is part of the "systemd" package.
To actually switch to systemd however, the package systemd-sysv has to be installed, which creates a link from /bin/init to systemd's main executable.
What happens when your system doesnt use systemd then? The postinstall/preremove scripts fail as systemctl fails to talk to the system bus, and thus, the installation is marked as failed!
Oversights like this are exactly the reason behind my systemd dislike. We never wanted the systemd package, but another key package suddenly added it as a dependency one day...
Now to see if this is reported as a bug already, and if not, to report it myself...
(also, who checks for init by looking for the init's management utility?! Its like I checked if sysvinit is installed by checking if update-rc.d is installed!
And not like figuring out the system's init daemon is hard anyway! Just check /bin/init, or, better yet, check for process with pid 0!)1