Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "testing in production"
-
Hey, Root? How do you test your slow query ticket, again? I didn't bother reading the giant green "Testing notes:" box on the ticket. Yeah, could you explain it while I don't bother to listen and talk over you? Thanks.
And later:
Hey Root. I'm the DBA. Could you explain exactly what you're doing in this ticket, because i can't understand it. What are these new columns? Where is the new query? What are you doing? And why? Oh, the ticket? Yeah, I didn't bother to read it. There was too much text filled with things like implementation details, query optimization findings, overall benchmarking results, the purpose of the new columns, and i just couldn't care enough to read any of that. Yeah, I also don't know how to find the query it's running now. Yep, have complete access to the console and DB and query log. Still can't figure it out.
And later:
Hey Root. We pulled your urgent fix ticket from the release. You know, the one that SysOps and Data and even execs have been demanding? The one you finished three months ago? Yep, the problem is still taking down production every week or so, but we just can't verify that your fix is good enough. Even though the changes are pretty minimal, you've said it's 8x faster, and provided benchmark findings, we just ... don't know how to get the query it's running out of the code. or how check the query logs to find it. So. we just don't know if it's good enough.
Also, we goofed up when deploying and the testing database is gone, so now we can't test it since there are no records. Nevermind that you provided snippets to remedy exactly scenario in the ticket description you wrote three months ago.
And later:
Hey Root: Why did you take so long on this ticket? It has sat for so long now that someone else filed a ticket for it, with investigation findings. You know it's bringing down production, and it's kind of urgent. Maybe you should have prioritized it more, or written up better notes. You really need to communicate better. This is why we can't trust you to get things out.
*twitchy smile*rant useless people you suck because we are incompetent what's a query log? it's all your fault this is super urgent let's defer it ticket notes too long; didn't read21 -
So my department is "integrating CI/CD"
Right now, there's a very anti-automation culture in the deployment process, and out of our many applications, almost none have automated testing. And my groups is the only one that uses feature branching - one of the few groups that uses branching at all beyond "master, dev"
So yeah... You could see how this is already ENTIRELY fucked from the very beginning.
First thing they want to do is add better support for a process... Which goes directly against CI/CD.
The process is that to deploy to production (even after it is manually approved by manager), someone in another department needs to press a button to manually deploy. This, as far as I can tell, is for business rule reasons rather than technical ones.
They want us to improve that (the system will stay exactly the same with some streamlined options for said button pressers)
I'm absolutely astounded at the way our management wants to do something but goes in exactly the opposite direction. It's like the found an article of what CI/CD was and then took notes on exactly what not to do.25 -
If you're going to request CRITICAL changes to thousands of records in the database, and approve it through testing which is done on an exact replica of production, then tell me it was done incorrectly after the fact it has been implemented and you didn't actually review the changes made to the data or business logic that you requested then you are an idiot. Our staging environment is there to ensure all the changes are accurate you useless human. Its the data you provided, I didn't just magically pull it from thin air to make yours and my job a pain the ass.undefined stupid data analysts this is why health insurance costs a buttload do your job fuckface idiots9
-
Worst collaboration experience story?
I was not directly involved, it was a Delphi -> C# conversion of our customer returns application.
The dev manager was out to prove waterfall was the only development methodology that could make convert the monolith app to a lean, multi-tier, enterprise-worthy application.
Starting out with a team of 7 (3 devs, 2 dbas, team mgr, and the dev department mgr), they spent around 3 months designing, meetings, and more meetings. Armed with 50+ page specification Word document (not counting the countless Visio workflow diagrams and Microsoft Project timeline/ghantt charts), the team was ready to start coding.
The database design, workflow, and UI design (using Visio), was well done/thought out, but problems started on day one.
- Team mgr and Dev mgr split up the 3 devs, 1 dev wrote the database access library tier, 1 wrote the service tier, the other dev wrote the UI (I'll add this was the dev's first experience with WPF).
- Per the specification, all the layers wouldn't be integrated until all of them met the standards (unit tested, free from errors from VS's code analyzer, etc)
- By the time the devs where ready to code, the DBAs were already tasked with other projects, so the Returns app was prioritized to "when we get around to it"
Fast forward 6 months later, all the devs were 'done' coding, having very little/no communication with one another, then the integration. The service and database layers assumed different design patterns and different database relationships and the UI layer required functionality neither layers anticipated (ex. multi-users and the service maintaining some sort of state between them).
Those issues took about a month to work out, then the app began beta testing with real end users. App didn't make it 10 minutes before users gave up. Numerous UI logic errors, runtime errors, and overall app stability. Because the UI was so bad, the dev mgr brought in one of the web developers (she was pretty good at UI design). You might guess how useful someone is being dropped in on complex project , months after-the-fact and being told "Fix it!".
Couple of months of UI re-design and many other changes, the app was ready for beta testing.
In the mean time, the company hired a new customer service manager. When he saw the application, he rejected the app because he re-designed the entire returns process to be more efficient. The application UI was written to the exact step-by-step old returns process with little/no deviation.
With a tremendous amount of push-back (TL;DR), the dev mgr promised to change the app, but only after it was deployed into production (using "we can fix it later" excuse).
Still plagued with numerous bugs, the app was finally deployed. In attempts to save face, there was a company-wide party to celebrate the 'death' of the "old Delphi returns app" and the birth of the new. Cake, drinks, certificates of achievements for the devs, etc.
By the end of the project, the devs hated each other. Finger pointing, petty squabbles, out-right "FU!"s across the cube walls, etc. All the team members were re-assigned to other teams to separate them, leaving a single new hire to fix all the issues.5 -
I was asked to look into a site I haven't actively developed since about 3-4 years. It should be a simple side-gig.
I was told this site has been actively developed by the person who came after me, and this person had a few other people help out as well.
The most daunting task in my head was to go through their changes and see why stuff is broken (I was told functionality had been removed, things were changed for the worse, etc etc).
I ssh into the machine and it works. For SOME reason I still have access, which is a good thing since there's literally nobody to ask for access at the moment.
I cd into the project, do a git remote get-url origin to see if they've changed the repo location. Doesn't work. There is no origin. It's "upstream" now. Ok, no biggie. git remote get-url upstream. Repo is still there. Good.
Just to check, see if there's anything untracked with git status. Nothing. Good.
What was the last thing that was worked on? git log --all --decorate --oneline --graph. Wait... Something about the commit message seems familiar. git log. .... This is *my* last commit message. The hell?
I open the repo in the browser, login with some credentials my browser had saved (again, good because I have no clue about the password). Repo hasn't gotten a commit since mine. That can't be right.
Check branches. Oh....Like a dozen new branches. Lots of commits with text that is really not helpful at all. Looks like they were trying to set up a pipeline and testing it out over and over again.
A lot of other changes including the deletion of a database config and schema changes. 0 tests. Doesn't seem like these changes were ever in production.
...
At least I don't have to rack my head trying to understand someone else's code but.... I might just have to throw everything that was done into the garbage. I'm not gonna be the one to push all these changes I don't know about to prod and see what breaks and what doesn't break
.
I feel bad for whoever worked on the codebase after me, because all their changes are now just a waste of time and space that will never be used.3 -
Startup-ing 101, from Fitbit:
- spy on users
- sell data
- cut production costs
- mutilate people's bodies, leaving burn scars that will never heal
- announce the recall, get PR, and make the refund process impossibly convoluted
- never give actual refunds
- claim that yes, fitbit catches fire, but only the old discontinued device, just to mess with search results and make the actual info (that all devices catch fire) hard to find
- try hard to obtain the devices in question, so people who suffered have no evidence
- give bogus word salad replies to the press
This is what one of the people burned has to say:
"I do not have feeling in parts of my wrist due to nerve damage and I will have a large scar that will be with me the rest of my life. This was a traumatic experience and I hope no one else has to go through it. So, if you own a Fitbit, please reconsider using it."
Ladies and gentlemen, cringefest starts. One of fitbit replies:
"Fitbit products are designed and produced in accordance with strict standards and undergo extensive internal and external testing to ensure the safety of our users. Based on our internal and independent third party testing and analysis, we do not believe this type of injury could occur from normal use. We are committed to conducting a full investigation. With Google's resources and global platform, Fitbit will be able to accelerate innovation in the wearables category, scale faster, and make health even more accessible to everyone. I could not be more excited for what lies ahead".
In the future, corporate speech will be autogenerated.
(if you wear fitbit, just be aware of this.)15 -
Continuing on the PR ranting.. That's six PR's in the queue now. The most important one took over a week to sort through because someone had a grand idea to generalize the work and that required a ton of changes. Great to receive that kind of feedback AFTER I've already done the work.. Then we waited on translations because can't miss those! Now the tests are failing. Not caused by me! Someone messed up the E2E tests. That's two weeks in the queue now.
Because one test is failing, that's a block across all the PR's. All of them have now been waiting multiple days in the queue. After someone manages to fix the broken test, all PR's go one by one into the QA testing queue and that takes 1-2 hours per branch to test and deploy to production.
Really love having a cycle time of four days for coloring a button differently! And the constant context switching. -
Posted in DevOps discussion board (teams channel):
“Program x isn’t behaving the same way that it does on production. Can you please take a look?”
..a little background: we have a deployment scheduled for today and this issue was found during regression testing.
The issue found is that when a file is clicked on it disappears from the screen, and then isn’t opened…
The file is not on prem, and doesn’t get uploaded to a server that our DevOps team owns…
So why on earth would this development team be asking DevOps to look into a bug that is most likely a code related issue? 😆
Is this a common occurrence for anyone else?
A Bug is found, and the first thought is that the code isn’t the issue?11 -
This is the first project where I've worked with a separate QA department and it is pain. It's just handing out work and waiting for hours for them to finish testing a very small thing, and you just have to sit there, waiting, because context switching will end up messing something up. Then you get a failure, because you didn't replace an icon, that was in the design spec just as a placeholder, and you can't move forward because you need another icon too that was not included in the spec. So here we are, sitting and waiting for the designer to return from their holiday so we can have their feedback.
When I asked if we could just skip the icon and push to production, then wait for feedback and do another iteration if needed, the answer is a firm NO. It must be completed on this one test cycle because they're too lazy to do another test run.
Fuck you and your waterfall working methods!2 -
Best:
Seeing ALL the members of my team finally coming into their own. One person tackled our entire not-at-all-simple CI/CD setup from scratch knowing nothing about any of it and, while not without bumps in the road, did an excellent job overall (and then did the same for some other projects since he found himself being the SME). Two of my more junior people took on some difficult tasks that required them to design and build some tricky features from the ground-up, rather than me giving them a ton of guidance, design and even a start on the basic code early on (I just gave them some general descriptions of what I was looking for and then let them run with it). Again, not without some hiccups, but they ultimately delivered and learned a lot in the process and, I think, gained a new sense of self-confidence, which to me is the real win. And my other person handled some tricky high-level stuff that got him deep in the weeds of all the corporate procedures I'd normally shield them all from and did very well with it (and like the other person, wound up being an SME and doing it for some other projects after that). It took a while to get here, but I finally feel like I don't need to do all the really difficult stuff myself, I can count on them now, and they, I think, no longer feel like they're in over their heads if I throw something difficult at them.
Worst:
A few critical bugs slipped into production this year, with a few requiring some after-hours heroics to deal with (and, unfortunately, due to the timing, it all fell on me). Of course, that just tells us that next year we really need to focus on more robust automated testing (though, in reality, at least one of the issues almost certainly would not - COULD NOT - have been caught before-hand anyway, and that's probably true for more than just one of them). We had avoided major issues the previous three years we've been live, so this was unusual. Then again, it's in a way a symptom of success because with more users and more usage, both of which exploded this year, typically does come more issues discovered, so I guess it tempers the bad just a little bit.2 -
Spent a couple of weeks on writing a cronjob which updates a certain value in the application config, and spend the last few months on testing it in different environments to make sure it does not fail in production. Ran the deployment script, and the damn cronjob fails because of ssl certificate on production. fuck me
-
Lately as I've been reading about hacks in companies like Uber, Rockstar etc. I've come to remember my time at a company that worked like a feature factory, where developers constantly groaned about not having enough time to do things right. This had lead to having a few dozen people (both business and technology side) with direct access to the production database, just so they can do their jobs and that database was replicated across various environments just so they could develop on top of it, and do "proper testing".
The bad side is, this database contained the personal information of millions of the people (names, addresses, ssn numbers..). I saw this security problem, and always spoke out about it, but there was never enough time to do anything about it, like build the features the users need for them to do their job without direct access to the database. The app itself was a monolithic nightmare with poor development standards and holes here and there.
It only takes one person whos credentials could be compromised to bring that entire business down because all the data resides in one single database. However, they are lucky that there were at least smart people in the Ops department so they do have some good security measures in place on that side, because the development side was complete shit. Don't go lacking around with security!2 -
Quick question for you all: How do you deal with a problem in production that you cannot fix, even over an extended period of time (say 2 months)?
For context I feel like I’m losing my sanity here, we’ve had this problem on our production API since the beginning of March this year. I’ve done so much testing, got in contact with various teams of my company to try to figure out any potential candidate that would explain the bug, but none worked out. No need to say I’ve spent a considerable amount of time searching on the internet for others with the same problem or similar… We’ve even opened a ticket with the cloud host to see if they would have more details about the problem without success. So how do you deal with that ?5