Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "testing on prod"
-
I worked with a good dev at one of my previous jobs, but one of his faults was that he was a bit scattered and would sometimes forget things.
The story goes that one day we had this massive bug on our web app and we had a large portion of our dev team trying to figure it out. We thought we narrowed down the issue to a very specific part of the code, but something weird happened. No matter how often we looked at the piece of code where we all knew the problem had to be, no one could see any problem with it. And there want anything close to explaining how we could be seeing the issue we were in production.
We spent hours going through this. It was driving everyone crazy. All of a sudden, my co-worker (one referenced above) gasps “oh shit.” And we’re all like, what’s up? He proceeds to tell us that he thinks he might have been testing a line of code on one of our prod servers and left it in there by accident and never committed it into the actual codebase. Just to explain this - we had a great deploy process at this company but every so often a dev would need to test something quickly on a prod machine so we’d allow it as long as they did it and removed it quickly. It was meant for being for a select few tasks that required a prod server and was just going to be a single line to test something. Bad practice, but was fine because everyone had been extremely careful with it.
Until this guy came along. After he said he thought he might have left a line change in the code on a prod server, we had to manually go in to 12 web servers and check. Eventually, we found the one that had the change and finally, the issue at hand made sense. We never thought for a second that the committed code in the git repo that we were looking at would be inaccurate.
Needless to say, he was never allowed to touch code on a prod server ever again.8 -
Doot doot.
My day: Eight lines of refactoring around a 10-character fix for a minor production issue. Some tests. Lots of bloody phone calls and conference calls filled with me laughing and getting talked over. Why? Read on.
My boss's day: Trying very very hard to pin random shit on me (and failing because I'm awesome and fuck him). Six hours of drama and freaking out and chewing and yelling that the whole system is broken because of that minor issue. No reading, lots of misunderstanding, lots of panic. Three-way called me specifically to bitch out another coworker in front of me. (Coworker wasn't really in the wrong.) Called a contractor to his house for testing. Finally learned that everything works perfectly in QA (duh, I fixed it hours ago). Desperately waited for me to push to prod. Didn't care enough to do production tests afterwards.
My day afterwards: hey, this Cloudinary transform feature sounds fun! Oh look, I'm done already. Boo. Ask boss for update. Tests still aren't finished. Okay, whatever. Time for bed.
what a joke.
Oh, I talked to the accountant after all of this bullshit happened. Apparently everyone that has quit in the last six years has done so specifically because of the boss. Every. single. person.
I told him it was going to happen again.
I also told him the boss is a druggie with a taste for psychedelics. (It came up in conversation. Absolutely true, too.) It's hilarious because the company lawyer is the accountant's brother.
So stupid.18 -
Welcome back to practiseSafeHex's new life as a manager.
Episode 2: Why automate when you can spend all day doing it by hand
This is a particularly special episode for me, as these problems are taking up so much of my time with non-sensical bullshit, that i'm delayed with everything else. Some badly require tooling or new products. Some are just unnecessary processes or annoyances that should not need to be handled by another human. So lets jump right in, in no particular order:
- Jira ... nuff said? not quite because somehow some blue moon, planets aligning, act of god style set of circumstances lined up to allow this team to somehow make Jira worse. On one hand we have a gigantic Jira project containing 7 separate sub teams, a million different labels / epics and 4.2 million possible assignees, all making sure the loading page takes as long as possible to open. But the new country we've added support for in the app gets a separate project. So we have product, backend, mobile, design, management etc on one, and mobile-country2 on another. This delightfully means a lot of duplication and copy pasting from one to the other, for literally no reason what so ever.
- Everything on Jira is found through a label. Every time something happens, a new one is created. So I need to check for "iOS", "Android", "iOS-country2", "Android-country2", "mobile-<feature>", "mobile-<feature>-issues", "mobile-<feature>-prod-issues", "mobile-<feature>-existing-issues" and "<project>-July31" ... why July31? Because some fucking moron decided to do a round of testing, and tag all the issues with the current date (despite the fact Jira does that anyway), which somehow still gets used from time to time because nobody pays attention to what they are doing. This means creating and modifying filters on a daily basis ... after spending time trying to figure out what its not in the first one.
- One of my favourite morning rituals I like to call "Jira dumpster diving". This involves me removing all the filters and reading all the tickets. Why would I do such a thing? oh remember the 9000 labels I mentioned earlier? right well its very likely that they actually won't use any of them ... or the wrong ones ... or assign to the wrong person, so I have to go find them and fix them. If I don't, i'll get yelled at, because clearly it's my fault.
- Moving on from Jira. As some of you might have seen in your companies, if you use things like TestFlight, HockeyApp, AppCenter, BuddyBuild etc. that when you release a new app version for testing, each version comes with an automated change-log, listing ticket numbers addressed ...... yeah we don't do that. No we use this shitty service, which is effectively an FTP server and a webpage, that only allows you to host the new versions. Sending out those emails is all manual ... distribution groups?? ... whats that?
- Moving back to Jira. Can't even automate the changelog with a script, because I can't even make sense of the tickets, in order to translate that to a script.
- Moving on from Jira. Me and one of the remote testers play this great game I like to call "tag team ticketing". It's so much fun. Right heres how to play, you'll need a QA and a PM.
*QA creates a ticket, and puts nothing of any use inside it, and assigns to the PM.
*PM fires it back asking for clarification.
*QA adds in what he feels is clarification (hes wrong) and assigns it back to the PM.
*PM sends detailed instructions, with examples as to what is needed and assigns it back.
*QA adds 1 of the 3 things required and assigns it back.
*PM assigns it back saying the one thing added is from the wrong day, and reminds him about the other 2 items.
*QA adds some random piece of unrelated info to the ticket instead, forgetting about the 3 things and assigns it back.
and you just continue doing this for the whole dev / release cycle hahaha. Oh you guys have no idea how much fun it is, seriously give it a go, you'll thank me later ... or kill yourselves, each to their own.
- Moving back to Jira. I decided to take an action of creating a new project for my team (the mobile team) and set it up the way we want and just ignore everything going on around us. Use proper automation, and a kanban board. Maybe only give product a slack bot interface that won't allow them to create a ticket without what we need etc. Spent 25 minutes looking for the "create new project" button before finding the link which says I need to open a ticket with support and wait ... 5 ... fucking ... long ... painful ... unnecessary ... business days.
... Heres hoping my head continues to not have a bullet hole in it by then.
Id love to talk more, but those filters ain't gonna fix themselves. So we'll have to leave it here for today. Tune in again for another episode soon.
And remember to always practiseSafeHex13 -
Friend of mine killed his MacBook with some Softdrink.
Just poured it all over his poor a1502.
He let it dry for a few days, it starts to work again.
Except the battery.
Goes on Amazon and buys a new battery.
New battery doesn't work either and so he tells me about it and I as stupid as I am couldn't resist the temptation to finally work on a MacBook like my "hero" Lois Rossmann does.
So turns out the board is good.
Cleaned it up and basically nothing happened to it.
So what's the deal with "los batlerias"?
The first got hit by liquid, the second had a broken connection to a cell.
That could have happened through my friend, installing it without testing it first, or at the seller, so it being a DOA battery.
Now away from the stupidity of my friend and the situation to the actual source for this rant.
Once something happens to a modern Managed battery, the Battery Management System (BMS) disconnects the voltage from the system and goes into an error state, staying there and not powering anything ever again.
For noobs, it's dead. Buy a new one.
But It can be reset, depending you know how to, and which passwords were set at the factory.
Yes, the common Texas instruments BQ20Zxx chips have default passwords, and apple seems to leav them at default.
The Usb to SMBus adaptors arrived a few days ago and I went to prod the BMS.
There is a very nice available for Windows called BE2works, that I used the demo of to go in and figure out stuff. The full version supports password cracking, the demo not.
After some time figuring out how Smart Battery Systems (SBS) "API" works, I got to actually enter the passwords into the battery to try get into manufacturer and full access mode.
Just to realise, they don't unlock the BMS.
So, to conclude, my friend bought a "new" battery that was most likely cut out of a used / dead macbook, which reports 3000mah as fully charged instead of the 6xxx mah that it should have, with 0 cycles and 0hours used.
And non default access.
This screams after those motherfuckers scaming the shit out of people on Amazon, with refurb, reset, and locked fucken batteries.
I could kill those people right now.
Last but not least,
My friend theoretically can't send it back because I opened the battery to fix the broken connection.
Though maybe, it'll get send back anyway, with some suprise in the package.9 -
Didnt expect a reputable bank to do testing in this manner. Damn you Axis Bank, your system is supposed to be the best. Smh. 😩1
-
The riskiest dev choice...
How about "The riskiest thing you've done as a dev"? I have a great entry for that. and I suppose it was my choice to build the feature afterall.
I was working on an instance of a small MMO at a game company I worked for. The MMO boasted multiple servers, each of them a vastly different take on the base game. We could use, extend, or outright replace anything we wanted to, leading to everything from Zelda to pokemon to an RP haven to a top-down futuristic counterstrike. The server in this particular instance was a fantasy RPG, and I was building it a new leveling and experience system with most of the trimmings. (Talents, feats/perks, etc. were in a future update.)
A bit of background, first: the game's dev setup did not have the now-standard dev/staging/prod servers; everything ran on prod, devs worked on prod, players connected and played on prod, etc. Worse yet, there was no backup system implemented -- or not really. The CTO was really the only person with sufficient access. The techy CEO did as well, but he rarely dealt with anything technical except server hardware, occasionally. And usually just to troll/punish us devs (as in "Oops ! I pulled the cat5 ! ;)"). Neither of them were the most reliable of people, either. The CTO would occasionally remote in and make backups of each server -- we assumed whenever he happened to think of it -- and would also occasionally do it when asked, but it could take him a week, sometimes even up to a month to get around to it. So the backups were only really useful for retreiving lost code and assets, not so much for player data.
The lack of reliable backups and the lack of proper testing grounds (among the plethora of other issues at the company) made for an absolutely terrible dev setup, but that's just how it was, and that's what we dealt with. We were game devs, afterall. Terrible or not, we got to make games! What more could you ask for!? It was amazing and terrible and wonderful and the worst thing ever, all at the same time. (and no, I'm not sharing the company name, but it isn't EA or Nexon, surprisingly 😅)
Anyway, back to the story! My new leveling system also needed to migrate players' existing data, so... you can see where this is going.
I did as much testing and inspection of my code as I could, copied it from a personal dev script to the server's xp system, ... and debated if I really wanted to click [Apply]. Every time I considered it, I went back to check another part or do yet more testing. I ended up taking like 40 minutes to finally click it.
And when I did... that was the scariest button press of my life. And the scariest three seconds' wait afterwards. That one click could have ruined every single player's account, permanently lost us players ...
After applying it, I immediately checked my character to see if she was broken, checked the account data for corruption or botched flags, checked for broken interactions with the other systems....
Everything ended up working out perfectly, and the players loved all of the new features. They had no idea what went into building them, and certainly had no idea of what went into applying them, or what could have gone wrong -- which is probably a good thing.
Looking back, that entire environment was so fragile, it's a wonder things didn't go horribly wrong all the time. Really, they almost never did. Apocalypses did happen, but were exceedingly rare, and were ususally fixed quickly. I guess we were all super careful simply because everything was so fragile? or the decent devs were, at least. We never trusted the lessers with access 😅 at least on the main servers where it mattered. Some of the smaller servers... well, we never really cared about those.
But I'm honestly more surprised to realize I've never had nightmares of that button click. It was certainly terrifying enough.
But yay! Complete system overhaul and migration of stored and realtime player data! on prod! With no issues! And lots of happy players! Woooooo!
Thinking back on it makes me happy 😊rant deploying straight to prod prod prod prod dev server? dev on prod you chicken migration on prod wk149 git? who's a git? you're a git! scariest deploy ever game development1 -
Be me, new dev on a team. Taking a look through source code to get up to speed.
Dev: **thinking to self** why is there no package lock.. let me bring this up to boss man
Dev: hey boss man, you’ve got no package lock, did we forget to commit it?
Manager: no I don’t like package locks.
Dev: ...why?
Manager: they fuck up computer. The project never ran with a package lock.
Dev: ..how will you make sure that every dev has the same packages while developing?
Manager: don’t worry, I’ve done this before, we haven’t had any issues.
**couple weeks goes by**
Dev: pushes code
Manager: hey your feature is not working on my machine
Dev: it’s working on mine, and the dev servers. Let’s take a look and see
**finds out he deletes his package lock every time he does npm install, so therefore he literally has the latest of like a 50 packages with no testing**
Dev: well you see you have some packages here that updates, and have broken some of the features.
Manager: >=|, fix it.
Dev: commit a working package lock so we’re all on the same.
Manager: just set the package version to whatever works.
Dev: okay
**more weeks go by**
Manager: why are we having so many issues between devs, why are things working on some computers and not others??? We can’t be having this it’s wasting time.
Dev: **takes a look at everyone’s packages** we all have different packages.
Manager: that’s it, no one can use Mac computers. You must use these windows computers, and you must install npm v6.0 and node v15.11. Everyone must have the same system and software install to guarantee we’re all on the same page
Dev: so can we also commit package lock so we’re all having the same packages as well?
Manager: No, package locks don’t work.
**few days go by**
Manager: GUYS WHY IS THE CODE DEPLOYING TO PRODUCTION NOT WORKING. IT WAS WORKING IN DEV
DEV: **looks at packages**, when the project was built on dev on 9/1 package x was on version 1.1, when it was approved and moved to prod on 9/3 package x was now on version 1.2 which was a change that broke our code.
Manager: CHANGE THE DEPLOYMENT SCRIPTS THEN. MAKE PROD RSYNC NODE_MODULES WITH DEV
Dev: okay
Manager: just trust me, I’ve been doing this for years
Who the fuck put this man in charge.11 -
On the first day of Christmas, the bossman gave to me: The fact that my new computer purchase order needs to be OKed by the CEO and I need to continue working on a 2014 Mac Mini (i5-4260U, 8 Gig RAM, GPU shot by an ESD on the case long ago) for the next year.
On the second day of Christmas, my family gave to me... a good reason to get shitfaced
On the third day of Christmas, getting shitfaced gave to me: A hangover and some urgent plastic welding job that had to be done with a soldering iron. FML, I've had a headache before breathing in pure hydro-cyano-whatthefuckyougetwhenyoumeltplastics
On the fourth day of Christmas, my team gave to me: A legacy, age-old Rails 2 project that was written by an intern and never reviewed, went to prod in 2014 and can't be changed anymore, but needs to be changed after the fact that it has zero test coverage and needs 100 % now to prevent issues and costly manual testing.
On the fifth day of Christmas, devrant gave to me: The Idea that making fun of Christmas songs to get over the sheer amount of dicks that working over the twelve days of Christmas sucks.
To be continued...2 -
When will I fuckin learn that
a) customers lie
b) customers are sloppy
c) customers are wrong
d) customers do not do their work (properly)
e) customers want us to do their (dirty) work
f) possibly all of the freakinly above?! + khm....
They will fuckin aaaalwaaaays say sth is not working after the update..
And I will alwaaaays assume I fucked up something..even if I didn't touch that part of the code/data..
And almost aaaaalways it turns out that the bug they complain about is how the system worked (or didn't work) before the update and/or some fuckup from their side..
Anyhow, I rushed over, grabbed the files went testing in dev..wtf, output is different, mine is ok, theirs is..wtf is that shit?!
Transfer newly built dll to test..same shit as on prod..wtf?! How?!
I assumed they have thing A correctly linked to thing B.. ofc thing A was linked to thing C in their case and in another case (our test) to correct thing B..
I got chillies when grabbing files, that
I should have tripple checked that they didn't fuck up something on the link part, but I just assumed they know what they were doing & that they checked they linked correct files with correct content already, before being pissy that the update fucked up things.. riiiight!! :/
I wanted to find solutions to this fuckup asap so I disregarded my gut feeling..yet again!! Fuuuck!
I've spent too much time trying to find ways to fix a bug that wasn't even a real bug to begin with.. :/
Fuuuuuck!!
So yeah, always treat the customers like they are 3yrs old & have no clue what they are doing & check exactly wtf they were indeed trying to do..it will save you time & nerves..
And note to self: reread this shit daily!! And imprint it in your brain that everything is not always your fault!!11 -
I've seen many devs doing crazy things in my entire career till now. But this one dude stands out.
He used to:
- Push binaries to git repo
- Use some old libraries which were used during Indus Valley civilization
- Had no sense of database and used to delete random data from it and call it as TESTING (Thank God! I never gave him prod access)
- And on top of this, he had an ass full of attitude!2 -
I was asked to look into a site I haven't actively developed since about 3-4 years. It should be a simple side-gig.
I was told this site has been actively developed by the person who came after me, and this person had a few other people help out as well.
The most daunting task in my head was to go through their changes and see why stuff is broken (I was told functionality had been removed, things were changed for the worse, etc etc).
I ssh into the machine and it works. For SOME reason I still have access, which is a good thing since there's literally nobody to ask for access at the moment.
I cd into the project, do a git remote get-url origin to see if they've changed the repo location. Doesn't work. There is no origin. It's "upstream" now. Ok, no biggie. git remote get-url upstream. Repo is still there. Good.
Just to check, see if there's anything untracked with git status. Nothing. Good.
What was the last thing that was worked on? git log --all --decorate --oneline --graph. Wait... Something about the commit message seems familiar. git log. .... This is *my* last commit message. The hell?
I open the repo in the browser, login with some credentials my browser had saved (again, good because I have no clue about the password). Repo hasn't gotten a commit since mine. That can't be right.
Check branches. Oh....Like a dozen new branches. Lots of commits with text that is really not helpful at all. Looks like they were trying to set up a pipeline and testing it out over and over again.
A lot of other changes including the deletion of a database config and schema changes. 0 tests. Doesn't seem like these changes were ever in production.
...
At least I don't have to rack my head trying to understand someone else's code but.... I might just have to throw everything that was done into the garbage. I'm not gonna be the one to push all these changes I don't know about to prod and see what breaks and what doesn't break
.
I feel bad for whoever worked on the codebase after me, because all their changes are now just a waste of time and space that will never be used.3 -
I'm having an existential crisis with this client.
We are spending millions of $s every year to make sure the product's performance is perfect. We are testing various scenarios, fine-tuning PLABs: the environment, application, middleware, infra,... And then we provide our recommendations to the client: "To handle load of XX parallel users focusing on YY, yy and Zy APIs, use <THIS> configuration".
And what the client does?
- take our recommendations and measure the wind speed outside
- if speed is <20m/s and milk hasn't gone bad yet, add 2x more instances of API X
- otherwise add 3xX, 1xY and give more CPUs to Z
- split the setup in half and deploy in 2 completely separate load-balanced prod environments.
- <do other "tweaking">
- bomb our team with questions "why do we have slow RTs?", "why did the env crash?", "why do we have all those errors?", "why has this been overlooked in PLABs?!?"
If you're improvising despite our recommendations, wtf are we doing here???
One day I will crack. Hopefully, not sometime soon.3 -
I've been staffed on a old ongoing project, first day.
0. Compatibility has to be guaranteed down till IE9... ppf.
1. Front end made in XHTML+JS(jQuery)... bah, ok.
2. XHTML+JS is actually generated by PHP5.4, not a line is actually statically served... beh, funny, ok.
3. PHP files are the output of an XSLT transform of a bunch of XMLs... meh, seriously? Oooook.
4. XMLs are the product of the serialisation of a truck of stateful JavaEE6 DTOs populated magically (undocumented) with data coming from a SQL DB... WTF mode!!!
5. Session logics lives within PHP-land at point 2, front end makes ajax calls here that propagates to another WS out of our control that triggers -somehow- (undocumented) our Java backend at point 4 to generate new XMLs and then reach front end again. Kill me now.
Boss: look... it's too slow for the client, it's too heavy on our servers: fix it. Ah, and we sold 85% test coverage by October. You're the man for the job. (I'm a Node.js fullstacker and right now there's not even a testing scaffold, ofc).
Me: prod is on Linux or Windows?
Boss: RHEL7.
Me: rm -rf / as root. Done.
Boss: I know I know...
Me: ...
I think time has come...6 -
Ever had a day that felt like you're shoveling snow from the driveway? In a blizzard? With thunderstorms & falling unicorns? Like you shovel away one m² & turn around and no footprints visible anymore? And snow built up to your neck?
Today my work day was like that.. xcept shit..shit instead of pretty & puffy snow!!
Working on things a & b, trying to not mess either one up, then comes shit x, coworker was updating production.. ofc something went wrong.. again not testing after the update..then me 'to da rescue'.. :/ hardly patch things up, so it works..in a way.. feature c still missing due to needed workarounds.. going back to a and b.. got disrupted by the same coworker who is nver listening, but always asking too much..
And when I think I finally have the b thing figured out a f-ing blocker from one of our biggest clients.. The whole system is unresponsive.. Needles to say, same guy in support for two companies (their end), so they filed the jira blocker with the wrong customer that doesn't have a SLA so no urgent emails..and then the phone calls.. and then the hell broke loose.. checking what is happening.. After frantic calls from our dba to anyone who even knows that our customer exists if they were doing sth on the db.. noup, not a single one was fucking with the prod db.. The hell! Materialised view created 10 mins ago that blocked everything..set to recreate every 10 minutes..with a query that I am guessing couldn't even select all that data in under 15.. dafaaaq?! Then we kill it..and again it is there.. We found out that customers dbas were testing something on live environment, oblivious that they mamaged to block the entire db..
FML, I'm going pokemon hunting.. :/ codename for ingress n beer..3 -
TL;DR Dear boss, firstly, you always get someone to review anything important done by a fucking intern.
Secondly, you do not give access to your fucking client's production server to an intern.
Thirdly, you don't ask your fucking intern to test the intern's work that has not been reviewed by anyone directly on your client's fucking production server.
Last week, the boss and one of the lead devs (the only guy with some serious knowledge about systems and networking) decided to give me (an intern who barely has any work experience) the task of fixing or finding an alternate solution to allowing their support team access to their client machines. Currently they used a reverse SSH tunnel and an intermediary VH but for some reason, that was very unreliable in terms of availability. I suggested using OpenVPN and explained how it would work. Seemed to be a far better idea and they accepted. After several days of working through documentations and guides and everything, I figured out how OpenVPN works and managed to deploy a TEST server and successfully test remote access using two VMs. On seeing my tests, the boss told me that he wanted to test it on the client network. I agreed. Today he comes to me and he tells me to prepare testing for tomorrow and that the client technician is going to give me access to one of their boxes. And then he adds, "It's a working prod server. We'll see if we can make it work on that" and left. I gaped at him for a while and asked another dev guy in the room if what I heard was right. He confirmed. Turns out, the lead dev and the boss's son (who also works here) had had a huge argument since morning on the same issue and finally the dev guy had washed it off his hands and declared that if anything goes wrong from testing it on production, it's entirely the boss's own fault. That's when the boss stepped in and approached me. I ran back to his office and began to explain why prod servers don't top the list of things you can fuck around with. But he simply silenced me saying, "What can go wrong?" and added, "You shouldn't stay still. You should keep moving". Okay, like firstly what the fuck and secondly, what the fuck?.
Even though OpenVPN client is not the scariest thing to install, tomorrow's going to be fun.4 -
Departure: 13.00 Train: test Destination: A1 Delay 18888min
Now it is 15:43 and it is still on...
Finaly poland joins the testing in the prod club!
Station is Wroclaw if anybody is interested.2 -
We had 1 Android app to be developed for charity org for data collection for ground water level increase competition among villages.
Initial scope was very small & feasible. Around 10 forms with 3-4 fields in each to be developed in 2 months (1 for dev, 1 for testing). There was a prod version which had similar forms with no validations etc.
We had received prod source, which was total junk. No KT was given.
In existing source, spelling mistakes were there in the era of spell/grammar checking tools.
There were rural names of classes, variables in regional language in English letters & that regional language is somewhat known to some developers but even they don't know those rural names' meanings. This costed us at great length in visualizing data flow between entities. Even Google translate wasn't reliable for this language due to low Internet penetration in that language region.
OOP wasn't followed, so at 10 places exact same code exists. If error or bug needed to be fixed it had to be fixed at all those 10 places.
No foreign key relationships was there in database while actually there were logical relations among different entites.
No created, updated timestamps in records at app side to have audit trail.
Small part of that existing source was quite good with Fragments, MVP etc. while other part was ancient Activities with business logic.
We have to support Android 4.0 to 9.0 of many screen sizes & resolutions without any target devices issued to us by the client.
Then Corona lockdown happened & during that suddenly client side professionals became over efficient.
Client started adding requirements like very complex validation which has inter-entity dependencies. Then they started filing bugs from prod version on us.
Let's come to the developers' expertise,
2 developers with 8+ years of experience & they're not knowing how to resolve conflicts in git merge which were created by them only due to not following git best practice for coding like only appending new implementation in existing classes for easy auto merge etc.
They are thinking like handling click events is called development.
They don't want to think about OOP, well structured code. They don't want to re-use code mostly & when they copy paste, they think it's called re-use.
They wanted to follow old school Java development in memory scarce Android app life cycle in end user phone. They don't understand memory leaks, even though it's pin pointed by memory leak detection tools (Leak canary etc.).
Now 3.5 months are over, that competition was called off for this year due to Corona & development is still ongoing.
We are nowhere close to completion even for initial internal QA round.
On top of this, nothing is billable so it's like financial suicide.
Remember whatever said here is only 10% of what is faced.
- An Engineering lead in a half billion dollar company.4 -
TL;DR: When picking vendors to outsource work to, vet them really well.
Backstory:
Got a large redesign project that involves rebuilding a website's main navigation (accessibility reasons).
Project is too big just for our dev team to handle with our workload so we got to bring a 3rd party vendor to help us. We do this often so no big deal.
But, this time the twist was Senior Management already had retained hours with a dev shop so they want us to use them for project. Okay...
It begins:
Have our scope / discovery meeting about the changes and our expected DevOps workflow.
Devs work Local and push changes to our Github, that kicks off the build and we test on Dev, then it goes to Staging for more testing & PM review. Once ready we can push to prod, or whenever needed. All is agreed, everyone was happy.
Emailed the vendors' project manager to ask for their devs Github accounts so we can add them to the project. Got no reply for 3 days.
4th day, I get back "Who sets up the Github accounts?"
fuck me. they've never used Github before but in our scope meeting 4 days ago you said Github was fine...??
Whatever, fuck it. I'll make the accounts and add them.
Added 4 devs to the repo and setup new branch. 40min later get an email that they can't setup dev environment now, the dev doesn't know how to setup our CMS locally, "not working for some reason."
So, they ask for permission to develop on our STAGING server.. "because it's already setup"... they want to actively dev on our staging where we get PM/Senior Management approvals?
We have dev, staging, production instances and you want to dev in staging, not dev?... nay nay good sir.
This is whom senior management wants us to use, already paid for via retainer no less. They are a major dev shop and they're useless...
😢😭
Cant wait for today's progress checkup meeting. 😐😐
/rant1 -
A couple of years ago I was working on a fairly large system with a complex (by necessity) access control architecture.
As is usually the case with those projects, it's awkward for developers to repro bugs that have to do with a user's accesses in production when we are not allowed to replicate production data in test, let alone locally.
We had a bug where I ended up making myself a new row in the production database for a thing I could have access to without affecting real data to repro it safely. I identified the bug so I could repro it in dev/test and removed the row and ensured everything worked normally, whew scary.
Have you ever walked into the office one day, and everyone is hunched over in a semicircle around one person's workstation, before one turns around to look at you and says - after a pause - "... ltlian?.."
Turns out I had basically "poisoned the well" with my dummy entity in a way where production now threw 500 for everyone BUT me who had transitive access to this post-non-entity. Due to the scope of the system, it had taken about a day for this to gradually propagate in terms of caching and eventual consistencies; new entities coming in was expected, but not that they disappear.
Luckily I had a decent track record for this to be a one-off. I sometimes think about how I would explain testing in prod and making it faceplant before going home for the day, other than "I assumed it would be fine". I would fire me.3 -
ideal sprint fallacy.
total days 10 , total hours(excluding breaks ) 8 hrs per day= 80 hrs per dev
code freeze day = day 8, testing+ fixing days : 8,9,10. release day : day 10
so ideal dev time = 7days/56 hr
meetings= - 1hr per day => 49 hrs per dev
- 1 day for planning i.e d1 . so dev time left . 6 days 42 hrs.
-----------
all good planning. now here comes the messups
1. last release took some time. so planning could not happen on d1. all devs are waiting. . devtime = 5 days 35 hrs.
2. during planning:
mgr: hey devx what's the status on task 1?
d: i integrated mock apis. if server has made the apis, i will test them .
mgr : server says the apis are done. whats your guestimate for the task completion?
d : max 1-2 hrs?
m : cool. i assign you 4 hrs for this. now what about task 2?
d : task told to me is done and working . however sub mgr mentioned that a new screen will be added. so that will take time
m : no we probably won't be taking the screen. what's your giestimate?
d : a few more testing on existing features. maybe 1-2 hrs ?
m: cool
another 4 hrs for u. what about task 3?
d : <same story>
m : cool. another 4 hrs for u. so a total of 12 hrs out of 35 hrs? you must be relaxed this sprint.
d : yeah i guess.
m cool.
-------
timelines.
d1: wasted i previous sprint
d2 : sprint planning
d3 : 3+ hrs of meetings, apis for task 1 weren't available sub manager randomly decided that yes we can add another screen but didn't discussed. updates on all 3 tasks : no change in status
d4 : same story. dev apis starts failing so testing comes to halt.
d5 : apis for task1 available . task 3 got additional improvement points from mgr out of random. some prod issue happens which takes 4+ hrs. update on tasks : some more work done on task 3, task 1 and 2 remains same.
d6 : task1 apis are different from mocks. additionally 2 apis start breaking and its come to know thatgrs did not explain the task properly. finally after another 3+ hrs of discussion , we come to some conclusions and resolutions
d7 : prod issue again comes. 4+ hrs goes into it . task 2 and 3 are discussed for new screen additiona that can easily take 2+ days to be created . we agree tot ake 1 and drop 2nd task's changes i finish task 2 new screens in 6 hrs , hoping that finally everything will be fine.
d8 : prod issue again comes, and changes are requested in task 2 and 3
day 9 build finally goes to tester
day 10 first few bugs come with approval for some tasks
day 11(day 1 of new sprint) final build with fixes is shared. new bugs (unrelated to tasks. basically new features disguised as bugs) are raised . we reject and release the build.
day 2 sprint planning
mgr : hey dev x, u had only 12 hrs of work in your plate. why did the build got delayed?
🥲🫡5 -
New normal. New app to build.
- Still have to maintain older systems in parallel
- That leaves 1 week for developing the new app
- Slept 2 hours a day, coding coding coding
- Tested the shit out of the app because.. hey, its to help the customers' safety and health... I don't mind staying up late
- Finished the app in 5 days, code is now on prod
- Could barely look myself at the mirror because I look like shit
- Btw the app requires an external device as an input, the existing device works flawlessly based on my testing
- We need more devices
- Clueless manager bought new model instead. He assumed everything is fine, no testing is required
- Tested the app with new device model, doesn't work
- Deadline closing in
- Thanks, there goes my sleep
- THANK YOU1 -
A loooong time ago...
I've started my first serious job as a developer. I was young yet enthusiastic as well as a kind of a greenhorn. First time working in a business, working with a team full of experienced full-lowered ultra-seniors which were waiting to teach me the everything about software engineering.
Kind of.
Beside one senior which was the team lead as well there were two other devs. One of them was very experienced and a pretty nice guy, I could ask him anytime and he would sit down with me a give me advice. I've learned a lot of him.
Fast forward three months (yes, three months).
I was not that full kind of greenhorn anymore and people started to give me serious tasks. I had some experience in doing deployments and stuff from my other job as a sysadmin before so I was soon known as the "deployment guy", setting up deployments for our projects the right way and monitoring as well as executing them. But as it should be in every good team we had to share our knowledge so one can be on vacation or something and another colleague was able to do the task as well.
So now we come to the other teammate. The one I was not talking about till now. And that for a reason.
He was very nice too and had a couple of years as a dev on his CV, but...yeah...like...
When I switched some production systems to Linux he had to learn something about Linux. Everytime he encountered an error message he turned around and asked me how to fix it. Even. For. The. Simplest. Error. He. Could. Google. Up.
I mean okay, when one's new to a system it's not that easy, but when you have an error message which prints out THE SOLUTION FOR THE ERROR and he asks me how to fix it...excuse me?
This happened over 30 times.
A. Week.
Later on I had to introduce him to the deployment workflow for a project, so he could eventually deploy the staging environment and the production environment by hisself.
I introduced him. Not for 10 minutes. I explained him the whole workflow and the very main techniques and tools used for like two hours. Every then and when I stopped and asked him if he had any questions. He had'nt! Wonderful!
Haha. Oh no.
So he had to do his first production deployment. I sat by his side to monitor everything. He did well. One or two questions but he did well.
The same when he did his second prod deploy. Everythings fine.
And then. It. Frikkin. Begins.
I was working on the project, did some changes to the code. Okay, deploy it to dev, time for testing.
Hm.
Error checking out git. Okay, awkward. Got to investigate...
On the dev server were some files changed. Strange. The repo was all up to date. But these changes seemed newer because they were fixing at least one bug I was working on.
This doubles the strangeness.
I want over to my colleague's desk.
I asked him about any recent changes to the codebase.
"Yeah, there was a bug you were working on right? But the ticket was open like two days so I thought I'll fix it"
What the Heck dude, this bug was not critical at all and I had other tasks which were more important. Okay, but what about the changed files?
"Oh yeah, I could not remember the exact deployment steps (hint from the author: I wrote them down into our internal Wiki, he wrote them done by hisself when introducing him and after all it's two frikkin commands), so I uploaded them via FTP"
"Uhm... that's not how we do it buddy. We have to follow the procedure to avoid..."
"The boss said it was fine so I uploaded the changes directly to the production servers. It's so much easier via FTP and not this deployment crap, sorry to say that"
You. Did. What?
I could not resist and asked the boss about this. But this had not Effect at all, was the long-time best-buddy-schmuddy-friend of the boss colleague's father.
So in the end I sat there reverting, committing and deploying.
Yep
It's soooo much harder this deployment crap.
Years later, a long time after I quit the job and moved to another company, I get to know that the colleague now is responsible for technical project management.
Hm.
Project Management.
Karma's a bitch, right? -
Worst prod scenario experienced - on site in small African country working on CRM/billing system my colleague was testing some new SQL and after finishing decided to drop and recreate the DB. She thinks the process is very slow and suddenly realizes she is dropping the prod DB. In a panic she shuts down the system and starts doing a restore from tape, but is so stressed out she writes "tar cv" instead of "tar xv" and overwrites the backup with the broken DB. Took a while to clean that one up...2
-
Tl;Dr Im the one of the few in my area that sees sftping as the prod service account shouldn't be a deployment process. And the ONLY ONE THAT CARES THAT THIS IS GONNA BREAK A BUNCH OF SHIT AT SOME POINT.
The non tl;dr:
For a whole year I've been trying to convince my area that sshing as the production service account is not the proper way to deploy and/or develop batch code. My area (my team and 3 sister teams) have no concept of using version control for our various Unix components (shell scripts and configuration files) that our CRITICAL for our teams ongoing success. Most develop in a "prodqa like" system and the remainder straight in production. Those that develop straight in prodqa have no "test" deployment so when they ssh files straight to actual production. Our area has no concept of continuous integration and automated build checking. There is no "test cases", no "systems testing" or "regression testing". No gate checks for changing production are enforced. There is a standing "approved" deployment process by the enterprise (my company is Whyyyyyyyyyy bigger than my area ) but no one uses it. In fact idk anyone in my area who knows HOW to deploy using the official deployment method. Yes, there is privileged access management on the service account. Yes the managers gets notified everytime someone accesses the privileged production account. The managers don't see fixing this as a priority. In fact I think I've only talk to ONE other person in my area who truly understands how terrible it is that we have full production change access on a daily basis. Ive brought this up so many times and so many times nothing has been done and I've tried to get it changed yet nothing has happened and I'm just SO FUCKING SICK that no one sees how big of a deal this. I mean, overall I live the area I work in, I love the people, yet this one glaring deficiency causes me so much fucking stress cause it's so fucking simple to fix.
We even have an newer enterprise deployment. Method leveraging a product called "urban code deploy" (ucd) to deploy a git repository. JUST FUCKING GIT WITH THE PROGRAM!!!!..... IT WAS RELEASED FUCKING 12 YEARS AGO......
Please..... Please..... I just want my otherwise normally awesome team to understand the importance and benefits of version control and approved/revertable deployments2 -
So we have this project that we are hosting on our testing server for presentation purposes ( already provisioning prod server ).
Our boss was presenting it to investors and my superior committed a bug there and was asking me help to figure out how to fix it (yeah.. he doesn't know how to checkout last commits in git... fml), and I realised the presentation might still be going on... so I asked: isn't boss showing it to investors?
superior: lol, idk maybe.
me: right... ( I proceed to roll back changes ) bye, have a good lunch.
And here I am having lunch considering my life choices. -
In a previous job, I was trying to organize a common repository with our shitty business partner so we could both be able to contribute our part so our work would not overlap. Not like they cared anyways.
One thing I quickly noticed is those fuckers would just straight up commit untested changes on master and cripples our whole testing and prod deployment at times because we were depending on a shitty IoT service they provided us onto which we had no control whatsoever.
I told my boss, who was often complaining about them being unreliable in the first place, I would simply restrict them from merging and commiting to develop or to master without my approval. We cannot keep working like this.
He told me that we could not impose on them our work practices and that I should not try to piss them off. To be diplomatic.
I politely and professionally refused to do it, but he did change his mind in the end. He and I left not too long after. I guess he felt obliged to respond that having his job at stake but you cannot condone voluntarily shitty work. -
Out of the frying pan, into the fire:
So in my first job, I thought it's just us operating so crazy: meddling with arcane C/C++ code from the 80's, shooting our code to production without testing, fixing hundred of customers data base entries by hand, letting an intern alter some core component (to have more logging) and directly push it to prod...
Silly me.
I mean I suspected, that maybe it's not only this tiny little company acting wild, that also the bigger companies with all their ISO certified processes, agile blabla, professional tooling whatsoever - will also have their skeleton in the closet,.. like some obscure assembler part buried in the heart of your code base nobody dares to touch...
How Pieter Hintjens asked about the state of the industry and all the fads so bluntly put it:
"It's all bullshit."
But we are humans, so we better jump on the bandwagon if we want to keep our jobs... and somehow try to keep that trashy house of cards from crashing down. -
So I discovered WAMP this weekend, it's so nice to not have to work on my prod server when making changes and testing out new projects :)
-
If a team uses multiple languages and stacks (Have, JS, Python) do you think it's better to have everyone use/constantly switch between them or have dedicated developers for each language (ie. 80% main, 20% others)?
--END QUESTION, ANSWER NOW BEFOREHAND CONTINUING---
---BEGIN RANT---
My boss likes keeping the team "will rounded" so everyone does everything. One month in working in Java, the next with Node web apps. When I switch to node, it takes like a week of "wtf doesn't it work.... what changed, is it a big?" And usually end it"oh right I remember I need to ..."
And also always... "How the fuck do I write tests in {some reading framework} again?"
So feels like everyone is just a generalist and no one is a master/has time to develop mastery. I don't know if it's just me (1/3 Senior developers on the team that has to do everything) or if I'm the only one that complains... Not that it makes a difference... (Only option to really be heard is to resign but I need to somewhere else to work and finding one is hard for personal reasons)
And well this is the biggest reason I would leave the team. No time for mastery, no standardization/shared knowledge (everyone does their own thing but probably not well and no time for testing or documentation; how the fuck does whatever you wrote work, how do we use it, what the fuck did you put in prod that does ... And where the fuck did you put it cuz it's not in ANY of our repos).
I always feel one day soon it will come crashing down and I can say "I told you so" but will then it's too late and I'll be there one cleaning it up... Again6 -
Week ago, leader of the artifacts/packages storage and mirroring with animal in Logo, fucked up our testing enabling new feature on theirs SaaS.
We created a ticket, they managed to fix that, although it took them long time to do so, probably due to timezones as fix was simple click on their Admin area.
Today they forwarded us email that there will be some changes that can impact prod ...
Great timing, great .. -
Should I be even be testing if no one cares.
I keep asking Devs regarding the functionalities of their system for testing and then, realize it's already on Prod even before I test in beta or enable to test in beta.
Do I exist!!!
Now, for the nth time, I have started to test when things are almost there or already there in Prod.
I should keep myself in adrenaline mode always on from now on.
Need to do something worthy.
The worst way to start a new job.2 -
This is an actual transcript...
Since it's way too long for the normal 5000 characters, hence splitting it up...
Infra Guy: mr Dev, could you please give some rational for update of jjb?
Dev: sparse checkout support is missing
Infra Guy: is this support mandatory to achive whatever you trying to do?
Dev: yes
Infra Guy: u trying to get set of specific folder for set of specific components?
Dev: yes
Infra Guy: bash script with cp or mv will not work for you?
Dev: no
Infra Guy: ?
Dev: when you have already present functionality why reinvent the wheel
Dev: jenkins has support for it
Dev: the jjb is the bottle neck
Infra Guy: getting this functionality onto our infra would have some implications
Dev: why should I write bash script if jenkins allows me to do that
Dev: what implications ??
Infra Guy: will you commit to solve all the issues caused by new jjb?
Dev: you show me the implications first
Infra Guy: like a year ago i have tried to get new jjb <commit_url>
Infra Guy: no, the implications is a grey area
Infra Guy: i cant show all of them and they may hit like in week or eve month
Dev: then why was it not tackled
Dev: and why was it kept like that
Infra Guy: few jobs got broken on something
Dev: it will crop up some time later
Dev: if jobs get broken because of syntax
Dev: then jobs can be fixed
Dev: is it not ???
Infra Guy: ofc
Infra Guy: its just a question who will fix them
Dev: follow the syntax and follow the guidelines
Dev: put up a test server and try and lets see
Dev: you have a dev server
Dev: why not try on that one and see what all jobs fails
Dev: and why they fail
Dev: rather than saying it will fail and who will fix
Dev: let them fail and then lets find why
Dev: I manually define a job
Dev: I get it done
Infra Guy: i dont think we have test server which have the same workload and same attention as our prod
Dev: unless you test how would you know ??
Dev: and just saying that it broke one with a version hence I wont do it
Infra Guy: and im not sure if thats fair for us to deal with implication of upgrading of the major components just cause bash script is not good enough for u
Dev: its pretty bad
Infra Guy: i do agree
Infra TL Guy: Dev, what Infra Guy is saying is that its not possible to upgrade without downtime
Infra Guy: no
Dev: how long a downtime are we looking at ??
Infra Guy: im saying that after this upgrade we will have deal with consequences for long time
Infra Guy-2: No this is not testing the upgrade is the huge effort as we dont have dev resources to handle each job to run
Dev: if your jjb compiles all the yaml without error
Dev: I am not sure what consequences are we talking of
Infra Guy: so you think there will be no consequences, right?
Dev: unless you take the plunge will you know ??
Dev: you have a dev server running at port 9000
Infra Guy: this servers runs nothing
Dev: that is good
Dev: there you can take the risk
Infra Guy: and the fack we have managed to put something onto api doesnt mean it works
Dev: what API ?
Infra Guy: jenkins api
Infra Guy: hmmm
Dev: what have you put on Jenkins API ??
Infra Guy: (
Dev: jjb is a CLI
Infra Guy: ((
Dev: is what I understand
Dev: not a Jenkins API
Infra Guy: (((
Dev: (((((
Infra Guy: jjb build xmls and push them onto api
Infra Guy: and its doent matter
Dev: so you mean to say upgrading a CLI is goig to upgrade your core jenkisn API
Dev: give me a break
Infra Guy: the matter is that even if have managed to build something and put it onto api
Infra Guy: doesnt mean it will work
Dev: the API consumes the xml file and creates a job
Infra Guy: right
Dev: if it confirms to the options which it understands
Dev: then everything will work
Dev: I am actually not getting your point Infra Guy
Infra Guy: i do agree mr Dev
Dev: we are beating around the bush
Infra Guy: just want to be sure that if this upgrade will break something
Infra Guy: we will have a person who will fix it
Dev: that is what CICD is supposed to let me know with valid reasons
Dev: why can't that upgrade be done
Infra Guy: it can be done
Infra Guy: i even have commit in place3 -
Not exactly related to the topic but the exact thing is chilling the fuck out .
I always was anxious and was completely paranoid about minor bugs in my application during prod deployments(that is when I didn't know about testing utils and so on) , till the point that I couldn't fix a minor bug in the CSS and I puked 5 times over.
It was rough times but then I got over it and it really helped me alot.
I know bugs are like really not the kind of things you'd want to see in any application but it will arise in every application :3 -
One nightmarish project that was doomed from the beginning, had me as the sole developer. I could hardly sleep when we began testing on a separate test system, but with (nearly) all the config stored in shared memory and copied from the production system, I dreaded, half awake, that the production server data base connection was still configured in the test system and that it was shooting all it's test data repeatedly to prod.
Finally drove to company in middle of the night at 4 o'clock. Checked everything was OK, tried to sleep 3 hours before the start of the work day.
This system also had the most hideous memory corruption in some shared memory that was used across several processes and should have been thoroughly protected by a mutex, but somehow, sometimes this crucial map, that was used to speed up the access to all the customer data just contained garbage.
Still haunts me to that day. (Like xkcd's unresolved tension of a non-matching parenthesis - an unresolved bug. -
In my initial days as a web developer, i was assigned a task, to implement a cart share functionality in an e commerce company.
I made the functionality and tested on my system.
Result: working good.
Pushed it to beta testing environment.
Resilt: working good.
Pushed to pre production environment.
Result: working good.
Pushed to live site.
Result: 😀 Error in live site..
So a call comes to me from my team lead..
Asks what was the issue...
Me: i dont know either.
....
After 3-4 hrs:
I found the reason.
My system, beta test env, pre prod env are all having latest php version (5.6 i guess)
But the live server had old version of php.
Me: laughed like anything.
I didn't know that these things would matter in such a great level.
Moral of the story:
Be one with the force (server in this case)2