Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "failing-software-systems"
-
--- GitHub 24-hour outage post mortem ---
As many of you will remember; Github fell over earlier this month and cracked its head on the counter top on the way down. For more or less a full 24 hours the repo-wrangling behemoth had inconsistent data being presented to users, slow response times and failing requests during common user actions such as reporting issues and questioning your career choice in code reviews.
It's been revealed in a post-mortem of the incident (link at the end of the article) that DB replication was the root cause of the chaos after a failing 100G network link was being replaced during routine maintenance. I don't pretend to be a rockstar-ninja-wizard DBA but after speaking with colleagues who went a shade whiter when the term "replication" was used - It's hard to predict where a design decision will bite back and leave you untanging the web of lies and misinformation reported by the databases for weeks if not months after everything's gone a tad sideways.
When the link was yanked out of the east coast DC undergoing maintenance - Github's "Orchestrator" software did exactly what it was meant to do; It hit the "ohshi" button and failed over to another DC that wasn't reporting any issues. The hitch in the master plan was that when connectivity came back up at the east coast DC, Orchestrator was unable to (un)fail-over back to the east coast DC due to each cluster containing data the other didn't have.
At this point it's reasonable to assume that pants were turning funny colours - Monitoring systems across the board started squealing, firing off messages to engineers demanding they rouse from the land of nod and snap back to reality, that was a bit more "on-fire" than usual. A quick call to Orchestrator's API returned a result set that only contained database servers from the west coast - none of the east coast servers had responded.
Come 11pm UTC (about 10 minutes after the initial pant re-colouring) engineers realised they were well and truly backed into a corner, the site was flipped into "Yellow" status and internal mechanisms for deployments were locked out. 5 minutes later an Incident Co-ordinator was dragged from their lair by the status change and almost immediately flipped the site into "Red" status, a move i can only hope was accompanied by all the lights going red and klaxons sounding.
Even more engineers were roused from their slumber to help with the recovery effort, By this point hair was turning grey in real time - The fail-over DB cluster had been processing user data for nearly 40 minutes, every second that passed made the inevitable untangling process exponentially more difficult. Not long after this Github made the call to pause webhooks and Github Pages builds in an attempt to prevent further data loss, causing disruption to those of us using Github as a way of kicking off our deployment processes (myself included, I had to SSH in and run a git pull myself like some kind of savage).
Glossing over several more "And then things were still broken" sections of the post mortem; Clever engineers with their heads screwed on the right way successfully executed what i can only imagine was a large, complex and risky plan to untangle the mess and restore functionality. Github was picked up off the kitchen floor and promptly placed in a comfy chair with a sweet tea to recover. The enormous backlog of webhooks and Pages builds was caught up with and everything was more or less back to normal.
It goes to show that even the best laid plan rarely survives first contact with the enemy, In this case a failing 100G network link somewhere inside an east coast data center.
Link to the post mortem: https://blog.github.com/2018-10-30-...6 -
Been reading devrant posts for a month or so, this is my first actual post. I'm hoping it will be therapeutic. ☺️ I need something to keep me from killing my boss when I see him again tomorrow..
Some backstory: Currently working in HR for the last 7 or so years with complete shit for brains boss, even worse when it comes to anything related to technology. For almost two years I've been working to get another bachelor's degree. This time in computer sciences, to make a career switch to systems and software engineer. Last week I roughly had the following wonderful conversation:
Boss: we've needed new Recruitment software for a while now. Can't you make us one as a school project?
Me: 'Make us one?' It's not really that simple.. I'm barely halfway through my education, maybe I could do it, but it would take me quite a long time even if I could work on it fulltime.. Combining a halftime job with a fulltime education is taking up enough of my time as it is and I have more than enough school projects btw..
Boss: it would be a win-win. Work a little harder in your spare time and when you graduate you have a real-life project on your resume.
Me: I'm sorry, i'm failing to see the 'win' for me here.. I work 10 hours a day, 7 days a week on average, trying to combine work and studies. I'm pretty much maxed out..
Boss: Your coworker(also extreme dumbass) told me you wrote some quick code the other day that helped him out. Don't underestimate yourself, I'm sure you can do this.
Me(in complete disbelief by now): I wrote him an Excel-macro! They don't even teach me that at school. It's a very very very long way from actual software development! I'm sorry, it just can't be done.
Boss: Thats too bad. I expected you to welcome an opportunity like this and be more motivated towards this company..
Me: ***more disbelief and silence, just staring at him***
I'm sorry you feel that way.
***walked away***
WTF, I work my ass off for 7 years for this fucking shithead.. Even before I started this bachelors degree I had at least some understanding of the work developers put in their software. It blows my mind, no, it fucking angers me how people think making software is so simple.. Why do you think it's a 3-year education you fucking cunt?
Please, someone tell me how I can keep myself from ramming his fucking head through a wall tomorrow...6 -
Hi everyone, long time no see.
Today I want to tell you a story about Linux, and its acceptance on the desktop.
Long ago I found myself a girlfriend, a wonderful woman who is an engineer too but who couldn't be further from CS. For those in the know, she absolutely despises architects. She doesn't know the size units of computers, i.e. the multiples of the byte. Breaks cables on the regular, and so on. For all intents and purposes, she's a user. She has written some code for a college project before, but she is by no means a developer.
She has seen me using Linux quite passionately for the last year or so, and a few weeks ago she got so fed up with how Windows refused to work on both her computers (on one of them literally failing to run exe's, go figure), that she allowed me to reinstall both systems, with one of them being dualbooted Windows 10 + Linux.
The computer that runs Linux is not one she uses very often, but for gaming (The Sims) it's her platform to go. On it I installed Debian KDE, for the following reasons:
- It had to be stable as I didn't want another box to maintain.
- It had to be pretty OOTB, as first impressions are crucial.
- It had to be easy to use, given her skill level.
- It had to have a GUI abstraction to apt, the KDE team built Discover which looks gorgeous.
She had the following things to say about Linux, when she went to download The Sims from a torrent (I installed qBittorrent for her iirc).
"Linux is better, there's no need to download anything"
"Still figuring things out, but I'm liking it"
"I'm scared of using Windows again, it's so laggy"
"Linux works fine, I'm becoming a Linux user"
Which you can imagine, it filled me with pride. We've done it boys. We've built a superior system that even regular users can use, if the system is set up to be user-friendly.
There are a few gripes I still have, and pitfalls I want to address. There's still too many options, users can drown in the sheer amount of distro's to choose from. For us that's extremely important but they need to have a guide there. However, don't do remote administration for them! That's even worse than Microsoft's tracking! Whenever you install Linux on someone else's computer, don't be all about efficiency, they are coming from Windows and just want it to be easy to use. I use Mate myself, but it is not the thing I would recommend to others. In other words, put your own preferences aside in favor of objective usability. You're trying to sell people on a product, not to impose your own point of view. Dualboot with Windows is fine, gaming still sucks on Linux for the most part. Lots of people don't have their games on Steam. CAD software and such is still nonexistent (OpenSCAD is very interesting but don't tell me it's user-friendly). People are familiar with Windows. If you were to be swimming for the first time in the deep water, would you go without aids? I don't think so.
So, Linux can be shown and be actually usable by regular people. Just pitch it in the right way.11 -
Why do certain software systems fail on a specific day (Friday/Sunday)? For example, getting error 500 "registration module issues :(" on their part.
Do they have some kind of deployment day then? Even so, this isn't acceptable for today's standards.
Oh, let's cut off our customers from purchasing something on those days. What?
Something leads me to think they're not hosted in the cloud then. Outdated architecture..3 -
I'll have to make some tough choices over the next 6 months. With my tech career beginning and my college education ramping up, time is of the essence, and the skills I develop now will be at the forefront of my future. So what does this have to do with Microsoft?
Well, the story begins in the Spring of 2016. Social Forums was about to turn a year old, Trump's campaign was ramping up, and I had just found my love for technology. With all my friends having phones, I had to get a phone and get working on development. The year before, Windows 10 was launched, and I was psyched. I found Microsoft's products to be underrated with potential. That day, I purchased a Lumia 640, upgraded it to Windows 10, and immediately began working. After another year-and-a-half gone by, I went from loving Microsoft, to defending Microsoft, to tolerating Microsoft. I could go on and on about the lousy structure, the privacy issues, the forced upgrades, the redundant developer platform, and other such issues that is leading me away from them. But if there is one thing they have proven over the years, is that the they are completely out of touch with its developers and its customers. They spent years ramping up their phones. They failed. They spend years ramping up their phones. They failed. They spend years ramping up their semi-annual OS updates. They failed. So why did they fail? It's not that they made the wrong prediction out of chance. They legitimately don't care about feedback. It's their way or the highway. This sounds vaguely familiar. They have been spending a decade ignoring feedback from the community because they want to become just like Apple. Right now, Apple LIVES off of brand loyalty and its stable, useful ecosystem. This cannot work for Microsoft as they don't have a lot of brand loyalty. But most of all, they don't have a working ecosystem. They have Windows Insiders, which provides them with hundreds of feedback messages per day. These include suggestions, bug reports, and constructive criticism. The feedback is public. You can have several pages of the same complaint, and they still won't do anything about it. They say they have a good relationship with their community, and that this Beta program helps Windows become better for all. But in the end, we are nothing more than a glorified unpaid labor force. They fired hundreds of professional debuggers just before the Insider Program took off. We are only here to provide bug reports for free. Now that their phones, AR headsets, browser, online services, and VR headsets are failing for all these reasons, I see little reason to develop for Windows anymore. I don't just mean their UWP and App Store platforms, I mean Windows as a whole. I'm definitely not a Mac guy either. I never see myself going to Mac either, as they are really no different in terms of how they treat their Developers and PC users. If things continue down this route, I will leave the platform all together. I've always wanted to be a Systems Programmer, so I don't really need an established paid platform to be successful. Even now, I'm not certain about leaving Windows altogether but as a developer, I need to find my place. Time is of the essence in my life, and I need to find out my place in the software world. Now I think it isn't on the Windows platform like I had dreamed it would be. But where do I go?10