Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "service status"
-
Happened a few weeks ago but still awesome.
Me and a good friend have a website together but we don't monitor it too much.
He studied with me in the same class but went towards frontend/apps where I chose backend/servers/security. He knows how to do basic Linux stuff but that's about it.
We were at a party when he noticed that our site was offline. Walked over to me (because I manage the server) to notify me so I could look into it said I'd look into it (phone):
*visits site: nothing*
*online dig tool: got the server ip*
*remembered this one didn't have pubkey authentication - after three passwords attempts I'm in*
"service apache2 status"
*service doesn't exist*
*right, migrated this one from Apache to nginx....*
"history"
*ah, an nginx restart probably suffices...*
"service nginx restart"
BAM, site is reachable again.
*god damnit, lets encrypt cert expired...*
"history"
*sees command with certbot and our domain both in one*
"!892"
*20 seconds later: success message*
*service nginx reload*
BAM, site works securely again.
"Yo mate, check the site again"
Mate: 😶 w-w-what? *checks site and his watch* you started less than two minutes ago...?
Me: yeah..?
Mate: 😶 now this is why YOU manage our server and I don't 😐
His face was fucking gold. It wasn't that difficult for me (I do this daily) but to him, I was a God at that moment.
Awesome moment 😊24 -
Dear nerds from all over the world,
We get it. 404 pics are funny.
But did you know there other status codes too?
Like...
204 - No Response
301 - Moved
302 - Found
400 - Bad request
401 - Unauthorized
402 - Payment Required
403 - Forbidden
501 - Not Implemented
502 - Service Temporarily Overloaded
I'm sure you'll also find funny situations with these.
Thanks. We're the best!26 -
Navy story time, and this one is lengthy.
As a Lieutenant Jr. I served for a year on a large (>100m) ship, with the duties of assistant navigation officer, and of course, unofficial computer guy. When I first entered the ship (carrying my trusty laptop), I had to wait for 2 hours at the officer's wardroom... where I noticed an ethernet plug. After 15 minutes of waiting, I got bored. Like, really bored. What on TCP/IP could possibly go wrong?
So, scanning the network it is. Besides the usual security holes I came to expect in ""military secure networks"" (Windows XP SP2 unpatched and Windows 2003 Servers, also unpatched) I came along a variety of interesting computers with interesting things... that I cannot name. The aggressive scan also crashed the SMB service on the server causing no end of cute reactions, until I restarted it remotely.
But me and my big mouth... I actually talked about it with the ship's CO and the electronics officer, and promptly got the unofficial duty of computer guy, aka helldesk, technical support and I-try-to-explain-you-that-it-is-impossible-given-my-resources guy. I seriously think that this was their punishment for me messing around. At one time I received a call, that a certain PC was disconnected. I repeatedly told them to look if the ethernet cable was on. "Yes, of course it's on, I am not an idiot." (yea, right)
So I went to that room, 4 decks down and 3 sections aft. Just to push in the half-popped out ethernet jack. I would swear it was on purpose, but reality showed me I was wrong, oh so dead wrong.
For the full year of my commission, I kept pestering the CO to assign me with an assistant to teach them, and to give approval for some serious upgrades, patching and documenting. No good.
I set up some little things to get them interested, like some NMEA relays and installed navigation software on certain computers, re-enabled the server's webmail and patched the server itself, tried to clean the malware (aka. Sisyphus' rock), and tried to enforce a security policy. I also tried to convince the CO to install a document management system, to his utter horror and refusal (he was the hard copy type, as were most officers in the ship). I gave up on almost all besides the assistant thing, because I knew that once I left, everything would go to the high-entropy status of carrying papers around, but the CO kept telling me that would be unnecessary.
"You'll always be our man, you'll fix it (sic)".
What could go wrong?
I got my transfer with 1 week's notice. Panic struck. The CO was... well, he was less shocked than I expected, but still shocked (I learned later that he knew beforehand, but decided not to tell anybody anything). So came the most rediculous request of all:
To put down, within 1 A4 sheet, and in simple instructions, the things one had to do in order to fulfil the duties of the computer guy.
I. SHIT. YOU. NOT.
My answer:
"What I can do is write: 'Please read the following:', followed by the list of books one must read in order to get some introductory understanding of network and server management, with most accompanying skills."
I was so glad I got out of that hellhole.6 -
One week, and it turned out to be worse than that.
I was put on a project for a COVID-19 program in America (The CARES Act). The financial team came to us on Monday morning and said they need to give away a couple thousand dollars.
No big deal. All they wanted was a single form that people could submit with some critical info. Didn't need a login/ registration flow or anything. You could have basically used Google Forms for this project.
The project landed in my lap just before lunch on Monday morning. I was a junior in a team with a senior and another junior on standby. It was going to go live the next Monday.
The scope of the project made it seem like the one week deadline wasn't too awful. We just had to send some high priority emails to get some prod servers and app keys and we were fine.
Now is the time where I pause the rant to express to you just how fine we were decidedly **not**: we were not fine.
Tuesday rolls around and what a bad Tuesday it was. It was the first of many requirement changes. There was going to need to be a review process. Instead of the team just reading submissions from the site, they needed accept and reject buttons. They needed a way to deny people for specific reasons. Meaning the employee dashboard just got a little more complicated.
Wednesday came around and yeah, we need a registration and login flow. Yikes.
Thursday came and the couple-thousand dollars turned into a tens of millions. The amount of users we expected just blew up.
Friday, and they needed a way for users to edit their submissions and re-submit if they were rejected. And we needed to send out emails for the status of their applications.
Every day, a new meeting. Every meeting, new requirements that were devastating given our timeframe.
We put in overtime. Came in on the weekend. And by Monday, we had a form that users could submit and a registration/ login flow. No reviewer dashboard. We figured we could take in user input on time and then finish the dashboard later.
Well, financial team has some qualms. They wanted a more complicated review process. They wanted roles; managers assign to assistants. Assistants review assigned items.
The deadline that we worked so hard on whizzed by without so much as a thought, much less the funeral it deserved.
Then, they wanted multiple people to review an application before it was final. Then, they needed different landing pages for a few more departments to be able to review different steps of the applications.
Ended up going live on Friday, close to a month after that faithful Monday which disrupted everything else I was working on, effective immediately.
I don't know why, but we always go live on a Friday for some reason. It must be some sort of conspiracy to force overtime out of our managers. I'm baffled.
But I worked support after the launch.
And there's a funny story about support too: we were asked to create a "submit an issue" form. Me and the other junior worked on it on a wednesday three weeks into the project. Finished it. And the next day it was scrapped and moved to another service we already had running. Poor management like that plagued the project and worked in tandem with the dynamic and ridiculous requirements to make this project hell.
Back to support.
Phone calls give me bad anxiety. But Friday, just before lunch, I was put on the support team. Sure, we have a department that makes calls and deal with users. But they can't be trained on this program: it didn't exist just a month ago, and three days ago it worked differently (the slippery requirements never stopped).
So all of Friday and then all of Saturday and all of Monday (...) I had extended panic attacks calling hundreds of people. And the team that was calling people was only two people. We had over 400 tickets in the first two days.
And fuck me, stupid me, for doing a good job. Because I was put on the call team for **another** COVID project afterwards. I knew nothing about this project. I have hated my job recently. But I'm a junior. What am I gonna say, no?7 -
It's maddening how few people working with the internet don't know anything about the protocols that make it work. Web work, especially, I spend far too much time explaining how status codes, methods, content-types etc work, how they're used and basic fundamental shit about how to do the job of someone building internet applications and consumable services.
The following has played out at more than one company:
App: "Hey api, I need some data"
API: "200 (plain text response message, content-type application/json, 'internal server error')"
App: *blows the fuck up
*msg service team*
Me: "Getting a 200 with a plaintext response containing an internal server exception"
Team: "Yeah, what's the problem?"
Me: "...200 means success, the message suggests 500. Either way, it should be one of the error codes. We use the status code to determine how the application processes the request. What do the logs say?"
Team: "Log says that the user wasn't signed in. Can you not read the response message and make a decision?"
Me: "That status for that is 401. And no, that would require us to know every message you have verbatim, in this case, it doesn't even deserialize and causes an exception because it's not actually json."
Team: "Why 401?"
Me: "It's the code for unauthorized. It tells us to redirect the user to the sign in experience"
Team: "We can't authorize until the user signs in"
Me: *angermatopoeia* "Just, trust me. If a user isn't logged in, return 401, if they don't have permissions you send 403"
Team: *googles SO* "Internet says we can use 500"
Me: "That's server error, it says something blew up with an unhandled exception on your end. You've already established it was an auth issue in the logs."
Team: "But there's an error, why doesn't that work?"
Me: "It's generic. It's like me messaging you and saying, "your service is broken". It doesn't give us any insight into what went wrong or *how* we should attempt to troubleshoot the error or where it occurred. You already know what's wrong, so just tell me with the status code."
Team: "But it's ok, right, 500? It's an error?"
Me: "It puts all the troubleshooting responsibility on your consumer to investigate the error at every level. A precise error code could potentially prevent us from bothering you at all."
Team: "How so?"
Me: "Send 401, we know that it's a login issue, 403, something is wrong with the request, 404 we're hitting an endpoint that doesn't exist, 503 we know that the service can't be reached for some reason, 504 means the service exists, but timed out at the gateway or service. In the worst case we're able to triage who needs to be involved to solve the issue, make sense?"
Team: "Oh, sounds cool, so how do we do that?"
Me: "That's down to your technology, your team will need to implement it. Most frameworks handle it out of the box for many cases."
Team: "Ah, ok. We'll send a 500, that sound easiest"
Me: *..l.. -__- ..l..* "Ok, let's get into the other 5 problems with this situation..."
Moral of the story: If this is you: learn the protocol you're utilizing, provide metadata, and stop treating your customers like shit.22 -
This just in everyone...
Android Dev: *sent and email to network admin* can you please unblock github for a few mins.
Network admin: *Replied* Can you take a screenshot whats the error your getting.
Android Dev: *Replied with screenshot* "Failed to load resource: the server responded with a status of 503 (Service Unavailable)"
Network admin: that is a known issue. *Replied with Wordpress Links.
Android Dev: why is github working outside our network then?
Network admin: there must be a problem with your code that needs to be tweaked.
Team: *FACE PALM*5 -
"Thank you for choosing Microsoft!"
No Microsoft, I really didn't choose you. This crappy hardware made you the inevitable, not a choice.
And like hell do I want to run your crappy shit OS. I tried to reset my PC, got all my programs removed (because that's obviously where the errors are, not the OS, right? Certified motherfuckers). Yet the shit still didn't get resolved even after a reset. Installing Windows freshly again, because "I chose this".
Give me a break, Microshaft. If it wasn't for your crappy OS, I would've gone to sleep hours ago. Yet me disabling your shitty telemetry brought this shit upon me, by disabling me to get Insider updates just because I added a registry key and disabled a service. Just how much are you going to force data collection out of your "nothing to hide, nothing to fear" users, Microsoft?
Honestly, at this point I think that Microsoft under Ballmer might've been better. Because while Linux was apparently cancer back then, at least this shitty data collection for "a free OS" wasn't yet a thing back then.
My mother still runs Vista, an OS that has since a few months ago reached EOL. Last time she visited me I recommended her to switch to Windows 7, because it looks the same but is better in terms of performance and is still supported. She refused, because it might damage her configurations. Granted, that's probably full of malware but at this point I'm glad she did.
Even Windows 7 has telemetry forcibly enabled at this point. Vista may be unsupported, but at least it didn't fall victim to the current status quo - data mining on every Microshaft OS that's still supported.
Microsoft may have been shady ever since they pursued manufacturers into defaulting to their OS, and GPU manufacturers will probably also have been lobbied into supporting Windows exclusively. But this data mining shit? Not even the Ballmer era was as horrible as this. My mother may not realize it, but she unknowingly avoided it.6 -
*tries to SSH into my laptop to see how that third kernel compilation attempt went*
… From my Windows box.
Windows: aah nope.
"Oh God maybe the bloody HP thing overheated again"
*takes laptop from beneath the desk indent*
… Logs in perfectly. What the hell... Maybe it's SSH service went down?
$ systemctl status sshd
> active (running)
Well.. okay. Can I log in from my phone?
*fires up Termux*
*logs in just fine*
What the fuck... Literally just now I added the laptop's ECDSA key into the WSL known_hosts by trying to log into it, so it can't be blocked by that shitty firewall (come to think of it, did I disable that featureful piece of junk yet? A NAT router * takes care of that shit just fine Redmond certified mofos).. so what is it again.. yet another one of those fucking WanBLowS features?!!
condor@desktop $ nc -vz 192.168.10.30 22
Connection to 192.168.10.30 22 port [tcp/ssh] succeeded!
ARE YOU FUCKING FOR REAL?!
Fucking Heisen-feature-infested piece of garbage!!! Good for gaming and that's fucking it!
Edit: (*) this assumes that your internal network doesn't have any untrusted hosts. Public networks or home networks from regular users that don't audit their hosts all the time might very well need a firewall to be present on the host itself as well.17 -
Apologies to everyone.
I got sick.
Hence, resulted in service unavailability.
Current health status: 100% OK
Please continue to enjoy me aka github7 -
I helped a homeless man with a really bad foot condition down the stairs to get to the subway platform along with bringing down his wheelchair. Just his luck that the elevator was out of service. The man cried and said that he wished more people like me existed in the world but the world he knew was long gone from his grasp.
My problem with that situation was that people just ignored him and look disgusted at him....since when did we as humans lose faith in each other to the point that we ultimately care about differences, status, and appearance before anything else?
Such a sad fucking world we live in today and I had just learned that he used to be a programmer and worked at IBM back in the days (he had a really old decrepit badge too and we shared some programming woes before we got on separate trains).
Bless him and fuck everyone else who stood by and watched the poor man suffer. Those people are worse than shit, they are the scum of the Earth in my eyes.4 -
Highlights from my week:
Prod access: Needed it for my last four tickets; just got it approved this week. No longer need it (urgently, anyway). During setup, sysops didn’t sync accounts, and didn’t know how. Left me to figure out the urls on my own. MFA not working.
Work phone: Discovered its MFA is tied to another coworker’s prod credentials. Security just made it work for both instead of fixing it.
My merchant communication ticket: I discovered sysops typo’d my cronjob so my feature hasn’t run since its release, and therefore never alerted merchants. They didn’t want to fix it outside of a standard release. Some yelling convinced them to do it anyway.
AWS ticket: wow I seriously don’t give a crap. Most boring ticket I have ever worked on. Also, the AWS guy said the project might not even be possible, so. Weee, great use of my time.
“Tiny, easy-peasy ticket”: Sounds easy (change a link based on record type). Impossible to test locally, or even view; requires environments I can’t access or deploy to. Specs don’t cover the record type, nor support creating them. Found and patched it anyway.
Completed work: Four of my tickets (two high-priority) have been sitting in code review for over a month now.
Prod release: Release team #2 didn’t release and didn’t bother telling anyone; Release team #1 tried releasing tickets that relied upon it. Good times were had.
QA: Begs for service status page; VP of engineering scoffs at it and says its practically impossible to build. I volunteered. QA cheered; VP ignored me.
Retro: Oops! Scrum master didn’t show up.
Coworker demo: dogshit code that works 1 out of 15 times; didn’t consider UX or user preferences. Today is code-freeze too, so it’s getting released like this. (Feature is using an AI service to rearrange menu options by usage and time of day…)
Micromanager response: “The UX doesn’t matter; our consumers want AI-driven models, and we can say we have delivered on that. It works, and that’s what matters. Good job on delivering!”
Yep.
So, how’s your week going?2 -
Disclaimer: Long tale of a tech support job. Also the wk29 story is at the bottom.
One time I was working tech support for a website and email hosting firm that was in town. I was hired and worked as the only tech support person there, so all calls came in through me. This also meant that if I was on a call, and another one came through, they would go straight to voice mail. But I couldn't hang up calls either, so, sometimes someone would take up tons of time and I'd have to help them. I was also the "SEO" and "Social Media Marketing" person, as well; managed peoples' social media campaigns. I have tons of stories from this place but a few in particular stick out to me. No particular order to these, I'm just reminiscing as I write this.
I once had to help a man who couldn't find the start button on his computer. When I eventually guided him to allowing me to remote into his computer via Team Viewer, I found he was using Windows XP. I'm not kidding.
I once had to sit on the phone with a man selling Plexus Easy Weight Loss (snake oil, pyramid scheme, but he was a client) and have him yell at me about not getting him more business, simply because we'd built his website. No, I'D not built his website, but his website was fine and it wasn't our job to get him more business. Oh yeah, this is the same guy who said that he didn't want the social media marketing package because he "had people to hide from." Christ.
We had another client who was a conspiracy theorist and wanted the social media marketing package for his blog, all about United States conspiracies. Real nut case. But the best client I've ever had because sometimes he'd come into the office and take up my time talking at me about how Fukushima was the next 911 and that soon it'll spill into the US water supply and everybody was going to die. Hell, better than being on the phone! Doing his social media was great because he wanted me to post clearly fake news stories to his twitter and facebook for him, and I got to look at and manage all the comments calling him out on his bullshit. It was kinda fun. After all, it wasn't _me_ that believed all this. It felt like I was trolling.
[wk29] I was the social media and support techie, not a salesperson. But sometimes I was put in charge _alone_ in front of clients for status meetings about their social media. This one time we had a client who was a custom fashion-type person. I don't really remember. But I was told directly to make them a _new_ facebook page and post to it every day with their hot new deals and stuff. MONTHS pass since I do that and they come in for a face-to-face meeting. Boss is out doing... boss things and that means I have to sit in with her, and for some fucking reason she brought her boyfriend AND HER DAD. Who were both clearly very very angry with me, the company, and probably life. They didn't ever say anything at first, they didn't greet me, they were both just there like British royal guards. It was weird as fuck. I start showing them the page, the progress on their likes goals, etc etc. Marketing shit. They say, "huh, we didn't see any of these posts at home." Turns out they already had a Facebook page, I was working on a completely seperate one, and then the boyfriend finally chimes in with the biggest fucking scowl, "what are you going to do about this?" He was sort of justified, considering this was a payed and semi-expensive service we offered, but holy shit the amount of fire in all three of them. Anyway, it came down to me figuring out how to merge facebook pages, but they eventually left as clients. Is this my fuck up? Is it my company's? Is it theirs? I don't know but that was probably the most awkward meeting ever. Don't know if it comes across through text but the anxiety was pretty real. Fuck.
tl;dr Tech support jobs are a really fun and exciting entry level position I recommend everybody apply for if they're starting out in the tech world! You'll meet tons of cool people and every day is like a new adventure.2 -
It's my second rant about Windows here in two days, but here we go:
Windows used to be a cool OS (and in part it still is). Yes, it's made for the end user, not power users, yes it has many flaws. But it was my gateway to computers and programming. I have fond memories of my first PC, playing around with the old win98 themes (my favorite was the baseball one!).
However, I am very disappointed now. I just had to basically force Windows 10 to stop hogging my bandwidth. It was an actual battle, with the OS simply (I kid you not) running update and other services EVEN AFTER I SPECIFICALLY DISABLED THEM. I just saw the Windows update service running, while its status was disabled. It's absurd.
Sorry Windows, but that's not what I want. I want to choose what happens on my own OS. Linux gives me exactly that, why can't you?11 -
Hmm internet connection is down. Check isp status page...no issues. Wait 50mins on phone to get to support, where they tell me there is a known issue, reported 4 hours ago. After call check isp status page...no issues
Is AWS selling status pages as a service now?5 -
In today's episode of kidding on SystemD, we have a surprise guest star appearance - Apache Foundation HTTPD server, or as we in the Debian ecosystem call it, the Apache webserver!
So, imagine a situation like this - Its friday afternoon, you have just migrated a bunch of web domains under a new, up to date, system. Everything works just fine, until... You try to generate SSL certificates from Lets Encrypt.
Such a mundane task, done more than a thousand times already... Yet... No matter what you do, nothing works. Apache just returns a HTTP status code 403 - Forbidden.
Of course, what many folk would think of first when it came to a 403 error is - Ooooh, a permission issue somewhere in the directory structure!
So you check it... And re-check it to make sure... And even switch over to the user the webserver runs under, yet... You can access the challenge just fine, what the hell!
So you go deeper... And enable the most verbose level of logging apache is capable of - Trace8. That tells you... Not a whole lot more... Apparently, the webserver was unable to find file specified? But... Its right there, you can see it!
So you go another step deeper and start tracing the process' system calls to see exactly where it calls stat/lstat on the file, and you see that it... Calls lstat and... It... Returns -1? What the hell#2!
So, you compile a custom binary that calls lstat on the first argument given and prints out everything it returns... And... It works fine!
Until now, I chose to omit one important detail that might have given away the issue to the more knowledgeable right away. Our webservers have the URL /.well-known/acme-challenge/, used for ACME challenges, aliased somewhere else on the filesystem - To /tmp/challenges.
See the issue already?
Some *bleep* over at the Debian Package Maintainer group decided that Apache could save very sensitive data into /tmp, so, it would be for the best if they changed something that worked for decades, and enabled a SystemD service unit option "PrivateTmp" for the webserver, by default.
What it does is that, anytime a process started with this option enabled writes to /tmp/*, the call gets hijacked or something, and actually makes the write to a private /tmp/something/tmp/ directory, where something... Appeared as a completely random name, with the "apache2.service" glued at the end.
That was also the only reason why I managed fix this issue - On the umpteenth time of checking the directory structure, I noticed a "systemd-private-foobarbas-apache2.service-cookie42" directory there... That contained nothing but a "tmp" directory with 777 as its permission, owned by the process' user and group.
Overriding that unit file option finally fixed the issue completely.
I have just one question - Why? Why change something that worked for decades? I understand that, in case you save something into /tmp, it may be read by 3rd parties or programs, but I am of the opinion that, if you did that, its only and only your fault if you wrote sensitive data into the temporary directory.
And as far as I am aware, by default, Apache does not actually write anything even remotely sensitive into /tmp, so...
Why. WHY!
I wasted 4 hours of my life debugging this! Only to find out its just another SystemD-enabled "feature" now!
And as much as I love kidding on SystemD, this time, I see it more as a fault of the package maintainers, because... I found no default apache2/httpd service file in the apache repo mirror... So...8 -
In my previous company we developed a CRM web app for the company to use internally and it was in my humble opinion really easy to make sense of, but for some freaking we kept getting calls whenever someone got an error, and our default response was always to send us an email, then we will get back to you, as it was mostly stupid things they called about, for example, a customer might have to be status terminated, before you can click button A, button A would then be disabled and employees would call asking why. Apparently, people got annoyed by our response and went to the management, to get some guidelines as to when they could call the "development apartment" for help, so the management sends out some guidelines as to when they could call, write or whatever... The following was done without consulting us in any way ANY WAY AT ALL!... Because we all know management knows fucking best, and why bother asking the people that sit with it every day, and the way it was done was by saying:
If the background color on your error is red, it means the error is fatal and you can call the developers immediately, if its orange send an email and they will answer within 48 hours LIKE WTF... Seriously???. That was basically it, and honestly we had just been using colors, without much thought to it ofc red, was an error etc. But they we're not "OMG EVERYTHING IS BREAKING" alert, so we decided to use a couple of hours refactoring the color of the flash errors, and after that, we did not have many red alerts(None, yes none what so ever) We changed all the red ones to orange, and introduced some new colors. That worked for some time around 6 months or so, but then people obviously started calling again like, why even bother... So we created a simple service desk, blocked all incoming calls to our phones that were from regular employees, heard a lot of complaints about this from the employees, management was mad, we had so many meetings with those top paid management fuckers that know everything (way better than you and me), about how to handle this. As it took way too much of our time, that people couldn't bother trying simple things, or make some sense as to why a button is disabled etc. We ended up "winning", was allowed to block calls for some time, till the employees had learned to use a freaking simple service desk, it's not fucking rocket science Okay, stop being a pain in the ass... And it actually fucking worked! Most relaxing time after people got a hang of using the service desk instead of calling life was good after that... <3 -
The whois service for the legacy top-level domain for Germany (.de) is one of the most fucked up things on the internet.
For years now they've restricted the whois service to notice you about their website information service (https://denic.de/en, you run a search and get information about the domain) which already cost you an unnecessary amount of time if you simply want to lookup something.
A while back they changed it so that you need to state whether you want to look it up fotr informative purposes or business purposes, then they changed it so that you need to supply a reason in a text box.
The new (GDPR) way is that you only get the connectivity status ("connect", "free") via whois and the nameservers on the website (without supplying a reason, which actually is an improvement). Everything this either is for executive authorities or the domain owner (by entering their mail address or zip code).
Germany - the land of "We can opt out of any standard because we can and since theaws changed we can also behave like dickbutts".
Adding the GDPR now only fed the trolls even more.7 -
That moment when returning correct HTTP status codes from an API become a feature request 😒
For the meantime I will need to deal with endpoints returning status 200 for everything, and status 500 when the service crash. 🤦🏼♂️4 -
That awkward moment when you phone IKEA Costumer Service to ask for the buCode that is used in their API to see product status in stores.
First I spoke to a salesperson and almost had to explain what an API is. The next person understood (after a while) that it is a 3-digit code that identifies the store.
And yes, I am from Sweden. 😂2 -
Sometimes you keep being the product even when paying for the service:
https://twitter.com/LucaBongiorni/...4 -
Aren't you, software engineer, ashamed of being employed by Apple? How can you work for a company that lives and shit on the heads of millions of fellow developers like a giant tech leech?
Assuming you can find a sounding excuse for yourself, pretending its market's fault and not your shitty greed that lets you work for a company with incredibly malicious product, sales, marketing and support policies, how can you not feel your coders-pride being melted under BILLIONS of complains for whatever shitty product you have delivered for them?
Be it a web service that runs on 1980 servers with still the same stack (cough cough itunesconnect, membercenter, bug tracker, etc etc etc etc) incompatible with vast majority of modern browsers around (google at least sticks a "beta" close to it for a few years, it could work for a few decades for you);
be it your historical incapacity to build web UI;
be it the complete lack of any resemblance of valid documentation and lets not even mention manuals (oh you say that the "status" variable is "the status of the object"? no shit sherlock, thank you and no, a wwdc video is not a manual, i don't wanna hear 3 hours of bullshit to know that stupid workaround to a stupid uikit api you designed) for any API you have developed;
be it the predatory tactics on smaller companies (yeah its capitalism baby, whatever) and bending 90 degrees with giants like Amazon;
be it the closeness (christ, even your bugtracker is closed and we had to come up with openradar to share problems that you would anyway ignore for decades);
be it a desktop ui api that is so old and unmaintained and so shitty, but so shitty, that you made that cancer of electron a de facto standard for mainstream software on macos;
be it a IDE that i am disgusted to even name, xcrap, that has literally millions of complains for the same millions of issues you dont even care to answer to or even less try to justify;
be it that you dont disclose your long term plans and then pretend us to production-test and workaround-fix your shitty non-production ready useless new OS features;
be it that a nervous breakdown on a stupid little guy on the other side of the planet that happens to have paid to you dozens of thousands of euros (in mandatory licences and hardware) to actually let you take an indecent cut out of his revenues cos there is no other choice in a monopoly regime, matter zero to you;
Assuming all of these and much more:
How can you sleep at night with all the screams of the devs you are exploiting whispering in you mind? Are all the money your earn worth?
** As someone already told you elsewhere, HAVE SOME FUCKING PRIDE, shitty people AND WRITE THE FUCKING DOCS AND FIX THE FUCKING BUGS you lazy motherfuckers, your are paid more than 99.99% of people on earth, move your fucking greasy little fingers on that fucking keyboard. **
PT2: why the fuck did you remove the ESC key from your shitty keyboards you fuckshits? is it cos autocomplete is slower than me searching the correct name of a function on stackoverflow and hence ESC key is useless? at least your hardware colleagues had the decency of admitting their error and rolling back some of the uncountable "questionable "hardware design choices (cough cough ...magic mouse... cough golden charging cables not compatible with your own devices.. cough )?12 -
I hate having to deal with our IT service desk. Every time it takes enormous energy to get to the right people and make them understand that no, you are not an idiot, but you actually have a technical issue.
Sure thing they do have a few competent nice folks there too I've gotten to know over time and they indeed have to deal with a ton of dumb non-tech savvy idiots on a daily basis. However, if my job title mentions "software" and "engineer" they should at least assume I'm an idiot in tech. Or something. Every single time I need to open a ticket, even for the simplest "add x to env y", I need to quadruple check that the subject line is moron-friendly because otherwise they would take every chance to respond "nah we can't do that", "that's not us", or "sry that's not allowed". And then I would need to respond, "yes you do:) your slightly more competent colleague just did this for us 2 weeks ago".
Now you might imagine this is on even another level when the problem is complex.
One of our internal apps has been failing because one of the internal APIs managed by a service desk team responds a 500 status code randomly but only when called with a specific internal account managed by another service desk team.
(when I say "managed by", that doesn't mean they maintain it, it just mean they are the only ones who would have access to change something)
Yesterday I spent over a fucking hour writing a super precise essay detailing the issue, proving a million times it's not on our end and that they need to fix it. Now here is an insight to what beautiful "IT service" our service desk provides:
1) ticket gets assigned to a "Connectivity Engineer" lady
2) few hours later she responds and asks me to give her the app and environment IDs and grant her access to those
(naturally everything in my email was ignored including these two IDs)
3) since the app needs to be in prod for the issue, I make a copy isolating the failing part and grant her access to the original "for reference" and the copy to play with
4) few hours later I get an email from the env that some guy called P made changes to the actual app, no changes to the copy
(maybe they immediately fixed the app even though I asked them to only touch the copy)
I also check the env and the live app had been shared with another 2 people giving them editing rights:)
5) another few hours pass and the lady responds that she had been chatting with P (no mention of who tf that guy is) and that P has a suggestion that might work and I should test it, "please see screen shot" for details:
These motherfuckers sent me a fucking screenshot of the env config file where "P has edited a few parameters" that might help. The screenshot had a 16 line part of the config json with a bunch of IDs and Base64 params which HE EDITED LOCALLY.
Again, because I needed a few iterations to realise what I've just witnessed:
These idiots modified some things in the main app (not the copy) for hours. Then came to the conclusion that the config needs some IDs and params updated. They downloaded the config json. Edited it locally. Did not fucking upload it back to the main or test app. Did not test it live. Did not CC in or direct the guy with changes to me. Did not send me the modified config file. Did not even paste the new IDs into the email. But TOOK A FUCKING SCREENSHOT OF THE MODIFIED FILE AND SENT THAT SHIT TO ME. And then had the audacity to ask me to test it when they had access to it and that's literally their fucking job.
I had to compare the fucking screenshot to the live config file and manually type in the changes.
And no, it still doesn't work. And Now I have to get back to them showing it still fails the same way but I just can't deal with these people. Fuck. Was hoping by the time I write it all down it'd be better, and it does feel a bit better, but I still need to get this app fixed. And I can only do it through these... monkeys. I just can't. Talking to these people drains my life energy... I'm just sad. -
SO MAD. Hands are shaking after dealing with this awful API for too long. I just sent this to a contact at JP Morgan Chase.
-------------------
Hello [X],
1. I'm having absolutely no luck logging in to this account to check the Order Abstraction service settings. I was able to log in once earlier this morning, but ever since I've received this frustratingly vague "We are currently unable to complete your request" error message (attached). I even switched IP's via a VPN, and was able to get as far as entering the below Identification Code until I got the same message. Has this account been blocked? Password incorrect? What's the issue?
2. I've been researching the Order Abstraction API for hours as well, attempting to defuddle this gem of an API call response:
error=1&message=Authentication+failure....processing+stopped
NOWHERE in the documentation (last updated 14 months ago) is there any reference to this^^ error or any sort of standardized error-handling description whatsoever - unless you count the detailed error codes outlined for the Hosted Payment responses, which this Order Abstraction service completely ignores. Finally, the HTTP response status code from the Abstraction API is "200 OK", signaling that everything is fine and dandy, which is incorrect. The error message indicates there should be a 400-level status code response, such as 401 Unauthorized, 403 Forbidden or at least 400 Bad Request.
Frankly, I am extremely frustrated and tired of working with poorly documented, poorly designed and poorly maintained developer services which fail to follow basic methodology standardized decades ago. Error messages should be clear and descriptive, including HTTP status codes and a parseable response - preferably JSON or XML.
-----
This whole piece of garbage is junk. If you're big enough to own a bank, you're big enough to provide useful error messages to the developers kind enough to attempt to work with you.2 -
Everytime you tell yourself "This time I'm going to make them stop putting the cart before the horse again!!! No more forced shit implementations!!! NO MORE ! I'm strong!!"
The last hour in the next week:
- Selinux: off
- Firewall: Any-Any
- Application data: Everything installed on OS disc.
- Documentation: At best, someone remembers the server supposed-to-be dns record
- Service Accounts: Your domain admin account and sysadmin for databases.
- Patching: DON'T EVER THINK ABOUT IT..AND NO REBOOTING! I have set very important runtime variables.
- Backup: Maybe someone else will set this up.
- Monitoring: Not needed since clients will create tickets if system fails.
- Production Status: vague at best. Sort of silently transitioned to production.
- Handover status: Probably, but I quit before the project closed.
! -
sudo pacman -Syu --force
sudo reboot
Openfire and owncloud no longer work.
sudo systemstl status openfire
Java exception relating to SQL.
sudo systemstl status mariadb
No such service mariadb
WTF why would that get uninstalled, how the hell.
sudo pacman -S mariadb
Everything now works again
Arch can be a confusing place
😺
Maybe that force was a bad idea.5 -
One of the projects started a year ago was to replace our tech/status dashboard. Two of our new people were even dedicated to developing it full time for 6 ~ 8 months. I found out today the entire board is gone, and it's been replaced to a link to Service Now.
Nearly a year of work, completely flushed down the tubes. Glad I wasn't on that project.2 -
Stakeholder: Users are unable to buy tickets on the website. IT says Azure’s health check is showing an unhealthy status.
[It’s Sunday. Web Engineering is not on call so no one sees this right away.]
Stakeholder: IT restarted the Azure website twice, but users still can’t place orders.
Me: There was never an issue with the Azure site. That health check is inaccurate. There is a rewrite rule that sends the Azure supplied domain to our custom domain. The Azure health check doesn’t like that so it returns an unhealthy status. The problem is the ticketing server that the website has to communicate with. The ticketing server is overwhelmed and can’t handle more requests. IT should have checked the ticketing server’s logs. This has happened before and it’s never been an Azure issue. It’s a ticketing server issue.
Stakeholder and IT: Oops 😅
—-
JFC. Stop trying to make this web engineering’s problem. Stop trying to make it look like engineering dropped the ball. The ticketing server has experienced this issue multiple times. The ticketing server is maintained by a different team. The website’s symptoms are always the same and there are steps you need to take before you make the decision to restart the website, which will cause the website to show a blue screen of death that says 503 service unavailable for a few minutes. And we have a switch to shut off all transactions. Why do you not want to use it when it’s clear the website can’t process transactions???3 -
When the CTO/CEO of your "startup" is always AFK and it takes weeks to get anything approved by them (or even secure a meeting with them) and they have almost-exclusive access to production and the admin account for all third party services.
Want to create a new messaging channel? Too bad! What about a new repository for that cool idea you had, or that new microservice you're expected to build. Expect to be blocked for at least a week.
When they also hold themselves solely responsible for security and operations, they've built their own proprietary framework that handles all the authentication, database models and microservice communications.
Speaking of which, there's more than six microservices per developer!
Oh there's a bug or limitation in the framework? Too bad. It's a black box that nobody else in the company can touch. Good luck with the two week lead time on getting anything changed there. Oh and there's no dedicated issue tracker. Have you heard of email?
When the systems and processes in place were designed for "consistency" and "scalability" in mind you can be certain that everything is consistently broken at scale. Each microservice offers:
1. Anemic & non-idempotent CRUD APIs (Can't believe it's not a Database Table™) because the consumer should do all the work.
2. Race Conditions, because transactions are "not portable" (but not to worry, all the code is written as if it were running single threaded on a single machine).
3. Fault Intolerance, just a single failure in a chain of layered microservice calls will leave the requested operation in a partially applied and corrupted state. Ger ready for manual intervention.
4. Completely Redundant Documentation, our web documentation is automatically generated and is always of the form //[FieldName] of the [ObjectName].
5. Happy Path Support, only the intended use cases and fields work, we added a bunch of others because YouAreGoingToNeedIt™ but it won't work when you do need it. The only record of this happy path is the code itself.
Consider this, you're been building a new microservice, you've carefully followed all the unwritten highly specific technical implementation standards enforced by the CTO/CEO (that your aware of). You've decided to write some unit tests, well um.. didn't you know? There's nothing scalable and consistent about running the system locally! That's not built-in to the framework. So just use curl to test your service whilst it is deployed or connected to the development environment. Then you can open a PR and once it has been approved it will be included in the next full deployment (at least a week later).
Most new 'services' feel like the are about one to five days of writing straightforward code followed by weeks to months of integration hell, testing and blocked dependencies.
When confronted/advised about these issues the response from the CTO/CEO
varies:
(A) "yes but it's an edge case, the cloud is highly available and reliable, our software doesn't crash frequently".
(B) "yes, that's why I'm thinking about adding [idempotency] to the framework to address that when I'm not so busy" two weeks go by...
(C) "yes, but we are still doing better than all of our competitors".
(D) "oh, but you can just [highly specific sequence of undocumented steps, that probably won't work when you try it].
(E) "yes, let's setup a meeting to go through this in more detail" *doesn't show up to the meeting*.
(F) "oh, but our customers are really happy with our level of [Documentation]".
Sometimes it can feel like a bit of a cult, as all of the project managers (and some of the developers) see the CTO/CEO as a sort of 'programming god' because they are never blocked on anything they work on, they're able to bypass all the limitations and obstacles they've placed in front of the 'ordinary' developers.
There's been several instances where the CTO/CEO will suddenly make widespread changes to the codebase (to enforce some 'standard') without having to go through the same review process as everybody else, these changes will usually break something like the automatic build process or something in the dev environment and its up to the developers to pick up the pieces. I think developers find it intimidating to identify issues in the CTO/CEO's code because it's implicitly defined due to their status as the "gold standard".
It's certainly frustrating but I hope this story serves as a bit of a foil to those who wish they had a more technical CTO/CEO in their organisation. Does anybody else have a similar experience or is this situation an absolute one of a kind?2 -
Service status pages that poorly reflect actual service status are so annoying. Ex. GitHub is having a lot of latency issues with processing updates and like 5 people in my office noticed it while their status page still says everything is fine.
This isn't to explicitly call out GitHub since many service status pages behave like this, but it definitely shows a general weakness in these health checks. I've seen similar issues with tons of services, web hosts, etc. Monitoring is definitely hard but will hopefully keep getting better.1 -
Story time;
Major project, multi million budget, huge business and IT coordination, board level status updates, meeting started back in March 2018 for a Go Live of Aug 2019.
Based on draft requirements (and experience) I request the test environment be built for half of the work. Turns out that no one told Server Eng and they are out of space in both dev and prod until Q2 of 2019. We went from Green to Red because a Service Request.5 -
Tldr: no router, almost not work.
Ok I recently moved into a new house, and I signed a contract for an Internet line.
Problem is that the router has been sent at the ISP shop, where I was supposed to get it personally. But guess what? Covid emergency happened two days after, and the shop closed.
So, after spending two days calling customer service of both ISP and Postal office without being able to speak to anybody, I received a Sms saying that the pack was not delivered because the receiver was closed.
After some more unsuccessful calls to the same two entities I managed to find the actual shop's phone number, that was actually thw owner's house (he's working from home). I spoke to him, told the problem, and he changed the router destination to my house.
Today I checked the package status on the postal website and I saw that it seems that they tried every day, at 7:02 am, to deliver the bloody package again at the shop! I truly hope this was a bug on their tracking system. It's weird that the hours were always 7:02am, because the package delivery office opens at 8:30 am, so again I'm praying any existent and non-existent god that that's just a bug. I'm kinda tired of being stuck with my phone hotspot with limited GB and with ISP public routers with about 5Mbps.
I wish I had @netikras skills with router building.4 -
This project is just a complete clusterfuck... But nvm. We had to integrate a third party service pushing data into our system. Btw the service wasnt even working correctly. But that is just the tip of the iceberg. Its friday around lunch time. Message appears "what is the status of the integration?" Yeah havent started working on it. Last info was service is not stable. I doubt that this will be done this week. Next message from PO: "We will all push hard to get this done today and deploy to prod." Why? Because this dumbasses said to the customer this will be deployed eod. And by we you mean the devs once again doing overtime. Has this shit stopped? No. Like for the last two weeks its like we promised the customer xyz to be deployed tomorrow. Not a single dev was asked how long it takes to add this3
-
I think UPS' Api documentation and service must be the worst documented and build API I have ever seen from a corporate.
1. The developer website is a mess. A total mess. You can barely find the API type you are looking for.
2. When you get the API and download the documentation, the files, .pdf etc is still a mess. Pages long that most are craps.
3. Each request returns Status Code 200. Even if it is an error. This blew my mind.
4. Each request, based on error type or based on tracking activity returns different JSON schema.
For example, the JSON Schema for a shipment in transit is different from JSON schema for a shipment that has been delivered. A shipment that has been returned, a shipment that required signature etc. They are different from each other.
5. And the worst. They do not provide with test tracking codes. I have found some on internet, but they do not work in development and production environment.4 -
What does my current job hate descriptive names? Like to the point of active antagonism? We're a subscription based service. Why is SubscriptionExpiration the date you log in? Why does SubscriptionStatus only contain the name and id of product you're subscribed to and nothing to do with an actual status? All I need is the date a user subscribed. Why is this so hard!?!?7
-
In last episode of "How SystemD screwed me over", we talked about Systemd's PrivateTMP and how it stopped me from generating SSL certificates.
In today's episode - SystemD vs CGroups!
Mister Pottering and his team apparently felt that CGroups are underused (As they can be quite difficult to set up), and so decided to integrate them into SystemD by default. As well as to provide a friendlier interface to control their values.
One can read about these interactions in the manual page "systemd.resource-control"
All is cool so far. So what happened to me today?
Imagine you did a major system release upgrade of a production server, previously tested on a standalone server. This upgrade doesn't only upgrade the distribution however, it also includes the switch from SysVInit to SystemD. Still, everything went smooth before, nothing to worry now then, right? Wrong.
The test server was never properly stress-tested. This would prove to be an issue.
When the upgrade finishes, it is 4 AM. I am happy to go to bed at last. At 6 AM, however, I am woken up again as the server's webservices are unavailable, and the machine is under 100% CPU load. Weird, I check htop and see that Apache now eats up all 32 virtual cores. So I restart it, casting it off to some weird bug or something as the load returns to normal.
2 hours later, however, the same situation occurs. This time, I scour all the logs I can, and find something weird - Many mentions that Apache couldn't create a worker thread? That's weird.
Several hours of research and tinkering later, I found out the following:
1 - By default, all processes of a system that runs SystemD are part of several CGroups. One of these CGroups is the PID CGroup, meant to stop a runaway process from exhausting all PIDs/TIDs of a system.
This limit is, by default, set to a certain amount of the total available PIDs. If a process exhausts this limit, it can no longer perform operations like fork().
So now, I know the how and why, but how should I solve this? The sanest option would be to get a rough estimate of just how many threads the Apache webserver might need. This option, though, is harder, than apparent. I cannot just take the MaxRequestsWorkers number... The instance has roughly double the amount of threads already. The cause being, as I found out, the HTTP/2 module, which spawns additional threads that do not count towards this limit. So I have no idea what limit to set.
Or I could... Disable the limit for just the webserver via the TasksAccounting switch. I thought this would work. And it did seem to... Until I ran out of TIDs again - Although systemctl status apache2.service no longer reported the number of tasks or a task limit of the process, the PID CGroup stayed set to the previous limit. Later I found out that I can only really disable the Task Accounting for all the units of a given slice and its parents.
This, though, systemctl somewhat didn't make apparent (And I skimmed the manual, that part was my fault)
So... The only remaining option I had was to... Just set the limit to infinite. And that worked, at last.
It took me several hours to debug this issue. And I once again feel like uninstalling systemd again, in favor of sysvinit.
What did I learn? RTFM, carefully, everything is important, it is not enough to read *half* the paragraph of a given configuration option...
Oh, and apache + http/2 = huge TID sink. -
Betty: Opens slack chat with Bob, Tony and me to ask me to fix some data for a client who messed the setup. (Don’t worry just building a script that takes 3 hours to complete and that I must supervise)
Betty: Opens slack chat with Ron, Tim and me to ask me to force the system I made to ignore protocol because someone else’s fuck up made it so she didn’t get the output she expected.
Betty: proceeds to ask for status updates constantly on both chats. She also disguises them as her asking what she can do to “get it across faster” knowing there’s jack shit I or anyone can do to make it go “faster”.
Also Betty, vomits BS about my micro service being unstable in front of managers even though it is it’s correctness what brought to light a bug fucking up thousands of records silently.
Go fuck yourself Betty ☺️ and fuck the client5 -
Status: Got off hour+ long call with provider teir2 tech support because their "sync service" isn't syncing. "It's all cloud controlled" they tell me. Whatever.
It does have the ability to install a Windows service to do the needful! 🎉
However the program that does the actual syncing is the "launcher" application, and the service's only job is to tell the launcher to run. 🤦♂️
Their assumption is that there will be a user that gets smacked in the face with a UAC prompt when they first log in and just shrug it away. Which is the Launcher application.
The sync service is not capable of running the sync application without a desktop session I guess?
MOTHERTRUCKERS do you understand what the point of a Windows Service is?!?
I tried relating this situation to how Windows Update works: It will update whenever the fuck it wants without the user doing anything because of the Service, and you only configure the service with the Control Panel/Settings App. You don't need the Control Panel/Settings App running in order for Windows Update to work, but it's there for status info and configuration.
Anyways, this software does not do that. It apparently *requires* both the service AND the launcher program running in order to work. Not work properly, to work *at all*.
Anyways, It's installed on a computer that's not normally logged into, but is always on (where other "always needs to be running" programs live). Normally the hackaround would be to launch the program via Scheduled Task.
This program apparently does not want to run as a scheduled task, or the Task Scheduler is being stupid and can't figure out "Hey, it's time to run this program. Do it!". Naturally it runs if told manually.
The fact that I'm even doing this at all is stupid, but even more infuriating is that it's just not working unattended. You know, what the service should be doing. But no, the service runs happily all alone, doing nothing of note, while Task Scheduler sucks its stick running OneDrive installer but not the launcher program.
Pluckin' donuts...2 -
So I was looking into phone app development again (as you do) and I'm working on a simple QoL app for me and my SO that will help us automate some home management and finances stuff. Naturally I delved down the rabbit hole deep and wanted to have push notifications so we don't have to check the app periodically to know when certain things happen... Oh boy... Why is mobile development so convoluted, especially if you don't want to rely on Google Services...
It seems that the most accepted way of doing this is Firebase (FCM). Well me being me, I refuse to use google services for this and I prefer self hosted solutions (for data privacy reasons) which eliminates most products out there.
It also didn't help that my framework of choice is Flutter/Dart, because fuck Android Studio and the insane buggy XML stuff and fuck Android and it's constantly changing APIs...
Well In the end I decided on a rather simple solution and self hosted an AMQP service (RabbitMQ in my case, as I have some experience with it already) and implemented a foreground service in android platform specific code on top of my flutter project to kickstart it and made my phone a queue listener... This now means I can push notifications from my server to the Messaging Queue and it will be pushed into my App automatically!
One thing I found out on this journey was that Android now kills most background services and enforces foreground services to have a visible notification in the status drawer... which I actually approve of. It's a bit annoying that you can start a reliable background service, but I'm absolutely on-board with long running processes started by my apps are constantly visible...
Long story short, I love reinventing all the wheels, especially if it's for free and private... And I also went to sleep at 2AM again because this took longer that I'd like to tune... but it works, and it's google free...
I'm thinking of trying to package this up as a flutter module later, but first I want to do testing on battery life and the general life cycle of the service. RabbitMQ says they have the client library optimized for long-lasting connections and it should be just using a tcp socket, which should pretty much be what all the push notification services are doing anyway. I'm also not completely satisfied with how the permanent notification looks.. it isn't collapsible like some of the other ones from other apps and it's about 2 lines high instead of single line... which is something quite annoying and I'm struggling to find any relevant docs on how this is done other than possible making a custom Notification Style... but I just can't believe that everyone is doing that.. there must be a built-in somewhere -_-... Ugh Android is hell...
Anyway, if any android devs here have some hints, tips and tricks on how to handle this type of background/foreground process stuff and I'm doing something wrong let me know, cause googling this shit is a nightmare too!6 -
My selenium script cronjob is not running on the VPS.
I am not sure what's wrong with it
Even the crond service status just shows
the session was opened for the user root
shows the command
the session closed for user root
I tried the tons of solution available but nothing worked
I am using Ubuntu 16.04 with LXDE13 -
Another part of messy network gone.
Caching fucked me hard....
Isn't it just lovely that nowadays you need to nearly wipe a machine to get it from claiming stale data....
And thanks to DNS, HAProxy -/ service names / ... I think I know now why the curse of babel is so powerful.
When you have to think for 2 mins to make sure you've set the zone's right, cause otherwise you need to ProxyJump with SSH through more tunnels than imaginable (VPN/HO) to fix possible caching on several DNS servers.... You'll realize that it's russian roulette with too much bullets. :(
And If a monitoring service asks another monitoring service for status information which asks the first monitoring service which then asks the second monitoring cause you were too late...
You'll get very funky monitoring statistics.
Too slow, had to nuke it (mismatched a DNS name, the second monitoring service should have been a service node).
I think I've had more near death scenarios in the last 2 weeks than I like.
Hopefully I'll never have to do that again.
(Splitting and reordering a few dozen VLANs, assigning proper DNS names, loadbalancer migration....)