SkillsLinux administration, Bash, basics of PHP, jS and other languages
Joined devRant on 5/13/2016
Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple APILearn More
>Finds an URL that causes some sort of internal bug in a client's webapp
>Subsequent requests fill up the server's PHP-FPM slots, waiting for a session exclusive lock that never comes
>Effectively DoS's the server
>Sends it to a colleague to discuss the possible causes
>Forgets Slack happily indexes any link it's given
>Slack almost DoS the service
That awkward feeling when you try to make an easy to pick up and use UI and fail so horrendously, that even a person otherwise skilled in computer management fails to grasp it...
I'm looking at you Synology and your fancy DSM bullshit that I just spent 2 hours trying to make available on WAN.
I almost gave up... Then realised I can log onto there through SSH, sudo su onto root and check out the webserver configuration (nginx) manually to make heads and tails of how to use it!
God... Its just tuesday, and I already feel like I need a shot of something strong...
Anytime I operate with hardware RAIDs on prod servers, I still sweat from the nerves of sending a wrong command that would wipe the RAID metadata clean and make all the data disappear.
Doesn't help that the CLI tools (MegaCli / StorCli) are both kinda terrible. The prior has a terrible documentation / switch design and the latter cannot do everything the prior can...
In last episode of "How SystemD screwed me over", we talked about Systemd's PrivateTMP and how it stopped me from generating SSL certificates.
In today's episode - SystemD vs CGroups!
Mister Pottering and his team apparently felt that CGroups are underused (As they can be quite difficult to set up), and so decided to integrate them into SystemD by default. As well as to provide a friendlier interface to control their values.
One can read about these interactions in the manual page "systemd.resource-control"
All is cool so far. So what happened to me today?
Imagine you did a major system release upgrade of a production server, previously tested on a standalone server. This upgrade doesn't only upgrade the distribution however, it also includes the switch from SysVInit to SystemD. Still, everything went smooth before, nothing to worry now then, right? Wrong.
The test server was never properly stress-tested. This would prove to be an issue.
When the upgrade finishes, it is 4 AM. I am happy to go to bed at last. At 6 AM, however, I am woken up again as the server's webservices are unavailable, and the machine is under 100% CPU load. Weird, I check htop and see that Apache now eats up all 32 virtual cores. So I restart it, casting it off to some weird bug or something as the load returns to normal.
2 hours later, however, the same situation occurs. This time, I scour all the logs I can, and find something weird - Many mentions that Apache couldn't create a worker thread? That's weird.
Several hours of research and tinkering later, I found out the following:
1 - By default, all processes of a system that runs SystemD are part of several CGroups. One of these CGroups is the PID CGroup, meant to stop a runaway process from exhausting all PIDs/TIDs of a system.
This limit is, by default, set to a certain amount of the total available PIDs. If a process exhausts this limit, it can no longer perform operations like fork().
So now, I know the how and why, but how should I solve this? The sanest option would be to get a rough estimate of just how many threads the Apache webserver might need. This option, though, is harder, than apparent. I cannot just take the MaxRequestsWorkers number... The instance has roughly double the amount of threads already. The cause being, as I found out, the HTTP/2 module, which spawns additional threads that do not count towards this limit. So I have no idea what limit to set.
Or I could... Disable the limit for just the webserver via the TasksAccounting switch. I thought this would work. And it did seem to... Until I ran out of TIDs again - Although systemctl status apache2.service no longer reported the number of tasks or a task limit of the process, the PID CGroup stayed set to the previous limit. Later I found out that I can only really disable the Task Accounting for all the units of a given slice and its parents.
This, though, systemctl somewhat didn't make apparent (And I skimmed the manual, that part was my fault)
So... The only remaining option I had was to... Just set the limit to infinite. And that worked, at last.
It took me several hours to debug this issue. And I once again feel like uninstalling systemd again, in favor of sysvinit.
What did I learn? RTFM, carefully, everything is important, it is not enough to read *half* the paragraph of a given configuration option...
Oh, and apache + http/2 = huge TID sink.1
please die a painful and horrible death already, you living corpse of the times long gone. You're taking way too long.
((Seriously, MyISAM is so bad, yet so many people still use it because they don't know better))2
It's not everyday that I give Microsoft praise, but damn, the new Windows Terminal is... Surprisingly decent.
Together with WSL2, it allowed me to switch from working in a VM to working fully from Windows.
And with little tweaking of the settings file, it acts exactly the way I like.
Good job creating something modern, almost universal and usable Microsoft!12
mod-php is weird and should never have existed.
I hate having to deal with it, even if it's only still in use in years old legacy systems. FPM is so much nicer.
You know a server is having a jolly'ol time when, while logging through the serial console, it lags... Then, a few seconds later, you get a message
[time.seconds] Out of memory: Kill process PID (login) score 0 or sacrifice child
[time.seconds] Killed process PID (login) total-vm:65400kB, anon-rss:488kB, file-rss:0kB
10/10, only way to bring the server back to life was by a hard-reset :|3
WHAT. THE. FUCK.
Fucking UCEPROTECT blacklist, who the hell blacklists a whole fucking ASN when they detect even a large amount of spam coming from it? For all they know, it could be just a couple of IPs. But nooooo, instead of blacklisting IPs, they blacklist the whole ASN, so now, even some of our machines are on the list, without us ever doing anything. Just because the IP is from the DigitalOcean prefix. UGH.2
Am I the only one who hates when I enter a simple question like "PHP memory limit" and the first link *isn't* the official PHP documentation? Who gives a flip about some fancy third-party webpage where they write a whole flippin' article about a simple directive?
Ugh... The priority Google...6
You know your cmdline utility sucks when you have to publish a cheat sheet yourself, too, along the manual.
I'm looking at you, Broadcom, and that horrible MegaCLI raid management utility. Storcli is superior.
A client asked us today to disable TLS 1.0 and 1.1 across their servers.
Its not often that I say this. But this makes me proud. It's a good client. Going with the changin times. I wish all clients were like this one.
RIP TLS 1.0/1.1, took you long enough.2
"Hello, the drive of your XYZ server is getting full, would it be possible to prune some of the unused and/or old docker images and layers there please? Alternatively, we can offer to replace the drives with a higher capacity models for FOO extra per month"
"Hello, the disk use keeps growing and has reached the 95% mark, please prune some of your images to make space for new. If you wanted to choose the alternate option of disk capacity increase, we would have to do that as soon as possible, otherwise you may run out of space before the RAID array rebuilds"
"Hello, your server XYZ has completely ran out of disk space. Any changes that would require data being saved on disk may and probably will fail. Please free some space as soon as possible"
Ugh, I hate clients that just don't cooperate until shit hits the fan...
And no, we could not prune the space ourselves, its not our data to delete whenever we think it necessary.
We merely manage the machine's operation, keeping it online and its services running.3
>Asks client if the proxy can use self-signed cert
>Client agrees, no problem
>Client complains about "an error they're getting"
>The error: "Error in connection establishment: net::ERR_CERT_AUTHORITY_INVALID"
Am I a joke to you? Or am I just talking to a brick wall over there?7
Any Windows Sysadmins here? I have a question for you - How do you do it?
I only very rarely have to do something that would fall under "Windows System Administration", but when I do... I usually find something either completely baffling, or something that makes me want to tear our my hair.
This time, I had a simple issue - Sis brought me her tablet laptop (You know, the kind of tablets that come with a bluetooth keyboard and so can "technically" be called a laptop) and an SD card stating that it doesn't work.
Plugging it in, it did work, only issue was that the card contained file from a different machine, and so all the ACLs were wrong.
I... Dealt with Windows ACLs before, so I went right to the usual combination of takeown and icacls to give the new system's user rights to work with the files already present. Takeown worked fine... But icacls? It got stuck on the first error it encountered and didn't go any further - very annoying.
The issue was a found.000 folder (Something like lost+found folder from linux?) that was hidden by default, so I didn't spot it in the explorer.
Trying to take ownership of that folder... Worked for for files in there, safe for one - found.000\dir0000.chk$Txf; no idea what it is, and frankly neither do I care really.
Now... Me, coming from the Linux ecosystem, bang my head hard against the table whenever I get "Permission denied" as an administrator on the machine.
Most of the times... While doing something not very typical like... Rooting around (Hah... rooting... Get it?! I... Carry on) the Windows folder or system folders elsewhere. I can so-so understand why even administrators don't have access to those files.
But here, it was what I would consider a "common" situation, yet I was still told that my permissions were not high enough.
Seeing that it was my sister's PC, I didn't want to install anything that would let me gain system level permissions... So I got to writing a little forloop to skip the one hidden folder alltogether... That solved the problem.
My question is - Wtf? Why? How do you guys do this sort of stuff daily? I am so used to working as root and seeing no permission denied that situations like these make me loose my cool too fast too often...
Also - What would be the "optimal" way to go about this issue, aside for the forloop method?
The exact two commands I used and expected to work were:
takeown /F * /U user /S machine-name /R
icacls * /grant machine-name\user:F /T7
So, today, I wanted to try setting up a wireguard VPN server on my little raspberry pi at home. I... expected /some/ issues, but what I found dumbfounded me.
1 - I already had the wireguard package from the unstable branch of the main raspbian repo installed... Huh, okay.
2 - Setting up config was extremely easy... Wow, so the rumors were true. Wireguard really is almost dumb-simple.
3 - Failed to create a network interface? Oh, trouble, here it is! So lets see... modprobe wireguard... Nope. Don't have the module? What?
4 - Reconfigure package to rebuild the module - missing kernel headers? Huh... weird
This was the simple stuff... Then I went down the rabbit hole of the Raspberry Pi ecosystem:
1 - There is the Raspberry Pi Bootloader, that is apparently separate from the Kernel itself. And I didn't seem to have any of the standard linux-image-* installed... What? Weird, yet there I was, running a 4.19.42-v7+ kernel...
2 - No kernel and no headers... What... The... Fuck
3 - Okay, so... Lets just... try to install the latest kernel image then? One apt-get install... It downloaded the image, but during package configuration, it failed because... I didn't have... its headers? What? What for? And if it needs them (for whatever reason), why isn't the headers package as a dependency? Ugh, whatever...
4 - Another apt-get install and... Okay, building the initrd image aaaaand...
WHAT. What is it this time!?
Oh... Ran... No more space on device? What? Is /boot independent? Of course it is, it has to be, its a bloody different filesystem
Okay, so, lets che-OH MY GOD WTF.
Its just bloody 45 MBs big! The entire /boot is just 45 MBs large. WHY. THE. FUCK.
This was a default raspbian install from I have no idea when. But... Why. Oh WHY would ANYONE pre-configure /boot to be this incredibly tiny!?
No wonder the new init ramdisk couldn't fit in there! Its already used up from 64%!
Thanks, Raspbian Devs, now I gotta reinstall the whole system because, yes, the /boot is, of course, sector 8192. Just far enough from 2048 that there are *some* sectors free - About 3 MBs.
So what did I try? Remove the partition and recreate it from the very beginning. Only... I never tried in in the past, and okay, kernel doesn't like having the partition where its image resides deleted on the fly, it will not give up FDs pointing there or something.
So now, I have a system I cannot reboot, or it will never boot back up :|
I need to get a cheap 1U somewhere or something T.T1
I hate when I have to debug an issue and find out its somewhere entirely else, than I was looking.
>Installs a virtual server on the Proxmox VE platform
>Reboots and grub be like "No such device *UUID*"
Okay, so... reinstall grub, maybe a bug in the automated install?
>Nop.exe, still an issue
>Partition tables all good, drives all visible when booted from a live environment, grub is up to date
>Finally gives up and goes to mess in the (terrible) grub rescue environment
>Grub only sees (hd0) while root is on (hd2), what?
>A whole lot of cursing ensues, wtf?
Turns out it was a bug, but not in grub... Rather, in the QEMU-KVM agent daemon, wtf!
I never had to deal with a bug in the virtualization agent itself.
Downgrading from pve-qemu-kvm 5.0.x to 4.0.x solved the issue.
Now, maybe, I can finally go have my lunch...
I. Hate. Windows. Apps. UGH.
I may never be able to play FS2020 from the Xbox Game Pass again as... Its unable to install, gives a helpful 0x1 error code, and the help page link goes to a 404.
Now, I caused this myself... Partially... Er, no, fully, but I had a good reason!
I wanted to install something larger again and didn't have enough disk space. Fired up WinDirStat and there was a huge, like... 45 GB file in C:\Program Files\WindowsApps\Somedir\
Googling around, I found some people saying its a temp file so that Windows Store could reserve enough space for the app instalation... Okay, so... It got stuck, and I had no way to remove it?
Of course I didn't want to remove all apps of the windows market... So, I did something any *sane* person would never do - Took ownership of the whole WindowsApps and gave myself full control. Then I removed the file and... FS2020 never launched again.
I couldn't even uninstall it! It would give me no error either. It just lagged and then did nothing.
I tried resetting all the ACLs, tried giving ownership back to TrustedInstaller, nothing worked. Failed on some of the files, wtf?
Launching the game only ever told me there was an update in progress.
Tried booting a windows iso image and fix the ACLs from there, nope, also failed for the same bunch of files of FS2020. (Permission Denied while on a live image? Wow)
Last resort, I booted up Linux and tried removing the offending folders from there, only to find out that... Huh. The NTFS module labelled the offending folders as... broken links leading to an "unsupported reparse point". But hey, it let me remove it at least.
Since then, it no longer appeared as installed, but... Now, anytime I want to install it, it just throws an error 0x00000001 with no further details.
So yeah, I know I caused this myself, but after fiddling with the permissions and ACLs and NTFS dark magic, I feel justified in saying - Fuck you WindowsApps DRM.8
>Tries to uninstall old version of Visual Studio
>Uninstaller asks to update Visual Studio to continue
... Oh... Uh... That... Doesn't... Make sense? Gee... Thanks?4
In today's episode of "Am I paranoid already?" - Caching Bind resolver forwarding queries to a DoH client connecting to Cloudflare
A fun little thing to configure, and now, anytime I am on my VPN, all my DNS traffic should be completely untrackable.
Does that make me paranoid? Maybe a little... But, the knowledge that noone - not even my ISP, can see what I am doing on the internet, is kinda... Heartarming.
Now, all that's left, is for eSNI to roll out and get implemented by all major web browsers, and most snooping will be completely done for...4
Am I the only one who's getting more and more aggrevated about how the large youtube channels misinform and make out VPN providers (I am looking at you, Nord VPN, mostly) as the messiahs of the internet? How they protect our data that would otherwise be in incredible "danger" otherwise?
I understand they need clients, and I know most of the YT channels probably do not know better, but... This is misinformation at best, and downright false advertising at the worst...
"But HTTP-only websites still exist!" - yes, but unlike the era before Lets Encrypt, they are a minority. Most of the important webpages are encrypted.
"Someone could MITM their connection and present a fake certificate!" - And have a huge, red warning about the connection being dangerous. If at that point, the user ignores it, I say its their fault.
Seriously... I don't know if Nord gives their partners a script or not... But... I am getting super sick of them. And is the main reason why I made my own VPN at home...16
Don't you guys love it when your ISP suddenly decides to change your public IP? The one you were using for months?
I know we're technically not getting a static address guarantee (that is for some reason something only companies can buy), but a heads up would be nice...
I spent like an hour debugging why my VPN suddenly couldn't connect...8
Client be like:
could you please restore our database from today's backup?
At a first glance - nothing out of the ordinary. Daily backups are standard...
Until we get the backed up snapshot running.
MySQLDump is somehow... Stuck. It... Doesn't seem to be doing... Like, anything. For ages. Wtf.
So we check the database. Connect, change scheme and... The commandline tool gets stuck, too. Weird.
So a layer lower, we check the datadir and... ls... After also getting stuck for a bit, lists about 500k files O_o
Yea, dumping a database with roughly ~250k tables is not fun. No wonder it takes ages.12
>Gets a new CPU for desktop (yay, went from R5 1600 to R5 3600X)
>Spends half a day flashing new MB BIOS (Needed to flash individual major versions in order, couldn't just go 1.10 to 6.40)
>Finally finishes preparations and goes to replace the CPU
>Cleans the old one and packages it to give it to a friend
>Has issues inserting the new one as the orientation arrow on the motherboard was very hard to make out
>Spends 30 minutes applying thermal paste, worrying about optimal spread
>Forgets which side the CPU fan goes on
>Finally boots back up... CPU fan is suddenly loud AF under load, but eh, temps under stress are sub-60, so, good
>Loud CPU fan is too annoying, opens the case again
>CPU fan is on backwards
>Takes the fan off, turns it around and fastens again, puts PC back together and boots
>Is quiet again, nice
>Goes to work on the PC
>2 hours later randomly checks temps because no fan noise is weird
>CPU at 75dC, crap
>Opens the (live) PC, CPU fan is not spinning
>Has put the header on one pin to a side
>Unplugs and replugs it correctly
>Fan suddenly starts spinning very fast and cuts my finger
>Finally closes the case once more. All issues resolved
...Its situations like these that make me wonder... What would happen it I had to work with servers in person, physically lol8
Reasons to dislike our work VPN when working remotely: Forces all traffic to be routed through it, not only the important internal stuff.
For some reason, I just dislike having everything I do potentially recorded and/or at least slowed down.13
>Sets up a personal VPN
>Works on Android
>Works on Windows
>Works on Linux
>Doesn't work on MacOS
...Thanks, Apple... I guess someone always has to be the weird one out.3
The longer I work in my department, the more I grow to appreciate clients that actually know what they are doing. Or clients who have been communicating with us for so long that the emails got a little less strict and formal.
Having a client write something like "I know this mail looks scary long, but trust me, its just a few domain edits, nothing horrible" (freely-translated from my native language) just kinda... Sets me at ease and makes me chuckle.1
In hindsight, sending WoL to an untested machine while 30 kilometers away was not a very smart idea.
The machine is up, but does not respond to pings and is unreachable.4
So... I've been thinking, I tend to default to LVM when trying to create easy-to-manage disk partitions, or when I want to backup a database without long lockings during a dump... Though, now... I got thinking.
What do you guys think, which is better in terms of functionality: BtrFS or LVM?
I know BtrFS offers such thing like full snapshots that allow to easily transfer just the increment over the snapshot origin off to a remote server for archival, but I never fully grew to trust btrfs as a server filesystem... Its...
Younger, and not as widespread, not to mention I don't know any performance statistics to recommend its use for this or that case (Like... Would a high-load database engine stutter flushing all those changes on disk while reading / writing temp tables and such)6
>Have the COMPAL modem with the DOCSIS OS
>Change my bedroom router's IP to static after doing factory reset on the modem
>As expected, I get booted from the modem settings page
>Cannot log in now, because "another user is already signed into the modem settings page"
Stupid piece of silicon waste, whyyyy. I hate that thing, ugh!