devRant - A fun community for developers to connect over code, tech & life as a programmer

In last episode of "How SystemD screwed me over", we talked about Systemd's PrivateTMP and how it stopped me from generating SSL certificates.

In today's episode - SystemD vs CGroups!

Mister Pottering and his team apparently felt that CGroups are underused (As they can be quite difficult to set up), and so decided to integrate them into SystemD by default. As well as to provide a friendlier interface to control their values.

One can read about these interactions in the manual page "systemd.resource-control"

All is cool so far. So what happened to me today?

Imagine you did a major system release upgrade of a production server, previously tested on a standalone server. This upgrade doesn't only upgrade the distribution however, it also includes the switch from SysVInit to SystemD. Still, everything went smooth before, nothing to worry now then, right? Wrong.

The test server was never properly stress-tested. This would prove to be an issue.

When the upgrade finishes, it is 4 AM. I am happy to go to bed at last. At 6 AM, however, I am woken up again as the server's webservices are unavailable, and the machine is under 100% CPU load. Weird, I check htop and see that Apache now eats up all 32 virtual cores. So I restart it, casting it off to some weird bug or something as the load returns to normal.

2 hours later, however, the same situation occurs. This time, I scour all the logs I can, and find something weird - Many mentions that Apache couldn't create a worker thread? That's weird.

Several hours of research and tinkering later, I found out the following:
1 - By default, all processes of a system that runs SystemD are part of several CGroups. One of these CGroups is the PID CGroup, meant to stop a runaway process from exhausting all PIDs/TIDs of a system.

This limit is, by default, set to a certain amount of the total available PIDs. If a process exhausts this limit, it can no longer perform operations like fork().

So now, I know the how and why, but how should I solve this? The sanest option would be to get a rough estimate of just how many threads the Apache webserver might need. This option, though, is harder, than apparent. I cannot just take the MaxRequestsWorkers number... The instance has roughly double the amount of threads already. The cause being, as I found out, the HTTP/2 module, which spawns additional threads that do not count towards this limit. So I have no idea what limit to set.

Or I could... Disable the limit for just the webserver via the TasksAccounting switch. I thought this would work. And it did seem to... Until I ran out of TIDs again - Although systemctl status apache2.service no longer reported the number of tasks or a task limit of the process, the PID CGroup stayed set to the previous limit. Later I found out that I can only really disable the Task Accounting for all the units of a given slice and its parents.

This, though, systemctl somewhat didn't make apparent (And I skimmed the manual, that part was my fault)

So... The only remaining option I had was to... Just set the limit to infinite. And that worked, at last.

It took me several hours to debug this issue. And I once again feel like uninstalling systemd again, in favor of sysvinit.

What did I learn? RTFM, carefully, everything is important, it is not enough to read *half* the paragraph of a given configuration option...

Oh, and apache + http/2 = huge TID sink.