Python muses me sometimes.

Gunicorn has a preload mode. It enables forking...

So Gunicorn starts, when Gunicorn loaded it forks the workers (Uvicorn / FastAPI in my case).


So if we add a function that creates the app... this function will be executed before forking, thus the memory at the state of creating the app will be duplicated.

You can thus spawn 40 workers, they would all have the same ML models.

Or in my case a client who does some things that should only be run by a single thread (with locking).

So the client has a cache, as long as I load the cache during the create_app phase, the cache will be shared between all instances and not created per instance.

It's ... Such a small detail. So simple.

Yet completely fucks my brain.

It's logical, yes. I understand what it does, yes.

But it still makes my brain fart.

Add Comment