6

So I recently finished a rewrite of a website that processes donations for nonprofits. Once it was complete, I would migrate all the data from the old system to the new system. This involved iterating through every transaction in the database and making a cURL request to the new system's API. A rough calculation yielded 16 hours of migration time.

The first hour or two of the migration (where it was creating users) was fine, no issues. But once it got to the transaction part, the API server would start using more and more RAM. Eventually (30 minutes), it would start doing OOMs and the such. For a while, I just assumed the issue was a lack of RAM so I upgraded the server to 16 GB of RAM.

Running the script again, it would approach the 7 GiB mark and be maxing out all 8 CPUs. At this point, I assumed there was a memory leak somewhere and the garbage collector was doing it's best to free up anything it could find. I scanned my code time and time again, but there was no place I was storing any strong references to anything!

At this point, I just sort of gave up. Every 30 minutes, I would restart the server to fix the RAM and CPU issue. And all was fine. But then there was this one time where I tried to kill it, but I go the error: "fork failed: resource temporarily unavailable". Up until this point, I believed this was simply a lack of memory...but none of my SWAP was in use! And I had 4 GiB of cached stuff!

Now this made me really confused. So I did one search on the Internet and apparently this can be caused by many things: a lack of file descriptors or even too many threads. So I did some digging, and apparently my app was using over 31 thousands threads!!!!! WTF!

I did some more digging, and as it turns out, I never called close() on my network objects. Thus leaving ~30 new "worker" threads per iteration of the migration script. Thanks Java, if only finalize() was utilized properly.

Comments
  • 1
    At least you found the error in the end! :)
    Welcome to Devrant buddy 😁
Add Comment