9

Monday morning we found out our main event queue hadn't processed since late Wednesday afternoon. Shit was hitting the fan and we were stumped. What had changed?!?. Why wasn't the queue processor running?!?

Turns out a server restart had killed the job (no worries there, surely?!) but turns out the job checked for a system flag on disk to stop it running multiple instance or in this case as the flag was still present any instance at all. Got to love the little things that really screw you over.

Comments
  • 2
    I keep thinking there is a better way to flag a process is running. Maybe a temp file that gets cleaned up on restart?

    How did you fix this?!
  • 2
  • 1
    @Demolishun Had to calculate the name of the temp file by generating a SHA1 of the process name and it's arguments. Then locate the file and remove it.

    I think a script at shutdown / reboot that cleaned the directory would be a great addition
  • 2
    @StyxOfDynamite Just do it in a sane way. Use a file lock, mutex, whatever. Filesystem isn't even atomic, it's a terrible choice for locking.
  • 0
    check the lock file creation time and allow a time limit for the lock. Considering the maximum execution time of your process
    If the lock time exceeds, recreate the lock start the process and notify someone
    It might work for you. I use it all the time
  • 1
    @DreamWave Or, you know, just use a reliable, foolproof method built into your OS instead of relying on homebrew hacks that impose artificial limitations such as lock time.
Add Comment