15
Maxeh
42d

DONT do production stuff on friday afternoon. This friday evening we had an issue on production and just wanted to do a quick fix. The fix resulted in a ddos attack that we accidentally started on our servers in an IoT project. We contacted all customers' devices and asked them for response at the same time. Funny thing is that the devices are programmed to retry if a request fails until it is successful. We ended up with 4 hours downtime on production, servers were running again at 11pm.

Comments
  • 4
    Planning to add a random delay before the retry?
  • 1
    @electrineer we will probably introduce maximum number of retrys and we will not contact all devices at the same time again
  • 1
    If you can't trust your deployment pipeline on Friday, you can't trust it any other day. "No deploy Friday" is antipattern, people should just fix their shit.
  • 0
    @PAKA well if anything goes wrong (and that can always be the case for whatever reason), then I would rather fix it during my normal working hours than on friday evening or on saturday/sunday.
Add Comment