14
noder
3y

Developing a notification API, sends emails to subscribers, email API can take only 100 IDs at once, so partitioned the email list and send mails in blocks of 100.

Forgot to reset the list after every block, so each new partition got appended to the existing list and kept going on.

Ran it against a test DB, which was recently refreshed with near-prod data !!! Thousands of emails went out of the app server in one shot and everybody receiving numerous duplicate emails. Especially the ones in the very first partition.

Got an incident raised by the CEO himself reg the flurry of emails. But, things were out of our hands, quite literally. All emails are queued up in the exchange server.

Called up the exchange server team, purged​ the queued emails. No other emails were sent/received during this whole episode.

Thanks to Iterables.partition in the present day.

Comments
  • 3
    Thats close to worst case :/

    Should real emails even be in the test db?

    If it is, implement some trial switch that hard wire email to a trial one.

    Got similar problems where to make a real test I need to run against customer data since customers add their own html and we need to se that this does not break down.

    So I use an interface and a proxy to all db operations and the trial hard codes a test email in the return data to
    Prevent such accidents. But I always feel it in the stomach when triggering a test, what if ...
  • 2
  • 1
    @Voxera Yeah, it's the closest to worst mistake in my dev career, kind of created a legend - the one who crashed exchange server.
    This happened probably 6 years back, so have come a long way from there. If it is today, I'd be using mock SMTP servers for testing it out !!
Add Comment