TL;DR I have to bump a Redis cluster from t3.medium to m6g.large just to get enough network bandwidth even though I have no need of the extra memory.

Debugged an interesting issue today.
I am adding Elasticache to a project to reduce strain on the single node postgres DB.
Deployed a Redis replication group with 2 shards, with multi-AZ replication for resilience.

Everything was going well. We arent caching that much atm so was barely using 100Mb of memory.

Suddenly, when our US region comes online, latency skyrockets and the logs are full of Jedis timeout errors.
Still no issue with memory or node CPU.

The cause? Arbitrary network bandwidth throttling by AWS. The app currently processes about 3,000 requests per second so we were exceeding Amazons random ass allowances which arent documented anywhere.

  • 0
    My favourite kind of limits undocumented aws ones lol
Add Comment