Why did Discourse start running their own Redis instead of using Elasticache?

Continuing the discussion from Meta is moving to the Cloud :cloud_with_lightning::

This sounds like a really interesting story. I’d love to hear it.

Have you told it already anywhere?


Our main pain points were two.

1. Instance types

Elasticache didn’t allow you to run HA Redis without Cluster Mode on instance types below M4.large.

Support for cluster mode in Ruby Redis libraries is still being ironed out. MessageBus also does atomic operations on keys using LUA scripts and since that will span multiple servers it’s not allowed in cluster mode. We do host some BIG instances, so we may revisit using distributed writes for databases (Rails 6 is coming with that for PostgreSQL) but we aren’t needing this yet.

We do host sites which need instance types way larger than that. But also host a lot where a t3.small is more than enough.

Moving to running our own Redis allows us to pick and play with any instance types available in the target region.

2. Read only mode

Discourse can keep a connection open with read only nodes with both Discourse and Redis.

That allows people to keep reading on the site when masters go down.

Elasticache wasn’t very straightforward in providing endpoints for the replicas in the main cluster, that updated to current replicas in failover events.


Also … not to forget. This offers us significant cost savings.

When you use ElasticCache you pay for the instances plus a tax for the ElasticCache service. Additionally we get much better control on utilisation since we can run multiple redises if we wish on a single instance.