Docker image update: Redis 6 and 25% smaller image size

We just shipped a brand new container image that will be used on your next ./launcher rebuild app. As always, there is no need to change any configuration you have provided you followed our Discourse official Standard Installation. That said, there are new features that will help some installations out there.

Redis 6

We make heavy use of Redis in many places in Discourse, be it for cache, Sidekiq, MessageBus, Distributed Locks, Rate Limits All in all it’s been a rock solid choice for us.

However, at some very specific workloads, Redis could be a bottleneck. And because of Redis single threaded nature, coupled with our inability to use multiple instances due to our LUA scripts, it meant this was a hard bottleneck to workaround.

Thankfully, Redis 6 comes with support to using a thread pool for I/O operations, and during our tests it works very well with Discourse clusters bottlenecked by Redis.

So, if you are running on a machine with lots of CPU cores, and metrics showing Redis struggling to handle the load, you can now opt-in into using threads for writing operations via the app.yml params section:

params:
  redis_io_threads: "4" # 1 disables it, n>1 uses n-1 extra threads for IO writes

Smaller image

We opted to ship a large container image early on in the project, so we can make it easier for non-technical people to run Discourse, and handle all necessary dependencies, versioning, upgrades, etc.

That said, we recently went over 1GB for the compressed image, and that was a bit too much.

So in order to mitigate the ever increasing size of the image, we moved the Discourse source code from inside the image from a complete copy of the source code to a “shallow clone” containing only the most recent version of the code.

This changes makes the compressed image size 25% smaller, which results in less server space needed and faster rebuilds when a new image is released. It also should control the image growth over time.

We tested it on tests-passed/beta/stable, with both rebuilds and web updates and it doesn’t break any standard paths. However, users doing more exotic git stuff on the app.yml hooks may have to adapt their customizations.

42 Likes

What happens if anything to the browser experience after such a redis upgrade? Any impact on cached assets? Does this get emptied as a result of the upgrade?

3 Likes

Nothing.

Assets are saved into the local disk or object storage, and cached in the CDN. Redis doesn’t impact it.

The Redis data is kept during the upgrade.

10 Likes

What is the default value? 1?

5 Likes

Yes. It comes from Redis own config file, where 1 means a single thread like the old version.

8 Likes

I’ve got an instance where I have added:

after_redis:
  - replace:
      filename: "/etc/redis/redis.conf"
      from: /^databases.*/
      to: "databases 50"

And it’s failing to rebuild because:

25:M 01 Dec 2020 20:21:08.830 # FATAL: Data file was created with a Redis server configured to handle
 more than 16 databases. Exiting

Is there some other hook that I can snag to update the databases count before it tries to migrate or whatever it does?

Hmm. And now after the apparently failed rebuild, I see this in docker logs:

chgrp: invalid group: ‘syslog’

2 Likes

Why do you need more than 1 database?

3 Likes

Multisite. Multiple instances using a single redis. I probably should have used a more generic redis container, but thought I’d stick with yours.

2 Likes

:face_with_raised_eyebrow:

Don’t multisite uses a single database and the standard Redis keyname spaces? AFAIK Redis databases are a thin layer and we have commands that go across their boundaries, so you should not rely on that.

6 Likes

Yeah multisite uses 1 database

3 Likes

Oh. Then not multisite, just multiple instances running on one machine that each need a separate redis. I only upped the default 16 to 50 because I was too lazy to keep tight control over which redis databases were in use.

So I should run a separate redis container for each instance, I guess?

2 Likes

Yes, otherwise you may run into stuff cross-talking.

5 Likes

OH. Darn.

Thankfully, I learned this on a server that’s in use only for testing.

For the other sites, should I just give them a fresh redis and throw away what was scheduled? Do a backup/restore?

FWIW, I’ve not noticed any issues with crosstalk in the past year or so. :man_shrugging: And there is a way to set the DB. .

EDIT: Well, the good news is that I can enter the container, edit redis.conf and restart it and it starts working again.

If you’ve got a hint on how to move a site from DISCOURSE_REDIS_DB: 12 on one redis container to another redis container, I’d love to hear. Or maybe just don’t care about scheduled jobs?

3 Likes

This. Discourse should reasonably survive a Redis flush. Some stuff is lost, but nothing critical.

7 Likes

That’s what I’d though, as I don’t know any way that backups attempt to restore it (but there’s a lot I don’t know). It looks like I could do it like this: Copy all keys from one db to another in redis - Stack Overflow, and it appears to have worked in a test I just did, but it’ll be much easier to adjust my playbook to just create a new redis container and use it.

Thanks.

Now to figure out whether to run those redises on the database server or the web server. . .

3 Likes

Then what is the db_id: 2 in the multisite config refer to?

2 Likes

A deprecated setting:

https://github.com/discourse/rails_multisite/commit/2bb4d5170cbf462708eb32bb85c035fb1700f7b4

5 Likes

LOL. Yes, It is indeed confusing!

Thanks. I’m working on a ‘multisite config with lets encrypt and no external reverse proxy’ topic, and when I do that, I’ll clean up the other one as well.

4 Likes

Just make sure to restart unicorn right after that, so it will recreate the scheduled tasks.
You will lose anything that is queued so you need to find a good moment to do this.

6 Likes

Is this still working as it should be? Is there an easy one-liner to discover how large the compressed image is?

1 Like