How to Perform Major Discourse Maintenance with Minimal Downtime?

emonunix · September 5, 2025, 8:27pm

I’d like to open a discussion on the best practices for performing core maintenance tasks on a Discourse instance while minimizing or eliminating downtime.

Tasks like changing critical resource settings (e.g., UNICORN_WORKERS, DISCOURSE_SIDEKIQ_WORKERS, DISCOURSE_DB_POOL) or applying major updates typically require a launcher rebuild app which can take a significant amount of time, sometimes 30 minutes or more.

My question is:
What are the recommended strategies for system administrators to perform these essential updates and configuration changes with the least amount of user-facing downtime?

Are there any advanced techniques, like blue/green deployments or other zero-downtime deployment strategies, that are supported or recommended for Discourse? Or is the standard rebuild process the only supported method, and the focus should be on optimizing the rebuild time itself?

I’m interested in hearing from anyone who has experience managing large or high-traffic instances and what their workflow looks like for maintenance.

Thanks for any insights!

pfaffman · September 5, 2025, 8:39pm

If you have a two container install, the new container builds while the old one runs. Downtime is just the amount of time it takes to launch the new container. The only issue is that you need enough ram to build a container while the other runs.

Move from standalone container to separate web and data containers, but I usually move a new vm.

If you want zero down time then you need a load balancer that keeps the old container running until the new one has fully started. Then you shut down the old container and do the post update migrations.

Ethsim2 · September 5, 2025, 9:05pm

can you have two data containers on failover?

do you use a usually have separate vm for data?

merefield · September 5, 2025, 9:53pm

Discourse is so stable this is pretty unnecessary for most installs (but I guess you might consider it for very high availability requirements or if you are hosting others?!)

I don’t think I’ve had a single outage in 7 years due to a production “glitch” …

The riskiest moments in a Discourse’s life is always at rebuild.

the two container setup gives you the ability to bootstrap a new build before committing to it though that won’t catch some runtime errors of course.

The issue is that if your migrations have run, you might need to commit to the new build and so you would usually try to track down and fix the source of those errors rather than roll back.

Generally people do not try to roll back …

pfaffman · September 5, 2025, 10:47pm

I move to a new vm when doing a big reconfiguration.

It’s possible to run a PostgreSQL mirror, but it’s a lot of work

itsbhanusharma · September 5, 2025, 11:42pm

Read replica would be better no?

pfaffman · September 6, 2025, 12:41am

Yeah! Replica! That’s the word they use. And then if the other one dies you can hot swap to the replica.

Topic		Replies	Views
Help with "zero downtime" setup Self-hosting hosting	7	2328	September 10, 2020
How to install a plugin without rebuilding (or set a maintainance message) Support	10	3535	July 22, 2020
How do I upgrade Discourse in a multiple container configuration? Self-hosting	2	939	October 8, 2020
How to speed up container instantiation - if possible at all? Self-hosting	4	319	August 29, 2023
Is there any faster way to re-build the site? Self-hosting	4	453	March 30, 2024

How to Perform Major Discourse Maintenance with Minimal Downtime?

Related topics