Sidekiq runit script too fragile

hel_Sinki · November 23, 2025, 2:30pm

Hi — let me restate this strictly based on runtime facts from the official Docker container.

What I’m seeing in the running container (facts)

This is an official Docker install with runit (standard /var/discourse launcher workflow; no rebuild right before the incident). Inside the container:

A runit Sidekiq service exists and is the one being supervised

ls -l /etc/service/sidekiq/run
sv status sidekiq

Output during the incident:

down: sidekiq: 1s, normally up, want up

Manual Sidekiq start works

cd /var/www/discourse
sudo -u discourse bundle exec sidekiq -C config/sidekiq.yml

This stays up, connects to Redis, and processes jobs.

Patching only /etc/service/sidekiq/run (no rebuild) fixes the crash loop immediately Replaced /etc/service/sidekiq/run with:

#!/bin/bash
exec 2>&1
cd /var/www/discourse
mkdir -p tmp/pids
chown discourse:discourse tmp/pids || true
exec chpst -u discourse:discourse \
  bash -lc 'cd /var/www/discourse && rm -f tmp/pids/sidekiq*.pid; exec bundle exec sidekiq -C config/sidekiq.yml'

After that:

sv status sidekiq
run: sidekiq: (pid <PID>) <SECONDS>s

So Sidekiq is not being launched via Unicorn master in this image; it’s a runit service whose runtime script can crash-loop.

Why you may not see the exact code in

discourse_docker

I agree the literal text may not be in the repo because /etc/service/sidekiq/run is a runtime artifact generated/injected during image build/boot, not necessarily a verbatim file in discourse_docker. But it is the active supervised service in this official image, as shown above.

What triggered the fragility (facts + minimal inference)

We also observed daily logrotate failures due to standard Debian perms:/var/log = root:adm 0775, so logrotate refused rotation until adding global su root adm.
When logrotate was failing, it recreated files under /shared/log/rails/, including sidekiq.log.
The default runit script in this image used discourse:www-data and forced -L log/sidekiq.log into /shared/log, which makes Sidekiq very sensitive to shared-volume perms drift and can cause an immediate exit before useful logs.

Request / proposal

Given the above, could we consider hardening the default Docker/runit Sidekiq service?

Suggested defaults:

run as discourse:discourse (matches typical ownership inside container),
start via bundle exec sidekiq -C config/sidekiq.yml,
avoid forcing a shared -L log/sidekiq.log (or make it resilient).

This would prevent the silent down: 1s crash loop that stops all background/AI jobs.

Happy to test any branch/commit you point me at.

Topic		Replies	Views
Sidekiq stops after some time Self-hosting	8	1158	July 14, 2023
"Ensure sidekiq is running." when it is definitely running Self-hosting	19	7818	October 24, 2015
After a clean install : Sidekiq error Bug	18	2394	December 7, 2015
Sidekiq not running. Sidekiq heartbeat test failed, restarting Self-hosting unsupported-install	15	2848	June 10, 2020
Sidekiq not running Self-hosting	22	5293	June 8, 2024