Sidekiq runit script too fragile

hel_Sinki · November 21, 2025, 11:56am

Hi team,

reporting a failure mode in the official Docker/runit setup that can silently kill Sidekiq (and therefore AI / background jobs) without any rebuild or upgrade.

Environment

Official Discourse Docker install (standard container + runit services).
No rebuild/upgrade right before the issue started.
Discourse AI plugin enabled, but AI stopped replying.

Symptoms

AI looks enabled in admin UI, but no AI replies appear.
Background jobs (AI/embeddings/auto-reply) appear stuck.
sv status sidekiq shows Sidekiq repeatedly dying right after start:

down: sidekiq: 1s, normally up, want up

Manually starting Sidekiq works fine, so the app itself is OK:

bundle exec sidekiq -C config/sidekiq.yml
# stays up, connects to Redis, processes jobs

What we found

The default runit script was:

exec chpst -u discourse:www-data \
  bash -lc 'cd /var/www/discourse && ... bundle exec sidekiq -e production -L log/sidekiq.log'

Two fragility points:

Primary group www-data In my container, typical writable paths are owned by discourse:discourse. Any drift in tmp/pids or shared paths can make Sidekiq exit during boot when run under www-data, even though manual start as discourse works.
Forced -L log/sidekiq.log writing to shared logs The log path is a symlink into /shared/log/rails/sidekiq.log. If that file/dir gets recreated with different ownership/permissions, Sidekiq can exit immediately before producing useful logs.

Related trigger: logrotate failing daily

Separately, logrotate was failing every day with:

error: skipping "...log" because parent directory has insecure permissions
Set "su" directive in config file ...

Cause was standard Debian/Ubuntu perms:

/var/log is root:adm with 0775 (group writable).
logrotate refuses rotation unless a global su directive is set.This is expected upstream behavior.

At the moment the daily logrotate job failed, it also recreated files under /shared/log/rails/ (including sidekiq.log), which likely interacted with the forced -L logging and contributed to the Sidekiq “1s crash” loop.

Fix (no rebuild needed)

Fix logrotate so it stops touching shared logs in a failed state Add a global su directive:

# /etc/logrotate.conf (top)
su root adm

After that, logrotate -v exits 0 and no longer reports insecure parent perms.

Replace Sidekiq runit script with a more robust default Switching to discourse:discourse and the standard sidekiq.yml, and not forcing -L log/sidekiq.log, makes Sidekiq stable:

#!/bin/bash
exec 2>&1
cd /var/www/discourse

mkdir -p tmp/pids
chown discourse:discourse tmp/pids || true

exec chpst -u discourse:discourse \
  bash -lc 'cd /var/www/discourse && rm -f tmp/pids/sidekiq*.pid; exec bundle exec sidekiq -C config/sidekiq.yml'

After this:

sv status sidekiq stays run:
AI/background jobs resume.

Request / suggestion

Could we consider making the official Docker/runit Sidekiq service more robust by default?

For example:

Run Sidekiq under discourse:discourse (matching typical ownership inside container).
Prefer bundle exec sidekiq -C config/sidekiq.yml.
Avoid forcing a shared log file via -L log/sidekiq.log, or make it resilient to logrotate/shared-volume perms drift.

Even a doc note (“if Sidekiq shows down: 1s but manual start works, check /etc/service/sidekiq/run and avoid forced shared logging”) would help self-hosters a lot.

Happy to provide more logs if needed. Thanks!

Topic		Replies	Views
Sidekiq stops after some time Installation	8	1124	July 14, 2023
"Ensure sidekiq is running." when it is definitely running Installation	19	7752	October 24, 2015
After a clean install : Sidekiq error Bug	18	2382	December 7, 2015
Sidekiq dying and not coming back up Installation	6	1375	April 24, 2022
Sidekiq heartbeat test failed, restarting Installation	12	2013	February 11, 2020