Hi team,
reporting a failure mode in the official Docker/runit setup that can silently kill Sidekiq (and therefore AI / background jobs) without any rebuild or upgrade.
Environment
- Official Discourse Docker install (standard container + runit services).
- No rebuild/upgrade right before the issue started.
- Discourse AI plugin enabled, but AI stopped replying.
Symptoms
- AI looks enabled in admin UI, but no AI replies appear.
- Background jobs (AI/embeddings/auto-reply) appear stuck.
- sv status sidekiq shows Sidekiq repeatedly dying right after start:
down: sidekiq: 1s, normally up, want up
- Manually starting Sidekiq works fine, so the app itself is OK:
bundle exec sidekiq -C config/sidekiq.yml
# stays up, connects to Redis, processes jobs
What we found
The default runit script was:
exec chpst -u discourse:www-data \
bash -lc 'cd /var/www/discourse && ... bundle exec sidekiq -e production -L log/sidekiq.log'
Two fragility points:
- Primary group www-data In my container, typical writable paths are owned by discourse:discourse. Any drift in tmp/pids or shared paths can make Sidekiq exit during boot when run under www-data, even though manual start as discourse works.
- Forced -L log/sidekiq.log writing to shared logs The log path is a symlink into /shared/log/rails/sidekiq.log. If that file/dir gets recreated with different ownership/permissions, Sidekiq can exit immediately before producing useful logs.
Related trigger: logrotate failing daily
Separately, logrotate was failing every day with:
error: skipping "...log" because parent directory has insecure permissions
Set "su" directive in config file ...
Cause was standard Debian/Ubuntu perms:
- /var/log is root:adm with 0775 (group writable).
- logrotate refuses rotation unless a global su directive is set.This is expected upstream behavior.
At the moment the daily logrotate job failed, it also recreated files under /shared/log/rails/ (including sidekiq.log), which likely interacted with the forced -L logging and contributed to the Sidekiq “1s crash” loop.
Fix (no rebuild needed)
- Fix logrotate so it stops touching shared logs in a failed state Add a global su directive:
# /etc/logrotate.conf (top)
su root adm
After that, logrotate -v exits 0 and no longer reports insecure parent perms.
- Replace Sidekiq runit script with a more robust default Switching to discourse:discourse and the standard sidekiq.yml, and not forcing -L log/sidekiq.log, makes Sidekiq stable:
#!/bin/bash
exec 2>&1
cd /var/www/discourse
mkdir -p tmp/pids
chown discourse:discourse tmp/pids || true
exec chpst -u discourse:discourse \
bash -lc 'cd /var/www/discourse && rm -f tmp/pids/sidekiq*.pid; exec bundle exec sidekiq -C config/sidekiq.yml'
After this:
- sv status sidekiq stays run:
- AI/background jobs resume.
Request / suggestion
Could we consider making the official Docker/runit Sidekiq service more robust by default?
For example:
- Run Sidekiq under discourse:discourse (matching typical ownership inside container).
- Prefer bundle exec sidekiq -C config/sidekiq.yml.
- Avoid forcing a shared log file via -L log/sidekiq.log, or make it resilient to logrotate/shared-volume perms drift.
Even a doc note (“if Sidekiq shows down: 1s but manual start works, check /etc/service/sidekiq/run and avoid forced shared logging”) would help self-hosters a lot.
Happy to provide more logs if needed. Thanks!