Olá,
Esta manhã, alguns usuários me avisaram que aparentemente nossa instalação do Discourse estava fora do ar, mas quando verifiquei, ela já estava no ar novamente. Houve vários comportamentos “offline” que pude ver em minha pilha de monitoramento, mas quando fui verificar os logs via launcher logs app, parece que eles são exibidos sem data/hora (
) e apenas os mais recentes, presumo, sem distinção real da origem dos logs, pois existem vários componentes que formam uma instalação do Discourse (nginx, redis, psql, etc.).
Existe alguma maneira de ter logs mais precisos ou, melhor ainda, expô-los de alguma forma para que eu possa capturá-los com loki / promtail?
Todos os gráficos são de “últimos 5 dias”, pois adicionei este monitoramento apenas naquela época. Isso é o que posso ver em meu monitoramento:
Uptime do Container:
A partir dos plugins do discourse-prometheus, muitos “buracos”, que suponho serem de quando o Discourse não está respondendo, pois correspondem a quando fui notificado pelos usuários esta manhã:
Zoom nos últimos 6 horas para esclarecer que esta não é uma janela “pequena” e definitivamente não é aceitável pelos usuários, é claro:
Logs obtidos com ./launcher logs app
root@vmi1229594:/var/discourse# ./launcher logs app
x86_64 arch detected.
run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/01-cleanup-web-pids
run-parts: executing /etc/runit/1.d/anacron
run-parts: executing /etc/runit/1.d/cleanup-pids
Cleaning stale PID files
run-parts: executing /etc/runit/1.d/copy-env
Started runsvdir, PID is 41
ok: run: redis: (pid 53) 0s
ok: run: postgres: (pid 55) 0s
supervisor pid: 51 unicorn pid: 87
Shutting Down
run-parts: executing /etc/runit/3.d/01-nginx
ok: down: nginx: 0s, normally up
run-parts: executing /etc/runit/3.d/02-unicorn
(51) exiting
ok: down: unicorn: 1s, normally up
run-parts: executing /etc/runit/3.d/10-redis
ok: down: redis: 0s, normally up
run-parts: executing /etc/runit/3.d/99-postgres
ok: down: postgres: 0s, normally up
ok: down: nginx: 5s, normally up
ok: down: postgres: 0s, normally up
ok: down: redis: 3s, normally up
ok: down: unicorn: 5s, normally up
ok: down: cron: 1s, normally up
ok: down: rsyslog: 1s, normally up
run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/01-cleanup-web-pids
run-parts: executing /etc/runit/1.d/anacron
run-parts: executing /etc/runit/1.d/cleanup-pids
Cleaning stale PID files
run-parts: executing /etc/runit/1.d/copy-env
Started runsvdir, PID is 35
ok: run: redis: (pid 49) 0s
ok: run: postgres: (pid 48) 0s
supervisor pid: 43 unicorn pid: 81
(43) Reopening logs
(43) Reopening logs
Shutting Down
run-parts: executing /etc/runit/3.d/01-nginx
ok: down: nginx: 0s, normally up
run-parts: executing /etc/runit/3.d/02-unicorn
(43) exiting
ok: down: unicorn: 0s, normally up
run-parts: executing /etc/runit/3.d/10-redis
ok: down: redis: 0s, normally up
run-parts: executing /etc/runit/3.d/99-postgres
timeout: run: postgres: (pid 48) 34983s, want down, got TERM
run-parts: /etc/runit/3.d/99-postgres exited with return code 1
ok: down: nginx: 10s, normally up
ok: down: redis: 8s, normally up
ok: down: unicorn: 10s, normally up
ok: down: cron: 0s, normally up
ok: down: rsyslog: 0s, normally up
kill: run: postgres: (pid 48) 34991s, want down, got TERM
run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/01-cleanup-web-pids
run-parts: executing /etc/runit/1.d/anacron
run-parts: executing /etc/runit/1.d/cleanup-pids
Cleaning stale PID files
run-parts: executing /etc/runit/1.d/copy-env
Started runsvdir, PID is 35
fail: redis: runsv not running
ok: run: redis: (pid 53) 1s
ok: run: postgres: (pid 48) 1s
supervisor pid: 79 unicorn pid: 83
(79) Reopening logs
Shutting Down
run-parts: executing /etc/runit/3.d/01-nginx
ok: down: nginx: 0s, normally up
run-parts: executing /etc/runit/3.d/02-unicorn
(79) exiting
ok: down: unicorn: 0s, normally up
run-parts: executing /etc/runit/3.d/10-redis
ok: down: redis: 0s, normally up
run-parts: executing /etc/runit/3.d/99-postgres
ok: down: postgres: 0s, normally up
ok: down: nginx: 5s, normally up
ok: down: postgres: 0s, normally up
ok: down: redis: 3s, normally up
ok: down: unicorn: 5s, normally up
ok: down: cron: 1s, normally up
ok: down: rsyslog: 1s, normally up
run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/01-cleanup-web-pids
run-parts: executing /etc/runit/1.d/anacron
run-parts: executing /etc/runit/1.d/cleanup-pids
Cleaning stale PID files
run-parts: executing /etc/runit/1.d/copy-env
Started runsvdir, PID is 34
ok: run: redis: (pid 49) 0s
ok: run: postgres: (pid 44) 0s
supervisor pid: 41 unicorn pid: 80
(41) Reopening logs
(41) Reopening logs
(41) Reopening logs
(41) Reopening logs
(41) Reopening logs








