I’m trying to diagnose a major issue with AUFS affecting my forums, and need to understand some detail in Docker startup to do this properly.
Question
Does stopping or starting up Discourse modify /var/www/discourse/config/discourse.conf
, or is it changed only when rebuilding a container? I think I’ve confirmed that rebuilding a container will change it, through code in /var/discourse/templates/web.template.yml
(from @sam’s discourse_docker
repository).
I think I do just need the answer to that question; if you disagree and want to double check, below’s more info.
TL;DR. Background
Basically, changes to smtp_address
in /var/www/discourse/config/discourse.conf
are not autoreloaded, docker stop/restart <Discourse_Container>
hangs repeatably, with processes inside the container using 100% CPU in kernel space and outputting kernel stacktraces that suggest “infinite loop”, and after rebooting the host, the changes to /var/www/discourse/config/discourse.conf
are gone, even though I never rebuilt the container. So, either Discourse is overwriting it, or AUFS is losing the changes.
Why??
My mail server sysadmin asked me to urgently fix my Discourse configuration, and rebuilding the container right away would have been too much downtime (though of course I planned to do that eventually).
Evidence for kernel bug
Since I’m suspecting a kernel bug, and you shouldn’t trust me on that just so, here’s the smoking gun — a process in state D (uninterruptible sleep) for two minutes.
Jan 12 16:16:07 kamino kernel: [1254339.769725] INFO: task runsv:26554 blocked for more than 120 seconds.
Jan 12 16:16:07 kamino kernel: [1254339.769927] Tainted: G OX 3.13.0-74-generic #118-Ubuntu
Jan 12 16:16:07 kamino kernel: [1254339.770135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 12 16:16:07 kamino kernel: [1254339.770383] runsv D ffff88080f033180 0 26554 26526 0x00000004
Jan 12 16:16:07 kamino kernel: [1254339.770386] ffff8807744dba90 0000000000000082 ffff8806653f8000 ffff8807744dbfd8
Jan 12 16:16:07 kamino kernel: [1254339.770389] 0000000000013180 0000000000013180 ffff8806653f8000 ffff8808003db408
Jan 12 16:16:07 kamino kernel: [1254339.770392] ffff8808003db40c ffff8806653f8000 00000000ffffffff ffff8808003db410
Jan 12 16:16:07 kamino kernel: [1254339.770394] Call Trace:
Jan 12 16:16:07 kamino kernel: [1254339.770401] [<ffffffff81729499>] schedule_preempt_disabled+0x29/0x70
Jan 12 16:16:07 kamino kernel: [1254339.770404] [<ffffffff8172b305>] __mutex_lock_slowpath+0x135/0x1b0
Jan 12 16:16:07 kamino kernel: [1254339.770406] [<ffffffff8172b39f>] mutex_lock+0x1f/0x2f
Jan 12 16:16:07 kamino kernel: [1254339.770419] [<ffffffffa02ab276>] au_new_inode+0xa6/0x700 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770423] [<ffffffff811ca589>] ? vfs_create+0x109/0x130
Jan 12 16:16:07 kamino kernel: [1254339.770429] [<ffffffffa02ad8b8>] epilog+0x78/0x160 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770435] [<ffffffffa02ae1a1>] add_simple+0x1e1/0x2e0 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770440] [<ffffffffa02ac020>] ? aufs_permission+0x190/0x310 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770445] [<ffffffffa02ae334>] aufs_create+0x34/0x40 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770447] [<ffffffff811ca54d>] vfs_create+0xcd/0x130
Jan 12 16:16:07 kamino kernel: [1254339.770450] [<ffffffff811cb71e>] do_last+0x100e/0x1200
Jan 12 16:16:07 kamino kernel: [1254339.770453] [<ffffffff8131666b>] ? apparmor_file_alloc_security+0x5b/0x180
Jan 12 16:16:07 kamino kernel: [1254339.770457] [<ffffffff812d8c86>] ? security_file_alloc+0x16/0x20
Jan 12 16:16:07 kamino kernel: [1254339.770459] [<ffffffff811cde8b>] path_openat+0xbb/0x640
Jan 12 16:16:07 kamino kernel: [1254339.770462] [<ffffffff811cec9b>] ? SYSC_renameat+0xeb/0x420
Jan 12 16:16:07 kamino kernel: [1254339.770465] [<ffffffff811cf27a>] do_filp_open+0x3a/0x90
Jan 12 16:16:07 kamino kernel: [1254339.770467] [<ffffffff811dc0d7>] ? __alloc_fd+0xa7/0x130
Jan 12 16:16:07 kamino kernel: [1254339.770470] [<ffffffff811bd839>] do_sys_open+0x129/0x280
Jan 12 16:16:07 kamino kernel: [1254339.770472] [<ffffffff811bd9ae>] SyS_open+0x1e/0x20
Jan 12 16:16:07 kamino kernel: [1254339.770475] [<ffffffff8173575d>] system_call_fastpath+0x1a/0x1f
That process shouldn’t be using 100% CPU, but three other ones (presumably from the same container) are:
26555 root 20 0 168 4 R 99.9 0.0 4:00.38 runsv rsyslog
26553 root 20 0 168 4 D 98.6 0.0 3:55.57 runsv postgres
26556 root 20 0 168 4 R 97.2 0.0 3:52.82 runsv cron
This info isn’t conclusive of course—as I said, I’m investigating—but it is rather suggestive evidence.