From DISCOURSE_SMTP_ADDRESS in discourse_docker to /var/www/discourse/config/discourse.conf

I’m trying to diagnose a major issue with AUFS affecting my forums, and need to understand some detail in Docker startup to do this properly.

Question

Does stopping or starting up Discourse modify /var/www/discourse/config/discourse.conf, or is it changed only when rebuilding a container? I think I’ve confirmed that rebuilding a container will change it, through code in /var/discourse/templates/web.template.yml (from @sam’s discourse_docker repository).

I think I do just need the answer to that question; if you disagree and want to double check, below’s more info.

TL;DR. Background

Basically, changes to smtp_address in /var/www/discourse/config/discourse.conf are not autoreloaded, docker stop/restart <Discourse_Container> hangs repeatably, with processes inside the container using 100% CPU in kernel space and outputting kernel stacktraces that suggest “infinite loop”, and after rebooting the host, the changes to /var/www/discourse/config/discourse.conf are gone, even though I never rebuilt the container. So, either Discourse is overwriting it, or AUFS is losing the changes.

Why??

My mail server sysadmin asked me to urgently fix my Discourse configuration, and rebuilding the container right away would have been too much downtime (though of course I planned to do that eventually).

Evidence for kernel bug

Since I’m suspecting a kernel bug, and you shouldn’t trust me on that just so, here’s the smoking gun — a process in state D (uninterruptible sleep) for two minutes.

Jan 12 16:16:07 kamino kernel: [1254339.769725] INFO: task runsv:26554 blocked for more than 120 seconds.
Jan 12 16:16:07 kamino kernel: [1254339.769927]       Tainted: G           OX 3.13.0-74-generic #118-Ubuntu
Jan 12 16:16:07 kamino kernel: [1254339.770135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 12 16:16:07 kamino kernel: [1254339.770383] runsv           D ffff88080f033180     0 26554  26526 0x00000004
Jan 12 16:16:07 kamino kernel: [1254339.770386]  ffff8807744dba90 0000000000000082 ffff8806653f8000 ffff8807744dbfd8
Jan 12 16:16:07 kamino kernel: [1254339.770389]  0000000000013180 0000000000013180 ffff8806653f8000 ffff8808003db408
Jan 12 16:16:07 kamino kernel: [1254339.770392]  ffff8808003db40c ffff8806653f8000 00000000ffffffff ffff8808003db410
Jan 12 16:16:07 kamino kernel: [1254339.770394] Call Trace:
Jan 12 16:16:07 kamino kernel: [1254339.770401]  [<ffffffff81729499>] schedule_preempt_disabled+0x29/0x70
Jan 12 16:16:07 kamino kernel: [1254339.770404]  [<ffffffff8172b305>] __mutex_lock_slowpath+0x135/0x1b0
Jan 12 16:16:07 kamino kernel: [1254339.770406]  [<ffffffff8172b39f>] mutex_lock+0x1f/0x2f
Jan 12 16:16:07 kamino kernel: [1254339.770419]  [<ffffffffa02ab276>] au_new_inode+0xa6/0x700 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770423]  [<ffffffff811ca589>] ? vfs_create+0x109/0x130
Jan 12 16:16:07 kamino kernel: [1254339.770429]  [<ffffffffa02ad8b8>] epilog+0x78/0x160 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770435]  [<ffffffffa02ae1a1>] add_simple+0x1e1/0x2e0 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770440]  [<ffffffffa02ac020>] ? aufs_permission+0x190/0x310 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770445]  [<ffffffffa02ae334>] aufs_create+0x34/0x40 [aufs]
Jan 12 16:16:07 kamino kernel: [1254339.770447]  [<ffffffff811ca54d>] vfs_create+0xcd/0x130
Jan 12 16:16:07 kamino kernel: [1254339.770450]  [<ffffffff811cb71e>] do_last+0x100e/0x1200
Jan 12 16:16:07 kamino kernel: [1254339.770453]  [<ffffffff8131666b>] ? apparmor_file_alloc_security+0x5b/0x180
Jan 12 16:16:07 kamino kernel: [1254339.770457]  [<ffffffff812d8c86>] ? security_file_alloc+0x16/0x20
Jan 12 16:16:07 kamino kernel: [1254339.770459]  [<ffffffff811cde8b>] path_openat+0xbb/0x640
Jan 12 16:16:07 kamino kernel: [1254339.770462]  [<ffffffff811cec9b>] ? SYSC_renameat+0xeb/0x420
Jan 12 16:16:07 kamino kernel: [1254339.770465]  [<ffffffff811cf27a>] do_filp_open+0x3a/0x90
Jan 12 16:16:07 kamino kernel: [1254339.770467]  [<ffffffff811dc0d7>] ? __alloc_fd+0xa7/0x130
Jan 12 16:16:07 kamino kernel: [1254339.770470]  [<ffffffff811bd839>] do_sys_open+0x129/0x280
Jan 12 16:16:07 kamino kernel: [1254339.770472]  [<ffffffff811bd9ae>] SyS_open+0x1e/0x20
Jan 12 16:16:07 kamino kernel: [1254339.770475]  [<ffffffff8173575d>] system_call_fastpath+0x1a/0x1f

That process shouldn’t be using 100% CPU, but three other ones (presumably from the same container) are:

26555 root      20   0     168      4        R  99.9  0.0   4:00.38 runsv rsyslog
26553 root      20   0     168      4        D  98.6  0.0   3:55.57 runsv postgres
26556 root      20   0     168      4        R  97.2  0.0   3:52.82 runsv cron

This info isn’t conclusive of course—as I said, I’m investigating—but it is rather suggestive evidence.

See:

https://meta.discourse.org/t/runsv-hanging-on-docker-container-shutdown/36844/27?u=sam

Please try that out ASAP and let me know if it works, I want to roll it into the image.

Yes, we do autogenerate that file on boot per:

https://github.com/discourse/discourse_docker/blob/master/templates/web.template.yml#L26-L35

2 Likes