Can't bootstrap discourse_docker on DigitalOcean/Arch Linux


(Martin Törnqvist) #1

I am trying to setup discourse_docker so I can migrate my old, non-docker installation. The server is on DigitalOcean and runs Arch Linux x64, kernel 3.8.4-1. I’m following this guide and can’t get past the step Bootstrap Discourse.

I have installed docker and lxc (standard packages). Not sure about aufs support since I can’t update the kernel normally on a droplet, and DigitalOcean’s instructions are only for apt-based systems…

As the discourse user, I clone discourse_docker to /var/discourse, edit containers/app.yml, then run ./launcher bootstrap app, where rake fails:

...
I, [2014-09-27T15:44:09.852087 #39]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate'
rake aborted!
NameError: uninitialized constant Onebox
/var/www/discourse/plugins/lazyYT/plugin.rb:14:in `activate!'
...

If I try this as root, it gets a bit further, then fails with a different error:

...
I, [2014-09-27T16:34:02.876802 #38]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake assets:precompile'
...
I, [2014-09-27T16:36:44.987772 #5653]  INFO -- : Writing /var/www/discourse/public/assets/preload_store-48dc8094f8aedcfa1c243aae5eff1400.js
                                         I, [2014-09-27T16:37:42.763873 #5653]  INFO -- : Writing /var/www/discourse/public/assets/vendor-b4a7d12c6e9c7418690f53abd46456a1.js
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/sass-3.2.16/lib/sass/script/number.rb:53: [BUG] Segmentation fault
ruby 2.0.0p576 (2014-09-19 revision 47628) [x86_64-linux]
...
-- C level backtrace information -------------------------------------------
   Segmentation fault
I, [2014-09-27T16:37:45.658779 #38]  INFO -- : 2014-09-27T16:34:07Z 5653 TID-osqtuq1zg INFO: Sidekiq client with redis options {:url=>"redis://localhost:6379/0", :namespace=>"sidekiq"}
Purging temp files
...

There’s 8 GB of free space and a few hundred MB of free memory. I have tried rebooting the droplet a few times.
What’s going on here? I don’t know if it’s obvious but I don’t know Ruby, Rails or these ecosystems…


(Jeff Atwood) #2

Probably some incompatibility between Arch and Docker. Look up guides on getting Docker installed on Arch.


(Martin Törnqvist) #3

I did check both the ArchWiki and Docker’s documentation for Arch Linux, there’s nothing special there except what I already did so it seems like it’s expected to work.

I tried again and got new errors again, this time from redis which crashed for unknown reasons when trying to write. Found some related information here but still didn’t get it to work yet. Redis prints some stacktrace several times, when I try ./launcher start app it says the database is corrupted. Giving up docker for now!


(Sam Saffron) #4

What does docker info return?


(Martin Törnqvist) #5

Before running launcher bootstrap app, having reset docker by stopping the daemon and deleting /var/lib/docker/, then starting it again:

Containers: 0
Images: 2
Storage Driver: devicemapper
 Pool Name: docker-254:0-439805-pool
 Pool Blocksize: 64 Kb
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 1267.4 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 1.2 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.8.4-1-ARCH
Operating System: Arch Linux
WARNING: No swap limit support

After the latter of the errors mentioned above - sass-3.2.16/lib/sass/script/number.rb:53: [BUG] Segmentation fault:

Containers: 1
Images: 6
Storage Driver: devicemapper
 Pool Name: docker-254:0-439805-pool
 Pool Blocksize: 64 Kb
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 451.5 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 1.6 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.8.4-1-ARCH
Operating System: Arch Linux
WARNING: No swap limit support

(Sam Saffron) #6

Device mapper is dodge, we warn against it

Try btrfs (see my notes here) if you can not get aufs going


(Martin Törnqvist) #7

I guess you mean this? Those are very useful notes, thanks! Wish I’d seen this earlier.

With the btrfs volume it has proceeded up to this point, where it has been stuck for 10 minutes or so:

I, [2014-09-28T11:14:06.901981 #39]  INFO -- : Authorized SSH keys for this container:
<user>@<host>

 [154] 28 Sep 11:16:28.079 * 10 changes in 300 seconds. Saving...
[154] 28 Sep 11:16:28.080 * Background saving started by pid 5690
[5690] 28 Sep 11:16:28.141 * DB saved on disk
[5690] 28 Sep 11:16:28.142 * RDB: 1 MB of memory used by copy-on-write
[154] 28 Sep 11:16:28.180 * Background saving terminated with success

Another shell shows that there’s both free memory and free space on the virtual btrfs volume, so I’m not sure what this could mean. Should I interrupt it? Will give it some time first.


(Kane York) #8

The “Authorized SSH keys” line is supposed to be the last step in the bootstrapping process, so it seems like it completed, but fails to cleanly exit?


(Jeff Atwood) #9

We have seen this a few times now on Digital Ocean, the process just hangs there. Only solution is to reboot and launcher rebuild.


(axil) #10

I have faced this issue twice now. See this post. First it was Ubuntu 14.04, now I tried with Arch because it has the latest version of docker packaged.

If you want to change to aufs you have to try a different kernel. Fortunately, there are 2 kernels in AUR that support it. I went with linux-pf as the folks that maintain it, also have an unofficial repository with precompiled binaries.

Containers: 1
Images: 6
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 8
Execution Driver: native-0.2
Kernel Version: 3.16.2-pf
Operating System: Arch Linux
WARNING: No swap limit support

And I just did that and it stuck at (not on DigitalOcean):

I, [2014-10-07T06:50:51.031409 #39]  INFO -- : > su postgres -c 'psql template1 -c "create extension hstore;"'
2014-10-07 06:50:51 UTC ERROR:  extension "hstore" already exists
2014-10-07 06:50:51 UTC STATEMENT:  create extension hstore;
ERROR:  extension "hstore" already exists
I, [2014-10-07T06:50:51.220642 #39]  INFO -- : 


FAILED
--------------------
RuntimeError: su postgres -c 'psql template1 -c "create extension hstore;"' failed with return #<Process::Status: pid 112 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:85:in `spawn'
exec failed with the params "su postgres -c 'psql template1 -c \"create extension hstore;\"'"
b0f87cf4deca56ee0d2928e0ff490fa0809a60d51b744386d2d46e8d068847a1
FAILED TO BOOTSTRAP

(Sam Saffron) #11

That command needs fixing it should only create if missing

@nx2zdk can you send through a PR to fix ASAP


(nx2zdk) #12

Oh. Ok, I’ll fix that.


(nx2zdk) #13

Done! please check it out.


(Sam Saffron) #14

Thanks heaps! :smiley:


(axil) #15

So, I’m still stuck at this point:

I, [2014-10-07T07:48:36.013881 #38]  INFO -- : Authorized SSH keys for this container:
root@discourse

[153] 07 Oct 07:52:15.038 * 10 changes in 300 seconds. Saving...
[153] 07 Oct 07:52:15.042 * Background saving started by pid 675
[675] 07 Oct 07:52:16.166 * DB saved on disk
[675] 07 Oct 07:52:16.168 * RDB: 1 MB of memory used by copy-on-write
[153] 07 Oct 07:52:16.246 * Background saving terminated with success

I have tried to rebuild several times. Using latest discourse_docker with pg fix. Do you have any other pointers to debug this?

While the bootstrap is stuck at that point running ps and images gives:

[root@discourse discourse]# docker ps
CONTAINER ID        IMAGE                        COMMAND                CREATED             STATUS              PORTS               NAMES
1c36654bf92d        samsaffron/discourse:1.0.4   "/bin/bash -c 'cd /p   About an hour ago   Up About an hour                        stoic_leakey        
[root@discourse discourse]# docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
samsaffron/discourse   1.0.4               2e3b522b59c5        11 days ago         1.034 GB

docker version

Client version: 1.2.0
Client API version: 1.14
Go version (client): go1.3.1
Git commit (client): fa7b24f
OS/Arch (client): linux/amd64
Server version: 1.2.0
Server API version: 1.14
Go version (server): go1.3.1
Git commit (server): fa7b24f

Space is not an issue:

[root@discourse discourse]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        28G  4.1G   22G  16% /
dev             496M     0  496M   0% /dev
run             499M  656K  498M   1% /run
tmpfs           499M  780K  498M   1% /dev/shm
tmpfs           499M     0  499M   0% /sys/fs/cgroup
tmpfs           499M     0  499M   0% /tmp
tmpfs           100M     0  100M   0% /run/user/1000
none             28G  4.1G   22G  16% /var/lib/docker/aufs/mnt/1c36654bf92d2016b4c9e31902613fbbb727eac6360d5eed2ab3bd168fec8834

Also, if I try in another terminal session to run ./launcher stop app it says No cid found, whereas there is cids/app_boostrap.cid which includes the same container id that docker ps shows.

What more info do you need to get to the bottom of this? Thanks.


(axil) #16

Ok, after about 2h waiting I got this on my screen:

2014-10-07 09:28:49 UTC WARNING:  pgstat wait timeout
2014-10-07 09:28:59 UTC WARNING:  pgstat wait timeout
2014-10-07 09:42:55 UTC WARNING:  pgstat wait timeout
2014-10-07 09:43:22 UTC WARNING:  pgstat wait timeout
2014-10-07 09:43:32 UTC WARNING:  pgstat wait timeout
2014-10-07 09:43:58 UTC WARNING:  pgstat wait timeout
2014-10-07 09:48:54 UTC WARNING:  pgstat wait timeout
2014-10-07 09:49:19 UTC WARNING:  pgstat wait timeout
2014-10-07 09:49:29 UTC WARNING:  pgstat wait timeout
2014-10-07 09:49:49 UTC WARNING:  pgstat wait timeout
2014-10-07 09:50:19 UTC WARNING:  pgstat wait timeout
2014-10-07 10:39:59 UTC WARNING:  pgstat wait timeout
2014-10-07 10:40:20 UTC WARNING:  pgstat wait timeout
2014-10-07 10:40:31 UTC WARNING:  pgstat wait timeout
2014-10-07 10:40:50 UTC WARNING:  pgstat wait timeout

Could this be of help? Is something blocking postgres from finalizing whatever it’s been doing?
Related answer in SO?

RESOLVED
I gave the VM 2GB of RAM and raised the db_shared_buffers to 512MB.


(Sam Saffron) #17

Interesting, I actually suspect the issue was our termination code in pups (the orchestrator of the bootstrap)

I made this fix

https://github.com/discourse/discourse_docker/commit/c244475cb52f8125250e56917f6af1f69a5a0cd8

We now use fast termination for pg (using SIGINT), and in worst case will just SIGKILL it.

Additionally we were using su to launch it which meant signals were travelling multiple processes, I amended to use chpst which is much safer.

On top of this pups now logs properly during its termination chain, so we can tell what is going on.