Discourse Docker container on AWS: Disk full, can't start service or free space


(Dom) #1

Continuing the discussion from 503 Service Unavailable after error about space:

Hi

I have a Discourse Docker container running on the Amazon AWS free tier. I installed the webserver as per these instructions and everything ran fine for a few days. In this time the public weren’t using the site, it was just my testing / admin work.

I then started to get the error messages about being out of space for download to local (as in the linked post). I then got the 503 error and the site has been offline since.

It looks like the local disk is full (see investigations below); does anyone know what I need to delete / configure to free up some space and stop things filling up so fast in the future?


Investigations

Doing a dh -ah gives this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      7.8G  7.8G     0 100% /
proc               0     0     0    - /proc
sysfs              0     0     0    - /sys
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none               0     0     0    - /sys/fs/fuse/connections
none               0     0     0    - /sys/kernel/debug
none               0     0     0    - /sys/kernel/security
udev            484M  144K  483M   1% /dev
devpts             0     0     0    - /dev/pts
tmpfs           100M  332K   99M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            497M     0  497M   0% /run/shm
none            100M     0  100M   0% /run/user
none               0     0     0    - /sys/fs/pstore
overflow        1.0M     0  1.0M   0% /tmp

Doing a du -x / | sort -n | tail -40 gives this:

344568	/var/cache
347772	/var/lib/docker/aufs/diff/3cfc25c9f6568b30f5839ed2e4aaa6badee68b4261b68f336eb0682fe399812f/var/www/discourse/vendor/bundle/ruby/2.0.0
347772	/var/lib/docker/aufs/diff/d43527fd72fd35b5a34a4fff3e0cb08fba8c477e116a0de949a068dc73829e83/var/www/discourse/vendor/bundle/ruby/2.0.0
347772	/var/lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047/var/www/discourse/vendor/bundle/ruby/2.0.0
347776	/var/lib/docker/aufs/diff/3cfc25c9f6568b30f5839ed2e4aaa6badee68b4261b68f336eb0682fe399812f/var/www/discourse/vendor/bundle/ruby
347776	/var/lib/docker/aufs/diff/d43527fd72fd35b5a34a4fff3e0cb08fba8c477e116a0de949a068dc73829e83/var/www/discourse/vendor/bundle/ruby
347776	/var/lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047/var/www/discourse/vendor/bundle/ruby
347780	/var/lib/docker/aufs/diff/3cfc25c9f6568b30f5839ed2e4aaa6badee68b4261b68f336eb0682fe399812f/var/www/discourse/vendor/bundle
347780	/var/lib/docker/aufs/diff/d43527fd72fd35b5a34a4fff3e0cb08fba8c477e116a0de949a068dc73829e83/var/www/discourse/vendor/bundle
347780	/var/lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047/var/www/discourse/vendor/bundle
349996	/var/lib/docker/aufs/diff/3cfc25c9f6568b30f5839ed2e4aaa6badee68b4261b68f336eb0682fe399812f/var/www/discourse/vendor
349996	/var/lib/docker/aufs/diff/d43527fd72fd35b5a34a4fff3e0cb08fba8c477e116a0de949a068dc73829e83/var/www/discourse/vendor
349996	/var/lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047/var/www/discourse/vendor
448404	/var/lib/docker/aufs/diff/01ec0f64026758b7eb863768ee5fa705cef4ae05a9cd558efcc9116fa1327f01/var/www/discourse
448408	/var/lib/docker/aufs/diff/01ec0f64026758b7eb863768ee5fa705cef4ae05a9cd558efcc9116fa1327f01/var/www
448736	/var/lib/docker/aufs/diff/01ec0f64026758b7eb863768ee5fa705cef4ae05a9cd558efcc9116fa1327f01/var
448808	/var/lib/docker/aufs/diff/01ec0f64026758b7eb863768ee5fa705cef4ae05a9cd558efcc9116fa1327f01
548000	/var/lib/docker/aufs/diff/56fe93484da04e5594627f82e6cddc95fca240ca7f98f719518b5d000d9fd5fb/usr
564912	/lib/modules
612216	/var/lib/docker/aufs/diff/d43527fd72fd35b5a34a4fff3e0cb08fba8c477e116a0de949a068dc73829e83/var/www/discourse
612220	/var/lib/docker/aufs/diff/d43527fd72fd35b5a34a4fff3e0cb08fba8c477e116a0de949a068dc73829e83/var/www
612412	/var/lib/docker/aufs/diff/3cfc25c9f6568b30f5839ed2e4aaa6badee68b4261b68f336eb0682fe399812f/var/www/discourse
612416	/var/lib/docker/aufs/diff/3cfc25c9f6568b30f5839ed2e4aaa6badee68b4261b68f336eb0682fe399812f/var/www
612460	/var/lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047/var/www/discourse
612464	/var/lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047/var/www
648048	/var/lib/docker/aufs/diff/d43527fd72fd35b5a34a4fff3e0cb08fba8c477e116a0de949a068dc73829e83/var
648244	/var/lib/docker/aufs/diff/3cfc25c9f6568b30f5839ed2e4aaa6badee68b4261b68f336eb0682fe399812f/var
648292	/var/lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047/var
658844	/var/lib/docker/aufs/diff/d43527fd72fd35b5a34a4fff3e0cb08fba8c477e116a0de949a068dc73829e83
659044	/var/lib/docker/aufs/diff/3cfc25c9f6568b30f5839ed2e4aaa6badee68b4261b68f336eb0682fe399812f
659092	/var/lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047
673428	/var/lib/docker/aufs/diff/56fe93484da04e5594627f82e6cddc95fca240ca7f98f719518b5d000d9fd5fb
683052	/lib
789800	/usr
3575556	/var/lib/docker/aufs/diff
3575680	/var/lib/docker/aufs
3778048	/var/lib/docker
3935676	/var/lib
4374800	/var
8080304	/

So it looks like all the disk space is being used up by /var/lib/docker/*

In my /var directory doing a sudo du -Sx | sort -rn | head -n 10 gives:

220476	./cache/apt/archives
193392	./lib/docker/containers/6b8dd9956e8a2de814e8db7e2bd75be1b73fe5592ddd900c82c3e09360d00f9f
111444	./lib/apt/lists
106320	./lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047/var/www/discourse/.git/objects/pack
106308	./lib/docker/aufs/diff/3cfc25c9f6568b30f5839ed2e4aaa6badee68b4261b68f336eb0682fe399812f/var/www/discourse/.git/objects/pack
106144	./lib/docker/aufs/diff/d43527fd72fd35b5a34a4fff3e0cb08fba8c477e116a0de949a068dc73829e83/var/www/discourse/.git/objects/pack
92812	./lib/docker/aufs/diff/01ec0f64026758b7eb863768ee5fa705cef4ae05a9cd558efcc9116fa1327f01/var/www/discourse/.git/objects/pack
86676	./cache/apt-xapian-index/index.1
62400	./lib/docker/aufs/diff/6b8dd9956e8a2de814e8db7e2bd75be1b73fe5592ddd900c82c3e09360d00f9f/var/www/discourse/tmp/cache/assets/production/sprockets
61952	./lib/docker/aufs/diff/e0b4c310111297e6fbe396b256582d2a27be21b7d991bc67e7c8e441f0533047/var/www/discourse/tmp/cache/assets/production/sprockets

To try and fix the problem I have:

  • Rebooted, stopped/started the AWS instance to release any ephemeral storage, but this has had no effect.

  • Tried sudo ./launcher cleanup, but that gives the following:

Cannot connect to the docker daemon - verify it is running and you have access

  • Tried sudo docker images --no-trunc| grep none | awk ‘{print $3}’ | xargs -r docker rmi, but that gives the following:

Cannot connect to the Docker daemon. Is ‘docker -d’ running on this host?


/var/lib/docker/aufs/diff increasing in size
#2

I’m on digitalocean, not AWS, but here’s what I did:

  1. switching images and backups to S3
  2. cd /var/discourse/
  3. git pull
  4. ./launcher cleanup

Now I’m at:

root@discourse:/var/discourse# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 40G 5.2G 33G 14% /

This seems like the max space required for a new instance. We have around 40k posts in our forum.


(Dom) #3

Thanks for you amazing fast post.

I don’t think I can move to S3 (yet) as I can’t get to the Discourse admin page as the Docker container won’t start with 0% free space.

Doing sudo git pull on /var/discourse gives:

remote: Counting objects: 85, done.
remote: Total 85 (delta 44), reused 44 (delta 44), pack-reused 41
error: file write error (No space left on device)
fatal: unable to write sha1 file
fatal: unpack-objects failed

Do you know what I can delete from /var/lib/docker/* to free up some space (so that I can then move to S3), or any other way to proceed?


(Dom) #4

UPDATE:

I tried moving the /tmp endpoint to the AWS ephemeral storage as per these instructions:

For security purposes, the mkdir command should create the directory with the sticky bit set in the mode:

mkdir -m 1777 /mnt/tmp
… I’d recommend trying this in /etc/rc.local:

test -d /mnt/tmp || mkdir -m 1777 /mnt/tmp
mount --bind /mnt/tmp /tmp

Unfortunately I get:

[ Error writing /etc/rc.local: No space left on device ]

:frowning: So not even enough space on the disk to write 68 chars to a text file.


(Jeff Atwood) #5

If this is a development machine on the AWS free tier why not just blow it away and start over and allocate more disk?


(Dom) #6

I had linked it to my Wordpress site using WP Discourse, so there are a few cross-linked comments. I was trying to retain these (and all the setup work I’ve already done to get the two systems talking to each other).

Is there a way to safely free-up a few MBs, so that I can get the Discourse instance back on its feet? I’ll then move the /tmp location off the local disk; move all content to S3 and potentially increase the size of the root disk as per here and or here.


(Dom) #7

…or is there a way to back-up the config/content from my current instance and apply it to a fresh instance (bearing in mind that I can’t access things through the admin console)?


(Dean Taylor) #8

You can try recovering some space from any Linux updates that might be hanging around

Note - use these at your own risk - each can cause problems if application dependencies are broken or missing. Sometimes “auto” functions will guess wrongly that the package is no longer used.

Having sad that - I run both of these commands every month on my dedicated Discourse boxes.

autoclean clears out the local repository of retrieved package files. It only removes package files that can no longer be downloaded, and are largely useless:
sudo apt-get autoclean

Removing unused dependency packages:
sudo apt-get autoremove


(Dom) #9

Thanks. I have just done both and its freed 1% of a 10GB disk.

The instance was a fresh install, so not much hanging around it seems. But I’ll keep running those two commands as every little helps.


(Dean Taylor) #10

Also you could try tracking down the backups folder - copy those files off the box and delete the local copies.

Usually there are 7 days of backups kept:


(Dom) #11

SORTED.

I increased the partition size from 8GB to 10GB using Amazon’s instructions here and here.

This gave me enough headroom to run:
/var/discourse$ sudo ./launcher cleanup

This failed complaining about an invalid date, so I ran:
/var/discourse$ sudo ./launcher rebuild

This failed as I ran out of space again :frowning:. So I ran the cleanup command again and this time it worked. My disk space usage was then reduced to 51% of 10GB, so I ran the rebuild command again and it worked - ish.

The output of the rebuild complained that:

Error response from daemon: Cannot start container 531e85273d8049c8b3a8ea58b7605d6d82871d4cfb0c08cb32040d6f78164e59: Cannot find child for /app

I ran sudo ./launcher start app which seemed to sort things out. I checked online and my Discourse instance is back up and running. :smile:

Disk space usage now at 73% of 10GB so I ran cleanup yet again to see if I could free up any more space. This time though it looks like everything is tidy as no more space was reclaimed.


(Dom) #12

Will do. Thanks.

Next jobs:

  • Move /tmp files to ephemeral storage

  • Move content to S3