مسؤول Discourse مستضاف ذاتيًا لمدة 10 سنوات يسأل: لماذا لا يتم تنظيف المشغل كجزء من إعادة البناء؟

Hey all. My name is Lee, and I’ve been self-hosting Discourse on and off since 2013. I remember having to screw around with rbenv to even get started. I remember having to compile nginx with Phusion Passenger to make things run. I remember arguing with @sam probably ten damn years ago that switching to Docker was capitulating to developer it-works-with-my-home-directory-and-my-nightmare-of-dotfiles weakness (and being dead-ass wrong!). I remember the first time I heard the phrase “bike-shedding”. To quote the man, I remember everything.

After being away for several years, I’ve had occasion to come back to self-hosting Discourse as a replacement for native Wordpress comments on a Houston-area weather site that typically does ~10k PV/day, but during hurricanes, might do ~2 million PV/day to ~1 million unique visitors. We’ve struggled with wordpress’ native comments for years, but as of last Wednesday, we’re live on self-hosted Discourse. (And on Graviton3, no less! Seriously, it just works and it’s great.)

Here’s the point I’m getting round to: it’s 2025, and as a self-hoster I’m still dealing with manually managing my docker image space. I present a story about /dev/root, told in code snippets, after less than a week in production:

[11:49:56] 0 ✓ (1.8ms)
root@discourse:/var/discourse # df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         30G   21G  9.6G  69% /
tmpfs            7.7G     0  7.7G   0% /dev/shm
tmpfs            3.1G  1.1M  3.1G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
efivarfs         128K  3.6K  125K   3% /sys/firmware/efi/efivars
/dev/nvme1n1p16  891M  109M  720M  14% /boot
/dev/nvme1n1p15   98M  6.4M   92M   7% /boot/efi
/dev/nvme0n1      32G  346M   30G   2% /var/discourse
tmpfs            1.6G   12K  1.6G   1% /run/user/1001
overlay           30G   21G  9.6G  69% /var/lib/docker/overlay2/5a649418bbfc064f488e895572eec1ace487a3eaa324fe1d8e3b395e6c5e3645/merged

[11:49:59] 0 ✓ (4.8ms)
root@discourse:/var/discourse # ./launcher cleanup
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y 
Total reclaimed space: 0B
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
Deleted Images:
untagged: discourse/base@sha256:3696bdf18652b5455bd33795ec3b8e0f201c17a04f0e0126fc0317ed821373cd
....

[a whoooooooooooooooole lot of lines redacted]

....
Total reclaimed space: 12.43GB

[11:50:34] 0 ✓ (27.8s)
root@discourse:/var/discourse # df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         30G  6.9G   24G  23% /
tmpfs            7.7G     0  7.7G   0% /dev/shm
tmpfs            3.1G  1.1M  3.1G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
efivarfs         128K  3.6K  125K   3% /sys/firmware/efi/efivars
/dev/nvme1n1p16  891M  109M  720M  14% /boot
/dev/nvme1n1p15   98M  6.4M   92M   7% /boot/efi
/dev/nvme0n1      32G  346M   30G   2% /var/discourse
tmpfs            1.6G   12K  1.6G   1% /run/user/1001
overlay           30G  6.9G   24G  23% /var/lib/docker/overlay2/5a649418bbfc064f488e895572eec1ace487a3eaa324fe1d8e3b395e6c5e3645/merged

[11:55:28] 0 ✓ (3.3ms)
root@discourse:/var/discourse #

I love you guys. I love discourse. I am wedded to the product and I intend to keep using it more or less forever.

But, like…just, why. Why is it 2025 and I am personally by my own-ass self still screwing around with launcher cleanup? Why is image management not an inherent function of launcher?

Again, I love you guys. I chose Discourse for SCW because I believe in what you guys have built and I love using it. But like… that’s half my poor AMI’s boot volume tied up with useless crap that could—at least if I understand the technical side of things—be automatically managed.

Not meaning to complain—just checking in again after a few years away from the admin’s chair. I love the AI spam detection and the AI triage, especially in a weather forum where politically charged posts re: climate change (either for or against) are a regular feature. Thanks for everything <3

7 إعجابات

Great to see you back, Lee! :sunflower:

I had the same thing happen on my self hosted site just this week. Backups were failing and I let that go for a week or so because I was away and didn’t have access to my laptop. As soon as I got back I ran cleanup and recovered lots of disk space and the backups were able to run again.

4 إعجابات

Hello, glad to see you back around here!

Part of it is it’s been “good enough” – we don’t use it internally on our hosting since we frequently rotate containers + images so our cadence is much different than what a self-hosting site would look like.

The other explanation here is between both launcher and docker, no system wants to take full responsibility for data removal schedule - the schedule for deleting user data should be in full control of the user.

I’ve run into some issues on self-hosted sites where cleanup also cleans up the new discourse base I needed to build, leading to a terrible chicken/egg problem. Having that not get noticed due to it being run automatically probably would be a bit of a snag to try and figure out.

A simple suggestion here could perhaps be to cronjob a docker system prune or launcher clean at your own risk. Could work?

6 إعجابات

Because sometimes it can delete the only working container that you have.

You could just run it every time before you run a rebuild, while all your working containers are still running.

إعجاب واحد (1)

Good call—sometimes the simple answers are the best ones. Thank you, and I shall make it so!

How can I/we answer yes when doing ./launcher cleanup via cron? I mean for me containers isn`t that big issue, but orphaned images are.

إعجاب واحد (1)

There’s no reason to do it with cron, you make new images only when you build one with launcher. You only need to do it before or after you build an image.

If you want to avoid launcher’s prompts, you can do it with the docker commands as suggested above. Here’s one (but rtfm about what it does to make sure it’s what you want):

/usr/bin/docker image prune -a -f
إعجاب واحد (1)

Have to check out that. Thanks.

I don’t know anything else but today, again, rebuilding failed because I had under 5 GB left empty space. Sure, cleanup did the job, and that wasn`t anything else but little bit annoying. And yet, I would like to not see such situations.

And here shows how little I understand how docker works :joy: If I undestood right those images, that were destroyed because they weren’t used by any container, weren’t images in the meaning pictures at all as I’ve reckoned all the time :face_with_peeking_eye: :rofl:

إعجابَين (2)

The direct answer is you could echo y | launcher cleanup to send “y” early.

The indirect answer is the actual launcher cleanup (after is equivalent is these two commands:

docker container prune --force --filter until=24h
docker image prune --all --force --filter until=24h

and the prompt I think you’re referring to is for removing old postgres data directories:

rm -rf /var/discourse/shared/standalone/postgres_data_old*

You could drop the dependency on launcher and use those commands directly.

إعجابَين (2)

Actually I’m referring to questions what I got when doing ./launcher cleanup. First it removes all containers that are stopped. Then it offers to delete all images that aren’t used by at least one container — and that part is the one that frees space for me, close to 40 GB last time.

That’s why I’ve been quite confused because I couldn`t understand why I had so many orphan images (jpg, png etc.). But we are talking here about totally different images, right.

Yes, I do rebuilding at least twice every week. Or quite recently when I was hunting one bad behaving plugin, I did at least a dozen rebuildings.

Will it do every time new image, I don’t know.

Every rebuild is a new image - so they would accumulate if not cleaned up.

Launcher currently only prompts to clean when running other commands once disk space is low.

إعجاب واحد (1)

Which can be a bummer if you’re running it with a script; the script will just hang waiting for a response (I guess that’s why to pipe yes into it). I just do a cleanup if the disk has less than 10GB free.

إعجاب واحد (1)

Here’s perhaps a potential workaround that might work for me. Bringing it up here in case it’s helpful to others.

I’m contemplating adding a data-root setting to /etc/docker/daemon.json, to see if that forces docker to place its images—Discourse’s images, in this case, since nothing else is hosted on the box—in a less critical location that won’t blow up my boot volume.

Searching on meta for past threads about this gets me a couple of results which don’t really tell me much, and before I cause my production Discourse instance to collapse in a flaming smoldering heap, I wanted to ask to see if this is viable :slight_smile:

I took a different approach, and mounted a separate filesystem on /var/lib/docker

In my case, for very site-specific reasons I chose separate filesystems for each of /var/discourse/shared, /var/discourse/shared/data, /var/discourse/shared/app/uploads/default/original, and /var/lib/docker — but if you want to just have /var/discourse as a separate filesystem, you could probably create the directory /var/discourse/share/docker and bind-mount it onto /var/lib/docker (obviously doing this with a quiescent system and moving files as necessary).

4 إعجابات

That’s an even better idea than screwing with docker’s guts. Thank you!!

إعجاب واحد (1)