Upgrade failed: low disk space -- excess overlay files?

Well, sigh – I’m stuck on a failed upgrade.

I’m on a 25G VPS, using the supported Docker installation.

Upgrading docker_manager via the admin panel went fine.

Upgrading Discourse from v3.2.0.beta1 +125 to v3.2.0.beta3 +325 via the admin panel failed, so I tried a command line install:

cd /var/discourse
git pull
./launcher rebuild app

…which also failed:

You have less than 5GB of free space on the disk where /var/lib/docker is located. You will need more space to continue
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda2        23G   22G  640M  98% /

Apparently because of two 18G “overlay” files:

root@forum:/var/discourse# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            95M  1.4M   94M   2% /run
/dev/vda2        23G   18G  4.1G  82% /
tmpfs           474M     0  474M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/vda1       511M  6.1M  505M   2% /boot/efi
tmpfs            95M  4.0K   95M   1% /run/user/0
overlay          23G   18G  4.1G  82% /var/lib/docker/overlay2/8a331589d7fa9046a6ef73489cc830c2583cb76c9174125c8bfe1064d58cd503/merged
overlay          23G   18G  4.1G  82% /var/lib/docker/overlay2/d56574358c8edbc9bc1fb50022585b854462a8ce56daa636b07f3a3771949251/merged

(Three 18G files on a 25G server? That’s 54G…)

It seems something is reclaimable:

root@forum:/var/discourse# docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          2         2         4.3GB     3.334GB (77%)
Containers      2         2         1.849GB   0B (0%)
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B

…but I’m not sure what or how.

Contents of /var/discourse/shared/standalone/backups/default only amount to 67Mb.

I stopped docker with systemctl stop docker and tried these to no effect:

docker system prune -a
docker buildx prune --all
docker builder prune --all

…all reported 0B freed.

I have two Docker images, one for Discourse and one for… “none?”

root@forum:/var/discourse/image# docker images
REPOSITORY            TAG       IMAGE ID       CREATED        SIZE
local_discourse/app   latest    5ff1dcfe050c   2 months ago   4.09GB
<none>                <none>    bbaceb5f4a80   2 months ago   214MB

Apparently “none” indicates a dangling or intermediate image: Why the “none” image appears in Docker and how can we avoid it - Stack Overflow – but it’s so small I don’t think it’s my first priority.

When advice at Is it safe to clean docker/overlay2/ - Stack Overflow gets into grepping overlays against images, I lose steam. There are 60 hashed folders in my docker/overlay2… please don’t make me grep 120 times…

I imagine my options at this point are:
A. Get some help figuring out if either of these overlays can be deleted.
B. Restore from a snapshot, upgrade for more disk space and try again. Will I always have these huge overlays?

(And how do I even have 3 x 18G files on a 25G instance…?)

If anyone’s up at this hour and has any input, I’d appreciate it.

Just to tick off the basics, but you have tried ./launcher cleanup and this is what’s left after?

3 Likes

Yes - cleanup had no effect.

2 Likes

You don’t have two 18GB overlay files, that’s a red herring. Docker uses overlayfs and those are just how your existing disk is presented to the container. /dev/vda2 is your disk and is mounted at / - that’s where you should be directing your efforts.

If ./launcher cleanup did nothing then I’m assuming docker image prune (removes dangling images) won’t either. If you’re only using this server for discourse you may need to just check around to make sure there are no large files/folders in your home directory.

3 Likes

Ohh – well, that’s tricky of Docker…
No, prune operations didn’t recover anything.
I’m poking through /usr now with ncdu… nothing looks like it obviously doesn’t belong, though I’m not sure what to make of /usr/lib/modules:

  547.2 MiB [###########################] /6.2.0-37-generic
  547.2 MiB [########################## ] /6.2.0-34-generic
    1.2 MiB [                           ] /6.2.0-33-generic
    1.2 MiB [                           ] /6.2.0-32-generic
    1.2 MiB [                           ] /6.2.0-35-generic
    1.2 MiB [                           ] /6.2.0-36-generic

By far the most use is reported as the overlays down in /var:

   16.0 GiB [###########################] /var
    4.3 GiB [#######                    ] /usr
    2.3 GiB [###                        ]  swapfile
    1.7 GiB [##                         ] /snap
  289.5 MiB [                           ] /boot

There’s nothing in /snap but what it came with:

root@forum:/# snap list
Name    Version       Rev    Tracking         Publisher   Notes
core22  20230801      864    latest/stable    canonical✓  base
lxd     5.19-8635f82  26200  latest/stable/…  canonical✓  -
snapd   2.60.4        20290  latest/stable    canonical✓  snapd

Whoa – /var/log/journal got big!

    1.8 GiB [###########################] /7341e5ac94ae440bbd06f743e242da89
   16.0 MiB [                           ] /7025a9ae870140c1bef8e55211d339dc

Looks like it’s been tons of bots trying to log in over just a couple of months.
Seems prudent to retain logs for a while, but this forum is still beta.
Maybe vacuuming that will be enough to get me going again.

2 Likes

Well, that didn’t quite do it, so I upgraded the server to 55G. If those big overlay files are inevitable, I guess there wasn’t really a choice.

A Discourse upgrade just completed, the site appears to be working fine on 3.2.0.beta4-dev. :sweat_smile:

Thank you @JammyDodger and @Stephen for your attention and input!

3 Likes