Discourse not cleaning-up local tmp backups after uploading to S3

Running 3.2.0.beta4-dev ( 86da47f58d ) but we’ve had this problem for awhile now.

We have backups configured to go straight to S3. Understandably the application takes it to local storage first then uploads to S3, which is fine. Problem is it doesn’t delete each backup after uploading leading to tons of space usage even without thumbnails saved inside the backups.

root@forum:/var/discourse/shared/standalone/tmp/backups/default# du -sh
57G     .
root@forum:/var/discourse/shared/standalone/tmp/backups/default# du -k
7073520 ./2023-12-28-063845
8040176 ./2023-12-29-063923
8521220 ./2024-01-08-063857
4909616 ./2023-12-24-064434
4918056 ./2024-01-07-064325
7079136 ./2024-01-03-064430
7077984 ./2024-01-02-063855
2949660 ./2024-01-09-063708
59088404        .
root@forum:/var/discourse/shared/standalone/tmp/backups/default# rm -Rf *

Could this be a permissions issue on the directory, possibly? I certainly haven’t changed it.

root@forum:/var/discourse/shared/standalone/tmp/backups# ls -la
total 12
drwxr-xr-x 3 mas www-data 4096 Nov 23 06:44 .
drwxr-xr-x 4 mas www-data 4096 Nov 22 04:57 ..
drwxr-xr-x 2 mas www-data 4096 Jan  9 15:35 default

What’s weird is from the tmp files listing, we see jan 2, 3, 7, 8, and 9 consuming space. From the Discourse backup listing in the admin UI, I only see Jan 4th. So maybe Discourse is taking those backups but not properly uploading them to S3? Problem with that theory is “backup frequency” is set to 3 in the admin configuration, so it shouldn’t be trying to backup every day anyway. Note backup logs in the admin UI is empty, no logs there.

My best explanation is that sometimes the server reboots before it can delete the local backup file.

The backup listing shows what’s on S3, not on your local drive.

Is someone manually running a backup?

The host has 90d uptime and the docker container has 6 weeks of uptime, so no actual reboots, unless you’re talking about something inside the application.

No manual backups from me, certainly not one every single day. Nothing in cron etc either.

root@forum:/# uptime
 17:20:56 up 90 days,  1:52,  4 users,  load average: 0.81, 1.71, 1.81
root@forum:/# docker ps
CONTAINER ID   IMAGE                 COMMAND        CREATED       STATUS       PORTS                                                                      NAMES
d8bc34250454   local_discourse/app   "/sbin/boot"   6 weeks ago   Up 6 weeks>80/tcp, :::80->80/tcp,>443/tcp, :::443->443/tcp   app
1 Like

Still happening, sigh. I guess I’ll cron a find -mtime +2 -delete. Good times.

root@forum:/var/discourse/shared/standalone/tmp/backups/default# du -sh
14G     .
root@forum:/var/discourse/shared/standalone/tmp/backups/default# ls -la
total 16
drwxr-xr-x 4 mas www-data 4096 Jan 16 06:56 .
drwxr-xr-x 3 mas www-data 4096 Nov 23 06:44 ..
drwxr-xr-x 2 mas www-data 4096 Jan 14 06:38 2024-01-14-063807
drwxr-xr-x 2 mas www-data 4096 Jan 15 06:43 2024-01-15-064337
1 Like

Darn. That was my best guess.

Yeah. That might be what to do.

Done. Not the most elegant or satisfying solution, but I guess problem solved.

1 Like

Yeah. I think that’s what I’ll do the next time I have this problem.