Yes, the “orphans” is only deleted posts or does not appear in any posts.
I think mostly it is admin judgement call. Old topics are not necessarily irrelevant. Same goes for closed, archived and unlisted topics.
It`s always have seemed bit strange that deleted topics retain uploads. For what purpose? Nobody except admins will be able to view deleted topics and I doubt that they will have any need for the uploads.
For my use scenario the best solution would be to delete just uploads and retain topic. So that all text information would not be lost and there would not be any broken topic links.
I’d be curious to see if we could filter images for clean up within a certain category? It would be easy for me as an admin to justify removing uploads from “more than 1 year ago, under ## Category” - any discussions with uploads that I’d want to keep could just be moved into a more appropriate category.
Hmm my suggestion here is to actually use S3 for storage of uploads in this case since the per GB cost is cheaper than what it is costing you on your 1GB DO droplet.
Here are the current (2017) prices for Amazon S3:
So 10gb = $1.25 month, 50gb = $6.25 per month, and so on.
I already migrated to S3. My upload folder had gone so big that I even could not restore backup(with uploads) on fresh 1GB DO droplet. Now I just need to figure out how to clean my local upload folder. I noticed that avatars are still kept locally, so I can`t just delete all of my local uploads.
Regarding the matter I wrote above. My main concern is that I am maintaining and paying for uploads which nobody anymore needs and there isn`t a efficient way to clean up disk space. Even uploads from deleted topics are still kept on disk. All the occasional pornographic pictures and selfies(which users requested to be removed) from more than a year ago are still kept on the disk. I remember one case where one of my users liked Discourse upload feature so much that he attended to upload a couple chapters of comic book scans to the server. Luckily I was able to stop him, regardless of that there could be lurking around 100MB of comic book upload on my server. These kind of little things will be adding up disk space as community gets older and bigger.
I understand that this really is not a real world problem. Storage costs almost nothing these days and there is no need to cry about couple GB of dead weight uploads. I just saw inefficiency and wanted to share my view of it. Discourse is really powerful community building tool and data usage is nothing compered to the benefits of this platform.
I can report that uploads can`t be deleted this way.
@tgxworld or @zogstrip can follow up but images unreferenced by posts should be automatically removed over time.
The CleanUpUploads
job runs every hour and will delete every uploads that are not used in a post or by a user in their profile or as a category background/icon.
If uploads aren’t deleted, then they are referenced somewhere.
It will also keep uploads used in a draft post/topic, correct?
That’s a good question. I don’t think we keep uploads only used in drafts. I will check tomorrow to be sure.
EDIT: Yeah, @jomaxro reminded me of my own work We do keep uploads only used in drafts and queued posts.
FYI, looks like you fixed this last year - not sure if your new changes made this irrelevant.
Forgot the link…
https://meta.discourse.org/t/images-in-drafts-and-queued-posts-are-cleared-to-aggressively/48048/2?u=jomaxro
How did you handle the migration? There is a rake task that will migrate your uploads to S3.
https://github.com/discourse/discourse/blob/master/lib/tasks/uploads.rake#L172
Enabling the following site setting will clean up uploads from deleted topics.
Thank you for the feedback By inefficiency, do you mean old topics with huge uploads that no one ever reads anymore?
I have even tried to destroy posts via ruby console. Even after that uploads are still kept on the server.
I used this exact rake task. When it finished all of my old uploads were still kept on the server. After that I rebaked all of the posts with uploads to update picture links.
This setting is enabled on my site for as long as I can remember.
Not exactly. I am more concerned with uploads users don’t even have access to. For example, user creates a post with picture in it and site administrator deletes this post. Picture in this post will be kept on the server. And I don`t see any timeout on that. I can go and take a look on posts which were deleted more than a year ago and all uploads are still there, they even have migrated to S3.
If you’ve successfully migrated to S3, then uploads stored locally won’t be publicly available anymore.
Well my problem did not disappear by it’s self. I had to manually delete my pre-S3-migration uploads to free some disk space. That did the trick, but in the process some of the oldest user avatar pictures were lost.
Right now all of my file uploads are stored in S3. I have not checked if avatar pictures are deleted properly from my instance. At least I have not noticed any noticeable buildup in local upload size.
When you moved to s3 was rake uploads:migrate_to_s3
run? Were all pre-s3 uploads lost?
No, they were copied to S3 and links in posts pointed to S3. My problem was that afterwards these files were not deleted from the local hard drive.
I’ve done one migration of uploads to s3 today with a small forum to test the waters. The old local uploads don’t seem to clear out automatically (if the CleanUpUploads job runs once per hour) with the ‘clean up uploads’ enabled unfortunately. I’d guess that this is because it’s checking for upload orphans in the S3 bucket exclusively once the migration is complete. I can see the old local files and subfolders when I navigate to:
/var/discourse/shared/standalone/uploads/default/optimized/
/var/discourse/shared/standalone/uploads/default/original/