Is there a way to delete old/irrelevant uploads?

(Daniel) #1

Hello Discourse community!

I`m maintaining small foreign community using Discourse for more than a year. This platform really has done wonders for us. We are small and most likely will never grow out of 1GB DO droplet. Down the road our only concern is disk space. Which will run out in about a year or two. Of course one solution is to turn off uploads, but I would really like to refrain from it. We are media heavy community and upload feature is important for us.

Is there a way to delete uploaded files? For example all uploads from specific topic.

We already have couple of irrelevant and archived topics who for the most part contain spam. There is no need to maintain uploads in these kind of topics. A way to cleanup uploaded junk would be very helpful.

(Jeff Atwood) #2

Any suggestions here @zogstrip or @tgxworld?

(OG) #3

I have noticed when uploading through API that unused uploads (not attached to post) are removed automatically with some Job.

(Mittineague) #4

Under Admin -> Settings -> Files there are these

clean up uploads [default: enabled]
Remove orphan unreferenced uploads to prevent illegal hosting. WARNING: you may want to back up of your /uploads directory before enabling this setting.

clean orphan uploads grace period hours [default: 48]
Grace period (in hours) before an orphan upload is removed.

purge deleted uploads grace period days [default: 30]
Grace period (in days) before a deleted upload is erased.

It might be that making an upload an orphan would result it it being deleted. I haven’t tested, but AFAIK these settings do not clean up older posts retroactively.

One problem I see here is defining “old/irrelevant”

  • Admin judgement call?
  • last post date?
  • last viewed date?
  • topic status (eg. Closed/Archived/Unlisted/Deleted)?

(Kane York) #5

Yes, the “orphans” is only deleted posts or does not appear in any posts.

(Daniel) #6

I think mostly it is admin judgement call. Old topics are not necessarily irrelevant. Same goes for closed, archived and unlisted topics.
It`s always have seemed bit strange that deleted topics retain uploads. For what purpose? Nobody except admins will be able to view deleted topics and I doubt that they will have any need for the uploads.

For my use scenario the best solution would be to delete just uploads and retain topic. So that all text information would not be lost and there would not be any broken topic links.

(Keith) #7

I’d be curious to see if we could filter images for clean up within a certain category? It would be easy for me as an admin to justify removing uploads from “more than 1 year ago, under ## Category” - any discussions with uploads that I’d want to keep could just be moved into a more appropriate category.

(Alan Tan) #8

Hmm my suggestion here is to actually use S3 for storage of uploads in this case since the per GB cost is cheaper than what it is costing you on your 1GB DO droplet.

(Jeff Atwood) #9

Here are the current (2017) prices for Amazon S3:

So 10gb = $1.25 month, 50gb = $6.25 per month, and so on.

(Daniel) #10

I already migrated to S3. My upload folder had gone so big that I even could not restore backup(with uploads) on fresh 1GB DO droplet. Now I just need to figure out how to clean my local upload folder. I noticed that avatars are still kept locally, so I can`t just delete all of my local uploads.

Regarding the matter I wrote above. My main concern is that I am maintaining and paying for uploads which nobody anymore needs and there isn`t a efficient way to clean up disk space. Even uploads from deleted topics are still kept on disk. All the occasional pornographic pictures and selfies(which users requested to be removed) from more than a year ago are still kept on the disk. I remember one case where one of my users liked Discourse upload feature so much that he attended to upload a couple chapters of comic book scans to the server. Luckily I was able to stop him, regardless of that there could be lurking around 100MB of comic book upload on my server. These kind of little things will be adding up disk space as community gets older and bigger.

I understand that this really is not a real world problem. Storage costs almost nothing these days and there is no need to cry about couple GB of dead weight uploads. I just saw inefficiency and wanted to share my view of it. Discourse is really powerful community building tool and data usage is nothing compered to the benefits of this platform.

I can report that uploads can`t be deleted this way.

(Jeff Atwood) #11

@tgxworld or @zogstrip can follow up but images unreferenced by posts should be automatically removed over time.

(RĂ©gis Hanol) #12

The CleanUpUploads job runs every hour and will delete every uploads that are not used in a post or by a user in their profile or as a category background/icon.

If uploads aren’t deleted, then they are referenced somewhere.

(Joshua Rosenfeld) #13

It will also keep uploads used in a draft post/topic, correct?

(RĂ©gis Hanol) #14

That’s a good question. I don’t think we keep uploads only used in drafts. I will check tomorrow to be sure.

EDIT: Yeah, @jomaxro reminded me of my own work :wink: We do keep uploads only used in drafts and queued posts.

(Joshua Rosenfeld) #15

FYI, looks like you fixed this last year - not sure if your new changes made this irrelevant.

Forgot the link…

(Alan Tan) #16

How did you handle the migration? There is a rake task that will migrate your uploads to S3.

Enabling the following site setting will clean up uploads from deleted topics.

Thank you for the feedback :slight_smile: By inefficiency, do you mean old topics with huge uploads that no one ever reads anymore?

(Daniel) #17

I have even tried to destroy posts via ruby console. Even after that uploads are still kept on the server.

I used this exact rake task. When it finished all of my old uploads were still kept on the server. After that I rebaked all of the posts with uploads to update picture links.

This setting is enabled on my site for as long as I can remember.

Not exactly. I am more concerned with uploads users don’t even have access to. For example, user creates a post with picture in it and site administrator deletes this post. Picture in this post will be kept on the server. And I don`t see any timeout on that. I can go and take a look on posts which were deleted more than a year ago and all uploads are still there, they even have migrated to S3.

(RĂ©gis Hanol) #18

If you’ve successfully migrated to S3, then uploads stored locally won’t be publicly available anymore.