I was investigating a case where an upload is not deleted while it belongs to a post that has been deleted over 48 hours ago.
But … neither the CleanUpUploads job nor the soft deletion code for a post seem to handle this case ?
The job checks for pu.upload_id IS NULL but while an Upload has has_many :post_uploads, dependent: :destroy , a Post does not have this cascading delete?
What am I overlooking? Or is this really not handled?
db6033=# select * from site_settings ss where NULLIF(ss.value, '')::integer = 25772 and ss.data_type = 18;
id | name | data_type | value | created_at | updated_at
----+------+-----------+-------+------------+------------
(0 rows)
When I look at the code for CleanUpUploads and run it manually, the upload immediately gets excluded from the result set because of the .where("pu.upload_id IS NULL") clause. So I decided to check where the post_uploads record is being removed and the only thing I could find is in case the upload record itself is being removed. But when the post is (soft)deleted that post_uploads entry is still there, and the code doesn’t check if that relates to a post record which has been (soft)deleted.
I am traumatised from losing of uploads, my blog just lost a pile of images on early posts due to a bug we had.
I think we should simply start with a rake task people can run to reclaim the space taken up by images on deleted posts (and posts on deleted topics) we can then decide what to do longer term but at least have an outlet here.
I’m going to play the GDPR violin here. It’s not just about reclaiming space - it’s also about being able to remove personal data. Right now many users will think they removed the post including the uploads, while they didn’t. Even soft-deleting the post and hard-deleting the images would not always suffice.
I do understand @sam’s concerns though (although we keep 30 days of incremental backups I fully understand that not every Discourse forum has that luxury) so I was thinking about soft-deleting the images as well:
Would it be hard to deny the download of an image in case all posts it belongs to have been soft-deleted ? Restoring the post would make the image accessible again.
I object to the GDPR concern, cause if you are asking for ultra strong post removal then posts need to be hard deleted, in that case uploads go as well, so what you are asking for here is a UI hard nuke.
If you must ensure image removal another simple workaround is to edit the image link out of the post
Yeah those were two different solutions, the GDPR concern would indeed need a hard deletion, but if there are objections against that then a soft delete would be a second best.
The image link editing is a cool workaround, thanks for the tip!
I am in a situation where I can’t ignore it. On my instance, our users constantly exchange sensitive data via PM’s such as IDs and documents. They were under the impression that once deleted, they’re gone. I gave them that impression via our rules and terms, as I was mislead by the options within Discourse that state that deleted uploads are removed after X days. It turns out, they’re not. And now this all sits in our database, which will be a disaster if we ever get breached.
Would “hard nuking” as you’ve said would be something that could be implemented via a plugin? It’s not about nuking the post itself, just the attachments/uploads.
First we need the rake task, that is definitely slotted. After we have that we can also add a site setting, default off for this behavior. purge_images_from_deleted_posts_days something like that. Maybe we can set it to something like 1 year by default. not sure, @codinghorror can decide when we have the setting.
sam already gave a solution: if you edit the image link out of the post before deleting it then the upload will be removed after the set amount of hours.
Well, given the fact that the proposed implementation is a manual task that would only work on posts that have been removed over 1 year ago, it wouldn’t suit your purpose anyway.
So editing the image link out of the post is the second best solution (actually: the best, since it already exists and works), especially since in your use case those topics are deleted by staff.
I think we would need to diffentiate between different use cases for this:
regular cleanup / making sure there are no illegal uploads / space issues (the rake task will suffice)
removal for regulatory (privacy or DMCA) reasons which require uploads to be removed ASAP (editing the image links out of the posts will work, albeit cumbersome)
Were these ever actioned? I can’t see a Rake task or a setting that will remove images from deleted posts.
When I look in at the old deleted posts (> 1 year old) in our instance, the images are still there.
This matters at the moment as we are moving part of an instance which will involves a heap of topic deletions and historic uploads that need to be removed.
Another way to solve this would be to purge deleted posts and topics from the database, but I can’t see how to do that either.
Same on our instance – would be great to have this. The expectation of a user is that the upload would get purged after 48 hours if their post is deleted, without first needing to edit the link out of the post.