Because message revisions are kept, and older removed messages are also kept, and even removed topics are still there in database, how do I make an image orphaned so it gets removed automatically by a sidekiq job?
I believe this is already how it works, the files are already removed unless the file is referenced in the current post revision. @zogstrip can clarify.
All the details are available in this job
https://github.com/discourse/discourse/blob/master/app/jobs/scheduled/clean_up_uploads.rb
TL;DR: an image is orphaned if and only if it’s not referenced
- in the latest version of a post
- in a draft
- in a queued post
- in a logo site setting
- in a custom emoji
- in a theme
- in a user avatar/background/card image
- in a category logo/background image
I was investigating a case where an upload is not deleted while it belongs to a post that has been deleted over 48 hours ago.
But … neither the CleanUpUploads
job nor the soft deletion code for a post seem to handle this case ?
The job checks for pu.upload_id IS NULL
but while an Upload
has has_many :post_uploads, dependent: :destroy
, a Post
does not have this cascading delete?
What am I overlooking? Or is this really not handled?
Are you sure that upload isn’t being used elsewhere? Like in a site setting? A category image? A user avatar?
I’m 100% sure.
db6033=# select * from uploads where sha1 like 'ff91%';
id | user_id | original_filename | filesize | width | height | url | created_at | updated_at | sha1 | origin | retain_hours | extension | thumbnail_width | thumbnail_height | etag
-------+---------+---------------------+----------+-------+--------+-------------------------------------------------------------------------------+----------------------------+----------------------------+------------------------------------------+--------+--------------+-----------+-----------------+------------------+------
25772 | 4112 | 20190205_144913.jpg | 3233257 | 4032 | 3024 | /uploads/db6033/original/3X/f/f/ff91b693131a925a730f7db0088bc23fd002cc77.jpeg | 2019-02-13 18:44:51.075662 | 2019-02-13 18:44:51.089795 | ff91b693131a925a730f7db0088bc23fd002cc77 | | | jpeg | 666 | 500 |
db6033=# select * from post_uploads where upload_id=25772;
id | post_id | upload_id
--------+---------+-----------
174381 | 279379 | 25772
db6033=# select deleted_at from posts where id=279379;
deleted_at
----------------------------
2019-02-16 06:53:06.366242
db6033=# select * from site_settings ss where NULLIF(ss.value, '')::integer = 25772 and ss.data_type = 18;
id | name | data_type | value | created_at | updated_at
----+------+-----------+-------+------------+------------
(0 rows)
When I look at the code for CleanUpUploads and run it manually, the upload immediately gets excluded from the result set because of the .where("pu.upload_id IS NULL")
clause. So I decided to check where the post_uploads
record is being removed and the only thing I could find is in case the upload record itself is being removed. But when the post is (soft)deleted that post_uploads
entry is still there, and the code doesn’t check if that relates to a post record which has been (soft)deleted.
On one hand, keeping the upload helps when restoring the post.
On the other hand, this can easily be abused.
I think we should update our CleanUpUploads
job to remove uploads only referenced in a soft-deleted post older than 1 week.
@codinghorror / @sam what do you think?
I am traumatised from losing of uploads, my blog just lost a pile of images on early posts due to a bug we had.
I think we should simply start with a rake task people can run to reclaim the space taken up by images on deleted posts (and posts on deleted topics) we can then decide what to do longer term but at least have an outlet here.
I’m going to play the GDPR violin here. It’s not just about reclaiming space - it’s also about being able to remove personal data. Right now many users will think they removed the post including the uploads, while they didn’t. Even soft-deleting the post and hard-deleting the images would not always suffice.
I do understand @sam’s concerns though (although we keep 30 days of incremental backups I fully understand that not every Discourse forum has that luxury) so I was thinking about soft-deleting the images as well:
Would it be hard to deny the download of an image in case all posts it belongs to have been soft-deleted ? Restoring the post would make the image accessible again.
@sam wants removal of uploads from deleted posts to be a manual rake task for now. I don’t have any particular feelings about it.
I object to the GDPR concern, cause if you are asking for ultra strong post removal then posts need to be hard deleted, in that case uploads go as well, so what you are asking for here is a UI hard nuke.
If you must ensure image removal another simple workaround is to edit the image link out of the post
Yeah those were two different solutions, the GDPR concern would indeed need a hard deletion, but if there are objections against that then a soft delete would be a second best.
The image link editing is a cool workaround, thanks for the tip!
I am in a situation where I can’t ignore it. On my instance, our users constantly exchange sensitive data via PM’s such as IDs and documents. They were under the impression that once deleted, they’re gone. I gave them that impression via our rules and terms, as I was mislead by the options within Discourse that state that deleted uploads are removed after X days. It turns out, they’re not. And now this all sits in our database, which will be a disaster if we ever get breached.
Would “hard nuking” as you’ve said would be something that could be implemented via a plugin? It’s not about nuking the post itself, just the attachments/uploads.
First we need the rake task, that is definitely slotted. After we have that we can also add a site setting, default off for this behavior. purge_images_from_deleted_posts_days
something like that. Maybe we can set it to something like 1 year by default. not sure, @codinghorror can decide when we have the setting.
So I can expect this to be implemented or do I need to find my own way?
sam already gave a solution: if you edit the image link out of the post before deleting it then the upload will be removed after the set amount of hours.
Oh sorry, I thought that was only a temporary workaround. OK, thank you.
A post was split to a new topic: How to truly delete topics
Well, given the fact that the proposed implementation is a manual task that would only work on posts that have been removed over 1 year ago, it wouldn’t suit your purpose anyway.
So editing the image link out of the post is the second best solution (actually: the best, since it already exists and works), especially since in your use case those topics are deleted by staff.
I think we would need to diffentiate between different use cases for this:
- regular cleanup / making sure there are no illegal uploads / space issues (the rake task will suffice)
- removal for regulatory (privacy or DMCA) reasons which require uploads to be removed ASAP (editing the image links out of the posts will work, albeit cumbersome)
I think what you want is a plugin that destroys deleted PMs. That’s like the PMs and the uploads