Then make sure a hidden staff post links to the files. (There is a private staff category in every install.) And you should be fine.
I have just one new question in mind that is troubling me.
How does Discourse handles illegal hosting done by users using the PM system? What if a user uploads large files via PM and then links to it from an external source. Will cleanup task delete them or not?
Good question! Illegal activity in Messages is another reason admins need visibility into them, cc: @downey.
There is not a way at the moment to browse the largest file uploads but that would be a good thing to have eventually.
I am glad you agreed here. I will be more than pleased to have this option added in future updates
“Illegal” is going to depend on the jurisdiction. For example, in the United States, the Online Copyright Infringement Liability Limitation Act provides safe harbor for service providers (such as someone running a Discourse site) against actions of their users. As long as the service provider responds to notice of infringing activity, and can produce good records of what that user did or did not do (emphasis hint for @codinghorror), and how they responded to that user and that user’s infringing content, then they are generally not liable for those actions.
In other words, it may not (or may, depending on where you are) be the job of the service provider to police and snoop on its users’ activities.
Sure, Nginx or Apache server logs would exist in all cases for all websites. So you are covered. This is real basic web stuff.
Whether the uploads are illegal or not, the site owner might not appreciate a user who signs up, achieves trust level 1 (necessary to message at all) and then posts 1 GB of files per day in PMs.
Still no such feature? How can I remove uploaded image?
Uploaded files that aren’t used anywhere are automatically removed.
Are you talking about an image that is used somewhere?
This are a good news, thank you. Can I delete 'em manually?
For the avoidance of doubt, when a post is edited and the only instance anywhere of an image/dropbox linked image is removed, is that image classed as not being used anywhere? Currently it is viewable when reviewing the edit history, something I wasn’t expecting.
Those are two different things.
An image has been uploaded and has a
Upload record in the database.
A “dropbox linked image” is a link to an image that is “served” to the clients (browsers) by dropbox.
The automated cleanup is only about uploaded images. Not linked ones.
So an image with an
upload record that was never actually posted or was edited out of a comment would be deleted automatically. That correct?
The png above is coming from a dropbox link. My previous experience with a .gif was that upon deleting the underlying link the image persisted in the comment. If I edited the post and removed the link the image persisted in the original iteration of the post. This post will be edited so I hope it demonstrates what I mean!
This doesn’t seem like it’s being “served”, it seems like it’s being uploaded. Is that not the case?
I deleted the link at the dropbox end and immediately it stopped being viewable in the post.
I re-enabled the link, and instead edited the post to remove the link but in another browser I wasn’t able to view past edits of the post - there is no pencil - to test this:
If I edited the post and removed the link the image persisted in the original iteration of the post.
The behavior in point one runs counter to the post here, if you look through the first post edit history you can see a .gif. The underlying link does not function and the post doesn’t feature it anymore so I’m confused by the differences here and there.
That is correct.
Can’t tell since you ninja edited your post which did not create a new revision. The first version of this post only has “edited.png”.
Which makes me think that you linked the image and did not upload it.
Yes, because you edited your post too quickly. We have a grace period window of 300 seconds here in which if you make an edit, it won’t create a new version of the post.
If you look at the raw, you see that the images are merely links to dropbox
Those were not uploaded. Just linked.
The underlying links have long been deleted - Shouldn’t that mean these .gif should seize appearing? Clicking “View Image” from the context menu takes me to a community.signalusers address, is that expected behavior?
Testing, I’ll edit this out in ~300 seconds and shortly after delete the link.
The link is deleted but the image persists in the edit history, perhaps it doesn’t have an
Upload record as it isn’t removed by the automatic cleanup.
It’s hosted at https://d11a6trkgmumsb.cloudfront.net/original/3X/1/0/101f03af29f12ea30e1226eb96a02c3ed2f6d2ef.png. Not dropbox.
I suppose, looking about, that it is held locally is expected behavior when
download_remote_images_to_local is enabled. I think that’s the relevant setting.
isn’t functioning for this type of upload, as demonstrated in my previous post. Correct me if I’m wrong.
The upload will be deleted if the site setting
clean up uploads is enabled, after
clean orphan uploads grace period hours.
Thanks for the quick response!
clean up uploads sounds like a general setting that would capture all images with an
upload record, is that correct? Not just those present due to
download_remote_images_to_local. If true, I should be able to find examples on the site of regular image uploads that aren’t being removed as a result of the automatic cleanup.
You mind me asking what the
clean orphan uploads grace period hours is set to here so I can offer it as a solution. Or does it come with a default?
If they decide to enable that setting, will they need to do anything to apply it to past posts?
Just for the sake of being explicit, the thinking here is that this isn’t an issue but that a setting needs to be switched on. I just don’t want to go back and say “You need to enable this!” and they say “It is enabled!” I’ll look silly.
I also caught myself frantically looking for a place to browse uploads (familiar with it from MediaWiki) because I just know stuff gets double triple and quadruple uploaded, and sometimes I wonder where a file was that I uploaded once a while ago but maybe lost or deleted so I can link to it instead of re-uploading it yet again… I guess there is something to be said about a file browser…
I also had to somehow delete an uploaded file. We don’t have the cleanup task enabled as some files come from an import from a different forum software and have not yet been referenced in imported posts correctly. So I needed to find a manual way. The following works but is not pretty …
Make sure the relevant upload is no longer in the current version of any post. That way, Discourse will consider it orphaned and not make trouble when you delete it.
Use the Data Explorer plugin or a different way to query the Discourse database to list orphaned uploads, find the relevant one, and note down its upload_id and filename. Relevant query:
SELECT uploads.id, uploads.user_id, uploads.created_at, uploads.url, uploads.filesize FROM uploads LEFT OUTER JOIN post_uploads ON uploads.id = post_uploads.upload_id WHERE post_uploads.post_id IS NULL ORDER BY created_at DESC LIMIT 100
In the database or with the Rails console for Discourse, delete the concerned record from table
uploadsby its upload ID. Here I use the Rails console:
Delete the associated file incl. all optimized versions (if any, applies to images) from the file system via SSH. Note the wildcard added before the file extension to also catch optimized versions, which have a suffix here. Of course,
cd /path/to/discourse/shared/public/ find . -name 43adade7a4cc64426adb8232a56cb2c3b49fb7c9*.pdf -type f -delete
Huh! Looks like the image referred to in this post is not captured by these settings:
Why has it not been deleted?
May I also wonder why Discourse “uploads” a linked file such as the Dropbox link here? The point of linking a specific file will often be retaining control over content.