Pasted images garbage collection

(Marco) #1

I wonder what happens to a pasted (or uploaded) image in the editor when it’s code <img ...> is later manually deleted in the editor. Is the referenced file deleted too?

And what happens if I paste the same image in two different posts and then I delete the <img...> code from one of them? Do they both reference the same file? Is the image deleted only when the last referencing code is deleted?

(Jens Maier) #2

Discourse keeps track of references to uploaded files. When no posts remain that include a link or reference to the upload, the file is automatically moved to a tombstone area by an automated background job. Another background job periodically cleans out the tombstone area, permanently deleting any files that have been tombstoned for at least one month.

The URL you see in the <img> tag contains the uploaded file’s SHA1 hash. If two uploaded files have the same hashsum (which for all intents and purposes means that they are identical), they will have the same URL and will be stored in the same file on the server’s harddisk, and this file will only be deleted once all references to it have been removed from posts, regardless of whether an <img> tag was copied or the same file was uploaded a second time.

(Dave McClure) #3

Though, deleted posts remain visible to staff now, so I don’t think that actually deletes the reference.

I think the only case where images are truly orphaned now is when a draft post is abandoned before being published.

(Jens Maier) #4

Hm, you’re right. I’ve only taken a quick glance at the code, but it seems that an upload will never be deleted at all once a post referencing it has been entered into the database. Even when a post is eventually trash!'ed, the PostUpload reverse index entries remain, which prevents the CleanUpUploads job from deleting anything.

Or did I miss something?