Adding a stable way of adding whitelisted "orphaned" uploads in a plugin

There are some situations in which people want to have uploads kept in Discourse that are not strictly associated with existing Discourse models and properties that support uploads, i.e. “whitelisted” orphaned uploads.

For example some people want to use the Custom Wizard plugin to upload content associated with a topic (not a post per se).

The issue is that the clean_up_uploads job removes all orphaned uploads. It’s exceptions are limited to urls in specific site settings.

You could of course turn off clean_up_uploads, or increase the clean_orphan_uploads_grace_period_hours, which is what I’m sometimes forced to do. But this is not ideal.

Maybe there’s another way of “whitelisting” orphaned uploads from a plugin that I’m not seeing, but if not, it’d be great to add some mechanism of doing that into the upload cleanup process.

I’d be happy to prepare a PR along these lines, but I’m wondering if this is something others see as an issue.

11 Likes

This is so clunky

https://github.com/discourse/discourse/blob/32b8a2ccffd50d78b8f7be6f0b524506cd2dbd7e/app/jobs/scheduled/clean_up_uploads.rb#L53-L66

For reviewables @eviltrout introduced a general mechanism where arbitrary objects could be linked to a reviewable.

Moving to a similar system for uploads would be like moving multiple mountains. It would be a nice mountain to move though cause this current system is pretty fragile.

Easiest workaround you have for now is to create a hidden site setting with a data_type for :upload. Then put it in the site settings table.

8 Likes

Yeah, I was thinking this. I didn’t love it as a solution as I’d need to create different a setting for each upload, which feels like an abuse of the site_settings table.

As an interim solution, before the mountain is moved, would you mind if made a PR to add an extra exception to the big “exceptions” query that uses plugin_store_row with a plugin_name like whitelisted_orphan_upload_id?

3 Likes

This query is already SOOOO slow, I worry that another join here is going to make it very painful.

I am not sure but maybe we start by preparing the new table in core for UploadRefrence(object_id, object_type, upload_id) and joining into that, at least it is cheap and then we can move stuff to it. There are quite a few easy moves, only post uploads would be tricky…er.

I think this is work I would prefer the team here to take, it is very fiddly.

Curious to know what @zogstrip and @eviltrout think

4 Likes

Are you talking about Rails’ single table inheritance? It’s a really good fit for reviewables because there is core logic to a reviewable that they all share, but then extra fields.

In this case though, it seems like the problem is we’re putting upload ids all over the place, and not deleting the uploads when they’re cleared out?

Is it unsafe to delete an upload if a user removes it from their profile, because the same upload sha1 could be used elsewhere in the app? If that’s the case I think @sam’s suggestion of an UploadReference table is a good solution.

If uploads are never shared this way, I think a better solution would be to delete when those fields are NULLd out.

2 Likes

That’s exactly why we need the UploadReference table :+1:

I actually think that’s a very good task to learn more about Discourse internals. Might be a good task for @CvX, starting early October. We’ll do some pair programming to get things started and properly split into several “small” tasks.

9 Likes

Hello!

I’m very interested in an UploadReference table. We are in a situation where we have a lot of uploads in plugins that would be considered to be orphan uploads. Our solution is simply to turn off the clean_up_uploads. I checked the other day and we now have 10GB of orphan uploads that would need to be removed manually with some script that also figures out if we reference any of them in our plugins. A solution like the one discussed here would help out a lot for us.

David

Also, on the topic. We reference some uploads purely with html img tags in some cases. This seems to interrupt the baking of these posts and other thing like hyperlinks don’t get cooked. I see that uploads are generally referenced in a upload://.png type manner. My question is, where can this string be found? It’s not stored on the Upload from what I can see.

1 Like

It’s called the short url and you can find how it’s computed in the Upload model

https://github.com/discourse/discourse/blob/7fc99f5e7b4fb4b60c778c5458e32853e00e8063/app/models/upload.rb#L385-L387

2 Likes

Oh thanks! I guess I didn’t look close enough :slight_smile:

1 Like

@angus
We recently merged this PR which adds a upload_references table to address the concerns raised in the OP.

7 Likes

Just a note that this was implemented in the Custom Wizard plugin. Thank you for adding in this system!

2 Likes