Cleaning up uploads and purging uploads from S3

Enabling “clean up uploads” sounds scary with the warning message. When converting an existing forum to discourse this setting will be disabled. Not all import scripts will properly register all the uploads in posts, so if you enable it, you might lose a lot of attachments.

With the following query you can check if uploads are properly referenced by the posts:

select p.post_id, u.id as upload_id
from (select id post_id, (regexp_matches(cooked, 'data-download-href=[^\s]+/default/([a-z0-9]+)', 'g'))[1] upload_sha from posts where raw like '%upload://%' order by created_at) as p 
join uploads u on u.sha1 = p.upload_sha
where not exists(select * from upload_references r where r.upload_id = u.id)

That should not return any rows if everything is correct. If you use this query in the Data Explorer plugin it will also neatly link to the posts which have unreferenced attachments.

If the above query does return results you can fix the missing upload references with the following query:

insert into upload_references(upload_id, target_type, target_id, created_at, updated_at)
select u.id, 'Post', p.post_id, u.created_at, u.updated_at 
from (select id post_id, (regexp_matches(cooked, 'data-download-href=[^\s]+/default/([a-z0-9]+)', 'g'))[1] upload_sha from posts where raw like '%upload://%' order by created_at) as p 
join uploads u on u.sha1 = p.upload_sha
on conflict do nothing;

You will need direct database access in order to make the corrective change.