Depending on your configuration, uploads being a shared resource (referenced by posts, user profile, badges, etc…) it’s somewhat risky to delete them without proper due diligence.
Plus they’re automatically cleaned up after a little while if they’re not being referenced anywhere.
The use case is for time critical removal of uploads as part of human in the loop automation workflows using activepieces, data explorer and the API.
The goal is to enable regular moderators to be able to have access to a simple way to confidently fully remove uploads immediately without SSH access and comprehensively cover all known cases for breaking upload references as well as automatically CDN purge the specific urls (and for the case of proxied avatar, the URL prefix based on the username).
I was relieved when testing, to see that avatar, profile background and card background references are automatically handled when an upload is destroyed.
The two main scenarios would be ‘scorched earth’ and ‘surgical removal’. This is the work in progress for scorched earth:
Converting to an upload hash list:
avatar urls (extract username using regex) → get avatar hash via data explorer query (the hash is not in the url)
topic/post urls → collect all upload hashes used in that post using data explorer
direct original/optimised upload urls (incl. profile background and card background) → extract hashes using regex
Then query each hash to find all occurrences (one data explorer query to cover all cases, single hash at a time):
username/id list, for users who used it for an avatar
username/id list, for users who used it for a profile background
username/id list, for users who used it for a card background
list of all posts (raw) that use this upload
Actions:
suspend all users who used the upload for an avatar, profile background or card background
suspend all users who have posts referencing the target upload, but exclude if their reference is nested inside a quote
delete all topics
delete all posts
destroy upload (no API endpoint)
purge all CDN urls (optimised/unoptimised)
purge the standard prefix for proxied avatar urls for each associated username (to cover all sizes)
The surgical removal scenario would essentially be the same, but not suspend associated users and need some changes regarding collecting all the upload hashes from post/topic urls.
Likely still delete the posts/topics themselves to avoid broken references, but stripping the upload markdown for that specific upload (without touching other uploads) would be superior if possible. Like this automation if it didn’t strip all markdown upload references.
Ideally I’d like to be able to block hashes also, so after going through the above, someone can’t just simply create a new account and re-upload.
I don’t think it’s possible currently for a regular mod, like by using watched words. So perhaps doing a periodic scan like the above for a list of hashes would be a way to handle that.
Yes! I pretty much cried tears of joy when seeing your plugin earlier this month haha. Huge thank you for making and sharing it.
Some scenarios I didn’t find a way to cover with it:
Using the upload: search prefix, it didn’t seem possible to find profile backgrounds or card backgrounds (unless someone direct linked those images in a post also).
Hard delete images in that scenario (image with no association to posts).
Similar for Avatar e.g. get the hash of the avatar (it isn’t usually in the URL), then find any other accounts that possibly also used it, if a suspension is needed + delete the upload with no post association.
Nice to haves:
Trigger a webhook on deletion, in order to trigger CDN purges after deletion for all variants of the uploads.
Bulk/automatic delete all posts/topics that have the upload reference when the upload is deleted.
In Discourse (web ui), it seems not possible to delete (unlink) a user avatar, either by an admin/mod or the user themselves (besides uploading a replacement). It is possible for the profile background and banner. But both of these do not destroy the upload immediately and if another unknown user exists that uses the same upload for a part of their profile (avatar, profile background, profile card) it won’t purge later either.
I ended up thinking that it could be worthwhile to try make some automated workflows that can handle most or all of the process (with human review/approval and/or simplified human input). To make it easier to be applied consistently, cover edge cases and ideally minimise the possibility of having non-deleted posts that have dead upload references. Also auto suspend if needed (unless the target upload is inside a quote) and CDN purge.
This is what I had so far for the data explorer queries (not at the ‘glue them together’ part yet), with the goal of covering:
topic urls
post urls
direct image urls (posts, profile background, card background)
avatar urls (proxied)
Prepare Upload Hashes List
username (from proxied avatar url) to upload hash
-- [params]
-- user_id :username
SELECT
up.sha1 AS upload_hash
FROM
users u
JOIN uploads up ON up.id = u.uploaded_avatar_id
WHERE
u.id = :username
post/topic url to list of upload hashes
-- [params]
-- post_id :post_url
SELECT
COALESCE(json_agg(up.sha1), '[]'::json) AS upload_hashes
FROM
upload_references ur
JOIN uploads up ON up.id = ur.upload_id
WHERE
ur.target_id = :post_url
AND ur.target_type = 'Post'
The other upload URL types that I am aware of, have the hash in the URL itself (don’t need to use data explorer to get those hashes).
Gather and prepare all actionable info about an upload hash
-- [params]
-- string :upload_hash
-- string :upload_schemeless_prefix
-- string :cdn_url_prefix
-- string :app_hostname
WITH target_uploads AS (
SELECT
id,
url,
user_id
FROM
uploads
WHERE
sha1 = :upload_hash
),
all_urls AS (
SELECT
url
FROM
target_uploads
UNION
SELECT
url
FROM
optimized_images
WHERE
upload_id IN (
SELECT
id
FROM
target_uploads)
),
post_data AS (
SELECT
p.id AS post_id,
p.topic_id,
u.id AS user_id,
u.username AS author,
p.created_at,
CASE WHEN p.post_number = 1 THEN
TRUE
ELSE
FALSE
END AS is_topic_starter,
CASE WHEN t.archetype = 'private_message' THEN
TRUE
ELSE
FALSE
END AS is_dm,
p.raw
FROM
upload_references ur
JOIN posts p ON p.id = ur.target_id
AND ur.target_type = 'Post'
JOIN topics t ON t.id = p.topic_id
JOIN users u ON u.id = p.user_id
WHERE
ur.upload_id IN (
SELECT
id
FROM
target_uploads)
AND p.deleted_at IS NULL
)
SELECT
(
SELECT
COALESCE(json_agg(id), '[]'::json)
FROM
target_uploads) AS upload_ids,
(
SELECT
COALESCE(json_agg(url), '[]'::json)
FROM
all_urls) AS upload_s3_schemeless_urls,
(
SELECT
COALESCE(json_agg(
CASE WHEN url LIKE '//%' THEN
REPLACE(url, :upload_schemeless_prefix, :cdn_url_prefix)
ELSE
:cdn_url_prefix || url
END), '[]'::json)
FROM
all_urls) AS upload_cdn_urls,
(
SELECT
COALESCE(json_agg(json_build_object('user_id', id, 'username', username)), '[]'::json)
FROM
users
WHERE
uploaded_avatar_id IN (
SELECT
id
FROM
target_uploads)) AS avatar_users,
(
SELECT
COALESCE(json_agg('https://' || :app_hostname || '/user_avatar/' || :app_hostname || '/' || username || '/'), '[]'::json)
FROM
users
WHERE
uploaded_avatar_id IN (
SELECT
id
FROM
target_uploads)) AS avatar_proxied_url_prefixes,
(
SELECT
COALESCE(json_agg(json_build_object('user_id', u.id, 'username', u.username)), '[]'::json)
FROM
user_profiles up
JOIN users u ON u.id = up.user_id
WHERE
up.profile_background_upload_id IN (
SELECT
id
FROM
target_uploads)) AS profile_background_users,
(
SELECT
COALESCE(json_agg(json_build_object('user_id', u.id, 'username', u.username)), '[]'::json)
FROM
user_profiles up
JOIN users u ON u.id = up.user_id
WHERE
up.card_background_upload_id IN (
SELECT
id
FROM
target_uploads)) AS card_background_users,
(
SELECT
COALESCE(json_agg(pd.topic_id), '[]'::json)
FROM
post_data pd
WHERE
pd.is_topic_starter = TRUE) AS topic_ids,
(
SELECT
COALESCE(json_agg(pd.post_id), '[]'::json)
FROM
post_data pd
WHERE
pd.is_topic_starter = FALSE) AS post_ids,
(
SELECT
COALESCE(json_agg(json_build_object('post_id', pd.post_id, 'topic_id', pd.topic_id, 'user_id', pd.user_id, 'author', pd.author, 'created_at', pd.created_at, 'is_topic_starter', pd.is_topic_starter, 'is_dm', pd.is_dm, 'raw', pd.raw)), '[]'::json)
FROM
post_data pd) AS post_details
Does not cover: pending review queue topics/posts or draft posts
Outputs:
upload_ids (should only be one)
upload_s3_schemeless_urls
upload_cdn_urls
avatar_users
avatar_proxied_url_prefixes
profile_background_users
card_background_users
topic_ids
post_ids
post_details
That will provide the required info to use these APIs selectively: