Upload Deletion via API

I reached for it, but it appears that an API endpoint for deleting an upload doesn’t exist.

Muppet Show: Kermit's Trapeze Fall

Posting it here, because I was not able to find any mentions or suggestions for this, the closest I found was: API: a scope with access to /uploads

In the API key granular scope settings for uploads, I see ‘create’ is available, but ‘delete’ is not.

It seems that a potential work around could be to create a custom automation and trigger it via API.

4 Likes

What’s your use case?

Depending on your configuration, uploads being a shared resource (referenced by posts, user profile, badges, etc…) it’s somewhat risky to delete them without proper due diligence.

Plus they’re automatically cleaned up after a little while if they’re not being referenced anywhere.

3 Likes

The use case is for time critical removal of uploads as part of human in the loop automation workflows using activepieces, data explorer and the API.

The goal is to enable regular moderators to be able to have access to a simple way to confidently fully remove uploads immediately without SSH access and comprehensively cover all known cases for breaking upload references as well as automatically CDN purge the specific urls (and for the case of proxied avatar, the URL prefix based on the username).


I was relieved when testing, to see that avatar, profile background and card background references are automatically handled when an upload is destroyed.

https://github.com/discourse/discourse/commit/e1975e293f2625259e925b4a3c93d88d5acfcaa8

https://github.com/discourse/discourse/commit/38e7b1a0492dd4282c3cd3b1ddb2b3343661d31f


The two main scenarios would be ‘scorched earth’ and ‘surgical removal’. This is the work in progress for scorched earth:


Converting to an upload hash list:

  • avatar urls (extract username using regex) → get avatar hash via data explorer query (the hash is not in the url)

  • topic/post urls → collect all upload hashes used in that post using data explorer

  • direct original/optimised upload urls (incl. profile background and card background) → extract hashes using regex


Then query each hash to find all occurrences (one data explorer query to cover all cases, single hash at a time):

  • username/id list, for users who used it for an avatar

  • username/id list, for users who used it for a profile background

  • username/id list, for users who used it for a card background

  • list of all posts (raw) that use this upload


Actions:

  • suspend all users who used the upload for an avatar, profile background or card background

  • suspend all users who have posts referencing the target upload, but exclude if their reference is nested inside a quote

  • delete all topics

  • delete all posts

  • destroy upload (no API endpoint)

  • purge all CDN urls (optimised/unoptimised)

  • purge the standard prefix for proxied avatar urls for each associated username (to cover all sizes)


The surgical removal scenario would essentially be the same, but not suspend associated users and need some changes regarding collecting all the upload hashes from post/topic urls.

Likely still delete the posts/topics themselves to avoid broken references, but stripping the upload markdown for that specific upload (without touching other uploads) would be superior if possible. Like this automation if it didn’t strip all markdown upload references.


Ideally I’d like to be able to block hashes also, so after going through the above, someone can’t just simply create a new account and re-upload. :melting_face: :coffin:

I don’t think it’s possible currently for a regular mod, like by using watched words. So perhaps doing a periodic scan like the above for a list of hashes would be a way to handle that.

1 Like

Have you checked out our Legal Compliance Plugin ?

It has endpoints to get the uploads for a post and to remove an upload given the upload and post id’s.

It also extends the search functionality to get all posts that contain a specific upload, given the hash.

2 Likes

I’m just linking this for users that’ll read and agree with your post, they might be interested into voting for this (I know you already read it):

2 Likes

Yes! I pretty much cried tears of joy when seeing your plugin earlier this month haha. Huge thank you for making and sharing it. :heart:

Some scenarios I didn’t find a way to cover with it:

  • Using the upload: search prefix, it didn’t seem possible to find profile backgrounds or card backgrounds (unless someone direct linked those images in a post also).
  • Hard delete images in that scenario (image with no association to posts).
  • Similar for Avatar e.g. get the hash of the avatar (it isn’t usually in the URL), then find any other accounts that possibly also used it, if a suspension is needed + delete the upload with no post association.

Nice to haves:

  • Trigger a webhook on deletion, in order to trigger CDN purges after deletion for all variants of the uploads.
  • Bulk/automatic delete all posts/topics that have the upload reference when the upload is deleted.

In Discourse (web ui), it seems not possible to delete (unlink) a user avatar, either by an admin/mod or the user themselves (besides uploading a replacement). It is possible for the profile background and banner. But both of these do not destroy the upload immediately and if another unknown user exists that uses the same upload for a part of their profile (avatar, profile background, profile card) it won’t purge later either.


I ended up thinking that it could be worthwhile to try make some automated workflows that can handle most or all of the process (with human review/approval and/or simplified human input). To make it easier to be applied consistently, cover edge cases and ideally minimise the possibility of having non-deleted posts that have dead upload references. Also auto suspend if needed (unless the target upload is inside a quote) and CDN purge.


This is what I had so far for the data explorer queries (not at the ‘glue them together’ part yet), with the goal of covering:

  • topic urls
  • post urls
  • direct image urls (posts, profile background, card background)
  • avatar urls (proxied)

Prepare Upload Hashes List

username (from proxied avatar url) to upload hash

-- [params]
-- user_id :username
SELECT
    up.sha1 AS upload_hash
FROM
    users u
    JOIN uploads up ON up.id = u.uploaded_avatar_id
WHERE
    u.id = :username

post/topic url to list of upload hashes

-- [params]
-- post_id :post_url
SELECT
    COALESCE(json_agg(up.sha1), '[]'::json) AS upload_hashes
FROM
    upload_references ur
    JOIN uploads up ON up.id = ur.upload_id
WHERE
    ur.target_id = :post_url
    AND ur.target_type = 'Post'

The other upload URL types that I am aware of, have the hash in the URL itself (don’t need to use data explorer to get those hashes).


Gather and prepare all actionable info about an upload hash

example params:

upload_schemeless_prefix:
//my-s3-bucket.somehost.example.com

cdn_url_prefix:
https://cdn.example.com

app_hostname:
www.example.com

upload_hash:
0000000000000000000000000000000000000000

hash search & summary

-- [params]
-- string :upload_hash
-- string :upload_schemeless_prefix
-- string :cdn_url_prefix
-- string :app_hostname
WITH target_uploads AS (
    SELECT
        id,
        url,
        user_id
    FROM
        uploads
    WHERE
        sha1 = :upload_hash
),
all_urls AS (
    SELECT
        url
    FROM
        target_uploads
    UNION
    SELECT
        url
    FROM
        optimized_images
    WHERE
        upload_id IN (
            SELECT
                id
            FROM
                target_uploads)
),
post_data AS (
    SELECT
        p.id AS post_id,
        p.topic_id,
        u.id AS user_id,
        u.username AS author,
        p.created_at,
        CASE WHEN p.post_number = 1 THEN
            TRUE
        ELSE
            FALSE
        END AS is_topic_starter,
        CASE WHEN t.archetype = 'private_message' THEN
            TRUE
        ELSE
            FALSE
        END AS is_dm,
        p.raw
    FROM
        upload_references ur
        JOIN posts p ON p.id = ur.target_id
            AND ur.target_type = 'Post'
        JOIN topics t ON t.id = p.topic_id
        JOIN users u ON u.id = p.user_id
    WHERE
        ur.upload_id IN (
            SELECT
                id
            FROM
                target_uploads)
            AND p.deleted_at IS NULL
)
SELECT
    (
        SELECT
            COALESCE(json_agg(id), '[]'::json)
        FROM
            target_uploads) AS upload_ids,
    (
        SELECT
            COALESCE(json_agg(url), '[]'::json)
        FROM
            all_urls) AS upload_s3_schemeless_urls,
    (
        SELECT
            COALESCE(json_agg(
                    CASE WHEN url LIKE '//%' THEN
                        REPLACE(url, :upload_schemeless_prefix, :cdn_url_prefix)
                    ELSE
                        :cdn_url_prefix || url
                    END), '[]'::json)
        FROM
            all_urls) AS upload_cdn_urls,
    (
        SELECT
            COALESCE(json_agg(json_build_object('user_id', id, 'username', username)), '[]'::json)
        FROM
            users
        WHERE
            uploaded_avatar_id IN (
                SELECT
                    id
                FROM
                    target_uploads)) AS avatar_users,
    (
        SELECT
            COALESCE(json_agg('https://' || :app_hostname || '/user_avatar/' || :app_hostname || '/' || username || '/'), '[]'::json)
        FROM
            users
        WHERE
            uploaded_avatar_id IN (
                SELECT
                    id
                FROM
                    target_uploads)) AS avatar_proxied_url_prefixes,
    (
        SELECT
            COALESCE(json_agg(json_build_object('user_id', u.id, 'username', u.username)), '[]'::json)
        FROM
            user_profiles up
            JOIN users u ON u.id = up.user_id
        WHERE
            up.profile_background_upload_id IN (
                SELECT
                    id
                FROM
                    target_uploads)) AS profile_background_users,
    (
        SELECT
            COALESCE(json_agg(json_build_object('user_id', u.id, 'username', u.username)), '[]'::json)
        FROM
            user_profiles up
            JOIN users u ON u.id = up.user_id
        WHERE
            up.card_background_upload_id IN (
                SELECT
                    id
                FROM
                    target_uploads)) AS card_background_users,
    (
        SELECT
            COALESCE(json_agg(pd.topic_id), '[]'::json)
        FROM
            post_data pd
        WHERE
            pd.is_topic_starter = TRUE) AS topic_ids,
    (
        SELECT
            COALESCE(json_agg(pd.post_id), '[]'::json)
        FROM
            post_data pd
        WHERE
            pd.is_topic_starter = FALSE) AS post_ids,
    (
        SELECT
            COALESCE(json_agg(json_build_object('post_id', pd.post_id, 'topic_id', pd.topic_id, 'user_id', pd.user_id, 'author', pd.author, 'created_at', pd.created_at, 'is_topic_starter', pd.is_topic_starter, 'is_dm', pd.is_dm, 'raw', pd.raw)), '[]'::json)
        FROM
            post_data pd) AS post_details

Does not cover: pending review queue topics/posts or draft posts

Outputs:

upload_ids (should only be one)
upload_s3_schemeless_urls
upload_cdn_urls
avatar_users
avatar_proxied_url_prefixes
profile_background_users
card_background_users
topic_ids
post_ids
post_details

That will provide the required info to use these APIs selectively:

And for further actions:

  • Purge all CDN urls (all variants, optimised and original etc.)
  • Purge prefix urls of proxied avatars (to cover all sizes)

Automatically handled by Discourse:

  • Remove upload reference from all user profiles (avatar, profile background and card background) immediately when an upload is destroyed/deleted.
1 Like