Human-driven copy-paste spam

codinghorror · June 8, 2019, 8:02pm

Let us know if you’re still seeing this today.

jsha · June 9, 2019, 6:51pm

Yes, still seeing it.

For a while it died down, but then we recently increased the post edit time setting from 60 minutes to 24 hours, and within the first day we saw another instance of this edit spam.

mbauman · August 23, 2019, 2:48pm

We’ve been seeing this on our site, too — or at least the precursor to it. Our users and staff have thus far been pretty quick to notice the out-of-place plagiarized material and flagged the posts as suspicious before the spammer linked spam.

Is this something that could be automated for first posts, though? It’d be great to flag new threads that contain exact copies of existing posts.

codinghorror · August 23, 2019, 11:35pm

Not at the moment, checking one post against the text of literally every other post ever made is … quite expensive.

Any weird non-sequitur posts from new users should be looked at quite skeptically as a rule. This catches it for me most of the time.

jsha · August 30, 2019, 1:33am

This works well for replies, but not as well for new threads. Since our forum offers support, most new users are asking questions that look fairly similar to other questions that have been asked (and sometimes the questions are copied from other sites, like reddit, though a lot are also copied from our own site).

codinghorror · August 30, 2019, 1:59am

Yeah those are viciously hard to pin down. Have seen a few of them myself on a different Discourse.

Are they exact copies, or modified in any way?

mbauman · August 30, 2019, 5:29am

In our case, the body has been near-exact copies. Frequently formatting is lost, but not always. Sometimes it’s just been a portion of the post. They’ve also always been new threads. The title, for some reason, has sometimes been the same and sometimes it’s inexplicably been the original title with the word “name” appended to the end. I agree that this seems human driven — I think someone is actually control-c, control-v’ing it in manually, and this process is lossy. So no, a simple hash check isn’t gonna work in nearly all cases.

We have yet to see this scheme actually come to fruition, whatever that is. In one case we missed a post for 14 days and it never got followed up on. I found it through a shared IP with another account doing the same thing. We also have our spam settings dialed way down because, even with these odd posts, we almost never have any actual spam. TL0s can post outbound links and images immediately. They can edit posts. So why the charade? It’s all very strange.

Edit: ah, the charade is because they not only dodge the automatic spam filter, but they also dodge eyeballs (and flags) from the active community since edits don’t bump the topic unless it’s also the most recent post. Thus to be effective, they not only need to look innocuous, they need to garner a reply.

jsha · August 30, 2019, 5:52pm

Same as Matt, ours are mostly exact but commonly with some lost formatting or a subset of the post. We see this type of spam mostly on new threads, but also some off-topic replies on existing threads.

mnordhoff · August 30, 2019, 5:58pm

I can’t remember the details, but I believe they sometimes modified the links or URLs that were contained in the original posts, too. I assume it was to work around the new poster link limits, and because nobody wants to spam someone else’s website.

(They’d modify the original links even more when editing the spam in later.)

(Edit: Three Matts!)

codinghorror · September 7, 2019, 10:04am

Good news! We added a feature to help with this – lower trust levels have less time to edit posts. That should help mitigate the editing-related shenanigans.

As you can see, TL0 and TL1 users are limited to 1 day of edit time by default now @jsha @mnordhoff and @mbauman – you may want to ratchet that down that even further.

mbauman · September 7, 2019, 5:27pm

Thank you! I’ve also modified a Data Explorer query to provide a nice table of “stale edits” that admins can occasionally review:

SELECT
    p.id AS post_id,
    p.updated_at,
    (p.updated_at - p.created_at) AS staleness,
    u.trust_level,
    (p.cooked LIKE '%href%') AS has_links
FROM posts p
    JOIN users u
        ON u.id = p.user_id
    JOIN topics t
        ON t.id = p.topic_id
WHERE p.last_editor_id = p.user_id
    AND p.self_edits > 0
    AND (p.updated_at - p.created_at) > INTERVAL '1 Day'
    AND p.deleted_at IS NULL
    AND t.deleted_at IS NULL
    AND t.archetype = 'regular'
ORDER BY p.updated_at DESC
LIMIT 500

jsha · November 6, 2019, 12:13am

That’s terrific! Thanks so much. We’d moved up our “post edit time limit” to 1 day (1440 minutes) and will keep it there for now. But I’ve just increased the “tl2 post edit time limit” to 30 days (43200 minutes), which should make things a lot nicer for our active users. Really appreciate the work and thought the Discourse team has put into this.

codinghorror · November 6, 2019, 12:57am

Fantastic! Let us know how it goes. I suggest you could probably reduce the default edit limit to even less than 1 day, now that TL2 and above have a separate setting.

Topic		Replies	Views
Spam bots tricking Discourse filter by editing Support	28	3028	April 13, 2023
Free to edit post at any time Feature	34	15236	May 22, 2023
Editing Old Posts and Adding Links Doesn't Alert Anybody Feature	23	7615	October 7, 2014
People editing posts into spam Support	16	1325	August 21, 2023
Diagnosing spam attack of 100 topics Feature	34	2892	May 29, 2017

Human-driven copy-paste spam

Related topics