Human-driven copy-paste spam

Let us know if you’re still seeing this today.

Yes, still seeing it.

For a while it died down, but then we recently increased the post edit time setting from 60 minutes to 24 hours, and within the first day we saw another instance of this edit spam.

5 Likes

We’ve been seeing this on our site, too — or at least the precursor to it. Our users and staff have thus far been pretty quick to notice the out-of-place plagiarized material and flagged the posts as suspicious before the spammer linked spam.

Is this something that could be automated for first posts, though? It’d be great to flag new threads that contain exact copies of existing posts.

1 Like

Not at the moment, checking one post against the text of literally every other post ever made is … quite expensive.

Any weird non-sequitur posts from new users should be looked at quite skeptically as a rule. This catches it for me most of the time.

1 Like

This works well for replies, but not as well for new threads. Since our forum offers support, most new users are asking questions that look fairly similar to other questions that have been asked (and sometimes the questions are copied from other sites, like reddit, though a lot are also copied from our own site).

3 Likes

Yeah those are viciously hard to pin down. Have seen a few of them myself on a different Discourse.

Are they exact copies, or modified in any way?

1 Like

In our case, the body has been near-exact copies. Frequently formatting is lost, but not always. Sometimes it’s just been a portion of the post. They’ve also always been new threads. The title, for some reason, has sometimes been the same and sometimes it’s inexplicably been the original title with the word “name” appended to the end. I agree that this seems human driven — I think someone is actually control-c, control-v’ing it in manually, and this process is lossy. So no, a simple hash check isn’t gonna work in nearly all cases.

We have yet to see this scheme actually come to fruition, whatever that is. In one case we missed a post for 14 days and it never got followed up on. I found it through a shared IP with another account doing the same thing. We also have our spam settings dialed way down because, even with these odd posts, we almost never have any actual spam. TL0s can post outbound links and images immediately. They can edit posts. So why the charade? It’s all very strange.

Edit: ah, the charade is because they not only dodge the automatic spam filter, but they also dodge eyeballs (and flags) from the active community since edits don’t bump the topic unless it’s also the most recent post. Thus to be effective, they not only need to look innocuous, they need to garner a reply.

2 Likes

Same as Matt, ours are mostly exact but commonly with some lost formatting or a subset of the post. We see this type of spam mostly on new threads, but also some off-topic replies on existing threads.

3 Likes

I can’t remember the details, but I believe they sometimes modified the links or URLs that were contained in the original posts, too. I assume it was to work around the new poster link limits, and because nobody wants to spam someone else’s website.

(They’d modify the original links even more when editing the spam in later.)

(Edit: Three Matts!)

4 Likes

Good news! We added a feature to help with this – lower trust levels have less time to edit posts. That should help mitigate the editing-related shenanigans.

As you can see, TL0 and TL1 users are limited to 1 day of edit time by default now @jsha @mnordhoff and @mbauman – you may want to ratchet that down that even further.

12 Likes

Thank you! I’ve also modified a Data Explorer query to provide a nice table of “stale edits” that admins can occasionally review:

SELECT
    p.id AS post_id,
    p.updated_at,
    (p.updated_at - p.created_at) AS staleness,
    u.trust_level,
    (p.cooked LIKE '%href%') AS has_links
FROM posts p
    JOIN users u
        ON u.id = p.user_id
    JOIN topics t
        ON t.id = p.topic_id
WHERE p.last_editor_id = p.user_id
    AND p.self_edits > 0
    AND (p.updated_at - p.created_at) > INTERVAL '1 Day'
    AND p.deleted_at IS NULL
    AND t.deleted_at IS NULL
    AND t.archetype = 'regular'
ORDER BY p.updated_at DESC
LIMIT 500
6 Likes

That’s terrific! Thanks so much. We’d moved up our “post edit time limit” to 1 day (1440 minutes) and will keep it there for now. But I’ve just increased the “tl2 post edit time limit” to 30 days (43200 minutes), which should make things a lot nicer for our active users. Really appreciate the work and thought the Discourse team has put into this.

7 Likes

Fantastic! Let us know how it goes. I suggest you could probably reduce the default edit limit to even less than 1 day, now that TL2 and above have a separate setting.

7 Likes