The Akismet spam filter is quite good at finding duplicate posts between the Docker forum and, say, Stack Exchange sites (mostly Stack Overflow), GitHub and Reddit. These posts end up in review, but then don’t reveal where Akismet may have found the matching post:
I guess I wanted to ask if the Akismet plugin could be configured to show URLs of other occurrences. But actually, I want more…
Often just copy/pasting part of the text into Google reveals the source after all. And on the Docker forums I then tend to reject the flag (approve the duplicate post) but also add a staff notice for the volunteers who’re answering questions. Like so:
So, wondering: did anyone ever try to automate something similar?
Asides:
I also tend to post a link back to the forum on Stack Overflow; that’s a manual action anyway. So, if automated then one may want to be notified anyhow.
I quite often use the same approach for “New user typed their first post suspiciously fast, suspected bot or spammer behavior.” which is not detected (or marked) as a duplicate by Akismet (yet).
I didn’t know Akismet filtered for duplicate copies online; I’m guessing it was the inclusion of certain markup used in those examples that triggered Akismet.
I can’t find mention of Akismet providing that service, could you provide guidance? If that information is available via their service maybe we can tap into it.
Hmmm, you may be right. I boldly assumed that Stack Exchange was also using Akismet (which I do not actually know). I think, but will need to find examples, that I also saw the review being triggered for existing posts, after it was duplicated to Stack Exchange. Most often it seems the Stack Exchange post was older, which also explains copy-paste triggering the “typed their first post suspiciously fast” review.
Also, for some time, we surely saw many false positives after posts were edited. This made me assume the filter was confused by its own algorithm to find duplicates, not understanding the duplicate from some online database was the very same post on the very same forum. When searching for the cause of this, I did not find any references in Akismet’s services.
So, many assumptions. I’ll try to find some examples, but maybe even more posts are duplicated between the forum and other places, and maybe I’ve only found few of them after all.
Of course, Akismet could still subscribe to the public feed of Stack Exchange posts, but it’s not their goal to find duplicates. (Or maybe the Stack Exchange duplicates that Akismet flagged were also duplicated elsewhere. Oh well.)
@maiki I’ve not run into examples to confirm this seemed to have happened. Surely Akismet flagged existing posts as spam after some time passed, but I’ve no clue about its internals to determine that.