I’d love to hear the experiences of staff of forums who don’t allow AI/LLM generated content. How have you been able to communicate this to users? How do you detect it? How do you approach users who post it anyway?
All thoughts are welcome.
Note: Personally, I’m only interested in the human side of things on the front end interactions on the site. I’m assuming that blocking crawlers is a lost cause.
Our forum is a spiritual/religious discussion forum. We ban any and all AI generated content.
Most, if not all, AI text can be easily detected just be reading it. Google’s SynthID is cool tech for detecting AI images and it claims to even be able to detect text probably only written by Gemini but OpenAI also supports the standard. Being able to personally detect the text myself is probably an acquired skill but I appreciate the work being done to respond to the current crisis we have of not being able to detect AI imagery or text.
Muting/suspensions are still the right way to go for this in my opinion, especially if the account is new. if there is a random new account that joins your site and instantly posts an AI generated topic I see no reason why you shouldn’t just suspend the account and block it.
As for the entire scraping dilemma: My site is for internal communication & documentation within a small company at the moment and I’m planning on using it as a backend for blogging eventually. It was not hard to set up a honeypot to deter the crawlers that opt to ignore the robotstxt files on my domains.
Whenever an AI crawler visits said site, they are led to an infinite maze of spam using the lovely iocaine project self-hosted with a dataset of roughly ~7000 made up words, some gibberish HTML, random words, and fake news made by 8B Llama) .
Obviously this is a nuclear “go away” tactic and is not for everybody but it has been great for me in my goal of stopping LLMs from taking my code or text content. I remember reading a case study Anthropic did about LLM poisoning but I can’t find the article anymore so it won’t be attached here, but surely at some point they need to block my domain when they realize the bot has sent a cool 5 million requests to my domain as recently.
(I notice we’re setting aside the question of crawler load, crawlers taking content for training, and the social and economic consequences of the current rapid developments. That’s good.)
For myself, on a low volume hobby site,
we’re trying to agree and formulate a written policy
we deal with things as they come up
the most egregious examples are essentially spam, so we delete and ban
otherwise, we remonstrate, perhaps in public and perhaps in private, and we may delete posts
A suggested form of guidance might look like this:
‘Owning’ the content of messages that you post (i.e. reading & understanding and not blindly copy & pasting content,regardless of where this comes from).
Trying to answer your own questions to the best of your ability first (e.g. by searching the forum) before starting new threads.
Communicate specifics in a succinct manner so that other users can read & understand in order to help, i.e. avoid long walls of repetitive or irrelevant text, or overly broad statements without sufficient information.
Keep discussions on topic, avoid meta discussions (particularly around use of AI - be that ‘best practice’ or ‘ethics thereof’).
Keep conversations respectful and remember that we have useres with different backgrounds, view and opinions.
Have fun! This is meant to be a hobby.
(In our hobby environment, there’s an extra angle, which is use of LLMs within the hobby, which covers a spectrum of possibilities and has both its enthusiasts and its detractors.)