How are we all feeling about ChatGPT and other LLMs and how they'll impact forums?

In a quasi-related issue, one of the WSJ columnists put Hardeeā€™s chatbot drive-through ordering system through 30 tests, and it apparently did a pretty good job, only 3 had to be referred to humans for response.

2 Likes

Can you link to the announcement?
It would give those of us out of the (hyper fast) loop a bit of context :slight_smile:

1 Like
2 Likes

Perfect, thank you @RGJ :slight_smile:

It seems itā€™s specifically about this commitment:

Screenshot 2023-07-27 at 13.00.12

So I think this really is up to the companies to facilitate. But watermarking text is fairly impossible, like @merefield mentioned above.

What would you expect Discourse to do in this case @MikeNolan ? If a user simply copy-pastes AI-generated text, there is no way for Discourse to know about it (apart from running spam and AI-detectors), so I donā€™t really see how this specific agreement changes anything for now.

1 Like

User-pasted AI-generated content is probably not something Discourse can do much with, as it is likely indistinguishable from human-generated content (aside from possibly being better-written), but if you use an official Discourse AI plugin, perhaps Discourse can do something about watermarking or otherwise denoting what it generates?

3 Likes

Ah in that way, yes, I can see how that makes sense :slight_smile:

1 Like

We started work on this, for example this very own topic summary is watermarked:

Summarization UI work is the one who got most love, so itā€™s where we already are close to the final form and have this setup. Others will follow.

3 Likes

Maybe a bit semantic but two properties of digital watermarks are that they are hidden to the casual viewer, and hard to remove.

2 Likes

I would think that OPEN acknowledgement of AI-generated content is important, both for text and for images.

Hidden digital signatures are more useful for things like image copyright enforcement.

Iā€™m active on the Ugly Hedghog photography forum, whether or not AI generated or modified images qualify as photographs is a hotly discussed topic there. (Some AI-generated images have won photography contests.)

1 Like

The problem weā€™re discussing right now is that people with malicious intent will use AI to generate things and then remove the acknowledgement and try to stage it as human generated content. That implies the requirement of an origin ā€œtagā€ thatā€™s hard to remove.

2 Likes

The intent isnā€™t necessarily malicious, but it is less than honest.

Good luck finding a way to ā€˜tagā€™ AI generated text that canā€™t be overcome with something possibly as rudimentary as cut-and-paste.

1 Like

Could zero-width characters be used for that?

1 Like

No, those can easily be removed by passing the content through a filter that only keeps normal alphabetical characters. Watermarking text is very, very hard. You basically cannot do it at the character representation level.

This blog post from Scott Aaronson explains a bit how it could work. Scroll down to the ā€œMy Projects at OpenAIā€ section. The method outlined there is copy/paste proof @MikeNolan

2 Likes

Thanks, thatā€™s interesting:

My main project so far has been a tool for statistically watermarking the outputs of a text model like GPT. Basically, whenever GPT generates some long text, we want there to be an otherwise unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT. We want it to be much harder to take a GPT output and pass it off as if it came from a human. This could be helpful for preventing academic plagiarism, obviously, but also, for example, mass generation of propagandaā€¦ Or impersonating someoneā€™s writing style in order to incriminate them. These are all things one might want to make harder, right?
ā€¦
So then to watermark, instead of selecting the next token randomly, the idea will be to select it pseudorandomly, using a cryptographic pseudorandom function, whose key is known only to OpenAI. That wonā€™t make any detectable difference to the end user, assuming the end user canā€™t distinguish the pseudorandom numbers from truly random ones.

2 Likes

One of my concerns about trying to identify AI generated writing is that it will accidentally target well written human generated text.

2 Likes

well-written human generated text seems to be the exception on many forums. :sigh:

3 Likes

I just go back to motivation.

If you identify bad intent, ban or suspend.

If itā€™s well written, well intentioned text with facts that bear out, leave it?

What if the userā€™s first language is not English and theyā€™ve used ChatGPT to refine their grammar?

2 Likes

btw, hereā€™s how I preface AI Topic Summaries:

image

(eek CSS tweak needed!)

1 Like

OK, Iā€™m concerned it could target my posts :slight_smile:

I think so. I donā€™t see a problem with people using AI to help compose posts assuming thereā€™s an actual human being making the decision as to whether or not the AI generated text is worthy of posting.

1 Like

There are a host of tools that can help improve grammar, I donā€™t know if ChatGPT is better than the rest of the bunch.

Improving grammar is a somewhat different issue than generating ā€˜originalā€™ content, though. The AI engines are starting to be targeted by the content owners who want to be reimbursed for using their material to train the AI engine.

1 Like