What User Information is Exposed to LLMs in Discourse AI

BrianC · January 25, 2025, 12:42pm

I’m using Discourse AI on my site, which runs on a subdomain (community.website.com), and I’d like to better understand what kind of user information might be shared with the language model (LLM) during interactions. Specifically, I’m curious about:

What types of user data (e.g., personal information, IP addresses) could potentially be exposed to the LLM?
Are there any safeguards in place within Discourse AI to limit or anonymize what gets sent?

For some additional context, my setup uses Caddy as a reverse proxy and Sucuri for DNS and firewall. If anyone has insights on how this configuration might affect what is exposed—or just general knowledge about how Discourse AI handles user data—I’d really appreciate the input!

Looking forward to hearing from those who have a better understanding of this.

merefield · January 25, 2025, 1:12pm

I believe you’ve been using my AI plugins at some point, Chatbot and AI Topic Summary, since you’ve posted in those Topics, so I will respond for those, but if you want more information, please post in those Topics.

Both of my plugins send usernames and raw Post content (ie the markdown). NB if someone mentions someone’s name in a Post, or an address that will get sent in the markdown, of course, but otherwise Users are only depicted by Usernames.

Other metadata is not sent, e.g. IPs, User Profiles, etc.

You can see the queries being sent in logs if you select the option for verbose logging and divert the logs to Warn (there’s another setting) so they are visible in /logs.

BrianC · January 25, 2025, 1:25pm

Thank you Robert. Yes, I do use those plugins which are excellent. Appreciate the feedback. After reading some of the LLM privacy policies transferring sensitive data for users would be concerning. Obviously whatever the context within the chat will be sent and the username by itself really not concerning. Some of the LLMS terms are quite invasive so that’s what spurred my inquiry. Thanks again

trusktr · March 17, 2025, 9:29pm

That’s good its only usernames and post content. As long as it is only publicly-visible content, then really it makes no difference if it was a search engine, an AI, or a human, that saw some content and spread it or derived something from it.

I would be concerned about private posts/categories that are for certain logged-in users. If sensitive business discussion is happening, and that stuff goes to an AI, well now the AI can present those ideas to anyone else who may ask for business ideas. Or similar.

My site is for an open source project, so the more data gets sent out for AIs to learn from, the better for helping everyone out.

Topic		Replies	Views
Concerns over personal privacy with the AI plugin Feature privacy , ai , ai-summarize	10	194	April 10, 2025
All Discourse AI features now available on Pro and Business Announcements ai	6	570	January 6, 2025
Help with Discourse AI Support ai	5	89	February 19, 2025
Is there any way to use AI bots while not allowing them access to read all posts? Support ai , ai-bot	1	182	November 15, 2023
How to prevent community content from being used to train LLMs like ChatGPT? Community	71	4040	October 14, 2023

What User Information is Exposed to LLMs in Discourse AI

Related topics