Discourse AI

:discourse2: Summary Integration between AI features and Discourse
:hammer_and_wrench: Repository Link GitHub - discourse/discourse-ai
:open_book: Install Guide How to install plugins in Discourse

Please check our blog post about this plugin at

Introducing Discourse AI

We are happy to announce a brand new Discourse plugin that we have been working on: Discourse AI.

Discourse AI is our one-stop solution for integrating Artificial Intelligence and Discourse, enabling both new features and enhancing existing ones. With this first release, we are shipping 5 different Discourse AI modules.

Discourse AI Modules

For Discourse AI, we have opted to keep its features all in a single plugin, but separated by modules that you can enable independently and allow you to customize them for your community needs.

We’ve also made one of our priorities not to lock you to a single company API, so every community can pick the provider that makes sense for them, balancing data privacy, performance, feature sets, and vendor lock-in.

Community Sentiment

With the sentiment module, we will automatically classify every post in your community across sentiment (positive or negative) and/or emotion (joy, surprise, anger, disgust, fear, sadness, or neutral). This will allow your staff team to have insights into the community’s health and will help you to diagnose the sentiment across axes like category, topic, and user level.

Composer AI Helper

After composing your post, click on the :sparkles: icon and select any of the following options:

  • Suggest titles
  • Translate to English
  • Proofread

And after a couple of seconds, you will get some help from the AI.

This is enabled here on Meta for TL3+

Toxicity Detection

The toxicity module can scan both new posts and chat messages and classify them on a toxicity score across a variety of labels. Those toxicity scores are all available for reports, where the community moderators can identify content that may not be adequate for your instance.

And, if you want to get one step further, you can enable automatic flagging of content that crosses a customizable toxicity threshold, which will put the potential problematic content into the Discourse Review Queue, where they can be manually analyzed by your mod team.

NSFW Image Detection

The NSFW module will automatically scan every new upload in user posts and classify each image found for what’s usually considered NSFW content. The content of the classification is available via reports to your moderator team and, optionally, you can enable automatic flagging of content that crosses a certainty threshold.

Embeddings

This powers two modules at the moment:

Semantic Related Topics

When you get to the end of a topic, Discourse presents you with 5 suggestions of topics to read next. Nowadays, we pick 5 random topics for anonymous users and use the unread topics for logged-in users to populate that list, making it quick to generate but not very useful when you are researching a specific subject.

With the new Semantic Related Topics feature, we will use Semantic Textual Similarity between the current topic and all the other topics in your instance to suggest topics that are potentially more relevant to what a person is looking for.

This is enabled here on Meta for all, including anon

Semantic Search

This used the same logic we used for semantic related topics, but to power search results.

More on this soon.

Summarization

This module can summarize topics and chat channels, for times when you need a quick way to figure out what is going on.

Configuration

Check each module documentation topic:

Modules Providers

As we said above, we are committed to offer new AI features without compromising your privacy. See below the current providers and models for each module. CDCK handles hosting for open-source models in our infrastructure and API keys for SaaS providers like OpenAI.

Disclaimer

We are being very mindful with our experimentation around AI. The algorithms we are leaning on are only as good as the data they were trained on. Bias, inaccuracies and hallucinations are all possibilities we need to allow for. We regularly revisit, test and refine our AI modules.

Self Hosting

Check the docs on self-hosting the API services at Discourse AI - Self-Hosted Guide

FAQ

Will this available on Discourse hosting? Which plans?

This is available in preview for Enterprise customers, please contact our support team to get it installed and configured on your instance.

Rollout for select modules for other tiers will follow later.

Will CDCK offer a SaaS version of the AI services API for self-hosted communities?

Not at the moment, but this is something we may consider given the feedback from our community.

31 Likes

It’s a great development. <3

I guess it could be more effective if added with some feature: @Falco

  • Suggest tags and categories for the topic (by looking at the title and description)
  • If a tag is created in the description, artificial intelligence can add an answer to this field: [sorucevap_ai=Are you from the world?] I am currently using this in my own project: Profil - SoruCevap_AI - Soru Cevap
  • And I would like to do this: we can add a button below the answers for the authorities, and the authority can press this button and ask the artificial intelligence to respond to the user. I’m doing this manually with the code above, but wouldn’t it be great with a plugin :slight_smile:
4 Likes

How is the toxicity detection diffrent from Disorder or Google Perspective API

3 Likes

It’s merely a port of Disorder, merged into Discourse AI. We plan on adding more providers to it in Discourse AI in the future. Disorder is now deprecated.

10 Likes

Am I right to assume that the model was trained & tested on English language data only?

1 Like

There are around 20 different models involved in Discourse AI so far, but yes most models are English only. With the exceptions being the Toxicity module that ships with a multilingual module, and the composer Helper module is powered by OpenAI/Anthropic which are multilingual AFAIK.

Also worth saying that I did a case study and found quite a few models with potential for the french language and I’m keen on creating language specific versions of each modules provided there are good open source models available.

5 Likes

I can confirm the AI helper working like a charm on spanish self-hosted instance.

7 Likes

6 posts were split to a new topic: AI plugin failing installation due to incompatibility

I’d like this too — but not sure how hard mapping to a site’s specific categories and existing tags would be without per-site specific training.

2 Likes

Actually quite easy, we have enough tokens there (4000 or so for GPT-4)

You would feed in top say, 500 tags into the prompt and all the categories. Then you would still have enough tokens for title and most post bodies (you can truncate at 4000 tokens)

5 Likes

One cannot fork the repo

1 Like

Short reply/suggestion: Use AI to migrate data from other sites.


As we know many Discourse sites transitioned their existing knowledge from Google Groups or other such forums.

Every few months I find a very old posting in such a site that would be nice to have in the Discourse forum that is now the active repository of such information, e.g. this post

https://swi-prolog.iai.uni-bonn.narkive.com/cOnL0aGn/push-back-lists-on-dcg-rule-heads

from this site

would be nice to have in

As many of us know trying to transform old sites into Discourse using deterministic software is really hard because as one gets into the corner cases it becomes harder and harder and you really don’t know how many corner cases exists. But with transformers, the T in GPT, it should be possible.

If an AI tool is created for such, the only caveat that should not be overlooked is to include a link in the translated post to the original, and/or capture the orginal for display if needed.

Thanks for considering.


Side note:

Before this topic primarily used the following to post AI ideas for Discourse

Integrating GPT3-like bots?

Can now post AI ideas for Discourse here.

1 Like

For those wondering how to view the classification results database for the Community Sentiment and Toxicity modules, this can be done using the Data Explorer plugin, and the classification_results table.

This is useful for seeing how the AI plugin is functioning on your site and classifying posts.

AI Sentiment

SELECT target_id as post_id,
model_used,
classification->'negative' as negative,
classification->'neutral' as neutral,
classification->'positive' as positive
from classification_results
WHERE model_used = 'sentiment'
order by id desc

09cb357d6c2799a50b88c9051c47f9529525bd9f_2_690x119

AI Emotion:

SELECT target_id as post_id,
model_used,
classification->'neutral' as neutral,
classification->'sadness' as sadness,
classification->'surprise' as surprise,
classification->'fear' as fear,
classification->'anger' as anger,
classification->'joy' as joy,
classification->'disgust' as disgust
from classification_results
WHERE model_used = 'emotion'
order by id desc

image

AI Toxicity:

SELECT target_id as post_id,
classification->'toxicity' as toxicity,
classification->'severe_toxicity' as severe_toxicity,
classification->'obscene' as obscene,
classification->'identity_attack' as identity_attack,
classification->'insult' as insult,
classification->'threat' as threat,
classification->'sexual_explicit' as sexual_explicit
From classification_results
WHERE classification_type = 'toxicity'
order by id desc

image

6 Likes

This is fantastic, i can’t wait to give it a try!

Are there any fees to pay, or limits to community sizes before they send too many requests and fees need to be paid to keep it operational?

I realise its a touchy subject, but are the AI service providers recording the answers and if so is there any information on what they are doing with the data?

1 Like

LangChain for LLM Application Development - A free (for now) DeepLearning.AI course by Harrison Chase (Creator of LangChain) and Andrew Ng (DeepLearning.AI)

2 Likes

Does this work with all locales, or Finnish to be specific? Our sports/ice hockey community has wild mood swings, which sometimes leads to toxicity. Team loses → everything is shit and vice versa.

Would be an intriguing pilot.

1 Like

There is something you can check in askym but you probably want to self-host or try to use their APIs for keep your data secured by yourself :slight_smile:

1 Like