Discourse AI - Toxiciteit

Falco · 26 augustus 2024 om 19:21

We integrated GitHub - unitaryai/detoxify: Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai. models to handle automatic post toxicity classification and perform automatic flagging when over a configurable threshold.

What we found, is that while it works great if you have a zero tolerance for typical toxicity on your instances, like what more “brand” owned instance are, for other more community oriented Discourse instances, the toxicity models were too strict, generating too much flags in more lenient instances.

Because of that our current plan is to Depreate Toxicity and move this feature to our AI Triage plugin, where we give a customizable prompt for admins to adapt their automatic Toxicity detection to the levels of what are allowed in their instance.

We also plan on offering our customer a hosted moderation LLM, in the likes of ShieldGemma | Google AI for Developers or [2312.06674] Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations, which perfomed very well in our internal evals against the same dataset used in the original Jigsaw Kaggle competition that spawned Detoxify.

Topic		Antwoorden	Weergaven
Setting up toxicity detection in your community Site Management moderation , automation , how-to , ai	0	868	7 augustus 2024
Have AI check for inappropriate post or at least words and flag the post Support ai , ai-toxicity	3	407	7 juli 2023
Discourse Google Perspective API Plugin official , perspective-api	2	20983	10 augustus 2024
Setting up NSFW detection in your community Site Management moderation , automation , how-to , ai	0	733	10 oktober 2024
AI flagging too sensitive Support ai , ai-toxicity	2	579	31 maart 2024

Discourse AI - Toxiciteit

Gerelateerde topics