Discourse AI - 有害性

Falco · 2024 年 8 月 26 日午後 7:21

わかったことは、典型的な有害性に対してゼロトレランスを持つインスタンス（「ブランド」所有のインスタンスなど）ではうまく機能する一方で、よりコミュニティ指向のDiscourseインスタンスでは、有害性モデルが厳しすぎ、より寛容なインスタンスでフラグが多すぎることが判明しました。

そのため、現在の計画は有害性の非推奨とし、この機能をAIトリアージプラグインに移行することです。これにより、管理者がインスタンスで許可されているレベルに合わせて自動有害性検出を適応させるためのカスタマイズ可能なプロンプトを提供します。

また、お客様には、https://ai.google.dev/gemma/docs/shieldgemma や [2312.06674] Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations のような、Detoxifyを生み出した元のJigsaw Kaggleコンペティションで使用されたデータセットに対する内部評価で非常に良好なパフォーマンスを示した、ホストされたモデレーションLLMを提供する予定です。

トピック		返信	表示
Setting up toxicity detection in your community Site Management moderation , automation , how-to , ai	0	857	2024 年 8 月 7 日
Have AI check for inappropriate post or at least words and flag the post Support ai , ai-toxicity	3	404	2023 年 7 月 7 日
Discourse Google Perspective API Plugin official , perspective-api	2	20976	2024 年 8 月 10 日
Setting up NSFW detection in your community Site Management moderation , automation , how-to , ai	0	721	2024 年 10 月 10 日
AI flagging too sensitive Support ai , ai-toxicity	2	578	2024 年 3 月 31 日