What LLM to use for Discourse AI?

It’s important to understand the needs of you as the community admin and your members when choosing a Large Language Model (LLM) to power Discourse AI features.

Several factors may influence your decisions:

  1. Performance for use-case: Are you looking for the best-performing model? Performance can vary depending on the task (e.g., summarization, search, complex reasoning, spam detection). Assessment is based on the model’s ability to generate correct, relevant, and coherent responses.
  2. Context length: The context window is the amount of text a model can “see” and consider at one time. Larger context windows allow for processing more information (e.g., longer topics for summarization) and maintaining coherence over longer interactions.
  3. Compatibility: Is the model supported out of the box by the Discourse AI plugin? Will it require specific API endpoints or configuration? Check the plugin documentation for supported providers and models.
  4. Language support: While many top LLMs handle multiple languages well, performance can vary. If your community primarily uses a language other than English, testing specific models for that language is recommended.
  5. Multimodal capabilities: Some features, like AI Triage (NSFW detection), require models that can process images (vision). Ensure the chosen model supports the required modalities.
  6. Speed & Modes: Larger, more powerful models can be slower. For real-time features like AI Helper or Search, faster models might provide a better user experience. Some models (like Claude 3.7 Sonnet) offer different modes, allowing a trade-off between speed and deeper reasoning.
  7. Cost: Budget is often a key factor. Model costs vary significantly based on the provider and the model tier. Costs are typically measured per token (input and output). Faster/smaller models are generally cheaper than large/high-performance models. Open source models can often be run more cost-effectively depending on hosting.
  8. Privacy concerns: Different LLM providers have varying data usage and privacy policies. Review the terms of service, especially regarding whether your data might be used for training purposes. Some providers offer zero data retention options.
  9. Open vs. Closed Source: Open-source models offer transparency and the potential for self-hosting or fine-tuning, though they may require more technical effort. Closed-source models are typically easier to use via APIs but offer less control and transparency.

Choosing an LLM for Discourse AI Features

The LLM landscape evolves rapidly. The table below provides a general overview of currently popular and capable models suitable for various Discourse AI features, categorized by their typical strengths and cost profiles. Models within each category are listed alphabetically.

:warning: These are general guidelines. Always check the official Discourse AI plugin documentation for the most up-to-date list of supported models and required configurations. Performance and cost change frequently; consult the LLM provider’s documentation for the latest details. Open Source model availability and performance can depend on the specific provider or hosting setup.

An alternative option for hosted customers is using the pre-configured open-weight LLMs hosted by Discourse. These can often be enabled via Admin → Settings → AI → ai_llm_enabled_models or specific feature settings.

Category Model Provider Key Strengths / Use Cases Notes
Top Performance/Reasoning Claude 3.7 Sonnet (Thinking) Anthropic Maximum reasoning capability, complex tasks, analysis, generation Uses more resources/time than regular mode, excellent vision
DeepSeek-R1 DeepSeek Strong reasoning, competitive with top tiers, coding, math Open Source option, potentially lower cost than proprietary equivalents
Gemini 2.5 Pro Google High performance, very large context window, strong multimodal Excellent all-rounder, integrates well with Google ecosystem
OpenAI o1 / o1-pro OpenAI State-of-the-art reasoning, complex tasks, generation Highest cost, o1-pro likely needed for max capability via API
Balanced (Multi-Purpose) Claude 3.7 Sonnet (Regular) Anthropic High performance, good reasoning, large context, vision, faster mode Excellent default choice, balances speed and capability
DeepSeek-V3 DeepSeek Strong general performance, good value Open Source option, cost-effective for broad use
GPT-4o OpenAI Very strong all-rounder, good multimodal, widely compatible Great balance of performance, speed, and cost
OpenAI o3-mini OpenAI Good performance and reasoning for cost A flexible, intelligent reasoning model suitable for many tasks
Cost-Effective/Speed Claude 3.5 Haiku Anthropic Extremely fast and low cost, suitable for simpler tasks Best for high-volume, low-latency needs like search, basic summaries
Gemini 2.0 Flash Google Very fast and cost-effective, good general capabilities Good for summarization, search, helper tasks
GPT-4o mini OpenAI Fast, affordable version of GPT-4o, good for many tasks Good balance of cost/performance for simpler features
Llama 3.3 (e.g., 70B) Meta Strong open source model, often cost-effective multi-purpose option Performance varies by provider/hosting, requires checking compatibility
Vision Capable Claude 3.7 Sonnet Anthropic Strong vision capabilities (both modes) Required for AI Triage/NSFW Detection
Gemini 2.5 Pro / 2.0 Flash Google Strong vision capabilities Required for AI Triage/NSFW Detection
GPT-4o / GPT-4o mini OpenAI Integrated text and vision Required for AI Triage/NSFW Detection
Llama 3.2 Meta Open source vision capabilities Requires checking compatibility/hosting/provider support
Discourse Hosted LLM Discourse Pre-configured vision model for hosted sites Check specific feature settings (e.g., NSFW Detection)
Qwen-VL / others Various Check Discourse AI plugin for specific supported vision models Configuration may vary

General Recommendations Mapping (Simplified):

  • AI Bot (Complex Q&A, Persona): Top Performance/Reasoning models (Claude 3.7 Sonnet - Thinking, R1, Gemini 2.5 Pro, o1-pro) or strong Balanced models (GPT-4o, Claude 3.7 Sonnet - Regular, o3-mini).
  • AI Search: Cost-Effective/Speed models (Haiku 3.5, Gemini 2.0 Flash, GPT-4o mini, Llama 3.3) or Balanced models for slightly better understanding (GPT-4o, DeepSeek-V3).
  • AI Helper (Title Suggestions, Proofreading): Cost-Effective/Speed models or Balanced models. Speed is often preferred. Claude 3.7 Sonnet (Regular) or GPT-4o mini are good candidates. Llama 3.3 can also work well here.
  • Summarize: Balanced models (Claude 3.7 Sonnet - Regular, GPT-4o, o3-mini, DeepSeek-V3) or Cost-Effective models (Gemini 2.0 Flash, Llama 3.3). Longer context windows (Gemini 2.5 Pro, Sonnet 3.7) are beneficial for long topics if budget allows.
  • Spam Detection / AI Triage (Text): Cost-Effective/Speed models are usually sufficient and cost-efficient (Haiku 3.5, Gemini 2.0 Flash, GPT-4o mini, Llama 3.3).
  • AI Triage (NSFW Image Detection): Requires a Vision Capable model (GPT-4o/mini, Sonnet 3.7, Gemini 2.5 Pro/2.0 Flash, Llama 3.2, specific Discourse hosted/supported models).

Remember to configure the selected LLM(s) in your Discourse Admin settings under the relevant AI features.

Last edited by @sam 2025-03-31T02:00:15Z

Check documentPerform check on document:
14 Likes

ِI’m sure you are going to support Gemini 2.0, Can you estimate when?

1 Like

It is already supported

2 Likes