This topic covers the configuration of the Embeddings module of the Discourse AI plugin. It explains what embeddings are, how they’re used, and how to set them up.
Required user level: Administrator
Embeddings are a crucial component of the Discourse AI plugin, enabling features like Related topics and AI search. This guide will walk you through the setup and use of embeddings in your Discourse instance.
What are Embeddings?
Embeddings are numerical representations of text that capture semantic meaning. In Discourse, they’re used to:
- Generate related topics at the bottom of topic pages
- Enable semantic search functionality
Setting up Embeddings
For hosted customers
If you’re a hosted customer, embeddings are pre-configured. You can simply enable the AI features that depend on them.
For self-hosted instances
If you’re self-hosting, refer to the Discourse AI self-hosted guide for detailed setup instructions.
Configuring Embedding Definitions
Embedding models are now configured as Embedding Definitions in the admin UI. Navigate to Admin → AI plugin → Embeddings tab. When adding a new embedding definition, you can choose from pre-configured presets or configure one manually.
Available presets include:
- text-embedding-3-large (OpenAI)
- text-embedding-3-small (OpenAI)
- text-embedding-ada-002 (OpenAI)
- gemini-embedding-001 (Google)
- bge-large-en (Hugging Face)
- bge-m3 (Hugging Face)
- multilingual-e5-large (Hugging Face)
Each embedding definition includes: display name, provider, URL, API key (or AI Secret), tokenizer, dimensions, distance function, max sequence length, and optional embed/search prompts.
Configuring embeddings
Navigate to Admin → Plugins → Discourse AI, ensure the following settings are enabled.
- ai embeddings enabled: Turn the embeddings module on or off
- ai embeddings selected model: Select which embedding definition to use for generating embeddings
Optional settings that can be tweaked…
- AI embeddings generate for pms: Decide whether to generate embeddings for personal messages
- AI embeddings semantic related topics enabled: Enable or disable the “Related topics” feature
- AI embeddings semantic related topics: The maximum number of related topics to be shown
- AI embeddings semantic related include closed topics: Include closed topics in related topic results
- AI embeddings semantic related age penalty: Apply an exponential age penalty to topics in related results (0.0 disables, higher values penalize older topics more)
- AI embeddings semantic related age time scale: Time scale in days for age penalty calculation (default: 365)
- AI embeddings semantic search enabled: Enable full-page AI search
- AI embeddings semantic quick search enabled: Enable semantic search option in the search menu popup
- AI embeddings semantic search use hyde: Enable HyDE (Hypothetical Document Embedding) for semantic search
- AI embeddings semantic search hyde agent: The AI agent used to expand search terms when HyDE is enabled
Providers
Discourse AI supports multiple embedding providers:
- OpenAI
- Hugging Face (for open source/open weights models)
- Cloudflare Workers AI
For hosted customers, Discourse provides pre-configured (seeded) embedding definitions that work out of the box.
Features
Related Topics
When enabled, a “Related Topics” section appears at the bottom of topic pages, linking to semantically similar discussions.
AI Search
Embeddings power the semantic search option on the full-page search interface.
Semantic search can optionally use HyDE (Hypothetical Document Embedding). When enabled via ai embeddings semantic search use hyde, the search term is expanded using the AI agent configured in ai embeddings semantic search hyde agent. The expanded search is then converted to a vector and used to find similar topics. This technique adds some latency to search but can improve results.
When selecting an agent for HyDE, choose a fast model like Gemini Flash, Claude Haiku, GPT-4o Mini, or the latest available models.
Generating embeddings
Embeddings are generated automatically for new posts. To generate embeddings for existing content:
- Discourse will automatically backfill embeddings for older topics via a scheduled job that runs every 5 minutes
- The backfill processes topics in order of recent activity first
FAQs
Q: How are related topics determined?
A: Related topics are based solely on embeddings, which include the title, category, tags, and posts content
Q: Can I exclude certain topics from related topics?
A: Yes, there’s a site setting to remove closed topics from the results
Q: Do embeddings work for historical posts?
A: Yes, the system will automatically backfill embeddings for all your content
Additional resources
Last edited by @tobiaseigen 2025-09-25T15:06:15Z
Last checked by @hugh 2024-08-06T04:16:01Z
Check document
Perform check on document:




