This topic covers the configuration of the Embeddings module of the Discourse AI plugin. It explains what embeddings are, how they’re used, and how to set them up.
Required user level: Administrator
Embeddings are a crucial component of the Discourse AI plugin, enabling features like Related topics and AI search. This guide will walk you through the setup and use of embeddings in your Discourse instance.
What are Embeddings?
Embeddings are numerical representations of text that capture semantic meaning. In Discourse, they’re used to:
- Generate related topics at the bottom of topic pages
- Enable semantic search functionality
Setting up Embeddings
For hosted customers
If you’re a hosted customer, embeddings are pre-configured. You can simply enable the AI features that depend on them.
For self-hosted instances
If you’re self-hosting, refer to the Discourse AI self-hosted guide for detailed setup instructions.
Configuring embeddings
Navigate to Admin
→ Settings
→ Discourse AI
, ensure the following settings are enabled.
- ai embeddings enabled: Turn the embeddings module on or off
- ai embeddings models: Select which models to use for generating embeddings
Optional settings that can be tweaked…
- AI embeddings generate for pms: Decide whether to generate embeddings for private messages
- AI embeddings semantic related topics enabled: Enable or disable the “Related topics” feature
- AI embeddings semantic related topics: The maximum number of related topics to be shown
- AI embeddings semantic related include closed topics: Inclusion of closed topics within AI search results
- AI embeddings semantic search enabled: Enable full-page AI search
- AI embeddings semantic search hyde model: Model used to expand keywords to get better results during a semantic search
Providers
Within the admin settings, navigate to the AI
plugin → Embeddings
tab to configure any provider-related settings such as API keys.
Discourse AI supports multiple Embedding providers:
- Discourse hosted Embeddings (recommended and default)
- OpenAI
- Open source models via Hugging Face
- Custom options
Features
Related Topics
When enabled, a “Related Topics” section appears at the bottom of topic pages, linking to semantically similar discussions.
AI Search
Embeddings power the semantic search option on the full-page search interface.
Semantic search leans on HyDE (Hypothetical Document Embedding). We expand the search term using a large language model you supply. Once expanded we convert the expanded search to a vector and look for similar topics. This technique adds some latency to search and improves results.
When selecting a model for hyde via ai embeddings semantic search hyde model
be sure to choose a fast model like Gemini Flash, Claude Haiku, GPT4o Mini or the latest available models
Generating embeddings
Embeddings are generated automatically for new posts. To generate embeddings for existing content:
- Embeddings are created when a page is viewed if they’re missing
- Self-hosters can use the rake task
ai:embeddings:backfill
to generate embeddings for all topics
The rake task should only be used by experienced operators who can install required gems manually.
FAQs
Q: How are related topics determined?
A: Related topics are based solely on embeddings, which include the title, category, tags, and posts content
Q: Can I exclude certain topics from related topics?
A: Yes, there’s a site setting to remove closed topics from the results
Q: Do embeddings work for historical posts?
A: Yes, the system will automatically backfill embeddings for all your content
Additional resources
Last edited by @Saif 2025-03-26T14:54:31Z
Last checked by @hugh 2024-08-06T04:16:01Z
Check document
Perform check on document: