The Discourse AI plugin has many features that require embeddings to work, such as Related Topics, AI Search, AI Helper Category and Tag suggestion, etc. While you can use a third-party API, like Configure API Keys for OpenAI, Configure API Keys for Cloudflare Workers AI or Configure API Keys for Google Gemini, we built Discourse AI from the first day to not be locked into those.
Running with HuggingFace TEI
HuggingFace provides an awesome container image that can get you running quickly.
For example:
mkdir -p /opt/tei-cache
docker run --rm --gpus all --shm-size 1g -p 8081:80 \
-v /opt/tei-cache:/data \
ghcr.io/huggingface/text-embeddings-inference:latest \
--model-id BAAI/bge-large-en-v1.5
This should get you up and running with a local instance of BAAI/bge-large-en-v1.5, a very good performing open-source model.
You can check if itās working with
curl -X POST \
'http://localhost:8081/embed' \
-H 'Content-Type: application/json' \
-d '{ "inputs": "Testing string for embeddings" }'
Which should return an array of floats under normal operation.
Making it available for your Discourse instance
Most of the time, you will be running this on a dedicated server because of the GPU speed-up. When doing so, I recommend running a reverse proxy, doing TLS termination, and securing the endpoint so it can only be connected by your Discourse instance.
Configuring DiscourseAI
Discourse AI now uses a fully configurable embedding definition system, similar to how LLMs are configured. To set up your self-hosted endpoint:
- Navigate to Admin ā Plugins ā Discourse AI ā Embeddings.
- Click New to create a new embedding definition.
- Select a preset that matches your model (e.g.
bge-large-en,bge-m3, ormultilingual-e5-large), or choose Configure manually for any other model. - Set the URL to point to your self-hosted TEI server (e.g.
https://your-tei-server:8081). - Use the Test button to verify connectivity before saving.
- After saving, set
ai_embeddings_selected_modelto your new embedding definition.
Once configured, Discourse will automatically backfill embeddings for existing topics via a scheduled background job. If you have a large backlog, you can increase the hidden setting ai_embeddings_backfill_batch_size (default: 250) to process topics faster.
Last edited by @Falco 2025-03-21T17:34:12Z
Check document
Perform check on document: