Self-Hosting Embeddings for DiscourseAI

Discourse · January 8, 2024, 8:49pm

The Discourse AI plugin has many features that require embeddings to work, such as Related Topics, AI Search, AI Helper Category and Tag suggestion, etc. While you can use a third-party API, like Configure API Keys for OpenAI, Configure API Keys for Cloudflare Workers AI or Configure API Keys for Google Gemini, we built Discourse AI from the first day to not be locked into those.

Running with HuggingFace TEI

HuggingFace provides an awesome container image that can get you running quickly.

For example:

mkdir -p /opt/tei-cache
docker run --rm --gpus all --shm-size 1g -p 8081:80 \
  -v /opt/tei-cache:/data \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id BAAI/bge-large-en-v1.5

This should get you up and running with a local instance of BAAI/bge-large-en-v1.5, a very good performing open-source model.

You can check if it’s working with

curl -X POST \
  'http://localhost:8081/embed' \
  -H 'Content-Type: application/json' \
  -d '{ "inputs": "Testing string for embeddings" }'

Which should return an array of floats under normal operation.

Making it available for your Discourse instance

Most of the time, you will be running this on a dedicated server because of the GPU speed-up. When doing so, I recommend running a reverse proxy, doing TLS termination, and securing the endpoint so it can only be connected by your Discourse instance.

Configuring DiscourseAI

Discourse AI now uses a fully configurable embedding definition system, similar to how LLMs are configured. To set up your self-hosted endpoint:

Navigate to Admin → Plugins → Discourse AI → Embeddings.
Click New to create a new embedding definition.
Select a preset that matches your model (e.g. bge-large-en, bge-m3, or multilingual-e5-large), or choose Configure manually for any other model.
Set the URL to point to your self-hosted TEI server (e.g. https://your-tei-server:8081).
Use the Test button to verify connectivity before saving.
After saving, set ai_embeddings_selected_model to your new embedding definition.

Once configured, Discourse will automatically backfill embeddings for existing topics via a scheduled background job. If you have a large backlog, you can increase the hidden setting ai_embeddings_backfill_batch_size (default: 250) to process topics faster.

Last edited by @Falco 2025-03-21T17:34:12Z

Check document
Perform check on document:

satonotdead · February 14, 2024, 1:51am

The model bge-m3 should work for multilingual (or not english) sites?

Falco · February 14, 2024, 4:14am

Yes, I played with it the week it got silently shared on GitHub and it works well. Still waiting to see how it lands on the MTEB leaderboars, as it wasn’t there last I looked.

That said we have large hosted Discourse instances using the multilingual the plugin ships, e5, and it performs very well.

satonotdead · February 14, 2024, 2:24pm

Thanks, did you have plans to enable open-source custom endpoints for embeds? I’m trying to use this models on Huggingface.

Falco · February 15, 2024, 10:48pm

Sorry I don’t understand what you are trying to convey here. This topic is a guide on how to run open-source models for Discourse AI embeddings.

satonotdead · February 16, 2024, 2:37pm

Oh, sorry about that. I’m trying to use an open-source model from HuggingFace custom endpooint and I wonder if that’s possible or it’s on the plans to enable at near future

fokx · April 28, 2024, 3:40am

To check if it’s working, the following command works for me (with BAAI/bge-m3 model):

curl -X 'POST' \
  'http://localhost:8081/embed'\
  -H 'Content-Type: application/json' \
  -d '{ "inputs": "Testing string for embeddings"}'

BTW, you can also use the Swagger web interface at http://localhost:8081/docs/.

Isambard · May 16, 2024, 8:19pm

This is also a nice embeddings server:

https://github.com/michaelfeil/infinity

Isambard · November 29, 2024, 1:06pm

To save space, is it possible to use quantized embeddings? I’d like to use binary quantized embeddings to really cut down the storage size. Having done some tests, I get >90% performance with 32x less storage!

Falco · November 29, 2024, 1:49pm

We are storing embeddings using half precision (half storage space) and using binary quantization for indexes (32x smaller) by default as of a few weeks ago, so just updating your site to latest should give you ample disk usage reduction.

Isambard · November 29, 2024, 10:27pm

Could you please also add:

to the supported embedding models?

Falco · November 29, 2024, 10:53pm

We plan on making embeddings configurable the same way we did with LLMs, so any model will be compatible soon.

Isambard · November 30, 2024, 12:00am

If anyone else has problems with endpoints on the local network e.g. 192.168.x.x - it seems these are blocked by discourse (presumably for security reasons) and the block needs to be bypassed. Lost some hours figuring that one out!

Isambard · November 30, 2024, 8:19am

@Falco that would be great. In the interim, if I wanted to have a stab at adding in a new embedding model, do I just need to add:

 lib/embeddings/vector_representations/mxbai-embed-xsmall-v1.rb
 lib/tokenizer/mxbai-embed-xsmall-v1.rb
 tokenizers/mxbai-embed-xsmall-v1.json

and modify lib/embeddings/vector_representations/base.rb to include the new model, or is there something else I need to change too?

Isambard · November 30, 2024, 2:11pm

@Falco I tried my hand at adding the model and sent a pull request. Apologies if I did something wrong as I’m not really a SW developer. I hoped you could maybe look over it and see if it is OK for inclusion.

Unfortunately, I was not able to get it working with TEI. I could get the all-mpnet working with TEI, but I think there’s something wrong with what I have done to get mxbai working.

BTW, any chance of supporting https://github.com/michaelfeil/infinity as an embedding server?

EDIT: I see this is going to be messy as the HNSW indexes in the database seem to be hardcoded so new models need to be appended at the end to avoid disrupting the ordering and each new model needs to add its own index.

Falco · November 30, 2024, 10:51pm

I really recommend waiting a couple of weeks until we ship support for configurable embeddings.

This should work fine when we ship configurable embeddings, but out of curiosity what would that bring over GitHub - huggingface/text-embeddings-inference: A blazing fast inference solution for text embeddings models · GitHub ?

Isambard · December 3, 2024, 11:55pm

I haven’t kept up with TEI so won’t mention the advantages that I haven’t tested recently, but of then things I saw recently:

Hardware support: infinity has better GPU support than TEI
infinity server can host multiple embedding models in a single server (unless I missed this in TEI)

It’s very nice. If you haven’t tried it, you should take a look!

michaelfeil · December 31, 2024, 2:45pm

A friend just DM’ed me this thread.

Some Pro/Con’s:

infinity supports multi-modal embeddings (aka send images/audio) to the
amd gpu support
multiple models supported in the same container (control the model via model param).
more dtypes e.g. int8 quantization of the weights (mostly this is irrelevant, activation memory is larger)
new models often come out via “custom modeling code” shipped in the huggingface repo. Infinity reads this pytorch code if needed. This will help you avoid “can you support xyz models” on a ongoing basis)
more models supported (e.g. debertav2 for mixedbread)

Cons:

cold start time of TEI is better

sam · January 15, 2025, 11:23pm

Hi Michael

@roman has been busy restructuring our embedding config at:

https://github.com/discourse/discourse-ai/pull/1049

We should be done very very soon, once that is done adding support for inifinity should be trivial.

I still think a lot about multi model embedding, it gives you a shortcut when trying to do RAG on PDFs cause you just process it into images and embed each image avoiding need for OCR or expensive Image to text powered by LLM.

Once we get this PR done we will be more than happy to add infinity support (and multi model support) into the embedding config.

Thanks for popping in

Isambard · January 23, 2025, 11:45am

I wonder whether building litellm support might offer a shortcut as then you benefit from all the models supported via litellm. Other projects see to embed this.

Topic		Replies	Views
Can´t set ai embedding model Support ai	3	154	July 16, 2025
Discourse AI - Embeddings Site Management ai , ai-search , related-topics	20	6819	July 7, 2025
Self-Hosting an OpenSource LLM for DiscourseAI Self-Hosting ai	6	3757	January 20, 2026
What do I need to insert into the 'ai embeddings discourse service api endpoint' Support ai	2	183	January 7, 2024
Discourse AI - Self-Hosted Guide Self-Hosting ai	59	14024	May 20, 2024

Self-Hosting Embeddings for DiscourseAI

Running with HuggingFace TEI

Making it available for your Discourse instance

Configuring DiscourseAI

Related topics