Self-Hosting Embeddings for DiscourseAI

michaelfeil · December 31, 2024, 2:45pm

A friend just DM’ed me this thread.

Some Pro/Con’s:

infinity supports multi-modal embeddings (aka send images/audio) to the
amd gpu support
multiple models supported in the same container (control the model via model param).
more dtypes e.g. int8 quantization of the weights (mostly this is irrelevant, activation memory is larger)
new models often come out via “custom modeling code” shipped in the huggingface repo. Infinity reads this pytorch code if needed. This will help you avoid “can you support xyz models” on a ongoing basis)
more models supported (e.g. debertav2 for mixedbread)

Cons:

Topic		Replies	Views
Can´t set ai embedding model Support ai	4	84	July 16, 2025
Discourse AI - Embeddings Site Management ai , ai-search , related-topics	24	5947	October 15, 2025
What do I need to insert into the 'ai embeddings discourse service api endpoint' Support ai	3	148	January 7, 2024
Discourse AI - Self-Hosted Guide Self-Hosting ai	61	12043	April 30, 2025
Self-Hosting an OpenSource LLM for DiscourseAI Self-Hosting ai	5	3082	February 21, 2025