Self-Hosting Embeddings for DiscourseAI

Discourse · 08.Январь.2024 20:49:12

Плагин Discourse AI обладает множеством функций, требующих для работы векторных представлений (embeddings), таких как «Связанные темы», «Поиск с помощью ИИ», «Помощник ИИ» и предложения категорий и тегов. Хотя вы можете использовать сторонний API, например, настроить ключи API для OpenAI, настроить ключи API для Cloudflare Workers AI или настроить ключи API для Google Gemini, мы создали Discourse AI с первого дня так, чтобы не быть зависимыми от этих сервисов.

Запуск с использованием HuggingFace TEI

HuggingFace предоставляет отличный контейнерный образ, который позволит вам быстро запустить сервис.

Например:

mkdir -p /opt/tei-cache
docker run --rm --gpus all --shm-size 1g -p 8081:80 \
  -v /opt/tei-cache:/data \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id BAAI/bge-large-en-v1.5

Это позволит вам запустить локальный экземпляр модели BAAI/bge-large-en-v1.5 — очень эффективной модели с открытым исходным кодом.

Вы можете проверить, работает ли он, выполнив команду:

curl -X POST \
  'http://localhost:8081/embed' \
  -H 'Content-Type: application/json' \
  -d '{ "inputs": "Testing string for embeddings" }'

При нормальной работе должен быть возвращён массив чисел с плавающей запятой.

Доступность для вашего экземпляра Discourse

Чаще всего этот сервис будет запускаться на выделенном сервере из-за ускорения работы GPU. В таком случае рекомендуется использовать обратный прокси-сервер, завершать TLS-соединения и обеспечить безопасность конечной точки, чтобы к ней мог подключаться только ваш экземпляр Discourse.

Настройка DiscourseAI

Discourse AI теперь использует полностью настраиваемую систему определения векторных представлений, аналогичную настройке больших языковых моделей (LLM). Чтобы настроить ваш самохостинговый конечный пункт:

Перейдите в Администрирование → Плагины → Discourse AI → Векторные представления (Embeddings).
Нажмите Создать, чтобы создать новое определение векторных представлений.
Выберите шаблон, соответствующий вашей модели (например, bge-large-en, bge-m3 или multilingual-e5-large), или выберите Настроить вручную для любой другой модели.
Укажите URL, ведущий на ваш самохостинговый сервер TEI (например, https://your-tei-server:8081).
Используйте кнопку Тест, чтобы проверить подключение перед сохранением.
После сохранения установите параметр ai_embeddings_selected_model на ваше новое определение векторных представлений.

После настройки Discourse автоматически заполнит векторные представления для существующих тем с помощью запланированной фоновой задачи. Если у вас большая очередь, вы можете увеличить скрытый параметр ai_embeddings_backfill_batch_size (по умолчанию: 250), чтобы обрабатывать темы быстрее.

satonotdead · 14.Февраль.2024 01:51:18

The model bge-m3 should work for multilingual (or not english) sites?

Falco · 14.Февраль.2024 04:14:47

Yes, I played with it the week it got silently shared on GitHub and it works well. Still waiting to see how it lands on the MTEB leaderboars, as it wasn’t there last I looked.

That said we have large hosted Discourse instances using the multilingual the plugin ships, e5, and it performs very well.

satonotdead · 14.Февраль.2024 14:24:38

Thanks, did you have plans to enable open-source custom endpoints for embeds? I’m trying to use this models on Huggingface.

Falco · 15.Февраль.2024 22:48:07

Sorry I don’t understand what you are trying to convey here. This topic is a guide on how to run open-source models for Discourse AI embeddings.

satonotdead · 16.Февраль.2024 14:37:55

Oh, sorry about that. I’m trying to use an open-source model from HuggingFace custom endpooint and I wonder if that’s possible or it’s on the plans to enable at near future

fokx · 28.Апрель.2024 03:40:37

To check if it’s working, the following command works for me (with BAAI/bge-m3 model):

curl -X 'POST' \
  'http://localhost:8081/embed'\
  -H 'Content-Type: application/json' \
  -d '{ "inputs": "Testing string for embeddings"}'

BTW, you can also use the Swagger web interface at http://localhost:8081/docs/.

Isambard · 16.Май.2024 20:19:05

This is also a nice embeddings server:

Isambard · 29.Ноябрь.2024 13:06:41

To save space, is it possible to use quantized embeddings? I’d like to use binary quantized embeddings to really cut down the storage size. Having done some tests, I get >90% performance with 32x less storage!

Falco · 29.Ноябрь.2024 13:49:54

We are storing embeddings using half precision (half storage space) and using binary quantization for indexes (32x smaller) by default as of a few weeks ago, so just updating your site to latest should give you ample disk usage reduction.

Isambard · 29.Ноябрь.2024 22:27:29

Could you please also add:

to the supported embedding models?

Falco · 29.Ноябрь.2024 22:53:02

We plan on making embeddings configurable the same way we did with LLMs, so any model will be compatible soon.

Isambard · 30.Ноябрь.2024 00:00:30

If anyone else has problems with endpoints on the local network e.g. 192.168.x.x - it seems these are blocked by discourse (presumably for security reasons) and the block needs to be bypassed. Lost some hours figuring that one out!

Isambard · 30.Ноябрь.2024 08:19:44

@Falco that would be great. In the interim, if I wanted to have a stab at adding in a new embedding model, do I just need to add:

 lib/embeddings/vector_representations/mxbai-embed-xsmall-v1.rb
 lib/tokenizer/mxbai-embed-xsmall-v1.rb
 tokenizers/mxbai-embed-xsmall-v1.json

and modify lib/embeddings/vector_representations/base.rb to include the new model, or is there something else I need to change too?

Isambard · 30.Ноябрь.2024 14:11:07

@Falco I tried my hand at adding the model and sent a pull request. Apologies if I did something wrong as I’m not really a SW developer. I hoped you could maybe look over it and see if it is OK for inclusion.

Unfortunately, I was not able to get it working with TEI. I could get the all-mpnet working with TEI, but I think there’s something wrong with what I have done to get mxbai working.

BTW, any chance of supporting https://github.com/michaelfeil/infinity as an embedding server?

EDIT: I see this is going to be messy as the HNSW indexes in the database seem to be hardcoded so new models need to be appended at the end to avoid disrupting the ordering and each new model needs to add its own index.

Falco · 30.Ноябрь.2024 22:51:29

I really recommend waiting a couple of weeks until we ship support for configurable embeddings.

This should work fine when we ship configurable embeddings, but out of curiosity what would that bring over GitHub - huggingface/text-embeddings-inference: A blazing fast inference solution for text embeddings models ?

Isambard · 03.Декабрь.2024 23:55:21

I haven’t kept up with TEI so won’t mention the advantages that I haven’t tested recently, but of then things I saw recently:

Hardware support: infinity has better GPU support than TEI
infinity server can host multiple embedding models in a single server (unless I missed this in TEI)

It’s very nice. If you haven’t tried it, you should take a look!

michaelfeil · 31.Декабрь.2024 14:45:07

A friend just DM’ed me this thread.

Some Pro/Con’s:

infinity supports multi-modal embeddings (aka send images/audio) to the
amd gpu support
multiple models supported in the same container (control the model via model param).
more dtypes e.g. int8 quantization of the weights (mostly this is irrelevant, activation memory is larger)
new models often come out via “custom modeling code” shipped in the huggingface repo. Infinity reads this pytorch code if needed. This will help you avoid “can you support xyz models” on a ongoing basis)
more models supported (e.g. debertav2 for mixedbread)

Cons:

cold start time of TEI is better

sam · 15.Январь.2025 23:23:19

Hi Michael

@roman has been busy restructuring our embedding config at:

github.com/discourse/discourse-ai

FEATURE: configurable embeddings

main ← data_driven_embeddings

merged 03:23PM - 21 Jan 25 UTC

romanrizzi

+2124 -1001

Adds a way to configure embeddings similar to what we already have for other con…cepts like LLMs, tools, personas, etc. It hides many old settings and adds a new one called "ai_embeddings_selected_model". We include a data migration to seed the model using these old settings. It also removes the `DiscourseClassifier` service. <img width="1131" alt="Screenshot 2025-01-13 at 11 33 39 AM" src="https://github.com/user-attachments/assets/f6be2f98-1cc6-4bf9-a7d3-2aeb289f353f" /> --- <img width="545" alt="Screenshot 2025-01-13 at 11 33 45 AM" src="https://github.com/user-attachments/assets/5f541f6b-0919-42ba-8182-4f84f8c5ab8a" /> --- <img width="572" alt="Screenshot 2025-01-13 at 11 33 51 AM" src="https://github.com/user-attachments/assets/7310580c-64cd-4194-b536-0511e9ea7e81" />

We should be done very very soon, once that is done adding support for inifinity should be trivial.

I still think a lot about multi model embedding, it gives you a shortcut when trying to do RAG on PDFs cause you just process it into images and embed each image avoiding need for OCR or expensive Image to text powered by LLM.

Once we get this PR done we will be more than happy to add infinity support (and multi model support) into the embedding config.

Thanks for popping in

Isambard · 23.Январь.2025 11:45:12

I wonder whether building litellm support might offer a shortcut as then you benefit from all the models supported via litellm. Other projects see to embed this.

Тема		Ответов	Просм.
Can´t set ai embedding model Support ai	4	121	16.07.2025
Discourse AI - Embeddings Site Management ai , ai-search , related-topics	24	6400	15.10.2025
Self-Hosting an OpenSource LLM for DiscourseAI Self-Hosting ai	7	3540	20.01.2026
What do I need to insert into the 'ai embeddings discourse service api endpoint' Support ai	3	159	07.01.2024
Discourse AI - Self-Hosted Guide Self-Hosting ai	61	13218	30.04.2025