Is there some cost benchmark or yardstick or guesstimate formula that will help me understand the the one time (mass embedding) and ongoing (embedding and search) cost of enabling Discourse AI using a cloud based LLM?
For self hosted LLM, what is a typical server config/cost that would be required?
I believe you would need a GPU it is better with a GPU if you want to self-host. Check out things like Ollama .
Also see:
In order to use certain Discourse AI features, users are required to use a Large Language Model (LLM) provider. Please see each AI feature to determine which LLMs are compatible.
If cost is a significant worry, one way to combat that is to set usage limits right from the vendor and use a monthly budget. Another option is to only let select users and groups access the AI features
There are several variable factors to consider when calculating the costs of using…
Falco
(Falco)
October 28, 2025, 1:11pm
3
Related Topics and AI search don’t use an LLM.
It’s one request per topic for mass embeddings, so most sites should be able to do it using something like the Gemini Free tier.
Search is one request per search, and which most likely can fit in the free tier.
SubStrider:
For self hosted
Since this is just an embeddings model, you should be able to self host Qwen/Qwen3-Embedding-0.6B · Hugging Face using GitHub - huggingface/text-embeddings-inference: A blazing fast inference solution for text embeddings models in a basic 2 vCPU / 4GB RAM easily.
It is faster on a server with GPU, of course, but runs just fine in one without it.