Is there some cost benchmark or yardstick or guesstimate formula that will help me understand the the one time (mass embedding) and ongoing (embedding and search) cost of enabling Discourse AI using a cloud based LLM?
For self hosted LLM, what is a typical server config/cost that would be required?
I believe you would need a GPU it is better with a GPU if you want to self-host. Check out things like Ollama .
Also see:
In order to use certain Discourse AI features, users are required to use a Large Language Model (LLM) provider. Please see each AI feature to determine which LLMs are compatible.
If cost is a significant worry, Discourse AI has several built-in tools to help manage spending:
AI Usage dashboard — track token consumption per feature, model, and user with estimated costs
Usage quotas — set per-model, per-group limits on tokens or request counts within configura…
Falco
(Falco)
October 28, 2025, 1:11pm
3
Related Topics and AI search don’t use an LLM.
It’s one request per topic for mass embeddings, so most sites should be able to do it using something like the Gemini Free tier.
Search is one request per search, and which most likely can fit in the free tier.
SubStrider:
For self hosted
Since this is just an embeddings model, you should be able to self host Qwen/Qwen3-Embedding-0.6B · Hugging Face using GitHub - huggingface/text-embeddings-inference: A blazing fast inference solution for text embeddings models · GitHub in a basic 2 vCPU / 4GB RAM easily.
It is faster on a server with GPU, of course, but runs just fine in one without it.