Estimating cost of enabling Discourse AI for related content and search

Is there some cost benchmark or yardstick or guesstimate formula that will help me understand the the one time (mass embedding) and ongoing (embedding and search) cost of enabling Discourse AI using a cloud based LLM?

For self hosted LLM, what is a typical server config/cost that would be required?

I believe you would need a GPU it is better with a GPU if you want to self-host. Check out things like Ollama.

Also see:

Related Topics and AI search don’t use an LLM.

It’s one request per topic for mass embeddings, so most sites should be able to do it using something like the Gemini Free tier.

Search is one request per search, and which most likely can fit in the free tier.

Since this is just an embeddings model, you should be able to self host Qwen/Qwen3-Embedding-0.6B · Hugging Face using GitHub - huggingface/text-embeddings-inference: A blazing fast inference solution for text embeddings models in a basic 2 vCPU / 4GB RAM easily.

It is faster on a server with GPU, of course, but runs just fine in one without it.