Estimating cost of enabling Discourse AI for related content and search

Related Topics and AI search don’t use an LLM.

It’s one request per topic for mass embeddings, so most sites should be able to do it using something like the Gemini Free tier.

Search is one request per search, and which most likely can fit in the free tier.

Since this is just an embeddings model, you should be able to self host Qwen/Qwen3-Embedding-0.6B · Hugging Face using GitHub - huggingface/text-embeddings-inference: A blazing fast inference solution for text embeddings models in a basic 2 vCPU / 4GB RAM easily.

It is faster on a server with GPU, of course, but runs just fine in one without it.

1 Like