I’m using Discourse AI and have it connected to an external LLM API, along with some extensions (which come with additional costs).
Because of that, I’d like to configure it so that it does not use those extensions, and instead relies only on the LLM itself as a forum helper—for things like internal search, summarization, or other features that work purely within the forum.
The main reason is to reduce the cost of paid add-ons(eg.external web search), so I’m looking for guidance on how to set things up this way.
Thank
Edit:
I got a reply from the provider saying that this request was charged for web search because the AI cited sources like BBC and Reuters, Others, which automatically triggered the model’s evidence/search mode.
So does this mean this isn’t related to Discourse settings, and there’s no way to disable this behavior from the provider side?
Is there any workaround for this?
The provider suggested switching to a model with less “thinking,” and avoiding flash or instinct models, but that also means reduced reasoning and computation capability.
**This message was translated from Thai using a translation tool, so I apologize in advance if anything is unclear or slightly incorrect.
what do you mean by extensions? i assume for web searching?
i have 2 self-hosted sites running all my Discourse AI features with Gemini on Google Cloud, and i am using Google Custom Search Engine API for the web researcher (100 free queries /day). i use Gemini 2.5 flash lite for as much as possible, like summarizing and gists, 2.5 flash for translation, and the various other Gemini models for more specific and thinking tasks (Gemini flash image, for example).
Ah, got it — thanks for clarifying! Yeah, I was thinking “extensions” in the sense of web searching or extra AI features.
For my setup, I’m using the MiMo API by Xiaomi, which gives me 1000 requests per month. Any use of additional extensions counts extra based on usage, and unfortunately I can’t disable that. The provider mentioned it depends on the length and complexity of the prompt — for example, if I or my users enter something like “search the latest news about…”, whether or not it exists on my forum, the model will do a web search in parallel. I really don’t have control over those extra costs.
I haven’t filled in any Google Custom Search Engine API keys — I just leave that empty and use the default settings for Forum Helper.
I was wondering if there’s any smart way to handle this? If I try to limit credits at the provider level, it ends up restricting all the models I’m running.
Also, apologies if my English is a bit hard to follow — I’m using a translator to communicate
Summary of the problem I am facing (explained simply):
I am using Discourse AI on a self-hosted website.
The LLM I am using is MiMo API by Xiaomi, which provides a quota of 1000 requests per month.
The issue is that using certain extensions (like web search) incurs additional charges based on usage and cannot be disabled from the provider’s side.
The provider explained that:
Costs depend on the length and nature of the prompt.
For example, if I or a user types something like “Search for the latest news about…”, regardless of whether the information is already in my forum, the model might automatically search the web as well.
This makes it:
Difficult for me to control costs because the user types the prompt themselves.
I did not enter a Google Custom Search Engine API key, leaving this field blank and using the default setting for Forum Helper.
If I try to limit the credit from the provider’s side:
It will limit all models currently in use.
It cannot limit specific models or specific features.
Here is an example of the log I can check:
Generation details
Model: MiMo-V2-Flash
Model ID: xiaomi/mimo-v2-flash
Provider: Xiaomi
First token latency: 12.77 seconds
Throughput: 1.5 tokens/second
Finish reason: stop
Data policy: No data training | Policy
Tokens:
- Prompt: 38065
- Completion: 20
Web search:
- Results: 5
Costs:
- Subtotal: 0
- Web search cost: 0.02
- Final cost: 0.02
Creator: hidden
Generation ID: hidden
If you mean using a Local LLM, I do not have plans to increase expenses yet. On the server, it requires a lot of processing power for more than 20 concurrent users, so this plan is not being implemented. I would like to focus on using external APIs such as Groq or OpenRouter, which are more cost-effective, and try to control costs in this area.
I’ve found the answer already. From my testing and observations, web search was being triggered on every model I used (or at least every model I tried), even after switching models. This appears to be an issue on the provider side.
The problem is that web search becomes an unwanted, hidden cost that I can’t properly control or fully disable, even when it’s not needed.
I’ve already cleared my account, cancelled the service with this provider, and I’m now looking for a different provider.