How to set up Discourse AI for internal-only usage

Thefacto · January 9, 2026, 4:39am

I’m using Discourse AI and have it connected to an external LLM API, along with some extensions (which come with additional costs).

Because of that, I’d like to configure it so that it does not use those extensions, and instead relies only on the LLM itself as a forum helper—for things like internal search, summarization, or other features that work purely within the forum.

The main reason is to reduce the cost of paid add-ons(eg.external web search), so I’m looking for guidance on how to set things up this way.

Thank

Edit:

I got a reply from the provider saying that this request was charged for web search because the AI cited sources like BBC and Reuters, Others, which automatically triggered the model’s evidence/search mode.

So does this mean this isn’t related to Discourse settings, and there’s no way to disable this behavior from the provider side?

Is there any workaround for this?

The provider suggested switching to a model with less “thinking,” and avoiding flash or instinct models, but that also means reduced reasoning and computation capability.

**This message was translated from Thai using a translation tool, so I apologize in advance if anything is unclear or slightly incorrect.

Lilly · January 9, 2026, 5:31am

what do you mean by extensions? i assume for web searching?

i have 2 self-hosted sites running all my Discourse AI features with Gemini on Google Cloud, and i am using Google Custom Search Engine API for the web researcher (100 free queries /day). i use Gemini 2.5 flash lite for as much as possible, like summarizing and gists, 2.5 flash for translation, and the various other Gemini models for more specific and thinking tasks (Gemini flash image, for example).

perhaps this topic may interest you

Thefacto · January 9, 2026, 5:51am

Ah, got it — thanks for clarifying! Yeah, I was thinking “extensions” in the sense of web searching or extra AI features.

For my setup, I’m using the MiMo API by Xiaomi, which gives me 1000 requests per month. Any use of additional extensions counts extra based on usage, and unfortunately I can’t disable that. The provider mentioned it depends on the length and complexity of the prompt — for example, if I or my users enter something like “search the latest news about…”, whether or not it exists on my forum, the model will do a web search in parallel. I really don’t have control over those extra costs.

I haven’t filled in any Google Custom Search Engine API keys — I just leave that empty and use the default settings for Forum Helper.

I was wondering if there’s any smart way to handle this? If I try to limit credits at the provider level, it ends up restricting all the models I’m running.

Also, apologies if my English is a bit hard to follow — I’m using a translator to communicate

Lilly · January 9, 2026, 5:56am

you should be able to post in your native language here, content localization and ai translation are enabled.

Thefacto · January 9, 2026, 6:15am

Thank you for the advice on language usage.

Summary of the problem I am facing (explained simply):

I am using Discourse AI on a self-hosted website.
The LLM I am using is MiMo API by Xiaomi, which provides a quota of 1000 requests per month.
The issue is that using certain extensions (like web search) incurs additional charges based on usage and cannot be disabled from the provider’s side.

The provider explained that:

Costs depend on the length and nature of the prompt.
For example, if I or a user types something like “Search for the latest news about…”, regardless of whether the information is already in my forum, the model might automatically search the web as well.

This makes it:

Difficult for me to control costs because the user types the prompt themselves.

I did not enter a Google Custom Search Engine API key, leaving this field blank and using the default setting for Forum Helper.

If I try to limit the credit from the provider’s side:

It will limit all models currently in use.
It cannot limit specific models or specific features.

Here is an example of the log I can check:

Generation details
Model: MiMo-V2-Flash
Model ID: xiaomi/mimo-v2-flash
Provider: Xiaomi

First token latency: 12.77 seconds
Throughput: 1.5 tokens/second
Finish reason: stop
Data policy: No data training | Policy

Tokens:
- Prompt: 38065
- Completion: 20

Web search:
- Results: 5

Costs:
- Subtotal: 0
- Web search cost: 0.02
- Final cost: 0.02

Creator: hidden 
Generation ID: hidden

Thefacto · January 9, 2026, 6:38am

If you mean using a Local LLM, I do not have plans to increase expenses yet. On the server, it requires a lot of processing power for more than 20 concurrent users, so this plan is not being implemented. I would like to focus on using external APIs such as Groq or OpenRouter, which are more cost-effective, and try to control costs in this area.

Thefacto · January 9, 2026, 7:54am

Thanks for the support.

I’ve found the answer already. From my testing and observations, web search was being triggered on every model I used (or at least every model I tried), even after switching models. This appears to be an issue on the provider side.

The problem is that web search becomes an unwanted, hidden cost that I can’t properly control or fully disable, even when it’s not needed.

I’ve already cleared my account, cancelled the service with this provider, and I’m now looking for a different provider.

Thanks again.

system · February 8, 2026, 7:55am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Estimating cost of enabling Discourse AI for related content and search Support ai	3	84	October 28, 2025
Unlock All Discourse AI Features with Our Hosted LLM Announcements ai	8	470	December 17, 2025
How to configure Discourse to use a locally installed LLM? Support ai	8	233	September 17, 2025
What LLM to use for Discourse AI? Site Management how-to , ai	0	754	January 23, 2025
Inquiry About AI Plugin Options Support ai	7	99	November 24, 2025

How to set up Discourse AI for internal-only usage

Related topics