Making the case for a hard cap feature on user group AI usage for AI bots and AI Helper

MachineScholar · July 24, 2024, 10:43am

I’ve done some calculations and came to the conclusion that without the ability to limit a user’s monthly or daily token input and output, you can quickly get into some trouble. Currently, the only way to limit a user’s interaction with an AI bot is to allow AI bots in PM only (disabling chat for each persona) and setting a limit of allowed daily PMs — but of course this is unrealistic. Here is an example of a “worst case scenario” cost breakdown which justifies the need for this feature, using the approach that OpenAI does for their chatGPT members:

GPT-4o mini with 32k context (P.S. context length is set using the “Number of token for the prompt” setting on the LLMs settings page)

Current cost: $0.15 1M input / $0.60 1M output

Let’s say that the user inputs 32k and outputs 16k each day for 30 days (one billing cycle for a typical subscription):

Cost of monthly input = 960,000 tokens = ~$0.14

Cost of monthly output = 480,000 tokens = ~$0.28

Okay so that’s actually not bad, right? Less than half a buck. However, that is actually rather low usage, especially since GPT-4o mini can generate up to 16.4k tokens in a single shot (although yes of course you can engineer the prompt and LLM settings to prevent that). You can start to multiply those costs for however much you think your users would use the AI bot. The worst part is that this is an incredibly cheap model; the costs are exponentially higher for Claude 3.5 Sonnet ($3 1M input / $15 1M output) and GPT-4o ($5 1M input / $15 1M output) — and let’s not even talk about GPT-4 Turbo lol. Here’s that same breakdown for Claude 3.5 Sonnet:

Claude 3.5 Sonnet with 32k context

Cost of monthly input = ~$2.88

Cot of monthly output = ~$7.20

Total = ~$10.08

But again; this is low usage. So it becomes clear how costly is can get to have unconstrained LLM use in AI bots. If you multiply this by 2, then you would need to charge a $25 subscription to pull a profit of just under $5

Here’s what I formally propose:

A setting which allows a specific amount of token input and output for a specified user group each month or day for AI bots.

This token usage would NOT include the system prompt for the personas.
Token limits can be either per LLM, per persona, or universal/altogether.

Alternatively to point 1, a simple integer limit for using AI bots in DMs and PMs could be used. Example: limit of 100 DMs to any persona per day.
A setting which allows a specific amount of token output for a specified user group each month or day for the AI helper.

Token input can be uncounted since it’d be impractical to expect the user to guess how many tokens a long topic is when, for example, generating a summary.
It might also be wise to put a hard integer limit on the length (in words so that TikToken doesn’t have to be used here) for custom prompts so users don’t attempt to circumvent their monthly/daily limits by using the Composer as an unmetered chatbot

A token counter in the user’s profile and perhaps even in their PMs and DMs. It would be cool if there was a tiny text next to each user and AI message which displays the number of tokens it is (we don’t necessarily want to allow everyone the debug feature, and that only works in PMs anyway)

A separate token counter for AI helper (to help keep these two features separate) which shares a count between explain, proofread, custom prompt, etc…

Side-note: I am not at all knocking this feature nor the devs in any way and I apologize if any part of this comes off that way. Honestly the Discourse AI plugin is one of my all-time favorites bits of technology. Actually, it is allowing me to build my dream business as an AI researcher and educator without having to hire extra engineers and pay for additional infrastructure — I can set it all up by myself . I merely think that this feature is the last piece to the puzzle not only for me, but for numerous other Discoursers who want to let their users enjoy this wonderful technology within reason.

merefield · July 24, 2024, 12:56pm

This was implemented in Discourse Chatbot as a weekly quota system in March '23 and has since been expanded so you can define quotas for three different collections of user Groups. E.g. Paying members get a higher quota.

Users who breach their weekly quota are shown a polite message (which costs you nothing).

Moreover, the admins can be alerted when a quota is breached.

I did PM you about it in response to one of your prior Posts on this topic, but you didn’t respond Perhaps you are hosted and don’t have access to my plugin?

tbh, with the advent of GPT 4o-mini the costs for decent bot conversations have plummeted.

btw, Discourse Chatbot is now used by at least one business for front line customer support, so you can be sure it is stable and effective.

MachineScholar · July 25, 2024, 2:55pm

Apologies for the lack of a reply! I remember reading it now, but I have no idea why I didn’t write back . And I’m on a droplet so that isn’t the issue.

I have no doubt about its stability and quality; in fact, I quite like the plugin and I respect you and the effort you put into it. However, a chatbot is only a partial need for my business venture. The AI Helper is a core necessity, as well as is the ability to immediately change between models. In the near future I will be deploying my own fine-tuned model and manually setting up my LLMs and this is mission-critical.

I’m only explaining all of this so that you don’t think I have something against your work! Rather, the problem is from my side; I’m trying to do something quite niche.

merefield · July 25, 2024, 2:58pm

Yep, fully respect the scope of the request is broader

Just offering a partial (if significant) solution.

sam · January 13, 2025, 2:50am

This is expected to land this week:

github.com/discourse/discourse-ai

FEATURE: llm quotas

discourse:main ← discourse:quotas2

opened 06:20AM - 02 Jan 25 UTC

SamSaffron

+1508 -5

Adds a comprehensive quota management system for LLM models that allows: - Se…tting per-group token and usage limits with configurable durations - Tracking and enforcing token/usage limits across user groups - Quota reset periods (hourly, daily, weekly, or custom) - Admin UI for managing quotas with real-time updates - Full test coverage for quota models and controllers This system provides granular control over LLM API usage by allowing admins to define limits on both total tokens and number of requests per group. Supports multiple concurrent quotas per model and automatically handles quota resets. ![image](https://github.com/user-attachments/assets/76375c76-889d-438b-b464-e65c7f7a41ed) ![image](https://github.com/user-attachments/assets/21752366-2b33-4fb7-8b3f-faee74c45413) ![image](https://github.com/user-attachments/assets/c7248930-0aa7-434e-805e-56adb7cbfb2f)

MachineScholar · January 13, 2025, 8:04am

This is AWESOME !

In the details below, does this imply that the total tokens and requests are shared between all users in the group, or rather that each user in the group can utilize the set amounts individually?

This system provides granular control over LLM API usage by allowing admins
to define limits on both total tokens and number of requests per group.

sam · January 13, 2025, 8:15am

Oh I need to clarify this in the UI… all limits are per user and never shared between group members. Shared group quota limit is an interesting concept but I am not sure it makes sense in practice? Can you think of any time this would be useful?

For now my implementation is:

Pick the most “relaxed” quota the user has depending on groups the user is a member of
Enforce per user.

(this allows admins immunity even if TL2 has a strict quota)

MachineScholar · January 13, 2025, 8:27am

I was asking because it indeed wouldn’t make sense in practice . My two cents is that your implementation here is the most ideal. My community and I truly appreciate the work being done here

sam · January 13, 2025, 8:29am

The one argument for “absolute quota” is:

I want Tl1 to be able to play with AI but … as a safeguard limit my spend at N$ a day. Tl1 has an unknown number of members.

But I guess if that is what people are after they could put the absolute quotas direct in the Anthropic / Open AI etc… dashboards.

I am not against adding absolute quotas later on, but probably will skip on this iteration.

BrianC · January 14, 2025, 4:50am

@sam This is fantastic update Will token limits be tied in to subscriptions? It would be awesome if we can control usage and allow more expensive models to be used for a fee.

sam · January 14, 2025, 5:00am

Yes this can work with the system, you can set up different quotas for different groups of users.

sam · January 21, 2025, 6:10am

This is now implemented and documented:

sam · January 26, 2025, 9:00pm

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
Limit the number of AI tokens a user can use in a day? Feature completed , ai	12	313	April 3, 2025
Balancing Costs and Functionality in AI-Powered Forums Feature ai , ai-bot	4	676	January 21, 2025
Configuring LLM Usage Quotas in Discourse AI Site Management official , how-to , ai	4	168	January 21, 2025
Discourse AI - AI usage Site Management how-to , ai	0	207	January 23, 2025
Estimating costs of using LLMs for Discourse AI Site Management how-to , price-sensitive , ai	2	654	November 14, 2024

Making the case for a hard cap feature on user group AI usage for AI bots and AI Helper

Related topics