I’ve done some calculations and came to the conclusion that without the ability to limit a user’s monthly or daily token input and output, you can quickly get into some trouble. Currently, the only way to limit a user’s interaction with an AI bot is to allow AI bots in PM only (disabling chat for each persona) and setting a limit of allowed daily PMs — but of course this is unrealistic. Here is an example of a “worst case scenario” cost breakdown which justifies the need for this feature, using the approach that OpenAI does for their chatGPT members:
GPT-4o mini with 32k context (P.S. context length is set using the “Number of token for the prompt” setting on the LLMs settings page)
Current cost: $0.15 1M input / $0.60 1M output
Let’s say that the user inputs 32k and outputs 16k each day for 30 days (one billing cycle for a typical subscription):
Cost of monthly input = 960,000 tokens = ~$0.14
Cost of monthly output = 480,000 tokens = ~$0.28
Okay so that’s actually not bad, right? Less than half a buck. However, that is actually rather low usage, especially since GPT-4o mini can generate up to 16.4k tokens in a single shot (although yes of course you can engineer the prompt and LLM settings to prevent that). You can start to multiply those costs for however much you think your users would use the AI bot. The worst part is that this is an incredibly cheap model; the costs are exponentially higher for Claude 3.5 Sonnet ($3 1M input / $15 1M output) and GPT-4o ($5 1M input / $15 1M output) — and let’s not even talk about GPT-4 Turbo lol. Here’s that same breakdown for Claude 3.5 Sonnet:
Claude 3.5 Sonnet with 32k context
Cost of monthly input = ~$2.88
Cot of monthly output = ~$7.20
Total = ~$10.08
But again; this is low usage. So it becomes clear how costly is can get to have unconstrained LLM use in AI bots. If you multiply this by 2, then you would need to charge a $25 subscription to pull a profit of just under $5
Here’s what I formally propose:
- A setting which allows a specific amount of token input and output for a specified user group each month or day for AI bots.
- This token usage would NOT include the system prompt for the personas.
- Token limits can be either per LLM, per persona, or universal/altogether.
- Alternatively to point 1, a simple integer limit for using AI bots in DMs and PMs could be used. Example: limit of 100 DMs to any persona per day.
- A setting which allows a specific amount of token output for a specified user group each month or day for the AI helper.
- Token input can be uncounted since it’d be impractical to expect the user to guess how many tokens a long topic is when, for example, generating a summary.
- It might also be wise to put a hard integer limit on the length (in words so that TikToken doesn’t have to be used here) for custom prompts so users don’t attempt to circumvent their monthly/daily limits by using the Composer as an unmetered chatbot
- A token counter in the user’s profile and perhaps even in their PMs and DMs. It would be cool if there was a tiny text next to each user and AI message which displays the number of tokens it is (we don’t necessarily want to allow everyone the debug feature, and that only works in PMs anyway)
- A separate token counter for AI helper (to help keep these two features separate) which shares a count between explain, proofread, custom prompt, etc…
Side-note: I am not at all knocking this feature nor the devs in any way and I apologize if any part of this comes off that way. Honestly the Discourse AI plugin is one of my all-time favorites bits of technology. Actually, it is allowing me to build my dream business as an AI researcher and educator without having to hire extra engineers and pay for additional infrastructure — I can set it all up by myself . I merely think that this feature is the last piece to the puzzle not only for me, but for numerous other Discoursers who want to let their users enjoy this wonderful technology within reason.