AI exceeds LLM token thresholds randomly and unpredictably

I configured the LLM to a maximum of 8000 output tokens, discourse AI regularly sends requests that exceeds that threshold.

Over time with trial and error and by looks at logs (which in itself a problem since there is nothing indicated on the dashboard), I reduced the numbers to 7800, 7500, 7200 and finally 7000 after days of trail an error.

Suddenly after months of working it stops working and after debugging it I realized in some edge cases again Discourse as requesting > 8000 output tokens even through the LLM is configured to set a maximum of 7000 tokens.

Message (3 copies reported)

DiscourseAi::Completions::Endpoints::OpenAi: status: 413 - body: {"error":{"message":"Request too large for model `openai/gpt-oss-120b` in organization `org_01kccx1be8f` service tier `on_demand` on tokens per minute (TPM): Limit 8000, Requested 8102, please reduce your message size and try again. Need more tokens? Upgrade to Dev Tier today at https://console.groq.com/settings/billing","type":"tokens","code":"rate_limit_exceeded"}}


Backtrace

activesupport-8.0.5/lib/active_support/broadcast_logger.rb:218:in 'block in ActiveSupport::BroadcastLogger#dispatch'
activesupport-8.0.5/lib/active_support/broadcast_logger.rb:217:in 'Array#map'
activesupport-8.0.5/lib/active_support/broadcast_logger.rb:217:in 'ActiveSupport::BroadcastLogger#dispatch'
activesupport-8.0.5/lib/active_support/broadcast_logger.rb:129:in 'ActiveSupport::BroadcastLogger#error'
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:202:in 'block (2 levels) in DiscourseAi::Completions::Endpoints::Base#perform_completion!'
net-http-0.9.1/lib/net/http.rb:2461:in 'block in Net::HTTP#transport_request'
net-http-0.9.1/lib/net/http/response.rb:321:in 'Net::HTTPResponse#reading_body'
net-http-0.9.1/lib/net/http.rb:2458:in 'Net::HTTP#transport_request'
net-http-0.9.1/lib/net/http.rb:2410:in 'Net::HTTP#request'
rack-mini-profiler-4.0.1/lib/patches/net_patches.rb:19:in 'block in Net::HTTP#request_with_mini_profiler'
rack-mini-profiler-4.0.1/lib/mini_profiler/profiling_methods.rb:51:in 'Rack::MiniProfiler::ProfilingMethods#step'
rack-mini-profiler-4.0.1/lib/patches/net_patches.rb:18:in 'Net::HTTP#request_with_mini_profiler'
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:198:in 'block in DiscourseAi::Completions::Endpoints::Base#perform_completion!'
net-http-0.9.1/lib/net/http.rb:1630:in 'Net::HTTP#start'
net-http-0.9.1/lib/net/http.rb:1064:in 'Net::HTTP.start'
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:146:in 'DiscourseAi::Completions::Endpoints::Base#perform_completion!'
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/open_ai_shared.rb:28:in 'DiscourseAi::Completions::Endpoints::OpenAiShared#perform_completion!'
/var/www/discourse/plugins/discourse-ai/lib/completions/llm.rb:214:in 'DiscourseAi::Completions::Llm#generate'
/var/www/discourse/plugins/discourse-ai/lib/agents/bot.rb:144:in 'DiscourseAi::Agents::Bot#reply'
/var/www/discourse/plugins/discourse-ai/lib/translation/base_translator.rb:55:in 'DiscourseAi::Translation::BaseTranslator#get_translation'
/var/www/discourse/plugins/discourse-ai/lib/translation/base_translator.rb:31:in 'block in DiscourseAi::Translation::BaseTranslator#translate'
/var/www/discourse/plugins/discourse-ai/lib/translation/base_translator.rb:31:in 'Array#map'
/var/www/discourse/plugins/discourse-ai/lib/translation/base_translator.rb:31:in 'DiscourseAi::Translation::BaseTranslator#translate'
/var/www/discourse/plugins/discourse-ai/lib/translation/post_localizer.rb:17:in 'DiscourseAi::Translation::PostLocalizer.localize'
/var/www/discourse/plugins/discourse-ai/app/jobs/regular/localize_posts.rb:39:in 'block in Jobs::LocalizePosts#execute'
/var/www/discourse/plugins/discourse-ai/app/jobs/regular/localize_posts.rb:29:in 'Array#each'
/var/www/discourse/plugins/discourse-ai/app/jobs/regular/localize_posts.rb:29:in 'Jobs::LocalizePosts#execute'
/var/www/discourse/app/jobs/base.rb:318:in 'block (2 levels) in Jobs::Base#perform'
rails_multisite-7.0.0/lib/rails_multisite/connection_management/null_instance.rb:49:in 'RailsMultisite::ConnectionManagement::NullInstance#with_connection'
rails_multisite-7.0.0/lib/rails_multisite/connection_management.rb:17:in 'RailsMultisite::ConnectionManagement.with_connection'
/var/www/discourse/app/jobs/base.rb:305:in 'block in Jobs::Base#perform'
/var/www/discourse/app/jobs/base.rb:301:in 'Array#each'
/var/www/discourse/app/jobs/base.rb:301:in 'Jobs::Base#perform'
sidekiq-7.3.10/lib/sidekiq/processor.rb:220:in 'Sidekiq::Processor#execute_job'
sidekiq-7.3.10/lib/sidekiq/processor.rb:185:in 'block (4 levels) in Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:180:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
/var/www/discourse/lib/sidekiq/suppress_user_email_errors.rb:6:in 'Sidekiq::SuppressUserEmailErrors#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
/var/www/discourse/lib/sidekiq/discourse_event.rb:6:in 'Sidekiq::DiscourseEvent#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
/var/www/discourse/lib/sidekiq/pausable.rb:131:in 'Sidekiq::Pausable#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/job/interrupt_handler.rb:9:in 'Sidekiq::Job::InterruptHandler#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/metrics/tracking.rb:26:in 'Sidekiq::Metrics::ExecutionTracker#track'
sidekiq-7.3.10/lib/sidekiq/metrics/tracking.rb:134:in 'Sidekiq::Metrics::Middleware#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:173:in 'Sidekiq::Middleware::Chain#invoke'
sidekiq-7.3.10/lib/sidekiq/processor.rb:184:in 'block (3 levels) in Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/processor.rb:145:in 'block (6 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/job_retry.rb:118:in 'Sidekiq::JobRetry#local'
sidekiq-7.3.10/lib/sidekiq/processor.rb:144:in 'block (5 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/config.rb:39:in 'block in <class:Config>'
sidekiq-7.3.10/lib/sidekiq/processor.rb:139:in 'block (4 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/processor.rb:281:in 'Sidekiq::Processor#stats'
sidekiq-7.3.10/lib/sidekiq/processor.rb:134:in 'block (3 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/job_logger.rb:15:in 'Sidekiq::JobLogger#call'
sidekiq-7.3.10/lib/sidekiq/processor.rb:133:in 'block (2 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/job_retry.rb:85:in 'Sidekiq::JobRetry#global'
sidekiq-7.3.10/lib/sidekiq/processor.rb:132:in 'block in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/job_logger.rb:40:in 'Sidekiq::JobLogger#prepare'
sidekiq-7.3.10/lib/sidekiq/processor.rb:131:in 'Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/processor.rb:183:in 'block (2 levels) in Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/processor.rb:182:in 'Thread.handle_interrupt'
sidekiq-7.3.10/lib/sidekiq/processor.rb:182:in 'block in Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/processor.rb:181:in 'Thread.handle_interrupt'
sidekiq-7.3.10/lib/sidekiq/processor.rb:181:in 'Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/processor.rb:86:in 'Sidekiq::Processor#process_one'
sidekiq-7.3.10/lib/sidekiq/processor.rb:76:in 'Sidekiq::Processor#run'
sidekiq-7.3.10/lib/sidekiq/component.rb:10:in 'Sidekiq::Component#watchdog'
sidekiq-7.3.10/lib/sidekiq/component.rb:19:in 'block in Sidekiq::Component#safe_thread'

This is very problematic and makes for a poor experience using AI with discourse. Issue with LLM configuration don’t show up on the dashboard and when discourse ignore the LLM configuration it leads to unpredictable issues and frustrations.

Discourse needs to find way to limit itself to the LLM configuration parameters and show problems on the admin dashboard.

Are you confusing request tokens with response tokens?

413 means that your request was too large, not your requested response.

To handle that you want to tweak the Context window LLM configuration, but I’d warn that 8k tokens is way too small nowadays. It will work for some features, but it’s not exactly something we exercize much nowadays when LLMs are handling 1M token long context windows. I can run a 256k context window on my desktop PC using a model that is much better than the one you are using.

The context windows is set to 130k

But that brings me back to the same issue. The model limit on groq is 131,072; I’ve already made it 130,000. I shouldn’t have to experiment with the limits to figure out how much discourse is sending. Discourse should be able to operate within the limits provided by the LLM configuration

What I don’t understand is why reducing the max output tokens seems to fix the issue. I haven’t made any changes to the context window, just reduced the max output token further and it’s started working and picking up from where it left off.

Just FYI the problem first started with the translation service getting stuck and running out of tokens:

DiscourseAi::Completions::Endpoints::OpenAi: status: 429 - body: {“error”:{“message”:“Rate limit reached for model openai/gpt-oss-120b in organization org_01kccx1be8fffaz5sbe17 service tier on_demand on tokens per day (TPD): Limit 200000, Used 193487, Requested 7464. Please try again in 6m50.832s. Need more tokens? Upgrade to Dev Tier today at https://console.groq.com/settings/billing",“type”:“tokens”,“code”:"rate_limit_exceeded”}}

Then I paused the service for 24 hours for the daily rate limits to reset. After restarting it then I noticed this error:

DiscourseAi::Completions::Endpoints::OpenAi: status: 413 - body: {“error”:{“message”:“Request too large for model openai/gpt-oss-120b in organization org_01kccx1be8fffaz5sbe17 service tier on_demand on tokens per minute (TPM): Limit 8000, Requested 8102, please reduce your message size and try again. Need more tokens? Upgrade to Dev Tier today at https://console.groq.com/settings/billing",“type”:“tokens”,“code”:"rate_limit_exceeded”}}

Then I reduced max output tokens from 7000 to 6800 in the LLM configuration and it started working again.

What am I missing here? Are you suggesting it’s related to context window and nothing to do with max output tokens? Just trying to figure out how to match configuration numbers from groq / model limits to discourse LLM configurations.