AI 随机且不可预测地超出 LLM 令牌阈值

我已将LLM的输出令牌上限配置为8000,但Discourse AI经常发送超出该阈值的请求。

经过长时间的反复试验,并查看日志(这本身就是一个问题,因为仪表板上没有任何提示),我将数值从8000逐步降低到7800、7500、7200,最终在数天的反复调试后定为7000。

然而,在正常运行数月后,系统突然停止工作。经过调试,我发现即使在某些边缘情况下,尽管LLM已配置为最大7000个令牌,Discourse仍然请求超过8000个输出令牌。

消息(报告了3份副本)

DiscourseAi::Completions::Endpoints::OpenAi: 状态:413 - 正文:{"error":{"message":"请求对于模型 `openai/gpt-oss-120b` 过大,组织 `org_01kccx1be8f` 服务层级 `on_demand` 的每分钟令牌数(TPM)限制为 8000,请求量为 8102,请减小消息大小后重试。需要更多令牌?请立即在 https://console.groq.com/settings/billing 升级到开发者层级","type":"tokens","code":"rate_limit_exceeded"}}


回溯

activesupport-8.0.5/lib/active_support/broadcast_logger.rb:218:in 'block in ActiveSupport::BroadcastLogger#dispatch'
activesupport-8.0.5/lib/active_support/broadcast_logger.rb:217:in 'Array#map'
activesupport-8.0.5/lib/active_support/broadcast_logger.rb:217:in 'ActiveSupport::BroadcastLogger#dispatch'
activesupport-8.0.5/lib/active_support/broadcast_logger.rb:129:in 'ActiveSupport::BroadcastLogger#error'
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:202:in 'block (2 levels) in DiscourseAi::Completions::Endpoints::Base#perform_completion!'
net-http-0.9.1/lib/net/http.rb:2461:in 'block in Net::HTTP#transport_request'
net-http-0.9.1/lib/net/http/response.rb:321:in 'Net::HTTPResponse#reading_body'
net-http-0.9.1/lib/net/http.rb:2458:in 'Net::HTTP#transport_request'
net-http-0.9.1/lib/net/http.rb:2410:in 'Net::HTTP#request'
rack-mini-profiler-4.0.1/lib/patches/net_patches.rb:19:in 'block in Net::HTTP#request_with_mini_profiler'
rack-mini-profiler-4.0.1/lib/mini_profiler/profiling_methods.rb:51:in 'Rack::MiniProfiler::ProfilingMethods#step'
rack-mini-profiler-4.0.1/lib/patches/net_patches.rb:18:in 'Net::HTTP#request_with_mini_profiler'
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:198:in 'block in DiscourseAi::Completions::Endpoints::Base#perform_completion!'
net-http-0.9.1/lib/net/http.rb:1630:in 'Net::HTTP#start'
net-http-0.9.1/lib/net/http.rb:1064:in 'Net::HTTP.start'
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:146:in 'DiscourseAi::Completions::Endpoints::Base#perform_completion!'
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/open_ai_shared.rb:28:in 'DiscourseAi::Completions::Endpoints::OpenAiShared#perform_completion!'
/var/www/discourse/plugins/discourse-ai/lib/completions/llm.rb:214:in 'DiscourseAi::Completions::Llm#generate'
/var/www/discourse/plugins/discourse-ai/lib/agents/bot.rb:144:in 'DiscourseAi::Agents::Bot#reply'
/var/www/discourse/plugins/discourse-ai/lib/translation/base_translator.rb:55:in 'DiscourseAi::Translation::BaseTranslator#get_translation'
/var/www/discourse/plugins/discourse-ai/lib/translation/base_translator.rb:31:in 'block in DiscourseAi::Translation::BaseTranslator#translate'
/var/www/discourse/plugins/discourse-ai/lib/translation/base_translator.rb:31:in 'Array#map'
/var/www/discourse/plugins/discourse-ai/lib/translation/base_translator.rb:31:in 'DiscourseAi::Translation::BaseTranslator#translate'
/var/www/discourse/plugins/discourse-ai/lib/translation/post_localizer.rb:17:in 'DiscourseAi::Translation::PostLocalizer.localize'
/var/www/discourse/plugins/discourse-ai/app/jobs/regular/localize_posts.rb:39:in 'block in Jobs::LocalizePosts#execute'
/var/www/discourse/plugins/discourse-ai/app/jobs/regular/localize_posts.rb:29:in 'Array#each'
/var/www/discourse/plugins/discourse-ai/app/jobs/regular/localize_posts.rb:29:in 'Jobs::LocalizePosts#execute'
/var/www/discourse/app/jobs/base.rb:318:in 'block (2 levels) in Jobs::Base#perform'
rails_multisite-7.0.0/lib/rails_multisite/connection_management/null_instance.rb:49:in 'RailsMultisite::ConnectionManagement::NullInstance#with_connection'
rails_multisite-7.0.0/lib/rails_multisite/connection_management.rb:17:in 'RailsMultisite::ConnectionManagement.with_connection'
/var/www/discourse/app/jobs/base.rb:305:in 'block in Jobs::Base#perform'
/var/www/discourse/app/jobs/base.rb:301:in 'Array#each'
/var/www/discourse/app/jobs/base.rb:301:in 'Jobs::Base#perform'
sidekiq-7.3.10/lib/sidekiq/processor.rb:220:in 'Sidekiq::Processor#execute_job'
sidekiq-7.3.10/lib/sidekiq/processor.rb:185:in 'block (4 levels) in Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:180:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
/var/www/discourse/lib/sidekiq/suppress_user_email_errors.rb:6:in 'Sidekiq::SuppressUserEmailErrors#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
/var/www/discourse/lib/sidekiq/discourse_event.rb:6:in 'Sidekiq::DiscourseEvent#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
/var/www/discourse/lib/sidekiq/pausable.rb:131:in 'Sidekiq::Pausable#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/job/interrupt_handler.rb:9:in 'Sidekiq::Job::InterruptHandler#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:183:in 'block in Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/metrics/tracking.rb:26:in 'Sidekiq::Metrics::ExecutionTracker#track'
sidekiq-7.3.10/lib/sidekiq/metrics/tracking.rb:134:in 'Sidekiq::Metrics::Middleware#call'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:182:in 'Sidekiq::Middleware::Chain#traverse'
sidekiq-7.3.10/lib/sidekiq/middleware/chain.rb:173:in 'Sidekiq::Middleware::Chain#invoke'
sidekiq-7.3.10/lib/sidekiq/processor.rb:184:in 'block (3 levels) in Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/processor.rb:145:in 'block (6 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/job_retry.rb:118:in 'Sidekiq::JobRetry#local'
sidekiq-7.3.10/lib/sidekiq/processor.rb:144:in 'block (5 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/config.rb:39:in 'block in <class:Config>'
sidekiq-7.3.10/lib/sidekiq/processor.rb:139:in 'block (4 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/processor.rb:281:in 'Sidekiq::Processor#stats'
sidekiq-7.3.10/lib/sidekiq/processor.rb:134:in 'block (3 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/job_logger.rb:15:in 'Sidekiq::JobLogger#call'
sidekiq-7.3.10/lib/sidekiq/processor.rb:133:in 'block (2 levels) in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/job_retry.rb:85:in 'Sidekiq::JobRetry#global'
sidekiq-7.3.10/lib/sidekiq/processor.rb:132:in 'block in Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/job_logger.rb:40:in 'Sidekiq::JobLogger#prepare'
sidekiq-7.3.10/lib/sidekiq/processor.rb:131:in 'Sidekiq::Processor#dispatch'
sidekiq-7.3.10/lib/sidekiq/processor.rb:183:in 'block (2 levels) in Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/processor.rb:182:in 'Thread.handle_interrupt'
sidekiq-7.3.10/lib/sidekiq/processor.rb:182:in 'block in Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/processor.rb:181:in 'Thread.handle_interrupt'
sidekiq-7.3.10/lib/sidekiq/processor.rb:181:in 'Sidekiq::Processor#process'
sidekiq-7.3.10/lib/sidekiq/processor.rb:86:in 'Sidekiq::Processor#process_one'
sidekiq-7.3.10/lib/sidekiq/processor.rb:76:in 'Sidekiq::Processor#run'
sidekiq-7.3.10/lib/sidekiq/component.rb:10:in 'Sidekiq::Component#watchdog'
sidekiq-7.3.10/lib/sidekiq/component.rb:19:in 'block in Sidekiq::Component#safe_thread'

这非常成问题,导致使用Discourse AI的体验很差。LLM配置问题不会在仪表板上显示,而当Discourse忽略LLM配置时,会导致不可预测的问题和挫败感。

Discourse需要找到一种方法来确保自身遵守LLM配置参数,并在管理仪表板上显示相关问题。

你是不是把请求令牌和响应令牌搞混了?

413 错误表示你的请求体过大,而不是你请求的响应过大。

要解决这个问题,你需要调整 LLM 配置中的“上下文窗口”(Context window)。但我要提醒一下,如今 8k 的令牌容量实在太小了。它可能适用于某些功能,但在如今大模型普遍支持百万级令牌上下文窗口的情况下,这已经不太符合主流需求了。我可以在自己的台式机上用比你所用模型性能更好的模型运行 256k 的上下文窗口。

上下文窗口已设置为 130k

但这又回到了同一个问题。Groq 上的模型限制是 131,072;我已经将其设为 130,000。我不应该需要通过实验来摸索 Discourse 实际发送了多少内容。Discourse 应当能够在 LLM 配置提供的限制内正常运行。

我不理解的是,为什么减少最大输出 token 数似乎能解决问题。我并没有更改上下文窗口,只是进一步减少了最大输出 token 数,结果它就开始正常工作,并从断点处继续处理了。

仅供参考,问题最初是从翻译服务卡住并耗尽 token 开始的:

DiscourseAi::Completions::Endpoints::OpenAi: 状态码:429 - 响应体:{“error”:{“message”:“模型 openai/gpt-oss-120b 在组织 org_01kccx1be8fffaz5sbe17 的服务层级 on_demand 上已达到每日 token 限制(TPD):限制 200000,已用 193487,请求 7464。请在 6 分 50.832 秒后重试。需要更多 token?请立即升级到 Dev Tier:https://console.groq.com/settings/billing",“type”:“tokens”,“code”:"rate_limit_exceeded”}}

随后我将该服务暂停了 24 小时,等待每日限制重置。重启服务后,我注意到以下错误:

DiscourseAi::Completions::Endpoints::OpenAi: 状态码:413 - 响应体:{“error”:{“message”:“请求过大,模型 openai/gpt-oss-120b 在组织 org_01kccx1be8fffaz5sbe17 的服务层级 on_demand 上已达到每分钟 token 限制(TPM):限制 8000,请求 8102,请减小消息大小后重试。需要更多 token?请立即升级到 Dev Tier:https://console.groq.com/settings/billing",“type”:“tokens”,“code”:"rate_limit_exceeded”}}

之后,我在 LLM 配置中将最大输出 token 数从 7000 降至 6800,服务便恢复正常。

我是不是遗漏了什么?您是否认为这与上下文窗口有关,而与最大输出 token 数无关?我只是想弄清楚如何将 Groq/模型限制中的配置数值与 Discourse LLM 配置相匹配。