Discourse AI 引发新的 SSL 和 Connection Reset by Peer 错误

优先级/严重性
最近的仓库变更导致 Discourse AI 在当前 OpenAI API 下基本无法使用。

平台

  • 自托管,使用标准独立构建
  • Ubuntu 24.04 主机虚拟机,Docker 容器
  • OpenAI API
  • Anthropic API

描述

Discourse AI 一直调用外部 API(OpenAI),使用相关模型,截至 2 月 15 日(上次容器重建)运行良好。今天(2 月 21 日)我重建了容器,但功能无法正常工作。

以下是我所知的情况:

截至 2 月 15 日
已配置且运行良好的 OpenAI 模型:

  • LLM/Persona
    • GPT4 Omni
    • GPT4 Omni Mini
  • 嵌入(Embeddings)
    • text-embedding-ada-002

截至 2 月 21 日

所有 OpenAI 模型在 LLM 调用中的错误率约为 70-80%,错误信息为“连接被对端重置(Connection Reset by Peer)”。部分聊天能成功,部分在中间失败。嵌入调用因 Faraday::ConnectionFailed SSL 错误而失败。

其他 OpenAI 模型也失败:

  • o1-mini 和 o1-preview 在测试/保存 LLM 时因代码错误(‘developer’ 不是有效角色)而失败,因为 ‘developer’ 角色仅对 o1o3 模型有效,不适用于它们的 -mini 版本。源代码 github.com/discourse/discourse-ai/…/chat_gpt.rb:61 需要更新,以进行精确的模型名称匹配,而不是 starts_with 匹配。对于第 73 行的 else 情况,已不再存在 system 用户,需要更新为简单的 user。截至今天,o1-mini 无法使用工具。

已尝试:

  • 检查 OpenAI 平台限制,我们远低于速率限制,且 OpenAI 账户资金充足。
  • 重建容器
  • 删除并重新创建 LLM 角色和用户
  • 删除并重新创建 LLM 模型
  • 创建新的 API 密钥
  • 确保容器内的 SSL 和证书已更新
  • 登录容器并使用 bash & curl 调用 API(成功)
  • 登录 rails 控制台 RAILS_ENV=production bundle exec rails console 并使用 http 对象调用 OpenAI API(成功)
  • Anthropic API 调用 claude-3.5-sonnet(成功)

可复现步骤

使用最新 Discourse 创建新容器构建,并将 Discourse AI 插件添加到插件中:

  ...
  after_code:
    - exec:
        cd: $home/plugins
        cmd:
          - git clone https://github.com/discourse/discourse-ai.git

使用以下配置 OpenAI LLM 和嵌入模型:

  • GPT4 Omni, GPT4 Omni Mini
    • 所有默认值,输入您的 API 密钥
    • Tokens: 64000
    • 点击“运行测试”,等待响应,有时成功,经常显示“内部服务器错误”。成功时,尝试与角色聊天会显示“推理 LLM 模型”堆栈跟踪
  • text-embedding-ada-002, text-embedding-3-large
    • 保存成功,但生成错误日志,每 5 分钟重复多次

内部服务器错误堆栈跟踪

内部服务器错误堆栈跟踪
消息(报告了 2 份副本)
Errno::ECONNRESET (连接被对端重置)
app/controllers/application_controller.rb:427:in `block in with_resolved_locale'
app/controllers/application_controller.rb:427:in `with_resolved_locale'
lib/middleware/omniauth_bypass_middleware.rb:35:in `call'
lib/content_security_policy/middleware.rb:12:in `call'
lib/middleware/anonymous_cache.rb:409:in `call'
lib/middleware/csp_script_nonce_injector.rb:12:in `call'
config/initializers/008-rack-cors.rb:26:in `call'
config/initializers/100-quiet_logger.rb:20:in `call'
config/initializers/100-silence_logger.rb:29:in `call'
lib/middleware/enforce_hostname.rb:24:in `call'
lib/middleware/processing_request.rb:12:in `call'
lib/middleware/request_tracker.rb:385:in `call'
回溯
openssl (3.3.0) lib/openssl/buffering.rb:217:in `sysread_nonblock'
openssl (3.3.0) lib/openssl/buffering.rb:217:in `read_nonblock'
net-protocol (0.2.2) lib/net/protocol.rb:218:in `rbuf_fill'
net-protocol (0.2.2) lib/net/protocol.rb:199:in `readuntil'
net-protocol (0.2.2) lib/net/protocol.rb:209:in `readline'
net-http (0.6.0) lib/net/http/response.rb:625:in `read_chunked'
net-http (0.6.0) lib/net/http/response.rb:595:in `block in read_body_0'
net-http (0.6.0) lib/net/http/response.rb:570:in `inflater'
net-http (0.6.0) lib/net/http/response.rb:593:in `read_body_0'
net-http (0.6.0) lib/net/http/response.rb:363:in `read_body'
plugins/discourse-ai/lib/completions/endpoints/base.rb:374:in `non_streaming_response'
plugins/discourse-ai/lib/completions/endpoints/base.rb:160:in `block (2 levels) in perform_completion!'
net-http (0.6.0) lib/net/http.rb:2433:in `block in transport_request'
net-http (0.6.0) lib/net/http/response.rb:320:in `reading_body'
net-http (0.6.0) lib/net/http.rb:2430:in `transport_request'
net-http (0.6.0) lib/net/http.rb:2384:in `request'
rack-mini-profiler (3.3.1) lib/patches/net_patches.rb:19:in `block in request_with_mini_profiler' 
rack-mini-profiler (3.3.1) lib/mini_profiler/profiling_methods.rb:44:in `step' 
rack-mini-profiler (3.3.1) lib/patches/net_patches.rb:18:in `request_with_mini_profiler' 
(eval at /var/www/discourse/lib/method_profiler.rb:38):12:in `request'
plugins/discourse-ai/lib/completions/endpoints/base.rb:122:in `block in perform_completion!'
net-http (0.6.0) lib/net/http.rb:1632:in `start'
net-http (0.6.0) lib/net/http.rb:1070:in `start'
plugins/discourse-ai/lib/completions/endpoints/base.rb:105:in `perform_completion!'
plugins/discourse-ai/lib/completions/endpoints/open_ai.rb:44:in `perform_completion!'
plugins/discourse-ai/lib/completions/llm.rb:281:in `generate'
plugins/discourse-ai/lib/configuration/llm_validator.rb:36:in `run_test'
plugins/discourse-ai/app/controllers/discourse_ai/admin/ai_llms_controller.rb:128:in `test'
actionpack (7.2.2.1) lib/action_controller/metal/basic_implicit_render.rb:8:in `send_action'
actionpack (7.2.2.1) lib/abstract_controller/base.rb:226:in `process_action'
actionpack (7.2.2.1) lib/action_controller/metal/rendering.rb:193:in `process_action'
actionpack (7.2.2.1) lib/abstract_controller/callbacks.rb:261:in `block in process_action'
activesupport (7.2.2.1) lib/active_support/callbacks.rb:121:in `block in run_callbacks'
app/controllers/application_controller.rb:427:in `block in with_resolved_locale'
i18n (1.14.7) lib/i18n.rb:353:in `with_locale'
app/controllers/application_controller.rb:427:in `with_resolved_locale'
activesupport (7.2.2.1) lib/active_support/callbacks.rb:130:in `block in run_callbacks'
activesupport (7.2.2.1) lib/active_support/callbacks.rb:141:in `run_callbacks'
actionpack (7.2.2.1) lib/abstract_controller/callbacks.rb:260:in `process_action'
actionpack (7.2.2.1) lib/action_controller/metal/rescue.rb:27:in `process_action'
actionpack (7.2.2.1) lib/action_controller/metal/instrumentation.rb:77:in `block in process_action'
activesupport (7.2.2.1) lib/active_support/notifications.rb:210:in `block in instrument'
activesupport (7.2.2.1) lib/active_support/notifications/instrumenter.rb:58:in `instrument'
activesupport (7.2.2.1) lib/active_support/notifications.rb:210:in `instrument'
actionpack (7.2.2.1) lib/action_controller/metal/instrumentation.rb:76:in `process_action'
actionpack (7.2.2.1) lib/action_controller/metal/params_wrapper.rb:259:in `process_action'
activerecord (7.2.2.1) lib/active_record/railties/controller_runtime.rb:39:in `process_action'
actionpack (7.2.2.1) lib/abstract_controller/base.rb:163:in `process'
actionview (7.2.2.1) lib/action_view/rendering.rb:40:in `process'
rack-mini-profiler (3.3.1) lib/mini_profiler/profiling_methods.rb:115:in `block in profile_method' 
actionpack (7.2.2.1) lib/action_controller/metal.rb:252:in `dispatch'
actionpack (7.2.2.1) lib/action_controller/metal.rb:335:in `dispatch'
actionpack (7.2.2.1) lib/action_dispatch/routing/route_set.rb:67:in `dispatch'
actionpack (7.2.2.1) lib/action_dispatch/routing/route_set.rb:50:in `serve'
actionpack (7.2.2.1) lib/action_dispatch/routing/mapper.rb:32:in `block in <class:Constraints>'
actionpack (7.2.2.1) lib/action_dispatch/routing/mapper.rb:62:in `serve'
actionpack (7.2.2.1) lib/action_dispatch/journey/router.rb:53:in `block in serve'
actionpack (7.2.2.1) lib/action_dispatch/journey/router.rb:133:in `block in find_routes'
actionpack (7.2.2.1) lib/action_dispatch/journey/router.rb:126:in `each'
actionpack (7.2.2.1) lib/action_dispatch/journey/router.rb:126:in `find_routes'
actionpack (7.2.2.1) lib/action_dispatch/journey/router.rb:34:in `serve'
actionpack (7.2.2.1) lib/action_dispatch/routing/route_set.rb:896:in `call'
lib/middleware/omniauth_bypass_middleware.rb:35:in `call'
rack (2.2.11) lib/rack/tempfile_reaper.rb:15:in `call'
rack (2.2.11) lib/rack/conditional_get.rb:27:in `call'
rack (2.2.11) lib/rack/head.rb:12:in `call'
actionpack (7.2.2.1) lib/action_dispatch/http/permissions_policy.rb:38:in `call'
lib/content_security_policy/middleware.rb:12:in `call'
lib/middleware/anonymous_cache.rb:409:in `call'
lib/middleware/csp_script_nonce_injector.rb:12:in `call'
config/initializers/008-rack-cors.rb:26:in `call'
rack (2.2.11) lib/rack/session/abstract/id.rb:266:in `context'
rack (2.2.11) lib/rack/session/abstract/id.rb:260:in `call'
actionpack (7.2.2.1) lib/action_dispatch/middleware/cookies.rb:704:in `call'
actionpack (7.2.2.1) lib/action_dispatch/middleware/callbacks.rb:31:in `block in call'
activesupport (7.2.2.1) lib/active_support/callbacks.rb:101:in `run_callbacks'
actionpack (7.2.2.1) lib/action_dispatch/middleware/callbacks.rb:30:in `call'
actionpack (7.2.2.1) lib/action_dispatch/middleware/debug_exceptions.rb:31:in `call'
actionpack (7.2.2.1) lib/action_dispatch/middleware/show_exceptions.rb:32:in `call'
logster (2.20.1) lib/logster/middleware/reporter.rb:40:in `call'
railties (7.2.2.1) lib/rails/rack/logger.rb:41:in `call_app'
railties (7.2.2.1) lib/rails/rack/logger.rb:29:in `call'
config/initializers/100-quiet_logger.rb:20:in `call'
config/initializers/100-silence_logger.rb:29:in `call'
actionpack (7.2.2.1) lib/action_dispatch/middleware/request_id.rb:33:in `call'
lib/middleware/enforce_hostname.rb:24:in `call'
rack (2.2.11) lib/rack/method_override.rb:24:in `call'
actionpack (7.2.2.1) lib/action_dispatch/middleware/executor.rb:16:in `call'
rack (2.2.11) lib/rack/sendfile.rb:110:in `call'
plugins/discourse-prometheus/lib/middleware/metrics.rb:14:in `call'
rack-mini-profiler (3.3.1) lib/mini_profiler.rb:334:in `call'
lib/middleware/processing_request.rb:12:in `call'
message_bus (4.3.9) lib/message_bus/rack/middleware.rb:60:in `call'
lib/middleware/request_tracker.rb:385:in `call'
actionpack (7.2.2.1) lib/action_dispatch/middleware/remote_ip.rb:96:in `call'
railties (7.2.2.1) lib/rails/engine.rb:535:in `call'
railties (7.2.2.1) lib/rails/railtie.rb:226:in `public_send'
railties (7.2.2.1) lib/rails/railtie.rb:226:in `method_missing'
rack (2.2.11) lib/rack/urlmap.rb:74:in `block in call'
rack (2.2.11) lib/rack/urlmap.rb:58:in `each'
rack (2.2.11) lib/rack/urlmap.rb:58:in `call'
unicorn (6.1.0) lib/unicorn/http_server.rb:634:in `process_client'
unicorn (6.1.0) lib/unicorn/http_server.rb:739:in `worker_loop'
unicorn (6.1.0) lib/unicorn/http_server.rb:547:in `spawn_missing_workers'
unicorn (6.1.0) lib/unicorn/http_server.rb:143:in `start'
unicorn (6.1.0) bin/unicorn:128:in `<top (required)>'
vendor/bundle/ruby/3.3.0/bin/unicorn:25:in `load'
vendor/bundle/ruby/3.3.0/bin/unicorn:25:in `<main>'

查看日志错误如下:

嵌入模型

日志中的错误消息:(每 5 分钟) 连接被对端重置 (Faraday::ConnectionFailed)

application_version: 00907363d4b290df1c755df1a2494b95265e40b4

job: Jobs::EmbeddingsBackfill

嵌入模型错误堆栈跟踪

嵌入模型错误堆栈跟踪
作业异常:5 个错误
连接被对端重置 (Faraday::ConnectionFailed)
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/openssl-3.3.0/lib/openssl/buffering.rb:217:in `sysread_nonblock'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/openssl-3.3.0/lib/openssl/buffering.rb:217:in `read_nonblock'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:218:in `rbuf_fill'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:199:in `readuntil'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:209:in `readline'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-http-0.6.0/lib/net/http/response.rb:625:in `read_chunked'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-http-0.6.0/lib/net/http/response.rb:595:in `block in read_body_0'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-http-0.6.0/lib/net/http/response.rb:570:in `inflater'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-http-0.6.0/lib/net/http/response.rb:593:in `read_body_0'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-http-0.6.0/lib/net/http/response.rb:363:in `read_body'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-http-0.6.0/lib/net/http/response.rb:401:in `body'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-http-0.6.0/lib/net/http/response.rb:321:in `reading_body'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-http-0.6.0/lib/net/http.rb:2430:in `transport_request'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/net-http-0.6.0/lib/net/http.rb:2384:in `request'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rack-mini-profiler-3.3.1/lib/patches/net_patches.rb:19:in `block in request_with_mini_profiler'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rack-mini-profiler-3.3.1/lib/mini_profiler/profiling_methods.rb:50:in `step'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rack-mini-profiler-3.3.1/lib/patches/net_patches.rb:18:in `request_with_mini_profil...
回溯
concurrent-ruby-1.3.5/lib/concurrent-ruby/concurrent/promises.rb:1268:in `raise' 
concurrent-ruby-1.3.5/lib/concurrent-ruby/concurrent/promises.rb:1268:in `wait_until_resolved!' 
concurrent-ruby-1.3.5/lib/concurrent-ruby/concurrent/promises.rb:998:in `value!' 
/var/www/discourse/plugins/discourse-ai/lib/embeddings/vector.rb:50:in `gen_bulk_reprensentations' 
/var/www/discourse/plugins/discourse-ai/app/jobs/scheduled/embeddings_backfill.rb:134:in `block in populate_topic_embeddings' 
/var/www/discourse/plugins/discourse-ai/app/jobs/scheduled/embeddings_backfill.rb:133:in `each' 
/var/www/discourse/plugins/discourse-ai/app/jobs/scheduled/embeddings_backfill.rb:133:in `each_slice' 
/var/www/discourse/plugins/discourse-ai/app/jobs/scheduled/embeddings_backfill.rb:133:in `populate_topic_embeddings' 
/var/www/discourse/plugins/discourse-ai/app/jobs/scheduled/embeddings_backfill.rb:36:in `execute' 
/var/www/discourse/app/jobs/base.rb:316:in `block (2 levels) in perform' 
rails_multisite-6.1.0/lib/rails_multisite/connection_management/null_instance.rb:49:in `with_connection'
rails_multisite-6.1.0/lib/rails_multisite/connection_management.rb:21:in `with_connection'
/var/www/discourse/app/jobs/base.rb:303:in `block in perform' 
/var/www/discourse/app/jobs/base.rb:299:in `each' 
/var/www/discourse/app/jobs/base.rb:299:in `perform' 
/var/www/discourse/app/jobs/base.rb:379:in `perform' 
mini_scheduler-0.18.0/lib/mini_scheduler/manager.rb:137:in `process_queue' 
mini_scheduler-0.18.0/lib/mini_scheduler/manager.rb:77:in `worker_loop' 
mini_scheduler-0.18.0/lib/mini_scheduler/manager.rb:63:in `block (2 levels) in ensure_worker_threads' 

推理 LLM 模型

日志中的错误消息:作业异常:连接被对端重置

application_version: 00907363d4b290df1c755df1a2494b95265e40b4

job: Jobs::CreateAiReply

LLM 模型错误堆栈跟踪

LLM 模型错误堆栈跟踪
消息
作业异常:连接被对端重置
回溯
openssl-3.3.0/lib/openssl/buffering.rb:217:in `sysread_nonblock' 
openssl-3.3.0/lib/openssl/buffering.rb:217:in `read_nonblock' 
net-protocol-0.2.2/lib/net/protocol.rb:218:in `rbuf_fill' 
net-protocol-0.2.2/lib/net/protocol.rb:199:in `readuntil' 
net-protocol-0.2.2/lib/net/protocol.rb:209:in `readline' 
net-http-0.6.0/lib/net/http/response.rb:625:in `read_chunked' 
net-http-0.6.0/lib/net/http/response.rb:595:in `block in read_body_0' 
net-http-0.6.0/lib/net/http/response.rb:570:in `inflater' 
net-http-0.6.0/lib/net/http/response.rb:593:in `read_body_0' 
net-http-0.6.0/lib/net/http/response.rb:363:in `read_body' 
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:374:in `non_streaming_response' 
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:160:in `block (2 levels) in perform_completion!' 
net-http-0.6.0/lib/net/http.rb:2433:in `block in transport_request' 
net-http-0.6.0/lib/net/http/response.rb:320:in `reading_body' 
net-http-0.6.0/lib/net/http.rb:2430:in `transport_request' 
net-http-0.6.0/lib/net/http.rb:2384:in `request' 
rack-mini-profiler-3.3.1/lib/patches/net_patches.rb:19:in `block in request_with_mini_profiler' 
rack-mini-profiler-3.3.1/lib/mini_profiler/profiling_methods.rb:50:in `step' 
rack-mini-profiler-3.3.1/lib/patches/net_patches.rb:18:in `request_with_mini_profiler' 
(eval at /var/www/discourse/lib/method_profiler.rb:38):5:in `request'
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:122:in `block in perform_completion!' 
net-http-0.6.0/lib/net/http.rb:1632:in `start' 
net-http-0.6.0/lib/net/http.rb:1070:in `start' 
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/base.rb:105:in `perform_completion!' 
/var/www/discourse/plugins/discourse-ai/lib/completions/endpoints/open_ai.rb:44:in `perform_completion!' 
/var/www/discourse/plugins/discourse-ai/lib/completions/llm.rb:281:in `generate' 
/var/www/discourse/plugins/discourse-ai/lib/ai_bot/bot.rb:65:in `get_updated_title' 
/var/www/discourse/plugins/discourse-ai/lib/ai_bot/playground.rb:252:in `title_playground' 
/var/www/discourse/plugins/discourse-ai/lib/ai_bot/playground.rb:561:in `ensure in reply_to' 
/var/www/discourse/plugins/discourse-ai/lib/ai_bot/playground.rb:561:in `reply_to' 
/var/www/discourse/plugins/discourse-ai/app/jobs/regular/create_ai_reply.rb:18:in `execute' 
/var/www/discourse/app/jobs/base.rb:316:in `block (2 levels) in perform' 
rails_multisite-6.1.0/lib/rails_multisite/connection_management/null_instance.rb:49:in `with_connection'
rails_multisite-6.1.0/lib/rails_multisite/connection_management.rb:21:in `with_connection'
/var/www/discourse/app/jobs/base.rb:303:in `block in perform' 
/var/www/discourse/app/jobs/base.rb:299:in `each' 
/var/www/discourse/app/jobs/base.rb:299:in `perform' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:202:in `execute_job' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:170:in `block (2 levels) in process' 
sidekiq-6.5.12/lib/sidekiq/middleware/chain.rb:177:in `block in invoke' 
/var/www/discourse/lib/sidekiq/pausable.rb:132:in `call' 
sidekiq-6.5.12/lib/sidekiq/middleware/chain.rb:179:in `block in invoke' 
sidekiq-6.5.12/lib/sidekiq/middleware/chain.rb:182:in `invoke' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:169:in `block in process' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch' 
sidekiq-6.5.12/lib/sidekiq/job_retry.rb:113:in `local' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch' 
sidekiq-6.5.12/lib/sidekiq.rb:44:in `block in <module:Sidekiq>' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:263:in `stats' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch' 
sidekiq-6.5.12/lib/sidekiq/job_logger.rb:13:in `call' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch' 
sidekiq-6.5.12/lib/sidekiq/job_retry.rb:80:in `global' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:124:in `block in dispatch' 
sidekiq-6.5.12/lib/sidekiq/job_logger.rb:39:in `prepare' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:123:in `dispatch' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:168:in `process' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:78:in `process_one' 
sidekiq-6.5.12/lib/sidekiq/processor.rb:68:in `run' 
sidekiq-6.5.12/lib/sidekiq/component.rb:8:in `watchdog' 
sidekiq-6.5.12/lib/sidekiq/component.rb:17:in `block in safe_thread' 

任何想法、建议等都非常欢迎。

1 个赞

我们在上个月在这个网站上对 OpenAI API 发送了超过 2000 次请求,我们的日志中没有此类错误。

我查看了我们托管的另一个大量使用 OpenAI 的网站,上个月有 80,000 次请求,他们没有 Faraday 错误,在此期间只有两个“Connection Reset by Peer”错误。

您是否可能在服务器的网络方面存在问题?我曾经因为有故障的 NIC 驱动程序而遇到过 Errno::ECONNRESET (Connection reset by peer)

我会看看这个问题。

1 个赞

正如您所写,我最初认为这是一个网络堆栈问题,但从同一个容器中重复调用 OpenAI API 却可以正常工作。

这也是在 2 月 21 日的最新提交中完成的。

为了证明这一点(不惜消耗 token),我编写了一个快速脚本来检查 OpenAI 网络堆栈。

  • 运行 600 秒(10 分钟)
  • 每秒进行一次聊天补全调用
  • 更改提示以避免缓存

在容器中执行 ./launcher enter app,保存以下脚本,使用 chmod +x test_openai.sh 使其可执行,然后使用 OPENAI_API_KEY=.... ./test_openai.sh 调用它。

test_openai.sh
#!/bin/bash

# 运行持续时间
DURATION_SECS=600

# 初始化计数器
successful=0
unsuccessful=0
declare -A error_messages

# 计算百分比的函数
calc_percentage() {
    local total=$(($1 + $2))
    if [ $total -eq 0 ]; then
        echo "0.00"
    else
        echo "scale=2; ($2 * 100) / $total" | bc
    fi
}

# 打印统计信息的函数
print_stats() {
    local percent=$(calc_percentage $successful $unsuccessful)
    echo "-------------------"
    echo "成功调用次数: $successful"
    echo "失败调用次数: $unsuccessful"
    echo "失败率: ${percent}%"
    echo "错误消息:"
    for error in "${!error_messages[@]}"; do
        echo "  - $error (${error_messages[$error]} 次)"
    done
}

end_time=$((SECONDS + DURATION_SECS))

counter=1
while [ $SECONDS -lt $end_time ]; do
    # 带超时进行 API 调用
    response=$(curl -s -w "\n%{http_code}" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $OPENAI_API_KEY" \
        -d "{
            \"model\": \"gpt-4o-mini\",
            \"messages\": [{\"role\": \"user\", \"content\": \"Use this number to choose a one word response: $counter\"}]
        }" \
        --connect-timeout 5 \
        --max-time 10 \
        https://api.openai.com/v1/chat/completions 2>&1)

    # 获取最后一行(状态码)和响应体
    http_code=$(echo "$response" | tail -n1)
    body=$(echo "$response" | sed '$d')

    # 检查调用是否成功
    if [ "$http_code" = "200" ]; then
        ((successful++))
    else
        ((unsuccessful++))
        # 提取错误消息
        error_msg=$(echo "$body" | grep -o '"message":"[^"]*"' | cut -d'"' -f4)
        if [ -z "$error_msg" ]; then
            error_msg="Connection error: $body"
        fi
        # 增加错误消息计数器
        ((error_messages["$error_msg"]++))
    fi

    # 打印当前统计信息
    print_stats

    ((counter++))
    
    # 等待 1 秒后进行下一次调用
    sleep 1
done

对于测试脚本,我的失败率低于 0.5%,在这个规模下是可以接受的。

这表明问题出在 Discourse 软件本身,而不是为其供电的容器或网络堆栈。

如果最近的提交没有修复它,我将进一步研究。

2 个赞

我在这里修复了 o1-mini 和 o1-preview 的回归问题:

不过我对 SSL 问题感到困惑,我们并没有更改底层库。

也许这与流式传输有关,尝试禁用你的 OpenAI LLM 上的流式传输,看看问题是否会自行解决?你那里的测试使用的是 gpt-4o-mini,但没有使用流式传输。

3 个赞

太棒了!干得好!

在诊断过程中,我发现了另一个错误——在 LLM 配置页面(/admin/plugins/discourse-ai/ai-llms/%/edit),选择“禁用原生工具支持(使用基于 XML 的工具)(可选)”或“禁用流式完成(将流式请求转换为非流式请求)”并点击保存会显示一个临时的“成功!”提示,但重新加载页面后,这两个选项或其中一个选项会处于未选中状态。

连接重置问题仍然存在,我还在深入研究,但看起来是 Ruby 代码(FinalDestination / DNS 解析 / Faraday)套接字处理与 Debian 12 容器在 Ubuntu 24.04 VM 上的组合问题。

我启动了一个测试用的 Ubuntu 22.04 VM,没有出现任何问题,所有的嵌入和推理都工作得很完美。从未见过一次重置。

我会继续努力,也许这与 Ubuntu 24.04 管理 netplan 的 TCP 堆栈的新方式有关。

2 个赞

谢谢,今天已经解决了持久性问题,您可以升级并重试。

3 个赞

好的,稍微更新一下——我们无法在公司 IP 地址范围内成功建立直接的 OpenAI API 连接。Cloudflare 会在 TLS 建立后大约 1 毫秒发送 RST 数据包。

因此,我们设置了一个 Cloudflare AI Gateway 作为 OpenAI API 端点的 URL 替换方案,它与 LLM 配置配合使用效果完美。

看起来 Cloudflare 对未知 IP 地址范围(例如非 Azure、AWS、GCP 等)有一个未公开的速率限制策略,该策略会被触发。Embeddings 的 100 个连接池会触发该限制。

另外,Cloudflare 有一个 Authenticated Gateway 功能,可以添加一个特殊的标头令牌。

来自他们的文档:

curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
  --header 'cf-aig-authorization: Bearer {CF_AIG_TOKEN}' \
  --header 'Authorization: Bearer OPENAI_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{"model": "gpt-4o" .......

如果在 LLM 配置屏幕中为每个 LLM 添加标头,那就太棒了。

这样我们就可以为每次调用 LLM 添加 cf-aig-authorization 键和值了。

这是一个棘手的问题,为了处理边缘情况,需要考虑很多用户界面。

您有没有可能尝试 openrouter.ai,这或许也能解决此问题?

我并非绝对反对允许任意标头,但这属于非常高级的配置。也许如果将其隐藏在站点设置后面会比较好(站点设置用于启用高级用户界面)。

您公司是否可以为这个开源插件提供帮助?

1 个赞

我还没有获得对贡献的批准,但我们会继续努力。谢谢你到目前为止的帮助!