I’ve found ai_api_audit_logs and I think I’ve found the issue.
When translation is sent, there’s a get_max_tokens function which assigns max tokens based on text length.
Problem is, it’s mostly used up by reasoning. See this audit log, the limit was set to 1000, and reasoning took the whole 1000 before it even started generating output.
The limit for reasoning models should be much higher.
data: {"id":"chatcmpl-CQ7XU4Ep16RClb7OZQAxOXN9JWgIG","object":"chat.completion.chunk","created":1760341544,"model":"gpt-5-2025-08-07","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"finish_reason":null}],"usage":null,"obfuscation":"dPNNK7ojEf"}
data: {"id":"chatcmpl-CQ7XU4Ep16RClb7OZQAxOXN9JWgIG","object":"chat.completion.chunk","created":1760341544,"model":"gpt-5-2025-08-07","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{},"finish_reason":"length"}],"usage":null,"obfuscation":"dM2r"}
data: {"id":"chatcmpl-CQ7XU4Ep16RClb7OZQAxOXN9JWgIG","object":"chat.completion.chunk","created":1760341544,"model":"gpt-5-2025-08-07","service_tier":"default","system_fingerprint":null,"choices":[],"usage":{"prompt_tokens":1075,"completion_tokens":1000,"total_tokens":2075,"prompt_tokens_details":{"cached_tokens":0,"audio_tokens":0},"completion_tokens_details":{"reasoning_tokens":1000,"audio_tokens":0,"accepted_prediction_tokens":0,"rejected_prediction_tokens":0}},"obfuscation":"j4"}
data: [DONE]