Discourse AI Failing to translate large number of posts and topics

Hello there,

I’ve enabled Discourse AI on our community forum and started backfilling into multiple languages. Around 60% of posts and topics are translated, but in the process, I’m getting a LOT of errors in the console (ai_translation_verbose_logs is enabled) and now backfilling has mostly stalled:

DiscourseAi::Translation: Failed to translate topic 563 to de: Validation failed: Title can't be blank, Fancy title can't be blank /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.3/

DiscourseAi::Translation: Failed to translate post 582 to pl_PL: Validation failed: Raw can't be blank, Cooked can't be blank /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.3/lib/a

The strange thing about it is:

  • Mentioned posts and topics look pretty normal, they have different kinds of title and bodies, both simple and complex. Similar ones got translated successfully
  • On a second or third attempt they often get translated successfully
  • I’m using a custom persona for Posts, but this happens on a built-in Post Translator persona as well as on the built-in topic title translations one
  • It happens on all models I tested: Gemini-2.5-flash (non thinking), Gemini-2.5-flash (thinking), GPT5 and GPT5-mini
  • It happens on all locales equally (en, es, pt, de, pl_PL, fr, nl)

Is it possible to log the full prompts and model responses to debug this further?

I’m testing the same prompts manually on all of these models and they always respond successfully.

I’ve found ai_api_audit_logs and I think I’ve found the issue.

When translation is sent, there’s a get_max_tokens function which assigns max tokens based on text length.

Problem is, it’s mostly used up by reasoning. See this audit log, the limit was set to 1000, and reasoning took the whole 1000 before it even started generating output.

The limit for reasoning models should be much higher.

data: {"id":"chatcmpl-CQ7XU4Ep16RClb7OZQAxOXN9JWgIG","object":"chat.completion.chunk","created":1760341544,"model":"gpt-5-2025-08-07","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"finish_reason":null}],"usage":null,"obfuscation":"dPNNK7ojEf"}

data: {"id":"chatcmpl-CQ7XU4Ep16RClb7OZQAxOXN9JWgIG","object":"chat.completion.chunk","created":1760341544,"model":"gpt-5-2025-08-07","service_tier":"default","system_fingerprint":null,"choices":[{"index":0,"delta":{},"finish_reason":"length"}],"usage":null,"obfuscation":"dM2r"}

data: {"id":"chatcmpl-CQ7XU4Ep16RClb7OZQAxOXN9JWgIG","object":"chat.completion.chunk","created":1760341544,"model":"gpt-5-2025-08-07","service_tier":"default","system_fingerprint":null,"choices":[],"usage":{"prompt_tokens":1075,"completion_tokens":1000,"total_tokens":2075,"prompt_tokens_details":{"cached_tokens":0,"audio_tokens":0},"completion_tokens_details":{"reasoning_tokens":1000,"audio_tokens":0,"accepted_prediction_tokens":0,"rejected_prediction_tokens":0}},"obfuscation":"j4"}

data: [DONE]

I simply do not recommend using any type of thinking models for translation tasks.

https://www.reddit.com/r/LocalLLaMA/comments/1g7y74t/adding_a_thinking_turn_to_extend_llms_reasoning/

My experience is quite the opposite. I have a set of instructions that I want to be followed that require understanding the context, which are either ignored by non-thinking models, or applied in wrong situations. I’ve just translated a whole application this way - >3000 strings, with reasoning models giving much better results.

I reduced thinking effort to low based on my findings and got all the translations come through. But I believe limiting the output tokens like that is counter-productive, as thinking models are not restricted from being used in translations, and user have no clue why it’s failing.

The solution could be as simple as further multiplying by 2 if the LLM has thinking enabled. Or exposing a multiplier as a config option.

לייק 1

We had to limit max_tokens as our usage of structured outputs meant that many smaller models could easily spiral down into infinite loops during translations.

I believe the newer version of OpenAI Responses API applies max_tokens without counting the thinking tokens, which solves this issue.

I’m trying the latest GPT-5. I saw exactly the same issue with Gemini 2.5 Pro and 2.5 Flash. Why not to just increase the limit a bit?

I’ve spent quite a bit on failed tries, which I wouldn’t even know about if I didn’t enable the debug logging, and then had to poke around in the Data Explorer to find the logs. All while I’ve used a pre-defined model creator.