AI Translation skips Portuguese (pt) locale - post translated to all languages except Portuguese

Sorry, I relied on the information provided in the first post

4 Likes

Actually, it’s not true that there’s no retry -

Backfill always backfills the most recent posts and topics first, so they will technically be retried within the hour / a few minutes. So, I’m not very sure what “fix” you’re looking for here.

Being on v2026.4.0-latest is good. There could be AI agent updates that are just not backported due to the sheer speed of how AI APIs are evolving.

5 Likes

I understood that, I am just saying that we don’t aggressively retry translations in order to protect sites from runaway token bills.

But as @nat notes, we do retry in the background. With a well-functioning LLM, your content should be getting translated over time.

2 Likes

After extensive investigation, I’ve identified that what appeared to be one translation issue is actually three separate problems occurring simultaneously, which created significant confusion.

A special thanks to Richard from Communiteq for his communication, competence, and especially for suggesting the Data Explorer approach — it was through SQL queries that I was finally able to pinpoint all three issues. Big respect.


Problem 1: Incorrect locale detection by the LLM

The LLM used for locale detection is incorrectly classifying posts that are written in English but contain Portuguese place names.

Example: The post titled “Hanamaro Chaki’s WA exhibition opens at Fortress of São João do Pico” is written entirely in English. However, the locale detector classified it as pt-BR — likely because of the Portuguese place names in the text (“Fortress of São João do Pico”, “Casa da Cultura de Santa Cruz”).

The consequence: because the system believed the post was already in Portuguese, it never translated it to Portuguese. Instead, it translated it to English — treating English as the “missing” language.

This is particularly problematic in multilingual communities where posts in one language frequently reference place names or proper nouns in another language.

Proposed fix: Use a more capable model for locale detection (e.g. Mistral Large), which better understands context and distinguishes between the language of the body text and proper nouns embedded within it.


Problem 2: Mistral API returning 503 errors causing mid-batch job crashes

Mistral intermittently returns 503 unreachable_backend errors. While backfill does eventually retry some of these, the Jobs::LocalizeTopics job crashes mid-execution when a 503 is encountered — leaving the remaining topics in the batch untranslated until the next scheduled run.

This creates an unpredictable pattern of missing translations for random locales across random topics.

Log evidence:

DiscourseAi::Translation: Translated 13 topics to de
[crash in localize_topics.rb:57]

The job translated 13 topics, then crashed. The remaining topics received no German translation until the next backfill cycle.


Problem 3: AI translation target categories — inconsistent auto-population of subcategories

In my case, I never manually added any categories to the AI translation target categories setting — they appeared to be added automatically. However, two subcategories (Viewpoints and Beaches) were not automatically added, even though they existed and contained content.

My hypothesis: the system automatically adds a subcategory to the target list only when a new post is created in it after translation was enabled. Since Viewpoints and Beaches were populated before translation was turned on, they were never auto-added — and therefore never translated.

This is confusing behavior. If the auto-population logic exists, it should be consistent and retroactive, or the UI should make it much clearer that subcategories need to be manually added.


Summary

All three issues occurred simultaneously, which made diagnosis extremely difficult. A post could be untranslated because of locale misdetection, a 503 crash, or simply because its category was missing from the target list — and there was no way to distinguish between these cases without deep log analysis and SQL queries.

The Data Explorer query suggested by Richard was the key that unlocked the investigation. I hope this detailed breakdown is useful for the team. Happy to provide additional logs or examples if needed.

Thanks to the team for their activity in this topic!

1 Like

I have the following problem: if the locale detector identifies the language incorrectly and I disagree with it, I can’t change the locale to the one I think is correct.
How can I resolve this issue?

Thanks for the detailed update @Denis_Kovalenko, appreciate it!

This is indeed something we can improve on our end. We made a recent change with category support but as you found out, and it is not working properly. We will look into it.

You should be able to do this via this button in the composer:

Make sure you update it in the original post (not in any of the translations).

2 Likes