Locale detector return value issues

We’ve encountered two different issues on two different forums, where the content localization started to translate posts into their original language (and worse, weirdly messing up the content or adding “this post is already in English!”).

Digging into this, it turned out that the locale detector was not receiving a pure language code from the LLM.

Instead, it was wrapped in markdown ``` (copying the relevant parts from the log only for readability

"delta":{"content":"```"}
"delta":{"content":"en```"},

or it had surrounding quotes, probably confused by the prompt which says Output: "en"

"delta":{"content":"\""}
"delta":{"content":"en\""}

Changing the last line of the prompt to Your response must be a language code, and nothing else. Do not wrap your response in markdown. helped but I guess LanguageDetector.detect should clean up the answer a bit (maybe allowing only AZaz and - ?) before using it.

4 Likes

thanks for reporting @nat will have a look

2 Likes

@RGJ we have a PR open for this, but can you share what LLM you’re using?

1 Like

We’ve decommissioned that instance, but as far as I remember it was Ministral 3B.

I merged a fix here that included updating the prompt and moving examples away from the system prompt and into proper interaction.

Our team is also currently working on evals to improve reliability across various LLMs.

2 Likes

This topic was automatically closed after 2 days. New replies are no longer allowed.