Discourse AI bug?

I’m encountering some bugs while using Discourse AI. Here’s the situation:
First, let’s look at a correct post to demonstrate that my configuration is not the issue and everything is working as I intended.
My site’s base language is Simplified Chinese, and there are 9 other languages available.
This “New post test 0 translation” post is normal.

However, some individual posts are experiencing issues. For example:
This “About ‘Image Library’ category” post. The original post was published in Simplified Chinese (it was automatically published when the category was created).
But in the translation results, the original Simplified Chinese has been translated into “zh_CN” again, instead of being translated into English.

When I tried to manually add the English translation using the red globe icon, something strange


In the dropdown list, I am unable to add a separate translation for English.
However, in my global settings, English (US) has already been added correctly.

I have disabled the CDN cache and repeatedly cleared the local browser cache.
This rules out caching as the problem.
My basic configuration can also be ruled out, as the first image shows the correct behavior.
I don’t understand why this problem is happening. How can I fix it?
My Discourse version is 3.6.0.beta1-dev(b8e86ceb23)
The server OS is Debian 12.12
I have tested this locally using the latest versions of both Chrome and Opera browsers.
URL for the correct case: https://openttc.com/t/topic/61
URL for the incorrect case: https://openttc.com/t/topic/58

Dit is geen bug. Omdat de standaardtaal voor alle categoriebeschrijvingen (systeemschepping) En is. Wat je ook invoert, het is standaard een En-post. Daarom wordt de vertaling niet geactiveerd.

Every post has a detected language done by AI.

If you have a post originally written in “Chinese”, by right, the LLM should detect it to be “zh_CN”. You can view the detected language in the regular composer (not the translation composer) next to the :globe_with_meridians: .

The detected language is important so that the translator does not unnecessarily translate it to the detected language itself, and translates the post to every other language.

I suspect what is happening with https://openttc.com/t/topic/58 is that the detected language is incorrect, can you check?

1 like

Ja. Het werd ten onrechte gedetecteerd als en in plaats van zh_CN. Ik heb gecontroleerd en ontdekt dat alle door het systeem gemaakte categoriebeschrijvingsberichten als en werden gedetecteerd. Zelfs als ze in het Chinees zijn geschreven.

Nieuw probleem: Ondersteunt de FAQ-pagina meerdere talen? Waar moeten vertalingen worden toegevoegd?

1 like

I believe the FAQ page uses the typical I18n keys. You can go to your site’s admin page for site texts @ e.g. https://openttc.com/admin/customize/site_texts?q=guidelines_topic.body to replace it.

1 like

好的,非常感谢您的帮助。

我想这是一个确实存在的问题。不知道这种情况普遍存在还是因为我的配置错误。

Het kan ook met de tijd te maken hebben. Als de categorieën zijn aangemaakt voordat de sitetaal werd gewijzigd, kunnen de talen van de categorieonderwerpen allemaal Engels zijn.

In dat geval kunt u naar het eerste bericht van de onderwerpen gaan, de composer openen en deze naar het Chinees schakelen.

你好。经过我的尝试。
我在6分钟前创建了一个新分类:https://openttc.com/t/topic/66
然后我什么都没有做。等待翻译生效。
随后我就得到了一个内容是中文,但被识别成en的帖子。



在这个分类被创建时,我的站点语言100%是简体中文。
我的AI模型用的是Openrouter的Llama3.1 8b。

Interesting, can you run this data explorer query?

-- [params]
-- integer :topic_id = 66

SELECT 
  a.id,
  a.created_at,
  a.response_tokens,
  (REGEXP_MATCH(a.raw_response_payload, '"text": "([^"\\]+)"'))[1] AS llm_detected_locale,
  a.raw_response_payload
FROM ai_api_audit_logs a
JOIN posts p ON p.id = a.post_id AND p.deleted_at IS NULL
LEFT JOIN topics t ON t.id = a.topic_id AND t.deleted_at IS NULL
WHERE t.id = :topic_id
AND p.post_number = 1
AND a.feature_name = 'translation'
AND a.response_tokens < 5

This should tell us what the LLM is actually returning.

If we’re seeing an “en” in the llm_detected_locale column, I suspect your site’s prompts for locale detection may need a change, or you can use a more suitable LLM (maybe qwen?)

您好。我最终没有解决这个问题,但是影响不大。
最准确的问题描述应该是:当我创建一个新的分类时,那张由系统自动生成的“关于此分类”的帖子,他的默认语言会被识别成en。虽然他的文本内容确实是简体中文,但当我打开帖子时,他就是会被识别成英文,需要我手动更改才行。
我不确定,这是不是和我的app.yml文件设置有关?