AI-generated tag translations do not work perfectly

When scrolling through the German tag translations, I noticed a range of issues that seem to stem from the AI lacking context - it treats tags as isolated words rather than references to specific Discourse features, plugins, or components.

Note: German nouns are always capitalised, but tags on meta are lowercase. The translations in this post are therefore inconsistently capitalised - I kept slipping into correct German capitalisation out of habit.

The fun part first

Before getting into the practical problems, some translations are just entertaining:

  • composer → “Komponist” - This is the person writing music
  • auto-bump → “automatische-erhöhung” - “automatic increase”
  • fully-theme → “vollstĂ€ndig-thematisiert” - “fully addressed”
  • raspberry-pi → “Himbeere-pi” (“raspberry” as in the fruit)
  • post-voting → “nach-der-Abstimmung” - “after the vote” (“post” read as the Latin prefix, not as a forum post)
  • tablet → “Tablette” - “pill” (the medication, not the device)

Same translation for different tags

This is the most impactful problem in practice. When two tags get the same translation, they lose their ability to distinguish topics from each other.

  • year-in-review & yearly-review → “JahresrĂŒckblick” - Currently the plugin name seems to be not-translatable (I see the English name in the admin sidebar and in the list of installed plugins), so it’s likely you would use the English term to refer to the plugin’s name. Though I hope some day all plugins have translated names so I think I would add “Metas” to the one grouping the yearly review topics on here to separate those, so it’s “Metas-JahresrĂŒckblick” (meta’s year in review)
  • surveys & polls → “Umfragen” - I think the translations of both plugins are the same too and so far no one noticed. I need to think more about a good solution for this one because it can also easily conflict with “voting” :exploding_head:
  • docs & documentation → “Dokumentation” - Just like yearly-review docs hasn’t been translated to German so I would not translate the tag (In this case a translation in the future seems very unlikely)
  • how-to & tutorial → “Anleitung” - This one has already been fixed. I found this translation of https://diataxis.fr/ and suggested the term[1] used there)

Proper nouns and product names that shouldn’t be translated

Some tags refer to specific tools, frameworks, or products. Translating them makes the feature unrecognisable.

  • raspberry-pi → “Himbeere-pi” (“raspberry” as in the fruit)
  • mermaid → “Meerjungfrau” (“mermaid” as in the mythological creature, not the diagramming tool)
  • ember → “Glut” (glowing embers from a fire)
  • vanilla → “Vanille” (the flavour)
  • onebox → “einzige-box” - “only box”
  • intercom → “Gegensprechanlage” (an intercom as in a door buzzer - though intercom-widget was translated fine)
  • passkey → “Passwort” - “password” (a passkey is specifically not a password)
  • perspective-api → “Perspektiven-api”
  • backups → “Sicherungen”
  • design-experiment → “Experimententwurf” - can be “design-experiment” but also “draft experiment”, I would think of the latter because for the first I’d have kept “design” and talking about drafts is quite common in Discourse.

Translations of “Discourse”

Most tags referring to “Discourse” were translated so they no longer include the name of the software. One exception is discourse-hub .

“Theme” consistently mistranslated as “Thema” (topic)

This is a systematic problem across all theme-related tags. In German, both “theme” and “topic” translate to Thema, but in a Discourse context these are very different things. This makes theme tags read as if they’re about specific discussion topics.

  • theme-welcome → “Willkommens-Thema” (reads like “welcome topic”, as in the default pinned welcome thread)
  • theme-creator → “Themenersteller” - “topic creator”
  • horizon-theme → “Horizont-Thema”
  • meta-theme-feedback → “Meta-Themen-Feedback”
  • foundation-theme → same pattern
  • fully-theme → “vollstĂ€ndig-thematisiert” - “fully addressed”

This affects all tags in the Official Themes group.

Translations where context was missing

  • composer → “Komponist” - This is the person writing music, compared to the input field which we usually call “Editor” in German.
  • tablet → “Tablette” - “pill” or “tablet”.
  • copy-post → "kopierbeitrag” - “copying fee” (The problem is the combinations of the words. “Beitrag” for post is correct, but because copy wasn’t translated as a verb it reads like Beitrag would be used in the meaning of fee here)

Noun or verb

Some features were translated as verbs instead of nouns

  • chat → “plaudern” - “to chat”
  • search → “suchen” - “to search”

“post” read as the Latin prefix, not as a forum post

  • post-voting → “nach-der-Abstimmung” - “after the vote”
  • post-badges → “nach-Abzeichen” - “after-badges”

Results from not that clear English tags

  • hosted-support → “gehosteter-support” (This reads like support being hosted instead of support for hosted customers)

Abbreviation

  • pm-dropdown (same in German) without context m (message) was not replaced with n (Nachricht)

Translations that don’t match Discourse’s own interface terminology

These translations are technically correct German, but Discourse’s own UI uses different terms. This makes tags harder to find intuitively, especially for users who navigate by the interface language.

  • impersonate → “nachahmen” - “imitate” (but the interface uses Nutzersicht or Nutzerrolle)
  • staged-users → “Staging-Benutzer” (but the interface says vorbereitete Benutzer)
  • advertising → “Werbung” (but the interface refers to Anzeigen)
  • assign → “zuweisen” (but the plugin translation uses zuordnen)
  • hot-topics → “Top-Themen” (this was translated as “top topics”, which is actually a different list in Discourse)
  • read-only → “nur lesbar”
  • bootstrap-mode → “Bootstrap-Modus” (but translators originally chose Starthilfemodus)
  • post-notices → “Nachrichten” - “messages/news” (can be misleading because messages are a different feature, “official notice” uses Mitteilung in the interface)
  • about-page → â€œĂŒber-Seite” (This is a literal translation. But usually the German translation is something like “about us page”. Über does not only mean about but also above.)
  • auto-bump → “automatische-erhöhung” - “automatic increase”
  • tags → “Etiketten” (but tag-groups and most tags containing tag use “tag”, the term used on Crowdin is Schlagwort)

Truncated translations

This is a different kind of problem - not a translation error, but a consequence of German compound nouns being significantly longer than their English equivalents, combined with the tag character limit.

  • content-security-policy → “inhalts-sicherheitsrichtl” (cut off, should be inhalts-sicherheitsrichtlinie)
  • ai-custom-prompt → “ai-benutzerdefinierte-auf” (cut off mid-word, should be ai-benutzerdefinierte-aufforderung)
  • custom-category-boxes → “benutzerdefinierte-katego” (cut off mid-word, should be benutzerdefinierte-kategorie-boxen, in this case box is missing entirely from the translation)

Tags containing “custom” easily get too long because “benutzerdefiniert” is quite a long word.

more examples
  • pause-notifications → “benachrichtigungen-anhalt” (en)
  • theme-site-settings → “thema-website-einstellung” (en)
  • staff-action-log → “mitarbeiter-aktionsprotok” (le)
  • lazy-load-categories → “kategorien-verzögert-lade” (n)
  • unsupported-install → “nicht-unterstĂŒtzte-instal” (lation)
  • categories-navbar → “kategorien-navigationslei” (ste)
  • remove-name-suppression → “namenunterdrĂŒckung-entfer” (nen)
  • right-sidebar-blocks → “rechte-seitenleiste-blöck” (e)
  • user-field-prompt → “benutzerfeld-eingabeauffo” (rderung)
  • top-contributors-sidebar → “seitenleiste-der-top-beit” (ragenden)
  • hide-users-column → “benutzer-spalte-ausblende” (n)
  • topic-footer-buttons → “thema-fußzeilen-schaltflĂ€â€ (chen)
  • scrollable-post-content → “scrollbarer-beitrag-inhal” (t)
  • custom-inline-codeblocks → “benutzerdefinierte-inline” (-codeblöcke)
  • hide-muted-categories → “stummgeschaltete-kategori” (en-verstecken)
  • custom-header-icons → “benutzerdefinierte-kopfze” (ilen-symbole)
  • custom-header-links → “benutzerdefinierte-kopfze” (lein-links) (NOTE: This is the same as the one above because they were cut)
  • new-topic-header-button → “neuer-themen-header-butto” (n) (Though usually we use “SchaltflĂ€che” for button)
  • sidebar-theme-toggle → “seitenleisten-themenumsch” (alter) (of course this one should also use “theme” instead of “topic” so the “n” is not needed)
  • custom-profile-link → “benutzerdefiniertes-profi” (l-link), the grammar seemsas if “link” was lost quite early because custom does not match link, but profile. I think it should be “benutzerdefinierter-profil-link”
  • easy-responsive-footer → “einfacher-responsiver-fuß” Similar to the one above easy and responsive seem to refer to foot which is where the tag was cut, instead of footer. It should be “einfache-responsive-fußzeile”

These examples suggest the translation process needs more context - ideally knowing which plugin or feature a tag belongs to, and having access to existing Discourse interface translations as a reference. Happy to hear if others have noticed similar patterns in other languages.


@nat (upon personal request)


  1. Lernunterlagen ↩

6 Likes

Thanks @Moin, I’ll be looking into this and improving our prompts :smiling_face:

Also LOL

Thanks for the laughs :hugs:

4 Likes

@nat what if we gave the agent access to the read tool, so it could gather context by itself?

Would be too costly for posts, but rather cheap and increase the quality across all models for once offs like tags and categories.

3 Likes

Hmm that’s a good idea @falco.

Another way I had considered was passing the tag’s description as additional context when translating the tag name. Perhaps this way is more predictable, wdyt?

4 Likes

Having access to the glossary on Crowdin could be very helpful for the bot doing the translation (not for all sites, but especially for Meta). If it’s noted there that we translate “composer” as “Editor,” and AI knows it, it could use this in tags, but also in topic titles and posts.

I had once fixed “composer” in Introducing our new composer, making writing on Discourse easier than ever which resulted in my feedback on editing translations here: Feedback on the composer when translating a post to German, but the topic was edited after I did that, and it doesn’t seem that the prior translation is used as context, so the post says “composer” again. (The author of music usually doesn’t occur in posts; only shorter texts like topic titles and tags.)

On Meta, the description often doesn’t add much context. All the ones for theme components, for example, simply contain a link to the component’s topic, not the short description from the beginning of the topic.

Good idea, let’s do both!

The idea is to use Meta as a test bed and proxy to what our customers could encounter on the wild and make the feature better for everyone.

Getting a perfect translation on Meta would be super easy by simply using the most expensive LLM and giving it access to tools like source code access and web search.

I don’t think any model would choose the same translations for Meta as German translators did for the Discourse interface. “Mitarbeiter” is a perfect translation for “staff.” The fact that some translators decided years ago that it doesn’t fit small hobby forums - where “staff” implies paid employees - and therefore chose “Team” is something no AI will guess, because it’s simply not the correct translation. This is exactly where the Crowdin glossary would help: without it, AI-generated terms will never match what admins actually see in the interface - not because AI can’t translate, but because it doesn’t make the same localization decisions human translators made. It’s the difference between translation and localization.
And that’s similar with other terms like “bootstrap mode” or “impersonation”.

This does not only affect tags, but everything localized here. Whether it’s a guide or a tag, matching the terms users actually see in their interface is more useful than a linguistically perfect translation that doesn’t align with what the software says.

It would, as it would have access to that exact same choice on the config/locales/**/*.yml files for reference.

Definitely, and for small enumerable groups, like categories and tags giving the agent access to the existing translations, which are part of the source code would help grounding them.

We can’t do that for the posts, as the cost would be too great, but it is an option for smaller sites or customers with larger translation budgets.

Then maybe you should disable AI-translation for Documentation and News and Events > Announcements :wink: I don’t think it’s possible to ensure those translations are helpful, especially since suggested edits don’t bump the topic, so there is no easy way to notice that a topic was updated.

In general, the cost is why I suggested using the glossary instead of the files containing all translations, because I would expect that to contain the most relevant choices once while not adding every text.

That is not how it works; the agent can search the code for chunks with matches, and it doesn’t ever load the entire thing into the context.

That’s a bit like throwing the baby out with the bathwater, ain’t it?

I just checked Calendar subscription URLs for external calendar apps in PT-BR, and it’s looking like a great translation, much better than having nothing.

There will always be improvements to be done on an unsupervised machine translation workflow, and @nat has already made it better today thanks to your feedback!

No one expects it to be perfect, and Meta is a place where we can early adopt features and showcase what is possible in Discourse for our users and customers.