Contentlokalisatie - Handmatig en automatisch met Discourse AI

In this topic, we will walk you through the Content Localization features and how to enable them. The features are split into two parts: What is available by default in Discourse; and Discourse AI for automatic translations.

:warning: For quick access to the relevant sections, use the wiki headings :backhand_index_pointing_right:t2:

Localizing Your Community’s Content

An updated version of Discourse (3.5.0.beta7-dev) gives you access to several localization features available for configuration at:

  • <your-site-url>/admin/site_settings/category/content_localization
New Content Localization in Site Settings 📸

Getting information on your users

Firstly, it is good to get some information on your community. The following data explorer query can give you an idea of how many users may have set their locale in /my/preferences/interface.

SELECT locale, count(*) as count
FROM users
WHERE (locale IS NOT null AND locale <> '')
GROUP BY locale
ORDER BY count DESC
Sample results from Data Explorer

Setting locales that your community supports

With the information above, we are now more informed about which locales your community should support.

In <your-site-url>/admin/site_settings/category/content_localization, you can select locales to support.

  • Content localization enabled - turns on the feature that replaces original written user content with localized content. Read on for auto and manual modes of localizing.
  • Content localization supported locales - the list of languages your site supports
  • Content localization language switcher - covered just below
List of locales in Site Settings 📸

Enabling the subsequent setting Content localization language switcher also allows you to make your community more accessible to non-logged-in users by showing the list of languages you’ve chosen in the list of supported locales:


Language switcher at the top right of the page

Viewing localized content


Localized welcome topic on meta.discourse.org

For viewers of localized content (all site visitors), they may cursor over the indicator next to the post’s date to view the original language of the post. This indicator only shows up if the post is not in their language.

If a user wishes to only see original content, they may use the toggle above the topic timeline to disable localizations for the whole site.

Automatic translations with Discourse AI :sparkles:

Discourse AI are the vitamins essential for the localization feature, and takes away the need to do manual translations.

As an admin, you’d want to head to our new AI features section for Translation.

Discourse AI Features in Admin Settings 📸

Scroll down in /admin/plugins/discourse-ai/ai-features

To cover some important settings and recommendations:

  • AI translation backfill hourly rate - this setting defaults to 0. :warning: Automatic translation will not begin if this value is 0. Assuming the rate is 50, your site will translate 50 posts, 50 topics, and 50 categories per hour, to the locales you have set in Content localization supported locales. Keep this to a low number when starting out.
  • AI translation backfill max age days - defaults to 5. This means topics and posts older than 5 days will not be translated. You may increase this to a large number to translate all topics and posts.
  • AI translation backfill limit to public content - defaults to true. This prevents PMs and content in private categories from being sent to the LLM. When set to false, group PMs, and private categories will be included in translations. PMs between individuals will not be translated.
  • AI translation max post length - defaults to 10000. This is a safeguard and prevents posts above a certain length from being translated.
  • AI translation max tokens multiplier - defaults to 1.0. We generally include a cap in the number of tokens used per translation. It is noted that when using Thinking / Reasoning LLMs like Gemini Pro 3 to translate, you will need to increase this multipler to about 3.0.
  • AI translation post raw translator persona (and other personas) - In more formal communities, admins may choose to create their own persona. This allows you to set a prompt that is more fine-tuned to the language or vocabulary you prefer.

You can refer to AI bot - Personas on how to configure suitable personas and fine-tune prompts for each function.

Translation Progress

You may find more information about how automatic translations are progressing in the Translation Progress chart on /admin/plugins/discourse-ai/ai-translations

This chart will show up if

  • all translator personas have a valid LLM
  • discourse ai enabled :check_mark:
  • ai translation enabled :check_mark:
  • content localization supported locales is filled
  • ai translation backfill max age days is more than 0
  • ai translation backfill hourly rate is more than 0

Manual localization

As localization is a core feature in Discourse, we provide the ability for you to fill in and edit localizations manually in the event automatic translations with Discourse AI is not available.

By default, admins and moderators are set up to edit localizations.

Localization allowed groups in Site Settings 📸


Admin Site Setting for Content Localization

Currently, we have post content, topic title, and category name and category description localizations. Tags are not supported yet, but will be in the near future. Subsequent sections below will show you how they work.

Category localization

Localized categories are visible in the following areas, with both category name and description localized:

Places where categories are localized 📸
  1. Homepage, sidebar, and category dropdown
  1. Categories page
  1. A specific category with subcategories

As an admin, you should be able to access category settings as usual, and find the new “Localizations” nav item on the left.

Editing category localizations in Category Settings 📸

Topic and Post localization

From the screenshots above in Category localization, you may have noticed topic titles and excerpts being localized.

There are some pre-requisite settings

  • Ensure your user is in content localization allowed groups
  • Add addTranslation in site setting for post menu. This allows the :globe_with_meridians: to show up in the post menu for users in content localization allowed groups
  • Content localization allow author localization is enabled by default, and allows post authors to localize their own content using the same post menu as above.
3 Site Settings 📸


:backhand_index_pointing_down:t2:

Once again, the list of localizable languages is in the Content localization supported locales setting mentioned above.

Editing a localized post

In the event the user might be viewing a localized post, and wants to edit the post, a dialog will appear to ask which version they would prefer to edit:

The appropriate composer will appear after deciding.

Deleting a post’s translation for a certain locale

If you’ve followed instructions above regarding the post menu setting correctly, you should be able to do the following if you’re in the Content localization allowed groups:

FAQ

I’ve set things up, but automatic translation is still not working for me
Confirm if you’ve these set up

  • Content localization supported locales has at least one language
  • Content localization enabled is :check_mark:
    • Allow user locale is :check_mark:
    • Set locale from cookie is :check_mark:
  • Ai translation enabled is :check_mark:
  • Ai translation max age days is not 0
  • Ai translation backfill hourly rate is more than 12
  • You must have a working LLM set for each translation persona

If all else fails, you can enable SiteSetting.ai_translation_verbose_logs.

Is every post getting translated?
If AI translation backfill limit to public content is :check_mark: , all posts in public categories except for Bot (user id < 0) posts will be translated.

Are the automatic translations saved, or is it being sent to the LLM each time someone views a topic?
The translations are saved, each post is only sent once per language and the translations are reused.

If my forum supports English and Japanese (via Content localization supported locales), and someone writes in Spanish, will their post be translated?
Yes. All topics and posts will be translated to English and Japanese, regardless of the written language.

If the original post is edited, is it re-translated?
Yes – with a maximum of 2 times per day. When a post is edited, it gets sent to re-translation 5 minutes later (or the SiteSetting.editing_grace_period) to account for ninja edits. Authorized users in Content localization allowed groups have the option to send a post to re-translate immediately.

Will translations be deleted if I change the Persona or LLM?
No, translations will typically persist across settings changes unless explicitly deleted using the post menu item or the translation composer.


22 likes

Are there any recommendations for doing this in bulk for existing categories? Worst case, perhaps via API?

2 likes

Hmm great question. I’ll see to it that API docs get updated for the category update endpoint. :memo:

4 likes

Zal er ondersteuning zijn voor een per-taal moderator (vertaler)? Ik denk aan meta – waar ik misschien vrijwilligerswerk zou doen om posts in een specifieke taal te controleren en ze handmatig bij te werken. Vooral documentatie die misschien wat menselijke aandacht kan gebruiken. Maar je zegt alleen moderators kunnen het doen wat ik waarschijnlijk nooit zal zijn.

2 likes

Hmm good suggestion. I think that can be done but we’ll need to think about the details on how it can be set up.

We currently have the following, but let me check if it can be extended to a group called “localization moderators”.

2 likes

How to access it? Could you provide a command please?

Does Sidekiq have any job linked? Is it possible to trigger it manually?

1 like
2 likes

To add on to Moin’s post above, it’s just SiteSetting. ai_translation_backfill_hourly_rate once you get to the console. The job runs every five minutes and rate limit accordingly.

2 likes

I see localization is now available in the docs. Thanks @nat!

3 likes

Dat is geweldig, hulde aan het team! Ik test het nu en zal mijn gedachten en algehele ervaring delen.

*We missen Esperanto, omdat we overstapten van de Discourse Translator plugin.

Kan dit ‘simpelweg’ worden toegevoegd, of moet het eerst worden ingebouwd in discourse-languages?*

Wow you’re on the ball – I was just about to report here. :laughing:

Yes, kind of. We want a full localized experience where the controls (buttons, labels, etc) are translated properly and sufficiently (70% would be really good) via Crowdin (see Translations - Discourse Meta), and with that we can provide support to the language.

1 like

Werkt de lokalisatie van content met documentcategorieën? Het lijkt erop dat de inhoud van de zijbalk niet wordt vertaald, ook al lokaliseer ik het indexonderwerp.

Ook merkte ik een vreemd gedrag op. Wanneer ik een gelokaliseerd onderwerp in de originele taal zie en vernieuw, schakelt het over naar de gelokaliseerde versie. Ik moet handmatig weer overschakelen naar de originele versie.

2 likes

Oh, fantastische vangst, ja het werkt nog niet, maar @nat zal het volgen!

Ik vraag me af of dit een katalysator is om te komen met een betere abstractie / gegevensmodel voor de sidbar-documentkoppelingen.

1 like

Yes that’s right – there are many places in Discourse that will need explicit translation, so I’m logging them as and when. Most recently, we localized notifications for topic titles as well. This is an example of a feature topic I created - Show translated user bios.

I’ll create a new topic and @ you so we make sure we cover all the bases in sidebar.

EDIT: @tvavrda covered here - Translate sidebar documentation links. Please have a look and see if it makes sense.

What do you mean by “switch again”?

Do you mind sharing a video recording (include the address bar) next time it happens? :folded_hands:t2: Feel free to DM me for this case if the content is not suitable in public. Also, were you logged in? Technically speaking these things are tracked by cookies so it’s a bit puzzling for me.

1 like

Ik heb je een video gestuurd.

Nog een observatie: ik kan geen diffs van de vertaalde inhoud zien, toch? Dat zou nuttig kunnen zijn als er updates voor zijn. Niet super belangrijk, maar het zou logisch zijn, denk ik.

En nog een: de backlinks onder het onderwerp tonen geen gelokaliseerde onderwerptitel.

En een vraag: Wat is het nut van het lokaliseren van categoriebeschrijvingen in de categorie-instellingen? De categoriebeschrijving zou toch uit de gelokaliseerde versie van het “Over” onderwerp moeten komen? De gelokaliseerde versie ondersteunt geen markdown, dus ik kan geen link gebruiken, wat ik wel graag zou willen.

1 like

Well… the old GitHub - discourse/discourse-docs-sidebar component actually respects the localization :slight_smile: I temporarily switched to that one.

Yeah, this is currently not supported as well and would be quite an endeavour.

We have a little special-coloured indicator (similar to post edits indicator next to it) when a translation may be outdated as the post version has changed.

1 like

Ik zie ook onvertaalde inhoud in de samenvattingen van vastgepinde onderwerpen. Dus ik zie een lijst met onderwerpen in de vertaalde taal, maar de samenvatting van het vastgepinde onderwerp toont het origineel.

1 like

We kunnen vertalingen handmatig doen of repareren, maar kunnen we het bouwen van vertalingen handmatig starten? Een soort on-demand taak.

Wat ik denk is dat ik vertalingen voor onderwerpen van een jaar oud heb toegestaan. Maar als dat jaar wordt geteld vanaf de huidige datum, zal die limiet voortdurend bewegen naar reeds vertaalde inhoud. Maar de grootste vraag is oudere waardevolle inhoud die ik snel en zonder nogal trage bulkacties wil bereiken.

Ik ben benieuwd, heeft iemand kostenindicaties na het inschakelen van de vertalingen? Onze site bestaat al een tijdje, en hoewel ik de hele site indien mogelijk zou willen vertalen, zijn de kosten zeker een punt van zorg. Dus als iemand een ruwe schatting van de kosten uit ervaring heeft, bijvoorbeeld 1000 berichten resulterend in $1 kosten, zou dat enorm helpen om een idee te krijgen van de totale kosten.

Wordt de contentlokalisatie eenmalig gedaan en ergens opgeslagen, dus niet on-demand? Zo ja, is er iets dat mij ervan weerhoudt om Ollama en een open-source LLM op mijn desktop te starten, zoals Llama 3 of Deepseek 3, en de taak gewoon te laten draaien totdat deze klaar is?

Bewerking: Ik denk dat het zou kunnen werken om de initiële vertaalkosten te verlagen, maar het zal niet werken voor nieuwere berichten, tenzij men besluit de lokale LLM permanent te laten draaien.