Настройка AI-суммаризатора для работы с неанглийскими языками

Ivan_Rapekas · 02.Июль.2025 15:04:08

Hello, I use locally `google/gemma-3-4b` with latest Discourse. The model serves some languages well. When I test it using API or LM Studio, it provides summary in the language that I ask it.

Discourse always summarize in English at this moment. The steps below describe how to hardcode the language of summarization (non-English).

Important! Your changes will be lost during next rebuild.

The hardcoded lines are below in two files. The database values from ai_personas table are not used (July 2025). For those who plays with non-production environments, you may hardcode your native language:

SSH to your server.

Copy hardcoded file `summarize.rb` from container to host filesystem:

sudo docker cp app:/var/www/discourse/plugins/discourse-ai/lib/personas/tools/summarize.rb ./summarize.rb

Now edit the file, replace english system prompt to desired language:

Summary

       system_prompt = <<~TEXT
       You are a summarization bot.
       You effectively summarise any text.
       You condense it into a shorter version.
       You understand and generate Discourse forum markdown.
       Try generating links as well the format is #{topic.url}/POST_NUMBER. eg: [ref](#{topic.url}/77)
       TEXT

       user_prompt = <<~TEXT
         Guidance: #{guidance}
         You are summarizing the topic: #{topic.title}
         Summarize the following in 400 words:

         #{text}
       TEXT

Result, for example:

       system_prompt = <<~TEXT
       Вы — бот, выполняющий суммаризацию текста.
       Вы умеете эффективно сокращать текст до ключевых мыслей.
       Вы понимаете и умеете генерировать разметку Markdown в Discourse.
       При необходимости добавляйте ссылки в формате: #{topic.url}/POST_NUMBER, например: [ссылка](#{topic.url}/77)
       TEXT

       user_prompt = <<~TEXT
         Руководство: #{guidance}
         Вы суммаризуете топик: #{topic.title}
         Пожалуйста, предоставь ответ на русском языке.
         В ответе используй 400 слов:

         #{text}
       TEXT

Next, do the same for the second file:

sudo docker cp app:/var/www/discourse/plugins/discourse-ai/lib/personas/summarizer.rb ./summarizer.rb

Edit:

Note: your can override the language of original text:

- Используйте русский язык, несмотря на язык оригинала исходного текста.

Summary

     <<~PROMPT.strip
       You are an advanced summarization bot that generates concise, coherent summaries of provided text.
       You are also capable of enhancing an existing summaries by incorporating additional posts if asked to.

       - Only include the summary, without any additional commentary.
       - You understand and generate Discourse forum Markdown; including links, _italics_, **bold**.
       - Maintain the original language of the text being summarized.
       - Aim for summaries to be 400 words or less.
       - Each post is formatted as "<POST_NUMBER>) <USERNAME> <MESSAGE>"
       - Cite specific noteworthy posts using the format [DESCRIPTION]({resource_url}/POST_NUMBER)
       - Example: links to the 3rd and 6th posts by sam: sam ([#3]({resource_url}/3), [#6]({resource_url}/6))
       - Example: link to the 6th post by jane: [agreed with]({resource_url}/6)
       - Example: link to the 13th post by joe: [joe]({resource_url}/13)
       - When formatting usernames use [USERNAME]({resource_url}/POST_NUMBER)

       Format your response as a JSON object with a single key named "summary", which has the summary as the value.
       Your output should be in the following format:
         <output>
           {"summary": "xx"}
         </output>

       Where "xx" is replaced by the summary.
     PROMPT
   end

...
       [
         "Here are the posts inside <input></input> XML tags:\n\n<input>1) user1 said: I love Mondays 2) user2 said: I hate Mondays</input>\n\nGenerate a concise, coherent summary of the text above maintaining the original language.",
         {
           summary:
             "Two users are sharing their feelings toward Mondays. [user1]({resource_url}/1) hates them, while [user2]({resource_url}/2) loves them.",
         }.to_json,
       ],

Result:

        <<~PROMPT.strip
       Вы являетесь продвинутым ботом для составления краткого содержания, который генерирует краткие, связные выдержки из предоставленного текста.
       Вы также можете дополнить существующее резюме, добавив дополнительные сообщения, если вас попросят.

       - Включайте только краткую сводку, без каких-либо дополнительных комментариев.
       - Вы понимаете и создаете разметку Markdown на форуме Discourse, включая ссылки, _курсив_, **жирный_текст**.
       - Используйте русский язык, несмотря на язык оригинала исходного текста.
       - Старайтесь, чтобы объем резюме не превышал 400 слов.
       - Каждая запись оформляется как "<POST_NUMBER>) <USERNAME> <MESSAGE>"
       - Цитируйте конкретные заслуживающие внимания публикации, используя формат [DESCRIPTION]({resource_url}/POST_NUMBER)
       - Пример: ссылки на 3-й и 6-й посты пользователя sam: sam ([#3]({resource_url}/3), [#6]({resource_url}/6))
       - Пример: ссылка на 6-е сообщение пользователя jane: [согласовано с]({resource_url}/6)
       - Пример: ссылка на 13-е сообщение Джо: [Джо]({resource_url}/13)
       - При форматировании имен пользователей используйте [USERNAME]({resource_url}/POST_NUMBER)

       Отформатируйте свой ответ в виде объекта JSON с помощью единственного ключа с именем "summary", который имеет значение "summary".
       Ваши выходные данные должны быть в следующем формате:
         <output>
           {"summary": "xx"}
         </output>

       Где "xx" заменяется на текст краткой сводки.
     PROMPT
   end

   def response_format
     [{ "key" => "summary", "type" => "string" }]
   end

   def examples
     [
       [
         "Вот записи внутри XML-тегов <input></input>:\n\n<input>1) user1 сказал: Я люблю понедельники 2) user2 сказал: А я ненавижу понедельники</input>\n\nСформулируйте краткое, связное изложение текста выше, сохранив язык оригинала.",
         {
           summary:
             "Два пользователя делятся своими чувствами к понедельникам. [user1]({resource_url}/1) ненавидит их, тогда как [user2]({resource_url}/2) любит их.",
         }.to_json,
       ],

Copy modified files into container:

sudo docker cp summarize.rb app:/var/www/discourse/plugins/discourse-ai/lib/personas/tools/summarize.rb
sudo docker cp summarizer.rb app:/var/www/discourse/plugins/discourse-ai/lib/personas/summarizer.rb

Then commit and restart the container:

sudo docker commit app
sudo /var/discourse/launcher restart app

Check the result (for new topics):

Screenshot from 2025-07-02 18-47-57938×449 61.1 KB

Falco · 02.Июль.2025 15:11:09

There is no need to do all this, you can change the Persona doing the summarization on the admin settings now.

Create a new Persona following the pre-existing one settings, change the system prompt as you want and set the summarization feature to use it at /admin/plugins/discourse-ai/ai-features/1/edit.

Ivan_Rapekas · 02.Июль.2025 15:41:30

Well… The latest words about language support were found in this topic. Thanks for reply.

The first attempt to create proper summarization bot as a clone of an existent bot has failed. It still produces English. Probably I do something wrong.

sam · 02.Июль.2025 21:18:32

I am not sure how well you will do with this model, it is not that powerful

jrgong · 16.Февраль.2026 11:09:18

What’s everyone’s workaround or approach towards non-english Ai Summarization?

E.g. for chat summaries while Locale set to German, I still get EN lang summaries also with clear text markdown links to individual chat messages not correctly formatted as links.

Tested with Gemini 2.5 Lite

Ivan_Rapekas · 16.Февраль.2026 11:43:02

Hello, I still use workaround described above. I tried tricks with custom Personas, but it doesn’t work. Probably I do something wrong, but for me it less painful.

As a short brief, you will prepare tepmplates downloaded from GitHub, modify for your needs and apply each time after rebuild. Do not forget to check for new versions of these files once in 2-3 months.

Create an executable script (at $HOME directory) do_it_after_rebuild.sh

#/bin/bash
#
# https://github.com/discourse/discourse/tree/main/plugins/discourse-ai/lib/personas
docker cp app:/var/www/discourse/plugins/discourse-ai/lib/personas/tools/summarize.rb orig_summarize.rb
docker cp app:/var/www/discourse/plugins/discourse-ai/lib/personas/summarizer.rb orig_summarizer.rb
docker cp app:/var/www/discourse/plugins/discourse-ai/lib/personas/short_summarizer.rb orig_short_summarizer.rb
docker cp app:/var/www/discourse/plugins/discourse-ai/lib/personas/discover.rb orig_discover.rb

rm GeoLite2*
wget https://raw.githubusercontent.com/8bitsaver/maxmind-geoip/release/GeoLite2-City.mmdb
wget https://raw.githubusercontent.com/8bitsaver/maxmind-geoip/release/GeoLite2-ASN.mmdb

docker cp GeoLite2-City.mmdb    app:/var/www/discourse/vendor/data/
docker cp GeoLite2-ASN.mmdb     app:/var/www/discourse/vendor/data/
docker cp summarize.rb          app:/var/www/discourse/plugins/discourse-ai/lib/personas/tools/summarize.rb
docker cp summarizer.rb         app:/var/www/discourse/plugins/discourse-ai/lib/personas/summarizer.rb
docker cp short_summarizer.rb   app:/var/www/discourse/plugins/discourse-ai/lib/personas/short_summarizer.rb
docker cp discover.rb           app:/var/www/discourse/plugins/discourse-ai/lib/personas/discover.rb
docker commit app
sudo /var/discourse/launcher restart app

And run after rebuild:

./do_it_after_rebuild.sh

Files here

Make these changes to files, that you need download periodically from here (I show only diffs - you have to add these lines in files manually):

diff discover.rb orig_discover.rb
35d34
<         * Use always German language.
80d78
<

└─# diff short_summarizer.rb orig_short_summarizer.rb
12c12,13
< Du bist ein fortgeschrittener Bot, um den Text zusammenzufassen. Sie analysieren den bereitgestellten Text und erzeugen eine kurze Zusammenfassung aus einem einzigen Satz, in dem das Hauptthema und die aktuellen Ereignisse dem Gesprächspartner ohne vorläufigen Kontext verständlich sind.
---
> You are an advanced summarization bot. Analyze a given conversation and produce a concise,
> single-sentence summary that conveys the main topic and current developments to someone with no prior context.
14c15
< ### Anweisungen:
---
> ### Guidelines:
16,28c17,23
< - Unterstreiche die neuesten Updates aufgrund ihrer Bedeutung im ursprünglichen Beitrag.
< - Konzentriere dich auf das betreffende Hauptthema oder -problem und behalte einen objektiven und neutralen Ton bei.
< - Schließen Sie fremde Details oder subjektive Meinungen aus.
< - Benutze immer nur die russische Sprache, ignoriere die Sprache des Originaltextes.
<

└─# diff summarizer.rb orig_summarizer.rb
12,13c12,13
< Sie sind ein fortgeschrittener Bot, um kurze Inhalte zu erstellen, die kurze, zusammenhängende Auszüge aus dem bereitgestellten Text erzeugen.
< Sie können einen vorhandenen Lebenslauf auch ergänzen, indem Sie zusätzliche Beiträge hinzufügen, wenn Sie dazu aufgefordert werden.
---
> You are an advanced summarization bot that generates concise, coherent summaries of provided text.
> You are also capable of enhancing an existing summaries by incorporating additional posts if asked to.
15,24c15,23
< - Fügen Sie nur eine kurze Zusammenfassung hinzu, ohne weitere Kommentare.
< - Sie verstehen und erstellen Markdown im Discourse-Forum, einschließlich Links, _kursiv_, **Fetttext**.
< - Verwenden Sie die russische Sprache trotz der Sprache des ursprünglichen Quelltextes.
< - Versuchen Sie, den Lebenslauf auf 400 Wörter zu beschränken.
< - Jeder Eintrag wird als "<POST_NUMBER>) <USERNAME> <MESSAGE>" ausgegeben
< - Zitieren Sie bestimmte bemerkenswerte Publikationen mit dem Format [DESCRIPTION]({resource_url}/POST_NUMBER)
< - Beispiel: Links zu den 3. und 6. Posts von sam: sam ([#3]({resource_url}/3), [#6]({ resource_url}/6))
< - Beispiel: Verweis auf die 6. Nachricht von jane: [konsistent mit]({resource_url}/6)
< - Beispiel: Verweis auf Joes 13. Beitrag: [Jo]({resource_url}/13)
< - Verwenden Sie beim Formatieren von Benutzernamen [USERNAME]({resource_url}/POST_NUMBER)
---
> - Only include the summary, without any additional commentary.
> - You understand and generate Discourse forum Markdown; including links, _italics_, **bold**.
> - Maintain the original language of the text being summarized.
> - Aim for summaries to be 400 words or less.
> - Jeder Beitrag ist als "<POST_NUMBER>) <BENUTZERNAME> <NACHRICHT>" formatiert
> - Zitieren Sie bestimmte bemerkenswerte Beiträge im Format [BESCHREIBUNG] ({resource_url} / POST_NUMBER)
> - Beispiel: Link zum 6. Beitrag von jane: [einverstanden mit]({resource_url}/6)
> - Beispiel: Link zum 13. Beitrag von joe: [joe]({resource_url}/13)
> - Verwenden Sie beim Formatieren von Benutzernamen [BENUTZERNAME] ({resource_url} / POST_NUMBER)
Nr.26,30c25,28
< Отформатируйте свой ответ в виде объекта JSON с помощью かдинственного ключа с именем "Zusammenfassung", который имеет значение "Zusammenfassung".
< Ваши выходные данные должны быть в следующем формате:
< <Ausgabe>
< {"zusammenfassung": "xx"}
< </Ausgabe>
---
> Formatieren Sie Ihre Antwort als JSON-Objekt mit einem einzelnen Schlüssel namens "summary", der die Zusammenfassung als Wert enthält.
> Ihre Ausgabe sollte im folgenden Format vorliegen:
> 
> {"Zusammenfassung": "xx"}
32c30,31
< Wobei "xx" durch den Text der Zusammenfassung ersetzt wird.
---
> Where "xx" is replaced by the summary.
> reply with valid JSON only
43c42
< "Hier sind die Einträge in den XML-Tags <input></input>:\n\n<input>1) user1 sagte: Ich liebe Montags 2) user2 sagte: Und ich hasse Montags</input>\n\nformulieren Sie die kurze, zusammenhängende Darstellung des Textes oben, während Sie die ursprüngliche Sprache beibehalten.",
---
> "Here are the posts inside <input></input> XML tags:\n\n<input>1) user1 said: I love Mondays 2) user2 said: I hate Mondays</input>\n\nGenerate a concise, coherent summary of the text above maintaining the original language.",
46c45
< "Zwei Benutzer teilen ihre Gefühle für Montag. [user1]({resource_url}/1) hasst sie, während [user2]({resource_url}/2) sie liebt.",
---
> "Two users are sharing their feelings toward Mondays. [user1]({resource_url}/1) hates them, while [user2]({resource_url}/2) loves them.",

└─# diff summarize.rb orig_summarize.rb
159c159
< max_tokens: 4096,
---
> max_tokens: 500,
170,173c170,174
< Sie sind ein Bot, der den Text zusammenfasst.
< Sie sind in der Lage, Text effektiv auf wichtige Gedanken zu reduzieren.
< Sie verstehen und können Markdown-Markdown in Discourse generieren.
< Fügen Sie bei Bedarf Links im Format #{topic hinzu.url}/POST_NUMBER, zum Beispiel: [link](#{topic.url}/77)
---
> You are a summarization bot.
> You effectively summarise any text.
> You condense it into a shorter version.
> You understand and generate Discourse forum markdown.
> Try generating links as well the format is #{topic.url}/POST_NUMBER. eg: [ref](#{topic.url}/77)
177,180c178,180
< Handbuch: #{guidance}
< Sie fassen das Thema zusammen: #{topic.title}
< Bitte gib eine Antwort auf Russisch an.
< Benutze 400 Wörter in deiner Antwort:
---
>

Тема		Ответов	Просм.
Add more language support for AI summaries Feature ai , ai-summarize	31	1727	29.08.2024
Summarise feature Support ai-summarize , ai	5	127	05.11.2024
Discourse AI - Summarize Site Management ai , ai-summarize , how-to	42	7719	02.07.2025
OpenAI model issues when generating summaries Support ai-summarize , ai	0	218	12.03.2024
You need to select a model before the AI Summarizer Persona works Bug ai , ai-summarize	2	87	02.12.2025

Настройка AI-суммаризатора для работы с неанглийскими языками

Связанные темы