This guide explains how to adjust Discourse settings to better accommodate Chinese, Japanese, and Korean (CJK) languages in site search.
Required user level: Administrator
Discourse’s default settings may not be optimal for CJK (Chinese, Japanese, Korean) languages. However, Discourse now automatically adjusts many of these settings when your site’s locale is set to a CJK language (Japanese, Simplified Chinese, or Traditional Chinese). This guide explains what is configured automatically and what you may still need to adjust manually.
Automatic locale defaults
When your site’s default locale is set to ja, zh_CN, or zh_TW, the following settings are automatically adjusted:
| Setting | Default | CJK locale default |
|---|---|---|
min_search_term_length |
3 | 1 (also applies to ko) |
min_post_length |
20 | 8 |
min_first_post_length |
20 | 8 |
min_personal_message_post_length |
10 | 3 |
body_min_entropy |
7 | 3 |
min_topic_title_length |
15 | 6 |
title_min_entropy |
10 | 3 |
min_title_similar_length |
10 | 4 |
allow_uppercase_posts |
false | true (ja only) |
title_prettify |
true | false |
If your site uses one of these locales, you generally don’t need to change these settings — they’ll already be optimized for CJK.
Manual adjustments
Korean locale
Korean (ko) only receives an automatic locale default for min_search_term_length. If your site uses the Korean locale, you should manually adjust the other settings listed above to similar values.
Multilingual or non-CJK locale sites with CJK content
If your site’s default locale is not a CJK language but you have significant CJK-speaking users, you’ll need to adjust these settings manually:
- Set
min_search_term_lengthto 1 or 2 — CJK keywords can be as short as one or two characters - Set
min_post_lengthto approximately 8 - Set
body_min_entropyto about 3 — reduplication is common and meaningful in CJK languages, so setting this too high may cause “not meaningful post” errors - Set
min_topic_title_lengthto approximately 6 - Set
title_min_entropyto about 3 - Set
min_title_similar_lengthto approximately 4 - Enable
allow_uppercase_posts— Discourse may not recognize CJK characters when analyzing topic titles for case, causing errors - Disable
title_prettify— title prettification rules are designed for Latin scripts and may not work well with CJK text
Search tokenization
For improved search accuracy, Discourse offers optional CJK-specific tokenization settings:
-
search_tokenize_chinese— enables segmentation of Chinese text for better search results -
search_tokenize_japanese— enables segmentation of Japanese text for better search results
These are disabled by default and can be enabled in the admin search settings.
Troubleshooting search issues
If you encounter problems with search functionality after making these changes, you may need to reindex your database. Here’s how to do it:
- Enter your Discourse Docker installation directory.
- Run the following command to access the app container:
./launcher enter app - Once inside the container, run the reindexing command:
rake search:reindex
After reindexing, you should be able to search content effectively.
Last edited by @hugh 2024-07-26T01:02:02Z
Last checked by @sam 2026-03-18T04:23:22Z
Check document
Perform check on document: