Optimizing Discourse search for CJK languages

:bookmark: This guide explains how to adjust Discourse settings to better accommodate Chinese, Japanese, and Korean (CJK) languages in site search.

:person_raising_hand: Required user level: Administrator

Discourse’s default settings may not be optimal for CJK (Chinese, Japanese, Korean) languages. However, Discourse now automatically adjusts many of these settings when your site’s locale is set to a CJK language (Japanese, Simplified Chinese, or Traditional Chinese). This guide explains what is configured automatically and what you may still need to adjust manually.

Automatic locale defaults

When your site’s default locale is set to ja, zh_CN, or zh_TW, the following settings are automatically adjusted:

Setting Default CJK locale default
min_search_term_length 3 1 (also applies to ko)
min_post_length 20 8
min_first_post_length 20 8
min_personal_message_post_length 10 3
body_min_entropy 7 3
min_topic_title_length 15 6
title_min_entropy 10 3
min_title_similar_length 10 4
allow_uppercase_posts false true (ja only)
title_prettify true false

If your site uses one of these locales, you generally don’t need to change these settings — they’ll already be optimized for CJK.

Manual adjustments

Korean locale

Korean (ko) only receives an automatic locale default for min_search_term_length. If your site uses the Korean locale, you should manually adjust the other settings listed above to similar values.

Multilingual or non-CJK locale sites with CJK content

If your site’s default locale is not a CJK language but you have significant CJK-speaking users, you’ll need to adjust these settings manually:

  • Set min_search_term_length to 1 or 2 — CJK keywords can be as short as one or two characters
  • Set min_post_length to approximately 8
  • Set body_min_entropy to about 3 — reduplication is common and meaningful in CJK languages, so setting this too high may cause “not meaningful post” errors
  • Set min_topic_title_length to approximately 6
  • Set title_min_entropy to about 3
  • Set min_title_similar_length to approximately 4
  • Enable allow_uppercase_posts — Discourse may not recognize CJK characters when analyzing topic titles for case, causing errors
  • Disable title_prettify — title prettification rules are designed for Latin scripts and may not work well with CJK text

Search tokenization

For improved search accuracy, Discourse offers optional CJK-specific tokenization settings:

  • search_tokenize_chinese — enables segmentation of Chinese text for better search results
  • search_tokenize_japanese — enables segmentation of Japanese text for better search results

These are disabled by default and can be enabled in the admin search settings.

Troubleshooting search issues

If you encounter problems with search functionality after making these changes, you may need to reindex your database. Here’s how to do it:

  1. Enter your Discourse Docker installation directory.
  2. Run the following command to access the app container:
    ./launcher enter app
    
  3. Once inside the container, run the reindexing command:
    rake search:reindex
    

After reindexing, you should be able to search content effectively.

Last edited by @hugh 2024-07-26T01:02:02Z

Last checked by @sam 2026-03-18T04:23:22Z

Check documentPerform check on document:
8 likes