Adjust Discourse search to work with CJK languages

The default site settings doesn’t fit with CJK language well. Below is some tweaks. You can find them by searching in the site setting panel.

Tweaks

  1. Set min_search_term_length to 1 or 2

    The keyword usually can be as short as 2 characters, set it reasonably lower.

  2. Check allow_uppercase_posts

    Discourse doesn’t recognize CJK characters when analyzed topics. Users will find their post’s title illegal sometimes.

  3. Set min_post_length around 8

    Thumb rule for a reasonable sentences.

  4. Set body_min_entropy around half of min_post_length

    Reduplication is common and those characters are meaningful. Too high this value, some users may find not meaningful post error.

  5. Set min_topic_title_length and title_min_entropy in a similar fashion.

  6. Set min_title_similar_length and min_body_similar_length according to assigned value above.

Troubleshooting

Reindex DBs for seaching

1. Enter the discourse docker install directory and run:

        ./launcher enter app

2. Then type the command below for reindexing

        rake search:reindex

3. Now you should be able to search the content.

Thanks to Audrey Tang, she give me support to finish this article.

7 Likes

We already do this (by default) now as per:

https://github.com/discourse/discourse_docker/blob/762d9bbf6827d25295923b3ff0145d80008f0d41/templates/postgres.9.5.template.yml#L151

@sam is this topic still relevant?

2 Likes

Encoding for postgresql is perfectly fine now. 1 and 2 are kind of essential settings for CJK users as well as other settings to adjust post length/title length restriction.

Reindexing is still a trick for troubleshooting but rarely used.

6 Likes

I wiki’d it, so feel free to remove the stuff that’s redundant now.

4 Likes