Adjust Discourse search to work with CJK languages

translation

#1

The default site settings doesn’t fit with CJK language well. Below is some tweaks. You can find them by searching in the site setting panel.

Tweaks

  1. Set min_search_term_length to 1 or 2

    The keyword usually can be as short as 2 characters, set it reasonably lower.

  2. Check allow_uppercase_posts

    Discourse doesn’t recognize CJK characters when analyzed topics. Users will find their post’s title illegal sometimes.

  3. Set min_post_length around 8

    Thumb rule for a reasonable sentences.

  4. Set body_min_entropy around half of min_post_length

    Reduplication is common and those characters are meaningful. Too high this value, some users may find not meaningful post error.

  5. Set min_topic_title_length and title_min_entropy in a similar fashion.

  6. Set min_title_similar_length and min_body_similar_length according to assigned value above.

Troubleshooting

Reindex DBs for seaching

1. Enter the discourse docker install directory and run:

        ./launcher enter app

2. Then type the command below for reindexing

        rake search:reindex

3. Now you should be able to search the content.

Thanks to Audrey Tang, she give me support to finish this article.


(Arpit Jalan) #2

We already do this (by default) now as per:

@sam is this topic still relevant?


(Erick Guan) #3

Encoding for postgresql is perfectly fine now. 1 and 2 are kind of essential settings for CJK users as well as other settings to adjust post length/title length restriction.

Reindexing is still a trick for troubleshooting but rarely used.


(Erlend Sogge Heggen) #4

I wiki’d it, so feel free to remove the stuff that’s redundant now.