Korean words can't be searched

English words can be searched properly
but the forum doesn’t recognize Korean and Chinese search terms at all

Did you install using the correct database locale / language?

1 Like

yes I did. I installed as Korean language.

should I turn on ‘search tokenize chinese japanese korean’ this feature??

I had someone contact me about this recently. I wasn’t sure which locale to use for Chinese, as I don’t know how the Postgres locales map to the various Chinese dialects. Is there some way to know which to use?

should I turn on ‘search tokenize chinese japanese korean’ ???

please help me.

Probably, give it a shot and see what happens.

1 Like

turned on it but still can’t search any Korean and Chinese words…

I don’t know for certain, but I have a feeling the tokenize happens between raw -> cooked.

If you rebake a post with the Korean or Chinese words, does that post then show up in Search?

I posted a new topic and searched korean words but still nothing shown up in search result :confused:

Not sure, maybe @fantasticfears would know?

1 Like

You don’t have to change Postgres locale but your default_locale should be zh_CN and zh_TW. @k11 you could try search tokenize chinese japanese korean but the dictionaries are limited for Korean.

1 Like

I already turned on ‘search tokenize chinese japanese korean’ like 12 hours ago
but site cannot search any korean words…

Should I reboot my server or do something else?

Dont you have to recook your posts to update the full text index?

If i remember, getting Chinese working is nontrivial. There are a whole bunch of settings required for CJK support. For example, you’ll need to turn down the text entropy limit, and turn off prettify.

1 Like

can you help me setting Korean also?

still couldn’t solve this issue :confused:
can’t search any korean words…
anyone can help?

Try creating a new post, does it show up in search?

4 Likes

new post also not show up in search result…
only english can…

https://github.com/discourse/discourse/blob/master/lib/search.rb#L51

Specific tokenizers are needed for Japanese and Korean to work. (Maybe Korean doesn’t need one?)

It seems that Discourse relies on spaces between words to perform a success search. Chinese and Japanese users don’t add a space between two words, therefore a tokenizer is required to split a sentence into words. Though I don’t understand Korean (and Japanese), there are inter-word spaces on Korean Wikipedia, however.

CppJiebaRb, the tokenizer designed for Chinese only, has been used for Chinese, Japanese and Korean in Discourse. It segments any Japanese or Korean sentence into an array of each character but not words which are expected.

6 Likes

thanks for comment.
But what should I do for korean search?
sorry I’m very new to SSH command and not sure what you meant me to do…

Nice detective work, looks like it is a bit complicated:

http://www.koreanwikiproject.com/wiki/Word_spacing

@k11 the problem is search relies on having seperate words in the index.

So for example if we do not try to inject spaces when we type 공을 치다 you will not be able to find 을.

Curious when you use search on this site does search in work correctly for this topic?

1 Like