Korean words can't be searched

k11 · May 26, 2018, 2:04am

it’s not about space I think
no matter I search 공 or 공을 or 공을 치다 whatsoever, search never works for Korean

sam · May 26, 2018, 2:06am

Can you try changing min search term length in admin/site_settings to 2 or even 1?

schungx · May 26, 2018, 3:25am

Discourse does have reasonable Chinese support. Can you first try with a Chinese post to see whether you can search for Chinese?

Possible result #1: Cannot search for Chinese – there is something wrong with the settings, because Chinese should work.

Possible result #2: Can search for Chinese – it may be the issue of not having a tokenizer for Korean.

k11 · May 26, 2018, 7:59am

WTF it works with min search term length 1 LOL
rofl thanks Sam LOL

k11 · May 26, 2018, 8:00am

tested with chinese language too,
chinese even worked before I set min search term lengh to 1

sam · May 26, 2018, 8:12am

I do not understand Korean, can you clarify, do we need word splitting in Korean or not?

k11 · May 26, 2018, 8:14am

I think I know why it didn’t work even with min search term length 2

I never put empty space between syllable when search Korean words
but the search terms showed in dashboard have space after every syllables…

so what I’m saying is

what I searched is 말랑 but the search query appeared in dashboard is 말 랑
what I searched is 중국 but the search query appeared in dashboard is 중 국
what I searched is 차트 but the search query appeared in dashboard is 차 트

this is why discourse couldn’t show any search result I think.
How can I fix this lovely bug?

k11 · May 26, 2018, 8:17am

I hope my reply above helps a bit to sort out the issue.

one more thing I wanna say is
I think someone said earlier that Korean language has no space but we do.
We use space just like English.

I am a handsome boy.
나는 잘생긴 소년입니다.

나는 = I
잘생긴 = handsome
소년 = boy
입니다 = am (sort of)

we still use space between words.

sam · May 26, 2018, 8:19am

So we should remove all the splitting and just rely on space? Just confirming

k11 · May 26, 2018, 8:20am

I don’t 100% understand what you mean but if you meant what I mean then yes.
WTF am I saying LOL

k11 · May 26, 2018, 8:26am

it works almost perfect when I set min search term length to 1.
only 1 thing why it’s almost perfect but not perfect is

it literally catches all. I mean,
if I search for 안녕하세요
then it brings up not only 안녕하세요 but also 안 which is totally different.

it’s like when you search for ‘interesting’
search results are not only ‘interesting’ but also ‘i’ which is not relevant at all.

But still ok part is that search result showing by the most relevant order so the bad results are far away

schungx · May 26, 2018, 9:58am

In this case, it seems that you can just split by punctuation or white space in Korean (unless the Korean texts contain Chinese characters, which is rare in contemporary writing).

Japanese, unfortunately, is purely no-space though…

fantasticfears · May 26, 2018, 6:23pm

If word spacing is used, it’s much easier to build the index since the algorithm can treat it like English. For better understanding, does Korean use some really common words to compose the sentence even it doesn’t provide an significant meaning? This is what we called stop words. We leave them out because they are extremely common and it’s not useful to index them at all.

sam · May 27, 2018, 11:41pm

@k11 try this commit out:

https://github.com/discourse/discourse/commit/c677877e4fe5381f613279901f36ae255c909573

(upgrade once the tests passed)

It removes word segmentation from Korean and changes min search length to 2 for Korean and Japanese

k11 · May 27, 2018, 11:59pm

thank you very much Sam I will go try now

Topic		Replies	Views
Search a term in Japanese Support	26	2169	February 2, 2022
Chinese search doesn't work to some words Support	15	1695	October 31, 2021
What's the word tokenizer for different languages in discourse? Support	1	589	May 27, 2020
Searching Chinese terms in middle of sentence Feature	23	3458	October 8, 2016
Optimizing Discourse search for CJK languages Site Management how-to , localization	3	3152	March 13, 2017

Korean words can't be searched

Related topics