Japanese search only works partially


(German Viscuso) #1

Hi.

I’m aware of this thread about issues with Chinese search on a Discourse instance localized for English.

Similarly I’m facing a problem with searching for Japanese words on a Discourse instance localized for English.

We created a post that has exactly this content:


This is a topic for testing internally. I expect this will be removed when we make this public.

Trying some Japanese text
近いうちに日本語のコミュニティサイトもできるらしいので、その時はよろしくお願いします。

If you search for the the first Japanese characters on the left (近いうちに日本語) then you get a proper result but if you take the middle part of the Japanese sentence or the last part to the right then you don’t get a result (BTW you can try and see this behavior right here on meta searching for the Japanese characters above).

Is this a bug? I can’t just localize the instance to Japanese because it’s mainly an English site with Japanese and Chinese categories.

Best. Thx


(Konrad Borowski) #2

I believe this happens, because this software tries to avoid searching inside words (only at beginning). This works with English (I believe that Unicode support in Discourse is limited to recognizing length of strings), but it’s completely broken with languages that don’t use spaces (Japanese, or Chinese, for instance).


(German Viscuso) #3

That makes sense. Still a bug, I wish this could be fixed!!


(Eduardo Gonzalez) #4

This is a pretty tough bug to fix because you have to create an index per language. If you wanted to be able to search in Japanese, Chinese and English you’d have to create three (fairly large) indexes. Since most forums have a single dominant language it’s mostly a waste of space (and time) to maintain these indexes.

I took a quick perusal through the source and it looks like discourse is hardcoded to ‘english’ at the moment but there are some recent commits that make it look like you can customize the language-mode of the index.


(German Viscuso) #5

Thx a lot for your feedback. Do you have an idea of whether I can solve this by running 3 instances localized differently over the same database:

  • Instance 1: localized to English
  • Instance 2: localized to Chinese
  • Instance 3: localized to Japanese

In this case would search behave differently? I know most of the posts will be in English (for all instances) but some will be in Japanese and Chinese.

Best!


(Sam Saffron) #6

This is essentially the same as:


(Aj Koft ModifyWordpressCourse) #7

I do have a problem when searching in Thai language also. This is a big barrier for me to completely switch to Discourse.

Hope it will be fixed soon,

aka. in Thai, and other eastern language, wording are not separated by a single white space as English does, that’s a problem


(Sam Saffron) #8

We shipped a stemmer a while back for Chinese/Japanese

Closing this … @ajkoft if you need something for Thai this is the location that needs to be patched raise a separate issue.


(Sam Saffron) #9