Thai language support for searching

Hi,

It seems that Thai language is one of a language that lack capability in searching.

If you are starting with Thai character, or space then Thai character, it can be search. But when the searching term is in the middle of a sentence, it won’t be found.

Sample.

ตรงนี้คือภาษาไทยนะ เว้นวรรคก็ยังเจอ

If I search with the starting keyword “ตรง” it will return the correct position. Also if I search with the word that begin after space “เว้น” It still can be found.

While if I type the middle word “ไทย” the problem happen, it return not found.

I would like to employ Discouse as a new forum technology to my country and looking for the patch to be done.

Sincere

Did you try with the “search tokenize chinese japanese korean” site setting enabled?

3 Likes

I’ve tried it as the first option, but unlucky that it just not worked.

With CJK languages, each character is (more or less) a word. But in Thai several characters combine to produce a syllable, where the vowel may be before, after, above, or below, the consonant, and various tone markers may also be added. To make it even funner, words are usually (but not always) one syllable and there is usually no space between words. There must be standard libraries to tokenize these for search, I’m guessing.

Incidentally, most of the south and south-east Asian languages use related scripts, so if Thai has these problems we may find other languages do too.

3 Likes

Any progress on this issue? In my case, search by tag in Thai give the strange result, some tag give no result (ex. tags:มาตรฐานและใบรับรอง) but when I go to /tags/มาตรฐานและใบรับรอง I see the posts

discourse version: 2.6.0.beta1