Thai language support for searching

Hi,

It seems that Thai language is one of a language that lack capability in searching.

If you are starting with Thai character, or space then Thai character, it can be search. But when the searching term is in the middle of a sentence, it won’t be found.

Sample.

ตรงนี้คือภาษาไทยนะ เว้นวรรคก็ยังเจอ

If I search with the starting keyword “ตรง” it will return the correct position. Also if I search with the word that begin after space “เว้น” It still can be found.

While if I type the middle word “ไทย” the problem happen, it return not found.

I would like to employ Discouse as a new forum technology to my country and looking for the patch to be done.

Sincere

Did you try with the “search tokenize chinese japanese korean” site setting enabled?

「いいね!」 3

I’ve tried it as the first option, but unlucky that it just not worked.

With CJK languages, each character is (more or less) a word. But in Thai several characters combine to produce a syllable, where the vowel may be before, after, above, or below, the consonant, and various tone markers may also be added. To make it even funner, words are usually (but not always) one syllable and there is usually no space between words. There must be standard libraries to tokenize these for search, I’m guessing.

Incidentally, most of the south and south-east Asian languages use related scripts, so if Thai has these problems we may find other languages do too.

「いいね!」 3

この件について進展はありますか?私の場合、タイ語のタグ検索がおかしい結果になります。一部のタグは結果が返ってきません(例:タグ: 標準と認証)が、/tags/標準と認証 にアクセスすると投稿が表示されます。

discourse バージョン: 2.6.0.beta1