Chinese search doesn't work to some words


#1

Hi all,
I find that some Chinese words cannot be searched. I don’t know why. Any idea for that?

The screenshot below comes from this site. You can view it from https://meta.discourse.org/t/discourse/6780?source_topic_id=13287


(Sam Saffron) #2

You have to enable search tokenize chinese japanese korean in site settings and rebuild the search index by editing the post or something.

This should be enabled by default for you in site settings if you have Chinese locale selected.


#3

Thanks Sam. I’m not sure how to rebuild the search index, I will do some research on it. Thanks


(Stephen) #4

If memory serves:

cd /var/discourse
./launcher enter app
rake search:reindex

#5

Thanks Stephen. Do you know how to do it under development environment?


(Stephen) #6

You can run rake from ./bin/docker

I can’t be more specific without details of your dev environment install.


#7

It still doesn’t work on my dirsource.
I find an example from this site:
https://meta.discourse.org/t/discourse/6780?source_topic_id=13287

You can find some Chinese words from the topic above, but if you search some of these words, it shows no results.


(Mittineague) #8

Going by the text as it appears in the screen captures, it looks like missed


(Sam Saffron) #9

@fantasticfears any tips here? Is the segmenter broken here?


#10

This screen capture can be clearer.


#11


(Erick Guan) #12

Do you enable the search tokenize chinese japanese korean? Or do you set default locale as zh_CN? Meta probably didn’t enable those settings. Mind trying something here? https://master1.discoursecn.org/

@sam Segmenter works for me. It should be something else.


#13

I tried some words on your site. It still doesn’t work

It can be seen from the screenshot below that the provided results do not include the current topic that I was viewing, but it should include. It cannot cover all correct results.


(Erick Guan) #14

I’m not really sure which is broken since both text come back with expected segmentation.

Can you update cppjieba_rb to 0.3.3? @sam


#16

I paste some missed words here. If you copy them and search them in discourse, you can find that the expected searching results are missed.

For this site: (the results should include this page: Discourse 中文完全安装指南正式发布)
指南发布

For Erick Guan’s site (https://master1.discoursecn.org/)
重启服务器 (searching results should include: 关于重启服务器需要重建容器以及vps搬家的疑问 - 支持 - Discourse中文论坛)


What if set db_default_text_search_config: "pg_catalog.simple"