Search a term in Japanese

yashi · February 2, 2022, 10:58am

I’ve tried tiny_segmenter from Rubygems and at least it does generate the words I’ve listed in the previous comment.

# coding: utf-8
require 'tiny_segmenter'
require 'pp'

s = File.read('topic27.txt')

ts = TinySegmenter.new
sg = ts.segment(s, ignore_punctuation: true)
pp(sg)

bundle exec ruby test.rb | grep -e 北側 -e 真上 -e 一般
 "北側",
 "真上",
 "一般",
 "一般",
 "一般",
 "北側",
 "一般",

A quick search about TinySegmenter told me that the model it uses isn’t as good. There is model generator for it.

I haven’t tried it though.

Topic		Replies	Views
Korean words can't be searched Support	36	1585	November 22, 2020
Chinese search doesn't work to some words Support	15	1695	October 31, 2021
Thai language support for searching Bug	4	1195	August 11, 2020
What's the word tokenizer for different languages in discourse? Support	1	589	May 27, 2020
Optimizing Discourse search for CJK languages Site Management how-to , localization	3	3152	March 13, 2017

Search a term in Japanese

Related topics