Search a term in Japanese

yashi · 2 Febbraio 2022, 10:58am

Ho provato tiny_segmenter da Rubygems e almeno genera le parole che ho elencato nel commento precedente.

# coding: utf-8
require 'tiny_segmenter'
require 'pp'

s = File.read('topic27.txt')

ts = TinySegmenter.new
sg = ts.segment(s, ignore_punctuation: true)
pp(sg)

bundle exec ruby test.rb | grep -e 北側 -e 真上 -e 一般
 "北側",
 "真上",
 "一般",
 "一般",
 "一般",
 "北側",
 "一般",

Una rapida ricerca su TinySegmenter mi ha detto che il modello che utilizza non è così buono. Esiste un generatore di modelli per esso.

Non l’ho ancora provato.

Argomento		Risposte	Visualizzazioni
Korean words can't be searched Support	36	1645	Novembre 22, 2020
Chinese search doesn't work to some words Support	15	1713	Ottobre 31, 2021
Thai language support for searching Bug	4	1208	Agosto 11, 2020
What's the word tokenizer for different languages in discourse? Support	1	618	Maggio 27, 2020
Optimizing Discourse search for CJK languages Site Management how-to , localization	3	3235	Marzo 13, 2017

Search a term in Japanese

Argomenti correlati