Censor words should support sentence level censoring for Chinese

fantasticfears · October 3, 2017, 5:41am

CJKV doesn’t have word boundaries. It’s more reliable to use the feature with a sentence level. In a word, support this feature without word boundaries.

Suggested: (Chinese) 推荐主题、危禁词要是支持中文就好了 - 支持 - Discourse中文论坛

schungx · October 3, 2017, 6:13am

There is a discussion here:

schungx · October 3, 2017, 6:35am

If you can do a custom build of discourse, it is a simple matter to change that one line of code to remove the wrapping \b's.

In the long term, I suggest removing them as default, or at least add a site setting for those of us running non-English forums.

pfaffman · October 3, 2017, 1:47pm

You could create a plugin to do that and/or submit a PR.

schungx · October 3, 2017, 1:50pm

Unfortunately a plugin requires quite a bit of Ruby knowledge. I can debug, but probably not even close to writing plugins.

A PR would require that I fork the entire repo, which is ok except I have no way to test it. It is bad form to submit a PR without testing…

Stranik · October 3, 2017, 2:51pm

There’s really not enough to remove one line. It is necessary to completely rewrite the logic file. I gave there a working version of the file (using loops).

schungx · October 4, 2017, 4:17pm

Well, not to remove the line, but to remove the \b's in the line.

Regexp will never work for all languages with word breaks. The best you can do is to allow the user to decide which words require word breaks and which do not.

With the \b wrapper hard-coded in right now, there is no choice.

schungx · January 11, 2018, 5:50am

This issue is now solved by:

To match Chinese patterns, turn on Settings > Posting > watched words regular expressions.

Beware, your Watched Words will now be raw regular expressions, so if your list includes English words, you’ll need to put in your own word break \b where necessary.

jomaxro · January 12, 2018, 11:00pm

This topic was automatically closed after 40 hours. New replies are no longer allowed.

Topic		Replies	Views
Censored pattern Bug	8	2205	January 12, 2018
Censored words do not respect word boundaries in non-latin alphabet Bug pr-welcome	8	1490	November 29, 2018
A closing round bracket breaks word censoring Bug	5	1425	September 13, 2017
* wildcards in Watched Words (Censor) don't work Feature	20	3068	January 11, 2018
Word censoring does not respect word boundaries in topic titles Bug	4	979	June 28, 2017

Censor words should support sentence level censoring for Chinese

Related topics