Word censor - butt vs button


(Dean Taylor) #1

This is expected?

Adding the word butt to the word censor marks the word button as ■■■■on


(Robert Lee Louviere) #2

Then don’t mention this article.

New laws mean ****ing out in more public areas


(Mr.Burns avatar therefor TDWTF) #3

It probably is expected behavior. You should try adding a basic letter to the word filter. (Note that rather than blocks other things do this style of censoring by subbing in a less offensive word, which is a clbuttic mistake when applying this style of filtering).


(Jeff Atwood) #4

Hmm I though we did this using word boundary checks @eviltrout, e.g. \b on either side?


(Robin Ward) #5

No in this case it’s simpler than that, just a straight replace. Otherwise a user could say “bananalord” if the word “banana” was banned.

I think the “butt” case is kind of unusual isn’t it?


(Mr.Burns avatar therefor TDWTF) #6

Do you want to at least check for a word break at either end of a banned word then? Otherwise you are intentionally making a clbuttic mistake with your banned word list (which is fine but should be pointed out to users when they are making their list of banned words).


(mott555) #7

How about this? Censor the word, unless it’s contained within a larger, known, and uncensored dictionary word. “Bananalord” isn’t a real word (at least not in my dictionary) and would be censored then.


(Mr.Burns avatar therefor TDWTF) #8

That could lead to some confusion if your dictionary of OK words isn’t big enough. Which could be problematic depending on what kinds of things are getting added to a banned word list. Getting the forum admins to find a proper dictionary could make this work, but you would need instructions about getting dictionaries and examples as to what the problem is.

If you don’t allow the admins to set the dictionary then you are leaving gaping holes in those running non-English versions.


(Robin Ward) #9

It would mean sending an entire dictionary to the client app when this happens which is not ideal either.

Maybe I should change it to word boundaries after all? The question is would there be more intra-word banned words than clbuttic mistakes?


(mott555) #10

How difficult would it be to do the censoring on the server end? No dictionary to send to the client then, just keep it local to the server and do all the work there when loading posts from the database, that way if the censorlist changes old posts are automagically updated as needed.


(Mr.Burns avatar therefor TDWTF) #11

Wait, the censoring is done client side?


(Robin Ward) #12

The text rendering code path is entirely Javascript, and the same code runs on the client and server. So yes, it’s client side for the preview then re-done on the server side using the same code. I think you guys would have figured it out pretty quickly otherwise :smile:

It would be possible but I like the fact that users see their words being censored as they type them. It communicates to them immediately that the word is not going to go through.

I am leaning towards the boundary check.


(Sam Saffron) #16

Simplest thing is to add a trivial DSL @eviltrout

*boat*,ship , everything containing the word boat and ship on word boundary.

This also allows for *ass (ending in ass) ass* (leading with ass) and ass plain old ass.


(Mr.Burns avatar therefor TDWTF) #17

Plus if you are censoring “ass” checking for a single word boundary would still catch things like class, assume, and so on. You would avoid the clbuttic mistake, but still hit far too many to say it is a clean fix.


(Robin Ward) #18

I think it’s easier to default to word boundaries for now. Even with the DSL it’s not clear when to allow “smartass” for example.

The goal was never to get 100% of the censored words, just to stop the easiest ones then allow moderation for the rest. In this case “ass” was preventing “class” which is really no good.

https://github.com/discourse/discourse/commit/3b38667274fe96977e8852d1415eb58537d1d7cd


Word censoring does not respect word boundaries in topic titles
(Ilya Kuchaev) #19

http://ruby.bastardsbook.com/chapters/regexes/

There is a great exersize regarding the “ass” struggling :slight_smile:


(Dean Taylor) #20

So this one doesn’t get missed I created a separate bug
https://meta.discourse.org/t/word-censor-requires-restart-to-clear-word-list/20989?u=deanmarktaylor&source_topic_id=20968


(Dean Taylor) #21

@eviltrout I would love to try this - FYI I can’t as it seems that tests-passed branch as not updated for an hour.


(Robin Ward) #22

Oops, sorry about that. It should build again soon.


(Jeff Atwood) #23