Censored words out of a URL

markdown-it-review

(Jesse Perry) #1

Not sure if this should be classified as a bug, or intended. But I noticed that Discourse will censor out censored words from a URL, even when it’s not shown. So if someone links to something that contains a censored word, but they link it through the hyperlink button — Discourse will still censor out the words and mess up the link.

For that matter, perhaps the censored word feature shouldn’t mess with URLs at all, even if they do show? Or if a URL contains a censored word, hide the URL altogether, but just force a hyperlink to it?

All minutiae I’m not sure the answer to…


(Mittineague) #2

How big of a list of censored words do you have?
And are they interfering with legimate URLs?

If so, perhaps you can come up with something more specific to differentiate them.

Example?


(Jesse Perry) #3

The URL had Moby-Dick in the title. Which as I type this, I realize that’s a poor thought out censored word.


(Mittineague) #4

Well, Dick can certainly be used legitimately.

It’s all about context.

For example.
“She lifted his big package"
could be more offensive than
"My friend Dick is a Ruby developer”.

lol I was just going to post something along that line.

I think in this case it would be better to moderate it on an individual context basis.

I was thinking of the classic example “sex” as in expertsexchange.com
You wouldn’t want that to come out as a link like expert■■■change.com
as nice as it might be to foil attempts to link drop online■■■tapes.com

And has been posted in other topics here, the censored word list is not foolproof. Easy enough to post
Hey D-I-C-K
There are times when moderation needs to be done.


(Jesse Perry) #5

Thanks for the thoughtful response and clarification! I should tone down the automatic censorship a bit :slight_smile:


(ljpp) #6

I just encountered this as well. I have blocked about two dozen most offensive curse words of our language, mostly related to sexuality and human anatomy. In our culture these are not considered civilized discussion, and we don’t have an age limit for members.


(cpradio) #7

Also, I sort of what to keep censor words working on URLs… We actually use it as a brief way to combat some of the link dropping we get.

We’ve added things like adf.ly to our censor word list as users were trying to use it in a non-community-oriented way, so this way their links would break and it would be obvious to us mods that the link they had was not appropriate (so we should take action on their post).

I’m not saying that is a right solution, but so far, nothing else exists for blocking certain domains. And we wanted something semi-automated, so this fit in nicely.


(Sam Saffron) #8

Certainly nothing should break due to a censored word, the words in the URL are slugs anyway so it should not matter.

What is the exact repro here?


(Mittineague) #9

Agreed 100+++%
Until there is way to censor URLs I hope this approach remains as an alternative.

True, without regex support it can be tricky to come up with something that doesn’t have false positives.
But it is about the only thing that works so far short of inspecting raw / cooked on a post by post basis to moderate.

… off to look at some more files … :eye:


(Kane York) #10

repro:

  1. add adf.ly to censored words list
  2. post with http://adf.ly/SpONsoREDliNK
  3. post renders with <a href="http://■■■■■■/SpONsoREDliNK">http://■■■■■■/SpONsoREDliNK</a>

(cpradio) #11

If that changes, can we implement something similar with Screen URLs. We really need this to function, as it was and remains to be a problem for us and censor words was our “workaround”.


(Sam Saffron) #12

Sorry, just to expand, you are looking for a function that allows you to ban certain domains in hyperlinks… like "you shall never link to bit.ly" ? (fair warning, even if implemented there are :mount_fuji:s of workaround)


(Mittineague) #13

I’ve tried as many ways as I could think of to get around the current effect of censored word preventing unwanted URLs and could not find any way that resulted in working links.


(Sam Saffron) #14

You simply link to a site that 302s to the URL you are after, like bit.ly or family, there are 100s of these sites around.


(cpradio) #15

We ban most of those too. But yes, we have fairly strict policies on links on our site. Shorten URLs are not permitted, which rules out many of them. Plus we’ve had issues in the past with some sites generating 10-50 users over the course of months to find/dredge up topics to offer their services that are in our list.

And yes, we do eventually ban these members, but by stopping their instant gratification of seeing their link in a working order, they do quickly stop trying. Many also use their real URL because they magically believe we’re a “do-follow” site (even though we do utilize the no-follow setting in our install). So for that reason, they typically avoid URL shorteners.


(Sam Saffron) #16

The censor only applies to text now per:

https://github.com/discourse/discourse/blob/master/app/assets/javascripts/pretty-text/engines/markdown-it/censored.js.es6#L18-L22

fixed in:


(Sam Saffron) #17