Uppercase letter detection appears to be ignoring accented letters


Just one is enough to get past the ‘is this content’ filter, as the post below shows.

Not a biggie, just seems a little inconsistent, s’all.



(Robin Ward) #3

The Uppercase detection is one of those features where we just handle the simplest and most basic version of the issue and leave it up to moderators to enforce it otherwise.

Why? Well as you noticed there are hundreds of thousands of unicode endpoints that would break it. It is just not practical to get them all when it’s easier to tell a user, “hey, stop doing that!”


True I guess, but IIRC, the .NET Framework has an API for asking whether a letter is upper or lower case; does whatever Discourse runs on (Ruby On Rails?) not have an equivalent? Or would that slow things down too much?

(Robin Ward) #5

Ruby provides an API to do this, and we use it, however it is only effective for ASCII. So when non-ASCII characters are present we skip the check. We were bitten by this previously with foreign languages.


Well, that’s a bit… huh.

Eh, may as well close this now I guess.

(Robin Ward) #7

(Sam Saffron) #8

You can actually do this in Ruby it just means you need to be a tad more fancy

utf_pattern = Regexp.new("\\p{Lower}".force_encoding("UTF-8"))

a = "Go234"
a.match(utf_pattern) # => #<MatchData "o">

b = "GO234"
b.match(utf_pattern) # => nil

b = "ÜÖ234"
b.match(utf_pattern) # => nil

b = "Über234"
b.match(utf_pattern) # => #<MatchData "b">

(Kane York) #9

Don’t forget Chinese/Japanese/Korean!

(Robin Ward) #10

@neil is there a reason you didn’t use this approach? Looks like you were the one who did the ascii change.

(Neil Lalonde) #11

I have no memory of this… It should use that approach. Also, can you do ALL CAPS in Chinese/Japanese/Korean??

(Kane York) #12

I was saying to make sure that the behavior was correct, as it looks like that regex checks for “any lowercase”.

(Robin Ward) #13

Huh I found this commit but maybe you took the approach from someone else?