Uppercase letter detection appears to be ignoring accented letters

RaceProUK · 25 Marzo, 2015 11:35

Just one is enough to get past the ‘is this content’ filter, as the post below shows.

Not a biggie, just seems a little inconsistent, s’all.

RaceProUK · 25 Marzo, 2015 11:35

ÁN EXAMPLE REPRODUCTION HERE TOO

eviltrout · 25 Marzo, 2015 15:02

The Uppercase detection is one of those features where we just handle the simplest and most basic version of the issue and leave it up to moderators to enforce it otherwise.

Why? Well as you noticed there are hundreds of thousands of unicode endpoints that would break it. It is just not practical to get them all when it’s easier to tell a user, “hey, stop doing that!”

RaceProUK · 25 Marzo, 2015 15:15

True I guess, but IIRC, the .NET Framework has an API for asking whether a letter is upper or lower case; does whatever Discourse runs on (Ruby On Rails?) not have an equivalent? Or would that slow things down too much?

eviltrout · 25 Marzo, 2015 15:28

Ruby provides an API to do this, and we use it, however it is only effective for ASCII. So when non-ASCII characters are present we skip the check. We were bitten by this previously with foreign languages.

RaceProUK · 25 Marzo, 2015 15:50

Well, that’s a bit… huh.

Eh, may as well close this now I guess.

sam · 25 Marzo, 2015 22:59

You can actually do this in Ruby it just means you need to be a tad more fancy

utf_pattern = Regexp.new("\\p{Lower}".force_encoding("UTF-8"))

a = "Go234"
a.match(utf_pattern) # => #<MatchData "o">

b = "GO234"
b.match(utf_pattern) # => nil

b = "ÜÖ234"
b.match(utf_pattern) # => nil

b = "Über234"
b.match(utf_pattern) # => #<MatchData "b">

riking · 26 Marzo, 2015 06:47

Don’t forget Chinese/Japanese/Korean!

eviltrout · 26 Marzo, 2015 15:28

@neil is there a reason you didn’t use this approach? Looks like you were the one who did the ascii change.

neil · 26 Marzo, 2015 15:59

I have no memory of this… It should use that approach. Also, can you do ALL CAPS in Chinese/Japanese/Korean??

riking · 26 Marzo, 2015 18:06

I was saying to make sure that the behavior was correct, as it looks like that regex checks for “any lowercase”.

eviltrout · 26 Marzo, 2015 18:11

Huh I found this commit but maybe you took the approach from someone else?

https://github.com/discourse/discourse/commit/876a570e3a2e227528d135a0cc67cccf442baaf1

Tema		Respuestas	Vistas
Force Lowercase slug URLs when set to "encoded" Support	22	4540	20 Junio 2016
Username completition broken for names with accents like Régis Bug	8	1687	22 Agosto 2018
When watched words regular expressions is true, watched words does not allow uppercase regex Bug	1	1173	9 Enero 2018
Unicode username with Σ as the final char leads to an error loading profile page Bug	36	2419	23 Febrero 2021
Add tags with capital letters Feature completed , tags	4	1652	7 Septiembre 2025

Uppercase letter detection appears to be ignoring accented letters

Temas relacionados