Unicode username with Σ as the final char leads to an error loading profile page

Can this also affect user related slugs? (containing the username)
We have a few users that use UTF-8 usernames and some of them can’t access their profiles…

2 Likes

It should not affect user at all as the routes are completely different.

Can you share a link where a profile fails to load? Or at least an example of a username that triggers the bug?

4 Likes

This is one case: https://rembetiko.gr/u/σπυρος

The username is ΣΠΥΡΟΣ (which is the capital form of σπυρος)


Sorry for the greek :sweat_smile:

2 Likes

This page is behind a Cloudflare proxy? Can you test with that off?

Also what are the values of the settings:

  • allowed unicode username characters

  • unicode usernames

3 Likes

Here are the values (pretty standard :sweat_smile:)

Yes

I just disabled the proxy, and tested again. The issue unfortunately persists. I will keep the proxy disabled for a bit so that you can test it yourself if you want :slight_smile:


Thank you very much for your help! :smiley:

2 Likes

Hmmm.

If I try to load the upper case it loads: Προφίλ - ΣΠΥΡΟΣ - Ρεμπέτικο Φόρουμ at first and then fails on a subsequent JSON fetch to the lowercase version.

Looks like an error between our upper/lower case handling.

6 Likes

Strangely this one works:

https://rembetiko.gr/u/αγγελικη_ντοτη

The username is ΑΓΓΕΛΙΚΗ_ΝΤΟΤΗ (again capital form of αγγελικη_ντοτη)

2 Likes

Could it be that in Greek, there are two ways to uncapitalize the letter ‘Σ’?

  • ‘ς’ when it is used at the end of a word
  • ‘σ’ anywhere else
2 Likes

So is this wrong?

[1] pry(main)> "ΣΠΥΡΟΣ".downcase
=> "σπυροσ"
2 Likes

Yes, grammatically it is wrong. The correct form should be “σπυρος”

2 Likes

Oh, so I’m afraid that is a Ruby bug:

➜  ruby --version           
ruby 3.0.0dev (2020-12-16T18:46:44Z master 93ba3ac036) [x86_64-linux]
➜  irb           
irb(main):001:0> "ΣΠΥΡΟΣ".downcase
=> "σπυροσ"
3 Likes

But it works nicely when the link is created… So there must be a way for it to work… (?)

https://rembetiko.gr/u/σπυρος

2 Likes

It shouldn’t matter if Ruby converts the username into the grammatically correct lowercase version of the username as long as it always looks up users with the normalized username (User.normalize_username in Ruby) and the username_lower in the database.

Which JSON fetch fails? It’s quite likely that there’s a route that uses a different mechanism for comparing usernames.

4 Likes

Maybe it’s because Ruby and JS have different implementations?

➜  ruby --version           
ruby 3.0.0dev (2020-12-16T18:46:44Z master 93ba3ac036) [x86_64-linux]
➜  irb           
irb(main):001:0> "ΣΠΥΡΟΣ".downcase
=> "σπυροσ"
➜  node
Welcome to Node.js v12.11.1.
Type ".help" for more information.
> "ΣΠΥΡΟΣ".toLowerCase()
'σπυρος'

Firefox does the same as NodeJS in my tests.

The endpoint /u/#{username}.json only bring the username column and not the username_lower column, so maybe we are relying in the browser here? Digging it now…

6 Likes

Oh, that’s bad. So, the problem is probably this:

https://github.com/discourse/discourse/blob/9870a0b6a1ba86e1e6192d32145507acbd53d43a/app/assets/javascripts/discourse/app/models/user.js#L264-L267

I was going to suggest to add the username_lower to the UserSerializer on the server instead of doing it on the client, but this would still leave us with a couple of other occurrences of username.toLowerCase.

I wonder if the better solution would be to use mini_racer for calculating username_lower on the server when it contains non-ASCII characters. :thinking:

6 Likes

Well, independent of the workaround we chase I will report this to Ruby.

8 Likes

Just for reference, PHP does it in the same way as Ruby… Makes me think that it is an intentional design (?)

You can test the code here:

1 Like

Interestingly Postgres also fails here:

[2] pry(main)> DB.query_single('select lower(?)', 'ΣΠΥΡΟΣ')
=> ["σπυροσ"]

Perhaps we should simply special case this quirk in our internal Discourse method that handles calculating username_lower ?

Find all the methods that call username_lower pipe them to a central function and then have this special case allowed for (I guess we can use a mini_racer call here if we wish or simply call .lower and fix it up afterwards with a sub call)

Updating the OP title here to make it clearer.

3 Likes

Given:

[4] pry(main)> "σπυρος".downcase
=> "σπυρος"

@chrispanag a trivial workaround for you is simply to change the username to σπυρος, username and username_lower will be exactly the same and this will simply work.

I am mixed on adding workarounds to core just for this specific case, especially when a totally trivial workaround exists.

Additionally you could ban the use of Σ in usernames using our allowed unicode username characters setting which would ensure this issue never pops up.

All for fixing Ruby and Postgres here, but this is one long multi year battle to get these things fixed.

5 Likes

I fully agree here, we report the upstream bugs and Discourse users can use existing tools to workaround in the meanwhile.

5 Likes