Σで終わるUnicodeユーザー名でプロフィールページ読み込みエラーが発生

Can this also affect user related slugs? (containing the username)
We have a few users that use UTF-8 usernames and some of them can’t access their profiles…

「いいね!」 2

It should not affect user at all as the routes are completely different.

Can you share a link where a profile fails to load? Or at least an example of a username that triggers the bug?

「いいね!」 4

This is one case: https://rembetiko.gr/u/σπυρος

The username is ΣΠΥΡΟΣ (which is the capital form of σπυρος)


Sorry for the greek :sweat_smile:

「いいね!」 2

This page is behind a Cloudflare proxy? Can you test with that off?

Also what are the values of the settings:

  • allowed unicode username characters

  • unicode usernames

「いいね!」 3

Here are the values (pretty standard :sweat_smile:)

Yes

I just disabled the proxy, and tested again. The issue unfortunately persists. I will keep the proxy disabled for a bit so that you can test it yourself if you want :slight_smile:


Thank you very much for your help! :smiley:

「いいね!」 2

Hmmm.

If I try to load the upper case it loads: Προφίλ - ΣΠΥΡΟΣ - Ρεμπέτικο Φόρουμ at first and then fails on a subsequent JSON fetch to the lowercase version.

Looks like an error between our upper/lower case handling.

「いいね!」 6

Strangely this one works:

https://rembetiko.gr/u/αγγελικη_ντοτη

The username is ΑΓΓΕΛΙΚΗ_ΝΤΟΤΗ (again capital form of αγγελικη_ντοτη)

「いいね!」 2

Could it be that in Greek, there are two ways to uncapitalize the letter ‘Σ’?

  • ‘ς’ when it is used at the end of a word
  • ‘σ’ anywhere else
「いいね!」 2

So is this wrong?

[1] pry(main)> "ΣΠΥΡΟΣ".downcase
=> "σπυροσ"
「いいね!」 2

Yes, grammatically it is wrong. The correct form should be “σπυρος”

「いいね!」 2

Oh, so I’m afraid that is a Ruby bug:

➜  ruby --version           
ruby 3.0.0dev (2020-12-16T18:46:44Z master 93ba3ac036) [x86_64-linux]
➜  irb           
irb(main):001:0> "ΣΠΥΡΟΣ".downcase
=> "σπυροσ"
「いいね!」 3

But it works nicely when the link is created… So there must be a way for it to work… (?)

https://rembetiko.gr/u/σπυρος

「いいね!」 2

It shouldn’t matter if Ruby converts the username into the grammatically correct lowercase version of the username as long as it always looks up users with the normalized username (User.normalize_username in Ruby) and the username_lower in the database.

Which JSON fetch fails? It’s quite likely that there’s a route that uses a different mechanism for comparing usernames.

「いいね!」 4

Maybe it’s because Ruby and JS have different implementations?

➜  ruby --version           
ruby 3.0.0dev (2020-12-16T18:46:44Z master 93ba3ac036) [x86_64-linux]
➜  irb           
irb(main):001:0> "ΣΠΥΡΟΣ".downcase
=> "σπυροσ"
➜  node
Welcome to Node.js v12.11.1.
Type ".help" for more information.
> "ΣΠΥΡΟΣ".toLowerCase()
'σπυρος'

Firefox does the same as NodeJS in my tests.

The endpoint /u/#{username}.json only bring the username column and not the username_lower column, so maybe we are relying in the browser here? Digging it now…

「いいね!」 6

Oh, that’s bad. So, the problem is probably this:

https://github.com/discourse/discourse/blob/9870a0b6a1ba86e1e6192d32145507acbd53d43a/app/assets/javascripts/discourse/app/models/user.js#L264-L267

I was going to suggest to add the username_lower to the UserSerializer on the server instead of doing it on the client, but this would still leave us with a couple of other occurrences of username.toLowerCase.

I wonder if the better solution would be to use mini_racer for calculating username_lower on the server when it contains non-ASCII characters. :thinking:

「いいね!」 6

Well, independent of the workaround we chase I will report this to Ruby.

「いいね!」 8

Just for reference, PHP does it in the same way as Ruby… Makes me think that it is an intentional design (?)

You can test the code here:

「いいね!」 1

Interestingly Postgres also fails here:

[2] pry(main)> DB.query_single('select lower(?)', 'ΣΠΥΡΟΣ')
=> ["σπυροσ"]

Perhaps we should simply special case this quirk in our internal Discourse method that handles calculating username_lower ?

Find all the methods that call username_lower pipe them to a central function and then have this special case allowed for (I guess we can use a mini_racer call here if we wish or simply call .lower and fix it up afterwards with a sub call)

Updating the OP title here to make it clearer.

「いいね!」 3

Given:

[4] pry(main)> "σπυρος".downcase
=> "σπυρος"

@chrispanag a trivial workaround for you is simply to change the username to σπυρος, username and username_lower will be exactly the same and this will simply work.

I am mixed on adding workarounds to core just for this specific case, especially when a totally trivial workaround exists.

Additionally you could ban the use of Σ in usernames using our allowed unicode username characters setting which would ensure this issue never pops up.

All for fixing Ruby and Postgres here, but this is one long multi year battle to get these things fixed.

「いいね!」 5

I fully agree here, we report the upstream bugs and Discourse users can use existing tools to workaround in the meanwhile.

「いいね!」 5