Unicode username with Σ as the final char leads to an error loading profile page

Firstly thank you very much for your help! It is an interesting bug nevertheless.

Thing is, I might need to change the username through the db, cause the route is not available :sweat_smile:

Can you provide me with a query?

I think this is not possible because then I would ban the many names that start with Σ in greece (and I will essentially break Facebook logins for those users). Can I ban a specific regex pattern? so that I ensure that the Σ is at least at the end?

1 Like

Another member registered with the exact same issue. My point here is that for us, this is not an edge-case. There are literally thousands of names in Greek that use the Σ (or ς) at the end of the name.

And unfortunately, given our primary age group (> 40), there are a lot of members that have their name written in all-caps on Facebook (:sweat_smile:) , so when they use Facebook login, their name is copied to the username field…

4 Likes

I wouldn’t worry about Postgres. We should always compare with username_lower in SQL and not rely on LOWER(), because username_lower isn’t just the lowercase version of the username. We apply Unicode normalization as well.

I agree. Adding a workaround to User.normalize_username should be enough for now. There’s already some talk in the Ruby bug and it looks like there’s no easy solution. However, we are lucky, because all we need to do is check the last character in a username. That’s a lot easier than doing it in a complete sentence.

4 Likes

Well… how would this username name be handled?

Caps: ΧΡΗΣΤΟΣ_ΠΑΝΑΓΙΩΤΑΚΟΠΟΥΛΟΣ
Small: χρηστος_παναγιωτακοπουλος or χρηστοσ_παναγιωτακοπουλος

NodeJs Handles this correctly, but if you only check the last character of the username, then the problem will persist.


(Almost all male names in Greek -both first and last- end with Σ)

3 Likes

Right. Still, it should be a lot easier than a full implementation, as we only have to worry about certain symbols like underscore and dash and maybe numbers? That should be doable.

2 Likes

I know that I’m moving away from this specific’s bug discussion, but when I think of this issue, I can’t dismiss the fact that this can be avoided.

I discovered that there are some API routes in discourse that refer to a user by userId and others that refer to a user by username. Shouldn’t this be more consistent (in favor of userId)?

Maybe implementing something like what happens with categories/tags now? Having both the username and the userId on the url eg: https://meta.discourse.org/u/chrispanag/4387

Just a thought :sweat_smile:

2 Likes

This will require a cleanup effort, I’ve definitely seen some LOWER() calls.

4 Likes

This has come up over the years: Update quotes and mentions when username is changed - #10 by sam

Back in 2016 @eviltrout was against it, not sure where he is sitting on it these days.

Anyway, I have a workaround in Discourse in this PR:

It will handle the facebook new signup issue by downcasing any username with sigma upfront. This means all you need to do is fix Spiros username and any other users with final sigma by downcasing and the issue should be gone long term.

(PR merge pending)

4 Likes

Thanks a lot @sam! Your help was immense.

Is there a way to do that from the rails console? (Because the user is broken, there is no way to do it through the admin panel…)

1 Like

Yes, this should do the trick

./launcher enter app
rails c
u = User.find_by(username: 'ΣΠΥΡΟΣ')
u.username = 'σπυρος'
u.username_lower = 'σπυρος'
u.save!
4 Likes

I still don’t love using ids everywhere but I understand there are a lot of cases when it makes sense.

In those I would prefer something like id-username where username is whatever we can put in a url. It could be ignored even by the router. But at least when sharing the link you’d have an idea what you are linking to.

6 Likes

Regarding PostgreSQL looks like setting up proper collations fixes the issue:

➜  discoursesmall git:(master) psql -d discourse_development
psql (13.1 (Ubuntu 13.1-1.pgdg20.10+1))
Type "help" for help.

discourse_development=# SELECT lower('ΣΠΥΡΟΣ');
 lower  
--------
 σπυροσ
(1 row)

discourse_development=# SELECT lower('ΣΠΥΡΟΣ' COLLATE "und-x-icu"); 
 lower  
--------
 σπυρος
(1 row)

Source: PostgreSQL: Re: BUG #15805: Problem with lower function for greek sigma (Σ) letter

7 Likes

Apparently there are other unicode characters that have similar transformations eg: ‘ß’ -> ‘SS’ (German)

It’s interesting to investigate how they are handled by Discourse…

Also, you could check out this resource too:

Personally I am a big fan of the Stack Overflow style routing for users:

https://stackoverflow.com/users/17174/sam-saffron

In Discourse land this would be:

https://meta.discourse.org/u/17174/sam-saffron

This kind of gives you the cake and let’s you eat it. Buy yeah I totally get the objection to “I don’t like 17174 anywhere in the URL, usernames are stable”

That said, we have survived for this long on our existing routes, its just that once every few years some edge cases pop up.

5 Likes

I think it would be helpful to use the id instead of the username (or both, but depending only on the id, as explained above) at least in the user profile page so that admin can change the username using the Discourse UI without having to run the command in the rails console if something goes wrong with the username.

We only have to worry about characters where the JavaScript and Ruby implementation of the conversion into lowercase differs. Not the other way around.

There is no rule how the German lowercase letter “ß” is converted into uppercase. It could be “SS”, “SZ” or even the new uppercase letter “ẞ” (yeah, there’s a subtle difference). Reversing that process is only possible for “ẞ” and that works correctly in Ruby and JavaScript.


I think we should do the following:

  1. Merge @sam’s workaround to fix the immediate issue
  2. Remove LOWER(username) from SQL queries, because it’s just a bad™ thing to do (e.g. missing Unicode normalization)
  3. Hope that Ruby fixes the underlying issue
  4. Long term: Think about adding user IDs to routes. I guess the hardest part will be figuring out how to handle quotes and mentions.
3 Likes

not sure if it’s related but we’ve got a few users with unicode names and these are scattered through the /log

ActionView::Template::Error (No route matches {:action=>"show", :controller=>"users", :username=>"ζηεδψ"}, possible unmatched constraints: [:username])