Unicode usernames and group names

@marguerite You should also remove mentions.js.es6 from the plugin. There’s no need to patch anything related to usernames anymore. Only your customizations for categories and tags might still be needed, but we will fix that as well.

The +? at the end of the regex isn’t needed.
Out of curiosity: Can this whitelist be used for both zh_CN and zh_TW or is there a difference?

6 Likes

@gerhard

I removed mentions.js.es6, do I need to remove override-username-match.js.es6 as well?

\p{Han} covers traditional and simplified Chinese. My whitelist will allow CJK usernames

2 Likes

For EU langs it might be easiest to just allow all extended latin if possible, rather than hand-picking specific letters for every language. :slight_smile:

EDIT: Although reading a bit more about homograph attack, might not be the best idea after all. :blush:
Here are chars for Czech;

ěščřžýáíéóůúďťň

8 Likes

Yes, you can remove that as well. Looks like that part of the plugin is broken anyway. User cards were refactored about a year ago.

4 Likes

Katakana and Hiragana are both Japanese. Hangul is Korean.

I love this work. And I think a default setting below should work:

  • zh_CN, zh_TW: \p{Han}. This covers Chinese characters. Some communities can use more other characters. Maybe not default.
  • ko: \p{Hangul}. Korean don’t write Chinese at all. (I heard there are some Chinese characters in use in Korean?)
  • jp: [\p{Han}\p{Katakana}\p{Hiragana}] Japanese use all of them.

And maybe it’s good to mention reserved_usernames :sweat_smile:Unicode username does enable more names to be fake as admin/moderator.

9 Likes

Thanks for the regular expressions and also for the tip regarding reserved usernames. I added a note in the first post.

5 Likes

How does this option affect migrations? I am migrating from Kunena with a script based on the “official” kunena3.rb script.

I have a user called abd-def (for example). It gets imported as abcdef.

Then I turned on this option for the unicode usernames and deleted that user, and re-ran the script. It was again imported as abddef :frowning:

How can I ensure my user names with dashes don’t get changed during import?

Thanks!

The removal of the dash has nothing to do with Unicode usernames. That’s happening because the import script manipulates the username during the import.

https://github.com/discourse/discourse/blob/7f8cdea9244760b7f27bcebb86de0121006f0ce3/script/import_scripts/kunena3.rb#L89-L93

I don’t think there’s any need for that. Try replacing those lines with the following code snippet. It should work.

@users[u['id'].to_i] = { id: u['id'].to_i, username: u['username'], email: u['email'], created_at: u['registerDate'] }
7 Likes

I’m happy to see support for Unicode usernames and group names :+1:.

With the introduction of the support for Unicode usernames however, there’s now a bit of a odd situation where Discourse can support something like 中国 or ไทย as a username, but not -dashing- as it still requires the first and last character to be a letter, number or underscore (but not a dash).

I tried using the Unicode support setting to add support for the dash character but that didn’t seem to work for me, although I may have missed something.

Would it make sense to revise this rule for the first/last characters about the dash now that Unicode is supported? Is there a reason to continue not allowing dash on the first&last position but allow any non-ASCII letter (including the underscore)? Dash doesn’t seem to require special encoding on URLs, but maybe there’s another reason for this?

I know this is a bit of a tangent to the topic, so let me know if I should open a separate one.

@gerhard Can a user name be like this?

discource__
or
discource_name
?

Because I can’t seem to make it work!

TIA

See reserved_usernames site setting.

1 Like

@pfaffman Thanks. That’s not what I meant, I meant:

username__ or username_ or user_name

User_Name and _Username should work, but it won’t allow Username_ (“Username must end with a letter or a number”)

2 Likes

Thanks. Is there a way to change it?

There is not an easy way.

1 Like

Thank you Jay, I appreciate it

1 Like

I tried to allow Japanese characters as username but failed. However, allowing Chinese only with \p{Han} takes effect. Is this method not usable anymore?

It should still work. Did you forget the braces at the beginning and end? It should be [\p{Han}\p{Katakana}\p{Hiragana}] – otherwise could you please provide example usernames that didn’t work?

2 Likes

Thank you!! It works now. The default value of this field was \p{Han} so I thought the braces were not necessary. :flushed:

2 Likes

3 posts were split to a new topic: Issue with renaming user with unicode characters