How to support unicode-style username?

(唐明星) #1

I have noticed a few discussions over this topic: if it should or should not support more letters, i.e Chinese characters in username. And the conclusion often comes to NO, and I can understand this for you English guys who are lucky to have only 26 letters, I really, really envy you. For we Chinese, who have almost ten of thousands of letters, I really need the username is more open.

Why? There is display name already, maybe you say. But sorry, it’s useless.

  1. We can’t log in by it
  2. We can’t see it, except the profile page
  3. We can’t mention it, only see it when mentioning

I hope it’s more open, and if the core team think it’s bad for the most people, then, can you just give me some paths to do this job? Like, what’s the difficulty for you to do this?

Thank you.

(Juffin) #2

i support this, Unicode usernames are my communities pain -_-

(Jeff Atwood) #3

You generally log in via email address, are email addresses unicode?

Enable the option in site settings and full name will appear on every post next to username. Click on the avatars in the left column and you will see the display name.

@name mentions match on username and display name. To mention me you can type @jeff … (as in Jeff Atwood, my display name) or @cod… (as in codinghorror my username)

What had been discussed was a mode where both username and display name show in the left gutter.

(唐明星) #4

I have been said, I can understand you guys, because you are using English, and doesn’t know Chinese at all. But let me tell you this, for we Chinese, almost every site’s username can contain Chinese character, which of course English letter too, and most of them are using Chinese username, you know why?

Because for most of us, codinghorror means nothing, we can not understand, we can not remember, even we can not pronounce, read or spell. It’s just a serial of strange letters, using that as a username is a nightmare for us.

Can you recognize 唐明星, which is my name from my birth? And if you can only use Chinese as your username, will you be crazy…too???

So that’s my feeling, but maybe that’s not fair, because a lot of Chinese know English, but, not all.

Let me tell you, it’s definitely a nightmare for us to use just English letters as username, and I must change it. And I don’t expect you could do this just for guys like us, but provide some information please. I would appreciate it very much.

(Sam Saffron) #5

I appreciate this issue, I see no reason why people running Chinese Discourse sites should not be allowed to have Chinese usernames. Besides familiarity they are significantly more concise and efficient. Imagine having to switch keyboards constantly just cause you want to @mention someone. This heavily discourages a flagship feature we have. Sina Weibo - Wikipedia allows it and its the 36th most popular site in the world.

I personally support someone adding a mode/flag/setting that enables unicode usernames, though would like this to be default off. If anyone is working on this I strongly recommend thinking through homoglyph attacks. Perhaps a mode where usernames are constrained to another language.

There is an open question about our central registry, I have not thought this part through.

(Sam Saffron) #6

It works but not in an expected way, as a Chinese user when you hit the @ sign you would see a pile of names in unfamiliar English.

(唐明星) #7

To be frank, I don’t expect you can take it as an issue, and even understand what I am feeling, I just want to be told some advice. But what you say is so…

I am totally moved, my friend.

(唐明星) #8

I have some ideas now, and I will try to do this job.

(Mait) #9

I really upvote this as Korean.

(Akshar Prabhudesai) #10

I too upvote this feature as and Indian. I am starting to build a community for multiple Indian languages and supporting unicode usernames should be important. This is primarily useful where we are mentioning the user.

(Jeff Atwood) #11

There is now support for this. There is now a setting to enable displaying the full name (spaces + unicode + much longer length) along with the username (short + ASCII), it is display_name_on_posts.

Enable it and you get both in the left gutter instead of just the username as before:

And the longer name is FULL UNICODE with spaces and everything else.

(Erick Guan) #12

I’m reviving this topic as Sam relaxed the username rules recently.

I got many requests from Chinese community during this year. Therefore I’m considering Unicode username.

As for homoglyph attacks, it’s hard. Major browsers have to display it so they build different kinds of algorithm. For Discourse, avoiding this character seems a better approach. Though it’s not a problem can achieve easily.

  1. Cyrillic is tricky since it looks like English letters.
  2. There are also many Chinese characters look like the same(identical actually).
  3. Discuz simple has a regex site setting can censor any words you like. Several years ago, it has a short Unicode list which cause all kinds of illegal username problem.
  4. The list of confusables is long.

I don’t see a quite way to avoid this problem. A site setting allow the Unicode username is doable. Admin can ban the abuser. But it won’t do any help with the central registry.

(无限星辰) #13

so, how’s this feature going?
I want to import my Discuz forum to Discourse, and there’s Chinese username all over it.
How can I do that?

(Sawood Alam) #14

Keyboard switching is a really big pain, but the problem doesn’t end there. Although, it looks like allowing an option to show full names is helpful, but it not. At the time of composing messages, one can not expect the suggestions to show up if the full name is not ASCII, so one must switch keyboard.

Then comes the RTL languages, where the text is flowing from right-to-left, and mentioned IDs are right-to-left. Since user mentions show IDs and there is no option to show full names instead, so it makes the mixed text look ugly.

Our Urdu community was using vBulleting, then we switched to XenForo a few years ago. They both had Unicode name support with spaces (there is no concept of full name, the same serves as ID as well). XenForo even allows mentions with full names with spaces, so does Facebook. These days we are evaluating other options including Discourse and we find this issue a deal breaker for us, because our members would not be happy if their Urdu IDs were not preserved.

Short IDs along with full names make sense in twitter where the text is bound by 140 character limit, and people don’t write paragraphs there, so keyboard switching is OK. Short ASCII IDs are also fine for technical sites such as GitHub where repo URLs use that as namespacing; Unicode is not necessary there because developers must know English for coding anyway. This is an unnecessary limitation for a forum software that would limit people to participate in the conversation who have ZERO knowledge of ASCII letters, as they wont be able to register in the first place.

Restricting IDs to only ASCII is too limiting. Even domain names and email addresses are allowed in Unicode. Indian government recently asked big email providers to support email addresses in local languages starting with Hindi (India has 50+ commonly used languages).

(dfpoon) #15

I wish to allow chinese usernames for my forum too. Is it possible at the current version? If not, is it on the roadmap? Thanks.

(Eli the Bearded) #16

I believe the current recommendation is to keep ASCII usernames, but to enable the “Full Name” being preferred. “prioritize username in ux” is the name of the config option (in English) on admin/site_settings/category/users . Possibly want to check full name required on the same page. Users can then @ mention by fullname.

(Erick Guan) #17

@elijah gave the working solution for now. There is some work needed to be done. But my study duty diverts me now. I still want to improve it though don’t expect it can come any time soon.

(rizka) #18

So currently usernames are built of 26 letters, 10 digits and dashes and underscores. That is 38 characters allowed in total. An important thing to notice is that lowercase and uppercase letters are equivalent: @rizka and @rIZkA are the same user.

I know the reasoning for this and know that this is the basis of the software. But I had an idea which would respect this but would also allow some Unicode characters in usernames. The idea would build on the equivalence of lowercase and uppercase letters. The suggestion: let a site admin to map any Unicode character to any of the 38 currently allowed characters. A user could use the Unicode character and the character of the real set interchangeably.

Mapping the space Unicode character to anything would be a terrible idea of course. You would break any mention which is not at the end of a paragraph. But with reasonable mapping you could do great. To illustrate: the Finnish alphabet shares the 26 letters with English but we also have three Scandinavian letters in uppercase and lowercase forms (Å/å, Ä/ä and Ö/ö). If possible, I might map

  • Ä, ä, Å and å --> a.
  • Ö and ö --> o.

Surely Unicode characters would be in essence an illusion. But for any Finnish speaker that would make sense because that is how they replace letters of Finnish names in international contexts all the time. It would be natural that @rizkä is the same person than @rizka. And even if you didn’t know that, you would still not fail at anything! Thus this solution would satisfy the needs of any Finnish community.

However, you wouldn’t bother to develop this for Finnish speakers. Not just because there are so few of us but also because this is a cosmetic issue for Finnish communities. It annoys some new users, but they can just replace ä with a etc. I’m more interested how this idea would serve non-Latin languages. I understand little about Russian and other Cyrillic languages, but as far as I can tell, this could work quite nicely. What do you think @meglio? I won’t even guess what a Chinese or Japanese speaker would think of this. I let them speak for themselves.

PS. Take this as a novel idea, not as a request! :wink:

(Régis Hanol) #19

Wouldn’t removing diacritics in usernames be enough?

(rizka) #20

If we find a resource of diacritics removal which Discourse can rely on, then maybe. I’m somewhat worried that there may be all kinds of wild characters on the lists. There is even a slight risk that two languages share a character but speakers of the languages would prefer to symbolize it with different letters of English alphabet. Therefore I suggested that we could count on the admin’s judgment.

This wouldn’t help those who suffer most at all, though. I mean the speakers of languages with non-Latin script. I also don’t think that this is the top request of any of your Latin-script clients and this is a minor issue for me as well as said. But if this is not too difficult to implement, then I believe this could be an idea for a nice plugin.