Unicode用户名和群组名称

放宽用户名中允许字符的限制是最古老的功能请求之一。从 Discourse 2.3.0.beta9 开始,终于可以在用户名和组名中使用 Unicode 字符了。

新增站点设置

有两个新的站点设置:allowed unicode username characters(允许的 Unicode 用户名字符)和 unicode usernames(Unicode 用户名)。

allowed unicode username characters 允许您仅允许特定的 Unicode 字符(例如 [äöüßÄÖÜẞ]\p{Greek})。默认情况下,Discourse 允许字母(Ll / Lm / Lo / Lt / Lu)、标记(Mc, Me, Mn)和数字(Nd, Nl),但_不包括_ No。此设置可以限制这些字符,但无法添加额外的字符。此外,也无法禁止 ASCII 字母和数字。

您应根据社区的需求进行定制,仅允许社区所用语言所需的字符和脚本。

如果您想了解更多关于正则表达式中的 字符类字符属性 的信息,请参阅 Ruby 文档。

unicode usernames 默认处于禁用状态,我们强烈建议您在启用它之前先配置 allowed unicode username characters 设置,以防止 同形用户名欺骗

允许的示例值:

  • zh_CN 中文:[\p{Han}]
  • zh_TW 中文:[\p{Han}]
  • ko 仅韩语:[\p{Hangul}]
  • jp 日语:[\p{Han}\p{Katakana}\p{Hiragana}]
  • jp 日语(仅限片假名):[\p{Katakana}]
  • fi 芬兰语:[åäöÅÄÖ]
  • cs 捷克语:[ěščřžýáíéóůúďťň]

字母头像服务

字母头像服务已更新,我们添加了对生成最常用脚本头像的支持。如果您发现您的语言缺少头像,欢迎在 GitHub 上提交拉取请求,从 Google Noto 字体 系列中添加字体。

仅当启用了 external system avatars enabled(启用外部系统头像)站点设置时,才能启用 unicode usernames,因为内部头像生成器不支持 Unicode。如果您无法或不想依赖外部服务,可以运行自己的 字母头像服务 实例。

我们甚至支持五月添加到 Unicode 中的全新“令和”(Reiwa) 字形。

须知…

Discourse 在验证用户名长度(min username lengthmax username length 站点设置)时,计算的是 字素簇(“用户感知的字符”),而不是 Unicode 码位。字母头像服务也使用用户名的第一个字素簇来生成头像。

您还应该查看 reserved usernames(保留用户名)站点设置。既然您的论坛现在支持用户名中的 Unicode,您可能需要添加额外的保留用户名。

反馈

您是否为社区启用了 Unicode 用户名?我们很希望能听到您的反馈。
此外,我们想为 Discourse 支持的每种语言设置 unicode username character whitelist(Unicode 用户名字符白名单)的合理默认值。欢迎在回复中建议正则表达式。

41 个赞

Thanks for the new feature!

I do have a discourse instance running for Chinese users, and I would like to test it.

But we have installed another plugin discourse-username-localization because previously unicode usernames were not supported officially.

So I would like to know how could I disable that plugin and switch to the official solution, will it break something? Any recommended steps to follow?

If this can be done, I think every CJK instance will switch to our official solution and contribute whitelist immediately :grinning:

5 个赞

It looks like the plugin also changes the behavior for linking to CJK tags and categories. This will probably break, but we should fix it in Discourse core. That should be easy to fix.

Other than that disabling the plugin and enabling the official Unicode support should work without problems. Letter avatars will look differently afterwards, because the plugin currently converts Chinese usernames into latin characters. But I guess that’s a good thing. :slight_smile:

9 个赞

Thanks.

I ‘ll create a branch with those not implemented yet left and try the official solution so they may not conflict.

The tags and categories uses the same set of regex but in javascript which doesn’t support \p{Katakana} stuff. I raised an issue to unify regex in that plugin, but the attempt failed. is it possible to use the same whitelist in the official implementation? eg a converter to convert ruby whitelist to javascript.

And the unicode avatar is just excellent!

6 个赞

I just switched my forum to unicode username.

I updated discourse-username-localization to remove all the ruby stuff. (can’t wait to see you guys fix hashtags and mentions in the core, so I can abandon it completely)

And use this whitelist:

[\p{Han}\p{Katakana}\p{Hiragana}\p{Hangul}]

And update letter_avatar service to v4.

Now it works

5 个赞

I think mentions are already supported :thinking:

3 个赞

For Finnish, it should be [åäöÅÄÖ].

4 个赞

Isn’t this your real Finnish name @rizka :wink:

7 个赞

Not quite, I have just one of those in my surname. :slight_smile:

Å/å is actually not a pure Finnish language letter. It never appears anywhere except the Finnish alphabet, computer keyboards and names of Swedish people and places. Ö/ö is somewhat rare. Ä/ä is by far the most common, but for a reason unknown to me, very uncommon in Finnish first names. Appears in many surnames, though, like mine. :slight_smile:

7 个赞

@marguerite You should also remove mentions.js.es6 from the plugin. There’s no need to patch anything related to usernames anymore. Only your customizations for categories and tags might still be needed, but we will fix that as well.

The +? at the end of the regex isn’t needed.
Out of curiosity: Can this whitelist be used for both zh_CN and zh_TW or is there a difference?

6 个赞

@gerhard

I removed mentions.js.es6, do I need to remove override-username-match.js.es6 as well?

\p{Han} covers traditional and simplified Chinese. My whitelist will allow CJK usernames

2 个赞

For EU langs it might be easiest to just allow all extended latin if possible, rather than hand-picking specific letters for every language. :slight_smile:

EDIT: Although reading a bit more about homograph attack, might not be the best idea after all. :blush:
Here are chars for Czech;

ěščřžýáíéóůúďťň

8 个赞

Yes, you can remove that as well. Looks like that part of the plugin is broken anyway. User cards were refactored about a year ago.

4 个赞

Katakana and Hiragana are both Japanese. Hangul is Korean.

I love this work. And I think a default setting below should work:

  • zh_CN, zh_TW: \p{Han}. This covers Chinese characters. Some communities can use more other characters. Maybe not default.
  • ko: \p{Hangul}. Korean don’t write Chinese at all. (I heard there are some Chinese characters in use in Korean?)
  • jp: [\p{Han}\p{Katakana}\p{Hiragana}] Japanese use all of them.

And maybe it’s good to mention reserved_usernames :sweat_smile:Unicode username does enable more names to be fake as admin/moderator.

9 个赞

Thanks for the regular expressions and also for the tip regarding reserved usernames. I added a note in the first post.

5 个赞

How does this option affect migrations? I am migrating from Kunena with a script based on the “official” kunena3.rb script.

I have a user called abd-def (for example). It gets imported as abcdef.

Then I turned on this option for the unicode usernames and deleted that user, and re-ran the script. It was again imported as abddef :frowning:

How can I ensure my user names with dashes don’t get changed during import?

Thanks!

The removal of the dash has nothing to do with Unicode usernames. That’s happening because the import script manipulates the username during the import.

I don’t think there’s any need for that. Try replacing those lines with the following code snippet. It should work.

@users[u['id'].to_i] = { id: u['id'].to_i, username: u['username'], email: u['email'], created_at: u['registerDate'] }
7 个赞

我很高兴看到对 Unicode 用户名和群组名称的支持 :+1:

然而,随着对 Unicode 用户名支持的引入,现在出现了一个有点奇怪的情况:Discourse 可以支持像 中国ไทย 这样的用户名,但却不支持 -dashing-,因为它 仍然要求 首尾字符必须是字母、数字或下划线(但不能是连字符)。

我尝试通过 Unicode 支持设置来添加对连字符的支持,但似乎没有生效,尽管我可能遗漏了某些步骤。

既然已经支持了 Unicode,是否有必要重新审视关于首尾字符不允许使用连字符的规则?是否有理由继续禁止在首尾位置使用连字符,却允许任何非 ASCII 字母(包括下划线)?连字符在 URL 中似乎不需要特殊编码,但也许还有其他原因?

我知道这有点偏离主题,如果需要的话,请告诉我是否应该另开一个话题讨论。

@gerhard 用户名可以这样吗?

discource__
或者
discource_name

因为我似乎无法让它工作!

提前感谢

请参阅 reserved_usernames 站点设置。

1 个赞