No search result if filtered by unicode username

I met a bug that I cannot search a user’s posts by @+username in search box if the username contains unicode (e.g. Chinese \p{Han}).

The reason is regex used in following code (in lib\search.rb) only match ascii characters.

advanced_filter(/^\@([a-zA-Z0-9_\-.]+)$/i) do |posts, match|

I tested if I add allowed_unicode_username_characters in the regex (i.e. adding a rule advanced_filter(/^\@([a-zA-Z0-9_\-.\p{Han}]+)$/i)), it works fine.

3 Likes

Nice one, yes, we certainly want to fix this, we allow site admins to opt for unicode usernames.

Do you care to try a PR? We should only amend this behavior of the site admin opts for unicode usernames.

2 Likes

I use a plugin to add the advanced_filter by following code, and it satisfies my needs.

if SiteSetting.unicode_usernames?
    regexp = Regexp.new("(?i-mx:^\\@([a-zA-Z0-9_\\-.#{SiteSetting.allowed_unicode_username_characters}]+)$)")
    Search.advanced_filter(regexp) do |posts, match|
      username = match.downcase
  
      user_id = User.where(staged: false).where(username_lower: username).pluck_first(:id)
  
      if !user_id && username == "me"
        user_id = @guardian.user&.id
      end
  
      if user_id
        posts.where("posts.user_id = #{user_id}")
      else
        posts.where("1 = 0")
      end
    end
  end

But I don’t know how to modify the core properly, especially when allowed_unicode_username_characters is a variable. Maybe a simple but probably wrong solution is to change advanced_filter(/^\@([a-zA-Z0-9_\-.]+)$/i) to advanced_filter(/^\@(.+)$/i), just like you did in advanced_filter(/^user:(.+)$/i).

2 Likes

I think a PR to core is the way to go here. It is a bug in core. You would need to add a test though in the PR

2 Likes

OK, I made a PR just now and you can take a look.

3 Likes

Merged, thank you for your contribution!

3 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.