User mention full name search is not working for Unicode names


(Sawood Alam) #1

I can type @Atw and see @codinghorror suggested for mention, which illustrates that the auto-suggest system is working not only for user id lookup, but also for name lookup, this is great and very much desired.

However, if the name has Unicode characters beyond ASCII, then name lookup does not work. For example if I type @علی (which is an Arabic, Persian, or Urdu name), I would expect @ALINAWAZJOYIA to show up in the suggestions, which it doesn’t.

For users of different languages it would be very inconvenient to switch the keyboard to English just to mention someone then switch it back to the original language to continue writing. Additionally, it forces people to remember IDs, not names which is often more common in non-technical communities for people to remember and call each other by names.

Also, a feature would be great if name lookup continues after @ is typed, even if spaces are added to the text, until the server exhausted suggestions. This would be useful in situations where the first name is very common and to find a specific user, one must also type part of the middle or last name until the desired name shows up in the auto-complete list.

The following old thread is somewhat relevant, so I am adding it here for the reference.


How to support unicode-style username?
(Rafael dos Santos Silva) #2

That being said, after using a board with 5.000 people with the same first name, I would like an option to allow spaces.


(Sawood Alam) #3

I wont’t be too certain about saying never when it comes to software features. Software is to help people, not to punish them. There are good examples out there to learn from, how to implement something that caters to the needs of a broader audience. I understand that sometimes architectural design decisions, that were made earlier, may limit the ability to do specific things in specific ways, but user needs and good experience should always be the priority.

I am not well aware of the codebase, but here is one approach that could work efficiently while allowing full name searches with spaces in many languages. I am supposing here that the server looks up for matches in the user ID and full name fields for the mention queries. Here is a pseudo-code to illustrate how the client could behave (expressed in Ruby-ish syntax, though it would be JavaScript eventually, if implemented):

MAX_SUGGESTIONS = 6

result_cache = [] # use a set for deduplication, do not clear often to maximize reuse
prefix_match_exhausted = false # whether server returns less results than MAX_SUGGESTIONS
mention_mode = false
query_prefix = ""

def reset_mention
  prefix_match_exhausted = false
  mention_mode = false
  query_prefix = ""
end

def request_server
  results = response_from_server(query_prefix)
  result_cache.add(results)
  if results.length < MAX_SUGGESTIONS
    prefix_match_exhausted = true
  end
end

def populate_suggestions
  # cancel previous trigger if the timer is not up yet
  suggestions = find_in_cache(query_prefix)
  unless prefix_match_exhausted || suggestions.length >= MAX_SUGGESTIONS
    suggestions = request_server
    if suggestions.length.zero?
      reset_mention
    else
      render_auto_suggest(suggestions)
    end
  end
end

case typed_char
when "@"
  mention_mode = true
when "\r", "\n" # or any character that is neither allowed user IDs nor in full names
  reset_mention
when "\b"
  at_pos = position_of_at_sign_in_the_current_line
  if at_pos >= 0
    prefix_match_exhausted = false
    mention_mode = true
    query_prefix = current_line[at_pos..cursor_pos]
    populate_suggestions
  else
    reset_mention
  end
else
  if mention_mode
    query_prefix += typed_char
    populate_suggestions
  end
end

This is a rough illustration. There are other places that might improve the performance and reduce server queries, such as:

  • asking server to return more results for each query than required to populate the suggestions, this way successive queries can be avoided by spatial locality in the cache
  • there could be a min and/or max number of characters to trigger the mention_mode
  • the result_cache can use the trie data structure, with some flags on nodes that represent prefixes that were actually queried from the server to allow prefix_match_exhausted state
  • server may return results prioritized based on recent activity and/or reputation

(Jeff Atwood) #4

Did all 5000 of those users have the same username though? I agree that once you have a million users, the namespace gets pretty… crowded.


(Rafael dos Santos Silva) #5

Hahahaha no. We had 100k users/employees. And they were pre-created into the forum.

So we used the employee_id (One letter + 7 numbers like X1234567) as the username. People can remember employee_id of close colleagues (like your boss and team) but when you just want to mark:

Oh this topic is gold. Gonna mark Susan from the HR here.

You know her full name is Susan Summers, but there are 15+ Susans and 15+ Summers on the board.


(Sawood Alam) #6

I am not an Arab, but I have not seen a group of 5+ Arab people among whom there is no one named Mohamed or Ahmed. Similarly, in India Kumar is a very common part of men’s name.