Regular expression (non-English)

Continuation: Tag search in tags dropdown box does not work for non-English characters

file: discourse_tagging.rb

-       term.gsub!(/[^a-z0-9\.\-\_]*/, '')
+      term.gsub!(/[^a-z0-9а-я\.\-\_]*/, '')

All checks have failed

-       term.gsub!(/[^a-z0-9\.\-\_]*/, '')
+      term.gsub!(/[^a-z0-9\p{Cyrillic}\.\-\_]*/, '')

https://github.com/discourse/discourse/pull/4886
All checks have passed

  • The construction: а-я is private (for the Russian language).
  • The construction: p{Cyrillic} (Describes many cases)

All options work on localhost

If the construction: p{Cyrillic} - Is true (and it works fine on my site), then the same will be done in the file: search.rb in the search for tags.

448 advanced_filter(/tags?:([a-zA-Z0-9,\-_]+)/) do |posts, match|

What do you think about it?

Is it possible to do this in a way that also contain other alphabets like Persian and Arabic?

I think if you change the code: p{Cyrillic}
It is necessary to find what is suitable, then yes.

I think \p{Alnum} would be the right choice.
https://idiosyncratic-ruby.com/30-regex-with-class.html

4 Likes

I checked: localhost - works (As well as: а-я, p{Cyrillic} …)
But the test showed:

term.gsub!(/[^a-z0-9\p{Alnum}\.\-\_]*/, '')
All checks have failed

Strangely (Although it already includes both letters and numbers)
delete
- a-z0-9

Can you make a correction yourself?
This bug is initially there, and working with tags is an important part.

P.S. term.gsub!(/[^\p{Alnum}\.\-\_]*/, '') - working (All checks have passed)
file: search.rb - does not need correction

1 Like