Arabic Search Normalization: Missing Support for Hamza Variants, Ya/Kaf Forms,and Orthographic Equivalence

I think this a reasonable request since it would greatly improve the search experience for Arabic and Persian sites. We’d love to review a PR that implements this feature, so I’m going to put a pr-welcome on it.

For anyone who decides to work on this feature: all the normalization logic should be gated behind a site setting that enables this by default for Arabic and Persian sites (see locale_default in site_settings.yml) and all other locales should have this setting off by default. Core already has similar normalization logic for accented characters (see lib/search.rb), so that would be a useful reference when implementing this feature.

4 Likes