「通常の」同等物でファンシー文字と一致する検索

I’ve copy-pasted a topic title (as it is displayed, with fancy entities) in the search
New Lowe’s commercial with UniGeezer

No result:

I replace the fancy apostrophe by the “regular” one in the search field:
New Lowe's commercial with UniGeezer

Now the topic appears.

My suggestion is that the search should match every fancy character with the original one.

「いいね!」 8

Good point, how should we handle this @sam?

「いいね!」 3

What about diacritics?

We have some normalization for diacritics already so maybe we can also correct this in a similar path.

@tgxworld can have a think about it.

「いいね!」 3

@Canapin Are you still able to reproduce this? I tried to reproduce this locally but couldn’t. The apostrophe is stripped from the search data so it should not have any effect on search.

discourse_development=# SELECT TO_TSVECTOR('english', 'New Lowe’s commercial with UniGeezer') @@ PLAINTO_TSQUERY('english', 'New Lowe’s commercial with UniGeezer');
 ?column? 
----------
 t
(1 row)

Are you able to point me to the site which you’re facing this problem so that I can get a repro? Thank you!

「いいね!」 1

I still have the issue, and it’s when I search for the exact string (wrapped by "):

https://unicyclist.com/search?q=%22New%20Lowe%E2%80%99s%20commercial%20with%20UniGeezer%22

vs

https://unicyclist.com/search?q=%22New%20Lowe%27s%20commercial%20with%20UniGeezer%22

「いいね!」 1

Thank you for the repro. This basically affects search for exact terms when the search terms are wrapped in ". The problem here is that the real title of the topic is actually New Lowe's commercial with UniGeezer but the fancy title is New Lowe’s commercial with UniGeezer. When we do a search for exact terms, we’re only matching the given terms to the topic’s title and not the fancy title.

The difficulty here is that we can’t just replace with ' unconditinally becasue a topic title with in it will end up not matching. I kind of unsure what we can do here because we’re displaying different characters on the client side when displaying the topic title.

@gerhard @sam It seems like you have tackled this issue around quoting before, any ideas what we can do here? To be honest though, it is an edge case that will affect a very small portion of search queries. I’m inclined to just pun on this.

「いいね!」 1

This is no laughing matter! :stuck_out_tongue_winking_eye:

I guess we could normalize to ' in the index and search term. But I am honestly not sure it is worth a giant effort fixing this.

「いいね!」 1

This is not related to the search index. For exact matches, we match it against Post#raw and Topic#title:

「いいね!」 1

I see, yeah … no easy solution here at all, I think this is just a nit we have to live with.

「いいね!」 2