Search function returning approximate results?


I was searching for this topic on my forum: les passants - Autour d'une roue -

The topic’s title is “les passants”

Before, searching with “passants in:title” would return this topic among the first results.

But now, it displays approximate results:

Instead of “passants”, you’ll see “passe”, “pass”, “passage”, “passion”, some of these words before even the exact string “passants”…

I suppose this is a bug?

I’m afraid I don’t know enough about the Search to know if that’s on purpose or not, but you can add double quotes around a word to make it an exact search if that helps?

Eg. "passants"

Résultats de recherche pour « "passants" in:title » -

1 Like

That sure helps, but the default behavior is puzzling.

For an unknown reason, when searching the exact string with double quotes, it also returns a result that has no “passants” in the title (but it has in the first post content):"passants"%20in%3Atitle

1 Like

In French, passants is reduced to the pass lexeme before we search against the contents of the posts.

A lexeme is a string, just like a token, but it has been normalized so that different forms of the same word are made alike.


I understand.

Are the lexemes language-related?

I don’t feel it makes much sense in French to reduce such a word to “pass”. :thinking: This is more confusing than helping considering there are a lot of words that begin with “pass” without being (directly or not) related to the verb “passer”.

But if we can target the exact string with double quotes instead, that will do :slight_smile:.

Also, what’s with the second result not having “passants” in the title despite using the double quotes on my last screenshot?

This stuff is configurable in Postgres, but the French implementation in Postgres is the thing reducing this all the way down:

discourse_development=# select to_tsvector('french', 'passants');
(1 row)

discourse_development=# select to_tsvector('english', 'passants');
(1 row)

There are more fancy dictionaries that can be used, but they are extremely complicated to configure.