Setting a similarity threshold for semantic search

I am using discourse v3.1.5 in my application. When I try to use semantic search for topics, it ranks the topics from most similar to least similar, but it will show all the topics in results even if the search query is completely different. I want to limit the search results to have a specific similarity at least in order to be displayed as a search result. Is there a way to do so?

It will limit it to the 50 closest results by default. There is no way to pass an arbitrary similarity threshold at the moment, but it seems like we could add.

How would you envision it? A single setting with the max distance?

1 Like

Ideally yes, having a max distance can help eliminate non-relevant search results.

I’m curious, how would you determine an appropriate value for max distance? I guess for one instance is easy, but one thing that pushed me away from this is that calculating a proper value for every instance out there and setting it as a default is non-trivial.

I guess we could ship by leaving it null and disabled by default.

1 Like

The feature would be useful when there is not much data to be searched on. Alternate thing that can be done to tackle less relevant results is to display a message that “close matches were not found” if not limiting the max distance for the search.

What about setting the threshold as, let’s say, one third of maximum? Or to show only top-n results?

That is the current behavior, it shows top 50.

2 Likes