More specific "similar topics" when composing

Questions have frequently come up here on meta about tweaking or turning off the education notice (is it called that in that case?) that is supposed to alert users to “similar topics” while they’re composing a new topic. My interpretation of this (and my own experience) is that the suggested topics are often too unsimilar and hence irrelevant. My suspicion is that in the long term this leads to users not paying attention to the notices at all and routinely dismissing them as a nuisance.

If that’s the case, I think that would be a pity cause the idea of the feature is obviously very useful. So I wonder if it would make sense to introduce a couple of settings that allow admins to tweak the selection of relevant topics so as to change the UX from “if there are similar topics suggested while composing, there is a slight chance that this list includes one that is relevant for me” to (ideally) “if a similar topic is suggested, it’s probably the one I’m looking for”.

I’m not sure what might be the best settings to tweak, but the ones available at the moment (min title similar length, max similar results, and minimum topics similar) don’t seem to be enough.

One idea could be to limit the search for similar topics to certain trigger terms that staff would enter manually based on their experience of reoccurring questions.

Another would be to limit similar topics to topics in specific categories or with specific tags.

The obvious, but possibly most difficult to implement option would be to simply change the similarity threshold for topics to be considered similar. For example, one could require x words in the title to be similar (unless title is still empty) AND y identical words in the body.


None of this is relevant unless you have tested your search ideas / theories using Advanced Search, with specific real world examples.


Sorry to pile on here, but getting topic similarity search going efficiently is a very very extremely hard computer science problem. It is the reason complex recommendation systems exist with deep learning neural nets.

Trying to jimmy in something here that is not super duper AI :nerd_face:, that performs well and produces good results is not going to be easy to say the least.


I understand the benefits of AI and I know that Google search has raised expectations of what any search engine should be able to do. And it goes without saying that we cannot expect discourse to use neural networks or even “just” topic modelling. That’s why my tentative solution was trying to address the problem of false positives from the other side, as it were: not by trying to improve the search algorithm itself but by adding a blunt exclusion mechanism:

a) if it doesn’t include one of these terms, don’t show it.
b) if it doesn’t belong to this category/tag, don’t show it.