Yeh I’ve been pondering this one a bit myself. Improving the relevance check is definitely one part of it, but I also frequently encounter new duplicate topics on Meta that I know had good suggestions upon topic creation (I’ve tested it by creating a new topic and inserting their title & body).
I think an equally important side to this is getting the user to actually look at the suggestions. Personally I’ve become practically blind to it at this point. I don’t know how prevalent this “blindness” is, but in my case I frequent such a large portion of Meta that I feel pretty confident any new topic I make will not be a duplicate (also I tend to search before posting, but that can’t be expected of the average user IMO, that’s what the automatic topic search is there for after all). I fear this habit might have transferred with me to other forums as well, where I’m nowhere near as well versed in the forum’s content.
One idea I’m playing around with is to move the JIT notification into the preview field:
You start typing, and at first all you see is your preview.
Two paragraphs or so in, the suggestions pop up.
If you just continue typing however, the pop-up will be folded down to an expandable “Your topic is similar to…”. This has the added effect of maybe making the user curious: “hmmm, I wonder if the suggestions are different now”.
I don’t know how exactly the relevance algorithm works, but it would also be very nice if there was a distinction between very relevant topics and only moderately relevant ones. Then the relevance search could further peak user’s interest with the occasional “Your topic is very similar to X other topics”, referring to topics with >80% relevance score (not a real thing afaik). Suddenly there’s an exact count for you to look at.
Not sure I’d even want this, but, further building on the hypothetical relevance score, we could also put an actual stop gap in front of “Create topic” if your topic has other topics >95% relevant to it. You’d click “Create topic” and you’d be hit with a warning like: “These other topics seem very closely related to yours. Could you please take a look and see if your topic isn’t already being discussed?”
You are talking here about changing a feature that I use 0% of the time to be even more annoying
Our relevance check is super ghetto at the moment, I guess if you had a 90% match on the title with another title… Maybe… But so many people will just type “discourse is not working” getting a half decent algorithm going for forums is just super duper hard
I think we should do some testing here to see what we are returning in “real life”, how often is it helpful, etc.
First thing is that the results need to be much more useful. Also I wish I could opt out of the feature at tl2 or 3
It is tricky because on the one hand, you have to do something to ensure that people searched before creating a new, duplicate topic. And unfortunately, relevance through searches is a super hard problem.
Contrary to @sam’s observations I have had positive results when creating a new topic and getting a duplicate topic search hit from the “your topic is similar to” panel. Others have reported positive results on Twitter as well if you browse the @discourse Twitter account.
My problem is I rarely create new topics in the first place, and on the rare occasion that I do, I know they are guaranteed to be new topics, not duplicates. Probably the same for Sam. To be honest this is a feature for your, uh, “less experienced” users.
Consider a full screen topic composer for TL0 and TL1, that way we have lots of room on the screen to teach people and can follow the same pattern used at Stack for similar topics.
Improve results a lot, currently they are just too random.
Allow TL2 and above to opt-out of this feature
As it stands I feel the feature needs lots of love.
The way we improved it at Stack was to manually drop the top 10k English words from the query terms, so you are only ever querying on really unique, rarer words rather than “how do I” matches, which are almost always bullshit.
Also to heavily prioritize title matches, but the current code uses both title and body provided they are long enough.
As a very important followup to such an improvement: Don’t pop up any suggestions unless the algorithm has found something that is indeed quite similar. This is probably the primary reason for my blindness. Whatever I’m writing, however unique, some suggestions will pop up no matter how much of a longshot they are.
Well, it’s the same sort of stuff that Google’s trillion dollar empire is based on. Search queries, expressed in SQL, probably against full text indexes. It is not black magic from another galaxy.
You might say the same thing to Google: just show me the correct matches for my search