A possible approach @Falco could be a reverse approach of what our icurrent one does
For each topic try to extract/create 20 or so keywords and layer them on top of the existing keywords
I wonder if that helps
A possible approach @Falco could be a reverse approach of what our icurrent one does
For each topic try to extract/create 20 or so keywords and layer them on top of the existing keywords
I wonder if that helps
Our relevance search does not take into account views or pagerank. And to add complication, views in all time can get really high skewing stuff so we probably would need views by year or something to correct for that.
But… with pagerank / accounting for view counts / likes it is possible we can come up with a far better relevance algorithm.
This is complex stuff, a multi trillion dollar company was built on these ideas and another multi trillion dollar company has no easy way catching up.
There I fixed it … at #1 now.
I discussed this issue with @tgxworld and @JammyDodger in the past, we baked ourselves a very bad cake here.
The simple workaround is
Going through every single plugin topic and appending “Plugin” at the end.
Discourse Advertising Plugin
Discourse Chat Plugin
and so on…
Title matches “win” so for example
Advertising
in category plugin will lose to Discourse Advertising Plugin question
in category random.We could “bloat” our title index by appending category and tags - I think this is what google do anyway.
So instead of indexing:
first priority “Discourse Advertising”
second “plugin”
third priority “content”
We could index
first priority “Discourse Advertising - plugin tag1 tag2”
Of course a workaround is searching for:
#plugin chat
…
vs
FWIW … might as well go and fix up all the official plugins now, will only take me a few mins.
How about taking into account the number of links to the topic?
Yes, that is page rank, I mentioned that
So many trade offs though, should an exact title match lose to high page rank?
No. Exact titles are what I must often look for, but I’m pretty special. When I’m looking for a “why didn’t you do a search” link I’m mostly looking for things I know exist (a step away from standard install, for many months I was stumped that “straightforward” would no longer find the Configure direct-delivery incoming email for self-hosted sites with Mail-Receiver, but I recently got it renamed so “mail receiver” works)
Ah. Now I see that you said that.
For the things I actually search for that I don’t know that I’m looking for, the most-recent usually does best.
FWIW, on my own (largely just for me) sites, with relatively few topics and posts, I think search works pretty well!
This is the way, there are many search tools to test before wasting too much effort on the internal one. I don’t know any site with an internal search that doesn’t get this complain. Even reddit which is one of the largest sites around get criticized for their search.
By correlating user behavior during searches and reading (and possibly through inquiries, as Google Maps does, for example), Discourse could internally generate knowledge about anticipated outcomes of queries.
I also wonder if AI could help steer a conversation towards the desired results. Such a dialog could start with a button that says: “I am dissatisfied with the results”. The role of the AI would then be to ask questions whose answers either narrow down the range of outcomes or prioritize them appropriately.
A typesense plugin sounds amazing.
Good topic! Search in forums is a really tricky thing, and the solution of using Google tends to come up a bit too often for my tastes.
Would agree here. You don’t want old topics to dominate your search results.
Judging from my own search expectations, I would want the best results to be threads that are both recent and active, and which are a good match in terms of title and category. And even after that I would prefer recency to have a notable impact, because I often search for things that I vaguely remember.
Unfortunately also true. Personally, I’m not even sure how much links would really contribute to relevance (though they probably would be a factor), because in the forums I’m active in, but which are not support or technical forums of some kind, linking is relatively rare.
So I tend to consider recency and activity, i.e. number of views, likes/reactions, replies, within the not-too-distant past more important (not if this is also factored into the current search implementation or not).
I think it’s worth looking the algorithm reddit uses for it’s “hot” score:
That is something like