Search results should prioritize first post in topic when title matches search term

Recently I’ve noticed that when I search for a specific topic, the search results give me a post in the middle of the topic instead of the first post. This happens when a post in the topic contains the same search term that the title of the topic includes. Here are two recent examples from http://community.wanikani.com:

Searching for “Self Study Quiz” or “Self-Study Quiz” gives me [Userscript] Self-Study Quiz - #731 by prouleau - API And Third-Party Apps - WaniKani Community instead of the first post.

Searching for “ご注文はうさぎですか” gives me ご注文はうさぎですか miscellaneous discussion - #93 by seanblue - Reading - WaniKani Community instead of the first post.

I put this in #feature because I’m not 100% this has ever worked the way I’m describing. However, I search for the second term above several times a year, and I’m pretty sure the search results used to return the first post in the topic. If this was changed unintentionally at some point, you could consider this a #bug.

I know there’s an advanced search option to only search topic titles, but that seems unnecessary in this case. Since the search term matches many posts in the topic as well as the title itself, I think it should automatically prioritize the title match and return the first post.

6 Likes

This is the result of @tgxworld and @sam changes to search recently. Perhaps they can respond.

4 Likes

Yeah we have a long discussion about this, overall my preference is for search to be very dumb and always prioritize the first match in the topic regardless of ranking. It makes it far easier to explain to people, ranking feels very arbitrary and magical.

6 Likes

I understand the request in the OT to let search reference the topic itself (i.e., the first post) if the topic title matches the search criteria. If not, then reference the concrete post where the match occurs .

2 Likes

The problem is that we may have 15 matches in one topic, one of the 15 has the highest “rank”

4 Likes

Frankly, as long as the first post (via title match) is prioritized above all else I don’t care much about the rest, but what you’re saying sounds reasonable to me. When a dozen posts in the topic match the exact phrase you’re searching for, the one actually being returned in the search results certainly feels arbitrary right now. Always returning the earliest matching post in the topic feels like a good solution to me.

That said, the search algorithm should probably at least still prioritize exact matches to make sure an earlier post that only partly matches the search term isn’t prioritized over a later post that matches it exactly. I don’t know if that’s relevant since I’m not that familiar with the algorithm. Either way, as I said before I think all these details are far less important than prioritizing title match over everything else.

5 Likes

I keep getting tagged from people linking to a post of mine from the middle of a topic when they meant to share the topic itself. The current approach sure is causing confusion.

1 Like

Yeah @tgxworld and @sam this really has to be changed. We’re so far beyond the rule of 3 here. Title match should have massive supernova blackhole weight in the ranking.

9 Likes

I don’t think things can be as simple as always prioritising the first post of a topic because changing one specify search scenario will affect another. Consider the following scenario:

  1. Topic X and Topic Y both contains posts which matches a given search term “discourse”.
  2. Post number 1 and post number 100 in Topic X matches the search term “discourse”.
  3. Post number 100 ranks higher since its contents are a much better match for the search term “discourse”.
  4. Post number 1 of Topic Y matches the search term “discourse”.
  5. Post number 1 of Topic Y ranks higher than Post number 1 of Topic X since its contents are more relevant to the search term “discourse”.

With the previous approach, we were taking the ranking of post number 100 of Topic X to rank against post number 1 in Topic Y. In this case, post number 1 of Topic X appears higher in search than post number 1 of Topic Y even though post number 1 of Topic Y clearly matches the search term better than post number 1 of Topic X. This is equivalent to ranking topics by taking the highest ranking of the post in the topic.

If we were to just take the first post in a topic which matches the search term, we end up excluding posts which are a better match but have a higher post number in a topic which invites questions on why wasn’t this post included in the search results.

I think my main contention here with what is being proposed is that search by default searches for posts so the body of a post is actually what a user is searching for, not the title. The title helps us to filter through all the topics for the right posts to search for.

We actually do give title match a higher weight. The only problem is that title is part of the search data for all posts in a given topic so all posts are actually benefiting from the same weight for title matches.

2 Likes

I don’t think anyone is suggesting this though. We’re saying to rank title matches higher, which is not the scenario you just described. When the title matches the search term, it should specifically boost the rank of the first post, not all posts in the topic.

I disagree with this assessment. Users don’t care about finding posts specifically, but rather about finding content. When topics are properly named, the title is often the best indicator of what the user wants, and most of the time that means starting with the first post.

At least in the community I use, there are topics in certain subcategories that are often searched for for the purpose of sharing the topics when other users ask for help. Getting a random post in the topic from the search results instead of the first post has led to the wrong posts being shared, which has resulted in users not getting help as quickly as they could.

Whether for sharing or for personal use, I also think you’re underestimating how often users just want to search for the same topic over and over again, and I think in those cases, the vast majority of the time the users either want the first post or to continue from where they left off.

3 Likes

The scenario I described was how search in Discourse used to work and is an incorrect behaviour.

I think this was a wrong way to phrase it, search by default searches for a combination of both the topic’s title and the post’s body.

While this may be true, it isn’t always the case as this assumes that the search term will always match the topic’s title. On coding forums, sometimes I’m searching for snippet of code which is not something that will appear in the title.

What is being proposed isn’t impossible although there are some technical decisions and tradeoffs we will need to make. For instance, we need to remove the topic’s title from PostSearchData for posts which are not the first post. This ensures that topic title matches will rank the first post higher than the other posts in the topic but it still means that we’re at the mercy of the Postgres ranking algorithm.

4 Likes

Here’s another example to demonstrate my use case. I searched for “special kanji” on https://community.wanikani.com with the goal of finding this topic: Special kanji words derived from other words - Kanji - WaniKani Community. I wanted to look at the table of data in the first post, which is a wiki. As you can see, my search term exactly matches the start of title of the topic. However, the first post doesn’t actually contain the word “special”, so another post got prioritized instead, just because both the words “special” and “kanji” were in the post somewhere.

Hopefully giving additional examples like this is helpful.

3 Likes

@sam I wonder what your thoughts on this is. Search is ranking correctly because the “post somewhere”
contains the search terms in the title and the body while the first post only contains the search terms in the title. It seems counter intuitive that we would want to show the first post when another post is a way better match. From what I see, there are a couple of options we can consider:

  1. Revert to old behaviour. When multiple posts in a topic matches a given search term, always pick the post with the smallest post number. The downside here is that the smallest post number may not always be the first post and our search result may end up being poor when that is the case. There are also cases where a post’s body is obviously a much better match but ends up being excluded just because another post with a lower post number in the topic also matches the search terms.

  2. I wonder if we can solve the problem from the UX side of things. Instead of just having a single link that goes to the post, perhaps the title of the search result will always go to the first post in the title while the search excerpt will link to the post that ranked in the search result.

  3. Exclude topic title information from the PostSearchData of posts that are not the first post. We will need to experiment with things a little but I expect this to heavily skew search results to return the first posts.

I’m kind of inclined to try out option 2 because it keeps our search results accurate while providing users, that know what they are searching for, a way to go to the first post from the search results.

4 Likes

Feels really icky to me. This is non-obvious, unless the user is in the habit of reading (and interpreting) hovered link-to data.


Perhaps consider a two-stage search?

  • Search Topic Titles, matching only on EXACT_MATCH and CONTAINS_ALL
    • These get placed at the top of the results, with EXACT_MATCH getting priority
  • Then, fill in the rest of the results using the current weighted search, excluding any topic-starter posts that were caught by the first stage.
3 Likes

The idea is that we will make UX changes to make this obvious.

The idea is sound but I don’t think we can execute the postgres search query efficiently since there is no index support when trying to match with the like and ilike operators. Another factor to consider is that there is no way to rank exact match or contains matches. This is the reason why we rely full text search capabilities that Postgres provides.

1 Like

I think you’d probably need a more obvious way of distinguishing the first post and match post links. I don’t think what you’re proposing would be intuitive. (But I get it was just an example.)

One possibility I’ve thought of is to show all matching posts from a topic perhaps using some kind of expand/collapse mechanism. So you’d show the best matched post from all matching topics similar to the current view, but then there’d be an arrow or something next to each matched topic to expand the list to show all matching posts in that topic. It may be overkill though, I don’t know.

I think this would be worth trying out for sure. It would solve the issue I’ve been complaining about by prioritizing the title matches. But it would also solve the scenario you brought up about coding forums @tgxworld, since as you said, the code snippets wouldn’t be in the title.

Of course, if it’s impractical I guess that’s another story. Maybe someone else knows of a way to make it performant enough.

Relying on existing technology is great and all, but only when it properly solves your use case. It sounds like Postgres’s full text search is insufficient on its own given the issues brought up in this topic. It sounds like a hybrid approach like what @Sailsman63 proposed would be ideal, if there’s a way to make it feasible.

Since it sounds like there’s no “easy answer” right now, I honestly think this is the best option while a more complete solution is being worked out. Any UX change or new algorithm requiring database optimizations or other performance considerations could take a while to get right, so I think reverting to the old behavior would be reasonable as a stopgap.

Unfortunately, it isn’t as simple as saying we’ll switch to something better for us. We have trade-offs from both the business and technical perspectives to consider before we arrive at our own decision.

I’m hesitant to do so because it doesn’t move us forward to a better solution. In fact, I strongly feel that the old behaviour is incorrect. The cases that have been described in this topic is based on a single case where it seems intuitive for the search result to link to the first post only because the user that is searching knows exactly which topic they want to search for. I would go so far to argue that users who already know the topic title they are searching for should use the in:title advanced search filter. For most cases of search, the content of the post matters alot and a partial match on the topic title should not mean that the first post of the topic should be shown.

1 Like

Yes, that’s fair I guess.

Perhaps that would be sufficient if the advanced search was simpler. Personally I find it incredibly difficult to use because there are simply too many options, making it hard to find the option I need. Maybe it’s just me :man_shrugging:

Perhaps it would be appropriate to promote the in:title checkbox to always be visible. Or better yet, maybe a dropdown to the left of the input box with options like All, Titles, and Posts would make sense, to let the user specify what scope to search. This type of search filter is fairly common I think, so it would (hopefully) be intuitive for users to use or ignore as they see fit. As a comparable example, IMDb let’s you search for All, Titles, Actors, etc. (I don’t remember the exact options off hand.) You’d have to figure how to differentiate All and Posts since they’re basically one and the same right now, but maybe this approach is worth considering.

2 Likes

This is what we should be doing – it makes absolutely no sense to me that all posts in the topic have the topic title associated with them?

2 Likes

Will need to clarify with @sam on this because this is how our search index has always been built. In fact, the search index for each post includes the title, category name as well as tag names even if the post is not the first post.

1 Like