Search results should prioritize first post in topic when title matches search term

So there is absolutely no way to tell in the search index if the post is the first post?

What about this kind of tweak:

When multiple posts in a topic matches a given search term, AND ONE OF THE POSTS IS THE FIRST POST give that specific post, the one that is the first post, a huge bump in search ranking.

Then you avoid a mindless “order by post number” behavior, while properly giving the first post priority?

2 Likes

Currently no but it wouldn’t be hard to add since it is similar to an optimisation I did last year to allow us to filter the PostSearchIndex by PM or not PM.

2 Likes

My call here would be that the theoretic benefit of accurate in topic ranking of duplicates has way too many downsides.

  • People could be using search like a bookmark and get confused when a search that always took them to #77 now takes them to #892

  • Pouring oil on fire, we have no mechanism for “going to first unread in a topic” so prioritizing later posts comes with a big downside that the odds are higher you would create reading gaps (EG. you read 1,2,3 but did not read 4-88 … now you hit 89)

I much prefer to just unconditionally prioritize the first hit in a topic, it is simpler to explain and much more stable.

5 Likes

I agree and that solution works for me :+1:

4 Likes

Do we use the ranking of the first hit or do we use the ranking of the best post in the topic? We used to be doing the latter which seems incorrect because we were taking the ranking of a post that will not appear in the search result to be ranked against posts of other topics.

3 Likes

I would say for we link to MIN(post_number) and we rank on MAX(rank) when doing the aggregate stuff.

3 Likes

The new behaviour has been reverted in

https://github.com/discourse/discourse/pull/11914

5 Likes

This topic was automatically closed after 6 days. New replies are no longer allowed.

Thanks for reopening the topic. I think there is still an issue related to this. On https://community.wanikani.com, there is a topic titled “General Anime Thread”. If I search for “general anime” I get this topic as the top result, but it goes to post 511. What’s particularly odd is that this post doesn’t even contain the words “general” or “anime”. So somehow this seems even worse that the originally reported issue. https://community.wanikani.com should be on a version after this fix was merged, as far as I can tell.

To give a second example, if I search for “japanese book club”, I get post 925 of that topic, even though the search term is in the title of the topic.

I’ve tried to reproduce on meta, but I haven’t been able to. It could just be the specific terms I’ve tried searching for though.

4 Likes

Did you check the version by viewing source? If you cannot repro here, it is likely a version mismatch.

1 Like

This is from the page source:

Discourse 2.7.0.beta4 - https://github.com/discourse/discourse version 47835ade9a3dcebb14bdd744e92d93b9c9199b90

That commit is from two days ago, and I can still reproduce this issue with the examples in my last post.

3 Likes

Thanks for the very detailed report, @tgxworld will have a quick look!

4 Likes

@sam I know what this is. We have a search performance optimisation on large sites where we only search through a partial index. Since the first post is really old, it is left out of the partial index which is why we’re not linking to the first post. I’ll need to think about the fix for this because the solutions I have in mind currently either trade-off performance for accuracy or accuracy for performance.

3 Likes

Would it be reasonable (in performance trade off) if every topic’s first post was included in the index? Or maybe just the first post of every topic if the topic has a post within the optimized time range? Assuming that last one is even feasible from a database perspective.

3 Likes

This is technically feasible but a monster of a problem

I am afraid that the trade off of spending say 1 week of careful index shuffling may not be worth it when it comes to correcting this

I can also think of plenty outlier situations that complicate this (like a forum with mountains of short topics)

Let’s wait on this for a bit and see how often it pops up

One interim change we can probably afford on your forum is doubling the size of your recent index, this is configurable (@tgxworld maybe make it so)

4 Likes

Thanks! If you think it’s safe to do, that would be great!

3 Likes

Was this ever done? Searching for the previously mentioned topic now goes to post 523 instead of 511, which seems to indicate more and more posts going out of range of the index.

2 Likes

Hi Sean, sorry it was not raised.

I just changed it (SiteSetting.search_recent_posts_size) to 250k, you only have 163k topics. It will take a couple of days for the change to kick in, a scheduled job needs to run.

4 Likes

I belive the changes have already kicked in. I changed it to 1 million previously but forgot to post an update here.

@sam Do we plan to tackle this problem at some point? Including all first post in the partial index doesn’t sound like a bad trade-off. I know from previous discussions that @codinghorror believes that search should heavily prioritise topics first before allowing posts within a topic to show up.

3 Likes