Search results should prioritize first post in topic when title matches search term

Do we use the ranking of the first hit or do we use the ranking of the best post in the topic? We used to be doing the latter which seems incorrect because we were taking the ranking of a post that will not appear in the search result to be ranked against posts of other topics.

3 Likes

I would say for we link to MIN(post_number) and we rank on MAX(rank) when doing the aggregate stuff.

3 Likes

The new behaviour has been reverted in

https://github.com/discourse/discourse/pull/11914

5 Likes

This topic was automatically closed after 6 days. New replies are no longer allowed.

Thanks for reopening the topic. I think there is still an issue related to this. On https://community.wanikani.com, there is a topic titled “General Anime Thread”. If I search for “general anime” I get this topic as the top result, but it goes to post 511. What’s particularly odd is that this post doesn’t even contain the words “general” or “anime”. So somehow this seems even worse that the originally reported issue. https://community.wanikani.com should be on a version after this fix was merged, as far as I can tell.

To give a second example, if I search for “japanese book club”, I get post 925 of that topic, even though the search term is in the title of the topic.

I’ve tried to reproduce on meta, but I haven’t been able to. It could just be the specific terms I’ve tried searching for though.

4 Likes

Did you check the version by viewing source? If you cannot repro here, it is likely a version mismatch.

1 Like

This is from the page source:

Discourse 2.7.0.beta4 - https://github.com/discourse/discourse version 47835ade9a3dcebb14bdd744e92d93b9c9199b90

That commit is from two days ago, and I can still reproduce this issue with the examples in my last post.

3 Likes

Thanks for the very detailed report, @tgxworld will have a quick look!

4 Likes

@sam I know what this is. We have a search performance optimisation on large sites where we only search through a partial index. Since the first post is really old, it is left out of the partial index which is why we’re not linking to the first post. I’ll need to think about the fix for this because the solutions I have in mind currently either trade-off performance for accuracy or accuracy for performance.

3 Likes

Would it be reasonable (in performance trade off) if every topic’s first post was included in the index? Or maybe just the first post of every topic if the topic has a post within the optimized time range? Assuming that last one is even feasible from a database perspective.

3 Likes

This is technically feasible but a monster of a problem

I am afraid that the trade off of spending say 1 week of careful index shuffling may not be worth it when it comes to correcting this

I can also think of plenty outlier situations that complicate this (like a forum with mountains of short topics)

Let’s wait on this for a bit and see how often it pops up

One interim change we can probably afford on your forum is doubling the size of your recent index, this is configurable (@tgxworld maybe make it so)

4 Likes

Thanks! If you think it’s safe to do, that would be great!

3 Likes

Was this ever done? Searching for the previously mentioned topic now goes to post 523 instead of 511, which seems to indicate more and more posts going out of range of the index.

2 Likes

Hi Sean, sorry it was not raised.

I just changed it (SiteSetting.search_recent_posts_size) to 250k, you only have 163k topics. It will take a couple of days for the change to kick in, a scheduled job needs to run.

4 Likes

I belive the changes have already kicked in. I changed it to 1 million previously but forgot to post an update here.

@sam Do we plan to tackle this problem at some point? Including all first post in the partial index doesn’t sound like a bad trade-off. I know from previous discussions that @codinghorror believes that search should heavily prioritise topics first before allowing posts within a topic to show up.

3 Likes

Maybe… yeah in a later point we can consider including the first post as well. It would be a hugely complex change though.

4 Likes

One of the examples, if it helps in any way is this. Ideally, this link should show the 🗓 Discourse Event topic as the first matching entry due to exact match with the topic title but its the 10th entry in results.

https://meta.discourse.org/search?q=Discourse%20Event

4 Likes

I made a support topic a few days ago because I was again unable to find topics via search, but it looks like it was deleted/unlisted for some reason. This time around, the search term 本好きの下克上 on community.wanikani.com doesn’t give search results with all the topics with that title. The two topics it does return are for a seemingly random post that happens to include the text, but not the first post even though the search term is in the topic title.

Have we hit this new 250k limit already? Does it need to be increased again?

I’d also like to know, even if we do have 250k topics now (or did you mean posts?), how are the topics/posts to keep in the search index determined? If it’s by age, I don’t understand why the topic I’m expecting to find isn’t returned since it’s not even a year old. Or did something change in the algorithm again that is causing this?

It’s been a while so I don’t remember all the details here, but would it be possible/simpler to include the topic title directly in the search index regardless of topic age instead of forcing the first post itself to always be included? Since you said it would be complex to force the first post itself to always be included I just wanted to throw out an alternate idea in case it was useful.

2 Likes

wooops Sean, I converted it to a PM but removed you from the list :man_facepalming:

Check your PMs long discussion there.

2 Likes