So there is absolutely no way to tell in the search index if the post is the first post?
What about this kind of tweak:
When multiple posts in a topic matches a given search term, AND ONE OF THE POSTS IS THE FIRST POST give that specific post, the one that is the first post, a huge bump in search ranking.
Then you avoid a mindless “order by post number” behavior, while properly giving the first post priority?
Currently no but it wouldn’t be hard to add since it is similar to an optimisation I did last year to allow us to filter the PostSearchIndex by PM or not PM.
My call here would be that the theoretic benefit of accurate in topic ranking of duplicates has way too many downsides.
People could be using search like a bookmark and get confused when a search that always took them to #77 now takes them to #892
Pouring oil on fire, we have no mechanism for “going to first unread in a topic” so prioritizing later posts comes with a big downside that the odds are higher you would create reading gaps (EG. you read 1,2,3 but did not read 4-88 … now you hit 89)
I much prefer to just unconditionally prioritize the first hit in a topic, it is simpler to explain and much more stable.
Do we use the ranking of the first hit or do we use the ranking of the best post in the topic? We used to be doing the latter which seems incorrect because we were taking the ranking of a post that will not appear in the search result to be ranked against posts of other topics.
Thanks for reopening the topic. I think there is still an issue related to this. On https://community.wanikani.com, there is a topic titled “General Anime Thread”. If I search for “general anime” I get this topic as the top result, but it goes to post 511. What’s particularly odd is that this post doesn’t even contain the words “general” or “anime”. So somehow this seems even worse that the originally reported issue. https://community.wanikani.com should be on a version after this fix was merged, as far as I can tell.
To give a second example, if I search for “japanese book club”, I get post 925 of that topic, even though the search term is in the title of the topic.
I’ve tried to reproduce on meta, but I haven’t been able to. It could just be the specific terms I’ve tried searching for though.
@sam I know what this is. We have a search performance optimisation on large sites where we only search through a partial index. Since the first post is really old, it is left out of the partial index which is why we’re not linking to the first post. I’ll need to think about the fix for this because the solutions I have in mind currently either trade-off performance for accuracy or accuracy for performance.
Would it be reasonable (in performance trade off) if every topic’s first post was included in the index? Or maybe just the first post of every topic if the topic has a post within the optimized time range? Assuming that last one is even feasible from a database perspective.
Was this ever done? Searching for the previously mentioned topic now goes to post 523 instead of 511, which seems to indicate more and more posts going out of range of the index.
I just changed it (SiteSetting.search_recent_posts_size) to 250k, you only have 163k topics. It will take a couple of days for the change to kick in, a scheduled job needs to run.
I belive the changes have already kicked in. I changed it to 1 million previously but forgot to post an update here.
@sam Do we plan to tackle this problem at some point? Including all first post in the partial index doesn’t sound like a bad trade-off. I know from previous discussions that @codinghorror believes that search should heavily prioritise topics first before allowing posts within a topic to show up.