Search Improvements in 2.3

For the upcoming 2.3 release, we’ve added a new feature and made several fixes to the way we index posts for search that will make search results better.

1. Search Priorities for Categories


Each individual category can be configured with a search priority which you can find under the Settings tab when creating/editing a category. Five levels for prioritization have been added: ignore, very low, low, high and very high. The levels work by multiplying a pre-configured weight to the search ranking for each result and can be configured via hidden site settings in your console.

As an example, configuring a category’s search priority to very_high will boost its search ranking by 40% while a search priority of very_low will reduce its search ranking by 40%. Setting a category’s search priority to ignore will remove it from the search results. However, you can still search for posts within the ignored category by scoping the search to the category via advanced search. Do take note that search priority is not inherited which means that a sub-category will still be searchable even if the parent category has been configured to be ignored.

2. Improvements and Fixes to Search Results

  1. Search relevance has been updated to consider the document length in the ranking. Previously, search ranking was ranked by the most number of matches for the given search term. This was problematic for search results as we noticed that longer posts were more likely to be ranked higher because of the tendency to produce a higher number of matches. As a result, we’ve made the switch to consider the document length when ranking search results.
    FIX: Relevance search will now consider document length in ranking. · discourse/discourse@e87ca59 · GitHub

  2. Improve the quality of the raw data that we use to generate the search index. PERF: Improve quality of `PostSearchData#raw_data`. (#7275) · discourse/discourse@cfd5078 · GitHub

    • URLs in posts were sometimes incorrectly tokenized and indexed twice leading to terms within links having a higher ranking.
    • Content for lightboxes in posts were polluting search results with the image’s metadata.
  3. Empty posts such as moderator actions or small post actions (closing, assigning) are no longer part of the search index. This change leads to a smaller index and less noise in search results. FIX: Don't index posts with empty `Post#raw` for search. (#7263) · discourse/discourse@daeda80 · GitHub

  4. Smaller search index by deleting posts of trashed topics from the index. PERF: Delete search data of posts from trashed topics periodically. (… · discourse/discourse@d151425 · GitHub

  5. When a post is moved to a different topic, the search data for the post was not updated and would lead to the post incorrectly appearing in searches that matched the category of the old topic. FIX: Reindex post for search when post is moved to a different topic. · discourse/discourse@d808f36 · GitHub

  6. Smaller payload for search results by excluding the cooked version of the posts. PERF: Reduce number of queries and size of payload when searching. · discourse/discourse@03c6b22 · GitHub

  7. The blurb for posts in search results was broken when searching for an exact phrase and lead to missing search term highlights on the client side. FIX: Post blurb incorrect when search contains a phrase match. · discourse/discourse@dae0bb4 · GitHub

Do let us know if you’re search results have improved after these changes. We would also like to know if you think your search results have become worse so that we can continue to refine it. Thank you all!

43 лайка

Is the age of a post part of the weight in ranking search results? Information gets stale fairly quickly in our forum, so it would be nice to have a way to reduce the relevance of older postings without actually eliminating them.

11 лайков

Not yet, but it is an interesting idea, even in the weaker form of simply factoring in the date the topic was last touched

7 лайков

Я бы с радостью увидел возможность «закреплять» темы в результатах поиска, чтобы они отображались первыми. Так популярные запросы могли бы сразу вести к нашим руководствам.

Это делается путем изменения приоритета поиска для категории, в которой находятся эти темы.

Моя «проблема» заключается в том, что эти темы распределены по всем категориям форума на основе контекста. Поэтому я не могу использовать подход, основанный на категориях.

Тогда вы застрянете, потому что случайного способа сделать это по теме не существует и никогда не будет.

Возможно, в будущем появится возможность настраивать приоритеты поиска по тегам? Хотя это будет не так просто, поскольку у тем может быть несколько тегов…

В любом случае, возможность взвешивать категории — отличная функция! У меня на форуме уже есть несколько спамных категорий специального назначения, которые будет приятно немного опустить в результатах поиска.

6 лайков

Да, я думаю, что это произойдёт в какой-то момент.

10 лайков

Это было бы здорово! Тогда посты в блоге можно будет помечать соответствующим образом и приоритизировать в поиске…

1 лайк

Я бы очень хотел видеть возможность того, чтобы определенные теги также влияли на приоритет поиска.

4 лайка

Планируется ли добавление этого в версию 2.4.X или 2.5?

Это теперь так работает?

Я недостаточно разбираюсь, чтобы понять этот ресурс: Search Controller Need help with understanding how discourse search works - #3 by neounix

2 лайка

Пока нет. На данный момент только для категорий можно настроить приоритет поиска.

3 лайка