Database full text search capabilities/limits?

Let me clarify this statement. For exact matches, there is no need to rank the results since there is either a match or there isn’t. For contain matches, how will we decide which should rank higher?

  1. This is my search term
  2. Search term is interesting
  3. Search term is more interesting here since the title is longer

All the above titles contains the search term so the question is how do rank the results? Currently we rely on Postgres’s ranking function to determine this for us.

Ranking attempts to measure how relevant documents are to a particular query, so that when there are many matches the most relevant ones can be shown first. PostgreSQL provides two predefined ranking functions, which take into account lexical, proximity, and structural information; that is, they consider how often the query terms appear in the document, how close together the terms are in the document, and how important is the part of the document where they occur. However, the concept of relevancy is vague and very application-specific. Different applications might require additional information for ranking, e.g., document modification time. The built-in ranking functions are only examples. You can write your own ranking functions and/or combine their results with additional factors to fit your specific needs.

There are a bunch of factors that Postgres takes into account but ultimately ranking is still somewhat vague and really depends from site to site depending on how the content of the forum is structured. Our aim here at Discourse is to provide a good default that would work for most of the forums out there.

5 Likes