No search result are shown


(Jiaqi Li) #1

Hi friends, I discovered very strange problems on my website.

My website is an online dictionary that contains many words, so every topic is in the format of “WORD meaning-Pronunciation-Example”.

However, when I searched some words in my website, many of them don’t show result.For example, “off”, “few”, “able”, “add”, “age”, “but”, “gain”, “she”, “here”, “him”, “top”, “very”, “why” etc.

And for some other words, when I search the whole word, no result there; but when I search part of the word, the result appears. For example:

when I search “city”, no result; but when I search “cit”, the result appears;
when I search “policy”, no result; but when I search “poli”, the result appears;
when I search “industry”, no result; but when I search “indu”, the result appears;
when I search “should”, no result; but when I search “shou”, the result appears;
when I search “story”, no result; but when I search “sto”, the result appears;
when I search “memory”, no result; but when I search “memo”, the result appears;
when I search “pretty”, no result; but when I search “prett”, the result appears;
when I search “happy”, no result; but when I search “happ”, the result appears;

There are totally about 200 words that have such issue and I believe, if I don’t do something, I will see more and more and more in the future.

So please, my friends, I NEED YOUR HELP :scream_cat:


Now should I only do myself? No. We should both do ourselves
(Sam Saffron) #2

Are you using our official Docker install?


(Jiaqi Li) #3

@sam Hi Sam, I mean when I search these words in my OWN website, they have the problem.

For example:

But these topics DO exist. So I’m wondering why :worried:


(Mittineague) #4

Those are “stop” words

https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS

Stop words are words that are very common, appear in almost every document, and have no discrimination value. Therefore, they can be ignored in the context of full text searching. For example, every English text contains words like a and the, so it is useless to store them in an index. However, stop words do affect the positions in tsvector, which in turn affect ranking:


Searching for "where we are at" does not work
(Jiaqi Li) #5

@Mittineague Thank you for your kind reply :slight_smile:

Is that possible to solve the problem? Since my website is an online dictionary, it would be super complicated if my user cannot search the English words they want to study


(Sam Saffron) #6

no, it is impossible, there is no setting to disable stemming if we would it would cause severe performance issues for search.


(Jiaqi Li) #7

@sam :sob: can’t be worse if so… but anyway thank you for your help :+1:


(Matt Palmer) #8

Stemming and stop words are unrelated, though. @Jiaqi, I’m pretty sure you can do this, if you’re willing to rummage around enough, but it’ll involve learning a lot more about full-text search and PostgreSQL internals than you ever cared to know. It’s certainly not something that would ever be a “standard” feature, because it’s so incredibly niche. You’d be better off creating some sort of custom index plugin that kept a mapping of all the words and their associated topic IDs, and offered a custom search box to find the relevant topic.


(Mittineague) #9

There sure is a lot more to it than I ever imagined.

The custom index could be doable. AFAIK there are 127 stops. A bit, but not overwhelmingly so.

https://apt-browse.org/browse/ubuntu/trusty/main/i386/postgresql-9.3/9.3.4-1/file/usr/share/postgresql/9.3/tsearch_data/english.stop


(Matt Palmer) #10

The trick, if you want to use a different stop word list (or not have any at all) is to define a custom dictionary and language profile (or whatever PostgreSQL calls it) that doesn’t have the stop words in it, and then reconfigure your full-text indexing to use that instead of the standard English one. I’ve done it once, a long long time ago, and I have no interest in repeating the experience.