Are specific terms ignored from searches?

I’m having a few difficulties with the search functionality today.

We have an automated process we creates topics based on a lecture from a course, an example may be;

About ‘Implementing Health & Damage’

I found myself wanting to reduces the maximum number of characters for a topic length and decide I’d do a quick search to see how many of our automated topics are quite long, to get a feel for the suitable max length.

If I search using the above, I’ll get result, but I need the search to be a little more general than that, so I’ve tried searching for;

About ’

…knowing that is the start of the format we always use. When I do, I don’t get any results, if I add another word after the apostrophe I do get results. I assumed at that point it was related to the apostrophe, but nope, if I search for just;

About

I get no results, yet if I search for;

Test

I do get results! This indicates there isn’t a limitation on the number of words as a minimum for a search, and we have a 3 character limit set for the term, so it isn’t that. The only thing I can think of is that there is a list of “words” that are being ignored from a search, yet I don’t think we’ve set these up.

Any information would be appreciated. :slight_smile:


Update

I have also tested this with “because”, “and”, “the” and these all produce no results, so I’m feeling that there is a list of “common words” which are being ignored, but because I don’t have access to these via the settings I cannot alter them, and, cannot perform the search I want to perform.

Yes, I believe these are “stop” words — see:

I’m not sure if things have changed much since two years ago, but there doesn’t seem to be an easy way to change those.

8 Likes

Hi Kris,

Thanks for the reply.

Yeah, I thought as much. It’s a pain on this occasion as in this specific case the word does have value.

I guess using DataExplorer wouldn’t get around this either?


Update

Actually, via DataExplorer it works. I guess a full text search isn’t carried out when querying the topic title field.

Looks like a work-around :slight_smile:

7 Likes

I just ran into this myself. It’s… extremely annoying. Any update, is there a way to change this now? “About” really should not be a filtered word for search! Here’s how I ran into it here on Meta:

I was in the Plugins category and noticed that some plugin topics auto-delete replies, and others don’t. So I went to see if there was an explanation for this in the About topic in that category. There wasn’t, so I was going to ask about it, but wanted to make a feature request to a plugin first (in an auto-deleting-reply topic :grinning_face_with_smiling_eyes:). So I went back to the main category list and made my reply. Then I went back again to the category list and tried to find The About the Plugin category topic. Since it was an older but pinned topic, and I had already read it… it was no longer visible in the list (it had been moved down to its date-appropriate position now). :roll_eyes: OK, no problem, I’ll just Search for it… But of course, no dice.

The bigger issue is that there is absolutely no in-UI warning or message about this. The search just silently fails. So if this is unfixable due to Postgres limitations, I’d like to strongly suggest you add a feature into Discourse that notifies people of these words being filtered from search! Otherwise it’s quite confusing.

1 Like

I’m also surprised “about” is on the current Postgres stopword list, but here it is:

One workaround is to search for pinned topics in a category

in:pinned #support

1 Like

Thanks Jeff, I didn’t know you could search for pinned topics. Always something cool and new to discover in Discourse features. :smiley:

But… any thoughts on warning the poor user who is searching and finding nothing even when they know there should be results? Now I see the full list, it really is rather long…

2 Likes

It is a good idea to issue a warning if your search is all stopwords. How hard would that be to do @tgxworld ?

We don’t want to hardcode the list, so we need a way to query postgres and tell us “are all these words stopwords?”

3 Likes

We can return an extra column in our search query which can allow us to tell if the term consisted of only stop words which isn’t very hard to do. I’ll have to dig deeper to figure out where to inject that extra column though but the end goal here for us should be to avoid an extra DB query just to figure out if a term is all stop words or not.

2 Likes

Maybe if there are no search results, we run that check to see if the query is all stopwords? Then we only run an extra check when the results are poor anyway. Per the stopword list, if you search for “doing should now” for example.

Worst case we could mirror the list and run the check server side, it’s only 127 strings, but it would be hacky. Perhaps at boot-up time we could query the stopword list and cache it?

2 Likes

Yup kind of similar to what I meant with the extra column in the search results. If the search is empty, we
look up the column to figure out if it is because all the terms are stop words or because we got 0 matches.

2 Likes

How are these stop words working in another languges? Translated ones are stopped too? Or just english ones when those on connected to some technical needs?

There are files for different languages

2 Likes

I see. Just all person pronomins almost in all form, and with verb be, plus few fillers like or, only, when etc. Almost everything can be bypassed in the searches, for example we don’t mostly need personas in a sentence. But sure, keeps the database more controllable.

But good to know it has builded per language too. Thanks.

2 Likes