搜索时会忽略特定术语吗?

I’m having a few difficulties with the search functionality today.

We have an automated process we creates topics based on a lecture from a course, an example may be;

About ‘Implementing Health & Damage’

I found myself wanting to reduces the maximum number of characters for a topic length and decide I’d do a quick search to see how many of our automated topics are quite long, to get a feel for the suitable max length.

If I search using the above, I’ll get result, but I need the search to be a little more general than that, so I’ve tried searching for;

About ’

…knowing that is the start of the format we always use. When I do, I don’t get any results, if I add another word after the apostrophe I do get results. I assumed at that point it was related to the apostrophe, but nope, if I search for just;

About

I get no results, yet if I search for;

Test

I do get results! This indicates there isn’t a limitation on the number of words as a minimum for a search, and we have a 3 character limit set for the term, so it isn’t that. The only thing I can think of is that there is a list of “words” that are being ignored from a search, yet I don’t think we’ve set these up.

Any information would be appreciated. :slight_smile:


Update

I have also tested this with “because”, “and”, “the” and these all produce no results, so I’m feeling that there is a list of “common words” which are being ignored, but because I don’t have access to these via the settings I cannot alter them, and, cannot perform the search I want to perform.

Yes, I believe these are “stop” words — see:

I’m not sure if things have changed much since two years ago, but there doesn’t seem to be an easy way to change those.

8 个赞

Hi Kris,

Thanks for the reply.

Yeah, I thought as much. It’s a pain on this occasion as in this specific case the word does have value.

I guess using DataExplorer wouldn’t get around this either?


Update

Actually, via DataExplorer it works. I guess a full text search isn’t carried out when querying the topic title field.

Looks like a work-around :slight_smile:

7 个赞

我自己也刚遇到这个问题。这……非常烦人。有什么更新吗?现在能改吗?“关于”真的不应该被设为搜索过滤词!我就是在Meta 站点上遇到这个问题的:

我当时在“插件”分类下,注意到有些插件主题会自动删除回复,而有些则不会。于是我去查看该分类下的“关于”主题,看是否有相关说明。结果没有,我本想就此提问,但决定先向某个插件提个功能请求(在一个会自动删除回复的主题里 :grinning_face_with_smiling_eyes:)。于是我回到主分类列表,发表了回复。接着我又回到分类列表,试图找到 About the Plugin category 这个主题。由于它是一个较旧但被置顶的主题,而且我已经读过……它就不再显示在列表中了(现在已按日期顺序移到了相应位置)。:roll_eyes: 好吧,没问题,我直接搜索它……但当然,毫无结果。

更大的问题是,界面中完全没有相关的警告或提示信息。搜索只是静默失败。如果由于 PostgreSQL 的限制而无法修复,我强烈建议在 Discourse 中增加一项功能,当某些词汇被从搜索中过滤时通知用户!否则这很容易让人困惑。

1 个赞

我也很惊讶“about”会出现在当前的 Postgres 停用词列表中,但这里就是:

一个变通方法是在特定类别中搜索置顶话题:

in:pinned #support

1 个赞

谢谢 Jeff,我都不知道可以搜索置顶话题。Discourse 的功能里总有一些新奇酷的东西等着我们去发现。:smiley:

不过……对于正在搜索却一无所获、明明知道应该有结果的可怜用户,有没有什么办法提醒他们呢?现在我看到完整列表了,确实相当长……

2 个赞

如果您的搜索词全是停用词,发出警告是个好主意。@tgxworld,这做起来难度有多大?

我们不希望硬编码这个列表,因此需要一种方法来查询 PostgreSQL,并告诉我们“这些词是否全都是停用词?”

3 个赞

我们可以在搜索查询中返回一个额外的列,这样就能判断该术语是否完全由停用词组成,这并不难实现。不过,我需要进一步研究在哪里注入这个额外的列。但我们的最终目标应该是避免为了判断一个术语是否全为停用词而额外执行一次数据库查询。

2 个赞

如果没有搜索结果,我们是否可以运行检查以确认查询是否全部由停用词组成?这样,只有在结果不佳时才会额外执行一次检查。根据停用词列表,例如搜索“doing should now”时。

最坏的情况下,我们可以镜像该列表并在服务器端运行检查,这仅涉及 127 个字符串,但这样做会有些取巧。或许在启动时,我们可以查询停用词列表并将其缓存起来?

2 个赞

是的,这和我之前提到的在搜索结果中增加一列的想法类似。如果搜索结果为空,我们可以查询该列,以判断是因为所有词都是停用词,还是因为确实没有匹配到任何结果。

2 个赞

这些停用词在其他语言中是如何工作的?翻译过来的词也会被停用吗?还是只有在与某些技术需求相关时,英文的才会被停用?

有不同语言的文件

2 个赞

我明白了。几乎所有形式的人称代词,以及动词 be,加上一些像 or、only、when 等填充词。几乎所有这些都可以在搜索中绕过,例如,我们大部分时间在句子中不需要人称。但当然,这使得数据库更易于控制。

但很高兴知道它也是按语言构建的。谢谢。

2 个赞