Apologies for the bad title, but I’m trying to make a point.
Every single word in my title (until someone changes it, and then for the record it was "Now should I only do myself? No. We should both do ourselves. ") is on the Postgres stop words list. This topic title is invisible to Postgres. I don’t mind stop words being used for indexing topic bodies, but it irks me a lot to have them used in topic titles. Stop words are sometimes the key words to distinguish one topic from another, so I’d like my searches to be for the terms I’ve put in.
This is not a bug in Discourse, it is just using the defaults from Postgres, but it does cause user confusion when words get ignored.
Please, please, find a way around the stop list for topic titles.
I have seen forum game topics titled “This or That” which is another completely stopword list title.
And recently I found a book in a book store which has a title that is on Amazon’s stop word list. That issue made me think of this problem again. (The book is titled The Book and is a history of books and related technologies, like paper making.)
I suspect that is not true. For full text, maybe. For titles? That’s a much smaller working space. This forum has around 50k titles if I’m correct in thinking that …ourselves/49417 shows me a sequential number. Most of those will have under a dozen words (mine is exactly a dozen). Indexing 600,000 words should not be an issue for a modern system. How many posts here will use the word “Discourse”? The results cap at 50, but I bet it is nearer to 50k than 50. (Google tells me “about 17,800” for “site:meta.discourse.org discourse” and I suspect it is an undercount for “posts”.) And yet postgres has no problems searching for a word with that many results.
I guess if your testing has proven the Postgres documentation is in need of correction. i.e.
Aside from improving search quality, normalization and removal of stop words reduce the size of the tsvector representation of a document, thereby improving performance.
There is a link at the bottom of the page to a form where you should submit your findings so others can benefit from the improved knowledge.
Unlikely this is anything we would attack super soon. The answer is “use Google in these cases”. Have you tried that? Our 404 page includes a google search box as does the search help.
It’s not on the zero results search page. And the first words of the search help are “Title matches are prioritized – when in doubt, search for titles”.
I’m asking for title matches to get real priority.
I want to revisit this and add it to someone’s list because I like it a lot. @neil can you take? Try to normalize the search code that’s on the “topic not found” page so we aren’t duplicating stuff everywhere, and bear in mind the public vs. private site caveats, we don’t want to show this on a private site…
It would be nice if the domains to be searched by Google could be customized in site settings. That way, one could include a WordPress site, for example. Or was “search this site” already thought as including the entire domain, not just the forum’s sub-domain?
Also minor point but @neil I think this reads better as
Can’t find what you’re looking for? Start a new topic, or search with Google instead:
edit: I see the problem, search with google may not be available (private site) and creating a topic may not be available (you don’t have permissions) so these have to be two complete, independent sentences. OK, fine as is then @neil!