Anyone hazard a guess why searching for “wb.camra.org.uk” doesn’t find any posts when that text is most certainly in a post. Whereas a search for “wb.camra” does work.
Kind of proving the text does exist as I asked the same question in our own support category:
My guess is “camra” is the only string long enough to get past the minimum search limits. I suspect “camra” is all you are really searching for there.
So why does adding more break the search? Now that this post has been made, it’s reproducible here on Meta.
Remember that in normal human being talk, periods end sentences.
So you definitely wouldn’t want to search for just “org” or “wb” or “uk”.
Long unique strings is where it’s at, from a searching perspective. Searching for “to” or “by” ain’t not never no good no how.
But a search for “wb org uk” does find the first post.
I would say searching for website addresses is a pretty common requirement as it’s a unique identifier. And yes, it is now repeatable on here. I think people would complain if searching for www.bbc.co.uk didn’t work on Google.
Maybe somebody who knows how the search engine works could comment on what Discourse is doing with the search string? Is it getting processed so that it’s not searching on what one types?
Later… and yes, having included www.bbc.co.uk in this message, Discourse is unable to find it.
camra.org seems to work OK. As does
As to what bits of Postgres innards don’t like
wb.camra.org.uk I can’t say. There is a reason there are tens of thousands of people working at Google, and literally nobody uses Bing even though it air-quotes “works”
Although this is not exactly the same issue - something that might be relevant is this:
@sam did put a fix in for something similar to this sort of thing back in July 2016:
Where the last word wasn’t searchable.
Although the file
app/models/search_observer.rb doesn’t seem to exist in the current version of the code I assume those fixes are still in there somewhere…
Because the tests still exist for it:
But as I said before - perhaps a slightly different issue.
EDIT: found the current relevant code:
I don’t really buy the “Google is awesome at search, therefore no point improving search cause Google is awesome at search” argument
There is absolutely point in improving, but we may need to hire 10,000 folks to reach parity with Google’s efforts…
Sure, but… I can fix this specific bug
Is this a bug in your code, then? Is that what you’re saying?
The pg tokenizer is dumb, I already have workarounds for some edge cases, this is another example
Maybe there will be some improvements when we upgrade to PG 10 later this year.
Have not done root cause analysis here, this could very much be a bug in my code for all I know
It looks like it tokenizes right, at least on my local install and meta.
 pry(main)> Post.exec_sql("SELECT * FROM ts_debug('english', 'wb.camra.org.uk');").to_a
So this is likely an implementation bug on our side.
Thanks @DeanMarkTaylor for finding the exact reason for the bug!
Note you will need to re-index posts with said issue, which means you have to edit them (I am not 100% sure if rebake will catch this)
search for: wb.camra.org.uk
This topic was automatically closed after 25 hours. New replies are no longer allowed.