We are implementing a search engine for our main site using Swiftype. Out of the box indexing of our discourse site includes page navigation in the search description which is not desirable. (see here).
Swiftype offers two ways to customize the search description.
Content Exclusion/Inclusion: add attribute to markdown to designate what should be excluded/included in search index (eg: <p data-swiftype-index='false/true'>)
Meta tags: add custom meta tags to indicate what should be included in search index (eg: <meta class="swiftype" name="body" data-type="text" content="this is the body content" />)
I’m not finding a way to do this without directly modifying files and it’s pretty clear to me that would break once we updated Discourse.
Would really welcome ideas on the best approach to resolving this.
I am not seeing the problem in the link you provided?
What webcrawler agent does this software use? Are we correctly identifying it as a crawler? What happens when you set your user-agent to the same user-agent that crawler uses, and browse your site?
The user-agent is called Swiftbot (it’s a commercial search tool). Not sure how you would identify as a crawler?
The problem is that Swiftbot is picking up extra stuff (login, navigation, etc) in the search description. I think it’s a limitation with the Swiftype crawler – but their solution requires either 1) add markup to exclude navigation or 2) add a meta tag with the desired description. Neither option appears to be non-trivial in Discourse.
Ok, I added Swiftbot to the webcrawler detection regex, so if it using that user-agent, we will serve it the crawler version of the page. You’ll need to be on absolute latest for this change to take effect.