Search of external embedded content

search

(omfg) #1

I’m not sure if this is a bug, I don’t really think it is, but I searched the forum and couldn’t find a post about it so I’ll put it here just in case.

So:

  • If you search the site (this site) using Google, you can find stuff in embedded content (that’s automatically pulled by Discourse from external sites such as Github).
  • If you try the same using this forum’s search engine, it won’t work - it seems to ignore external embedded content.

Since external content can change, I think the current behavior is okay, as it lessens the load on the system (which I like), otherwise regular checking and potentially re-scanning of external URLs would have to happen to keep the search index up to date.
IMO the only thing that should probably be highlighted is that by default Discourse doesn’t seem to index those pages, so that people who hope to find such content need to use Google or another search engine.
(If Discourse were to regularly check external content, I’d like to be able to disable that feature, to save resources.)


(Jeff Atwood) #2

Sorry, I have no idea what you are talking about here? Can you provide a specific example?


(omfg) #3

Because soylent.com appears below, you should get at least one result from the 2nd link, right?
Of course you will find this post because the URLs and this very sentence, but if soylent.com wasn’t mentioned except in the linked document below, you’d get 0 results. Google, on the other hand, will give you a ton of search results.


(Mittineague) #4

Hmm, odd.

Doing a site search here for a different domain in the onebox, the post didn’t come up in the results, but the string did highlight in the post

My guess is the search is working with Raw but Google with Cooked

https://meta.discourse.org/raw/36236/3

* [https://www.google.com/?gfe_rd=cr&q=%22soylent.com%22+site:meta.discourse.org&gws_rd=cr#](https://www.google.com/?gfe_rd=cr&q=%22soylent.com%22+site:meta.discourse.org&gws_rd=cr#)
* [https://meta.discourse.org/search?q=soylent.com](https://meta.discourse.org/search?q=soylent.com)

Because soylent.com appears below, you should get at least one result from the 2nd link, right?
Of course you will find this post because the URLs and this very sentence, but if soylent.com wasn't mentioned except in the linked document below, you'd get 0 results. Google, on the other hand, will give you a ton of search results.

https://github.com/discourse/discourse/blob/master/README.md

(omfg) #5

Yes, that too.
I had noticed that as well and I had thought the highlight is coming from the browser, but I didn’t know how to determine if that’s true.