Whitelisting internal hosts for crawling

(Robin Ward) #1

In the latest release of Discourse we added extra protection to prevent SSRF attacks. This new code ensures that links are only crawled if they are not on private networks, so if your server replies with an internal address for any host it won’t be crawled.

However, this is not always ideal, for example if you are running a couple of Discourses on a private network, they wouldn’t be able to onebox each other or crawl links to fetch topic titles and such.

To fix this, I’ve added a new site setting to whitelist internal hosts for link crawling and oneboxing.

Simply add your hosts to the whitelist internal hosts site setting and they will be crawled even if they are internal.

You should be absolutely sure these hosts are safe to crawl before you do this: we won’t crawl on any ports except 443 and 80, but if you are running other web services on the same host it’s possible an attacker could create a onebox or link crawling request that would hit those services and change data.

Videos used to play, now only links available
(Jeff Atwood) #2

Note that this is only possible if the HTTP onebox request triggers some kind of action on GET or HEAD. That is, you have a page on your intranet like this …


… where simply visiting that URL (issuing a HTTP GET to it) would… delete all your data. Correct @eviltrout?

(Felix Freiberger) #3

There is also a (theoretical) information disclosure risk: If https://internalsite.example.com/show-super-secret-data returns a page that will onebox and where the onebox contains sensitive data, this data will be leaked. (This is pretty unlikely, because most sensitive internal sites probably won’t onebox at all, or without sensitive data in the onebox itself.)

(Jeff Atwood) #4

It’s exceedingly unlikely that an internal page would onebox, don’t you think?

(Felix Freiberger) #5

Yeah, it is. But not impossible, and I refuse to not celebrate that Discourse has taken precautions against a possible issue just because it’s not very likely :slight_smile:
An albeit unlikely, I think it’s fair to warn sysadmins about the risk before they whitelist a host.

(Robin Ward) #6

Yes, onebox will never POST or PUT data, so if your internal app is built properly and does CSRF protection and all that you are good. It’s more of a danger for those legacy PHP apps where GETs are mutating data.

I hope those are rare, but I could also understand a company saying “hey we know this 15-year old app sucks but it only runs on our internal network so who cares”


This is a nice security precaution feature, but it would be great if the rails production log had some more debug text when it comes to why oneboxes are failing… something like “host X is on a private network but not whitelisted” or “opengraph meta tags missing” and so on.

I’ve been scratching my head about why internal oneboxes didn’t work all day until I found this explanation of the whitelisting setting. It wasn’t immediately obvious to me that the internal host whitelist required just the hostnames, without any http:// or url paths around it.

At least I learned something new about HTTP teapots after searching for that generic 418 code in the production.log :laughing: