Any implementation of this will likely result in your Discourse instance making spurious calls to non-existent websites while someone is typing, which is a bit wasteful.
The client already makes a lot of calls to /onebox, maybe they should be debounced.
I just hit this exact issue again today, this is something I would also like build.
In particular for internal Discourse instances there is zero reason to blacklist any sites. In general, I am not really sure what the value is of the whitelist approach. Facebook and Twitter seem to be coping just fine with a blacklist approach.
I guess the big question is:
What kind of abuse does the onebox whitelist only approach prevent? Honestly I am struggling really hard to think of any.
The idea was to be safe by default but relax it over time, as opengraph and oembed become more common on the web. I would still prefer to see mini oneboxer implemented before we do that though.
That is what I am struggling with, what is “insecure” opening the open-graph floodgates? I just can’t think of anything really, we already download images from arbitrary sources in our pipeline which is far more risky.
Not at all against mini-onebox, in fact this would tie in to mini-onebox work. Just struggling real hard to figure out what we are protecting against here.
For the record I always thought it should be blacklist, not whitelist, but I also had no problem erring on the side of security. In early discussions I believe @codinghorror insisted it should be whitelist.
Please change this from whitelist to blacklist. This is hurting the decentralized web. With the rise of Mastodon as an alternative federated social media platform I believe this becomes more important than ever. You can’t whitelist every potential Mastodon URL out there - there’s over 2000 different servers. All support OEmbed via discovery, since it’s an open standard specifically for this use case. But Discourse won’t work with any of them because of this choice.
Last time I checked Discourse sanitizes OEmbed html and strips out any scripts, allowing only pure iframes and other safe elements. So I do not see this as an issue defined by security risks.
Is it behind a setting? I just tried it on my hosted Discourse and it didn’t work.
Excuse me while I try it right here:
Hm nope, doesn’t work - displays the OpenGraph tags instead. Before you ask - I did test the OEmbed discovery implementation on Mastodon’s side using other OEmbed tools.
They’re saying that the oEmbed format would be preferred, and oembed is still on an (extremely strict) whitelist (because a lot of oembed stuff is broken, so Discourse preferring it by default would result in a lot of broken stuff).
How are the entries expected in the blacklist? If I wanted to blacklist the website https://www.example.com, can i just enter “example” (seemingly not) or the entire base url (i.e. https://www.example.com)?