Blacklist vs whitelist onebox

Continuing the discussion from Auto-discoverable oneboxer-able links based on whitelist configurable by admin:

I would like to enable onebox for every site that support oEmbed or Open Graph and be able to blacklist some sites in case of improper usage.

2 Likes

This currently isn’t possible. You’d need to make changes to the onebox gem.

Any implementation of this will likely result in your Discourse instance making spurious calls to non-existent websites while someone is typing, which is a bit wasteful.

The client already makes a lot of calls to /onebox, maybe they should be debounced.

1 Like

and would it be possible to create a plugin to enable OEmbed discovery ?

like this plugin for wordpress Enable oEmbed Discovery – WordPress plugin | WordPress.org

see http://oembed.com/#section4

I just hit this exact issue again today, this is something I would also like build.

In particular for internal Discourse instances there is zero reason to blacklist any sites. In general, I am not really sure what the value is of the whitelist approach. Facebook and Twitter seem to be coping just fine with a blacklist approach.

I guess the big question is:

What kind of abuse does the onebox whitelist only approach prevent? Honestly I am struggling really hard to think of any.

@eviltrout ? @codinghorror ?

1 Like

The idea was to be safe by default but relax it over time, as opengraph and oembed become more common on the web. I would still prefer to see mini oneboxer implemented before we do that though.

That is what I am struggling with, what is “insecure” opening the open-graph floodgates? I just can’t think of anything really, we already download images from arbitrary sources in our pipeline which is far more risky.

Not at all against mini-onebox, in fact this would tie in to mini-onebox work. Just struggling real hard to figure out what we are protecting against here.

1 Like

For the record I always thought it should be blacklist, not whitelist, but I also had no problem erring on the side of security. In early discussions I believe @codinghorror insisted it should be whitelist.

1 Like

Yeah I think only oEmbed needs whitelisting, and even then most of the oEmbed returns are unusable due to <script>s.

2 Likes

Please change this from whitelist to blacklist. This is hurting the decentralized web. With the rise of Mastodon as an alternative federated social media platform I believe this becomes more important than ever. You can’t whitelist every potential Mastodon URL out there - there’s over 2000 different servers. All support OEmbed via discovery, since it’s an open standard specifically for this use case. But Discourse won’t work with any of them because of this choice.

Last time I checked Discourse sanitizes OEmbed html and strips out any scripts, allowing only pure iframes and other safe elements. So I do not see this as an issue defined by security risks.

@codinghorror @eviltrout

This was changed some time ago and is already how it works today. Did you try it yourself?

2 Likes

Is it behind a setting? I just tried it on my hosted Discourse and it didn’t work.

Excuse me while I try it right here:

Hm nope, doesn’t work - displays the OpenGraph tags instead. Before you ask - I did test the OEmbed discovery implementation on Mastodon’s side using other OEmbed tools.

Looks correct to me. I see zero issues.

They’re saying that the oEmbed format would be preferred, and oembed is still on an (extremely strict) whitelist (because a lot of oembed stuff is broken, so Discourse preferring it by default would result in a lot of broken stuff).

3 Likes

How are the entries expected in the blacklist? If I wanted to blacklist the website https://www.example.com, can i just enter “example” (seemingly not) or the entire base url (i.e. https://www.example.com)?