"Onebox Assistant", a plugin to help onebox do its job

What it does

Turns this kind of result:

(where your server has failed to bring back the page source so cannot extract the required tags to build the onebox)

Into this!:

It simply provides an alternative path for onebox to get its page source with which to look for meta-data when the target server refuses your connection.

It changes nothing about how onebox then processes the page source to find the meta-data and render the box.

It’s meant to allow you to enter the details and credentials of a third party API to bring back the page instead of doing a normal http call directly to the target page.

Why

I found my servers were being forbidden access to a number of commercial sites so oneboxes would fail to be rendered. It essentiallly helps leverage the trustworthiness of the 3rd party API, a bit like a mail service.

Why it’s cost effective

It is intended to use the API sparingly and will bring back the page source in the normal way under most circumstances. It only uses the API when it’s refused a response.

What this means is you can use a relatively cheap VPS but still get reliable one-boxing functionality, even if your IP or user agent is somehow ‘blacklisted’.

You don’t need it if

You are oneboxing all your target content ok with the vanilla install and all users are happy

Pre-requisites

You need an account with a suitable 3rd part API.

Settings

See example below

image

Known Limitations

  • It’s only been tested with one provider at the moment and not tested on others. That provider is https://embed.rocks (with whom I have no affiliation). I’m happy to consider supporting more services if the work is sponsored.

There’s a switch to return page source.

  • The monkeypatching is a little brash and there’s rather too much Discourse code in the current version. Advice on how to minimise the amount of original source-code most welcome!

Repo here

All feedback welcome.

17 Likes

Can i know what 3rd party API you are using?

1 Like

The plugin currently supports https://embed.rocks. I’ve updated the first post with that information too.

There’s a switch to return page source instead of a ‘card’ (shown in the screenshot).

embed.rocks look like down right now.

I’m trying to use proxycrawl.com but i can not. Can you take a look?

I’ve escalated the website issue. Have mailed the owner.

Thanks for the alternate link. Very useful suggestion. I don’t have time to look at this at the moment unfortunately as have a couple of live projects. I suggest using embed.rocks for now.

If you’d like me to take a look at supporting proxycrawl more urgently you can hire me or submit a PR :).

Hi @merefield, I’ve been stuck for about an hour over here. Have tried every combination of what should work for settings. No matter what I do I can’t get this URL to work in Discourse… even though embed.rocks doesn’t have a problem getting the data under “Try It” on their site.

https://oilandgaslawdigest.com/caselaw/scotx-applies-discovery-rule-to-breach-of-pref-right-despite-disclosure-in-deed-records/

Please help.

Hi James,

Yes this stuff is super frustrating at times.

It’s neither Discourse’s fault on this occasion, nor your server (though it would be nice if Discourse provided more info about why it’s failed).

If you check with Facebook’s opengraph debugger, you see this:

https://developers.facebook.com/tools/debug/sharing/?q=https%3A%2F%2Foilandgaslawdigest.com%2Fcaselaw%2Fscotx-applies-discovery-rule-to-breach-of-pref-right-despite-disclosure-in-deed-records%2F

So it looks like the target website doesn’t have a very well formed metadata section. If you know them, you could raise it with them - they should be checking with the Facebook tool in any case.

It always amazes me how many sites still don’t get their metadata right. I face this weekly on one of my sites that focusses on financial markets.

The other systems, like embed.rocks and iframely are possibly using alternative tags and tricks to put their previews together.

Remember my plugin is not using embed.rocks previews, its merely using that service to scrape, so onebox is processing the original page source.

As I said, Discourse/Onebox could arguably help more by being more transparent about why it fails when it is unable to render these things instead of just rendering the original URL. Letting the poster know which tag(s) were missing or if there was a bad response from the scrape attempt that prevented the onebox from rendering correctly would be a real improvement.

It might be good to enhance the plugin or build another plugin to support one of the third party preview builders to provide an alternative to oneboxing. That’s currently beyond the scope and I’m busy with other projects at the moment. I might consider the work at some stage if it were suitably sponsored. Nonetheless, retaining onebox functionality for previews makes the plugin more resilient and less likely to fail due to changes in core.

2 Likes