"Onebox Assistant", crawl for those previews reliably!

What it does

Turns this kind of result:

(where your server has failed to bring back the page source so cannot extract the required tags to build the onebox)

Into this!:

It simply provides an alternative path for onebox to get its page source with which to look for meta-data when the target server refuses your connection.

It changes nothing about how onebox then processes the page source to find the meta-data and render the box.

It’s meant to allow you to enter the details and credentials of a third party API to bring back the page instead of doing a normal http call directly to the target page.

Why

I found my servers were being forbidden access to a number of commercial sites so oneboxes would fail to be rendered. It essentiallly helps leverage the trustworthiness of the 3rd party API, a bit like a mail service.

Why it’s cost effective

It is intended to use the API sparingly and will bring back the page source in the normal way under most circumstances. It only uses the API when it’s refused a response.. On deeper investigation and with experience I’ve noticed that using the API every time may be a requirement now as the redirects stage can fail to bring back the correct page for the very same reasons as a total denial to respond. The plugin can now use the API on every occasion.

What this means is you can use a relatively cheap VPS but still get reliable one-boxing functionality, even if your IP or user agent is somehow ‘blacklisted’.

You don’t need it if

You are oneboxing all your target content ok with the vanilla install and all users are happy

Pre-requisites

You need an account with a suitable 3rd part API.

Settings

onebox assistant api base address:  https://api.embed.rocks/api/

Above example uses embed.rocks, but in the future support for other API’s might be added, however, embed.rocks is relatively good value atm.

onebox assistant api base query:   ?url=

onebox assistant api options:   &skip=article,description,oembed,imextra&include=source

onebox assistant api page source field:   source

You will also need to enter your API key provided by embed.rocks

See example below

image

This setting allows you to ignore the prefetch (to check if the direct crawl returns a result) and use the API from the get-go.


default OFF

I recommend setting this to TRUE.

This is more expensive of course but often yields better results as there are some cases where the pre-fetch gets redirected to the wrong page because you are not trusted.

Support Information

Remember, if you’ve previously attempted to onebox a link, Discourse core will cache the result.

You can add a random querystring on the end to overcome the cache: https://mylink.com/todaynews?random=random

You can also check the API is responding with, e.g.:

curl -X GET "https://api.embed.rocks/api/?url=https%3A%2F%2Fnews.bbc.co.uk%0A&skip=article,description,oembed,imextra&include=source" -H "x-api-key: %%%your-api-key%%%"

You need to url encode the site you are calling (the url parameter value) using some site like this (not vouched for!)

Known Limitations

  • It’s only been tested with one provider at the moment and not tested on others. That provider is https://embed.rocks (with whom I have no affiliation). I’m happy to consider supporting more services if the work is sponsored.

  • The monkey patching is done at method level. This overrides more code than it needs to which leads to a greater risk of the plugin breaking after a core update. However I don’t think there’s a way to minimise this further?

How to install plugins

See the guide here: Install Plugins in Discourse

This repo is: https://github.com/merefield/discourse-onebox-assistant

https://github.com/merefield/discourse-onebox-assistant

All feedback welcome. Please :star: it on GitHub if you find it useful.

38 Likes