"Onebox Assistant", crawl for those previews reliably!

What it does

Turns this kind of result:

(where your server has failed to bring back the page source so cannot extract the required tags to build the onebox)

Into this!:

It simply provides an alternative path for onebox to get its page source with which to look for meta-data when the target server refuses your connection.

It changes nothing about how onebox then processes the page source to find the meta-data and render the box.

It’s meant to allow you to enter the details and credentials of a third party API to bring back the page instead of doing a normal http call directly to the target page.

Why

I found my servers were being forbidden access to a number of commercial sites so oneboxes would fail to be rendered. It essentiallly helps leverage the trustworthiness of the 3rd party API, a bit like a mail service.

Why it’s cost effective

It is intended to use the API sparingly and will bring back the page source in the normal way under most circumstances. It only uses the API when it’s refused a response.. On deeper investigation and with experience I’ve noticed that using the API every time may be a requirement now as the redirects stage can fail to bring back the correct page for the very same reasons as a total denial to respond. The plugin can now use the API on every occasion.

What this means is you can use a relatively cheap VPS but still get reliable one-boxing functionality, even if your IP or user agent is somehow ‘blacklisted’.

You don’t need it if

You are oneboxing all your target content ok with the vanilla install and all users are happy

Pre-requisites

You need an account with a suitable 3rd part API.

Settings

onebox assistant api base address:  https://api.embed.rocks/api/

Above example uses embed.rocks, but in the future support for other API’s might be added, however, embed.rocks is relatively good value atm.

onebox assistant api base query:   ?url=

onebox assistant api options:   &skip=article,description,oembed,imextra&include=source

onebox assistant api page source field:   source

You will also need to enter your API key provided by embed.rocks

See example below

image

This setting allows you to ignore the prefetch (to check if the direct crawl returns a result) and use the API from the get-go.


default OFF

I recommend setting this to TRUE.

This is more expensive of course but often yields better results as there are some cases where the pre-fetch gets redirected to the wrong page because you are not trusted.

Support Information

Remember, if you’ve previously attempted to onebox a link, Discourse core will cache the result.

You can add a random querystring on the end to overcome the cache: https://mylink.com/todaynews?random=random

You can also check the API is responding with, e.g.:

curl -X GET "https://api.embed.rocks/api/?url=https%3A%2F%2Fnews.bbc.co.uk%0A&skip=article,description,oembed,imextra&include=source" -H "x-api-key: %%%your-api-key%%%"

You need to url encode the site you are calling (the url parameter value) using some site like this (not vouched for!)

Known Limitations

  • It’s only been tested with one provider at the moment and not tested on others. That provider is https://embed.rocks (with whom I have no affiliation). I’m happy to consider supporting more services if the work is sponsored.

  • The monkey patching is done at method level. This overrides more code than it needs to which leads to a greater risk of the plugin breaking after a core update. However I don’t think there’s a way to minimise this further?

How to install plugins

See the guide here: Install Plugins in Discourse

This repo is: https://github.com/merefield/discourse-onebox-assistant

https://github.com/merefield/discourse-onebox-assistant

All feedback welcome. Please :star: it on GitHub if you find it useful.

38 Likes

Sorry I haven’t had a chance to rebuild our site with this addon enabled, I’ll do so tonight.

@WaitroseCarpark I didn’t fill out anything like that, steps I followed are:

Step 1:

Step 2:

Step 3:

Step 4:
do the robot check

Step 5:
add oEmbed api


sign your life away
image
green tick for oEmbed
image

Step 6:
get creds from Settings > Basic

Step 7:
get an app token

with the creds from above, run

curl -X GET "https://graph.facebook.com/oauth/access_token?client_id={your-app-id}&client_secret={your-app-secret}&grant_type=client_credentials"

which returns

{"access_token":"378384926723309|xxxxxx","token_type":"bearer"}

test your auth token

curl -X GET \ "https://graph.facebook.com/v9.0/instagram_oembed?url=https://www.instagram.com/p/fA9uwTtkSN/&access_token=xxxx..."

which returns

{"version":"1.0","author_name":"diegoquinteiro","provider_name":"Instagram","provider_url":"https:\/\/www.instagram.com\/","type":"rich","width":658,"html":"\u003Cblockquote class=\"instagram-media\" data-instgrm-captioned data-instgrm-permalink=\"https:\/\/www.instagram.com\/p\/fA9uwTtkSN\/?utm_source=ig_embed&utm_campaign=loading\" ....

Add to discourse and you’re done!

5 Likes

Wow…

Ok, that worked…

Thanks man :bowing_man:

4 Likes

With the plugin installed? @Richie

1 Like

@znedw Thank you so much for taking the time to explain step by step (with pictures! :heart_eyes:) how to do this! Works a charm as far as I can see!

3 Likes

with the plugin installed?

1 Like

I only actually noticed you’ve made this plugin just now…I was not using it, sorry for hijacking your thread.

BUT, I will try it out later today if I can find some spare time…

2 Likes

I spent half an hour or so this evening trying all manner of possible permutations, Robert @merefield

Following the steps provided by @znedw (thanks again my man!) it would appear that this method works both with the Onebox Assistant enabled, and with it disabled too (via admin panel, plugins, checkbox).

I’ve not tried rebuilding my Discourse with the plugin completely removed.

Perhaps worth a note, I do not have the option enabled for “Always use the proxy crawl regardless of direct response”.

Off topic, it would appear amazon.co.uk are blocking all manner of requests again, both directly (plugin disabled) and also via embed.rocks “try it” page, which also times out :roll_eyes:

3 Likes

Thanks. Yes I expected it to work at least without the “always use the proxy”

2 Likes

On the subject of Amazon I always use their affiliate links anyway, manually, which are not blocked.

Oneboxing breaches their T&Cs if you are an affiliate.

Highly recommend you consider the affiliate membership. It doesn’t net a fortune but may pay for the server and mail fees.

1 Like

Is it possible to use this plugin only for YouTube links? Can’t understand what settings should I use for this.
And also where can I find more info about possible settings?
Thank you!

No. You’d have to fork and develop the code further.

Just copy those in the OP exactly. You will have your own key.

2 Likes

How can I check if the plugin working and the queries are going through https://embed.rocks?
I activated it, but the error hasn’t gone.

I test with these videos:

https://www.youtube.com/watch?v=_2wfBNUnOVY
https://www.youtube.com/watch?v=qLNhVC296YI
1 Like

The dashboard on embed.rocks should start reflecting your calls (a local counter is a nice-to-have but have not yet implemented one)

Both of those links work for me via the plugin.

If it’s not working, try checking this option:

image

2 Likes

Are all settings you have equal to those in the picture you gave? Maybe I have a mistake when typing them. Any option to copy them?

1 Like

I’ve added plaintext to the OP for you.

3 Likes

Thank you. I checked and everything was correct. I can’t see any call on the Dashboard page in the Usage section. Does it mean smth is wrong? Should the number of calls always be visible?

1 Like

You’d need to ask embed.rocks that question. At some point I may add additional logging to the plugin which might help. You might be able to set your site to log level ‘info’ to get more information however, as there are existing calls to Rails.logger.info

1 Like

Yeah, that would be great. Because now I activated the plugin and configured the settings, but youtube is not working and there is no activity on the Dashboard. I sent an email to the embed.rocks support.

1 Like

Will this plugin work correctly if I store files at Amazon S3?
I still struggle with it and can’t make it working. Videos are not uploaded - only links. And there is no information on the Usage Panel of the Embed.rocks

I have a testing server. Files are stored locally on it. But the plugin also doesn’t work and there is no activity on the Usage dashboard. But videos are saved on the site.

So I can’t understand how and what to check to solve the problem.

Please, help with your ideas.

2 Likes