"Onebox Assistant", crawl for those previews reliably!

merefield · January 24, 2019, 12:16pm

What it does

Turns this kind of result:

(where your server has failed to bring back the page source so cannot extract the required tags to build the onebox)

Into this!:

It simply provides an alternative path for onebox to get its page source with which to look for meta-data when the target server refuses your connection.

It changes nothing about how onebox then processes the page source to find the meta-data and render the box.

It’s meant to allow you to enter the details and credentials of a third party API to bring back the page instead of doing a normal http call directly to the target page.

Why

I found my servers were being forbidden access to a number of commercial sites so oneboxes would fail to be rendered. It essentiallly helps leverage the trustworthiness of the 3rd party API, a bit like a mail service.

Why it’s cost effective

You can use a relatively cheap VPS but still get reliable one-boxing functionality, even if your IP or user agent is somehow ‘blacklisted’.

You don’t need it if

You are oneboxing all your target content ok with the vanilla install and all users are happy

Pre-requisites

You need an account with a suitable 3rd part API.

Settings

onebox assistant api base address:  https://api.embed.rocks/api/

Above example uses embed.rocks, but in the future support for other API’s might be added, however, embed.rocks is relatively good value atm.

onebox assistant api base query:   ?url=

onebox assistant api options:   &skip=article,description,oembed,imextra&include=source

onebox assistant api page source field:   source

You will also need to enter your API key provided by embed.rocks

See example below

This setting allows you to ignore the prefetch (to check if the direct crawl returns a result) and use the API from the get-go.

default OFF

I recommend setting this to TRUE.

This is more expensive of course but often yields better results as there are some cases where the pre-fetch gets redirected to the wrong page because you are not trusted.

Support Information

Remember, if you’ve previously attempted to onebox a link, Discourse core will cache the result.

You can add a random querystring on the end to overcome the cache: https://mylink.com/todaynews?random=random

You can also check the API is responding with, e.g.:

curl -X GET "https://api.embed.rocks/api/?url=https%3A%2F%2Fnews.bbc.co.uk%0A&skip=article,description,oembed,imextra&include=source" -H "x-api-key: %%%your-api-key%%%"

You need to url encode the site you are calling (the url parameter value) using some site like this (not vouched for!)

Known Limitations

It’s only been tested with one provider at the moment and not tested on others. That provider is https://embed.rocks (with whom I have no affiliation). I’m happy to consider supporting more services if the work is sponsored.
The monkey patching is done at method level. This overrides more code than it needs to which leads to a greater risk of the plugin breaking after a core update. However I don’t think there’s a way to minimise this further?

How to install plugins

See the guide here: Install plugins on a self-hosted site

This repo is: https://github.com/merefield/discourse-onebox-assistant

All feedback welcome. Please it on GitHub if you find it useful.

znedw · November 25, 2020, 9:52pm

Sorry I haven’t had a chance to rebuild our site with this addon enabled, I’ll do so tonight.

@WaitroseCarpark I didn’t fill out anything like that, steps I followed are:

Step 1:

Step 2:

Step 3:

Step 4:
do the robot check

Step 5:
add oEmbed api

sign your life away

green tick for oEmbed

Step 6:
get creds from Settings > Basic

Step 7:
get an app token

with the creds from above, run

curl -X GET "https://graph.facebook.com/oauth/access_token?client_id={your-app-id}&client_secret={your-app-secret}&grant_type=client_credentials"

which returns

{"access_token":"378384926723309|xxxxxx","token_type":"bearer"}

test your auth token

curl -X GET \ "https://graph.facebook.com/v9.0/instagram_oembed?url=https://www.instagram.com/p/fA9uwTtkSN/&access_token=xxxx..."

which returns

{"version":"1.0","author_name":"diegoquinteiro","provider_name":"Instagram","provider_url":"https:\/\/www.instagram.com\/","type":"rich","width":658,"html":"\u003Cblockquote class=\"instagram-media\" data-instgrm-captioned data-instgrm-permalink=\"https:\/\/www.instagram.com\/p\/fA9uwTtkSN\/?utm_source=ig_embed&amp;utm_campaign=loading\" ....

Add to discourse and you’re done!

Richie · November 25, 2020, 10:42pm

Wow…

Ok, that worked…

Thanks man

merefield · November 25, 2020, 11:24pm

With the plugin installed? @Richie

WaitroseCarpark · November 26, 2020, 9:50am

@znedw Thank you so much for taking the time to explain step by step (with pictures! ) how to do this! Works a charm as far as I can see!

merefield · November 26, 2020, 9:51am

with the plugin installed?

WaitroseCarpark · November 26, 2020, 10:36am

I only actually noticed you’ve made this plugin just now…I was not using it, sorry for hijacking your thread.

BUT, I will try it out later today if I can find some spare time…

Richie · November 26, 2020, 9:55pm

I spent half an hour or so this evening trying all manner of possible permutations, Robert @merefield

Following the steps provided by @znedw (thanks again my man!) it would appear that this method works both with the Onebox Assistant enabled, and with it disabled too (via admin panel, plugins, checkbox).

I’ve not tried rebuilding my Discourse with the plugin completely removed.

Perhaps worth a note, I do not have the option enabled for “Always use the proxy crawl regardless of direct response”.

Off topic, it would appear amazon.co.uk are blocking all manner of requests again, both directly (plugin disabled) and also via embed.rocks “try it” page, which also times out

merefield · November 26, 2020, 10:06pm

Thanks. Yes I expected it to work at least without the “always use the proxy”

merefield · November 29, 2020, 10:53am

On the subject of Amazon I always use their affiliate links anyway, manually, which are not blocked.

Oneboxing breaches their T&Cs if you are an affiliate.

Highly recommend you consider the affiliate membership. It doesn’t net a fortune but may pay for the server and mail fees.

Ed_Bobkov · January 15, 2021, 4:18pm

Is it possible to use this plugin only for YouTube links? Can’t understand what settings should I use for this.
And also where can I find more info about possible settings?
Thank you!

merefield · January 15, 2021, 4:58pm

No. You’d have to fork and develop the code further.

Just copy those in the OP exactly. You will have your own key.

Ed_Bobkov · January 15, 2021, 5:21pm

How can I check if the plugin working and the queries are going through https://embed.rocks?
I activated it, but the error hasn’t gone.

I test with these videos:

https://www.youtube.com/watch?v=_2wfBNUnOVY
https://www.youtube.com/watch?v=qLNhVC296YI

merefield · January 15, 2021, 9:31pm

The dashboard on embed.rocks should start reflecting your calls (a local counter is a nice-to-have but have not yet implemented one)

Both of those links work for me via the plugin.

If it’s not working, try checking this option:

Ed_Bobkov · January 16, 2021, 7:58am

Are all settings you have equal to those in the picture you gave? Maybe I have a mistake when typing them. Any option to copy them?

merefield · January 16, 2021, 12:43pm

I’ve added plaintext to the OP for you.

Ed_Bobkov · January 16, 2021, 12:52pm

Thank you. I checked and everything was correct. I can’t see any call on the Dashboard page in the Usage section. Does it mean smth is wrong? Should the number of calls always be visible?

merefield · January 16, 2021, 1:11pm

You’d need to ask embed.rocks that question. At some point I may add additional logging to the plugin which might help. You might be able to set your site to log level ‘info’ to get more information however, as there are existing calls to Rails.logger.info

Ed_Bobkov · January 16, 2021, 2:02pm

Yeah, that would be great. Because now I activated the plugin and configured the settings, but youtube is not working and there is no activity on the Dashboard. I sent an email to the embed.rocks support.

Ed_Bobkov · January 27, 2021, 5:29pm

Will this plugin work correctly if I store files at Amazon S3?
I still struggle with it and can’t make it working. Videos are not uploaded - only links. And there is no information on the Usage Panel of the Embed.rocks

I have a testing server. Files are stored locally on it. But the plugin also doesn’t work and there is no activity on the Usage dashboard. But videos are saved on the site.

So I can’t understand how and what to check to solve the problem.

Please, help with your ideas.

Topic		Replies	Views
Youtube videos onebox embedding stopped working Support	49	6252	April 5, 2021
Onebox issue with a specific site Support onebox	14	1595	March 2, 2019
Vimeo Embed not working on my site due to Vimeo server IP blacklisting Support	51	12354	August 31, 2023
Youtube embeds missing Support	51	3926	June 8, 2024
Onebox requests being incorrectly redirected due to user-agent Support	20	1690	November 28, 2022