"Onebox Assistant",可靠地抓取那些预览!

功能说明

将此类结果:

(当您的服务器无法获取页面源代码,因此无法提取构建单框所需的标签时)

转变为这样:

它仅为单框功能提供了一条替代路径,以便在目标服务器拒绝连接时获取页面源代码并查找元数据。

它不会改变单框如何处理页面源代码以查找元数据并渲染单框的方式。

它的目的是允许您输入第三方 API 的详细信息和凭据,以获取页面内容,而不是直接向目标页面发起正常的 HTTP 调用。

原因

我发现我的服务器被禁止访问多个商业网站,导致单框无法渲染。它本质上利用了第三方 API 的可信度,有点像邮件服务。

为何具有成本效益

您可以使用相对便宜的 VPS,即使您的 IP 或用户代理被“列入黑名单”,仍能获得可靠的单框功能。

您不需要它的情况

如果您使用默认安装即可正常为所有目标内容生成单框,且所有用户都满意。

前提条件

您需要拥有一个合适的第三方 API 账户。

设置

onebox assistant api base address:  https://api.embed.rocks/api/

上述示例使用的是 embed.rocks,但未来可能会添加对其他 API 的支持。目前,embed.rocks 性价比较高。

onebox assistant api base query:   ?url=

onebox assistant api options:   &skip=article,description,oembed,imextra&include=source

onebox assistant api page source field:   source

您还需要输入由 embed.rocks 提供的 API 密钥。

参见下方示例:

此设置允许您忽略预取(检查直接爬取是否返回结果),而直接使用 API。

image
默认值为 OFF。

我建议将其设置为 TRUE。

当然,这成本更高,但通常能获得更好的结果,因为在某些情况下,由于不被信任,预取会被重定向到错误的页面。

支持信息

请记住,如果您之前尝试过对某个链接生成单框,Discourse 核心会缓存该结果。

您可以在链接末尾添加随机查询字符串以绕过缓存:https://mylink.com/todaynews?random=random

您还可以通过以下方式检查 API 是否响应,例如:

curl -X GET "https://api.embed.rocks/api/?url=https%3A%2F%2Fnews.bbc.co.uk%0A&skip=article,description,oembed,imextra&include=source" -H "x-api-key: %%%your-api-key%%%"

您需要使用类似 这样的网站(未作担保!)对您要调用的站点(即 url 参数值)进行 URL 编码。

已知限制

  • 目前仅测试过一家提供商,尚未测试其他提供商。该提供商是 https://embed.rocks(我与该提供商无任何关联)。如果工作得到赞助,我很乐意考虑支持更多服务。

  • 猴子补丁是在方法级别完成的。这会覆盖比实际需要更多的代码,从而增加了插件在核心更新后失效的风险。不过,我认为没有进一步减少这种风险的方法?

如何安装插件

请参阅此处指南:Install plugins on a self-hosted site

此仓库地址为:https://github.com/merefield/discourse-onebox-assistant

https://github.com/merefield/discourse-onebox-assistant

欢迎提供反馈。如果您觉得它有用,请在 GitHub 上给它点星。

41 个赞

Sorry I haven’t had a chance to rebuild our site with this addon enabled, I’ll do so tonight.

@WaitroseCarpark I didn’t fill out anything like that, steps I followed are:

Step 1:

Step 2:

Step 3:

Step 4:
do the robot check

Step 5:
add oEmbed api


sign your life away
image
green tick for oEmbed
image

Step 6:
get creds from Settings > Basic

Step 7:
get an app token

with the creds from above, run

curl -X GET "https://graph.facebook.com/oauth/access_token?client_id={your-app-id}&client_secret={your-app-secret}&grant_type=client_credentials"

which returns

{"access_token":"378384926723309|xxxxxx","token_type":"bearer"}

test your auth token

curl -X GET \ "https://graph.facebook.com/v9.0/instagram_oembed?url=https://www.instagram.com/p/fA9uwTtkSN/&access_token=xxxx..."

which returns

{"version":"1.0","author_name":"diegoquinteiro","provider_name":"Instagram","provider_url":"https:\/\/www.instagram.com\/","type":"rich","width":658,"html":"\u003Cblockquote class=\"instagram-media\" data-instgrm-captioned data-instgrm-permalink=\"https:\/\/www.instagram.com\/p\/fA9uwTtkSN\/?utm_source=ig_embed&utm_campaign=loading\" ....

Add to discourse and you’re done!

6 个赞

Wow…

Ok, that worked…

Thanks man :bowing_man:

4 个赞

With the plugin installed? @Richie

1 个赞

@znedw Thank you so much for taking the time to explain step by step (with pictures! :heart_eyes:) how to do this! Works a charm as far as I can see!

3 个赞

with the plugin installed?

1 个赞

I only actually noticed you’ve made this plugin just now…I was not using it, sorry for hijacking your thread.

BUT, I will try it out later today if I can find some spare time…

2 个赞

I spent half an hour or so this evening trying all manner of possible permutations, Robert @merefield

Following the steps provided by @znedw (thanks again my man!) it would appear that this method works both with the Onebox Assistant enabled, and with it disabled too (via admin panel, plugins, checkbox).

I’ve not tried rebuilding my Discourse with the plugin completely removed.

Perhaps worth a note, I do not have the option enabled for “Always use the proxy crawl regardless of direct response”.

Off topic, it would appear amazon.co.uk are blocking all manner of requests again, both directly (plugin disabled) and also via embed.rocks “try it” page, which also times out :roll_eyes:

3 个赞

Thanks. Yes I expected it to work at least without the “always use the proxy”

2 个赞

On the subject of Amazon I always use their affiliate links anyway, manually, which are not blocked.

Oneboxing breaches their T&Cs if you are an affiliate.

Highly recommend you consider the affiliate membership. It doesn’t net a fortune but may pay for the server and mail fees.

1 个赞

Is it possible to use this plugin only for YouTube links? Can’t understand what settings should I use for this.
And also where can I find more info about possible settings?
Thank you!

No. You’d have to fork and develop the code further.

Just copy those in the OP exactly. You will have your own key.

2 个赞

How can I check if the plugin working and the queries are going through https://embed.rocks?
I activated it, but the error hasn’t gone.

I test with these videos:

https://www.youtube.com/watch?v=_2wfBNUnOVY
https://www.youtube.com/watch?v=qLNhVC296YI
1 个赞

The dashboard on embed.rocks should start reflecting your calls (a local counter is a nice-to-have but have not yet implemented one)

Both of those links work for me via the plugin.

If it’s not working, try checking this option:

image

2 个赞

Are all settings you have equal to those in the picture you gave? Maybe I have a mistake when typing them. Any option to copy them?

1 个赞

I’ve added plaintext to the OP for you.

3 个赞

Thank you. I checked and everything was correct. I can’t see any call on the Dashboard page in the Usage section. Does it mean smth is wrong? Should the number of calls always be visible?

1 个赞

You’d need to ask embed.rocks that question. At some point I may add additional logging to the plugin which might help. You might be able to set your site to log level ‘info’ to get more information however, as there are existing calls to Rails.logger.info

1 个赞

Yeah, that would be great. Because now I activated the plugin and configured the settings, but youtube is not working and there is no activity on the Dashboard. I sent an email to the embed.rocks support.

1 个赞

Will this plugin work correctly if I store files at Amazon S3?
I still struggle with it and can’t make it working. Videos are not uploaded - only links. And there is no information on the Usage Panel of the Embed.rocks

I have a testing server. Files are stored locally on it. But the plugin also doesn’t work and there is no activity on the Usage dashboard. But videos are saved on the site.

So I can’t understand how and what to check to solve the problem.

Please, help with your ideas.

2 个赞