Why are oneboxes cached into posts?

One of the things that bother me a bit about Onebox is that the content of Onebox is cached in the post and does not change by changing the fields until the post is edited.

For example, we create an OneBox from an account, and a week later the user changes his biography …

Re-downloading the Onebox every time the page was viewed would slow things down quite a bit. It would be silly to re-download a onebox that never changed. There’s no way to know that it’s changed, so forcing a post-rebuild (with the wrench or by editing) is the way to go. If you knew that some category was likely to have oneboxes you want to update regularly, you could create a plugin that would, say, rebake all the topics in the category on a daily basis.

6 Likes

In general Oneboxes are cached because you want to rate limit scraping of external sites locally to prevent you being banned! (but sure caching has a performance benefit too!) To get around the cache on the odd occasion you need to just add a fake querystring like:

?myname=isbill

or something at the end of the link.

That will cause a refresh.

7 Likes

Onebox caching is absolutely necessary for external content and there is no expectation that it will be refreshed. I meant internal Oneboxes. As I mentioned in the example above, the Onebox created by a user is expected to be updated on all sites by changing the person’s biography.

Also, Oneboxes should be cached for internal content, and this is necessary to reduce the load, but maybe it is not bad to have internal Oneboxes indexed somewhere, and if the source record changes, an update task will be scheduled for them.

In the current situation, if the discourse updates the formatting of the Onebox changes (for internal cases), in the old posts it will still be displayed with the same pattern. This problem will be solved if we cache the content of the Onebox as JSON in the post and format it by the client.

3 Likes

I’m building Onebox support into my site and editing a post is not refreshing the onebox cache. I also don’t see an option to “post-rebuild” under my wrench menu.

However, adding a fake querystring does work, although if I go back to the initial URL the cached version comes back up.

Is there a setting or trick I’m missing?

My version is: Discourse 2.5.5

Be careful, sometimes using CDN as a proxy in front of the site and activating the cache layer in the proxy can cause problems. If you are using CDN as a proxy (like Cloudflare), turn it off once and check the problem again.

I’m not familiar with CDNs and not using Cloudflare. I am only adding og: metadata data to my site so as to implement OpenGraph and thus Onebox.

As a result I’m making a lot of changes in my own site and want to see how they appear when referenced from Discourse.

I notice that editing a post does not cause the onebox to refresh the link. Adding a ?x=1 to the URL does.

Should the onebox re-scan the target URL each time the post is edited?

I’m not seeing much uptake on this… it seems like a bug to me. Unless there is a checkbox control to turn Onebox refresh on and off?

As an example of an alternative approach, this page lets you test your meta data to see what you will get, but it is also a front end for the Facebook cache.

Funny that their OpenGraph tag appears to be broken! :slight_smile:

If you try an URL in this tool it will fetch the cache results, and then gives you a button which you can use to re-scrape the site to update their cache.

That might be a nice option for this tool… perhaps a button that comes up when you hover over an URL which is in the cache?

As it is my first tests while implementing this feature will always be what shows on certain links.

We are also looking for a way to refresh oneboxes, in our case oneboxes of other topics in our forum.

Our use case is to build documentation across the forum, which may remix other wiki posts. But, since these are wiki posts, it’s obviously not ideal that the onebox won’t stay up to date with the many edits that are likely to be made over time.

It would be great if we could have a setting to automatically refresh oneboxes if they match the domain of our forum and/or belong to the category that this use case will occur in (wiki/docs).

4 Likes

As I build out Opengraph support on my main site (linked to Discourse through SSO) I’m running into this issue more and more.

One trick that really helps is that I no longer drop in the actual URL right away, if ever. Instead I use the trick suggested by @merefield above:

I add an ?n=1 to the end of each url the first time I reference it. If the onebox looks bad I can go update the page then increment my N variable and iterate until it looks good.

Once I’m done, anyone who drops in just the URL will get the final scrape of the page rather than the first one.

3 Likes

No matter what I try, I cannot get a github repo onebox to refresh. I changed the README.md on the repo to be more up-to-date and I have added a fake query string and also tried a rebuild, but it stubbornly stays the same.

Any other suggestions? (This is a hosted instance so I have limited/no access to the backend)

Edit: I think I found the problem - my fault - as I didn’t realise the repo description is not in sync with my README.md.