Incorrect or failing oneboxes for links to other discourse instances

Something seems to have been broken in a recent upgrade of the oneboxing engine. For example, this URL

https://meta.discourse.org/t/new-users-arent-starting-the-discobot-tutorial/62995/9?u=tophee

used to be rendered like this on a discourse instance that is not meta:

Now it is rendered like this:

Never mind the different colour scheme. It’s the text that is different. In the older version, the text of the post being referred to is shown, now it’s the text of the OP (although I also wonder why the user avatar is missing in both cases, but that would be an older bug).

To be more precise: the old image is from my forum which is currently still on v1.9.0.beta4 +61, the new one is from feverbee (as of today) but I have also tried it on another freshly installed instance.

Next one is https://meta.discourse.org/t/change-to-dd-mmm-mmm-yy-for-post-dates/13625/35?u=tophee which used to render like this:

but now comes out like this

Like in the first example, the text is now the OP’s text but - strangely, now also the image is a different one (first image in post instead of user avatar).

Finally, this URL https://meta.discourse.org/t/discourse-narrative-bot-beta-feedback/58621/254?u=tophee used to render like this:

but now it fails entirely to be oneboxed:

image

hmm…

https://try.discourse.org/t/how-can-i-make-a-post-wiki/325

and How can I make a post Wiki? - discourse - Demo

and https://try.discourse.org/t/how-can-i-make-a-post-wiki/325

Seem to work here…

https://try.discourse.org/t/a-bear-however-hard-he-tries-grows-tubby-without-exercise/48/3?u=samsaffron

:confused: can you repro whatever issue you have with try.

Yes. Here you go: https://try.discourse.org/t/testing-oneboxing-bug/791

And for future generations:

But just to be clear: what exactly is the expected behaviour when it comes to the images in the onebox? When should I expect the user avatar and when the first image in the post? (Personally, I think I’d always like to see the user avatar…)

3 Likes

Oh wait – you’re right, this is wrong. @techapj this is quite bad, if it is off by one, or affected by deletions. Please make this a priority.

255 is a deleted post so it may be somehow related @techAPJ can you have look to see if there is an off by one error or something

The only weird I see is that 254 is not one boxing @codinghorror

1 Like
https://meta.discourse.org/t/discourse-narrative-bot-beta-feedback/58621/254?u=tophee

has a canonical link:

https://meta.discourse.org/t/discourse-narrative-bot-beta-feedback/58621?page=13

which renders whitepage… looking more into it.

That does not look correct, maybe for crawler only. In which case you need to ignore that for Discourse instances.

1 Like

How about the text in the oneboxes not being from the referenced posts?

And how about the inconsistent rendering of images?

This is because the open graph data is now coming from canonical links (read more about this change here). For example:

https://meta.discourse.org/t/change-to-dd-mmm-mmm-yy-for-post-dates/13625/35?u=tophee

has a canonical link

https://meta.discourse.org/t/change-to-dd-mmm-mmm-yy-for-post-dates/13625?page=2

Now when you will see og:{xyz} data for the canonical link above you will understand why the text and image is different from that of linked post.

That said, I agree that the text and image shown in onebox should be of the linked post (instead of the page specific post as seen in above example). I am working on a fix for it right now.

2 Likes

I didn’t even know that discourse still does pagination in the background, but as I think about it, I see that, unless you want to load each post individually, pagination is needed.

The pagination shown here is only for non-Javascript user-agents (like web crawlers). The javascript interface works differently, which is how/why posts can update live on screen.

But the oneboxer is a non-Javascript user-agent.

2 Likes

Fixed via:

https://github.com/discourse/onebox/commit/70aa8d5961b6e742c424d25fed6d4615a0d7e42b

Okay, let me try to explain.

Topic body (first post)

If the first post contains an image, onebox will show that image.

If the first post does not have any image it will fall back to default_opengraph_image_url setting (more detail here).

Topic reply (random post in a topic)

If the post contains image, onebox will show that image.

If the post does not contain any image, onebox will show user (poster) avatar.

I hope this clarifies the onebox behaviour.

Also note that this behaviour will be consistent across all the sites that uses OpenGraph tags to fetch data.


The above oneboxes also verifies that fix is working… :blush:

3 Likes

Thanks for the explanation (and the quick fix, of course!). That’s almost worth a faq-material tag, but we don’t want to overdo things…

Perhaps it would be a good idea to have onebox ignore small images like this one from becoming the featured image?

https://d11a6trkgmumsb.cloudfront.net/images/font-awesome-ellipsis.png

Hm, as I paste this, I realize that the actual image file is not as tiny as it seems when it’s shown like this: , but nonetheless, do you see what I mean?

Yes, I see what you mean, but fixing this will not be a trivial change.

3 Likes

WhitelistedGenericOnebox.probable_discourse(URI(url))

I am not a huge fan of this, what I would recommend is adding some sort of custom open graph tag that tells us not to follow canonical. That way any site owner can opt for this behavior not only Discourse instances. Hitting the URL with a regex is fragile and will eventually break, I can guarantee that 100%

https://github.com/discourse/onebox/blob/a45bfa115c9656d63240492f44b13a8f75a51785/lib/onebox/engine/whitelisted_generic_onebox.rb#L171-L173

4 Likes

Optimized via:

https://github.com/discourse/discourse/commit/b3540992523a0356b6e110c6760ef2e05707957e

and

https://github.com/discourse/onebox/commit/e0103d3b6b31e850f3ae65b38bb7d1736b373c91

6 Likes

Okay, changes are deployed.

Demo:

4 Likes