Topic embedding needs some love

I was reminded of this today after clicking the “Show Full Post” button for Introducing Discourse AI. The full post that is displayed on Discourse is missing all images and many headings. Adding to the confusion, image captions are displayed, but without their associated images.

It might be possible to fix the issue on Meta for its (Ghost?) blog by adjusting Meta’s allowed embed selectors site setting: Configure the Allowed Embed Selectors Setting. From past experience, I know that getting this setting can be a tricky process. If you try adjusting it, pay close attention to the results.

Discourse has a lot of potential to function as a comment system for external posts, but to do a good job of this, clicking the “Show Full Post” button needs to reliably pull in all elements of the external post. I think the issue is that the Ruby Readability gem that’s used for parsing external posts isn’t intended for the job that Discourse is using it for. It’s also not being actively maintained: GitHub - cantino/ruby-readability: Port of arc90's readability project to Ruby.

2 Likes

Yes, at this point we either move to something else that makes it slightly better or just change the embedding strategy into making the Show Full Post into a Read Full Post that is a simple link to the original post. It may be pointless fighting with all the possible embed problems in every website afterall.

4 Likes

@sam just fixed this, take a look.

3 Likes

We’re getting ready to release our blog on Ghost and make use of the Ghost > Discourse integrations. Really happy to see this change!

4 Likes

The images are now getting pulled in. I’m not great at “spot the difference” types of puzzles, but I’m still seeing some differences:

  • Semantic Related Topics title missing
  • Community Sentiment title missing
  • missing unordered list in the Modules Providers section
  • Installing Discourse AI on your community title missing

Ideally, the “Sign up for our newsletter” prompt would be excluded from the embedded post.

Having the ability to easily quote the embedded post seems important. Thinking about that now, I’m not sure what the expected behaviour is when the “expand/collapse” and “go to post” buttons are clicked for an embedded post’s quotes.

It’s a tricky problem. It should be as simple as sanitizing the HTML that’s contained in a post’s article or main element, but I suspect there would still be issues with that approach. For example, it would require some special handling to prevent duplication of a blog post’s h1 element if the header exists inside of the article.

1 Like

I think this is all happening even in readablity.js, this is firefox reader view:

<h2 id="installing-discourse-ai-on-your-community">
      <strong>Installing Discourse AI on your community</strong>
</h2>

Will see if there is an easy way to fix this…

Not sure about this… but if we really really want to do that we can add .discourse-newsletter-signup to blocked_embed_selectors

3 Likes

Yeah, readablity.js is based on the same code as GitHub - cantino/ruby-readability: Port of arc90's readability project to Ruby, so probably the same logic is being used to remove those elements. readablity.js generally does a better job than Ruby Readability though.

The email CTA is confusing because the email input gets stripped from the embedded post. Technically, I’m not sure the CTA belongs inside the article.

1 Like