How to customize the text in an embedded post?

I have a site where I publish various tutorials and blogs, and I use Discourse as both a forum and as comments using the embedding feature.

This mostly works great, except that when I create a new page on the main site, all of the content is included in the Discourse post. Some of my users don’t even know about the main site, because they always read the full post on the forum! Which is a problem because features like embedded code editors don’t work on Discourse, so it comes off as a buggy experience.

In a perfect world, the Discourse post would just be a short, very obvious link back to the original post on the main page. Maybe something like this:

View the original post here:

https://example.com

Replies to this thread will show as comments on the original post!

I’ve tried disabling the embed truncate setting as described in this thread, but that seems to hide the “show full post” button but still shows all of the content in the post.

I’ve also tried editing the embed.imported_from message, but that just changes the tiny text at the bottom that folks seem to already be missing.

I’ve also tried just editing the post manually after Discourse creates it, but the markdown is not rendered into HTML and shows up as plain text instead. This sounds similar to this issue: Customizing the "Embedding" Behavior by Disabling Show Full Post?

Is there a setting I’m missing, or some other trick I can use to customize the text in an auto-generated Discourse post? Maybe something I can include in the HTML of the main site to trick Discourse into showing the right thing? Or I’m not above just manually editing it, if there’s a way to fix the markdown rendering issue.

Thanks for any help y’all can offer!

1 Like

Sorry for the bump, but I’d be really grateful if anybody had any ideas for me to try!

Hey Kevin, can I just confirm whether you’re using the WP Discourse plugin or a javascript embed?

1 Like

Thanks for the reply! I’m using the JavaScript embed. For example, I have this page:

Which contains this embed code:

<script type="text/javascript">
DiscourseEmbed = { discourseUrl: location.protocol 
	+ '//forum.happycoding.io/',
	discourseEmbedUrl: location.protocol + '//happycoding.io/tutorials/javascript/react-css' };
	
(function() {
	var d = document.createElement('script'); d.type = 'text/javascript'; d.async = true;
	d.src = DiscourseEmbed.discourseUrl + 'javascripts/embed.js';
	(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(d);
})();
</script>

That creates this post on my Discourse:

And if you click that, you’ll see that the post contains the full text of the original page.

Thanks for the clarification.

Why have you got the embed truncate site setting enabled in Discourse? I’m just a bit confused as you mention you’ve disabled it, but you also say your problem is

The embed truncate setting is there partly for this very reason. It means the user will only see a partial excerpt of the post on Discourse itself.

Could you just explain a bit more what specific user behavior you’re trying to avoid, and what specific user behavior you’re trying to encourage.

1 Like

I’ve gone back and forth with the embed truncate setting. Looking at it again now, I guess enabling it is marginally better, but I’m still hoping for a way to avoid showing the full text of the original article in Discourse at all. In other words, I don’t want to hide the full text behind a button click- I never want to show the full text at all, just a link to the original article.

The behavior I’m trying to avoid is users come to my Discourse and read the full article on Discourse, rather than in the original page. This is a problem because the full text on Discourse often contains bugs (with interactive JS, embedded code, etc), and then I get bug reports where the solution is to stop reading on Discourse and go to the “real” site instead.

In other words, the user behavior I’m trying to encourage is reading the full article on the original page, rather than in the Discourse post.

This might seem minor (and in the grand scheme of things it is) but my fear is that users are coming to my Discourse, thinking that the page’s behavior is buggy, and bouncing off before they realize there’s a page on my “real” site that they should be reading instead of the Discourse.

Some possible options I’ve considered:

  • Is there a setting that tells Discourse to include just a link, and not include any of the original post at all?
  • Is there a CSS class or other attribute I can add to my original HTML to indicate which part of the article should be included (or excluded) in the Discourse post?
  • Maybe I could add custom CSS to the Discourse to hide the Show Full Post... button?

Thanks for explaining Kevin. There are no settings specifically directed at this issue, but there’s two ways you could approach this.

Customize what HTML is pulled from your site

The way embeds work is that they scrape the content from a site using the Readability gem. The gem and it’s output use the following options to filter what HTML is scraped

opts[:whitelist] = SiteSetting.allowed_embed_selectors if SiteSetting.allowed_embed_selectors.present?
opts[:blacklist] = SiteSetting.blocked_embed_selectors if SiteSetting.blocked_embed_selectors.present?
allowed_embed_classnames = SiteSetting.allowed_embed_classnames if SiteSetting.allowed_embed_classnames.present?

So you could set the site settings allowed_embed_selectors, blocked_embed_selectors, or allowed_embed_classnames to restrict which content is pulled from your HTML and shown in the Discourse post, e.g. you could restrict it to non-existent classes so no content was pulled.

The content scraped from the site then has this HTML appended to it:

"\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"

So you’d just need to customize the embed.imported_from text in the admin panel to tell the user to read the content on the blog. Note that you can interpolate the link to the content in that text, e.g. the english version of the locale text is

This is a companion discussion topic for the original entry at %{link}

Hide the Show Full Post button

As you suggested, hiding the show full post button with CSS should also work.

2 Likes

I struggle to understand why there is not an option to customize the full embedded text. I do not want to scrape any actual content from the embedded URL, instead just have a link to it with a short description (e.g. just the meta summary).

Right now I do this with an automated API call, but want to switch to the native embedding feature.

I tried creating a hidden element on the scraped site specifially for Discourse to scrape just that single element, but the downside is that the onebox will not be displayed for the link.

Customizing the embed.imported_from also has its limitations, since it’s always forced into a <small> tag allowing no actual customization.

It sounds like you don’t want an embed, which is, by it’s nature, the “embedding” of content from another place.

Why do you want to switch?

True, I just want an automatic thread creation whenever a new blog article is published.

However, I also want to use the native JS embed feature to show comments below the blog post on the external website, which comes with the embedding behaviour in the forum as well.

My current automation comes with a bit of a delay (not real time), and implementing an automatic thread creation in our CMS whenever a new article is published is a bit more difficult, since it’s not just a blog CMS and there is not even a distinct “publishing” event.

In any case, there will be collisions between the JS embed trying to create the thread and my automation, the former probably being faster most of the time. This is why I want to “switch” to just using the JS embed feature, with the downside of the threads having to be manually edited each time.

Happy to hear suggestions! :smile:

Thanks for explaining.

Ok, if I’m understanding you correctly:

  1. you want the topic creation and comment linking functionality of JS embeds; and
  2. you want just a link with a short description in the first post of the linked topic in discourse.

Is that right? For 2 have you tried the embed truncate site setting? If you have, what about that did you not like? I understand you have touched on that a bit in your first reply, however could you explain specifically what you’re struggling with? Maybe give an example as to what is stopping you achieving your desired outcome (and what exactly that desired outcome is).

1 Like

Yes, you are.

The issue comes down to the link onebox, which is not shown because the embed content always gets wrapped in HTML tags. :smiley:

I know this sounds like a small nit (which it is), but the quality of life downside of having to edit this manually for each article is significant and something I wanted to fix for a long time.

What I want it to look like (using an example Discourse blog post):

Currently, I would have to mess with hidden elements on the website to be able to specifically scrape the URL and summary, and even then, the problem is the onebox not being displayed. The only thing I can more or less fully customize is the “Read the full blog post…” part at the bottom.

I guess what I’m asking for is the ability to add something to the JS snippet like this:

DiscourseEmbed = {
    discourseUrl: 'https://forum.example.com/',
    discourseEmbedUrl: 'https://blog.discourse.org/2024/03/a-warm-welcome-to-spiceworks',
    discourseRaw: 'https://blog.discourse.org/2024/03/a-warm-welcome-to-spiceworks\n\nWe are thrilled to share the move of the Spiceworks community to Discourse! The Spiceworks team has worked closely with our migration team\n\n<small>Read the full blog post on <a href="https://blog.discourse.org/2024/03/a-warm-welcome-to-spiceworks/">discourse.org</a>. This post has been created automatically and replies will be shown on the website.</small>'
};

discourseEmbedRaw being equivalent to the raw value in a regular API request to /posts.json.

But I understand this might be an edge case requirement and is not relevant for most users. I guess I will try to solve this by creating the topics via API before the JS snippet attempts to do so.

I wouldn’t recommend that.

This would cause various issues. Let’s just leave that to one side for now.

I appreciate that ideally you’d want complete control over everything, however bear with me as I attempt to translate your needs into what might be feasible improvements to the current system. Keep in mind these are just suggestions and I don’t have control over what gets accepted by the Discourse team.

An embedded post in Discourse is essentially made up of two things

  1. “imported from” HTML (i.e. the link)
  2. HTML content from linked page, either full or truncated.

1. Control over the “imported from” html

Currently, this html is hardcoded as

 "\n<hr>\n<small>#{I18n.t("embed.imported_from", link: "<a href='#{url}'>#{url}</a>")}</small>\n"

You’d like to customise this to be, for example, just the url so that it would onebox. I think a feasible improvement there would be a site setting that simply switches this to “url only”, so that you wouldn’t have to allow admins to enter html somewhere.

2. Control over the truncated HTML content

You can do this already. Just set the site setting allowed embed classnames to a classname of an element you’ve used to wrap the excerpt you want on your site, e.g.

On discourse

Set these site settings:

  • embed truncate to false
  • allowed embed classnames to “discourse-excerpt”

On your blog page

<div class="discourse-excerpt">
We are thrilled to share the move of the Spiceworks community to Discourse! The Spiceworks team has worked closely with our migration team
</div>

3. Control over the order of the “import from” HTML, and the HTML Content

If I’m reading you correctly, you want the “imported from” part (e.g. just the URL) to come before the HTML content (or truncated content). Again, the simplest way to do this would be a boolean site setting, something like embed imported from above content.

So, in short, if I’m reading you correctly, you could achieve this with the addition of two new boolean settings and some small tweaks to the TopicEmbed class. You’ll note that all of these changes are to discourse/discourse itself as the processing has to happen there.

As I mentioned above, these are just suggestions as to how I would achieve what you want to do. To get these, or something similar actioned, there would need to be by-in from the Discourse team.

1 Like

Thanks for writing this down! :+1:

Yes, that’s exactly what I played around with, the issue being the content will be wrapped in several HTML tags, which is why the onebox will not trigger. I tried separating the URL with <br> tags (to trigger the onebox), but stuff like that appears to get trimmed automatically.

Hmm, okay, why? :slight_smile:

I would set the embed_url value, of course.

Getting your embed URL to onebox is a separate issue. Use allowed embed classnames just to set the text excerpt as in my example.

Because you’ll be effectively re-inventing the wheel to try to circumvent what is really a TopicEmbed parsing issue. It’ll also open up a new set of issues, like what if your code does not execute in the order you expect, e.g. there’s a race condition or some other intervening exception. These kind of issues happen relatively often with a mixture of code on an external site with the WP Discourse plugin. In short, it’s not worth it.

You seem to know your way around a codebase :slight_smile: . You effectively need to make two simple changes to this class.

  1. Insert a conditional controlled by a site setting here
  1. insert another conditional controlled by a site setting here

You wouldn’t even need to build the discourse app. Just write two rspec tests first, then make the changes, then once you get them working make a PR :slight_smile:

1 Like

For what it’s worth, here’s what I ended up doing:

  1. On my blog, I have a <div> with an id of forum-excerpt, which is hidden with display:none but contains the HTML that I’d like to show in the Discourse post. (I do this using some Jekyll / Liquid logic, but that shouldn’t really matter.)

  2. On my Discourse, I set the CSS selector for elements that are allowed in embeds to #forum-excerpt. Although the div is hidden on my actual page, the content shows up on the forum.

  3. I also uncheck Truncate the embedded posts.

  4. In the Embedded CSS section, I give .button a larger font. This is a small change but it makes the button to add a comment bigger.

  5. I have also customized the embed.continue, embed.start_discussion, and embed.imported_from text, which changes what shows up in the comments section on my website.

This means that I have full control over the HTML that shows up in the forum post. The HTML I give it is basically the equivalent of a OneBox- it’s a big thumbnail and a link to the main post.

This works pretty much perfectly for me, belated thank you for the help!

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.