Understanding how embedding works on remote site

I am setting up embedding of a Discourse forum and I am having trouble understanding how to make it work.

I’m not able to create topics by RSS at this stage so I need to do the creation manually.

My question is how does Discourse tie a topic to the page on the remote site?

(For my use case it would be simpler to put the required topic URL to embed in the JS snippet rather than ‘self’. Is that likely to be possible?)

here’s the page you want to look at:


OK, I have reread that post and it says this

If your site does not have an XML feed, Discourse will create topics based on your page contents using a readability algorithm. It’s remarkably accurate for those cases when your site doesn’t have a feed.

I cannot see how this works from the post end of things. How do you tell a post what page it should be using?

The other issue I have is that the URLs change as stories are updated. It would seem to me that this will break the integration if I rely on the current URL; it may not be the same as the created URL.

@eviltrout - are you able to comment on this?

The code you embed tells discourse there’s a new post to embed. For example on eviltrout.com:

var discourseUrl = "http://fishtank.eviltrout.com/",
    discourseEmbedUrl = "http://eviltrout.com/2014/12/22/watch-ember-tv.html";
(function() {
  var d = document.createElement('script'); d.type = 'text/javascript'; d.async = true;
  d.src = discourseUrl + 'javascripts/embed.js';
  (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(d);

The first time that script is executed it contacts discourse and says "give me the contents for discourseEmbedUrl. Discourse will then say, hey, there are no comments, but in the background enqueue a job to crawl that page for its contents. So shortly afterwards, it will be crawled and run through readability.

The key is discourseEmbedUrl.

Your site will indeed break if discourseEmbedUrl changes. However, I assume the original URL still works even when it’s updated? If you leave discourseEmbedUrl as the original value it will not import the article twice and the comments will continue to work.


Ah right, the penny has dropped now.

Also, I did not have a embed by username set, so I presume the system could not create the post.

I can easily control the URL used - I have a field in the database called initial_url which is the url that existed when the item was created (this is for Google Analytics).

One thing I’d be interested in your thoughts around is embedding the same topic in more than one story.

A good example of this use-case is this radio programme series. It is on sourdough and ran over many weeks. We might embed the forum in the first story, and then embed it in the second (and so on) so people can enter the discussion on any of the pages at any time.

It would be quite cool if there was a mode ‘non-create or update mode’ where you can embed any topic on any page. This is something we’d use a fair amount as it allows you to seed a new story on the same topic with the existing discussion and avoids having many separate topics about essentially the same conversation.

I am also going to add an RSS feed of stories we want topics for to our CMS so I can be explicit about the content that appears in the first post.

RSS + arbitrary embed appear to do the golden path for us, but I’d be interested on hearing if there are other ways to do this.

I’d be interested in the details of the “readability algorithm”. It looks like the title being pulled for the topic is from the meta title field - in our case its better to pull from the h1 tag. Also, not sure how its pulling the content for the topic… as of now its not pulling the correct details we want in there. For example, we have a description on our page, and then have a bunch of other stuff that shouldn’t be in the Discourse descripton… its pulling all of the other stuff and omitting description.

I also noticed that its not possible to change the title and content of the topic after its auto created - I receive a generic error message.

It’s a bit of a hacked up ruby gem, not 100% the same as that

Does it have any logic to apply weights to an H1 tag so that it would win out and become the topic title (like a specific ID or class)? From the original Readability code here http://arc90labs-readability.googlecode.com/svn/trunk/js/readability.js I saw that if the meta title is < 10 or > 150 characters the h1 tag would be used. Right now our meta title is just “Home” so that must be something that was customized.

Here’s our specific use case. We’re planning on launching a new website in a couple days. It’s a daily deal website, and the new deal is shown on the homepage similar to meh.com. Changing the meta title every day on the homepage isn’t something we necessarily want to do. We definitely want to foster community so embedding the discourse topic posts for each deal is ideal.

With the JS code on our site to map the page to the forum topic, I’m just appending a query string that would be distinct to the deal of the day.

A suggestion for a future Discourse update would be to allow us to enter a selector in the admin (when not using an xml / atom feed) to find the title and description for the topic creation.

We have settings for selectors I even made commits to the gem to support that


Thanks, I missed that. I applied the blacklist selector to the title element, just as a test to see if I could get it to pull the H1 as the title. Didn’t work - didn’t event create the topic (not surprised… I wouldn’t normally add a class to a title).

My workaround for the title is this:

discourseEmbedUrl = 'http://[my site].com/?eid=[event ID]&ref=cpdiscourse&title=[url encoded event title]';

Then if ref and title are present in the querystring, I change the meta title just for Discourse… adding a noindex so this variation doesn’t get picked up by search engines.

The blacklist selector worked perfectly for allowing me to prevent certain things from showing up in the description.