Issues with embedding from RSS

Hi Simon

I don’t think this has been merged yet.
I’ve been trying again for hours … it just won’t work.
I have the forum set for no uncategorised categories yet the RSS feeds all go into uncategorised no matter what I do.

Could you add the category on the RSS Polling page?
Feed URL - Author - Class - Category . and the embedding part done behind the scenes?

I don’t understand

The domain of the feeds link attributes?
I thought I was selecting the category from the dropdown?

But I put the full URL of the feed in the RSS Polling page?
It seems I’m entering the same information into two different places in different formats and they are not matching.
I’m not seeing the purpose of the allowed hosts / whitelist path.

I just had a look and see that the PR hasn’t been merged. I’ll get someone to take a quick look at my changes and get them merged into the core code. Based on your questions, I’m not sure that the explanation I added to the plugin of how to set the feed topic’s categories will be clear though. I’ll try explaining it again here.

The category that the RSS feed topics gets published to is based off the domain of the feed’s link attributes, it is not based off the domain of the feed. For example, if your feed is at https://example.com/feed, but the link attributes in your feed are for posts at https://www.example.com/, the domain that you need to add to your Admin / Customize / Embedding hosts section is www.example.com, not example.com. The example below would cause all posts from the feed to be published to the “fun” category:

To find the value of your feed’s link attributes, you will need to look at the markup of your feed. You can do that by loading the feed URL in your browser.

It allows you to publish posts from a specific path on your blog to a Discourse category. For example:

This would publish all posts in the www.example.com/fun path to my “fun” category and all posts from the www.example.com/support path to the “Customer support” category.

1 Like

I don’t understand why I’m on the embed page at all.
The complete feed URL is already input.
Can’t the plugin slice and dice the URL into Allowed Hosts and Path Whitelist without my inputing it all twice?

The example input on the RSS Polling plugin and the Embed page don’t correlate.
‘feeds’ is a subdomain on one … then later a path?

I’ve been trying with a few RSS feeds.
This one … BBC Health - BBC Health - Admin user

allowed hosts - feeds.bbci.co.uk
path whitelist - /health/.*
into a Health category

I think the above should work, but it doesn’t.
I’ve tried every possible combination for hours now.

I agree that configuring an RSS feed is fairly difficult. Some of the issues are related to our having moved the RSS feed code out of the core Discourse code into a plugin. There are not many sites that I know of that are using Discourse’s RSS feed functionality.

To test things out, I configured the feed at https://feeds.bbci.co.uk/health/rss.xml on my site. Here is how that setup looks:

When I first set that up, all feed topics were automatically published to my Uncategorized category. To fix that, I visited a couple of the topics that had been created to look at this section of the post:

What that is telling me is that the URL of the post is at http://www.bbc.co.uk/news/uk-politics-21668349#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa. The domain of the post is www.bbc.co.uk. I then added this domain as an Allowed Host on my Embedding page. I set the path /news/.* to publish to my “fun” category:

I then deleted the first batch of topics that were created by the feed. Discourse is pulling them in again. They are now being published to the correct category.

Hi I’ve finally worked it out … sorry.
I think the BBC feed was a bad place to start.

I understand your terminology now.
I didn’t realise you were referring to links from within the RSS feed.
I thought the feed URL was enough.

2 Likes

Yes, that’s probably the most difficult one that I have seen to work out. The links are redirected by the BBC servers and the feed cannot be viewed directly in the browser. The only way I could find to get the correct embed domain was to first publish the posts to my site. Generally it is a lot easier than this to configure the plugin.

2 Likes

Unfortunately the BBC health feed pulls from lots of different paths.
One just now on tropical medicine from the history section. so I need /history/.* for my Health feed too.
But this will work if I only add the one BBC feed.
So thank you very much for your time and patience. :clap: :clap:

1 Like

Revisiting a quite old topic to thank you for the good explanation. Unfortunately, it made me understand the plugin is not likely to work for my need. I’m trying to embed a feed generated by an instance of Shaarli, but the link attribute in each entry points to a whole different domain (as Shaarli is a bookmark archive tool). I assume there is no way to use a wildcard in the path (thus allowing any incoming feed item to be directed to a particular category), correct?

Oops, nevermind. I found a way to generate a different feed from the source in Shaarli. Solved for my needs.

2 Likes

As I wrote here a couple of weeks back, using a different feed solved the permalink problem. But now the plugin is not fetching the contents of each item.

The topic is created with the correct title, from the right user and filed under the appropriate category as configured. However, the body of the topic says something (sorry for the imprecision, I am using the portuguese locale) like “this is a discussion related to this original message” and points to the right URL.

Then there is a button labeled “show full message”. When I click it, it stays “loading” for indefinite time. I had the understanding that if I came back to the topic a second time, the cache would have been created but that does not seem to be the case.

Example:

The feed source is this:
https://links.efeefe.me/?do=atom&searchtags=tropixel&permalinks&nb=all

Any tips are welcome.

I think the problem is that Discourse is not finding the content that is on the page at https://links.efeefe.me/?xZVQww. There is very little text on the page. When the “Show full post” button is clicked, Discourse attempts to scrape the page to get its main content. If pages have very little content, you can help identify the main page content by configuring the Discourse embed whitelist selector site setting. There are details about how to do that here: How to configure the embed whitelist selector setting.

1 Like

Thanks, I tried that (in my case whitelisting “linklist-item-description”), and saw no effect, even waiting for the cache to refresh and adding new entries to the rss source.

Try .linklist-item (Note the . that’s at the start of the class name. It needs to be included.)

You could also try .linklist-item .linklist-item-title, .linklist-item .linklist-item-description

You’ll need to wait up to 10 minutes to see the changes. If you have access to your Discourse site’s Rails console, you can clear the cache by running Rails.cache.clear. That way you can see the changes right away.

2 Likes