RSS feeder auto-discovery can miss topic-specific feeds

Hi! At NLnet Labs, we’ve been setting up Discourse for our products (community.nlnetlabs.nl). A user asked about getting the RSS feed for a particular topic (e.g. https://community.nlnetlabs.nl/c/cascade/10), as their RSS reader couldn’t find it.

I tried using that topic-specific page with my RSS feed reader of choice, and it found two feeds: “NLnet Labs Community - Latest Posts” (/posts.rss) and “NLnet Labs Community - Latest topics” (/latest.rss). I know that /c/cascade/10.rss is a valid RSS feed, but my feeder couldn’t find it automatically. This is a bit frustrating, as we will need to start communicating these URLs ourselves.

I’ve investigated automatic RSS feed discovery for my personal website, so I have some experience with this. I checked the <head> of the web page; I noticed the following links:

<link rel="alternate" type="application/rss+xml" title="Latest posts" href="https://community.nlnetlabs.nl/posts.rss">
<link rel="alternate" type="application/rss+xml" title="Latest topics" href="https://community.nlnetlabs.nl/latest.rss">
<link rel="alternate nofollow" type="application/rss+xml" title="RSS feed of topics in the 'Cascade' category" href="https://community.nlnetlabs.nl/c/cascade/10.rss">

So the <head> does include a third link for the topic-specific RSS feed; but it appears that some RSS feed readers don’t like the rel=”nofollow” attribute.

Of course, I checked MDN ( HTML attribute: rel - HTML | MDN ); nofollow is documented as:

Indicates that the current document’s original author or publisher does not endorse the referenced document.

But also:

Relevant to <form>, <a>, and <area>, the nofollow keyword tells search engine spiders to ignore the link relationship. The nofollow relationship may indicate the current document’s owner does not endorse the referenced document. It is often included by Search Engine Optimizers pretending their link farms are not spam pages.

I looked through the Discourse source code on GitHub, and with some searches and Git blame was able to find FEATURE: add nofollow to RSS alternate link in topics and categories by rr-it · Pull Request #16013 · discourse/discourse · GitHub . So I guess the second meaning to rel=”nofollow” was intended here. Following the background discussion, it seems to be helpful for guiding prioritization in site crawlers. There was some additional follow-up in Search engines now blocked from indexing non-canonical pages - #4 by rrit , but I couldn’t figure out whether rel="nofollow” is still important.

I couldn’t find any discussion on Discourse Meta about this issue, even though the PR was merged back in 2022. Clearly, there’s a misunderstanding in the conventions around <link>s for RSS feeds, between some RSS feed readers and Discourse. So I ask:

  1. Does rel=”nofollow” still serve its original intention for improving site crawler prioritization, or has it been superseded by other techniques?
  2. Does this behavior (i.e. ignoring rel=”nofollow” links) in RSS feed reader autodiscovery appear to be common? Can others replicate it? I’m not aware of an authoritative standard on RSS feed auto-discovery.
  3. Is there willingness to support this use case, for RSS feed readers to automatically discover the right posts? The existence of those topic-specific <link>s, even if they’re not getting used by my feeder, makes me think so; perhaps the loss of functionality was simply overlooked when rel=”nofollow” was added.

To the Discourse devs: thanks for building this!

1 Like

Hi Arya,

Yes — this is indeed a result of how Discourse currently handles topic-specific RSS feeds, not a bug in your feed reader. The root cause is that Discourse adds rel="nofollow" to the <link> element for topic/category RSS feeds. Many feed readers ignore links with nofollow, which prevents automatic discovery, even though the feed itself is valid and works if accessed directly.

A practical workaround is to use a Theme Component to add topic-specific RSS links without nofollow. Here’s a simple example:

<!-- Add topic-specific RSS links without nofollow -->
<script type="text/discourse-plugin" version="0.8">
  api.onPageChange((url, title) => {
    document.querySelectorAll('link.custom-rss').forEach(e => e.remove());
    document.querySelectorAll('link[title^="RSS feed of"]').forEach(link => {
      const newLink = document.createElement('link');
      newLink.rel = "alternate";
      newLink.type = "application/rss+xml";
      newLink.href = link.href;
      newLink.title = link.title;
      newLink.classList.add('custom-rss');
      document.head.appendChild(newLink);
    });
  });
</script>

This scans for all topic/category RSS links and injects new elements without nofollow into the .

Feed readers should now detect topic-specific feeds automatically.

Alternatively, for a simpler approach, you can just share the feed URL directly with users, e.g. Cascade - NLnet Labs Community.

This method avoids modifying the core of Discourse and works across updates. Hopefully this helps feed autodiscovery work as expected!

Cheers!

1 Like

May I ask which RSS feed reader that is?

Hi Ayke! I’m using GitHub - spacecowboy/Feeder: Android feed reader app (available on F-Droid and the Play Store). I don’t know which feeders our users have tried.

Edit: I peeked into the source code: https://github.com/spacecowboy/Feeder/blob/bd98548f7a900b92c2fab9e7d5046827e12e2dbf/app/src/main/java/com/nononsenseapps/feeder/model/FeedParser.kt#L122 seems to search for exact matches of rel=”alternate”, which is why it’s missing rel=”alternate nofollow”. I’d call this a bug on their end iff other feed readers are more careful.

Turns out, there is a standard on RSS feed autodiscovery: https://www.rssboard.org/rss-autodiscovery#element-link-rel. It explicitly disallows anything in the rel attribute except alternate. So Discourse’s generated HTML is breaking the standard. That doesn’t mean it should definitely change, but it’s important to note. Maybe adding rel=”nofollow” worked on those site crawlers because they were conforming to the autodiscovery standard, indistinguishably from users’ RSS feed readers, and the change broke both of them.

1 Like

Good find!

Than my bug report might be null and void:

The next proper way to rel=”nofollow is the use of HTTP-header Link: <…>; rel="canonical" on all RSS feed URLs.
This would result in all RSS urls being crawled once by Google and then ultimately being dropped.

See How to Specify a Canonical with rel="canonical" and Other Methods | Google Search Central  |  Documentation  |  Google for Developers

E.g. for calls to the url
https://meta.discourse.org/t/rss-feeder-auto-discovery-can-miss-topic-specific-feeds/392890.rss
add this HTTP-header:
Link: <https://meta.discourse.org/t/rss-feeder-auto-discovery-can-miss-topic-specific-feeds/392890>; rel="canonical"

Also see implementation for the very same idea to Joomla: Canonical HTTP Headers for RSS Feeds

To be clear: do you think there’s a suitable alternative behavior Discourse can switch to, that would let it conform to the spec? That would be awesome. The RSS users will rejoice :slight_smile:

Edit: also, thank you for filing a bug report in Feeder and mentioning the spec update. It’s nice to have a clear, engaged discussion where even minor issues like this can be taken seriously.

1 Like