RSS 订阅源自动发现可能会错过特定主题的订阅源

arya-nlnl · 2026 年1 月 7 日 15:45

您好！我们在 NLnet Labs 一直为我们的产品（community.nlnetlabs.nl）设置 Discourse。一位用户询问如何获取特定主题的 RSS 源（例如 https://community.nlnetlabs.nl/c/cascade/10），因为他们的 RSS 阅读器找不到它。

我尝试使用我选择的 RSS 阅读器访问这个特定主题的页面，它找到了两个源：“NLnet Labs Community - Latest Posts” (/posts.rss) 和 “NLnet Labs Community - Latest topics” (/latest.rss)。我知道 /c/cascade/10.rss 是一个有效的 RSS 源，但我的阅读器无法自动找到它。这有点令人沮丧，因为我们需要开始自己告知这些 URL。

我对个人网站的自动 RSS 源发现进行了一些研究，因此在这方面有一些经验。我检查了网页的 \<head> 部分；我注意到了以下链接：

<link rel="alternate" type="application/rss+xml" title="Latest posts" href="https://community.nlnetlabs.nl/posts.rss">
<link rel="alternate" type="application/rss+xml" title="Latest topics" href="https://community.nlnetlabs.nl/latest.rss">
<link rel="alternate nofollow" type="application/rss+xml" title="RSS feed of topics in the 'Cascade' category" href="https://community.nlnetlabs.nl/c/cascade/10.rss">

所以 \<head> 确实包含第三个用于特定主题的 RSS 源的链接；但似乎一些 RSS 阅读器不喜欢 rel=”nofollow” 属性。

当然，我查阅了 MDN（https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Attributes/rel）；`nofollow` 的文档是这样描述的：

表示当前文档的原始作者或发布者不认可所引用的文档。

但也有：

与 \<form>、<a> 和 <area> 相关，nofollow 关键字告诉搜索引擎蜘蛛忽略链接关系。nofollow 关系可能表示当前文档的所有者不认可所引用的文档。它通常由搜索引擎优化者添加，他们假装他们的链接农场不是垃圾邮件页面。

我在 GitHub 上查看了 Discourse 的源代码，通过一些搜索和 Git blame 找到了 https://github.com/discourse/discourse/pull/16013。所以我想第二个关于 rel=”nofollow” 的含义在这里是故意的。根据背景讨论，这似乎有助于指导网站爬虫的优先级。在 Search engines now blocked from indexing non-canonical pages - #4 by rrit 中有进一步的跟进，但我无法确定 rel=\"nofollow” 是否仍然重要。

我在 Discourse Meta 上没有找到关于此问题的讨论，尽管该 PR 早在 2022 年就被合并了。显然，在某些 RSS 阅读器和 Discourse 之间，关于 RSS 源 \<link> 的约定存在误解。所以我想问：

rel=”nofollow” 是否仍然服务于改善网站爬虫优先级的初衷，还是已被其他技术取代？
这种（即忽略 rel=”nofollow” 链接的）行为在 RSS 阅读器自动发现中是否常见？其他人能重现吗？我不知道关于 RSS 源自动发现的权威标准。
是否愿意支持这种用例，以便 RSS 阅读器可以自动发现正确的帖子？那些特定主题的 \<link> 存在（即使它们没有被我的阅读器使用）让我认为是这样；也许添加 rel=”nofollow” 时无意中忽略了功能上的损失？

致 Discourse 开发者：感谢你们的构建！

Thefacto · 2026 年1 月 7 日 15:54

你好 Arya，

是的，这确实是 Discourse 当前处理特定于主题的 RSS 源的方式所导致的结果，而不是你的源阅读器中的错误。根本原因在于 Discourse 会向主题/类别 RSS 源的 <a> 元素添加 rel="nofollow"。许多源阅读器会忽略带有 nofollow 的链接，这会阻止自动发现，即使源本身是有效的并且直接访问时可以正常工作。

一个实用的解决方法是使用一个 主题组件 (Theme Component) 来添加不带 nofollow 的特定于主题的 RSS 链接。这是一个简单的示例：

<!-- 添加不带 nofollow 的特定于主题的 RSS 链接 -->
<script type="text/discourse-plugin" version="0.8">
  api.onPageChange((url, title) => {
    document.querySelectorAll('link.custom-rss').forEach(e => e.remove());
    document.querySelectorAll('link[title^="RSS feed of"]').forEach(link => {
      const newLink = document.createElement('link');
      newLink.rel = "alternate";
      newLink.type = "application/rss+xml";
      newLink.href = link.href;
      newLink.title = link.title;
      newLink.classList.add('custom-rss');
      document.head.appendChild(newLink);
    });
  });
</script>

这将扫描所有主题/类别 RSS 链接，并将新的不带 nofollow 的 <a> 元素注入到 <head> 中。

源阅读器现在应该可以自动检测特定于主题的源了！

或者，对于更简单的方法，你可以直接与用户共享源 URL，例如 https://community.nlnetlabs.nl/c/cascade/10.rss。

此方法避免了修改 Discourse 的核心部分，并且可以在更新后继续使用。希望这能帮助源自动发现按预期工作！

干杯！

rrit · 2026 年1 月 7 日 16:01

请问您用的是哪款 RSS 订阅阅读器？

arya-nlnl · 2026 年1 月 7 日 16:03

嗨，Ayke！我正在使用 https://github.com/spacecowboy/Feeder（可在 F-Droid 和 Play 商店上获取）。我不知道我们的用户尝试过哪些 feed 阅读器。

编辑：我查看了源代码：Feeder/app/src/main/java/com/nononsenseapps/feeder/model/FeedParser.kt at bd98548f7a900b92c2fab9e7d5046827e12e2dbf · spacecowboy/Feeder · GitHub 似乎在搜索 rel=”alternate” 的精确匹配，这就是它缺少 rel=”alternate nofollow” 的原因。如果其他 feed 阅读器更谨慎，我会称这是他们那边的一个错误。

arya-nlnl · 2026 年1 月 7 日 16:19

原来，RSS 订阅自动发现确实有一个标准：https://www.rssboard.org/rss-autodiscovery#element-link-rel。它明确禁止 rel 属性中出现除 alternate 以外的任何内容。因此，Discourse 生成的 HTML 违反了标准。这并不意味着它绝对应该改变，但这一点很重要。也许添加 rel=”nofollow” 对那些网站爬虫起作用是因为它们符合自动发现标准，与用户的 RSS 阅读器无法区分，而更改破坏了两者。

rrit · 2026 年1 月 7 日 16:27

干得好！

那么我的错误报告可能就无效了：

github.com/spacecowboy/Feeder

Parsing for "alternate" links does not find "alternate nofollow"

opened 04:20PM - 07 Jan 26 UTC

rr-it

Type: Possible bug

### Checklist - [x] I have used the search function for [**OPEN**](https://gith…ub.com/spacecowboy/feeder/issues) issues to see if someone else has already submitted the same bug report. - [x] I have **also** used the search function for [**CLOSED**](https://github.com/spacecowboy/feeder/issues?q=is%3Aissue+is%3Aclosed) issues to see if the problem is already solved and just waiting to be released. - [x] I will describe the problem with as much detail as possible. - [x] If the bug only to occurs with a certain feed, I will include the URL of that feed. ### App version current ### Where did you get the app from Other ### Are you using the "Parse full article" feature? No ### Android version – ### Device model _No response_ ### First occurred _No response_ ### Steps to reproduce Check an URL e.g. https://meta.discourse.org/t/rss-feeder-auto-discovery-can-miss-topic-specific-feeds/392890 with RSS feed linked like: ```html <link rel="alternate nofollow" type="application/rss+xml" title="RSS-Feed von „RSS feeder auto-discovery can miss topic-specific feeds“" href="https://meta.discourse.org/t/rss-feeder-auto-discovery-can-miss-topic-specific-feeds/392890.rss" /> ``` ### Expected behaviour Finds RSS feed "https://meta.discourse.org/t/rss-feeder-auto-discovery-can-miss-topic-specific-feeds/392890.rss". ### Current behaviour Does not find the RSS feed. ---- https://github.com/spacecowboy/Feeder/blob/bd98548f7a900b92c2fab9e7d5046827e12e2dbf/app/src/main/java/com/nononsenseapps/feeder/model/FeedParser.kt#L120-L122 The feed parser checks for full matching attribute `rel="alternate"`. It should also take attribute `rel` into account if it _just contains_ the value `alternate` as one of multiple values, like in `rel="alternate nofollow"`. ### Logs _No response_

rrit · 2026 年1 月 7 日 17:08

rel=”nofollow 的下一个正确方法是在所有 RSS 源网址上使用 HTTP 标头 Link: <...> ; rel="canonical"。
这将导致所有 RSS 网址仅被 Google 抓取一次，然后最终被放弃。

参见 How to Specify a Canonical with rel="canonical" and Other Methods | Google Search Central | Documentation | Google for Developers

例如，对于对网址
https://meta.discourse.org/t/rss-feeder-auto-discovery-can-miss-topic-specific-feeds/392890.rss
的调用，添加此 HTTP 标头：
Link: <https://meta.discourse.org/t/rss-feeder-auto-discovery-can-miss-topic-specific-feeds/392890> ; rel="canonical"

另请参阅 Joomla 对完全相同想法的实现：Canonical HTTP Headers for RSS Feeds

arya-nlnl · 2026 年1 月 7 日 19:15

明确地说：您认为 Discourse 可以切换到什么合适的替代行为，使其符合规范吗？那将是太棒了。RSS 用户会欢欣鼓舞的

编辑：另外，感谢您在 Feeder 中提交错误报告并提及规范更新。很高兴能有一个清晰、深入的讨论，即使是这样的小问题也能得到认真对待。

话题		回复	浏览量
RSS feed discovery tags not discoverable? Support rss	3	327	2024 年2 月 17 日
Issues with embedding from RSS Support	31	3449	2020 年6 月 26 日
SEO Problems with RSS duplicate content Support rss-polling , seo	5	547	2024 年3 月 16 日
Bing is picking all the rss feed for each post, can I disable the feed in Discourse Support	9	1045	2020 年11 月 20 日
RSS feeds plugin: perennially "new" topic, from very old item in a feed Support rss-polling	4	674	2022 年10 月 18 日

RSS 订阅源自动发现可能会错过特定主题的订阅源

相关话题