🤖 Discourse llms.txt 生成器插件

Discourse llms.txt Generator

:information_source: Summary Automatically generates llms.txt files that make your forum content discoverable by Large Language Models like ChatGPT, Claude, and other AI systems
:hammer_and_wrench: Repository Link GitHub - kaktaknet/discourse-llms-txt-generator: 🤖 Discourse llms.txt Generator Plugin. Automatically generates llms.txt and llms-full.txt files for your Discourse forum to optimize content discovery by Large Language Models (LLMs) and AI systems.
:open_book: Install Guide How to install plugins in Discourse

Features

This plugin implements the llms.txt standard - a proposed convention for providing LLM-friendly content from websites. Think of it as “robots.txt for AI” that helps your forum content appear in AI-generated responses.

For detailed feature documentation, see README.md

For technical documentation and development, see CONTRIBUTING.md

While this video isn’t specifically about Discourse or its plugins, it does a great job explaining why llms.txt files are important for your forum.

Thousands of sites—including many of the world’s largest and most respected tech companies—have already implemented the llms.txt standard on their own domains. Examples include:

  • :white_check_mark: Amazon AWS — https://docs.aws.amazon.com/llms.txt
  • :white_check_mark: Cloudflare — https://developers.cloudflare.com/llms.txt
  • :white_check_mark: Stripe — https://stripe.com/llms.txt
  • :white_check_mark: Angular — https://angular.dev/llms.txt
  • :white_check_mark: Redis — https://redis.io/llms.txt
  • :white_check_mark: Docker — https://docs.docker.com/llms.txt
  • :white_check_mark: Model Context Protocol — https://modelcontextprotocol.io/llms-full.txt

When industry giants adopt a standard at scale—long before it becomes “official”—it’s a clear signal that llms.txt solves a real and urgent problem. Such companies never roll out sitewide initiatives lightly; they always have a solid strategic reason. The rapid, large-scale embrace of llms.txt across the tech industry shows just how important structured content for AI has become, and that the industry itself is driving this adoption forward—even faster than formal standards bodies.

Key capabilities:

  1. Main Navigation File (/llms.txt)

    • Structured overview of your forum with categories, subcategories, and latest topics
    • Helps AI systems understand your forum organization instantly
    • Automatically updates as your forum grows
  2. Full Content Index (/llms-full.txt)

    • Complete forum index with all topics categorized and ready for AI consumption
    • Configurable filtering by views and topic count
    • Optional post excerpts for faster AI indexing
  3. Dynamic Per-Resource Files

    • Generate llms.txt for any category: /c/category-name/123/llms.txt
    • Generate llms.txt for any topic: /t/topic-slug/456/llms.txt
    • Generate llms.txt for any tag: /tag/tutorial/llms.txt
    • Created on-demand without physical storage
  4. Sitemap Index (/sitemaps.txt)

    • Complete list of all llms.txt URLs for efficient AI crawler discovery
    • Automatically integrates with robots.txt and sitemap.xml
  5. SEO Protection

    • Canonical URLs in HTTP headers (RFC 6596 compliant)
    • Prevents duplicate content penalties from search engines
    • Proper attribution to original forum URLs
  6. Smart Caching

    • Hourly background checks for new content
    • Only regenerates when necessary
    • Fast response times (under 50ms)
  7. Bot Control

    • Block specific AI crawlers while allowing forum access
    • Comma-separated list of user agents to block
    • Automatic robots.txt integration via view connector
  8. Privacy & Security

    • Private categories automatically excluded
    • Guardian permission checks for dynamic files
    • SQL-level security filtering
    • No personal data exposed

Benefits for your forum:

  • Increased Visibility: Your content appears in ChatGPT, Claude, and other AI responses
  • Direct Attribution: AI systems cite and link back to your forum
  • More Traffic: Increased discovery through AI-powered search
  • Better GEO: Generative Engine Optimization for AI systems
  • No Maintenance: Automatic generation and updates

Configuration

After installing the plugin, navigate to Admin → Settings → Plugins → discourse-llms-txt-generator

Step 1: Enable the plugin

Set llms_txt_enabled to true (enabled by default)

Step 2: Configure indexing

Set llms_txt_allow_indexing to true to allow AI crawlers access (enabled by default)

This setting controls whether llms.txt files appear in your robots.txt as allowed or disallowed.

Step 3: Set content filters

Configure these settings based on your forum size:

  • llms_txt_min_views: Minimum topic views to include (default: 50)
  • llms_txt_posts_limit: How many topics to include
    • small: 500 topics (recommended for large forums)
    • medium: 2,500 topics (recommended for most forums)
    • large: 5,000 topics
    • all: All topics (use cautiously on large forums)

Step 4: Optional - Add custom description

Fill in llms_txt_full_description with 2-4 sentences describing your forum’s purpose and community. This helps AI systems provide more accurate information about your forum.

Example:

This forum is dedicated to discussing Python programming, with focus on web development,
data science, and machine learning. Our community includes beginners and experienced
developers sharing practical solutions and best practices.

Step 5: Optional - Block specific bots

If you want to block certain AI crawlers, enter their user agent names in llms_txt_blocked_user_agents as a comma-separated list:

Example: Omgilibot, ChatGPT-User, CCBot

Step 6: Verify installation

Visit your forum at:

  • /llms.txt - Main navigation file
  • /llms-full.txt - Full content index
  • /sitemaps.txt - Complete sitemap

Check your /robots.txt to verify the integration.

Settings

Name Default Description
llms_txt_enabled true Enable or disable the plugin
llms_txt_allow_indexing true Allow AI crawlers to access llms.txt files (affects robots.txt)
llms_txt_blocked_user_agents "" Comma-separated list of bot user agents to block from llms.txt files
llms_txt_intro_text Custom text Introduction text that appears in the main llms.txt file
llms_txt_full_description "" Custom description for llms-full.txt to help AI understand your forum context
llms_txt_min_views 50 Minimum topic views required for inclusion in llms-full.txt
llms_txt_posts_limit medium Topic count limit: small (500), medium (2,500), large (5,000), or all
llms_txt_include_excerpts false Include post excerpts in llms-full.txt (increases file size significantly)
llms_txt_post_excerpt_length 500 Maximum excerpt length in characters (100-5000) if excerpts are enabled
llms_txt_latest_topics_count 50 Number of latest topics to show in main llms.txt file
llms_txt_cache_minutes 60 Cache duration in minutes for the navigation file

Important notes:

  • Enabling llms_txt_include_excerpts with llms_txt_posts_limit set to all may cause extremely large file sizes (potentially 10-100+ MB) and high server load on large forums
  • Private categories are automatically excluded from all llms.txt files
  • Dynamic per-resource files (categories, topics, tags) are not cached and generated on-demand
  • The plugin uses view connectors for robots.txt integration - no need to manually edit robots.txt

Technical Details

Architecture:

  • On-demand generation without pre-generated files
  • Smart caching with hourly background checks
  • Permission-aware using Discourse Guardian
  • SQL-level security filtering for private content
  • Canonical URLs in HTTP headers to prevent SEO penalties

Performance:

  • Navigation file cached for 60 minutes (configurable)
  • Full content file generated on-demand
  • Smart cache only regenerates when new content exists
  • Response time under 50ms for cached content

Compatibility:

  • Discourse 2.7.0+
  • Ruby 2.7+
  • Tested on Discourse 3.6.0.beta3

Standards compliance:

  • llms.txt standard (https://llmstxt.org/)
  • RFC 3986 (URL encoding for international characters)
  • RFC 6596 (Canonical Link headers)

Use Cases

Community Forums:
Your discussions and solutions appear when users ask AI assistants relevant questions, driving qualified traffic back to your forum.

Documentation Sites:
AI systems can reference your documentation and tutorials, providing accurate information with proper attribution.

Support Forums:
Users get direct answers from your knowledge base through AI assistants, with links back to full discussions.

Technical Communities:
Developers discover your forum content through AI-powered coding assistants, increasing community engagement.

Maintenance

The plugin requires minimal maintenance:

  • Cache automatically refreshes every hour
  • Content updates happen automatically on post creation/editing
  • No manual file generation needed
  • Optional rake tasks available: llms_txt:refresh and llms_txt:check

Troubleshooting

Files not accessible:

  • Verify llms_txt_enabled is true
  • Check that llms_txt_allow_indexing is true
  • Confirm plugin is installed in Admin → Plugins

Empty content:

  • Ensure you have public topics with sufficient views (check llms_txt_min_views setting)
  • Verify categories are public (not read-restricted)
  • Check llms_txt_posts_limit setting

robots.txt integration not working:

  • Clear robots.txt cache: Rails.cache.delete('robots_txt')
  • Use rake task: bundle exec rake llms_txt:refresh
  • Restart Discourse: ./launcher restart app

Support

License

MIT License - Free and open-source software

Credits

5 个赞

我前几天安装了这个插件。得益于作者的文档,安装过程很顺利。

我现在看到的情况是:我的网站上生成了很多 txt 文件,例如 https://discuss.rabkesov.ru/llms.txt。我知道现在就期望看到某个 AI 聊天机器人(Perplexity)在聊天中将我的内容作为答案显示出来还为时过早。

所以,为了检查其是否适合 LLM,我将其上传到了 Qwen(不幸的是,它不支持上面的链接):

你能分析一下 llms.txt 页面,告诉我你从阅读中理解了什么吗?以思维导图(Mermaid)的形式制作一棵树

附注:如果我要求将所有内容翻译成英文,树就会断裂。

image
image


这是我网站的结构(仅公开部分)。LLM 可以根据 llms.txt 创建它!我认为这只是可能情况的一小部分。


没有 Mermaid 的文本输出
你能分析一下 llms.txt 页面,告诉我你从阅读中理解了什么吗?

是的。根据 llms.txt 的内容,我的理解如下:

该文件描述了 “discuss | Комната ломки”,大致翻译为 “讨论论坛 | 拆解室”——一个比喻性地旨在拆解 IT 领域幻觉而非人员的空间。其标语和重点表明这是一个反思性、批判性和社区驱动的环境,IT 专业人员可以在其中审视假设、分享经验并就实际挑战进行协作。

核心主题与目的:

  • 对 IT 实践、职业道路、组织动态和个人成长的批判性反思
  • 强调简化工作的工具和方法论:文档(Markdown/GitLab/Docsify)、可视化(Mermaid/Drawio)、反馈机制和协作起草——特别是在基于 Discourse 的企业平台内。
  • 技术、哲学和个人话题的混合,通常相互交叉:例如,“我们为什么选择简单的任务”、“安心的代价”、“拖延和避免决策”。

结构:

该网站组织成主题类别,包括:

  • AI:关于模型(如 Gemma、LM Studio)、部署(Runpod、GPU 计算器)和工具的讨论。
  • Discourse:技术后端主题(站点配置、rake 任务、身份验证方法、隐藏参数)。
  • Основная (主要):关于工作文化、领导力、文档标准和责任的核心反思。
  • Блог (博客):个人叙事(如职业变动、搬迁到萨马拉、工作场所事件)。
  • Школа (学校):为学生提供的资源,包括道德规范(14 岁以上用户的家长同意)和数字素养(UTM 参数)。
  • Анонимная (匿名):敏感想法的安全空间。
  • Книги (书籍)Мысль дня (每日一思) 等——支持智力和日常反思。

值得注意的技术与文化信号:

  • 多语言和国际受众:例如,平行的俄语/英语身份验证指南。
  • 包容性身份验证的关注:例如,为非俄罗斯用户保留 Google/GitHub 登录,同时也支持国内提供商(VK、Mail.ru、OK)。
  • 自动化、标准化和 DevOps 的浓厚兴趣:GitLab CI/CD、工件处理、Runpod 集成、基于 SSH 的部署。
  • 职业幻灭的反复出现的主题,特别是那些被调到支持角色的高级工程师——以及应对或计划退出的策略。

总之,llms.txt 概述了一个成熟、精心设计的社区平台,它将技术深度与心理和组织内省相结合——目标是经验丰富的 IT 从业者,他们在应对复杂性、职业倦怠和在工作中寻找意义。

如果您需要可视化地图、主题聚类或下一步建议(例如,提高可发现性、参与度或入职流程),请告诉我。


对我个人而言,这是一个非常棒的插件,可用于 SEO 等方面。

需要探讨的一些要点:

  1. 如何处理非英语内容?llms.txt 中 EN 语言的文本对 LLM 来说重要吗?
  2. 找到证据证明最终的根 llms.txt 从不包含私人页面(目前 llms.txt 是为网站的每一个页面创建的)
  3. 检查 /admin/site_settings/category/security?filter=crawler 的权限设置,以确保必要的 AI 机器人没有被阻止 :slight_smile:
  4. 关于网站结构与 LLM 之前保存的记忆偶尔混合的看法(这不是插件的错误,而是聊天机器人的特性,见下例)

关于第 4 点。Qwen 第一次就认为:

  • 我计划辞职
  • 我的感情出了问题 :slight_smile:
    并将这些要点添加到了网站描述中!

Qwen 可能会将记忆与当前分析站点地图的请求(以 llms.txt 的形式提供)混合在一起。请注意 :slight_smile:

1 个赞