🤖 Discourse llms.txt Generator Plugin

kaktak · November 11, 2025, 6:45pm

Discourse llms.txt Generator


	Summary	Automatically generates llms.txt files that make your forum content discoverable by Large Language Models like ChatGPT, Claude, and other AI systems
	Repository Link	GitHub - kaktaknet/discourse-llms-txt-generator: 🤖 Discourse llms.txt Generator Plugin. Automatically generates llms.txt and llms-full.txt files for your Discourse forum to optimize content discovery by Large Language Models (LLMs) and AI systems.
	Install Guide	How to install plugins in Discourse

Features

This plugin implements the llms.txt standard - a proposed convention for providing LLM-friendly content from websites. Think of it as “robots.txt for AI” that helps your forum content appear in AI-generated responses.

For detailed feature documentation, see README.md

For technical documentation and development, see CONTRIBUTING.md

While this video isn’t specifically about Discourse or its plugins, it does a great job explaining why llms.txt files are important for your forum.

Thousands of sites—including many of the world’s largest and most respected tech companies—have already implemented the llms.txt standard on their own domains. Examples include:

Amazon AWS — https://docs.aws.amazon.com/llms.txt
Cloudflare — https://developers.cloudflare.com/llms.txt
Stripe — https://stripe.com/llms.txt
Angular — https://angular.dev/llms.txt
Redis — https://redis.io/llms.txt
Docker — https://docs.docker.com/llms.txt
Model Context Protocol — https://modelcontextprotocol.io/llms-full.txt

When industry giants adopt a standard at scale—long before it becomes “official”—it’s a clear signal that llms.txt solves a real and urgent problem. Such companies never roll out sitewide initiatives lightly; they always have a solid strategic reason. The rapid, large-scale embrace of llms.txt across the tech industry shows just how important structured content for AI has become, and that the industry itself is driving this adoption forward—even faster than formal standards bodies.

Key capabilities:

Main Navigation File (/llms.txt)
- Structured overview of your forum with categories, subcategories, and latest topics
- Helps AI systems understand your forum organization instantly
- Automatically updates as your forum grows
Full Content Index (/llms-full.txt)
- Complete forum index with all topics categorized and ready for AI consumption
- Configurable filtering by views and topic count
- Optional post excerpts for faster AI indexing
Dynamic Per-Resource Files
- Generate llms.txt for any category: /c/category-name/123/llms.txt
- Generate llms.txt for any topic: /t/topic-slug/456/llms.txt
- Generate llms.txt for any tag: /tag/tutorial/llms.txt
- Created on-demand without physical storage
Sitemap Index (/sitemaps.txt)
- Complete list of all llms.txt URLs for efficient AI crawler discovery
- Automatically integrates with robots.txt and sitemap.xml
SEO Protection
- Canonical URLs in HTTP headers (RFC 6596 compliant)
- Prevents duplicate content penalties from search engines
- Proper attribution to original forum URLs
Smart Caching
- Hourly background checks for new content
- Only regenerates when necessary
- Fast response times (under 50ms)
Bot Control
- Block specific AI crawlers while allowing forum access
- Comma-separated list of user agents to block
- Automatic robots.txt integration via view connector
Privacy & Security
- Private categories automatically excluded
- Guardian permission checks for dynamic files
- SQL-level security filtering
- No personal data exposed

Benefits for your forum:

Increased Visibility: Your content appears in ChatGPT, Claude, and other AI responses
Direct Attribution: AI systems cite and link back to your forum
More Traffic: Increased discovery through AI-powered search
Better GEO: Generative Engine Optimization for AI systems
No Maintenance: Automatic generation and updates

Configuration

After installing the plugin, navigate to Admin → Settings → Plugins → discourse-llms-txt-generator

Step 1: Enable the plugin

Set llms_txt_enabled to true (enabled by default)

Step 2: Configure indexing

Set llms_txt_allow_indexing to true to allow AI crawlers access (enabled by default)

This setting controls whether llms.txt files appear in your robots.txt as allowed or disallowed.

Step 3: Set content filters

Configure these settings based on your forum size:

llms_txt_min_views: Minimum topic views to include (default: 50)
llms_txt_posts_limit: How many topics to include
- small: 500 topics (recommended for large forums)
- medium: 2,500 topics (recommended for most forums)
- large: 5,000 topics
- all: All topics (use cautiously on large forums)

Step 4: Optional - Add custom description

Fill in llms_txt_full_description with 2-4 sentences describing your forum’s purpose and community. This helps AI systems provide more accurate information about your forum.

Example:

This forum is dedicated to discussing Python programming, with focus on web development,
data science, and machine learning. Our community includes beginners and experienced
developers sharing practical solutions and best practices.

Step 5: Optional - Block specific bots

If you want to block certain AI crawlers, enter their user agent names in llms_txt_blocked_user_agents as a comma-separated list:

Example: Omgilibot, ChatGPT-User, CCBot

Step 6: Verify installation

Visit your forum at:

/llms.txt - Main navigation file
/llms-full.txt - Full content index
/sitemaps.txt - Complete sitemap

Check your /robots.txt to verify the integration.

Settings

Name	Default	Description
`llms_txt_enabled`	`true`	Enable or disable the plugin
`llms_txt_allow_indexing`	`true`	Allow AI crawlers to access llms.txt files (affects robots.txt)
`llms_txt_blocked_user_agents`	`""`	Comma-separated list of bot user agents to block from llms.txt files
`llms_txt_intro_text`	Custom text	Introduction text that appears in the main llms.txt file
`llms_txt_full_description`	`""`	Custom description for llms-full.txt to help AI understand your forum context
`llms_txt_min_views`	`50`	Minimum topic views required for inclusion in llms-full.txt
`llms_txt_posts_limit`	`medium`	Topic count limit: `small` (500), `medium` (2,500), `large` (5,000), or `all`
`llms_txt_include_excerpts`	`false`	Include post excerpts in llms-full.txt (increases file size significantly)
`llms_txt_post_excerpt_length`	`500`	Maximum excerpt length in characters (100-5000) if excerpts are enabled
`llms_txt_latest_topics_count`	`50`	Number of latest topics to show in main llms.txt file
`llms_txt_cache_minutes`	`60`	Cache duration in minutes for the navigation file

Important notes:

Enabling llms_txt_include_excerpts with llms_txt_posts_limit set to all may cause extremely large file sizes (potentially 10-100+ MB) and high server load on large forums
Private categories are automatically excluded from all llms.txt files
Dynamic per-resource files (categories, topics, tags) are not cached and generated on-demand
The plugin uses view connectors for robots.txt integration - no need to manually edit robots.txt

Technical Details

Architecture:

On-demand generation without pre-generated files
Smart caching with hourly background checks
Permission-aware using Discourse Guardian
SQL-level security filtering for private content
Canonical URLs in HTTP headers to prevent SEO penalties

Performance:

Navigation file cached for 60 minutes (configurable)
Full content file generated on-demand
Smart cache only regenerates when new content exists
Response time under 50ms for cached content

Compatibility:

Discourse 2.7.0+
Ruby 2.7+
Tested on Discourse 3.6.0.beta3

Standards compliance:

llms.txt standard (https://llmstxt.org/)
RFC 3986 (URL encoding for international characters)
RFC 6596 (Canonical Link headers)

Use Cases

Community Forums:
Your discussions and solutions appear when users ask AI assistants relevant questions, driving qualified traffic back to your forum.

Documentation Sites:
AI systems can reference your documentation and tutorials, providing accurate information with proper attribution.

Support Forums:
Users get direct answers from your knowledge base through AI assistants, with links back to full discussions.

Technical Communities:
Developers discover your forum content through AI-powered coding assistants, increasing community engagement.

Maintenance

The plugin requires minimal maintenance:

Cache automatically refreshes every hour
Content updates happen automatically on post creation/editing
No manual file generation needed
Optional rake tasks available: llms_txt:refresh and llms_txt:check

Troubleshooting

Files not accessible:

Verify llms_txt_enabled is true
Check that llms_txt_allow_indexing is true
Confirm plugin is installed in Admin → Plugins

Empty content:

Ensure you have public topics with sufficient views (check llms_txt_min_views setting)
Verify categories are public (not read-restricted)
Check llms_txt_posts_limit setting

robots.txt integration not working:

Clear robots.txt cache: Rails.cache.delete('robots_txt')
Use rake task: bundle exec rake llms_txt:refresh
Restart Discourse: ./launcher restart app

Support

GitHub Issues: GitHub · Where software is built

License

MIT License - Free and open-source software

Credits

Standard Answer.AI: llms.txt by Jeremy Howard (Answer.AI)
GitHub Answer.AI: GitHub - AnswerDotAI/llms-txt: The /llms.txt file, helping language models use your website
Platform: Discourse

Ivan_Rapekas · November 20, 2025, 3:25pm

I installed the plugin a couple of days ago. The installation was smooth, thanks to author’s documentation.

What I see now. It were generated a lot of txt’s at my site, for example https://discuss.rabkesov.ru/llms.txt. I now that it is too early to want to see some AI chatbot (Perplexity) shows my content as an answer in chat.

So, to check an ability to be suitable for LLM, I uploaded it to Qwen (unfortunately it doesn’t support the link above):

Could you analyze the page llms.txt and tell me what you understood from what you have read? Make a tree in the form of a mind map (Mermaid)

P.S. If I ask to translate everything into English, the tree breaks.

Here it is a structure of my site (the public part only). LLM could created it from the llms.txt! I think it is only a small part of possible cases.

Text output without Mermaid

Could you analyze the page llms.txt and tell me what you understood from what you have read?

Yes. Based on the content of llms.txt, here’s what I understood:

The file describes “discuss | Комната ломки”, which translates roughly to “Discussion Forum | The Breaking Room” — a space metaphorically intended for breaking illusions in IT, not people. Its tagline and focus suggest a reflective, critical, and community-driven environment where IT professionals can examine assumptions, share experiences, and collaborate on practical challenges.

Core Themes & Purpose:

Critical reflection on IT practices, career paths, organizational dynamics, and personal growth.
Emphasis on tools and methodologies that simplify work: documentation (Markdown/GitLab/Docsify), visualization (Mermaid/Drawio), feedback mechanisms, and collaborative drafting — particularly within a Discourse-based corporate platform.
A mix of technical, philosophical, and personal topics, often intersecting: e.g., “Why we choose easy tasks”, “Cost of peace of mind”, “Procrastination and decision avoidance”.

Structure:

The site is organized into thematic categories, including:

AI: Discussions on models (e.g., Gemma, LM Studio), deployment (Runpod, GPU calculators), and tooling.
Discourse: Technical backend topics (site configuration, rake tasks, auth methods, hidden parameters).
Основная (Main): Core reflections on work culture, leadership, documentation standards, and responsibility.
Блог (Blog): Personal narratives (e.g., career shifts, relocation to Samara, workplace incidents).
Школа (School): Resources for students, including ethics (parental consent for 14+ users) and digital literacy (UTM parameters).
Анонимная (Anonymous): Safe space for sensitive ideas.
Книги (Books), Мысль дня (Thought of the Day), etc. — supporting intellectual and daily reflection.

Notable Technical & Cultural Signals:

Multilingual & international audience: e.g., parallel Russian/English authentication guides.
Concern for inclusive authentication: e.g., preserving Google/GitHub login for non-Russian users, while also supporting domestic providers (VK, Mail.ru, OK).
Strong interest in automation, standardization, and DevOps: GitLab CI/CD, artifact handling, Runpod integration, SSH-based deployment.
Recurring theme of professional disillusionment, especially among senior engineers moved into support roles — and strategies to cope or plan exits.

In summary, llms.txt outlines a mature, thoughtfully designed community platform that blends technical depth with psychological and organizational introspection — targeting experienced IT practitioners navigating complexity, burnout, and the search for meaning in their work.

Let me know if you’d like a visual map, thematic clustering, or suggestions for next steps (e.g., improving discoverability, engagement, or onboarding).

As for me, It is a great plugin to be used in SEO etc.

Some point to explore:

How to handle non-English content? Is it matter to LLM the EN-language of text in llms.txt?
Find the proof that the final root llms.txt never contains private pages (llms.txt is created for every page of the site at this moment)
Check permissions at /admin/site_settings/category/security?filter=crawler so that necessary AI bots were not blocked
Thoughts about occasional mixing site structure with a memories, which is saved in LLM previously (this is not a plugin fault, but a chatbot feature, see example below)

About the point #4. The first time Qwen has decided that:

I planned to leave my work
I had troubles in relationship
and added these points into site description!

Qwen may mix memories with a current request of analyzing the sitemap, provided as llms.txt. Be aware

Topic		Replies	Views
ChatGPT Assistant Integration Support	6	1243	April 16, 2024
Discourse AI Plugin included-in-core , ai , official	89	37495	October 14, 2025
[Ai Bot] Add user token tracking, custom AI personas, max context posts, document loading, custom API URLs, and localized chat titles Feature ai-bot , ai	2	486	March 22, 2024
Forum Researcher AI Persona guide Site Management ai-bot , how-to , ai	12	507	September 18, 2025
Discourse AI - AI bot Site Management ai , ai-bot , how-to	51	15816	November 13, 2025