PDF support in Discourse AI (RAG)

sam · February 18, 2025, 4:32am

This guide explains how to implement and use PDF processing capabilities within discourse-ai, including both basic text extraction and enhanced processing with LLM assistance.

Required user level: Administrator

Summary

The discourse-ai plugin supports PDF processing for RAG (Retrieval-Augmented Generation) in two distinct modes:

Basic text extraction
Enhanced processing with LLM analysis

Basic text extraction

This mode provides fundamental PDF processing capabilities:

Extracts text content using the pdf-reader gem
Supports files up to 100MB
Works immediately after plugin installation
Processes text-only content (ignores visual elements)

Enhanced processing with LLM improvements

This mode requires specific configuration and provides more advanced capabilities.

Requirements:

Enterprise plan subscription or self-hosted Discourse
ImageMagick with Ghostscript support installed in container
ai_rag_images_enabled site setting enabled (hidden — must be set via Rails console)
A RAG LLM model configured on the AI agent or tool

Capabilities:

Interprets images, charts, and diagrams
Provides context from visual elements
Processes PDFs page by page
Maintains the 100MB file size limit
Enables image file uploads (png, jpg, jpeg) for RAG indexing via LLM-based text extraction

Implementation details

Processing specifications

Page processing resolution: 300 DPI
Per-page image conversion timeout: 30 seconds
Automatic cleanup of temporary files
Full integration with RAG document embeddings

Processing workflow

PDF upload and validation
Content extraction (basic or enhanced mode)
Text chunking with configurable overlap
Chunk embedding and storage
Progress tracking via MessageBus

Limitations

Be aware of these constraints when implementing PDF processing:

File size restrictions:
- 100MB for existing PDF processing
- 20MB for new admin interface uploads
Enhanced mode requires additional system resources
Complex PDF layouts may not be perfectly interpreted
Enhanced processing increases processing time significantly

قرار وزاري رقم (120) لسنة 2023م في شأن التعديلات بموجب الأحكام الانتقالية...

المحاسبة والضرائب والتشريعات - دولة الإمارات ضريبة الشركات - دولة الإمارات

بسم الله الرحمن الرحيم تحية طيبة وبعد، يشرفني أن أقدم لكم تحليلاً مفصلاً للنص الذي تفضلتم بعرضه، مع الربط بينه وبين معايير المحاسبة الدولية IFRS والمعايير الدولية للتدقيق ISA، بالإضافة إلى أحدث الأبحاث والممارسات المهنية والقواعد المحاسبية...

sam · February 25, 2025, 1:00am

Sorry but this is not how it works, you need to define a persona or tool and then add the upload there.

There has been some discussion around supporting “upload and ask” here: Upload and discuss pdfs in composer but it is not supported yet.

hameedacpa · February 25, 2025, 6:45am

First of all, really thank you for your great work. I really like it.

After playing around with the settings and changing the AI Model to Gemini-Flash-2.0, it worked great for me. Here’s the situation I have:

We are an Auditors, Accountants, and Tax Consultants community, and we needed a tool to share related laws and trigger discussions about them. This discussion should be very useful for visitors, as we are professionals in our field. We are targeting the AI Model to check and analyze legislation and answer our questions. The great experiment led to the conclusion that we can really discuss the context added in the first post, and if the AI model is smart enough, it will answer our questions with very high-quality output.
Really thank you again and looking forward to the PDF support as it will make Discourse best forum Sofware

sam · February 28, 2025, 12:04am

Latest image of discourse supports the advanced mode if anyone feels like testing

MachineScholar · February 28, 2025, 12:01pm

Does it have to be enabled via console? Don’t see any advanced mode options via the UI.

Furthermore, I am getting an error when trying to upload this pdf. It is 34 MB but I have my max attachment size set to 100 MB (in both admin settings and app.yml). What’s strange is that I have a compressed version which is 16 MB and it uploads just fine. But perhaps the larger PDF is simply too complex for now? There are lots of images, equations, etc.

Falco · February 28, 2025, 3:17pm

Yes, you need to SiteSetting.ai_rag_images_enabled = true in the Rails console to enable it.

sam · February 28, 2025, 11:48pm

my guess here is that some nginx stuff needs to change in the container as well so it does not do the rejecting

Michael_Liu · April 17, 2025, 12:17am

Hi @sam
im currently having trouble to upload and indexing the pdfs by this error Job exception: undefined method `length’ for nil.

i was wondering if the error related to the settings we discussed above.
the interface will stuck on indexing 0% not move and
the exception details as below:

/var/www/discourse/plugins/discourse-ai/app/jobs/regular/digest_rag_upload.rb:81:in `chunk_document'
/var/www/discourse/plugins/discourse-ai/app/jobs/regular/digest_rag_upload.rb:40:in `block in execute'
activerecord-7.2.2.1/lib/active_record/connection_adapters/abstract/transaction.rb:616:in `block in within_new_transaction'
activesupport-7.2.2.1/lib/active_support/concurrency/null_lock.rb:9:in `synchronize'
activerecord-7.2.2.1/lib/active_record/connection_adapters/abstract/transaction.rb:613:in `within_new_transaction'
activerecord-7.2.2.1/lib/active_record/connection_adapters/abstract/database_statements.rb:361:in `transaction'
activerecord-7.2.2.1/lib/active_record/transactions.rb:234:in `block in transaction'
activerecord-7.2.2.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:415:in `with_connection'
activerecord-7.2.2.1/lib/active_record/connection_handling.rb:296:in `with_connection'
activerecord-7.2.2.1/lib/active_record/transactions.rb:233:in `transaction'
/var/www/discourse/plugins/discourse-ai/app/jobs/regular/digest_rag_upload.rb:39:in `execute'
/var/www/discourse/app/jobs/base.rb:316:in `block (2 levels) in perform'
rails_multisite-6.1.0/lib/rails_multisite/connection_management/null_instance.rb:49:in `with_connection'
rails_multisite-6.1.0/lib/rails_multisite/connection_management.rb:21:in `with_connection'
/var/www/discourse/app/jobs/base.rb:303:in `block in perform'
/var/www/discourse/app/jobs/base.rb:299:in `each'
/var/www/discourse/app/jobs/base.rb:299:in `perform'
sidekiq-7.3.9/lib/sidekiq/processor.rb:220:in `execute_job'
sidekiq-7.3.9/lib/sidekiq/processor.rb:185:in `block (4 levels) in process'
sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:180:in `traverse'
sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:183:in `block in traverse'
/var/www/discourse/lib/sidekiq/pausable.rb:132:in `call'
sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:182:in `traverse'
sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:183:in `block in traverse'
sidekiq-7.3.9/lib/sidekiq/job/interrupt_handler.rb:9:in `call'
sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:182:in `traverse'
sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:183:in `block in traverse'
sidekiq-7.3.9/lib/sidekiq/metrics/tracking.rb:26:in `track'
sidekiq-7.3.9/lib/sidekiq/metrics/tracking.rb:134:in `call'
sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:182:in `traverse'
sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:173:in `invoke'
sidekiq-7.3.9/lib/sidekiq/processor.rb:184:in `block (3 levels) in process'
sidekiq-7.3.9/lib/sidekiq/processor.rb:145:in `block (6 levels) in dispatch'
sidekiq-7.3.9/lib/sidekiq/job_retry.rb:118:in `local'
sidekiq-7.3.9/lib/sidekiq/processor.rb:144:in `block (5 levels) in dispatch'
sidekiq-7.3.9/lib/sidekiq/config.rb:39:in `block in <class:Config>'
sidekiq-7.3.9/lib/sidekiq/processor.rb:139:in `block (4 levels) in dispatch'
sidekiq-7.3.9/lib/sidekiq/processor.rb:281:in `stats'
sidekiq-7.3.9/lib/sidekiq/processor.rb:134:in `block (3 levels) in dispatch'
sidekiq-7.3.9/lib/sidekiq/job_logger.rb:15:in `call'
sidekiq-7.3.9/lib/sidekiq/processor.rb:133:in `block (2 levels) in dispatch'
sidekiq-7.3.9/lib/sidekiq/job_retry.rb:85:in `global'
sidekiq-7.3.9/lib/sidekiq/processor.rb:132:in `block in dispatch'
sidekiq-7.3.9/lib/sidekiq/job_logger.rb:40:in `prepare'
sidekiq-7.3.9/lib/sidekiq/processor.rb:131:in `dispatch'
sidekiq-7.3.9/lib/sidekiq/processor.rb:183:in `block (2 levels) in process'
sidekiq-7.3.9/lib/sidekiq/processor.rb:182:in `handle_interrupt'
sidekiq-7.3.9/lib/sidekiq/processor.rb:182:in `block in process'
sidekiq-7.3.9/lib/sidekiq/processor.rb:181:in `handle_interrupt'
sidekiq-7.3.9/lib/sidekiq/processor.rb:181:in `process'
sidekiq-7.3.9/lib/sidekiq/processor.rb:86:in `process_one'
sidekiq-7.3.9/lib/sidekiq/processor.rb:76:in `run'
sidekiq-7.3.9/lib/sidekiq/component.rb:10:in `watchdog'
sidekiq-7.3.9/lib/sidekiq/component.rb:19:in `block in safe_thread'

Michael_Liu · April 17, 2025, 1:45am

thanks for this amazing update,
just have once concern here, 100mb limit for each Persona AI bot or for all Personas

pacharanero · April 30, 2025, 9:52pm

I’m new to Discourse AI but an old hand on Discourses generally.

Really keen to try this out for a specific use case in demo form at this stage.

I’ve enabled the hidden site setting.

Nothing in SideKiq that I can see. How can I see if it is working at all?

I’m aware this is a pre release feature and not ready for prime time yet, however it would be great to be able to experience and try out.

Really keen for any hints, tips, screenshots, or recipes from people that are trying this out.

Neil_Evans2 · July 15, 2025, 6:18pm

I get this error when asking the bot to summarize the contents of some PDF’s on my site. I’ve not enabled enhanced processing, and am using GPT 4.1. Any ideas what I am doing wrong?

Sorry, it looks like our system encountered an unexpected issue while trying to reply.

Error details

{
“error”: {
“message”: “An assistant message with ‘tool_calls’ must be followed by tool messages responding to each ‘tool_call_id’. The following tool_call_ids did not have response messages: call_nrDCba5mt83oavbXfPq2BtEV”,
“type”: “invalid_request_error”,
“param”: “messages.[2].role”,
“code”: null
}
}

MachineScholar · August 20, 2025, 7:12am

May I inquire into the current status of PDF support?

MachineScholar · August 20, 2025, 7:24am

When you configure upload sizes in app.yml it is site-wide, so it applies to each persona.

kuaza · November 22, 2025, 5:40pm

Are there any updates on this matter? I’m attaching a PDF when initiating a conversation with the AI, but it still doesn’t seem to recognize it. I am currently utilizing GPT. Should I perhaps consider employing a different model specifically designed for PDF processing?

Topic		Replies	Views
Will RAG Support PDF Files in the Future? Feature completed , ai , ai-bot	21	756	March 11, 2025
Upload and discuss pdfs in composer Feature ai	5	293	February 24, 2025
Is the PDF upload feature for the new AI Bot UX still in development Support ai , ai-bot	2	137	May 9, 2025
Using PDF and attachment support with AI bots Site Management how-to , ai , ai-bot	0	299	December 11, 2025
Allow ChatBot to read PDFs so it can join in a group discussion Feature ai , ai-bot	6	995	October 12, 2023