Allow ChatBot to read PDFs so it can join in a group discussion

For those who have access to tools that allow one to chat with a PDF(s), it would be nice if the Discourse AI -AI Bot could also read PDFs and join in the discussion.

Right now the academics are eating this like candy but I don’t know of a way for a group of users to join as a group with the bot to talk about the paper(s). AFAIK one can only chat alone with the bot that read the paper. I am sure group chats with paper(s) exist but Discourse should have it too.

Think of it like a book club with a bot invited and the discussion being about one or more papers (pdfs).

If someone gets the bright idea :star2: that Discourse + AI model plugins (ref) = :moneybag:, hopefully this is the first place you read it.

As more and more different plugins and bots are created, one could eventually form a garage band, :guitar: have a virtual programmer meetup :desktop_computer: , etc.


As far as Discourse Chatbot 🤖 (Now smarter than ChatGPT!*) is concerned, PR welcome.

Anyone is free to contact me if they’d like to sponsor that work.

The framework I’ve created is easily extended and reading pdfs would be a great addition. :+1:


Going to need dedicated personas for this kind of work, I do think it is doable, you chunk and embed and then can discuss with it. But I am not sure I would mix this with “Forum Helper” … maybe a “Document Explorer” persona.

Very interesting use case and given we have so much of the infra to upload documents, etc it is not too much of a stretch to build.


Is this extracting text from the file and injecting it in the prompt? Sounds like an interesting feature if so.

First off, I have not created any of these so can only speculate.


The few ChatGPT plugins that I have tried read the entire PDF, however many only read the text as trying to extract data from the math expressions and graphs is beyond their capability. This is do to the fact that a PDF is designed for layout and presentation and not context extraction or passing along knowledge as a data interchange format.

Not sure exactly what you mean by that but from I understand they embed the knowledge into a vector database and then use the prompt to pick out the relevant parts and compose a reply.

The analogy I use to explain to others for how to understand the concept is instead of focusing on the idea of a PDF instead think about the ideas the author(s) of the paper are trying to pass along in the paper and that you are conversing with them.

If you can run plugins with ChatGPT then at this site

search for PDF or paper and try some. The two main differences I find in them is that many will read a single PDF, (Pugin - ChatWithPDF) while this one (Pugin - Science) will pick the relevant papers out 250M scientific papers.

LangChain has this

and there are similar repos on GitHub (ref), YMMV.

Here is a specific use case for such technology for those that think such would be limited to just academics.

Leveraging LLMs with Vast Mechanic Datasets and Guides

1 Like

How strange to put a model number in a repo name! Why wouldn’t it work with 3.5?


Others are also jumping in on similar ideas.

1 Like