Advice on a support bot for a technical support forum (Discourse AI vs Discourse Chatbot)

Who is forcing you to upload in ancient PDF?

Should imagine it would be better to convert once and be able to edit locally.

Anyway, watch this space there is more to come in this space for Chatbot …

1 Like

Reality maybe?
Truth is that most domain knowledge is not readily available in markdown.

Again, take the example given above. They have nice PDF user manuals which are created by an entirely different division. Convert once and edit locally won’t work.

:fire: :fire:

1 Like

A PDF reader PR is welcome Richard!

But I suspect markdown capability is more than good enough for time being.

back to Eric

We’ll just wait until assistants become a tad less expensive. And yes, in the meanwhile markdown will work fine!

1 Like

Glad others are joining on this and hope that this is of benefit to Ryan.

Perhaps @37Rb (Ryan) can provide some more info

  1. Do you foresee using manuals for use with RAG for a Discourse AI bot on the Discourse site?
    a. What format are the current format of such manuals? Hard copy, PDF, web site, other?
    b. How often will the knowledge base be updated with new or versions of the manuals?
    c. Will the manuals be made public on the site?
  2. What other information besides the Discourse forum could be of value to an AI for responding on the support site?
  3. When creating a Discourse AI bot do you foresee other needs for the bot? I ask this because if you read through the history of Discourse creating the AI bot you will see it go through many changes, it was not the perfect bot with the first creations and still is seeing many changes.
  4. Can you create some examples of how you see the bot or agents being used? For example show a few different cases of a user chatting with the bot.
    Note I use the word agent with similarity to an Langchain agent and not to be confused with an OpenAI assistant.

What does that mean?

Non public information only the bot sees directly (but ultimately affects the way it communicates with the user)


Had an idea now that I have seen OpenAI GPT being created in a YouTube demo.

You could create an OpenAI GPT, which requires a ChatGPT Plus subscription and probably access after being on a waitlist, similar to what you seek to learn about the pros and cons of such technology and that can be leveraged for creating an Discourse AI bot. I am not suggesting you create the bot but would have a better understanding of what the AI can do and where it will fail miserably.



PDF or HTML (hosted online) but I’m fine with converting them to text/markdown and maintaining them as Discourse posts. We can write a script to make it easier.

Product manuals won’t be updated frequently but other content in the knowledge base will. Things like FAQs, frequent/trending issues, etc… would be updated more often. We post as much as possible publicly but some would make sense to keep private - not because it’s sensitive information but because it might look weird and confusing as a public post.

Yes, official product manuals are usually PDFs. For example…

For now, probably nothing. There’s too much risk letting it search the internet and getting results that aren’t true for us.

In the future, I could see value in giving the bot additional context about the customer’s own security system but that might never be possible due to privacy issues.

It seems to me that since it’s all based on RAG the most important thing is giving us fine-grain control over how the bot searches the forum and prompts the AI. From what I’ve read it seems like merefield’s Discourse Chatbot does that better today. I’m hoping to test that theory this week.

For us it’s always providing user/customer support by answering their questions or solving their problems. For example:

  1. Helping them figure out what to buy to secure their specific home, sometimes taking into consideration home security equipment they already have.
  2. Helping them install and use their system.
  3. Helping them troubleshoot when they have problems.
  4. Answering curiosity questions they have about how things work, what might be available in the future, etc…

The best examples are the real ones that get posted every day and our support team (or sometimes community members) answer.


Thank you!

I know this is just a simple link for you to share but for many others who are following in your footsteps of getting a support site to have a AI bot respond and allowing others to see the interchange of ideas in a public forum this is cutting edge. I know companies are doing this in private but you are doing it publicly. :slightly_smiling_face:

I was also happy to see the diagrams as that reaffirmed what I was thinking was correct.

I would not do this.

The bot will not actually search the forum in a manner similar to how text is searched for in post. The keyword and related technology you need to learn for this is Vector Database and Nearest neighbor search.

To see code examples of such being used check out

Wishing you the best with that!

1 Like

Normally I would not recommend a product one has to pay for however years ago I learned of

when answering this StackExchange question.

I have used the free online version a few times and as I noted in my answer I have not seen anything as good and I have looked a lot over the years for ways to get text and other info from a PDF.

I want to add some more info on this as some may think that the reference (name of the manual, the date of publishing, the section and paragraph) is part of the information used with RAG and I did not envision it that way and did not give some of the needed details. I did envision using metadata; for me data and metadata have very different meanings and usage.

When I asked ChatGPT about this it gave a reply that was along the lines I envisioned but with a few more specifics like Elasticsearch.


How can one use RAG (Retrieval Augmented Generation) with a PDF manual for a product and get the title, publication date, page and paragraph number from the manual as a reference with the noted information?


Using Retrieval Augmented Generation (RAG) with a PDF manual to extract specific information like the title, publication date, page, and paragraph number involves several steps. RAG, developed by Facebook AI, combines the power of retrieval (fetching relevant documents or data) with language generation, enabling the model to reference and incorporate external knowledge. Here’s a general approach to use RAG with a PDF manual:

  1. Convert the PDF Manual to a Text Format:

    • First, you need to convert the PDF manual into a text format that can be processed. Tools like Adobe Acrobat, pdftotext (part of the Xpdf project), or Python libraries such as PyPDF2 or PDFMiner can be used for this purpose.
  2. Preprocess the Text:

    • Once you have the text, you may need to preprocess it. This includes removing unnecessary formatting, handling special characters, and possibly segmenting the text into sections, pages, and paragraphs with identifiable markers.
  3. Indexing the Manual:

    • For RAG to work effectively, the content of the manual needs to be indexed. This means creating a database or a structured format where each piece of information (like a paragraph or a section) is stored with its corresponding metadata (page number, paragraph number, etc.). Elasticsearch or a similar search engine can be used for this purpose.
  4. Integrating RAG:

    • With your manual indexed, you can now integrate RAG. RAG typically combines a transformer-based language model (like BERT or RoBERTa) for retrieval and a sequence-to-sequence model (like BART or T5) for generation. You’ll need to set up RAG to query your indexed manual. This might involve customizing the retrieval component to work with your search engine.
  5. Querying the System:

    • When you query the RAG system, the retrieval component first finds the most relevant sections of the manual based on your query. This information is then passed to the generation component, which formulates a response.
  6. Extracting Reference Information:

    • To get the title, publication date, page, and paragraph number, you’ll need to ensure that these details are either included in the retrieved segments or that the system can reference back to the indexed data to fetch this metadata. This might require additional logic in your system.
  7. Post-Processing for References:

    • After the response is generated, you may need a post-processing step to format the references correctly, ensuring that the title, publication date, page, and paragraph number are accurately included.
  8. Fine-Tuning and Evaluation:

    • Depending on your specific requirements and the complexity of the manual, you might need to fine-tune the RAG model on similar types of documents or queries. Evaluate the system’s performance and adjust your approach as needed.
  9. Integration with User Interface:

    • Finally, if this system is meant for end-users, integrate it with a user-friendly interface where users can input queries and receive responses with the referenced information.

Remember that the effectiveness of this approach heavily depends on the quality of the indexed data, the capabilities of the retrieval model, and how well the generation model has been trained or fine-tuned for your specific use case.

Note: While I do use ChatGPT to generate information I also do not post information from ChatGPT if I think it is wrong. In other words ChatGPT can be useful but you have to know if the information given is correct or not. While I have not implemented exactly what ChatGPT notes here, I have enough experience with this that I would use this or a deviation of it.

1 Like

I set up a test forum for each plugin, Discourse AI and Discourse Chatbot.

The Discourse Chatbot plugin has a step where you create the embeddings which uses the embeddings API from OpenAI. The Discourse AI plugin doesn’t seem to do this (it showed no embeddings activity on the OpenAI dashboard) but it still finds posts on our forum. If it doesn’t generate embeddings to search semantically, how does the Discourse AI plugin search the forum? Does it use regular text search? Does it compute it’s own embeddings?

1 Like

Our AIBot will use hybrid search, combining both keyword and semantic search. If you didn’t enable embeddings properly, it will only use keyword search. To enable embeddings follow Enable Related Topics. After that it will slowly backfill in the background using the choosen model / provider.


Any news/updates on how it is working?

We decided to go with Discourse Chatbot because it has some features we need and Robert is helping us build a few additional features to get us all the way there. I’ll post a link when it’s up and running in case you want to try it but it will probably take at least a month.


Great to hear about that for your forum!

Thanks for the update.


Maybe RLHF is on the way out.

1 Like


Not suggesting this be added but useful for ideas in the future.

“DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue” by Lang Cao (pdf )


For those that recognize the info it was also posted here.


“RAG vs Fine-tuning: pipelines, tradeoffs, and a case study on agriculture” (pdf) by Aman Gupta, Anup Shirgaonkar, Angels Balaguer, Bruno Silva, Daniel Holstein, Dawei Li, Jennifer Marsman, Leonardo Nunes, Mahsa Rouzbahman, Morris Sharp, Nick Mecklenburg, Rafael Padilha, Ranveer Chandra, Renato Cunha, Roberto Estevão, Ryan Tsang, Sara Malvar, Swati Sharma, Todd Hendry, Vijay Aski, Vijetha Vijayendran, Vinamra Benara