Why is my AI forum helper struggling to answer questions?

One misconception many of us have about AI is that our expectations are:

graph TD
    Question -->|"Expensive AI Magic"| Answer
    Answer["The perfect answer"]

It can feel extremely frustrating to spend $$$ on a flagship model and still end up getting sub-par results.

In this topic I would like to spend some time adjusting expectations and pointing at various features Discourse AI ships to help get better answers.

How Discourse AI Actually Works

The reality is more nuanced. When you ask a question, this is what happens:

graph TD
    Question --> Consolidation[Question Consolidation]
    Consolidation --> RAG[RAG System]
    RAG --> ToolSelection[Tool Selection]
    ToolSelection --> Search[Search]
    ToolSelection --> Categories[Categories]
    ToolSelection --> Tags[Tags]
    Search --> Context[Gather Context]
    Categories --> Context
    Tags --> Context
    Context --> Answer[Generate Answer]
  1. Your question is first consolidated and understood
  2. Our RAG system searches through available knowledge (optional)
  3. Based on what it finds, it decides which tools to use (optional)
  4. The tools gather specific information
  5. Everything is combined to generate an answer

Context is Everything

When using AI systems to answer questions, context is key.

Large language models (LLMs) such as GPT-4 or Claude 3.5 Sonnet are trained on a huge amount of public data. However there are two caveats that make it less ideal at answering domain specific questions:

  1. LLMs are trained on public data
  2. LLMs have a cut-off training date, meaning public data that is only a few months old is likely missing from the dataset

In the case of a closed forum, nothing will be in the dataset making context even more key.

Context is information we feed an LLM prior to it answering a question to help guide it in answering it correctly.

How do you get context?

Context can be supplied in a few ways:

  • Automatically using RAG (Retrieval-Augmented Generation) - When you upload documents to a persona we can search the document for answers prior to replying. We first consolidate your question and then search the content using semantic search. This helps guide which tools to use.
  • Automatically using tools - LLMs can use tools (such as search or reading topics) to find additional information based on RAG guidance.
  • Manually - you may paste in a large amount of text and then ask an LLM questions about it.

Tool based context

The Discourse Forum Helper uses tools for search, reading topics and post, listing categories or tags.

When you ask Forum Helper a question, the model first uses RAG to understand what might be relevant, then decides which tools will help get better answers.

In this example you can see that I asked the forum helper what I was up to, it in turn decided to search for content by Sam that was posted in the past week and got 21 results.

To understand what new context the model got from this search you can enable the setting: AI bot debugging allowed groups

It will add a debugging button at the bottom of every post, when you click on it you can it can show the exact context that was supplied to the LLM.

The discourse search tool is very rich in options:

This is both a blessing and a curse. LLMs often make sub-optimal choices if there are too many options on the table, we attempted to keep the number of options not … too high… but depending on your needs you may prefer to use a custom tool for search.

What do I do if I do not get the results I expect?

Given context is everything, the first thing to confirm is

  1. Did the RAG system find relevant content to guide the tools
  2. Did the LLM use the right tools based on that guidance
  3. Did the tools find the expected results

For example take this failure:

This is a classic failure of both RAG guidance and tool usage.

The LLM searched for:

bug critical urgent broken order:latest status:open status:public

The keyword component of this search is pretty poor yielding a single result, semantic search then has a very hard time finding the results cause any time someone yells loudly it treats it as urgent.

What probably would yield a better result is:

Find all open bugs ordered by op_likes that were reported in the past 2 weeks. However this particular subset on information is not available to the search function as is, it would require a custom tool.

Given the large variance and nuance here, the first thing to do is to monitor existing interactions users have with your bots and collect as many failures as you can.

What do I do with failed cases?

When you have a failed interaction you have a few options:

  1. You can improve the system prompt to hint the bot better
  2. You can document information on the forum to give the bot better search results - you can prioritize your documentation category so it is more likely to find it
  3. You can upload a document with more context, we will split up the document into fragments and supply the closest fragments to the consolidated question to help ground the bot.
  4. If your persona is predominantly about search, force usage of the search tool.
  5. You can develop custom tools to better meet your needs
  6. You can increase the amount of context you supply to the LLM (more search results, etc)

AI can be fuzzy, solving one problem can create another one

Be mindful that getting the “perfect” AI question answering bot is a journey. Creating monitored feedback loops and evaluating how well it does regularly will allow you to improve it over time.

Be mindful of how the tech is built, if you need access to context of 10,000 posts to answer a question it may be unfeasible to feed in the entire list of posts into the context for answering with current context windows.

The key is understanding that getting better results is an iterative process

11 Likes