Building a technical support chatbot

,

Adding an AI chatbot to Discourse is easy (thanks to 2 great plugins). Adding a chatbot that performs technical support is a lot harder! This post shares our experience setting up a technical support chatbot for support.suretyhome.com - what we wanted, problems we had, how we solved them, and where we’ll go from here.

Our support team is only available during regular business hours but customers want help around the clock. We aren’t trying to replace the support team. Our goal is to augment the support team with a bot that’s:

  • Available 24/7, including nights and weekends, just like our forum
  • Responds immediately, while our human support team takes a little longer
  • May be able to answer questions that users couldn’t answer with a forum search

Here is our experience.

Picking A Plugin

There are two really good plugins that provide an AI Chatbot.

  1. Discourse AI
  2. Discourse Chatbot

The Discourse AI plugin is the official AI plugin from the Discourse development team. It includes a chatbot as well as other AI features. The Discourse Chatbot plugin is only a chatbot. It was created before Discourse AI and focuses on doing that one thing well.

Initially, we had no idea which one to use so I asked the question here to get some advice.

We received a lot of great help. We ended up going with Discourse Chatbot because it’s more flexible as a chatbot, with more bells and whistles (options) to customize. Our use case had some specific demands that didn’t seem doable yet with Discourse AI. Either can be a great choice. Which one is right for you depends on your forum’s specific needs.

Initial Setup

The initial setup for Discourse Chatbot can be a bit of a project because there are a lot of options to choose from and customizations you can make. Follow the setup instructions carefully and make sure to look at all the settings.

Our goal was to provide a chat-like experience so we only wanted the bot to work in the Discourse Chat, not in public topics or PMs. The first steps we had to take were

  • Set up Discourse Chat (Discourse Chatbot depends on it)
  • In Discourse Chatbot settings, chatbot permitted in chat: enabled

Prompt Engineering

Discourse Chatbot is incredibly customizable. Anything that isn’t a setting is customized under Discourse’s Customize > Text. This is where you do all your prompt engineering. In Customize > Text, search for chatbot.prompt to filter down to all the customizable prompt text.

To make the bot behave the way we want, we’ll need to edit the system prompt. But there are 2 of them, one for public and one for private discussions. Since we’re only using the bot in private chat channels, we needed to edit chatbot.prompt.system.rag.private.

As a technical support bot, we need it to be more conservative and accurate than LLMs tend to be out of the box. Our system prompt needed to be relatively long to accomplish this. In the system prompt, give the LLM instructions and context that answer questions such as:

  • Who is your bot? What role should it play?
  • What background info or context does it need to know?
  • What topics is it supposed to discuss? What topics should it never discuss?
  • What writing style or tone should it use?
  • What should it do when the user is frustrated?

In addition to this general prompt engineering, the system prompt is also a place where you can try to solve problems you discover when testing. If you find your bot makes an egregious error, you might be able to fix it by adding instructions to the system prompt. But beware, prompting is merely a suggestion to the LLM. You’re not programming it. You’re only asking it to behave a certain way. It might not listen.

Temperature and Top P

Another tool to make the bot more conservative and less likely to make things up is the temperature setting. By default, the temperature setting is 100 which is 50% of the maximum temperature. You can reduce it more to make the bot more conservative or deterministic, and less likely to make mistakes. But when the temperature is set very low (like 0) the LLM doesn’t sound nearly as impressive. It’s a tradeoff decision you’ll have to make.

In addition to temperature, there is the Top P setting. You probably won’t need this but it’s there if you do. Refer to the OpenAI docs for more information.

The Discourse Chatbot settings for this section are:

  • chatbot request temperature
  • chatbot request top p

Problem: LLM Inaccurate And Outdated

The LLM was trained on a vast amount of general data, some time ago. We need it to have the most up-to-date and specific information about our forum and be as accurate as possible. The solution is Retrieval-Augmented Generation (RAG).

RAG will search the forum for additional information before replying to the user. As a technical support bot, we can’t just rely on the LLM’s trained knowledge, we need the bot to search our forum for technical information before replying.

In order to do RAG, Discourse Chatbot needs to create a database of “embeddings” that represent each of our forum posts as a vector of semantic “features”. This needs to be enabled but I recommend waiting to enable it after setting up your embeddings strategy, which we’ll cover in the next section.

The Discourse Chatbot settings for this section are:

  • chatbot bot type high trust: RAG
  • chatbot bot type medium trust: RAG
  • chatbot bot type low trust: RAG
  • chatbot embeddings enabled: enabled (after setting up embeddings strategy)

Problem: Many Forum Posts Are Not Useful

Having the bot search your forum before replying (RAG) is great but it introduces a new problem. Many of the posts on a forum aren’t very useful. Some posts are useful, and those are the ones you want the bot to find, but many are just conversational, confusing, or flat out wrong. Our solution is to curate a knowledge base (KB) of only posts that we want the bot to find.

To do this in Discourse Chatbot, we use the categories embeddings strategy. The embeddings strategy determines which posts are available to the chatbot when it searches the forum. Our approach is to have a single, non-public category serve as our chatbot knowledge base, so we choose categories as the embeddings strategy. The knowledge base category needs to be identified in the embeddings categories setting as well.

We use a non-public category (only visible to staff) because we’re duplicating a lot of public topics in this category and we don’t want users to see the duplicates. Duplicating topics into a private category introduces a couple other problems.

  1. It’s a lot of effort to copy them and even worse to maintain the copies when topics get updated or replied to.
  2. The topics that the bot finds when it searches the forum aren’t public topics that users should be given links to as a reference. We need to send the LLM links to the public topics.

In order to solve these 2 problems, we created minimal tools to help us copy topics from public categories into our private chatbot KB category and keep the KB copies updated when public topics are updated. They’re available here if you’d like to use the same approach we’re using.

KB tools imports an entire public topic into the private KB category as a new topic with only 1 post, by concatenating all the public topic’s posts together and adding a link to each public post before its content. This way, when the bot finds the private KB post in a RAG search, it gets the content of the whole public topic and URLs to the public posts that it can include in the reply as references.

The Discourse Chatbot settings for this section are:

  • chatbot embeddings strategy: categories
  • chatbot embeddings categories: (your private KB category)
  • chatbot forum search function include topic titles: disabled (default)

Additionally, you need to remove some interpolation keys from the forum search prompt because they’re not relevant when the posts are in a private category.

  • chatbot.prompt.function.forum_search.answer.topic.each.post: Remove %{username} and %{date}
  • chatbot.prompt.function.forum_search.answer.topic.each.topic: Remove %{title} and %{url}

Problem: LLM Won’t RAG Search Enough

Now that we have a private category with curated posts filled with only good information to serve as our chatbot knowledge base, everything should be fine, right? Wrong. The next problem we ran into was that the RAG search function was not being used nearly enough by the LLM.

LLMs such as GPT-4 are just smart enough to be dangerous. They often “think” they know the answer already and don’t need to ask for help when in reality they should do a RAG search and look for the answer in the knowledge base. To solve this, we use the tool choice option and force the LLM to call a function.

We could force a local forum search but we found that just forcing a function call is enough, and we want to give the LLM the freedom to sometimes call another function instead of doing a forum search.

The Discourse Chatbot settings for this section are:

  • chatbot tool choice first iteration: force_a_function

Problem: Forum Search Performance

By forcing a function call, the LLM reliably does a RAG search for almost every reply. But we were still getting bad results. The bot wasn’t finding the right posts in the knowledge base. With so much information in our posts, there was a lot of “noise” interfering with the search function’s ability to find the best post.

For example, imagine the user is asking the bot how to stop their HP Laser Jet printer from beeping. The LLM might do a forum search with the query, “HP LaserJet stop beeping”. There might be a post in the knowledge base that perfectly addresses that problem but the “question” portion of it (which would closely match the query) is only 2% of the text in the post. The remaining 98% of text are the troubleshooting steps and answers.

Discourse Chatbot’s local forum search is a semantic search that uses the vector embeddings to find the most similar post (or few posts) to the query. The question portion of the best post is very similar to the query but it’s only 2% of the overall text. The remaining 98% of text make the post not-so-similar to the query, so the post doesn’t rank high in a search for that query.

Our solution to this problem is to add “bait posts’’ to our knowledge base topics which only contain text similar to the search query. In our example above, we might add a bait post with just “HP Laser Jet beeping”. The local forum search function will easily find the bait post because it’s very similar to the query. Then we have Discourse Chatbot send the real post content, which includes the answer, back to the LLM instead of the bait post.

Because our KB topics have all their content in the first post, we’re able to use replies in KB topics as bait posts. After we’ve used our KB tools to import topics into the knowledge base, we can just reply to KB topics to create bait posts.

The Discourse Chatbot settings for this section are:

  • chatbot forum search function results content type: topic
  • chatbot forum search function results topic max posts count strategy: just_enough
  • chatbot forum search function results topic max posts count: 1

Bait posts provide a powerful mechanism to optimize the knowledge base for search but you’re stuck doing it blindly. You can’t see the searches as they happen so it’s hard to tell exactly how your bait posts are impacting search rankings. To help solve this problem we created a small program that performs the same semantic search that Discourse Chatbot uses but does it locally on your computer and shows all the details such as similarity scores and ranking. This makes it much easier to create bait posts that actually improve search performance and optimize the knowledge base.

The combination of RAG, a curated knowledge base, forcing the LLM to call a function, and bait posts finally resulted in a pretty good technical support chatbot! :boom: :tada: :partying_face:

LLM Hallucinates URLs

Although the bot was doing well answering technical questions, it still had an annoying hallucination problem. It would frequently hallucinate URLs in replies to the user. This is a known problem with LLMs and the general consensus is you just have to deal with it. We didn’t want our users to deal with it.

Because our bot is providing technical support, we rely heavily on RAG to provide the LLM accurate, up-to-date information. We force it to do a RAG search before every reply. We rely on the LLM to “understand” and communicate with the user but we rely almost entirely on our knowledge base for the technical information used to answer their questions. We can exploit this to stop the bot from hallucinating URLs.

Our solution is to add a constraint to the bot so it can only include a URL in its reply if that URL came from a forum search result. If the LLM tries to include a URL that wasn’t already in a forum search result then Discourse Chatbot will tell the LLM about the problem and ask it to try again. This simple workaround has effectively eliminated URL hallucinations.

The Discourse Chatbot settings for this section are:

  • chatbot url integrity check: enabled

Problem: Chatbot Can’t Handle Everything

Some support questions can’t possibly be handled by the chatbot because they require an action to be taken. For example, if an account change needs to be made or a product needs to be returned (which requires a refund and/or an authorization) then the bot can’t do it.

It also depends on complexity. Many questions are straightforward - if the user had just searched the forum with the right query they would have found the answer. In that case, the bot is pretty good at searching the forum, finding the answer, and presenting it to them. But when the problem requires investigation or complex analysis, the bot often gives a generic answer that doesn’t add much value.

In either of these situations where the bot can’t handle the problem, we need it to escalate the chat to our human support team. Discourse Chatbot has a function for the LLM to do exactly that, called escalate to staff.

The Discourse Chatbot settings for this section are:

  • chatbot escalate to staff function: enabled
  • chatbot escalate to staff groups: (the group you want escalated to)

We’re already forcing the LLM to call a function before replying but now there is a question of which function to call. Should the LLM call the local forum search function (RAG) or the escalate to staff function? It has to make this decision every single time a user sends it a message. Most of the time we want it to call local forum search. We only want it to escalate to staff when it can’t handle the problem, it notices the user is frustrated, or the user explicitly requests it.

We use prompting to guide the LLM on how to decide which function to call. The prompt texts you can edit for this under Customize > Text are:

  • chatbot.prompt.system.rag.private
  • chatbot.prompt.function.forum_search.description
  • chatbot.prompt.function.escalate_to_staff.description

Problem: Can’t Monitor Chats

When users are actually using the chatbot, it’s helpful to see the conversations and watch for bad answers or opportunities to improve. But Discourse doesn’t provide a way for admins to read chats, which makes sense because chats are typically private conversations between people. Chats with the support bot however, are not private conversations and if we can’t review them then we can’t continually improve the bot.

The good news is Discourse offers admins a way to export the forum’s entire chat history as a CSV file.

To solve this problem we created a small program that converts the chat history CSV file into a bunch of HTML files, one for each user who chatted with the bot. We can’t watch bot usage in real time but with this solution we can periodically export the chat history, convert it to HTML files, and review them to work on improving our bot.

Wrapping It Up

After dealing with each of these problems and curating the knowledge base sufficiently, we were finally able to let users start using the chatbot.

So far the results have been mixed. It’s exciting when we see people using the chatbot during nights and weekends, getting their questions answered and problems solved. It’s eye-opening when we see people trying to use the chatbot and not having a great experience. It usually means something is missing or unclear in our knowledge base. On occasion we see a problem that requires a system prompt improvement. And sometimes it’s clear that there is confusion and we need to improve how we present the chatbot to the user.

We’ve seen about half of chats require an escalation to staff. About half of those escalated are cases where the bot could not have possibly handled the situation. The other half (about 25% of chats) are cases where the bot could have resolved the issue but failed, which are opportunities for improvement.

Of the chats that don’t result in an escalation to staff, it can be hard to tell whether the user actually resolved their issue or just gave up and left. It’s obvious when the bot gives a wrong answer what needs to be improved. It’s not always obvious when it’s giving reasonable, good answers whether the user fully understood and their problem was solved, unless they tell us.

Overall, we’re happy with this first iteration of the Surety support chatbot and are looking forward to the LLM improving as well as our knowledge base getting better over time. Working on the knowledge base is our biggest job now.

Discourse Chatbot has added support for hybrid search (both semantic and text search) since we started working on this project so we’ll probably experiment with that soon.

Thanks to @merefield for all his hard work on the Discourse Chatbot plugin! It’s been a lot of fun to work with and has proven to be up to the task.

If anyone else decides to build a technical support chatbot for their forum please reach out and let me know! It would be great to have others to collaborate with and bounce ideas off. I’ll update again as interesting things happen, changes are made, or we learn something new.

10 Likes

FYI this will be implemented in Discourse AI per:

I agree that this is a pretty important feature for RAGs, LLMs are lazy and often refuse to search.

1 Like