Improving quality of search filters in Discourse AI

Firstly, this is truly incredible - thanks so much! :heart_eyes:

I have a question about a persona I created with these Enabled Commands: Search, Categories, Read, Tags.

I have also uploaded a .txt file. My goal with this persona is to have it always search my forum and the uploads, and do RAG with both.

After uploading the .txt file, I noticed that it stopped searching my forum. This is what I have in my Base Search Query: #docs #faqs #anking-wiki
#docs and #faqs are categories while #anking-wiki is a tag.


  • Do I actually need the categories and tags commands? I see these commands just list the tags and categories from my instance. It seems like maybe the search command should be enough.
  • is my base search query ok?
  • why did the bot stop searching my forum?
    • Is the solution simply to adjust my prompt to clarify that it should always use the search function?

Here is the prompt as seen in the debugging info:

You are a helpful Discourse assistant for AnkiHub users.
You understand and generate Discourse Markdown.
You live in a Discourse Forum Message.
You live in the forum with the URL:
The title of your site: AnkiHub Community
The description is: A community forum for the AnkiHub and AnKing projects
The participants in this conversation are: gpt4o_bot, andrew
The date now is: 2024-06-07 23:32:13 UTC, much has changed since you were
You were trained on OLD data, lean on search to get up to date information about this forum
When searching try to SIMPLIFY search terms
Discourse search joins all terms with AND. Reduce and simplify terms to find more results.
The following texts will give you additional guidance for your response.
We included them because we believe they are relevant to this conversation topic.

I tried adding “Be sure to always use the search function” to my prompt. This caused it to indeed use the search function, but with poor results:
Missing Images After AnkiHub Version Update Issue - AI Conversation - AnkiHub Community

It apparently did 3 searches and wrongly reported “Found 0 results” all three times. If you click the search results links, you can see there were, in fact, results. Further, you can see that the parameters in my “Base Search Query” weren’t used, so it searched the whole forum instead of the specific categories.

I think part of the problem here is that our search is a bit confusing:

Searches for bug AND feature - test

What you want here is:


I’m confused. This doesn’t work as I expect it to. I expected the search to return results only from the specified categories. However, the search returns results from any category. The first result from this search is in the theme-component category.


True this is looking buggy, oh no, will have a look

1 Like

This is fixed per:

I agree it is intuative to support #bug #feature something, but this is not quite supported yet and needs a bit of discussion.

categories:bug,feature something will work now, and category:bug,feature something will as well.

1 Like

Awesome! Thanks for getting that fix out so quickly. :bowing_man:

While the search now works properly, the Discourse AI Bot still doesn’t seem to work as expected. Here are my observations with 3.3.0.beta3-dev (a1d881f625):

  • Doesn’t use the base_query option
    • it seems to ignore it entirely. is my config valid?
      CleanShot 2024-06-13 at 11.10.12
    • afaict, there are no requirements for the values of base search query. but no matter what I put in there, it isn’t used appended to search_query.
  • Search results aren’t added to the context for RAG
    • when the search tool is called (albeit, without the base_query), the results don’t seem to be used in the LLM call.
    • Am I misunderstanding how this tool should work? I expected it to do a Discourse search and use the search results (i.e., the contents of the found topics) for RAG.
      See this request for example:
Request JSON
{"model":"gpt-4","messages":[{"role":"system","content":"You are a friendly and helpful assistant for Anki users specializing in the AnkiHub add-on and web app.\nYou _understand_ and **generate** Discourse Markdown.\nYou live in a Discourse Forum Message.\n\nYou live in the forum with the URL:\nYou answer questions specifically about using Anki with AnkiHub.\nThe title of your site: AnkiHub Community\nThe description is: A community forum for the AnkiHub and AnKing projects\nThe participants in this conversation are: gpt4_bot, andrew\nThe date now is: 2024-06-13 18:29:19 UTC, much has changed since you were\ntrained.\nYou were trained on OLD data, lean on search to get up to date information about this forum\nWhen searching try to SIMPLIFY search terms\nDiscourse search joins all terms with AND. Reduce and simplify terms to find more results."},{"role":"user","content":"Can I purchase accounts for other users and automatically subscribe them to my private decks?","name":"andrew"},{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"arguments":"{\"search_query\":\"purchase accounts for other users subscribe private decks\"}","name":"search"},"id":"call_fmzwIbU4ravFRKWRsOQlqGBq"}]},{"role":"tool","tool_call_id":"call_fmzwIbU4ravFRKWRsOQlqGBq","content":"{\"column_names\":[\"title\",\"url\",\"username\",\"excerpt\",\"created\",\"category\",\"likes\",\"topic_views\",\"topic_likes\",\"topic_replies\",\"tags\"],\"rows\":[[\"👪 Organizations\",\"/t/organizations/166454/1\",\"andrew\",\"\\u003ca name=\\\"what-are-ankihub-organizations-1\\\" class=\\\"anchor\\\" href=\\\"#what-are-ankihub-organizations-1\\\"\\u003e\\u003c/a\\u003eWhat are AnkiHub Organizations\\nAnkiHub Organizations allow Organization Owners to purchase AnkiHub for other users and invite those users to the Organization. \\nAnkiHub organizations work like this and have the following features: \\n\\u003ca name=\\\"linked-decks-2\\\" class=\\\"anchor\\\" href=\\\"#linked-decks-2\\\"\\u003e\\u003c/a\\u003eLinked Decks\\nUsers in your organization can automatically get access \\u0026hellip;\",\"2024-02-08T17:44:27.842Z\",\"🎓 Docs \\u003e 🤝 Create \\u0026 Collaborate\",1,166,1,1],[\"How can I get my school to buy AnkiHub for my class?\",\"/t/how-can-i-get-my-school-to-buy-ankihub-for-my-class/228569/1\",\"andrew\",\"Some students have successfully gotten their schools to sponsor AnkiHub memberships for their students. To do so, your school must create an AnkiHub Organization: \\n\\n\\n:bulb: Here are some tips for getting your school to purchase AnkiHub: \\n\\nConnect with your school’s student government people or med e\\u0026hellip;\",\"2024-05-23T21:40:37.477Z\",\"❓ FAQs\",0,50,0,1],[\"What happens after my AnkiHub subscription ends?\",\"/t/what-happens-after-my-ankihub-subscription-ends/167190/1\",\"Ahmed7\",\"If your subscription to AnkiHub runs out and you do not wish to renew. \\nAll your decks will stay, nothing will be removed. However, you will no longer be able to sync with AnkiHub and receive new changes to your decks, nor will you be able to suggest new changes or notes to any pre-existing decks yo\\u0026hellip;\",\"2024-02-10T06:14:53.787Z\",\"❓ FAQs\",0,307,0,1],[\"What if I can't afford AnkiHub?\",\"/t/what-if-i-cant-afford-ankihub/167280/1\",\"Ahmed7\",\"AnkiHub has a scholarship program to get access at a decreased price. Apply \\u003ca href=\\\"\\\"\\u003ehere \\u003c/a\\u003e. (You must \\u003ca href=\\\"\\\"\\u003esign up \\u003c/a\\u003efor an AnkiHub account before applying).\",\"2024-02-10T13:58:24.842Z\",\"❓ FAQs \\u003e AnKing Decks\",0,445,0,1,\"anking-wiki\"],[\"Does AnkiHub offer discounts?\",\"/t/does-ankihub-offer-discounts/228564/1\",\"andrew\",\"The are currently two ways to get a discount on AnkiHub: \\n\\nPurchase our \\u003ca href=\\\"\\\"\\u003eannual billing option\\u003c/a\\u003e for one free month per year\\n\\nAnkiHub is available at $4.58/user/month with an annual billing discount. We can’t afford to discount AnkiHub any further, unfortunately.\\n\\n\\nParticipate in an Ambassador program\\n\\n\\u0026hellip;\",\"2024-05-23T21:34:32.829Z\",\"❓ FAQs\",0,90,0,1]],\"args\":{\"search_query\":\"purchase accounts for other users subscribe private decks\"}}","name":"search"}],"stream":true,"stream_options":{"include_usage":true},"tools":[{"type":"function","function":{"name":"search","description":"Will search topics in the current discourse instance, when rendering always prefer to link to the topics you find","parameters":{"type":"object","properties":{"search_query":{"description":"Specific keywords to search for, space separated (correct bad spelling, remove connector words)","type":"string"},"user":{"description":"Filter search results to this username (only include if user explicitly asks to filter by user)","type":"string"},"order":{"description":"search result order","type":"string","enum":["latest","latest_topic","oldest","views","likes"]},"limit":{"description":"limit number of results returned (generally prefer to just keep to default)","type":"integer"},"max_posts":{"description":"maximum number of posts on the topics (topics where lots of people posted)","type":"integer"},"tags":{"description":"list of tags to search for. Use + to join with OR, use , to join with AND","type":"string"},"category":{"description":"category name to filter to","type":"string"},"before":{"description":"only topics created before a specific date YYYY-MM-DD","type":"string"},"after":{"description":"only topics created after a specific date YYYY-MM-DD","type":"string"},"status":{"description":"search for topics in a particular state","type":"string","enum":["open","closed","archived","noreplies","single_user"]}},"required":[]}}},{"type":"function","function":{"name":"read","description":"Will read a topic or a post on this Discourse instance","parameters":{"type":"object","properties":{"topic_id":{"description":"the id of the topic to read","type":"integer"},"post_numbers":{"description":"the post numbers to read (optional)","type":"array","items":{"type":"integer"}}},"required":["topic_id"]}}}]}

You can see that the search tool was called but doesn’t contain the base_query.

You can also see that content contains only the default system prompt, without any of the search results.

  • Doesn’t display the correct number of search results
    • it says 0 when there are many or 5 when there are 0
    • doesn’t seem to correlate with either the normal search results or semantic similarity results
  • The bot will add invalid query parameters
    • e.g., it added category:non-existent-category to the end of the query.

:pleading_face: Requests

  • It would be super helpful if there was an option to enable verbose logging for discourse-ai so that we non-Ruby-literate folks (I’m learning! :nerd_face: ) can quickly get some insight into what’s happening under the hood by visiting /logs
  • When the search tool is used, add the top results to the context for RAG
  • Advice on how to ensure the base_query is used.

Am I right that the function call should indeed be pushing the results of the tool call into the prompt according to this?

So perhaps it is updating the prompt, but that isn’t clear from the request that is shown from the debugging tool?

Should I not expect to see all of the context there? If not, that’s confusing because the entire prompt context is visible when using a persona that does RAG with uploads.

I’m still confused by the number of results that are reported.

That said, I’m getting much better results after adding the following to the system prompt!

  • Always call both the search and read functions.
  • Call the search function no more than 3 times.
  • When calling the search function, always use this value for the category parameter: “docs,faqs”

This is going to be an absolute game-changer once I get things configured properly.

I’m not sure if you are aware of this, but you are effectively empowering us with something far more powerful than what Intercom is trying to achieve.

I’m already trying to hijack Intercom’s messenger by directing users straight to Discourse AI :nerd_face: :

The AnkiHub AI link opens a PM with the AI Bot with a link like this: /new-message?username=ankihubai_bot

Yes, debugging tool shows everything we send the LLM.

I just made a fix to number of results per:

We were not properly linking to a filtered query.

Recommend against this, use base_query, just tested it works fine: categories:docs,faqs

I do recommend against using GPT4 these days be sure to use GPT4o or Turbo, they are so much cheaper and faster. ’

Just double checking … you did enable

ai_bot_debugging_allowed_groups so you see the debug button?


Yes. I’ll PM you a quick screen share to show that the results of the read function call aren’t used in the prompt.

I also noticed that the read function is never called unless I add a note in the system prompt: “always call the read function.” It would be awesome if there was a way for us to have more control over function calling. For example, set tool_choice: "required" in the case of OpenAI.

I just tried again with this base query and the the correct search parameters were used. This is the first time I’ve seen it work. :person_shrugging:

I know, I’m just experimenting with different models atm :smile: Was wondering if GPT4 would decide to make different function calls. Haiku sounds very promising.

1 Like

Thanks heaps

oh, that is correct, system prompt injection will only happen if the persona has uploads. If there are no uploads then we don’t inject anything

Without uploads we don’t use unconditional system prompt injection and instead rely on tool calls with tool call results, you can see them when scroll through.

A tool free mode targeted at topics is certainly interesting, it would eat a fair bit of tokens, the challenge though is fishing the right content out of a topic.

If a topic for example is 5000 tokens, which tokens would we pick? how would we pick the right tokens.

As it stands we only have N tokens embedded per topic depending on the embedding model you are using.

So we would need to start embedding more chunks per topic or come up with an algorithm to figure out what part of the topic is the most interesting.

1 Like