Discourse Chatbot šŸ¤– (Now smarter than ChatGPT!*)

:information_source: Summary A cloud chatbot adaptor for Discourse, currently supporting OpenAI
:hammer_and_wrench: Repository Link GitHub - merefield/discourse-chatbot: An AI bot and agent for Topics and Chat in Discourse, currently powered by OpenAI
:open_book: Install Guide How to install plugins in Discourse

Enjoying this plugin? Please :star: it on GitHub ! :pray:

What is it?

  • The original Discourse AI Chatbot!
  • Converse with the bot in any Topic or Chat Channel, one to one or with others!
  • Customise the character of your bot to suit your forum!
    • want it to sound like William Shakespeare, or Winston Churchill? can do!
  • The new ā€œAgent Modeā€* can now:
    • Search your forum** for answers so the bot can be an expert on the subject of your forum.
      • not just be aware of the information on the current Topic or Channel.
    • Search Wikipedia
    • Search current news*
    • Search Google*
    • Return current End Of Day market data for stocks.*
    • Do ā€œcomplexā€ maths accurately (with no made up or ā€œhallucinatedā€ answers!)
  • Uses cutting edge Open AI API and functions capability of their excellent, industry leading Large Language Models.
  • Includes a special quota system to manage access to the bot: more trusted and/or paying members can have greater access to the bot!
  • Also supports Azure and proxy server connections.

*sign-up for external (not affiliated) API services required. Links in settings.

Agent mode is very smart and knows facts posted on your forum:

Normal bot mode can sometimes make mistakes, but is cheaper to run because it makes fewer calls to the Large Language Model:


(Sorry China! :wink: )

:biohazard: **Bot’s ā€œvisionā€ - what it can see (potentially share) and privacy :biohazard:

This bot can be used in public spaces on your forum. To make the bot especially useful there is the new (currently experimental) Agent mode. This is not set by default.

In this mode the bot is, by default, privy to all content a Trust Level 1 user would see, working from this setting:

image

Thus, if interacted with in a public facing Topic, there is a possibility the bot could ā€œleakā€ information if you tend to gate content at the Trust Level 0 or 1 level via Category permissions. This level was chosen because through experience most sites usually do not gate sensitive content at low trust levels but it depends on your specific needs.

This can be eliminated by:

  • only using the bot in normal mode (but the bot then won’t see any posts)
  • only allowing the bot to be used in Categories that require the set trust level or above to read.
  • mitigated with moderation

In addition, anything it can ā€œseeā€ gets shared with Open AI.

You can see that this setup is a compromise. In order to make the bot useful it needs to be knowledgeable about the content on your site. Currently it is not possible for the bot to selectively read members only content and share that only with members which some admins might find limiting but there is no way to easily solve the that whilst the bot is able to talk in public. Contact me if you have special needs and would like to sponsor some work in this space. Bot permissioning with semantic search is a non-trivial problem. The system is currently optimised for speed. NB Private Messages are never read by the bot.

FYI’s

  • May not work on mulit-site installs (not explicitly tested), but PR welcome to improve support :+1:
  • Open AI API response can be slow at times on more advanced models due to high demand. However Chatbot supports GPT 3.5 too which is fast and responsive and perfectly capable.
  • Is extensible and supporting other cloud bots is intended (hence the generic name for the plugin), but currently ā€˜only’ supports interaction with Open AI Large Language Models (LLM) such as ā€œChatGPTā€. This may change in the future. Please contact me if you wish to add additional bot types or want to support me to add more. PR welcome.
  • Is extensible to support the searching of other content beyond just the current set provided.

Setup

Intro

Be patient, it’s worth it. Also be aware there are some special steps involved in uninstalling this plugin, see the guide below.

Required changes to app.yml

This new update brings forum search which requires embeddings and parts of the changes represent a breaking change so listen up!

I use the Postgres extension known as pg_embeddings. This promises vector searches 20x the speed of pgvector but requires bespoke additions to the build script in app.yml.

Now needs the following added to app.yml in the after_code: section before the plugins are cloned.

(NB you may be able to omit the first three commands if your server can see the postgresql-server-dev-x package)

    - exec:
        cd: $home
        cmd:
          - sudo apt-get install wget ca-certificates
    - exec:
        cd: $home
        cmd:
          - wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
    - exec:
        cd: $home
        cmd:
          - sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
    - exec:
        cd: $home
        cmd:
          - apt-get update
    - exec:
        cd: $home
        cmd:
          - apt-get -y install -y postgresql-server-dev-${PG_MAJOR}
    - exec:
        cd: $home/tmp
        cmd:
          - git clone https://github.com/neondatabase/pg_embedding.git
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config install
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "create extension if not exists embedding;"'

This is necessary to add the pg_embeddings extension.

It is required even if you are not using the agent functionality.

Creating the Embeddings

Only necessary if you want to use the agent type bot and ensure it is aware of the content on your forum, not just the current Topic.

Once built, we need to create the embeddings for all posts, so the bot can find forum information.

Note this is very memory demanding … I’m guesstimating 1 million Posts will require 0.5GB’s of memory for the index (I believe that’s linear so 100,000 Posts will need about 50MBs) but I’d love to hear your experience here. In other words, make sure you have an appropriately scaled machine that can cope with this expected memory demand.

Enter the container:

./launcher enter app

and run the following rake command:

rake chatbot:refresh_embeddings[1]

which at present will run twice due to unknown reason (sorry! feel free to PR) but the [1] ensures the second time it will only add missing embeddings (ie none immediately after first run).

If you get rate limited by OpenAI you can complete the embeddings by doing this:

rake chatbot:refresh_embeddings[1,1]

which will fill in the missing ones (so nothing lost from the error) but will continue more cautiously putting a 1 second delay between each call to Open AI.

Compared to bot interactions, embeddings are not expensive to create, but do watch your usage on your Open AI dashboard in any case.

NB Embeddings are only created for Posts and only those Posts for which a Trust Level One user would have access. This seemed like a reasonable compromise. It will not create embeddings for posts from Trust Level 2+ only accessible content.

Bot Type

Take a moment to read through the entire set of Plugin settings. The chatbot bot type setting is key:

Agent mode is superior but will make more calls to the API, potentially increasing cost. That said, the reduction in its propensity to ultimately output ā€˜hallucinations’ may facilitate you being able to drop down from GPT-4 to GPT-3.5 and you may end up spending less despite the significant increase in usefulness and reliability of the output. GPT 3.5 is also a better fit for the Agent type based on response times. A potential win-win! Experiment!

For Chatbot to work in Chat you must have Chat enabled.

OpenAI

You must get a token from https://platform.openai.com/ in order to use the current bot. A default language model is set (one of the most sophisticated), but you can try a cheaper alternative, the list is here

There is an automated part of the setup: upon addition to a Discourse, the plugin currently sets up a AI bot user with the following attributes

  • Name: ā€˜Chatbot’
  • User Id: -4
  • Bio: ā€œHi, I’m not a real person. I’m a bot that can discuss things with you. Don’t take me too seriously. Sometimes, I’m even right about stuff!ā€
  • Group Name: ā€œai_bot_groupā€
  • Group Full Name: ā€œAI Botsā€

You can edit the name, avatar and bio (see locale string in admin → customize → text) as you wish but make it easy to mention.

It’s not free, so there’s a quota system, and you have to set this up

Initially no-one will have access to the bot, not even staff.

Calling the Open AI API is not free after an initial free allocation has expired! So, I’ve implemented a quota system to keep this under control, keep costs down and prevent abuse. The cost is not crazy with these small interactions, but it may add up if it gets popular. You can read more about OpenAI pricing on their pricing page.

Example calculations can be found in this post:

In order to interact with the bot you must belong to a group that has been added to one of the three levels of trusted sets of groups, low, medium & high trust group sets. You can modify each of the number of allowed interactions per week per trusted group sets in the corresponding settings.

You must populate the groups too. That configuration is entirely up to you. They start out blank so initially no-one will have access to the bot:

image

In this example I’ve made staff have high trust access, whilst trust_level_0 have low trust. They get the corresponding quotas in three additional settings.

Note the user gets the quota based on the highest trusted group they are a member of.

ā€œPrompt Engineeringā€

There are several locale text ā€œsettingsā€ that influence what the bot receives and how the bot responds.

The most important one you should consider changing is the bot’s system prompt. This is sent every time you speak to the bot.

For the basic bot, you can try a system prompt like:

’You are an extreme Formula One fan, you love everything to do with motorsport and its high octane levels of excitement’ instead of the default.

(For the agent bot you must keep everything after ā€œYou are a helpful assistant.ā€ or you may break the agent behaviour. Reset it if you run into problems. Again experiment!)

Try one that is most appropriate for the subject matter of your forum. Be creative!

Changing these locale strings can make the bot behave very differently but cannot be amended on the fly. I would recommend changing only the system prompt as the others play an important role in agent behaviour or providing information on who said what to the bot.

NB In Topics, the first Post and Topic Title are sent in addition to the window of Posts (determined by the lookback setting) to give the bot more context.

You can edit these strings in Admin → Customize → Text under chatbot.prompt.

Supports both Posts & Chat Messages!

The bot supports Chat Messages and Topic Posts, including Private Messages (if configured).

You can prompt the bot to respond by replying to it, or @ mentioning it. You can set how far the bot looks behind to get context for a response. The bigger the value the more costly will be each call.

There’s a floating quick chat button that connects you immediately to the bot. Its styling is a little experimental (modifying some z-index values of your base forum on mobile) and it may clash on some pages. This can be disabled in settings. PR welcome to improve how it behaves.

image

Uninstalling the plugin - Important!

Because of the custom index installed for the plugin, removing the plugin requires additional work than simply removing those lines you added to app.yml. Your site will not function if you do not follow these steps as the container will fail to start properly.

  1. Ensure you have all the setup in place as described in ā€œSetupā€, ie the additional script in the after_code section and the plugin cloned and have rebuilt at least once since adding those. Complete them and rebuild if you missed any. (this is the plugin installed state).
  2. Before you remove these things do the following:
    • ./launcher enter app
    • rake db:migrate:down VERSION=20230826010103 - reverses an index rename
    • rake db:migrate:down VERSION=20230826010101 - reverses table name change
    • rake db:migrate:down VERSION=20230820010105 - drops the index
    • exit
  3. Now remove the app.yml edits you added to install the app (after_code script section and clone)
  4. Immediately rebuild with ./launcher rebuild app.

The site should now work without Chatbot.

The only actions that should be needed to re-install is to follow the original install instructions.

Thanks for your interest in the plugin!

Disclaimer: I’m not responsible for what the bot responds with. Consider the plugin to be at Beta stage and things could go wrong. It will improve with feedback. But not necessarily the bots response :rofl: Please understand the pro’s and con’s of a LLM and what they are and aren’t capable of and their limitations. They are very good at creating convincing text but can often be factually wrong.

Important Privacy Note: whatever you write on your forum may get forwarded to Open AI as part of the bots scan of the last few posts once it is prompted to reply (obviously this is restricted to the current Topic or Chat Channel). Whilst it almost certainly won’t be incorporated into their pre-trained models, they will use the data in their analytics and logging. Be sure to add this fact into your forum’s TOS & privacy statements. Related links: Terms of use, Privacy policy, OpenAI Platform

Copyright: Open AI made a statement about Copyright here: Will OpenAI claim copyright over what outputs I generate with the API? | OpenAI Help Center

TODO/Roadmap Items

  • Add front and back-end tests :construction:
  • Add ā€œbot typingā€ indicator and ā€œresponse streamingā€ (@Aizada_M, @MarcP)
  • forgot to mention the bot? Get bot to respond to edits that add its @ mention (@frold )
  • Add a badge? You did mention @botname (@frold )
  • Add setting to include Category and Pinned Posts prompt? (@Ed_S)
  • Ditto Bios to each message history prompt? (@Ed_S , @codergautam). Will this even work. Let’s get evidence.
  • Update Discourse Frotz with this better codebase?
  • Add semantic search so that the bot can read your forum Posts and become an ā€œexpertā€ :wink: :white_check_mark:
  • Add agent behaviour to reduce hallucinations and leverage reliable, factual information. :white_check_mark:
  • Add extra logic to convert suspected usernames into @ mentions (@frold ) :white_check_mark:
  • Add GPT-4 support (when Open AI deems me worthy enough of access! :sweat_smile: ) :white_check_mark:
  • Add custom model name support. :white_check_mark:
  • Add option to strip out quotes from Posts before passing text to API. :white_check_mark:
  • Improve error transparency & handling for when Open AI returns an error state :white_check_mark:
  • Add retry capability for timed out API requests :white_check_mark:
  • Add support for ChatGPT :white_check_mark:
  • Lint the plugin to Discourse core standards :white_check_mark:
  • Add CI workflows :white_check_mark:
  • Add settings to influence the nature of the bots response (e.g. how wacky it is). :white_check_mark:
  • include Topic Title & first Posts to prompt :white_check_mark:
  • Add setting to switch from raw Post/Message data to cooked to potentially leverage web training data better (suggestion by @MarcP). NB May cost more and limit what is returned as input tokens are counted and cooked is much bigger. think we’ve abandoned this idea

Credits:

*It still uses OpenAI’s chat GPT engine, but can now leverage local functions and data from API calls to limit hallucinations.

75 Likes

Wow! This is super interesting. I am still trying to wrap my head around all the use cases of GPT. I think so many people have become experts and know the best phrases and keywords and even booleans to use when searching Google. But we also know the limitations and our search queries have been within that box of what we know Google can actually spit out for us. GPT seems to take this same search functionality to a whole new level… even a tactical level where it can help you take action with the results you’re given.

Great work on this @merefield!

7 Likes

And would you believe the timing? This just turned up in my inbox:

image

I will try to squeeze in some work to support this at some point.

Present model is pretty good though.

Now back to work :sweat_smile:

9 Likes

We get some pretty strange responses at the moment:

It also sometimes responds like this, with the ā€œCREATE_POST_RESPONSESā€ and also I had one with a flag notice or something along the lines of that:

Any ideas?

3 Likes

That’s quite odd! I’ve had some odd responses but never that odd.

The responses aren’t always factually correct but rarely so bizarre. The demo in the screenshot is fairly typical.

Are you embedding anything in posts? Any images that might be throwing it off? Try it on the end of just a series of text posts?

I just tried this:

and this:

Both were requested at the beginning of a Topic.

You are better to ask it specific things, not just ā€œhelp us outā€: help us out with what?

If your posts are cluttered with links and images, try Chat.

4 Likes

This is just chat and nothing else & I still get these very peculiar responses. I feel like the AI lacks it’s usual knowledge and just seems kind of ā€œstupidā€.

2 Likes

I cannot explain your experience. It is far more capable. I’d be keen to know if others are having this issue.

Could you please try Discourse Chat?

2 Likes

Works fine in discourse chat, strange behaviour .At the moment our community doesn’t use the chat. :confused:

2 Likes

Did you try it with the default bot name?

2 Likes

ChatGPT model in test: (coming soon :rocket: )

4 Likes

Support for ChatGPT along with some other improvements has been merged:

  • Adds support for the model behind ChatGPT (ā€œgpt-3.5-turboā€)
    • improves responses and avoids some significant prompt ā€˜hacking’
    • reduces cost of use by a factor of 10!
    • provides option to modify nature of response by setting up a freeform expectation, e.g. ā€œYou are a helpful assistantā€ is default, or perhaps try ā€œYou are a mad scientistā€?!
  • Changing models is now possible via a helpful dropdown, no need to type in the exact name.
  • Adds setting for delay in seconds for paying customers who want a faster response and have better rate limits
  • Cleans up some code and improves console feedback.

@jimmynewtron please let me know if this resolves your issue - if not could you please share with me the list of plugins you are using? Feel free to PM.

6 Likes

Hey @merefield this is so cool and I love the group limits.

However, the behavior reported above, happened to me while testing in a slightly different way.

But I can conclude, somehow it got context from a very different topic (XYZ) and is using that in the reply in topic ABC. From front-end, it looks like it’s using the context from a wrong topic IP, is this possible?

5 Likes

That’s interesting. Bugs are possible, it’s early days too.

Is it working most of the time? Can you reproduce on the latest version?

4 Likes

I always used the latest version, never installed the one before.

I am still playing around, I will let you know.

Also, I notice it replies even if it’s not mentioned in topics. I assume you want it to only reply if @ or pressed Reply on one of his posts, not if you press Reply on the topic:

2 Likes

That’s intended behaviour so not a bug as such. If that’s demonstrably a problem I’m happy to reconsider that behaviour.

3 Likes

Of course, if you have a long topic and want to ask AI for input once, you don’t want it do keep participating with the topic for the rest of it’s lifetime, every time someone replies, right?

5 Likes

Right. That’s great feedback. I’ll see what I can do.

Presumably same will apply to chat? The only downside is the interface there makes replies more manual effort.

3 Likes

There might be some info (you probably already know) in this topic, I made a simple prompt-reply bot and it only listens for type 1/2 notifications and that works great.

In channels you’d want the same behavior, for private chat with the bot it can always reply but there should probably be a different limit in prompt history, and perhaps a ā€œmagic wordā€ to reset the context history for the bot?

3 Likes

I think there is a special case too, where if the bot is the only other actor in the topic, it makes sense to assume a further post is meant for the bot regardless of the means of replying if the last post was a bot post? In other words, retain the current behaviour if the only participants so far are the bot and the user and the same user is replying?

ie: add a clause that if you reply to the Topic, and the last post is owned by the bot, do not assume user is replying to the bot if there is more than one human user in the Topic*, otherwise do?

(*which would also be the case if the person replying has not yet posted, ie there’s about to be two users in the Topic)

This logic would then work for one to one PMs or chats as well as open topics/channels.

EDIT: in fact this is simple: count human participants in topic (and include typist of new Post) and if this is more than one switch off convenient reply? (convenient because it also has a hotkey).

5 Likes

Randomly, after testing a bit back and forth, the following response is given:

Sorry, I’m not well right now. Lets talk some other time. Meanwhile, please ask the admin to check the logs, thank you!

Error log: OpenAIBot: There was a problem: Couldn't find Post with 'id'=99 [WHERE "posts"."deleted_at" IS NULL]

That would be a possible approach - as long as so far is always respected.

I think behavior should be as follows:

  • Topic - only reply if mentioned or "reply button to an earlier reply of the bot is clicked.
    • I don’t think we should encourage long AI chats on a forum - AI is useful if you call him and ask him a prompt, if it always replies in a topic it would be more towards entertainment because you don’t really need it, right?
    • However, I could see this working if you can specify a category as ā€œAI will always replyā€-setting?
  • PM - always reply
  • Chat channels - only reply if mentioned or "reply button to an earlier reply of the bot is clicked. Unless a chat-category is specified as ā€œalways replyā€?
  • Private chat with bot - always reply

This is one way to approach it. But I still believe on the forums the AI would be better off (more functional) if it’s call-based only by default and specify categories where you want AI to be ā€œactive aggressiveā€, lol.

EDIT: Looks like AIBot is hijacking every private-chat without being mentioned. To repro chat a random user and AIBot will reply. Also noticed that mentioning Alice in another chat-category then where it’s allowed to reply, it still replies, not sure if that is intended.

Got to say this is way cooler than the simple bot I created and I hope the team will include this in official, I can see some very useful appliances with this bot (what if AIBot knows every (public!) thing that’s on your forum and can utilize that in the chat without a huge token usage. Keeps dreaming…

4 Likes