Discourse Chatbot 🤖 (Now smarter than ChatGPT!*)

:information_source: Summary A cloud chatbot adaptor for Discourse, currently supporting OpenAI
:hammer_and_wrench: Repository Link GitHub - merefield/discourse-chatbot: An AI bot and agent for Topics and Chat in Discourse, currently powered by OpenAI
:open_book: Install Guide How to install plugins in Discourse

Enjoying this plugin? Please :star: it on GitHub ! :pray:

What is it?

  • The original Discourse AI Chatbot!
  • Converse with the bot in any Topic or Chat Channel, one to one or with others!
  • Customise the character of your bot to suit your forum!
    • want it to sound like William Shakespeare, or Winston Churchill? can do!
  • The new “Agent Mode”* can now:
    • Search your forum** for answers so the bot can be an expert on the subject of your forum.
      • not just be aware of the information on the current Topic or Channel.
    • Search Wikipedia
    • Search current news*
    • Search Google*
    • Return current End Of Day market data for stocks.*
    • Do “complex” maths accurately (with no made up or “hallucinated” answers!)
  • Uses cutting edge Open AI API and functions capability of their excellent, industry leading Large Language Models.
  • Includes a special quota system to manage access to the bot: more trusted and/or paying members can have greater access to the bot!
  • Also supports Azure and proxy server connections.

*sign-up for external (not affiliated) API services required. Links in settings.

Agent mode is very smart and knows facts posted on your forum:

Normal bot mode can sometimes make mistakes, but is cheaper to run because it makes fewer calls to the Large Language Model:


(Sorry China! :wink: )

:biohazard: **Bot’s “vision” - what it can see (potentially share) and privacy :biohazard:

This bot can be used in public spaces on your forum. To make the bot especially useful there is the new (currently experimental) Agent mode. This is not set by default.

In this mode the bot is, by default, privy to all content a Trust Level 1 user would see, working from this setting:

image

Thus, if interacted with in a public facing Topic, there is a possibility the bot could “leak” information if you tend to gate content at the Trust Level 0 or 1 level via Category permissions. This level was chosen because through experience most sites usually do not gate sensitive content at low trust levels but it depends on your specific needs.

This can be eliminated by:

  • only using the bot in normal mode (but the bot then won’t see any posts)
  • only allowing the bot to be used in Categories that require the set trust level or above to read.
  • mitigated with moderation

In addition, anything it can “see” gets shared with Open AI.

You can see that this setup is a compromise. In order to make the bot useful it needs to be knowledgeable about the content on your site. Currently it is not possible for the bot to selectively read members only content and share that only with members which some admins might find limiting but there is no way to easily solve the that whilst the bot is able to talk in public. Contact me if you have special needs and would like to sponsor some work in this space. Bot permissioning with semantic search is a non-trivial problem. The system is currently optimised for speed. NB Private Messages are never read by the bot.

FYI’s

  • May not work on mulit-site installs (not explicitly tested), but PR welcome to improve support :+1:
  • Open AI API response can be slow at times on more advanced models due to high demand. However Chatbot supports GPT 3.5 too which is fast and responsive and perfectly capable.
  • Is extensible and supporting other cloud bots is intended (hence the generic name for the plugin), but currently ‘only’ supports interaction with Open AI Large Language Models (LLM) such as “ChatGPT”. This may change in the future. Please contact me if you wish to add additional bot types or want to support me to add more. PR welcome.
  • Is extensible to support the searching of other content beyond just the current set provided.

Setup

Intro

Be patient, it’s worth it. Also be aware there are some special steps involved in uninstalling this plugin, see the guide below.

Required changes to app.yml

These additions were required for the pgembeddings extension which is now deprecated in favour of using pgvector which is available as standard within the standard install.

It is important to follow the right path here depending on whether you’ve previously installed Chatbot or not.

Note, because the (current) lastest version of pgvector is now required (>= 0.5.1), new installs require a minor command added to ensure you have the latest version installed.

I’ve never installed Chatbot before

Please add the following to app.yml in the after_code: section but before the plugins are cloned:

    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "ALTER EXTENSION vector UPDATE;"' 

After one succcesful build with the plugin, you should be able to remove these additional lines and should be able to rebuild afterwards without issue.

I’ve already installed Chatbot before/have it installed

Please add/ensure you have the following in app.yml in the after_code: section but before the plugins are cloned (note that there are three new commands than before):

    - exec:
        cd: $home
        cmd:
          - sudo apt-get install wget ca-certificates
    - exec:
        cd: $home
        cmd:
          - wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
    - exec:
        cd: $home
        cmd:
          - sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
    - exec:
        cd: $home
        cmd:
          - apt-get update
    - exec:
        cd: $home
        cmd:
          - apt-get -y install -y postgresql-server-dev-${PG_MAJOR}
    - exec:
        cd: $home/tmp
        cmd:
          - git clone https://github.com/neondatabase/pg_embedding.git
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config install
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "create extension if not exists embedding;"'
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "DROP INDEX IF EXISTS hnsw_index_on_chatbot_post_embeddings;"'
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "DROP EXTENSION IF EXISTS embedding;"'
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "ALTER EXTENSION vector UPDATE;"' 

After one succcesful build with the plugin, you should be able to remove these additional lines and should be able to rebuild afterwards without issue.

Creating the Embeddings

Only necessary if you want to use the agent type bot and ensure it is aware of the content on your forum, not just the current Topic.

Once built, we need to create the embeddings for all posts, so the bot can find forum information.

Note this is very memory demanding … I’m guesstimating 1 million Posts will require 0.5GB’s of memory for the index (I believe that’s linear so 100,000 Posts will need about 50MBs) but I’d love to hear your experience here. In other words, make sure you have an appropriately scaled machine that can cope with this expected memory demand.

Enter the container:

./launcher enter app

and run the following rake command:

rake chatbot:refresh_embeddings[1]

which at present will run twice due to unknown reason (sorry! feel free to PR) but the [1] ensures the second time it will only add missing embeddings (ie none immediately after first run).

If you get rate limited by OpenAI you can complete the embeddings by doing this:

rake chatbot:refresh_embeddings[1,1]

which will fill in the missing ones (so nothing lost from the error) but will continue more cautiously putting a 1 second delay between each call to Open AI.

Compared to bot interactions, embeddings are not expensive to create, but do watch your usage on your Open AI dashboard in any case.

NB Embeddings are only created for Posts and only those Posts for which a Trust Level One user would have access. This seemed like a reasonable compromise. It will not create embeddings for posts from Trust Level 2+ only accessible content.

Bot Type

Take a moment to read through the entire set of Plugin settings. The chatbot bot type setting is key:

Agent mode is superior but will make more calls to the API, potentially increasing cost. That said, the reduction in its propensity to ultimately output ‘hallucinations’ may facilitate you being able to drop down from GPT-4 to GPT-3.5 and you may end up spending less despite the significant increase in usefulness and reliability of the output. GPT 3.5 is also a better fit for the Agent type based on response times. A potential win-win! Experiment!

For Chatbot to work in Chat you must have Chat enabled.

Bot’s speed of response

This is governed mostly by a setting: ‎chatbot_reply_job_time_delay‎ over which you have discretion.

The intention of having this setting is to:

  • protect you from reaching rate limits of Open AI
  • protect your site from users that would like to spam the bot and cost you money.

It is now default ‘1’ second and can now be reduced to zero :racing_car: , but be aware of the above risks.

Setting this zero and the bot, even in ‘agent’ mode, becomes a lot more ‘snappy’.

Obviously this can be a bit artificial and no real person would actually type that fast … but set it to your taste and wallet size.

Enjoy!

NB I cannot directly control the speed of response of Open AI’s API - and the general rule is the more sophisticated the model you set the slower this response will usually be. So GPT 3.5 is much faster that GPT 4 … although this may change with the newer GPT 4 Turbo model.

OpenAI

You must get a token from https://platform.openai.com/ in order to use the current bot. A default language model is set (one of the most sophisticated), but you can try a cheaper alternative, the list is here

There is an automated part of the setup: upon addition to a Discourse, the plugin currently sets up a AI bot user with the following attributes

  • Name: ‘Chatbot’
  • User Id: -4
  • Bio: “Hi, I’m not a real person. I’m a bot that can discuss things with you. Don’t take me too seriously. Sometimes, I’m even right about stuff!”
  • Group Name: “ai_bot_group”
  • Group Full Name: “AI Bots”

You can edit the name, avatar and bio (see locale string in admin → customize → text) as you wish but make it easy to mention.

It’s not free, so there’s a quota system, and you have to set this up

Initially no-one will have access to the bot, not even staff.

Calling the Open AI API is not free after an initial free allocation has expired! So, I’ve implemented a quota system to keep this under control, keep costs down and prevent abuse. The cost is not crazy with these small interactions, but it may add up if it gets popular. You can read more about OpenAI pricing on their pricing page.

In order to interact with the bot you must belong to a group that has been added to one of the three levels of trusted sets of groups, low, medium & high trust group sets. You can modify each of the number of allowed interactions per week per trusted group sets in the corresponding settings.

You must populate the groups too. That configuration is entirely up to you. They start out blank so initially no-one will have access to the bot:

image

In this example I’ve made staff have high trust access, whilst trust_level_0 have low trust. They get the corresponding quotas in three additional settings.

Note the user gets the quota based on the highest trusted group they are a member of.

“Prompt Engineering”

There are several locale text “settings” that influence what the bot receives and how the bot responds.

The most important one you should consider changing is the bot’s system prompt. This is sent every time you speak to the bot.

For the basic bot, you can try a system prompt like:

’You are an extreme Formula One fan, you love everything to do with motorsport and its high octane levels of excitement’ instead of the default.

(For the agent bot you must keep everything after “You are a helpful assistant.” or you may break the agent behaviour. Reset it if you run into problems. Again experiment!)

Try one that is most appropriate for the subject matter of your forum. Be creative!

Changing these locale strings can make the bot behave very differently but cannot be amended on the fly. I would recommend changing only the system prompt as the others play an important role in agent behaviour or providing information on who said what to the bot.

NB In Topics, the first Post and Topic Title are sent in addition to the window of Posts (determined by the lookback setting) to give the bot more context.

You can edit these strings in Admin → Customize → Text under chatbot.prompt.

Supports both Posts & Chat Messages!

The bot supports Chat Messages and Topic Posts, including Private Messages (if configured).

You can prompt the bot to respond by replying to it, or @ mentioning it. You can set how far the bot looks behind to get context for a response. The bigger the value the more costly will be each call.

There’s a floating quick chat button that connects you immediately to the bot. Its styling is a little experimental (modifying some z-index values of your base forum on mobile) and it may clash on some pages. This can be disabled in settings. You can choose whether to load the bot into a 1 to 1 chat or a Personal Message.

image

Now you can choose your preferred icon (default :robot: ) or if setting left blank, will pick up the bot user’s avatar! :sunglasses:

avatar: image OR icon: image

And remember, you can also customise the text that appears when it is expanded:

image

… using Admin → Customize → Text

(though you may need to customise the CSS a little to accommodate colours and sizing you want).

Uninstalling the plugin - Important!

Due to recent efforts to simplify the plugin, the only steps necessary to uninstall the plugin are now to remove the clone statement.

Thanks for your interest in the plugin!

Disclaimer: I’m not responsible for what the bot responds with. Consider the plugin to be at Beta stage and things could go wrong. It will improve with feedback. But not necessarily the bots response :rofl: Please understand the pro’s and con’s of a LLM and what they are and aren’t capable of and their limitations. They are very good at creating convincing text but can often be factually wrong.

Important Privacy Note: whatever you write on your forum may get forwarded to Open AI as part of the bots scan of the last few posts once it is prompted to reply (obviously this is restricted to the current Topic or Chat Channel). Whilst it almost certainly won’t be incorporated into their pre-trained models, they will use the data in their analytics and logging. Be sure to add this fact into your forum’s TOS & privacy statements. Related links: Terms of use, Privacy policy, https://platform.openai.com/docs/data-usage-policies

Copyright: Open AI made a statement about Copyright here: Will OpenAI claim copyright over what outputs I generate with the API? | OpenAI Help Center

TODO/Roadmap Items

  • Add front and back-end tests :construction:
  • Add “bot typing” indicator and “response streaming” (@Aizada_M, @MarcP) :construction:
  • forgot to mention the bot? Get bot to respond to edits that add its @ mention (@frold )
  • Add a badge? You did mention @botname (@frold )
  • Add setting to include Category and Pinned Posts prompt? (@Ed_S)
  • Ditto Bios to each message history prompt? (@Ed_S , @codergautam). Will this even work. Let’s get evidence.
  • Update Discourse Frotz with this better codebase?
  • Move to use pgvector in favour of pgembedding for vector search now that former supports fast HNSW lookup. :white_check_mark:
  • Add semantic search so that the bot can read your forum Posts and become an “expert” :wink: :white_check_mark:
  • Add agent behaviour to reduce hallucinations and leverage reliable, factual information. :white_check_mark:
  • Add extra logic to convert suspected usernames into @ mentions (@frold ) :white_check_mark:
  • Add GPT-4 support (when Open AI deems me worthy enough of access! :sweat_smile: ) :white_check_mark:
  • Add custom model name support. :white_check_mark:
  • Add option to strip out quotes from Posts before passing text to API. :white_check_mark:
  • Improve error transparency & handling for when Open AI returns an error state :white_check_mark:
  • Add retry capability for timed out API requests :white_check_mark:
  • Add support for ChatGPT :white_check_mark:
  • Lint the plugin to Discourse core standards :white_check_mark:
  • Add CI workflows :white_check_mark:
  • Add settings to influence the nature of the bots response (e.g. how wacky it is). :white_check_mark:
  • include Topic Title & first Posts to prompt :white_check_mark:
  • Add setting to switch from raw Post/Message data to cooked to potentially leverage web training data better (suggestion by @MarcP). NB May cost more and limit what is returned as input tokens are counted and cooked is much bigger. think we’ve abandoned this idea

Credits:

*It still uses OpenAI’s chat GPT engine, but can now leverage local functions and data from API calls to limit hallucinations.

80 Likes

Before I had used the plugin successfully, but I’ve been trying to install it for days and it always returns some errors, I went to the repository GitHub - neondatabase/pg_embedding: Hierarchical Navigable Small World (HNSW) algorithm for vector similarity search in PostgreSQL and there is this note
“IMPORTANT NEWS: As of September 29, 2023, Neon will no longer commit to pg_embedding.
Support will remain in place for existing users of the extension, but we strongly recommend migrating to pgvector.
For migration instructions, see Migrate from pg_embedding to pgvector in the Neon documentation.”
Maybe that’s why it’s impossible to install the plugin, would there be another option?"

1 Like

I’m already prepared for this. See:

if you are brave and want to test out the new version you can use this branch. This has been successfully deployed already on at least 5 Production forums. This is the branch I’m currently using now in any case and don’t intend to make any changes to main branch until this is merged.

Once ready I will merge this.

I’m waiting for pgvector >= 0.5.1 to be standard on Core (currenlty 0.4.4).

1 Like

Isn’t it? It is on coop for instance

postgresql-13-pgvector/now 0.5.1-1.pgdg110+1 amd64 [installed,local

2 Likes

I think you must run this:

'psql discourse -c "ALTER EXTENSION vector UPDATE;"'

for that to be forced, but I could be wrong.

That depends on how old your instance is. Postgres takes the version you had when you first installed the extension (and did not update it). So on an older deployment that installed the extension, the database will have a lower version of the extension and you must rebuild (to get the new code) and update it (to deploy it in your db). But on a newer deployment it will have 0.5.1 from the beginning.

After a rebuild and that update statement, you’ll be on >= 0.5.1 for sure.

1 Like

That tallies with experience, so yeah. :+1:

You will need to rebuild with that branch and follow the instructions on the PR.

Perhaps after a few more test installs I will merge.

FYI another reason for holding off a merge is to avoid the noise it will generate from this breaking change for at least a week: this week we will be seeing big changes merged by core wrt javascript/Ember upgrades and I think it would be unwise to combine the two :sweat_smile:

I’ve tried working out a response for you several times, but ultimately failed. Please contact the admin if this persists, thank you!

I’ve been making the mistake mentioned above a lot for the last week. Responses have been slower for the last week. Previously, responses were faster. Do you have any solution suggestions for this?

1 Like

The API can sometimes be slow on the more sophisticated models.

Are you using GPT 4? Perhaps switch to GPT 3.5 for improved response.

Which bot type are you using? Try normal. This will reduce the number of LLM calls.

You will need to provide detailed steps to reproduce.

1 Like

I am using gpt 3.5 turbo 16k. Bot mode: normal

2 Likes

Very strange, I’m getting near instant response.

Anything unusual in the logs?

Apologies, my ability to provide much free support is limited.

Check your quotas and account with Open AI is in good order.

1 Like

chatbot max response tokens : 4000

Could it be related to my settings above?

  • console error log: OpenAIBot Post Embedding: There was a problem, but will retry til limit: PG::ConnectionBad: PQconsumeInput() PANIC
  • Openai api rate limite:
2 Likes

This is your issue. I’ll contact you privately to resolve (On this occasion)

2 Likes

For those who saw the news about the great new large context and cheaper GPT 4 Turbo model that’s available (in preview), remember you don’t have to wait for an update, you can use these settings to try the new model out, now!:

be aware the rate limit on this preview is quite low, so don’t spam it!

2 Likes

Thanks for your job.

But I got the following error when I run rake chatbot:refresh_embeddings[1]:

Any ideas? Thx!

This is the app.yml file:

:warning: BREAKING CHANGE!!! :warning:

I’ve now merged this PR.

This change is about simplifying the install once and for all.

Everyone - if you do nothing, your build will probably break.

This should be the last breaking change that brings us inline with core and you can finally remove all the app.yml changes (after following these instructions carefully).

Please back up before proceeding.

Intro

Be patient, it’s worth it.

Required changes to app.yml

These additions were required for the pgembeddings extension which is now deprecated in favour of using pgvector which is available as standard within the standard install.

It is important to follow the right path here depending on whether you’ve previously installed Chatbot or not.

Note, because the (current) lastest version of pgvector is now required (>= 0.5.1), new installs require a minor command added to ensure you have the latest version installed.

I’ve never installed Chatbot before (never run ./launcher rebuild app with the previous instructions)

Please add the following to app.yml in the after_code: section but before the plugins are cloned:

    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "ALTER EXTENSION vector UPDATE;"' 

After one succcesful build with the plugin, you should be able to remove these additional lines and should be able to rebuild afterwards without issue.

I’ve already installed Chatbot before/have it installed

Please add/ensure you have the following in app.yml in the after_code: section but before the plugins are cloned (note that there are three new commands than before):

    - exec:
        cd: $home
        cmd:
          - sudo apt-get install wget ca-certificates
    - exec:
        cd: $home
        cmd:
          - wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
    - exec:
        cd: $home
        cmd:
          - sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
    - exec:
        cd: $home
        cmd:
          - apt-get update
    - exec:
        cd: $home
        cmd:
          - apt-get -y install -y postgresql-server-dev-${PG_MAJOR}
    - exec:
        cd: $home/tmp
        cmd:
          - git clone https://github.com/neondatabase/pg_embedding.git
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config install
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "create extension if not exists embedding;"'
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "DROP INDEX IF EXISTS hnsw_index_on_chatbot_post_embeddings;"'
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "DROP EXTENSION IF EXISTS embedding;"'
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "ALTER EXTENSION vector UPDATE;"' 

In either case you can now run ./launcher rebuild app

After one succcesful build with the plugin (including the above lines in app.yml depending on your case), you should be able to remove these additional lines and should be able to rebuild afterwards without issue.

4 Likes

Hey Rob, I of course installed this on one instance yesterday with the old instructions right before you posted this update :laughing:

I tried the new method you posted and it fails to rebuild, any ideas here? I disabled all other plugins to rule that out.

Error:

FAILED
--------------------
Pups::ExecError: cd /var/www/discourse && su postgres -c 'psql discourse -c "create extension if not exists embedding;"' failed with return #<Process::Status: pid 1433 exit 1>
Location of failure: /usr/local/lib/ruby/gems/3.2.0/gems/pups-1.2.1/lib/pups/exec_command.rb:132:in `spawn'
exec failed with the params {"cd"=>"$home", "cmd"=>["su postgres -c 'psql discourse -c \"create extension if not exists embedding;\"'"]}
bootstrap failed with exit code 1

app.yml

## Plugins go here
## see https://meta.discourse.org/t/19157 for details
hooks:
  after_code:
    - exec:
        cd: $home
        cmd:
          - sudo apt-get install wget ca-certificates
    - exec:
        cd: $home
        cmd:
          - wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
    - exec:
        cd: $home
        cmd:
          - sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
    - exec:
        cd: $home
        cmd:
          - apt-get update
    - exec:
        cd: $home
        cmd:
          - apt-get -y install -y postgresql-server-dev-${PG_MAJOR}
    - exec:
        cd: $home/tmp
        cmd:
          - git clone https://github.com/neondatabase/pg_embedding.git
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config
    - exec:
        cd: $home/tmp/pg_embedding
        cmd:
          - make PG_CONFIG=/usr/lib/postgresql/${PG_MAJOR}/bin/pg_config install
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "create extension if not exists embedding;"'
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "DROP INDEX IF EXISTS hnsw_index_on_chatbot_post_embeddings;"'
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "DROP EXTENSION IF EXISTS embedding;"'
    - exec:
        cd: $home
        cmd:
          - su postgres -c 'psql discourse -c "ALTER EXTENSION vector UPDATE;"'
    - exec:
        cd: $home/plugins
        cmd:
          - git clone https://github.com/discourse/docker_manager.git
          - git clone https://github.com/merefield/discourse-chatbot.git
1 Like

I’ve contacted you privately, the provided info is insufficient to diagnose the problem.

The lack of an available rake task, suggests the site wasn’t built successfully with the plugin and you are running a prior version of the site. I’ve contacted you privately to avoid unnecessary noise here.