Integrating GPT3-like bots?

Proposal for an Answering Bot with Scheduled Categorization and Fine-Tuning for Discourse Forums

Introduction: Discourse forums rely on user engagement and contributions, and a crucial aspect of this is the ability to get timely and accurate answers to questions. However, sometimes it may take a while for a response, discouraging users from continuing to participate in the conversation. To address this, we propose a bot that can automatically answer questions after a specific time frame to encourage community engagement. Additionally, the bot will allocate scheduled calls to categorize existing threads and build its own fine-tuning dataset, which can be updated from time to time.

Objectives: The primary objectives of the answering bot with scheduled categorization and fine-tuning for discourse forums are to:

  1. Encourage community engagement by providing timely and accurate answers to questions that may otherwise remain unanswered.
  2. Automate the categorization of existing threads to ensure that questions are correctly tagged, and users can easily find relevant information.
  3. Build a fine-tuning dataset for the bot to improve its performance and accuracy over time.

Proposed Solution: To achieve the objectives outlined above, we propose integrating a bot that can automatically answer questions after a specific time frame, allocated scheduled calls to categorize existing threads, and build its fine-tuning dataset. The bot will be designed to analyze user input, understand the context of the conversation, and generate appropriate responses based on predefined rules and machine learning models.

The bot will use natural language processing (NLP) techniques to analyze user input and generate responses that are relevant to the conversation. It will be trained to understand the context of the question, the topic being discussed, and the user’s previous interactions to provide accurate and helpful answers. The bot will only respond to questions that have not been answered within a specific time frame or when summoned by a username directly.

In addition to answering questions, the bot will allocate scheduled calls to categorize existing threads. It will analyze the thread’s content and tags to ensure that questions are correctly tagged and organized for easy navigation. The bot will also build its fine-tuning dataset by recording and categorizing user queries and responses. This dataset will be used to train and improve the bot’s performance over time.

Benefits: The benefits of integrating an answering bot with scheduled categorization and fine-tuning for discourse forums are numerous, including:

  1. Encouraging community engagement: The bot will provide timely and accurate answers to questions, encouraging users to continue participating in the conversation.
  2. Improved categorization of threads: The bot will automate the categorization of threads, ensuring that questions are correctly tagged and organized for easy navigation.
  3. Improved accuracy and performance: The bot’s fine-tuning dataset will be used to train and improve its performance over time.
  4. Reduced workload for human moderators: The bot will reduce the workload of human moderators by automating the categorization of threads and answering questions that would otherwise go unanswered.

Conclusion: Integrating an answering bot with scheduled categorization and fine-tuning for discourse forums is a valuable investment that can help encourage community engagement, automate categorization tasks, and improve the accuracy and performance of the bot over time. We recommend exploring the available NLP and machine learning models to select the one that best meets the needs of the discourse forum. The integration process should be planned and executed carefully, with proper testing and training to ensure that the bot performs as intended.


This is a great start but unfortunately, as the OP, it still does not achieve what I said I was lookig for in the beginning. However, after having been looking at this area for many years now, unless I win the lottery, I am not going to get exactly what I want by trying to pay for it myself. To reiterate, I need a bot that can not only do what you outline above but also has a persistent memory for previous discussions with individuals - just a like a human has. Since I am not going to get exactly what I want anytime soon but Discourse itself + the developing proposal for a Discourse AI Bot would do a LOT of what I want, maybe I should just put all my efforts into helping as much as I can with that project while I continue to investigate persistent memories using Graph Theory and other stuff - that could be added later?

If the Discourse implementation/proposals don’t meet your needs, and you’re happy to fund the development of Open Source AI software (Apache-2.0, which Discourse themselves would then be free to re-purpose), I would be more than happy to set up for you an AI bot for Discourse that has memory.

Everything here is going to depend on the model. I can see a lot of general interest here, but nobody has suggested which model to use and nobody proved that the model can do anything remotely useful.

Even getting good stuff out of OPT is hard and Facebook added a fair bit of params. My general concern here (also to the industry at wide) is that in the open space there is nothing even remotely close to GPT 3 devinci, and GPT 4 is on the horizon and will make it even harder to compete.


In the following post above both a model, and the usefulness is detailed:

The process is Supabase + OpenAI GPT API. Right now GPT 3.5 + OpenAI embeddings is sufficient to achieve many tasks desired today of a Discourse bot.

The GPT API is not open source. But, it is an API. And when an open source model catches up (such as, it can be swapped into its place.

I certainly agree. That’s why, for now, under the hood GPT 3.5 would be used until a better alternative is available.

My big concern here is attribution, especially when a corpus is enormous like a Discourse forum and so much of the data used to train the bot may be out of date.

There are some big fundamental issues with this ClippyGPT approach. You can’t replace search with something that is not providing links back to content. Training this would also be a monster of a task.


Prob better to PM you for a better discussion about the details of this idea . .

Hi Phil,

I am also in the Sydney timezone (Wagga Wagga). My email address is Let’s organise a time to do a video call?


Oh wow! - I moved from Sydney to Cowra in 2015! - I just drove near you a few days ago going to and from Holbrook!

Are you around at all for the rest of the day?

Speaking of Sydney… :wink:

I’m surprised that nobody has mentioned the “new” Bing yet. I think it’s a great example of what could be implemented in Discourse.

Willing to throw some financial support into this :slight_smile:

1 Like

So my plugin can use GPT3 (by default: “text-davinci-003”, but you can choose the model) to summarise Topics:

I’ve given it a go (even in Production) and I’m quite impressed with the results so far. I would go as far as saying it’s sometimes ‘sublime’.

However, whilst it often returns relevant, syntactically correct and convincing summaries, it is prone to factual inaccuracies that can be highly misleading and impair its usefulness. So much potential here though!

Note the plugin is still very experimental but now seems stable. Results will vary, but there are various quality of life settings to improve your results including a downvoting mechanic with a setting for threshold that will prompt the system to retrieve a new summary from the LLM.


This is going quite well too:


I think chatgpt would be great for faq and documentation. Take a look a t this study by Richard Millington


ChatGPT has no support for fine-tuning (nor does it have an API for that matter as of today )

I think it would be interesting to train a GPT based model (either fine tune GPT-3 or use something else) on a corpus on a Discourse site to see how well captain word salad does once trained on the data and taught to respond. With the caveat clearly that “garbage in, garbage out”.

Experiments are certainly going to happen, and the over confident lying GPT models will get better over time (both due to better data and mitigating algorithms that somehow fact check)

Richard’s post is certainly interesting, but ChatGPT is not ready for the task quite yet:

Compared to say Bing which fine tunes based on search results that are pretty recent.


Something similar seems to have been released on Bubble Buddy I Solve Bubble Issues I Goodspeed

I am uncertain whether the contents of are used as seed too.

1 Like

If the OP does not feel this reply is on-topic let me know so that I can move it.

OpenAI just announced ChatGPT Plugins.

ChatGPT plugins are here!

Reading through the documentation came across the most extensive example

which notes

This is a plugin for ChatGPT that enables semantic search and retrieval of personal or organizational documents. It allows users to obtain the most relevant document snippets from their data sources, such as files, notes, or emails, by asking questions or expressing needs in natural language. Enterprises can make their internal documents available to their employees through ChatGPT using this plugin.

Being a regular user of several Discourse sites it would be nice to be able to just ask a natural language question of ChatGPT and then it could via a to be created plugin search all of those Discourse sites for me and create a nice reply.

This is more of an idea but worth mentioning.


Another ChatGPT plugin that currently exist. (Noted for those trying to wrap their head around what a plugin can do, I know for programmers this is common sense but for non-programmers it is like speaking Greek so hopefully this will give one a better understanding :slightly_smiling_face: )




Learning about LLMs and how to integrate them is not easy. This site ties many of the concepts together in a comprehensible format, still requires knowledge of LLMs and such at a high technical level but also serves as a template of what to pay attention to for learning how to integrate LLMs.

LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also:

  1. Be data-aware: connect a language model to other sources of data
  2. Be agentic: Allow a language model to interact with its environment

I have been experimenting a lot in this space recently:

The topic focuses on integrating AI bots like GPT-3 and BlenderBot into Discourse, discussing technical design, APIs, and software. Participants explore potential use cases and address concerns about cost, ethics, and limitations. They express interest in developing plugins for GPT-3 integration and collaboration with the Discourse company. AI integration is prioritized for the next release cycle, but challenges remain, including the need for persistent memory. The topic also covers recent developments in plugins and technologies, such as the Wolfram plugin, WebGPU, and LangChain, which offer new ways to compute data, develop applications, and create custom visualizations.

The above :arrow_double_up: was generated by summarizing chunks of this topic and then making a summary of the summary.

The “next frontier” here is certainly around tying various GPT tooling with other external sources… like … the Discourse database, search engine, summarization engines and more.