Integrating GPT3-like bots?

@Festinger I did reply, maybe you missed my email? Check January 17th for the email:

@SimonBiggs upon reflecting a bit more on the issue, I realized that it might be a better approach to simply make a closed, external service that could receive invites to join a Discourse forum, set up its profile and then participate as a user, using the API. The profile would be realistic but be clear in the description that it’s a bot.

I figured out what the bot would do, but it doesn’t have to be an actual plugin to Discourse. It just has to run once in a while as a cron, and then post comments and replies using Discourse API. What do you think?

I have some ideas for use cases for such features. I realize this is going off topic for the OP but also seems to be the topic where all those interested in integrating a GPT3-like bot are visiting. If you care to start another topic (public or private) then there would be a single place where the ideas by the community are collected. :slightly_smiling_face:

1 Like

That would be wonderful as our first period of the dedicated AI team is to collect and catalog ideas around possible features.

3 Likes

I figured out what the bot would do, but it doesn’t have to be an actual plugin to Discourse. It just has to run once in a while as a cron, and then post comments and replies using Discourse API. What do you think?

That certainly makes sense. However, I personally would only want to make open source AI tools. And, given discourse itself is planning on making a tool I would want to be helping them instead ideally.

1 Like

Proposal for an Answering Bot with Scheduled Categorization and Fine-Tuning for Discourse Forums

Introduction: Discourse forums rely on user engagement and contributions, and a crucial aspect of this is the ability to get timely and accurate answers to questions. However, sometimes it may take a while for a response, discouraging users from continuing to participate in the conversation. To address this, we propose a bot that can automatically answer questions after a specific time frame to encourage community engagement. Additionally, the bot will allocate scheduled calls to categorize existing threads and build its own fine-tuning dataset, which can be updated from time to time.

Objectives: The primary objectives of the answering bot with scheduled categorization and fine-tuning for discourse forums are to:

  1. Encourage community engagement by providing timely and accurate answers to questions that may otherwise remain unanswered.
  2. Automate the categorization of existing threads to ensure that questions are correctly tagged, and users can easily find relevant information.
  3. Build a fine-tuning dataset for the bot to improve its performance and accuracy over time.

Proposed Solution: To achieve the objectives outlined above, we propose integrating a bot that can automatically answer questions after a specific time frame, allocated scheduled calls to categorize existing threads, and build its fine-tuning dataset. The bot will be designed to analyze user input, understand the context of the conversation, and generate appropriate responses based on predefined rules and machine learning models.

The bot will use natural language processing (NLP) techniques to analyze user input and generate responses that are relevant to the conversation. It will be trained to understand the context of the question, the topic being discussed, and the user’s previous interactions to provide accurate and helpful answers. The bot will only respond to questions that have not been answered within a specific time frame or when summoned by a username directly.

In addition to answering questions, the bot will allocate scheduled calls to categorize existing threads. It will analyze the thread’s content and tags to ensure that questions are correctly tagged and organized for easy navigation. The bot will also build its fine-tuning dataset by recording and categorizing user queries and responses. This dataset will be used to train and improve the bot’s performance over time.

Benefits: The benefits of integrating an answering bot with scheduled categorization and fine-tuning for discourse forums are numerous, including:

  1. Encouraging community engagement: The bot will provide timely and accurate answers to questions, encouraging users to continue participating in the conversation.
  2. Improved categorization of threads: The bot will automate the categorization of threads, ensuring that questions are correctly tagged and organized for easy navigation.
  3. Improved accuracy and performance: The bot’s fine-tuning dataset will be used to train and improve its performance over time.
  4. Reduced workload for human moderators: The bot will reduce the workload of human moderators by automating the categorization of threads and answering questions that would otherwise go unanswered.

Conclusion: Integrating an answering bot with scheduled categorization and fine-tuning for discourse forums is a valuable investment that can help encourage community engagement, automate categorization tasks, and improve the accuracy and performance of the bot over time. We recommend exploring the available NLP and machine learning models to select the one that best meets the needs of the discourse forum. The integration process should be planned and executed carefully, with proper testing and training to ensure that the bot performs as intended.

4 Likes

This is a great start but unfortunately, as the OP, it still does not achieve what I said I was lookig for in the beginning. However, after having been looking at this area for many years now, unless I win the lottery, I am not going to get exactly what I want by trying to pay for it myself. To reiterate, I need a bot that can not only do what you outline above but also has a persistent memory for previous discussions with individuals - just a like a human has. Since I am not going to get exactly what I want anytime soon but Discourse itself + the developing proposal for a Discourse AI Bot would do a LOT of what I want, maybe I should just put all my efforts into helping as much as I can with that project while I continue to investigate persistent memories using Graph Theory and other stuff - that could be added later?

If the Discourse implementation/proposals don’t meet your needs, and you’re happy to fund the development of Open Source AI software (Apache-2.0, which Discourse themselves would then be free to re-purpose), I would be more than happy to set up for you an AI bot for Discourse that has memory.

Everything here is going to depend on the model. I can see a lot of general interest here, but nobody has suggested which model to use and nobody proved that the model can do anything remotely useful.

Even getting good stuff out of OPT is hard and Facebook added a fair bit of params. My general concern here (also to the industry at wide) is that in the open space there is nothing even remotely close to GPT 3 devinci, and GPT 4 is on the horizon and will make it even harder to compete.

4 Likes

In the following post above both a model, and the usefulness is detailed:

The process is Supabase + OpenAI GPT API. Right now GPT 3.5 + OpenAI embeddings is sufficient to achieve many tasks desired today of a Discourse bot.

The GPT API is not open source. But, it is an API. And when an open source model catches up (such as https://github.com/LAION-AI/Open-Assistant), it can be swapped into its place.

I certainly agree. That’s why, for now, under the hood GPT 3.5 would be used until a better alternative is available.

My big concern here is attribution, especially when a corpus is enormous like a Discourse forum and so much of the data used to train the bot may be out of date.

There are some big fundamental issues with this ClippyGPT approach. You can’t replace search with something that is not providing links back to content. Training this would also be a monster of a task.

2 Likes

Prob better to PM you for a better discussion about the details of this idea . .

Hi Phil,

I am also in the Sydney timezone (Wagga Wagga). My email address is me@simonbiggs.net. Let’s organise a time to do a video call?

Cheers,
Simon

Oh wow! - I moved from Sydney to Cowra in 2015! - I just drove near you a few days ago going to and from Holbrook!

Are you around at all for the rest of the day?

Speaking of Sydney… :wink:

I’m surprised that nobody has mentioned the “new” Bing yet. I think it’s a great example of what could be implemented in Discourse.
https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/

Willing to throw some financial support into this :slight_smile:

1 Like

So my plugin can use GPT3 (by default: “text-davinci-003”, but you can choose the model) to summarise Topics:

I’ve given it a go (even in Production) and I’m quite impressed with the results so far. I would go as far as saying it’s sometimes ‘sublime’.

However, whilst it often returns relevant, syntactically correct and convincing summaries, it is prone to factual inaccuracies that can be highly misleading and impair its usefulness. So much potential here though!

Note the plugin is still very experimental but now seems stable. Results will vary, but there are various quality of life settings to improve your results including a downvoting mechanic with a setting for threshold that will prompt the system to retrieve a new summary from the LLM.

3 Likes

This is going quite well too:

7 Likes

I think chatgpt would be great for faq and documentation. Take a look a t this study by Richard Millington

5 Likes

ChatGPT has no support for fine-tuning (nor does it have an API for that matter as of today )

I think it would be interesting to train a GPT based model (either fine tune GPT-3 or use something else) on a corpus on a Discourse site to see how well captain word salad does once trained on the data and taught to respond. With the caveat clearly that “garbage in, garbage out”.

Experiments are certainly going to happen, and the over confident lying GPT models will get better over time (both due to better data and mitigating algorithms that somehow fact check)

Richard’s post is certainly interesting, but ChatGPT is not ready for the task quite yet:

Compared to say Bing which fine tunes based on search results that are pretty recent.

5 Likes

Something similar seems to have been released on Bubble Buddy I Solve Bubble Issues I Goodspeed

I am uncertain whether the contents of https://forum.bubble.io/ are used as seed too.

1 Like