@merefield plugin has been around for longer and has many more knobs to configure it. AI Bot is also a bit more ambitious (especially since we have GPT 4 access) in that we attempt to integrate it into the Discourse experience - it knows how to search and summarize topics , for example.
Notable differences as of today are probably
We stream replies and offer a stop button
@merefield offers a lot more settings to tune stuff
We offer a “command” framework for getting the bot to act on your behalf - albeit experience is fairly flaky on GPT 3.5
@merefield offers discourse chat integration atm, we do not yet
To add: From my tests, it looks like AI Bot only works in PM and Chatbot works everywhere, unless I’m doing something wrong with the AI bot.
Image generation and streaming are nicely done, as well as search API, however, it sometimes still falls back to “I can’t search the web or can not generate images”. Are you using something similar to LangChain agents, that decide what steps to take?
Are we supposed to create a CX with scope for the full web, or just our instance URL?
That is correct. We will probably get to wider integration, but are taking our time here and trying to polish the existing stuff first.
Yes, this is the very frustrating thing about GPT 3.5 vs 4. Grounding the model for 3.5 is just super duper hard.
I am considering having an intermediary step prior to replying in GPT 3.5 that first triages prior to actually responding (Eg: does this interaction INTERACTION look like it should result in a !command, if so which?) It would sadly add cost and delay so this is my last resort
We use a “sort of” langchain, limited to 5 steps, but we try to be very frugal with tokens so balance is hard.
Up to you… I like having access to all of Google, it is mighty handy
What I do to ground 3.5 is adding a second, shorter system prompt lower in the final prompt to “remind” the model of some of the rules in the main system prompt.
So it would look something like (typing from phone, trying…)
system role “reminder”
new user prompt
Just by repeating the most important system role contents, the model adds more weight to it. I’ve been using this workaround for a few months now without too much strange responses.
Especially if prompts are becoming longer, the model tends to “forget” things that are higher in the final prompt. Things in AI are very hacky, it’s something I experience this in GPT models and langchain as well. Just today I got such a strong personality in langchain that the actions when asking the time in a random city, were “checking my watch”, “changing the timezone of my watch” and “ask a stranger”
I’m assuming you rely on a formatted LLM output to decide the next action to take. So this works way better with temperatures close to zero. This should help grounding 3.5 and should greatly improve results.
Maybe not within the scope, but it would be interesting to train a model on all the posts in my forum and use them to create an expert user AI bot that users could interact with, or that could answer questions from users on its own in threads, and link to/quote relevant posts from the past.