Forum helper is tweaked to RAG with the existing forum content, but your question has nothing to do with it. In this case the “creative” persona, which is the vanilla LLM is a better fit.
Creative Bot: “If there are more towels than can be laid out without overlapping or if the space becomes crowded, reducing the exposure to sunlight, then it might take longer. Assuming conditions remain optimal and similar to drying 10 towels, the drying time should remain the same: 30 minutes.”
I’m having good luck configuring various open and commercial LLMs that can be mixed and matched with standard and custom AI bot personas… and the idea of having multiple ‘agents’ that each have their own RAG context, categories, groups, etc. way cool… opens up a ton of use cases !
so… wow ! this is an amazing “reimagining” of the Discourse platform that a lot of major apps are not able to do for some reason … bravo Disco peeps. ! onwards…
Yes. the earlier models are esp bad at even simple math / simple logic for instance with this simple puzzle:
I live in an apartment building with a broken elevator.
If I enter it and push the button for a floor with an even number, it goes to that floor but doesn’t open the door and then goes back to where I started.
So if I enter the elevator on floor 1 and push the floor 4 button… and when it stops, I get out and climb three additional flights by foot… what floor do I end up on ?
best answer: floor 4
All my Discourse AI personas based on gpt4o get this right , (except the one with only local / RAG context … because no relevant context… )
but I just ran this by gpt 3.5 in the openAI gui, and it replied …
Let’s break it down step by step:
You enter the elevator on floor 1 and press the button for floor 4.
The elevator goes up to floor 4 but doesn’t open the door, then it goes back to where you started (floor 1).
You get out of the elevator on floor 4 and climb three additional flights of stairs by foot.
Since you started on floor 1 and climbed three additional flights of stairs, you would end up on floor 7.
and oddly… in the Gemini 1.5 Pro in browser GUI … ask about towel drying…
… with 10 towels, each towel has more surface area exposed to the sun and air, allowing for faster drying. With 15 towels, some towels might be bunched together or overlapping, reducing the effective drying surface area per towel.
Therefore, it won’t simply take 50% more time (1.5 times 30 minutes) to dry 50% more towels. It likely will take more time, but not necessarily an exact 1.5 fold increase.
Estimation:
A reasonable estimate could be around 45 minutes to 1 hour. This considers the reduced drying efficiency with more towels but avoids assuming a perfectly linear relationship between drying time and towel count.
Giving the LLM access to a calculator certainly helps (Chatbot has had that access for a long time) but does not make up for poor logic or reasoning: doing the wrong calculation “correctly” is arguably as bad as doing a wrong calculation. Indeed, the former can actually make the error more convincing so might be harder to detect?
GPT 3.5 (OpenAI browser GUI):
“If you prioritize both high probability and a larger sample size, you might consider the second seller , as it has a high probability of positive reviews with a relatively larger sample size”
Gemini 1.5 Pro (Google AI Studio):
“You should be most inclined to buy from seller 3 . who offers the most statistically reliable data.”
Claude 3 Sonnet (Anthropic browser GUI):
“According to standard principles of probability and statistics, a larger sample size generally provides a more reliable estimate of the true population proportion. It would be most reasonable to choose Seller 3” .
My custom Discourse AI persona (Gemini Pro):
“You should likely go with product 3” .
My custom Discourse AI persona (GPT4o):
“The second seller (96% with 50 reviews) might be a balanced choice between high probability and sufficient review volume.”
Some of the ‘logic’ put forth by these LLM’s is truly laughable! … and none of them seemed to grasp the real statistical nuances …
Considering how many variables there are in the LLM game, it would seem that comprehensive ‘in situ’ testing frameworks will be a non-optional feature going forward (plugin? )
Factors :
LLM Model release/version ( they seem to tweak fine tuning regularly )
Not being one to leave well enough alone… I added context on the Laplace theory of probabilities to the Discourse AI bot that got it ‘wrong’ ( Gemini based )
… general conclusion: Bots are just weird… sorta just like people… but like people they learn in all sorts of interesting ways. Even though they are at heart just huge stochastic webs of probabilistic language inference… bots will help out with math, logic and stats problems in ways that more than justify their place card the Disco banquet table …
They don’t learn. That’s true with OpenAI models, I don’t know others. A bot can or cannot use given information depending of tokens, algoritm and some other mystical things.
But we can point it to right direction. And yet after five or so answers it has forgot that.
Yes fair point… they don’t really learn like humans!
I think we are talking in this thread about methods relating to context learning , and not conventional human long term learning… though it’s ephemeral… context learning is getting really interesting because of the insanely huge context sizes ( eg, +1M tokens) that the latest models are achieving.
for instance… if you wanted a certain model to more reliably answer questions that require knowledge of Laplace probability principles… with the context/ prompting approach, you could feed in that context with either with hard coded system prompt or vector DB retrieval, etc …
Here’s an example experiment based on upload of a small document (~1k words) with Laplace knowledge
Assumptions:
The bot is not pretrained on Laplace ( see above fail examples ) …
The bot is limited to what’s in the Discourse instance for specific knowledge
Custom Persona Settings
( plugin experts please correct as needed ! )
Name: AlphaBot
Description: Probability puzzle bot with Laplace knowledge
Default Language Model: GeminiPro
Enabled Commands: Search, Categories, Read
System Prompt:
Answer questions using local provided context that describes Laplace methods for probability comparisons. Be as thorough and comprehensive as possible but don’t search the web or outside sources. Use only local context and focus on using Laplace techniques.
Upload: Laplace-tutorial.txt
note how you don’t have to mention Laplace because it’s in the instructions: