But that’s not the issue here, surely?
The issue is with the reasoning.
Giving the LLM access to a calculator certainly helps (Chatbot has had that access for a long time) but does not make up for poor logic or reasoning: doing the wrong calculation “correctly” is arguably as bad as doing a wrong calculation. Indeed, the former can actually make the error more convincing so might be harder to detect?