I think I see the bug here… generally max_tools was there to avoid huge chains with tool calls, but there are probably 2 different types of max:
- Max times we round trip with the LLM prior to replying
- Maximum amount of tool calling (which can be much higher)
- Maximum amount of context tools can inject. (eg: if you blow the context in 3 tool calls and then pile on another 2 that cause the model to forget about 2 tool replies)
Going to have a think about this, it is solvable.