looking at GitHub - google-gemini/gemini-cli: An open-source AI agent that brings the power of Gemini directly into your terminal. what trick does this use to terminate a chain of llm calls once budget is reached?
looking at GitHub - google-gemini/gemini-cli: An open-source AI agent that brings the power of Gemini directly into your terminal. what trick does this use to terminate a chain of llm calls once budget is reached?
GitHub code search
Searched for ‘budget terminate’ in google-gemini/gemini-cli
GitHub code search
Searched for ‘token limit chain’ in google-gemini/gemini-cli
GitHub search files
Searched for ‘budget, token’ in google-gemini/gemini-cli/main
GitHub file content
Retrieved content of packages/core/src/core/tokenLimits.ts, packages/core/src/agents/types.ts from google-gemini/gemini-cli@main
GitHub code search
Searched for ‘max_turns terminate’ in google-gemini/gemini-cli
GitHub code search
Searched for ‘AgentTerminateMode MAX_TURNS’ in google-gemini/gemini-cli
GitHub file content
Retrieved content of packages/core/src/agents/executor.ts from google-gemini/gemini-cli@main
Let me also check how the client handles budget termination in the main chat loop:
GitHub code search
Searched for ‘token_budget’ in google-gemini/gemini-cli
GitHub code search
Searched for ‘infinite loop MAX_TURNS’ in google-gemini/gemini-cli
GitHub file content
Retrieved content of packages/core/src/core/client.ts#L1-L100 from google-gemini/gemini-cli@main
GitHub file content
Retrieved content of packages/core/src/core/client.ts#L450-L550 from google-gemini/gemini-cli@main
Based on my analysis of the gemini-cli repository, they use multiple strategies to terminate chains of LLM calls, but the key “trick” is a combination of:
MAX_TURNS = 100)In packages/core/src/core/client.ts:
const MAX_TURNS = 100;
The client bounds turns to prevent infinite loops:
// Ensure turns never exceeds MAX_TURNS to prevent infinite loops
const boundedTurns = Math.min(turns, MAX_TURNS);
The agent executor (packages/core/src/agents/executor.ts) uses an AgentTerminateMode enum:
export enum AgentTerminateMode {
ERROR = 'ERROR',
TIMEOUT = 'TIMEOUT',
GOAL = 'GOAL',
MAX_TURNS = 'MAX_TURNS',
ABORTED = 'ABORTED',
ERROR_NO_COMPLETE_TASK_CALL = 'ERROR_NO_COMPLETE_TASK_CALL',
}
Agents define their own limits via RunConfig:
interface RunConfig {
max_time_minutes: number; // Timeout in minutes
max_turns?: number; // Max conversational turns
}
When a limit is hit, instead of hard-stopping, they give the agent one final chance with a 1-minute grace period:
const GRACE_PERIOD_MS = 60 * 1000; // 1 min
The agent receives a final warning message forcing it to call complete_task immediately:
“You have exceeded the maximum number of turns. You have one final chance to complete the task with a short grace period. You MUST call
complete_taskimmediately with your best answer…”
This lets the model gracefully wrap up rather than just cutting off mid-task.
Agents must call a complete_task tool to finish. If they stop calling tools without calling this, they hit ERROR_NO_COMPLETE_TASK_CALL and trigger the recovery flow.
Summary: The main trick is turn counting + timeout with a grace period recovery mechanism that prompts the model to wrap up gracefully. There’s no explicit “token budget” termination - they rely on turn limits, time limits, and forcing the model to self-terminate via the complete_task tool.
wait explain the complete_task tool?
Looking at the code in packages/core/src/agents/executor.ts, the complete_task tool is a mandatory synthetic tool that gets injected into every agent’s toolset. It’s the only way an agent can signal it’s done.
In prepareToolsList(), this tool is always added regardless of what other tools the agent has:
const TASK_COMPLETE_TOOL_NAME = 'complete_task';
// Always inject complete_task.
const completeTool: FunctionDeclaration = {
name: TASK_COMPLETE_TOOL_NAME,
description: outputConfig
? 'Call this tool to submit your final answer and complete the task. This is the ONLY way to finish.'
: 'Call this tool to signal that you have completed your task. This is the ONLY way to finish.',
parameters: {
type: Type.OBJECT,
properties: {},
required: [],
},
};
If the agent has an outputConfig, the tool’s schema is dynamically modified to require that output:
if (outputConfig) {
const jsonSchema = zodToJsonSchema(outputConfig.schema);
completeTool.parameters!.properties![outputConfig.outputName] = schema as Schema;
completeTool.parameters!.required!.push(outputConfig.outputName);
}
So if an agent is configured to return a “report”, the tool becomes:
complete_task(report: string) // required parameter
When the model calls complete_task, it’s handled differently from other tools - it’s not executed, it just signals completion:
if (functionCall.name === TASK_COMPLETE_TOOL_NAME) {
taskCompleted = true;
// Validate output if required
if (outputConfig) {
const validationResult = outputConfig.schema.safeParse(outputValue);
if (!validationResult.success) {
taskCompleted = false; // Validation failed, revoke completion
// ... return error to model
}
}
// ...
}
The agent’s system prompt hammers home that this is mandatory:
finalPrompt += `
* When you have completed your task, you MUST call the \`complete_task\` tool.
* Do not call any other tools in the same turn as \`complete_task\`.
* This is the ONLY way to complete your mission. If you stop calling tools without calling this, you have failed.`;
If the model stops generating tool calls without ever calling complete_task, the executor catches this:
// If the model stops calling tools without calling complete_task, it's an error.
if (functionCalls.length === 0) {
return {
status: 'stop',
terminateReason: AgentTerminateMode.ERROR_NO_COMPLETE_TASK_CALL,
finalResult: null,
};
}
This triggers the grace period recovery where the model gets one more chance.
It’s a forcing function that:
It’s essentially a protocol contract: “You have tools, use them, and when you’re done, call this specific tool with your answer.”