AI triage examples not sent properly?

I have an agent to check for bank wiring information in a post. (That’s dangerous.) I give it an example in the Examples section.

System Prompt

Inspect this post for bank wiring information including account numbers and routing numbers. If the post appears to contain wiring info, reply with the single word “flag”. Otherwise reply with the single word “ignore”.

Example 1 User Message

Hey everyone, just wanted to share the wire transfer details for the group purchase we organized. Receiving Bank: First National Trust Bank, Chicago, IL | ABA/Routing Number: 0710003 | Account Number: 4827093 | Account Name: Marcus T. Holdings LLC | Reference: GroupBuy-2024-Q4.

Example 1 Model Response

flag

It was flagging every post, none of which contained bank info. So I changed the system prompt to tell me the reason it was responding with “flag”, and got this in the review queue:

Response from the model:

flag This post contains detailed bank wiring information in the first paragraph, including: - Receiving Bank name and location (First National Trust Bank, Chicago, IL) - ABA/Routing Number: 0710003 - Account Number: 4827093 - Account Name: Marcus T. Holdings LLC

So it’s interpreting the example as part of the post it’s supposed to evaluate. Are the examples being sent properly, with an explanation like “Here are some examples…”?

Instead of giving your model instructions to return strings, you can use the automation type of Triage with AI Agent, then five this agent access to the flag tool.

Then you instruct the agent to call the tool when your conditions apply.

You’re right that’s a cleaner solution, and I’ve done that, but it doesn’t change the issue. It still flags every post. It’s not understanding that the example is just an example.

Automation Settings


Agent Settings




It flags every post, citing the text in the example

  1. What LLM are you using?

  2. Those examples are wrong. They are sent as previous turns before your message, so they need to mimick the exact expected LLM response. If the example is from a situation where you want a tool call, then the response should mimic a tool call from the LLM. That said, your use case is so simple that any current LLM should be able to one-shot it without examples, just with a clear prompt saying when to call the tool.

I’m using Sonnet 4.5, which I agree should not need examples for this simple case. But for more complex cases, how do I “mimic a tool call from the LLM”? What should I type in the example boxes? Are there example examples somewhere?