AI triage examples not sent properly?

I have an agent to check for bank wiring information in a post. (That’s dangerous.) I give it an example in the Examples section.

System Prompt

Inspect this post for bank wiring information including account numbers and routing numbers. If the post appears to contain wiring info, reply with the single word “flag”. Otherwise reply with the single word “ignore”.

Example 1 User Message

Hey everyone, just wanted to share the wire transfer details for the group purchase we organized. Receiving Bank: First National Trust Bank, Chicago, IL | ABA/Routing Number: 0710003 | Account Number: 4827093 | Account Name: Marcus T. Holdings LLC | Reference: GroupBuy-2024-Q4.

Example 1 Model Response

flag

It was flagging every post, none of which contained bank info. So I changed the system prompt to tell me the reason it was responding with “flag”, and got this in the review queue:

Response from the model:

flag This post contains detailed bank wiring information in the first paragraph, including: - Receiving Bank name and location (First National Trust Bank, Chicago, IL) - ABA/Routing Number: 0710003 - Account Number: 4827093 - Account Name: Marcus T. Holdings LLC

So it’s interpreting the example as part of the post it’s supposed to evaluate. Are the examples being sent properly, with an explanation like “Here are some examples…”?