I have an agent to check for bank wiring information in a post. (That’s dangerous.) I give it an example in the Examples section.
System Prompt
Inspect this post for bank wiring information including account numbers and routing numbers. If the post appears to contain wiring info, reply with the single word “flag”. Otherwise reply with the single word “ignore”.
Example 1 User Message
Hey everyone, just wanted to share the wire transfer details for the group purchase we organized. Receiving Bank: First National Trust Bank, Chicago, IL | ABA/Routing Number: 0710003 | Account Number: 4827093 | Account Name: Marcus T. Holdings LLC | Reference: GroupBuy-2024-Q4.
Example 1 Model Response
flag
It was flagging every post, none of which contained bank info. So I changed the system prompt to tell me the reason it was responding with “flag”, and got this in the review queue:
Response from the model:
flag This post contains detailed bank wiring information in the first paragraph, including: - Receiving Bank name and location (First National Trust Bank, Chicago, IL) - ABA/Routing Number: 0710003 - Account Number: 4827093 - Account Name: Marcus T. Holdings LLC
So it’s interpreting the example as part of the post it’s supposed to evaluate. Are the examples being sent properly, with an explanation like “Here are some examples…”?