We’ve done quite a bit of testing with this, and we don’t seem to get reliable results at all. For context, we’re using the gpt-4o model.
To test its accuracy, I gave the following simple instructions:
You are a spam detection system. Analyze the following content and context.
Notes below. If *ANY* of the items are true below then mark it as spam:
- The username is very specifically "testjon", then it is *ALWAYS* spam.
- Respond only with "SPAM - It's Jon!" or "NOT SPAM".
Testing on a post, by the username testjon
, results in NOT SPAM. It seems like it’s not heeding instructions well at all. Any suggestions?
Have any others had any good or bad experiences with the AI spam detection?