Discourse AI - Spam detection

We’ve done quite a bit of testing with this, and we don’t seem to get reliable results at all. For context, we’re using the gpt-4o model.

To test its accuracy, I gave the following simple instructions:

You are a spam detection system. Analyze the following content and context.
Notes below. If *ANY* of the items are true below then mark it as spam:
- The username is very specifically "testjon", then it is *ALWAYS* spam.
- Respond only with "SPAM - It's Jon!" or "NOT SPAM".

Testing on a post, by the username testjon, results in NOT SPAM. It seems like it’s not heeding instructions well at all. Any suggestions?

Have any others had any good or bad experiences with the AI spam detection?