Experiments with AI based moderation on Discourse Meta

Looking at the difference between these prompts:

Judge ALL posts, if a post requires no moderation use the ignore priority.

Judge ALL posts with a skeptical eye. Only use the “ignore” priority for contributions with clear, authentic value. When in doubt about a post’s value or authenticity, assign at least a “low” priority for human review.

I think it’s important to remember the major recency bias in the models – perhaps all command words should be mentioned in prose near the end, in reverse order of desired frequency.