Suppress images for short emails

To prevent Discourse emails being detected as spam, we should suppress images from short emails (having text length 2800 Bytes or less).

Currently short emails with images are detected as spam by Apache SpamAssassin, due to HTML_IMAGE_ONLY_24 rule.

I sent a PR for this:

https://github.com/discourse/discourse/pull/2805

Any suggestion/feedback on this?

cc @sam @codinghorror @supermathie

2 Likes

Too strong. Short emails with HTML images is an indicator used by SA, contributing ~30% of the necessary score to be detected as spam.

The bayesian filtering is independent of this.

1 Like

The scores we were shown are

BAYES_95             3
HTML_IMAGE_ONLY_24   1.618
RDNS_NONE            0.793

Total score 5.4111

So that 30% number is right on target!

https://wiki.apache.org/spamassassin/Rules/HTML_IMAGE_ONLY_24

1 Like

This implementation strips images out using a regex. That means that if the post itself includes images, the images will be simply missing from the email, which could be confusing to the email recipient.

IME, the vast majority of emails I receive from my Discourse instance include only the avatar image; I bet 80% of the problem would be addressed by just suppressing the avatar for short messages. Then, if someone does send out a post whose content is nothing but an image, just that email may be (correctly!) flagged with HTML_IMAGE_ONLY_24.

(Indeed, the post itself might be actual spam posted by a spammer, so just by suppressing the avatar, we’d allow the spam catcher to do its own thing.)

5 Likes

I like the advice of narrowing the regex to suppress just the avatar images, however, we should also suppress emoji. A short message with an emoji will trigger this just as badly. And emoji aren’t so important that they should be allowed to trigger the Spam Assassin filters for short HTML with image.

2 Likes

Good point. Many emoji have a Unicode representation. Others could be replaced by their textual representation.

1 Like

Okay, as per the feedback, I have done following updates:

  • Only Avatar and Emoji images will be suppressed from email
  • Avatar image will be replaced with their relevant title text, e.g. :heart: will be replaced by :heart:
  • Short email length is configurable in site settings, default will be 2800

Example:

This post on forum:

Gets rendered in email as:

3 Likes

@techAPJ can you fix alignment for this very common case

2 Likes

Fixed:

2 Likes

This topic was automatically closed after 3 days. New replies are no longer allowed.

Just to confirm this is definitely working. Here’s a short email notice I just got:

Notice removed avatar. Good work @techapj!

1 Like