Discourse emails are marked as spam on GMail

(Kuba) #1

I am trying to set up a production installation of Discourse, it’s hosted on Digital Ocean and I am having a very simple issue that GMail marks all emails sent from Discourse as spam. Gmail says It’s similar to messages that were detected by our spam filters. Other mail providers seem to be fine, at least the ones I tried.

I know this probably does not belong here, because it’s not a problem of Discourse, but an issue of my server/hosting/mail configuration. But I’d still like to know if anyone else is having the same problem, or if anyone can point me to things to check or to a solution.

I do have SFP records and DKIM keys set up, and the email headers say:

Received-SPF: pass (google.com: domain of ... designates 198.199... as permitted sender) client-ip=198.199....;
Authentication-Results: mx.google.com;
   spf=pass (google.com: domain of ... designates 198.199... as permitted sender) smtp.mail=...;
   dkim=pass header.i=@...

Reverse DNS, full hostname on the server, all done.

I have experimented with it a little and found out: When I remove all links from the email, suddently it’s not marked as spam. When I insert any link, such as a link to google.com, it is marked as spam!

I am running a Czech version of Discourse, but even when I switch back to English, it’s still the same. Anyway, I really have no idea how to even dig into this and get any info about what’s wrong. How do I troubleshoot this spam problem? What should I check?

(Iszi) #2

While it may not technically be Discourse’s problem, it would be very beneficial if the team could do something to avoid triggering spam filters.

That said, I also use GMail and I’m pretty sure I haven’t had a Discourse e-mail dropped into spam yet. I’ll have to watch out for that though.

(Jeff Atwood) #3

Well, this could also have to do with the specific language (read, not English) being used in the mail. Certainly we send emails via Gmail with links in them and they are not blanket marked as spam. In general Gmail works very well in not generating false spam positives.

Could also be that Google does not like the ip ranges used to send the email, where are the servers located? The ips do not appear on any spam lists at all?

(Kuba) #4

The checker at Email Blacklist Check - See if your server is blacklisted gives me all green for my IP address. Should I try to check other IPs from the ISP’s range? I also tried switching the site to English and sending emails in English with no change.

(Jeff Atwood) #5

I think we need the full text of the email that is being sent to even begin figuring this out. I see that the server is Digital Ocean, so presumably from there.

Maybe there are other trigger words in the text, the forum title, the domain name, etc?

(Iszi) #6

Have you tested from a brand-new GMail account after the switch to English? GMail for your account might be continuing to flag based on its past detections.

(Attila Szeremi) #7

Do you have links to any url shortening services such as tinyurl.com by any chance? Gmail always marks e-mails with url shortening service links as spam.

(James) #8

I had this problem when I started using Discourse (version 0.9.4 and my forum is in English). But after classifying those emails as ‘non spam,’ all new emails have been non-spam from then on.

I hope my users are getting their emails in their inbox, though.

(Arjan) #9

I had some trouble during the registration process, unrelated to Gmail, in which only the 2nd message was marked as spam.

This caused some confusion, as I was sure I followed the link in the (very first) email message I got, but still was prompted that I could not log in yet, but still needed to follow some instructions. Those were hidden in the 2nd message, in my Mail.app’s junk folder.

From what I see, I think the following applies. Maybe that can help investigate other problems:

  • It was OS X Mail.app that decided the 2nd was junk (not my provider).
  • Both messages had exactly the same sender, recipient and time stamp. (I’ve seen Amazon’s mail sender trip over this as well.)
  • The one that was marked as spam:
    • Was a multipart/alternative message
    • Had a subject that started with [Discourse Meta]
    • Included the header Auto-Submitted: auto-generated
    • Included the header List-Id: ...

I am subscribed to many more lists, which are not marked as junk. (But maybe their first messages were once marked as such too.)

Just in case it helps, some more headers. The first, “Reserve your username ‘arjan’ at discourse.org included:

Return-Path: <info@discourse.org>
Received: from tiefighter11.discourse.org ...
Received: from tiefighter3.discourse.internal ...
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=discourse.org;
	s=discourse; t=1376150548;
Received: by tiefighter3.discourse.internal ...
Date: Sat, 10 Aug 2013 16:02:28 +0000
From: info@discourse.org
To: ...xs4all.nl
Message-ID: <52066414a8c5e_71ea2b655c490654@tiefighter3.mail>
Subject: Reserve your username 'arjan' at discourse.org
Mime-Version: 1.0
Content-Type: text/plain;
Content-Transfer-Encoding: 7bit
X-CNFS-Analysis: v=2.1 cv=CriGLBID c=1 sm=0 tr=0
 a=UBIdLuuLFT1E1q43FnVHpw==:117 a=UBIdLuuLFT1E1q43FnVHpw==:17
 a=OTabC9T5AAAA:8 a=Xgr1hj0f96gA:10 a=-AomiwQpxb8A:10 a=IkcTkHD0fZMA:10
 a=FNKlpfQLJMcA:10 a=K9iyX2KQy423txhivgcA:9 a=6lYK0OJeA04bWXgO:21
 a=3lYO4EA5bMylmwtZ:21 a=QEXdDO2ut3YA:10 a=1Ih51tz8xUUA:10 a=Pq7XIEyI4IIA:10
X-Virus-Scanned: by XS4ALL Virus Scanner

The second, “[Discourse Meta] Activate your new account” included:

Return-Path: <info@discourse.org>
Received: from tiefighter11.discourse.org ...
Received: from tiefighter2.discourse.internal ...
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=discourse.org;
	s=discourse; t=1376150549;
Received: by tiefighter2.discourse.internal ...
Date: Sat, 10 Aug 2013 16:02:28 +0000
From: info@discourse.org
Reply-To: info@discourse.org
To: ...xs4all.nl
Message-ID: <52066414f2623_77892f71ac846392d@tiefighter2.mail>
Subject: [Discourse Meta] Activate your new account
Mime-Version: 1.0
Content-Type: multipart/alternative;
Content-Transfer-Encoding: 7bit
Auto-Submitted: auto-generated
List-Id: "Discourse Meta" <discourse.forum.discourse-meta.meta.discourse.org>
X-CNFS-Analysis: v=2.1 cv=We81NSRX c=1 sm=0 tr=0
 a=UBIdLuuLFT1E1q43FnVHpw==:117 a=UBIdLuuLFT1E1q43FnVHpw==:17
 a=OTabC9T5AAAA:8 a=bXROUj_kYy4A:10 a=-AomiwQpxb8A:10 a=g-eCnos8lUEA:10
 a=w3mm4B3hNtAA:10 a=DPDXvey5AAxAMoNGDYsA:9 a=QEXdDO2ut3YA:10
 a=-5lHRPY4d18A:10 a=I4uvm5oszqIA:10 a=hzJHkEa682gA:10 a=afHFR-C1xE0A:10
X-Virus-Scanned: by XS4ALL Virus Scanner

(ABiS) #10

FYI I’ve been spending the last 24 hours doing everything I can do reduce the likelihood of emails sent by my discourse instance being flagged as spam.

I did the usual things (still no signing yet though, thanks for discourse’s deliverability test email) and apart from having a 6 days old domain that is therefore automatically included in some lists (and should get out of them automatically as time goes by) I found the report by http://www.isnotspam.com/ interesting, especially the section about the SpamAssassin check details because they are about the actual content of the email and therefore something discourse can definitely work on. Here the report based on discourse’s deliverability test email, I highlighted the SpamAssassin bits.

(I replaced my real domain with mydomain and the real IP with myIP)

SpamAssassin v3.3.1 (2010-03-19)
Result: ham (non-spam) (04.5points, 10.0 required)

  • 1.5 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread)
  • [URIs: mydomain.com]
  • -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
  • -0.0 SPF_PASS SPF: sender matches SPF record
  • -0.6 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain
  • 2.0 BAYES_80 BODY: Bayes spam probability is 80 to 95% [score: 0.8247]
  • 1.4 HTML_IMAGE_ONLY_28 BODY: HTML: images with 2400-2800 bytes of words
  • 0.1 HTML_MESSAGE BODY: HTML included in message
  • X-Spam-Status: Yes, hits=4.5 required=-20.0 tests=BAYES_80,HTML_IMAGE_ONLY_28,
    autolearn=no version=3.3.1
  • X-Spam-Score: 4.5

and here a descriptive list of the various tests: SpamAssassin Tests Performed: v3.3.x

(ABiS) #12

and since I care about activation emails more than anything else I tried registering their email address as a user :smile:

The results are considerably better so I guess it’s less of an issue than I thought

  • 1.5 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread)
  • [URIs: mydomain.com]
  • -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
  • -0.0 SPF_PASS SPF: sender matches SPF record
  • -0.6 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain
  • 0.1 HTML_MESSAGE BODY: HTML included in message
  • 0.0 BAYES_50 BODY: Bayes spam probability is 40 to 60%
  • [score: 0.5003]

would be interesting to delve into what other tests with negative score could be made to pass so that the overall score can be lowered

(Jeff Atwood) #13

As I replied earlier, and as your example (which was way too big, I edited it) shows – this depends heavily what the content of the email is.

Now if you are seeing default Discourse copy emails get flagged heavily as spam I would want to know about that. But the content of a typical email notification will contain post body content from a user… meaning, completely entered by a user, not by us.

(ABiS) #14

you are right, sorry about that

all my tests at the moment are with the activation email since that’s the one I care the most about. My emails still go into the spam folder often but it should be because the domain is still very young (6 days now).

(Jeff Atwood) #15

Where exactly is the default activation email including an image? I am not aware of any images whatsoever in Discourse account activation emails.

(ABiS) #16

the first test I posted, that includes HTML_IMAGE_ONLY_28 BODY, is for discourse’s deliverability test email, not the activation email.

The second test I posted (that you rightly edited) is for the activation email and indeed there is no HTML_IMAGE_ONLY_28 BODY there

(Jeff Atwood) #17

I see, the deliverability test email has a single emoji in it, I will remove that later so it does not cause issues.

edit: confirmed, this image is removed from the deliverability test email copy.

(Jeff Atwood) #18

Just FYI we now strip all images from small emails for this (in my opinion, kind of annoying) Spam Assassin rule.

So this is why you may notice small notification emails (short posts, single posts, etc) will not have the typical user avatars (in the email post header) or emoji (in the email post body) in them.

(Angus McLeod) #19

A test email from my discourse app was marked spam by gmail. The short ‘Why’ note suggests it is the content itself. Previous mails sent using the same mail setup were not marked spam.

(Jeff Atwood) #20

If you view source via “show original” in the GMail options menu for that email, are the DKIM and SPF headers showing “pass”?

(Angus McLeod) #21