Handling bouncing e-mails

Would be helpful for us too.

Seems like there is some confusion between the documentation from @zogstrip in this howto and the current state of things.

Was this where the change happened? I don’t know enough about github to know if this change was tied to an issue.

https://github.com/discourse/discourse/blob/33abd68bdff4abaaafa7dc12ae5450bfa6af3126/db/migrate/20200130115859_remove_bounce_score_threshold_deactivate_site_setting.rb

2 Likes

Yes it was removed in

https://github.com/discourse/discourse/commit/62c21ba64995856470b01764934c6e1fb76f1b97

It was removed because once you hit bounce_score_threshold (default 4), it is almost impossible for the score to rise any further. Therefore bounce_score_threshold_deactivate (default 30) would never be hit. The vast majority of people never change these settings, and having a threshold which was impossible to hit caused confusion.

I’ve updated the howto here to reflect the change.

9 Likes

Ah, understood. I had bounce_score_threshold_deactivate set to be equal with bounce_score_threshold if I recall correctly and it worked as intended. Having it not hit by default sounds quite suitable for this setting imo.

2 Likes

Not sure if it actually means to “uncheck” the “group events” checkbox? The icon used is not checked, so I got confused.

I am following the instructions for Mailjet and the fourth bullet seems ambiguous. What does

check the :white_medium_square: in the group events column

mean?

The wording says “check”. The square is empty. The current default for Mailjet is checked. Should we uncheck it? Is the wording not up to date?

1 Like

It means check the box in the group events column. A little more description would help? I’m assuming if you want mail sent for group events, then check the box.

2 Likes

What are Mailgun’s rules for trying to send emails to an address after a permanant_fail?

I guess I should choose Discourse settings to match Mailgun’s rules as closely as possible.

2 Likes

I asked Mailgun and here is their reply:

Bounces, complaints and unsubscribes will be added to something called a Suppressions List, so that you are unable to send to them in the future and accrue a high bounce rate. You can manually remove the address from the Suppressions List, in order to contact that recipient again, if you’re certain that the address is correctly spelled and in good standing.

I didn’t ask about temporary_fail but imagine they try again until success or permanent_fail.

So if I understand correctly, Discourse allowing x bounces within y days might be sending to addressses already on Mailgun’s suppression list following one permanent fail.

4 Likes

I’ve setup email feedback forwarding as described in the post for SES but I’m not getting bounce notifications in discourse. Do i need to create a sns webhook? Or do I need to enable VERP? Please help.

1 Like

Yes, for SES you need to set up VERP on Discourse. I’ll update the OP here to make that clearer.

4 Likes

Thanks @david! I’ve set the reply by email address, but I am still not getting bounce notifications. Do I need to enable reply by email enabled setting aswell?

I’m sending emails via noreply@domain.com and i’ve set the reply by address to replies+%{reply_key}@domain.com

1 Like

I’ve having the same issue with Amazon SES. Followed the instructions in the OP to enable VERP (set the reply by email address and enabled reply by emails). For emails I’ve tried replies+{reply_key}@mydoman.com, replies+{reply_key}@replies.mydomian.com, and replies+{reply_key}@noreply.mydomain.com (which is what the site sends emails as).

I see my bounced email count going up in SES, so I know it’s processing them.

The only thing I’m not sure about was the Manual Polling vs POP3 Polling that I had to enable before I could set the Reply By Emails setting to enabled. I went with manual since it had no other settings.

2 Likes

After some testing using AWS Simple Email/Notification Service I got it working.

What I did:

  • On Discourse:
    • set reply by email enabled and reply by email address as per the top of OP
    • set manual polling enabled
  • On AWS:
    • create a SNS Topic
    • create a Subscription for the created Topic as HTTPS pointing to https://your.discourse/webhooks/aws
    • set this Topic as the value for Bounces and Complaints on the Notifications setting for my SNE Domain

I’m not sure how the Discourse settings relate to each other, at first I assumed only manual polling enabled was needed when using the SNS approach, but if I reset the other values the bounces stop showing up on /admin/email/bounced.

I’m guessing you’d need either manual polling enabled + Webhook configuration on AWS (what I did) or pop3 polling enabled + pop3 polling * settings with the OP approach that recommends enabling Email Feedback Forwarding on AWS – but of course I may be completely wrong.

What I didn’t test yet is whether the Complaints notifications are used by Discourse – it’s working as expected for Bounces.

5 Likes

Just noticed this reply, and after managing to set the SNS bounce/complaint thing in the wrong place twice, after the third time it started working!

Thanks! (And boo Amazon for their confusing UI)

3 Likes

It may not always be desirable to enable the temporary_fail webhook. This hook is called for timeouts when sending mails - usually, these sends will be retried and will succeed on the second attempt. There may be other spurious “temporary” failures that also trigger this hook as well. I have a number of users / hosts subscribing to my forum that occasionally timeout during sends - if temporary_fail is connected, these addresses very quickly hit the bounce limit and be blocked.

AFAICT Mailgun has it’s own rubric for permanently suppressing badly-behaving addresses - from the API docs, it sounds like these are always reported to the permanent_fail webhook - so the only reason to connect the temporary_fail webhook is probably if you want bounce rules that are more strict than those of mailgun. And since timeouts seem much more common than any other kind of temporary_fail, you run a risk of lots of false positives or false negatives no matter how you configure your bounce scores…

Something unfortunate: Mailgun classifies “mailbox full” errors as temporary_fail's. So, if you disable this webhook, users probably won’t be notified of a “mailbox full” failure until it has occurred several times and they have triggered a hard bounce / suppression.

3 Likes

I have always felt unsure about this, and the relationship between Discourse (the software I mean) and Mailgun. On balance, it sounds like you would generally recommend not using the temporary_fail webhook - is that right? After a while (reset bounce score after days) Discourse starts trying to send emails again (and presumably Mailgun then forms a view about whether to send) so maybe that is relevant (ie an argument for saying it’s ok to keep the temporary_fail webhook). Maybe the answer is to tweak the bounce threshold/increments?

(Also, as a Mailgun guru, do you have any view on "Discourse::NotFound" error when click "Email Type" field on admin/email/bounced - #8 by Jonathan5 ?)

2 Likes

It should be possible to tweak the settings to avoid some spurious Discourse bounces due to these timeouts (fwiw, the error is: Service closing transmission channel - command timeout).

Here’s the problem: in my logs, I see that timeouts are by far the most common “temporary” failure. If you’ve tweaked your Discourse increment/threshold settings such that timeout errors are not triggering bounces on Discourse, then you’re definitely not triggering bounces for any of the other kinds of errors, since these are less frequent. On the other hand: if your increment/threshold are such that you’ll e.g. catch a case where someone has a full mailbox and delivery fails a few times in a row, you’ll also probably end up catching some timeout cases as well, purely b/c these are much more statistically common.

At best, you can probably set your increment for soft bounces so that you’ll reliably catch e.g. an address where a send fails 20 times in a row (e.g. maybe something like reset time 4 days, threshold 12, soft increment 1, hard increment 12). But, it’s pretty likely that a case like this would already be flagged as a permanent_fail by mailgun anyway - in this case, you don’t really have a choice to restrict email to that address: Mailgun is already suppressing mail via it’s own rules, until you manually re-enable it.

TLDR: it seems like permanent_fail by itself will do what you need, and temporary_fail may not be needed unless you have a specific reason to make your bounce rules more strict than Mailgun’s own rules.

You can check for timeout related failures on your own Mailgun account here (replacing the *** with your domain):

https://app.mailgun.com/app/sending/domains/***.*****.***/logs?search=timeout

I see at least a few per day - curious what other people see? This may be a coincidence, but it seems like the failures are generally with obscure/personal/university email hosts, which would point to something like old SMTP server software or high server load (I see no gmail/yahoo/protonmail/etc timeouts in spite of these being the majority).

3 Likes

I wonder whether the problem is that Discourse was designed like Mailman (the mailing list software), ie to be used with your own mail server rather than an external commercial service. For example, if it had been designed with Mailgun in mind then likely it would obtain Mailgun’s suppression list via Mailgun’s API rather than rely on its own potentially inconsistent bounce scores/thresholds. (Also it would obtain the bounce information instead of show an error message - see other topic.)

1 Like

Hi,

I am a little confused. I did all these then which email address should i add?

Thanks.

1 Like

Email feedback should be going to the VERP address - check those setup instructions for details.

2 Likes