Email notifications fail if duplicate headers exist

See Email notifications fail if duplicate headers exist - #14 by simonk for bug description
Above added by @pfaffman

Original text follows

Hello,

I’m running a rather small Discourse installation for some years now. It’s fairly low traffic, hence it took a while to notice that sending of emails (notifications, digests) obvoiusly began to fail some months ago. Forensics point to the upgrade to 2.8.0.beta7 around 2022-10-22, previously we were at 2.8.0.beta4. At least I haven’t received any email from that installation about posts or messages.

The emails pile up in Sidekiq, with a message I can’t relate to, nor find anything fitting when searching for it — there where reports of undefined method messages, but none of the conditions match mine. (It’s not TLS, it’s no timeout to the mail server, it’s not the events plugin, and the secure media fix should be in already — besides, the exact error messages were different.)

Error in Sidekiq:

Wrapped NoMethodError: undefined method `value' for #<Array:0x00007f7fd5277d68> Did you mean? values_at

The part after #<Array: is different for each held email. I reinstalled Discourse to a new VM last night and restored from a fresh backup — but seemingly the email issue was restored with the data :thinking:

As the error rate started to climb end of October, I’m rather sure this was introduced by 2.8.0.beta7:

Any help, or hint to debug the issue further, is much appreciated.

1 Like

Do you have any plugins installed? If you have non standard plugins you should remove them.

Tried this already, the issue prevails :frowning:

## Plugins go here
## see https://meta.discourse.org/t/19157 for details
hooks:
  after_code:
    - exec:
        cd: $home/plugins
        cmd:
          - git clone https://github.com/discourse/docker_manager.git

## Any custom commands to run after building

EDIT: FTR:

Intended Plugins:

root@discourse:/var/discourse# grep -B5 "git clone" containers/app.yml-with-plugins 
hooks:
  after_code:
    - exec:
        cd: $home/plugins
        cmd:
          - git clone https://github.com/discourse/docker_manager.git
          - git clone https://github.com/discourse/discourse-spoiler-alert.git
          - git clone https://github.com/discourse/discourse-prometheus.git
          - git clone https://github.com/discourse/discourse-solved.git
          - git clone https://github.com/discourse/discourse-chat-integration.git
          - git clone https://github.com/discourse/discourse-voting.git
          - git clone https://github.com/discourse/discourse-checklist.git
          - git clone https://github.com/discourse/discourse-whos-online.git
          - git clone https://github.com/discourse/discourse-calendar.git
          - git clone https://github.com/discourse/discourse-affiliate.git
          - git clone https://github.com/discourse/discourse-reactions.git
          - git clone https://github.com/discourse/discourse-surveys.git

The old installation has:

root@discourse-old:/var/discourse# grep -B5 "git clone" containers/app.yml
hooks:
  after_code:
    - exec:
        cd: $home/plugins
        cmd:
          - git clone https://github.com/discourse/docker_manager.git
          - git clone https://github.com/discourse/discourse-spoiler-alert.git
          - git clone https://github.com/discourse/discourse-prometheus.git
          - git clone https://github.com/discourse/discourse-solved.git
          - git clone https://github.com/discourse/discourse-chat-integration.git
          - git clone https://github.com/discourse/discourse-voting.git
          - git clone https://github.com/discourse/discourse-checklist.git
          - git clone https://github.com/davidtaylorhq/discourse-whos-online.git

This is a standard install? If you have separate data container has it been rebuilt? If postgres up to date? (Though I think that there is still a check for this on the code)

Yes, it is a standard install; AFAICS there is only one container running. (Bring up a new VM, repoint DNS, apt update, apt dist-upgrade, reboot, git pull, ./discourse-setup, Web-based setup, upload backup, restore backup, re-enable mail, see the errors again.) Note that the old install was able to email the link to the backup, and sending test emails still works in the new install — it’s seemingly only posting-related mails that fail.

To be more precise, it’s a minimal Debian 10 installation, on which I put discourse:

root@discourse:~# history
    1  apt update
    2  apt dist-upgrade
    3  reboot ; exit
    4  git clone https://github.com/discourse/discourse_docker.git /var/discourse
    5  apt install git rsync
    6  git clone https://github.com/discourse/discourse_docker.git /var/discourse
    7  cd /var/discourse
       [attach /dev/vdb, fdisk, mkfs.ext4, mount as /var/lib/docker]
   18  apt-get install git apt-transport-https ca-certificates curl gnupg2 software-properties-common -y
   19  df -h
   20  curl -fsSL https://download.docker.com/linux/$(. /etc/os-release; echo "$ID")/gpg | apt-key add -
   21  add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") $(lsb_release -cs) stable"
   22  apt-get update -y
   23  apt-get install docker-ce -y
   24  ./discourse-setup

And one container:

root@discourse:~# docker ps
CONTAINER ID   IMAGE                 COMMAND        CREATED        STATUS        PORTS                                                                      NAMES
ac408a70305d   local_discourse/app   "/sbin/boot"   12 hours ago   Up 12 hours   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   app

Edit:

root@discourse:~# apt update
Hit:1 https://download.docker.com/linux/debian buster InRelease
Hit:2 http://deb.debian.org/debian buster InRelease
Get:3 http://deb.debian.org/debian buster-updates InRelease [51,9 kB]
Get:4 http://security.debian.org/debian-security buster/updates InRelease [65,4 kB]
Fetched 117 kB in 2s (60,5 kB/s)    
Reading package lists... Done
Building dependency tree       
Reading state information... Done
All packages are up to date.

The first restores actually failed, though. database_restorer.rb failed in restore_dump due to duplicated links in post_id 3841. I ended up replacing these links in a post from 2017 by a screenshot of them on the old install and doing another backup; only then I was able to restore the backup on the new install. As you mentioned postgres, this happened during CREATE INDEX with ERROR: could not create unique index "unique_post_links". Further info: EXCEPTION: psql failed: DETAIL: Key (topic_id, post_id, url)=(1300, 3841, [redacted]) is duplicated.

While I don’t think this is directly related, I though I should mention it.

So, if nobody knows an easy way how to fix it, let’s debug this properly. But I need your help, as Discourse is a rather complex Application using a bunch of technologies I’m not familiar with. So: How are Mails “send to” Sidekiq?

Which component of Discourse is giving Sitekiq a method “value”?

With email notifications stopping to work after an Upgrade, Discourse is unfortunately rather useless now. Instead of immediate attention, topics now take days to get attention, if any, as people don’t do active polling in 2022. No notification ⇒ nothing happened ⇒ no need to check the site :frowning:

1 Like

This looks like a corrupt index. I don’t think I’ve seen this on postgres 13. It sounds like you fixed those on an old site and the upgraded by backing up and restoring to s new site?

It looks like the issue has to do with something in the code that’s sending notifications, but they it’s an issue with sidekiq.

As I said, old site is running, I did a backup and put that onto a fresh install, restore failed. I modified the post noted as the culprit until restore on the fresh install worked. Only to see having the issue with Sitekiq again/still.

The old site runs postgres 13 as well (but goes back several years, so it most likely didn’t start with that version :slightly_smiling_face:)

root@discourse-old:/var/discourse# ./launcher enter app
x86_64 arch detected.
root@discourse-app:/var/www/discourse# psql --version
psql (PostgreSQL) 13.5 (Debian 13.5-1.pgdg110+1)

So, according to this post’s closing comments, Discourse’s Database can get corrupted — and get fixed.

Tried with a new user, it get its registration email properly. But notifications on replies to its posts, no; Sidekiq errors out.

So, to me this means that Discourse is giving some fautly information to Sidekiq when it instructs it to send out Notifications (in contrast to registration mail). How to debug further?

OK, so if the indexes are repaired then this suggests to me that something is getting called with an array rather than a model that has value. The problem isnt’ with sidekiq, per se, but with the function that sidekiq is causing to get called.

So it sounds like something is getting called that’s returning an array rather than a single item, but I can’t guess just what. I think you’ll need to look in /var/discourse/shared/standalone/logs/rails/production.log (or something very much like that if my fingers or memory are failing me). Then you can look in those logs for that error (or cause it to happen again so it’ll be at the end of the file). You should get more information about what is failing there.

Thanks, but it’s not really telling much:

Started POST "/sidekiq/retries" for 185.39.142.187 at 2022-04-11 16:31:35 +0000
start
Started GET "/sidekiq/retries" for 185.39.142.187 at 2022-04-11 16:31:35 +0000
  Rendered email/notification.html.erb (Duration: 42.8ms | Allocations: 4323)
  Rendered layouts/email_template.html.erb (Duration: 0.3ms | Allocations: 29)
Job exception: undefined method `value' for #<Array:0x00007ff393af6c78>
Did you mean?  values_at

fail

shared/standalone/log/rails/production_errors.log is empty.

Does the error show up in Admin → Logs → Error Logs? If so, you can get a full stack trace which might help.

1 Like

Oh, nice — yes, it does:

So, if I understand this correctly,

 433    def header_value(name)
 434      header = @message.header[name]
 435      return nil unless header
*436      header.value
 437    end

is – in the Discource code? – where Sidekiq bails out?

This is called from …

 228      MessageBuilder.custom_headers(SiteSetting.email_custom_headers).each do |key, _|
*229        value = header_value(key)
 230
 231        # Remove Auto-Submitted header for group private message emails, it does
 232        # not make sense there and may hurt deliverability.

So it could be a custom header?

I indeed do have an entry there:

I’ve reset this to defaults, and … actually it seems that email notifications are indeed been sent out again, as verified by readers.

Thanks!

One question remains, though: Why is »Auto-Submitted: auto-generated|Precedence: bulk« causing this failure? It states that custom headers shall be seperated by »|«.

1 Like

(Disclaimer: not a ruby programmer)

I think this is particularly nasty behaviour in the mail library that discourse is using. Here’s the header_value function:

As far as I can tell, @message.header[name] is calling this method:

https://www.rubydoc.info/github/mikel/mail/Mail%2FHeader:[]

As per RFC, many fields can appear more than once, we will return a string of the value if there is only one header, or if there is more than one matching header, will return an array of values in order that they appear in the header ordered from top to bottom.

Discourse automatically sets a Precedence header, so because you’re adding one as well via the email_custom_headers setting, there are now two Precedence headers, and @message.header["Precedence"] is returning an array instead of a string.

I think this bug will be triggered any time email_custom_headers contains a header that already exists on the message object.

5 Likes

This looks to me like what’s happening (I suggested they something was an array instead of a single item, but couldn’t imagine how that could be true) and is a bug.

I’ll change the title and category of this topic.

3 Likes

I merged a fix for this today, if we detect a duplicate header we just use the one defined in Discourse core rather than the custom one from the site setting:

4 Likes

This topic was automatically closed after 2 days. New replies are no longer allowed.