I’m running a rather small Discourse installation for some years now. It’s fairly low traffic, hence it took a while to notice that sending of emails (notifications, digests) obvoiusly began to fail some months ago. Forensics point to the upgrade to 2.8.0.beta7 around 2022-10-22, previously we were at 2.8.0.beta4. At least I haven’t received any email from that installation about posts or messages.
The emails pile up in Sidekiq, with a message I can’t relate to, nor find anything fitting when searching for it — there where reports of undefined method messages, but none of the conditions match mine. (It’s not TLS, it’s no timeout to the mail server, it’s not the events plugin, and the secure media fix should be in already — besides, the exact error messages were different.)
Error in Sidekiq:
Wrapped NoMethodError: undefined method `value' for #<Array:0x00007f7fd5277d68> Did you mean? values_at
The part after #<Array: is different for each held email. I reinstalled Discourse to a new VM last night and restored from a fresh backup — but seemingly the email issue was restored with the data
As the error rate started to climb end of October, I’m rather sure this was introduced by 2.8.0.beta7:
## Plugins go here
## see https://meta.discourse.org/t/19157 for details
hooks:
after_code:
- exec:
cd: $home/plugins
cmd:
- git clone https://github.com/discourse/docker_manager.git
## Any custom commands to run after building
This is a standard install? If you have separate data container has it been rebuilt? If postgres up to date? (Though I think that there is still a check for this on the code)
Yes, it is a standard install; AFAICS there is only one container running. (Bring up a new VM, repoint DNS, apt update, apt dist-upgrade, reboot, git pull, ./discourse-setup, Web-based setup, upload backup, restore backup, re-enable mail, see the errors again.) Note that the old install was able to email the link to the backup, and sending test emails still works in the new install — it’s seemingly only posting-related mails that fail.
root@discourse:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ac408a70305d local_discourse/app "/sbin/boot" 12 hours ago Up 12 hours 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp app
Edit:
root@discourse:~# apt update
Hit:1 https://download.docker.com/linux/debian buster InRelease
Hit:2 http://deb.debian.org/debian buster InRelease
Get:3 http://deb.debian.org/debian buster-updates InRelease [51,9 kB]
Get:4 http://security.debian.org/debian-security buster/updates InRelease [65,4 kB]
Fetched 117 kB in 2s (60,5 kB/s)
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
The first restores actually failed, though. database_restorer.rb failed in restore_dump due to duplicated links in post_id 3841. I ended up replacing these links in a post from 2017 by a screenshot of them on the old install and doing another backup; only then I was able to restore the backup on the new install. As you mentioned postgres, this happened during CREATE INDEX with ERROR: could not create unique index "unique_post_links". Further info: EXCEPTION: psql failed: DETAIL: Key (topic_id, post_id, url)=(1300, 3841, [redacted]) is duplicated.
While I don’t think this is directly related, I though I should mention it.
So, if nobody knows an easy way how to fix it, let’s debug this properly. But I need your help, as Discourse is a rather complex Application using a bunch of technologies I’m not familiar with. So: How are Mails “send to” Sidekiq?
Which component of Discourse is giving Sitekiq a method “value”?
With email notifications stopping to work after an Upgrade, Discourse is unfortunately rather useless now. Instead of immediate attention, topics now take days to get attention, if any, as people don’t do active polling in 2022. No notification ⇒ nothing happened ⇒ no need to check the site
This looks like a corrupt index. I don’t think I’ve seen this on postgres 13. It sounds like you fixed those on an old site and the upgraded by backing up and restoring to s new site?
It looks like the issue has to do with something in the code that’s sending notifications, but they it’s an issue with sidekiq.
As I said, old site is running, I did a backup and put that onto a fresh install, restore failed. I modified the post noted as the culprit until restore on the fresh install worked. Only to see having the issue with Sitekiq again/still.
The old site runs postgres 13 as well (but goes back several years, so it most likely didn’t start with that version )
So, according to this post’s closing comments, Discourse’s Database can get corrupted — and get fixed.
Tried with a new user, it get its registration email properly. But notifications on replies to its posts, no; Sidekiq errors out.
So, to me this means that Discourse is giving some fautly information to Sidekiq when it instructs it to send out Notifications (in contrast to registration mail). How to debug further?
OK, so if the indexes are repaired then this suggests to me that something is getting called with an array rather than a model that has value. The problem isnt’ with sidekiq, per se, but with the function that sidekiq is causing to get called.
So it sounds like something is getting called that’s returning an array rather than a single item, but I can’t guess just what. I think you’ll need to look in /var/discourse/shared/standalone/logs/rails/production.log (or something very much like that if my fingers or memory are failing me). Then you can look in those logs for that error (or cause it to happen again so it’ll be at the end of the file). You should get more information about what is failing there.
Started POST "/sidekiq/retries" for 185.39.142.187 at 2022-04-11 16:31:35 +0000
start
Started GET "/sidekiq/retries" for 185.39.142.187 at 2022-04-11 16:31:35 +0000
Rendered email/notification.html.erb (Duration: 42.8ms | Allocations: 4323)
Rendered layouts/email_template.html.erb (Duration: 0.3ms | Allocations: 29)
Job exception: undefined method `value' for #<Array:0x00007ff393af6c78>
Did you mean? values_at
fail
shared/standalone/log/rails/production_errors.log is empty.
is – in the Discource code? – where Sidekiq bails out?
This is called from …
228 MessageBuilder.custom_headers(SiteSetting.email_custom_headers).each do |key, _|
*229 value = header_value(key)
230
231 # Remove Auto-Submitted header for group private message emails, it does
232 # not make sense there and may hurt deliverability.
I’ve reset this to defaults, and … actually it seems that email notifications are indeed been sent out again, as verified by readers.
Thanks!
One question remains, though: Why is »Auto-Submitted: auto-generated|Precedence: bulk« causing this failure? It states that custom headers shall be seperated by »|«.
As per RFC, many fields can appear more than once, we will return a string of the value if there is only one header, or if there is more than one matching header, will return an array of values in order that they appear in the header ordered from top to bottom.
Discourse automatically sets a Precedence header, so because you’re adding one as well via the email_custom_headers setting, there are now twoPrecedence headers, and @message.header["Precedence"] is returning an array instead of a string.
I think this bug will be triggered any time email_custom_headers contains a header that already exists on the message object.
This looks to me like what’s happening (I suggested they something was an array instead of a single item, but couldn’t imagine how that could be true) and is a bug.
I merged a fix for this today, if we detect a duplicate header we just use the one defined in Discourse core rather than the custom one from the site setting: