Imported users not receiving verification emails due to massive Sidekiq backlog

I just launched my forum after importing from vBulletin.

The user flow leading up to the error:

Imported user enters old username and password > successful login > verification email sent to user > USER NEVER GETS THE EMAIL

I have gone through the email troubleshooting guide. Everything seems to be fine and well:

✓ app.yml looks good
✓ DigitalOcean not the issue
✓ Mandrill working fine
✓ Test Emails through Admin Interface get sent
✓ Re-built several times just to be sure SiteSettings correspond with the cache
✓ PTR record checks out fine

I think my issue is with sidekiq but I am not certain so I am going to provide some information that I have:

ps -ef | grep sidekiq
root     28274  6811  0 20:02 pts/0    00:00:00 grep --color=auto sidekiq
1000     31174 31145 98 18:56 ?        01:05:43 sidekiq 3.3.4 discourse [5 of 5 busy]            

Any ideas? @sam??

1 Like

“Imported user” sounds suspicious to me. What happens when a new user created from scratch signs up?

Forgot to mention, I did test that as well with another email address of mine.

No activation email sent.

Is this a Docker based install, and is there anything mail related in forum.example.com/logs ?

It is a Docker based install. I went by the book contrary to any of my early summer reactions to Discourse that may have misled you.

log errors are 90%+ Failed to pull hotlinked image but I am seeing Job exception: getaddrinfo: Name or service not known which I’m guessing is probably 100% relevant to the issue.

1 Like

What’s likely happening is that the email jobs are stuck in the queue behind all of the post processing jobs from the import.

6 Likes

I’ve tried just about everything to get these transaction emails to send via mandrill. It’s kind of upsetting me mainly because my help email inbox is filling up with users who can’t get the validation email(s) to send.

I followed the installation guide and email troubleshooting step by step. I got confirmation that Digital Ocean is not blocking outbound emails.

I can also send emails via mandrill from my local machine, from outside the container and inside the container on my DO server using a python script using my mandrill credentials (same as the app.yml file). Mail from the test admin interface is working. After checking outbound email through madnrill, it seems the transaction emails aren’t passing through?

This is what my app.yml file looks like:

DISCOURSE_HOSTNAME: 'domain.com'
DISCOURSE_SMTP_ADDRESS: smtp.mandrillapp.com         # (mandatory)
DISCOURSE_SMTP_PORT: 587                             # (optional)
DISCOURSE_SMTP_USER_NAME: mail@domain.com            # (optional)
DISCOURSE_SMTP_PASSWORD: NfgduQX6JFV7YDolBGEahA      # (optional)

What else could be going wrong? What else can I check?

Yes, there is.

It’s exactly like @riking said:

You can try to temporarily setting UNICORN_SIDEKIQS to a higher number (e.g. 5) in your app.yml in order to process the queue a lot faster. But, this will use a lot more memory. So, make sure your machine has enough.

BTW: I was wondering if it’s possible to enqueue high priority tasks like sending validation emails before all other tasks? Does Sidekiq support this? Everyone expects emails to get sent immediately, but nobody cares about how long it takes to postprocess posts.

5 Likes

To clarify this even more: Your problem is not mails not getting through (as your test emails show). Your problem is that your install is quite busy, and will take some time for these mails to be sent. Wait until the enqueued count has dropped to 0, or try to speed things up as @gerhard explained above (it will still take some time, though).

Until then, it’s normal that a lot of things in Discourse will appear broken. Also, don’t be surprised by all he requested mails arriving when the queue clears – they’re all in there and will be processed when they get through.

I have 8GB of memory. What is the maximum I could set UNICORN_SIDEKIQS to? At the current rate it is going to take days before the queue reaches zero. Days…

You’ll have to experiment with this value a little bit. I’d say 5 should work fine. You’ll get 5 threads per sidekiq process, so you’ll have 50 workers.

One more thing: You also need to raise the number of available connections in the database pool, otherwise the database will be the bottleneck. Adding this to your app.yml in the env section should work.

  DISCOURSE_DB_POOL: 50
  UNICORN_SIDEKIQS: 5
1 Like

@gerhard I very much appreciate the response. This turned out to be a bumpier ride than I anticipated. The good news is I am now aggressively reducing this queue and I see a light at the end of the tunnel. Woo :slight_smile:

2 Likes

Oh good, glad to see you found the solution before I could reply to your PM :smile:

Sidekiq queue is now empty. This import process was still nothing short of a disaster, however.

All those emails finally executed as @riking said and introduced a new set of problems.

:heavy_multiplication_x: Mandrill hourly quota has been exceeded - new backlog there
:heavy_multiplication_x: Mandrill has accepted some of my money but is closed on weekends – can’t lift my quota
:heavy_multiplication_x: Users are annoyed from numerous validation emails
:heavy_multiplication_x: Validation emails have expired
:heavy_multiplication_x: Users are now in a bad loop of trial and error (re-validating attempts, re-registering)

I want to say thank you to the Discourse Team for everything they have provided including personalized support here. At the same time I hope in the future there’s more of a seamless migration for current forum owners. Even if there’s a simple cheat sheet that outlines ‘at-large topics to think about before migrating and going live’.

1 Like

One thing we could do here is warn if there is a very high Sidekiq backlog, didn’t we used to do this @neil?

Would have made this problem a bit more obvious.

1 Like

I’m guessing you tried to migrate everything in one go instead of in more manageable batches?

I am not familiar with the migration scripts, but I know that when SitePoint migrated from vB it was done in at least 3 stages.

Perhaps some kind of heads-up “are you sure” message is in order? ,

@Mittineague I actually shut down my current forum for about 4-5 days and did an import on my local machine. The import took about 19-ish hours. I then backed up once it completed and restored the backup on Digital Ocean after following the installation guide step-by-step.

Doing imports on a running production server is a Bad Idea™ for this reason (and others). Running the import locally or on a separate digital ocean instance like you did is best. We should somehow display a warning when people try to do this.

2 Likes

We put back the /admin Sidekiq warning at 100k backlog. Hopefully that will help identify these problems in the future.

3 Likes

A user just emailed me this:

I am encountering a technical issue with the new forum which is that it won´t let me use my password or get a new one. Though I clicked on the “reset password” link, it didn’t quickly email a link to change the password and hours later when I received the reset link by email and clicked on it, it told me that it had expired. This happened twice. How can I get access to my account?

This is because Mandrill had limited my forum to a low number of messages per hour.

Idea: would it be possible to specify a priority email SMTP server in app.yml? This server would be used for time-sensitive emails, such as password resets. Users could set it to Gmail, since this server would see little usage.