Huge server load processing digests after import

Hi,

after running an import with 300k users, I found that running the weekly digest is a bit of an issue: the server will queue 100k emails at once and processing them is hard on the machine - the load spikes up (about 10x), making the site very slow for a hours. I suppose this is an edge case of running an import - with organic signups these will be distributed more evenly throughout the week, making this less of an issue.

For now, I’ve flushed the email jobs and stopped generating digests, but I’d like to re-enable them.

What’s the recommended strategy for handling this - is there a way to ‘spread out’ the generation of the digests?

It sounds like most of your users have the same last_seen_at value. Usually import scripts set it to the user’s created_at or last_posted_at date & time.

https://github.com/discourse/discourse/blob/master/script/import_scripts/base.rb#L739-L745

Maybe that didn’t happen or lots of users have it set to the same time… Try updating the last_seen_at column with random times so that it evenly distributes the digest mails throughout the day.

6 Likes

Exactly this, also if your import data didn’t had an explicit field for last_seen_at, you should update with it with the timestamp of their last post.

This way, a lot of those 300k users will not get the digest, because the date will be greater than suppress digest email after days.

5 Likes