Cleaning up e-mail logs

I was looking to free up some disk space on our discourse installation and I noticed that the email_logs table was using a substantial amount of space compared to everything else. For context all other tables amount to about 100mb but the e-mail logs are in the range of 12GB

I’m curious, is there a process to limit the amount in this table? Can older entries be removed on a regular basis?

Thanks!

1 Like

Sounds like there was a dire problem with your email config at one point?

Diving in to this a little more there are a lot of logs with the type ‘email_reject_destination’ with a lot of spam content, ~ 500k rows in one week.

Is there any harm in selectively trimming the table either by just date or perhaps just removing any ‘email_reject_destination’ logs over x days?

EDIT - Yep, a lot of reject logs!

  count   |              email_type              
----------+--------------------------------------
 24086049 | email_reject_destination
  6293124 | email_reject_auto_generated
   639048 | mailing_list
    55853 | digest
     7398 | user_posted

`

3 Likes

You can safely trim all the email_logs which have an empty reply_key (eg. “email_reject_*” and “digest”).

For the ones that have a reply_key, if you delete them, you’ll prevent users from replying to that email. So I guess you could be fine deleting some that are older than X months.

2 Likes

Should we be doing this automatically?

We could, but I’m not sure what threshold to use. Add a new site setting which defaults to 1 year?

Note: this will also affect the stats we have on the dashboard.

After trimming the rejects our database shrunk from 13gb -> 900mb so our database is happy again.

Thanks!

2 Likes

Sure default to 1 year is safe lets do that!

3 Likes

https://github.com/discourse/discourse/commit/ac863bab915a1b808e259d3bfd0d23b7bbba13ca

3 Likes

Suppress? Isn’t the “purge” or “delete”?

I think the word is a bit confusing in this context

1 Like

I was being consistent with https://github.com/discourse/discourse/blob/master/config/site_settings.yml#L503 but I agree, delete is much better. ("purge" might be too technical)

Yeah it should be delete.

https://github.com/discourse/discourse/commit/460665895c91b2f9018e361b393d7e00dc86b418

1 Like

Hey @zogstrip,

When you changed from suppress_email_logs_after_days to delete_email_logs_after_days are the changed values lost on upgrades?

I just sent 15k e-mail to people :sadpanda: when digest ran.

Having a second look at it, I think you’re right. I forgot a migration… :disappointed:

What did you set it to?

I have 110k users created trough API, and only 15k using the forum.

Just enabled digests last week, and set it to 60 days, because my non-users where created 5 months ago.

When the update ran, the default was set back to 360 days, and many e-mails sent.

I got back from my lunch and got to kill the sidekiq queue with 15k sent :slightly_smiling:.

So sorry about that :anguished:

1 Like

Can we fix it so people who upgrade today no longer have this problem

Ahhh re read … Odds on someone bring on that exact version with the bad name is really low

The default for SiteSetting.delete_email_logs_after_days has been reduced to 90 days as per

https://github.com/discourse/discourse/commit/186623acd070b71422dfe61e083fa3aeb67844ac

For busy sites that send out alot of emails, this table can bloat fairly quickly. There are some optimizations that we can do on the table to make it use less disk space but that’ll have to wait till Discourse Version 2.1.

6 Likes

A couple of changes have been made here

  1. https://github.com/discourse/discourse/commit/01a63f8b4b3d193ecf9d25a8f7a0365c2902f0aa
    This column was not required as we can obtain the topic through the post.

  2. https://github.com/discourse/discourse/commit/ae8b0a517ffda0bb57a67bb9773bfb441181dcee
    Emails that have been skipped are logged into another table skipped_email_logs from now on so that we can avoid creating indexes like this. Also, we were previously storing the skipped reasons as strings in the database which was inefficient and the “locale” of the reason is decided when the record is created. Going forward, each skipped email log contains a reason_type which is stored as an integer in the database. The reason is then translated on demand based on the given reason_type.

  3. https://github.com/discourse/discourse/commit/fad9c2b97113ddbac16d968c38edb7ab47aa8eb9
    Previously, we couldn’t delete email_logs with a reply_key because it would break the reply to feature. To overcome this limitation, we’re storing the reply keys in a new table that contains less columns making it more light weight.

Note that I’m playing it safe here and have yet to drop the old columns from the tables so you’ll actually see an increase in disk space used for a short period of time.

5 Likes