Cleaning up e-mail logs


(Garth) #1

I was looking to free up some disk space on our discourse installation and I noticed that the email_logs table was using a substantial amount of space compared to everything else. For context all other tables amount to about 100mb but the e-mail logs are in the range of 12GB

I’m curious, is there a process to limit the amount in this table? Can older entries be removed on a regular basis?

Thanks!


(Jeff Atwood) #2

Sounds like there was a dire problem with your email config at one point?


(Garth) #3

Diving in to this a little more there are a lot of logs with the type ‘email_reject_destination’ with a lot of spam content, ~ 500k rows in one week.

Is there any harm in selectively trimming the table either by just date or perhaps just removing any ‘email_reject_destination’ logs over x days?

EDIT - Yep, a lot of reject logs!

  count   |              email_type              
----------+--------------------------------------
 24086049 | email_reject_destination
  6293124 | email_reject_auto_generated
   639048 | mailing_list
    55853 | digest
     7398 | user_posted

`


(Régis Hanol) #4

You can safely trim all the email_logs which have an empty reply_key (eg. “email_reject_*” and “digest”).

For the ones that have a reply_key, if you delete them, you’ll prevent users from replying to that email. So I guess you could be fine deleting some that are older than X months.


(Jeff Atwood) #5

Should we be doing this automatically?


(Régis Hanol) #6

We could, but I’m not sure what threshold to use. Add a new site setting which defaults to 1 year?

Note: this will also affect the stats we have on the dashboard.


(Garth) #7

After trimming the rejects our database shrunk from 13gb -> 900mb so our database is happy again.

Thanks!


(Jeff Atwood) #8

Sure default to 1 year is safe lets do that!


(Régis Hanol) #9

(Sam Saffron) #10

Suppress? Isn’t the “purge” or “delete”?

I think the word is a bit confusing in this context


(Régis Hanol) #11

I was being consistent with discourse/site_settings.yml at master · discourse/discourse · GitHub but I agree, delete is much better. ("purge" might be too technical)


(Jeff Atwood) #12

Yeah it should be delete.


(Régis Hanol) #13

(Rafael dos Santos Silva) #14

Hey @zogstrip,

When you changed from suppress_email_logs_after_days to delete_email_logs_after_days are the changed values lost on upgrades?

I just sent 15k e-mail to people :sadpanda: when digest ran.


(Régis Hanol) #15

Having a second look at it, I think you’re right. I forgot a migration… :disappointed:

What did you set it to?


(Rafael dos Santos Silva) #16

I have 110k users created trough API, and only 15k using the forum.

Just enabled digests last week, and set it to 60 days, because my non-users where created 5 months ago.

When the update ran, the default was set back to 360 days, and many e-mails sent.

I got back from my lunch and got to kill the sidekiq queue with 15k sent :slightly_smiling:.


(Régis Hanol) #17

So sorry about that :anguished:


(Sam Saffron) #18

Can we fix it so people who upgrade today no longer have this problem

Ahhh re read … Odds on someone bring on that exact version with the bad name is really low


(Alan Tan) #19

The default for SiteSetting.delete_email_logs_after_days has been reduced to 90 days as per

For busy sites that send out alot of emails, this table can bloat fairly quickly. There are some optimizations that we can do on the table to make it use less disk space but that’ll have to wait till Discourse Version 2.1.


(Alan Tan) #21

A couple of changes have been made here

  1. Drop `EmailLogs#topic_id`. · discourse/discourse@01a63f8 · GitHub
    This column was not required as we can obtain the topic through the post.

  2. PERF: Split skipped email logs into a seperate table. · discourse/discourse@ae8b0a5 · GitHub
    Emails that have been skipped are logged into another table skipped_email_logs from now on so that we can avoid creating indexes like this. Also, we were previously storing the skipped reasons as strings in the database which was inefficient and the “locale” of the reason is decided when the record is created. Going forward, each skipped email log contains a reason_type which is stored as an integer in the database. The reason is then translated on demand based on the given reason_type.

  3. PERF: Move `EmailLog#reply_key` into new `post_reply_keys` table. · discourse/discourse@fad9c2b · GitHub
    Previously, we couldn’t delete email_logs with a reply_key because it would break the reply to feature. To overcome this limitation, we’re storing the reply keys in a new table that contains less columns making it more light weight.

Note that I’m playing it safe here and have yet to drop the old columns from the tables so you’ll actually see an increase in disk space used for a short period of time.