Removing email addresses from imported posts

Hi All

I’m setting up a discourse instance for an org that’s currently using a mailing list - we imported the mail (20K messages) with no major issues, however there are a lot of posts where people’s email addresses are showing due to quoting, email signatures etc.

I’m reasonably proficient with the rake/rails command line, but I’m struggling to figure out how to remove these email addresses, I’ve tried various forms of this (regex, not regex) but it never finds any posts

rake posts:remap["\<vincent@domain\.com\>","","regex"]
rake posts:remap["\\<vincent@domain\\.com\\>","","regex"]

Is there a better way to do this? I started with a simple regex to find my own address first (and yes I have a backup!).

Since this is a new discourse instance, I’m not concerned about removing any @ mentions etc.

Thx.

Here is a quick script @pfaffman wrote for us. Any crappy parts are things I changed. It has a few bells and whistles you may not need, for example a cutoff date so it only removes email addresses from posts before that date.

I found it best to replace addresses with ‘email@removed.com’, rather than remove them entirely. Can’t remember why- I think it played nicer with surrounding brackets.

def remove_email_addresses
  n=0 
  test_mode=false 
  dt=DateTime.new(2019, 1, 1, 0, 0, 0) 
  no_emails="email@removed." 
  Post.where("raw like '%@%'").find_each do |post| 
    sleep 0.1 
    if post.created_at < dt 
      post.raw.gsub!(/[a-z0-9+-_.]+@[a-z0-9+-]+[. ,;\\]/i,no_emails) 
      if test_mode 
        puts post.raw 
        sleep 10 
      end 
      post.save unless test_mode 
      post.rebake! unless test_mode 
      puts "saved" 
      n+=1 
      puts n.to_s 
    else 
      puts "new post, leaving as-is" 
    end 
  end 
  nil 
end

> remove_email_addresses
2 Likes

Excellent thanks, I’ll give it a try.