I’m setting up a discourse instance for an org that’s currently using a mailing list - we imported the mail (20K messages) with no major issues, however there are a lot of posts where people’s email addresses are showing due to quoting, email signatures etc.
I’m reasonably proficient with the rake/rails command line, but I’m struggling to figure out how to remove these email addresses, I’ve tried various forms of this (regex, not regex) but it never finds any posts
Here is a quick script @pfaffman wrote for us. Any crappy parts are things I changed. It has a few bells and whistles you may not need, for example a cutoff date so it only removes email addresses from posts before that date.
I found it best to replace addresses with ‘email@removed.com’, rather than remove them entirely. Can’t remember why- I think it played nicer with surrounding brackets.
def remove_email_addresses
n=0
test_mode=false
dt=DateTime.new(2019, 1, 1, 0, 0, 0)
no_emails="email@removed."
Post.where("raw like '%@%'").find_each do |post|
sleep 0.1
if post.created_at < dt
post.raw.gsub!(/[a-z0-9+-_.]+@[a-z0-9+-]+[. ,;\\]/i,no_emails)
if test_mode
puts post.raw
sleep 10
end
post.save unless test_mode
post.rebake! unless test_mode
puts "saved"
n+=1
puts n.to_s
else
puts "new post, leaving as-is"
end
end
nil
end
> remove_email_addresses