(Superseded) Import MBOX (mailing list) files

If one wanted to import a mailing list into an already existing discourse instance, would following the steps here result in wiping out my existing instance? And, if that’s the case, would it make sense to follow the steps here to import the archives and then use the Topic and Category Export/Import to move the mailing list archives instead?

Thanks!

2 Likes

The way that I’d do it is to

  1. back up discourse,
  • freeze discourse,
  • import that database on your development machine,
  • import the mailing list on the dev machine,
  • backup
  • restore that backup on the production machine.
6 Likes

Makes sense. Thanks!

1 Like

I am trying to import about 142000 emails using this script, yet some emails just block everything for unknown reasons.
Is there any way to modify the script so that emails are skipped if they take too long to process?

There should be a way to modify it so that it processes or ignores those messages. Is there something about those messages that seems different?

not really. I can’t see anything different about them. Might be something with CCs or so, but that’s processed in other emails just fine…

Is it possible to make the SQLite-to-Discourse step “verbose”? The “creating forum topics” part of the process halts reliably after a very limited number of created topics. I do not understand how to troubleshoot this part of the import.

importing users
Skipping 20 already imported users
Skipping 12 already imported users

creating forum topics
        2 / 192 (  1.0%)  [1093 items/min] 

After CTRL + V:

 ^C/home/discourse/.rbenv/versions/2.3.4/lib/ruby/gems/2.3.0/gems/email_reply_trimmer-0.1.7/lib/email_reply_trimmer.rb:182:in `gsub!': Interrupt
        from /home/discourse/.rbenv/versions/2.3.4/lib/ruby/gems/2.3.0/gems/email_reply_trimmer-0.1.7/lib/email_reply_trimmer.rb:182:in `block in preprocess!'
        from /home/discourse/.rbenv/versions/2.3.4/lib/ruby/gems/2.3.0/gems/email_reply_trimmer-0.1.7/lib/email_reply_trimmer.rb:181:in `each'
        from /home/discourse/.rbenv/versions/2.3.4/lib/ruby/gems/2.3.0/gems/email_reply_trimmer-0.1.7/lib/email_reply_trimmer.rb:181:in `preprocess!'
        from /home/discourse/.rbenv/versions/2.3.4/lib/ruby/gems/2.3.0/gems/email_reply_trimmer-0.1.7/lib/email_reply_trimmer.rb:33:in `trim'
        from /home/discourse/discourse/lib/email/receiver.rb:205:in `select_body'
        from script/import_scripts/mbox.rb:426:in `block (2 levels) in create_forum_topics'
        from /home/discourse/discourse/script/import_scripts/base.rb:432:in `block in create_posts'
        from /home/discourse/discourse/script/import_scripts/base.rb:431:in `each'
        from /home/discourse/discourse/script/import_scripts/base.rb:431:in `create_posts'
        from script/import_scripts/mbox.rb:419:in `block in create_forum_topics'
        from /home/discourse/discourse/script/import_scripts/base.rb:784:in `block in batches'
        from /home/discourse/discourse/script/import_scripts/base.rb:783:in `loop'
        from /home/discourse/discourse/script/import_scripts/base.rb:783:in `batches'
        from script/import_scripts/mbox.rb:413:in `create_forum_topics'
        from script/import_scripts/mbox.rb:57:in `execute'
        from /home/discourse/discourse/script/import_scripts/base.rb:45:in `perform'
        from script/import_scripts/mbox.rb:555:in `<main>'

Sure. Just add in some

puts "#{somevariable}"

Statements at the top of that loop.

1 Like

Thanks, that works well. Is there a place where the maximum body length of an imported message is defined?

Edit: Nevermind. I think I’ve narrowed the problem down to forward slashes (/) in the content to be imported.

1 Like

There is a sitesetting.

If you grep other importers you can find how to set a sitesetting in the importer.

@pfaffman I’m doing an import of a mbox gnu archive, the importer does all the message indexing and creates the SQLite DB index.db although there is no message content.

I suspect it will have something to do with the errors I’m getting all through the import:

Ignoring bad email address foo.bar at example.org (Foo Bar) in

I’ve had a look at the source around that error message, which is checking that the from_email matches the regex, so presumably something in those email addresses (it affects them all) is not matching.

There is a bit of email processing code in the #extract_name method, which is clearly designed to sort out real names in parentheses and replace _at_ with @, but clearly something isn’t working as intended.

Any pointers gratefully received.