Problem importing using mbox script

(Geoff Hutchison) #1

I’m trying to import some mailing list archives in mbox format from SourceForge.

I used formail to split into individual e-mail messages - this was one snag, since it caused some minor corruption in certain messages (e.g., there was no real From: address).

Fine, now I have the JSON indexes created:

-rw-rw-r-- 1 discourse discourse 192003 Jan  7 00:00 replies-index.json
-rw-rw-r-- 1 discourse discourse  77279 Jan  7 00:00 topic-index.json
-rw-rw-r-- 1 discourse discourse  11363 Jan  7 00:00 user-index.json

But when it starts to actually do the work:

creating forum topics
/var/www/discourse/vendor/bundle/ruby/2.0.0/gems/rack-mini-profiler-0.9.7/lib/patches/db/pg.rb:50:in `exec': ERROR:  duplicate key value violates unique constraint "import_ids_pkey" (PG::UniqueViolation)
DETAIL:  Key (val)=(<>) already exists.
	from /var/www/discourse/vendor/bundle/ruby/2.0.0/gems/rack-mini-profiler-0.9.7/lib/patches/db/pg.rb:50:in `exec'
	from /var/www/discourse/lib/freedom_patches/active_record_base.rb:7:in `exec_sql'
	from /var/www/discourse/script/import_scripts/base.rb:203:in `all_records_exist?'
	from script/import_scripts/mbox.rb:147:in `block in create_forum_topics'

(Geoff Hutchison) #2

I should indicate that I’m highly willing to debug and/or tailor the importer script. I’m just not sure why there’s a duplicate key - the script is supposed to hoist such topics into replies.

(Geoff Hutchison) #3

I resolved this by disabling the all_records_exist? queries for topics and replies:

(line 146):

     next if all_records_exist? :posts, {|t| t['id']}

(line 182):

  next if all_records_exist? :posts, {|p| p['id']}

Since I’m doing the initial import, this should not be a problem - the topics and replies should not already exist in my Discourse instance.

For future reference, importing from SourceForge mailing list mbox archives ~3000 messages took ~20-30 minutes. I pulled two different mailing lists into different categories.

The result was successful (


@ghutchis - How do you create the json files from mbox? I searched around for an mbox import guide but can’t see to find one.

(Geoff Hutchison) #5

You don’t create any json files. You break the mbox into individual files in a directory. (I used formail) Then you edit this script and run it:

discourse/mbox.rb at master · discourse/discourse · GitHub

Hope that helps!