Working through a Yahoo Groups mbox import and hitting some errors. Not sure what direction to take at this point regarding debug and importation. Here are the errors I’m seeing so far:
https://pastebin.com/raw/2WTN3GTj
You are using the mbox script’ right? Worked well for me with no errors. Attachments are missing but not the end of the world in my case.
That is correct @tobiaseigen. The importation ran for over 2 hours.
In addition to the last question, I wanted to add that I wasn’t sure if I should proceed with importation of this even with these failures… wondering if after the errors/failures are fixed we can simply import them AGAIN and it will skip the already imported messages and just move on with proper importation.
@gerhard perhaps we need some input and help here… Even after going through your guide Sidekiq is not showing the processing of these ~35,000 messages.
Not sure sidekiq is relevant here - the import script runs outside discourse I think.
In case it helps, here’s my import log. There are in fact a few lines that are similar to yours, but I just decided not to worry about it. Life is too short.
Since you have so many errors, you seem to have have a more systematic problem. Are you sure the system has enough RAM available? I don’t know if you have already tried it, but you may want to look at the import file a little more closely and try to identify if there is anything you can find out there - maybe you just need to adjust the split_regex
in some way, or upload the file to your server in a different format?
If you keep having trouble, you could ask for help in marketplace - there are some consultants hanging out here who are quite experienced at doing imports. I’m certainly no expert - this was my first attempt.
root@discourse:/var/discourse# ./launcher enter import
root@discourse-import:/var/www/discourse# RAILS_DB=secondsite
root@discourse-import:/var/www/discourse# export RAILS_DB
root@discourse-import:/var/www/discourse# import_mbox.sh
The mbox import is starting...
Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...
creating index
indexing files in /shared/import/data/list
indexing /shared/import/data/list/18929486-3.mbox
indexing /shared/import/data/list/18929486-2.mbox
indexing replies and users
creating categories
1 / 1 (100.0%) [4916421 items/min]
creating users
69 / 69 (100.0%) [1178 items/min] ]
creating topics and posts
Date is missing. Skipping 0462b41b966d8c11e6e32cc14c0b576d
1 / 2333 ( 0.0%) [179689 items/min] Date is missing. Skipping 0adb9bd80082595a33130f7749d7f530
2 / 2333 ( 0.1%) [224693 items/min] Date is missing. Skipping 3bd86d7adb396fbeb7d6dfcfe9f0be5f
3 / 2333 ( 0.1%) [283328 items/min] Date is missing. Skipping 4f5397838e6c7f96eedfe116ce27be13
4 / 2333 ( 0.2%) [184374 items/min] Date is missing. Skipping c8c14ab80e92ae1cacd4af99351319bd
45 / 2333 ( 1.9%) [334 items/min] Failed to map post for 2f401ce90708241252h30bdae5iad2ae0096e067b71@mail.gmail.com
undefined method `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:145:in `map_first_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:103:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
940 / 2333 ( 40.3%) [398 items/min] Failed to map post for BBCAF42471FF9540868B4DC02B885B1BBCDA1F@wn1217.or.providence.org
undefined method `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:159:in `map_reply'
/var/www/discourse/script/import_scripts/mbox/importer.rb:105:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
944 / 2333 ( 40.5%) [399 items/min] Failed to map post for 3A1D6C799D451B41BD0500303339622A023AA1@s-mail.integral-corp.com
undefined method `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:159:in `map_reply'
/var/www/discourse/script/import_scripts/mbox/importer.rb:105:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
1149 / 2333 ( 49.2%) [408 items/min] Failed to map post for FF35EE5B30156244A4370DC859B7F650F50626@s-mail.integral-corp.com
undefined method `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:159:in `map_reply'
/var/www/discourse/script/import_scripts/mbox/importer.rb:105:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
2328 / 2333 ( 99.8%) [467 items/min]
Updating topic status
Updating bumped_at on topics
Updating last posted at on users
Updating last seen at on users
Updating topic reply counts...
70 / 70 (100.0%) [10745 items/min]
Updating first_post_created_at...
Updating user post_count...
Updating user topic_count...
Updating topic users
Updating post timings
Updating featured topic users
Updating featured topics in categories
9 / 9 (100.0%) [2505 items/min] n]
Updating user topic reply counts
70 / 70 (100.0%) [9174 items/min] ]
Resetting topic counters
Done (00h 06min 58sec)
So I went ahead and allowed this to proceed (I’ll look at the errors later), but now I have a very big peculiarity. I had attempted to import these into a folder called “old-yahoo-group” by 1st creating this CATEGORY within the system and then I pushed all of the mbox folders into a folder here:
/var/discourse/shared/standalone/import/data/old-yahoo-group
I thought I had understood the instructions such that these messages, upon importation, would show up in the appropriate category, however they’re all hidden in the system.
I can do a search and find old messages just fine, however they do not appear in any aggregate location.
How can se manipulate this last import to go into a defined category such that all ~35k messages show up in a convenient section that indicates these are old messages?
Upon looking further, I appear to have found what happened:
Now I need to figure out how to recover from this…
So this worked perfectly (where old-yahoo-group
had already been created and NO other uncategorized posts existed (and it was actually disabled in the Settings)):
/var/discourse/launcher enter app
rails c
un=Category.find_by_slug('uncategorized')
newcat=Category.find_by_slug('old-yahoo-group')
Topic.where(category_id: un.id).update_all(category_id: newcat.id)
Incidentally, I had a similar experience. For some reason the import script ignored the existing category I had created even though the slug was the same. But it created the new category for me so I had no problem. I just deleted the category I had created and renamed the category created by the script.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.