Yahoo Groups 导入错误

正在处理 Yahoo Groups mbox 导入,但遇到了一些错误。目前不确定在调试和导入方面该朝哪个方向进行。以下是我目前看到的错误信息:
https://pastebin.com/raw/2WTN3GTj

你正在使用 mbox 脚本吧?我这边用得很顺利,没有任何错误。虽然附件缺失,但对我来说问题不大。

没错,@tobiaseigen。导入过程持续了 2 个多小时。

除了最后一个问题,我还想补充一点:我不确定是否应该在出现这些失败的情况下继续导入。我在想,如果在修复错误/失败后再次导入,系统是否会跳过已导入的消息,并继续进行正常的导入。

@gerhard 也许我们需要一些输入和帮助……即使按照您的指南操作,Sidekiq 仍未显示这约 35,000 条消息的处理情况。

Not sure sidekiq is relevant here - the import script runs outside discourse I think.

In case it helps, here’s my import log. There are in fact a few lines that are similar to yours, but I just decided not to worry about it. Life is too short.

Since you have so many errors, you seem to have have a more systematic problem. Are you sure the system has enough RAM available? I don’t know if you have already tried it, but you may want to look at the import file a little more closely and try to identify if there is anything you can find out there - maybe you just need to adjust the split_regex in some way, or upload the file to your server in a different format?

If you keep having trouble, you could ask for help in marketplace - there are some consultants hanging out here who are quite experienced at doing imports. I’m certainly no expert - this was my first attempt. :wink:

root@discourse:/var/discourse# ./launcher enter import
root@discourse-import:/var/www/discourse# RAILS_DB=secondsite
root@discourse-import:/var/www/discourse# export RAILS_DB
root@discourse-import:/var/www/discourse# import_mbox.sh
The mbox import is starting...

Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...

creating index
indexing files in /shared/import/data/list
indexing /shared/import/data/list/18929486-3.mbox
indexing /shared/import/data/list/18929486-2.mbox

indexing replies and users

creating categories
        1 / 1 (100.0%)  [4916421 items/min]  
creating users
       69 / 69 (100.0%)  [1178 items/min]  ]  
creating topics and posts
Date is missing. Skipping 0462b41b966d8c11e6e32cc14c0b576d
        1 / 2333 (  0.0%)  [179689 items/min]  Date is missing. Skipping 0adb9bd80082595a33130f7749d7f530
        2 / 2333 (  0.1%)  [224693 items/min]  Date is missing. Skipping 3bd86d7adb396fbeb7d6dfcfe9f0be5f
        3 / 2333 (  0.1%)  [283328 items/min]  Date is missing. Skipping 4f5397838e6c7f96eedfe116ce27be13
        4 / 2333 (  0.2%)  [184374 items/min]  Date is missing. Skipping c8c14ab80e92ae1cacd4af99351319bd
       45 / 2333 (  1.9%)  [334 items/min]  Failed to map post for 2f401ce90708241252h30bdae5iad2ae0096e067b71@mail.gmail.com
undefined method `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:145:in `map_first_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:103:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
      940 / 2333 ( 40.3%)  [398 items/min]  Failed to map post for BBCAF42471FF9540868B4DC02B885B1BBCDA1F@wn1217.or.providence.org
undefined method `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:159:in `map_reply'
/var/www/discourse/script/import_scripts/mbox/importer.rb:105:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
      944 / 2333 ( 40.5%)  [399 items/min]  Failed to map post for 3A1D6C799D451B41BD0500303339622A023AA1@s-mail.integral-corp.com
undefined method `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:159:in `map_reply'
/var/www/discourse/script/import_scripts/mbox/importer.rb:105:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
     1149 / 2333 ( 49.2%)  [408 items/min]  Failed to map post for FF35EE5B30156244A4370DC859B7F650F50626@s-mail.integral-corp.com
undefined method `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:159:in `map_reply'
/var/www/discourse/script/import_scripts/mbox/importer.rb:105:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
     2328 / 2333 ( 99.8%)  [467 items/min]  

Updating topic status

Updating bumped_at on topics

Updating last posted at on users

Updating last seen at on users

Updating topic reply counts...
       70 / 70 (100.0%)  [10745 items/min]    
Updating first_post_created_at...

Updating user post_count...

Updating user topic_count...

Updating topic users

Updating post timings

Updating featured topic users

Updating featured topics in categories
        9 / 9 (100.0%)  [2505 items/min]  n]  
Updating user topic reply counts
       70 / 70 (100.0%)  [9174 items/min]   ]  
Resetting topic counters


Done (00h 06min 58sec)

所以我直接允许此操作继续(稍后我会查看错误),但现在出现了一个非常奇怪的状况。我尝试将这些邮件导入到一个名为“old-yahoo-group”的文件夹中,方法是先在系统中创建这个分类,然后将所有 mbox 文件夹推送到以下目录:

/var/discourse/shared/standalone/import/data/old-yahoo-group

我以为自己理解了说明,即这些邮件在导入后应该会显示在相应的分类中,但它们在整个系统中都隐藏了。

我可以通过搜索找到旧邮件,但它们没有出现在任何汇总位置。

我该如何调整这次最后的导入,使其进入一个指定的分类,让所有约 3.5 万条邮件都显示在一个方便查看的版块中,并标明这些是旧邮件?

进一步查看后,我似乎找到了原因:

现在我需要弄清楚如何从中恢复……

以下操作完全成功了(前提是 old-yahoo-group 分类已创建,且系统中不存在其他未分类的帖子(实际上该设置在设置中已被禁用)):

/var/discourse/launcher enter app
rails c
un=Category.find_by_slug('uncategorized')
newcat=Category.find_by_slug('old-yahoo-group')
Topic.where(category_id: un.id).update_all(category_id: newcat.id)

顺便提一下,我也有过类似的经历。不知为何,导入脚本忽略了我已创建的分类,尽管其 slug 相同。但它为我创建了新的分类,所以我并没有遇到问题。我只需删除自己创建的分类,然后将脚本创建的分类重命名即可。