Yahoo Groups 导入错误

正在处理 Yahoo Groups mbox 导入,但遇到了一些错误。目前不确定在调试和导入方面该朝哪个方向进行。以下是我目前看到的错误信息:
https://pastebin.com/raw/2WTN3GTj

你正在使用 mbox 脚本吧?我这边用得很顺利,没有任何错误。虽然附件缺失,但对我来说问题不大。

没错,@tobiaseigen。导入过程持续了 2 个多小时。

除了最后一个问题,我还想补充一点:我不确定是否应该在出现这些失败的情况下继续导入。我在想,如果在修复错误/失败后再次导入,系统是否会跳过已导入的消息,并继续进行正常的导入。

@gerhard 也许我们需要一些输入和帮助……即使按照您的指南操作,Sidekiq 仍未显示这约 35,000 条消息的处理情况。

我不确定 Sidekiq 在这里是否相关——我认为导入脚本是在 Discourse 外部运行的。

如果这对你有帮助,这是我的导入日志。实际上有几行和你的类似,但我决定不去担心它。人生苦短。

既然你有这么多错误,似乎存在一个更系统性的问题。你确定系统有足够的可用内存吗?我不知道你是否已经尝试过,但你可能需要更仔细地查看导入文件,看看是否能发现什么线索——也许你只需要以某种方式调整 split_regex,或者以不同的格式将文件上传到服务器?

如果你继续遇到问题,可以在 Marketplace 寻求帮助——这里有一些非常有经验的顾问专门从事导入工作。我肯定不是专家——这是我第一次尝试。:wink:

root@discourse:/var/discourse# ./launcher enter import
root@discourse-import:/var/www/discourse# RAILS_DB=secondsite
root@discourse-import:/var/www/discourse# export RAILS_DB
root@discourse-import:/var/www/discourse# import_mbox.sh
正在开始 mbox 导入...

正在加载现有群组...
正在加载现有用户...
正在加载现有分类...
正在加载现有帖子...
正在加载现有主题...

正在创建索引
正在对 /shared/import/data/list 中的文件建立索引
正在对 /shared/import/data/list/18929486-3.mbox 建立索引
正在对 /shared/import/data/list/18929486-2.mbox 建立索引

正在对回复和用户建立索引

正在创建分类
        1 / 1 (100.0%)  [4916421 项/分钟]  
正在创建用户
       69 / 69 (100.0%)  [1178 项/分钟]  ]  
正在创建主题和帖子
缺少日期。跳过 0462b41b966d8c11e6e32cc14c0b576d
        1 / 2333 (  0.0%)  [179689 项/分钟]  缺少日期。跳过 0adb9bd80082595a33130f7749d7f530
        2 / 2333 (  0.1%)  [224693 项/分钟]  缺少日期。跳过 3bd86d7adb396fbeb7d6dfcfe9f0be5f
        3 / 2333 (  0.1%)  [283328 项/分钟]  缺少日期。跳过 4f5397838e6c7f96eedfe116ce27be13
        4 / 2333 (  0.2%)  [184374 项/分钟]  缺少日期。跳过 c8c14ab80e92ae1cacd4af99351319bd
       45 / 2333 (  1.9%)  [334 项/分钟]  无法映射帖子 2f401ce90708241252h30bdae5iad2ae0096e067b71@mail.gmail.com
未定义方法 `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:145:in `map_first_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:103:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
      940 / 2333 ( 40.3%)  [398 项/分钟]  无法映射帖子 BBCAF42471FF9540868B4DC02B885B1BBCDA1F@wn1217.or.providence.org
未定义方法 `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:159:in `map_reply'
/var/www/discourse/script/import_scripts/mbox/importer.rb:105:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
      944 / 2333 ( 40.5%)  [399 项/分钟]  无法映射帖子 3A1D6C799D451B41BD0500303339622A023AA1@s-mail.integral-corp.com
未定义方法 `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:159:in `map_reply'
/var/www/discourse/script/import_scripts/mbox/importer.rb:105:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
     1149 / 2333 ( 49.2%)  [408 项/分钟]  无法映射帖子 FF35EE5B30156244A4370DC859B7F650F50626@s-mail.integral-corp.com
未定义方法 `hex' for nil:NilClass
/var/www/discourse/app/models/upload.rb:132:in `base62_sha1'
/var/www/discourse/app/models/upload.rb:386:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:115:in `short_url'
/var/www/discourse/lib/upload_markdown.rb:17:in `image_markdown'
/var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
/var/www/discourse/lib/email/receiver.rb:1085:in `block in add_attachments'
/var/www/discourse/lib/email/receiver.rb:1060:in `each'
/var/www/discourse/lib/email/receiver.rb:1060:in `add_attachments'
/var/www/discourse/script/import_scripts/mbox/importer.rb:137:in `format_raw'
/var/www/discourse/script/import_scripts/mbox/importer.rb:121:in `map_post'
/var/www/discourse/script/import_scripts/mbox/importer.rb:159:in `map_reply'
/var/www/discourse/script/import_scripts/mbox/importer.rb:105:in `block (2 levels) in import_posts'
/var/www/discourse/script/import_scripts/base.rb:491:in `block in create_posts'
/var/www/discourse/script/import_scripts/base.rb:490:in `each'
/var/www/discourse/script/import_scripts/base.rb:490:in `create_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:97:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:870:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:869:in `loop'
/var/www/discourse/script/import_scripts/base.rb:869:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:83:in `batches'
/var/www/discourse/script/import_scripts/mbox/importer.rb:91:in `import_posts'
/var/www/discourse/script/import_scripts/mbox/importer.rb:35:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
script/import_scripts/mbox.rb:16:in `<module:Mbox>'
script/import_scripts/mbox.rb:10:in `<module:ImportScripts>'
script/import_scripts/mbox.rb:9:in `<main>'
     2328 / 2333 ( 99.8%)  [467 项/分钟]  

正在更新主题状态

正在更新主题的 bumped_at

正在更新用户的 last posted at

正在更新用户的 last seen at

正在更新主题回复计数...
       70 / 70 (100.0%)  [10745 项/分钟]    
正在更新 first_post_created_at...

正在更新用户的 post_count...

正在更新用户的 topic_count...

正在更新主题用户

正在更新帖子时间

正在更新特色主题用户

正在更新分类中的特色主题
        9 / 9 (100.0%)  [2505 项/分钟]  n]  
正在更新用户的主题回复计数
       70 / 70 (100.0%)  [9174 项/分钟]   ]  
正在重置主题计数器


完成 (00 小时 06 分钟 58 秒)

所以我直接允许此操作继续(稍后我会查看错误),但现在出现了一个非常奇怪的状况。我尝试将这些邮件导入到一个名为“old-yahoo-group”的文件夹中,方法是先在系统中创建这个分类,然后将所有 mbox 文件夹推送到以下目录:

/var/discourse/shared/standalone/import/data/old-yahoo-group

我以为自己理解了说明,即这些邮件在导入后应该会显示在相应的分类中,但它们在整个系统中都隐藏了。

我可以通过搜索找到旧邮件,但它们没有出现在任何汇总位置。

我该如何调整这次最后的导入,使其进入一个指定的分类,让所有约 3.5 万条邮件都显示在一个方便查看的版块中,并标明这些是旧邮件?

进一步查看后,我似乎找到了原因:

现在我需要弄清楚如何从中恢复……

以下操作完全成功了(前提是 old-yahoo-group 分类已创建,且系统中不存在其他未分类的帖子(实际上该设置在设置中已被禁用)):

/var/discourse/launcher enter app
rails c
un=Category.find_by_slug('uncategorized')
newcat=Category.find_by_slug('old-yahoo-group')
Topic.where(category_id: un.id).update_all(category_id: newcat.id)

顺便提一下,我也有过类似的经历。不知为何,导入脚本忽略了我已创建的分类,尽管其 slug 相同。但它为我创建了新的分类,所以我并没有遇到问题。我只需删除自己创建的分类,然后将脚本创建的分类重命名即可。