您好,此主题提供了我正在缓慢规划和测试的迁移的一些背景信息。上周五,我终于在测试环境 VPS 上尝试了 Drupal 导入器,结合使用了这个和这个。导入器在我输入时仍在运行,所以我还无法实际测试测试站点的功能,但它很快就会完成。
我面临的最大问题是,在约 80,000 个节点(Discourse 中的主题等效项)中,有 8 个似乎是随机的节点出现了“重复键值”错误。以防万一存在某种非常奇怪的 Y2K 式数学错误,这里是具体的 nid 号码:
42081, 53125, 57807, 63932, 66756, 76561, 78250, 82707
每次重新运行导入器时,都会在这些相同的 nid 上发生相同的错误:
Traceback (most recent call last):
19: from script/import_scripts/drupal.rb:537:in `<main>'
18: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
17: from script/import_scripts/drupal.rb:39:in `execute'
16: from script/import_scripts/drupal.rb:169:in `import_forum_topics'
15: from /var/www/discourse/script/import_scripts/base.rb:916:in `batches'
14: from /var/www/discourse/script/import_scripts/base.rb:916:in `loop'
13: from /var/www/discourse/script/import_scripts/base.rb:917:in `block in batches'
12: from script/import_scripts/drupal.rb:195:in `block in import_forum_topics'
11: from /var/www/discourse/script/import_scripts/base.rb:224:in `all_records_exist?'
10: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-7.0.3.1/lib/active_record/transactions.rb:209:in `transaction'
9: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-7.0.3.1/lib/active_record/connection_adapters/abstract/database_statements.rb:316:in `transaction'
8: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-7.0.3.1/lib/active_record/connection_adapters/abstract/transaction.rb:317:in `within_new_transaction'
7: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
6: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
5: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
4: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
3: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-7.0.3.1/lib/active_record/connection_adapters/abstract/transaction.rb:319:in `block in within_new_transaction'
2: from /var/www/discourse/script/import_scripts/base.rb:231:in `block in all_records_exist?'
1: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:56:in `exec'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:56:in `exec': ERROR: duplicate key value violates unique constraint "import_ids_pkey" (PG::UniqueViolation)
DETAIL: Key (val)=(nid:42081) already exists.
20: from script/import_scripts/drupal.rb:537:in `<main>'
19: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
18: from script/import_scripts/drupal.rb:39:in `execute'
17: from script/import_scripts/drupal.rb:169:in `import_forum_topics'
16: from /var/www/discourse/script/import_scripts/base.rb:916:in `batches'
15: from /var/www/discourse/script/import_scripts/base.rb:916:in `loop'
14: from /var/www/discourse/script/import_scripts/base.rb:917:in `block in batches'
13: from script/import_scripts/drupal.rb:195:in `block in import_forum_topics'
12: from /var/www/discourse/script/import_scripts/base.rb:224:in `all_records_exist?'
11: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-7.0.3.1/lib/active_record/transactions.rb:209:in `transaction'
10: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-7.0.3.1/lib/active_record/connection_adapters/abstract/database_statements.rb:316:in `transaction'
9: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-7.0.3.1/lib/active_record/connection_adapters/abstract/transaction.rb:317:in `within_new_transaction'
8: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
7: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
6: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
5: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
4: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-7.0.3.1/lib/active_record/connection_adapters/abstract/transaction.rb:319:in `block in within_new_transaction'
3: from /var/www/discourse/script/import_scripts/base.rb:243:in `block in all_records_exist?'
2: from /var/www/discourse/script/import_scripts/base.rb:243:in `ensure in block in all_records_exist?'
1: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:56:in `exec'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:56:in `exec': ERROR: current transaction is aborted, commands ignored until end of transaction block (PG::InFailedSqlTransaction)
我唯一能让它继续进行的方法是修改 SQL 条件:
...
LEFT JOIN node_counter nc ON nc.nid = n.nid
WHERE n.type = 'forum'
AND n.status = 1
AND n.nid != 42081
AND n.nid != 53125
AND n.nid != 57807
AND n.nid != 63932
AND n.nid != 66756
AND n.nid != 76561
AND n.nid != 78250
AND n.nid != 82707
LIMIT #{BATCH_SIZE}
OFFSET #{offset};
...
我检查了第一个失败的节点以及它在源 Drupal 数据库中前后相邻的 nid,但没有发现任何问题。nid 被设置为主键,并且具有 AUTO_INCREMENT,原始 Drupal 站点运行正常,因此源数据库的完整性不存在根本性问题。
除了上述错误之外,脚本还存在以下限制:
-
固定链接: 看起来导入器脚本会为以前的节点 URL
example.com/node/XXXXXXX创建固定链接。但我还需要维护指向这些节点内特定评论的链接,其格式为:example.com/comment/YYYYYYY#comment-YYYYYYY(YYYYYYY在两种情况下都相同)。Drupal 的 URL 方案不包含评论所属的节点 ID,而 Discourse 则包含(example.com/t/topic-keywords/XXXXXXX/YY),这似乎是一个主要的复杂问题。 -
用户名限制: Drupal 允许用户名中包含空格。我理解 Discourse 不允许,至少不允许新用户以这种方式创建。 此帖子表明导入器脚本会自动“转换”有问题的用户名,但我没有在更新: 实际上,看起来 Discourse 已自动以正确的方式处理了这个问题。/import_scripts/drupal.rb中看到任何相关代码。 -
被封禁用户: 脚本似乎会导入所有用户,包括被封禁的帐户。我或许可以很容易地在 SQL 选择
WHERE status = 1中添加一个条件,只导入活动用户帐户,但我不知道这是否会导致记录序列化出现问题。最重要的是,我希望永久阻止这些先前被封禁的帐户名称及其关联的电子邮件地址,以免相同的问题用户再次在 Discourse 上注册。 -
用户个人资料字段: 有没有人知道其他导入器中是否有导入用户帐户个人资料中个人信息字段的示例代码?我只有一个个人资料字段(“地点”)需要导入。
-
头像(非 Gravatar): Drupal 导入器中有导入 Gravatar 的代码,但没有导入更常用的本地帐户头像图片的代码,这似乎有点奇怪。
-
私人消息: 几乎所有 Drupal 7 论坛可能都会使用第三方 privatemsg 模块(Drupal 没有官方的 PM 功能)。导入器不支持导入 PM。在我的情况下,我需要导入大约 150 万条。
感谢您的帮助以及提供 Drupal 导入器脚本。




