将 vBulletin 3 论坛迁移到 Discourse,通过 XenForo

趁热打铁写下这些内容——目前仍在进行中,请务必测试并确保它满足您的需求。

据我所知,没有从 vB3 到 Discourse 的导入器,而且我手头也没有 vB4/5 的许可证(我认为 Discourse 的导入器是专为这些版本设计的)——但我确实有 XenForo 1.4 的许可证,而且有一个针对它的 Discourse 导入器!对于那些没有 XF 许可证的用户,通常可以在二手市场购买,或者您可以付费请人完成导入并为您提供 XF 数据库。

我之前曾成功将 vB3.6 导入到 XF,所以我知道这没问题(唯一无法导入的是个人资料照片,因为 XF 只有头像——但我有一个修复方案)。

好的…

首先,像往常一样将您的 vB 论坛导入到 XF。

我建议您保持 vB 论坛在线并可访问(因此请将导入操作放在 XF 的子目录中)。这只是一个安全措施,以防您后来决定保留 vB 论坛,同时也因为我们将要从在线站点复制个人资料图片(当然,如果您真的需要,也可以事先使用下面的个人资料复制脚本)。

一旦您确认论坛已成功转换为 XF,请备份这个新数据库,并将其复制到您的开发机器上。

我的开发机器是 Mac,所以这些说明适用于 macOS。

brew install mysql
// 同时确保它已启动

mysql -u root

create database xenforo_db;
exit;

mysql -u root -p xenforo_db < /path/to/your/backup/and/downloaded/xenforo_db.sql

按照常规设置您的 Discourse 开发环境(参见 这篇 macOS 指南),然后:

打开 database.yml 并将数据库名称更改为类似 discourse_development_sitename_01 的内容——使用数字,这样您可以通过简单地更改数字来重复执行导入。

bundle
bundle exec rake db:create
bundle exec rake db:migrate
RAILS_ENV=development bundle exec rake admin:create
RAILS_ENV=development bundle exec rake admin:create

对于第一个管理员账户,请尝试使用您在 vB/XF 安装中现有的管理员账户的相同电子邮件地址。当它询问是否授予管理员权限时,选择“是”。

对于第二次创建账户,请将电子邮件设置为类似 guest@something.com 的内容,并在询问是否将其创建为管理员账户时选择“否”。我们需要这个账户来处理与访客/已删除用户相关的帖子。您可以进入 rails c 然后运行 User.last 来检查其 ID,但它很可能为 2。我们将把这个 ID 添加到导入脚本中。

我对导入脚本做了一些修改,以下是我的版本(请用以下内容替换 script/import_scripts/xenforo.rb 的内容):

# frozen_string_literal: true

require "mysql2"
require_relative "base"

require "set" # 可能不需要 - 不记得为什么现在加上了
require "htmlentities" # 可能不需要 - 不记得为什么现在加上了

require File.expand_path(File.dirname(__FILE__) + "/base.rb")

# 调用方式如下:
#   RAILS_ENV=production bundle exec ruby script/import_scripts/xenforo.rb
class ImportScripts::XenForo < ImportScripts::Base

  XENFORO_DB = "xenforo_db_3"
  TABLE_PREFIX = "xf_"
  BATCH_SIZE = 1000
  ATTACHMENT_DIR = '/full/path/to/attachments/eg/name/projects/discourse/sitename/discourse/tmp/attachments'
  AVATAR_DIR = '/full/path/to/avatars/eg/name/projects/discourse/sitename/discourse/tmp/avatars'
  PROFILE_PIC_DIR = '/full/path/to/profilepics/eg/name/projects/discourse/sitename/discourse/tmp/profilepics'

  def initialize
    super
    @client = Mysql2::Client.new(
      host: "localhost",
      username: "root",
      password: "",
      database: XENFORO_DB
    )

    @category_mappings = {}
    @prefix_as_category = false
  end

  def execute
    import_users
    import_avatars
    import_categories
    import_posts
  end

  def import_users
    puts '', "creating users"

    total_count = mysql_query("SELECT count(*) count FROM #{TABLE_PREFIX}user;").first['count']

    batches(BATCH_SIZE) do |offset|
      results = mysql_query(
        "SELECT user_id id, username, email, custom_title title, register_date created_at,
                last_activity last_visit_time, user_group_id, is_moderator, is_admin, is_staff
         FROM #{TABLE_PREFIX}user
         LIMIT #{BATCH_SIZE}
         OFFSET #{offset};")

      break if results.size < 1

      next if all_records_exist? :users, results.map { |u| u["id"].to_i }

      create_users(results, total: total_count, offset: offset) do |user|
        next if user['username'].blank?
        { id: user['id'],
          email: user['email'],
          username: user['username'],
          title: user['title'],
          created_at: Time.zone.at(user['created_at']),
          last_seen_at: Time.zone.at(user['last_visit_time']),
          moderator: user['is_moderator'] == 1 || user['is_staff'] == 1,
          admin: user['is_admin'] == 1 }
      end
    end
  end

  def import_user_profiles
    puts "Importing user profiles..."

    user_profiles = mysql_query("
        SELECT user_id, location, about
        FROM #{TABLE_PREFIX}user_profile
        ORDER BY user_id;
    ")
    
    puts "Importing profiles: fetching info"
    user_profiles.each do |row|
      usf = UserCustomField.find_by(name: "import_id", value: row["user_id"])
      if user = User.find(usf.user_id)
        puts "Updating profile for #{user.username}"
        profile = user.user_profile
        profile.location = row["location"]
        profile.bio_raw = row["about"]
        profile.save
      end
    end
  end

  def import_categories
    puts "", "importing categories..."

    categories = mysql_query("
        SELECT node_id id,
               title,
               description,
               parent_node_id,
               display_order
          FROM #{TABLE_PREFIX}node
      ORDER BY parent_node_id, display_order
      ").to_a

    top_level_categories = categories.select { |c| c["parent_node_id"] == 0 }

    create_categories(top_level_categories) do |c|
      {
        id: c['id'],
        name: c['title'],
        description: c['description'],
        position: c['display_order']
      }
    end

    top_level_category_ids = Set.new(top_level_categories.map { |c| c["id"] })

    subcategories = categories.select { |c| top_level_category_ids.include?(c["parent_node_id"]) }

    create_categories(subcategories) do |c|
      {
        id: c['id'],
        name: c['title'],
        description: c['description'],
        position: c['display_order'],
        parent_category_id: category_id_from_imported_category_id(c['parent_node_id'])
      }
    end

    subcategory_ids = Set.new(subcategories.map { |c| c['id'] })

    # 更深层的分类需要转换为标签
    categories.each do |c|
      next if c['parent_node_id'] == 0
      next if top_level_category_ids.include?(c['id'])
      next if subcategory_ids.include?(c['id'])

      # 为属于此分类的主题找到一个子分类
      parent = c
      while !parent.nil? && !subcategory_ids.include?(parent['id'])
        parent = categories.find { |subcat| subcat['id'] == parent['parent_node_id'] }
      end

      if parent
        tag_name = DiscourseTagging.clean_tag(c['title'])
        @category_mappings[c['id']] = {
          category_id: category_id_from_imported_category_id(parent['id']),
          tag: Tag.find_by_name(tag_name) || Tag.create(name: tag_name)
        }
      else
        puts '', "Couldn't find a category for #{c['id']} '#{c['title']}'!"
      end
    end
  end

  # 此方法是 import_categories 的替代方案。
  # 它使用主题前缀而不是节点。
  def import_categories_from_thread_prefixes
    puts "", "importing categories..."

    categories = mysql_query("
                              SELECT prefix_id id
                              FROM #{TABLE_PREFIX}thread_prefix
                              ORDER BY prefix_id ASC
                            ").to_a

    create_categories(categories) do |category|
      {
        id: category["id"],
        name: "Category-#{category["id"]}"
      }
    end

    @prefix_as_category = true
  end

  def import_posts
    puts "", "creating topics and posts"

    total_count = mysql_query("SELECT count(*) count from #{TABLE_PREFIX}post").first["count"]

    posts_sql = "
        SELECT p.post_id id,
               t.thread_id topic_id,
               #{@prefix_as_category ? 't.prefix_id' : 't.node_id'} category_id,
               t.title title,
               t.first_post_id first_post_id,
               p.user_id user_id,
               p.message raw,
               p.post_date created_at
        FROM #{TABLE_PREFIX}post p,
             #{TABLE_PREFIX}thread t
        WHERE p.thread_id = t.thread_id
        AND p.message_state = 'visible'
        AND t.discussion_state = 'visible'
        ORDER BY p.post_date
        LIMIT #{BATCH_SIZE}" # 需要 OFFSET

    batches(BATCH_SIZE) do |offset|
      results = mysql_query("#{posts_sql} OFFSET #{offset};").to_a

      break if results.size < 1
      next if all_records_exist? :posts, results.map { |p| p['id'] }

      create_posts(results, total: total_count, offset: offset) do |m|
        skip = false
        mapped = {}

        mapped[:id] = m['id']
        mapped[:user_id] = user_id_from_imported_user_id(m['user_id']) || 2
        mapped[:raw] = process_xenforo_post(m['raw'], m['id'])
        mapped[:created_at] = Time.zone.at(m['created_at'])

        if m['id'] == m['first_post_id']
          if m['category_id'].to_i == 0 || m['category_id'].nil?
            mapped[:category] = SiteSetting.uncategorized_category_id
          else
            mapped[:category] = category_id_from_imported_category_id(m['category_id'].to_i) ||
              @category_mappings[m['category_id']].try(:[], :category_id)
          end
          mapped[:title] = CGI.unescapeHTML(m['title'])
        else
          parent = topic_lookup_from_imported_post_id(m['first_post_id'])
          if parent
            mapped[:topic_id] = parent[:topic_id]
          else
            puts "Parent post #{m['first_post_id']} doesn't exist. Skipping #{m["id"]}: #{m["title"][0..40]}"
            skip = true
          end
        end

        skip ? nil : mapped
      end
    end

    # 应用标签
    batches(BATCH_SIZE) do |offset|
      results = mysql_query("#{posts_sql} OFFSET #{offset};").to_a
      break if results.size < 1

      results.each do |m|
        next unless m['id'] == m['first_post_id'] && m['category_id'].to_i > 0
        next unless tag = @category_mappings[m['category_id']].try(:[], :tag)
        next unless topic_mapping = topic_lookup_from_imported_post_id(m['id'])

        topic = Topic.find_by_id(topic_mapping[:topic_id])

        topic.tags = [tag] if topic
      end
    end

  end
  
  def process_xenforo_post(raw, import_id)
    s = raw.dup

    # :) 被编码为 <!-- s:) --><img src="{SMILIES_PATH}/icon_e_smile.gif" alt=":)" title="Smile" /><!-- s:) -->
    s.gsub!(/<!-- s(\S+) --><img (?:[^>]+) \/><!-- s(?:\S+) -->/, '\1')

    # 一些链接看起来像这样:<!-- m --><a class="postlink" href="http://www.onegameamonth.com">http://www.onegameamonth.com</a><!-- m -->
    s.gsub!(/<!-- \w --><a(?:.+)href="(\S+)"(?:.*)>(.+)<\/a><!-- \w -->/, '[\2](\1)')

    # 许多 phpbb bbcode 标签带有哈希。例如:
    #   [url=https&#58;//google&#46;com:1qh1i7ky]click here[/url:1qh1i7ky]
    #   [quote=&quot;cybereality&quot;:b0wtlzex]Some text.[/quote:b0wtlzex]
    s.gsub!(/:(?:\w{8})\]/, ']')

    # 移除 mybb 视频标签。
    s.gsub!(/(^\[video=.*?\])|(\[\/video\]$)/, '')

    s = CGI.unescapeHTML(s)

    # phpBB 会这样缩短链接文本,这会破坏我们的 markdown 处理:
    #   [http://answers.yahoo.com/question/index ... 223AAkkPli](http://answers.yahoo.com/question/index?qid=20070920134223AAkkPli)
    #
    # 修复错误:xenforo.rb: 160: in `gsub!': invalid byte sequence in UTF-8 (ArgumentError)
    if ! s.valid_encoding?
      s = s.encode("UTF-16be", invalid: :replace, replace: "?").encode('UTF-8')
    end

    # 暂时绕过:
    s.gsub!(/\[http(s)?:\/\/(www\.)?/, '[')

    # [QUOTE]...[/QUOTE]
    s.gsub!(/\[quote\](.+?)\[\/quote\]/im) { "\n> #{$1}\n" }

    # 嵌套引用
    s.gsub!(/(\[\/?QUOTE.*?\])/mi) { |q| "\n#{q}\n" }

    # [QUOTE="username, post: 28662, member: 1283"]
    s.gsub!(/\[quote="(\w+), post: (\d*), member: (\d*)"\]/i) do
      username, imported_post_id, _imported_user_id = $1, $2, $3

      topic_mapping = topic_lookup_from_imported_post_id(imported_post_id)

      if topic_mapping
        "\n[quote=\"#{username}, post:#{topic_mapping[:post_number]}, topic:#{topic_mapping[:topic_id]}\"]\n"
      else
        "\n[quote=\"#{username}\"]\n"
      end
    end

    # [URL=...]...[/URL]
    s.gsub!(/\[url="?(.+?)"?\](.+)\[\/url\]/i) { "[#{$2}](#{$1})" }

    # [IMG]...[/IMG]
    s.gsub!(/\[\/?img\]/i, "")

    # 将 list 标签转换为 ul,将 list=1 标签转换为 ol
    # (基本上,我们只缺少 list=a...)
    s.gsub!(/\[list\](.*?)\[\/list:u\]/m, '[ul]\1[/ul]')
    s.gsub!(/\[list=1\](.*?)\[\/list:o\]/m, '[ol]\1[/ol]')
    # 将 *-标签转换为 li-标签,以便 bbcode-to-md 可以对 phpBB 的列表执行其魔法:
    s.gsub!(/\[\*\](.*?)\[\/\*:m\]/, '[li]\1[/li]')

    # [YOUTUBE]<id>[/YOUTUBE]
    s.gsub!(/\[youtube\](.+?)\[\/youtube\]/i) { "\nhttps://www.youtube.com/watch?v=#{$1}\n" }

    # [youtube=425,350]id[/youtube]
    s.gsub!(/\[youtube="?(.+?)"?\](.+)\[\/youtube\]/i) { "\nhttps://www.youtube.com/watch?v=#{$2}\n" }

    # [MEDIA=youtube]id[/MEDIA]
    s.gsub!(/\[MEDIA=youtube\](.+?)\[\/MEDIA\]/i) { "\nhttps://www.youtube.com/watch?v=#{$1}\n" }

    # [ame="youtube_link"]title[/ame]
    s.gsub!(/\[ame="?(.+?)"?\](.+)\[\/ame\]/i) { "\n#{$1}\n" }

    # [VIDEO=youtube;<id>]...[/VIDEO]
    s.gsub!(/\[video=youtube;([^\]]+)\].*?\[\/video\]/i) { "\nhttps://www.youtube.com/watch?v=#{$1}\n" }

    # [USER=706]@username[/USER]
    s.gsub!(/\[user="?(.+?)"?\](.+)\[\/user\]/i) { $2 }

    # 移除颜色标签
    s.gsub!(/\[color=[#a-z0-9]+\]/i, "")
    s.gsub!(/\[\/color\]/i, "")

    if Dir.exist? ATTACHMENT_DIR
      s = process_xf_attachments(:gallery, s)
      s = process_xf_attachments(:attachment, s)
    end

    s
  end

  def process_xf_attachments(xf_type, s)
    ids = Set.new
    ids.merge(s.scan(get_xf_regexp(xf_type)).map { |x| x[0].to_i })
    ids.each do |id|
      next unless id
      sql = get_xf_sql(xf_type, id).squish!
      results = mysql_query(sql)
      if results.size < 1
        # 剥离附件
        s.gsub!(get_xf_regexp(xf_type, id), '')
        STDERR.puts "#{xf_type.capitalize} id #{id} not found in source database. Stripping."
        next
      end
      original_filename = results.first['filename']
      result = results.first
      upload = import_xf_attachment(result['data_id'], result['file_hash'], result['user_id'], original_filename)
      next unless upload
      if upload.present? && upload.persisted?
        s.gsub!(get_xf_regexp(xf_type, id), @uploader.html_for_upload(upload, original_filename))
      else
        STDERR.puts "Could not find upload: #{upload.id}. Skipping attachment id #{id}"
      end
    end
    s
  end

  def import_xf_attachment(data_id, file_hash, owner_id, original_filename)
    current_filename = "#{data_id}-#{file_hash}.data"
    path = Pathname.new(ATTACHMENT_DIR + "/#{data_id / 1000}/#{current_filename}")
    new_path = path.dirname + original_filename
    upload = nil
    if File.exist? path
      FileUtils.cp path, new_path
      upload = @uploader.create_upload owner_id, new_path, original_filename
      FileUtils.rm new_path
    else
      STDERR.puts "Could not find file #{path}. Skipping attachment id #{data_id}"
    end
    upload
  end

  def get_xf_regexp(type, id = nil)
    case type
    when :gallery
      Regexp.new(/\[GALLERY=media,\s#{id ? id : '(\d+)'}\].+?\]/i)
    when :attachment
      Regexp.new(/\[ATTACH(?>=\w+)?\]#{id ? id : '(\d+)'}\[\/ATTACH\]/i)
    end
  end

  def get_xf_sql(type, id)
    case type
    when :gallery
      <<-SQL
		SELECT m.media_id, m.media_title, a.attachment_id, a.data_id, d.filename, d.file_hash,d.user_id
		FROM xengallery_media as m
		INNER JOIN #{TABLE_PREFIX}attachment a on m.attachment_id = a.attachment_id
		INNER JOIN #{TABLE_PREFIX}attachment_data d on a.data_id = d.data_id
		WHERE media_id = #{id}
      SQL
    when :attachment
      <<-SQL
		SELECT a.attachment_id, a.data_id, d.filename, d.file_hash, d.user_id
		FROM #{TABLE_PREFIX}attachment AS a
		INNER JOIN #{TABLE_PREFIX}attachment_data d ON a.data_id = d.data_id
		WHERE attachment_id = #{id}
      SQL
    end
  end

  def mysql_query(sql)
    @client.query(sql, cache_rows: false)
  end
  
  def import_avatars
    if AVATAR_DIR
      users = User.all
      users.each do |u|
        unless u.custom_fields["import_id"].nil?
          import_id = u.custom_fields["import_id"]
          if import_id.to_i < 1000
            dir_num = "0"
          elsif import_id.to_i > 1000
            dir_num = import_id.first
          end
        
          filename = "#{import_id}.jpg"
          avatar_file_path = "#{AVATAR_DIR}/l/#{dir_num}"
          avatar_file_path_and_name = "#{avatar_file_path}/#{filename}"
          profile_pic_file_path_and_name = "#{PROFILE_PIC_DIR}/#{filename}"
          
          if File.exists?(profile_pic_file_path_and_name)
            upload_pic_or_avatar(u, profile_pic_file_path_and_name, filename)
          elsif File.exists?(avatar_file_path_and_name)
            upload_pic_or_avatar(u, avatar_file_path_and_name, filename)
          end
        end
      end
    end
  end
  
  def upload_pic_or_avatar(u, file_path_and_name, filename)
    upload = create_upload(u.id, file_path_and_name, filename)
    if upload.persisted?
      puts "upload persisted"
      u.import_mode = false
      u.create_user_avatar
      u.import_mode = true
      u.user_avatar.update(custom_upload_id: upload.id)
      u.update(uploaded_avatar_id: upload.id)
    else
      puts "Error: Upload did not persist for #{u.username} #{filename}!"
    end
  end
  
  
end

ImportScripts::XenForo.new.perform

注意事项:

  • 它添加了一个 import_avatars 步骤/方法(这些必须是 jpg 格式)
  • 我们添加了头像和个人资料图片的路径
  • 我们将新创建的访客用户的 ID 添加为后备,当用户不存在但帖子存在时使用(访客用户)

现在,如果存在,让我们复制个人资料图片以用作头像;如果不存在,则使用用户已上传的头像。如果您只想直接进行头像到头像的导入,可以跳过此步骤。

个人资料图片复制器:

首先,使用 gem install down 安装 Down。

然后创建一个新文件,内容如下:

require 'down'

(1..NUMBER_OF_USERS).each do |u|
  puts "Fetching user #{u}"
  puts ""
  profile_pic_url = "https://www.forum-name.com/image.php?u=#{u}&type=profile"
  destination = "/full/path/where/you/want/to/save/profile/pics/#{u}.jpg"
  begin
    Down.download(profile_pic_url, destination: destination)
    puts "Completed #{u}"
  rescue
    puts "Failed #{u}"
  end
  puts ""
end

注意事项:

  • 假设所有个人资料图片(和头像)都是 jpg 格式。幸运的是,我们只允许 jpg 格式作为头像和个人资料图片,所以这对我们有效。
  • 确保您的路径和 URL 正确,并且个人资料和个人资料图片对访客可见。
  • 将 NUMBER_OF_USERS 替换为您拥有的用户数量(例如 3872)。

然后在终端中运行脚本:ruby /path/to/name-of-script.rb。这将把所有个人资料图片复制到该目录,然后只需进入该目录,按文件大小排序,并删除任何空白文件(会有很多——因为并非每个人都上传了个人资料图片)。

执行导入:

完成以上所有步骤后,您就可以开始了 :smiley:

RAILS_ENV=development bundle exec ruby script/import_scripts/xenforo.rb

导入一个拥有 10 万篇帖子和几千名成员的论坛大约需要 90 分钟,我的初步测试显示它运行良好。然而..

注意事项:

  • 它只会从个人资料中导入 locationabout 文本
  • 我尚未检查附件上传,因为我在用于测试的测试论坛上从未允许过附件上传。我在想要导入的其中一个论坛上(它要大得多,因此使用这个较小的论坛进行测试)已经进行了测试,稍后会报告结果。
  • 在开发机器上完成的导入现已成功迁移/“恢复”到生产环境中,一切顺利 :+1:
  • (仍需在带有附件的更大论坛上测试此功能——完成后将更新此帖子)

现在发布此内容,因为我认为目前有一些人正在尝试将他们的 vB3 论坛导入到 Discourse。

5 个赞