SMF2 to discourse - Troubles importing attachments


(Marco) #1

Dear all,
since a few days I’m working on importing an SMF2 forum content into a brand new install of discourse.

I’ve configured my host following this guide How to use the bbpress import script - or any other import script with mysql dependency, with a few tweaks to adapt it to SMF2.

My forum is quite big: I have about 283000 posts and 18000 attachments.

My problem is the following:

  • if I perform the import from SMF without attachments (I do this simply renaming the attachments directory) then the scripts performs well and I get all my posts (with a ton of error for missing attachments, as expected).
  • If I perform the import with the attachments, I get random crashes along the posts’ import. The message I receive is the following:
3093 / 282702 (  1.1%)  [905 items/min]  /var/www/discourse/vendor/bundle/ruby/2.4.0/gems/rack-mini-profiler-1.0.0/lib/patches/db/mysql2.rb:8:in `each': Lost connection to MySQL server during query (Mysql2::Error)
        from /var/www/discourse/vendor/bundle/ruby/2.4.0/gems/rack-mini-profiler-1.0.0/lib/patches/db/mysql2.rb:8:in `each'
        from /var/www/discourse/script/import_scripts/base.rb:493:in `create_posts'
        from script/import_scripts/smf2.rb:193:in `import_posts'
        from script/import_scripts/smf2.rb:67:in `execute'
        from /var/www/discourse/script/import_scripts/base.rb:46:in `perform'
        from script/import_scripts/smf2.rb:26:in `run'
        from script/import_scripts/smf2.rb:623:in `<main>'

For completeness, I have to add that I mounted the SMF2 attachments directory as a volume in the container. I have the suspicion it might be something related to performances of the mounted volume. If the attachments directory is found and used in the import, the whole process slows down considerably.

My server is pretty beefy, I don’t think this is about the resources available.
I’m a newbie to ruby and discourse, so I don’t have much to offer here, but asking for some hints or help.

Thanks in advance!


(Jay Pfaffman) #2

I’ve had that trouble before, and fixed it with the code below. For some reason, it always fails on the first connection, but it’ll catch the lost connection and try again.

  def mysql_query(sql)
    @client.query(sql, cache_rows: true)
  rescue Exception => e
    puts '=' * 50
    puts "problem with database. Trying again in 5"
    sleep 5
    puts e.message
    @client = Mysql2::Client.new(
      host: DB_HOST,
      username: DB_USER,
      password: DB_PW,
      database: DB_NAME
    )
    @client.query(sql, cache_rows: true)
  end



(Marco) #3

Thanks a lot @pfaffman.
If I understood correctly your suggestion, I had to adapt it to the smf2.rb import script.
Since there is no such method as mysql_query(sql) in the script, I have updated what I think is the most likely equivalent of it.

def __query(db, sql, **opts)
    db.query(sql.gsub('{prefix}', options.prefix),
      { symbolize_keys: true, cache_rows: false }.merge(opts))
    rescue Exception => e
      puts '=' * 50
      puts "Problem with database. Trying again in 5"
      sleep 5
      puts e.message
      db = create_db_connection
      db.query(sql.gsub('{prefix}', options.prefix),
        { symbolize_keys: true, cache_rows: false }.merge(opts))
    end

What do you think?

Later today I’ll give it a try and come back with feedback.


(Colin Marshall) #4

I had a fix for this issue in this thread:

https://meta.discourse.org/t/importer-for-simple-machines-2-forums/17656

For reasons unknown, the topic no longer exists. I only made that post in the last 1-2 months. Hopefully the Discourse team can chime in and explain what happened to it.


(Marco) #5

As promised, I’m back with feedback… negative :frowning_face:
It might well be that I made the change in the wrong place.
Reading once again the error message, it seems to me that the connection with mysql happens in the loop here discourse/base.rb at master · discourse/discourse · GitHub and the error shall then somehow be handled in this file…
Problem is, when the loss of connection happens and error is raised, the loop is interrupted and there is no obvious way to me (but I’m a newbie here as said) to catch and rescue the error so that I can reconnect to mysql and carry on from the message being worked when the disconnection happened…


(Colin Marshall) #6

@marcozambi I didn’t delete the VM I did this in so I was able to get the info for my fix.

The first thing I did was modify smf2.rb. I changed the create_db_connection method to the following:

def create_db_connection
  Mysql2::Client.new(
    host: options.host,
    username: options.username,
    password: options.password,
    database: options.database,
    read_timeout: 10000,
    write_timeout: 10000,
    connect_timeout: 10000,
    reconnect: true
  )
end

Next, I installed MariaDB in place of MySQL. I’m not sure if this step was necessary, it was just one thing I tried while troubleshooting. I’m including it because I think some of the variables in the next step are exclusive to MariaDB, and I’m not exactly sure which setting actually solved the problem. I just went through and found all the settings dealing with timeouts, # of connections, packet size, etc. and set them at the maximum values.

Here’s a guide for replacing MySQL with MariaDB:

You’ll have to import your SMF database to MariaDB after you get it setup.

Finally, add the following to the end of the /etc/mysql/conf.d/mariadb.cnf configuration file:

max_allowed_packet=1073741824
connect_timeout=5000
max_connections=10000

max_length_for_sort_data=8388608
max_sort_length=8388608

net_read_timeout=10000
net_write_timeout=10000
net_retry_count=1000
net_buffer_length=1073741824

I believe one of these settings in the last step is what resolved it.

I think this solution will work better than the other suggestion because it makes it so the connection is never lost, as opposed to retrying the connection when it is lost.


(Marco) #7

THIS. Many many thanks @cmwebdev !!

In my previous attempts I had also set the connection request like that:

    Mysql2::Client.new(host: options.host, username: options.username, 
                       password: options.password, database: options.database,
                       read_timeout: 3600, write_timeout: 3600, connect_timeout: 3600,
                       reconnect: true)

but what definitively did the trick was to tune the mysql server configuration to maximise buffers and timeouts.
I’m noting down all I’m doing and in a few days I’ll post a comprehensive guide to SMF2 migration.


How to migrate from SMF2 to Discourse