Migrate an SMF2 forum to Discourse

We tried and we are facing these issues: it has imported all posts, but topic titles won’t apear and external images won’t render. The current SMF2 Forum is: https://forum.mundofotografico.com.br we are trying to migrate to Discourse here: https://discourse.fotografos.online - All the topics and proper descriptions didn’t come through, images are not loading… please help! @marcozambi @miligraf @FireAllianceNX @pfaffman

I’m just embarking on the SMF migration process and I’m currently importing posts into a test instance at about 1000/hour so all is well so far apart from the MySQL performance script where MySQL didn’t like the ‘ALTER USER’ command for some reason. I manually did a ‘CREATE USER’ and all was well after that.

I read the comment about deleted users but I can’t easily create new users/fake emails to cover all my deleted users (my forum has been running for over 20 years and I’ve probably got more deleted users than real users now). I suspect I’ve got 4-5000 deleted users. Not all will have posted but a lot will have done so I probably have many hundreds of ‘missing’ users.

The posts are being imported as belonging to ‘system’ which isn’t really ideal. I did wonder whether the following would work.

  1. Before importing, create a dummy user, e.g. ‘Deleted User’.
  2. Find out the user number for ‘Deleted User’
  3. Modify the line “user_id: user_id_from_imported_user_id(message[:id_member]) || -1,” in smf2.rb and replace the ‘-1’ with the user number of ‘Deleted User’ (I think system user is -1?)

Would that work? Also, are there other places in smf2.rb where I’d need to make a similar change?

Hi there, by “deleted” do you mean that they’re actually deleted from the SMF database, or are they still in the database with their username and email and marked as suspended? How do posts by those “deleted” users currently display in SMF?

I’m in the middle of a huge migration from Drupal to Discourse, also from a long-established forum with tons of suspended users. I definitely wanted to maintain those same suspended usernames and their associated email addresses in Discourse, so I had to add that function to the Drupal importer script for Discourse. Basically the script imports all the users as normal active users, and if they had any publicly visible posts those will also get imported just like on the original forum. Then at the very tail end of the process I added a function that I lifted from another importer script to go through the Drupal database and if the user was marked as suspended to also suspend the Discourse account. You can see the code for that in my post history here.

1 Like

Hi. The users are actually deleted, i.e. there’s no longer a record for them in the smf_member table. SMF doesn’t have a function to suspend users. You can ban users, but that doesn’t seem right do to for an account where the user has died or lost interest in the hobby/forum. It’s not really right from a data protection perspective either.

SMF posts have two fields stored for each record…the user member number, which is set to zero for deleted users, and the poster name, which contains the username of the poster. So you can see which user posted the message but there are no longer any details (email, full name, etc) available for the user. Their posts have a ‘Guest’ marker when displayed.

I guess I could create a new user account for every user who has posted a message that has a member ID of zero and assign a dummy email address for the account, then mark the user as suspended afterwards. I could mark accounts as suspended based on the format of the dummy email account if I used something unique but identifiable. That feels a bit weird in some cases though…creating accounts for people who I know died 10-15 years ago!

I have time to think about this though…the migration partially worked but I now have to figure out why attachments weren’t attached, in-forum links weren’t modified and why passwords for migrated users don’t work. There may be other problems too, but I’ll work at fixing those problems first and then see what else crops up.

You mean Postgres? I’m not sure what this is about.

What I would do is if the user id is 0 use the username for the ID. Then if find_username_by_import_id fails to find the user, create the user, setting the email address to fake_email (it’s a function in base.rb that generates a fake email address) and the username with the username that you have. Then if you’re ambitius you could at the end of the script you could suspend all users that have @email.invalid in their email address. They won’t be active, so I don’t think it would matter much if you didn’t suspend them.

Another way would be to do a query that somehow generated a list of all of the deleted users and then created them before you started doing posts, but that seems harder.

If you want to create a deleted user user and have all of those posts owned by that user instead of system you could do that and just replace the -1 with the user number of deleted user. You could create it as a regular user or do something fancy and make it have a user id of -2 or something like that.

In some systems this is because sometimes attachments are in the body of the post and others the attachment record is in the database.

Did you install the Migrated password hashes support plugin after you ran the import (it can interfere with running imports in at least some circumstances). Does SMF2 hash passwords the same way that smf does

Sorry wrong name for the script. It’s the MySQL script referred to in the first post

– file: ~/smf2/script_for_mysql_tuning.sql
ALTER USER ‘user’@‘%’ IDENTIFIED WITH mysql_native_password BY ‘pass’;

Thanks for the suggestions about users and particularly fake_email. My first task is to learn enough Ruby to be able to make changes to the import script!

SMF2 attachments are records in the database. Having dug a bit deeper it looks like some have been imported, but only a few hundred out of tens of thousands. I’ll keep digging to see if I can figure out why.

Ahhh, that’s probably what I’m missing! I’m pretty sure that SMF2 uses the same hashing (salted MD5 IIRC) as SMF1 so the plugin will hopefully fix the problem. I need to do more import runs before I worry too much about users logging in.

One other question comes to mind. Is there a way to reset the system to allow me to do another import. I should have taken a backup before I started but forgot :anguished:

Oh. You mean just getting mysql set up. I see.

If you know some other languages, you can probably just muck along.
I wrote several importers before I did anything like learning Ruby. :slight_smile:

Here is one way to drop and create a new Discourse database.

sv stop unicorn;DISABLE_DATABASE_ENVIRONMENT_CHECK=1 IMPORT=1 rake db:drop db:create db:migrate; sv start unicorn

If you can remember to make a backup that can be a bit faster. Maybe.

Another trick, once you have the users figured out, is to stop the script after users are imported and make a backup then. That will let you debug the post import without having to import all the users again.

I know a few. I wrote my first program in 1976 in binary machine code on an Intel 4004. I’m starting to make sense of smf2.rb with a bit of DuckDuckGoing to understand some of the code structures that are new to me.

Thanks for database drop/create method. Time to start over and see if I can make some incremental changes to the importer for my data.

1 Like

I’ve managed to mod the importer to create dummy accounts with a fake email address for deleted users and the dummy accounts own their correct posts so that’s a good start.

I’m trying to understand attachments next because I don’t see any on any of the posts I’ve imported so far (and there should be some).

If I create a message normally via the Discourse web page I get a record in the posts table (id=4346), two records in the uploads table (ids=403 and 404), four records in upload_references (403/Draft/4, 403/Post/4346, 404/Draft/4, 404/Post/4346). I also see 403 in the image_upload_id field for post 4346 and HTML referring to the two uploads in the posts/cooked field.

For imported posts, I get a post table record for each imported SMF message and a record in the uploads table for each attachment associated with an imported SMF message. The uploads table records refer to disk files that contain the correct images, so that part is working OK. However, I don’t get any upload_references records for the uploaded images or any of the upload ids in the image_upload_id field in the posts table.

I assume that I need to try to get the upload_references records created and posts-image_upload_id and cooked fields populated, but I wanted to check first that there isn’t some other way of associating uploads with posts that’s being used (or attempting to be used) by the importer?

It sounds like you need to add a reference to the host/upload into raw. There is some function that will generate a link for you. I can’t remember what it’s called. I think it’s in the uploads model, but it might be easier to find in some other import script if you don’t know what a model is.

I was making progress tweaking the import script to suit the vagaries of my forum, but came to a screeching halt a couple of days ago. After the latest update to Discourse beta I can no longer build the import container. I get…

> FAILED
> --------------------
> Pups::ExecError: cd /var/www/discourse && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y libmariadb-dev failed with return #<Process::Status: pid 439 exit 100>
> Location of failure: /usr/local/lib/ruby/gems/3.1.0/gems/pups-1.1.1/lib/pups/exec_command.rb:117:in `spawn'
> exec failed with the params {"cd"=>"$home", "cmd"=>["echo \"gem 'mysql2'\" >> Gemfile", "apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y libmariadb-dev", "su discourse -c 'bundle config unset deployment'", "su discourse -c 'bundle install --no-deployment --path vendor/bundle --jobs 4 --without test development'"]}
> bootstrap failed with exit code 100
> ** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.

I’ve seen the posts about the yarn key expiry and have fixed that. That was stopping the libmariadb-dev package being installed but I did a manual install of the package which has worked correctly. The rebuild of import still doesn’t work with the mysql import template enabled even after manually installing the MariaDB package.

I’ve built a new server and started with a fresh install of Discourse just to avoid any potential issues with the previous server/install. The new server gives the same error as the old one though.

I’ve no ideas on what to try next so I’ll welcome any suggestions!

See Apt-get update fails inside container yarn repo not signed - #5 by pfaffman. You’ll need to edit the mysql template.

1 Like

Thank you for pointing me in the right direction. I see what I did wrong now!

1 Like

We’re starting with the test migration of an SMF 2 (v2.0.15 to be precise) forum and one of the first issues that surfaced is a problem when categories have an ampersand in their name:
image

Thread titles with the same seem to be fine:
image

So far, it seems that ampersand is the only problematic character and e.g. German umlauts are fine:
image

We’re probably also hit by issues reported here in the thread (deleted users, attachments, links within the forum), but the import is still running, so I’ll update once it has completed.

In this regard, I wonder if the import speed is really supposed to be that slow. We are currently importing at 1750 items/min (initially it was a bit closer to 2000 items/min) on an AMD Ryzen 5 3600 machine with 64GB RAM (Hetzner, Ubuntu 22.04), which puts the whole migration at roughly 3 days.

That’s a pretty good speed.

I imagine whatever issues reported befit still exist, though a surprising number of things are unique to a forum. If you’d like some stuff fixed and have a budget, I’d be happy to help.

1 Like

Thanks! I’ll get back to you once things become more concrete.

Right now we (small non-profit organisation centered around tabletop roleplaying games and related hobbies) are in evaluation phase - there’s consensus that we want a new software and Discourse, so far, is the favourite option. But we’ll have to collect all to-dos (migration, new theme, new plugins/theme components if needed) and see if we can fit everything into our budget.

1 Like