Migrate a phpBB3 forum to Discourse

Great! Glad you got it sorted. What was the trick?

Hopefully you won’t need this Discourse Migration - Literate Computing

I had to keep running import_phpbb3.sh until all the categories were imported. Script ran overnight, and a network interruption caused some error on my end. I re-started the script, and I think it is working now. If there are more issues, I’m going to have to go in and clean the MySQL table.

1 Like

Hi eveyone! Forum is mostly up and running…looks fantastic. I’m writing script to modify [/quote] tag. Board is fairly old, dating back to 2001, and just adding a space above and below the closing tag fixes a lot. I’m sure there was a setting that I could have updated in the import script, but it’s my first time migrating, so I’m learning as I go.

Question: I’m fixing 10 years of data, and have a live phpBB board running. It’s taking a bit of time. Can I use the import_phpbb3.sh script to import the last X days of posts from my forum? I think it’s just a merge as far as the script goes. I can export last 7 days of data from MySQL…but I don’t know if that would work. Thoughts?

I haven’t verified the if the script is working…I tested it on small parts, but not batching. Issue I had was a space above and below the closing tag. Now I’m going to go mow the lawn and come back and check later:

batch_size = 1000
total_processed = 0

# Process posts across the entire site in batches
Post.find_in_batches(batch_size: batch_size) do |batch|
  updated_posts = []

  batch.each do |post|
    original_raw = post.raw
    # Apply the correction
    new_raw = original_raw.gsub(/\n\\n\[\/quote\]\\n\n\n/, "\n\n[/quote]\n\n")

    if original_raw != new_raw
      post.update_column(:raw, new_raw)  # Direct column update to skip callbacks
      updated_posts << post
      total_processed += 1
    end
  end

  # Rebake only the updated posts to minimize load
  updated_posts.each(&:rebake!)

  puts "Processed a batch of #{batch.size}. Total processed so far: #{total_processed}."
end

puts "Total #{total_processed} posts processed across the entire site."
1 Like

This worked on individual posts:

post = Post.find(344572) # Replace 344572 with the correct ID

post.raw = post.raw.gsub(/\n\n[/quote]\n\n\n/, “\n\n[/quote]\n\n”)

I’d modify the script to do this when it imports the data. I’m surprised that it doesn’t already. It’s worth taking a closer look at things.

On several scripts that I’ve worked on, I’ve added an IMPORT_AFTER ENV setting and modified the queries to include where some_timestamp > import_after_data. I don’t think this one has such an option, but I haven’t paid careful attention.

But beware that it’s likely that things that are in the data from 10 years ago are likely to be different from the last 2 years, so testing on just recent data is great for debugging stuff that you know is everywhere, you’ll want to test on the whole database too.

1 Like

There’s a bunch of different stuff in the mix. Import is about 99% complete… I just need to go back and suck in last weeks of posts :slightly_smiling_face: Once the line breaks are added and this is fixed <LINK_TEXT text= all should be good :slight_smile:

1 Like

Here is what I did to clean up a large chunk of my posts after I imported. I had AI to provide an explanation to someone coming over from PhpBB. I was up till 5 am trying to grey his to work :rofl::rofl::rofl:

To run a Ruby script within your Discourse environment that processes forum topics in batches of 1000, and applies specific transformations to each post within those topics, you will follow a sequence of steps to access your server, enter the appropriate environment, and execute the script. Here’s a detailed guide including the script itself:

Step 1: Securely Connect to Your Server

Using a Secure Shell (SSH) client like PuTTY for Windows, connect to your server where the Discourse forum is hosted. You’ll need the IP address or domain name of your server, as well as your credentials (username and password or an SSH key).

Step 2: Access the Discourse Docker Container

Once logged into your server, navigate to the Discourse installation directory, typically /var/discourse. Then, enter the Docker container that runs Discourse using the following commands:

bash

cd /var/discourse
./launcher enter app

Step 3: Open the Rails Console

Inside the Docker container, you can interact with your Discourse application through the Rails console. This is a Ruby on Rails environment that allows you to run Ruby code directly against your Discourse database and application logic. Start the console with:

bash

rails c

Step 4: Execute the Ruby Script

With the Rails console open, you’re ready to run the Ruby script. The script should be prepared in advance and copied to your clipboard. In PuTTY, you can paste the script by right-clicking or pressing Shift + Insert.

Here’s the complete script you’ll be using:

code

# Retrieve an array of all topic IDs
topic_ids = Topic.pluck(:id)

# Define batch size
batch_size = 1000
current_batch_start = 0

while current_batch_start < topic_ids.length
  # Process a batch of 1000 topics at a time
  topic_ids[current_batch_start, batch_size].each do |topic_id|
    # Fetch the topic by ID
    topic = Topic.find(topic_id)
  
    # Skip if the topic is nil
    next if topic.nil?
  
    # Initialize a count of transformed posts for this topic
    transformed_count = 0

    # Iterate over each post within the topic
    topic.posts.each do |post|
      # Flag to track if transformations have been made
      transformed = false

      # Apply transformations
      transformed |= post.raw.gsub!(/<\/?r>/, '').present?
      transformed |= post.raw.gsub!(/<\/?s>/, '').present?
      transformed |= post.raw.gsub!(/<\/?e>/, '').present?
      transformed |= post.raw.gsub!(/<\/?QUOTE[^>]*>/, '').present?
      transformed |= post.raw.gsub!(/\[quote=““([^”]+)””\]/, '[quote="\1"]').present?
      transformed |= post.raw.gsub!(/\\n/, "\n").present?
      transformed |= post.raw.gsub!(/\[quote=([^\s]+)\s+post_id=\d+\s+time=\d+\s+user_id=\d+\]/, '[quote="\1"]').present?
      transformed |= post.raw.gsub!(/<URL url="([^"]+)">.*?<LINK_TEXT text="[^"]+">[^<]+<\/LINK_TEXT>.*?<\/URL>/, '\1').present?
      transformed |= post.raw.gsub!(/\[\/quote\]/, "\n[/quote]\n").present?
      transformed |= post.raw.gsub!(/\A\n/, '').present?

      # Save and rebake the post if any transformations have occurred
      if transformed
        post.save!
        post.rebake!
        transformed_count += 1
      end
    end

    # Output the result for the current topic
    if transformed_count > 0
      puts "Transformed #{transformed_count} posts in topic #{topic_id}."
    else
      puts "No transformations were necessary for topic #{topic_id}."
    end
  end

  # Update the starting index for the next batch
  current_batch_start += batch_size

  # Check if there are more topics to process
  if current_batch_start < topic_ids.length
    puts "Completed a batch of #{batch_size} topics. Do you want to continue to the next batch? (yes/no)"
    response = gets.strip.downcase
    break unless response == 'yes'
  end
end

Understanding the Script and Batch Processing

  • Batch Processing: This approach allows you to handle large sets of data in smaller, manageable chunks. It’s particularly useful for reducing load on the server and for operations that may take a long time to complete if done all at once. Here, it’s applied to process Discourse topics in batches of 1000.

This is what it should look like as it’s running.

No transformations were necessary for topic 19556.
No transformations were necessary for topic 35766.
No transformations were necessary for topic 35783.
No transformations were necessary for topic 35778.
No transformations were necessary for topic 35774.
No transformations were necessary for topic 35770.
Transformed 292 posts in topic 20234.
No transformations were necessary for topic 35781.
No transformations were necessary for topic 35779.
Transformed 242 posts in topic 20218.
Transformed 22 posts in topic 19522.
No transformations were necessary for topic 35771.
No transformations were necessary for topic 35767.
Transformed 2 posts in topic 22560.
No transformations were necessary for topic 35797.
No transformations were necessary for topic 35789.
No transformations were necessary for topic 35785.
No transformations were necessary for topic 31889.
Transformed 1 posts in topic 31831.
No transformations were necessary for topic 31792.
No transformations were necessary for topic 35794.
No transformations were necessary for topic 35815.
  • Script Functionality: The script iterates over each topic ID retrieved from your Discourse database, applying specified transformations to each post within those
1 Like

I’m getting this error (post-migration, on a functioning standard install) when trying to change a user’s display name (not username) to include special characters. I get an internal server error popup when I try and the logs show the same error @DDo had.

Notably, other users are able to change their display name to include the same character (™). The relevant difference seems to be that users who have logged in post migration can have UTF-8 characters, but users that haven’t logged in can only have ASCII-8BIT.

I also assume this error would be resolved by removing discourse-migratepassword, but haven’t tested it.

Is this a bug, or something inherent to making that plugin work? If the former, is it best to create an issue on Github to report it?

Hm trying to (re)build an import container but it fails:

FAILED
--------------------
Errno::ENOENT: No such file or directory @ rb_sysopen - /etc/service/unicorn/run
Location of failure: /usr/local/lib/ruby/gems/3.2.0/gems/pups-1.2.1/lib/pups/replace_command.rb:11:in `read'
replace failed with the params {"tag"=>"precompile", "filename"=>"/etc/service/unicorn/run", "from"=>"PRECOMPILE_ON_BOOT=1", "to"=>"PRECOMPILE_ON_BOOT=0"}
bootstrap failed with exit code 1
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

I already disabled all plugins, but no change.

Does anybody have any idea?

So that’s the first idea.

Yeah I did not find anything which shows/points me to an error… I will look again…

hooks:
  after_web_config:
    - exec:
        cd: /etc/service
        cmd:
        # - rm -R unicorn
          - rm -R nginx
          - rm -R cron

I commented the line - rm -R unicorn in templates/import/phpbb3.template.yml and the build went through without the error.

What happened here? phpbb3.template.yml is the version from Github, two years old. So there must be a change somewhere else?!?

That might be from before they switched from Ubuntu to Debian. Those are likely to be updated only when someone notices that they don’t work anymore.

It doesn’t make much sense to me that that rm was a problem, but I don’t pay much attention to those if someone’s not paying me. And even then I don’t remember paying much attention to that one. :slight_smile:

I thought, that if the builder complains about a missing /etc/service/unicorn/run and excactly that is removed, I tried to comment this removal command. :wink: Worked.

Maybe somebody with more knowledge about everything wants to verify and make an update on Github in the script. I can also make a PR - but without any knowledge about everything, I do not want to.

But changing from Ubuntu to Debian changes a lot of, true.

1 Like

First off, I’m a docker neophyte, so there’s a good chance that I’ve just screwed something up.

I have a clean install of Discourse on a DigitalOcean Droplet running Ubuntu 22.04 using their prebuilt app. The forum built just fine and runs in the standard configuration.

When I execute /var/discourse/launcher rebuild import, I get the following at the end of the build:

Errno::ENOENT: No such file or directory @ rb_sysopen - /etc/service/unicorn/run
Location of failure: /usr/local/lib/ruby/gems/3.3.0/gems/pups-1.2.1/lib/pups/replace_command.rb:11:in `read'
replace failed with the params {"tag"=>"precompile", "filename"=>"/etc/service/unicorn/run", "from"=>"PRECOMPILE_ON_BOOT=1", "to"=>"PRECOMPILE_ON_BOOT=0"}
bootstrap failed with exit code 1
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

When I execute: /var/discourse/launcher enter import
I get:

86_64 arch detected.
Error response from daemon: No such container: import

Is this because of the errors at the top of this post (and if so, how do I fix it), or what am I doing wrong?

Did you create import.yml and bootstrap that container as (I’m pretty sure) the instructions say?

The instructions show copying app.yml to import.yml and adding “templates/import/phpbb3.template.yml” to import.yml (which I did). Then, you rebuild import, which generates the error in my OP. I fail to see where there are any instructions on creating a bootstrap(?).

The instructions are pretty simple, which is why I’m confused about what’s going wrong.

# docker ps -a
CONTAINER ID   IMAGE                 COMMAND        CREATED        STATUS        PORTS                                                                      NAMES
81a2f335fd01   local_discourse/app   "/sbin/boot"   14 hours ago   Up 11 hours   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   app

Sorry. Rebuild does a bootstrap. If finished then the import container was running.

Oh. I’m very sorry. I didn’t catch what was happening before. It looks like the phpbb3 template might be incompatible with recent changes to discourse_docker. But that’s all I can tell on my phone.

I think if you delete a line in the phpbb3 template that deletes “/etc/service/unicorn/run” that might make the build complete.

Jay, thank you for the answer to this problem. The build now completes properly.

Next problem: When I execute: import_phpbb3.sh I get:

The phpBB3 import is starting...

/usr/local/lib/ruby/3.3.0/psych/parser.rb:62:in `_native_parse': (<unknown>): did not find expected key while parsing a block mapping at line 3 column 1 (Psych::SyntaxError)
        from /usr/local/lib/ruby/3.3.0/psych/parser.rb:62:in `parse'
        from /usr/local/lib/ruby/3.3.0/psych.rb:455:in `parse_stream'
        from /usr/local/lib/ruby/3.3.0/psych.rb:399:in `parse'
        from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/bootsnap-1.18.4/lib/bootsnap/compile_cache/yaml.rb:129:in `strict_load'
        from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/bootsnap-1.18.4/lib/bootsnap/compile_cache/yaml.rb:186:in `input_to_storage'
        from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/bootsnap-1.18.4/lib/bootsnap/compile_cache/yaml.rb:232:in `fetch'
        from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/bootsnap-1.18.4/lib/bootsnap/compile_cache/yaml.rb:232:in `load_file'
        from /var/www/discourse/script/import_scripts/phpbb3/support/settings.rb:10:in `load'
        from script/import_scripts/phpbb3.rb:20:in `<module:PhpBB3>'
        from script/import_scripts/phpbb3.rb:16:in `<module:ImportScripts>'
        from script/import_scripts/phpbb3.rb:15:in `<main>'

I’m guessing that it doesn’t like something in my settings.yml file. How do I tell what it’s choking on?

database:
  type: MySQL # currently only MySQL is supported
  host: localhost
  port: 3306
  username: 
  password: 
  schema: phpbb
  table_prefix: phpbb_ # Change this, if your forum is using a different prefix. Usually all table names start wi
th phpbb_
  batch_size: 1000 # Don't change this unless you know what you're doing. The default (1000) should work just fin
e.

import:
  # Set this if you import multiple phpBB forums into a single Discourse forum.
  #
  # For example, when importing multiple sites, prefix all imported IDs
  # with 'first' to avoid conflicts. Subsequent import runs must have a
  # different 'site_name'.
  #
  # site_name: first
  #
  site_name: Freedom Owners Forum

  # Create new categories
  #
  # For example, to create a parent category and a subcategory.
  #
  # new_categories:
  # - forum_id: foo
  #   name: Foo Category
  # - forum_id: bar
  #   name: Bar Category
  #   parent_id: foo
  #
  new_categories: 
 - forum_id: general
   name: General
 - forum_id: systems
   name: Boat Systems
 - forum_id: photos
   name: Photos
 - forum_id: docs
   name: Manuals and Documentation
 - forum_id: buy
   name: Buy/Sell/Trade
 - forum_id: site
   name: Site Usage
 - forum_id: archives
   name: Archives

  # Category mappings
  #
  # * "source_category_id" is the forum ID in phpBB3
  # * "target_category_id" is either a forum ID from phpBB3 or a "forum_id"
  #   from the "new_categories" setting (see above)
  # * "discourse_category_id" is a category ID from Discourse
  # * "skip" allows you to ignore a category during import
  #
  # Use "target_category_id" if you want to merge categories and use
  # "discourse_category_id" if you want to import a forum into an existing
  # category in Discourse.
  #
  #  category_mappings:
  #    - source_category_id: 1
  #      target_category_id: foo
  #    - source_category_id: 2
  #      discourse_category_id: 42
  #    - source_category_id: 6
  #      skip: true
  #
  category_mappings: 
  - source_category_id: 8
      target_category_id: systems
  - source_category_id: 7
      target_category_id: systems
  - source_category_id: 9
      target_category_id: systems
  - source_category_id: 10
      target_category_id: buy
  - source_category_id: 11
      target_category_id: general
  - source_category_id: 12
      target_category_id: general
  - source_category_id: 13
      target_category_id: general
  - source_category_id: 14
      target_category_id: general
  - source_category_id: 16
      target_category_id: docs
  - source_category_id: 17
      target_category_id: docs
  - source_category_id: 18
      target_category_id: general
  - source_category_id: 19
      target_category_id: general
  - source_category_id: 20
      target_category_id: general
  - source_category_id: 21
      target_category_id: docs
  - source_category_id: 22
      target_category_id: general
  - source_category_id: 23
      target_category_id: site
  - source_category_id: 24
      target_category_id: general
  - source_category_id: 25
      target_category_id: site
  - source_category_id: 42
      target_category_id: systems
  - source_category_id: 43
      target_category_id: docs
  - source_category_id: 44
      target_category_id: general
  - source_category_id: 45
      target_category_id: general
  - source_category_id: 46
      target_category_id: site
  - source_category_id: 48
      target_category_id: general
  - source_category_id: 56
      target_category_id: general
  - source_category_id: 58
      target_category_id: systems
  - source_category_id: 59
      skip: true
  - source_category_id: 60
      target_category_id: archives
  - source_category_id: 61
      target_category_id: archives
  - source_category_id: 62
      target_category_id: archives
  - source_category_id: 63
      target_category_id: archives
  - source_category_id: 64
      target_category_id: general
  - source_category_id: 65
      target_category_id: site

  # Tag mappings
  #
  # For example, imported topics from phpBB category 1 will be tagged
  # with 'first-category', etc.
  #
  # tag_mappings:
  #   1:
  #   - first-category
  #   2:
  #   - second-category
  #   3:
  #   - third-category
  #
  tag_mappings: {}

  # Rank to trust level mapping
  #
  # Map phpBB 3.x rank levels to trust level
  # Users with rank at least 3000 will have TL3, etc.
  #
   rank_mapping:
     trust_level_1: 200
     trust_level_2: 1000
     trust_level_3: 3000
  
#  rank_mapping: {}

  # WARNING: Do not activate this option unless you know what you are doing.
  # It will probably break the BBCode to Markdown conversion and slows down your import.
  use_bbcode_to_md: false

  # This is the path to the root directory of your current phpBB installation (or a copy of it).
  # The importer expects to find the /files and /images directories within the base directory.
  # You need to change this to something like /var/www/phpbb if you are not using the Docker based importer.
  # This is only needed if you want to import avatars, attachments or custom smilies.
  phpbb_base_dir: /shared/import/data

  site_prefix:
    # this is needed for rewriting internal links in posts
    original: freedomyachts.org    # without http(s)://
    new: https://test.freedomyachts.org       # with http:// or https://

  # Enable this, if you want to redirect old forum links to the new locations.
  permalinks:
    categories: true  # redirects   /viewforum.php?f=1            to  /c/category-name
    topics: true      # redirects   /viewtopic.php?f=6&t=43       to  /t/topic-name/81
    posts: false      # redirects   /viewtopic.php?p=2455#p2455   to  /t/topic-name/81/4
    # Append a prefix to each type of link, e.g. 'forum' to redirect /forum/viewtopic.php?f=6&t=43 to /t/topic-na
me/81
    # Leave it empty if your forum wasn't installed in a subfolder.
    prefix:

  avatars:
    uploaded: true  # import uploaded avatars
    gallery: true   # import the predefined avatars phpBB offers
    remote: false   # WARNING: This can considerably slow down your import. It will try to download remote avatar
s.

  # When true: Anonymous users are imported as suspended users. They can't login and have no email address.
  # When false: The system user will be used for all anonymous users.
  anonymous_users: true

  # Enable this, if you want import password hashes in order to use the "migratepassword" plugin.
  # This will allow users to login with their current password.
  # The plugin is available at: https://github.com/discoursehosting/discourse-migratepassword
  passwords: true

  # By default all the following things get imported. You can disable them by setting them to false.
  bookmarks: true
  attachments: true
  private_messages: true
  polls: true

  # Import likes from the phpBB's "Thanks for posts" extension
  likes: false

  # When true: each imported user will have the original username from phpBB as its name
  # When false: the name of each imported user will be blank unless the username was changed during import
  username_as_name: false

  # Map Emojis to smilies used in phpBB. Most of the default smilies already have a mapping, but you can override
  # the mappings here, if you don't like some of them.
  # The mapping syntax is: emoji_name: 'smiley_in_phpbb'
  # Or map multiple smilies to one Emoji: emoji_name: ['smiley1', 'smiley2']
  emojis:
    # here are two example mappings...
    smiley: [':D', ':-D', ':grin:']
    heart: ':love:'

  # Map custom profile fields from phpBB to custom user fields in Discourse (works for phpBB 3.1+)
  #
  #  custom_fields:
  #    - phpbb_field_name: "company_name"
  #      discourse_field_name: "Company"
  #    - phpbb_field_name: "facebook"
  #      discourse_field_name: "Facebook"
  custom_fields: []

You’re missing the username and password.