Migrate a phpBB3 forum to Discourse

Great! Glad you got it sorted. What was the trick?

Hopefully you won’t need this Discourse Migration - Literate Computing

I had to keep running import_phpbb3.sh until all the categories were imported. Script ran overnight, and a network interruption caused some error on my end. I re-started the script, and I think it is working now. If there are more issues, I’m going to have to go in and clean the MySQL table.

1 Like

Hi eveyone! Forum is mostly up and running…looks fantastic. I’m writing script to modify [/quote] tag. Board is fairly old, dating back to 2001, and just adding a space above and below the closing tag fixes a lot. I’m sure there was a setting that I could have updated in the import script, but it’s my first time migrating, so I’m learning as I go.

Question: I’m fixing 10 years of data, and have a live phpBB board running. It’s taking a bit of time. Can I use the import_phpbb3.sh script to import the last X days of posts from my forum? I think it’s just a merge as far as the script goes. I can export last 7 days of data from MySQL…but I don’t know if that would work. Thoughts?

I haven’t verified the if the script is working…I tested it on small parts, but not batching. Issue I had was a space above and below the closing tag. Now I’m going to go mow the lawn and come back and check later:

batch_size = 1000
total_processed = 0

# Process posts across the entire site in batches
Post.find_in_batches(batch_size: batch_size) do |batch|
  updated_posts = []

  batch.each do |post|
    original_raw = post.raw
    # Apply the correction
    new_raw = original_raw.gsub(/\n\\n\[\/quote\]\\n\n\n/, "\n\n[/quote]\n\n")

    if original_raw != new_raw
      post.update_column(:raw, new_raw)  # Direct column update to skip callbacks
      updated_posts << post
      total_processed += 1
    end
  end

  # Rebake only the updated posts to minimize load
  updated_posts.each(&:rebake!)

  puts "Processed a batch of #{batch.size}. Total processed so far: #{total_processed}."
end

puts "Total #{total_processed} posts processed across the entire site."

This worked on individual posts:

post = Post.find(344572) # Replace 344572 with the correct ID

post.raw = post.raw.gsub(/\n\n[/quote]\n\n\n/, “\n\n[/quote]\n\n”)

I’d modify the script to do this when it imports the data. I’m surprised that it doesn’t already. It’s worth taking a closer look at things.

On several scripts that I’ve worked on, I’ve added an IMPORT_AFTER ENV setting and modified the queries to include where some_timestamp > import_after_data. I don’t think this one has such an option, but I haven’t paid careful attention.

But beware that it’s likely that things that are in the data from 10 years ago are likely to be different from the last 2 years, so testing on just recent data is great for debugging stuff that you know is everywhere, you’ll want to test on the whole database too.

1 Like

There’s a bunch of different stuff in the mix. Import is about 99% complete… I just need to go back and suck in last weeks of posts :slightly_smiling_face: Once the line breaks are added and this is fixed <LINK_TEXT text= all should be good :slight_smile:

1 Like

Here is what I did to clean up a large chunk of my posts after I imported. I had AI to provide an explanation to someone coming over from PhpBB. I was up till 5 am trying to grey his to work :rofl::rofl::rofl:

To run a Ruby script within your Discourse environment that processes forum topics in batches of 1000, and applies specific transformations to each post within those topics, you will follow a sequence of steps to access your server, enter the appropriate environment, and execute the script. Here’s a detailed guide including the script itself:

Step 1: Securely Connect to Your Server

Using a Secure Shell (SSH) client like PuTTY for Windows, connect to your server where the Discourse forum is hosted. You’ll need the IP address or domain name of your server, as well as your credentials (username and password or an SSH key).

Step 2: Access the Discourse Docker Container

Once logged into your server, navigate to the Discourse installation directory, typically /var/discourse. Then, enter the Docker container that runs Discourse using the following commands:

bash

cd /var/discourse
./launcher enter app

Step 3: Open the Rails Console

Inside the Docker container, you can interact with your Discourse application through the Rails console. This is a Ruby on Rails environment that allows you to run Ruby code directly against your Discourse database and application logic. Start the console with:

bash

rails c

Step 4: Execute the Ruby Script

With the Rails console open, you’re ready to run the Ruby script. The script should be prepared in advance and copied to your clipboard. In PuTTY, you can paste the script by right-clicking or pressing Shift + Insert.

Here’s the complete script you’ll be using:

code

# Retrieve an array of all topic IDs
topic_ids = Topic.pluck(:id)

# Define batch size
batch_size = 1000
current_batch_start = 0

while current_batch_start < topic_ids.length
  # Process a batch of 1000 topics at a time
  topic_ids[current_batch_start, batch_size].each do |topic_id|
    # Fetch the topic by ID
    topic = Topic.find(topic_id)
  
    # Skip if the topic is nil
    next if topic.nil?
  
    # Initialize a count of transformed posts for this topic
    transformed_count = 0

    # Iterate over each post within the topic
    topic.posts.each do |post|
      # Flag to track if transformations have been made
      transformed = false

      # Apply transformations
      transformed |= post.raw.gsub!(/<\/?r>/, '').present?
      transformed |= post.raw.gsub!(/<\/?s>/, '').present?
      transformed |= post.raw.gsub!(/<\/?e>/, '').present?
      transformed |= post.raw.gsub!(/<\/?QUOTE[^>]*>/, '').present?
      transformed |= post.raw.gsub!(/\[quote=““([^”]+)””\]/, '[quote="\1"]').present?
      transformed |= post.raw.gsub!(/\\n/, "\n").present?
      transformed |= post.raw.gsub!(/\[quote=([^\s]+)\s+post_id=\d+\s+time=\d+\s+user_id=\d+\]/, '[quote="\1"]').present?
      transformed |= post.raw.gsub!(/<URL url="([^"]+)">.*?<LINK_TEXT text="[^"]+">[^<]+<\/LINK_TEXT>.*?<\/URL>/, '\1').present?
      transformed |= post.raw.gsub!(/\[\/quote\]/, "\n[/quote]\n").present?
      transformed |= post.raw.gsub!(/\A\n/, '').present?

      # Save and rebake the post if any transformations have occurred
      if transformed
        post.save!
        post.rebake!
        transformed_count += 1
      end
    end

    # Output the result for the current topic
    if transformed_count > 0
      puts "Transformed #{transformed_count} posts in topic #{topic_id}."
    else
      puts "No transformations were necessary for topic #{topic_id}."
    end
  end

  # Update the starting index for the next batch
  current_batch_start += batch_size

  # Check if there are more topics to process
  if current_batch_start < topic_ids.length
    puts "Completed a batch of #{batch_size} topics. Do you want to continue to the next batch? (yes/no)"
    response = gets.strip.downcase
    break unless response == 'yes'
  end
end

Understanding the Script and Batch Processing

  • Batch Processing: This approach allows you to handle large sets of data in smaller, manageable chunks. It’s particularly useful for reducing load on the server and for operations that may take a long time to complete if done all at once. Here, it’s applied to process Discourse topics in batches of 1000.

This is what it should look like as it’s running.

No transformations were necessary for topic 19556.
No transformations were necessary for topic 35766.
No transformations were necessary for topic 35783.
No transformations were necessary for topic 35778.
No transformations were necessary for topic 35774.
No transformations were necessary for topic 35770.
Transformed 292 posts in topic 20234.
No transformations were necessary for topic 35781.
No transformations were necessary for topic 35779.
Transformed 242 posts in topic 20218.
Transformed 22 posts in topic 19522.
No transformations were necessary for topic 35771.
No transformations were necessary for topic 35767.
Transformed 2 posts in topic 22560.
No transformations were necessary for topic 35797.
No transformations were necessary for topic 35789.
No transformations were necessary for topic 35785.
No transformations were necessary for topic 31889.
Transformed 1 posts in topic 31831.
No transformations were necessary for topic 31792.
No transformations were necessary for topic 35794.
No transformations were necessary for topic 35815.
  • Script Functionality: The script iterates over each topic ID retrieved from your Discourse database, applying specified transformations to each post within those
1 Like

I’m getting this error (post-migration, on a functioning standard install) when trying to change a user’s display name (not username) to include special characters. I get an internal server error popup when I try and the logs show the same error @DDo had.

Notably, other users are able to change their display name to include the same character (™). The relevant difference seems to be that users who have logged in post migration can have UTF-8 characters, but users that haven’t logged in can only have ASCII-8BIT.

I also assume this error would be resolved by removing discourse-migratepassword, but haven’t tested it.

Is this a bug, or something inherent to making that plugin work? If the former, is it best to create an issue on Github to report it?

Hm trying to (re)build an import container but it fails:

FAILED
--------------------
Errno::ENOENT: No such file or directory @ rb_sysopen - /etc/service/unicorn/run
Location of failure: /usr/local/lib/ruby/gems/3.2.0/gems/pups-1.2.1/lib/pups/replace_command.rb:11:in `read'
replace failed with the params {"tag"=>"precompile", "filename"=>"/etc/service/unicorn/run", "from"=>"PRECOMPILE_ON_BOOT=1", "to"=>"PRECOMPILE_ON_BOOT=0"}
bootstrap failed with exit code 1
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

I already disabled all plugins, but no change.

Does anybody have any idea?

So that’s the first idea.

Yeah I did not find anything which shows/points me to an error… I will look again…

hooks:
  after_web_config:
    - exec:
        cd: /etc/service
        cmd:
        # - rm -R unicorn
          - rm -R nginx
          - rm -R cron

I commented the line - rm -R unicorn in templates/import/phpbb3.template.yml and the build went through without the error.

What happened here? phpbb3.template.yml is the version from Github, two years old. So there must be a change somewhere else?!?

That might be from before they switched from Ubuntu to Debian. Those are likely to be updated only when someone notices that they don’t work anymore.

It doesn’t make much sense to me that that rm was a problem, but I don’t pay much attention to those if someone’s not paying me. And even then I don’t remember paying much attention to that one. :slight_smile:

I thought, that if the builder complains about a missing /etc/service/unicorn/run and excactly that is removed, I tried to comment this removal command. :wink: Worked.

Maybe somebody with more knowledge about everything wants to verify and make an update on Github in the script. I can also make a PR - but without any knowledge about everything, I do not want to.

But changing from Ubuntu to Debian changes a lot of, true.

1 Like