Great! Glad you got it sorted. What was the trick?
Hopefully you won’t need this Discourse Migration - Literate Computing
Great! Glad you got it sorted. What was the trick?
Hopefully you won’t need this Discourse Migration - Literate Computing
I had to keep running import_phpbb3.sh until all the categories were imported. Script ran overnight, and a network interruption caused some error on my end. I re-started the script, and I think it is working now. If there are more issues, I’m going to have to go in and clean the MySQL table.
Hi eveyone! Forum is mostly up and running…looks fantastic. I’m writing script to modify [/quote] tag. Board is fairly old, dating back to 2001, and just adding a space above and below the closing tag fixes a lot. I’m sure there was a setting that I could have updated in the import script, but it’s my first time migrating, so I’m learning as I go.
Question: I’m fixing 10 years of data, and have a live phpBB board running. It’s taking a bit of time. Can I use the import_phpbb3.sh script to import the last X days of posts from my forum? I think it’s just a merge as far as the script goes. I can export last 7 days of data from MySQL…but I don’t know if that would work. Thoughts?
I haven’t verified the if the script is working…I tested it on small parts, but not batching. Issue I had was a space above and below the closing tag. Now I’m going to go mow the lawn and come back and check later:
batch_size = 1000
total_processed = 0
# Process posts across the entire site in batches
Post.find_in_batches(batch_size: batch_size) do |batch|
updated_posts = []
batch.each do |post|
original_raw = post.raw
# Apply the correction
new_raw = original_raw.gsub(/\n\\n\[\/quote\]\\n\n\n/, "\n\n[/quote]\n\n")
if original_raw != new_raw
post.update_column(:raw, new_raw) # Direct column update to skip callbacks
updated_posts << post
total_processed += 1
end
end
# Rebake only the updated posts to minimize load
updated_posts.each(&:rebake!)
puts "Processed a batch of #{batch.size}. Total processed so far: #{total_processed}."
end
puts "Total #{total_processed} posts processed across the entire site."
This worked on individual posts:
post = Post.find(344572) # Replace 344572 with the correct ID
post.raw = post.raw.gsub(/\n\n[/quote]\n\n\n/, “\n\n[/quote]\n\n”)
I’d modify the script to do this when it imports the data. I’m surprised that it doesn’t already. It’s worth taking a closer look at things.
On several scripts that I’ve worked on, I’ve added an IMPORT_AFTER
ENV setting and modified the queries to include where some_timestamp > import_after_data
. I don’t think this one has such an option, but I haven’t paid careful attention.
But beware that it’s likely that things that are in the data from 10 years ago are likely to be different from the last 2 years, so testing on just recent data is great for debugging stuff that you know is everywhere, you’ll want to test on the whole database too.
There’s a bunch of different stuff in the mix. Import is about 99% complete… I just need to go back and suck in last weeks of posts Once the line breaks are added and this is fixed <LINK_TEXT text= all should be good
Here is what I did to clean up a large chunk of my posts after I imported. I had AI to provide an explanation to someone coming over from PhpBB. I was up till 5 am trying to grey his to work
To run a Ruby script within your Discourse environment that processes forum topics in batches of 1000, and applies specific transformations to each post within those topics, you will follow a sequence of steps to access your server, enter the appropriate environment, and execute the script. Here’s a detailed guide including the script itself:
Using a Secure Shell (SSH) client like PuTTY for Windows, connect to your server where the Discourse forum is hosted. You’ll need the IP address or domain name of your server, as well as your credentials (username and password or an SSH key).
Once logged into your server, navigate to the Discourse installation directory, typically /var/discourse
. Then, enter the Docker container that runs Discourse using the following commands:
bash
cd /var/discourse
./launcher enter app
Inside the Docker container, you can interact with your Discourse application through the Rails console. This is a Ruby on Rails environment that allows you to run Ruby code directly against your Discourse database and application logic. Start the console with:
bash
rails c
With the Rails console open, you’re ready to run the Ruby script. The script should be prepared in advance and copied to your clipboard. In PuTTY, you can paste the script by right-clicking or pressing Shift + Insert.
Here’s the complete script you’ll be using:
code
# Retrieve an array of all topic IDs
topic_ids = Topic.pluck(:id)
# Define batch size
batch_size = 1000
current_batch_start = 0
while current_batch_start < topic_ids.length
# Process a batch of 1000 topics at a time
topic_ids[current_batch_start, batch_size].each do |topic_id|
# Fetch the topic by ID
topic = Topic.find(topic_id)
# Skip if the topic is nil
next if topic.nil?
# Initialize a count of transformed posts for this topic
transformed_count = 0
# Iterate over each post within the topic
topic.posts.each do |post|
# Flag to track if transformations have been made
transformed = false
# Apply transformations
transformed |= post.raw.gsub!(/<\/?r>/, '').present?
transformed |= post.raw.gsub!(/<\/?s>/, '').present?
transformed |= post.raw.gsub!(/<\/?e>/, '').present?
transformed |= post.raw.gsub!(/<\/?QUOTE[^>]*>/, '').present?
transformed |= post.raw.gsub!(/\[quote=““([^”]+)””\]/, '[quote="\1"]').present?
transformed |= post.raw.gsub!(/\\n/, "\n").present?
transformed |= post.raw.gsub!(/\[quote=([^\s]+)\s+post_id=\d+\s+time=\d+\s+user_id=\d+\]/, '[quote="\1"]').present?
transformed |= post.raw.gsub!(/<URL url="([^"]+)">.*?<LINK_TEXT text="[^"]+">[^<]+<\/LINK_TEXT>.*?<\/URL>/, '\1').present?
transformed |= post.raw.gsub!(/\[\/quote\]/, "\n[/quote]\n").present?
transformed |= post.raw.gsub!(/\A\n/, '').present?
# Save and rebake the post if any transformations have occurred
if transformed
post.save!
post.rebake!
transformed_count += 1
end
end
# Output the result for the current topic
if transformed_count > 0
puts "Transformed #{transformed_count} posts in topic #{topic_id}."
else
puts "No transformations were necessary for topic #{topic_id}."
end
end
# Update the starting index for the next batch
current_batch_start += batch_size
# Check if there are more topics to process
if current_batch_start < topic_ids.length
puts "Completed a batch of #{batch_size} topics. Do you want to continue to the next batch? (yes/no)"
response = gets.strip.downcase
break unless response == 'yes'
end
end
This is what it should look like as it’s running.
No transformations were necessary for topic 19556.
No transformations were necessary for topic 35766.
No transformations were necessary for topic 35783.
No transformations were necessary for topic 35778.
No transformations were necessary for topic 35774.
No transformations were necessary for topic 35770.
Transformed 292 posts in topic 20234.
No transformations were necessary for topic 35781.
No transformations were necessary for topic 35779.
Transformed 242 posts in topic 20218.
Transformed 22 posts in topic 19522.
No transformations were necessary for topic 35771.
No transformations were necessary for topic 35767.
Transformed 2 posts in topic 22560.
No transformations were necessary for topic 35797.
No transformations were necessary for topic 35789.
No transformations were necessary for topic 35785.
No transformations were necessary for topic 31889.
Transformed 1 posts in topic 31831.
No transformations were necessary for topic 31792.
No transformations were necessary for topic 35794.
No transformations were necessary for topic 35815.
I’m getting this error (post-migration, on a functioning standard install) when trying to change a user’s display name (not username) to include special characters. I get an internal server error
popup when I try and the logs show the same error @DDo had.
Notably, other users are able to change their display name to include the same character (™). The relevant difference seems to be that users who have logged in post migration can have UTF-8 characters, but users that haven’t logged in can only have ASCII-8BIT.
I also assume this error would be resolved by removing discourse-migratepassword
, but haven’t tested it.
Is this a bug, or something inherent to making that plugin work? If the former, is it best to create an issue on Github to report it?
Hm trying to (re)build an import container but it fails:
FAILED
--------------------
Errno::ENOENT: No such file or directory @ rb_sysopen - /etc/service/unicorn/run
Location of failure: /usr/local/lib/ruby/gems/3.2.0/gems/pups-1.2.1/lib/pups/replace_command.rb:11:in `read'
replace failed with the params {"tag"=>"precompile", "filename"=>"/etc/service/unicorn/run", "from"=>"PRECOMPILE_ON_BOOT=1", "to"=>"PRECOMPILE_ON_BOOT=0"}
bootstrap failed with exit code 1
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
I already disabled all plugins, but no change.
Does anybody have any idea?
So that’s the first idea.
Yeah I did not find anything which shows/points me to an error… I will look again…
hooks:
after_web_config:
- exec:
cd: /etc/service
cmd:
# - rm -R unicorn
- rm -R nginx
- rm -R cron
I commented the line - rm -R unicorn
in templates/import/phpbb3.template.yml
and the build went through without the error.
What happened here? phpbb3.template.yml
is the version from Github, two years old. So there must be a change somewhere else?!?
That might be from before they switched from Ubuntu to Debian. Those are likely to be updated only when someone notices that they don’t work anymore.
It doesn’t make much sense to me that that rm
was a problem, but I don’t pay much attention to those if someone’s not paying me. And even then I don’t remember paying much attention to that one.
I thought, that if the builder complains about a missing /etc/service/unicorn/run and excactly that is removed, I tried to comment this removal command. Worked.
Maybe somebody with more knowledge about everything wants to verify and make an update on Github in the script. I can also make a PR - but without any knowledge about everything, I do not want to.
But changing from Ubuntu to Debian changes a lot of, true.
First off, I’m a docker neophyte, so there’s a good chance that I’ve just screwed something up.
I have a clean install of Discourse on a DigitalOcean Droplet running Ubuntu 22.04 using their prebuilt app. The forum built just fine and runs in the standard configuration.
When I execute /var/discourse/launcher rebuild import, I get the following at the end of the build:
Errno::ENOENT: No such file or directory @ rb_sysopen - /etc/service/unicorn/run
Location of failure: /usr/local/lib/ruby/gems/3.3.0/gems/pups-1.2.1/lib/pups/replace_command.rb:11:in `read'
replace failed with the params {"tag"=>"precompile", "filename"=>"/etc/service/unicorn/run", "from"=>"PRECOMPILE_ON_BOOT=1", "to"=>"PRECOMPILE_ON_BOOT=0"}
bootstrap failed with exit code 1
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
When I execute: /var/discourse/launcher enter import
I get:
86_64 arch detected.
Error response from daemon: No such container: import
Is this because of the errors at the top of this post (and if so, how do I fix it), or what am I doing wrong?
Did you create import.yml and bootstrap that container as (I’m pretty sure) the instructions say?
The instructions show copying app.yml to import.yml and adding “templates/import/phpbb3.template.yml” to import.yml (which I did). Then, you rebuild import, which generates the error in my OP. I fail to see where there are any instructions on creating a bootstrap(?).
The instructions are pretty simple, which is why I’m confused about what’s going wrong.
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
81a2f335fd01 local_discourse/app "/sbin/boot" 14 hours ago Up 11 hours 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp app
Sorry. Rebuild does a bootstrap. If finished then the import container was running.
Oh. I’m very sorry. I didn’t catch what was happening before. It looks like the phpbb3 template might be incompatible with recent changes to discourse_docker. But that’s all I can tell on my phone.
I think if you delete a line in the phpbb3 template that deletes “/etc/service/unicorn/run” that might make the build complete.
Jay, thank you for the answer to this problem. The build now completes properly.
Next problem: When I execute: import_phpbb3.sh I get:
The phpBB3 import is starting...
/usr/local/lib/ruby/3.3.0/psych/parser.rb:62:in `_native_parse': (<unknown>): did not find expected key while parsing a block mapping at line 3 column 1 (Psych::SyntaxError)
from /usr/local/lib/ruby/3.3.0/psych/parser.rb:62:in `parse'
from /usr/local/lib/ruby/3.3.0/psych.rb:455:in `parse_stream'
from /usr/local/lib/ruby/3.3.0/psych.rb:399:in `parse'
from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/bootsnap-1.18.4/lib/bootsnap/compile_cache/yaml.rb:129:in `strict_load'
from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/bootsnap-1.18.4/lib/bootsnap/compile_cache/yaml.rb:186:in `input_to_storage'
from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/bootsnap-1.18.4/lib/bootsnap/compile_cache/yaml.rb:232:in `fetch'
from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/bootsnap-1.18.4/lib/bootsnap/compile_cache/yaml.rb:232:in `load_file'
from /var/www/discourse/script/import_scripts/phpbb3/support/settings.rb:10:in `load'
from script/import_scripts/phpbb3.rb:20:in `<module:PhpBB3>'
from script/import_scripts/phpbb3.rb:16:in `<module:ImportScripts>'
from script/import_scripts/phpbb3.rb:15:in `<main>'
I’m guessing that it doesn’t like something in my settings.yml file. How do I tell what it’s choking on?
database:
type: MySQL # currently only MySQL is supported
host: localhost
port: 3306
username:
password:
schema: phpbb
table_prefix: phpbb_ # Change this, if your forum is using a different prefix. Usually all table names start wi
th phpbb_
batch_size: 1000 # Don't change this unless you know what you're doing. The default (1000) should work just fin
e.
import:
# Set this if you import multiple phpBB forums into a single Discourse forum.
#
# For example, when importing multiple sites, prefix all imported IDs
# with 'first' to avoid conflicts. Subsequent import runs must have a
# different 'site_name'.
#
# site_name: first
#
site_name: Freedom Owners Forum
# Create new categories
#
# For example, to create a parent category and a subcategory.
#
# new_categories:
# - forum_id: foo
# name: Foo Category
# - forum_id: bar
# name: Bar Category
# parent_id: foo
#
new_categories:
- forum_id: general
name: General
- forum_id: systems
name: Boat Systems
- forum_id: photos
name: Photos
- forum_id: docs
name: Manuals and Documentation
- forum_id: buy
name: Buy/Sell/Trade
- forum_id: site
name: Site Usage
- forum_id: archives
name: Archives
# Category mappings
#
# * "source_category_id" is the forum ID in phpBB3
# * "target_category_id" is either a forum ID from phpBB3 or a "forum_id"
# from the "new_categories" setting (see above)
# * "discourse_category_id" is a category ID from Discourse
# * "skip" allows you to ignore a category during import
#
# Use "target_category_id" if you want to merge categories and use
# "discourse_category_id" if you want to import a forum into an existing
# category in Discourse.
#
# category_mappings:
# - source_category_id: 1
# target_category_id: foo
# - source_category_id: 2
# discourse_category_id: 42
# - source_category_id: 6
# skip: true
#
category_mappings:
- source_category_id: 8
target_category_id: systems
- source_category_id: 7
target_category_id: systems
- source_category_id: 9
target_category_id: systems
- source_category_id: 10
target_category_id: buy
- source_category_id: 11
target_category_id: general
- source_category_id: 12
target_category_id: general
- source_category_id: 13
target_category_id: general
- source_category_id: 14
target_category_id: general
- source_category_id: 16
target_category_id: docs
- source_category_id: 17
target_category_id: docs
- source_category_id: 18
target_category_id: general
- source_category_id: 19
target_category_id: general
- source_category_id: 20
target_category_id: general
- source_category_id: 21
target_category_id: docs
- source_category_id: 22
target_category_id: general
- source_category_id: 23
target_category_id: site
- source_category_id: 24
target_category_id: general
- source_category_id: 25
target_category_id: site
- source_category_id: 42
target_category_id: systems
- source_category_id: 43
target_category_id: docs
- source_category_id: 44
target_category_id: general
- source_category_id: 45
target_category_id: general
- source_category_id: 46
target_category_id: site
- source_category_id: 48
target_category_id: general
- source_category_id: 56
target_category_id: general
- source_category_id: 58
target_category_id: systems
- source_category_id: 59
skip: true
- source_category_id: 60
target_category_id: archives
- source_category_id: 61
target_category_id: archives
- source_category_id: 62
target_category_id: archives
- source_category_id: 63
target_category_id: archives
- source_category_id: 64
target_category_id: general
- source_category_id: 65
target_category_id: site
# Tag mappings
#
# For example, imported topics from phpBB category 1 will be tagged
# with 'first-category', etc.
#
# tag_mappings:
# 1:
# - first-category
# 2:
# - second-category
# 3:
# - third-category
#
tag_mappings: {}
# Rank to trust level mapping
#
# Map phpBB 3.x rank levels to trust level
# Users with rank at least 3000 will have TL3, etc.
#
rank_mapping:
trust_level_1: 200
trust_level_2: 1000
trust_level_3: 3000
# rank_mapping: {}
# WARNING: Do not activate this option unless you know what you are doing.
# It will probably break the BBCode to Markdown conversion and slows down your import.
use_bbcode_to_md: false
# This is the path to the root directory of your current phpBB installation (or a copy of it).
# The importer expects to find the /files and /images directories within the base directory.
# You need to change this to something like /var/www/phpbb if you are not using the Docker based importer.
# This is only needed if you want to import avatars, attachments or custom smilies.
phpbb_base_dir: /shared/import/data
site_prefix:
# this is needed for rewriting internal links in posts
original: freedomyachts.org # without http(s)://
new: https://test.freedomyachts.org # with http:// or https://
# Enable this, if you want to redirect old forum links to the new locations.
permalinks:
categories: true # redirects /viewforum.php?f=1 to /c/category-name
topics: true # redirects /viewtopic.php?f=6&t=43 to /t/topic-name/81
posts: false # redirects /viewtopic.php?p=2455#p2455 to /t/topic-name/81/4
# Append a prefix to each type of link, e.g. 'forum' to redirect /forum/viewtopic.php?f=6&t=43 to /t/topic-na
me/81
# Leave it empty if your forum wasn't installed in a subfolder.
prefix:
avatars:
uploaded: true # import uploaded avatars
gallery: true # import the predefined avatars phpBB offers
remote: false # WARNING: This can considerably slow down your import. It will try to download remote avatar
s.
# When true: Anonymous users are imported as suspended users. They can't login and have no email address.
# When false: The system user will be used for all anonymous users.
anonymous_users: true
# Enable this, if you want import password hashes in order to use the "migratepassword" plugin.
# This will allow users to login with their current password.
# The plugin is available at: https://github.com/discoursehosting/discourse-migratepassword
passwords: true
# By default all the following things get imported. You can disable them by setting them to false.
bookmarks: true
attachments: true
private_messages: true
polls: true
# Import likes from the phpBB's "Thanks for posts" extension
likes: false
# When true: each imported user will have the original username from phpBB as its name
# When false: the name of each imported user will be blank unless the username was changed during import
username_as_name: false
# Map Emojis to smilies used in phpBB. Most of the default smilies already have a mapping, but you can override
# the mappings here, if you don't like some of them.
# The mapping syntax is: emoji_name: 'smiley_in_phpbb'
# Or map multiple smilies to one Emoji: emoji_name: ['smiley1', 'smiley2']
emojis:
# here are two example mappings...
smiley: [':D', ':-D', ':grin:']
heart: ':love:'
# Map custom profile fields from phpBB to custom user fields in Discourse (works for phpBB 3.1+)
#
# custom_fields:
# - phpbb_field_name: "company_name"
# discourse_field_name: "Company"
# - phpbb_field_name: "facebook"
# discourse_field_name: "Facebook"
custom_fields: []
You’re missing the username and password.