My phpBB Migration Journey (postgresql)

i’m sharing my findings as i make my way through a migration from phpBB3 to discourse.

solutions include:

  • postgres source db
  • emoji
  • tweaks to the importer (fixing quote bug, improved bbcode support (including youtube), attachment comments)
  • soft deleted posts being imported as normal posts

i’m super picky, so if you follow along, you should be able to get a pretty good result.


i’m considering moving to discourse from phpbb (installed version history: 3.2.1 - 3.2.8).
problem is, i use postgresql for the db. suggestions? i haven’t tried it but assume it’s not supported yet based on the OP.

did the new import script ever come out? i see that was a bit over a year ago.

1 Like

The bulk import script should work for you… discourse/phpbb_postgresql.rb at 52d4de7b45eb1b7c7997da2ba8bff83a3c22a3a5 · discourse/discourse · GitHub

4 Likes

thanks, @gerhard

is my understanding correct?

  • script/import_scripts/phpbb3 and script/bulk_import are completely separate importers
  • the bulk importer is less sophisticated (though faster), and, for example, it won’t import attachments?

if so, it sounds like there are downsides to using the bulk importer…

is there no other solution, then? would it be advisable to explore doing a db migration from postgresql to mysql or something?

2 Likes

Yes, there are drawbacks. If you are comfortable working with Ruby, I’d suggest modifying the files in discourse/script/import_scripts/phpbb3/database at main · discourse/discourse · GitHub so that they work with pg gem instead of mysql2.

Maybe? I have no idea how easy that would be and if it will actually work. You can give it a try…

1 Like

bummer. well, i’ll take a crack at it. i’ll follow up if i come up with something.

2 Likes

is that importer the one that will be deprecated? what’s the status of the ‘new’ one? i’m trying to decide how much effort i put into this—will there be upstream value?

2 Likes

I have not dealt with the postgresql to mysql issue myself, but purely from a perspective of migrating from phpBB to Discourse, my opinion is: do whatever it takes to accomplish the migration.

I’ve migrated two phpBB forums to Discourse, and although there were the usual grumblings from a small number of users that one gets with any change, the benefits of Discourse are well worth it! Not only is Discourse easier to maintain and administer, but the built-in user engagement, image-handling, user-customization, and legibility of Discourse are only a few of the features that are so far ahead of phpBB that there is no comparison. You also get much better support with Discourse.

I’m not an expert, but a quick search makes it look like you could migrate from postgresql to mysql by doing a schema dump, changing the data types in the schema statements to match those used by mysql, using the modified schema to create the tables in a mysql database, and then doing a table-by-table CSV export and import.

Once you have the mysql database, you could use the regular phpBB migration script and have all your attachments.

It will be worth the effort!

3 Likes

That sounds like a recipe for disaster if you aren’t careful with database charsets and collations.

A couple of years ago I started working on it, but it’s unfinished and out-of-date. Anyway, maybe it will be helpful.

3 Likes

I didn’t mean to imply that it would be easy, just that it’s worth the effort, even possible to do manually, and there do seem to be plenty of resources out there to accomplish it, both with automated tools and manually. :grinning:

3 Likes

given that i hate ruby, and the existing importer is going to be deprecated in favor of bulk importer (which is not good enough yet for my purposes), i am going ahead with the postgres → mysql migration strategy.

i’m getting somewhere with mysql workbench’s migration wizard. i’ll write up a little guide if it succeeds, but it looks promising so far.

2 Likes

so far, i’m stuck on this issue:

https://bugs.mysql.com/bug.php?id=89048

basically, mysql workbench fails on importing unicode. will try with mariadb tomorrow and see if i can set default db encoding or something before importing.

1 Like

It would have been great if MySQL Workbench would have worked. I saw a bunch of error reports with it when I looked at options, and so I didn’t recommend it.

I mentioned the schema dump and CSV export/import method once before, and I’ll only mention it once more with a slight modification to make it much easier and then shut up.

If you want to go a GUI route:

  1. Get your hands on a structure-only export of all the tables for a working copy of the same version of phpBB3 you’re using that is running MySQL/MariaDB. It would literally take someone two minutes to create one from phpMyAdmin, which is available on most web hosts that run MySQL/MariaDB. (Alternatively, you might be able to get the CREATE TABLE statements you need from the phpBB3 install scripts if you can’t get someone to do the structure-only export for you, or even use an inexpensive web hosting account to do a clean install of your version of phpBB3 into a MySQL/MariaDB environment and then delete the data in the tables it creates to create an empty MySQL/MariaDB database.)
  2. Find access to a web host with MySQL/MariaDB and phpMyAdmin, create the database, import the structure from the structure-only export you created in step 1 using phpMyAdmin.
  3. Create a CSV export for each table from your postgresql database and do a CSV import to the corresponding table using phpMyAdmin in your new, MySQL/MariaDB database.

I think that will prevent any issues with charset and encoding and you won’t have to figure out all the corresponding data types and field sizes that would be required if you tried to manually convert the schema dump from postgre.

I can give you a structure-only SQL statement, but the only DB I still have is from phpBB3 3.3.8, so it may add more problems if you’re using v. 3.2.x.

3 Likes

haha. yeah, it took some non obvious tricks to even get it working as far as i did. i think it still could work, but thanks for giving me another avenue to explore.

1 Like

ok, phew. here is my solution for postgresql import.

this will take you from a postgres dump to a working mariadb server that you can plug the importer into. requires docker compose.

it works for phpbb 3.2. no clue about 3.3. if you are dying for 3.3 support, i can try to fix it if you give me a phpbb 3.3 pg_dumpall dump.

3 Likes

phpbb_smilies.zip (87.2 KB)

(compare to default smilies)

fwiw here’s my emoji solution

i give you two options:

  1. native
    • if you don’t want to preserve the original phpBB emoji, you can map them to native emoji already existing in discourse

image

:grin: :smile: :wink: :slightly_frowning_face: :astonished: :exploding_head: :confounded: :sunglasses: :joy: :rage:
:stuck_out_tongue: :person_facepalming: :sob: :imp: :smiling_imp: :roll_eyes: :exclamation: :question: :bulb: :arrow_forward:
:neutral_face: :smiley_cat: :nerd_face: :nerd_face:

  1. imported
    • if you want to preserve the original phpBB emoji and have them available as shortcodes to continue using:
      • before running the importer, go to the emoji section of the discourse dashboard (admin/customize/emojis)
      • if you want, first create an emoji group called phpbb or something
      • drag the emoji files (5 at a time) in the new_files onto the browser to import them as custom emoji

i have a zip with the config for either option included as import.yml in the zip.

also in the zip is a spreadsheet to help you decide or make your own mapping:

phpbb_smilies
├── phpbb_smilies.xlsx # master reference to help you decide
├── import.yml # config file with both options
├── orig_phpbb_smilies.csv # original phpbb data
├── orig_files # original phpbb emoji files
│   ├── icon_arrow.gif
│   └── ...
└── new_files # emoji files renamed to match the `new_shortcode` column in the spreadsheet
    ├── phpbb_arrow.gif
    └── ...
1 Like

has anyone gotten quote attribution to work correctly?

here’s an example of an original post from phpbb containing a quote (username and content changed, IDs etc are the same)

[quote=someuser post_id=46649 time=1677556325 user_id=48]
foo
[/quote]

bar

here’s how it looks in discourse:

image

here’s the raw post migrated to discourse:

[quote=", post:37, topic:1893"]
foo
[/quote]

bar

the migrated postid and topicid are correct, but the username is missing. this causes it to be noninteractive. when username is an empty string, the quote doesn’t expand, and you can’t click it to follow the reference to the original post.

can i hope for a better experience without making improvements to the importer? i.e. am i just doing something wrong?

1 Like

I’m fairly certain that those worked properly on a phpbb3 import I did in the past month.

1 Like

i haven’t figured out the last problem yet, but here’s another question.

i notice that after the import finishes and you start the app container, my still existing phpbb server gets hammered really hard. i think this is during the sidekiq post processing phase.

phpbb still exists at www.example.com, and discourse is at dc.example.com

i’m trying to understand what’s actually happening, what settings make sense during this test migration, and what settings will make sense for the final migration. and if i need to have phpbb running for that sidekiq post processing. asking because i have no idea what happens in the post processing.

some possibly relevant settings in my current settings.yml:

import:
  # Set this if you import multiple phpBB forums into a single Discourse forum.
  site_name:
  site_prefix:
    # this is needed for rewriting internal links in posts
    original: example.com    # without http(s)://
    new: https://dc.example.com       # with http:// or https://

if there’s other stuff you need to look at, please lmk.

another thing i’m not clear on is www subdomain. i currently redirect to www with nginx for phpbb. so in the example above, does it matter if i put original: example.com vs original: www.example.com? similar question for new when i do the final migration. my users would actually access discourse from www.example.com, but idk what the best practice is.

1 Like

something is indeed wrong with the quote parsing.

i just tried importing a freshly-installed phpBB 3.2.8 (installed on mariadb)

quote attribution is broken – the username field is blank in the imported quotes, but everything else is legit

discourse screenshot of imported post:

image

discourse imported post:

[quote=", post:4, topic:12"]
[quote=", post:3, topic:12"]
[quote=", post:2, topic:12"]
msg 1
[/quote]

second
[/quote]

third
[/quote]

fourth

original phpbb post:

MariaDB [phpbb]> select post_text from phpbb_posts where post_id=5;
...
| <r><QUOTE author="admin" post_id="4" time="1678400691" user_id="2"><s>[quote=admin post_id=4 time=1678400691 user_id=2]</s>
<QUOTE author="admin" post_id="3" time="1678400685" user_id="2"><s>[quote=admin post_id=3 time=1678400685 user_id=2]</s>
<QUOTE author="admin" post_id="2" time="1678400675" user_id="2"><s>[quote=admin post_id=2 time=1678400675 user_id=2]</s>msg 1<e>[/quote]</e></QUOTE>

second
<e>[/quote]</e></QUOTE>

third
<e>[/quote]</e></QUOTE>

fourth</r> |

maybe i will end up making improvements to the importer after all…

3 Likes

fixed


i’m rather picky about my forum’s migration, so i am continuing to improve the importer. not sure if i’ll bother with making PRs since the importer is deprecated, and some of my fixes are semi-specific to my forum, but this branch will have all of my fixes combined in case it’s useful to someone:

i fixed the quotes issue, added support for some commonly-added bbcodes, made the youtube link parsing less broken, and support the mentions/simplementions extension. i still have more stuff to improve like adding support for multiple site prefixes (main use case is when you have links on your forum to example.com and www.example.com).

even though i’m supporting some non-vanilla stuff, it shouldn’t be a problem running it on a standard phpBB forum without extensions. i recommend just using mine in any case.

easiest way to use it is to pull my branch somewhere and clobber the import script dir inside of the container with a bind mount.

i.e. pull my changes somewhere:

git clone --filter=blob:none --no-checkout https://github.com/ftc2/discourse.git discourse_dev
cd discourse_dev
git sparse-checkout set --cone
git switch phpbb_import
git sparse-checkout set script/import_scripts

then add this to your import.yml container config:

docker_args:
  - '-v /path/to/discourse_dev/script/import_scripts:/var/www/discourse/script/import_scripts'

then rebuild the import container. after you rebuild, you will probably want to do a reset on where you pulled my repo because the build process will overwrite my files, lol.

cd /path/to/discourse_dev
git reset --hard HEAD
chown -R 1000:1000 .
5 Likes