Importing / migrating from phpBB3

I’m reporting a fix for the smiley conversion for the import from phpBB 3.0.7.

  • Some smilies weren’t properly converted to Discourse:


    (but not always, for some reason; some identical emojis were sometimes displayed, sometimes not. It seemed random at first)

  • Also, some emojis simply vanished:
    phpBB:
    image
    Discourse:
    image

The issue came from the regex used in replace_smilies(text) from

Faulty regex:

<!-- s(\S+) --><img src="\{SMILIES_PATH\}\/.+?" alt=".*?" title=".*?" \/><!-- s?:\S+ -->

Notice how the beginning of the regex doesn’t assume there’s a : character after:
<!-- s
But it does assume there is one at the end of the regex:
<!-- s?:
(Also I wonder why there is a ? matching 0 or 1 of the s character at the end of the regex, where there isn’t at the beginning of the regex)

I removed this : from the regex and my two smilis issues seemed completely resolved.

On my phpbb forum, a lot of smilies indeed started by a : like :mrgreen: or :evil:, but some didn’t, like 8-) or ;)
The old regex led to faulty smiley captures. For example, multiple smilies next to each other were captured as a single one.


Fixed regex:

<!-- s(\S+) --><img src="\{SMILIES_PATH\}\/.+?" alt=".*?" title=".*?" \/><!-- s?\S+ -->

I’m not fixing the code directly in the Discourse repo because I’m not used to using git, and also I’m not sure it would impact imports from other phpBB versions. I don’t want to mess up anything.


Anyway, if people come across the same issues as me, here’s the solution.

3 Likes

Another issue fixed that could help people in my case during a phpBB 3.0.7 migration.

For some reason, my phpBB forum post contents sometimes had multiple following spaces at the beginning of lines. I suspect some users “like” to frantically press the space key without paying attention when writing their message, and it had no importance since the rendered page ignored these multiple spaces:

Raw phpBB text content:

Salut tous  <!-- s:) --><img src="{SMILIES_PATH}/icon_e_smile.gif" alt=":)" title="Sourire" /><!-- s:) --> 
  
     Alors voilà, le combi n'a pas roulé beaucoup ces derniers temps cause CT pas OK  <!-- s:evil: --><img src="{SMILIES_PATH}/icon_evil.gif" alt=":evil:" title="Diable" /><!-- s:evil: --> mais il a fait ces 2000 kms sans broncher  <!-- s;) --><img src="{SMILIES_PATH}/icon_e_wink.gif" alt=";)" title="Clin d\'oeil" /><!-- s;) -->  
Maintenant le CT est OK . Merci L'Atelier Du Raz  8-')

    Je dois donc changer le joint-spi au bout de 40 000 kms en 10 ans  <!-- s:roll: --><img src="{SMILIES_PATH}/icon_rolleyes.gif" alt=":roll:" title="Yeux tournants" /><!-- s:roll: --> 
C'est un silicone et j'ai vu qu'il y avait des &quot;doubles lèvres &quot; !? 
What's About ?

             Je trouve ça un peu limte  <!-- s:evil: --><img src="{SMILIES_PATH}/icon_evil.gif" alt=":evil:" title="Diable" /><!-- s:evil: --> 
Merci tous, fred

Rendered page in the browser:


But during the import phpBB → Discourse, these existing spaces were converted to code blocks:

This is how it should be displayed:


I fixed it by adding a regex that removes spaces at the beginning of each line

 text.gsub!(/^[^\S\r\n]+/, "\n")

I added this just before process_smilies(text) in this file:


Another issue I encountered.
In this code (still in text_processor.rb):

    def clean_bbcodes(text)
      # Many phpbb bbcode tags have a hash attached to them. Examples:
      #   [url=https&#58;//google&#46;com:1qh1i7ky]click here[/url:1qh1i7ky]
      #   [quote=&quot;cybereality&quot;:b0wtlzex]Some text.[/quote:b0wtlzex]
      text.gsub!(/:(?:\w{8})\]/, ']')

In my database, these hashes lengths are between 5 and 8 characters, but the regex only removes hashes that are exactly 8 characters. So, my import kept shorter hashes instead of removing them.
I fixed this by changing the regex to:

text.gsub!(/:(?:\w{5,8})\]/, ']')

I add one minor issue, still in the same file. The regex that removes [color] BBCode tags expects a hexadecimal value prepended by a mandatory #. But [color] also accept strings such as “red”, “blue”, etc, as value. So I modified the original regex:

      # remove color tags
      text.gsub!(/\[\/?color(=#[a-z0-9]*)?\]/i, "")

By adding a ? after the # to make the # optional.
Fixed code:

      # remove color tags
      text.gsub!(/\[\/?color(=#?[a-z0-9]*)?\]/i, "")

I don’t know if my issues are common in phpBB imports, or if they are very specific to my case. If the latter, I hope my explanations here aren’t unwelcomed or superfluous. Just let me know if this is the case so it won’t be awkward. :grinning_face_with_smiling_eyes:


Edit: Is it possible to make it that after a migration, all the existing topics are set as “read” for every existing user?

The goal is to prevent that after the migration, existing users clicking on existing (and sometimes old) topics would lead them to the first message on these topics they’ve already read prior to the migration.

Ideally, existing users clicking on existing topics would open not the first, but the last message (since the end of the migration, of course).

It’s a small quality of life issue though (and it will naturally vanish after a few weeks as the users use the forum and read topics), but I was asked about this suggestion.

7 Likes

Thank you for sharing these fixes!

I’ve had to do similar adjustments to regexes for past migrations, so these will be helpful for future phpbb imports.

This topic may be helpful How to mark imported posts as read - #2 by stuwest

3 Likes

That’s all pretty typical.

I think that the changes you suggest likely won’t break any other imports. Often there are a set of changes like these and then a bunch of other changes that are specific to the import and figuring which are which and testing again with only those would be a bunch of work so a PR doesn’t get created.

Glad you got it done!

3 Likes

Thank you Constanza, very helpful link. :+1:


Thanks for the information Jay. I’ll do PR later after I’m done with the migration if it can help to have an even better phpBB3 migration tool.

The current script also ignores [size=XXX] tags which I encountered during other migrations.
On my current migration, I did a quick script in my importer and used the value of XXX to replace these tags with <small>content</small> and <big>content</big> since they are supported by Discourse.
But that was a personal wish and it might be a more proper solution (generally speaking) to simply get rid of these [size] tags, like the import script already does with the [color] tags.

4 Likes

Thanks @Canapin for all your feedback. A PR is definitely welcome!

Also, a small teaser: We are working on a solution which should make all imports – not just phpBB – a lot better, faster, easier to customize and get rid of those pesky problems with the BBCode conversion… :wink:

4 Likes

While importing my posts, I got a bunch of these errors (not for all posts):

   251491 / 251672 ( 99.9%)  [14140 items/min]  Exception while creating post 354629. Skipping.
undefined method `[]' for nil:NilClass
/var/www/discourse/script/import_scripts/phpbb3/importers/post_importer.rb:66:in `block in map_first_post'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/core_ext/object/try.rb:15:in `public_send'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/core_ext/object/try.rb:15:in `try'
/var/www/discourse/script/import_scripts/base.rb:576:in `create_post'
/var/www/discourse/script/import_scripts/base.rb:523:in `block in create_posts'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-2.3.3/lib/patches/db/mysql2/alias_method.rb:8:in `each'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-2.3.3/lib/patches/db/mysql2/alias_method.rb:8:in `each'
/var/www/discourse/script/import_scripts/base.rb:510:in `create_posts'
/var/www/discourse/script/import_scripts/phpbb3/importer.rb:192:in `block in import_posts'
/var/www/discourse/script/import_scripts/base.rb:874:in `block in batches'
/var/www/discourse/script/import_scripts/base.rb:873:in `loop'
/var/www/discourse/script/import_scripts/base.rb:873:in `batches'
/var/www/discourse/script/import_scripts/phpbb3/importer.rb:254:in `batches'
/var/www/discourse/script/import_scripts/phpbb3/importer.rb:188:in `import_posts'
/var/www/discourse/script/import_scripts/phpbb3/importer.rb:38:in `execute'
/var/www/discourse/script/import_scripts/base.rb:47:in `perform'
/var/www/discourse/script/import_scripts/phpbb3/importer.rb:22:in `perform'
script/import_scripts/phpbb3.rb:33:in `<module:PhpBB3>'
script/import_scripts/phpbb3.rb:14:in `<module:ImportScripts>'
script/import_scripts/phpbb3.rb:13:in `<main>'

But I couldn’t figure out what causes this.

Any clue?

How can I check if I have actually post missing? Couldn’t find how to output the data (post content for example) corresponding to “Exception while creating post 354629”

I can’t recall having this error in my other import tests the previous days, but maybe I just didn’t pay attention… Could it be related to this known issue?

251490 / 251672 ( 99.9%)  [14140 items/min]  Parent post 337703 doesn't exist. Skipping 354628: vw-camper est malade !?

I ran the import script twice as it should, regarding this particular issue.

1 Like

Make sure that you set tag_mappings: {} in your settings.yml file if you aren’t using this feature.

3 Likes

I feel dumb! I even thought doing about that, but I was so much sure that I didn’t encounter this error before that I didn’t try… :facepalm:

It resolved the issue, thank you very much.


I supposed you can’t say much at this time, but any clue about how it will work, and when some useable import scripts will be useable using this new thing?
I might migrate another phpBB this year, so that could be very interesting for me to know a bit more. :slight_smile:

1 Like

Hi, i’m testing a phpbb 3.2 migration to discourse for a decently sized forum (30k topics / 600k posts) and almost everything works fine, apart for a couple of topics not imported (“Parent post xx doesn’t exist”), not solved with multiple runs, but it’s not an issue.
My main problem is that subsequent imports with fresh data (verified with the sha256sum in import/mysql/imported file) do not import fresh posts in Discourse. I’m facing an exception early in the process, dunno if it’s linked :

Failed to map post with ID 6815

BIGINT UNSIGNED value is out of range in '(`phpbb_prod`.`o`.`poll_option_total` - (select count(distinct `phpbb_prod`.`v`.`vote_user_id`) from `phpbb_prod`.`phpbb3_poll_votes` `v` join `phpbb_prod`.`phpbb3_users` `u` join `phpbb_prod`.`phpbb3_topics` `t` where ((`phpbb_prod`.`u`.`user_id` = `phpbb_prod`.`v`.`vote_user_id`) and (`phpbb_prod`.`v`.`topic_id` = `phpbb_prod`.`t`.`topic_id`) and (`phpbb_prod`.`v`.`poll_option_id` = `phpbb_prod`.`o`.`poll_option_id`) and (`phpbb_prod`.`t`.`topic_id` = `phpbb_prod

Is this an issue with embedded polls in posts, from the SQL displayed ?

have a nice day !

2 Likes

I can’t say much yet as it’s still under development. Plans can change… But “this year” is a good bet. :wink:

2 Likes

BIGINT UNSIGNED value is out of range

Hmm, are there that many anonymous votes in this poll?

You will need to fiddle with this SQL:

Something like this maybe? mysql - BIGINT UNSIGNED VALUE IS out of range My SQL - Stack Overflow
Please let me know when you’ve found a solution or create a PR with a fix. I’d appreciate it.

3 Likes

Polls are from the very beginning of the forum, I wasn’t in charge at this time, but I think anonymous users were never allowed to post/answer polls

With

 SET sql_mode = 'NO_UNSIGNED_SUBTRACTION';

I’m able to query all polls votes, and for 2 of them, I have negative “anonymous_votes”. It seems this forum somehow allowed anonymous voting, actually :slight_smile:

Thanks @gerhard for the tip, I’m going to dig deeper…

4 Likes

I am currently looking into migrating a phpBB forum to Discourse, the problem is that our phpBB is using Postgres instead of MySQL, so I would like to adapt the script to work with that too.

Could anyone give me some pointers about the best way to go about this? As ideally I would like for everyone to benefit from this work, so would like it to use an upstream-accepted approach instead of just quick and dirty hacking it to work.

From what I can tell it seems to be mostly a matter for adding the proper SQL statements for the PostgreSQL schema using the right DB adapter, however I am not sure how the whole orchestration for the Docker container needs to be adapted to properly spin up a PostgreSQL instance, if needed by the DB type specified in the config, to import the DB dump into and execute the statements against.

2 Likes

There’s a bulk import script for phpBB running on postgresql. Did you see it? discourse/phpbb_postgresql.rb at main · discourse/discourse · GitHub

However, I don’t think it has been used recently, so it’s probably broken and might not work with current phpBB versions because the Markdown storage format changed.

:+1:

Whatever path you choose, I wouldn’t spend too much time on gold plating your solution unless you really want to. The current import scripts will be deprecated sometime this year…

2 Likes

Thanks, I wasn’t aware of that! And thanks for the heads-up that the scripts are going to be deprecated, so I guess it might not be worth upstreaming such a major feature for those if they will anyway be deprecated soon? Is there a replacement planned for them?

1 Like

Yes, of course there will be a replacement.

3 Likes

Another important thing I was wondering, do I need to set anything other than disable_edit_notifications to ensure none of the imported users are emailed by Discourse for anything that I import? As currently the instance is not public and we are just experimenting with the import and probably needing severals tries until it is good enough, I want to avoid any emails being sent.

1 Like

You are looking for the disable_emails site setting…

3 Likes

While [b]text[/b] is supported inline on Discourse, it’s not interpreted for block of text that have new lines.

[b]For example, if I have this text in my post…

And I continue my line after an empty line, I close the bbcode and the text won’t be converted to bold text[/b]

Same for [i] and probably other tags.

I fail to find a clean solution for this in my import. Any idea? :man_shrugging:

2 Likes