Migrate a phpBB3 forum to Discourse

@gerhard : Could I acheive the sanitizing of bad characters encoding after the migration by using the Sidekiq ? I am really new to Discourse to I may not understand of it works.
Or maybe a SQL script directly into the MariaDB ?

Thanks a lot !

Sure! You could also just edit each post by hand. Itā€™s on the order of 10X easier to do it before hand when itā€™s possible to just start over again.

I tried a bunch of stuff to try to fix up the encoding on the mariadb side, but didnā€™t come up with a solution. Hereā€™s some code that I used to fix up the encoding on an import that Iā€™m working on.

    ### WIN1252 encoding
    win_encoded = ''
    begin
      win_encoded = raw.force_encoding('utf-8').encode("Windows-1252",
                            invalid: :replace, undef: :replace, replace: ""
                           ).force_encoding('utf-8').scrub
    rescue => e
      puts "\n#{'-'*50}\nWin1252 failed for \n\n#{raw}\n\n"
      win_encoded = ''
    end
1 Like

nice thanks :slight_smile:

1 Like

It was a painful experience. I tried multiple encodings and included multiple ones in the post so that I could compare them. This one seemed to solve most problems most of the time. It took me much longer than Iā€™d have liked to figure out .scrub, as without it Iā€™d end up with strings that could no longer be parsed with gsub.

Does anyone yet have a good solution to import nested lists from phpBB?

Iā€™m not enough of a ruby guy to know how to approach it ā€“ sitting over this

 def process_lists(text)
      # convert list tags to ul and list=1 tags to ol
      # list=a is not supported, so handle it like list=1
      # list=9 and list=x have the same result as list=1 and list=a
      text.gsub!(/\[list\](.*?)\[\/list:u\]/mi) do
        $1.gsub(/\[\*\](.*?)\[\/\*(:m)?\]\n*/mi) { "* #{$1}\n" }
      end
      text.gsub!(/\[list=.*?\](.*?)\[\/list:o\]/mi) do
        $1.gsub(/\[\*\](.*?)\[\/\*(:m)?\]\n*/mi) { "1. #{$1}\n" }
      end
    end

Does anyone have an idea how ā€œloopingā€ over a nested list could look like?

Hereā€™s a bbcode example

[list]
   [*]
   [list=a]
      [*]a
      [*]b
      [*]c
      [*]d
      [*]e
   [/list]
   [*]outer list
[/list]
1 Like

@helmi

Have you looked at how this official discourse plugin could help you with LIST tags?

1 Like

I have not but I will, thanks. Though I try to avoid using it as I donā€™t want to support additional BBTags on the long run - I really would prefer to convert the stuff on import. The nested lists arenā€™t a deal breaker but would be great to have them.

You could create a backup, upgrade to phpBB 3.2 and use my experimental branch. It has a much better support for BBCodes. :wink:

1 Like

That sounds interesting but Iā€™d rather not want to fiddle with phpBB anymore. Apart from the lists the import process is more or less streamlined already. That would kind of set us back into more work and testing. But good to see the 3.2 progress coming along nicely.

Hey @helmi

Yes, we have been wrangling with these kinds of issues for a month now, migrating a legacy vB3 forum with nearly two decades of all kinds of crazy uses of bbcode, nesting, embedding, etc.

It is non-trivial to get this kind of migration up to 99.9% perfect. For example, we wrote Ruby code to strip all bbcode tags from our code tags because markdown does not ā€œlikeā€ bbcode inside fenced code blocks.

On our end, we are still finalizing a lot of Ruby preprocessing routines and we are getting closer and closer but we will never get to ā€œperfectionā€ or ā€œ100%ā€ with nearly two decades of posts by some very creative bbcode users (not to mention all those users who cut and pasted into posts, etcā€¦)

We are still working on cleaning up issues related from BBCODE to MD.

Sometimes I think ā€œjust strip it all awayā€ ā€¦ LOL

1 Like

Looks like weā€™re still facing some issues with Usernames getting modified or not imported at all when containing some special characters.

Are there any known issues with the importer or am I getting the unicode username option in discourse wrong? I enabled it but the whitelist still contains umlauts. Do I need to delete the whitelist to allow all unicode characters or do I need to list all characters? My thought was enabling unicode allows all unicode and disabling it only allows the whitelisted ones.

When importing it looks like some characters like @ or * are replaced by _ ā€“ i can imagine where @ could cause problems with @mentioning butā€¦ I just wanted to make sure before we manually deal with all those users before importing (which would be a big hazzle).

Any hints on that?

Cheers,
Frank

Even when you enable Unicode support, it still only allows letters and numbers. See Unicode usernames and group names

2 Likes

Hello,

First of all, thank you for all the documentation and the help of this topic.
Iā€™ve just import an old phpBB3 forum into a brand new Discourse one.

Everything ran quite smoothly with the 200k posts and 20k attachments, but sidekiq had a little trouble digesting everything after importing.

Iā€™m now facing a new issue with image tags which were included in a url tags, like that :
[url=http://www.casimages.com][img]http://nsm01.casimages.com/img/2009/04/24//090424092900546293539010.jpg[/img][/url]

After importing (converting bbcode to markdown), these images are just looking like links :
[nsm01.casimages.com/img/2009/04/24//090424092900546293539010.jpg](http://www.casimages.com)

Is there a way to process/rebake these links so that they are displayed as an image and can be automatically uploaded to the s3?

Thanks in advance for your help

Search system settings for ā€œdownload remoteā€

EDIT: Oops. Looks like that wasnā€™t your problem. Sorry.

@pfaffman

Thank you Jay. Yes, I already enabled the ā€œdownload remoteā€ setting. My problem is that these images are displayed as link (and the image url is only ā€œthe displayed partā€ of the link, the target does not match an image content).

1 Like

Hi,

I currently have a forum imported from phpbb.
Iā€™m thinking about importing another phpbb forum (36000 messages, 230 members) inside my existing Discourse installation. They share common categories and users.

Users account could be manually merged.
As for the messages, after the import they would be manually moved into existing categories.

Would that work kind of like out the box from the import script or would it be mayhem?

What happens if I import a phpbb forum and some user emails already exist in my current Discourse?

Youā€™ll need to wipe the import_id custom fields, or the new posts will be considered to already have been imported.

This sounds like something you should expect a lot of mucking around in a local install for, using a backup of your main database.

2 Likes

Thank you for this valuable information that I didnā€™t think about :+1:t6:
Yeah, Iā€™ll do that and try on a test server to see if it can be managed.

About the context:

I host a national Discourse forum about unicycling.
I also host a phpbb forum for a local unicycle association.
Both forum share some of the same users, and some topics or discussion subjects are redundant. The idea would be to merge the local association forum into the national forum so there are no more topics redundancy, and create a group/gategory for the local association related topics.

Plus, people who are only on the local association forum will join the national forum this way and maybe bring more activity.

Does anyone have an idea what the reason may be for Mysql not running in the import container?

Iā€™ve done quite a few runs and that always worked but now getting

Can't connect to local MySQL server through topic:30810

Once I try starting the import. Tripble checked everything, rebuild the import container several times. Everything is there but mysql doesnā€™t seem to be running. :frowning:

Iā€™m a bit horrified as this is actually our real move to discourse today.

:question: :question::question: That looks strange. Whereā€™s that coming from?