Imported MyBB database character encoding issue

Hi, I have a bunch of posts and even usernames imported from a MyBB forum that are displaying random characters like ’ and Â

From what I can glean from reports of similar behavior in WordPress, this may be a Latin1 vs UTF8 encoding issue?

Is there an easy way to remove these after the fact?

What do the characters actually relate to? - I can’t think what original characters they might have been substituted for.

Also I see some imported posts contain a bunch of un-parsed MyCode - is there a way for this to be parsed in Discourse?

Yes. That’s my guess. I’m working with an import now with similar problems. Most of them are things like curly quotes and emdashes.

It’s far from easy, but you you can do some post-processing that either does a force_encoding or attempts to replace the characters one-by-one.

Something like

Post.all.each do |post|
  post.raw = post.raw.force_encoding('utf-8').encode("Windows-1252").force_encoding('utf-8')
  post.save!
  post.rebake!
end

But I’d test it extensively on a staging site before you run it on your live data.

1 Like

Thanks Jay.
Is there any clever way of dealing with the issue at source - i.e re-exporting database from old forum, then re importing free of characters and mycode issues?

If you’ve not gone live so that starting over isn’t an option, That’s the best way to do it.

Site is not officially live - but what is best way to deal with character issues and mycode parsing when exporting from MyBB?

Exporting all data in UTF-8, if possible, will solve those issues.

1 Like

I went back to original MyBB installation, and found in admin control panel/Tools and Maintenance/System Health a warning

It is recommend not to use different encodings in your database. This may cause unexpected behavior or MySQL errors.

The tables are listed, and I could see most but not all were in UTF-8 format. Looked like some, particularly associated with plugins, were in an older format

Clicking a ‘Convert all’ link brought up response that /inc/config.php needed editing to support full 4 byte UTF-8

$config[‘database’][‘encoding’] = ‘utf8mb4’;

After editing config.php and trying the conversion again, all now show as matching. Will try re importing to Discourse and report back if this helps with character issues.

Not sure still how to deal with MyCode parsing though?

1 Like

You didn’t include any examples or details of this - at this point, may be best to start a new thread and keep this one focused on the followup for the character encoding.

3 Likes

Hi, a new thread with an example is here