Plugin to help mapping pre-migration threads after migration

Hello, me and my group of crazy idiots are really close to finally migrate our Vbulletin3 forum into Discourse after having wrote an ad-hoc script that finally manages to migrate all the 21 millions replies from the original database into discourse.

Now, we have the problem of links to the topics/replies written in the replies themselves.

In the migration we have written, we write a mapping of the “old” topic and post Ids and to what they map into in discourse.

For example:

   id   | topic_id |   name    | value  |         created_at         |         updated_at
--------+----------+-----------+--------+----------------------------+----------------------------
 581727 |   581736 | import_id | 599137 | 2023-02-08 16:30:01.600759 | 2023-02-08 16:30:01.600759

What I was thinking now is a plugin that simply intercept links to the old forum format and transform them with reference to the new thread/reply.

So for example, something like:

https://oldforum.something.com/showthread.php?t=123456

Will trigger a query search using the topics_custom_field for the value 123456, find the discourse topic_id, then query the topic_links table with that id and find the url. Finally replace it in the post on the client side (assuming js to manipulate the content).

Something similar for posts.

However, I can’t find any good example of how to even start creating something like that for discourse.
Can someone give me some hints, example or plugins that would do something similar (check replies for some substring and replace it, query the API? DB? for one value to retrieve another?).

Thank you

This already exists in core, it is called Permalinks, and the existing VB4 importer has code for it

You should enter something like /showthread\.php\?(\d*)/thread/\1 into the permalink_normalizations setting.

2 Likes

Just to confirm, I should run this logic after the migration has completed right?
So it goes through all the replies again and changes the permalinks

How do you mean, changes? Do you already have permalinks?

When we migrate the content of a reply with, for example: https://oldforum.something.com/showthread.php?t=123456 doesn’t know what id that topic will have on discourse… no?

it will if you use the above code to create permalinks.

showthread.php?t= refers to a topic/thread and not to a reply btw

1 Like

I was just using that link as an example :slight_smile:

Unfortunately we cannot use that code because the import takes ages to import 20 million posts and the bulk import simply doesn’t work. There are missing pieces.

That’s why we had to write our own migration script. It does it all (pm, users, usergroups, categories, topic, replies) in about 6 hours with a 4 cores, 8gb ram but we noticed that we were missing the permalinks :slight_smile:

Maybe you can consider the pre-permalinks nginx map solution? Redirect vBulletin URLs to Discourse URLs

1 Like

We discussed internally and will simply do a second pass when all replies have been migrated.

Thanks for bouncing back ideas with me Richard :heart:

Did your script create import_ids? If so, even if your didn’t create the permalinks, you can fairly quickly process those to create them.

Hey, yes Jay, we do.

We were trying to avoid cycling the whole 20 million + replies again but realised that alternatives solutions (plugin, nginx redirect, etc) would be quite convoluted or rely on external factors that would make it a half-assed solution so, we will simply cycle the replies again and process the permalinks. It will add some time to the migration but hopefully not as much.

Everything else is “cooked” on the fly as we know what “raw” need to be converted into html.

For the permalinks we cannot do that as if a permalink is added with an edit, it could reference a topic that has not yet been processed (higher topic Id) and those not being found in the topics_custom_field table at the time it is being processed.

I don’t know how you could have created topic_custom_fields without first creating the topic. I’d think you could do something like

TopicCustomField.each do |tcf|

and create the permalinks, but there’s a lot I don’t know about your code.

Let me clarify:

Topics and all its replies are imported following the topic ids from smaller to greater on the vbulletin database. That also means that we are importing in chronological order.

However, that would lead to think that if you ever find a reference to another topic, it would always be for another one that already existed.

But there are cases in which this is not true, just a couple of example:

  • split topic with a comment that lead to the split. The split would be with an id that is higher but exists in a topic with a lesser id.
  • edit for future readers in which old topics’ post have reference to more recent ones

So, yeah, while the topics_custom_field is generated and filled up while the import progresses, as explained in the very first topic, it’s not reliable to do it “on the fly” because you can’t be sure to find always the right correspondence between ids.

Another pass after the full import has completed is needed.

About TopicCustomField.each do |tcf|, I’m not sure what the tcf part would do. Ruby is not a language I’ve learnt. Our script is written in C# as the majority of the people that offered to work with it, use it for work already.

1 Like