Plugin to help mapping pre-migration threads after migration

Crius · February 8, 2023, 6:43pm

Hello, me and my group of crazy idiots are really close to finally migrate our Vbulletin3 forum into Discourse after having wrote an ad-hoc script that finally manages to migrate all the 21 millions replies from the original database into discourse.

Now, we have the problem of links to the topics/replies written in the replies themselves.

In the migration we have written, we write a mapping of the “old” topic and post Ids and to what they map into in discourse.

For example:

   id   | topic_id |   name    | value  |         created_at         |         updated_at
--------+----------+-----------+--------+----------------------------+----------------------------
 581727 |   581736 | import_id | 599137 | 2023-02-08 16:30:01.600759 | 2023-02-08 16:30:01.600759

What I was thinking now is a plugin that simply intercept links to the old forum format and transform them with reference to the new thread/reply.

So for example, something like:

https://oldforum.something.com/showthread.php?t=123456

Will trigger a query search using the topics_custom_field for the value 123456, find the discourse topic_id, then query the topic_links table with that id and find the url. Finally replace it in the post on the client side (assuming js to manipulate the content).

Something similar for posts.

However, I can’t find any good example of how to even start creating something like that for discourse.
Can someone give me some hints, example or plugins that would do something similar (check replies for some substring and replace it, query the API? DB? for one value to retrieve another?).

Thank you

RGJ · February 8, 2023, 7:19pm

This already exists in core, it is called Permalinks, and the existing VB4 importer has code for it

github.com

discourse/discourse/blob/main/script/import_scripts/vbulletin.rb#L375-L389


      
              # Add the following to permalink_normalizations for this to work:
              # /forum\/.*?\/(\d*)\-.*/thread/\1
          
          
    topics.each do |thread|
                topic_id = "thread-#{thread["threadid"]}"
                topic = topic_lookup_from_imported_post_id(topic_id)
                if topic.present?
                  url_slug = "thread/#{thread["threadid"]}" if thread["title"].present?
                  if url_slug.present? && topic[:topic_id].present?
                    Permalink.create(url: url_slug, topic_id: topic[:topic_id].to_i)
                  end
                end
              end
            end
          end

You should enter something like /showthread\.php\?(\d*)/thread/\1 into the permalink_normalizations setting.

Crius · February 8, 2023, 7:21pm

Just to confirm, I should run this logic after the migration has completed right?
So it goes through all the replies again and changes the permalinks

RGJ · February 8, 2023, 7:27pm

How do you mean, changes? Do you already have permalinks?

Crius · February 8, 2023, 7:45pm

When we migrate the content of a reply with, for example: https://oldforum.something.com/showthread.php?t=123456 doesn’t know what id that topic will have on discourse… no?

RGJ · February 8, 2023, 7:46pm

it will if you use the above code to create permalinks.

showthread.php?t= refers to a topic/thread and not to a reply btw

Crius · February 8, 2023, 7:50pm

I was just using that link as an example

Unfortunately we cannot use that code because the import takes ages to import 20 million posts and the bulk import simply doesn’t work. There are missing pieces.

That’s why we had to write our own migration script. It does it all (pm, users, usergroups, categories, topic, replies) in about 6 hours with a 4 cores, 8gb ram but we noticed that we were missing the permalinks

RGJ · February 8, 2023, 8:27pm

Maybe you can consider the pre-permalinks nginx map solution? Redirect vBulletin URLs to Discourse URLs

Crius · February 8, 2023, 9:46pm

We discussed internally and will simply do a second pass when all replies have been migrated.

Thanks for bouncing back ideas with me Richard

pfaffman · February 9, 2023, 3:41am

Did your script create import_ids? If so, even if your didn’t create the permalinks, you can fairly quickly process those to create them.

Crius · February 9, 2023, 7:51am

Hey, yes Jay, we do.

Crius:

In the migration we have written, we write a mapping of the “old” topic and post Ids and to what they map into in discourse.

For example:
   id   | topic_id |   name    | value  |         created_at         |         updated_at
--------+----------+-----------+--------+----------------------------+----------------------------
 581727 |   581736 | import_id | 599137 | 2023-02-08 16:30:01.600759 | 2023-02-08 16:30:01.600759

We were trying to avoid cycling the whole 20 million + replies again but realised that alternatives solutions (plugin, nginx redirect, etc) would be quite convoluted or rely on external factors that would make it a half-assed solution so, we will simply cycle the replies again and process the permalinks. It will add some time to the migration but hopefully not as much.

Everything else is “cooked” on the fly as we know what “raw” need to be converted into html.

For the permalinks we cannot do that as if a permalink is added with an edit, it could reference a topic that has not yet been processed (higher topic Id) and those not being found in the topics_custom_field table at the time it is being processed.

pfaffman · February 9, 2023, 2:55pm

I don’t know how you could have created topic_custom_fields without first creating the topic. I’d think you could do something like

TopicCustomField.each do |tcf|

and create the permalinks, but there’s a lot I don’t know about your code.

Crius · February 9, 2023, 3:54pm

Let me clarify:

Topics and all its replies are imported following the topic ids from smaller to greater on the vbulletin database. That also means that we are importing in chronological order.

However, that would lead to think that if you ever find a reference to another topic, it would always be for another one that already existed.

But there are cases in which this is not true, just a couple of example:

split topic with a comment that lead to the split. The split would be with an id that is higher but exists in a topic with a lesser id.
edit for future readers in which old topics’ post have reference to more recent ones

So, yeah, while the topics_custom_field is generated and filled up while the import progresses, as explained in the very first topic, it’s not reliable to do it “on the fly” because you can’t be sure to find always the right correspondence between ids.

Another pass after the full import has completed is needed.

About TopicCustomField.each do |tcf|, I’m not sure what the tcf part would do. Ruby is not a language I’ve learnt. Our script is written in C# as the majority of the people that offered to work with it, use it for work already.

Topic		Replies	Views
Redirect old forum URLs to new Discourse URLs using permalinks Migrating to Discourse how-to	10	28717	September 13, 2024
Migration from FluxBB while preserving incoming links Migration fluxbb	8	1317	November 14, 2022
Bulk importing Redirects to New Discourse Topics Migration	2	227	April 12, 2024
Remapping old imported forum permalinks to posts and topics? Dev	7	1455	October 31, 2016
Problem with permalinks after migration Migration	8	593	January 17, 2021

Plugin to help mapping pre-migration threads after migration

Related topics