Hello, me and my group of crazy idiots are really close to finally migrate our Vbulletin3 forum into Discourse after having wrote an ad-hoc script that finally manages to migrate all the 21 millions replies from the original database into discourse.
Now, we have the problem of links to the topics/replies written in the replies themselves.
In the migration we have written, we write a mapping of the “old” topic and post Ids and to what they map into in discourse.
For example:
id | topic_id | name | value | created_at | updated_at
--------+----------+-----------+--------+----------------------------+----------------------------
581727 | 581736 | import_id | 599137 | 2023-02-08 16:30:01.600759 | 2023-02-08 16:30:01.600759
What I was thinking now is a plugin that simply intercept links to the old forum format and transform them with reference to the new thread/reply.
Will trigger a query search using the topics_custom_field for the value 123456, find the discourse topic_id, then query the topic_links table with that id and find the url. Finally replace it in the post on the client side (assuming js to manipulate the content).
Something similar for posts.
However, I can’t find any good example of how to even start creating something like that for discourse.
Can someone give me some hints, example or plugins that would do something similar (check replies for some substring and replace it, query the API? DB? for one value to retrieve another?).
When we migrate the content of a reply with, for example: https://oldforum.something.com/showthread.php?t=123456 doesn’t know what id that topic will have on discourse… no?
Unfortunately we cannot use that code because the import takes ages to import 20 million posts and the bulk import simply doesn’t work. There are missing pieces.
That’s why we had to write our own migration script. It does it all (pm, users, usergroups, categories, topic, replies) in about 6 hours with a 4 cores, 8gb ram but we noticed that we were missing the permalinks
We were trying to avoid cycling the whole 20 million + replies again but realised that alternatives solutions (plugin, nginx redirect, etc) would be quite convoluted or rely on external factors that would make it a half-assed solution so, we will simply cycle the replies again and process the permalinks. It will add some time to the migration but hopefully not as much.
Everything else is “cooked” on the fly as we know what “raw” need to be converted into html.
For the permalinks we cannot do that as if a permalink is added with an edit, it could reference a topic that has not yet been processed (higher topic Id) and those not being found in the topics_custom_field table at the time it is being processed.
Topics and all its replies are imported following the topic ids from smaller to greater on the vbulletin database. That also means that we are importing in chronological order.
However, that would lead to think that if you ever find a reference to another topic, it would always be for another one that already existed.
But there are cases in which this is not true, just a couple of example:
split topic with a comment that lead to the split. The split would be with an id that is higher but exists in a topic with a lesser id.
edit for future readers in which old topics’ post have reference to more recent ones
So, yeah, while the topics_custom_field is generated and filled up while the import progresses, as explained in the very first topic, it’s not reliable to do it “on the fly” because you can’t be sure to find always the right correspondence between ids.
Another pass after the full import has completed is needed.
About TopicCustomField.each do |tcf|, I’m not sure what the tcf part would do. Ruby is not a language I’ve learnt. Our script is written in C# as the majority of the people that offered to work with it, use it for work already.