I managed to successfully migrate a legacy self written forum software to discourse.
After the 24h import of the 2M records everything works like a charm except the permalinks.
I have about 350.000 permalinks in the permalinks database table of discourse. They are all in lowercase eg. “forum/t140842-s1/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html”
Unfortunately Google links to “/forum/t140842/8P-Hilfe-bei-1-8-tfsi-guter-Motor-oder-schlechter-Motor/” and I get a 404.
Any hints how I can lowercase the request URL before looking it up on the permalinks table without breaking any other stuff? Shouldn’t this kind of conversion before lookup not standard?
You want to create a permalink normalization that throws away the slug and create permalinks that include only at the forum id, I think.
All that’s needed to identify the correct topic is t140842-s1, right?
There are some other importers that do that, though I’m not sure which ones. Maybe vBulletin? But if you grep them all for normalization you should find m example.
Did not knew before migration that the lookup is case sensitive - otherwise I had try to take this into account and made sure that the url was saved the way it was in the old software.
But I am after migration now and the site is productive. So unfortunately I am looking for ways to work around.
I can identify a topic by its ID so “/forum/t140842” would be enough. Via SQL I could modify the url field in the permalinks table this way but will it than look up and redirect? Did a quick test and it did not?
There is a option for permalinks normalization in the discourse settings - but I did not get what this does.
Permalink normalization will rewrite the url before it matches the permalink, so you can use it to remove the slug. The description of it in the settings explains it, but perhaps only if you already understand it.
The old topic ids should be in topicCustomField, so you should be able to create new permalinks by looping through those. Then just delete the old ones. (or delete all of them once you’re convinced your can make the ones you want).
You can search here and in the other importers for permalink and normalization to find some examples (grep -r is one way) . If you need more help and have a budget I can help out next week.
In the configuration I changed my regex now to (I mean mine should matched the topic id as well)
forum/t(\d*)/?$forum/t\1
When I add a new permalink to the table with the url “forum/t140842” and save it … discourse changes the url to “f?$forum/t140842” - Either there is a bug or I don’t understand the concept here.
After further trying the regex normalization I was not able to figure out how that works.
Using the example in the description caused again unexpected results in the url field when adding new permalinks.
When I migrated to discourse I dropped the www prefix of the domain.
This gave me now the opportunity to rewrite the URI on the old server via Apache configuration and .htaccess to lowercase before redirecting to the new discourse server. This kind of solved my problem for now.
or forum/t140842-s2/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html
or forum/t140842/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html
to a simple url with only the id of the old topic: forum/t140842
This is done by a SQL Command which rewrites the URL via the REGEXP_REPLACE function:
INSERT INTO permalinks (created_at, updated_at, topic_id, url) SELECT NOW(), NOW(), topic_id, REGEXP_REPLACE(url,'forum/t(\d*)(-?.*)/(.*)','forum/t\1','') url FROM permalinks WHERE topic_id > 0 ON CONFLICT DO NOTHING;
Rewrite old Requests via .htaccess on the old domain
Google has indexed and is linking to the URL https://old-domain.com/forum/t140842/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html. I was lucky that this request ended on an Apache server because of a different domain and i could use .htaccess to rewrite easily. So I am rewriting this request to https://discourse-domain.com/forum/t140842. In the permalinks table I added this forum/t140842 record by using the already added permalink with the whole slugged url using a regex (see above).
Hope this helps somebody else as a starting point.