Problem with permalinks after migration

Hi there,

I managed to successfully migrate a legacy self written forum software to discourse.
After the 24h import of the 2M records everything works like a charm except the permalinks.

I have about 350.000 permalinks in the permalinks database table of discourse. They are all in lowercase eg. “forum/t140842-s1/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html”

Unfortunately Google links to “/forum/t140842/8P-Hilfe-bei-1-8-tfsi-guter-Motor-oder-schlechter-Motor/” and I get a 404. :frowning:

Any hints how I can lowercase the request URL before looking it up on the permalinks table without breaking any other stuff? Shouldn’t this kind of conversion before lookup not standard?

Thank your for the great software!

Göks

Welcome! Congrats on getting this far!

You want to create a permalink normalization that throws away the slug and create permalinks that include only at the forum id, I think.

All that’s needed to identify the correct topic is t140842-s1, right?

There are some other importers that do that, though I’m not sure which ones. Maybe vBulletin? But if you grep them all for normalization you should find m example.

1 Like

Thanks for the fast reply.

Did not knew before migration that the lookup is case sensitive - otherwise I had try to take this into account and made sure that the url was saved the way it was in the old software.

But I am after migration now and the site is productive. So unfortunately I am looking for ways to work around.

I can identify a topic by its ID so “/forum/t140842” would be enough. Via SQL I could modify the url field in the permalinks table this way but will it than look up and redirect? Did a quick test and it did not?

There is a option for permalinks normalization in the discourse settings - but I did not get what this does.

Permalink normalization will rewrite the url before it matches the permalink, so you can use it to remove the slug. The description of it in the settings explains it, but perhaps only if you already understand it. :man_shrugging:

The old topic ids should be in topicCustomField, so you should be able to create new permalinks by looping through those. Then just delete the old ones. (or delete all of them once you’re convinced your can make the ones you want).

You can search here and in the other importers for permalink and normalization to find some examples (grep -r is one way) . If you need more help and have a budget I can help out next week.

1 Like

Alright, so I have this request

https://a3-freunde.de/forum/t140842/8P-Hilfe-bei-1-8-tfsi-guter-Motor-oder-schlechter-Motor

and one permalink normalization in the configuration like

/t(\d*)/?$forum/t\1

which should result in “forum/t140842” for the request above.

When I add the url “forum/t140842” to the discourse permalinks table it shows no effect?
Are my assumptions correct? What do I miss out?

You’re on the right track. That regex stuff can be fiddly. I think you need forum/ at the start of your permalink.

Thanks for the effort. I appreciate.

In the configuration I changed my regex now to (I mean mine should matched the topic id as well)

forum/t(\d*)/?$forum/t\1

When I add a new permalink to the table with the url “forum/t140842” and save it … discourse changes the url to “f?$forum/t140842” - Either there is a bug or I don’t understand the concept here.

:confused:

After further trying the regex normalization I was not able to figure out how that works.
Using the example in the description caused again unexpected results in the url field when adding new permalinks.

When I migrated to discourse I dropped the www prefix of the domain.
This gave me now the opportunity to rewrite the URI on the old server via Apache configuration and .htaccess to lowercase before redirecting to the new discourse server. This kind of solved my problem for now.

Did not work as expected because the slug part of the URL was generated in the migration-script slighty different as in the old software.

So I “normalized” the URL by my own.

Add/generate normalized URLs (Discourse Permalinks Table)

In the url-field get from

forum/t140842-s1/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html

or
forum/t140842-s2/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html

or
forum/t140842/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html

to a simple url with only the id of the old topic:
forum/t140842

This is done by a SQL Command which rewrites the URL via the REGEXP_REPLACE function:

INSERT INTO permalinks (created_at, updated_at, topic_id, url) SELECT NOW(), NOW(), topic_id, REGEXP_REPLACE(url,'forum/t(\d*)(-?.*)/(.*)','forum/t\1','') url FROM permalinks WHERE topic_id > 0 ON CONFLICT DO NOTHING;

Rewrite old Requests via .htaccess on the old domain

RewriteRule ^forum/t([0-9]*)(-?.*)/(.*)$ https://discourse-domain.com/forum/t$1 [R=301,L]

So what happens here?

Google has indexed and is linking to the URL https://old-domain.com/forum/t140842/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html. I was lucky that this request ended on an Apache server because of a different domain and i could use .htaccess to rewrite easily. So I am rewriting this request to https://discourse-domain.com/forum/t140842. In the permalinks table I added this forum/t140842 record by using the already added permalink with the whole slugged url using a regex (see above).

Hope this helps somebody else as a starting point.

1 Like