Logré migrar con éxito un antiguo software de foro desarrollado internamente a Discourse.
Después de la importación de 24 horas de los 2 millones de registros, todo funciona a la perfección, excepto los enlaces permanentes.
Tengo aproximadamente 350.000 enlaces permanentes en la tabla de la base de datos de permalinks de Discourse. Todos están en minúsculas, por ejemplo: “forum/t140842-s1/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html”.
Desafortunadamente, Google enlaza a “/forum/t140842/8P-Hilfe-bei-1-8-tfsi-guter-Motor-oder-schlechter-Motor/” y recibo un error 404.
¿Alguna sugerencia sobre cómo puedo convertir a minúsculas la URL de la solicitud antes de buscarla en la tabla de permalinks sin romper nada más? ¿No debería ser estándar realizar este tipo de conversión antes de la búsqueda?
You want to create a permalink normalization that throws away the slug and create permalinks that include only at the forum id, I think.
All that’s needed to identify the correct topic is t140842-s1, right?
There are some other importers that do that, though I’m not sure which ones. Maybe vBulletin? But if you grep them all for normalization you should find m example.
Did not knew before migration that the lookup is case sensitive - otherwise I had try to take this into account and made sure that the url was saved the way it was in the old software.
But I am after migration now and the site is productive. So unfortunately I am looking for ways to work around.
I can identify a topic by its ID so “/forum/t140842” would be enough. Via SQL I could modify the url field in the permalinks table this way but will it than look up and redirect? Did a quick test and it did not?
There is a option for permalinks normalization in the discourse settings - but I did not get what this does.
Permalink normalization will rewrite the url before it matches the permalink, so you can use it to remove the slug. The description of it in the settings explains it, but perhaps only if you already understand it.
The old topic ids should be in topicCustomField, so you should be able to create new permalinks by looping through those. Then just delete the old ones. (or delete all of them once you’re convinced your can make the ones you want).
You can search here and in the other importers for permalink and normalization to find some examples (grep -r is one way) . If you need more help and have a budget I can help out next week.
In the configuration I changed my regex now to (I mean mine should matched the topic id as well)
forum/t(\d*)/?$forum/t\1
When I add a new permalink to the table with the url “forum/t140842” and save it … discourse changes the url to “f?$forum/t140842” - Either there is a bug or I don’t understand the concept here.
After further trying the regex normalization I was not able to figure out how that works.
Using the example in the description caused again unexpected results in the url field when adding new permalinks.
When I migrated to discourse I dropped the www prefix of the domain.
This gave me now the opportunity to rewrite the URI on the old server via Apache configuration and .htaccess to lowercase before redirecting to the new discourse server. This kind of solved my problem for now.
or forum/t140842-s2/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html
or forum/t140842/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html
to a simple url with only the id of the old topic: forum/t140842
This is done by a SQL Command which rewrites the URL via the REGEXP_REPLACE function:
INSERT INTO permalinks (created_at, updated_at, topic_id, url) SELECT NOW(), NOW(), topic_id, REGEXP_REPLACE(url,'forum/t(\d*)(-?.*)/(.*)','forum/t\1','') url FROM permalinks WHERE topic_id > 0 ON CONFLICT DO NOTHING;
Rewrite old Requests via .htaccess on the old domain
Google has indexed and is linking to the URL https://old-domain.com/forum/t140842/8p-hilfe-bei-1-8-tfsi-guter-motor-oder-schlechter-motor.html. I was lucky that this request ended on an Apache server because of a different domain and i could use .htaccess to rewrite easily. So I am rewriting this request to https://discourse-domain.com/forum/t140842. In the permalinks table I added this forum/t140842 record by using the already added permalink with the whole slugged url using a regex (see above).
Hope this helps somebody else as a starting point.