Just wanted to circle back to this to mention a few of the gotchas that I found and perhaps leave some breadcrumbs for future travellers - because I found this hellishly difficult to debug.
Escaping in the permalink normalization string
The format of the permalink normalization string has two components
- the Regular Expression string
- the Replacement string
They appear, one immediately after the other, in the permalink normalization string like so
Permalink Normalization
Regular Expression Replacement
<-------------------------><------------->
/(this)reallyis(intuitive)/\1reallyisn't\2
Importantly, slashes are treated differently in the different parts of the same string.
A slash (and other regex chars) in the Regular Expression part of the string must be escaped, however, slashes do not need to be escaped in the Replacement part of the same string and will instead be treated literally.
The Format of incoming URL strings
Secondly, and this took me a while to nail down, you match the URL as a relative path description from root but you will not receive the /
as the first part of the string.
For example, if the URL that your old forum uses looked like this…
http://oldforum.com/chat/the-topic-title/post/d9aa09c3-19bd-4c6e-9d8d-a8f1008000a1
…then the URL that your the regular expression in your permalink normalization will match against will look like this…
chat/topic-title/post/d9aa09c3-19bd-4c6e-9d8d-a8f1008000a1
i.e. a path description from root but without the leading /
slash. (I guess that YMMV here depending on the structure of the URLs that you are redirecting - but I don’t think so).
Examples
Here are some examples from my migration project
CATEGORY_LINK_NORMALIZATION = '/(cat)\/(.*?)([#\?].*)?$/cat/\2'
POST_LINK_NORMALIZATION = '/chat\/(.*?)\/(post)\/(.+?)([#\?].*)?$/post/\3'
TOPIC_LINK_NORMALIZATION = '/(chat)\/(.*?)([#\?].*)?$/topic/\2'
The Process
The Old URL is as it sounds - the URL of the item in the old system.
The permalink normalization (recorded in the permalink_normalizations
system setting) will grab the incoming URL (without the leading slash /) and apply the regex match. The resulting normalised URL is then used to match against the URL Match Text entered on the /admin/customize/permalinks
screen.