Redirecting old forum URLs to new Discourse URLs

Sorry to go necro on an old topic, but thanks so much!

I had to do this though because rails and ruby weren’t in my PATH:

cd /var/discourse
./launcher enter app
PATH="$PATH:/usr/local/bin"
rails c
Permalink.create(url: ‘/discussion/12345’, topic_id: 987)

Does anyone know how to delete these though?

The following does not seem to remove it:
Permalink.delete(id: 1)

Hmm, that’s weird since at that point you’re inside the docker container…

That’s not how you delete a “record” in rails. Try

Permalink.where(topic: 1234, url: "/bla").destroy
3 Likes

Sorry, but I am rather clueless about how to deal with Rails, and I need some help about how to use this for redirecting categories and subcategories.

What is the category_id of a (sub)category exactly? Is it the “category slug”, or in the case of a subcategory is it “main category slug/subcategory slug”? Or is it really some number that I need to look up somehow?

An example for redirecting subcategories would be tremendously helpful!

I’ve messed up the first Permalink that I’ve created and I don’t know how to delete it again.

didn’t work for me. I get the following error:

Permalink.where(url: ‘/c/old-category’, category_id: ‘new-category’).destroy
ArgumentError: wrong number of arguments (0 for 1)
from /var/www/discourse/vendor/bundle/ruby/2.0.0/gems/activerecord-4.1.10/lib/active_record/relation.rb:414:in `destroy’

How can I do this right?

To find the id of a subcategory, you can look it up by the slug like this:

Category.find_by_slug('products').id

To delete the permalink for that url, do this:

Permalink.find_by_url("/blah").destroy

There can be only one permalink record per url, so just search by url.

3 Likes

Is there a way to include this automatically in code when I run the migration script? Struggling with how to manage this for importing from Vanilla Forums…

I’ve just created a topic map from MyBB to Discourse automatically, using the migration script.

MyBB was set to use SEO-friendly URLs without IDs in them. Now for example when I navigate to /thread-foo-bar, nginx redirects to /t/foo-bar/12. Here’s how I did it:

  1. Patch the importer to output lines that end up creating a map file to use for for nginx’s map module. For the MyBB importer, I added this code in create_posts:

    parent = topic_lookup_from_imported_post_id(m['first_post_id'])
    if parent
      puts "\nXXX #{m['topic_id']}: #{parent[:topic_id]},"
    end
    

    After that, I grepped for lines starting with XXX, removed the XXX, and made the file a JSON object, which I pasted into this script. Change the URLs to your forums, run the script, and its output will be a series of nginx map lines. I saved it as /etc/nginx/mybb2discourse.map.

  2. Configure nginx to “run other websites on the same machine as Discourse”, while making the following modifications to the nginx config file (/etc/nginx/conf.d/discourse.conf) in order to point nginx to the map file:

    • insert this at the top of the file:
    map_hash_bucket_size 128;
    map_hash_max_size 50000;  # might have to increase this
    
    map $uri $new {
        include /etc/nginx/mybb2discourse.map;
    }
    
    • then in the server section, add:
    if ($new) {
        rewrite ^ $new permanent;
    }
    
  3. Complete the nginx reload and container rebuild steps from the end of the Configure nginx… post linked above.

Would be great if someone who’s better with Ruby patched the importer to output the topic IDs map (or even better, the nginx map directly).

6 Likes

poor performance …
it will generate millions of regexp, and nginx have to process each of it in every request.

Do you have a better proposal?

Or a performance benchmark? I suspect that up to a pretty high number of regexps, the bottleneck is by far the entire Rails request processing + database lookup + response building stack, not nginx’s entirely memory-contained regexp matching.

After that, I grepped for lines starting with XXX, removed the XXX, and made the file a JSON object, which I pasted into this script…

I just copied this new piece of code into the script. How it works, how to get this json object/file?

this doesn’t seem to capture the threads with 0 replies in my mybb database :frowning: Any idea what the issue could be? Or any suggestions on a cleaner way to get a map of old to new threads?

Permalinks and normalizers are the most frustrating, unclear, under documented feature of Discourse that i’ve run into so far. Having a horrible time setting these up. Just wanted to vent my frustration here. I’ve read Problem with permalinks, or regex? as well as other posts on vbulletin specific importers.

Great feature idea, just wish i could figure out how to use it properly.

3 Likes

I understand your frustration. It took me a while to figure them out.

I think it’s because the feature is used infrequently (just when you do an import) and by relatively few people (people who write importers). And once you’ve figured it out for the current problem, you just move on.

3 Likes

I’m soon to be moving on… to another vbulletin import to Discourse :slight_smile: So I’ll share what I’m doing, and after a couple more of these i’ll compile all my lessons learned somewhere.

I wrote an importer for permalinks that solved my vbulletin4 redirects for old permalinks.

To get it to work - add the following “permalink normalizations” in Admin settings to get these redirects to work.

Example 1, you have urls like this:

/forums/f10/some-thread-title-here-51689/index1.html
/forums/f10/some-thread-title-here-51689/index15.html#323423

Normalization 1. This is for the above 2 examples. Add this normalization into the adminsetting first (order of normalizations is important!)

/(forums)/f[0-9]+/.±([0-9]+)/index[0-9]*.html/\1\2

Example 2, your vbulletin also has permalinks like this:

/forums/f10/some-thread-title-here-51689/

Normalization 2. This is for the above example permalink. Add this one normalization second (order of normalizations is important!)

/(forums)/f[0-9]+/.±([0-9]+)/\1\2

And then run this import script after completing the bulk-import or normal import scripts for vbulletin (btw, i had to use both official import scripts, and modified them because neither solved my needs alone: forum around ~1million posts)

5 Likes

Could you add that to the script and submit a PR?

Also, you can set the permalink normalizations in the script (rather than the web interface) something like this:

SiteSetting.permalink_normalizations='/topic/(.*t)\?.*/\1'

If you don’t know what logic is required to know which permalink normalization to use, just pick one and add the other one as a comment. People running the importer will see the code before they can find this thread. :slight_smile:

1 Like

If you need to delete all Permalinks at once, use Permalink.all.each { |p| p.destroy } from rails console.

3 Likes

Permalink.destroy_all is shorter and more efficient :wink:

10 Likes

7 posts were split to a new topic: Discourse to WordPress redirect questions

Apologies for opening an old thread but it seems like a good place for my question to sit as some of what I’m asking has been touched on but not fully answered.

I’m trying to ensure that I understand the workflow for permalink normalistation and as others have said there really doesn’t seem to be a great deal of documentation around this.

Can I just confirm my understanding / misunderstanding of the permalink normalisation process or at least the process that normalisation plays in redirects?

  1. URL comes in and isn’t matched to any route
  2. Before 404 is thrown - we check for a permalink rule matching our URL
  3. Before we attempt to match the URL, we apply a permalink_normalization regex on the inbound URL turning it into a new string
  4. We look for an exact match between the new string generated in 3. and the url column in the permalinks table
  5. If we find a match we redirect the visitor to the relevant category / topic / post described in the permalinks row.

IF that is the correct flow, can I ask

  1. What strategies do people use to generate the new string from the regex? Presumably, regardless of the incoming url, we could just generate /topic/d9aa09c3-19bd-4c6e-9d8d-a8f1008000a1, /post/4a512429-0e2d-4437-826c-a7590144617c or /category/elephants (yes, MVCF does use UUID descriptors on the url for topics and posts!)
  2. As you can have multiple permalink_normalization entries, are they applied in order until a match is found or a 404 is raised?
  3. Any other gotchas?

Thanks

3 Likes

Yup, I think that’s it.

1 Like

Thanks for the advice/validation (AGAIN) @pfaffman, I did manage to the get the redirects working.

Just wanted to circle back to this to mention a few of the gotchas that I found and perhaps leave some breadcrumbs for future travellers - because I found this hellishly difficult to debug.

Escaping in the permalink normalization string

The format of the permalink normalization string has two components

  1. the Regular Expression string
  2. the Replacement string

They appear, one immediately after the other, in the permalink normalization string like so

         Permalink Normalization
    Regular Expression       Replacement
<-------------------------><------------->
/(this)reallyis(intuitive)/\1reallyisn't\2

Importantly, slashes are treated differently in the different parts of the same string.

A slash (and other regex chars) in the Regular Expression part of the string must be escaped, however, slashes do not need to be escaped in the Replacement part of the same string and will instead be treated literally.

The Format of incoming URL strings

Secondly, and this took me a while to nail down, you match the URL as a relative path description from root but you will not receive the / as the first part of the string.

For example, if the URL that your old forum uses looked like this…

http://oldforum.com/chat/the-topic-title/post/d9aa09c3-19bd-4c6e-9d8d-a8f1008000a1

…then the URL that your the regular expression in your permalink normalization will match against will look like this…

chat/topic-title/post/d9aa09c3-19bd-4c6e-9d8d-a8f1008000a1

i.e. a path description from root but without the leading / slash. (I guess that YMMV here depending on the structure of the URLs that you are redirecting - but I don’t think so).

Examples

Here are some examples from my migration project

CATEGORY_LINK_NORMALIZATION = '/(cat)\/(.*?)([#\?].*)?$/cat/\2'
POST_LINK_NORMALIZATION = '/chat\/(.*?)\/(post)\/(.+?)([#\?].*)?$/post/\3'
TOPIC_LINK_NORMALIZATION = '/(chat)\/(.*?)([#\?].*)?$/topic/\2'

The Process

Old URL Permalink Normalization URL Match Text
http://oldsite.com/cat/history /(cat)\/(.*?)([#\?].*)?$/cat/\2 cat/history
http://oldsite.com/chat/topic-title/post/d9aa09c3-19bd-4c6e-9d8d-a8f1008000a1 /chat\/(.*?)\/(post)\/(.+?)([#\?].*)?$/post/\3 post/d9aa09c3-19bd-4c6e-9d8d-a8f1008000a1
http://oldsite.com/chat/mindgames-in-football /(chat)\/(.*?)([#\?].*)?$/topic/\2 topic/mindgames-in-football

The Old URL is as it sounds - the URL of the item in the old system.

The permalink normalization (recorded in the permalink_normalizations system setting) will grab the incoming URL (without the leading slash /) and apply the regex match. The resulting normalised URL is then used to match against the URL Match Text entered on the /admin/customize/permalinks screen.

3 Likes