Permalink normalizations don't work for internal links

(Tomek) #1

All permalink normalizations work only for users coming from external websites (like google, other domains, bookmarks or copy/paste to the browser) - is this to be expected or is it a bug?

Is it possible for normalization to work also for links published inside the forum?

When someone clicks on a internal link that needs to be normalized before checking the permalinks table, Discourse returns 404. You can refresh this 404 page to get the right content.

It looks as if Discourse is performing the normalization only for new sessions.

Topic permalinks broken in posts?
(Sam Saffron) #2

We talked about adding a fallback at some time with @riking

At the moment the ember router needs to hit a whitelist to leak full internal reqs out.

I do wonder @eviltrout if we should change it so the Ember router falls through to doing a full page request for any routes it does not recognise.

(Jeff Atwood) #3

Why would you need this? Can you provide an example?

(Jay Pfaffman) #4

Imported data includes direct links to topics on the old site with the old URLs. The imported site includes two different redirect schemes (in its history, it had been on two different forums before Discourse), so links might be /oldurl/something.php?id=SOMEID. As described, if you visit that link directly, it works fine, but links to there on the forum 404. You can then reload them and they work.

(Jeff Atwood) #5

The primary way to avoid this is to put the new forum on a different subdomain. That way redirects can be handled upstream at the nginx level. If you are redirecting anyway, there is no reason to overload the same URLs.

I believe we ran into this situation before with a migration that @techapj did.

(Jay Pfaffman) #6

I don’t understand the internals to know how this might be a bad idea, but this definitely what this user wants.

This almost obviates the usefulness of permalinks in the first place (at least as most importers use them). It doesn’t seem like people wanting to put their new Discourse forum on their old URL is an edge case.

(Jeff Atwood) #7

Old domain, yes, exact same URLs? Not a good idea. In other words:

Then redirecting one to the other is super easy and avoids the problem of internal links.

It just depends how much pain you want to absorb after the migration. You will be redirecting in any case.

Also, I think there are url normalization regexes you can use. Check the permalink normalizations site setting

(Jay Pfaffman) #8

I don’t understand how using the old url is a bad idea.

Running a whole 'nother web server on to handle redirects doesn’t seem super easy when Discourse handles those redirects just fine.

Oh, wait. Maybe I am starting to understand. Are you saying that if oldforum and newforum are the same machine/IP then those links will work? That would seem to solve the problem unless the old links are relative (not including hostname). I haven’t checked, but I’d imagine that most of those URLs linking to topics within the site don’t include the hostname, so I don’t see how this will help.

I’m still liking @sam’s suggestion.

(Jeff Atwood) #9

Did you look at this setting as requested?

(Jay Pfaffman) #10

Yes. I’m using permalink normalizations.

The problem is that for internal links permalink normalizations are not honored, and the user gets a 404. If they then reload, it works fine. This is just how links to /raw/123 used to work, but that got changed a while back.

I finally found an example. See this page. Click the 2nd or 3rd link (the first one doesn’t work for some reason). You get a 404, but if you reload, or instead open the link in a new tab, it works just fine.

(Tomek) #11

This is exactly the case. You cannot go around this problem by setting a new forum on new domain (and why should you).

As I see it, permalinks (+normalization) should help with this exact problem: old urls (from previous url schemes like an old forum) should redirect with 301 to moved content. It should handle redirects for users browsing the forum and encountering an old link inside some old post.

(Jeff Atwood) #12

Why don’t you simply replace the old URLs in all post content with the new URLs?

The primary audience for this is people visiting links from other websites to your old URL scheme. For internal links, you control the content and should change the links to be correct.

(Tomek) #13

Changing 3m+ posts on a big forum is tricky and risky (running several regex replaces, rebaking posts will take forever and can easily breake some content). It would be much easier for permalink normalization to handle this but if this is not possible we will have to do a reformat :slight_smile:

(Jeff Atwood) #14

No need to rebake just replace only the posts containing that link format.

(Robin Ward) #15

We’ve had a PR for this in the past, but I rejected it because it worries me that we could end up in a redirect loop. It would need to be very smart to only redirect once somehow (maybe a cookie?)

Custom permalinks not resolving in Hamburger Menu
(Sam Saffron) #16

This is actually desirable, it will save you a lot of traffic when google crawls cause there will be no need for the redirects. I would recommend doing so, we have the discoruse remap task that already provides you a framework. You could also write a script that does it.

Actual process would only take minutes, no need to do a rebake, you just change both cooked and raw.

I think this is the correct way to go about this, having the app do extra work to work around a bad import seems like it is not the right way to go about things.

(Tomek) #17

Thanks, this is what we are going to do :slight_smile:

(Jay Pfaffman) #18

Thanks, @sam.

Though your explanation makes sense and I understand the potential problem with loops, I’ve not noticed any importers that do this kind of replacement of internal links. I don’t want to write more code that generates a “bad import.” Can you point to an example of what the importer should have done and what the recommended way to see that old internal links continue to work after import to Discourse?

When I asked about this in October, permallink Normalizations seemed to be the answer (but I was concerned then with external sites linking in).

To remap the old IPB URLs to the new Discourse ones, you’d need to know the mapping of the IPB IDs to the Discourse IDs, right? Or are you suggesting that the old URLs be remapped to include another host name or path that would be handled by an external NGINX then do the redirects?

Edit: Actually, the IPB importer that I wrote does replace some internal links. When a post is quoted, I rewrite the post as Discourse Markdown

[quote="user, post:xx, topic:yyy"]
The stuff they said

Is Best Practice to identify all internal links and rewrite them?

Thanks for your help.

(Tomek) #19

After some consideration - it’s not so easy.

Simply running a remap task to replace strings is not possible because you need to change old post IDs (in my case, IPB) to new post IDs (Discourse). This happen during import and results of this operation are in permalinks table. To do it again for existing posts would require writing some more complex script.

(Jay Pfaffman) #20

No, it’s not, and as I said above, I’m not aware of other importers that replace internal links with new Discourse ones. That said, I do replace some URLS already for the quotation references, so I’m going to look today at also replacing at least the IPB links.