Canonical structure for /u/* causing many urls to be indexed

Initially, I wrote this for the “Bug” category …

… but don’t let me stop you if anyone feels this should warrant a PR or commit :crossed_fingers:

I’ve been studying the Discourse canonical structure pretty hard the past 2 months and overall it’s great.

Weird URLs with things like no_definitions=true or /search?q= give the correct canonical.

This all works when serving the JS version of the site to users and crawlers alike.

But it seems /u/* urls were overlooked - they have canonicals, but to URLs that are causing thousands of extra urls to be indexed by google.

The job:

I’d like all the urls after the users name to canonical to the main user profile page

So /u/FlyNumber is the main profile page

The following would have a canonical to the above ( instead of what happens now, which is a canonical to itself)

/u/FlyNumber/summary
/u/FlyNumber/activity
/u/FlyNumber/activity/topics
/u/FlyNumber/activity/replies
/u/FlyNumber/activity/likes-given
/u/FlyNumber/badges
1 Like

Is this for your custom indexing setup discussed here?

If so, it might be helpful to note this so those looking at the job know what they’re getting into.

By default, /u/ URLs are not indexed, which is set both in robots.txt and passed in the initial header request on the page.

image

4 Likes

No, as I’ve dropped using prerender - it couldn’t render the main menu , login button etc.

Google bot is getting the “app” directly

I have the discourse hidden site setting set to serve crawlers the JS version. Google seems to handle it well. (More updates on that soon.)

Good point, /badges as well…

I’m using a custom edited robots file

I’m using Cloudflare workers to alter the header to 'index'

/u/FlyNumber/summary
/u/FlyNumber/activity
/u/FlyNumber/activity/topics
/u/FlyNumber/activity/replies
/u/FlyNumber/activity/likes-given
/u/FlyNumber/badges

I’d like to also point out that removing the canonicals altogether for those urls would be a step in the right direction. (IMHO my method is better for SEO)

As stated above, it’s set to noindex so not sure why Discourse generates the canonical at all.

Perhaps someone knows a clever way of doing this with JS and cloudflare workers? This way I avoid messing with the Discourse code.

I can setup a “trigger” on /u/*/summary - (I can only trigger on /u/*) - and execute something like this:

const canonical = document.querySelector('link[rel="canonical"]');
if (canonical !== null) {
  canonical.href = 'NEW_HREF_GOES_HERE';
}

What can I do here to pass along the username to NEW_HREF_GOES_HERE - and the canonical would end up being /u/* instead of /u/*/summary.

Any help here is very much appreciated.

Edit

Perhaps someone can point to the relevant github page(s) - I’ll take my chances editing the code.