Canonical structure for /u/* causing many urls to be indexed

stance455 · May 5, 2022, 3:58pm

Initially, I wrote this for the “Bug” category …

… but don’t let me stop you if anyone feels this should warrant a PR or commit

I’ve been studying the Discourse canonical structure pretty hard the past 2 months and overall it’s great.

Weird URLs with things like no_definitions=true or /search?q= give the correct canonical.

This all works when serving the JS version of the site to users and crawlers alike.

But it seems /u/* urls were overlooked - they have canonicals, but to URLs that are causing thousands of extra urls to be indexed by google.

The job:

I’d like all the urls after the users name to canonical to the main user profile page

So /u/FlyNumber is the main profile page

The following would have a canonical to the above ( instead of what happens now, which is a canonical to itself)

/u/FlyNumber/summary
/u/FlyNumber/activity
/u/FlyNumber/activity/topics
/u/FlyNumber/activity/replies
/u/FlyNumber/activity/likes-given
/u/FlyNumber/badges

justin · May 5, 2022, 5:14pm

Is this for your custom indexing setup discussed here?

If so, it might be helpful to note this so those looking at the job know what they’re getting into.

By default, /u/ URLs are not indexed, which is set both in robots.txt and passed in the initial header request on the page.

stance455 · May 5, 2022, 5:36pm

No, as I’ve dropped using prerender - it couldn’t render the main menu , login button etc.

Google bot is getting the “app” directly

I have the discourse hidden site setting set to serve crawlers the JS version. Google seems to handle it well. (More updates on that soon.)

Good point, /badges as well…

I’m using a custom edited robots file

I’m using Cloudflare workers to alter the header to 'index'

stance455 · May 6, 2022, 6:22pm

/u/FlyNumber/summary
/u/FlyNumber/activity
/u/FlyNumber/activity/topics
/u/FlyNumber/activity/replies
/u/FlyNumber/activity/likes-given
/u/FlyNumber/badges

I’d like to also point out that removing the canonicals altogether for those urls would be a step in the right direction. (IMHO my method is better for SEO)

As stated above, it’s set to noindex so not sure why Discourse generates the canonical at all.

stance455 · May 15, 2022, 5:14pm

Perhaps someone knows a clever way of doing this with JS and cloudflare workers? This way I avoid messing with the Discourse code.

I can setup a “trigger” on ~~/u/*/summary~~ - (I can only trigger on /u/*) - and execute something like this:

const canonical = document.querySelector('link[rel="canonical"]');
if (canonical !== null) {
  canonical.href = 'NEW_HREF_GOES_HERE';
}

What can I do here to pass along the username to NEW_HREF_GOES_HERE - and the canonical would end up being /u/* instead of /u/*/summary.

Any help here is very much appreciated.

Edit

Perhaps someone can point to the relevant github page(s) - I’ll take my chances editing the code.

system · June 14, 2022, 5:14pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Removing the /2, /3, /4, etc links for each reply within a topic URL Dev seo	33	4056	October 13, 2024
Search engines now blocked from indexing non-canonical pages Announcements seo	23	4151	March 15, 2022
Adding Canonical Redirects for SEO Optimization Support	24	7299	October 1, 2015
User profile page canonical URL is not case normalized Feature pr-welcome	6	875	February 18, 2023
Redirect u/ to / Support	5	488	August 14, 2019

Canonical structure for /u/* causing many urls to be indexed

Related topics