Negative SEO because of too much category pages getting indexed

If you look for total category pages indexed of meta.discourse.org with following query in google

site:meta.discourse.org/c/

you will find 1670 pages index. Similarly i have a discourse site with 50 categories
while here more than 23 thousand pages are indexed under category. (A very very dangerous signal for seo because of content duplicacy)
It should be fixed otherwise google will penalise a discourse website rankings.
Look at the following screenshot how duplicate pages are created and indexed.

3 Likes

I tried to write about this earlier. This is an example:
https://meta.discourse.org/t/seo-for-discourse-not-working-properly/60709/5?u=stranik

Possible

Disallow: /latest?exclude_*
Disallow: /*?no_subcategories=*

I think it is more correct to look directly into the search index. I don’t know if there is such a mechanism in the Google search engine. In Yandex there is a detailed report: duplicates, poor content, all types of errors, are excluded and the included page, graphics crawls, etc. According to the results, a lot of duplicate pages.

I think you may be expecting the Google search modifier to to behave in a way it doesn’t.

Refine web searches - Google Search Help

Search for a specific site
Put “site:” in front of a site or domain. For example, site:youtube.com or site:.gov.

Do you have a Google reference that indicates URLs with sub-folders limits results to only those folders?

see snapshot. same cat pages indexed multiple times.

1 Like

The issue is these weirdo querystrings… where is this crap coming from?

page I can certainly understand, but…

no_subcategories?
slow_platfom?
per_page?

wtf?

Just combining those three plus page, that’s

16 categories * 4 * 3 * 2 = 384

384 category “pages”

3 Likes

I think I see the problem, and we need to fix it ASAP

https://meta.discourse.org/c/plugin?no_subcategories=false&page=2&slow_platform=20

This page must have a canonical meta-tag like so:

<link rel="canonical" href="https://meta.discourse.org/c/plugin?page=2" />

Can you take this @techapj please? the meta-tag should be served to all clients including Google crawler.

8 Likes

We used to include canonical meta tag on /c/plugin page but (I think) this was regressed 4 months ago when we introduced “Default Topic List” category setting.

This should now be fixed, and a test case is added to prevent future regression.

https://github.com/discourse/discourse/commit/72c92b0f4e2c324a9902613c9464fc45bf7fc09b

Now all the category pages should have canonical meta tag except the Top category page (/c/plugin/l/top).

4 Likes

We should probably backport this fix…

2 Likes

Okay, backported this fix to beta and stable branch.

2 Likes

This topic was automatically closed after 28 hours. New replies are no longer allowed.