Excluding user profiles in robots.txt (or allow edit of file)


(Lowell Heddings) #1

After the latest Google Panda update, I found myself on the unhappy end of a 15% drop in visitors via Google, which led me to do some research on what exactly Google has indexed for my site.

According to a quick search using “site:discuss.howtogeek.com/users/” I noticed that all of the user profiles are being indexed in Google, despite having almost no useful information on any of them. So on my very small forum that’s 4,310 additional URLs that are indexed in Google despite having very little quality content on them.

I noticed this setting, which allows me to entirely disallow the forum from Google, but I think it would be better to just have a way to exclude user profiles in the robots.txt.

The ideal scenario would be if I could just edit the robots.txt in the settings, of course.


Needing to edit robots.txt file - where is it?
Needing to edit robots.txt file - where is it?
(Neil Lalonde) #2

Good catch. Web crawlers will see almost nothing there. My page as crawlers see it:

Do we even need a setting for this? We could just add a rule to the robots.txt to disallow indexing of user pages. I don’t think any Discourse install would want them to be indexed.


(Lowell Heddings) #3

That definitely works as a quick fix, and personally I’d think the default should be to disallow indexing of users.


(Jeff Atwood) #4

Yeah, we can just disallow indexing of user pages for now via robots.txt as a quick fix.

At best they would just be links to topics the user participated in, and maybe an “about me” on the rare chance they filled it out.

Go ahead and do that tomorrow @neil.


Google Crawler and Permalinks 404s
(Jeff Atwood) #5

User profiles are now disallowed from indexing by default.


Editing robot txt file
(Jeff Atwood) #6