User profiles / User directory in robots.txt / x-robots-tag header

EDIT this turns out to be a bug, see post #3

If “Hide user profiles from public” is checked, shouldn’t /u be disallowed in robots.txt ?
Otherwise search engines will hit a 403 which might affect ranking and visibility.

This was supposedly implemented in 2014 Excluding user profiles in robots.txt (or allow edit of file) - #2 by neil Disallow /users/ in robots.txt · discourse/discourse@8267a45 · GitHub

The only Discourse forum I could find that has Disallow: /u is Meta :thinking:

Maybe Meta was customized so the migration in FIX: Remove /u/ from robots by nattsw · Pull Request #30782 · discourse/discourse · GitHub was skipped

Good find @Moin

Currently, we already add noindex to /u routes. However, due to robots.txt blocking this, search engines are not able to see the header.

So then we are referred to a previous change FIX: Always noindex /u routes by nattsw · Pull Request #27712 · discourse/discourse · GitHub

which says

Secondly, SiteSetting.hide_user_profiles_from_public raises a Forbidden, which disallows our after_action: add no index header from triggering.

This PR makes sure that the no index header gets added via before_action instead. We may consider removing /u from

discourse/app/controllers/robots_txt_controller.rb

Line 24 in 2900cbe

DISALLOWED_WITH_HEADER_PATHS = %w[/badges /u/ /my /search /tag//l /g /t//.rss /c/.rss]

Unfortunately, that does not always work. While /u/rgj does have a x-robots-tag: noindex header, /u/rgj/summary does not, so it seems like the most recent change is having some unwanted side effects.

(Moving to Bug)

1 Like