Add rel="nofollow" admin setting is not working right, its making parent domain link as nofollow


(Love Chopra ) #1

There is a setting in Admin panel:
add rel nofollow to user content: Add rel nofollow to all submitted user content, except for internal links (including parent domains). If you change this, you must rebake all posts with: “rake posts:rebake”

This setting is by default checked but all the links of parent domain is putting up with nofollow attribute.

Please use following steps to replicate:

1 Go to admin panel and search for setting “add rel nofollow to user content”, just to make sure it is checked

2 Create a new post and put link of root domain

3 Inserted link automatically append with rel=“nofollow” attribute despite of being a parent domain link


(Jeff Atwood) #2

That does seem like a regression @sam


(Sam Saffron) #3

not following, how are you changing site setting on try.discourse.org?


(Jeff Atwood) #4

He’s not, the idea that links to

are nofollowed from

does not seem in tune with the description of the setting:

add rel nofollow to all submitted user content, except for internal links (including parent domains)


(Sam Saffron) #5

I think @eviltrout wrote this but my guess is that www.discourse.org is not considered try.discourse.org … only discourse.org is.


(Jeff Atwood) #6

Yes but I think it should be. Any subdomain of the parent domain should be followed.

Perhaps you can take this @techapj?


(Robin Ward) #7

Looks like uri.host.ends_with?(site_uri.host) is the code that does it. It seems that try.discourse.org would not match www.discourse.org.

Are we sure we’d want it to do that? It might be weird from a security standpoint to make try match www.


(cpradio) #8

Speaking for Sitepoint, I could see use out of it with our article integration. Having sitepoint articles followed by default would make a lot of sense and in our case it would be community.sitepoint.com matching www.sitepoint.com

I would think other article integrated sites would see value in it too.


(Jeff Atwood) #9

It really should work this way.


(Arpit Jalan) #10

Fixed via:

https://github.com/discourse/discourse/pull/3501

Also there was bug when domain like foo.com is added in exclude_rel_nofollow_domains setting, then domain like nofoo.com was also being allowed to be excluded from nofollow. Fixed that too.


(Michael - DiscourseHosting.com) #11

It’s not that simple.

uri_domain = uri.host  
uri_domain = "#{uri.host.split('.')[1]}.#{uri.host.split('.')[2]}" if uri.host && uri.host.split('.').size == 3  
uri_domain = "#{uri.host.split('.')[1]}.#{uri.host.split('.')[2]}.#{uri.host.split('.')[3]}" if uri.host && uri.host.split('.').size == 4  

This code fails when people use naked ccSLD’s like example.co.uk (will follow everything in co.uk).

Also myblog.blogspot.com will follow everything at blogspot.com

You should use View the Public Suffix List to resolve this…


(Mittineague) #12

This is about “making links follow” (technically, not making them nofollow), not about making them nofollow.
Or is that what you mean?


(Michael - DiscourseHosting.com) #13

Yes, edited. Thanks.


(Arpit Jalan) #14

Thanks for pointing this out. Will fix.

This is the desired behaviour. If the instance is hosted at myblog.blogspot.com then everything at blogspot.com should be followed.

Yes I agree, initially I used publicsuffix-ruby gem to achieve this, but all tests broke… :disappointed: Will try to fix tests.


(Arpit Jalan) #15

Updated the PR to use publicsuffix-ruby gem. Tests are passing.


(Sam Saffron) #16

I am worried about this kind of stuff, how heavy is this dependency ?

This is a 150k file https://github.com/weppos/publicsuffix-ruby/blob/master/data/definitions.txt I bet this bloats us with 10k extra strings at least


User Preference 'Website' does not allow new TLDs
(Jeff Atwood) #17

I do not think a whole library should be necessary for this change @techAPJ

Why can’t the test be “ends with correct domain suffix” and then “anything else on front with a period between”?


(Kane York) #18

It’s because the requirements are like this:

  • Discourse at forum.example.com:
  • example.com/ - followed
  • www.example.com/ - followed
  • blog.example.com/ - followed
  • www.2example.com/ - nofollow
  • Discourse at forum.co.uk:
  • co.uk/ - nofollow
  • example.co.uk/ - nofollow
  • Discourse at awesome.website:
  • website./ - nofollow
  • example.website/ - nofollow

You can’t tell those apart with the number of dots.


(Jeff Atwood) #19

Number of dots should not be used.

Known domain name suffix of site should be used, e.g.

example.com
example.co.uk
example.website

With the rule being “any domain name with a dot plus this suffix is followed”

If necessary add another site setting to hold this value. If the value is not present, the nofollow will simply not be as accurate, e.g. it will nofollow stuff that it technically should not. It errs on the side of caution.

That’s much preferable to a giant library dependency…


(cpradio) #20

Worst case, default to the discourse domain, that way, once added, it only no-follows internal links; ie: community.sitepoint.com, and we’d have to make it more generic by changing it to sitepoint.com

However, it needs to be able to not match

example.com/sitepoint.com/mypage.php or similar related URL renderings.