Add rel="nofollow" admin setting is not working right, its making parent domain link as nofollow

There is a setting in Admin panel:
add rel nofollow to user content: Add rel nofollow to all submitted user content, except for internal links (including parent domains). If you change this, you must rebake all posts with: “rake posts:rebake”

This setting is by default checked but all the links of parent domain is putting up with nofollow attribute.

Please use following steps to replicate:

1 Go to admin panel and search for setting “add rel nofollow to user content”, just to make sure it is checked

2 Create a new post and put link of root domain

3 Inserted link automatically append with rel=“nofollow” attribute despite of being a parent domain link

That does seem like a regression @sam

not following, how are you changing site setting on try.discourse.org?

He’s not, the idea that links to

are nofollowed from

does not seem in tune with the description of the setting:

add rel nofollow to all submitted user content, except for internal links (including parent domains)

1 Like

I think @eviltrout wrote this but my guess is that www.discourse.org is not considered try.discourse.org … only discourse.org is.

Yes but I think it should be. Any subdomain of the parent domain should be followed.

Perhaps you can take this @techapj?

1 Like

Looks like uri.host.ends_with?(site_uri.host) is the code that does it. It seems that try.discourse.org would not match www.discourse.org.

Are we sure we’d want it to do that? It might be weird from a security standpoint to make try match www.

2 Likes

Speaking for Sitepoint, I could see use out of it with our article integration. Having sitepoint articles followed by default would make a lot of sense and in our case it would be community.sitepoint.com matching www.sitepoint.com

I would think other article integrated sites would see value in it too.

8 Likes

It really should work this way.

3 Likes

Fixed via:

https://github.com/discourse/discourse/pull/3501

Also there was bug when domain like foo.com is added in exclude_rel_nofollow_domains setting, then domain like nofoo.com was also being allowed to be excluded from nofollow. Fixed that too.

7 Likes

It’s not that simple.

uri_domain = uri.host  
uri_domain = "#{uri.host.split('.')[1]}.#{uri.host.split('.')[2]}" if uri.host && uri.host.split('.').size == 3  
uri_domain = "#{uri.host.split('.')[1]}.#{uri.host.split('.')[2]}.#{uri.host.split('.')[3]}" if uri.host && uri.host.split('.').size == 4  

This code fails when people use naked ccSLD’s like example.co.uk (will follow everything in co.uk).

Also myblog.blogspot.com will follow everything at blogspot.com

You should use View the Public Suffix List to resolve this…

1 Like

This is about “making links follow” (technically, not making them nofollow), not about making them nofollow.
Or is that what you mean?

1 Like

Yes, edited. Thanks.

Thanks for pointing this out. Will fix.

This is the desired behaviour. If the instance is hosted at myblog.blogspot.com then everything at blogspot.com should be followed.

Yes I agree, initially I used publicsuffix-ruby gem to achieve this, but all tests broke… :disappointed: Will try to fix tests.

Updated the PR to use publicsuffix-ruby gem. Tests are passing.

1 Like

I am worried about this kind of stuff, how heavy is this dependency ?

This is a 150k file https://github.com/weppos/publicsuffix-ruby/blob/master/data/definitions.txt I bet this bloats us with 10k extra strings at least

1 Like

I do not think a whole library should be necessary for this change @techAPJ

Why can’t the test be “ends with correct domain suffix” and then “anything else on front with a period between”?

It’s because the requirements are like this:

  • Discourse at forum.example.com:
  • example.com/ - followed
  • www.example.com/ - followed
  • blog.example.com/ - followed
  • www.2example.com/ - nofollow
  • Discourse at forum.co.uk:
  • co.uk/ - nofollow
  • example.co.uk/ - nofollow
  • Discourse at awesome.website:
  • website./ - nofollow
  • example.website/ - nofollow

You can’t tell those apart with the number of dots.

1 Like

Number of dots should not be used.

Known domain name suffix of site should be used, e.g.

example.com
example.co.uk
example.website

With the rule being “any domain name with a dot plus this suffix is followed”

If necessary add another site setting to hold this value. If the value is not present, the nofollow will simply not be as accurate, e.g. it will nofollow stuff that it technically should not. It errs on the side of caution.

That’s much preferable to a giant library dependency…

2 Likes

Worst case, default to the discourse domain, that way, once added, it only no-follows internal links; ie: community.sitepoint.com, and we’d have to make it more generic by changing it to sitepoint.com

However, it needs to be able to not match

example.com/sitepoint.com/mypage.php or similar related URL renderings.