Add rel="nofollow" admin setting is not working right, its making parent domain link as nofollow

It’s not that simple.

uri_domain = uri.host  
uri_domain = "#{uri.host.split('.')[1]}.#{uri.host.split('.')[2]}" if uri.host && uri.host.split('.').size == 3  
uri_domain = "#{uri.host.split('.')[1]}.#{uri.host.split('.')[2]}.#{uri.host.split('.')[3]}" if uri.host && uri.host.split('.').size == 4  

This code fails when people use naked ccSLD’s like example.co.uk (will follow everything in co.uk).

Also myblog.blogspot.com will follow everything at blogspot.com

You should use View the Public Suffix List to resolve this…

1 Like

This is about “making links follow” (technically, not making them nofollow), not about making them nofollow.
Or is that what you mean?

1 Like

Yes, edited. Thanks.

Thanks for pointing this out. Will fix.

This is the desired behaviour. If the instance is hosted at myblog.blogspot.com then everything at blogspot.com should be followed.

Yes I agree, initially I used publicsuffix-ruby gem to achieve this, but all tests broke… :disappointed: Will try to fix tests.

Updated the PR to use publicsuffix-ruby gem. Tests are passing.

1 Like

I am worried about this kind of stuff, how heavy is this dependency ?

This is a 150k file https://github.com/weppos/publicsuffix-ruby/blob/master/data/definitions.txt I bet this bloats us with 10k extra strings at least

1 Like

I do not think a whole library should be necessary for this change @techAPJ

Why can’t the test be “ends with correct domain suffix” and then “anything else on front with a period between”?

It’s because the requirements are like this:

  • Discourse at forum.example.com:
  • example.com/ - followed
  • www.example.com/ - followed
  • blog.example.com/ - followed
  • www.2example.com/ - nofollow
  • Discourse at forum.co.uk:
  • co.uk/ - nofollow
  • example.co.uk/ - nofollow
  • Discourse at awesome.website:
  • website./ - nofollow
  • example.website/ - nofollow

You can’t tell those apart with the number of dots.

1 Like

Number of dots should not be used.

Known domain name suffix of site should be used, e.g.

example.com
example.co.uk
example.website

With the rule being “any domain name with a dot plus this suffix is followed”

If necessary add another site setting to hold this value. If the value is not present, the nofollow will simply not be as accurate, e.g. it will nofollow stuff that it technically should not. It errs on the side of caution.

That’s much preferable to a giant library dependency…

2 Likes

Worst case, default to the discourse domain, that way, once added, it only no-follows internal links; ie: community.sitepoint.com, and we’d have to make it more generic by changing it to sitepoint.com

However, it needs to be able to not match

example.com/sitepoint.com/mypage.php or similar related URL renderings.

We are going to need an extra site setting.

if forum.ninjas.co.uk then www.ninjas.co.uk should be followed.

if beta.forum.ninjas.com then www.ninjas.com should be followed.

So we need a site setting here for:

follow_links_domains which user can set. Which will help for lots of other cases. Like ninjas.community wanting www.ninjas.com followed.

3 Likes

Yes this only applies to the domain part of the URL.

However, I think our default behavior is correct now.

meta.discourse.org should only include follow for a.meta.discourse.org etc. Otherwise we need to carry around logic that tells us who is a TLD

eg:

.community vs .com vs co.uk vs co.il

OR we need to do DNS queries and add extra complex logic.

1 Like

No we don’t need to do any of that, we just need to add one more site setting: it contains

discourse.org

or

example.com
example.co.uk
example.website

We know that anything with a period added to the front of that is allowed. If this setting is not specified, then nofollow is super strict.

Note that @techapj said he fixed a bug with the current handling that seems kind of severe…

So we need that fix…

1 Like

Sure, just saying current behavior is correct.

Also site setting should allow for a list of domains, not just one.

According to @techapj current behavior is not correct (at least for exclude_rel_nofollow_domains)

Yes, I agree. However I think we should just rename setting exclude_rel_nofollow_domains to follow_links_domains, because exclude_rel_nofollow_domains is essentially doing the same thing. So:

  • By default meta.discourse.org will only follow meta.discourse.org, a.meta.discourse.org, etc.
  • If the user adds discourse.org to follow_links_domains setting then meta.discourse.org will also follow discourse.org, www.discourse.org, try.discourse.org, etc.

Sounds good?

I think this is fine:

I don’t want rename this site setting, seems pointless.

4 Likes

Is it fixed? If I upgrade to latest version will it work or do I need to wait for next release?

Also do I need to do anything for old posted links or ‘no-follow’ will automatically get removed from older root domain links?

It works fine but you need to plugin the domain name into the site setting

1 Like