"Host is invalid" error when TLD is longer than 7 characters


(Flimm) #1

When setting the allowed hosts for embedding, I get this error: Host is invalid.

This is because it is not passing this regex in the source code, line 26 in the source code of embeddable_host.rb: Link

def host_must_be_valid
  if host !~ /\A[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,7}(:[0-9]{1,5})?(\/.*)?\Z/i &&
     host !~ /\A(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})\Z/
    errors.add(:host, I18n.t('errors.messages.invalid'))
  end
end

This regex only passes if the TLD is seven characters long or fewer. There are many TLDs that are longer than that nowadays, for instance:

  • .cancerresearch
  • .xn–y9a3aq
  • .consulting

The fix should be fairly easy.


(Régis Hanol) #2

Then why don’t you submit a pull request? :wink:


(Flimm) #3

Because I haven’t decided what I think about CLAs in general or this one in particular. Also, I have not tested the fix. It should be fairly easy for someone who is fine with the CLA and who has already used to getting Discourse to build and to run tests.


(Mittineague) #4

Please post your easy changes and I’ll be happy to test it and contingent on the testing do a PR for you.


(Flimm) #5
def host_must_be_valid
  if host !~ /\A[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,70}(:[0-9]{1,5})?(\/.*)?\Z/i &&
     host !~ /\A(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})\Z/
    errors.add(:host, I18n.t('errors.messages.invalid'))
  end
end

There you go. I changed 7 to 70. I release my changes under the CC0 license by Creative Commons and under the public domain.


(Mittineague) #6

Thanks. That does look like an exceedingly simple change.

From what I could find, the RFC allows up to 63 characters.
Yet the longest approved I could find is 18 characters.

http://data.iana.org/TLD/tlds-alpha-by-domain.txt

travelersinsurance
northwesternmutual

A lot of code touches “host” and it is hoped that allowing more characters won’t break anything eg.
an HTML element thats CSS doesn’t account for longer names.
a database table field that might truncate or null longer names.
other code that uses the value, eg. something like “reverse string position”

I made up a fake domain
"kleinfeltersville.travelersinsurance"
and it didn’t break the Admin -> Settings -> Onebox UI

But I could use your help in trying to embed a real domain.

Please post the longest named example URL that passes Iframely URL Debugger - Open Graph, Twitter Cards, oEmbed that you know of.


(Flimm) #7

Here’s an example of a URL with a long TLD: http://behold.photography/


(Eli the Bearded) #8

It does not pass the Armenian localized domain in the first post:

.xn–y9a3aq

I cannot find any active domains in that TLD however for iframely test. ICANN names a Persian TLD and site: http://نمونہ.آزمایشی (http://xn–hhbbbh02d.xn–hgbk6aj7f53bba) which also does not appear to have actually be in use, but gives a more extreme example, and one presumably that should be supported.

My, from memory, understanding of the rules of DNS labels are:

All labels are 1 to 63 characters, case insensitive A to Z, 0 to 9 and - (hyphen), all from ASCII.
No labels may start with a hyphen.
No top level domain label may start with a number.

That means a regexp for a valid domain name would look like:

/^([a-z0-9][a-z0-9-]{0,62}\.)+[a-z][a-z0-9-]{0,62}\.?$/

Domains that are just a TLD are sufficiently bizarre as to be worth ignoring.