Onebox embeds too much content when target page has malformed HTML


(Cédric Moreau) #1

Hi,

I had an issue with the following link: Le charlatanisme de Galuel | ⊔Foundation

Put on a one-line, it almost imports the whole page. I think this is not a normal behaviour, isn’t it?


(Régis Hanol) #2

This is indeed really weird, the preview of the onebox is completely different than the cooked version…


(Jeff Atwood) #3

We have seen this before, it implies badly invalid markup on the target site which somehow confuses the oneboxer, that is what I recall.


(Cédric Moreau) #4

So what should we do about it?


(Jeff Atwood) #5

See if the target site passes html validation via the w3c validator, and if not, how many errors does it report?


(Kane York) #6

Error Line 1250, Column 113: Stray end tag a.

…s://fr.wordpress.com/?ref=lof">Build a website with WordPress.com</a></a></div>

Yep, that’ll mess up the document structure.


(Jeff Atwood) #7

Not sure how to fix this. Can we make onebox more tolerant of screwed up HTML?


(Sam Saffron) #8

We can, nokogiri can iron out bad html and we can trucate it at some sort of sane size.


(Jeff Atwood) #9

@techapj can you add this to your list?


(Arpit Jalan) #10

This is now fixed via:


(Arpit Jalan) #11