Onebox embeds too much content when target page has malformed HTML

(Cédric Moreau) #1


I had an issue with the following link: Le charlatanisme de Galuel | ⊔Foundation

Put on a one-line, it almost imports the whole page. I think this is not a normal behaviour, isn’t it?

(Régis Hanol) #2

This is indeed really weird, the preview of the onebox is completely different than the cooked version…

(Jeff Atwood) #3

We have seen this before, it implies badly invalid markup on the target site which somehow confuses the oneboxer, that is what I recall.

(Cédric Moreau) #4

So what should we do about it?

(Jeff Atwood) #5

See if the target site passes html validation via the w3c validator, and if not, how many errors does it report?

(Kane York) #6

Error Line 1250, Column 113: Stray end tag a.

…s://">Build a website with</a></a></div>

Yep, that’ll mess up the document structure.

(Jeff Atwood) #7

Not sure how to fix this. Can we make onebox more tolerant of screwed up HTML?

(Sam Saffron) #8

We can, nokogiri can iron out bad html and we can trucate it at some sort of sane size.

(Jeff Atwood) #9

@techapj can you add this to your list?

(Arpit Jalan) #10

This is now fixed via:

(Arpit Jalan) #11