Strange onebox failure due to malformed 404

relates to this site:

https://www.nbcsandiego.com/news/national-international/CVS-to-Sell-CBD-Products-in-800-Stores-in-8-States-507484191.html

I get a onebox failure in logs:

Failed to onebox https://www.nbcsandiego.com/news/national-international/CVS-to-Sell-CBD-Products-in-800-Stores-in-8-States-507484191.html unexpected character () at line 9, column 2 [parse.c:704]

more info:

REQUEST_URI /onebox?url=https%3A%2F%2Fwww.nbcsandiego.com%2Fnews%2Fnational-international%2FCVS-to-Sell-CBD-Products-in-800-Stores-in-8-States-507484191.html&refresh=false

This link previews fine in iframely, facebook etc.

If anyone is interested in pursuing this let me know if you need more info.

Feels like something is expecting JSON and getting HTML.

1 Like

Yes, I’ve seen a related error. I did try to byebug it through … let me see if I can find that … will update

2019-03-22 10:42:42 - MultiJson::ParseError - 767: unexpected token at '<head>
<title>404 Not Found</title>
</head><body>
<h1>404 Not Found</h1>
  <p>The provided content is neither a Video Release nor does it contain a Video Release in Related Media.<br>Example Request Format: https://www.nbcnewyork.com/services/oembed/?url=https://www.nbcnewyork.com/sample/path/Sample-Article-Example-12345678.html</p>
</body></html>

      ':
	/Users/joffreyjaffeux/.rubies/ruby-2.6.1/lib/ruby/2.6.0/json/common.rb:156:in `parse'
	/Users/joffreyjaffeux/.rubies/ruby-2.6.1/lib/ruby/2.6.0/json/common.rb:156:in `parse'
	/Users/joffreyjaffeux/.gem/ruby/2.6.1/gems/multi_json-1.13.1/lib/multi_json/adapters/json_common.rb:14:in `load'
	/Users/joffreyjaffeux/.gem/ruby/2.6.1/gems/multi_json-1.13.1/lib/multi_json/adapter.rb:21:in `load'
	/Users/joffreyjaffeux/.gem/ruby/2.6.1/gems/multi_json-1.13.1/lib/multi_json.rb:122:in `load'
	/Users/joffreyjaffeux/Projects/onebox/lib/onebox/oembed.rb:6:in `initialize'

https://github.com/discourse/onebox/blob/master/lib/onebox/oembed.rb#L5

We are trying to get oembed content from https://www.nbcnewyork.com/services/oembed/?url=https://www.nbcdfw.com/news/health/CVS-to-Sell-CBD-Products-in-800-Stores-in-8-States-507484191.html which is a 404

The thing is, this URL comes from their page source:

<link rel="alternate" type="application/json+oembed"
    href="https://www.nbcnewyork.com/services/oembed/?url=https://www.nbcsandiego.com/news/national-international/CVS-to-Sell-CBD-Products-in-800-Stores-in-8-States-507484191.html"
    title="CVS to Sell CBD Products in 800 Stores in 8 States - NBC 7 San Diego " />

So I would day the bug is from them, not us.

4 Likes

Thank you, that doesn’t surprise me at all!

Forgive me if I sound stupid here: there’s no way we could make this more resilient and tolerate this as basic goal is to retrieve just title, excerpt, thumbnail, etc.?

You can always make things more resilient, proof is: it’s working on iFramely.

Now I don’t know if we want to invest time fixing blatant issues like this one. @zogstrip ?

4 Likes

Yeah, nooooo.

They’re sending back the HTTP status code 200 with the HTML content of a 404 on a JSON request… :man_facepalming:

3 Likes

FYI I even tried to tell them using their website feedback form … and the submission gets stuck at an unresponsive Captcha dialogue :man_facepalming:

5 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.