YouTube onebox makes excessive requests to YT with extra URL params


(Onyx) #1

Oneboxing YouTube videos triggers constant requests to YouTube’s servers. Confirmed repro conditions where I can confirm my own browser sending requests are:

  • Sharing a video that’s a part of a playlist along with the playlist. Sharing the video alone without any references to the playlist seems to be fine, at least on the clientside.
  • Sharing a video with URL of the form https://www.youtube.com/watch?v=Xqog63KOANc&feature=youtu.be.

Looks like a problem with additional parameters.

The main concern is that this might trigger a violation of YouTube ToS, specifically:

Section 5H:

you agree not to use or launch any automated system (including, without limitation, any robot, spider or offline reader) that accesses the Service in a manner that sends more request messages to the YouTube servers in a given period of time than a human can reasonably produce in the same period by using a publicly available, standard (i.e. not modified) web browser;

If I oneboxed that video instead of blockquoting the link, I would generate a total of 668 requests by now, one for each character I typed. And that is not counting me fixing typos. Which I did have to do multiple times.

I consider this a bug personally, but I’m posting in meta since the feature does work as intended. If the intention is to spam YouTube’s servers that is.


(Michael Downey) #2

Isn’t that referring to views of the videos?


#3

Does it say anything about views?


(Onyx) #4

The policy states “requests”. There is no referencing of views count anywhere in the section 5 as far as I can see.

Besides, if you limit that to views only, you effectively can “legally” DOS YouTube as long as you don’t watch the video.

Quotation marks because it’s an oversimplification, I am aware that DOSing a site is a separate issue.


(Ben T) #5

You’ll have to exclude additional content loaded by youtube; as the server side never contacts youtube. Other content is loaded by the “youtube player” client side.

The main issue is that the onebox is re-evaluated for each letter typed; and youtube shows valid responses as long as the video ID is correct. It will fully refresh the embed as letters are added, which causes all related information for the video to refresh on the client only. This is not an API limit issue however, as discourse is pushing this raw HTML back:

<img src='http://i1.ytimg.com/vi/#{video_id}/hqdefault.jpg' width='480' height='270'>
...
<iframe width=\"480\" height=\"270\" src=\"https://www.youtube.com/embed/#{video_id}?feature=oembed\" frameborder=\"0\" allowfullscreen></iframe>

Once the following regex is matched with two results:

/^https?:\/\/(?:www\.)?(?:m\.)?(?:youtube\.com\/watch\?v=|youtu\.be\/|youtube\.com\/embed\/)([a-zA-Z0-9_\-]{11})(?:[#&\?]t=(([0-9]+[smh]?)+))?$/

(finds the video ID after v=; and checks that it is a youtube link)

You would get the same result by going to youtube and hitting enter after each letter as you typed. Not exactly “website scraping” levels. I think that there should be some client code here that prevents the refreshing of the data for one-boxes once they are evaluated… or defer loading for a few seconds as the user types.

(see the youtube onebox code)


(Jeff Atwood) #6

I agree, I think this should be generally true for all our oneboxing, we shouldn’t do it character by character but on pause only.


(Kane York) #7

Hmm. Random video link I pulled off my sub feed:

I don’t see any requests being made while I type, watching the Network tab. The editor doesn’t have the actual embedded video, just a preview image that is requested once and then kept by the browser.

EDIT: Does this have anything to do with the YTLight or whatever thing we are using? Why isn’t the YTLight thing on all the forums? I thought it was put into core.


No repro on Meta, BBS, and the site this came from:


(Ben T) #8

I can repro here. Type a full link to a video, and add parameters to the end (anything will do).

Note that it also disables the YTLight plugin when adding additional parameters, which makes editing slow. Often YT videos will include a playlist link as an additional parameter.


(Kane York) #9

Oh, I see - the URI oneboxer gets whacked by extra URL params because it’s using a regex. I think it would be better to just use URI.parse


(Kane York) #10

I pretty much totally rewrote the YouTube oneboxing so that it no longer gives up when you pass in extra URL parameters:

https://github.com/discourse/onebox/pull/223


(Jeff Atwood) #11