Youtube videos onebox embedding stopped working

tl;dr I’d like to add that we’re experiencing what seems to be the same problem here. If there is a rate-limit issue due to some recent change then I think other users will start to experience this during migration, re-baking of posts or perhaps just down to a really busy forum. The fact that onebox seemingly fails silently means that these issues aren’t visible until users start to complain that YouTube oneboxes are missing.

Background

We’re on 2.6.0.beta 1

Users were getting messages about non-secure content. On investigation, Chrome seemed to be complaining about images linked from HTTP sites. So I configured Discourse to download all images/media and serve them over HTTPS.

Once I’d changed the setting, this meant doing a re-bake of historic posts. Since that rebake, a large chunk of YouTube videos that were previously onebox’d have now changed back to the linked URL

We have one thread of 10,000 posts that consists of solely YouTube video replies and all posts are URLs and not oneboxes.

During the re-bake, all queued jobs processed organically so it’s not jobs stuck in a queue of deleted jobs.

I haven’t seen the same error messages that @marcozambi described, but I believe we are tripping a rate limit too.

What have I tried?

In support of this rate-limit theory, a small piece of code I wrote to re-bake posts worked (onebox’d) for the first 80+ YouTube videos in a thread then failed to convert the remaining videos.

At that point, even editing the post, making a small amendment and resaving did not force the URL to be onebox ‘expanded’. At the same time, all queues were empty or had minor jobs being instantly processed as I would expect.

Attempts to re-run that code over a 30 minute period failed to force the oneboxing of the links. I don’t think 80 is a magic number here, just what was available from the quota we had.

@marcozambi mentioned that the /embed/ format YouTube link worked when others failed, so I amended the code to use a regex search-and-replace of YouTube links to turn them into the /embed/ format.

The code worked.

Re-running the code to just rebake the posts again failed to turn into onebox representations.

My plan is to experiment with a task that converts all YouTube links in the large thread to the /embed/ YouTube format. If that fails or we trip a higher rate limit, then I’ll take a look at @merefield’s Onebox Assistant.

I’ll post an update later.

2 Likes

OK, there is certainly something strange going on and it appears to be rate-limit related.

I’m not sure if we’re being rate-limited because I did a massive rebake and we’ve gone on the naughty step, or if we’re tripping limits that others will see.

Oneboxing of YouTube videos seems to have a limit and once that limit is reached the Oneboxing fails silently.

I feel this has to be changed for hopefully obvious reasons, but specifically for anyone that does a migration or rebake that will have no idea that lots of un-expanded or once-expanded Oneboxes are now just vanilla URLs.

@marcozambi mentioned above that the YouTube URL format that features /embed/ before the video ID works when the other formats are failing (presumably due to a rate-limit issue).

Here’s a video that illustrates that phenomenon well.

When this screencast was captured, there were no jobs clogging the queues and the forum was otherwise performing well.

Prior to this video, YouTube links had already started to fail to be expanded by OneBox.

What you will see is the compose window where Onebox fails to expand a YouTube link in the https://youtu.be/<video-id> format.

I then change the format to be in the https://youtube.com/embed/<video-id> format and Onebox expands it.

I then try again with the original format and it fails.

During this video capture, I tracked the browser console and network tabs. I recognise that the issue is surely between our server and YouTube rather than between my browser and our server, but I include them below in case they are useful.

(apologies for the zoomed-out nature of the image - I hope they’re visible when zoomed-in)

And here is the network trace when the Onebox worked.

I’m not convinced that the /embed/ format of link is a panacea here,

I think it seems to be a route that has separate rate limits such that when the https://youtu.be/<video-id> route hits a limit, there is another route with a separate limit on the https://youtube.com/embed/<video-id> route.

Evidence of the fact that both routes are limited comes from a utility I wrote to change the format of the YouTube embeds on a monster 10K post thread featuring 99% YouTube video replies.

At this stage Onebox was already failing to expand the https://youtu.be/<video-id> formatted links.

My utility, which changed the YouTube video URL to the https://youtube.com/embed/<video-id> format ran against the first 3000 posts in the thread.

It worked well for the first 1108 and then, whilst it changed the format for the next ~1900 posts, but they were not expanded by Onebox.

During this time, lots of jobs were generated (my code used post.revise) and all processed without error or retry.

Anecdotally, I noticed that job processing seemed to dramatically accelerate at a certain stage. I guess this might have been because the Onebox code was quickly getting some form of error from YouTube - but I didn’t time it and it could have been a number of things.

I’d be happy to try to supply more detailed evidence here, but not sure what I can do without instrumenting the Onebox gem.

I’m a hacker and not a Ruby expert but I’d gladly try to follow some high-level instructions.

1 Like

Performing some short repetitive curl scripts from the servers command line with the same user agent might allow you to isolate a rate limit issue.

Agree that the workaround is probably working just because it’s a separate count.

3 Likes

Here are some more results. Note, there are many assumptions in the post below - based on a lack of real knowledge.

I’ll follow this post up with my opinion of what is going on and what should happen.

Thanks for the response, Robert.

Note that Oneboxing of videos using the /watch route were (and still are!) failing when I tried this so I didn’t need a loop to force it to fail.

So one assumption I’ve made is that the user-agent that Onebox is using is Discourse Forum Onebox v2.6.0.beta1 based on this code…

https://github.com/discourse/discourse/blob/6a417c308f8abf30b0dec712426a009cabce859e/config/initializers/100-onebox_options.rb#L14

I picked a random video and attempted to use curl to read the headers.

I did this from inside the Docker container on my live site which produced the following response.

Result of first curl using /watch? route

command

curl --user-agent "Discourse Forum Onebox v2.6.0.beta1" -sD - -o /dev/null "https://m.youtube.com/watch?v=s0ONj4TG0UA"

response:

curl --user-agent "Discourse Forum Onebox v2.6.0.beta1" -sD - -o /dev/null "https://m.youtube.com/watch?v=s0ONj4TG0UA"
HTTP/2 303 
content-length: 0
p3p: CP="This is not a P3P policy! See http://support.google.com/accounts/answer/151657?hl=en-GB for more info."
cache-control: no-cache
x-frame-options: SAMEORIGIN
content-type: text/html; charset=utf-8
location: https://www.youtube.com/watch?v=s0ONj4TG0UA&app=desktop
accept-ch-lifetime: 2592000
x-content-type-options: nosniff
accept-ch: DPR
expires: Tue, 27 Apr 1971 19:44:06 GMT
strict-transport-security: max-age=31536000
date: Fri, 07 Aug 2020 11:35:21 GMT
server: YouTube Frontend Proxy
x-xss-protection: 0
set-cookie: VISITOR_INFO1_LIVE=rcVTSJn81Ck; path=/; domain=.youtube.com; secure; expires=Wed, 03-Feb-2021 11:35:20 GMT; httponly; samesite=None
set-cookie: YSC=cFXIPerzT3Y; path=/; domain=.youtube.com; secure; httponly; samesite=None
set-cookie: GPS=1; path=/; domain=.youtube.com; expires=Fri, 07-Aug-2020 12:05:20 GMT
alt-svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

So I was redirected using a 303 response to the URL in the the location header which was https://www.youtube.com/watch?v=s0ONj4TG0UA&app=desktop.

This simply had the effect of appending &app=desktop to the URL.

Result of the second curl to redirected URL - still on the /watch? route

command
curl --user-agent "Discourse Forum Onebox v2.6.0.beta1" -sD - -o /dev/null "https://www.youtube.com/watch?v=s0ONj4TG0UA&app=desktop"

response

HTTP/2 429 
x-content-type-options: nosniff
expires: Tue, 27 Apr 1971 19:44:06 GMT
x-frame-options: SAMEORIGIN
cache-control: no-cache
p3p: CP="This is not a P3P policy! See http://support.google.com/accounts/answer/151657?hl=en-GB for more info."
accept-ch-lifetime: 2592000
content-type: text/html; charset=utf-8
accept-ch: DPR
strict-transport-security: max-age=31536000
content-length: 48982
date: Fri, 07 Aug 2020 11:46:00 GMT
server: YouTube Frontend Proxy
x-xss-protection: 0
set-cookie: VISITOR_INFO1_LIVE=VQwNuouhq-s; path=/; domain=.youtube.com; secure; expires=Wed, 03-Feb-2021 11:46:00 GMT; httponly; samesite=None
set-cookie: YSC=8IRfPRFRY6c; path=/; domain=.youtube.com; secure; httponly; samesite=None
set-cookie: GPS=1; path=/; domain=.youtube.com; expires=Fri, 07-Aug-2020 12:16:00 GMT
set-cookie: VISITOR_INFO1_LIVE=VQwNuouhq-s; path=/; domain=.youtube.com; secure; expires=Wed, 03-Feb-2021 11:46:00 GMT; httponly; samesite=None
set-cookie: YSC=8IRfPRFRY6c; path=/; domain=.youtube.com; secure; httponly; samesite=None
set-cookie: GPS=1; path=/; domain=.youtube.com; expires=Fri, 07-Aug-2020 12:16:00 GMT
alt-svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

So I am being sent a 429 “too many requests” response code, but without being sent a retry-after header - cease and desist with no negotiation.

Either way, if this is what Onebox is seeing, it’s ignoring the response or at least I don’t know where to go looking for it if it is being logged.

Whilst this might be a legitimate thing to do for a single 429, seeing many 429 responses in a very short period of time cannot be ignored.

Result of third curl - this time using the /embed/ route

For completeness, I immediately tried to get the same video but this time using the /embed/ route.

command

curl --user-agent "Discourse Forum Onebox v2.6.0.beta1" -sD - -o /dev/null "https://www.youtube.com/embed/s0ONj4TG0UA"

response

HTTP/2 200 
accept-ch-lifetime: 2592000
content-type: text/html; charset=utf-8
expires: Tue, 27 Apr 1971 19:44:06 GMT
x-content-type-options: nosniff
cache-control: no-cache
p3p: CP="This is not a P3P policy! See http://support.google.com/accounts/answer/151657?hl=en-GB for more info."
strict-transport-security: max-age=31536000
accept-ch: DPR
date: Fri, 07 Aug 2020 11:55:29 GMT
server: YouTube Frontend Proxy
x-xss-protection: 0
set-cookie: VISITOR_INFO1_LIVE=PNE6x6djF00; path=/; domain=.youtube.com; secure; expires=Wed, 03-Feb-2021 11:55:29 GMT; httponly; samesite=None
set-cookie: VISITOR_INFO1_LIVE=PNE6x6djF00; path=/; domain=.youtube.com; secure; expires=Wed, 03-Feb-2021 11:55:29 GMT; httponly; samesite=None
set-cookie: GPS=1; path=/; domain=.youtube.com; expires=Fri, 07-Aug-2020 12:25:29 GMT
set-cookie: YSC=pDW-hdbauK8; path=/; domain=.youtube.com; secure; httponly; samesite=None
alt-svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
accept-ranges: none
vary: Accept-Encoding

200 - Success.

lazy-yt plugin seems to re-write URLs in /watch format

Not sure if this is of any significance at all but…is the embedded plugin lazy-yt enabled by default? I noticed it in my development installation.

It seems to monkey patch the to_html method of the YouTube Oneboxer.

I don’t know if it is significant, but the original Onebox’s to_html method returns the /embed/ URL format…

https://github.com/discourse/onebox/blob/eb783a5ccf38b20224294b14c068b3ba01ff2579/lib/onebox/engine/youtube_onebox.rb#L32

Whereas the lazy-yt plugin uses the /watch?v= URL format.

https://github.com/discourse/discourse/blob/b86198198fe904899f6b31edfbb7c05873a95625/plugins/lazy-yt/plugin.rb#L43

Is there anything else I can do to show there is a problem that needs some form of attention? Next post will explain what I think is the root cause.

4 Likes

I’d love some advice on what to do here, but also have some ideas about how this might be better handled.

What might be happening

One theory is that our server may have been identified by YouTube as potentially farming music videos and we’re getting limited / blocked.

We’re a really unremarkable little forum in the UK with meagre traffic, but we have a couple of threads (actually one split in two due to size) of 10K + 2K posts of music videos. It’s a musical chain where the next poster simply posts a song related, often in some tangential way, to the previous post.

We have other threads with YouTube links, of course, but this one is particularly (~100%) dense with music.

Following a rebake at the weekend, I’m guessing that YouTube looked at the activity of the Oneboxer trying to grab headers for lots of music videos and their algorithm put us on the naughty step.

I subsequently, re-tried to bake posts and that has presumably confirmed YouTube’s suspicions that all we do from this IP address is attempt to download music videos.

Might be Related to Digital Ocean

So Googling 429 errors on YouTube, and ignoring all of the YouTube API results, eventually points to a group of similar sounding issues that all say that their IP addresses have been grey-listed/black-listed/banned as a result of being on virtual servers - Digital Ocean’s name was specifically mentioned.

The suggestion is that YouTube may ‘ban’ a series of IP addresses and that other, legitimate servers, are often pulled in as collateral damage.

Here is one such problem and potential solution.

https://support.google.com/youtube/thread/21697789?hl=en

and here is another from a developer at medium.com who works on the embed.ly API.

https://support.google.com/youtube/thread/21939228?hl=en

The suggested resolution in the first post was to disable ip6 on the droplet.

Does anyone know if that sounds likely, or if it would cause any issues?

I’ve held off doing this at the moment so I can collect any other data that might be needed to aid resolution.

What should happen?

Firstly, I understand the reaction here is likely to be that my experience is an edge-case and that no change is needed.

I can understand that. But if @marcozambi and I have found the same issue in a relatively short period of time then it suggests that it could be something that others will see. Perhaps YouTube has recently become more officious in its policing of embedding?

I’d ask you to remember that at the moment my forum is in complete limbo. I have tens of thousands of links that have failed to be Oneboxed, but I don’t know where they are and I can only think of a total rebake to find them all which will almost certainly get more negative attention from YouTube.

At the very least, Onebox cannot fail silently when it hits a rate-limit error. It has to bring it to the attention of admins and, where possible, pause the process that tripped the rate-limit issue.

Options for bringing this to the attention of the forum owner

I don’t think this is easy. At all.

I see Onebox as playing a number of distinctly different roles

a) it builds Oneboxes in real-time as posts are composed,
b) it responds to rebakes of old posts (with often already known links) in the background and headless
c) it is asked to handle masses of background job posts during a migration / rebake

If a rate-limit error were to occur during scenario a) then the author would recognise that the Onebox hadn’t expanded and may complain to the forum admins - it might be noticed. I guess it’s also possible to grab the single error during composition and push an error or message to admins?

In scenario b) the failure may happen during background processing of the post and therefore it would need to be either retried or if a retry count had been reached it may need to be failed and the forum admins notified.

In scenario c) the wider context of lots of failures happening at the same time would need to be known. At the moment I don’t believe there is any overarching control of the post rebake process such that 10,000 429 errors coming back from Onebox might be able to recognised as a bigger problem than sending 10,000 individual messages to admins.

In reality, a failure such as that would mean that all calls to Onebox for YouTube (might be other providers as well) video expansion would need to be put on hold until we’re no longer rate-limited.

Alternative approaches

Rate limit outgoing requests

I guess the prevention-is-better-than-the-cure axiom might dictate that a Discourse rate-limiter could be put in front of the requests so that Discourse would then hold requests back on some settings-configured basis.

This would play havoc with re-bakes and migrations.

Onebox Assistant

Perhaps we all need commercial relationships with organisations that cache and proxy embeds such as the one used by @merefield in his Onebox Assistant plugin?

Whitelist the 'bot

Could we perhaps find a way to whitelist the Discourse 'bot with YouTube? Probably too open to abuse.

YouTube API

Just like the Onebox does with Twitter, perhaps we could use the YouTube API if the rate limits are greater?

4 Likes

Some good research there, thank you.

I don’t know whether you noticed, but I raised that even Signal Messenger has problems with YouTube and it’s still an open issue:

https://meta.discourse.org/t/onebox-assistant-crawl-for-those-previews-reliably/107405/9?u=merefield

It also happens on whatsapp.

3 Likes

I just wanted to add that we get regularly “banned” by YT every 2 to 3 weeks, we stay banned for a week or so, and then we get un-banned. Then the cycle repeats.
We have only a few hundreds of links to YT video, our users post no more than a handful per day…

4 Likes

So without using the OneBox Assistant plugin, there is no solution to preview the youtu.be links in the base discourse?

All formats of YouTube links should be expanded by Onebox.

Is it not working for you?

If not, what problems are you seeing?

I’m on the latest discourse and I just made a post with a youtu.be link and it didn’t show the preview for it. Regular youtube links work.

Let’s see

https://youtu.be/gLCduDJVksc looks fine to me

Yes it works on meta.discourse. I’m trying to figure out why it doesn’t on mine. Is there any settings to configure this from admin panel?

Youtube has likely banned your server or IP range, as discussed above.

But regular youtube links work. If they banned, wouldn’t all of them not work?

Not the case, @AntiMetaman.

As @codinghorror suggested above, YouTube appears to rate-limit / ban certain formats of YouTube links work where others continue to work.

In my case the ‘/embed’ format of link works but others don’t.

It looks like YouTube provides rate limits that are different per format. The ‘/embed’ one seems to allow for greater numbers. That’s purely anecdotal and I haven’t got any hard data on that.

A few questions

Are you hosting on Digital Ocean?

Are you running IPv6?

Have you checked that the video id is correctly formed - 11 characters? (stupid question - but you’d be suprised

For a given YouTube video have you tried each of the three supported formats? - i.e.

Which worked and which didn’t?

I have banged my head against this problem for weeks but managed to get a workable solution - but I really don’t understand why it works.

Assuming that you’re seeing the same problem try installing the Onebox Assistant, then enable it but don’t configure it.

I found I needed to bounce the server after enabling it to ensure it worked.

In this configuration - which really should not work at all - my otherwise rate-limited YouTube videos (in any format) get expanded by Onebox - but I have no idea how this works.

I have been discussing the issue with @merefield (the author) here…

https://meta.discourse.org/t/onebox-assistant-crawl-for-those-previews-reliably/107405/36?u=bletch

IF, and it’s a big IF, this also works for you, it’s not a proper fix as it’s just a quirk of method return values and shouldn’t be relied on long-term.

Separately, I’ve also used the Onebox Assistant plugin as it was intended - by subscribing to embed.rocks and it works like a charm.

2 Likes

Can you define regular and non-regular YouTube links please.

If that is the case, why can’t Onebox convert into the one format that works before polling the data?

It could, I guess, but it’s not a ‘fix’.

All of the formats appear to be rate-limited and eventually fail. Even the ‘/embed’ format failed for me after I tried to re-bake all of the YouTube embeds that had failed during a previous re-bake.

Plus, I’ve only seen two experiences shared on here where the ‘/embed’ route worked where others failed - including my own.

There’s not enough evidence yet, nor enough ‘sufferers’ yet to suggest making a change.

Are you experiencing the problem too, @Terrapop - or are you just an interested party?

I’m an interested party as we move our popular community to Discourse soon. Just staging at the moment, it works for now, but we want to have a save setup right from the start, thus we will probably setup with Onebox Assistant to our own endpoint which will relay and cache responses from embeds.rock or if this fails, we will pull from Iframely.

1 Like