Youtube embeds missing

Hi @Iceman

As a kludge, I have tested this (not elegant) CSS (for you) and it seems to work to stop onebox (from processing); as a kind of “hit a fly with a hammer” approach, which you could turn on and off (add and subtract) until you come up with something better. Give it a try and see:

.onebox-body{
    display:none;
}

Hope this helps.

Note:

I tested this on some onebox links and the onebox(es) disappeared and the link remained; but did not test in detail, sorry.

1 Like

Just stop current processing. Do you know what triggered rebake that is running? Is it background job? If it is then just lower rebake old posts count setting to zero.

You can then use rebake of selected youtube posts (matching with regexp) with delays…

2 Likes

First of all, thank you both @neounix @Overgrow

Regarding this:

Will test it asap. As longs as it blocks it for me to test, more than happy :smiley:

Regarding @Overgrow’s question:

I recently learned (thanks to you guys) how to do custom rebakes and do some queries and dark magic. Howevver, can I ask how can I know/query this:

I mean… is that on the Ruby console or by killing something on Sidekiq…?

rebake old posts count controls how many posts flagged with the various rake tasks are processed every 15 minutes.

Hi @riking,

Thanks for your input. Just a quick question… how can I change that setting? Rails Console?

Yo @Iceman

It is a setting in the admin UI

1 Like

God do I feel dumb :sweat_smile: Thanks @neounix!

I can’t set it to zero, though. The UI says that the value should be between 1 and 2000000000. So I guess I will set it to 1 and combine it with your hack… that should allow me to see if the ban lifts.

(Because the other options are way heavier, being new IPs, LBs or directly forcing X-Forwarded-For on everything, and I don’t want to screw up how Discourse works :sweat:)

Will update soon!

FYI … It you need to set it to zero you can more-than-likely do it with a direct DB UPDATE query (or Rails, which I cannot speak intelligently about)

Welp, no luck. Left it with value 1 and the hack to not display anything and the Ban is still up 5h later. Worst part? Obviously there is no way to contact them, so I’m just guessing ways to solve it. :sweat:

You need to be patient. It may take up to 2-3 days before the ban is lifted in my experience. Just make sure there are no further requests to that IP…

You can check if there are still rebake jobs to run. Hint is in the code (this line selects posts that need rebaking in the background limited by rebake old posts count setting):

https://github.com/discourse/discourse/blob/75b1298e997ce7dda904cc88b668a55eb13b457f/app/models/post.rb#L611

You are looking for following posts:

WHERE (((baked_version IS NULL) OR (baked_version < 2)) AND (deleted_at IS NULL))

1 Like

Hello @Iceman

Yes, I searched the DB for you just now and could not find a table or field where that setting is stored (was not in the site_settings table)

As far as your comment:

… and the Ban is still up 5h later

I don’t have any special working knowledge of how Google manages these “bans”, but I suspect after their algos trigger a ban, it could take a much longer time for Google to “unban” (days or even weeks).

But then again, I do not have any working knowledge of how this process of banning “works” or even what the “official Google name of this banning process” is.

Do you?

Is there a Google support page where they discuss this? What is the exact name of this process of banning you are referring to?

2 Likes

There’s definitely not a support page where Google explains exactly how their DoS-prevention blocking, or any related system, works.

3 Likes

After Blood, Sweat and Tears, I think it’s solved.

However, I’m not particuarly “proud” of the solution but hey, it’s Google, they won’t talk nor explain anything to you, so… conclusions:

  • First of all, one important lesson: Don’t enable IPv6 on DigitalOcean if you are using Discourse, because their IPv6 range is blocked by YouTube.

  • After the IPv6 change was fixed, due to increasing traffic, regardless of host (changed a couple of times, what a journey), what happened after that was that YouTube was IP-Blocking my Discourse installation, due to the quantity of YouTube videos posted on the site and how Discourse loads them.

  • In order to check this block, you need to either use your server as proxy to one with a browser or just do a curl and search for this line: “Sorry for the interruption. We have been receiving a large volume of requests from your network.” (there is this topic for reference)

  • Thanks to the help from @riking @neounix and @Overgrow I did a series of commands (that you can read above) to try to either stop, limit, or change the rate at which we bake the YouTube embeds. For most sites that would be enough, but we had the increased drama of being migrated after I tried a couple of hosts, so all the previous posts needed baking. As a matter of fact, limiting it to 1 every hour kind of solved it at first. But I guess my community just really likes to share videos because that didn’t last long.

  • Obviously there is no feedback or help from YouTube here, except for a couple of threads on their forum with the error and all the comments saying “yeah I also have that issue” but no solutions there.

  • Given the circumstances, remembering that infomercial logic of “There has to be another way!” I opted for a “Rambo” approach: Purchased another IP address. Then added a cron that switches the outbound IP address every hour. Issue solved.

It is expected that, if the site keeps growing and people just keep sharing that YouTube love I may need to acquire a third IP. But hey, until I figure out a correct way of doing a “distributed discourse” in a K8S or something, it’s as good as it gets.

Not the most elegant of solutions, I know.

Once again, thanks for all the help (and mostly patience, because I know I’m very n00bish with all the Rails/Sidekiq/RubyConsole combo, but I’m trying to improve by reading the Discourse Code).

Thanks!

7 Likes

Brilliant!

That’s a creative, effective, “out-of-the-box thinking” solution.

Congrats on solving your puzzle with style and finesse!

4 Likes

Having followed advice/recommendations I set up a CloudFront CDN for our AWS S3 bucket on our Discourse a few days ago.

I added the S3 CDN URL in our control panel, then duly then issued a rebake command on 200,000+ posts.

Didn’t think much of it at that point, it was off and working it’s magic for the next 12 hours or so.

We have many, many videos embedded in our Discourse. We are a drone/uav community and people are posting and sharing their pictures and videos all day long. Tens of thousands of YouTube videos are on our Discourse posts.

Hindsight…? After adding a CDN URL, I probably only needed to rebake posts matching a *.jpg pattern or similar :man_facepalming:t2: :cry:

Anyway, what’s happened?

YouTube have blocked the IP address of our server :pensive:

We can no longer onebox any YouTube links, our community is met with:

429 Too Many Requests

:pensive:

(a simple curl / wget on the server itself also returns the same thing)

We obviously got blocked at some point during the rebake as half of the existing posts that did have working videos, don’t anymore :sob:

I’m assuming this block is permanent but as you’ll know, it’s impossible to find anyone at YouTube to contact and beg forgiveness.

On the off chance it’s permanent, a question for @Iceman please. Can you share the details of how you obtained a second IP address at Digital Ocean, and the changes you made to route “out” on that IP but leave incoming traffic on the existing IP?

And a question for everyone, does anybody know if this block is likely to be just temporary? :crossed_fingers:t2: And/or is there anything I can do in order to fix my now very broken YouTube posts?

For a heavily media-driven community, this is quite disastrous for us.

1 Like

It’s unlikely to be permanent, it will probably go away over time.

2 Likes

If someone is looking for the same change, you can use a Remap instead of a rebake, which is almost instant and doesn’t make requests to anywhere.

5 Likes

Sorry on the late reply.

I can’t help you with Digital Ocean, moved away from them when their IPV6 support was lacking.

I keep adding more and more IPs to the “switch” that does outgoing requests to YT rotating between IPs to avoind being banned. But they eventually get banned and it’s a deadlock. If you are “banned” and that IP keeps doing requests it keeps getting banned. You need to be between 1 to 8h without requests for YT to “unban” you. The more users you get and the more YT Videos they post, the worst it gets.

Thanks to the latest update to oneboxes it’s easier to spot (because you can see the 429 within Discourse instead of hunting for it because videos don’t display correctly). However, within my limited understanding I wonder if there is a better way of handling YT embedding. Because when the video is played the request does come from the client’s IP (I guess), but when the video is displayed it is your site doing the request per video.

3 Likes

Thanks @Iceman :+1:t2:

This is really useful to know.

I installed the onebox assistant plugin and routed all oneboxes through the embed.rocks proxy on Friday.

Today, on the server console, I tried a random wget of a YouTube video.

Sure enough, we’ve been unblocked!

Onebox assistant disabled again this afternoon and we’re pulling direct with no issues so far.

Had I not done that, I think you’re right and we would have never been unblocked because we’d be hitting YouTube every hour or so as people constantly post new videos :grimacing:

Thanks again :smiley:

3 Likes

Thanks for the additional info @Iceman and @Richie – it’s come up a fair bit recently, so any additional info about the way YouTube approaches rate limiting is super helpful.

We also want to reassure people that these “bans” are automatic in both directions – if you are patient and reduce the number of YouTube requests your site makes, you should be removed from Santa’s naughty list within a few days. :santa::page_with_curl:

4 Likes