Embedding pens from CodePen

What’s weird? I think there was a change at some point where the user has to click “Run Pen”, because some pens can chew up a lot of resources, and not everyone will want them to run by default.

No, just that the preview differs significantly from the final posted output. That’s all. It works as designed.

2 Likes

Onebox implementation allows for 2 different outputs depending on preview/actual. We do that for youtube cause every keypress would re-render which becomes obnoxious. (at least until we can hold markdown output in a virtual dom and apply diffs). I guess this is happening here.

1 Like

Yup, but we can do much better in this case. Will fix :pencil:

3 Likes

This has regressed, again:

https://codepen.io/web-tiki/full/dNpgrR

https://codepen.io/web-tiki/pen/dNpgrR

1 Like

:frowning:

@Roman_Rizzi can you have a look?

1 Like

Looks like we’re being intercepted by CloudFlare when trying to fetch the URL’s HTML and this prevents us from discovering the oEmbed endpoint.

I think our options are:

  • Manually adding the oEmbed endpoint URL to onebox without having to fetch the HTML.
  • Check if Codepen can allow-list us? :thinking:
2 Likes

I’m sure we can get it fixed. You hit our oEmbed endpoint manually, yes? What would the referrer look like? Is it potentially different on every site is running Discourse?

3 Likes

Thanks for jumping in so quickly @chriscoyier!

This happens when I try to onebox the link using the gem, the request is being done from Ruby’s Net::HTTP.

The flow is:

  1. Fetch Codepen’s HTML (e.g. GET to https://codepen.io/web-tiki/full/dNpgrR).
  2. Discover the oEmbed URL from the application/json+oembed header tag.
  3. Fetch oEmbed data and build the box.

We never reach step number 2.

Onebox can also work outside of Discourse since it’s a standalone gem, so I don’t think we can rely on a referrer. On the other hand, we could possibly set a specific user-agent that can be allowed on your side? (Is this acceptable, @sam?)

3 Likes

I believe we set one per:

6 Likes

Thanks for pointing that out!

If we allow a particular user-agent, we’ll have to move that into the gem in order to ensure that Codepen’s oneboxes will always work and setting a different agent will no longer be possible.

3 Likes

How’s this going? Are we still screwing y’all up with blocking or has it resolved itself?

If it’s still a problem, we just need a way on our end to make sure we never block these requests. Discourse is self-hosted right? So we can’t count on any particular referrer URL. So it would probably have to be something unique in the UA?

On our end, we might be able to entirely unblock anything oEmbed related. That’s just tricky these days as anything that is entirely unchallenged is a potential attack vector for DDoS.

Most of our oEmbed usage is ultimately through https://embed.ly/ - not sure if that’s a possibility. Perhaps not perfect for an open source thing.

4 Likes

This is still a problem:

○ → curl --user-agent "Discourse Forum Onebox v2.2.0" https://codepen.io/web-tiki/full/dNpgrR
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Attention Required! | Cloudflare</title>
<meta name="captcha-bypass" id="captcha-bypass" />
…

If our user-agent could be whitelisted that would solve the problem. The string we use for these requests is:
Discourse Forum Onebox v#{discourse_version}

7 Likes

Thanks for the info there. I’ve got a ticket opened and we’ll get it fixed up.

9 Likes

Give it another try when you have a moment.

1 Like

Unfortunately it appears that ONLY v2.2.0 was whitelisted:

○ → curl -s --user-agent "Discourse Forum Onebox v2.2.0" https://codepen.io/web-tiki/full/dNpgrR | head
<!doctype html>
<!--[if lte IE 9]>
<html lang="en" class="oldie">
<![endif]-->
<!--[if gt IE 9]><!-->
<html lang="en">
<!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name='viewport' content='width=device-width, initial-scale=1'>

○ → curl -s --user-agent "Discourse Forum Onebox v2.2.1" https://codepen.io/web-tiki/full/dNpgrR | head
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Attention Required! | Cloudflare</title>
<meta name="captcha-bypass" id="captcha-bypass" />
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

○ → curl -s --user-agent "Discourse Forum Onebox v2.3.1" https://codepen.io/web-tiki/full/dNpgrR | head
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Attention Required! | Cloudflare</title>
<meta name="captcha-bypass" id="captcha-bypass" />
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

To make this work for everyone, the leading string Discourse Forum Onebox should be whitelisted.

5 Likes

I can do that. This is tricky stuff though. This is essentially a hole in our protections (we’ve had DDoS issues lately). Plus, a map to that hole is right here in this public thread on the internet. I’ll whitelist it more broadly for now, but if it gets found and hammered, I’ll have to remove the whitelisting. Sorry for making this complicated.

7 Likes

Wait, you have NO other user agents whitelisted? That can’t be right…

It’s quite common for hosts to whitelist user agents for things like this, for example WPEngine does that routinely for all the wordpress blogs they host, because all their customers pay for the CPU time of each request, and when they get lots of requests from bad or unknown crawlers…

2 Likes

Hey @chriscoyier :wave:

We’re still being blocked, is there something we can do from our side to help?

4 Likes

We’re still trying to figure out the best way to handle it. Sorry for the incredible delay here. I’ll update with news as I have it.

6 Likes