Onebox requests being incorrectly redirected due to user-agent

I’ve never developed with Ruby before so I’m struggling to understand why OneBox is returning with a blank preview from our link, when iFramely and Slack render it fine.

Here is a sample link we are trying to render, it has been whitelisted in our Discourse instance.
https://link.jig.space/kEhFbFwkcW

A working iFramely link is here: Iframely API for Responsive oEmbed
It extracts the appropriate open graph tags to make the preview.

We are using branch.io as our link shortener, with some intelligent redirects if the user is on iOS and has our app installed then it opens the app, but if they’re on desktop it goes to our microsite https://jig.space. Could this have something to do with why OneBox can’t infer the open graph tags while iFramely can?

I tried putting a few puts in the code to figure out why OneBox returns a 404 when previewing our links, and although the Discourse codebase is easy enough to read, I’m too new with Ruby to investigate further.

It would be an awesome feature to have some kind of debug flag for OneBox with more verbose logging as it seems lots of people on here (including me!) have some confusions about how oEmbed and Open Graph are supposed to work.

Could anyone help me debug OneBox further or let me know of any tools to figure this out?

1 Like

Faceboook og debugger throws a couple of warnings:

Not sure if that’s what’s upsetting onebox

1 Like

A curl request responds with html which seems like it’s actually rendering the whole card.
curl -X GET https://link.jig.space/gXszoFiWkX

 <html amp>
   <!-- ...  lots of header and style stuff, with some meta tags, then the body-->
<link rel="alternate" href="ios-app://1111193492/jigspaceviewer/open?link_click_id=link-665670162338762596">
       <link rel="apple-touch-icon" href="https://jig.space/images/jigs/jig-AOp18L7z-color.3.png">
       <meta property="al:ios:url" content="jigspaceviewer://open?link_click_id=link-665670162338762596">
       <meta name="twitter:app:url:iphone" content="jigspaceviewer://open?link_click_id=link-665670162338762596">
       <meta property="al:ios:app_store_id" content="1111193492">
       <meta name="twitter:app:id:iphone" content="1111193492">
       <meta property="al:ios:app_name" content="JigSpace">
       <meta name="twitter:app:name:iphone" content="JigSpace">
           <meta name="twitter:card" content="summary_large_image">
       <meta property="og:image" content="https://jig.space/images/jigs/jig-AOp18L7z-color.3.png">
         <meta name="twitter:image:src" content="https://jig.space/images/jigs/jig-AOp18L7z-color.3.png">
       <title>Macintosh - Complete</title>
       <meta property="og:title" content="Macintosh - Complete">
         <meta name="twitter:title" content="Macintosh - Complete">
       <meta property="og:description" content="Apple Macintosh Computer">
         <meta name="twitter:description" content="Apple Macintosh Computer">
       <meta name="twitter:app:country" content="US">
   <body>
     <div class="card center">
       <div class="main-image"></div>
       <div id="content-container">
         <div class="app-title text-bold">JigSpace</div>
         <div class="card-title text-light">Macintosh - Complete</div>
         <div class="app-content text-light">Apple Macintosh Computer</div>
       </div>

If OneBox’s issue is the image issue which FB is complaining about, then we should add the FB open graph debugger to the OneBox docs thread: Rich link previews with Onebox

I’ll try to put in the og:image:width and such as FB suggests. Thanks @merefield.

1 Like

I feel your pain in general though.

When oneboxing fails could we have some additional logging to say exactly why it is turning its nose up?

1 Like

Yeah it’s just that I can’t follow the code past here:
https://github.com/discourse/discourse/blob/102be5a9e3a063bebe6a62927a102f84904a9bdf/lib/oneboxer.rb#L289

Then later on in the onebox_controller.rb, preview.blank is true.
I’m pretty new to Ruby so just trying to figure out how to go deeper and output the options passed into Onebox.preview etc.

1 Like

byebug is extremely useful. It’s already included in the project I believe, so just put the command ‘byebug’ on it’s own line of the code and the console will pop into the code at that point.

Type any variable to find it’s value.

Use commands ‘next’, ‘step’ and ‘up’ to get around. ‘continue’ to set it off on its merry way.

It’s a bit like ‘debugger’ in javascript.

This is an extreme length to go to for a failing onebox though! :slight_smile: (but a great educational exercise nonetheless ).

2 Likes

I didn’t realise onebox is it’s own gem. I’ve just discovered the onebox repo and it has some awesome points:

Reading the readme there: I should be able to make an engine myself to get around any og tag issues.

I can’t see any onebox option for a debug flag or anything. So I guess that is the suggestion I’m making here: There should be a debug:true or verbose:true option to pass into onebox to understand some common issues.

2 Likes

Yup it’s a ‘separate’ project. I’m in complete agreement about the logging. This should be verbose by default for all failures. We shouldn’t be having to break into the code to see what’s going on: it’s too common a Production problem. Facebook og debug is a nice extra, but onebox has made the decision or failed the preview using it’s own rules and it should be transparent about when it does so - perhaps I’m missing something? I obviously respect the platform’s priorities, but it would be great if this was addressed.

1 Like

Does it have actual text? Image only oneboxes aren’t supported.

1 Like

Hi @codinghorror!
What do you mean by actual text?
As an example here are some of the meta og tags returned from https://link.jig.space/gXszoFiWkX

<meta property="og:title" content="Macintosh - Complete">
<meta property="og:description" content="Apple Macintosh Computer">
<meta property="og:image" content="https://jig.space/images/jigs/jig-AOp18L7z-color.3.png">

The og data renders like this in Slack:

In iframely they render it like this:
image

So I wouldn’t say it’s just the image, though it is just those three tags used so there’s not much text.

1 Like

My guess is you don’t meet our minimum text content requirement for the onebox.

1 Like

What’s the minimum?

I have descriptions like this too which also don’t render a preview from onebox:
<meta property="og:description" content="Create and share interactive 3D knowledge for anything, and bring it to life with Augmented Reality." />

1 Like

You’d need to look at the source code to be sure; the main time I see people get confused about this is when they expect blank or missing descriptions to work.

Other than that, we don’t see issues with this feature in general.

1 Like

Oh wait, I remembered another thing that trips people up. If you are hosting on a platform that commonly blacklists user-agent strings, you’ll run into problems. For example, WPEngine is notorious for blacklisting pretty much every unknown user agent, so oneboxing WordPress blogs on WPEngine tends to fail as a result.

Discourse requests the onebox using the Discourse user agent.

4 Likes

Thanks @codinghorror and @merefield

When I try a curl with the header Discourse, I get a completely different result.
curl -X GET https://link.jig.space/gXszoFiWkX -H 'User-Agent: Discourse'

It takes me to a 304 temp redirect to a page with a script that makes a 2nd redirect back to our home page:
window.top.location = validate("http://jig.space/?_branch_match_id=667900471741258199");

And of course there are no og: tags in those responses.

So it is something to do with my redirects upon getting the Discourse user agent.

Thanks again :slight_smile: I’ll report back when I fix it.

3 Likes

User agent is found here:

1 Like

Just to close this off:
The devs at branch.io haven’t told me a way to detect and redirect differently for the Discourse User agent so I had to change the code on the Discourse side to remove the User Agent completely before Onebox.preview.

So if it’s a onebox request for one of our links, I remove the user agent. This is filthy but it works:

def self.external_onebox(url)
    Rails.cache.fetch(onebox_cache_key(url), expires_in: 1.day) do
      fd = FinalDestination.new(url, ignore_redirects: ignore_redirects, ignore_hostnames: blacklisted_domains, force_get_hosts: force_get_hosts, preserve_fragment_url_hosts: preserve_fragment_url_hosts)
      uri = fd.resolve
      return blank_onebox if uri.blank? || blacklisted_domains.map { |hostname| uri.hostname.match?(hostname) }.any?

      options = {
        cache: {},
        max_width: 695,
        sanitize_config: Sanitize::Config::DISCOURSE_ONEBOX
      }

      options[:cookie] = fd.cookie if fd.cookie

      if uri.to_s.start_with?('https://link.jig.space')
        # Our branch.io links would redirect the Onebox.preview if there is a user agent so we set it blank here
        Onebox.options = { 
          user_agent: ""
        }
      else
        # But some other websites might rely on this useragent to deliver special stuff to Onebox or Discourse, so set it back
        Onebox.options = { 
          user_agent: "Discourse Forum Onebox v#{Discourse::VERSION::STRING}"
        }
      end
      if Rails.env.development? && SiteSetting.port.to_i > 0
        Onebox.options = { allowed_ports: [80, 443, SiteSetting.port.to_i] }
      end

      r = Onebox.preview(uri.to_s, options)

      { onebox: r.to_s, preview: r&.placeholder_html.to_s }
    end
  end

It’s not just the UserAgent for Onebox though. Before the onebox request, there is a call to FinalDestination.new. This takes the ignore_redirects array where I added our our link domain:
https://github.com/discourse/discourse/blob/73a45048a015890a4bce6ec7c203430900adfa02/lib/oneboxer.rb#L26

Now it works for us,
Thanks again for your help guys :slight_smile:

1 Like

That’s really going to confuse anyone who tries to onebox one of your links on any other discourse instance globally.

2 Likes

Yeh I know hence:

But without being able to get branch.io to detect Discourse AND FinalDestination user-agents, and distinguish them from other browser user agents, I don’t have a choice right now.

If I end up resolving this with branch.io support, then I’ll come back and edit these comments because it’s not ideal.

3 Likes