Change max chars in Onebox - enlarge your Onebox?


(Frederik L) #1

After Maja did add a more advance OneBox for Amazon links I would like to know if it is possible to “hack” the OneBox to strip more chars so the Onebox gets bigger.

Eg I see the max chars for description is 200 chars. Where can I change that? Then I could have a larger OneBox - or is it not possible to set how large you want your onebox?

I dont see any limit in the onebox/amazon_onebox.rb at master · discourse/onebox · GitHub so I thought it might be set more globally somewhere…

Best regards,


(Maja) #2

It’s not possible to change the length of the onebox description unfortunately (the only adjustable is internal post onebox).
Amazon onebox uses whatever is displayed as a description excerpt on mobile

27


(Frederik L) #3

Okay, now you had made such a good date ripper for amazon - my hope was to be able to get even more data lenght then the onebox normally gives you.

I can see: onebox/google_play_app_onebox.rb at master · discourse/onebox · GitHub use this code

MAX_DESCRIPTION_CHARS: 500

But I dont know if I could use something like that in your script… :slight_smile:


(Maja) #4

It certainly can be done, we’d just have to decide on sensible description length which will be fixed then, so the onebox doesn’t get too big.


(Frederik L) #5

Not to maja specific - she did a great job adding the extension to the amazon script, but more generally.

how do I enlarge the onebox for amazon to make a book review section at my site?

Do I need to code a plugin or?

I see this script:

Use these settings:

    DEFAULTS = {
    EXPAND_ONE_LINER: EXPAND_AFTER | EXPAND_BEFORE, #set how to expand a one liner. user EXPAND_NONE to disable expand
    LINES_BEFORE: 10,
    LINES_AFTER: 10,
    SHOW_LINE_NUMBER: true,
    MAX_LINES: 20,
    MAX_CHARS: 5000
  }

Best regards,


(Frederik L) #6

I was now able to set up a dev box via this topic: How to edit the discourse files? A development box?

Can anybody tell me where to find the file responseble for the amazon data ripper.

It seems like is is not the file I thougt it was:

https://github.com/discourse/onebox/blob/master/lib/onebox/engine/amazon_onebox.rb

As there are no such file in my system. Even then the data output is like this on my dev box, with stars info and ISBN etc

Thank you.


(Maja) #7

onebox/amazon_onebox.rb at master · discourse/onebox · GitHub is correct file to modify.

The best would be to clone the onebox library GitHub - discourse/onebox: A gem for turning URLs into website previews (run bundle install inside your onebox directory after cloning).

After that you can change the code and run bundle exec rake server to preview changes in the browser on localhost:9000


(Frederik L) #8

This way?

git clone https://github.com/discourse/onebox/blob/master/lib/onebox/engine/amazon_onebox.rb ~/discourse/lib/onebox

Or should I create a new home folder a side discourse?

git clone https://github.com/discourse/onebox/blob/master/lib/onebox/engine/amazon_onebox.rb ~/onebox


(Maja) #9

Use this one.

If you want Discourse app to use your cloned onebox, you can replace gem 'onebox', '1.8.42' with gem 'onebox', path: 'path-to-your-cloned-onebox' (run bundle after changing Gemfile)


(Frederik L) #10

Nice I was able to use

gem 'onebox', path: '/Users/username/onebox/'

Then I can edit a file and run

bundle update

before

bundle exec rails server

Or is there a easier way to do it?


(Maja) #11

You only have to run bundle once after changing Gemfile.
After that you can run bundle exec rails server inside /discourse and it should work.


(Frederik L) #12

UIt seems like Im not able to over rule the settings set somewhere else?

I did add

      include HTML

  # Set the max number of chars
  DEFAULTS = {
    MAX_DESCRIPTION_CHARS: 500
  }

And then around line 109

I did change line

            description: raw.at("#productDescription")&.inner_text,

to
description: raw.at("#productDescription")&.inner_text[0..DEFAULTS[:MAX_DESCRIPTION_CHARS]],

But Im not able to have more then 250 chars - I can reduce it to eg. 50 chars but not go over 250 - how to?


(Frederik L) #13

Im a little stucked here :frowning:

I even tried this: But Im not able to get more then 250 chars - any one who can push my in the right direction?

Code snippet
require 'json'

module Onebox
  module Engine
    class AmazonOnebox
      include Engine
      include LayoutSupport
      include HTML

      # Set the max number of chars
      DEFAULTS = {
        MAX_DESCRIPTION_CHARS: 500
      }

      always_https
      matches_regexp(/^https?:\/\/(?:www\.)?(?:smile\.)?(amazon|amzn)\.(?<tld>com|ca|de|it|es|fr|co\.jp|co\.uk|cn|in|com\.br)\//)

      def url
        if match && match[:id]
          return "https://www.amazon.#{tld}/gp/aw/d/#{URI::encode(match[:id])}"
        end

        @url
      end

      def tld
        @tld || @@matcher.match(@url)["tld"]
      end

      def http_params
        {
          'User-Agent' =>
          'Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3'
        }
      end

      private

      def match
        @match ||= @url.match(/(?:d|g)p\/(?:product\/)?(?<id>[^\/]+)(?:\/|$)/mi)
      end

      def image
        if (main_image = raw.css("#main-image")) && main_image.any?
          attributes = main_image.first.attributes

          return attributes["data-a-hires"].to_s if attributes["data-a-hires"]

          if attributes["data-a-dynamic-image"]
            return ::JSON.parse(attributes["data-a-dynamic-image"].value).keys.first
          end
        end

        if (landing_image = raw.css("#landingImage")) && landing_image.any?
          landing_image.first["src"].to_s
        end

        if (ebook_image = raw.css("#ebooksImgBlkFront")) && ebook_image.any?
          ::JSON.parse(ebook_image.first.attributes["data-a-dynamic-image"].value).keys.first
        end
      end

      def price
        # get item price (Amazon markup is inconsistent, deal with it)
        if raw.css("#priceblock_ourprice .restOfPrice")[0] && raw.css("#priceblock_ourprice .restOfPrice")[0].inner_text
          "#{raw.css("#priceblock_ourprice .restOfPrice")[0].inner_text}#{raw.css("#priceblock_ourprice .buyingPrice")[0].inner_text}.#{raw.css("#priceblock_ourprice .restOfPrice")[1].inner_text}"
        elsif raw.css("#priceblock_dealprice") && (dealprice = raw.css("#priceblock_dealprice span")[0])
          dealprice.inner_text
        elsif !raw.css("#priceblock_ourprice").inner_text.empty?
          raw.css("#priceblock_ourprice").inner_text
        else
          raw.css(".mediaMatrixListItem.a-active .a-color-price").inner_text
        end
      end

      def multiple_authors(authors_xpath)
        author_list = raw.xpath(authors_xpath)
        authors = []
        author_list.each { |a| authors << a.inner_text.strip }
        authors.join(", ")
      end

      def data
        og = ::Onebox::Helpers.extract_opengraph(raw)

        if raw.at_css('#dp.book_mobile') #printed books
          title = raw.at("h1#title")&.inner_text
          authors = raw.at_css('#byline_secondary_view_div') ? multiple_authors("//div[@id='byline_secondary_view_div']//span[@class='a-text-bold']") : raw.at("#byline")&.inner_text
          rating = raw.at("#averageCustomerReviews_feature_div .a-icon")&.inner_text || raw.at("#cmrsArcLink .a-icon")&.inner_text

          table_xpath = "//div[@id='productDetails_secondary_view_div']//table[@id='productDetails_techSpec_section_1']"
          isbn = raw.xpath("#{table_xpath}//tr[8]//td").inner_text.strip

          # if ISBN is misplaced or absent it's hard to find out which data is
          # available and where to find it so just set it all to nil
          if /^\d(\-?\d){12}$/.match(isbn)
            publisher = raw.xpath("#{table_xpath}//tr[1]//td").inner_text.strip
            published = raw.xpath("#{table_xpath}//tr[2]//td").inner_text.strip
            book_length = raw.xpath("#{table_xpath}//tr[6]//td").inner_text.strip
          else
            isbn = publisher = published = book_length = nil
          end

          result = {
            link: link,
            title: title,
            by_info: authors,
            image: og[:image] || image,
            description: raw.at("#productDescription")&.inner_text[0..DEFAULTS[:MAX_DESCRIPTION_CHARS]],
            rating: "#{rating}#{', ' if rating && (!isbn&.empty? || !price&.empty?)}",
            price: price,
            isbn_asin_text: "ISBN",
            isbn_asin: isbn,
            publisher: publisher,
            published: "#{published}#{', ' if published && !price&.empty?}"
          }

        elsif raw.at_css('#dp.ebooks_mobile') # ebooks
          title = raw.at("#ebooksTitle")&.inner_text
          authors = raw.at_css('#a-popover-mobile-udp-contributor-popover-id') ? multiple_authors("//div[@id='a-popover-mobile-udp-contributor-popover-id']//span[contains(@class,'a-text-bold')]") : (raw.at("#byline")&.inner_text&.strip || raw.at("#bylineInfo")&.inner_text&.strip)
          rating = raw.at("#averageCustomerReviews_feature_div .a-icon")&.inner_text || raw.at("#cmrsArcLink .a-icon")&.inner_text || raw.at("#acrCustomerReviewLink .a-icon")&.inner_text

          table_xpath = "//div[@id='detailBullets_secondary_view_div']//ul"
          asin = raw.xpath("#{table_xpath}//li[4]/span/span[2]").inner_text

          # if ASIN is misplaced or absent it's hard to find out which data is
          # available and where to find it so just set it all to nil
          if /^[0-9A-Z]{10}$/.match(asin)
            publisher = raw.xpath("#{table_xpath}//li[2]/span/span[2]").inner_text
            published = raw.xpath("#{table_xpath}//li[1]/span/span[2]").inner_text
          else
            asin = publisher = published = nil
          end

          result = {
            link: link,
            title: title,
            by_info: authors,
            image: og[:image] || image,
            description: raw.at("#productDescription")&.inner_text[0..DEFAULTS[:MAX_DESCRIPTION_CHARS]],
            rating: "#{rating}#{', ' if rating && (!asin&.empty? || !price&.empty?)}",
            price: price,
            isbn_asin_text: "ASIN",
            isbn_asin: asin,
            publisher: publisher,
            published: "#{published}#{', ' if published && !price&.empty?}"
          }

        else
          title = og[:title] || CGI.unescapeHTML(raw.css("title").inner_text)
          result = {
            link: link,
            title: title,
            image: og[:image] || image,
            price: price
          }

          result[:by_info] = raw.at("#by-line")
          result[:by_info] = Onebox::Helpers.clean(result[:by_info].inner_html) if result[:by_info]

          summary = raw.at("#productDescription").inner_text[0..DEFAULTS[:MAX_DESCRIPTION_CHARS]]
          result[:description].inner_text[0..DEFAULTS[:MAX_DESCRIPTION_CHARS]] = og[:description].inner_text[0..DEFAULTS[:MAX_DESCRIPTION_CHARS]] || (summary.DEFAULTS[:MAX_DESCRIPTION_CHARS] && summary.inner_text[0..DEFAULTS[:MAX_DESCRIPTION_CHARS]])
        end

        result
      end
    end
  end
end

(Frederik L) #14

Seems like a dead end?


(Frederik L) #15

Ahh, dooh.

It seems like the source that the script rip data from by standart only contains 250 chars

If you eg. go to

https://www.amazon.com/gp/aw/d/0323353177/

And view it as user agent:

Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3

The div with productDescription only contains 250 chars.

If you press the field you go to a new page with the path

https://www.amazon.com/gp/aw/d/0323353177/#productDescription_secondary_view_div_1520843164890

Here you have the full description inside the div productDescription_fullView

It should have a xpath

//*[@id="productDescription_fullView"] 

A dirty solution in: onebox/amazon_onebox.rb at master · discourse/onebox · GitHub

Find the empty line 85 add

      description = raw.at_css('#productDescription_fullView')&.inner_text

Change line 104

        description: raw.at("#productDescription")&.inner_text,

to

        description: description,

To have something like that


Adapt github changes to my own site
(Frederik L) #16

I did add the changes to: Get full details for a printed book by frold · Pull Request #383 · discourse/onebox · GitHub

How to use the changes in my live site?


(Frederik L) #17

Seems like a dead end?

I can not see that those changes have been adapted either reviewed. :thinking: