Discourses API get just the number of search results

Hi. I am trying to get just the number of search results from the API.
I have the following query /search.json?q=query but i just need information about how many results there are. Not blurbs, cooked, etc.
Is it possible with discourse API?

1 Like

I don’t think we return a “count” in the response, but it is something you can calculate yourself.

See the search API docs for a more detailed response example, but it will look something like this:

{
    "posts": [],
    "topics": [],
    "users": [],
    "categories": [],
    "grouped_search_result": {}
}

Be default the API will return a max of 50 results. To calculate the count you need to just count the number of items in the posts array. The number items in the topics array should be the same so there is no reason to count that array too.

2 Likes

I’m trying every way I can think of to just download all the topics and posts from my site - latest and top are limited, and I’m now trying getting all categories, and doing a search for the category (akin to how I can in the site). For example, in our site if I search for Q&A #q-a here I get over 50 results. When I search for that exact string with the discourse_api ruby gem, I get only 5:

irb(main):123:0> topics["posts"].length
=> 5
irb(main):124:0> topics["topics"].length
=> 5

Why is this not consistent with the interface and with what you are reporting? What is the easiest way to export data? I’d like to do some NLP on our site’s content and it’s proving to be very hard just to get the data. Thanks!

latest paginates, you just need to pass the params properly and you should be able to reach all topics via the API.

The search paginates as well.

I recommend How to reverse engineer the Discourse API as a crash coarse for figuring out all the params you need.

2 Likes

Thanks @sam! I can see (even from the GET request) that it should be fairly intuitive - when I want to get page 2, I add an additional option for page. I can also see that “options” is something I can define with the discourse_api function:

# frozen_string_literal: true
module DiscourseApi
  module API
    module Search
      # Returns search results that match the specified term.
      #
      # @param term [String] a search term
      # @param options [Hash] A customizable set of options
      # @option options [String] :type_filter Returns results of the specified type.
      # @return [Array] Return results as an array of Hashes.
      def search(term, options = {})
        raise ArgumentError.new("#{term} is required but not specified") unless term
        raise ArgumentError.new("#{term} is required but not specified") unless !term.empty?

        response = get('/search/query', options.merge(term: term))
        response[:body]
      end
    end
  end
end

So - trying this out, I would expect to get different results here for page 1 and 2. Or let’s give a little more separation and do pages 1 and 3. The query is for all Q&A topics:

 query = category["name"] + " #" + category["slug"]
=> "Q&A #q-a"

Now let’s retrieve pages 1, and 3 using the discourse_api client:

topics1 = client.search(query, options={"page": "1"})
topics3 = client.search(query, options={"page": "3"})

I can look at the first topic for each:

=> {"id"=>220, "title"=>"Why am I exceeding the quota?", "fancy_title"=>"Why am I exceeding the quota?", "slug"=>"why-am-i-exceeding-the-quota", "posts_count"=>3, "reply_count"=>0, "highest_post_number"=>3, "image_url"=>nil, "created_at"=>"2018-06-01T12:56:12.120Z", "last_posted_at"=>"2018-06-15T16:41:44.736Z", "bumped"=>true, "bumped_at"=>"2018-06-15T16:41:44.736Z", "unseen"=>false, "pinned"=>false, "unpinned"=>nil, "visible"=>true, "closed"=>false, "archived"=>false, "bookmarked"=>nil, "liked"=>nil, "tags"=>["storage", "quota"], "category_id"=>26, "has_accepted_answer"=>false}

irb(main):148:0> topics3['topics'][0]
=> {"id"=>220, "title"=>"Why am I exceeding the quota?", "fancy_title"=>"Why am I exceeding the quota?", "slug"=>"why-am-i-exceeding-the-quota", "posts_count"=>3, "reply_count"=>0, "highest_post_number"=>3, "image_url"=>nil, "created_at"=>"2018-06-01T12:56:12.120Z", "last_posted_at"=>"2018-06-15T16:41:44.736Z", "bumped"=>true, "bumped_at"=>"2018-06-15T16:41:44.736Z", "unseen"=>false, "pinned"=>false, "unpinned"=>nil, "visible"=>true, "closed"=>false, "archived"=>false, "bookmarked"=>nil, "liked"=>nil, "tags"=>["storage", "quota"], "category_id"=>26, "has_accepted_answer"=>false}

They are exactly the same, which I think means the page variable isn’t working? When I inspect in Chrome devtools, the point is triggered by scrolling down (since the posts auto-load in the window) I can confirm that page=2 is the correct parameter:

Request URL: https://ask.cyberinfrastructure.org/search?q=Q%26A%20%23q-a&page=2
Request Method: GET
Status Code: 200  (from ServiceWorker)
Referrer Policy: strict-origin-when-cross-origin

or better, just look at the parameters list:

Query String Parameters
q: Q&A #q-a
page: 2

This isn’t a form submit, so I don’t see any “Form Data” per the example.

2 Likes

Does anyone have any wisdom here? I’ve tried what was suggested, but I don’t see a logical next step. The page variable doesn’t seem to be working when given with the request.

The Discourse API gem is using the /search/query route. It doesn’t seem to respond to pagination. The Discourse UI uses the /search route. It does respond to pagination.

You can test this in your browser by going to http://forum.example.com/search.json?q=test and then trying http://forum.example.com/search.json?q=test&page=2

You might need to find a way to make the API call without using the Discourse API gem. If your goal is to get all the topics and posts on your site, using the /search route doesn’t seem like the best approach.

You might try making an API call to http://forum.example.com/c/your-category-slug.json. If all topics for the category are not returned in the request, the request’s topic_list will have a more_topics_url property that will give you the route to the next page of topics. That will look something like "/c/site-feedback?page=2". You’ll need to add .json to the URL to get the JSON data (/c/site-feedback.json?page=2).

4 Likes

Thank you! That worked totally perfectly, and it’s so much easier in Python with requests (I was making it hard on purpose to get more comfortable with ruby, but the client didn’t have what I needed). I’m mostly done with exports and haven’t done any machine learning stuffs yet, but if anyone is interested in the calls that I made, the quick scripts are here: https://github.com/hpsee/discourse-cluster. Hopefully I’ll do some cool clustering soon!

4 Likes

Thanks again @sam and @simon - in case others are ever interested to either do a simple export of topics, or (go further) and do some clustering with visualizations in d3, I wrote up a quick post that walks through it AskCI Discourse Clustering | VanessaSaurus. And again, all the stuffs you need to start are at the repository I previously linked.

6 Likes