How to run Data Explorer queries with the Discourse API

Virtually any action that can be performed through the Discourse user interface can also be triggered with the Discourse API. For a general overview of how to find the correct API request for an action, see How to reverse engineer the Discourse API.

To run a Data Explorer query from the API, you need to make a POST request to /admin/plugins/explorer/queries/<query-id>/run. You can find the a Data Explorer query’s ID by visiting the query on your site and getting the value of the id parameter that you’ll see in your browser’s address bar.

This example query has an ID of 20 on my site. It returns a list of topics, ordered by views, for a given date:

--[params]
-- date :viewed_at

SELECT
topic_id,
COUNT(1) AS views_for_date
FROM topic_views
WHERE viewed_at = :viewed_at
GROUP BY topic_id
ORDER BY views_for_date DESC

This query can be run from a terminal with:

curl -X	POST "https://discourse.example.com/admin/plugins/explorer/queries/20/run" \
-H "Content-Type: multipart/form-data;" \
-H "Api-Key: <your_all_users_api_key>" \
-H "Api-Username: system" \
-F 'params={"viewed_at":"2019-06-10"}'

You will need to substitute your site’s base URL, query ID, and All Users API key into the example.

Handling queries that return over 1000 rows

By default, the Data Explorer plugin returns a maximum of 1000 rows. You can bypass this limit by adding a limit=ALL parameter to the request. The query below will generate 10000 rows.

SELECT
*
FROM generate_series(1, 10000)

On my site, the query’s ID is 26. Here’s how to get all the query’s rows:

curl -X POST "https://discourse.example.com/admin/plugins/explorer/queries/26/run" \
-H "Content-Type: multipart/form-data;" \
-H "Api-Key: <your_all_users_api_key>" \
-H "Api-Username: system" \
-F "limit=ALL"

Paginating the results

For a query that returns a very large data set, you can paginate the results. Given this example query (it has an id of 27 on my site):

--[params]
-- integer :limit = 100
-- integer :page = 0

SELECT
*
FROM generate_series(1, 10000)
OFFSET :page * :limit
LIMIT :limit

you can return the first 100 rows with

curl -X POST "https://discourse.example.com/admin/plugins/explorer/queries/27/run" \
-H "Content-Type: multipart/form-data;" \
-H "Api-Key: <your_all_users_api_key>" \
-H "Api-Username: system" \
-F 'params={"page":"0"}'

You can keep incrementing the value of the page parameter until the result_count that is returned is 0.

Removing relations data from the results

When Data Explorer queries are run through the user interface, a relations object is added to the results. This data is used for rendering the user in UI results, but you are unlikely to need it when running queries via the API. To remove that data from the results, pass a download=true parameter with your request:

curl -X POST "https://discourse.example.com/admin/plugins/explorer/queries/27/run" \
-H "Content-Type: multipart/form-data;" \
-H "Api-Key: <your_all_users_api_key>" \
-H "Api-Username: system" \
-F 'params={"page":"0"}' \
-F "download=true"

Api authentication

For details about authenticating API requests, see the Authentication section of our API Documentation: https://docs.discourse.org/

27 Likes

Thanks @simonstarr for putting that how to together! It’s very clear, and I appreciate that you included a custom parameter in your example. In the response of a test query, I see "default_limit": 1000. One thing that I’m still not sure of is how to handle large sets with the api.

https://meta.discourse.org/t/data-explorer-plugin/32566/146?u=grayden_shand

So I have two questions:

  1. Can I easily increase the default cap for result sets with >1000 rows?
  2. In a really large set (eg. > 1000000 rows) can you suggest a pattern for iterating through the subsets? My first take would be to set two parameters :limit and :offset in the query, and iteratively paginate the set with each call. But I would love to hear if you have a better solution.

Thanks again!

5 Likes

Those are good questions. I’ve updated the topic to give details about setting the query’s limit and paginating the results.

8 Likes

@simon Is there any api endpoint I can use to get the list of reports and the ID numbers? I want to build a dropdown with the list in it.

Yes, you can make an authenticated GET request to /admin/plugins/explorer/queries.json to get a list of all queries on the site.

4 Likes

That worked perfect! Thanks a ton.

@simon, any idea if this changed in the past few days?

All of our clock processes that use the data explorer failed last night, and after debugging it looks like the endpoint is no longer handling “ALL” for the limit parameter. I’m simply returned an empty set.

The size of the expected result set doesn’t impact this, whether I’m running a query that expects 5 results or 5,000, it’s always empty if I set limit=“ALL”.

I can remove the limit parameter entirely, or set it to a value e.g. limit=1000 and the query returns a populated result set.

1 Like

I just tested this with the query that is in the “Handling queries that return over 1000 rows” section of the OP. When I set -F "limit=ALL" I get 10 000 results. When I don’t set the limit, I get the default 1000 results.

If you send me a PM I can have a look at the query that’s causing you trouble. There may be a useful error message in your site’s logs.

5 Likes

@simon I’ve noticed the same issue. limit=ALL no longer works for me as well. I fixed it by simply doing limit=1000000

4 Likes

@vinothkannans recently had to add circuit breaker limits to data explorer to ensure stuff like select * from posts to create a CSV of enormity will not blow memory up.

That is probably the reason limit=ALL hack is no longer supported.

4 Likes

Yes, I removed the limit=ALL support for CSV exports recently. But still, it should work for all JSON exports. Now I just pushed a fix regarding to it. Please check it and let me know if you still have the issue.

8 Likes

I’m still experiencing this issue.

Here’s an example.

The SQL I’m using for the query is:
SELECT user_id, count(*) FROM posts GROUP BY user_id ORDER BY count(*)

It returns 36 rows (one for each user) on our test-site.

From Postman:

POST to https://testlogin.akimbo.com/admin/plugins/explorer/queries/1/run

With no “limit” parameter

All results returned

    "success": true,
    "errors": [],
    "duration": 0.5,
    "result_count": 36,
    "params": {},
    "columns": [
        "user_id",
        "count"
    ],
    "default_limit": 1000,
    "relations": {
        "user": [
          ...
        ]
    },
    "colrender": {
        "0": "user"
    },
    "rows": [
        [
            108,
            1
        ],
       ...
   ]
}

With “limit”=1

Successfully returns just one row.

{
    "success": true,
    "errors": [],
    "duration": 0.7,
    "result_count": 1,
    "params": {},
    "columns": [
        "user_id",
        "count"
    ],
    "default_limit": 1000,
    "relations": {
        "user": [
            {
                "id": 108,
                "username": "alxpck",
                "name": null,
                "avatar_template": "https://avatars.discourse.org/v4/letter/a/7feea3/{size}.png"
            }
        ]
    },
    "colrender": {
        "0": "user"
    },
    "rows": [
        [
            108,
            1
        ]
    ]
}

With “limit” = “ALL”

Returns empty result set

{
    "success": true,
    "errors": [],
    "duration": 0.4,
    "result_count": 0,
    "params": {},
    "columns": [
        "user_id",
        "count"
    ],
    "default_limit": 1000,
    "relations": {
        "user": []
    },
    "colrender": {
        "0": "user"
    },
    "rows": []
}
1 Like

Today I tested it again in my local dev environment. It is working fine. Please make sure you are in the latest version of Discourse.

7 Likes

It looks like its been fixed since my last post. I just re-tested the same example and it now works as expected.

Thanks for the help!

5 Likes

Is it possible to send parameters with the post request?

So for example say I have a Data Explorer query that finds the latest 10 topics from a category, can I send the category ID with the POST request to /admin/plugins/explorer/<query-id>/run?

1 Like

Yes, the first code example in the OP should give you what you are looking for. It passes a viewed_at parameter, but a similar approach would work with a category_id parameter.

4 Likes

Ooo that’s awesome - thanks Simon!

(Also apologies for not looking that far down - I thought that might have been out of date after reading a reading a recent deprecation warning).

2 Likes

Anybody know why this might not be working?

I’m using the Discourse_api gem with the following:

id = 13
params = {"id":"#{id}"}
client.post("/admin/plugins/explorer/queries/86/run.json", params)

But I keep getting:

DiscourseApi::UnprocessableEntity ({"success"=>false, "errors"=>["DataExplorer::ValidationError: Missing parameter id of type int"]})

Running the same via CURL works no problem:

curl -X POST "https://forum.com/admin/plugins/explorer/queries/86/run" \
-H "Content-Type: multipart/form-data;" \
-H "Api-Key: 123345" \
-H "Api-Username: system" \
-F 'params={"id":"13"}'

…and running similar Data Explorer Queries that do not require params (or adding default params to this query) work fine too… so it’s the sending of params that’s the issue - but I can’t work out why. Any ideas?

1 Like

It’s subtle, but you need to pass in a hash with a key of params.

id = "8"
params = { params: '{"id":"' + id + '"}' }
puts client.post("/admin/plugins/explorer/queries/1/run.json", params)
4 Likes

Woohoo it works! Thanks Blake :blush:

2 Likes