Have Data Explorer Show number of results?

Data explorer returns only a certain number (500?) of items in the web interface. Too see them all you need to download them and look at them some other way (i.e., download a csv and open it in a spreadsheet).

It would be nice if it were easy to know how many items had been returned. What I’ve done in the past is to generate a second query that just did the count, which I find rather painful.

Feature Request

When the query is completed, would it be easy to pull out the number of results and print it next to (or below) the query completion time?

image

13 Likes

Sure, we should do something here @rishabh

When you run a query it should show Query completed in 12.7 ms, 102 results

IF we are truncating due to our enforced limit eg try SELECT id, raw from posts on meta. we should say Query completed in 12.7 ms, partial results, top 500 returned … something like that.

6 Likes

That’s an improvement, but the specific problem I’m trying to solve is that if there are more than 500,the only way to find out how many there are is to write another query or download the data set to a CSV and count the rows (and subtract one).

I’m not sure what the solution there would be.

I think what you are wanting is a SELECT COUNT() query.

I think for many queries it could be a matter of simply replacing the main SELECT fields with COUNT. But I don’t know if it would get messy with queries that had aggregates and GROUP BYs.

IMHO, a type of “results were limited to 500” would be good enough and would indicate that a separate COUNT query might be needed.

I quite often put together COUNT queries when I work up a query so I won’t be taken by surprise by a long running query. eg. from this table with mega rows join another table with mega rows join yet another table with mega rows …

In fact, I would recommend running heavy queries on a copy of the live database to avoid having the forum hang while the query runs.

That’s what I want, but the only way that I know to do that is to create a whole separate query that does the SELECT COUNT. So I’d need to create a query for (e.g., “closed topics” and “closed topics count”).

Just so you know, this is quite standard in SQL databases. You generally don’t get a count unless you explicitly ask for it, because counting “all the results” can be more expensive than getting the first {x} records.

We’re not secretly withholding information that we regularly get back from the database, because we’re jerks :wink:

7 Likes

I think a button for “Wrap the query with count() and get the result” might be viable.

WITH query AS (
#{sql}
) SELECT COUNT(*) FROM query
-- do not include LIMIT clause inside `sql`
4 Likes

That’s brilliant! That might be a PR that I could pull off (though there is a fairly long list of things that I need to submit).

1 Like

Would this in effect be running the query twice? Once to get the row count without the built-in resource use safety measures, and then again with the limit.

EDIT
AFAIK, though the Data Explorer plugin has a built in LIMIT, it does not have an OFFSET.

I think if there was some way to return the 500 result rows in “batches” - maybe with a OFFSET or a “not included in prior results” condition? - it would be safer. TBH, I’m thinking the better way would be to simply show a type of “LIMIT met, if you want more do it from the CLI” message where the admin could do a SELECT into an outfile query.

1 Like

This looks promising

PostgreSQL: Documentation: 9.1: Value Expressions

the window function is able to scan all the rows that would be part of the current row’s group according to the grouping specification

EDIT
I just had a brief play on my localhost as a POC

SELECT users.username 
 , COUNT(users.username) OVER ()  
FROM users 
WHERE users.id > 4 
LIMIT 5

over-count

3 Likes

For a first phase, the result limit has been updated from 250 to 1000 and number of results is displayed via:

https://github.com/discourse/discourse-data-explorer/commit/316923f1906d811d8547cea99bde8e44d58dd15e

When the limit is reached, the UI indicates that only partial results are shown:


When the limit is not reached, the exact number of results is shown:

9 Likes

Thanks for that tip, OVER() is a pretty cool function!
It works awesome but hiding the count column with the same identical values for every row was somewhat complicated.

4 Likes

We are showing counts now, closing this :slight_smile:

5 Likes