Data explorer returns only a certain number (500?) of items in the web interface. Too see them all you need to download them and look at them some other way (i.e., download a csv and open it in a spreadsheet).
It would be nice if it were easy to know how many items had been returned. What I’ve done in the past is to generate a second query that just did the count, which I find rather painful.
Feature Request
When the query is completed, would it be easy to pull out the number of results and print it next to (or below) the query completion time?
When you run a query it should show Query completed in 12.7 ms, 102 results
IF we are truncating due to our enforced limit eg try SELECT id, raw from posts on meta. we should say Query completed in 12.7 ms, partial results, top 500 returned … something like that.
That’s an improvement, but the specific problem I’m trying to solve is that if there are more than 500,the only way to find out how many there are is to write another query or download the data set to a CSV and count the rows (and subtract one).
I think what you are wanting is a SELECT COUNT() query.
I think for many queries it could be a matter of simply replacing the main SELECT fields with COUNT. But I don’t know if it would get messy with queries that had aggregates and GROUP BYs.
IMHO, a type of “results were limited to 500” would be good enough and would indicate that a separate COUNT query might be needed.
I quite often put together COUNT queries when I work up a query so I won’t be taken by surprise by a long running query. eg. from this table with mega rows join another table with mega rows join yet another table with mega rows …
In fact, I would recommend running heavy queries on a copy of the live database to avoid having the forum hang while the query runs.
That’s what I want, but the only way that I know to do that is to create a whole separate query that does the SELECT COUNT. So I’d need to create a query for (e.g., “closed topics” and “closed topics count”).
Just so you know, this is quite standard in SQL databases. You generally don’t get a count unless you explicitly ask for it, because counting “all the results” can be more expensive than getting the first {x} records.
We’re not secretly withholding information that we regularly get back from the database, because we’re jerks
Would this in effect be running the query twice? Once to get the row count without the built-in resource use safety measures, and then again with the limit.
EDIT
AFAIK, though the Data Explorer plugin has a built in LIMIT, it does not have an OFFSET.
I think if there was some way to return the 500 result rows in “batches” - maybe with a OFFSET or a “not included in prior results” condition? - it would be safer. TBH, I’m thinking the better way would be to simply show a type of “LIMIT met, if you want more do it from the CLI” message where the admin could do a SELECT into an outfile query.
Thanks for that tip, OVER() is a pretty cool function!
It works awesome but hiding the count column with the same identical values for every row was somewhat complicated.