Hey all - I’m trying to wrap my head around how search results are generated and am running into some weird inconsistencies, I’d be appreciative if folks could point me in the right direction!
What I’m trying to do
I’m generating a monthy “this week in our project” report about activity in the project. It pulls from many sources, like GitHub and Discourse. I’d like to do the following:
- Get a list of all new Discourse posts within a specific range of time (e.g. 2019-05-01 -> 2019-05-31)
- Get a list of all Discourse topics that were commented on during this range of time
- Provide a short list of “most active topics” as well as a list of “most active new topics” with links to our Discourse
- (ideally, but can’t figure out how) Provide a list of new and active users on the Discourse site
Where I’m getting confused
I’m generating this report programmatically (w/ python) so have been looking into the Discourse API for this. It seems like the /search
endpoint is the only way to request data within a range of dates (as opposed to using posts.json
etc). However, the results that this endpoint returns seem to be off.
As an example, here are a few results, in each case I’m searching only for date ranges, no keywords:
- If I search for
after:2019-05-01 before:2019-05-03
, then it returns 15 results - If I search for
after:2019-05-03 before:2019-05-05
then it returns 11 results - If I search for
after:2019-05-01 before:2019-05-05
then it returns 20 results
This confuses me, because I assume that the third search (which includes the full span of dates in the 1st and 2nd search) would return 15+11 = 26 results, instead of 20.
Could somebody explain this behavior to me? Or point me to a resource that goes into the search information more deeply?
Or more generally, I am open to “Chris you’re going about this the wrong kind of way, there’s a better way of getting this information” responses as well *
*with the exception of “pay for the data exporter” plugin - I’m coming from an open source community w/o a ton of resources, and while I’m working on getting buy-in from folks to pay for a version of Discourse, we’re not there yet
(thanks for reading this slightly long post!)