Aggregate dashboard report data by time interval

I recently wrote some Data Explorer queries that return similar data to what is found on the Discourse dashboard reports, but allow the data to be aggregated by a time period. For example, show the number of topics created between given start and end dates, but have the totals summed for weekly instead of daily periods.

The parameters for the queries are set with the following rules:

Query parameters: query_interval (a Postgres interval, for example, ‘1 day’, ‘7 days’, ‘1 week’, ‘1 month’), start_date (‘yyyy-mm-dd’), end_date (‘yyyy-mm-dd’), category_ids (a comma separated list of category ids, defaults to -1), include_subcategories (boolean, defaults to true.) Returns the number of posts created between the supplied start and end dates. Results are grouped by the query interval. If the category_ids list contains the value -1, results will be returned for all categories.

Average time to first response for interval

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
),

topics_and_replies AS (
    SELECT
    t.created_at AS topic_created_at,
    p.topic_id AS reply_topic_id,
    p.created_at AS reply_created_at,
    period_start
    FROM topics t
    JOIN query_periods
    ON t.created_at::date >= period_start AND t.created_at::date < period_start + interval :query_interval
    JOIN posts p
    ON p.topic_id = t.id
    WHERE t.posts_count > 1
    AND t.archetype = 'regular'
    AND t.deleted_at IS NULL
    AND CASE
        WHEN -1 IN (:category_ids)
            THEN true
        WHEN :include_subcategories = false
            THEN t.category_id IN (:category_ids)
        ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
    END
    AND p.post_number > 1
    AND p.post_type = 1
    AND p.deleted_at IS NULL
)

SELECT period_start, ROUND(AVG(reply_time_hours)::numeric, 2) AS response_time_hours FROM(
    SELECT
    qp.period_start,
    EXTRACT(EPOCH FROM MIN(reply_created_at) - topic_created_at):: float / 3600 AS reply_time_hours
    FROM query_periods qp
    JOIN topics_and_replies tar
    ON tar.period_start = qp.period_start
    GROUP BY reply_topic_id, topic_created_at, qp.period_start
) replies_for_period
GROUP BY period_start
ORDER BY period_start

Total solutions for interval

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
)

SELECT
period_start,
COUNT(1) AS solved_count
FROM user_actions ua
JOIN query_periods
ON ua.created_at::date >= period_start AND ua.created_at::date < period_start + interval :query_interval
JOIN topics t
ON t.id = ua.target_topic_id
JOIN posts p 
ON p.id = ua.target_post_id
WHERE ua.action_type = 15
AND t.deleted_at IS NULL
AND p.deleted_at IS NULL
AND CASE
    WHEN -1 IN (:category_ids)
        THEN true
    WHEN :include_subcategories = false
        THEN t.category_id IN (:category_ids)
    ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
END
GROUP BY period_start
ORDER BY period_start

Topic counts for interval

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
)

SELECT qp.period_start,
COUNT(t.id)
FROM query_periods qp
JOIN topics t
ON t.created_at::date >= qp.period_start AND t.created_at::date < qp.period_start + interval :query_interval
WHERE t.deleted_at IS NULL
AND t.archetype = 'regular'
AND CASE
    WHEN -1 IN (:category_ids)
        THEN true
    WHEN :include_subcategories = false
        THEN t.category_id IN (:category_ids)
    ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
END
GROUP BY qp.period_start
ORDER BY qp.period_start

Posts count for interval

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
)

SELECT
period_start,
COUNT(p.id)
FROM query_periods qp
JOIN posts p
ON p.created_at::date >= qp.period_start AND p.created_at::date < qp.period_start + interval :query_interval
JOIN topics t
ON t.id = p.topic_id
WHERE t.archetype = 'regular'
AND t.deleted_at IS NULL
AND CASE
    WHEN -1 IN (:category_ids)
        THEN true
    WHEN :include_subcategories = false
        THEN t.category_id IN (:category_ids)
    ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
END
AND p.deleted_at IS NULL
AND p.post_type = 1
GROUP BY period_start
ORDER BY period_start
7 Likes

Thanks @simon - these are excellent!

I was initially confused by having the start_date and end_date parameters still required when selecting an interval, and vice versa - now I see that it returns results per X interval, within the date range specified. That’s really handy for quickly looking at monthly changes across a year, or similar scenarios.

Category and subcategory inclusion are great - I track activity in different parts of my community, so it’s very handy to be able to quickly look at how an entire category + subcategories are performing.

Is there a simple way to modify these queries to display the subcategory results in a comma-separated list?

e.g. Posts made during the interval in category 1 (10 posts), 2 (20 posts) & 3 (30 posts).

Adding the category_ids 1,2,3 to the query returns a total (60 posts). I’d love to have a method for returning 10,20,30. That would allow for side-by-side comparisons between categories.

2 Likes

That could be possible. An easier approach is to modify the queries to return a row for each category. That can be done by changing the final GROUP BY clause to include the category ID. I haven’t tried this with all the examples that I posted, but here’s a modification of the “Post counts for interval” query that does that:

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
)

SELECT
period_start,
t.category_id,
COUNT(p.id)
FROM query_periods qp
JOIN posts p
ON p.created_at::date >= qp.period_start AND p.created_at::date < qp.period_start + interval :query_interval
JOIN topics t
ON t.id = p.topic_id
WHERE t.archetype = 'regular'
AND t.deleted_at IS NULL
AND CASE
    WHEN -1 IN (:category_ids)
        THEN true
    WHEN :include_subcategories = false
        THEN t.category_id IN (:category_ids)
    ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
END
AND p.deleted_at IS NULL
AND p.post_type = 1
GROUP BY period_start, t.category_id
ORDER BY period_start

Here’s what the results look like on my development site:

3 Likes

Fantastic - thanks again! I think that should do what I need it to :slight_smile:

2 Likes