Looking to build a custom report - time between last two responses on a topic

Similar to the time to first response, I’m looking for a way to determine what the average time (over any time period) is between the last two responses in topics in a selected list of categories.

The use case for this is in determining if an auto-close setting is appropriate.

I’ve got the data explorer plugin installed, but I don’t know enough about the database schema (or, for that matter, remember much from my college database/SQL course in the early 90s) to create the report myself.

Would appreciate pointers and suggestions on how to pull this data.

ETA: It looks like the approach is to do a join between topics and posts (so I can filter by category), and to then find the posts with the two latest update timestamps and difference them.

Since the point of the exercise is not dependent on whether or not the latest is a reply to the second-latest, but to determine the max time differential between the two most recent replies (regardless of their specific relationship) in order to see if the topic was old enough to be closed, and then to calculate the average across all categories (or per category, which would probably be more useful),

Thinking a bit more about it, it seems this might just be a variation on whatever query is used in the rake topics:auto_close process (the difference being that that uses the current time rather than differencing the two most recent posts in the topic).

For anyone interested, here’s what seems to work:

-- [params]
-- null date :start_date
-- null date :end_date
-- null int_list :category_ids

WITH RankedPosts AS (
    SELECT 
        p.topic_id,
        p.created_at,
        ROW_NUMBER() OVER (PARTITION BY p.topic_id ORDER BY p.post_number DESC) AS rank
    FROM 
        posts p
    WHERE 
        p.created_at BETWEEN :start_date AND :end_date
        AND EXISTS (
            SELECT 1
            FROM topics t
            WHERE t.id = p.topic_id
            AND t.category_id IN (:category_ids)
        )
),
FilteredPosts AS (
    SELECT 
        topic_id,
        created_at,
        rank
    FROM 
        RankedPosts
    WHERE 
        rank <= 2
),
PostDifferences AS (
    SELECT 
        topic_id,
        EXTRACT(days FROM (MAX(created_at) FILTER (WHERE rank = 1) - MAX(created_at) FILTER (WHERE rank = 2)))::numeric(9,2) AS days_difference
    FROM 
        FilteredPosts
    GROUP BY 
        topic_id
)
SELECT 
    t.category_id,
    AVG(pd.days_difference) AS avg_days_difference,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY pd.days_difference) AS median_days_difference
FROM 
    topics t
JOIN 
    PostDifferences pd ON t.id = pd.topic_id
GROUP BY 
    t.category_id
ORDER BY
    avg_days_difference DESC

This should give the mean and median of the times between the creation timestamps. The update timestamps seemed problematic (I got some negative values for some reason when just doing averages).

I had some AI assistance with creating the query, so it’s possible that there are mistakes - and I’m happy if anyone has suggestions about how to improve it (or correct any errors). The results that I got seemed reasonable based on checks made on smaller groups, as well as changes resulting from increasing the date range resulting in trends that aligned with expectations I had from using a larger data sample.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.