Similar to the time to first response, I’m looking for a way to determine what the average time (over any time period) is between the last two responses in topics in a selected list of categories.
The use case for this is in determining if an auto-close setting is appropriate.
I’ve got the data explorer plugin installed, but I don’t know enough about the database schema (or, for that matter, remember much from my college database/SQL course in the early 90s) to create the report myself.
Would appreciate pointers and suggestions on how to pull this data.
ETA: It looks like the approach is to do a join between topics and posts (so I can filter by category), and to then find the posts with the two latest update timestamps and difference them.
Since the point of the exercise is not dependent on whether or not the latest is a reply to the second-latest, but to determine the max time differential between the two most recent replies (regardless of their specific relationship) in order to see if the topic was old enough to be closed, and then to calculate the average across all categories (or per category, which would probably be more useful),
Thinking a bit more about it, it seems this might just be a variation on whatever query is used in the rake topics:auto_close process (the difference being that that uses the current time rather than differencing the two most recent posts in the topic).
-- [params]
-- null date :start_date
-- null date :end_date
-- null int_list :category_ids
WITH RankedPosts AS (
SELECT
p.topic_id,
p.created_at,
ROW_NUMBER() OVER (PARTITION BY p.topic_id ORDER BY p.post_number DESC) AS rank
FROM
posts p
WHERE
p.created_at BETWEEN :start_date AND :end_date
AND EXISTS (
SELECT 1
FROM topics t
WHERE t.id = p.topic_id
AND t.category_id IN (:category_ids)
)
),
FilteredPosts AS (
SELECT
topic_id,
created_at,
rank
FROM
RankedPosts
WHERE
rank <= 2
),
PostDifferences AS (
SELECT
topic_id,
EXTRACT(days FROM (MAX(created_at) FILTER (WHERE rank = 1) - MAX(created_at) FILTER (WHERE rank = 2)))::numeric(9,2) AS days_difference
FROM
FilteredPosts
GROUP BY
topic_id
)
SELECT
t.category_id,
AVG(pd.days_difference) AS avg_days_difference,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY pd.days_difference) AS median_days_difference
FROM
topics t
JOIN
PostDifferences pd ON t.id = pd.topic_id
GROUP BY
t.category_id
ORDER BY
avg_days_difference DESC
This should give the mean and median of the times between the creation timestamps. The update timestamps seemed problematic (I got some negative values for some reason when just doing averages).
I had some AI assistance with creating the query, so it’s possible that there are mistakes - and I’m happy if anyone has suggestions about how to improve it (or correct any errors). The results that I got seemed reasonable based on checks made on smaller groups, as well as changes resulting from increasing the date range resulting in trends that aligned with expectations I had from using a larger data sample.