按时间间隔聚合仪表板报告数据

我最近编写了一些数据探索器查询,它们返回的数据与 Discourse 仪表板报告中的数据相似,但允许按时间段对数据进行聚合。例如,显示在给定开始和结束日期之间创建的主题数量,但将总计按周而非按天汇总。

查询参数的设置遵循以下规则:

查询参数:query_interval(Postgres 区间,例如 ‘1 day’、‘7 days’、‘1 week’、‘1 month’)、start_date(‘yyyy-mm-dd’)、end_date(‘yyyy-mm-dd’)、category_ids(逗号分隔的分类 ID 列表,默认为 -1)、include_subcategories(布尔值,默认为 true)。返回在提供的开始和结束日期之间创建的帖子数量。结果按查询区间分组。如果 category_ids 列表包含值 -1,则将返回所有分类的结果。

区间内首次回复的平均时间

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
),

topics_and_replies AS (
    SELECT
    t.created_at AS topic_created_at,
    p.topic_id AS reply_topic_id,
    p.created_at AS reply_created_at,
    period_start
    FROM topics t
    JOIN query_periods
    ON t.created_at::date >= period_start AND t.created_at::date < period_start + interval :query_interval
    JOIN posts p
    ON p.topic_id = t.id
    WHERE t.posts_count > 1
    AND t.archetype = 'regular'
    AND t.deleted_at IS NULL
    AND CASE
        WHEN -1 IN (:category_ids)
            THEN true
        WHEN :include_subcategories = false
            THEN t.category_id IN (:category_ids)
        ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
    END
    AND p.post_number > 1
    AND p.post_type = 1
    AND p.deleted_at IS NULL
)

SELECT period_start, ROUND(AVG(reply_time_hours)::numeric, 2) AS response_time_hours FROM(
    SELECT
    qp.period_start,
    EXTRACT(EPOCH FROM MIN(reply_created_at) - topic_created_at):: float / 3600 AS reply_time_hours
    FROM query_periods qp
    JOIN topics_and_replies tar
    ON tar.period_start = qp.period_start
    GROUP BY reply_topic_id, topic_created_at, qp.period_start
) replies_for_period
GROUP BY period_start
ORDER BY period_start

区间内解决方案总数

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
)

SELECT
period_start,
COUNT(1) AS solved_count
FROM user_actions ua
JOIN query_periods
ON ua.created_at::date >= period_start AND ua.created_at::date < period_start + interval :query_interval
JOIN topics t
ON t.id = ua.target_topic_id
JOIN posts p 
ON p.id = ua.target_post_id
WHERE ua.action_type = 15
AND t.deleted_at IS NULL
AND p.deleted_at IS NULL
AND CASE
    WHEN -1 IN (:category_ids)
        THEN true
    WHEN :include_subcategories = false
        THEN t.category_id IN (:category_ids)
    ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
END
GROUP BY period_start
ORDER BY period_start

区间内主题数量

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
)

SELECT qp.period_start,
COUNT(t.id)
FROM query_periods qp
JOIN topics t
ON t.created_at::date >= qp.period_start AND t.created_at::date < qp.period_start + interval :query_interval
WHERE t.deleted_at IS NULL
AND t.archetype = 'regular'
AND CASE
    WHEN -1 IN (:category_ids)
        THEN true
    WHEN :include_subcategories = false
        THEN t.category_id IN (:category_ids)
    ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
END
GROUP BY qp.period_start
ORDER BY qp.period_start

区间内帖子数量

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
)

SELECT
period_start,
COUNT(p.id)
FROM query_periods qp
JOIN posts p
ON p.created_at::date >= qp.period_start AND p.created_at::date < qp.period_start + interval :query_interval
JOIN topics t
ON t.id = p.topic_id
WHERE t.archetype = 'regular'
AND t.deleted_at IS NULL
AND CASE
    WHEN -1 IN (:category_ids)
        THEN true
    WHEN :include_subcategories = false
        THEN t.category_id IN (:category_ids)
    ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
END
AND p.deleted_at IS NULL
AND p.post_type = 1
GROUP BY period_start
ORDER BY period_start
7 个赞

谢谢 @simon,这些功能太棒了!

起初我有些困惑:选择时间间隔时为何仍需要 start_dateend_date 参数,反之亦然。现在我明白了,它是在指定的日期范围内,按每 X 个时间间隔返回结果。这对于快速查看一年内的月度变化或类似场景非常实用。

类别和子类别的包含功能也非常出色——我会追踪社区不同区域的活动,因此能够快速查看整个类别及其子类别的表现,这非常有用。

有没有简单的方法可以修改这些查询,以逗号分隔的列表形式显示子类别的结果?

例如:在时间间隔内,类别 1 发布了 10 篇帖子,类别 2 发布了 20 篇,类别 3 发布了 30 篇。

在查询中添加 category_ids1,2,3 会返回总数(60 篇帖子)。我希望有一种方法能返回 10,20,30。这样就可以在类别之间进行并排比较了。

2 个赞

这是可行的。更简单的方法是修改查询,使其为每个类别返回一行。这可以通过在最后的 GROUP BY 子句中包含类别 ID 来实现。我尚未用所有示例测试过此方法,但以下是“间隔内帖子数量”查询的修改版本,实现了该功能:

--[params]
-- string :query_interval = 7 days
-- date :start_date
-- date :end_date
-- int_list :category_ids = -1
-- boolean :include_subcategories = true

WITH query_periods AS (
  SELECT generate_series(:start_date, :end_date, :query_interval::interval)::date as period_start
),
subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (:category_ids)
),
sub_subcategory_ids AS (
    SELECT id FROM categories
    WHERE parent_category_id IN (SELECT id FROM subcategory_ids)
)

SELECT
period_start,
t.category_id,
COUNT(p.id)
FROM query_periods qp
JOIN posts p
ON p.created_at::date >= qp.period_start AND p.created_at::date < qp.period_start + interval :query_interval
JOIN topics t
ON t.id = p.topic_id
WHERE t.archetype = 'regular'
AND t.deleted_at IS NULL
AND CASE
    WHEN -1 IN (:category_ids)
        THEN true
    WHEN :include_subcategories = false
        THEN t.category_id IN (:category_ids)
    ELSE t.category_id IN (:category_ids) OR t.category_id IN (SELECT id FROM subcategory_ids) OR t.category_id IN (SELECT id FROM sub_subcategory_ids)
END
AND p.deleted_at IS NULL
AND p.post_type = 1
GROUP BY period_start, t.category_id
ORDER BY period_start

以下是我开发站点上的结果示例:

3 个赞

太好了——再次感谢!我想这应该能满足我的需求了 :slight_smile:

2 个赞

这太棒了@simon,谢谢。
请原谅我简单的问题,但是否可以:

  1. 在仪表板、报告部分包含自定义编写的报告,以及如何操作?
  2. 根据在 DataExplorer 中运行的查询结果触发某些操作 - 例如向管理员发送消息?

谢谢

1 个赞