DATE_TRUNC 用于数据聚合

date_trunc 函数是 SQL 中一个强大的工具。它允许您根据指定的日期部分截断 TIMESTAMP 或 INTERVAL 值,这使得它在需要基于特定时间段聚合或分组数据时成为一个不可或缺的功能。

语法

date_trunc 函数的语法如下:

date_trunc('date_part', field)
  • date_part:这是一个字符串,用于指定要截断的日期或时间戳部分。它可以是以下值之一:
    • millennium
    • century
    • decade
    • year
    • quarter
    • month
    • week
    • day
    • hour
    • minute
    • second
    • milliseconds
    • microseconds
  • field:这是要截断的时间戳或间隔。

在 DE 查询中的示例用法

让我们看几个利用 date_trunc 的示例查询:

按月统计新主题数量

复杂度等级:入门

此 SQL 查询用于统计 Discourse 数据库中每个月创建的主题数量。

SELECT 
    date_trunc('month', created_at)::DATE AS month,
    count(id)
FROM topics
GROUP BY month
ORDER BY month DESC

在此查询中,date_trunc('month', created_at)::DATEcreated_at 时间戳截断到月份,然后将其转换为日期格式进行显示,从而有效地按创建月份对主题进行分组。

count(id) 函数随后统计每个月创建的主题数量。结果按月份降序排列,因此最近的月份将排在第一位。

示例结果:

month count
2023-09-01 1
2023-08-01 6
2023-07-01 10
详细解释及行内注释
-- 选择主题创建的月份以及主题数量
SELECT 
    -- 将 'created_at' 时间戳截断到月份并转换为日期
    -- 这按创建月份对主题进行分组
    date_trunc('month', created_at)::DATE AS month,
    -- 统计每个月创建的主题数量
    count(id)
-- 从 'topics' 表中选取
FROM topics
-- 按月份对结果进行分组
GROUP BY month
-- 按月份降序排列结果
-- 这意味着最近的月份将排在第一位
ORDER BY month DESC

用户累计总数

复杂度等级:中级

此查询将提供 Discourse 论坛每周的用户注册报告,以及用户累计总数。它使用 WITH 子句创建一个临时结果集(daily_signups),然后从该结果集中进行选择。

-- [params]
-- date :start_date = 2000-01-01
-- date :end_date

WITH daily_signups AS(
SELECT
    to_char(date_trunc('week', u.created_at)::date,'YYYY-MM-DD') AS Date,
    Count (id) as Signups
FROM users u
WHERE
    u.created_at::date BETWEEN :start_date::date AND :end_date::date
GROUP BY Date
)

SELECT
    Date, Signups, SUM(Signups) OVER (ORDER BY Date) as total_users
FROM daily_signups
ORDER BY Date Asc

以下是该查询运行方式的分解说明:

  • WITH 子句创建一个名为 daily_signups 的临时结果集。该结果集包含在参数 :start_date:end_date 指定的开始和结束日期之间,每周的用户注册数量。
  • daily_signups 结果集中,date_trunc('week', u.created_at)::datecreated_at 时间戳截断到周,然后转换为日期。这有效地按注册周对用户进行了分组。
  • Count(id) 随后统计每周注册的用户数量。
  • 在主 SELECT 语句中,SUM(Signups) OVER (ORDER BY Date) 计算用户的累计总数。OVER (ORDER BY Date) 子句指定总和应按日期排序的行进行计算,因此它给出了截至每个日期的注册累计总和。
  • 结果随后按日期升序排列。

示例结果:

date signups total_users
2013-01-28 20 20.0
2013-02-04 2136 2156.0
2013-02-11 442 2598.0
详细解释及行内注释
-- 定义开始和结束日期参数
-- [params]
-- date :start_date = 2000-01-01
-- date :end_date

-- 创建公共表表达式 (CTE) 以统计每周的用户注册数量
WITH daily_signups AS(
SELECT
    -- 将 'created_at' 时间戳截断到周并格式化为日期字符串
    to_char(date_trunc('week', u.created_at)::date,'YYYY-MM-DD') AS Date,
    -- 统计注册的用户数量
    Count (id) as Signups
FROM users u
WHERE
    -- 仅包含在开始和结束日期之间注册的用户
    u.created_at::date BETWEEN :start_date::date AND :end_date::date
GROUP BY Date
)

-- 选择日期、注册数量和注册累计总数
SELECT
    Date, 
    Signups, 
    -- 计算注册累计总数
    SUM(Signups) OVER (ORDER BY Date) as total_users
FROM daily_signups
-- 按日期升序排列结果
ORDER BY Date Asc

每月已解决和未解决问题数量

复杂度等级:中级 / 需要 Discourse Solved 插件

此查询将提供 Discourse 论坛上每月已解决和未解决问题的报告。此查询假设站点上的所有主题都可以被标记为已解决。

-- [params]
-- date :start_date
-- date :end_date

WITH monthly_questions AS (
    SELECT
        date_trunc('month', created_at)::DATE AS month,
        COUNT(id) as total_questions
    FROM topics
    WHERE
        created_at::date BETWEEN :start_date::date AND :end_date::date
    GROUP BY month
),
solved_questions AS (
    SELECT
        date_trunc('month', created_at)::DATE AS month,
        COUNT(id) as solved
    FROM user_actions
    WHERE
        created_at::date BETWEEN :start_date::date AND :end_date::date
    AND action_type = 15
    GROUP BY month
)

SELECT
    mq.month, 
    mq.total_questions, 
    COALESCE(sq.solved, 0) as solved,
    mq.total_questions - COALESCE(sq.solved, 0) as unsolved
FROM monthly_questions mq
LEFT JOIN solved_questions sq ON mq.month = sq.month
ORDER BY mq.month ASC

在此查询中,monthly_questions CTE 统计每月创建的问题(主题)总数。solved_questions CTE 通过统计 user_actions 表中 action type = 15id 数量,来统计每月被标记为已解决的问题数量。

SELECT 语句随后通过从问题总数中减去已解决问题数量来计算未解决问题数量。结果按月份升序排列,因此最早的月份将排在第一位。

示例结果:

month total_questions solved unsolved
2023-07-01 10 3 7
2023-08-01 6 0 6
2023-09-01 1 1 0
详细解释及行内注释
-- [params]
-- date :start_date = 2000-01-01
-- date :end_date

-- 创建 CTE 以统计每月创建的问题(主题)总数
WITH monthly_questions AS (
    SELECT
        -- 将 'created_at' 时间戳截断到月份
        date_trunc('month', created_at)::DATE AS month,
        -- 统计每月创建的主题数量
        COUNT(id) as total_questions
    FROM topics
    WHERE
        -- 仅包含在开始和结束日期之间创建的主题
        created_at::date BETWEEN :start_date::date AND :end_date::date
    GROUP BY month
),
-- 创建 CTE 以统计每月被标记为已解决的问题数量
solved_questions AS (
    SELECT
        -- 将 'created_at' 时间戳截断到月份
        date_trunc('month', created_at)::DATE AS month,
        -- 统计每月已解决的问题数量
        COUNT(id) as solved
    FROM user_actions
    WHERE
        -- 仅包含在开始和结束日期之间执行的操作
        created_at::date BETWEEN :start_date::date AND :end_date::date
        -- 仅考虑操作类型为 15 的操作(表示问题已解决)
        AND action_type = 15
    GROUP BY month
)

-- 选择月份、问题总数、已解决问题数量和未解决问题数量
SELECT
    mq.month, 
    mq.total_questions, 
    -- 如果某个月没有已解决的问题,则显示 0
    COALESCE(sq.solved, 0) as solved,
    -- 从问题总数中减去已解决问题数量以获得未解决问题数量
    mq.total_questions - COALESCE(sq.solved, 0) as unsolved
FROM monthly_questions mq
-- 按月份连接 'monthly_questions' 和 'solved_questions' CTE
LEFT JOIN solved_questions sq ON mq.month = sq.month
-- 按月份升序排列结果
ORDER BY mq.month ASC

主题回复统计

复杂度等级:高级

这个复杂的 SQL 查询提供了 Discourse 论坛每周的主题活动报告。它将主题数据分解为几个关键指标:至少有一条回复的主题数量、没有回复的主题数量、主题未获回复的最长天数,以及首次回复的平均时间。

WITH posts_list AS (
    SELECT 
        t.id topic_id,
        p.post_number,
        p.created_at,
        ROW_NUMBER () OVER (
                    PARTITION BY p.topic_id 
                    ORDER BY p.post_number)  AS post_order
    FROM posts p
    INNER JOIN topics t ON t.id = p.topic_id AND archetype = 'regular' AND t.deleted_at ISNULL
    WHERE p.post_type = 1
        AND p.deleted_at ISNULL
        AND t.deleted_at ISNULL
        AND t.archetype = 'regular'
    ORDER BY p.topic_id, p.post_number
),
atleast_1_response AS (
    SELECT 
        date_trunc('week', t.created_at::date)::date AS "week",
        COUNT(*) AS "topics with atleast one response"
    FROM topics t
    WHERE t.archetype = 'regular'
        AND t.deleted_at ISNULL
        AND t.posts_count >= 2
    GROUP BY "week"
),
no_response AS(
    SELECT 
        date_trunc('week', t.created_at::date)::date AS "week",
        COUNT(*) AS "topics without response"
    FROM topics t
    WHERE t.archetype = 'regular'
        AND t.deleted_at ISNULL
        AND t.posts_count = 1
    GROUP BY "week"
),
max_days_without_response AS(
    SELECT 
        date_trunc('week', t.created_at::date)::date AS "week",
        CURRENT_DATE::date-MIN(created_at)::date AS "max days without response"
    FROM topics t
    WHERE t.archetype = 'regular'
        AND t.deleted_at ISNULL
        AND t.posts_count = 1
    GROUP BY "week"
),
avg_time_first_response AS (
    SELECT 
        date_trunc('week', pl.created_at::date)::date AS "week",
        EXTRACT(epoch FROM AVG(pl2.created_at-pl.created_at))/3600 AS "avg time first response (h)"
    FROM posts_list pl
    INNER JOIN posts_list pl2 ON pl.topic_id = pl2.topic_id AND pl2.post_order = 2
    WHERE  pl.post_order = 1
    GROUP BY "week" 
)

SELECT ar.week, 
    "topics without response", 
    "max days without response", 
    "topics with atleast one response",
    "avg time first response (h)"
FROM atleast_1_response ar
INNER JOIN no_response nr ON nr.week = ar.week
INNER JOIN max_days_without_response mdwr ON mdwr.week = ar.week
INNER JOIN avg_time_first_response atfr ON atfr.week = ar.week
ORDER BY "week" DESC

以下是该查询工作原理的总结:

  • posts_list 公共表表达式 (CTE) 从 poststopics 表中选择所有常规帖子的列表,按 topic_idpost_number 排序。它还为其主题内的每个帖子分配了一个行号(post_order)。
  • atleast_1_response CTE 统计每周至少有一条回复的常规主题数量(即 posts_count 大于或等于 2)。
  • no_response CTE 统计每周没有回复的常规主题数量(即 posts_count 等于 1)。
  • max_days_without_response CTE 计算每周未获回复的主题未被回答的最长天数。
  • avg_time_first_response CTE 计算每周每个主题首次回复的平均时间(以小时为单位)。
  • SELECT 语句随后按周连接这些 CTE 并选择相关列。结果按周降序排列。
week topics without response max days without response topics with atleast one response avg time first response (h)
2023-09-04 15 2 47 2.6778684519444444
2023-08-28 30 9 138 8.7899938238888889
2023-08-21 22 16 130 9.3280889688888889
详细解释及行内注释
-- 创建所有常规帖子的临时表 (CTE),按 topic_id 和 post_number 排序
WITH posts_list AS (
    SELECT 
        t.id topic_id,  -- 主题 ID
        p.post_number,  -- 帖子编号
        p.created_at,   -- 帖子创建日期
        -- 为每个主题内的帖子分配行号
        ROW_NUMBER () OVER (
                    PARTITION BY p.topic_id 
                    ORDER BY p.post_number)  AS post_order
    FROM posts p
    -- 与 topics 表连接,仅考虑未删除的常规主题
    INNER JOIN topics t ON t.id = p.topic_id AND archetype = 'regular' AND t.deleted_at ISNULL
    WHERE p.post_type = 1
        AND p.deleted_at ISNULL  -- 排除已删除的帖子
        AND t.deleted_at ISNULL  -- 排除已删除的主题
        AND t.archetype = 'regular'  -- 仅考虑常规主题
    ORDER BY p.topic_id, p.post_number
),
-- 创建 CTE 以统计每周至少有一条回复的常规主题数量
atleast_1_response AS (
    SELECT 
        -- 将 created_at 时间戳截断到周
        date_trunc('week', t.created_at::date)::date AS "week",
        COUNT(*) AS "topics with atleast one response"
    FROM topics t
    WHERE t.archetype = 'regular'  -- 仅考虑常规主题
        AND t.deleted_at ISNULL  -- 排除已删除的主题
        AND t.posts_count >= 2  -- 仅考虑至少有一条回复的主题
    GROUP BY "week"
),
-- 创建 CTE 以统计每周没有回复的常规主题数量
no_response AS(
    SELECT 
        -- 将 created_at 时间戳截断到周
        date_trunc('week', t.created_at::date)::date AS "week",
        COUNT(*) AS "topics without response"
    FROM topics t
    WHERE t.archetype = 'regular'  -- 仅考虑常规主题
        AND t.deleted_at ISNULL  -- 排除已删除的主题
        AND t.posts_count = 1  -- 仅考虑没有回复的主题
    GROUP BY "week"
),
-- 创建 CTE 以计算每周未获回复的主题未被回答的最长天数
max_days_without_response AS(
    SELECT 
        -- 将 created_at 时间戳截断到周
        date_trunc('week', t.created_at::date)::date AS "week",
        -- 计算从主题创建日期到当前日期的天数
        CURRENT_DATE::date-MIN(created_at)::date AS "max days without response"
    FROM topics t
    WHERE t.archetype = 'regular'  -- 仅考虑常规主题
        AND t.deleted_at ISNULL  -- 排除已删除的主题
        AND t.posts_count = 1  -- 仅考虑没有回复的主题
    GROUP BY "week"
),
-- 创建 CTE 以计算每周每个主题首次回复的平均时间(小时)
avg_time_first_response AS (
    SELECT 
        -- 将 created_at 时间戳截断到周
        date_trunc('week', pl.created_at::date)::date AS "week",
        -- 计算首次回复的平均时间(小时)
        EXTRACT(epoch FROM AVG(pl2.created_at-pl.created_at))/3600 AS "avg time first response (h)"
    FROM posts_list pl
    -- 与 posts_list CTE 连接,仅考虑每个主题中的第二个帖子
    INNER JOIN posts_list pl2 ON pl.topic_id = pl2.topic_id AND pl2.post_order = 2
    WHERE  pl.post_order = 1  -- 仅考虑每个主题中的第一个帖子
    GROUP BY "week" 
)

-- 选择周、没有回复的主题数量、未获回复的最长天数、至少有一条回复的主题数量以及首次回复的平均时间
SELECT ar.week, 
    "topics without response", 
    "max days without response", 
    "topics with atleast one response",
    "avg time first response (h)"
FROM atleast_1_response ar
-- 按周连接 CTE
INNER JOIN no_response nr ON nr.week = ar.week
INNER JOIN max_days_without_response mdwr ON mdwr.week = ar.week
INNER JOIN avg_time_first_response atfr ON atfr.week = ar.week
-- 按周降序排列
ORDER BY "week" DESC

这些只是您可以在数据探索器查询中使用 date_trunc 的几种方式示例。请随意在您的网站上使用任何这些查询,如果您有任何问题,请在下方提出。:slight_smile:

5 个赞