在本教程中,我们将探讨在 Data Explorer SQL 查询中使用 COALESCE 函数。
COALESCE 允许您在查询结果中处理 NULL 值。如果您的数据中存在 NULL 值,可以使用 COALESCE 为这些 NULL 值提供默认值(例如 0)。
当您可能对查询结果进行后续计算或数据分析时,COALESCE 特别有用,因为 NULL 值可能会导致问题或误解。
语法
COALESCE 函数接受两个或更多参数,并返回从左到右遇到的第一个非 NULL 值。如果所有参数均为 NULL,则 COALESCE 返回 NULL。
COALESCE 的基本语法如下:
COALESCE(value1, value2, ..., valueN)
例如,COALESCE(NULL, 1, 2) 将返回 1,因为 1 是第一个非 NULL 参数。
示例查询
让我们查看几个示例查询,以了解 COALESCE 在 Data Explorer 查询中的用法。
帖子创建数、收到的点赞数和收到的书签数
复杂度级别:入门
此查询将获取每个用户在网站上创建的帖子总数、收到的点赞数和收到的书签数。如果用户没有帖子、点赞或书签,COALESCE 函数将返回 0 而不是 NULL。
SELECT
users.id AS user_id,
users.username,
COALESCE(COUNT(posts.id), 0) AS post_count,
COALESCE(SUM(posts.like_count), 0) AS likes_received,
COALESCE(SUM(posts.bookmark_count), 0) AS bookmarks_received
FROM
users
LEFT JOIN
posts ON users.id = posts.user_id
GROUP BY
users.id, users.username
ORDER BY
post_count DESC, likes_received DESC, bookmarks_received DESC
示例结果:
| user | username | post_count | likes_received | bookmarks_received |
|---|---|---|---|---|
| 1 | alice | 345 | 6 | 9 |
| 2 | bella | 278 | 5 | 6 |
| 3 | charlie | 37 | 3 | 3 |
| 4 | dave | 0 | 0 | 0 |
在此查询中,我们将 users 表与 posts 表通过 user_id 字段进行连接。然后使用 COALESCE 函数确保如果用户没有帖子、收到的点赞或收到的书签,则返回 0 而不是 NULL。结果按用户 ID 和用户名分组,并按帖子数、点赞数和书签数降序排列。
每个用户的主题和回复数
复杂度级别:中级
此查询获取每个用户在两个日期之间创建的主题数和回复数。如果用户没有主题或回复,COALESCE 将返回 0 而不是 NULL。
-- [params]
-- date :start_date
-- date :end_date
-- string NULL:username
WITH qtt_topics AS (
SELECT
t.user_id,
COUNT(*) AS topics
FROM topics t
WHERE
t.user_id > 0
AND t.deleted_at ISNULL
AND t.archetype = 'regular'
AND t.created_at::date BETWEEN :start_date AND :end_date
GROUP BY t.user_id
),
qtt_replies AS (
SELECT
p.user_id,
COUNT(*) AS replies
FROM posts p
WHERE
p.user_id > 0
AND p.deleted_at ISNULL
AND p.post_number != 1
AND p.created_at::date BETWEEN :start_date AND :end_date
GROUP BY p.user_id
),
total AS (
SELECT
COALESCE(qr.user_id, qt.user_id) user_id,
COALESCE(topics,0) qtt_topics,
COALESCE(replies,0) qtt_replies
FROM qtt_topics qt
FULL JOIN qtt_replies qr ON qt.user_id = qr.user_id
ORDER BY user_id)
SELECT
username,
qtt_topics,
qtt_replies
FROM total
INNER JOIN users u ON u.id = user_id
WHERE :username ISNULL OR LOWER(username) LIKE '%'||LOWER(:username)||'%'
示例结果:
| username | qtt_topics | qtt_replies |
|---|---|---|
| Alice | 10 | 50 |
| Bella | 15 | 45 |
| Charlie | 12 | 30 |
在此查询中,COALESCE 用于 total 公共表表达式(CTE)。它确保如果 user_id 在 qtt_topics 或 qtt_replies 中为 NULL,则使用另一个值。这很重要,因为使用了 FULL JOIN 来组合 qtt_topics 和 qtt_replies,如果用户只有主题而没有回复(或反之),其 user_id 在其中一张表中将为 NULL。COALESCE 可防止这种情况发生。
详细解释与内联注释
-- [params]
-- date :start_date
-- date :end_date
-- string NULL:username
-- 定义 CTE(公共表表达式)以统计每个用户的主题数
WITH qtt_topics AS (
SELECT
t.user_id, -- 用户 ID
COUNT(*) AS topics -- 主题数量
FROM topics t -- 来自 topics 表
WHERE
t.user_id > 0 -- 仅考虑非零用户 ID
AND t.deleted_at ISNULL -- 仅考虑未删除的主题
AND t.archetype = 'regular' -- 仅考虑常规主题
AND t.created_at::date BETWEEN :start_date AND :end_date -- 仅考虑在 start_date 和 end_date 之间创建的主题
GROUP BY t.user_id -- 按用户 ID 分组以获取每个用户的主题数量
),
-- 定义 CTE 以统计每个用户的回复数
qtt_replies AS (
SELECT
p.user_id, -- 用户 ID
COUNT(*) AS replies -- 回复数量
FROM posts p -- 来自 posts 表
WHERE
p.user_id > 0 -- 仅考虑非零用户 ID
AND p.deleted_at ISNULL -- 仅考虑未删除的帖子
AND p.post_number != 1 -- 仅考虑不是主题中第一个帖子的帖子(即回复)
AND p.created_at::date BETWEEN :start_date AND :end_date -- 仅考虑在 start_date 和 end_date 之间创建的帖子
GROUP BY p.user_id -- 按用户 ID 分组以获取每个用户的回复数量
),
-- 定义 CTE 以组合每个用户的主题和回复数量
total AS (
SELECT
COALESCE(qr.user_id, qt.user_id) user_id, -- 用户 ID(来自 qtt_replies 或 qtt_topics)
COALESCE(topics,0) qtt_topics, -- 主题数量(如果为 NULL,则返回 0)
COALESCE(replies,0) qtt_replies -- 回复数量(如果为 NULL,则返回 0)
FROM qtt_topics qt -- 来自 qtt_topics CTE
FULL JOIN qtt_replies qr ON qt.user_id = qr.user_id -- 通过用户 ID 与 qtt_replies CTE 连接
ORDER BY user_id -- 按用户 ID 排序
)
-- 主查询以获取最终结果集
SELECT
username, -- 用户名
qtt_topics, -- 主题数量
qtt_replies -- 回复数量
FROM total -- 来自 total CTE
INNER JOIN users u ON u.id = user_id -- 通过用户 ID 与 users 表连接
WHERE :username ISNULL OR LOWER(username) LIKE '%'||LOWER(:username)||'%' -- 按用户名过滤(如果提供)
已解决问题数据
复杂度级别:高级 / 需要 Discourse Solved 插件
此查询用于获取有关主题的详细信息,包括是否已解决、首次回复所需时间、解决方案所需时间以及其他相关统计信息。
此查询假设网站上的所有主题都可以解决。
--[params]
-- date :start_date = 2022-01-01
-- date :end_date = 2024-01-01
-- text :tag_name = all
WITH valid_topics AS (
SELECT
t.id,
t.user_id,
t.title,
t.views,
posts_count-1 AS "posts_count",
t.created_at,
(CURRENT_DATE::date-t.created_at::date) AS "total_days",
string_agg(tags.name, ', ') AS tag_names
FROM topics t
LEFT JOIN topic_tags tt ON tt.topic_id = t.id
LEFT JOIN tags ON tags.id = tt.tag_id
LEFT JOIN tag_group_memberships tgm ON tgm.tag_id = tt.tag_id
LEFT JOIN (SELECT topic_id, MIN(created_at) AS "created_at"
FROM posts
WHERE deleted_at ISNULL
AND post_type = 1
AND post_number > 1
GROUP BY topic_id) date_first_reply ON date_first_reply.topic_id = t.id
WHERE t.deleted_at ISNULL
AND t.created_at::date BETWEEN :start_date AND :end_date
AND t.archetype = 'regular'
GROUP BY t.id, t.user_id, t.title, t.views, posts_count, t.created_at
),
solved_topics AS (
SELECT
vt.id,
tcf.created_at
FROM topic_custom_fields tcf
INNER JOIN valid_topics vt ON vt.id = tcf.topic_id
WHERE tcf.name = 'accepted_answer_post_id'
),
last_reply AS (
SELECT p.topic_id, p.user_id FROM posts p
INNER JOIN (SELECT topic_id, MAX(id) AS post FROM posts p
WHERE deleted_at ISNULL
AND post_type = 1
GROUP BY topic_id) x ON x.post = p.id
),
first_reply AS (
SELECT p.topic_id, p.user_id, p.created_at FROM posts p
INNER JOIN (SELECT topic_id, MIN(id) AS post FROM posts p
WHERE deleted_at ISNULL
AND post_type = 1
AND post_number > 1
GROUP BY topic_id) x ON x.post = p.id
)
SELECT
CASE
WHEN st.id IS NOT NULL THEN 'solved'
ELSE 'unsolved'
END AS status,
vt.tag_names,
vt.id AS topic_id,
vt.user_id topic_user_id,
ue.email,
vt.title,
vt.views,
lr.user_id AS last_reply_user_id,
ue2.email AS last_reply_user_email,
vt.created_at::date topic_create,
COALESCE(TO_CHAR(fr.created_at,'YYYY-MM-DD'),'') AS first_reply_create,
COALESCE(TO_CHAR(st.created_at,'YYYY-MM-DD'),'') AS solution_create,
COALESCE(fr.created_at::date - vt.created_at::date,0) AS "time_first_reply(days)",
COALESCE(CEIL(extract(epoch from (fr.created_at - vt.created_at))/3600.00),0) as "time_first_reply(hours)",
COALESCE(st.created_at::date - vt.created_at::date,0) AS "time_solution(days)",
COALESCE(CEIL(extract(epoch from (st.created_at - vt.created_at))/3600.00),0) as "time_solution(hours)",
posts_count AS number_of_replies,
total_days AS total_days_without_solution
FROM valid_topics vt
LEFT JOIN last_reply lr ON lr.topic_id = vt.id
LEFT JOIN first_reply fr ON fr.topic_id = vt.id
LEFT JOIN solved_topics st ON st.id = vt.id
INNER JOIN user_emails ue ON vt.user_id = ue.user_id AND ue."primary" = true
LEFT JOIN user_emails ue2 ON lr.user_id = ue2.user_id AND ue2."primary" = true
WHERE (:tag_name = 'all' OR LOWER(vt.tag_names) LIKE '%' || LOWER(:tag_name) || '%')
ORDER BY tag_names, total_days DESC
示例结果:
| status | tag_names | topic | topic_user | title | views | last_reply_user | last_reply_user_email | topic_create | first_reply_create | solution_create | time_first_reply(days) | time_first_reply(hours) | time_solution(days) | time_solution(hours) | number_of_replies | total_days_without_solution | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| solved | a, c, b | A Topic Title (7) | alice | alice@example.com | A Topic Title | 58 | bella | bella@example.com | 2023-08-25 | 2023-08-25 | 2023-08-29 | 0 | 1 | 1 | 24 | 9 | 4 |
| unsolved | tag1 | Welcome to the Lounge (3) | system | no_email | Welcome to the Lounge | 3 | system | no_email | 2023-05-01 | 0 | 0 | 0 | 0 | 2 | 134 |
在此查询中,COALESCE 用于以下行:
COALESCE(TO_CHAR(fr.created_at,'YYYY-MM-DD'),'') AS first_reply_create:此行将首次回复的created_at日期转换为字符串。如果首次回复不存在(即fr.created_at为 null),则返回空字符串(‘’)。COALESCE(TO_CHAR(st.created_at,'YYYY-MM-DD'),'') AS solution_create:与上述类似,此行将解决方案的created_at日期转换为字符串。如果解决方案不存在(即st.created_at为 null),则返回空字符串(‘’)。COALESCE(fr.created_at::date - vt.created_at::date,0) AS "time_first_reply(days)":此行计算主题创建与首次回复之间的时间差(以天为单位)。如果首次回复不存在(即fr.created_at为 null),则返回 0。COALESCE(CEIL(extract(epoch from (fr.created_at - vt.created_at))/3600.00),0) as "time_first_reply(hours)":此行计算主题创建与首次回复之间的时间差(以小时为单位)。如果首次回复不存在(即fr.created_at为 null),则返回 0。COALESCE(st.created_at::date - vt.created_at::date,0) AS "time_solution(days)":此行计算主题创建与解决方案之间的时间差(以天为单位)。如果解决方案不存在(即st.created_at为 null),则返回 0。COALESCE(CEIL(extract(epoch from (st.created_at - vt.created_at))/3600.00),0) as "time_solution(hours)":此行计算主题创建与解决方案之间的时间差(以小时为单位)。如果解决方案不存在(即st.created_at为 null),则返回 0。
在所有这些情况下,COALESCE 用于防止 NULL 值出现在最终结果中,从而提高查询结果的可读性,并可能对后续数据处理或分析有所帮助。
详细解释与内联注释
--[params]
-- date :start_date = 2022-01-01
-- date :end_date = 2024-01-01
-- text :tag_name = all
-- 定义有效主题的 CTE
WITH valid_topics AS (
-- 选择必要字段
SELECT
t.id, -- 主题 ID
t.user_id, -- 用户 ID
t.title, -- 主题标题
t.views, -- 浏览量
posts_count-1 AS "posts_count", -- 主题中的帖子数量
t.created_at, -- 主题创建日期
(CURRENT_DATE::date-t.created_at::date) AS "total_days", -- 主题创建以来的总天数
string_agg(tags.name, ', ') AS tag_names -- 聚合与主题关联的所有标签
FROM topics t -- 来自 topics 表
-- 连接必要表以获取标签名称
LEFT JOIN topic_tags tt ON tt.topic_id = t.id
LEFT JOIN tags ON tags.id = tt.tag_id
LEFT JOIN tag_group_memberships tgm ON tgm.tag_id = tt.tag_id
-- 子查询以获取每个主题的首次回复日期
LEFT JOIN (SELECT topic_id, MIN(created_at) AS "created_at"
FROM posts
WHERE deleted_at ISNULL
AND post_type = 1
AND post_number > 1
GROUP BY topic_id) date_first_reply ON date_first_reply.topic_id = t.id
WHERE t.deleted_at ISNULL -- 仅考虑未删除的主题
AND t.created_at::date BETWEEN :start_date AND :end_date -- 仅考虑在 start_date 和 end_date 之间创建的主题
AND t.archetype = 'regular' -- 仅考虑常规主题
GROUP BY t.id, t.user_id, t.title, t.views, posts_count, t.created_at -- 按必要字段分组以获取正确的计数
),
-- 定义已解决主题的 CTE
solved_topics AS (
-- 选择主题 ID 和解决方案的创建日期
SELECT
vt.id,
tcf.created_at
FROM topic_custom_fields tcf -- 来自 topic_custom_fields 表
INNER JOIN valid_topics vt ON vt.id = tcf.topic_id -- 与 valid_topics CTE 连接
WHERE tcf.name = 'accepted_answer_post_id' -- 仅考虑有已接受答案的主题
),
-- 定义每个主题的最后回复的 CTE
last_reply AS (
-- 选择主题 ID 和最后回复的用户 ID
SELECT p.topic_id, p.user_id FROM posts p
INNER JOIN (SELECT topic_id, MAX(id) AS post FROM posts p -- 子查询以获取每个主题的最后帖子 ID
WHERE deleted_at ISNULL
AND post_type = 1
GROUP BY topic_id) x ON x.post = p.id -- 与 posts 表连接以获取最后回复的用户 ID
),
-- 定义每个主题的首次回复的 CTE
first_reply AS (
-- 选择主题 ID、用户 ID 和首次回复的创建日期
SELECT p.topic_id, p.user_id, p.created_at FROM posts p
INNER JOIN (SELECT topic_id, MIN(id) AS post FROM posts p -- 子查询以获取每个主题的首次回复 ID
WHERE deleted_at ISNULL
AND post_type = 1
AND post_number > 1
GROUP BY topic_id) x ON x.post = p.id -- 与 posts 表连接以获取首次回复的用户 ID 和创建日期
)
-- 主查询以获取最终结果集
SELECT
CASE
WHEN st.id IS NOT NULL THEN 'solved' -- 如果主题 ID 在 solved_topics CTE 中,则状态为 'solved'
ELSE 'unsolved' -- 否则,状态为 'unsolved'
END AS status,
vt.tag_names, -- 标签名称
vt.id AS topic_id, -- 主题 ID
vt.user_id topic_user_id, -- 用户 ID
ue.email, -- 用户邮箱
vt.title, -- 主题标题
vt.views, -- 浏览量
lr.user_id AS last_reply_user_id, -- 最后回复的用户 ID
ue2.email AS last_reply_user_email, -- 最后回复用户的邮箱
vt.created_at::date topic_create, -- 主题创建日期
COALESCE(TO_CHAR(fr.created_at,'YYYY-MM-DD'),'') AS first_reply_create, -- 首次回复的创建日期(如果存在),否则为空字符串
COALESCE(TO_CHAR(st.created_at,'YYYY-MM-DD'),'') AS solution_create, -- 解决方案的创建日期(如果存在),否则为空字符串
COALESCE(fr.created_at::date - vt.created_at::date,0) AS "time_first_reply(days)", -- 首次回复所需时间(天)
COALESCE(CEIL(extract(epoch from (fr.created_at - vt.created_at))/3600.00),0) as "time_first_reply(hours)", -- 首次回复所需时间(小时)
COALESCE(st.created_at::date - vt.created_at::date,0) AS "time_solution(days)", -- 解决方案所需时间(天)
COALESCE(CEIL(extract(epoch from (st.created_at - vt.created_at))/3600.00),0) as "time_solution(hours)", -- 解决方案所需时间(小时)
posts_count AS number_of_replies, -- 回复数量
total_days AS total_days_without_solution -- 无解决方案的总天数
FROM valid_topics vt -- 来自 valid_topics CTE
LEFT JOIN last_reply lr ON lr.topic_id = vt.id -- 与 last_reply CTE 连接
LEFT JOIN first_reply fr ON fr.topic_id = vt.id -- 与 first_reply CTE 连接
LEFT JOIN solved_topics st ON st.id = vt.id -- 与 solved_topics CTE 连接
INNER JOIN user_emails ue ON vt.user_id = ue.user_id AND ue."primary" = true -- 与 user_emails 表连接以获取用户邮箱
LEFT JOIN user_emails ue2 ON lr.user_id = ue2.user_id AND ue2."primary" = true -- 与 user_emails 表连接以获取最后回复用户的邮箱
WHERE (:tag_name = 'all' OR LOWER(vt.tag_names) LIKE '%' || LOWER(:tag_name) || '%') -- 按标签名称过滤
ORDER BY tag_names, total_days DESC -- 按标签名称和总天数降序排列
如果您有任何问题或分享了您在 Data Explorer 查询中使用 COALESCE 的示例,请随时在下方分享。![]()