In this tutorial, we will explore the use of the COALESCE
function in Data Explorer SQL queries.
COALESCE
allows you to handle NULL values in your query results. If you have NULL values in your data, you can use COALESCE
to provide a default value (like 0) for these NULL values.
COALESCE
is particularly useful when you might be performing later calculations or data analysis on your query results where NULL values could potentially cause issues or misinterpretations.
Syntax
The COALESCE
function takes two or more arguments and returns the first non-NULL value it encounters from left to right in the list. If all arguments are NULL, COALESCE
returns NULL.
The basic syntax for COALESCE
is as follows:
COALESCE(value1, value2, ..., valueN)
For example, COALESCE(NULL, 1, 2)
will return 1
because 1
is the first non-NULL argument.
Example Queries
Let’s examine a few example queries to understand how COALESCE
is used in Data Explorer queries.
Posts Created, Likes Received, and Bookmarks Received
Complexity Level: Beginner
This query will get the total number of posts created, likes received, and bookmarks received on posts for each user on a site. If a user has no posts, likes, or bookmarks received, the COALESCE
function will return 0 instead of NULL.
SELECT
users.id AS user_id,
users.username,
COALESCE(COUNT(posts.id), 0) AS post_count,
COALESCE(SUM(posts.like_count), 0) AS likes_received,
COALESCE(SUM(posts.bookmark_count), 0) AS bookmarks_received
FROM
users
LEFT JOIN
posts ON users.id = posts.user_id
GROUP BY
users.id, users.username
ORDER BY
post_count DESC, likes_received DESC, bookmarks_received DESC
Example Results:
user | username | post_count | likes_received | bookmarks_received |
---|---|---|---|---|
1 | alice | 345 | 6 | 9 |
2 | bella | 278 | 5 | 6 |
3 | charlie | 37 | 3 | 3 |
4 | dave | 0 | 0 | 0 |
In this query, we’re joining the users
table with the posts
table on the user_id
field. We’re then using the COALESCE
function to ensure that if a user has no posts, likes received, or bookmarks received, we return 0 instead of NULL. The results are grouped by user id and username, and ordered by the number of posts, likes, and bookmarks in descending order.
Topics and Replies per Users
Complexity Level: Intermediate
This query gets the count of topics and replies made by each user between two dates. If a user has no topics or replies, COALESCE
will return 0 instead of NULL.
-- [params]
-- date :start_date
-- date :end_date
-- string NULL:username
WITH qtt_topics AS (
SELECT
t.user_id,
COUNT(*) AS topics
FROM topics t
WHERE
t.user_id > 0
AND t.deleted_at ISNULL
AND t.archetype = 'regular'
AND t.created_at::date BETWEEN :start_date AND :end_date
GROUP BY t.user_id
),
qtt_replies AS (
SELECT
p.user_id,
COUNT(*) AS replies
FROM posts p
WHERE
p.user_id > 0
AND p.deleted_at ISNULL
AND p.post_number != 1
AND p.created_at::date BETWEEN :start_date AND :end_date
GROUP BY p.user_id
),
total AS (
SELECT
COALESCE(qr.user_id, qt.user_id) user_id,
COALESCE(topics,0) qtt_topics,
COALESCE(replies,0) qtt_replies
FROM qtt_topics qt
FULL JOIN qtt_replies qr ON qt.user_id = qr.user_id
ORDER BY user_id)
SELECT
username,
qtt_topics,
qtt_replies
FROM total
INNER JOIN users u ON u.id = user_id
WHERE :username ISNULL OR LOWER(username) LIKE '%'||LOWER(:username)||'%'
Example Results:
username | qtt_topics | qtt_replies |
---|---|---|
Alice | 10 | 50 |
Bella | 15 | 45 |
Charlie | 12 | 30 |
In this query COALESCE
is used in the total
Common Table Expression (CTE). It ensures that if the user_id
is NULL in either qtt_topics
or qtt_replies
, the other value is used. This is important because a FULL JOIN is used to combine qtt_topics
and qtt_replies
, and if a user only has topics but no replies (or vice versa), their user_id
would be NULL in one of the tables. COALESCE
prevents this from happening.
Detailed Explanation with Inline Comments
-- [params]
-- date :start_date
-- date :end_date
-- string NULL:username
-- Define a CTE (Common Table Expression) for counting topics per user
WITH qtt_topics AS (
SELECT
t.user_id, -- user id
COUNT(*) AS topics -- count of topics
FROM topics t -- from topics table
WHERE
t.user_id > 0 -- only consider non-zero user ids
AND t.deleted_at ISNULL -- only consider topics that are not deleted
AND t.archetype = 'regular' -- only consider regular topics
AND t.created_at::date BETWEEN :start_date AND :end_date -- only consider topics created between start_date and end_date
GROUP BY t.user_id -- group by user id to get the count of topics per user
),
-- Define a CTE for counting replies per user
qtt_replies AS (
SELECT
p.user_id, -- user id
COUNT(*) AS replies -- count of replies
FROM posts p -- from posts table
WHERE
p.user_id > 0 -- only consider non-zero user ids
AND p.deleted_at ISNULL -- only consider posts that are not deleted
AND p.post_number != 1 -- only consider posts that are not the first post in a topic (i.e., replies)
AND p.created_at::date BETWEEN :start_date AND :end_date -- only consider posts created between start_date and end_date
GROUP BY p.user_id -- group by user id to get the count of replies per user
),
-- Define a CTE to combine the counts of topics and replies per user
total AS (
SELECT
COALESCE(qr.user_id, qt.user_id) user_id, -- user id (from either qtt_replies or qtt_topics)
COALESCE(topics,0) qtt_topics, -- count of topics (if NULL, return 0)
COALESCE(replies,0) qtt_replies -- count of replies (if NULL, return 0)
FROM qtt_topics qt -- from qtt_topics CTE
FULL JOIN qtt_replies qr ON qt.user_id = qr.user_id -- join with qtt_replies CTE on user id
ORDER BY user_id -- order by user id
)
-- Main query to get the final result set
SELECT
username, -- username
qtt_topics, -- count of topics
qtt_replies -- count of replies
FROM total -- from total CTE
INNER JOIN users u ON u.id = user_id -- join with users table on user id
WHERE :username ISNULL OR LOWER(username) LIKE '%'||LOWER(:username)||'%' -- filter by username (if provided)
Solved Questions Data
Complexity Level: Advanced / Requires Discourse Solved Plugin
This query is used to get details about topics, including whether they are solved or not, the time taken for the first reply, the time taken for the solution, and other related statistics.
This query assumes all topics on a site can be solved.
--[params]
-- date :start_date = 2022-01-01
-- date :end_date = 2024-01-01
-- text :tag_name = all
WITH valid_topics AS (
SELECT
t.id,
t.user_id,
t.title,
t.views,
posts_count-1 AS "posts_count",
t.created_at,
(CURRENT_DATE::date-t.created_at::date) AS "total_days",
string_agg(tags.name, ', ') AS tag_names
FROM topics t
LEFT JOIN topic_tags tt ON tt.topic_id = t.id
LEFT JOIN tags ON tags.id = tt.tag_id
LEFT JOIN tag_group_memberships tgm ON tgm.tag_id = tt.tag_id
LEFT JOIN (SELECT topic_id, MIN(created_at) AS "created_at"
FROM posts
WHERE deleted_at ISNULL
AND post_type = 1
AND post_number > 1
GROUP BY topic_id) date_first_reply ON date_first_reply.topic_id = t.id
WHERE t.deleted_at ISNULL
AND t.created_at::date BETWEEN :start_date AND :end_date
AND t.archetype = 'regular'
GROUP BY t.id, t.user_id, t.title, t.views, posts_count, t.created_at
),
solved_topics AS (
SELECT
vt.id,
tcf.created_at
FROM topic_custom_fields tcf
INNER JOIN valid_topics vt ON vt.id = tcf.topic_id
WHERE tcf.name = 'accepted_answer_post_id'
),
last_reply AS (
SELECT p.topic_id, p.user_id FROM posts p
INNER JOIN (SELECT topic_id, MAX(id) AS post FROM posts p
WHERE deleted_at ISNULL
AND post_type = 1
GROUP BY topic_id) x ON x.post = p.id
),
first_reply AS (
SELECT p.topic_id, p.user_id, p.created_at FROM posts p
INNER JOIN (SELECT topic_id, MIN(id) AS post FROM posts p
WHERE deleted_at ISNULL
AND post_type = 1
AND post_number > 1
GROUP BY topic_id) x ON x.post = p.id
)
SELECT
CASE
WHEN st.id IS NOT NULL THEN 'solved'
ELSE 'unsolved'
END AS status,
vt.tag_names,
vt.id AS topic_id,
vt.user_id topic_user_id,
ue.email,
vt.title,
vt.views,
lr.user_id AS last_reply_user_id,
ue2.email AS last_reply_user_email,
vt.created_at::date topic_create,
COALESCE(TO_CHAR(fr.created_at,'YYYY-MM-DD'),'') AS first_reply_create,
COALESCE(TO_CHAR(st.created_at,'YYYY-MM-DD'),'') AS solution_create,
COALESCE(fr.created_at::date - vt.created_at::date,0) AS "time_first_reply(days)",
COALESCE(CEIL(extract(epoch from (fr.created_at - vt.created_at))/3600.00),0) as "time_first_reply(hours)",
COALESCE(st.created_at::date - vt.created_at::date,0) AS "time_solution(days)",
COALESCE(CEIL(extract(epoch from (st.created_at - vt.created_at))/3600.00),0) as "time_solution(hours)",
posts_count AS number_of_replies,
total_days AS total_days_without_solution
FROM valid_topics vt
LEFT JOIN last_reply lr ON lr.topic_id = vt.id
LEFT JOIN first_reply fr ON fr.topic_id = vt.id
LEFT JOIN solved_topics st ON st.id = vt.id
INNER JOIN user_emails ue ON vt.user_id = ue.user_id AND ue."primary" = true
LEFT JOIN user_emails ue2 ON lr.user_id = ue2.user_id AND ue2."primary" = true
WHERE (:tag_name = 'all' OR LOWER(vt.tag_names) LIKE '%' || LOWER(:tag_name) || '%')
ORDER BY tag_names, total_days DESC
Example Results:
status | tag_names | topic | topic_user | title | views | last_reply_user | last_reply_user_email | topic_create | first_reply_create | solution_create | time_first_reply(days) | time_first_reply(hours) | time_solution(days) | time_solution(hours) | number_of_replies | total_days_without_solution | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
solved | a, c, b | A Topic Title (7) | alice | alice@example.com | A Topic Title | 58 | bella | bella@example.com | 2023-08-25 | 2023-08-25 | 2023-08-29 | 0 | 1 | 1 | 24 | 9 | 4 |
unsolved | tag1 | Welcome to the Lounge (3) | system | no_email | Welcome to the Lounge | 3 | system | no_email | 2023-05-01 | 0 | 0 | 0 | 0 | 2 | 134 |
In this query, COALESCE
is used in the following lines:
COALESCE(TO_CHAR(fr.created_at,'YYYY-MM-DD'),'') AS first_reply_create
: This line is converting thecreated_at
date of the first reply to a string. If the first reply does not exist (i.e.,fr.created_at
is null), it will return an empty string (‘’).COALESCE(TO_CHAR(st.created_at,'YYYY-MM-DD'),'') AS solution_create
: Similar to the above, this line is converting thecreated_at
date of the solution to a string. If the solution does not exist (i.e.,st.created_at
is null), it will return an empty string (‘’).COALESCE(fr.created_at::date - vt.created_at::date,0) AS "time_first_reply(days)"
: This line is calculating the time difference in days between the creation of the topic and the first reply. If the first reply does not exist (i.e.,fr.created_at
is null), it will return 0.COALESCE(CEIL(extract(epoch from (fr.created_at - vt.created_at))/3600.00),0) as "time_first_reply(hours)"
: This line is calculating the time difference in hours between the creation of the topic and the first reply. If the first reply does not exist (i.e.,fr.created_at
is null), it will return 0.COALESCE(st.created_at::date - vt.created_at::date,0) AS "time_solution(days)"
: This line is calculating the time difference in days between the creation of the topic and the solution. If the solution does not exist (i.e.,st.created_at
is null), it will return 0.COALESCE(CEIL(extract(epoch from (st.created_at - vt.created_at))/3600.00),0) as "time_solution(hours)"
: This line is calculating the time difference in hours between the creation of the topic and the solution. If the solution does not exist (i.e.,st.created_at
is null), it will return 0.
In all these cases, COALESCE
is used to prevent NULL values from appearing in the final results, which improves the readability of the resulting query and could be helpful for later data processing or analysis.
Detailed Explanation with Inline Comments
--[params]
-- date :start_date = 2022-01-01
-- date :end_date = 2024-01-01
-- text :tag_name = all
-- Define CTE for valid topics
WITH valid_topics AS (
-- Select necessary fields
SELECT
t.id, -- topic id
t.user_id, -- user id
t.title, -- topic title
t.views, -- number of views
posts_count-1 AS "posts_count", -- number of posts in the topic
t.created_at, -- creation date of the topic
(CURRENT_DATE::date-t.created_at::date) AS "total_days", -- total days since the topic was created
string_agg(tags.name, ', ') AS tag_names -- aggregate all tags associated with the topic
FROM topics t -- from topics table
-- Join necessary tables to get tag names
LEFT JOIN topic_tags tt ON tt.topic_id = t.id
LEFT JOIN tags ON tags.id = tt.tag_id
LEFT JOIN tag_group_memberships tgm ON tgm.tag_id = tt.tag_id
-- Subquery to get the date of the first reply for each topic
LEFT JOIN (SELECT topic_id, MIN(created_at) AS "created_at"
FROM posts
WHERE deleted_at ISNULL
AND post_type = 1
AND post_number > 1
GROUP BY topic_id) date_first_reply ON date_first_reply.topic_id = t.id
WHERE t.deleted_at ISNULL -- only consider topics that are not deleted
AND t.created_at::date BETWEEN :start_date AND :end_date -- only consider topics created between start_date and end_date
AND t.archetype = 'regular' -- only consider regular topics
GROUP BY t.id, t.user_id, t.title, t.views, posts_count, t.created_at -- group by necessary fields to get the correct count
),
-- Define CTE for solved topics
solved_topics AS (
-- Select topic id and creation date of the solution
SELECT
vt.id,
tcf.created_at
FROM topic_custom_fields tcf -- from topic_custom_fields table
INNER JOIN valid_topics vt ON vt.id = tcf.topic_id -- join with valid_topics CTE
WHERE tcf.name = 'accepted_answer_post_id' -- only consider topics with an accepted answer
),
-- Define CTE for the last reply of each topic
last_reply AS (
-- Select topic id and user id of the last reply
SELECT p.topic_id, p.user_id FROM posts p
INNER JOIN (SELECT topic_id, MAX(id) AS post FROM posts p -- subquery to get the id of the last post for each topic
WHERE deleted_at ISNULL
AND post_type = 1
GROUP BY topic_id) x ON x.post = p.id -- join with posts table to get the user id of the last reply
),
-- Define CTE for the first reply of each topic
first_reply AS (
-- Select topic id, user id and creation date of the first reply
SELECT p.topic_id, p.user_id, p.created_at FROM posts p
INNER JOIN (SELECT topic_id, MIN(id) AS post FROM posts p -- subquery to get the id of the first reply for each topic
WHERE deleted_at ISNULL
AND post_type = 1
AND post_number > 1
GROUP BY topic_id) x ON x.post = p.id -- join with posts table to get the user id and creation date of the first reply
)
-- Main query to get the final result set
SELECT
CASE
WHEN st.id IS NOT NULL THEN 'solved' -- if the topic id is in solved_topics CTE, then the status is 'solved'
ELSE 'unsolved' -- otherwise, the status is 'unsolved'
END AS status,
vt.tag_names, -- tag names
vt.id AS topic_id, -- topic id
vt.user_id topic_user_id, -- user id
ue.email, -- user email
vt.title, -- topic title
vt.views, -- number of views
lr.user_id AS last_reply_user_id, -- user id of the last reply
ue2.email AS last_reply_user_email, -- email of the user who made the last reply
vt.created_at::date topic_create, -- creation date of the topic
COALESCE(TO_CHAR(fr.created_at,'YYYY-MM-DD'),'') AS first_reply_create, -- creation date of the first reply, if exists, otherwise empty string
COALESCE(TO_CHAR(st.created_at,'YYYY-MM-DD'),'') AS solution_create, -- creation date of the solution, if exists, otherwise empty string
COALESCE(fr.created_at::date - vt.created_at::date,0) AS "time_first_reply(days)", -- time taken for the first reply in days
COALESCE(CEIL(extract(epoch from (fr.created_at - vt.created_at))/3600.00),0) as "time_first_reply(hours)", -- time taken for the first reply in hours
COALESCE(st.created_at::date - vt.created_at::date,0) AS "time_solution(days)", -- time taken for the solution in days
COALESCE(CEIL(extract(epoch from (st.created_at - vt.created_at))/3600.00),0) as "time_solution(hours)", -- time taken for the solution in hours
posts_count AS number_of_replies, -- number of replies
total_days AS total_days_without_solution -- total days without solution
FROM valid_topics vt -- from valid_topics CTE
LEFT JOIN last_reply lr ON lr.topic_id = vt.id -- join with last_reply CTE
LEFT JOIN first_reply fr ON fr.topic_id = vt.id -- join with first_reply CTE
LEFT JOIN solved_topics st ON st.id = vt.id -- join with solved_topics CTE
INNER JOIN user_emails ue ON vt.user_id = ue.user_id AND ue."primary" = true -- join with user_emails table to get the email of the user
LEFT JOIN user_emails ue2 ON lr.user_id = ue2.user_id AND ue2."primary" = true -- join with user_emails table to get the email of the user who made the last reply
WHERE (:tag_name = 'all' OR LOWER(vt.tag_names) LIKE '%' || LOWER(:tag_name) || '%') -- filter by tag name
ORDER BY tag_names, total_days DESC -- order by tag names and total days in descending order
If you have any questions or examples of how you’ve used COALESCE
in your Data Explorer queries, please feel free to share them below.