Handling Null Values with COALESCE

In this tutorial, we will explore the use of the COALESCE function in Data Explorer SQL queries.

COALESCE allows you to handle NULL values in your query results. If you have NULL values in your data, you can use COALESCE to provide a default value (like 0) for these NULL values.

COALESCE is particularly useful when you might be performing later calculations or data analysis on your query results where NULL values could potentially cause issues or misinterpretations.

Syntax

The COALESCE function takes two or more arguments and returns the first non-NULL value it encounters from left to right in the list. If all arguments are NULL, COALESCE returns NULL.

The basic syntax for COALESCE is as follows:

COALESCE(value1, value2, ..., valueN)

For example, COALESCE(NULL, 1, 2) will return 1 because 1 is the first non-NULL argument.

Example Queries

Let’s examine a few example queries to understand how COALESCE is used in Data Explorer queries.

Posts Created, Likes Received, and Bookmarks Received

Complexity Level: Beginner

This query will get the total number of posts created, likes received, and bookmarks received on posts for each user on a site. If a user has no posts, likes, or bookmarks received, the COALESCE function will return 0 instead of NULL.

SELECT 
    users.id AS user_id,
    users.username,
    COALESCE(COUNT(posts.id), 0) AS post_count,
    COALESCE(SUM(posts.like_count), 0) AS likes_received,
    COALESCE(SUM(posts.bookmark_count), 0) AS bookmarks_received
FROM 
    users
LEFT JOIN 
    posts ON users.id = posts.user_id
GROUP BY 
    users.id, users.username
ORDER BY 
    post_count DESC, likes_received DESC, bookmarks_received DESC

Example Results:

user username post_count likes_received bookmarks_received
1 alice 345 6 9
2 bella 278 5 6
3 charlie 37 3 3
4 dave 0 0 0

In this query, we’re joining the users table with the posts table on the user_id field. We’re then using the COALESCE function to ensure that if a user has no posts, likes received, or bookmarks received, we return 0 instead of NULL. The results are grouped by user id and username, and ordered by the number of posts, likes, and bookmarks in descending order.

Topics and Replies per Users

Complexity Level: Intermediate

This query gets the count of topics and replies made by each user between two dates. If a user has no topics or replies, COALESCE will return 0 instead of NULL.

-- [params]
-- date :start_date 
-- date :end_date
-- string NULL:username

WITH qtt_topics AS (
    SELECT 
        t.user_id,
        COUNT(*) AS topics
    FROM topics t
    WHERE    
        t.user_id > 0 
        AND t.deleted_at ISNULL
        AND t.archetype = 'regular'
        AND t.created_at::date BETWEEN :start_date AND :end_date
    GROUP BY t.user_id
    ),
    
qtt_replies AS (
    SELECT 
        p.user_id,
        COUNT(*) AS replies
    FROM posts p
    WHERE    
        p.user_id > 0 
        AND p.deleted_at ISNULL
        AND p.post_number != 1
        AND p.created_at::date BETWEEN :start_date AND :end_date
    GROUP BY p.user_id
    ),

total AS (
    SELECT
        COALESCE(qr.user_id, qt.user_id) user_id,
        COALESCE(topics,0) qtt_topics,
        COALESCE(replies,0) qtt_replies
    FROM qtt_topics qt
    FULL JOIN qtt_replies qr ON qt.user_id = qr.user_id
    ORDER BY user_id)

SELECT 
    username,
    qtt_topics,
    qtt_replies
FROM total
INNER JOIN users u ON u.id = user_id
WHERE :username ISNULL OR LOWER(username) LIKE '%'||LOWER(:username)||'%'

Example Results:

username qtt_topics qtt_replies
Alice 10 50
Bella 15 45
Charlie 12 30

In this query COALESCE is used in the total Common Table Expression (CTE). It ensures that if the user_id is NULL in either qtt_topics or qtt_replies , the other value is used. This is important because a FULL JOIN is used to combine qtt_topics and qtt_replies , and if a user only has topics but no replies (or vice versa), their user_id would be NULL in one of the tables. COALESCE prevents this from happening.

Detailed Explanation with Inline Comments
-- [params]
-- date :start_date 
-- date :end_date
-- string NULL:username

-- Define a CTE (Common Table Expression) for counting topics per user
WITH qtt_topics AS (
    SELECT 
        t.user_id,  -- user id
        COUNT(*) AS topics  -- count of topics
    FROM topics t  -- from topics table
    WHERE    
        t.user_id > 0  -- only consider non-zero user ids
        AND t.deleted_at ISNULL  -- only consider topics that are not deleted
        AND t.archetype = 'regular'  -- only consider regular topics
        AND t.created_at::date BETWEEN :start_date AND :end_date  -- only consider topics created between start_date and end_date
    GROUP BY t.user_id  -- group by user id to get the count of topics per user
),
    
-- Define a CTE for counting replies per user
qtt_replies AS (
    SELECT 
        p.user_id,  -- user id
        COUNT(*) AS replies  -- count of replies
    FROM posts p  -- from posts table
    WHERE    
        p.user_id > 0  -- only consider non-zero user ids
        AND p.deleted_at ISNULL  -- only consider posts that are not deleted
        AND p.post_number != 1  -- only consider posts that are not the first post in a topic (i.e., replies)
        AND p.created_at::date BETWEEN :start_date AND :end_date  -- only consider posts created between start_date and end_date
    GROUP BY p.user_id  -- group by user id to get the count of replies per user
),

-- Define a CTE to combine the counts of topics and replies per user
total AS (
    SELECT
        COALESCE(qr.user_id, qt.user_id) user_id,  -- user id (from either qtt_replies or qtt_topics)
        COALESCE(topics,0) qtt_topics,  -- count of topics (if NULL, return 0)
        COALESCE(replies,0) qtt_replies  -- count of replies (if NULL, return 0)
    FROM qtt_topics qt  -- from qtt_topics CTE
    FULL JOIN qtt_replies qr ON qt.user_id = qr.user_id  -- join with qtt_replies CTE on user id
    ORDER BY user_id  -- order by user id
)

-- Main query to get the final result set
SELECT 
    username,  -- username
    qtt_topics,  -- count of topics
    qtt_replies  -- count of replies
FROM total  -- from total CTE
INNER JOIN users u ON u.id = user_id  -- join with users table on user id
WHERE :username ISNULL OR LOWER(username) LIKE '%'||LOWER(:username)||'%'  -- filter by username (if provided)

Solved Questions Data

Complexity Level: Advanced / Requires Discourse Solved Plugin

This query is used to get details about topics, including whether they are solved or not, the time taken for the first reply, the time taken for the solution, and other related statistics.

This query assumes all topics on a site can be solved.

--[params]
-- date :start_date = 2022-01-01
-- date :end_date = 2024-01-01
-- text :tag_name = all


WITH valid_topics AS (
    SELECT 
        t.id,
        t.user_id,
        t.title,
        t.views,
        posts_count-1 AS "posts_count", 
        t.created_at,
        (CURRENT_DATE::date-t.created_at::date) AS "total_days",
        string_agg(tags.name, ', ') AS tag_names
    FROM topics t
    LEFT JOIN topic_tags tt ON tt.topic_id = t.id
    LEFT JOIN tags ON tags.id = tt.tag_id
    LEFT JOIN tag_group_memberships tgm ON tgm.tag_id = tt.tag_id
    LEFT JOIN (SELECT topic_id, MIN(created_at) AS "created_at"
                FROM posts
                WHERE deleted_at ISNULL
                    AND post_type = 1
                    AND post_number > 1
                GROUP BY topic_id) date_first_reply ON date_first_reply.topic_id = t.id
    WHERE t.deleted_at ISNULL
        AND t.created_at::date BETWEEN :start_date AND :end_date
        AND t.archetype = 'regular'
    GROUP BY t.id, t.user_id, t.title, t.views, posts_count, t.created_at
),

solved_topics AS (
    SELECT 
        vt.id,
        tcf.created_at
    FROM topic_custom_fields tcf
    INNER JOIN valid_topics vt ON vt.id = tcf.topic_id
    WHERE tcf.name = 'accepted_answer_post_id'
),

last_reply AS (
    SELECT p.topic_id, p.user_id FROM posts p
    INNER JOIN (SELECT topic_id, MAX(id) AS post FROM posts p
                WHERE deleted_at ISNULL
                AND post_type = 1
                GROUP BY topic_id) x ON x.post = p.id
),

first_reply AS (
    SELECT p.topic_id, p.user_id, p.created_at FROM posts p
    INNER JOIN (SELECT topic_id, MIN(id) AS post FROM posts p
                WHERE deleted_at ISNULL
                AND post_type = 1
                AND post_number > 1
                GROUP BY topic_id) x ON x.post = p.id
)

SELECT
    CASE 
        WHEN st.id IS NOT NULL THEN 'solved'
        ELSE 'unsolved'
    END AS status,
    vt.tag_names,
    vt.id AS topic_id,
    vt.user_id topic_user_id,
    ue.email,
    vt.title,
    vt.views,
    lr.user_id AS last_reply_user_id,
    ue2.email AS last_reply_user_email,
    vt.created_at::date topic_create,
    COALESCE(TO_CHAR(fr.created_at,'YYYY-MM-DD'),'') AS first_reply_create,
    COALESCE(TO_CHAR(st.created_at,'YYYY-MM-DD'),'') AS solution_create,
    COALESCE(fr.created_at::date - vt.created_at::date,0) AS "time_first_reply(days)",
    COALESCE(CEIL(extract(epoch from (fr.created_at - vt.created_at))/3600.00),0) as "time_first_reply(hours)",
    COALESCE(st.created_at::date - vt.created_at::date,0) AS "time_solution(days)",
    COALESCE(CEIL(extract(epoch from (st.created_at - vt.created_at))/3600.00),0) as "time_solution(hours)",
    posts_count AS number_of_replies,
    total_days AS total_days_without_solution
FROM valid_topics vt
LEFT JOIN last_reply lr ON lr.topic_id = vt.id
LEFT JOIN first_reply fr ON fr.topic_id = vt.id
LEFT JOIN solved_topics st ON st.id = vt.id
INNER JOIN user_emails ue ON vt.user_id = ue.user_id AND ue."primary" = true
LEFT JOIN user_emails ue2 ON lr.user_id = ue2.user_id AND ue2."primary" = true
WHERE (:tag_name =  'all' OR LOWER(vt.tag_names) LIKE '%' || LOWER(:tag_name) || '%')
ORDER BY tag_names, total_days DESC

Example Results:

status tag_names topic topic_user email title views last_reply_user last_reply_user_email topic_create first_reply_create solution_create time_first_reply(days) time_first_reply(hours) time_solution(days) time_solution(hours) number_of_replies total_days_without_solution
solved a, c, b A Topic Title (7) alice alice@example.com A Topic Title 58 bella bella@example.com 2023-08-25 2023-08-25 2023-08-29 0 1 1 24 9 4
unsolved tag1 Welcome to the Lounge (3) system no_email Welcome to the Lounge 3 system no_email 2023-05-01 0 0 0 0 2 134

In this query, COALESCE is used in the following lines:

  • COALESCE(TO_CHAR(fr.created_at,'YYYY-MM-DD'),'') AS first_reply_create: This line is converting the created_at date of the first reply to a string. If the first reply does not exist (i.e., fr.created_at is null), it will return an empty string (‘’).
  • COALESCE(TO_CHAR(st.created_at,'YYYY-MM-DD'),'') AS solution_create: Similar to the above, this line is converting the created_at date of the solution to a string. If the solution does not exist (i.e., st.created_at is null), it will return an empty string (‘’).
  • COALESCE(fr.created_at::date - vt.created_at::date,0) AS "time_first_reply(days)": This line is calculating the time difference in days between the creation of the topic and the first reply. If the first reply does not exist (i.e., fr.created_at is null), it will return 0.
  • COALESCE(CEIL(extract(epoch from (fr.created_at - vt.created_at))/3600.00),0) as "time_first_reply(hours)": This line is calculating the time difference in hours between the creation of the topic and the first reply. If the first reply does not exist (i.e., fr.created_at is null), it will return 0.
  • COALESCE(st.created_at::date - vt.created_at::date,0) AS "time_solution(days)": This line is calculating the time difference in days between the creation of the topic and the solution. If the solution does not exist (i.e., st.created_at is null), it will return 0.
  • COALESCE(CEIL(extract(epoch from (st.created_at - vt.created_at))/3600.00),0) as "time_solution(hours)": This line is calculating the time difference in hours between the creation of the topic and the solution. If the solution does not exist (i.e., st.created_at is null), it will return 0.

In all these cases, COALESCE is used to prevent NULL values from appearing in the final results, which improves the readability of the resulting query and could be helpful for later data processing or analysis.

Detailed Explanation with Inline Comments
--[params]
-- date :start_date = 2022-01-01
-- date :end_date = 2024-01-01
-- text :tag_name = all

-- Define CTE for valid topics
WITH valid_topics AS (
    -- Select necessary fields
    SELECT 
        t.id,  -- topic id
        t.user_id,  -- user id
        t.title,  -- topic title
        t.views,  -- number of views
        posts_count-1 AS "posts_count",  -- number of posts in the topic
        t.created_at,  -- creation date of the topic
        (CURRENT_DATE::date-t.created_at::date) AS "total_days",  -- total days since the topic was created
        string_agg(tags.name, ', ') AS tag_names  -- aggregate all tags associated with the topic
    FROM topics t  -- from topics table
    -- Join necessary tables to get tag names
    LEFT JOIN topic_tags tt ON tt.topic_id = t.id
    LEFT JOIN tags ON tags.id = tt.tag_id
    LEFT JOIN tag_group_memberships tgm ON tgm.tag_id = tt.tag_id
    -- Subquery to get the date of the first reply for each topic
    LEFT JOIN (SELECT topic_id, MIN(created_at) AS "created_at"
                FROM posts
                WHERE deleted_at ISNULL
                    AND post_type = 1
                    AND post_number > 1
                GROUP BY topic_id) date_first_reply ON date_first_reply.topic_id = t.id
    WHERE t.deleted_at ISNULL  -- only consider topics that are not deleted
        AND t.created_at::date BETWEEN :start_date AND :end_date  -- only consider topics created between start_date and end_date
        AND t.archetype = 'regular'  -- only consider regular topics
    GROUP BY t.id, t.user_id, t.title, t.views, posts_count, t.created_at  -- group by necessary fields to get the correct count
),

-- Define CTE for solved topics
solved_topics AS (
    -- Select topic id and creation date of the solution
    SELECT 
        vt.id,
        tcf.created_at
    FROM topic_custom_fields tcf  -- from topic_custom_fields table
    INNER JOIN valid_topics vt ON vt.id = tcf.topic_id  -- join with valid_topics CTE
    WHERE tcf.name = 'accepted_answer_post_id'  -- only consider topics with an accepted answer
),

-- Define CTE for the last reply of each topic
last_reply AS (
    -- Select topic id and user id of the last reply
    SELECT p.topic_id, p.user_id FROM posts p
    INNER JOIN (SELECT topic_id, MAX(id) AS post FROM posts p  -- subquery to get the id of the last post for each topic
                WHERE deleted_at ISNULL
                AND post_type = 1
                GROUP BY topic_id) x ON x.post = p.id  -- join with posts table to get the user id of the last reply
),

-- Define CTE for the first reply of each topic
first_reply AS (
    -- Select topic id, user id and creation date of the first reply
    SELECT p.topic_id, p.user_id, p.created_at FROM posts p
    INNER JOIN (SELECT topic_id, MIN(id) AS post FROM posts p  -- subquery to get the id of the first reply for each topic
                WHERE deleted_at ISNULL
                AND post_type = 1
                AND post_number > 1
                GROUP BY topic_id) x ON x.post = p.id  -- join with posts table to get the user id and creation date of the first reply
)

-- Main query to get the final result set
SELECT
    CASE 
        WHEN st.id IS NOT NULL THEN 'solved'  -- if the topic id is in solved_topics CTE, then the status is 'solved'
        ELSE 'unsolved'  -- otherwise, the status is 'unsolved'
    END AS status,
    vt.tag_names,  -- tag names
    vt.id AS topic_id,  -- topic id
    vt.user_id topic_user_id,  -- user id
    ue.email,  -- user email
    vt.title,  -- topic title
    vt.views,  -- number of views
    lr.user_id AS last_reply_user_id,  -- user id of the last reply
    ue2.email AS last_reply_user_email,  -- email of the user who made the last reply
    vt.created_at::date topic_create,  -- creation date of the topic
    COALESCE(TO_CHAR(fr.created_at,'YYYY-MM-DD'),'') AS first_reply_create,  -- creation date of the first reply, if exists, otherwise empty string
    COALESCE(TO_CHAR(st.created_at,'YYYY-MM-DD'),'') AS solution_create,  -- creation date of the solution, if exists, otherwise empty string
    COALESCE(fr.created_at::date - vt.created_at::date,0) AS "time_first_reply(days)",  -- time taken for the first reply in days
    COALESCE(CEIL(extract(epoch from (fr.created_at - vt.created_at))/3600.00),0) as "time_first_reply(hours)",  -- time taken for the first reply in hours
    COALESCE(st.created_at::date - vt.created_at::date,0) AS "time_solution(days)",  -- time taken for the solution in days
    COALESCE(CEIL(extract(epoch from (st.created_at - vt.created_at))/3600.00),0) as "time_solution(hours)",  -- time taken for the solution in hours
    posts_count AS number_of_replies,  -- number of replies
    total_days AS total_days_without_solution  -- total days without solution
FROM valid_topics vt  -- from valid_topics CTE
LEFT JOIN last_reply lr ON lr.topic_id = vt.id  -- join with last_reply CTE
LEFT JOIN first_reply fr ON fr.topic_id = vt.id  -- join with first_reply CTE
LEFT JOIN solved_topics st ON st.id = vt.id  -- join with solved_topics CTE
INNER JOIN user_emails ue ON vt.user_id = ue.user_id AND ue."primary" = true  -- join with user_emails table to get the email of the user
LEFT JOIN user_emails ue2 ON lr.user_id = ue2.user_id AND ue2."primary" = true  -- join with user_emails table to get the email of the user who made the last reply
WHERE (:tag_name =  'all' OR LOWER(vt.tag_names) LIKE '%' || LOWER(:tag_name) || '%')  -- filter by tag name
ORDER BY tag_names, total_days DESC  -- order by tag names and total days in descending order

If you have any questions or examples of how you’ve used COALESCE in your Data Explorer queries, please feel free to share them below. :slightly_smiling_face:

5 Likes