Using DATE_TRUNC for Data Aggregation

The date_trunc function is a powerful tool in SQL. It allows you to truncate a TIMESTAMP or an INTERVAL value based on a specified date part, which makes it an invaluable function when you want to aggregate or group data based on a specific time period.

Syntax

The syntax for the date_trunc function is as follows:

date_trunc('date_part', field)
  • date_part: This is a string that specifies the part of the date or timestamp to truncate to. It can be one of the following values:
    • millennium
    • century
    • decade
    • year
    • quarter
    • month
    • week
    • day
    • hour
    • minute
    • second
    • milliseconds
    • microseconds
  • field: This is the timestamp or interval to be truncated.

Example Usage In DE Queries

Let’s take a look at a few example queries that utilize date_trunc:

Count of New Topics by Month

Complexity Level: Beginner

This SQL query is used to count the number of topics created in each month in the Discourse database.

SELECT 
    date_trunc('month', created_at)::DATE AS month,
    count(id)
FROM topics
GROUP BY month
ORDER BY month DESC

In this query, date_trunc('month', created_at)::DATE truncates the created_at timestamp to the month and then casts it to display a date, effectively grouping the topics by the month they were created.

The count(id) function then counts the number of topics created in each month. The results are ordered by month in descending order, so the most recent month will be first.

Example Results:

month count
2023-09-01 1
2023-08-01 6
2023-07-01 10
Detailed Explanation with Inline Comments
-- Select the month when the topic was created and the count of topics
SELECT 
    -- Truncate the 'created_at' timestamp to the month and cast it to a date
    -- This groups the topics by the month they were created
    date_trunc('month', created_at)::DATE AS month,
    -- Count the number of topics created in each month
    count(id)
-- From the 'topics' table
FROM topics
-- Group the results by the month
GROUP BY month
-- Order the results by the month in descending order
-- This means the most recent month will be first
ORDER BY month DESC

Cumulative Total Users

Complexity Level: Intermediate

This query will provide a weekly report of user signups on a Discourse forum, along with a running total of users. It uses a WITH clause to create a temporary result set (daily_signups ), and then selects from that result set.

-- [params]
-- date :start_date = 2000-01-01
-- date :end_date

WITH daily_signups AS(
SELECT
    to_char(date_trunc('week', u.created_at)::date,'YYYY-MM-DD') AS Date,
    Count (id) as Signups
FROM users u
WHERE
    u.created_at::date BETWEEN :start_date::date AND :end_date::date
GROUP BY Date
)

SELECT
    Date, Signups, SUM(Signups) OVER (ORDER BY Date) as total_users
FROM daily_signups
ORDER BY Date Asc

Here’s a breakdown of how this query is functioning:

  • The WITH clause creates a temporary result set named daily_signups. This result set contains the number of user signups for each week between the start and end dates specified by the parameters :start_date and :end_date.
  • Within the daily_signups result set, date_trunc('week', u.created_at)::date truncates the created_at timestamp to the week and then casts it to a date. This effectively groups the users by the week they signed up.
  • Count(id) then counts the number of users who signed up in each week.
  • In the main SELECT statement, SUM(Signups) OVER (ORDER BY Date) calculates a running total of users. The OVER (ORDER BY Date) clause specifies that the sum should be calculated over the rows ordered by date, so it gives a cumulative sum of signups up to each date.
  • The results are then ordered by date in ascending order.

Example Results:

date signups total_users
2013-01-28 20 20.0
2013-02-04 2136 2156.0
2013-02-11 442 2598.0
Detailed Explanation with Inline Comments
-- Define parameters for the start and end dates
-- [params]
-- date :start_date = 2000-01-01
-- date :end_date

-- Create a common table expression (CTE) to count the number of user signups each week
WITH daily_signups AS(
SELECT
    -- Truncate the 'created_at' timestamp to the week and format it as a date string
    to_char(date_trunc('week', u.created_at)::date,'YYYY-MM-DD') AS Date,
    -- Count the number of users who signed up
    Count (id) as Signups
FROM users u
WHERE
    -- Only include users who signed up between the start and end dates
    u.created_at::date BETWEEN :start_date::date AND :end_date::date
GROUP BY Date
)

-- Select the date, number of signups, and running total of signups
SELECT
    Date, 
    Signups, 
    -- Calculate the running total of signups
    SUM(Signups) OVER (ORDER BY Date) as total_users
FROM daily_signups
-- Order the results by date in ascending order
ORDER BY Date Asc

Number of Solved and Unsolved Questions per Month

Complexity Level: Intermediate / Requires Discourse Solved Plugin

This query will provide a monthly report of the number of solved and unsolved questions on a Discourse forum. This query assumes all topics on a site can be solved.

-- [params]
-- date :start_date
-- date :end_date

WITH monthly_questions AS (
    SELECT
        date_trunc('month', created_at)::DATE AS month,
        COUNT(id) as total_questions
    FROM topics
    WHERE
        created_at::date BETWEEN :start_date::date AND :end_date::date
    GROUP BY month
),
solved_questions AS (
    SELECT
        date_trunc('month', created_at)::DATE AS month,
        COUNT(id) as solved
    FROM user_actions
    WHERE
        created_at::date BETWEEN :start_date::date AND :end_date::date
    AND action_type = 15
    GROUP BY month
)

SELECT
    mq.month, 
    mq.total_questions, 
    COALESCE(sq.solved, 0) as solved,
    mq.total_questions - COALESCE(sq.solved, 0) as unsolved
FROM monthly_questions mq
LEFT JOIN solved_questions sq ON mq.month = sq.month
ORDER BY mq.month ASC

In this query, the monthly_questions CTE counts the total number of questions (topics) created each month. The solved_questions CTE counts the number of questions marked as solved each month by counting the number of id from the user_actions table WITH action type = 15

The main SELECT statement then calculates the number of unsolved questions by subtracting the number of solved questions from the total number of questions. The results are ordered by month in ascending order, so the earliest month will be first.

Example Results:

month total_questions solved unsolved
2023-07-01 10 3 7
2023-08-01 6 0 6
2023-09-01 1 1 0
Detailed Explanation with Inline Comments
-- [params]
-- date :start_date = 2000-01-01
-- date :end_date

-- Create a CTE to count the total number of questions (topics) created each month
WITH monthly_questions AS (
    SELECT
        -- Truncate the 'created_at' timestamp to the month
        date_trunc('month', created_at)::DATE AS month,
        -- Count the number of topics created in each month
        COUNT(id) as total_questions
    FROM topics
    WHERE
        -- Only include topics created between the start and end dates
        created_at::date BETWEEN :start_date::date AND :end_date::date
    GROUP BY month
),
-- Create a CTE to count the number of questions marked as solved each month
solved_questions AS (
    SELECT
        -- Truncate the 'created_at' timestamp to the month
        date_trunc('month', created_at)::DATE AS month,
        -- Count the number of solved questions in each month
        COUNT(id) as solved
    FROM user_actions
    WHERE
        -- Only include actions taken between the start and end dates
        created_at::date BETWEEN :start_date::date AND :end_date::date
        -- Only consider actions where the action type is 15 (indicating a solved question)
        AND action_type = 15
    GROUP BY month
)

-- Select the month, total number of questions, number of solved questions, and number of unsolved questions
SELECT
    mq.month, 
    mq.total_questions, 
    -- If there are no solved questions in a month, display 0
    COALESCE(sq.solved, 0) as solved,
    -- Subtract the number of solved questions from the total number of questions to get the number of unsolved questions
    mq.total_questions - COALESCE(sq.solved, 0) as unsolved
FROM monthly_questions mq
-- Join the 'monthly_questions' and 'solved_questions' CTEs on the month
LEFT JOIN solved_questions sq ON mq.month = sq.month
-- Order the results by month in ascending order
ORDER BY mq.month ASC

Topic Reply Statistics

Complexity Level: Advanced

This complex SQL query provides a weekly report on topic activity in a Discourse forum. It breaks down topic data into several key metrics: the number of topics with at least one response, the number of topics without a response, the maximum number of days a topic has gone without a response, and the average time to the first response.

WITH posts_list AS (
    SELECT 
        t.id topic_id,
        p.post_number,
        p.created_at,
        ROW_NUMBER () OVER (
                    PARTITION BY p.topic_id 
                    ORDER BY p.post_number)  AS post_order
    FROM posts p
    INNER JOIN topics t ON t.id = p.topic_id AND archetype = 'regular' AND t.deleted_at ISNULL
    WHERE p.post_type = 1
        AND p.deleted_at ISNULL
        AND t.deleted_at ISNULL
        AND t.archetype = 'regular'
    ORDER BY p.topic_id, p.post_number
),
atleast_1_response AS (
    SELECT 
        date_trunc('week', t.created_at::date)::date AS "week",
        COUNT(*) AS "topics with atleast one response"
    FROM topics t
    WHERE t.archetype = 'regular'
        AND t.deleted_at ISNULL
        AND t.posts_count >= 2
    GROUP BY "week"
),
no_response AS(
    SELECT 
        date_trunc('week', t.created_at::date)::date AS "week",
        COUNT(*) AS "topics without response"
    FROM topics t
    WHERE t.archetype = 'regular'
        AND t.deleted_at ISNULL
        AND t.posts_count = 1
    GROUP BY "week"
),
max_days_without_response AS(
    SELECT 
        date_trunc('week', t.created_at::date)::date AS "week",
        CURRENT_DATE::date-MIN(created_at)::date AS "max days without response"
    FROM topics t
    WHERE t.archetype = 'regular'
        AND t.deleted_at ISNULL
        AND t.posts_count = 1
    GROUP BY "week"
),
avg_time_first_response AS (
    SELECT 
        date_trunc('week', pl.created_at::date)::date AS "week",
        EXTRACT(epoch FROM AVG(pl2.created_at-pl.created_at))/3600 AS "avg time first response (h)"
    FROM posts_list pl
    INNER JOIN posts_list pl2 ON pl.topic_id = pl2.topic_id AND pl2.post_order = 2
    WHERE  pl.post_order = 1
    GROUP BY "week" 
)

SELECT ar.week, 
    "topics without response", 
    "max days without response", 
    "topics with atleast one response",
    "avg time first response (h)"
FROM atleast_1_response ar
INNER JOIN no_response nr ON nr.week = ar.week
INNER JOIN max_days_without_response mdwr ON mdwr.week = ar.week
INNER JOIN avg_time_first_response atfr ON atfr.week = ar.week
ORDER BY "week" DESC

Here’s a summary of how this query works:

  • The posts_list common table expression (CTE) selects a list of all regular posts from the posts and topics tables, ordered by topic_id and post_number. It also assigns a row number (post_order) to each post within its topic.
  • The atleast_1_response CTE counts the number of regular topics with at least one response (i.e., posts_count is greater than or equal to 2) for each week.
  • The no_response CTE counts the number of regular topics without a response (i.e., posts_count equals 1) for each week.
  • The max_days_without_response CTE calculates the maximum number of days a topic without a response has been left unanswered for each week.
  • The avg_time_first_response CTE calculates the average time to the first response for each topic in hours, for each week.
  • The main SELECT statement then joins these CTEs together on the week and selects the relevant columns. The results are ordered by week in descending order.
week topics without response max days without response topics with atleast one response avg time first response (h)
2023-09-04 15 2 47 2.6778684519444444
2023-08-28 30 9 138 8.7899938238888889
2023-08-21 22 16 130 9.3280889688888889
Detailed Explanation with Inline Comments
-- Create a temporary table (CTE) of all regular posts, ordered by topic_id and post_number
WITH posts_list AS (
    SELECT 
        t.id topic_id,  -- Topic ID
        p.post_number,  -- Post number
        p.created_at,   -- Post creation date
        -- Assign a row number to each post within its topic
        ROW_NUMBER () OVER (
                    PARTITION BY p.topic_id 
                    ORDER BY p.post_number)  AS post_order
    FROM posts p
    -- Join with topics table, only considering regular topics that are not deleted
    INNER JOIN topics t ON t.id = p.topic_id AND archetype = 'regular' AND t.deleted_at ISNULL
    WHERE p.post_type = 1
        AND p.deleted_at ISNULL  -- Exclude deleted posts
        AND t.deleted_at ISNULL  -- Exclude deleted topics
        AND t.archetype = 'regular'  -- Only consider regular topics
    ORDER BY p.topic_id, p.post_number
),
-- Create a CTE to count the number of regular topics with at least one response for each week
atleast_1_response AS (
    SELECT 
        -- Truncate the created_at timestamp to the week
        date_trunc('week', t.created_at::date)::date AS "week",
        COUNT(*) AS "topics with atleast one response"
    FROM topics t
    WHERE t.archetype = 'regular'  -- Only consider regular topics
        AND t.deleted_at ISNULL  -- Exclude deleted topics
        AND t.posts_count >= 2  -- Only consider topics with at least one response
    GROUP BY "week"
),
-- Create a CTE to count the number of regular topics without a response for each week
no_response AS(
    SELECT 
        -- Truncate the created_at timestamp to the week
        date_trunc('week', t.created_at::date)::date AS "week",
        COUNT(*) AS "topics without response"
    FROM topics t
    WHERE t.archetype = 'regular'  -- Only consider regular topics
        AND t.deleted_at ISNULL  -- Exclude deleted topics
        AND t.posts_count = 1  -- Only consider topics without a response
    GROUP BY "week"
),
-- Create a CTE to calculate the maximum number of days a topic without a response has been left unanswered for each week
max_days_without_response AS(
    SELECT 
        -- Truncate the created_at timestamp to the week
        date_trunc('week', t.created_at::date)::date AS "week",
        -- Calculate the number of days from the topic creation date to the current date
        CURRENT_DATE::date-MIN(created_at)::date AS "max days without response"
    FROM topics t
    WHERE t.archetype = 'regular'  -- Only consider regular topics
        AND t.deleted_at ISNULL  -- Exclude deleted topics
        AND t.posts_count = 1  -- Only consider topics without a response
    GROUP BY "week"
),
-- Create a CTE to calculate the average time to the first response for each topic in hours, for each week
avg_time_first_response AS (
    SELECT 
        -- Truncate the created_at timestamp to the week
        date_trunc('week', pl.created_at::date)::date AS "week",
        -- Calculate the average time to the first response in hours
        EXTRACT(epoch FROM AVG(pl2.created_at-pl.created_at))/3600 AS "avg time first response (h)"
    FROM posts_list pl
    -- Join with the posts_list CTE, only considering the second post in each topic
    INNER JOIN posts_list pl2 ON pl.topic_id = pl2.topic_id AND pl2.post_order = 2
    WHERE  pl.post_order = 1  -- Only consider the first post in each topic
    GROUP BY "week" 
)

-- Select the week, the number of topics without a response, the maximum number of days without a response, the number of topics with at least one response, and the average time to the first response
SELECT ar.week, 
    "topics without response", 
    "max days without response", 
    "topics with atleast one response",
    "avg time first response (h)"
FROM atleast_1_response ar
-- Join the CTEs together on the week
INNER JOIN no_response nr ON nr.week = ar.week
INNER JOIN max_days_without_response mdwr ON mdwr.week = ar.week
INNER JOIN avg_time_first_response atfr ON atfr.week = ar.week
-- Order by week in descending order
ORDER BY "week" DESC

These are just a few examples of the way you can use date_trunc in your Data Explorer queries. Feel free to use any of these queries on your site, and if you have any questions please ask them below. :slight_smile:

5 Likes