The date_trunc
function is a powerful tool in SQL. It allows you to truncate a TIMESTAMP or an INTERVAL value based on a specified date part, which makes it an invaluable function when you want to aggregate or group data based on a specific time period.
Syntax
The syntax for the date_trunc
function is as follows:
date_trunc('date_part', field)
date_part
: This is a string that specifies the part of the date or timestamp to truncate to. It can be one of the following values:millennium
century
decade
year
quarter
month
week
day
hour
minute
second
milliseconds
microseconds
field
: This is the timestamp or interval to be truncated.
Example Usage In DE Queries
Let’s take a look at a few example queries that utilize date_trunc
:
Count of New Topics by Month
Complexity Level: Beginner
This SQL query is used to count the number of topics created in each month in the Discourse database.
SELECT
date_trunc('month', created_at)::DATE AS month,
count(id)
FROM topics
GROUP BY month
ORDER BY month DESC
In this query, date_trunc('month', created_at)::DATE
truncates the created_at
timestamp to the month and then casts it to display a date, effectively grouping the topics by the month they were created.
The count(id)
function then counts the number of topics created in each month. The results are ordered by month in descending order, so the most recent month will be first.
Example Results:
month | count |
---|---|
2023-09-01 | 1 |
2023-08-01 | 6 |
2023-07-01 | 10 |
Detailed Explanation with Inline Comments
-- Select the month when the topic was created and the count of topics
SELECT
-- Truncate the 'created_at' timestamp to the month and cast it to a date
-- This groups the topics by the month they were created
date_trunc('month', created_at)::DATE AS month,
-- Count the number of topics created in each month
count(id)
-- From the 'topics' table
FROM topics
-- Group the results by the month
GROUP BY month
-- Order the results by the month in descending order
-- This means the most recent month will be first
ORDER BY month DESC
Cumulative Total Users
Complexity Level: Intermediate
This query will provide a weekly report of user signups on a Discourse forum, along with a running total of users. It uses a WITH
clause to create a temporary result set (daily_signups
), and then selects from that result set.
-- [params]
-- date :start_date = 2000-01-01
-- date :end_date
WITH daily_signups AS(
SELECT
to_char(date_trunc('week', u.created_at)::date,'YYYY-MM-DD') AS Date,
Count (id) as Signups
FROM users u
WHERE
u.created_at::date BETWEEN :start_date::date AND :end_date::date
GROUP BY Date
)
SELECT
Date, Signups, SUM(Signups) OVER (ORDER BY Date) as total_users
FROM daily_signups
ORDER BY Date Asc
Here’s a breakdown of how this query is functioning:
- The
WITH
clause creates a temporary result set nameddaily_signups
. This result set contains the number of user signups for each week between the start and end dates specified by the parameters:start_date
and:end_date
. - Within the
daily_signups
result set,date_trunc('week', u.created_at)::date
truncates thecreated_at
timestamp to the week and then casts it to a date. This effectively groups the users by the week they signed up. Count(id)
then counts the number of users who signed up in each week.- In the main
SELECT
statement,SUM(Signups) OVER (ORDER BY Date)
calculates a running total of users. TheOVER (ORDER BY Date)
clause specifies that the sum should be calculated over the rows ordered by date, so it gives a cumulative sum of signups up to each date. - The results are then ordered by date in ascending order.
Example Results:
date | signups | total_users |
---|---|---|
2013-01-28 | 20 | 20.0 |
2013-02-04 | 2136 | 2156.0 |
2013-02-11 | 442 | 2598.0 |
Detailed Explanation with Inline Comments
-- Define parameters for the start and end dates
-- [params]
-- date :start_date = 2000-01-01
-- date :end_date
-- Create a common table expression (CTE) to count the number of user signups each week
WITH daily_signups AS(
SELECT
-- Truncate the 'created_at' timestamp to the week and format it as a date string
to_char(date_trunc('week', u.created_at)::date,'YYYY-MM-DD') AS Date,
-- Count the number of users who signed up
Count (id) as Signups
FROM users u
WHERE
-- Only include users who signed up between the start and end dates
u.created_at::date BETWEEN :start_date::date AND :end_date::date
GROUP BY Date
)
-- Select the date, number of signups, and running total of signups
SELECT
Date,
Signups,
-- Calculate the running total of signups
SUM(Signups) OVER (ORDER BY Date) as total_users
FROM daily_signups
-- Order the results by date in ascending order
ORDER BY Date Asc
Number of Solved and Unsolved Questions per Month
Complexity Level: Intermediate / Requires Discourse Solved Plugin
This query will provide a monthly report of the number of solved and unsolved questions on a Discourse forum. This query assumes all topics on a site can be solved.
-- [params]
-- date :start_date
-- date :end_date
WITH monthly_questions AS (
SELECT
date_trunc('month', created_at)::DATE AS month,
COUNT(id) as total_questions
FROM topics
WHERE
created_at::date BETWEEN :start_date::date AND :end_date::date
GROUP BY month
),
solved_questions AS (
SELECT
date_trunc('month', created_at)::DATE AS month,
COUNT(id) as solved
FROM user_actions
WHERE
created_at::date BETWEEN :start_date::date AND :end_date::date
AND action_type = 15
GROUP BY month
)
SELECT
mq.month,
mq.total_questions,
COALESCE(sq.solved, 0) as solved,
mq.total_questions - COALESCE(sq.solved, 0) as unsolved
FROM monthly_questions mq
LEFT JOIN solved_questions sq ON mq.month = sq.month
ORDER BY mq.month ASC
In this query, the monthly_questions
CTE counts the total number of questions (topics) created each month. The solved_questions
CTE counts the number of questions marked as solved each month by counting the number of id
from the user_actions
table WITH action type = 15
The main SELECT
statement then calculates the number of unsolved questions by subtracting the number of solved questions from the total number of questions. The results are ordered by month in ascending order, so the earliest month will be first.
Example Results:
month | total_questions | solved | unsolved |
---|---|---|---|
2023-07-01 | 10 | 3 | 7 |
2023-08-01 | 6 | 0 | 6 |
2023-09-01 | 1 | 1 | 0 |
Detailed Explanation with Inline Comments
-- [params]
-- date :start_date = 2000-01-01
-- date :end_date
-- Create a CTE to count the total number of questions (topics) created each month
WITH monthly_questions AS (
SELECT
-- Truncate the 'created_at' timestamp to the month
date_trunc('month', created_at)::DATE AS month,
-- Count the number of topics created in each month
COUNT(id) as total_questions
FROM topics
WHERE
-- Only include topics created between the start and end dates
created_at::date BETWEEN :start_date::date AND :end_date::date
GROUP BY month
),
-- Create a CTE to count the number of questions marked as solved each month
solved_questions AS (
SELECT
-- Truncate the 'created_at' timestamp to the month
date_trunc('month', created_at)::DATE AS month,
-- Count the number of solved questions in each month
COUNT(id) as solved
FROM user_actions
WHERE
-- Only include actions taken between the start and end dates
created_at::date BETWEEN :start_date::date AND :end_date::date
-- Only consider actions where the action type is 15 (indicating a solved question)
AND action_type = 15
GROUP BY month
)
-- Select the month, total number of questions, number of solved questions, and number of unsolved questions
SELECT
mq.month,
mq.total_questions,
-- If there are no solved questions in a month, display 0
COALESCE(sq.solved, 0) as solved,
-- Subtract the number of solved questions from the total number of questions to get the number of unsolved questions
mq.total_questions - COALESCE(sq.solved, 0) as unsolved
FROM monthly_questions mq
-- Join the 'monthly_questions' and 'solved_questions' CTEs on the month
LEFT JOIN solved_questions sq ON mq.month = sq.month
-- Order the results by month in ascending order
ORDER BY mq.month ASC
Topic Reply Statistics
Complexity Level: Advanced
This complex SQL query provides a weekly report on topic activity in a Discourse forum. It breaks down topic data into several key metrics: the number of topics with at least one response, the number of topics without a response, the maximum number of days a topic has gone without a response, and the average time to the first response.
WITH posts_list AS (
SELECT
t.id topic_id,
p.post_number,
p.created_at,
ROW_NUMBER () OVER (
PARTITION BY p.topic_id
ORDER BY p.post_number) AS post_order
FROM posts p
INNER JOIN topics t ON t.id = p.topic_id AND archetype = 'regular' AND t.deleted_at ISNULL
WHERE p.post_type = 1
AND p.deleted_at ISNULL
AND t.deleted_at ISNULL
AND t.archetype = 'regular'
ORDER BY p.topic_id, p.post_number
),
atleast_1_response AS (
SELECT
date_trunc('week', t.created_at::date)::date AS "week",
COUNT(*) AS "topics with atleast one response"
FROM topics t
WHERE t.archetype = 'regular'
AND t.deleted_at ISNULL
AND t.posts_count >= 2
GROUP BY "week"
),
no_response AS(
SELECT
date_trunc('week', t.created_at::date)::date AS "week",
COUNT(*) AS "topics without response"
FROM topics t
WHERE t.archetype = 'regular'
AND t.deleted_at ISNULL
AND t.posts_count = 1
GROUP BY "week"
),
max_days_without_response AS(
SELECT
date_trunc('week', t.created_at::date)::date AS "week",
CURRENT_DATE::date-MIN(created_at)::date AS "max days without response"
FROM topics t
WHERE t.archetype = 'regular'
AND t.deleted_at ISNULL
AND t.posts_count = 1
GROUP BY "week"
),
avg_time_first_response AS (
SELECT
date_trunc('week', pl.created_at::date)::date AS "week",
EXTRACT(epoch FROM AVG(pl2.created_at-pl.created_at))/3600 AS "avg time first response (h)"
FROM posts_list pl
INNER JOIN posts_list pl2 ON pl.topic_id = pl2.topic_id AND pl2.post_order = 2
WHERE pl.post_order = 1
GROUP BY "week"
)
SELECT ar.week,
"topics without response",
"max days without response",
"topics with atleast one response",
"avg time first response (h)"
FROM atleast_1_response ar
INNER JOIN no_response nr ON nr.week = ar.week
INNER JOIN max_days_without_response mdwr ON mdwr.week = ar.week
INNER JOIN avg_time_first_response atfr ON atfr.week = ar.week
ORDER BY "week" DESC
Here’s a summary of how this query works:
- The
posts_list
common table expression (CTE) selects a list of all regular posts from theposts
andtopics
tables, ordered bytopic_id
andpost_number
. It also assigns a row number (post_order
) to each post within its topic. - The
atleast_1_response
CTE counts the number of regular topics with at least one response (i.e.,posts_count
is greater than or equal to 2) for each week. - The
no_response
CTE counts the number of regular topics without a response (i.e.,posts_count
equals 1) for each week. - The
max_days_without_response
CTE calculates the maximum number of days a topic without a response has been left unanswered for each week. - The
avg_time_first_response
CTE calculates the average time to the first response for each topic in hours, for each week. - The main
SELECT
statement then joins these CTEs together on the week and selects the relevant columns. The results are ordered by week in descending order.
week | topics without response | max days without response | topics with atleast one response | avg time first response (h) |
---|---|---|---|---|
2023-09-04 | 15 | 2 | 47 | 2.6778684519444444 |
2023-08-28 | 30 | 9 | 138 | 8.7899938238888889 |
2023-08-21 | 22 | 16 | 130 | 9.3280889688888889 |
Detailed Explanation with Inline Comments
-- Create a temporary table (CTE) of all regular posts, ordered by topic_id and post_number
WITH posts_list AS (
SELECT
t.id topic_id, -- Topic ID
p.post_number, -- Post number
p.created_at, -- Post creation date
-- Assign a row number to each post within its topic
ROW_NUMBER () OVER (
PARTITION BY p.topic_id
ORDER BY p.post_number) AS post_order
FROM posts p
-- Join with topics table, only considering regular topics that are not deleted
INNER JOIN topics t ON t.id = p.topic_id AND archetype = 'regular' AND t.deleted_at ISNULL
WHERE p.post_type = 1
AND p.deleted_at ISNULL -- Exclude deleted posts
AND t.deleted_at ISNULL -- Exclude deleted topics
AND t.archetype = 'regular' -- Only consider regular topics
ORDER BY p.topic_id, p.post_number
),
-- Create a CTE to count the number of regular topics with at least one response for each week
atleast_1_response AS (
SELECT
-- Truncate the created_at timestamp to the week
date_trunc('week', t.created_at::date)::date AS "week",
COUNT(*) AS "topics with atleast one response"
FROM topics t
WHERE t.archetype = 'regular' -- Only consider regular topics
AND t.deleted_at ISNULL -- Exclude deleted topics
AND t.posts_count >= 2 -- Only consider topics with at least one response
GROUP BY "week"
),
-- Create a CTE to count the number of regular topics without a response for each week
no_response AS(
SELECT
-- Truncate the created_at timestamp to the week
date_trunc('week', t.created_at::date)::date AS "week",
COUNT(*) AS "topics without response"
FROM topics t
WHERE t.archetype = 'regular' -- Only consider regular topics
AND t.deleted_at ISNULL -- Exclude deleted topics
AND t.posts_count = 1 -- Only consider topics without a response
GROUP BY "week"
),
-- Create a CTE to calculate the maximum number of days a topic without a response has been left unanswered for each week
max_days_without_response AS(
SELECT
-- Truncate the created_at timestamp to the week
date_trunc('week', t.created_at::date)::date AS "week",
-- Calculate the number of days from the topic creation date to the current date
CURRENT_DATE::date-MIN(created_at)::date AS "max days without response"
FROM topics t
WHERE t.archetype = 'regular' -- Only consider regular topics
AND t.deleted_at ISNULL -- Exclude deleted topics
AND t.posts_count = 1 -- Only consider topics without a response
GROUP BY "week"
),
-- Create a CTE to calculate the average time to the first response for each topic in hours, for each week
avg_time_first_response AS (
SELECT
-- Truncate the created_at timestamp to the week
date_trunc('week', pl.created_at::date)::date AS "week",
-- Calculate the average time to the first response in hours
EXTRACT(epoch FROM AVG(pl2.created_at-pl.created_at))/3600 AS "avg time first response (h)"
FROM posts_list pl
-- Join with the posts_list CTE, only considering the second post in each topic
INNER JOIN posts_list pl2 ON pl.topic_id = pl2.topic_id AND pl2.post_order = 2
WHERE pl.post_order = 1 -- Only consider the first post in each topic
GROUP BY "week"
)
-- Select the week, the number of topics without a response, the maximum number of days without a response, the number of topics with at least one response, and the average time to the first response
SELECT ar.week,
"topics without response",
"max days without response",
"topics with atleast one response",
"avg time first response (h)"
FROM atleast_1_response ar
-- Join the CTEs together on the week
INNER JOIN no_response nr ON nr.week = ar.week
INNER JOIN max_days_without_response mdwr ON mdwr.week = ar.week
INNER JOIN avg_time_first_response atfr ON atfr.week = ar.week
-- Order by week in descending order
ORDER BY "week" DESC
These are just a few examples of the way you can use date_trunc
in your Data Explorer queries. Feel free to use any of these queries on your site, and if you have any questions please ask them below.