This is an SQL version of a Cohort Analysis Report for User Activity, for use within the Data Explorer plugin.
The Cohort Analysis Report is designed to provide administrators with insights into user engagement over time. By analyzing the activity of users grouped by their registration month (cohorts), this report tracks the number of active users each month post-registration who meet a minimum posting activity criteria.
This report can be a valuable resource for understanding user retention, engagement trends, evaluating community health, and the identifying effectiveness of community growth strategies.
Cohort Analysis Report - Active Users per Month
--[params]
-- date :start_date = 2023-01-01
-- int :min_posts_per_month = 1
WITH user_cohorts AS (
SELECT
id AS user_id,
DATE_TRUNC('month', created_at) AS cohort,
COUNT(id) OVER (PARTITION BY DATE_TRUNC('month', created_at)) AS users_signed_up
FROM users
WHERE created_at >= :start_date -- Use the start_date parameter to filter users
),
posts_activity AS (
SELECT
p.user_id,
EXTRACT(YEAR FROM AGE(p.created_at, u.created_at)) * 12 + EXTRACT(MONTH FROM AGE(p.created_at, u.created_at)) AS months_after_registration,
DATE_TRUNC('month', u.created_at) AS cohort
FROM posts p
JOIN users u ON p.user_id = u.id
WHERE p.created_at >= u.created_at
),
activity_counts AS (
SELECT
cohort,
months_after_registration,
COUNT(user_id) AS posts_count,
user_id
FROM posts_activity
GROUP BY cohort, months_after_registration, user_id
HAVING COUNT(user_id) >= :min_posts_per_month -- Filter users by the minimum posts per month
),
active_users AS (
SELECT
cohort,
months_after_registration,
COUNT(DISTINCT user_id) AS active_users
FROM activity_counts
GROUP BY cohort, months_after_registration
),
cohorts_series AS (
SELECT generate_series AS months_after_registration
FROM generate_series(0, 11)
),
cohorts AS (
SELECT
cohort,
MAX(users_signed_up) AS users_signed_up -- Aggregate to get the total users signed up for each cohort
FROM user_cohorts
GROUP BY cohort
),
cross_join AS (
SELECT
c.cohort,
c.users_signed_up,
cs.months_after_registration
FROM cohorts c
CROSS JOIN cohorts_series cs
),
final_counts AS (
SELECT
cj.cohort,
cj.users_signed_up,
cj.months_after_registration,
COALESCE(au.active_users, 0) AS active_users
FROM cross_join cj
LEFT JOIN active_users au ON au.cohort = cj.cohort AND au.months_after_registration = cj.months_after_registration
)
SELECT
TO_CHAR(cohort, 'Mon YYYY') AS "Joined In", -- Include the year in the Joined In column
users_signed_up AS "Users Signed Up",
MAX(CASE WHEN months_after_registration = 0 THEN active_users END) AS "Month 1",
MAX(CASE WHEN months_after_registration = 1 THEN active_users END) AS "Month 2",
MAX(CASE WHEN months_after_registration = 2 THEN active_users END) AS "Month 3",
MAX(CASE WHEN months_after_registration = 3 THEN active_users END) AS "Month 4",
MAX(CASE WHEN months_after_registration = 4 THEN active_users END) AS "Month 5",
MAX(CASE WHEN months_after_registration = 5 THEN active_users END) AS "Month 6",
MAX(CASE WHEN months_after_registration = 6 THEN active_users END) AS "Month 7",
MAX(CASE WHEN months_after_registration = 7 THEN active_users END) AS "Month 8",
MAX(CASE WHEN months_after_registration = 8 THEN active_users END) AS "Month 9",
MAX(CASE WHEN months_after_registration = 9 THEN active_users END) AS "Month 10",
MAX(CASE WHEN months_after_registration = 10 THEN active_users END) AS "Month 11",
MAX(CASE WHEN months_after_registration = 11 THEN active_users END) AS "Month 12"
FROM final_counts
GROUP BY cohort, users_signed_up
ORDER BY cohort
SQL Query Explanation
The report operates by segmenting users into cohorts based on the month they joined. It then tracks these cohorts to see how many users remain active in subsequent months, based on a defined minimum number of posts per month.
Parameters
This report has two parameters:
start_date
: The beginning date from which users are considered for cohort analysis. Users who joined after this date are included in the report.min_posts_per_month
: The minimum number of posts a user must make in a month to be considered active for that month.
CTEs
The Cohort Analysis Report utilizes several Common Table Expressions (CTEs) to organize and process data for analysis. Each CTE serves a specific purpose in the overall query, building upon the previous ones to ultimately produce the final report. Here’s a breakdown of how each CTE operates:
1. user_cohorts
This CTE identifies the cohorts based on the month users joined. For each user, it calculates the cohort they belong to by truncating their created_at
timestamp to the month. It also counts the number of users who signed up in each cohort.
- Key Operations:
DATE_TRUNC('month', created_at) AS cohort
: Truncates thecreated_at
timestamp to month granularity, effectively grouping users by their sign-up month.COUNT(id) OVER (PARTITION BY DATE_TRUNC('month', created_at))
: Counts the number of users in each cohort.
2. posts_activity
This CTE tracks the posting activity of users relative to their registration date. It joins the posts
and users
tables to associate each post with the user who made it and calculates how many months have passed since the user’s registration at the time of each post.
- Key Operations:
EXTRACT(YEAR FROM AGE(p.created_at, u.created_at)) * 12 + EXTRACT(MONTH FROM AGE(p.created_at, u.created_at))
: Calculates the number of months that have passed since the user’s registration for each post.DATE_TRUNC('month', u.created_at) AS cohort
: Identifies the user’s cohort based on their registration month.
3. activity_counts
This CTE aggregates the posting activity from posts_activity
to count the number of posts each user made in each month after registration. It filters these counts to include only users who meet the minimum posting activity specified by the min_posts_per_month
parameter.
- Key Operations:
GROUP BY cohort, months_after_registration, user_id
: Groups the data by cohort, months after registration, and user ID to prepare for counting posts.HAVING COUNT(user_id) >= :min_posts_per_month
: Filters the grouped data to include only users who made at least the minimum number of posts in a month.
4. active_users
This CTE further aggregates the data from activity_counts
to count the number of distinct active users in each cohort for each month after registration.
- Key Operations:
COUNT(DISTINCT user_id) AS active_users
: Counts the number of unique active users in each cohort for each month after registration.
5. cohorts_series
This CTE generates a series of integers from 0 to 11, representing the months after registration. This series is used to ensure that the final report includes all months up to 12 for each cohort, even if there’s no activity data for some months.
- Key Operations:
generate_series(0, 11)
: Generates a series of integers from 0 to 11.
6. cohorts
This CTE aggregates the data from user_cohorts
to get the total number of users signed up for each cohort.
- Key Operations:
MAX(users_signed_up) AS users_signed_up
: Aggregates the total number of users signed up for each cohort.
7. cross_join
This CTE performs a cross join between cohorts
and cohorts_series
to create a grid of all possible combinations of cohorts and months after registration. This ensures that the final report includes rows for each month for each cohort, facilitating the calculation of active users per month.
8. final_counts
This CTE combines the data from cross_join
and active_users
to calculate the final count of active users for each cohort for each month after registration. It uses a left join to ensure that all combinations of cohorts and months are included, even if there are no active users for some.
- Key Operations:
COALESCE(au.active_users, 0) AS active_users
: Ensures that the report shows 0 active users for combinations without any activity, rather than leaving them blank.
The final SELECT
statement outside of the CTEs then formats and presents this data, showing the number of users signed up and the number of active users for each month after registration for each cohort.
Results
The report generates a table with the following columns:
- Joined In: The month and year the cohort was created, indicating when these users signed up.
- Users Signed Up: The total number of users who signed up in that cohort.
- Month 1 to Month 12: Each of these columns represents the number of active users for the cohort in each subsequent month after joining, up to 12 months. An active user is defined as someone who has made at least the minimum number of posts specified by the
min_posts_per_month
parameter.
Example Results
Joined In | Users Signed Up | Month 1 | Month 2 | Month 3 | Month 4 | Month 5 | Month 6 | Month 7 | Month 8 | Month 9 | Month 10 | Month 11 | Month 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Jan 2023 | 120 | 40 | 8 | 4 | 3 | 3 | 3 | 4 | 3 | 2 | 1 | 1 | 4 |
Feb 2023 | 119 | 40 | 7 | 5 | 3 | 2 | 2 | 7 | 2 | 2 | 2 | 1 | 1 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … |
The full results from the report will output a years worth of data after the start_date