This is an SQL version of a Cohort Analysis Report for User Activity, for use within the Data Explorer plugin.
This report requires the Discourse Solved plugin to be enabled.
This Cohort Analysis report provides insights into user engagement by tracking the activity of users who have signed up since a specified start date. It specifically focuses on users who meet or exceed certain thresholds for posts created and solutions provided each month after their registration.
This report is valuable for administrators looking to understand the effectiveness of their community in retaining and engaging new users over time, specifically related to posting and providing solutions to topics. This report can also be useful for evaluating community health and the identifying effectiveness of community growth strategies.
Active Users per Month after Signup w Min Posts + Solutions Parameters
--[params]
-- date :start_date = 2023-01-01
-- int :min_posts_per_month = 1
-- int :min_solutions_per_month = 1
WITH user_cohorts AS (
SELECT
id AS user_id,
DATE_TRUNC('month', created_at) AS cohort,
COUNT(id) OVER (PARTITION BY DATE_TRUNC('month', created_at)) AS users_signed_up
FROM users
WHERE created_at >= :start_date
),
posts_activity AS (
SELECT
p.user_id,
EXTRACT(YEAR FROM AGE(p.created_at, u.created_at)) * 12 + EXTRACT(MONTH FROM AGE(p.created_at, u.created_at)) AS months_after_registration,
DATE_TRUNC('month', u.created_at) AS cohort
FROM posts p
JOIN users u ON p.user_id = u.id
WHERE p.created_at >= u.created_at
),
solutions_counts AS (
SELECT
p.user_id,
COUNT(p.user_id) as solutions_count,
EXTRACT(YEAR FROM AGE(p.created_at, u.created_at)) * 12 + EXTRACT(MONTH FROM AGE(p.created_at, u.created_at)) AS months_after_registration,
DATE_TRUNC('month', u.created_at) AS cohort
FROM post_custom_fields pcf
INNER JOIN posts p ON p.id = pcf.post_id
JOIN topics t ON t.id = p.topic_id
JOIN users u ON p.user_id = u.id
WHERE pcf.name = 'is_accepted_answer'
AND p.created_at >= u.created_at
GROUP BY months_after_registration, cohort, p.user_id
HAVING COUNT(p.user_id) >= :min_solutions_per_month
),
activity_counts AS (
SELECT
cohort,
months_after_registration,
COUNT(user_id) AS posts_count,
user_id
FROM posts_activity
GROUP BY cohort, months_after_registration, user_id
HAVING COUNT(user_id) >= :min_posts_per_month
),
active_users AS (
SELECT
sc.cohort,
sc.months_after_registration,
COUNT(DISTINCT ac.user_id) AS active_users
FROM solutions_counts sc
FULL JOIN activity_counts ac ON sc.user_id = ac.user_id
AND sc.months_after_registration = ac.months_after_registration
AND sc.cohort = ac.cohort
GROUP BY sc.cohort, sc.months_after_registration
),
cohorts_series AS (
SELECT generate_series AS months_after_registration
FROM generate_series(0, 11)
),
cohorts AS (
SELECT
cohort,
MAX(users_signed_up) AS users_signed_up
FROM user_cohorts
GROUP BY cohort
),
cross_join AS (
SELECT
c.cohort,
c.users_signed_up,
cs.months_after_registration
FROM cohorts c
CROSS JOIN cohorts_series cs
),
final_counts AS (
SELECT
cj.cohort,
cj.users_signed_up,
cj.months_after_registration,
COALESCE(au.active_users, 0) AS active_users
FROM cross_join cj
LEFT JOIN active_users au ON au.cohort = cj.cohort AND au.months_after_registration = cj.months_after_registration
)
SELECT
TO_CHAR(cohort, 'Mon YYYY') AS "Joined In", -- Include the year in the Joined In column
users_signed_up AS "Users Signed Up",
MAX(CASE WHEN months_after_registration = 0 THEN active_users END) AS "Month 1",
MAX(CASE WHEN months_after_registration = 1 THEN active_users END) AS "Month 2",
MAX(CASE WHEN months_after_registration = 2 THEN active_users END) AS "Month 3",
MAX(CASE WHEN months_after_registration = 3 THEN active_users END) AS "Month 4",
MAX(CASE WHEN months_after_registration = 4 THEN active_users END) AS "Month 5",
MAX(CASE WHEN months_after_registration = 5 THEN active_users END) AS "Month 6",
MAX(CASE WHEN months_after_registration = 6 THEN active_users END) AS "Month 7",
MAX(CASE WHEN months_after_registration = 7 THEN active_users END) AS "Month 8",
MAX(CASE WHEN months_after_registration = 8 THEN active_users END) AS "Month 9",
MAX(CASE WHEN months_after_registration = 9 THEN active_users END) AS "Month 10",
MAX(CASE WHEN months_after_registration = 10 THEN active_users END) AS "Month 11",
MAX(CASE WHEN months_after_registration = 11 THEN active_users END) AS "Month 12"
FROM final_counts
GROUP BY cohort, users_signed_up
ORDER BY cohort
SQL Query Explanation
Parameters
This report uses three parameters:
start_date
: The date from which to start tracking new user signups.min_posts_per_month
: The minimum number of posts a user must make in a month to be considered active.min_solutions_per_month
: The minimum number of solutions (accepted answers) a user must provide in a month to be considered active.
Users must meet both the min_posts_per_month
and min_solutions_per_month
requirements to be considered active for the month.
CTEs
The SQL query provided uses several Common Table Expressions (CTEs) to break down the process of calculating active users per month after signup, based on minimum posts and solutions criteria. Here’s an explanation of each CTE:
user_cohorts
This CTE identifies the cohorts of users based on their signup month. For each user, it calculates the month of their signup (cohort
) and counts the total number of users who signed up in that same month. This helps in understanding the initial size of each cohort.
posts_activity
This CTE tracks the activity of users in terms of posts made after their signup. For each post, it calculates how many months have passed since the user’s registration date (months_after_registration
) and groups these by the user’s signup cohort. This is used to track how active users are in terms of posting content over time.
solutions_counts
This CTE focuses on counting the solutions (accepted answers) provided by users. It filters posts marked as solutions and counts them for each user, ensuring that only posts made after the user’s signup are considered. It also calculates how many months have passed since the user’s signup for each solution. Users are included in this count only if they meet or exceed the specified minimum number of solutions per month.
activity_counts
This CTE aggregates the number of posts made by each user per month after registration. It groups users by their signup cohort and the number of months since registration, then counts the number of posts made. Only users who meet or exceed the specified minimum number of posts per month are included in this count.
active_users
This CTE combines the data from solutions_counts
and activity_counts
to identify active users—those who meet both the posts and solutions criteria. It counts distinct users who are active based on the criteria set for posts and solutions, grouped by their signup cohort and the number of months since registration.
cohorts_series
This CTE generates a series of numbers from 0 to 11, representing the number of months after registration. This is used to ensure that the final report includes data for each month up to 12 months, even if there are no active users in some months.
cohorts
This CTE aggregates the data from user_cohorts
to get the total number of users who signed up in each cohort. It ensures that the final report includes the total number of users signed up for each cohort.
cross_join
This CTE performs a cross join between the cohorts
CTE and the cohorts_series
CTE. This ensures that each cohort is represented for each month after signup, facilitating the calculation of active users for each month in the final step.
final_counts
This CTE combines all the previous CTEs to calculate the final counts of active users for each month after signup, for each cohort. It uses a left join to match active users from the active_users
CTE with the cohorts and months generated in the cross_join
CTE. It ensures that each cohort-month pair has a count of active users, defaulting to 0 if there are no active users for that pair.
Final SELECT
The final SELECT
statement in the query groups users into cohorts based on their signup month and year, and then calculates the number of active users for each month up to a year after signup. This is done through a combination of transformations and conditional aggregations, which pivot the data into a format where each row corresponds to a cohort, and each column represents the number of active users for each month after signup, from “Month 1” to “Month 12”.
Results
The report outputs a table with the following columns:
- Joined In: The month and year of the cohort (when users signed up).
- Users Signed Up: The total number of users who signed up in that cohort.
- Month 1 to Month 12: The number of active users for each month after signup, up to 12 months.
Example Results
Joined In | Users Signed Up | Month 1 | Month 2 | Month 3 | Month 4 | Month 5 | Month 6 | Month 7 | Month 8 | Month 9 | Month 10 | Month 11 | Month 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Jan 2023 | 50 | 40 | 8 | 4 | 3 | 3 | 3 | 4 | 3 | 2 | 1 | 1 | 4 |
Feb 2023 | 63 | 40 | 7 | 5 | 3 | 2 | 2 | 7 | 2 | 2 | 2 | 1 | 1 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … |
The full results from the report will output a years worth of data after the start_date