Cohort Analysis Report - Monthly User Activity

This is an SQL version of a Cohort Analysis Report for User Activity, for use within the Data Explorer plugin.

The Cohort Analysis Report is designed to provide administrators with insights into user engagement over time. By analyzing the activity of users grouped by their registration month (cohorts), this report tracks the number of active users each month post-registration who meet a minimum posting activity criteria.

This report can be a valuable resource for understanding user retention, engagement trends, evaluating community health, and the identifying effectiveness of community growth strategies.

Cohort Analysis Report - Active Users per Month

--[params]
-- date :start_date = 2023-01-01
-- int :min_posts_per_month = 1

WITH user_cohorts AS (
    SELECT
        id AS user_id,
        DATE_TRUNC('month', created_at) AS cohort,
        COUNT(id) OVER (PARTITION BY DATE_TRUNC('month', created_at)) AS users_signed_up
    FROM users
    WHERE created_at >= :start_date -- Use the start_date parameter to filter users
),
posts_activity AS (
    SELECT
        p.user_id,
        EXTRACT(YEAR FROM AGE(p.created_at, u.created_at)) * 12 + EXTRACT(MONTH FROM AGE(p.created_at, u.created_at)) AS months_after_registration,
        DATE_TRUNC('month', u.created_at) AS cohort
    FROM posts p
    JOIN users u ON p.user_id = u.id
    WHERE p.created_at >= u.created_at
),
activity_counts AS (
    SELECT
        cohort,
        months_after_registration,
        COUNT(user_id) AS posts_count,
        user_id
    FROM posts_activity
    GROUP BY cohort, months_after_registration, user_id
    HAVING COUNT(user_id) >= :min_posts_per_month -- Filter users by the minimum posts per month
),
active_users AS (
    SELECT
        cohort,
        months_after_registration,
        COUNT(DISTINCT user_id) AS active_users
    FROM activity_counts
    GROUP BY cohort, months_after_registration
),
cohorts_series AS (
    SELECT generate_series AS months_after_registration
    FROM generate_series(0, 11)
),
cohorts AS (
    SELECT
        cohort,
        MAX(users_signed_up) AS users_signed_up -- Aggregate to get the total users signed up for each cohort
    FROM user_cohorts
    GROUP BY cohort
),
cross_join AS (
    SELECT
        c.cohort,
        c.users_signed_up,
        cs.months_after_registration
    FROM cohorts c
    CROSS JOIN cohorts_series cs
),
final_counts AS (
    SELECT
        cj.cohort,
        cj.users_signed_up,
        cj.months_after_registration,
        COALESCE(au.active_users, 0) AS active_users
    FROM cross_join cj
    LEFT JOIN active_users au ON au.cohort = cj.cohort AND au.months_after_registration = cj.months_after_registration
)
SELECT
    TO_CHAR(cohort, 'Mon YYYY') AS "Joined In", -- Include the year in the Joined In column
    users_signed_up AS "Users Signed Up",
    MAX(CASE WHEN months_after_registration = 0 THEN active_users END) AS "Month 1",
    MAX(CASE WHEN months_after_registration = 1 THEN active_users END) AS "Month 2",
    MAX(CASE WHEN months_after_registration = 2 THEN active_users END) AS "Month 3",
    MAX(CASE WHEN months_after_registration = 3 THEN active_users END) AS "Month 4",
    MAX(CASE WHEN months_after_registration = 4 THEN active_users END) AS "Month 5",
    MAX(CASE WHEN months_after_registration = 5 THEN active_users END) AS "Month 6",
    MAX(CASE WHEN months_after_registration = 6 THEN active_users END) AS "Month 7",
    MAX(CASE WHEN months_after_registration = 7 THEN active_users END) AS "Month 8",
    MAX(CASE WHEN months_after_registration = 8 THEN active_users END) AS "Month 9",
    MAX(CASE WHEN months_after_registration = 9 THEN active_users END) AS "Month 10",
    MAX(CASE WHEN months_after_registration = 10 THEN active_users END) AS "Month 11",
    MAX(CASE WHEN months_after_registration = 11 THEN active_users END) AS "Month 12"
FROM final_counts
GROUP BY cohort, users_signed_up
ORDER BY cohort

SQL Query Explanation

The report operates by segmenting users into cohorts based on the month they joined. It then tracks these cohorts to see how many users remain active in subsequent months, based on a defined minimum number of posts per month.

Parameters

This report has two parameters:

  • start_date: The beginning date from which users are considered for cohort analysis. Users who joined after this date are included in the report.
  • min_posts_per_month: The minimum number of posts a user must make in a month to be considered active for that month.

CTEs

The Cohort Analysis Report utilizes several Common Table Expressions (CTEs) to organize and process data for analysis. Each CTE serves a specific purpose in the overall query, building upon the previous ones to ultimately produce the final report. Here’s a breakdown of how each CTE operates:

1. user_cohorts

This CTE identifies the cohorts based on the month users joined. For each user, it calculates the cohort they belong to by truncating their created_at timestamp to the month. It also counts the number of users who signed up in each cohort.

  • Key Operations:
    • DATE_TRUNC('month', created_at) AS cohort: Truncates the created_at timestamp to month granularity, effectively grouping users by their sign-up month.
    • COUNT(id) OVER (PARTITION BY DATE_TRUNC('month', created_at)): Counts the number of users in each cohort.

2. posts_activity

This CTE tracks the posting activity of users relative to their registration date. It joins the posts and users tables to associate each post with the user who made it and calculates how many months have passed since the user’s registration at the time of each post.

  • Key Operations:
    • EXTRACT(YEAR FROM AGE(p.created_at, u.created_at)) * 12 + EXTRACT(MONTH FROM AGE(p.created_at, u.created_at)): Calculates the number of months that have passed since the user’s registration for each post.
    • DATE_TRUNC('month', u.created_at) AS cohort: Identifies the user’s cohort based on their registration month.

3. activity_counts

This CTE aggregates the posting activity from posts_activity to count the number of posts each user made in each month after registration. It filters these counts to include only users who meet the minimum posting activity specified by the min_posts_per_month parameter.

  • Key Operations:
    • GROUP BY cohort, months_after_registration, user_id: Groups the data by cohort, months after registration, and user ID to prepare for counting posts.
    • HAVING COUNT(user_id) >= :min_posts_per_month: Filters the grouped data to include only users who made at least the minimum number of posts in a month.

4. active_users

This CTE further aggregates the data from activity_counts to count the number of distinct active users in each cohort for each month after registration.

  • Key Operations:
    • COUNT(DISTINCT user_id) AS active_users: Counts the number of unique active users in each cohort for each month after registration.

5. cohorts_series

This CTE generates a series of integers from 0 to 11, representing the months after registration. This series is used to ensure that the final report includes all months up to 12 for each cohort, even if there’s no activity data for some months.

  • Key Operations:
    • generate_series(0, 11): Generates a series of integers from 0 to 11.

6. cohorts

This CTE aggregates the data from user_cohorts to get the total number of users signed up for each cohort.

  • Key Operations:
    • MAX(users_signed_up) AS users_signed_up: Aggregates the total number of users signed up for each cohort.

7. cross_join

This CTE performs a cross join between cohorts and cohorts_series to create a grid of all possible combinations of cohorts and months after registration. This ensures that the final report includes rows for each month for each cohort, facilitating the calculation of active users per month.

8. final_counts

This CTE combines the data from cross_join and active_users to calculate the final count of active users for each cohort for each month after registration. It uses a left join to ensure that all combinations of cohorts and months are included, even if there are no active users for some.

  • Key Operations:
    • COALESCE(au.active_users, 0) AS active_users: Ensures that the report shows 0 active users for combinations without any activity, rather than leaving them blank.

The final SELECT statement outside of the CTEs then formats and presents this data, showing the number of users signed up and the number of active users for each month after registration for each cohort.

Results

The report generates a table with the following columns:

  • Joined In: The month and year the cohort was created, indicating when these users signed up.
  • Users Signed Up: The total number of users who signed up in that cohort.
  • Month 1 to Month 12: Each of these columns represents the number of active users for the cohort in each subsequent month after joining, up to 12 months. An active user is defined as someone who has made at least the minimum number of posts specified by the min_posts_per_month parameter.

Example Results

Joined In Users Signed Up Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Month 9 Month 10 Month 11 Month 12
Jan 2023 120 40 8 4 3 3 3 4 3 2 1 1 4
Feb 2023 119 40 7 5 3 2 2 7 2 2 2 1 1

The full results from the report will output a years worth of data after the start_date

3 Likes