Cohort Analysis Report - Monthly User Activity by Posts and Solutions

This is an SQL version of a Cohort Analysis Report for User Activity, for use within the Data Explorer plugin.

:discourse: This report requires the Discourse Solved plugin to be enabled.

This Cohort Analysis report provides insights into user engagement by tracking the activity of users who have signed up since a specified start date. It specifically focuses on users who meet or exceed certain thresholds for posts created and solutions provided each month after their registration.

This report is valuable for administrators looking to understand the effectiveness of their community in retaining and engaging new users over time, specifically related to posting and providing solutions to topics. This report can also be useful for evaluating community health and the identifying effectiveness of community growth strategies.

Active Users per Month after Signup w Min Posts + Solutions Parameters

--[params]
-- date :start_date = 2023-01-01
-- int :min_posts_per_month = 1
-- int :min_solutions_per_month = 1

WITH user_cohorts AS (
    SELECT
        id AS user_id,
        DATE_TRUNC('month', created_at) AS cohort,
        COUNT(id) OVER (PARTITION BY DATE_TRUNC('month', created_at)) AS users_signed_up
    FROM users
    WHERE created_at >= :start_date 
),
posts_activity AS (
    SELECT
        p.user_id,
        EXTRACT(YEAR FROM AGE(p.created_at, u.created_at)) * 12 + EXTRACT(MONTH FROM AGE(p.created_at, u.created_at)) AS months_after_registration,
        DATE_TRUNC('month', u.created_at) AS cohort
    FROM posts p
    JOIN users u ON p.user_id = u.id
    WHERE p.created_at >= u.created_at
),
solutions_counts AS (
    SELECT
        p.user_id,
        COUNT(p.user_id) as solutions_count,
        EXTRACT(YEAR FROM AGE(p.created_at, u.created_at)) * 12 + EXTRACT(MONTH FROM AGE(p.created_at, u.created_at)) AS months_after_registration,
        DATE_TRUNC('month', u.created_at) AS cohort
    FROM post_custom_fields pcf
    INNER JOIN posts p ON p.id = pcf.post_id
    JOIN topics t ON t.id = p.topic_id
    JOIN users u ON p.user_id = u.id
    WHERE pcf.name = 'is_accepted_answer'
    AND p.created_at >= u.created_at
    GROUP BY months_after_registration, cohort, p.user_id
    HAVING COUNT(p.user_id) >= :min_solutions_per_month
),
activity_counts AS (
    SELECT
        cohort,
        months_after_registration,
        COUNT(user_id) AS posts_count,
        user_id
    FROM posts_activity
    GROUP BY cohort, months_after_registration, user_id
    HAVING COUNT(user_id) >= :min_posts_per_month 
),
active_users AS (
    SELECT
        sc.cohort,
        sc.months_after_registration,
        COUNT(DISTINCT ac.user_id) AS active_users
    FROM solutions_counts sc
    FULL JOIN activity_counts ac ON sc.user_id = ac.user_id
    AND sc.months_after_registration = ac.months_after_registration
    AND sc.cohort = ac.cohort
    GROUP BY sc.cohort, sc.months_after_registration
),
cohorts_series AS (
    SELECT generate_series AS months_after_registration
    FROM generate_series(0, 11)
),
cohorts AS (
    SELECT
        cohort,
        MAX(users_signed_up) AS users_signed_up 
    FROM user_cohorts
    GROUP BY cohort
),
cross_join AS (
    SELECT
        c.cohort,
        c.users_signed_up,
        cs.months_after_registration
    FROM cohorts c
    CROSS JOIN cohorts_series cs
),
final_counts AS (
    SELECT
        cj.cohort,
        cj.users_signed_up,
        cj.months_after_registration,
        COALESCE(au.active_users, 0) AS active_users
    FROM cross_join cj
    LEFT JOIN active_users au ON au.cohort = cj.cohort AND au.months_after_registration = cj.months_after_registration
)
SELECT
    TO_CHAR(cohort, 'Mon YYYY') AS "Joined In", -- Include the year in the Joined In column
    users_signed_up AS "Users Signed Up",
    MAX(CASE WHEN months_after_registration = 0 THEN active_users END) AS "Month 1",
    MAX(CASE WHEN months_after_registration = 1 THEN active_users END) AS "Month 2",
    MAX(CASE WHEN months_after_registration = 2 THEN active_users END) AS "Month 3",
    MAX(CASE WHEN months_after_registration = 3 THEN active_users END) AS "Month 4",
    MAX(CASE WHEN months_after_registration = 4 THEN active_users END) AS "Month 5",
    MAX(CASE WHEN months_after_registration = 5 THEN active_users END) AS "Month 6",
    MAX(CASE WHEN months_after_registration = 6 THEN active_users END) AS "Month 7",
    MAX(CASE WHEN months_after_registration = 7 THEN active_users END) AS "Month 8",
    MAX(CASE WHEN months_after_registration = 8 THEN active_users END) AS "Month 9",
    MAX(CASE WHEN months_after_registration = 9 THEN active_users END) AS "Month 10",
    MAX(CASE WHEN months_after_registration = 10 THEN active_users END) AS "Month 11",
    MAX(CASE WHEN months_after_registration = 11 THEN active_users END) AS "Month 12"
FROM final_counts
GROUP BY cohort, users_signed_up
ORDER BY cohort

SQL Query Explanation

Parameters

This report uses three parameters:

  • start_date: The date from which to start tracking new user signups.
  • min_posts_per_month: The minimum number of posts a user must make in a month to be considered active.
  • min_solutions_per_month: The minimum number of solutions (accepted answers) a user must provide in a month to be considered active.

Users must meet both the min_posts_per_month and min_solutions_per_month requirements to be considered active for the month.

CTEs

The SQL query provided uses several Common Table Expressions (CTEs) to break down the process of calculating active users per month after signup, based on minimum posts and solutions criteria. Here’s an explanation of each CTE:

user_cohorts

This CTE identifies the cohorts of users based on their signup month. For each user, it calculates the month of their signup (cohort) and counts the total number of users who signed up in that same month. This helps in understanding the initial size of each cohort.

posts_activity

This CTE tracks the activity of users in terms of posts made after their signup. For each post, it calculates how many months have passed since the user’s registration date (months_after_registration) and groups these by the user’s signup cohort. This is used to track how active users are in terms of posting content over time.

solutions_counts

This CTE focuses on counting the solutions (accepted answers) provided by users. It filters posts marked as solutions and counts them for each user, ensuring that only posts made after the user’s signup are considered. It also calculates how many months have passed since the user’s signup for each solution. Users are included in this count only if they meet or exceed the specified minimum number of solutions per month.

activity_counts

This CTE aggregates the number of posts made by each user per month after registration. It groups users by their signup cohort and the number of months since registration, then counts the number of posts made. Only users who meet or exceed the specified minimum number of posts per month are included in this count.

active_users

This CTE combines the data from solutions_counts and activity_counts to identify active users—those who meet both the posts and solutions criteria. It counts distinct users who are active based on the criteria set for posts and solutions, grouped by their signup cohort and the number of months since registration.

cohorts_series

This CTE generates a series of numbers from 0 to 11, representing the number of months after registration. This is used to ensure that the final report includes data for each month up to 12 months, even if there are no active users in some months.

cohorts

This CTE aggregates the data from user_cohorts to get the total number of users who signed up in each cohort. It ensures that the final report includes the total number of users signed up for each cohort.

cross_join

This CTE performs a cross join between the cohorts CTE and the cohorts_series CTE. This ensures that each cohort is represented for each month after signup, facilitating the calculation of active users for each month in the final step.

final_counts

This CTE combines all the previous CTEs to calculate the final counts of active users for each month after signup, for each cohort. It uses a left join to match active users from the active_users CTE with the cohorts and months generated in the cross_join CTE. It ensures that each cohort-month pair has a count of active users, defaulting to 0 if there are no active users for that pair.

Final SELECT

The final SELECT statement in the query groups users into cohorts based on their signup month and year, and then calculates the number of active users for each month up to a year after signup. This is done through a combination of transformations and conditional aggregations, which pivot the data into a format where each row corresponds to a cohort, and each column represents the number of active users for each month after signup, from “Month 1” to “Month 12”.

Results

The report outputs a table with the following columns:

  • Joined In: The month and year of the cohort (when users signed up).
  • Users Signed Up: The total number of users who signed up in that cohort.
  • Month 1 to Month 12: The number of active users for each month after signup, up to 12 months.

Example Results

Joined In Users Signed Up Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Month 9 Month 10 Month 11 Month 12
Jan 2023 50 40 8 4 3 3 3 4 3 2 1 1 4
Feb 2023 63 40 7 5 3 2 2 7 2 2 2 1 1
… … … … … … … … … … … … … …

The full results from the report will output a years worth of data after the start_date

3 Likes