Retrospectively pulling number of users each calendar month

Hello! We are using the cloud-hosted version of Discourse and have been running our platform for about 1.5 years. We would like to pull some historical metrics on platform adoption and use. We are already looking at the ‘Signups’ report available in the Admin Reports tab, but we also want to be able to show the total number of active users each month. Does anyone have insight on how I could do this? I can’t seem to find anything appropriate in Reports or Data Explorer.

This data explorer query will return unique monthly user visits. Is that what you want?

--[params]
-- date :start_date = 2021-01-01
-- date :end_date = 2021-12-31
-- string_list :group_name = staff,admins

WITH not_allowed_groups AS (
    SELECT gu.user_id FROM group_users gu
    INNER JOIN groups g ON (g.id = gu.group_id)
    WHERE g.name IN (:group_name) 
)

SELECT 
    to_char(date_trunc('month', uv.visited_at)::DATE,'YYYY-MM') AS "month",
    date_trunc('month', uv.visited_at)::DATE AS date_user_visits,
    COUNT(distinct uv.user_id) AS qtt_user_visits
FROM user_visits uv
WHERE
    uv.user_id NOT IN (SELECT user_id FROM not_allowed_groups)
    AND uv.visited_at::DATE BETWEEN :start_date AND :end_date
GROUP BY date_trunc('month', uv.visited_at)::DATE
ORDER BY date_trunc('month', uv.visited_at)::DATE

Thank you very much but not exactly… We would like the number of users that are active (signed up and not deactivated or deleted) at the end of each calendar month, regardless of whether they visited the platform that month or not.

We are looking at the plot of signups, but the sum of signups during a given month doesn’t reflect our user base because each month we also have automatic deactivations due to inactivity. I wonder if the only way is to access retrospective user list exports for specific dates? But because we are Cloud-hosted I don’t know if we have access to that. Would we need to request from Discourse?

I am Discourse. :slight_smile:

You can either ask here or email us directly on team@discourse.org. Someone with better SQL knowledge will be able to help.

4 Likes

Didn’t realise as I’m new to this forum, thank you! Do you happen to have stored historical backups of cloud-hosted sites that you could provide upon request?

1 Like

We store backups for business continuity purposes and they can be restored in cases of emergency, but not really for other reasons.

1 Like

Ok, I have another idea. I asked our AI helper.

SQL Query for Active Users by Month AI

HAWK:

Write me a data explorer query which returns a list of user accounts that are still active and were registered within a given month

DiscourseHelper:

Searching

Searching for: Discourse Data Explorer query to find active users registered within a specific month

Reading

Read: Active users for specific months

To create a Data Explorer query that returns a list of user accounts that are still active and were registered within a specific month, you can use the following SQL query. This query assumes that an active user is someone who has made a post or reply within a given timeframe.

--[params]
-- int :month = 5
-- int :year = 2023

SELECT u.id, u.username, u.created_at, u.last_seen_at
FROM users u
WHERE u.active = TRUE
  AND EXTRACT(MONTH FROM u.created_at) = :month
  AND EXTRACT(YEAR FROM u.created_at) = :year
  AND u.last_seen_at IS NOT NULL -- Assuming active means they accessed the site recently
ORDER BY u.created_at

Explanation of the Query:

  • :month and :year are parameters you can set to filter for the specific month and year.
  • u.active = TRUE ensures that the user account is still active.
  • EXTRACT(MONTH FROM u.created_at) = :month and EXTRACT(YEAR FROM u.created_at) = :year filter the registration date to the specified month and year.
  • u.last_seen_at IS NOT NULL considers the user active if they have logged in at any time.
  • The result provides user ID, username, registration date, and last-seen date.

Feel free to adjust the parameters and fields according to your needs. Let me know if you need any more assistance!

Feel free to give it further context if I’m missing something.

3 Likes

This is an awesome query explanation, thank you! However, I’m not sure if it’s what we are looking for unfortunately.

For example, if I set parameters 01 and 2024 I am looking at January this year.
I understand that the output will include all the users who signed up in January who are still active today. Is that correct?

But what we want is the number of people who had u.active=TRUE each month since launch (not right now). Then we can plot and show how our user base has changed from month to month. At the end of every month, for us this would be:

Active users = Previous month’s value + new signups - new deletions - new deactivations

I don’t see anything about deactivations or dates being tied to the active variable in the Data Explorer. The most likely course then, may be to use the staff action logs and analyse by date BUT I don’t think that the export function is working for us. When we export staff action log results we strangely get a spreadsheet with the content of different posts, not a table of staff actions. If you guys could help us with this, maybe that would be an alternative solution! :slight_smile: Are you aware of others having the same issue?

An interesting question.

I first took a look at the example here. But that does ignore deleted users. You only get the number of users who were registered at that time and still are, not those who have been deleted in the meantime.
My idea was therefore to take the ID of the user who last registered in the month. This is the maximum possible number of users at that time. The number of deleted users can then be subtracted from this. However, bot accounts (such as forum-helper) have a negative ID, but are counted if they are deleted. (But this is probably a minor deviation). My query was:

-- [params]
-- date :start_date
-- date :end_date


WITH month_dates AS (
    -- Generate end-of-month dates between the start and end date
    SELECT DATE_TRUNC('month', generate_series)::date + INTERVAL '1 month' - INTERVAL '1 day' AS month_end
    FROM generate_series(:start_date::date, :end_date::date, '1 month'::interval)
),
recent_user AS (
    -- For each end-of-month date, find the most recent user created before that date
    SELECT md.month_end,
           (SELECT id 
            FROM users u 
            WHERE u.created_at < md.month_end 
            ORDER BY u.created_at DESC 
            LIMIT 1) AS user_max_id
    FROM month_dates md
),
cumulative_deletion_count AS (
    -- Calculate the cumulative deletions up to each end-of-month date
    SELECT md.month_end,
           (SELECT COUNT(*)
            FROM user_histories uh
            WHERE uh.action = 1 AND uh.updated_at < md.month_end) AS deletions_count
    FROM month_dates md
)
SELECT 
    md.month_end,
    ru.user_max_id,
    cdc.deletions_count,
    ru.user_max_id - cdc.deletions_count AS number_of_users
FROM 
    month_dates md
LEFT JOIN recent_user ru ON md.month_end = ru.month_end
LEFT JOIN cumulative_deletion_count cdc ON md.month_end = cdc.month_end
ORDER BY md.month_end

But what it doesn’t take into account is (de)activation, which are also stored in the user_histories table. But maybe it will help you as a starting point.

5 Likes

Hey @marvi,
Just wondering if @Moin’s answer works for you?