Help finding some monthly site stats about topics and user trust levels


(Tobias Eigen) #1

I’d like to track the following stats on a monthly basis and am finding them hard to find retroactively. I’d be grateful for any advice on how to proceed - maybe others would also be interested.

  1. Number of topics started this month, with participation of at least one member who is not in a certain group.

  2. Number of members in TL1 group this month.

  3. Number of members in TL2 group this month.


(Dean Taylor) #2

EDIT

Note there is a 2nd version of this query that doesn’t use badges to get results and includes TL3 numbers here.


For #2 and #3

Well badges would be a good thing to look at because they are “nearly” assigned at the point the user obtains that trust level.

Had a quick bash at a query…
… assumes the TL1 badge is called 'Basic User' and TL2 is called 'Member'.

Data Explorer: count-new-tl1-or-tl2-users-past-12-months.dcquery.json (938 Bytes)

SELECT
  *
FROM  (
  SELECT
    to_char(date(day),'YYYY-MM') as year_month
  FROM
    generate_series(
      ( date_trunc('month', CURRENT_DATE) - INTERVAL '1 year' ),
      CURRENT_DATE,
      interval '1 month'
    ) AS day
) AS d
LEFT JOIN (
  SELECT
    to_char(date(granted_at),'YYYY-MM') as year_month,
    COUNT(CASE WHEN badges.name = 'Basic User' THEN 1 ELSE NULL END) AS "TL1",
    COUNT(CASE WHEN badges.name = 'Member' THEN 1 ELSE NULL END) AS "TL2"
  FROM user_badges
  JOIN badges ON badges.id = badge_id
  WHERE
    granted_at > ( date_trunc('month', CURRENT_DATE) - INTERVAL '1 year' )
    AND ( badges.name = 'Basic User' OR badges.name = 'Member' )
  GROUP BY
      year_month
) AS t USING (year_month)
ORDER BY
    year_month

Which looks a little something like this:

But just with more rows.

You might want to note that a user can be counted in both TL1 and TL2 in a single month if they obtained that level in that month.


(Tobias Eigen) #3

I like it - thanks much for pulling this together.

TL badges were turned off on my site because we didn’t (and still don’t) like the having multiple trust level badges cluttering user cards so I won’t have the back history, unfortunately. But I turned them on now to get this useful data so will at least have a baseline to start with.


(Mittineague) #4

You still have the Trust Level groups, no?

I don’t have the time to mess with the query now, but I think you should be able to get what you want from tables other than the badges table. eg.
users.trust_level
groups.id - groups.user_count


(Tobias Eigen) #5

Thanks for the suggestion - but do the trust level groups keep track of when they were acquired? If so that would work. With badges we can see when they were granted.

Another thought: instead of 1 above, I could also benefit from tracking the number of posts every month but excluding posts in private staff-only categories. Perhaps that’s easier to get. In fact, it would be great to be able to filter by user on the /admin/reports/posts stats page like you can with preview digest.


(Mittineague) #6

For that I think a JOIN to the group_users table to get created_at / updated_at would work


(Dean Taylor) #7

Here is a “Topic Participation” query, someone else is going to want to check it though.

Data Explorer: topic-participation.dcquery (v4).json (7.9 KB)

  • EDIT #1 Updated to handle including rows for month when data doesn’t exist.
  • EDIT #2 Updated to check for responses within X days (default 15)
  • EDIT #3 Updated to:
  • include posts information
  • separated exclusions for “topics created by user from group” and “posts created by user from group”
  • impersonate a particular users view of topics - use an admin user to see everything.
  • optionally restrict results to that month (-1) for responses or check for replies within X days.
  • support output of response rate as percentage ready for Excel.
  • WARNING: You might want to look further down this thread to see if anybody has sanity checked the query.

Parameters

  • impersonate
  • Username to view of topics as - use an admin username like system user to see everything.
  • exclude_groups_topics
  • list of user groups (separated by a comma) to exclude topics from
  • enter a single comma (,) to not exclude any.
  • if the group doesn’t exist, no exclusion will occur (check your spelling)
  • exclude_groups_replies
  • list of user groups (separated by a comma) to exclude replies from
  • enter a single comma (,) to not exclude any.
  • if the group doesn’t exist, no exclusion will occur.
  • exclude_category_slugs
  • list of category slugs (separated by a comma) to exclude topics from
  • enter a single comma (,) to not exclude any.
  • reply_within_x_days
  • Set to -1 to restrict responses check to that month (default).
  • Set to 0 to restrict responses to the same day topic was created.
  • Set to 15 to count as response up to 15 day after initial topic creation even if 15 days is outside of that month.
  • pct_multiplier and pct_places
  • Set pct_multiplier to 1 and pct_places to 6 to get percent values ready for Excel.
  • Set pct_multiplier to 100 and pct_places to 1 to display as readable percentage values (default).
-- [params]
-- string :impersonate = system
-- string :exclude_groups_topics = staff
-- string :exclude_groups_replies = ,
-- string :exclude_category_slugs = staff,dev,simply-thank-you,was-it-you,wanted
-- int :reply_within_x_days = -1
-- int :pct_multiplier = 100
-- int :pct_places = 1



WITH
year_months AS (
  SELECT
    to_char(date(day),'YYYY-MM') as year_month
  FROM
    generate_series(
      (date_trunc('month', CURRENT_DATE) - INTERVAL '1 year' ),
      CURRENT_DATE,
      interval '1 month'
    ) AS day
),
impersonated_usernames AS (
 SELECT unnest(string_to_array( :impersonate, ',')) AS username
),
impersonated_users AS (
 SELECT id
    FROM users AS u
    JOIN impersonated_usernames iun ON iun.username = u.username
    WHERE iun.username = u.username
),
excluded_group_names_topics AS (
 SELECT unnest(string_to_array( :exclude_groups_topics, ',')) AS name
),
excluded_group_names_replies AS (
 SELECT unnest(string_to_array( :exclude_groups_replies, ',')) AS name
),
excluded_users_topics AS (
    SELECT user_id AS id
    FROM users AS u
    JOIN group_users AS gu ON gu.user_id = u.id
    JOIN groups AS g ON g.id = gu.group_id
    JOIN excluded_group_names_topics eg ON eg.name = g.name
    WHERE g.name = eg.name
),
excluded_users_replies AS (
    SELECT user_id AS id
    FROM users AS u
    JOIN group_users AS gu ON gu.user_id = u.id
    JOIN groups AS g ON g.id = gu.group_id
    JOIN excluded_group_names_replies eg ON eg.name = g.name
    WHERE g.name = eg.name
),
excluded_category_slugs AS (
 SELECT unnest(string_to_array( :exclude_category_slugs, ',')) AS slug
),
excluded_categories AS (
    SELECT id
    FROM categories AS c
    JOIN excluded_category_slugs ec ON ec.slug = c.slug
    WHERE ec.slug = c.slug
),
restricted_topics AS (
  SELECT
    t.id,
    t.created_at,
    t.user_id
  FROM
    topics AS t
  JOIN users u on u.id IN (SELECT id FROM impersonated_users)
  JOIN user_stats AS us ON us.user_id = u.id
  JOIN user_options AS uo ON uo.user_id = u.id
  JOIN categories c ON c.id = t.category_id
  LEFT JOIN topic_users tu ON tu.topic_id = t.id AND tu.user_id = u.id
  WHERE
    u.id IN (SELECT id FROM impersonated_users)
    /** test **
    AND t.created_at > (date_trunc('day', CURRENT_DATE) - INTERVAL '1 day' )
    AND t.created_at < (date_trunc('day', CURRENT_DATE) )
    ** test **/
    AND t.created_at > (date_trunc('month', CURRENT_DATE) - INTERVAL '1 year' )
    AND t.archetype <> 'private_message'
    AND t.deleted_at IS NULL
    AND (t.visible OR u.admin OR u.moderator)
    AND (
      NOT c.read_restricted OR u.admin OR category_id IN (
        SELECT c2.id FROM categories c2
        JOIN category_groups cg ON cg.category_id = c2.id
        JOIN group_users gu ON gu.user_id IN (SELECT id FROM impersonated_users)
          AND cg.group_id = gu.group_id
        WHERE c2.read_restricted
      )
    )
    AND t.user_id NOT IN (SELECT id FROM excluded_users_topics)
    AND t.category_id NOT IN (SELECT id from excluded_categories)
),
restricted_posts AS (
  SELECT
    p.id,
    p.created_at
  FROM
    posts AS p
  LEFT JOIN topics AS t ON p.topic_id = t.id
  JOIN users u on u.id IN (SELECT id FROM impersonated_users)
  JOIN user_stats AS us ON us.user_id = u.id
  JOIN user_options AS uo ON uo.user_id = u.id
  JOIN categories c ON c.id = t.category_id
  LEFT JOIN topic_users tu ON tu.topic_id = t.id AND tu.user_id = u.id
  WHERE
    u.id IN (SELECT id FROM impersonated_users)
    /** test **
    AND p.created_at > (date_trunc('day', CURRENT_DATE) - INTERVAL '1 day' )
    AND p.created_at < (date_trunc('day', CURRENT_DATE) )
    ** test **/
    AND p.created_at > (date_trunc('month', CURRENT_DATE) - INTERVAL '1 year' )
    AND t.archetype <> 'private_message'
    AND t.deleted_at IS NULL
    AND (t.visible OR u.admin OR u.moderator)
    AND (
      NOT c.read_restricted OR u.admin OR category_id IN (
        SELECT c2.id FROM categories c2
        JOIN category_groups cg ON cg.category_id = c2.id
        JOIN group_users gu ON gu.user_id IN (SELECT id FROM impersonated_users)
          AND cg.group_id = gu.group_id
        WHERE c2.read_restricted
      )
    )
    AND t.user_id NOT IN (SELECT id FROM excluded_users_topics)
    AND t.category_id NOT IN (SELECT id from excluded_categories)
),
qualifying_topics AS (
  SELECT
    rt.id,
    rt.created_at::date AS created_at,
    MIN(p.post_number) AS first_reply,
    COUNT(p) AS new_topic_posts
  FROM
    restricted_topics AS rt
  LEFT JOIN posts p ON
    p.topic_id = rt.id
    AND p.deleted_at IS NULL
    AND p.user_id != rt.user_id
    AND p.user_id NOT IN (SELECT id FROM excluded_users_replies)
    AND (
      ( :reply_within_x_days >= 0
        AND p.created_at < ( rt.created_at::date + ( :reply_within_x_days + 1 ) )
      )
      OR
      (
       :reply_within_x_days < 0
        AND to_char(date(p.created_at),'YYYY-MM') = to_char(date(rt.created_at),'YYYY-MM')
      )
    )
  WHERE
    rt.created_at > (date_trunc('month', CURRENT_DATE) - INTERVAL '1 year' )
  GROUP BY rt.id, rt.created_at
),
topic_year_month_rate AS (
    SELECT
        to_char(date(tt.created_at),'YYYY-MM') as year_month,
        COUNT(*) AS topics,
        SUM(new_topic_posts) AS new_topic_posts,
        SUM(CASE WHEN tt.first_reply IS NULL THEN 1 ELSE 0 END) AS no_response,
        SUM(CASE WHEN tt.first_reply IS NOT NULL THEN 1 ELSE 0 END) AS response
    FROM qualifying_topics AS tt
    GROUP BY
      year_month
    ORDER BY
      year_month
),
posts_year_month_rate AS (
    SELECT
        to_char(date(rp.created_at),'YYYY-MM') as year_month,
        COUNT(*) as posts
    FROM restricted_posts as rp
    GROUP BY
      year_month
    ORDER BY
      year_month
),
raw_year_month_rate AS (
  SELECT
    year_month,
    SUM(posts) AS posts,
    SUM(new_topic_posts) AS new_topic_posts,
    (SUM(posts) - SUM(new_topic_posts)) AS old_topic_posts,
    SUM(topics) AS topics,
    SUM(no_response) AS no_response,
    SUM(response) AS response
  FROM (
    SELECT year_month, topics, 0 AS posts, new_topic_posts, 0 AS old_topic_posts, no_response, response FROM topic_year_month_rate
    UNION ALL
    SELECT year_month, 0 AS topics, posts, 0 AS new_topic_posts, 0 AS old_topic_posts, 0 AS no_response, 0 AS response FROM posts_year_month_rate
  ) AS combined
  GROUP BY year_month
  ORDER BY year_month
),
year_month_rate AS (
    SELECT
      rymr.*,
      (CASE
        WHEN topics <= 0 THEN 0
        ELSE (
         ROUND( (:pct_multiplier + 0.0) * (
           (response * 1.0) / (topics * 1.0) 
          ), :pct_places)
        ) END
      ) AS response_rate
    FROM 
      raw_year_month_rate rymr
)

SELECT
 *
FROM
 year_months AS d
LEFT JOIN (
  SELECT * FROM year_month_rate
) AS t USING (year_month)
ORDER BY
  year_month


If you don’t want to use any exclusions just put a comma (,) in that input box.

So it should look a little like this:

So the columns are:

  • posts
  • all qualifying posts from topics accessible to impersonated user after exclusion of both users and categories.
  • new_topic_posts
  • posts made to topics created that month.
  • old_topic_posts
  • posts made to topics created before that month.
  • topics
  • all qualifying topics after exclusion of both users and categories.
  • no_response
  • count of topics that didn’t receive a reply from that type of user
  • e.g. a staff member might have replied, but if you excluded staff it would be counted as no_response if that topic wasn’t created itself by an excluded user.
  • a topic created by an excluded user is excluded anyway.
  • response
  • count of topics that did receive a reply the month the topic was created from a non-excluded user.
  • response_rate
  • percentage rate of response - higher the better

What cool data explorer queries have you come up with?
Reply reminder - Remind users to reply to new users topics with zero replies
(Dean Taylor) #8

So for #2 and #3:

Here is a 2nd version based on group_users table, I added “Trust Level 3” as well:

Data Explorer: count-new-tl1-or-tl2-users-past-12-months-v2.dcquery.json (1.3 KB)

WITH
year_months AS (
  SELECT
    to_char(date(day),'YYYY-MM') as year_month
  FROM
    generate_series(
      (date_trunc('month', CURRENT_DATE) - INTERVAL '1 year' ),
      CURRENT_DATE,
      interval '1 month'
    ) AS day
),
qualifying_users AS (
  SELECT
    gu.user_id,
    to_char(date(gu.created_at),'YYYY-MM') as year_month,
    SUM(CASE WHEN g.name = 'trust_level_1' THEN 1 ELSE 0 END) AS tl1,
    SUM(CASE WHEN g.name = 'trust_level_2' THEN 1 ELSE 0 END) AS tl2,
    SUM(CASE WHEN g.name = 'trust_level_3' THEN 1 ELSE 0 END) AS tl3
  FROM group_users AS gu
  JOIN groups AS g ON g.id = gu.group_id
  WHERE
    gu.created_at > ( date_trunc('month', CURRENT_DATE) - INTERVAL '1 year' )
    AND ( g.name = 'trust_level_1' OR g.name = 'trust_level_2' OR g.name = 'trust_level_3' )
  GROUP BY
      gu.user_id,
      year_month
),
year_month_rate AS (
  SELECT
    year_month,
    SUM(tl1) AS tl1,
    SUM(tl2) AS tl2,
    SUM(tl3) AS tl3
  FROM qualifying_users q
  GROUP BY
      year_month
)



SELECT
 *
FROM
 year_months AS d
LEFT JOIN (
  SELECT * FROM year_month_rate
) AS t USING (year_month)
ORDER BY
  year_month

What cool data explorer queries have you come up with?
(Dean Taylor) #9

It should be noted that for the Topic Participation query:

The query could be improved greatly, it currently checks for a reply within that calendar month for the topic.

So if someone creates a topic on April 28th there are only a couple of days left in that month to get a reply.

But if if someone creates a topic on April 1st they have the entire month to get a reply and have it counted.

Some kind of defined period for checking for replies would be good, perhaps 15 days.


EDIT

Topic Participation query has been updated.

Now includes reply_within_x_days parameter to allow the topic to be counted if replies are within that period.

If you set reply_within_x_days to 0 you can get figures on how many topics got replies the same day as the original topic was posted.


(Tobias Eigen) #10

Hmm… so maybe my second idea of just tracking the number of posts every month to public community categories is more sensible/meaningful.

Thanks for pulling this together - I really appreciate it! :rocket:


(Dean Taylor) #11

I believe as long as you don’t “use” last months figures until at least reply_within_x_days after the last day of the month these are good numbers.

Actually I’m seeing 1 days (78.6%), 4 days (78.6%) and 15 days (79.5%) for one sample set of data - so you could pick a smaller number of days and measure that, Yes 1 day and 4 days are the same.


Reply reminder - Remind users to reply to new users topics with zero replies
(Tobias Eigen) #12

this is awesome - works like a charm! :sparkles:


(Tobias Eigen) #13

This topic participation by month query is pretty neat, and will really help to give insight into activity over time by different groups. :thumbsup:

I think what I need is actually something else, something more simple, for my monthly post activity indicator. I’d just like to track the number of posts per month total, excluding private categories or excluding categories that a representative member has access to.

Has anyone done this already?

Loving the way you share queries, btw, this is extremely handy. Maybe we should set up a wiki topic to gather shared queries and maintain them.


(Dean Taylor) #14

So the "topic_count per month" is already in the “Topic Participation” query results.
These results already exclude categories by slug.

I have no idea what this means:

But it certainly sounds “more complicated” and not “more simple”.


(Tobias Eigen) #15

Hi Dean! Sorry I’m not making any sense. To try to clarify:

  • I am not interested in topics created per month, but posts - this includes new topics and all replies. Does your "topic_count per month include all of these?
  • We have dozens of private categories - excluding them all by name is cumbersome and subject to change over time. I’d rather do what I do on the digest preview and just enter the name of a user who I know doesn’t have access to any private categories. Simpler?

(Dean Taylor) #16

I’ve updated the “Topic Participation” query:

  • Include posts information
  • separated exclusions for “topics created by user from group” and “posts created by user from group”
  • impersonate a particular users view of topics - use an admin user like system to see everything.
  • optionally restrict results to that month (-1) for responses or check for replies within X days.
  • support output of response rate as percentage ready for Excel (i.e. a value between 0 and 1).

It would be nice if someone would sanity check the query please.

To check and:

  • ensure it makes sense
  • I’m not 100% confident the Discourse internals for security lookups
  • at over 200 lines of SQL it’s quite long.

PAID Data explorer challenge: chronological list of user topics started, replied and liked
PAID Data explorer challenge: chronological list of user topics started, replied and liked
(Tobias Eigen) #17

I think it works - but only if I limit to a user who doesn’t have access to every single post… otherwise it times out.

So great to be able to get the total number of posts per month in community discussions. Many thanks.

Here, you deserve a beer. Let me pour you one.


(Dean Taylor) #18

Yeah - it’s not optimised, you might want to make a change to search a shorter period.


(Dave McClure) #19

Thanks for sharing this one!

I also noticed the “within X days” parameter not only defines what counts as a response for the response_rate, but also determines what is new_topic_post and what is an old_topic_post.

Considering whether it’s useful to separate those two parameters… looks like it should be pretty easy to do if I want to.


(Dean Taylor) #20

Currently it’s “as designed”, but if you wish to separate that into a parameter, happy to make the post a wiki post so it can be updated to add that feature.

Somebody else reviewing it to sanity check things is always helpful.