What cool data explorer queries have you come up with?

List of all Members of a Group with Custom Field

We use this query to get a list of the name and matriculation number of all team members:

SELECT users.name, user_custom_fields.value as matriculation
FROM users
JOIN group_users ON users.id = group_users.user_id
JOIN groups ON groups.id = group_users.group_id
JOIN user_custom_fields ON users.id = user_custom_fields.user_id
WHERE groups.name = 'Team'
AND user_custom_fields.name = 'user_field_2'

Simply insert the group name and the custom field id in the last two names.

3 Likes

Anyone able to provide the query for all users with a custom title (not a badge granted title)?

1 Like

I don’t think this is strictly possible, because a custom title could be character-by-character identical to a badge granted one (possible of a badge the user doesn’t have).

Here’s a query that might do what you want:

SELECT username FROM users
WHERE title IS NOT NULL

AND (
    SELECT count(*) FROM badges
    WHERE allow_title = true
    AND name = title
    ) = 0

AND (
    SELECT count(*) FROM groups
    WHERE groups.title = users.title
    ) = 0

It selects every title any user has, minus every title where a corresponding badge exists, minus every title that could come from a group membership. It does not test whether the user has the badge or the group membership.

Maybe it does help you, at least as a starting point :slight_smile:

1 Like

Thanks @fefrei, that seems to work, but as you mentioned it might not cover all the edge cases. I did have to change the first line SELECT title FROM users to SELECT username FROM users as I was looking for the usernames, not the custom titles.

@Mittineague was able to create a query that seems to work as well in this topic.

SELECT 
users.username
, users.title 
FROM users 
JOIN user_profiles 
ON user_profiles.user_id = users.id 
WHERE users.title IS NOT NULL 
AND users.title NOT LIKE '' 
AND user_profiles.badge_granted_title IS NOT TRUE
1 Like

Top quality users in last six months

Top 20 users by average post score.

Post scores are calculated based on reply count, likes, incoming links, bookmarks, average time (reading?) and read count.

SELECT 
    sum(p.score) / count(p) as "average score per post", 
    count(p.id) as post_count, 
    p.user_id
FROM posts p
JOIN users u ON u.id = p.user_id
WHERE p.created_at >= CURRENT_DATE - INTERVAL '6 month'
  AND NOT u.admin
  AND NOT u.silenced
  AND u.active
GROUP by user_id, u.views
HAVING count(p.id) > 50
ORDER BY sum(p.score) / count(p) DESC
LIMIT 20
9 Likes

Who has been sending the most messages in the last week?

SELECT user_id, count(*) AS message_count
FROM topics
WHERE archetype = 'private_message' AND subtype = 'user_to_user'
AND age(created_at) < interval '7 days'
GROUP BY user_id
ORDER BY message_count DESC

Useful for tracking down suspicious PM activity.

7 Likes

Users ordered by creation date

SELECT users.username, users.created_at
FROM users
WHERE users.created_at < '2017-05-10'::timestamp
ORDER BY users.created_at
2 Likes

Posts per category (with deleted and without deleted)

I think that by default deleted topics/posts will be included in data-explorer statistics.

Doing some quick experimentation on a category where we delete a lot of stuff:

SELECT count(*) from topics 
WHERE category_id=40

returns 799

And then, deliberately excluding deleted topics

SELECT count(*) from topics 
WHERE category_id=40
AND deleted_at is null

returns 416.

So to get the statistics you want, I think you can just run the query without checking deleted_at

7 Likes

Interesting…I assumed (likely incorrectly) that deleted topics were not included. Thanks!

A post was split to a new topic: Collecting data on registration for display later

Recreate the Discourse user’s directory

OK, here is the query to recreate the user’s directory.

-- [params]
-- null int :period

SELECT users.username AS "Username",
directory_items.likes_received AS "Likes Received",
directory_items.likes_given AS "Likes Given",
directory_items.topic_count AS "Topics Created",
directory_items.post_count AS "Replied",
directory_items.days_visited AS "Vists",
directory_items.topics_entered AS "Viewed",
directory_items.posts_read AS "Read"
FROM users
JOIN  directory_items ON users.id =  directory_items.user_id
WHERE directory_items.period_type = :period
ORDER BY directory_items.likes_received DESC

Once the query is saved you can enter the period into the field below the query to determine what data you get. The periods are as follows:

1: all
2: yearly
3: monthly
4: weekly
5: daily
6: quarterly

You can also change the sort by adjusting directory_items.likes_received on the last line.

10 Likes

Hello Super Discourse Users

I wanted to tell you that I started to “play” with the Data-Explorer plugin and I was impressed by the possibilities it offers, I’m super happy with the tool.

I am grateful to the genius who has come up with the creation of the plugin.

As I’m debugging almost every query I’ve been reading, I’ve created a repository and a list of queries.

When I can, I add improvements to the query as parameters, to make it more dynamic.

You can check the list here:

I have maintained the sources of each query in each file, so that when in doubt about the queries, they can send their doubts to the user who created it.

Thank you for any contributions you want to make, we can also use the repository as a backup of the queries you have.

Thank you in advance for making this possible.

A hug from Argentina,
SidV

12 Likes

9 posts were split to a new topic: Load Data Explorer queries from library

Is there an easy way of modifying these queries so as to exclude

  • myself
  • certain staff
  • all staff?

Thinking a bit more along these lines, wouldn’t a comparison between the metrics for all_users with all_users_minus_staff be an interesting measurement in itself? Call it “community independence” or so. If your community develops well, you’d want to see the the role of staff gradually disappear. This will be most relevant in small communities but it should be possible to normalize the measure by only looking at the 100 or 1000 most active users…

User participation

Shows the following columns over a specified time range (in days):

       user_id,
       username,
       name,
       email,
       visits, 
       topics_viewed,
       posts_read, 
       posts_created,
       topics_created,
       topics_with_replies,
       likes_given,
       likes_received

Full query:

-- [params]
-- int :from_days_ago = 0
-- int :duration_days = 30

with
t as (
  select 
    CURRENT_TIMESTAMP - ((:from_days_ago + :duration_days) * (INTERVAL '1 days')) as start,
    CURRENT_TIMESTAMP - (:from_days_ago * (INTERVAL '1 days')) as end
),
pr as (
    select user_id, 
        count(1) as visits,
        sum(posts_read) as posts_read
    from user_visits, t
    where posts_read > 0
    and visited_at > t.start
    and visited_at < t.end
    group by user_id
),
pc as (
  select user_id, 
      count(1) as posts_created
  from posts, t
  where created_at > t.start
  and created_at < t.end
  group by user_id
),
ttopics as (
 select user_id, posts_count
  from topics, t
  where created_at > t.start
  and created_at < t.end
),
tc as (
  select user_id, 
      count(1) as topics_created
  from ttopics
  group by user_id
),
twr as (
  select user_id, 
      count(1) as topics_with_replies
  from ttopics
  where posts_count > 1
  group by user_id
),
tv as (
 select user_id, 
        count(distinct(topic_id)) as topics_viewed
  from topic_views, t
  where viewed_at > t.start
  and viewed_at < t.end
  group by user_id
),
likes as (
  select 
      post_actions.user_id as given_by_user_id, 
      posts.user_id as received_by_user_id
  from t, post_actions
  left join posts
  on post_actions.post_id = posts.id
  where post_actions.created_at > t.start
  and post_actions.created_at < t.end
  and post_action_type_id = 2

),
lg as (
  select given_by_user_id as user_id, 
      count(1) as likes_given
  from likes
  group by user_id
),
lr as (
  select received_by_user_id as user_id, 
      count(1) as likes_received
  from likes
  group by user_id
),
e as (
  select email, user_id
  from user_emails u
  where u.primary = true
)
select pr.user_id,
       username,
       name,
       email,
       visits, 
       coalesce(topics_viewed,0) as topics_viewed,
       coalesce(posts_read,0) as posts_read, 
       coalesce(posts_created,0) as posts_created,
       coalesce(topics_created,0) as topics_created,
       coalesce(topics_with_replies,0) as topics_with_replies,
       coalesce(likes_given,0) as likes_given,
       coalesce(likes_received,0) as likes_received
from pr
left join tv using (user_id)
left join pc using (user_id)
left join tc using (user_id)
left join twr using (user_id)
left join lg using (user_id)
left join lr using (user_id)
left join e using (user_id)
left join users on pr.user_id = users.id
order by 
  visits desc, 
  posts_read desc, 
  posts_created desc
10 Likes

Participation Histograms

(Output from these used to make graphs shown in this post).

Posts Read User Counts

Number of users who have read at least N posts over the specified time period.

-- [params]
-- int :from_days_ago = 0
-- int :duration_days = 28

with
t as (
  select 
    current_date::timestamp - ((:from_days_ago + :duration_days) * (INTERVAL '1 days')) as start,
    current_date::timestamp - (:from_days_ago * (INTERVAL '1 days')) as end
),
read_visits as (
    select user_id, 
        count(1) as visits,
        sum(posts_read) as posts_read
    from user_visits, t
    where posts_read >= 1
    and visited_at > t.start
    and visited_at < t.end
    group by user_id
)
select
      case when posts_read >= 1 and posts_read <= 1    then '0001'
          when posts_read >= 2 and posts_read <= 3   then '0002 - 03'
          when posts_read >= 4 and posts_read <= 7   then '0004 - 07'
          when posts_read >= 8 and posts_read <= 15  then '0008 - 15'
          when posts_read >= 16 and posts_read <= 40  then '0016 - 31'
          when posts_read >= 32 and posts_read <= 63  then '0032 - 63'
          when posts_read >= 64 and posts_read <= 127  then '0064 - 127'
          when posts_read >= 128 and posts_read <= 255  then '0128 - 255'
          when posts_read >= 256 and posts_read <= 511  then '0256 - 511'
          when posts_read >= 512 and posts_read <= 1023  then '0512 - 1023'
          when posts_read >= 1024 and posts_read <= 2047  then '1024 - 2047'
          else '>= 2048'
      end as num_posts_read,
      count(*) as num_users
  from
      read_visits
  group by num_posts_read
  order by num_posts_read

Posts Created User Counts

Number of users who have created at least N posts over the specified time period.

-- [params]
-- int :from_days_ago = 0
-- int :duration_days = 28

with
t as (
  select 
    current_date::timestamp - ((:from_days_ago + :duration_days) * (INTERVAL '1 days')) as start,
    current_date::timestamp - (:from_days_ago * (INTERVAL '1 days')) as end
),
pc as (
  select user_id, 
      count(1) as posts_created
  from posts, t
  where created_at > t.start
  and created_at < t.end
  group by user_id
)
select
      case when posts_created >= 1 and posts_created <= 1    then '01'
          when posts_created >= 2 and posts_created <= 3   then '02 - 03'
          when posts_created >= 4 and posts_created <= 7   then '04 - 07'
          when posts_created >= 8 and posts_created <= 15  then '08 - 15'
          when posts_created >= 16 and posts_created <= 31  then '16 - 31'
          when posts_created >= 32 and posts_created <= 63  then '32 - 63'
          when posts_created >= 64 and posts_created <= 127  then '64 - 127'
          when posts_created >= 128 and posts_created <= 255  then '128 - 255'
          else '> 256'
      end as num_posts_created,
      count(*) as num_users
  from
      pc
  group by num_posts_created
  order by num_posts_created
13 Likes

New users within the past week (formatted usernames)

@outofthebox

This should work ok for you -

    select concat ('@', username)
    from users
    where created_at >= CURRENT_DATE - INTERVAL '1 week'
    order by created_at desc 

Just change the 1 week to 1 month or what ever interval you want - you can then export as csv and should be able to copy and paste that into a post.

8 Likes

New users within a given time interval (formatted usernames)

I updated to include column header and parameter so you can easily change the interval:

-- [params]
-- string :interval = 1 week

select concat ('@', username) as "new users"
    from users
    where created_at >= CURRENT_DATE - INTERVAL :interval
    order by created_at desc
3 Likes

Hi! I’m looking for a bit of help modifying a query to return usernames, instead of user ID numbers.

This is what I’ve been using so far (modified from @DavidGNavas’ excellent Network Chart thread)

WITH pairs AS (
    SELECT p.user_id liked, pa.user_id liker
    FROM post_actions pa
    LEFT JOIN posts p ON p.id = pa.post_id
    LEFT JOIN topics t ON t.id = p.topic_id
    LEFT JOIN categories c ON c.id = t.category_id
    WHERE post_action_type_id = 2
    AND c.id = 47
)
SELECT liker liker_user_id, liked liked_user_id, count(*)
FROM pairs
GROUP BY liked, liker
ORDER BY count DESC

Which I’ve then used a very clunky vlookup to match user names with ID numbers…

Once I have a list of pairs (likers & liked) expressed as usernames, it lets me follow the method used in the Network Chart thread to generate these sorts of charts - showing activity within specific categories, as we have very distinct sub-communities on different parts of our forums:

The live charts are a valuable way of exploring our different sub-communities, quickly identifying influential members and clusters of people.

For example, compare our musician community with our graphic designers - there’s a big difference between the two :slight_smile:

11 Likes

Here’s a modified version that does that:

WITH pairs AS (
    SELECT p.user_id liked_id, pa.user_id liker_id
    FROM post_actions pa
    LEFT JOIN posts p ON p.id = pa.post_id
    LEFT JOIN topics t ON t.id = p.topic_id
    LEFT JOIN categories c ON c.id = t.category_id
    WHERE post_action_type_id = 2
    AND c.id = 1
)
SELECT 
  liker.username as liker,
  liked.username as liked,
  count(*)
FROM pairs
LEFT JOIN users liker
ON liker_id = liker.id
LEFT JOIN users liked
ON liked_id = liked.id
GROUP BY liked, liker
ORDER BY count DESC
10 Likes