What cool data explorer queries have you come up with?

Member Uploads

I put together this query to help find members that post a lot of uploads that potentially might lead to a problem.

Ordered by total upload weight per member

WITH heavy_uploads AS ( SELECT 
    ( SUM(uploads.filesize) / 1024) AS sum_kb
  , COUNT(uploads.filesize) AS upload_count
  , ( (SUM(uploads.filesize) / COUNT(uploads.filesize)) / 1024) AS avg_weight_kb
  , uploads.user_id 
  FROM uploads
  GROUP BY uploads.user_id
)
SELECT heavy_uploads.sum_kb
 , heavy_uploads.upload_count
 , heavy_uploads.avg_weight_kb
 , heavy_uploads.user_id
FROM heavy_uploads 
WHERE heavy_uploads.sum_kb > 100
ORDER BY heavy_uploads.sum_kb DESC

(the three Grimm members uploaded animated woge gifs)

18 Likes

Active users in the last 30 days

select username
from users
where last_posted_at > current_timestamp - interval '30' day
4 Likes

How would you adjust this with parameters so that previous months can also be checked? Eg: read between 62 and 31 days ago (or similar)?

I haven’t figured out a “between” way of doing this, but using “params” can provide a selectable “back to”

-- [params]
-- int :interval = 30

select username
from users
where last_posted_at > current_timestamp - interval ':interval' day
1 Like

should be able to do something like this (starting where @Mittineague left off):

-- [params]
-- int :until_days_ago = 30
-- int :since_days_ago = 60

select username
from users
where last_posted_at > current_timestamp - interval ':since_days_ago' day 
and last_posted_at < current_timestamp - interval ':until_days_ago' day
4 Likes

Posts created for period

Got what I need (thanks @meglio) so updating this for future posterity.

-- [params]
-- date :date_from
-- date :date_to
-- int  :min_posts = 1

WITH user_activity AS (
    SELECT p.user_id, count (p.id) as posts_count
    FROM posts p
    LEFT JOIN topics t ON t.id = p.topic_id
    WHERE p.created_at::date BETWEEN :date_from::date AND :date_to::date
        AND t.deleted_at IS NULL
        AND t.visible = TRUE
        AND t.closed = FALSE
        AND t.archived = FALSE
        AND t.archetype = 'regular'
        AND p.deleted_at IS NULL
        
    GROUP BY p.user_id
)
SELECT COUNT(user_id)
FROM user_activity
WHERE posts_count >= :min_posts
8 Likes

List of all Members of a Group with Custom Field

We use this query to get a list of the name and matriculation number of all team members:

SELECT users.name, user_custom_fields.value as matriculation
FROM users
JOIN group_users ON users.id = group_users.user_id
JOIN groups ON groups.id = group_users.group_id
JOIN user_custom_fields ON users.id = user_custom_fields.user_id
WHERE groups.name = 'Team'
AND user_custom_fields.name = 'user_field_2'

Simply insert the group name and the custom field id in the last two names.

3 Likes

Anyone able to provide the query for all users with a custom title (not a badge granted title)?

1 Like

I don’t think this is strictly possible, because a custom title could be character-by-character identical to a badge granted one (possible of a badge the user doesn’t have).

Here’s a query that might do what you want:

SELECT username FROM users
WHERE title IS NOT NULL

AND (
    SELECT count(*) FROM badges
    WHERE allow_title = true
    AND name = title
    ) = 0

AND (
    SELECT count(*) FROM groups
    WHERE groups.title = users.title
    ) = 0

It selects every title any user has, minus every title where a corresponding badge exists, minus every title that could come from a group membership. It does not test whether the user has the badge or the group membership.

Maybe it does help you, at least as a starting point :slight_smile:

1 Like

Thanks @fefrei, that seems to work, but as you mentioned it might not cover all the edge cases. I did have to change the first line SELECT title FROM users to SELECT username FROM users as I was looking for the usernames, not the custom titles.

@Mittineague was able to create a query that seems to work as well in this topic.

SELECT 
users.username
, users.title 
FROM users 
JOIN user_profiles 
ON user_profiles.user_id = users.id 
WHERE users.title IS NOT NULL 
AND users.title NOT LIKE '' 
AND user_profiles.badge_granted_title IS NOT TRUE
1 Like

Top quality users in last six months

Top 20 users by average post score.

Post scores are calculated based on reply count, likes, incoming links, bookmarks, average time (reading?) and read count.

SELECT 
    sum(p.score) / count(p) as "average score per post", 
    count(p.id) as post_count, 
    p.user_id
FROM posts p
JOIN users u ON u.id = p.user_id
WHERE p.created_at >= CURRENT_DATE - INTERVAL '6 month'
  AND NOT u.admin
  AND NOT u.silenced
  AND u.active
GROUP by user_id, u.views
HAVING count(p.id) > 50
ORDER BY sum(p.score) / count(p) DESC
LIMIT 20
8 Likes

Who has been sending the most messages in the last week?

SELECT user_id, count(*) AS message_count
FROM topics
WHERE archetype = 'private_message' AND subtype = 'user_to_user'
AND age(created_at) < interval '7 days'
GROUP BY user_id
ORDER BY message_count DESC

Useful for tracking down suspicious PM activity.

5 Likes

Users ordered by creation date

SELECT users.username, users.created_at
FROM users
WHERE users.created_at < '2017-05-10'::timestamp
ORDER BY users.created_at
1 Like

Posts per category (with deleted and without deleted)

I think that by default deleted topics/posts will be included in data-explorer statistics.

Doing some quick experimentation on a category where we delete a lot of stuff:

SELECT count(*) from topics 
WHERE category_id=40

returns 799

And then, deliberately excluding deleted topics

SELECT count(*) from topics 
WHERE category_id=40
AND deleted_at is null

returns 416.

So to get the statistics you want, I think you can just run the query without checking deleted_at

6 Likes

Interesting…I assumed (likely incorrectly) that deleted topics were not included. Thanks!

A post was split to a new topic: Collecting data on registration for display later

Recreate the Discourse user’s directory

OK, here is the query to recreate the user’s directory.

-- [params]
-- null int :period

SELECT users.username AS "Username",
directory_items.likes_received AS "Likes Received",
directory_items.likes_given AS "Likes Given",
directory_items.topic_count AS "Topics Created",
directory_items.post_count AS "Replied",
directory_items.days_visited AS "Vists",
directory_items.topics_entered AS "Viewed",
directory_items.posts_read AS "Read"
FROM users
JOIN  directory_items ON users.id =  directory_items.user_id
WHERE directory_items.period_type = :period
ORDER BY directory_items.likes_received DESC

Once the query is saved you can enter the period into the field below the query to determine what data you get. The periods are as follows:

1: all
2: yearly
3: monthly
4: weekly
5: daily
6: quarterly

You can also change the sort by adjusting directory_items.likes_received on the last line.

8 Likes

Hello Super Discourse Users

I wanted to tell you that I started to “play” with the Data-Explorer plugin and I was impressed by the possibilities it offers, I’m super happy with the tool.

I am grateful to the genius who has come up with the creation of the plugin.

As I’m debugging almost every query I’ve been reading, I’ve created a repository and a list of queries.

When I can, I add improvements to the query as parameters, to make it more dynamic.

You can check the list here:

I have maintained the sources of each query in each file, so that when in doubt about the queries, they can send their doubts to the user who created it.

Thank you for any contributions you want to make, we can also use the repository as a backup of the queries you have.

Thank you in advance for making this possible.

A hug from Argentina,
SidV

10 Likes

9 posts were split to a new topic: Load Data Explorer queries from library

Is there an easy way of modifying these queries so as to exclude

  • myself
  • certain staff
  • all staff?

Thinking a bit more along these lines, wouldn’t a comparison between the metrics for all_users with all_users_minus_staff be an interesting measurement in itself? Call it “community independence” or so. If your community develops well, you’d want to see the the role of staff gradually disappear. This will be most relevant in small communities but it should be possible to normalize the measure by only looking at the 100 or 1000 most active users…