What cool data explorer queries have you come up with?

Posts Read (Daily)

Total number of new posts read by all users per day

SELECT visited_at as day,
count(1) as users,
sum(posts_read) as posts_read,
sum(posts_read) / count(1) as avg_posts_read_per_user
FROM user_visits 
group by visited_at
order by visited_at desc

Posts Read Percentiles

Number of posts read for users in each percentile

with tentiles as (
    select posts_read_count as read, 
    ntile(10) 
    over (order by posts_read_count) as tentile 
    from user_stats
)
select (tentile - 1) * 10 as percentile,
    min(read) as min_posts_read, 
    max(read) as max_posts_read
from tentiles
group by tentile
order by tentile desc

Active Readers (Past Month)

Users with the most visits that include reading activity

select user_id, 
    count(1) as visits,
    sum(posts_read) as posts_read
from user_visits
where posts_read > 0
and visited_at > CURRENT_TIMESTAMP - INTERVAL '30 days'
group by user_id
order by visits desc, posts_read desc

Active Readers (Since N Days Ago)

Number of users who have read at least 1 post since N days ago

with intervals as (
    select
        n as start_time, 
        CURRENT_TIMESTAMP as end_time
    from generate_series(CURRENT_TIMESTAMP - INTERVAL '30 days',
                         CURRENT_TIMESTAMP - INTERVAL '1 day',
                         INTERVAL '1 days') n
),
latest_visits as (
    select 
        user_id, 
        max(visited_at) as visited_at
    from user_visits
    where posts_read > 0
    group by user_id
)
select 
  COUNT(1) as active_readers_since,
  CURRENT_TIMESTAMP - i.start_time as time_ago
FROM users u
left join latest_visits v
on u.id = v.user_id
right join intervals i
on v.visited_at >= i.start_time and v.visited_at < i.end_time
group by i.start_time, i.end_time
order by i.start_time desc
14 Likes

Count new TL1, TL2, TL3 users past 12 months

Number of members added to Trust Level (TL) 1, 2 and 3 month-by-month over the past year.

Answering the question how many users actually sticking around and interacting with the community and progressing through the trust levels.

Topic Participation

Number (by month) of topics started, posts to new topics, posts to old topics, topics with a response, topics without a response, percentage response rate.

Optionally exclude groups of users and / or impersonate a single users view of topics.

5 Likes

Likes from the team

This query assumes there is a group called team, and gives you the likes other users have received, split into likes from the team and likes from others:

SELECT
    pl.user_id,
    SUM(pl.team_likes) as team_likes,
    SUM(pl.student_likes) as student_likes,
    SUM(pl.team_likes + pl.student_likes) as likes
FROM (
    SELECT -- count likes per post
        p.id as post_id_workaround,
        p.user_id as user_id,
        (
            SELECT count(*)
            FROM post_actions pa
            WHERE
                pa.post_id = p.id
                AND post_action_type_id = (
                            SELECT id FROM post_action_types WHERE name_key = 'like'
                )
                AND pa.user_id IN (
                    SELECT gu.user_id
                    FROM group_users gu
                    WHERE gu.group_id = ( SELECT id FROM groups WHERE name ilike 'team' ) 
                )
        ) as team_likes,
        (
            SELECT count(*)
            FROM post_actions pa
            WHERE
                pa.post_id = p.id
                AND post_action_type_id = (
                            SELECT id FROM post_action_types WHERE name_key = 'like'
                )
                AND pa.user_id NOT IN (
                    SELECT gu.user_id
                    FROM group_users gu
                    WHERE gu.group_id = ( SELECT id FROM groups WHERE name ilike 'team' ) 
                )
        ) as student_likes
    FROM badge_posts p
) AS pl
WHERE pl.user_id NOT IN (
    SELECT gu.user_id
    FROM group_users gu
    WHERE gu.group_id = ( SELECT id FROM groups WHERE name ilike 'team' ) 
)
GROUP BY pl.user_id
ORDER BY likes DESC
5 Likes

Recently Read Topics by User

Show the topics with that have been opened by a given user in the past N days, sorted by the amount of time the user has spent in that topic. (requested on feverbee)

-- [params]
-- integer :user = 1
-- integer :since_days_ago = 7

with topic_timing as (
  select user_id, topic_id, sum(msecs) / 1000 as seconds
  from post_timings
  where user_id = :user
  group by user_id, topic_id
)
SELECT tv.topic_id,
    tv.user_id,
    tv.viewed_at,
    tt.seconds
from topic_views tv
left join topic_timing tt
on tv.topic_id = tt.topic_id
and tv.user_id = tt.user_id
where tv.user_id = :user
and viewed_at + :since_days_ago > CURRENT_TIMESTAMP
order by seconds desc
14 Likes

6 posts were split to a new topic: “Recently Read Topics by User” query returns Parameter declaration error

While working on this

I was looking for a way to find topics that were created as a result of a moderator splitting posts into a new topic. (i.e. off-topic posts that deserved their own topic)

It wasn’t as easy as I had hoped. I looked at various “topic” and “post” tables with no success.

Being sure I could find a lead to what I needed in Admin → Logs → Staff Actions I was disappointed to find that “split” data was not there (not that it should, hopefully it doesn’t happen all that often)

The only place I could find the id of a topic created this way is in “small-action” posts as part of a link.

Getting the id portion of the string to an integer that can be used is a bit involved.

regexp_matches returns a string array
array_to_string converts the array to a string
CAST converts the string to an integer

WITH new_topics AS (
 SELECT 
 CAST( array_to_string(regexp_matches(posts.raw, '([\d]+)(?:\))$', 'g'), '') AS integer) AS new_topic_id
 FROM posts 
 WHERE action_code LIKE 'split_topic'
 AND raw LIKE '%posts were split to a new topic%'
)
SELECT topics.title 
FROM topics, new_topics
WHERE topics.id = new_topics.new_topic_id
2 Likes

Member Uploads

I put together this query to help find members that post a lot of uploads that potentially might lead to a problem.

Ordered by total upload weight per member

WITH heavy_uploads AS ( SELECT 
    ( SUM(uploads.filesize) / 1024) AS sum_kb
  , COUNT(uploads.filesize) AS upload_count
  , ( (SUM(uploads.filesize) / COUNT(uploads.filesize)) / 1024) AS avg_weight_kb
  , uploads.user_id 
  FROM uploads
  GROUP BY uploads.user_id
)
SELECT heavy_uploads.sum_kb
 , heavy_uploads.upload_count
 , heavy_uploads.avg_weight_kb
 , heavy_uploads.user_id
FROM heavy_uploads 
WHERE heavy_uploads.sum_kb > 100
ORDER BY heavy_uploads.sum_kb DESC

(the three Grimm members uploaded animated woge gifs)

20 Likes

Active users in the last 30 days

select username
from users
where last_posted_at > current_timestamp - interval '30' day
7 Likes

How would you adjust this with parameters so that previous months can also be checked? Eg: read between 62 and 31 days ago (or similar)?

I haven’t figured out a “between” way of doing this, but using “params” can provide a selectable “back to”

-- [params]
-- int :interval = 30

select username
from users
where last_posted_at > current_timestamp - interval ':interval' day
1 Like

should be able to do something like this (starting where @Mittineague left off):

-- [params]
-- int :until_days_ago = 30
-- int :since_days_ago = 60

select username
from users
where last_posted_at > current_timestamp - interval ':since_days_ago' day 
and last_posted_at < current_timestamp - interval ':until_days_ago' day
7 Likes

Posts created for period

Got what I need (thanks @meglio) so updating this for future posterity.

-- [params]
-- date :date_from
-- date :date_to
-- int  :min_posts = 1

WITH user_activity AS (
    SELECT p.user_id, count (p.id) as posts_count
    FROM posts p
    LEFT JOIN topics t ON t.id = p.topic_id
    WHERE p.created_at::date BETWEEN :date_from::date AND :date_to::date
        AND t.deleted_at IS NULL
        AND t.visible = TRUE
        AND t.closed = FALSE
        AND t.archived = FALSE
        AND t.archetype = 'regular'
        AND p.deleted_at IS NULL
        
    GROUP BY p.user_id
)
SELECT COUNT(user_id)
FROM user_activity
WHERE posts_count >= :min_posts
12 Likes

List of all Members of a Group with Custom Field

We use this query to get a list of the name and matriculation number of all team members:

SELECT users.name, user_custom_fields.value as matriculation
FROM users
JOIN group_users ON users.id = group_users.user_id
JOIN groups ON groups.id = group_users.group_id
JOIN user_custom_fields ON users.id = user_custom_fields.user_id
WHERE groups.name = 'Team'
AND user_custom_fields.name = 'user_field_2'

Simply insert the group name and the custom field id in the last two names.

3 Likes

Anyone able to provide the query for all users with a custom title (not a badge granted title)?

2 Likes

I don’t think this is strictly possible, because a custom title could be character-by-character identical to a badge granted one (possible of a badge the user doesn’t have).

Here’s a query that might do what you want:

SELECT username FROM users
WHERE title IS NOT NULL

AND (
    SELECT count(*) FROM badges
    WHERE allow_title = true
    AND name = title
    ) = 0

AND (
    SELECT count(*) FROM groups
    WHERE groups.title = users.title
    ) = 0

It selects every title any user has, minus every title where a corresponding badge exists, minus every title that could come from a group membership. It does not test whether the user has the badge or the group membership.

Maybe it does help you, at least as a starting point :slight_smile:

2 Likes

Thanks @fefrei, that seems to work, but as you mentioned it might not cover all the edge cases. I did have to change the first line SELECT title FROM users to SELECT username FROM users as I was looking for the usernames, not the custom titles.

@Mittineague was able to create a query that seems to work as well in this topic.

SELECT 
users.username
, users.title 
FROM users 
JOIN user_profiles 
ON user_profiles.user_id = users.id 
WHERE users.title IS NOT NULL 
AND users.title NOT LIKE '' 
AND user_profiles.badge_granted_title IS NOT TRUE
1 Like

Top quality users in last six months

Top 20 users by average post score.

Post scores are calculated based on reply count, likes, incoming links, bookmarks, average time (reading?) and read count.

SELECT 
    sum(p.score) / count(p) as "average score per post", 
    count(p.id) as post_count, 
    p.user_id
FROM posts p
JOIN users u ON u.id = p.user_id
WHERE p.created_at >= CURRENT_DATE - INTERVAL '6 month'
  AND NOT u.admin
  AND NOT u.silenced
  AND u.active
GROUP by user_id, u.views
HAVING count(p.id) > 50
ORDER BY sum(p.score) / count(p) DESC
LIMIT 20
10 Likes

Who has been sending the most messages in the last week?

SELECT user_id, count(*) AS message_count
FROM topics
WHERE archetype = 'private_message' AND subtype = 'user_to_user'
AND age(created_at) < interval '7 days'
GROUP BY user_id
ORDER BY message_count DESC

Useful for tracking down suspicious PM activity.

8 Likes

Users ordered by creation date

SELECT users.username, users.created_at
FROM users
WHERE users.created_at < '2017-05-10'::timestamp
ORDER BY users.created_at
2 Likes

Posts per category (with deleted and without deleted)

I think that by default deleted topics/posts will be included in data-explorer statistics.

Doing some quick experimentation on a category where we delete a lot of stuff:

SELECT count(*) from topics 
WHERE category_id=40

returns 799

And then, deliberately excluding deleted topics

SELECT count(*) from topics 
WHERE category_id=40
AND deleted_at is null

returns 416.

So to get the statistics you want, I think you can just run the query without checking deleted_at

8 Likes