What cool data explorer queries have you come up with?

data-explorer

(Dave McClure) #1

Continuing the discussion from Data Explorer Plugin:

Here’s a couple I’m starting with:

Users Last Seen Since (Since N Weeks Ago)

with intervals as (
    select
        n as start_time, 
        CURRENT_TIMESTAMP as end_time
    from generate_series(CURRENT_TIMESTAMP - INTERVAL '140 days',
                         CURRENT_TIMESTAMP - INTERVAL '7 days',
                         INTERVAL '7 days') n
)
select 
  COUNT(1) as users_seen_since,
  CURRENT_TIMESTAMP - i.start_time as time_ago
FROM USERS u
right join intervals i
on u.last_seen_at >= i.start_time and u.last_seen_at < i.end_time
group by i.start_time, i.end_time
order by i.start_time desc

Users Last Seen Since (Since N Days Ago)

with intervals as (
    select
        n as start_time, 
        CURRENT_TIMESTAMP as end_time
    from generate_series(CURRENT_TIMESTAMP - INTERVAL '30 days',
                         CURRENT_TIMESTAMP - INTERVAL '1 day',
                         INTERVAL '1 day') n
)
select 
  COUNT(1) as users_seen_since,
  CURRENT_TIMESTAMP - i.start_time as time_ago
FROM USERS u
right join intervals i
on u.last_seen_at >= i.start_time and u.last_seen_at < i.end_time
group by i.start_time, i.end_time
order by i.start_time desc

Data Explorer Plugin
See All User Activity As An Admin
Need log the who downloaded attachments
Is there a Data Explorer for Dummies article somewhere?
Top user list public
Group forum members based on topics viewed?
What would you like to see on your Discourse 2.0 Admin Dashboard?
Maximum Number of Search Results?
Apply CSS to all custom (non-badge) user titles
RGSoC 2016: Visual Forum Analytics Community Discussion
Can I access Discourse forum Database to use for PowerBI reporting?
(Mittineague) #2

Most of mine are “OK, but could be better”

For example, this one works, but I don’t like the duplicate sub-query much

Banner Stats

WITH all_users AS ( SELECT
  COUNT(users.id) AS user_count
  FROM users
  WHERE users.active 
  AND users.suspended_at IS NULL
  AND users.locale IS NOT NULL
)
, read_banner_topic AS ( SELECT 
  COUNT(topic_views.user_id) AS read_count
  FROM topic_views  
  WHERE topic_views.topic_id IN (
    SELECT topics.id 
    FROM topics 
    WHERE topics.archetype IS NOT NULL
    AND topics.archetype LIKE 'banner'
  )
  AND topic_views.user_id IS NOT NULL
)
, dismissed_banner AS ( SELECT 
  COUNT(user_profiles.user_id) AS dismissed_count
  FROM user_profiles
  WHERE user_profiles.dismissed_banner_key IS NOT NULL 
  AND user_profiles.dismissed_banner_key IN (
    SELECT topics.id 
    FROM topics 
    WHERE topics.archetype IS NOT NULL
    AND topics.archetype LIKE 'banner'
  )
  AND user_profiles.user_id IS NOT NULL
)
SELECT all_users.user_count
  , read_banner_topic.read_count
  , dismissed_banner.dismissed_count 
FROM all_users, read_banner_topic, dismissed_banner

* note, read banner topic is not the same as saw banner


(Dave McClure) #3

Posts Read (Daily)

Total number of new posts read by all users per day

SELECT visited_at as day,
count(1) as users,
sum(posts_read) as posts_read,
sum(posts_read) / count(1) as avg_posts_read_per_user
FROM user_visits 
group by visited_at
order by visited_at desc

Posts Read Percentiles

Number of posts read for users in each percentile

with tentiles as (
    select posts_read_count as read, 
    ntile(10) 
    over (order by posts_read_count) as tentile 
    from user_stats
)
select (tentile - 1) * 10 as percentile,
    min(read) as min_posts_read, 
    max(read) as max_posts_read
from tentiles
group by tentile
order by tentile desc

Active Readers (Past Month)

Users with the most visits that include reading activity

select user_id, 
    count(1) as visits,
    sum(posts_read) as posts_read
from user_visits
where posts_read > 0
and visited_at > CURRENT_TIMESTAMP - INTERVAL '30 days'
group by user_id
order by visits desc, posts_read desc

Active Readers (Since N Days Ago)

Number of users who have read at least 1 post since N days ago

with intervals as (
    select
        n as start_time, 
        CURRENT_TIMESTAMP as end_time
    from generate_series(CURRENT_TIMESTAMP - INTERVAL '30 days',
                         CURRENT_TIMESTAMP - INTERVAL '1 day',
                         INTERVAL '1 days') n
),
latest_visits as (
    select 
        user_id, 
        max(visited_at) as visited_at
    from user_visits
    where posts_read > 0
    group by user_id
)
select 
  COUNT(1) as active_readers_since,
  CURRENT_TIMESTAMP - i.start_time as time_ago
FROM users u
left join latest_visits v
on u.id = v.user_id
right join intervals i
on v.visited_at >= i.start_time and v.visited_at < i.end_time
group by i.start_time, i.end_time
order by i.start_time desc

Last seen date updates with no other obvious activity (eg. posts read)
(Dean Taylor) #4

Count new TL1, TL2, TL3 users past 12 months

Number of members added to Trust Level (TL) 1, 2 and 3 month-by-month over the past year.

Answering the question how many users actually sticking around and interacting with the community and progressing through the trust levels.

Topic Participation

Number (by month) of topics started, posts to new topics, posts to old topics, topics with a response, topics without a response, percentage response rate.

Optionally exclude groups of users and / or impersonate a single users view of topics.


RGSoC 2016: Visual Forum Analytics Community Discussion
(Felix Freiberger) #5

Likes from the team

This query assumes there is a group called team, and gives you the likes other users have received, split into likes from the team and likes from others:

SELECT
    pl.user_id,
    SUM(pl.team_likes) as team_likes,
    SUM(pl.student_likes) as student_likes,
    SUM(pl.team_likes + pl.student_likes) as likes
FROM (
    SELECT -- count likes per post
        p.id as post_id_workaround,
        p.user_id as user_id,
        (
            SELECT count(*)
            FROM post_actions pa
            WHERE
                pa.post_id = p.id
                AND post_action_type_id = (
                            SELECT id FROM post_action_types WHERE name_key = 'like'
                )
                AND pa.user_id IN (
                    SELECT gu.user_id
                    FROM group_users gu
                    WHERE gu.group_id = ( SELECT id FROM groups WHERE name ilike 'team' ) 
                )
        ) as team_likes,
        (
            SELECT count(*)
            FROM post_actions pa
            WHERE
                pa.post_id = p.id
                AND post_action_type_id = (
                            SELECT id FROM post_action_types WHERE name_key = 'like'
                )
                AND pa.user_id NOT IN (
                    SELECT gu.user_id
                    FROM group_users gu
                    WHERE gu.group_id = ( SELECT id FROM groups WHERE name ilike 'team' ) 
                )
        ) as student_likes
    FROM badge_posts p
) AS pl
WHERE pl.user_id NOT IN (
    SELECT gu.user_id
    FROM group_users gu
    WHERE gu.group_id = ( SELECT id FROM groups WHERE name ilike 'team' ) 
)
GROUP BY pl.user_id
ORDER BY likes DESC

(Dave McClure) #6

Recently Read Topics by User

Show the topics with that have been opened by a given user in the past N days, sorted by the amount of time the user has spent in that topic. (requested on feverbee)

-- [params]
-- integer :user = 1
-- integer :since_days_ago = 7

with topic_timing as (
  select user_id, topic_id, sum(msecs) / 1000 as seconds
  from post_timings
  where user_id = :user
  group by user_id, topic_id
)
SELECT tv.topic_id,
    tv.user_id,
    tv.viewed_at,
    tt.seconds
from topic_views tv
left join topic_timing tt
on tv.topic_id = tt.topic_id
and tv.user_id = tt.user_id
where tv.user_id = :user
and viewed_at + :since_days_ago > CURRENT_TIMESTAMP
order by seconds desc

(Erlend Sogge Heggen) #7

6 posts were split to a new topic: “Recently Read Topics by User” query returns Parameter declaration error


(Mittineague) #13

While working on this

I was looking for a way to find topics that were created as a result of a moderator splitting posts into a new topic. (i.e. off-topic posts that deserved their own topic)

It wasn’t as easy as I had hoped. I looked at various “topic” and “post” tables with no success.

Being sure I could find a lead to what I needed in Admin -> Logs -> Staff Actions I was disappointed to find that “split” data was not there (not that it should, hopefully it doesn’t happen all that often)

The only place I could find the id of a topic created this way is in “small-action” posts as part of a link.

Getting the id portion of the string to an integer that can be used is a bit involved.

regexp_matches returns a string array
array_to_string converts the array to a string
CAST converts the string to an integer

WITH new_topics AS (
 SELECT 
 CAST( array_to_string(regexp_matches(posts.raw, '([\d]+)(?:\))$', 'g'), '') AS integer) AS new_topic_id
 FROM posts 
 WHERE action_code LIKE 'split_topic'
 AND raw LIKE '%posts were split to a new topic%'
)
SELECT topics.title 
FROM topics, new_topics
WHERE topics.id = new_topics.new_topic_id

(Mittineague) #14

Member Uploads

I put together this query to help find members that post a lot of uploads that potentially might lead to a problem.

Ordered by total upload weight per member

WITH heavy_uploads AS ( SELECT 
    ( SUM(uploads.filesize) / 1024) AS sum_kb
  , COUNT(uploads.filesize) AS upload_count
  , ( (SUM(uploads.filesize) / COUNT(uploads.filesize)) / 1024) AS avg_weight_kb
  , uploads.user_id 
  FROM uploads
  GROUP BY uploads.user_id
)
SELECT heavy_uploads.sum_kb
 , heavy_uploads.upload_count
 , heavy_uploads.avg_weight_kb
 , heavy_uploads.user_id
FROM heavy_uploads 
WHERE heavy_uploads.sum_kb > 100
ORDER BY heavy_uploads.sum_kb DESC

(the three Grimm members uploaded animated woge gifs)


Would it be worth resizing uploaded images (to save space)?
(Alessio Fattorini) #15

Users active in last 30 days? Active = at least a post in the last 30 days.


(Kane York) #16
select username
from users
where last_posted_at > current_timestamp - interval '30' day

How to measure active users?
SQL query for users who have posted on a date (or date range)
(Tom Newsom) #17

Or just /users?period=monthly surely?

EDIT: No that also counts people who just read. sorry!


#18

How would you adjust this with parameters so that previous months can also be checked? Eg: read between 62 and 31 days ago (or similar)?


(Mittineague) #19

I haven’t figured out a “between” way of doing this, but using “params” can provide a selectable “back to”

-- [params]
-- int :interval = 30

select username
from users
where last_posted_at > current_timestamp - interval ':interval' day

(Dave McClure) #20

should be able to do something like this (starting where @Mittineague left off):

-- [params]
-- int :until_days_ago = 30
-- int :since_days_ago = 60

select username
from users
where last_posted_at > current_timestamp - interval ':since_days_ago' day 
and last_posted_at < current_timestamp - interval ':until_days_ago' day

(Mittineague) #21

Thanks, the call of Labor Day burgers was distracting me and all I could think of was finding a way to put current_timestamp into the params and getting to the burgers. At times my stomach overpowers my brain.


#22

Ok, so what about ‘posted during that period’ – not last posted during that period.
i.e. a count of active (made at least one post) members during that period?


#23

Got what I need (thanks @meglio) so updating this for future posterity.

-- [params]
-- date :date_from
-- date :date_to
-- int  :min_posts = 1

WITH user_activity AS (
    SELECT p.user_id, count (p.id) as posts_count
    FROM posts p
    LEFT JOIN topics t ON t.id = p.topic_id
    WHERE p.created_at::date BETWEEN :date_from::date AND :date_to::date
        AND t.deleted_at IS NULL
        AND t.visible = TRUE
        AND t.closed = FALSE
        AND t.archived = FALSE
        AND t.archetype = 'regular'
        AND p.deleted_at IS NULL
        
    GROUP BY p.user_id
)
SELECT COUNT(user_id)
FROM user_activity
WHERE posts_count >= :min_posts

Active users for specific months
Is there a Data Explorer for Dummies article somewhere?
How can I count posts in last month by a specific group of users?
Strange problem with Data Explorer
Strange problem with Data Explorer
What's the difference between "User Visits" and "Active Users"
(Felix Freiberger) #24

List of all Members of a Group with Custom Field

We use this query to get a list of the name and matriculation number of all team members:

SELECT users.name, user_custom_fields.value as matriculation
FROM users
JOIN group_users ON users.id = group_users.user_id
JOIN groups ON groups.id = group_users.group_id
JOIN user_custom_fields ON users.id = user_custom_fields.user_id
WHERE groups.name = 'Team'
AND user_custom_fields.name = 'user_field_2'

Simply insert the group name and the custom field id in the last two names.


(Joshua Rosenfeld) #25

Anyone able to provide the query for all users with a custom title (not a badge granted title)?