What cool data explorer queries have you come up with?

This is super helpful.

How would you go about augmenting this to actually retrieve those posts?

Would something like this meet your needs?

-- [params]
-- date :start_date
-- date :end_date

SELECT u.id AS user_id, p.id AS post_id, p.created_at
FROM users u
JOIN user_stats us
ON u.id = us.user_id
JOIN posts p
ON u.id = p.user_id
WHERE p.created_at = us.first_post_created_at
AND us.first_post_created_at BETWEEN :start_date::date AND :end_date::date
4 Likes

That’s perfect. Thanks a ton.

Can I ask a bit of a SQL newbie question? What are the u and the us doing on lines two and three? Is it sort of functioning like AS where users can now be referred to as the prefix u and user_stats as the prefix us?

2 Likes

No problem!

That’s correct. The u and us are table aliases. You can read more about them here: PostgreSQL Alias

It’s essentially shorthand. It allows your to use us.user_id instead of user_stats.user_id to refer to the user_id column in the user_stats table. It’s particularly helpful with long table names and big queries.

6 Likes

How to find total number of topics with 0 tags in it

(only 1 category should be checked - I’d like to enter the name of the category before doing the search)

1 Like

Is it possible to list active, non-suspended users who have the Map Location field in their profile blank? This field is from the Locations Plugin.

1 Like

Yes, adapt this query: discourse-data-explorer/check-url-user-profile.sql at queries · SidVal/discourse-data-explorer · GitHub (check version 1).

2 Likes

I’ve got a working query now which gives me details on the geo_location field and another custom field when geo is set to NOT NULL. :+1:

I still need to limit to non-suspended and active users. How would I do that?

Summary
WITH geo AS (
    SELECT ucf.value,
    ucf.user_id
    FROM user_custom_fields ucf
    WHERE ucf.name = 'geo_location'
),
user_field_2 AS (
    SELECT ucf.value,
    ucf.user_id
    FROM user_custom_fields ucf
    WHERE ucf.name = 'user_field_2'
),

target_fields AS (
    SELECT
    u.id AS user_id,
    uf1.value AS geo,
    uf2.value AS user_field_2
    FROM users u
    LEFT JOIN geo uf1
    ON uf1.user_id = u.id
    LEFT JOIN user_field_2 uf2
    ON uf2.user_id = u.id
    JOIN user_stats us ON u.id = us.user_id
)

SELECT * FROM target_fields tf
WHERE tf.geo IS NULL
ORDER BY tf.user_id
LIMIT 25

How could I calculate the number of active users per hour of the day averaged over the last month?

List TOP 20 Groups?
https://meta.discourse.org/g

Group Visit Activity

I have been working on queries that look at activity by groups.

  • Unique visitors
  • Total Visits
  • Mobile versus Desktop Visits
  • Percent of Group Membership Visiting
  • Avg Read Time per Visit
  • Avg Posts Read per Visit
  • Avg Read Time per Member
  • Avg Posts Read per Member

Questions

  1. I am suspecting that the user_visits table only captures one record per unique user_id per day given visited_at is a DATE and not a DATETIME or TIMESTAMP even if they happen to visit multiple times in a day, from multiple device types. When I run my query by DAY, I see the total unique members are equal to the total number of visits. Can someone confirm if my assumption is correct? If it is, the follow-up question becomes what happens to the value for mobile if the user visits three times in a day from a laptop, desktop, then phone, in any order?
  2. Oddly, I also see that the AVG(uv.posts_read) is not the same value as SUM(uv.posts_read)/COUNT(DISTINCT(uv.user_id), so I would be curious to hear where my query is making a wrong turn or if I am interpreting the user_visits table or fields incorrectly. Can someone chime in on the difference between those two calculations?

Query:

Summary
-- [params]
-- null string :group_name = YourGroupName
-- date :start_date = 2019/09/01
-- date :end_date = 2019/10/01
-- null string :frame = day

with mobile as (
SELECT uv.id,
    count(DISTINCT(uv.user_id)) as UniqueMobile,
    date_part(:frame, uv.visited_at::date) as Day,
    g.name as GroupName
from user_visits uv
join users u on uv.user_id = u.id
join group_users gu on gu.user_id = u.id
join groups g on g.id = gu.group_id
where mobile = true
    and uv.visited_at >= :start_date::date
    and uv.visited_at < :end_date::date
    and g.name = :group_name
GROUP BY GroupName, Day, uv.id
    ), 
    
desktop as (
SELECT uv.id,
    count(DISTINCT(uv.user_id)) as UniqueDesktop,
    date_part(:frame, uv.visited_at::date) as Day,
    g.name as GroupName
from user_visits uv
join users u on uv.user_id = u.id
join group_users gu on gu.user_id = u.id
join groups g on g.id = gu.group_id
where mobile = false
    and uv.visited_at >= :start_date::date
    and uv.visited_at < :end_date::date
    and g.name = :group_name
GROUP BY GroupName, Day, uv.id
    )

   SELECT 
    date_part(:frame, uv.visited_at::date) as VisitDate,
    count(DISTINCT(uv.user_id)) as UniqueMembers, 
    count(uv.id) as AllVisits,
    count(m.UniqueMobile) as MobileVisits,
    count(d.UniqueDesktop) as DesktopVisits,
    round((count(DISTINCT(uv.user_id)) * 100.0) / groups.user_count, 2) as Percent,
    round(avg(uv.posts_read),2) as "Posts Read (avg Visit)",
    (interval '1' minute * ROUND((cast(AVG(uv.time_read) as decimal)/60), 2)) as "Read Time (avg Visit)",
    (SUM(uv.posts_read)/count(DISTINCT(uv.user_id))) as "Posts Read (avg Member)",
    date_trunc('second',(interval '1' minute * (ROUND((cast(SUM(uv.time_read) as decimal)/60), 2)/(count(DISTINCT(uv.user_id)))))) as "Read Time (avg Member)"

FROM users
join group_users on group_users.user_id = users.id
join groups on group_users.group_id = groups.id
LEFT join user_visits uv on uv.user_id = users.id
LEFT JOIN mobile m ON m.id = uv.id
LEFT JOIN desktop d ON d.id = uv.id
    where groups.name = :group_name
    AND uv.visited_at::date >= :start_date
    and uv.visited_at::date < :end_date
    
group by VisitDate, groups.user_count 

order by VisitDate asc
1 Like

Yes, this is correct.

If the user reads posts when they visit from a new device, the mobile column will be updated based on the last device that they visited on. For example. If a user starts the day by reading 2 posts on a desktop browser, a user_visits entry will be created for the user’s id with posts_read set to 2 and mobile set to false. If the user then logs in on a mobile device and reads 3 more posts, the user_visits entry for the day will be updated to posts_read: 5 and mobile: true. This is something you can test with the Data Explorer, you just have to be sure that the user is reading posts they have not read before.

3 Likes

Thanks, any ideas on the last question? I am a fan of good data over bad, and that raises eyebrows to be similar and yet not the same, especially given your explanation that there is only one uv.id per uv.user_id in user_visits per uv.visited_at.

It may not be super helpful, but here is a sample from my data:

VisitDate Posts Read (avg Visit) Read (avg Member)
13 1.18 1
14 4.15 4
15 7.18 7
16 6.15 6

When you divide with integers, postgres will return an integer. Try something like SUM(posts_read)::float / COUNT(DISTINCT(user_id)) and see if you’re still finding a difference. You may need to round the result to two places.

2 Likes

I’m looking for a query that will give me a list of topics ordered by the number of different users that have participated in the topic.

Not sure how well I phrased that, this is the number I mean:

How can modify “Group Members Reply Count” to count likes received from the members of the group and visits?
My goal is to measure the “activity” of the members from the join date so far

I answered myself just used this and it’s amazing!
Extracted group_id from group.json

Just a question. How can I modify the script to select a specified range of dates?

3 posts were split to a new topic: Updating likes in user_stats via SQL

Is there a way or query where we can see the most valuable topics and most valued users for our forum?

I couldn’t find ‘Score’ and ‘Percent Ranks’ in any default reports, though I’m sure they are somewhere (my searching ‘Meta’ also didn’t work).

Check this: discourse-data-explorer/score-threads.sql at queries · SidVal/discourse-data-explorer · GitHub

3 Likes