What cool data explorer queries have you come up with?

Can anyone help me write a query that tells me what actions my members are taking in the first 7 days they’re part of the community?

So - a member accepts their invite and logs on for the first time. What do they do in the first 7 days? Could this be aggregated to tell me what students spend the most time doing? Are they posting or creating new topics? Liking things? Lurking?

I’m really struggling to put together something that reflects this.

1 Like

I don’t think this is possible via Data Explorer.
I think you need to analyze something like google analytics, or hotjar to track your members.

1 Like

Does anyone have a query to select the top ten posts by number of Likes, and include the number of likes along side the post titles/links?

Please @Richie, can you open a new topic and use #plugin:data-explorer category?
BTW, check the “Fetch top 10 posts by likes received in the last month” before open that topic :+1:

2 Likes

With a bit of help from a developer (@aluxian), I’ve adapted another one of Hawk’s reports here, to calculate what % of users, who joined within a certain timeframe, have posted X times, within a certain timeframe.

You could use this to see whether you’re improving onboarding for new users enough to make them feel comfortable posting, for example. @Richard_Millington shares some stats for benchmarking your community using these metrics (at 2:30) here.

Users who’ve become active

-- [params]
-- date :user_date_from = 01.06.2019
-- date :user_date_to = 30.06.2019
-- date :post_date_from = 01.06.2019
-- date :post_date_to = 30.06.2019
-- int  :min_posts = 1


WITH user_activity AS (
    SELECT p.user_id, COUNT(p.id) as posts_count
    FROM posts p
	LEFT JOIN users u ON u.id = p.user_id
    LEFT JOIN topics t ON t.id = p.topic_id
    WHERE u.created_at::date BETWEEN :user_date_from::date AND :user_date_to::date
	  AND p.created_at::date BETWEEN :post_date_from::date AND :post_date_to::date
      AND p.deleted_at IS NULL
	  AND t.deleted_at IS NULL
      AND t.visible = TRUE
      AND t.closed = FALSE
      AND t.archived = FALSE
      AND t.archetype = 'regular'
        
    GROUP BY p.user_id
)

SELECT (t1.new_users_with_posts::float) / (t2.new_users::float) * 100 as percent_users_with_posts

FROM ( SELECT COUNT(user_id) as new_users_with_posts
       FROM user_activity
       WHERE posts_count >= :min_posts )
       as t1 
cross join
     ( SELECT COUNT(id) as new_users
       FROM users u
       WHERE u.created_at::date BETWEEN :user_date_from::date AND :user_date_to::date )
       as t2
12 Likes

Hi there. I’m a SQL newb. I would like to be able to pull a list of first time posters by username within a defined time period.

Note this is the first time they post, not joined. Our use case is different than most so post is important.

Does anyone have something like that? Appreciate any assistance!

THANK YOU!

1 Like

This should do the trick @ThunderThighs. Add it to your Data Explorer, fill in the start and end date, and you’re off to the races. Dates must match a supported format, one of which is YYYY-MM-DD.

Users with first post within period

-- [params]
-- date :start_date
-- date :end_date

SELECT username
FROM users u
JOIN user_stats us
ON u.id = us.user_id
WHERE us.first_post_created_at BETWEEN :start_date::date AND :end_date::date
13 Likes

Here’s one that should return all non-PM topics that have not received a staff reply:

SELECT t.id topic_id, category_id
FROM topics t
WHERE t.deleted_at IS NULL
  AND t.category_id IS NOT NULL
  AND t.id NOT IN (
    SELECT p.topic_id
    FROM posts p 
    JOIN users u ON p.user_id = u.id
    WHERE  u.admin = 't' OR u.moderator = 't'
      AND p.post_number > 1
  )
7 Likes

This is super helpful.

How would you go about augmenting this to actually retrieve those posts?

Would something like this meet your needs?

-- [params]
-- date :start_date
-- date :end_date

SELECT u.id AS user_id, p.id AS post_id, p.created_at
FROM users u
JOIN user_stats us
ON u.id = us.user_id
JOIN posts p
ON u.id = p.user_id
WHERE p.created_at = us.first_post_created_at
AND us.first_post_created_at BETWEEN :start_date::date AND :end_date::date
4 Likes

That’s perfect. Thanks a ton.

Can I ask a bit of a SQL newbie question? What are the u and the us doing on lines two and three? Is it sort of functioning like AS where users can now be referred to as the prefix u and user_stats as the prefix us?

2 Likes

No problem!

That’s correct. The u and us are table aliases. You can read more about them here: PostgreSQL Alias

It’s essentially shorthand. It allows your to use us.user_id instead of user_stats.user_id to refer to the user_id column in the user_stats table. It’s particularly helpful with long table names and big queries.

6 Likes

How to find total number of topics with 0 tags in it

(only 1 category should be checked - I’d like to enter the name of the category before doing the search)

1 Like

Is it possible to list active, non-suspended users who have the Map Location field in their profile blank? This field is from the Locations Plugin.

1 Like

Yes, adapt this query: discourse-data-explorer/check-url-user-profile.sql at queries · SidVal/discourse-data-explorer · GitHub (check version 1).

2 Likes

I’ve got a working query now which gives me details on the geo_location field and another custom field when geo is set to NOT NULL. :+1:

I still need to limit to non-suspended and active users. How would I do that?

Summary
WITH geo AS (
    SELECT ucf.value,
    ucf.user_id
    FROM user_custom_fields ucf
    WHERE ucf.name = 'geo_location'
),
user_field_2 AS (
    SELECT ucf.value,
    ucf.user_id
    FROM user_custom_fields ucf
    WHERE ucf.name = 'user_field_2'
),

target_fields AS (
    SELECT
    u.id AS user_id,
    uf1.value AS geo,
    uf2.value AS user_field_2
    FROM users u
    LEFT JOIN geo uf1
    ON uf1.user_id = u.id
    LEFT JOIN user_field_2 uf2
    ON uf2.user_id = u.id
    JOIN user_stats us ON u.id = us.user_id
)

SELECT * FROM target_fields tf
WHERE tf.geo IS NULL
ORDER BY tf.user_id
LIMIT 25

How could I calculate the number of active users per hour of the day averaged over the last month?

List TOP 20 Groups?
https://meta.discourse.org/g

Group Visit Activity

I have been working on queries that look at activity by groups.

  • Unique visitors
  • Total Visits
  • Mobile versus Desktop Visits
  • Percent of Group Membership Visiting
  • Avg Read Time per Visit
  • Avg Posts Read per Visit
  • Avg Read Time per Member
  • Avg Posts Read per Member

Questions

  1. I am suspecting that the user_visits table only captures one record per unique user_id per day given visited_at is a DATE and not a DATETIME or TIMESTAMP even if they happen to visit multiple times in a day, from multiple device types. When I run my query by DAY, I see the total unique members are equal to the total number of visits. Can someone confirm if my assumption is correct? If it is, the follow-up question becomes what happens to the value for mobile if the user visits three times in a day from a laptop, desktop, then phone, in any order?
  2. Oddly, I also see that the AVG(uv.posts_read) is not the same value as SUM(uv.posts_read)/COUNT(DISTINCT(uv.user_id), so I would be curious to hear where my query is making a wrong turn or if I am interpreting the user_visits table or fields incorrectly. Can someone chime in on the difference between those two calculations?

Query:

Summary
-- [params]
-- null string :group_name = YourGroupName
-- date :start_date = 2019/09/01
-- date :end_date = 2019/10/01
-- null string :frame = day

with mobile as (
SELECT uv.id,
    count(DISTINCT(uv.user_id)) as UniqueMobile,
    date_part(:frame, uv.visited_at::date) as Day,
    g.name as GroupName
from user_visits uv
join users u on uv.user_id = u.id
join group_users gu on gu.user_id = u.id
join groups g on g.id = gu.group_id
where mobile = true
    and uv.visited_at >= :start_date::date
    and uv.visited_at < :end_date::date
    and g.name = :group_name
GROUP BY GroupName, Day, uv.id
    ), 
    
desktop as (
SELECT uv.id,
    count(DISTINCT(uv.user_id)) as UniqueDesktop,
    date_part(:frame, uv.visited_at::date) as Day,
    g.name as GroupName
from user_visits uv
join users u on uv.user_id = u.id
join group_users gu on gu.user_id = u.id
join groups g on g.id = gu.group_id
where mobile = false
    and uv.visited_at >= :start_date::date
    and uv.visited_at < :end_date::date
    and g.name = :group_name
GROUP BY GroupName, Day, uv.id
    )

   SELECT 
    date_part(:frame, uv.visited_at::date) as VisitDate,
    count(DISTINCT(uv.user_id)) as UniqueMembers, 
    count(uv.id) as AllVisits,
    count(m.UniqueMobile) as MobileVisits,
    count(d.UniqueDesktop) as DesktopVisits,
    round((count(DISTINCT(uv.user_id)) * 100.0) / groups.user_count, 2) as Percent,
    round(avg(uv.posts_read),2) as "Posts Read (avg Visit)",
    (interval '1' minute * ROUND((cast(AVG(uv.time_read) as decimal)/60), 2)) as "Read Time (avg Visit)",
    (SUM(uv.posts_read)/count(DISTINCT(uv.user_id))) as "Posts Read (avg Member)",
    date_trunc('second',(interval '1' minute * (ROUND((cast(SUM(uv.time_read) as decimal)/60), 2)/(count(DISTINCT(uv.user_id)))))) as "Read Time (avg Member)"

FROM users
join group_users on group_users.user_id = users.id
join groups on group_users.group_id = groups.id
LEFT join user_visits uv on uv.user_id = users.id
LEFT JOIN mobile m ON m.id = uv.id
LEFT JOIN desktop d ON d.id = uv.id
    where groups.name = :group_name
    AND uv.visited_at::date >= :start_date
    and uv.visited_at::date < :end_date
    
group by VisitDate, groups.user_count 

order by VisitDate asc
1 Like

Yes, this is correct.

If the user reads posts when they visit from a new device, the mobile column will be updated based on the last device that they visited on. For example. If a user starts the day by reading 2 posts on a desktop browser, a user_visits entry will be created for the user’s id with posts_read set to 2 and mobile set to false. If the user then logs in on a mobile device and reads 3 more posts, the user_visits entry for the day will be updated to posts_read: 5 and mobile: true. This is something you can test with the Data Explorer, you just have to be sure that the user is reading posts they have not read before.

3 Likes