This is super helpful.
How would you go about augmenting this to actually retrieve those posts?
This is super helpful.
How would you go about augmenting this to actually retrieve those posts?
Would something like this meet your needs?
-- [params]
-- date :start_date
-- date :end_date
SELECT u.id AS user_id, p.id AS post_id, p.created_at
FROM users u
JOIN user_stats us
ON u.id = us.user_id
JOIN posts p
ON u.id = p.user_id
WHERE p.created_at = us.first_post_created_at
AND us.first_post_created_at BETWEEN :start_date::date AND :end_date::date
That’s perfect. Thanks a ton.
Can I ask a bit of a SQL newbie question? What are the u
and the us
doing on lines two and three? Is it sort of functioning like AS where users
can now be referred to as the prefix u
and user_stats
as the prefix us
?
No problem!
That’s correct. The u
and us
are table aliases. You can read more about them here: PostgreSQL Alias
It’s essentially shorthand. It allows your to use us.user_id
instead of user_stats.user_id
to refer to the user_id
column in the user_stats
table. It’s particularly helpful with long table names and big queries.
How to find total number of topics with 0 tags in it
(only 1 category should be checked - I’d like to enter the name of the category before doing the search)
Is it possible to list active, non-suspended users who have the Map Location field in their profile blank? This field is from the Locations Plugin.
Yes, adapt this query: https://github.com/SidVal/discourse-data-explorer/blob/queries/queries/check-url-user-profile.sql (check version 1).
I’ve got a working query now which gives me details on the geo_location field and another custom field when geo is set to NOT NULL.
I still need to limit to non-suspended and active users. How would I do that?
WITH geo AS (
SELECT ucf.value,
ucf.user_id
FROM user_custom_fields ucf
WHERE ucf.name = 'geo_location'
),
user_field_2 AS (
SELECT ucf.value,
ucf.user_id
FROM user_custom_fields ucf
WHERE ucf.name = 'user_field_2'
),
target_fields AS (
SELECT
u.id AS user_id,
uf1.value AS geo,
uf2.value AS user_field_2
FROM users u
LEFT JOIN geo uf1
ON uf1.user_id = u.id
LEFT JOIN user_field_2 uf2
ON uf2.user_id = u.id
JOIN user_stats us ON u.id = us.user_id
)
SELECT * FROM target_fields tf
WHERE tf.geo IS NULL
ORDER BY tf.user_id
LIMIT 25
How could I calculate the number of active users per hour of the day averaged over the last month?
I have been working on queries that look at activity by groups.
user_visits
table only captures one record per unique user_id per day given visited_at
is a DATE and not a DATETIME or TIMESTAMP even if they happen to visit multiple times in a day, from multiple device types. When I run my query by DAY, I see the total unique members are equal to the total number of visits. Can someone confirm if my assumption is correct? If it is, the follow-up question becomes what happens to the value for mobile
if the user visits three times in a day from a laptop, desktop, then phone, in any order?AVG(uv.posts_read)
is not the same value as SUM(uv.posts_read)/COUNT(DISTINCT(uv.user_id)
, so I would be curious to hear where my query is making a wrong turn or if I am interpreting the user_visits
table or fields incorrectly. Can someone chime in on the difference between those two calculations?
-- [params]
-- null string :group_name = YourGroupName
-- date :start_date = 2019/09/01
-- date :end_date = 2019/10/01
-- null string :frame = day
with mobile as (
SELECT uv.id,
count(DISTINCT(uv.user_id)) as UniqueMobile,
date_part(:frame, uv.visited_at::date) as Day,
g.name as GroupName
from user_visits uv
join users u on uv.user_id = u.id
join group_users gu on gu.user_id = u.id
join groups g on g.id = gu.group_id
where mobile = true
and uv.visited_at >= :start_date::date
and uv.visited_at < :end_date::date
and g.name = :group_name
GROUP BY GroupName, Day, uv.id
),
desktop as (
SELECT uv.id,
count(DISTINCT(uv.user_id)) as UniqueDesktop,
date_part(:frame, uv.visited_at::date) as Day,
g.name as GroupName
from user_visits uv
join users u on uv.user_id = u.id
join group_users gu on gu.user_id = u.id
join groups g on g.id = gu.group_id
where mobile = false
and uv.visited_at >= :start_date::date
and uv.visited_at < :end_date::date
and g.name = :group_name
GROUP BY GroupName, Day, uv.id
)
SELECT
date_part(:frame, uv.visited_at::date) as VisitDate,
count(DISTINCT(uv.user_id)) as UniqueMembers,
count(uv.id) as AllVisits,
count(m.UniqueMobile) as MobileVisits,
count(d.UniqueDesktop) as DesktopVisits,
round((count(DISTINCT(uv.user_id)) * 100.0) / groups.user_count, 2) as Percent,
round(avg(uv.posts_read),2) as "Posts Read (avg Visit)",
(interval '1' minute * ROUND((cast(AVG(uv.time_read) as decimal)/60), 2)) as "Read Time (avg Visit)",
(SUM(uv.posts_read)/count(DISTINCT(uv.user_id))) as "Posts Read (avg Member)",
date_trunc('second',(interval '1' minute * (ROUND((cast(SUM(uv.time_read) as decimal)/60), 2)/(count(DISTINCT(uv.user_id)))))) as "Read Time (avg Member)"
FROM users
join group_users on group_users.user_id = users.id
join groups on group_users.group_id = groups.id
LEFT join user_visits uv on uv.user_id = users.id
LEFT JOIN mobile m ON m.id = uv.id
LEFT JOIN desktop d ON d.id = uv.id
where groups.name = :group_name
AND uv.visited_at::date >= :start_date
and uv.visited_at::date < :end_date
group by VisitDate, groups.user_count
order by VisitDate asc
Yes, this is correct.
If the user reads posts when they visit from a new device, the mobile
column will be updated based on the last device that they visited on. For example. If a user starts the day by reading 2 posts on a desktop browser, a user_visits
entry will be created for the user’s id with posts_read
set to 2 and mobile
set to false
. If the user then logs in on a mobile device and reads 3 more posts, the user_visits
entry for the day will be updated to posts_read: 5
and mobile: true
. This is something you can test with the Data Explorer, you just have to be sure that the user is reading posts they have not read before.
Thanks, any ideas on the last question? I am a fan of good data over bad, and that raises eyebrows to be similar and yet not the same, especially given your explanation that there is only one uv.id
per uv.user_id
in user_visits
per uv.visited_at
.
It may not be super helpful, but here is a sample from my data:
VisitDate | Posts Read (avg Visit) | Read (avg Member) |
---|---|---|
13 | 1.18 | 1 |
14 | 4.15 | 4 |
15 | 7.18 | 7 |
16 | 6.15 | 6 |
When you divide with integers, postgres will return an integer. Try something like SUM(posts_read)::float / COUNT(DISTINCT(user_id))
and see if you’re still finding a difference. You may need to round the result to two places.
I’m looking for a query that will give me a list of topics ordered by the number of different users that have participated in the topic.
Not sure how well I phrased that, this is the number I mean:
How can modify “Group Members Reply Count” to count likes received from the members of the group and visits?
My goal is to measure the “activity” of the members from the join date so far
I answered myself just used this and it’s amazing!
Extracted group_id from group.json
Just a question. How can I modify the script to select a specified range of dates?
Is there a way or query where we can see the most valuable topics and most valued users for our forum?
I couldn’t find ‘Score’ and ‘Percent Ranks’ in any default reports, though I’m sure they are somewhere (my searching ‘Meta’ also didn’t work).