Attempting to create Monthly Leaderboard query - head exploding


(Jeremy M) #1

I’m trying to build a query for a leaderboard comprising the things that matter to us during the last month:

  • Likes received
  • Topics solved
  • Badges gained

I started with the User Participation query, stripped out what I didn’t want/need, and then started with adding badges gained. The badges gained query by itself looks like:

-- [params] 
-- string :interval = 1 month 
-- string :trunc = month 
SELECT user_id, count(id) 
FROM user_badges 
WHERE granted_at > date_trunc(:trunc, CURRENT_TIMESTAMP - INTERVAL :interval) 
     AND granted_at < date_trunc(:trunc, CURRENT_TIMESTAMP) 
GROUP BY user_id 
ORDER BY count DESC

That works just fine, but then when I try to convert it into the modified User Participation query, that’s where my head explodes:

-- [params]
-- string :interval = 1 month
-- string :trunc = month

with
t as (
  select date_trunc(:trunc, CURRENT_TIMESTAMP - INTERVAL :interval) as start,
    date_trunc(:trunc, CURRENT_TIMESTAMP) as end
),
pr as (
    select user_id, 
        count(1) as visits,
        sum(posts_read) as posts_read
    from user_visits, t
    where posts_read > 0
    and visited_at > t.start
    and visited_at < t.end
    group by user_id
),
likes as (
  select 
      post_actions.user_id as given_by_user_id, 
      posts.user_id as received_by_user_id
  from t, post_actions
  left join posts
  on post_actions.post_id = posts.id
  where post_actions.created_at > t.start
  and post_actions.created_at < t.end
  and post_action_type_id = 2

),
lr as (
  select received_by_user_id as user_id, 
      count(1) as likes_received
  from likes
  group by user_id
),
e as (
  select email, user_id
  from user_emails u
  where u.primary = true
),
b as (
    SELECT b.user_id, count(1) as badges
    FROM user_badges b, t, pr
    WHERE granted_at > t.start
        AND granted_at < t.end
        AND pr.user_id = b.user_id
    GROUP BY b.user_id
)
select pr.user_id,
       username,
       name,
       email,
       visits, 
       coalesce(likes_received,0) as likes_received,
       coalesce(badges,0) as badges_recv
from pr
left join lr using (user_id)
left join e using (user_id)
left join users on pr.user_id = users.id
order by 
  likes_received desc,
  visits desc

I’m getting the following error (and I haven’t even started trying to add in Topics Solved):

PG::UndefinedColumn: ERROR:  column "badges" does not exist
LINE 65:        coalesce(badges,0) as badges_recv
                         ^

Which doesn’t really make sense to me since badges is clearly identified… So I am a bit stuck… Any help finishing this query with adding badges received and topics solved would be appreciated!


How do I see who is marking solved and on what?
(Dave McClure) #2

I think you just need to add a join in there:

left join b on b.user_id = users.id

(Jeremy M) #3

Ahh hell yes! Thanks! Now I need to figure out adding the topics solved to this


(Jeremy M) #4

ok so to come back to this as I have moved down the path a fair bit… trying to restrict to only a specific group, and then to all but a specific group (two separate queries).

-- User participation
-- [params]
-- string :interval = 1 month
-- string :trunc = month

with
t as (
  select date_trunc(:trunc, CURRENT_TIMESTAMP - INTERVAL :interval) as start,
    date_trunc(:trunc, CURRENT_TIMESTAMP) as end
),
pr as (
    select user_id, 
        count(1) as visits,
        sum(posts_read) as posts_read
    from user_visits, t
    where posts_read > 0
    and visited_at > t.start
    and visited_at < t.end
    group by user_id
),
likes as (
  select 
      post_actions.user_id as given_by_user_id, 
      posts.user_id as received_by_user_id
  from t, post_actions
  left join posts
  on post_actions.post_id = posts.id
  where post_actions.created_at > t.start
  and post_actions.created_at < t.end
  and post_action_type_id = 2

),
lr as (
  select received_by_user_id as user_id, 
      count(1) as likes_received
  from likes
  group by user_id
),
e as (
  select email, user_id
  from user_emails u
  where u.primary = true
),
r as (
SELECT p.user_id, count(p.id) as replies
    FROM posts p, t, pr
    WHERE p.post_type = 1 AND
        p.post_number > 1 AND
            (p.created_at > t.start and 
            p.created_at < t.end) and
            p.user_id IN 
                (select u.id 
                from users u
                where u.primary_group_id = 41) AND
        and p.user_id not in ('-1', '-2') 
        AND pr.user_id = p.user_id
        AND p.topic_id not in ('4','24','26','32','33','34','35','36','37','38','39','40')
GROUP BY p.user_id),
b as (
    SELECT b.user_id, count(1) as badges
    FROM user_badges b, t, pr
    WHERE granted_at > t.start
        AND granted_at < t.end
        AND pr.user_id = b.user_id
    GROUP BY b.user_id
)
select pr.user_id,
       username,
       name,
       email,
       visits, 
       coalesce(likes_received,0) as likes_recv,
       coalesce(replies,0) as replies_made,
       coalesce(badges,0) as badges_recv,
       ((coalesce(likes_received,0) * 2) + (coalesce(replies,0) * 1) + (coalesce(badges,0) * 3)) as total_points
from pr
left join lr using (user_id)
left join e using (user_id)
left join users on pr.user_id = users.id
left join b on b.user_id = users.id
left join r on r.user_id = users.id
order by 
  total_points desc NULLS LAST

Not sure where I would put the bits for that?


(Jeremy M) #5

pinging @Simon_Cossar to see if he has any thoughts here


(Simon Cossar) #6

I think you could create another WITH query to get the users who belong to a group, and then do an inner join to the in_group table in your final query. This should exclude any records that don’t have a user_id in the in_group table

-- returns user_ids for a named group

WITH in_group AS (
SELECT
gu.user_id
FROM group_users gu
JOIN
groups g
ON g.id = gu.group_id
WHERE g.name = :group_name
)

If you want another query that excludes the members of a group, you could do something like this

not_in_group AS (
SELECT
u.id 
FROM users u
WHERE u.id NOT IN (SELECT * FROM in_group)
)

(Jeremy M) #7

So I added the following based on group ID instead of name

in_group AS (
    SELECT gu.user_id
    FROM group_users gu
    JOIN groups g ON g.id = gu.group_id
    WHERE g.id = 41
)

and then added in the left join

left join in_group on in_group.user_id = users.id

and the same list appears - didn’t change anything.


(Simon Cossar) #8

Using a LEFT JOIN is the problem here. You need to use an INNER JOIN so that only records that match in both tables are returned. An INNER JOIN can be written as just JOIN.

This is a helpful article: A Visual Explanation of SQL Joins.


(Jeremy M) #9

Doh… I hate joins… =)

Thanks mate!


(Jeremy M) #10

So @Simon_Cossar I have another question around this… I am breaking things down for badges, giving different weight to which badge type… and getting a weird error when I try to add things together:

PG::SyntaxError: ERROR:  syntax error at or near "+"
LINE 97: ...sce(badges1,0) * 3)) + (coalesce(badges2,0) * 2)) + (coalesc...
                                                              ^

I am using “+” elsewhere in that line and it’s fine, but as soon as I try to break it down it gives that error…

Here is the query (wall of text hidden by spoiler):

Query
-- User participation
-- [params]
-- string :trunc = Day
-- string :interval = 1 day
-- string :start = 2018-06-01 00:00:00.000000
-- string :end = 2018-06-30 23:59:59.999999

with
--t as (
--  select date_trunc(:trunc, CURRENT_TIMESTAMP - INTERVAL :interval) as start,
--    date_trunc(:trunc, CURRENT_TIMESTAMP) as end
--),
pr as (
    select user_id, 
        count(1) as visits,
        sum(posts_read) as posts_read
    from user_visits
    where posts_read > 0
    and visited_at between :start and :end
    group by user_id
),
likes as (
  select 
      post_actions.user_id as given_by_user_id, 
      posts.user_id as received_by_user_id
  from post_actions
  left join posts
  on post_actions.post_id = posts.id
  where post_actions.created_at between :start and :end
  and post_action_type_id = 2

),
lr as (
  select received_by_user_id as user_id, 
      count(1) as likes_received
  from likes
  group by user_id
),
e as (
  select email, user_id
  from user_emails u
  where u.primary = true
),
r as (
SELECT p.user_id, count(p.id) as replies
    FROM posts p, pr
    WHERE p.post_type = 1 AND
        p.post_number > 1 AND
            (p.created_at between :start and :end)
        and p.user_id not in ('-1', '-2') 
        AND pr.user_id = p.user_id
        AND p.topic_id not in ('4','24','26','32','33','34','35','36','37','38','39','40')
GROUP BY p.user_id),
b1 as (
    SELECT b1.user_id, count(1) as badges1
    FROM user_badges b1, pr
    WHERE granted_at between :start and :end
        AND pr.user_id = b1.user_id
        AND b1.badge_id in (select id from badges where badge_type_id = 1)
    GROUP BY b1.user_id
),
b2 as (
    SELECT b2.user_id, count(1) as badges2
    FROM user_badges b2, pr
    WHERE granted_at between :start and :end
        AND pr.user_id = b2.user_id
        AND b2.badge_id in (select id from badges where badge_type_id = 2)
    GROUP BY b2.user_id
),
b3 as (
    SELECT b3.user_id, count(1) as badges3
    FROM user_badges b3, pr
    WHERE granted_at between :start and :end
        AND pr.user_id = b3.user_id
        AND b3.badge_id in (select id from badges where badge_type_id = 3)
    GROUP BY b3.user_id
)
select pr.user_id,
       username,
       name,
       email,
       visits, 
       coalesce(likes_received,0) as likes_recv,
       coalesce(replies,0) as replies_made,
       coalesce(badges1,0) as badges1_recv,
       coalesce(badges2,0) as badges2_recv,
       coalesce(badges3,0) as badges3_recv,
       ((coalesce(likes_received,0) * 2) + (coalesce(replies,0) * 1) + (coalesce(badges1,0) * 3)) + (coalesce(badges2,0) * 2)) + (coalesce(badges3,0) * 1)) as total_points
from pr
left join lr using (user_id)
left join e using (user_id)
left join users on pr.user_id = users.id
left join b1 on b1.user_id = users.id
left join b2 on b2.user_id = users.id
left join b3 on b3.user_id = users.id
left join r on r.user_id = users.id
order by 
  total_points desc NULLS LAST

(Mittineague) #11

It is not a weird error, the code has a Syntax Error. Maybe you can see the issue better this way?

( 
   (coalesce(likes_received,0) * 2) 
 + (coalesce(replies,0) * 1) 
 + (coalesce(badges1,0) * 3)
) 
 + (coalesce(badges2,0) * 2)
) 
 + (coalesce(badges3,0) * 1)
) as total_points

(Jeremy M) #12

DOH! I missed my parentheses… thanks!


(Mittineague) #13

No problem, many times it just takes a second pair of eyes to see something.

I’m a big fan of readability and don’t skimp on verbosity or newlines especially when things start to get gnarly looking. I rely heavily on indentation to keep track of nesting but many prefer to use an IDE or text editor that auto-magically adds closers as soon as a starter is added.


(Jeremy M) #14

And as I am continually refining this, I am trying to restrict posts/topics from certain categories from being included in the results, specifically categories 43, and 47-50 as seen in the query below, but they’re still included:

query
-- User participation
-- [params]
-- string :interval = 1 month
-- string :trunc = month

with
t as (
  select date_trunc(:trunc, CURRENT_TIMESTAMP - INTERVAL :interval) as start,
    date_trunc(:trunc, CURRENT_TIMESTAMP) as end
),
pr as (
    select user_id, 
        count(1) as visits,
        sum(posts_read) as posts_read
    from user_visits, t
    where posts_read > 0
    and visited_at > t.start
    and visited_at < t.end
    group by user_id
),
likes as (
  select 
      post_actions.user_id as given_by_user_id, 
      posts.user_id as received_by_user_id
  from t, post_actions
  left join posts
  on post_actions.post_id = posts.id
  where post_actions.created_at > t.start
  and post_actions.created_at < t.end
  and post_action_type_id = 2

),
lr as (
  select received_by_user_id as user_id, 
      count(1) as likes_received
  from likes
  group by user_id
),
e as (
  select email, user_id
  from user_emails u
  where u.primary = true
),
r as (
SELECT p.user_id, count(p.id) as replies
    FROM posts p, t, pr
    WHERE p.post_type = 1 AND
        p.post_number > 1 AND
            (p.created_at > t.start and 
            p.created_at < t.end)
        and p.user_id not in ('-1', '-2') 
        AND pr.user_id = p.user_id
        AND p.topic_id not in ('4','24','26','32','33','34','35','36','37','38','39','40','43','47','48','49','50')
GROUP BY p.user_id),
b1 as (
    SELECT b1.user_id, count(1) as badges1
    FROM user_badges b1, t, pr
    WHERE granted_at between t.start and t.end
        AND pr.user_id = b1.user_id
        AND b1.badge_id in (select id from badges where badge_type_id = 1)
    GROUP BY b1.user_id
),
b2 as (
    SELECT b2.user_id, count(1) as badges2
    FROM user_badges b2, t, pr
    WHERE granted_at between t.start and t.end
        AND pr.user_id = b2.user_id
        AND b2.badge_id in (select id from badges where badge_type_id = 2)
    GROUP BY b2.user_id
),
b3 as (
    SELECT b3.user_id, count(1) as badges3
    FROM user_badges b3, t, pr
    WHERE granted_at between t.start and t.end
        AND pr.user_id = b3.user_id
        AND b3.badge_id in (select id from badges where badge_type_id = 3)
    GROUP BY b3.user_id
),
in_group AS (
    SELECT gu.user_id
    FROM group_users gu
    JOIN groups g ON g.id = gu.group_id
   WHERE g.id = 41
)
select pr.user_id,
       username,
       name,
       email,
       visits, 
       coalesce(likes_received,0) as likes_recv,
       coalesce(replies,0) as replies_made,
       coalesce(badges1,0) as badges1_recv,
       coalesce(badges2,0) as badges2_recv,
       coalesce(badges3,0) as badges3_recv,
       (
           (coalesce(likes_received,0) * 2) + 
           (coalesce(replies,0) * 1) + 
           (coalesce(badges1,0) * 3) + 
           (coalesce(badges2,0) * 2) + 
           (coalesce(badges3,0) * 1)
       ) as total_points
from pr
left join lr using (user_id)
left join e using (user_id)
left join users on pr.user_id = users.id
join in_group on in_group.user_id = users.id
left join b1 on b1.user_id = users.id
left join b2 on b2.user_id = users.id
left join b3 on b3.user_id = users.id
left join r on r.user_id = users.id
order by 
  total_points desc NULLS LAST

(Jeremy M) #15

@Simon_Cossar do you have any thoughts on the above?


(Simon Cossar) #18

I haven’t run the full query. The first thing I would do is check that all the WITH statements are returning the results you are looking for. You can check them in your final query by running something like SELECT * FROM r.

If the WITH statements are returning the correct results, but rows are being returned in your final query that you expect to be excluded, then the issue is with the joins in the query. If the final result should only include rows that are found in your r table, the problem will be the left join you are using for the r table. A left join will return the complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null. Try using an inner join instead (JOIN and INNER JOIN are the same thing.)

Here’s a good reference on joins: A Visual Explanation of SQL Joins.


(Jeremy M) #19

Yeah so the WITH query looks to maybe be the problem, as it is pulling every user regardless of the category the topic is in. For instance I have one user that it’s returning 9 posts but they are in those protected topic IDs (43, 47, 48, 49, 59):

r as (
SELECT p.user_id, count(p.id) as replies
    FROM posts p, t, pr
    WHERE p.post_type = 1 AND
        p.post_number > 1 AND
            (p.created_at > t.start and 
            p.created_at < t.end)
        and p.user_id not in ('-1', '-2') 
        AND pr.user_id = p.user_id
        AND p.topic_id not in ('4','24','26','32','33','34','35','36','37','38','39','40','43','47','48','49','50')
GROUP BY p.user_id),

(Mittineague) #20

Are you sure the query is counting posts in the excluded topics and it’s not that the query is counting posts that are hidden, deleted, or not the current public_version?


(Jeremy M) #21

It is counting posts in the excluded categories because one of the users has only posted in those categories and yet still shows up.