New user of the month, is it working?

As you can see by the screenshots the “New user of the month” ran and picked two users with less liked posts.

Gary

You have other new users that are more active, as in their posts were liked by more people? Remember it is only new users that signed up in that calendar month. Those are the only eligible users for the award.

The badge won’t be awarded at all if

  • you don’t have any new users in that month (of course)

  • you don’t have any new users with at least a few posts that have likes in that month

Yes, that is correct. The only reason michael was selected is that his only three posts received likes. The other user, weather-display was only on the forum TWO days. There is no way that user is “most active.”

The way the badge works is based on active validation of other users, e.g. likes.

Simply posting 100 times isn’t enough to earn the badge; you’d need to get a fair number of likes, more than other new users.

Agreed. I understand.

A user with 52 likes and 17 topics created, one topic with 12 likes didn’t get selected. That user has more likes, many more than the two that were selected.

And you are 100% sure this user created their account in the specified time period?

I am. I even know the flaw in the code.

User A and B join on the same day.

User A posts three times and each post gets liked by user B and an Admin.
User A gets a total of 9 points. 9 / 3 is a score of 3.

User B posts 30 times. User A likes 20 posts. Admin likes 20 posts.
User B gets a total of 60 points. 60 / 30 = 2.

User B clearly much more likes than user A, yet user A gets a higher score. The logic in the code is flawed.

2 Likes

Ok, if you are certain, we can pull @eviltrout over to take a quick look, as he wrote the feature.

2 Likes

So, is this not a bug?

And, yes, I am certain. @zogstrip and I tested it and pretty much proved it doesn’t work as per the description.

LEFT OUTER JOIN posts AS p ON p.user_id = u.id

Instead of counting ALL the posts, count only the posts that have one or more like.

LEFT OUTER JOIN posts AS p ON p.user_id = u.id AND liked_by.id > 1

And frankly, this still isn’t the best way to find the most active.

I don’t profess to understand the query logic very well, but I’m wondering if it might be better to use badge_posts instead of posts
https://github.com/discourse/discourse/blob/master/app/jobs/scheduled/grant_new_user_of_the_month_badges.rb

What do the results look like if we divide the score by the number of days since signup? Perhaps pow(days, 0.8) so it doesn’t have quite as big as effect

The description states:

measured by how often their posts were liked, and by whom.

Somehow I got it in my mind this was about active users. I apologize for that. I understand it is about likes so I am thinking the score should not be divided at all.

I’m thinking this query might be better : just counting the :heart:s received

SELECT u.id,
  SUM(CASE
      WHEN liked_by.admin           THEN 3.0
      WHEN liked_by.moderator       THEN 3.0
      WHEN liked_by.trust_level = 4 THEN 2.0
      WHEN liked_by.trust_level = 3 THEN 1.5
      WHEN liked_by.trust_level = 2 THEN 1.0
      WHEN liked_by.trust_level = 1 THEN 0.25
      WHEN liked_by.trust_level = 0 THEN 0.1
      ELSE 1.0
      END) AS score
FROM users AS u
INNER JOIN user_stats AS us ON u.id = us.user_id
LEFT OUTER JOIN posts AS p ON p.user_id = u.id
LEFT OUTER JOIN post_actions AS pa ON pa.post_id = p.id AND pa.post_action_type_id = #{PostActionType.types[:like]}
LEFT OUTER JOIN users AS liked_by ON liked_by.id = pa.user_id
LEFT OUTER JOIN topics AS t ON t.id = p.topic_id
WHERE u.active
  AND u.id > 0
  AND liked_by.id > 0
  AND NOT u.staged
  AND NOT u.admin
  AND NOT u.moderator
  AND t.archetype <> '#{Archetype.private_message}'
  AND u.created_at >= CURRENT_TIMESTAMP - '1 month'::INTERVAL                                                                                                    
  AND u.id NOT IN (#{current_owners.join(',')})
GROUP BY u.id
HAVING COUNT(DISTINCT p.id) > 1
   AND COUNT(DISTINCT p.topic_id) > 1
   AND COUNT(pa.id) > 1
ORDER BY score DESC 
LIMIT #{MAX_AWARDED}
1 Like

Thank you for your help. This gives much better results. :heart:

Hmm, I believe there was some logic behind the way we had it – for the purposes of new user of the month badge, a person getting a single post with 30 likes shouldn’t “outscore” a person who has 5 posts with 6 likes each.

1 Like

The way it’s written, if that user only posted one time it certainly out weighs the other user if the second user posted six times and got no likes on the sixth post.

One solution is to add an additional point per post where liked.

Let’s work together to get this bug fixed and put into production.

What’s the next step and what are your ideas?

I think this is worth trying. Otherwise, I don’t agree that posting 60 times and getting 40 likes is better than posting 3 times and getting 9 likes. The first user is posting way too much.

4 Likes

There just isn’t a “one size fits all” for this badge. @zogstrip and I went through several ideas using real data and none are 100% perfect.

Idea: what about having an admin setting to disable the automatic awarding of the new user of the month badge?

If disabled, it would then become a badge that is manually awarded by staff each month. This addresses the concerns of those who will never be satisfied by the automatic calculations, or who simply want to have control over who is awarded the badge.

For the rest, it can continue to be awarded automatically based on the imperfect but largely satisfactory system.

3 Likes