Complex user activity report requirement

No, you would have to copy/paste sections of the query.

The posts table in the database includes a via_email boolean, so yes you can find out which posts were created via email :+1:

However, Discourse doesn’t include any trackers in the emails it sends, so there won’t be any way to determine whether an email notification has been ‘read’ or not

Is the ‘baked-in’ user activity reporting functionality handled via SQL queries that I can copy from anywhere and tinker with? - so I don’t end up spending weeks reinventing the wheel?

Are you looking at the “daily engaged users” report? That’s generated with this logic, which uses ActiveRecord (so no raw SQL). But still, the logic might be a useful starting point.

I am aiming for a per user activity report, in csv format, spanning from one date to another. The end goal is to give each user an activity points score for the year (or other period) based on messages received/read online or via email, messages posted online or via email, and where messages with likes get more points. The report I was wondering about using as a base is the first one you see when clicking Admin/Users, because it already does a lot of what I want

The user directory logic can be found here: discourse/directory_item.rb at 1e66e4602f408d7b2d1105b6c228542dce7f4b38 · discourse/discourse · GitHub

Hmm, not looking like a trivial undertaking to replicate via SQL (at least not for me), as I will need to learn enough SQL to pull it off from scratch at the same time, if there is nothing similar enough I can copy from directly.
Last time I did any programming was many years ago in high school, when BASIC was not quite as embarrassing to admit to.

1 Like

Yeah I think this will require a reasonable knowledge of SQL to implement. If you have a budget for the work, you might be able to find someone who can help in the #marketplace

Cheers, I don’t have a budget (forum generates little donation income beyond basic hosting costs), but looking like I may have to go that way anyway

1 Like

@Paul_King

This query might help.

Many thanks! I tried it, but am getting a syntax error

PG::SyntaxError: ERROR: syntax error at or near “WITH”
LINE 13: WITH date_range AS (

(apologies, I sent a PM to you via that thread before twigging that you and the original author there are one and the same!)

1 Like

Do you mind if I ask what values you are using for the variables?

I just pulled the query and ran it against a test site with the following values:

  • start_date: 2021-07-01
  • end_date: 2021-07-30
  • coverage: all

Hi, I tried a few, but all gave me that result - eg

Sorry, I am not able to reproduce the error.

Can you paste the query here as you have it?

SELECT 1-- coverage: 'week', 'all', or 'date'
-- [params]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date), 
visits AS (SELECT u.id, dr.*, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, COUNT(uv.*) AS visits
FROM date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN user_visits uv ON (uv.user_id = u.id AND visited_at = dr.date)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)

SELECT ps.id, ps.username, ps.created_at, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
LEFT JOIN visits v ON (v.id = ps.id AND v.date = ps.date)
GROUP BY ps.id, ps.username, ps.created_at, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END
1 Like

Ah I see.

The SELECT 1 at the beginning is not part of the query and is the source of your trouble. That’s the placeholder when you create a new query in the Data Explorer. Remove that and it should work.

-- coverage: 'week', 'all', or 'date'
-- [params]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date), 
visits AS (SELECT u.id, dr.*, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, COUNT(uv.*) AS visits
FROM date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN user_visits uv ON (uv.user_id = u.id AND visited_at = dr.date)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)

SELECT ps.id, ps.username, ps.created_at, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
LEFT JOIN visits v ON (v.id = ps.id AND v.date = ps.date)
GROUP BY ps.id, ps.username, ps.created_at, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END

Thanks @Grayden_Shand

The error has gone.

If I may pick your brains a little more, do the tallies generated by this query include posts by email, and replies by email for users on email list mode? If not, how can I include them?

Also how can I include the value of a custom user profile field beside the user’s name?

Are there any tips you can offer as to how to identify the field name involved, and implement this?

Yes it should. As David mentioned, the posts table contains a via_email boolean field. The query currently ignores this field and counts all posts whether they were posted via email or not.

There’s a table named user_custom_fields. You would need to join on this table to include a particular custom field.

I would probably do it in the post_summary sub-query.

e.g.

...
), posts_summary AS (
SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received, cf.value as LABEL_FOR_CUSTOM_FIELD
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
LEFT JOIN user_custom_fields cf ON (u.id == cf.user_id and cf.name == "YOUR CUSTOM FIELD NAME") 
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)
...

I added a column to the SELECT clause, and added a new JOIN clause for the user_custom_fields table.

Note that you’d need to replace "YOUR CUSTOM FIELD NAME" and LABEL_FOR_CUSTOM_FIELD.

You’ll then also need to update the columns you’re selecting in the final query.

...
SELECT ps.id, ps.username, ps.created_at, ps.LABEL_FOR_CUSTOM_FIELD, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
..

That’s probably how I would approach it.

Good luck!

Thank you so much for that!

The custom user profile field I am trying to use is ‘NZRAB number’

I had a go, with the resulting script below, but obviously got it wrong, as this gave ‘Undefined function’ error

I was not sure if field name is supposed to exclude capital letters or spaces - the Editor did not seem to like the space in the fieldname at least, so I swapped in an underscore

I might also be conflating the field name with the field label - but if so I am not sure how to tell what field name belongs to the label ‘NZRAB number’

-- coverage: 'week', 'all', or 'date'
-- [params]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (
SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received, cf.value as NZRAB_number
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
LEFT JOIN user_custom_fields cf ON (u.id == cf.user_id and cf.name == "NZRAB number") 
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)
SELECT ps.id, ps.username, ps.created_at, ps.NZRAB_number, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps

LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
LEFT JOIN visits v ON (v.id = ps.id AND v.date = ps.date)
GROUP BY ps.id, ps.username, ps.created_at, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END

resulting error:

PG::UndefinedFunction: ERROR: operator does not exist: integer == integer
LINE 32: LEFT JOIN user_custom_fields cf ON (u.id == cf.user_id and c…
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

If you concluded from this that I have not a clue what I am doing, you would be right!

I was able to get a running example from your code.

-- coverage: 'week', 'all', or 'date'
-- [params]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (
SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received, cf.value as nzrab_number
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
LEFT JOIN user_custom_fields cf ON (u.id = cf.user_id and cf.name = CONCAT('user_field_', (select id::text from user_fields where name='NZRAB number'))) 
GROUP BY u.id, dr.date, dr.week, cf.value
ORDER BY u.id, dr.date
)

SELECT ps.id, ps.username, ps.created_at, ps.NZRAB_number, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given
FROM posts_summary ps
LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
GROUP BY ps.id, ps.username, ps.created_at, ps.nzrab_number, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END

Note, the relationship on user_custom_fields is a little weird. If it doesn’t work, you may need to run a separate query to determine the correct name to use:

In a separate query run:

select * from user_custom_fields

Find a record in the result set that matches up with your “NZRAB number” field, and look for the name. For me it was “user_field_2”.

Once you have that replace this line in the above query:

LEFT JOIN user_custom_fields cf ON (u.id = cf.user_id and cf.name = CONCAT('user_field_', (select id::text from user_fields where name='NZRAB number'))) 

with

LEFT JOIN user_custom_fields cf ON (u.id = cf.user_id and cf.name = <YOUR FIELD NAME>)

Hopefully you don’t need to do that.

Awesome - that worked for me, just as is!

To push my luck, is it technically possible to separately tally posts made that received likes vs those that didn’t?

The reason I ask is that there is otherwise no way to tell whether someone who got 8 likes got them all for a single post, or if they were spread out over several posts

The other question was whether there is a way to constrain output to only those people with NZRAB_number custom field between two user supplied numbers? (cutting out those signups who entered text, or nothing at all for this field)

And to list the user’s real name beside their user id?