Complex user activity report requirement

I am looking for a way to get Discourse to generate activity reports in the form of an overview of all user activity between any two custom dates each year (current built in functionality only seems to have preset ranges, and only in relation to today’s date), and to save the results out to a excel compatible file (currently this looks to be only possible for individual user activity reports)

  • I would want to also include posts sent by email and responded to by email (without double counting where the same posts are read or posted while using the forum online). Current reporting seems to exclude posts via email.

  • I would want to filter these results according to a certain custom user profile field (a unique membership number).

  • I would want to exclude results for users with membership numbers that are not within a nominated numeric range.

  • I would also ideally want the ability to generate an activity points score per user for the period, weighted according to posts read, posts made, and posts that have received likes.

  • The weightings (multipliers) for each of these activity variables to be adjustable and set by administrator. Results to rounded down to the nearest 5 points, and to be cut off at a preset maximum number of points, set by the administrator.

  • Nice to have would be some sort of topic category / tag breakdown per user

  • Ideally this report would be auto generated and emailed to me at a nominated times each year (that would be the cherry on the cake though)

How feasible is all this?

Would it require a new custom plugin to implement, or is it possible via some sort of fancy SQL query within current the Data Explorer plugin?

Or would the smart move to be look for a relatively dumb export -all option, and attempt the rest using Excel?

2 Likes

Data explorer would probably be the best approach here. You may need to create different queries for each task (e.g. the per user breakdown would be a different query)

The only thing data explorer can’t do is the ‘auto generated and emailed to me’ point. If this is a requirement, you could potentially do it by calling the data explorer API from some other system.

1 Like

Thanks David.
Looks like I will need to get my head around SQL queries.

Can one query take as input filtered output from another query?. Also can email list based interaction definitely be included? is this stored in a different way somehow? I was worried that it had been excluded from standard activity reports for some obscure but unsurmountable reason…

No, you would have to copy/paste sections of the query.

The posts table in the database includes a via_email boolean, so yes you can find out which posts were created via email :+1:

However, Discourse doesn’t include any trackers in the emails it sends, so there won’t be any way to determine whether an email notification has been ‘read’ or not

Is the ‘baked-in’ user activity reporting functionality handled via SQL queries that I can copy from anywhere and tinker with? - so I don’t end up spending weeks reinventing the wheel?

Are you looking at the “daily engaged users” report? That’s generated with this logic, which uses ActiveRecord (so no raw SQL). But still, the logic might be a useful starting point.

I am aiming for a per user activity report, in csv format, spanning from one date to another. The end goal is to give each user an activity points score for the year (or other period) based on messages received/read online or via email, messages posted online or via email, and where messages with likes get more points. The report I was wondering about using as a base is the first one you see when clicking Admin/Users, because it already does a lot of what I want

The user directory logic can be found here: discourse/directory_item.rb at 1e66e4602f408d7b2d1105b6c228542dce7f4b38 · discourse/discourse · GitHub

Hmm, not looking like a trivial undertaking to replicate via SQL (at least not for me), as I will need to learn enough SQL to pull it off from scratch at the same time, if there is nothing similar enough I can copy from directly.
Last time I did any programming was many years ago in high school, when BASIC was not quite as embarrassing to admit to.

1 Like

Yeah I think this will require a reasonable knowledge of SQL to implement. If you have a budget for the work, you might be able to find someone who can help in the #marketplace

Cheers, I don’t have a budget (forum generates little donation income beyond basic hosting costs), but looking like I may have to go that way anyway

1 Like

@Paul_King

This query might help.

Many thanks! I tried it, but am getting a syntax error

PG::SyntaxError: ERROR: syntax error at or near “WITH”
LINE 13: WITH date_range AS (

(apologies, I sent a PM to you via that thread before twigging that you and the original author there are one and the same!)

1 Like

Do you mind if I ask what values you are using for the variables?

I just pulled the query and ran it against a test site with the following values:

  • start_date: 2021-07-01
  • end_date: 2021-07-30
  • coverage: all

Hi, I tried a few, but all gave me that result - eg

Sorry, I am not able to reproduce the error.

Can you paste the query here as you have it?

SELECT 1-- coverage: 'week', 'all', or 'date'
-- [params]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date), 
visits AS (SELECT u.id, dr.*, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, COUNT(uv.*) AS visits
FROM date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN user_visits uv ON (uv.user_id = u.id AND visited_at = dr.date)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)

SELECT ps.id, ps.username, ps.created_at, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
LEFT JOIN visits v ON (v.id = ps.id AND v.date = ps.date)
GROUP BY ps.id, ps.username, ps.created_at, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END
1 Like

Ah I see.

The SELECT 1 at the beginning is not part of the query and is the source of your trouble. That’s the placeholder when you create a new query in the Data Explorer. Remove that and it should work.

-- coverage: 'week', 'all', or 'date'
-- [params]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date), 
visits AS (SELECT u.id, dr.*, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, COUNT(uv.*) AS visits
FROM date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN user_visits uv ON (uv.user_id = u.id AND visited_at = dr.date)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)

SELECT ps.id, ps.username, ps.created_at, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
LEFT JOIN visits v ON (v.id = ps.id AND v.date = ps.date)
GROUP BY ps.id, ps.username, ps.created_at, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END

Thanks @Grayden_Shand

The error has gone.

If I may pick your brains a little more, do the tallies generated by this query include posts by email, and replies by email for users on email list mode? If not, how can I include them?

Also how can I include the value of a custom user profile field beside the user’s name?

Are there any tips you can offer as to how to identify the field name involved, and implement this?

Yes it should. As David mentioned, the posts table contains a via_email boolean field. The query currently ignores this field and counts all posts whether they were posted via email or not.

There’s a table named user_custom_fields. You would need to join on this table to include a particular custom field.

I would probably do it in the post_summary sub-query.

e.g.

...
), posts_summary AS (
SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received, cf.value as LABEL_FOR_CUSTOM_FIELD
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
LEFT JOIN user_custom_fields cf ON (u.id == cf.user_id and cf.name == "YOUR CUSTOM FIELD NAME") 
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)
...

I added a column to the SELECT clause, and added a new JOIN clause for the user_custom_fields table.

Note that you’d need to replace "YOUR CUSTOM FIELD NAME" and LABEL_FOR_CUSTOM_FIELD.

You’ll then also need to update the columns you’re selecting in the final query.

...
SELECT ps.id, ps.username, ps.created_at, ps.LABEL_FOR_CUSTOM_FIELD, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
..

That’s probably how I would approach it.

Good luck!