Requisito complesso per il report sull'attività utente

Sto cercando un modo per far sì che Discourse generi report di attività sotto forma di una panoramica di tutte le attività degli utenti tra due date personalizzate ogni anno (la funzionalità integrata attuale sembra avere solo intervalli predefiniti, e solo in relazione alla data di oggi) e salvare i risultati in un file compatibile con Excel (attualmente questo sembra possibile solo per i report di attività dei singoli utenti).

  • Vorrei includere anche i post inviati via email e a cui è stato risposto via email (senza doppi conteggi dove gli stessi post vengono letti o pubblicati mentre si utilizza il forum online). I report attuali sembrano escludere i post via email.

  • Vorrei filtrare questi risultati in base a un determinato campo del profilo utente personalizzato (un numero di adesione univoco).

  • Vorrei escludere i risultati per gli utenti con numeri di adesione non compresi in un intervallo numerico nominato.

  • Idealmente, vorrei anche la possibilità di generare un punteggio di punti di attività per utente per il periodo, ponderato in base ai post letti, ai post pubblicati e ai post che hanno ricevuto like.

  • Le ponderazioni (moltiplicatori) per ciascuna di queste variabili di attività dovrebbero essere regolabili e impostate dall’amministratore. I risultati dovrebbero essere arrotondati per difetto al multiplo di 5 più vicino e limitati a un numero massimo di punti preimpostato, stabilito dall’amministratore.

  • Sarebbe gradito avere una sorta di ripartizione per categoria di argomento/tag per utente.

  • Idealmente, questo report verrebbe generato automaticamente e inviato via email a me a orari prestabiliti ogni anno (questo sarebbe la ciliegina sulla torta).

Quanto è fattibile tutto questo?

Sarebbe necessario un nuovo plugin personalizzato per implementarlo, o è possibile tramite una sorta di query SQL avanzata all’interno del plugin Data Explorer attuale?

O sarebbe la mossa intelligente cercare un’opzione di esportazione “tutto” relativamente semplice e tentare il resto utilizzando Excel?

2 Mi Piace

Data explorer would probably be the best approach here. You may need to create different queries for each task (e.g. the per user breakdown would be a different query)

The only thing data explorer can’t do is the ‘auto generated and emailed to me’ point. If this is a requirement, you could potentially do it by calling the data explorer API from some other system.

3 Mi Piace

Thanks David.
Looks like I will need to get my head around SQL queries.

Can one query take as input filtered output from another query?. Also can email list based interaction definitely be included? is this stored in a different way somehow? I was worried that it had been excluded from standard activity reports for some obscure but unsurmountable reason…

No, you would have to copy/paste sections of the query.

The posts table in the database includes a via_email boolean, so yes you can find out which posts were created via email :+1:

However, Discourse doesn’t include any trackers in the emails it sends, so there won’t be any way to determine whether an email notification has been ‘read’ or not

Is the ‘baked-in’ user activity reporting functionality handled via SQL queries that I can copy from anywhere and tinker with? - so I don’t end up spending weeks reinventing the wheel?

Are you looking at the “daily engaged users” report? That’s generated with this logic, which uses ActiveRecord (so no raw SQL). But still, the logic might be a useful starting point.

I am aiming for a per user activity report, in csv format, spanning from one date to another. The end goal is to give each user an activity points score for the year (or other period) based on messages received/read online or via email, messages posted online or via email, and where messages with likes get more points. The report I was wondering about using as a base is the first one you see when clicking Admin/Users, because it already does a lot of what I want

The user directory logic can be found here: discourse/app/models/directory_item.rb at 1e66e4602f408d7b2d1105b6c228542dce7f4b38 · discourse/discourse · GitHub

Hmm, not looking like a trivial undertaking to replicate via SQL (at least not for me), as I will need to learn enough SQL to pull it off from scratch at the same time, if there is nothing similar enough I can copy from directly.
Last time I did any programming was many years ago in high school, when BASIC was not quite as embarrassing to admit to.

1 Mi Piace

Yeah I think this will require a reasonable knowledge of SQL to implement. If you have a budget for the work, you might be able to find someone who can help in the marketplace

Cheers, I don’t have a budget (forum generates little donation income beyond basic hosting costs), but looking like I may have to go that way anyway

1 Mi Piace

@Paul_King

Questa query potrebbe essere d’aiuto.

https://meta.discourse.org/t/daily-weekly-or-total-stats-by-user-over-a-specified-time-range/275167u=grayden_shand

Many thanks! I tried it, but am getting a syntax error

PG::SyntaxError: ERROR: syntax error at or near “WITH”
LINE 13: WITH date_range AS (

(apologies, I sent a PM to you via that thread before twigging that you and the original author there are one and the same!)

1 Mi Piace

Do you mind if I ask what values you are using for the variables?

I just pulled the query and ran it against a test site with the following values:

  • start_date: 2021-07-01
  • end_date: 2021-07-30
  • coverage: all

Hi, I tried a few, but all gave me that result - eg

Sorry, I am not able to reproduce the error.

Can you paste the query here as you have it?

SELECT 1-- coverage: 'week', 'all', or 'date'
-- [params]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date), 
visits AS (SELECT u.id, dr.*, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, COUNT(uv.*) AS visits
FROM date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN user_visits uv ON (uv.user_id = u.id AND visited_at = dr.date)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)

SELECT ps.id, ps.username, ps.created_at, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
LEFT JOIN visits v ON (v.id = ps.id AND v.date = ps.date)
GROUP BY ps.id, ps.username, ps.created_at, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END
1 Mi Piace

Ah I see.

The SELECT 1 at the beginning is not part of the query and is the source of your trouble. That’s the placeholder when you create a new query in the Data Explorer. Remove that and it should work.

-- coverage: 'week', 'all', or 'date'
-- [params]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date), 
visits AS (SELECT u.id, dr.*, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, COUNT(uv.*) AS visits
FROM date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN user_visits uv ON (uv.user_id = u.id AND visited_at = dr.date)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)

SELECT ps.id, ps.username, ps.created_at, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
LEFT JOIN visits v ON (v.id = ps.id AND v.date = ps.date)
GROUP BY ps.id, ps.username, ps.created_at, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END

Thanks @Grayden_Shand

The error has gone.

If I may pick your brains a little more, do the tallies generated by this query include posts by email, and replies by email for users on email list mode? If not, how can I include them?

Also how can I include the value of a custom user profile field beside the user’s name?

Are there any tips you can offer as to how to identify the field name involved, and implement this?

Yes it should. As David mentioned, the posts table contains a via_email boolean field. The query currently ignores this field and counts all posts whether they were posted via email or not.

There’s a table named user_custom_fields. You would need to join on this table to include a particular custom field.

I would probably do it in the post_summary sub-query.

e.g.

...
), posts_summary AS (
SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received, cf.value as LABEL_FOR_CUSTOM_FIELD
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
LEFT JOIN user_custom_fields cf ON (u.id == cf.user_id and cf.name == "YOUR CUSTOM FIELD NAME") 
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)
...

I added a column to the SELECT clause, and added a new JOIN clause for the user_custom_fields table.

Note that you’d need to replace "YOUR CUSTOM FIELD NAME" and LABEL_FOR_CUSTOM_FIELD.

You’ll then also need to update the columns you’re selecting in the final query.

...
SELECT ps.id, ps.username, ps.created_at, ps.LABEL_FOR_CUSTOM_FIELD, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
..

That’s probably how I would approach it.

Good luck!