复杂的用户活动报告要求

我正在寻找一种方法,让 Discourse 生成活动报告,以每年任意两个自定义日期之间的所有用户活动概览形式呈现(目前内置功能似乎仅支持预设范围,且仅与当前日期相关),并将结果导出为兼容 Excel 的文件(目前看来这仅适用于单个用户活动报告)。

  • 我希望同时包含通过电子邮件发送和回复的帖子(避免在通过在线论坛阅读或发布同一帖子时重复计数)。目前的报告似乎排除了通过电子邮件发布的帖子。

  • 我希望根据特定的自定义用户资料字段(唯一的会员编号)筛选这些结果。

  • 我希望排除会员编号不在指定数值范围内的用户的结果。

  • 理想情况下,我还希望能够为每位用户生成该期间的活动积分得分,根据阅读的帖子、发布的帖子以及获得赞的帖子进行加权。

  • 每个活动变量的权重(乘数)应由管理员进行调整和设置。结果应向下取整至最接近的 5 分,并在管理员设定的预设最高分数处截断。

  • 如果能有某种按用户划分的主题分类/标签细分,那就更好了。

  • 理想情况下,该报告应每年在指定时间自动生成并发送给我的电子邮件(这将是锦上添花)。

所有这些功能的可行性如何?

这需要开发一个新的自定义插件来实现,还是可以通过当前数据探索器(Data Explorer)插件中的某种复杂的 SQL 查询来实现?

或者,更明智的做法是寻找一个相对简单的“导出全部”选项,然后尝试在 Excel 中完成其余工作?

2 个赞

数据探索器可能是最佳方案。您可能需要为每项任务创建不同的查询(例如,按用户细分的查询就是一项不同的查询)。

数据探索器唯一无法实现的是“自动生成并发送电子邮件给我”这一功能。如果这是必需功能,您可以考虑从其他系统调用数据探索器 API 来实现。

3 个赞

谢谢 David。
看来我需要好好了解一下 SQL 查询了。

一个查询能否将另一个查询的过滤后输出作为输入?另外,基于电子邮件列表的交互是否肯定可以包含在内?这些数据是否以某种不同的方式存储?我曾担心它因某种晦涩但无法克服的原因而被排除在标准活动报告之外。

不行,您必须手动复制/粘贴查询的部分内容。

数据库中的 posts 表包含一个 via_email 布尔字段,因此您确实可以找出哪些帖子是通过电子邮件创建:+1:

不过,Discourse 在发送的邮件中不包含任何追踪器,因此无法确定邮件通知是否已被“阅读”。

内置的用户活动报告功能是否通过 SQL 查询实现?我能否直接复制这些查询并进行调整,以避免重复造轮子而浪费数周时间?

您是在查看“每日活跃用户”报告吗?该报告是根据此逻辑生成的,它使用了 ActiveRecord(因此没有原始 SQL)。不过,该逻辑仍可能作为一个有用的起点。

我希望能生成一份按用户划分的活动报告,格式为 CSV,时间范围从某一日期到另一日期。最终目标是为每位用户计算年度(或其他周期)的活动积分,积分依据包括:在线或通过电子邮件接收/阅读的消息、在线或通过电子邮件发布消息,以及带有“点赞”的消息将获得更多积分。我想以点击“管理员/用户”后看到的第一个报告为基础,因为它已经实现了许多我需要的功能。

用户目录逻辑可在此处找到:discourse/app/models/directory_item.rb at 1e66e4602f408d7b2d1105b6c228542dce7f4b38 · discourse/discourse · GitHub

嗯,通过 SQL 复现看起来并非易事(至少对我来说是这样),因为如果没有现成的类似代码可直接参考,我就得从头开始学习足够的 SQL 知识来完成。上次接触编程还是很多年前在高中时,那时用 BASIC 语言编程还不像现在这样让人难以启齿。

1 个赞

是的,我认为实现这一点需要对 SQL 有合理的了解。如果您有相关预算,或许可以在 Marketplace 频道中找到能提供帮助的人。

干杯!我没有预算(论坛产生的捐赠收入仅够支付基本托管费用),但看起来我可能不得不那样做了。

1 个赞

@Paul_King

此查询可能会有所帮助。

https://meta.discourse.org/t/daily-weekly-or-total-stats-by-user-over-a-specified-time-range/275167u=grayden_shand

非常感谢!我试过了,但遇到了语法错误

PG::SyntaxError: ERROR: 语法错误,位置在 “WITH” 附近
LINE 13: WITH date_range AS (

(抱歉,我之前在那个帖子中通过私信联系过您,后来才发现您和那里的原作者是同一个人!)

1 个赞

介意我问一下您使用的变量值是什么吗?

我刚提取了查询,并用以下值在测试站点上运行:

  • start_date: 2021-07-01
  • end_date: 2021-07-30
  • coverage: all

你好,我试了几个,但都出现了那个结果——例如

抱歉,我无法复现该错误。

能否将您目前的查询粘贴到这里?

SELECT 1-- 覆盖范围:'week'(周)、'all'(全部)或 'date'(日期)
-- [参数]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date), 
visits AS (SELECT u.id, dr.*, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, COUNT(uv.*) AS visits
FROM date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN user_visits uv ON (uv.user_id = u.id AND visited_at = dr.date)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)

SELECT ps.id, ps.username, ps.created_at, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
LEFT JOIN visits v ON (v.id = ps.id AND v.date = ps.date)
GROUP BY ps.id, ps.username, ps.created_at, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END
1 个赞

啊,我明白了。

开头的 SELECT 1 并不是查询的一部分,而是导致你问题的根源。这是你在数据资源管理器中创建新查询时生成的占位符。将其删除后,问题应该就能解决。

-- coverage: 'week', 'all', 或 'date'
-- [params]
-- date :start_date = 2019-08-27
-- date :end_date = 2019-09-30
-- text :coverage = week

WITH date_range AS (
SELECT date_trunc('day', dd):: date AS "date", EXTRACT(week from date_trunc('day', dd):: date) AS "week"
FROM generate_series
        ( :start_date::timestamp 
        , :end_date::timestamp
        , '1 day'::interval) dd
), likes_given AS (
SELECT u.id, dr.date, dr.week, count(pa.*) AS "likes"
FROM date_range dr
FULL JOIN users u ON (1=1)
LEFT JOIN post_actions pa ON (pa.created_at:: date = dr.date and post_action_type_id=2 AND user_id = u.id)
GROUP BY dr.date, dr.week, u.id
ORDER BY u.id, dr.date
), posts_summary AS (SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date), 
visits AS (SELECT u.id, dr.*, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, COUNT(uv.*) AS visits
FROM date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN user_visits uv ON (uv.user_id = u.id AND visited_at = dr.date)
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)

SELECT ps.id, ps.username, ps.created_at, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
LEFT JOIN likes_given lg ON (ps.date = lg.date AND ps.id = lg.id)
LEFT JOIN visits v ON (v.id = ps.id AND v.date = ps.date)
GROUP BY ps.id, ps.username, ps.created_at, CASE 
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  
ORDER BY ps.id, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END

感谢 @Grayden_Shand

错误已消失。

如果我再向您请教一下,此查询生成的统计是否包含通过电子邮件发布的帖子,以及处于邮件列表模式的用户通过电子邮件收到的回复?如果不是,该如何将它们包含在内?

另外,如何在用户姓名旁边显示自定义用户配置文件字段的值?

您能否提供一些提示,说明如何识别相关字段名称并实现这一功能?

是的,应该包含。正如 David 提到的,posts 表中有一个名为 via_email 的布尔字段。当前的查询忽略了该字段,统计了所有帖子,无论它们是否通过电子邮件发布。

有一个名为 user_custom_fields 的表。你需要将该表连接进来以包含特定的自定义字段。

我可能会在 post_summary 子查询中完成此操作。

例如:

...
), posts_summary AS (
SELECT u.id, u.username, u.created_at, dr.*, count(p.id) - count(t.id) AS replies, count(t.id) AS topics, COALESCE(sum(p.like_count),0) AS likes_received, cf.value as LABEL_FOR_CUSTOM_FIELD
from date_range dr
FULL OUTER JOIN users u ON (1=1)
LEFT JOIN posts p ON (p.user_id = u.id AND p.created_at::date=dr.date AND p.deleted_at IS NULL)
LEFT JOIN topics t ON (t.user_id = u.id AND t.created_at::date = dr.date AND p.topic_id = t.id AND t.deleted_at IS NULL)
LEFT JOIN user_custom_fields cf ON (u.id == cf.user_id and cf.name == "YOUR CUSTOM FIELD NAME") 
GROUP BY u.id, dr.date, dr.week
ORDER BY u.id, dr.date
)
...

我在 SELECT 子句中增加了一列,并为 user_custom_fields 表添加了一个新的 JOIN 子句。

请注意,你需要将 "YOUR CUSTOM FIELD NAME"LABEL_FOR_CUSTOM_FIELD 替换为实际的值。

随后,你还需要更新最终查询中选择的列。

...
SELECT ps.id, ps.username, ps.created_at, ps.LABEL_FOR_CUSTOM_FIELD, CASE
WHEN :coverage::text = 'week' THEN ps.week::text
WHEN :coverage::text = 'all' THEN '-1'
ELSE ps.date::text
END  AS period, sum(ps.replies) AS replies, sum(ps.topics) AS topics, sum(ps.likes_received) AS likes_received, sum(lg.likes) AS likes_given, COALESCE(sum(posts_read),0) AS posts_read, COALESCE(sum(time_read),0) AS time_read, SUM(visits) AS visits
FROM posts_summary ps
..

这大概是我会采用的方法。

祝你好运!