各位数据探索高手,大家好!
是否有人尝试过通过数据探索查询,按月份统计总的参与次数?
我的意思是,按月份统计所有帖子、解决方案和点赞的总数。
提前感谢!
各位数据探索高手,大家好!
是否有人尝试过通过数据探索查询,按月份统计总的参与次数?
我的意思是,按月份统计所有帖子、解决方案和点赞的总数。
提前感谢!
看看您是否喜欢这个。
我没找到我要找的那个。
我并不是数据探索专家,但有空时我喜欢写查询。在过于投入之前,我假设您想要的是按月统计数据的细分。如果是这样,类似下面的查询或许能行:
--[params]
-- date :start_date
WITH month_starts AS (
SELECT generate_series(date_trunc('month', :start_date::date), CURRENT_DATE, interval '1 month')::date AS month_start
),
monthly_posts AS (
SELECT
month_start,
COUNT(1) AS posts_count
FROM posts p
JOIN month_starts
ON p.created_at::date >= month_start AND p.created_at::date <= month_start + interval '1 month - 1 day'
WHERE p.deleted_at IS NULL
AND p.post_type = 1
AND p.created_at >= :start_date
GROUP BY month_start
),
monthly_total_users AS (
SELECT
month_start,
COUNT(1) AS total_users_count
FROM users u
JOIN month_starts
ON u.created_at::date <= month_start + interval '1 month - 1 day'
WHERE u.id > 0
GROUP BY month_start
),
monthly_active_users AS (
SELECT
month_start,
COUNT(DISTINCT user_id) AS active_users_count
fROM user_visits uv
JOIN month_starts
ON uv.visited_at >= month_start AND uv.visited_at <= month_start + interval '1 month - 1 day'
WHERE uv.visited_at >= :start_date
GROUP BY month_start
),
monthly_solutions AS (
SELECT
month_start,
COUNT(1) AS solutions_count
FROM user_actions ua
JOIN month_starts ms
ON ua.created_at::date >= month_start AND ua.created_at::date <= month_start + interval '1 month - 1 day'
WHERE ua.action_type = 15
AND ua.created_at >= :start_date
GROUP BY month_start
),
monthly_likes AS (
SELECT
month_start,
COUNT(1) AS likes_count
FROM user_actions ua
JOIN month_starts ms
ON ua.created_at::date >= month_start AND ua.created_at::date <= month_start + interval '1 month - 1 day'
WHERE ua.action_type = 2
AND ua.created_at >= :start_date
GROUP BY month_start
)
SELECT
ms.month_start,
COALESCE(posts_count, 0) AS posts_count,
COALESCE(total_users_count, 0) AS total_users_count,
COALESCE(active_users_count, 0) AS active_users_count,
COALESCE(solutions_count, 0) AS solutions_count,
COALESCE(likes_count, 0) AS likes_count
FROM month_starts ms
LEFT JOIN monthly_posts mp ON mp.month_start = ms.month_start
LEFT JOIN monthly_total_users mtu ON mtu.month_start = ms.month_start
LEFT JOIN monthly_active_users mau ON mau.month_start = ms.month_start
LEFT JOIN monthly_solutions mts ON mts.month_start = ms.month_start
LEFT JOIN monthly_likes ml ON ml.month_start = ms.month_start
ORDER BY month_start DESC
在运行查询之前,您需要为 start_date 参数提供一个值。格式应为 yyyy-mm-dd。不过,该查询仅从该日期中提取月份部分。之所以需要起始日期参数,是因为如果查询针对站点上线以来的整个时间段运行,在大型站点上会导致超时。使用起始日期参数,我可以在 Meta 站点上运行长达数年的数据查询而不会超时。
关于该查询的一点说明是,active_users_count 列返回的是当月_登录_过站点的唯一用户数量——它并不检查这些用户是否在站点上执行了任何操作(例如,点赞帖子或创建帖子)。虽然查询或许可以做到这一点,但我担心会出现超时问题。
如果这不是您想要的数据类型,请告诉我。如果总体思路正确,请告知您是否需要向查询中添加其他数据,或者您是否发现结果中有任何看似错误的地方。
感谢 @simon 分享,这非常有用,基本上正是我需要的!我想问一下,是否可以对查询做一点修改?我不希望指定开始日期。我希望将所有这些值(帖子、用户、解决方案、点赞)汇总,并按月份显示查询结果,如下所示:
这或许可行,我会尝试一下。开始日期参数是在最后时刻添加到查询中的,因为我发现如果对整个站点上线以来的全部时间范围在 Meta 上运行该查询,会导致超时。也许有一些方法可以提高查询效率,从而解决超时问题。如果不行,查询应该允许设置一个时间范围,而不仅仅是指定开始日期。这样,你只需通过设置不同的时间范围运行几次查询,就能获取站点的所有数据。
没问题!等您有时间进行调整时,我们再继续。
嗨,Konrad,
以下是调整后的查询。
WITH monthly_users AS (
SELECT
date_part('year', created_at) AS year,
date_part('month', created_at) AS month,
COUNT(*) AS "new_users_month"
FROM users
WHERE id > 0
GROUP BY date_part('year', created_at), date_part('month', created_at)
ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),
monthly_posts AS (
SELECT
date_part('year', created_at) AS year,
date_part('month', created_at) AS month,
COUNT(*) AS "posts_count"
FROM posts p
WHERE p.deleted_at IS NULL
AND p.post_type = 1
GROUP BY date_part('year', created_at), date_part('month', created_at)
ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),
monthly_active_users AS (
SELECT
date_part('year', visited_at) AS year,
date_part('month', visited_at) AS month,
COUNT(DISTINCT user_id) AS "active_users_count"
FROM user_visits uv
GROUP BY date_part('year', visited_at), date_part('month', visited_at)
ORDER BY date_part('year', visited_at) ASC, date_part('month', visited_at)
),
monthly_solutions AS (
SELECT
date_part('year', created_at) AS year,
date_part('month', created_at) AS month,
COUNT(*) AS "solutions_count"
FROM user_actions ua
WHERE ua.action_type = 15
GROUP BY date_part('year', created_at), date_part('month', created_at)
ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),
monthly_likes AS (
SELECT
date_part('year', created_at) AS year,
date_part('month', created_at) AS month,
COUNT(*) AS "likes_count"
FROM user_actions ua
WHERE ua.action_type = 2
GROUP BY date_part('year', created_at), date_part('month', created_at)
ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
)
SELECT
mu.year,
mu.month,
SUM(new_users_month) over (ORDER BY mu.year, mu.month rows between unbounded preceding AND current row) AS total_users,
posts_count,
COALESCE(active_users_count, 0) AS active_users_count,
COALESCE(solutions_count, 0) AS solutions_count,
COALESCE(likes_count, 0) AS solutions_count
FROM monthly_users mu
LEFT JOIN monthly_posts mp ON mp.year = mu.year AND mp.month = mu.month
LEFT JOIN monthly_active_users mau ON mau.year = mu.year AND mau.month = mu.month
LEFT JOIN monthly_solutions ms ON ms.year = mu.year AND ms.month = mu.month
LEFT JOIN monthly_likes ml ON ml.year = mu.year AND ml.month = mu.month
ORDER BY mu.year, mu.month
WITH monthly_users AS (
SELECT
date_part('year', created_at) AS year,
date_part('month', created_at) AS month,
COUNT(*) AS new_users_month
FROM users
WHERE id > 0
GROUP BY date_part('year', created_at), date_part('month', created_at)
ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),
monthly_posts AS (
SELECT
date_part('year', created_at) AS year,
date_part('month', created_at) AS month,
COUNT(*) AS posts_count
FROM posts p
WHERE p.deleted_at IS NULL
AND p.post_type = 1
GROUP BY date_part('year', created_at), date_part('month', created_at)
ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),
monthly_active_users AS (
SELECT
date_part('year', visited_at) AS year,
date_part('month', visited_at) AS month,
COUNT(DISTINCT user_id) AS active_users_count
FROM user_visits uv
GROUP BY date_part('year', visited_at), date_part('month', visited_at)
ORDER BY date_part('year', visited_at) ASC, date_part('month', visited_at)
),
monthly_solutions AS (
SELECT
date_part('year', created_at) AS year,
date_part('month', created_at) AS month,
COUNT(*) AS solutions_count
FROM user_actions ua
WHERE ua.action_type = 15
GROUP BY date_part('year', created_at), date_part('month', created_at)
ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),
monthly_likes AS (
SELECT
date_part('year', created_at) AS year,
date_part('month', created_at) AS month,
COUNT(*) AS likes_count
FROM user_actions ua
WHERE ua.action_type = 2
GROUP BY date_part('year', created_at), date_part('month', created_at)
ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
)
SELECT
mu.year,
mu.month,
SUM(new_users_month + COALESCE(posts_count,0) +
COALESCE(active_users_count, 0) +
COALESCE(solutions_count, 0) +
COALESCE(likes_count, 0))
over (ORDER BY mu.year, mu.month rows between unbounded preceding AND current row) AS sum_total
FROM monthly_users mu
LEFT JOIN monthly_posts mp ON mp.year = mu.year AND mp.month = mu.month
LEFT JOIN monthly_active_users mau ON mau.year = mu.year AND mau.month = mu.month
LEFT JOIN monthly_solutions ms ON ms.year = mu.year AND ms.month = mu.month
LEFT JOIN monthly_likes ml ON ml.year = mu.year AND ml.month = mu.month
ORDER BY mu.year, mu.month
这正是我想要的!再次感谢 @michebs!你们有没有考虑过创建一个开源的“数据探索器”查询集合,并与 Discourse 用户分享?
是的,这就是数据资源管理器内置的查询列表 ![]()
是的,我知道,但你会在此基础上扩展吗?例如,根据论坛里的这些问题。
我们可以接受添加查询的 PR,示例如下:
此外,(Superseded) What cool data explorer queries have you come up with? 以及 @SidV 的列表 discourse-data-explorer/querys.md at queries · SidVal/discourse-data-explorer · GitHub 也提供了大量有用的查询!
太好了!感谢分享!