使用 Data Explorer 按月计算总参与次数

各位数据探索高手,大家好!

是否有人尝试过通过数据探索查询,按月份统计总的参与次数?

我的意思是,按月份统计所有帖子、解决方案和点赞的总数。

提前感谢!

看看您是否喜欢这个。

我没找到我要找的那个。

我并不是数据探索专家,但有空时我喜欢写查询。在过于投入之前,我假设您想要的是按月统计数据的细分。如果是这样,类似下面的查询或许能行:

--[params]
-- date :start_date

WITH month_starts AS (
SELECT generate_series(date_trunc('month', :start_date::date), CURRENT_DATE, interval '1 month')::date AS month_start
),
monthly_posts AS (
SELECT
month_start,
COUNT(1) AS posts_count
FROM posts p
JOIN month_starts
ON p.created_at::date >= month_start AND p.created_at::date <= month_start + interval '1 month - 1 day'
WHERE p.deleted_at IS NULL
AND p.post_type = 1
AND p.created_at >= :start_date
GROUP BY month_start
),
monthly_total_users AS (
SELECT
month_start,
COUNT(1) AS total_users_count
FROM users u
JOIN month_starts
ON u.created_at::date <= month_start + interval '1 month - 1 day'
WHERE u.id > 0
GROUP BY month_start
),
monthly_active_users AS (
SELECT
month_start,
COUNT(DISTINCT user_id) AS active_users_count
fROM user_visits uv
JOIN month_starts
ON uv.visited_at >= month_start AND uv.visited_at <= month_start + interval '1 month - 1 day'
WHERE uv.visited_at >= :start_date
GROUP BY month_start
),
monthly_solutions AS (
SELECT
month_start,
COUNT(1) AS solutions_count
FROM user_actions ua
JOIN month_starts ms
ON ua.created_at::date >= month_start AND ua.created_at::date <= month_start + interval '1 month - 1 day'
WHERE ua.action_type = 15
AND ua.created_at >= :start_date
GROUP BY month_start
),
monthly_likes AS (
SELECT
month_start,
COUNT(1) AS likes_count
FROM user_actions ua
JOIN month_starts ms
ON ua.created_at::date >= month_start AND ua.created_at::date <= month_start + interval '1 month - 1 day'
WHERE ua.action_type = 2
AND ua.created_at >= :start_date
GROUP BY month_start
)

SELECT
ms.month_start,
COALESCE(posts_count, 0) AS posts_count,
COALESCE(total_users_count, 0) AS total_users_count,
COALESCE(active_users_count, 0) AS active_users_count,
COALESCE(solutions_count, 0) AS solutions_count,
COALESCE(likes_count, 0) AS likes_count
FROM month_starts ms
LEFT JOIN monthly_posts mp ON mp.month_start = ms.month_start
LEFT JOIN monthly_total_users mtu ON mtu.month_start = ms.month_start
LEFT JOIN monthly_active_users mau ON mau.month_start = ms.month_start
LEFT JOIN monthly_solutions mts ON mts.month_start = ms.month_start
LEFT JOIN monthly_likes ml ON ml.month_start = ms.month_start
ORDER BY month_start DESC

在运行查询之前,您需要为 start_date 参数提供一个值。格式应为 yyyy-mm-dd。不过,该查询仅从该日期中提取月份部分。之所以需要起始日期参数,是因为如果查询针对站点上线以来的整个时间段运行,在大型站点上会导致超时。使用起始日期参数,我可以在 Meta 站点上运行长达数年的数据查询而不会超时。

关于该查询的一点说明是,active_users_count 列返回的是当月_登录_过站点的唯一用户数量——它并不检查这些用户是否在站点上执行了任何操作(例如,点赞帖子或创建帖子)。虽然查询或许可以做到这一点,但我担心会出现超时问题。

如果这不是您想要的数据类型,请告诉我。如果总体思路正确,请告知您是否需要向查询中添加其他数据,或者您是否发现结果中有任何看似错误的地方。

感谢 @simon 分享,这非常有用,基本上正是我需要的!我想问一下,是否可以对查询做一点修改?我不希望指定开始日期。我希望将所有这些值(帖子、用户、解决方案、点赞)汇总,并按月份显示查询结果,如下所示:

这或许可行,我会尝试一下。开始日期参数是在最后时刻添加到查询中的,因为我发现如果对整个站点上线以来的全部时间范围在 Meta 上运行该查询,会导致超时。也许有一些方法可以提高查询效率,从而解决超时问题。如果不行,查询应该允许设置一个时间范围,而不仅仅是指定开始日期。这样,你只需通过设置不同的时间范围运行几次查询,就能获取站点的所有数据。

没问题!等您有时间进行调整时,我们再继续。

嗨,Konrad,

以下是调整后的查询。


WITH monthly_users AS (
    SELECT 
        date_part('year', created_at) AS year, 
        date_part('month', created_at) AS month,
        COUNT(*) AS "new_users_month"
    FROM users
    WHERE id > 0
    GROUP BY date_part('year', created_at), date_part('month', created_at)
    ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),

monthly_posts AS (
	SELECT
        date_part('year', created_at) AS year, 
        date_part('month', created_at) AS month,
		COUNT(*) AS "posts_count"
	FROM posts p
	WHERE p.deleted_at IS NULL
		AND p.post_type = 1
	GROUP BY date_part('year', created_at), date_part('month', created_at)
	ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),

monthly_active_users AS (
	SELECT
        date_part('year', visited_at) AS year, 
        date_part('month', visited_at) AS month,
		COUNT(DISTINCT user_id) AS "active_users_count"
	FROM user_visits uv
	GROUP BY date_part('year', visited_at), date_part('month', visited_at)
	ORDER BY date_part('year', visited_at) ASC, date_part('month', visited_at)
),

monthly_solutions AS (
	SELECT
        date_part('year', created_at) AS year, 
        date_part('month', created_at) AS month,
		COUNT(*) AS "solutions_count"
	FROM user_actions ua
	WHERE ua.action_type = 15
	GROUP BY date_part('year', created_at), date_part('month', created_at)
	ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),

monthly_likes AS (
	SELECT
        date_part('year', created_at) AS year, 
        date_part('month', created_at) AS month,
		COUNT(*) AS "likes_count"
	FROM user_actions ua
	WHERE ua.action_type = 2
	GROUP BY date_part('year', created_at), date_part('month', created_at)
	ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
)

SELECT
    mu.year,
    mu.month,
    SUM(new_users_month) over (ORDER BY mu.year, mu.month rows between unbounded preceding AND current row) AS total_users,
    posts_count,
    COALESCE(active_users_count, 0) AS active_users_count,
    COALESCE(solutions_count, 0) AS solutions_count,
    COALESCE(likes_count, 0) AS solutions_count
FROM monthly_users mu
LEFT JOIN monthly_posts mp ON mp.year = mu.year AND mp.month = mu.month
LEFT JOIN monthly_active_users mau ON mau.year = mu.year AND mau.month = mu.month
LEFT JOIN monthly_solutions ms ON ms.year = mu.year AND ms.month = mu.month
LEFT JOIN monthly_likes ml ON ml.year = mu.year AND ml.month = mu.month
ORDER BY mu.year, mu.month 

如果你需要添加所有列,使其与图片完全一致,请使用以下查询:

SQL 详情
WITH monthly_users AS (
    SELECT 
        date_part('year', created_at) AS year, 
        date_part('month', created_at) AS month,
        COUNT(*) AS new_users_month
    FROM users
    WHERE id > 0
    GROUP BY date_part('year', created_at), date_part('month', created_at)
    ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),

monthly_posts AS (
	SELECT
        date_part('year', created_at) AS year, 
        date_part('month', created_at) AS month,
		COUNT(*) AS posts_count
	FROM posts p
	WHERE p.deleted_at IS NULL
		AND p.post_type = 1
	GROUP BY date_part('year', created_at), date_part('month', created_at)
	ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),

monthly_active_users AS (
	SELECT
        date_part('year', visited_at) AS year, 
        date_part('month', visited_at) AS month,
		COUNT(DISTINCT user_id) AS active_users_count
	FROM user_visits uv
	GROUP BY date_part('year', visited_at), date_part('month', visited_at)
	ORDER BY date_part('year', visited_at) ASC, date_part('month', visited_at)
),

monthly_solutions AS (
	SELECT
        date_part('year', created_at) AS year, 
        date_part('month', created_at) AS month,
		COUNT(*) AS solutions_count
	FROM user_actions ua
	WHERE ua.action_type = 15
	GROUP BY date_part('year', created_at), date_part('month', created_at)
	ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
),

monthly_likes AS (
	SELECT
        date_part('year', created_at) AS year, 
        date_part('month', created_at) AS month,
		COUNT(*) AS likes_count
	FROM user_actions ua
	WHERE ua.action_type = 2
	GROUP BY date_part('year', created_at), date_part('month', created_at)
	ORDER BY date_part('year', created_at) ASC, date_part('month', created_at)
)

SELECT
    mu.year,
    mu.month,   
    SUM(new_users_month + COALESCE(posts_count,0) + 
        COALESCE(active_users_count, 0) + 
        COALESCE(solutions_count, 0) + 
        COALESCE(likes_count, 0)) 
        over (ORDER BY mu.year, mu.month rows between unbounded preceding AND current row) AS sum_total
FROM monthly_users mu
LEFT JOIN monthly_posts mp ON mp.year = mu.year AND mp.month = mu.month
LEFT JOIN monthly_active_users mau ON mau.year = mu.year AND mau.month = mu.month
LEFT JOIN monthly_solutions ms ON ms.year = mu.year AND ms.month = mu.month
LEFT JOIN monthly_likes ml ON ml.year = mu.year AND ml.month = mu.month
ORDER BY mu.year, mu.month 

这正是我想要的!再次感谢 @michebs!你们有没有考虑过创建一个开源的“数据探索器”查询集合,并与 Discourse 用户分享?

是的,这就是数据资源管理器内置的查询列表 :wink:

是的,我知道,但你会在此基础上扩展吗?例如,根据论坛里的这些问题。

我们可以接受添加查询的 PR,示例如下:

此外,(Superseded) What cool data explorer queries have you come up with? 以及 @SidV 的列表 discourse-data-explorer/querys.md at queries · SidVal/discourse-data-explorer · GitHub 也提供了大量有用的查询!

太好了!感谢分享!