导出用户数据的技巧

当用户需要超出其偏好设置页面上提供的基本用户数据导出的自定义或详细数据导出时,管理员可以使用Data Explorer插件执行用户数据导出。

此过程涉及编写和执行针对特定数据需求的 SQL 查询,同时确保符合站点策略和数据保护法规。

管理员可能还出于其他原因想要导出用户数据,例如:

  • 用户参与度分析
    • 分析用户如何与站点互动,哪些帖子受欢迎,哪些主题获得最多的回复。
  • 报告生成
    • 为利益相关者创建自定义报告,以提供社区指标和用户参与度的见解。
  • 数据备份
    • 保留用户数据的备份以用于恢复目的或历史分析。
  • 数据迁移
    • 迁移到另一个平台或创建数据仓库时导出用户数据。
  • 内容审核
    • 在发生违规或争议时调查用户活动或内容。

用户数据导出查询的关键注意事项

在编写导出用户数据的查询时,重要的是要同时考虑导出的具体要求以及您的站点关于用户数据导出的策略。

以下是一些可能影响用户数据导出查询编写方式的因素和注意事项:

  1. 用户属性

    • 如果您需要特定的用户属性,例如用户名、电子邮件或注册日期,您的查询应从数据库中明确选择这些列。例如:
    SELECT username, email, created_at AS registration_date
    FROM users
    WHERE id = :user_id
    
  2. 活动数据

    • 要导出用户活动数据(如帖子、点赞或登录历史记录),需要与其他表(例如 postslikesuser_actions)进行其他连接:
    SELECT p.id AS post_id, p.topic_id, p.created_at AS post_date, u.username
    FROM posts p
    JOIN users u ON u.id = p.user_id
    WHERE u.id = :user_id
    
  3. 日期范围

    • 如果您需要特定时间段的数据,请在查询中包含日期过滤器以将结果限制在相关期间内:
    SELECT username, email, created_at AS registration_date
    FROM users
    WHERE created_at BETWEEN :start_date AND :end_date
    
  4. 数据隐私

    • 确保导出的数据符合数据隐私法律和站点策略。除非绝对必要,否则在包含敏感数据时要小心。
  5. 匿名化

    • 在某些情况下,您可能需要匿名化用户数据以保护用户隐私。例如,您可以用唯一标识符替换姓名。
  6. 用户同意

    • 如果数据导出涉及共享个人或敏感用户信息,请确保已到位明确的同意机制。

示例查询

以下是一些常见数据导出需求的示例查询。这些查询的设计与从用户偏好设置页面获得的数据导出类似:

导出用户信息:

-- [params]
-- int :user_id

SELECT id, username, email, created_at AS registration_date
FROM users
WHERE id = :user_id

导出帖子

-- [params]
-- int :user_id

SELECT
    id,
    topic_id,
    post_number,
    created_at,
    updated_at,
    reply_to_post_number,
    reply_count,
    quote_count,
    deleted_at,
    like_count,
    incoming_link_count,
    bookmark_count,
    score,
    reads,
    post_type,
    sort_order,
    last_editor_id,
    hidden,
    hidden_reason_id,
    notify_moderators_count,
    spam_count,
    illegal_count,
    inappropriate_count,
    last_version_at,
    user_deleted,
    reply_to_user_id,
    percent_rank,
    notify_user_count,
    like_score,
    deleted_by_id,
    word_count,
    version,
    cook_method,
    wiki,
    baked_at,
    baked_version,
    hidden_at,
    self_edits,
    reply_quoted,
    via_email,
    public_version,
    locked_by_id,
    image_upload_id,
    qa_vote_count,
    raw,
    cooked,
    edit_reason,
    raw_email,
    action_code,
    outbound_message_id
FROM
    posts
WHERE
    user_id = :user_id
ORDER BY
    created_at DESC

导出认证令牌日志:

-- [params]
-- int :user_id

SELECT
    path,
    action,
    user_auth_token_id,
    client_ip,
    user_agent,
    auth_token,
    created_at
FROM
    user_auth_token_logs
WHERE
    user_id = :user_id

导出认证令牌:

-- [params]
-- int :user_id

SELECT
    auth_token,
    client_ip,
    rotated_at,
    created_at,
    updated_at,
    seen_at,
    auth_token_seen,
    user_agent,
    prev_auth_token
FROM
    user_auth_tokens
WHERE
    user_id = :user_id

导出用户徽章:

-- [params]
-- int :user_id

SELECT
    notification_id,
    badge_id,
    granted_at,
    granted_by_id,
    post_id,
    seq,
    featured_rank,
    created_at,
    is_favorite
FROM
    user_badges
WHERE
    user_id = :user_id

导出用户书签:

-- [params]
-- int :user_id

SELECT
    name,
    reminder_at,
    created_at,
    updated_at,
    reminder_last_sent_at,
    reminder_set_at,
    auto_delete_preference,
    pinned,
    bookmarkable_id,
    bookmarkable_type
FROM
    bookmarks
WHERE
    user_id = :user_id

导出类别偏好设置:

-- [params]
-- int :user_id

SELECT
    category_id,
    notification_level,
    last_seen_at
FROM
    category_users
WHERE
    user_id = :user_id

导出标志:

-- [params]
-- int :user_id

SELECT
    id,
    name,
    name_key,
    description,
    notify_type,
    auto_action_type,
    custom_type,
    applies_to,
    position,
    enabled,
    created_at,
    updated_at,
    score_type
FROM
    flags
WHERE
    id IN (
        SELECT
            post_action_type_id
        FROM
            post_actions
        WHERE
            user_id = :user_id
    )

导出点赞:

-- [params]
-- int :user_id

SELECT
    post_id,
    post_action_type_id,
    created_at,
    updated_at,
    deleted_at,
    deleted_by_id,
    related_post_id,
    staff_took_action,
    deferred_by_id,
    targets_topic,
    agreed_at,
    agreed_by_id,
    deferred_at,
    disagreed_at,
    disagreed_by_id
FROM
    post_actions
WHERE
    user_id = :user_id
    AND post_action_type_id = 1

导出用户偏好设置:

-- [params]
-- int :user_id

SELECT
    mailing_list_mode,
    email_digests,
    external_links_in_new_tab,
    enable_quoting,
    dynamic_favicon,
    automatically_unpin_topics,
    digest_after_minutes,
    auto_track_topics_after_msecs,
    new_topic_duration_minutes,
    last_redirected_to_top_at,
    email_previous_replies,
    email_in_reply_to,
    like_notification_frequency,
    include_tl0_in_digests,
    mailing_list_mode_frequency,
    notification_level_when_replying,
    theme_key_seq,
    allow_private messages,
    homepage_id,
    theme_ids,
    hide_profile_and_presence,
    text_size_key,
    text_size_seq,
    email_level,
    email_messages_level,
    title_count_mode_key,
    enable_defer,
    timezone,
    enable_allowed_pm_users,
    dark_scheme_id,
    skip_new_user_tips,
    color_scheme_id,
    default_calendar,
    chat_enabled,
    only_chat_push_notifications,
    oldest_search_log_date,
    chat_sound,
    dismissed_channel_retention_reminder,
    dismissed_dm_retention_reminder,
    bookmark_auto_delete_preference,
    ignore_channel_wide_mention,
    chat_email_frequency,
    enable_experimental_sidebar,
    seen_popups,
    chat_header_indicator_preference,
    sidebar_link_to_filtered_list,
    sidebar_show_count_of_new_items,
    watched_precedence_over_muted,
    chat_separate_sidebar_mode,
    topics_unread_when_closed,
    policy_email_frequency,
    show_thread_title_prompts,
    auto_image_caption
FROM
    user_options
WHERE
    user_id = :user_id

导出排队帖子:

-- [params]
-- int :user_id

SELECT
    id,
    user_id,
    topic_id,
    post_number,
    raw,
    created_at,
    updated_at,
    state,
    error,
    queue
FROM
    queued_posts
WHERE
    user_id = :user_id

导出用户存档:

-- [params]
-- int :user_id

SELECT
    id,
    topic_id,
    created_at,
    updated_at
FROM
    user_archived_messages
WHERE
    user_id = :user_id

导出用户访问记录:

-- [params]
-- int :user_id

SELECT
    visited_at,
    posts_read,
    mobile,
    time_read
FROM
    user_visits
WHERE
    user_id = :user_id

有关创建和运行数据导出查询的更多信息,请参阅Discourse Data Explorer 插件文档#data-reporting::category 类别。

7 个赞