法律工具插件

仓库:GitHub - paviliondev/discourse-legal-tools: Tools to help with legal compliance when using Discourse · GitHub

本插件提供工具,以协助在运行 Discourse 论坛时满足法律合规要求。相关工具将陆续添加。

请注意下方的免责声明。本插件不提供任何法律合规的保证。

扩展用户下载

扩展用户下载是一个单独的 CSV 文件,包含以下条目,每个条目之间由两个空行分隔:

  • 标题(可编辑:自定义 > 文本内容 > “csv_export.extended.title”):

    %{site_name} 存储的 %{username} 的所有信息

    • “username”是下载中包含其信息的用户的用户名。
    • “site_name”是 title 站点设置。
  • 标题下方的说明(可编辑:自定义 > 文本内容 > “csv_export.extended.note”):

    请注意,由于存在相反的隐私和法律利益,与用户标识符 %{username} 相关的部分信息已从本下载中排除。
    如需更多信息,请联系 %{site_contact}。

    • “username”是下载中包含其信息的用户的用户名。
    • “site_contact”是 contact email 站点设置。
  • 帖子:用户下载中默认包含的信息。

  • 账户:账户和资料信息。

  • 外部账户:来自外部账户的信息(如果存在)。

  • 统计数据:关于该用户存储的统计信息。

  • 登录和登录历史:关于用户登录的信息。

  • 搜索:用户执行的所有带 IP 日志记录的搜索记录。

  • 主题浏览:用户的所有带 IP 日志记录的主题浏览记录。

  • 主题链接点击:用户点击的所有带 IP 日志记录的主题链接。

  • 资料浏览:用户的所有带 IP 日志记录的资料浏览记录。

  • 操作:用户执行的所有操作。

  • 历史记录:涉及用户的所有可记录操作的带 IP 日志记录。

有两个站点设置可启用扩展下载:

  • legal extended user download:启用后,用户活动页面中的“下载全部”功能将变为扩展下载。

    • 请注意,与常规用户下载一样,扩展用户下载每个用户每天只能执行一次。
  • legal extended user download admin:启用后,获授权的工作人员可以下载该站点任何用户的所有信息。他们将在每个用户的管理员用户信息顶部看到一个新增的“下载全部”按钮。

      ![Screenshot%20at%20May%2026%2014-55-03|340x96,80%](upload://wvealMlRqo6xFgDCzkuXNmDAcOa.png)
    

    选项:

    • 已禁用(默认)
    • 仅限管理员(仅管理员可使用管理员扩展下载功能)
    • 管理员和工作人员(管理员和工作人员均可使用管理员扩展下载功能)

这两个设置是可分离的,即您可以启用 legal extended user download 而不赋予管理员或工作人员下载每位用户所有信息的能力,或者您可以启用 legal extended user download admin 而不赋予用户下载其所有信息的能力。

此功能的背景是欧盟的 GDPR。请特别参考:

请注意,您应慎重考虑允许用户、工作人员和/或管理员下载所列全部信息所带来的安全影响。此功能可能并不适用于所有情况。有关此问题的更多信息请参阅上述主题。

直接下载用户信息的一种替代方案是由相关工作人员(“数据保护官”)通过数据库查询来汇编信息。

免责声明

法律工具插件(以下简称“插件”)及其作者 Angus McLeod(以下简称“作者”)并非律师,也不能替代律师或法律建议。您与作者之间的通信不受律师 - 客户特权或工作成果原则的保护。插件及其作者无法就可能的法律权利、补救措施、抗辩理由、选项、表格选择或策略提供任何类型的建议、解释、意见、推荐或保证。

62 个赞

@codinghorror how much of this do you think belongs in core? Should we just amend our default “download all my data” to include all this stuff?

9 个赞

No, we need @riking to come through with the removal of IP where it’s unnecessarily recorded and should not be first. If that can’t be achieved in a reasonable timeframe we need to get someone else to do it.

5 个赞

Update here.

Scope of the download

All user-related records in user activity (record of likes, bookmarks, topics, replies etc) have been added to the download (commit).

I initially didn’t add this as it seemed like overkill. However @RGJ raised it with me and we had a productive exchange on the question.

Essentially, we decided that the best approach for the purposes of this feature was to include all records of activity tied to the user’s identifier that don’t entail countervailing concerns about the privacy of others or similar rights.

I would emphasise “for the purposes of this feature”, as the purpose of this feature is to take a ‘maximalist’ approach to possible interpretations of the GDPR. It does not attempt to parse ‘likely’ approaches. I’ve laid out some of my own views on the ‘likely’ approaches in this topic (which remain unchanged).

The specifics of the reasoning behind this ‘maximalist’ approach are:

  • The broadest interpretation of A.4.1 (the definition of ‘Personal Data’ in the GDPR) as it applies to Discourse is any record in the db that contains the user’s user_id, i.e.:

    any information … identified or identifiable natural person … identified, directly or indirectly, in particular by reference to … an identification number

  • Read literally, this definition doesn’t care about how the data is produced (e.g. whether the user is acting or not). It merely requires the data to be related to the user’s identifier in some way.

  • However, applying that literally would produce a fair amount of duplication (e.g. the records in the directory_items table are duplicative of various other entries).

  • The point of the extended download is to guard against even the small risk that Article 4.1 could receive a very broad interpretation by some authority or court in Europe.

  • The factors against including it - size of download, potentially security (?) - do not outweigh the possible benefit of including it.

We also considered whether to include ‘administrative’ records with the user’s user_id such as flags, complaints and staff whispers. We decided against this, reasoning as follows:

  • They’re already in the territory of information associated with the user purely by their identifier. They are not information about the user per se (i.e. name, email, age, location etc). This is already assuming a wide interpretation of A.4.1.

  • Whether administrative records intrude on the privacy of other parties, or other relevant concerns (i.e. R. 63.5 & A.15.4) must be determined on a case-by-case basis.

  • Other parties, such as Facebook, do no include such data in their user download functionality.

12 个赞

Hey Angus,

I’m having some trouble getting this to work. I’ve enabled the ‘legal extended user download’ setting, refreshed the page and clicked the ‘Download all’ button on the activity page. This results in an archive that contains one CSV file with topics in it. I checked both as an admin and as a regular user. What is the expected output - multiple CSV files, each for a different table?

The expected output is a single csv with headers for each item mentioned in the first post in this topic. If you try it out on my sandbox, this is what you’ll get.

Do you see any errors at /sidekiq?

@RGJ have you had this issue?

Nope. We had a few users with the same kind of complaint though but it turned out to be a false alarm.
I guess they never scrolled down past the posts. Maybe it’s an idea to move the posts section (i.e. the most unstructured / multiline content) to be the last section.

1 个赞

Could you provide a few of the header rows as an example so I can do a search for them?

See the ‘separator’ lines here.

Thanks Richard. Definitely not seeing those. So just to be sure I’m doing the right thing: this plugin is supposed to replace the ‘Download’ button in the Profile > Activity sidebar, right?

First words of admiration go to @angus :clap: :heart: :+1: for doing such a great job helping everyone here to get those highly useful tools (not only this plugin but other plugins too).

I’ve got one question though: wouldn’t it be better to have this export available to admins only (at least as option for those concerned)?
Isn’t it potentially risky in case when given account password is compromised and then ‘all activity’ is easily downloaded by unauthorized person? (Sorry that is two questions :slight_smile: )

3 个赞

I just did a few more tests and tried it on your sandbox too. My results are just the default download - they don’t match your new format. I don’t think I see any errors in sidekiq (at least not in ‘Failed’ - ‘Errors’ is not clickable). Any suggestions how I can best find more information here?

Please post (or PM) a screenshot of the csv you got from my sandbox with the ip addresses blacked out (if any).

I think I’ll make this a setting.

Potentially. This is partly what I meant by

Nevertheless, some people will need / want the functionality of allowing users to directly download their info. So I think a setting is the move here.

1 个赞

There are now two site settings that enable the extended download:

  • legal extended user download: When enabled, the “Download All” feature in the User Activity page becomes an extended download.

    • Please note that, like the normal user download, the extended user download can only be performed by a user once a day.
  • legal extended user download admin. When enabled, permitted staff can download all the information of any user of the site. They will see an new “Download All” button at the top of the the admin user information of each user.

    Screenshot%20at%20May%2026%2014-55-03

    Options:

    • Disabled (default)
    • Admins Only (only admins can use the admin extended download feature)
    • Admins and Staff (both admins and staff can use the admin extended download feature)

The two settings are severable, i.e. you can enable legal extended user download without giving admins or staff the ability to download all the information of every user, or you can enable legal extended user download admin without giving users the ability to download all of their information.

Due to the security implications of allowing users (even if they’re admins) to download all the information of other users, I’ve been particularly careful with protecting the admin extended user download server methods, however I would appreciate a second set of eyes on that aspect of the changes, particularly considering this feature seems to be quite popular. Perhaps you could take a look @riking if you have the time? (i.e. particularly the changes to the Guardian).

I’ve also added text to the top of the extended csv:

  • A header (can be edited: Customize > Text Content > “csv_export.extended.title”):

    All information of %{username} stored by %{site_name}

    • “username” is the username of the user who’s information is in the download.
    • “site_name” is the title site setting.
  • A note below the header (can be edited: Customize > Text Content > “csv_export.extended.note”):

    Please note that some information associated with the user identifier of %{username}
    has been excluded from this download due to countervailing privacy and legal interests.
    For more information, please contact %{site_contact}.

    • “username” is the username of the user who’s information is in the download.
    • “site_contact” is the contact email site setting.

This has all been tested on my sandbox where it is currently live.

@bartv Let me know if these updates fix your issue.

8 个赞

This new version solves the issue I had before. Thanks for all your hard work on this, Angus!

The extended note is interesting; could you shed some light on which information has been excluded and for which reasons?

3 个赞

That is primarily referring to the exclusion mentioned below, and also helps if we’ve not included something some authority happens to consider relevant.

4 个赞

6 条帖子已移至新主题:用户数据导出失败(事务中止)

作为管理员,我刚刚尝试通过点击正确的按钮(下载全部)导出某个用户的数据(GDPR 请求),但系统提示数据导出失败,并建议我检查日志。

我无法确定这是插件内部的问题,还是 Discourse 本身的问题。
更新:1 月 11 日 17:40 UTC:Discourse 团队表示,根据回溯信息判断,这是一个第三方插件的问题。@angus,能否请您查看一下?谢谢。

当前 Discourse 实例运行版本为 2.7.0.beta1(4f72830eb0)。
更新:1 月 11 日 16:15 UTC:现已升级至版本 2.7.0.beta1(422f395042),但错误依旧。

插件中的配置如下:

日志中显示以下错误:

作业异常:nil:NilClass 没有 collect 方法

/usr/local/lib/ruby/2.7.0/csv/writer.rb:46:in `\u003c\u003c'
/usr/local/lib/ruby/2.7.0/csv.rb:1230:in `\u003c\u003c'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:66:in `block (3 levels) in execute'
/var/www/discourse/plugins/discourse-legal-tools/lib/export_csv_file_extension.rb:267:in `user_archive_export_extended'
/var/www/discourse/plugins/discourse-legal-tools/lib/export_csv_file_extension.rb:226:in `admin_user_archive_export'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:66:in `each'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:66:in `block (2 levels) in execute'
/usr/local/lib/ruby/2.7.0/csv.rb:658:in `open'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:64:in `block in execute'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:63:in `each'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:63:in `execute'
/var/www/discourse/app/jobs/base.rb:232:in `block (2 levels) in perform'
rails_multisite-2.5.0/lib/rails_multisite/connection_management.rb:76:in `with_connection'
/var/www/discourse/app/jobs/base.rb:221:in `block in perform'
/var/www/discourse/app/jobs/base.rb:217:in `each'
/var/www/discourse/app/jobs/base.rb:217:in `perform'
sidekiq-6.1.2/lib/sidekiq/processor.rb:196:in `execute_job'
sidekiq-6.1.2/lib/sidekiq/processor.rb:164:in `block (2 levels) in process'
sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:138:in `block in invoke'
/var/www/discourse/lib/sidekiq/pausable.rb:138:in `call'
sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:140:in `block in invoke'
sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:143:in `invoke'
sidekiq-6.1.2/lib/sidekiq/processor.rb:163:in `block in process'
sidekiq-6.1.2/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/job_retry.rb:111:in `local'
sidekiq-6.1.2/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq.rb:38:in `block in \u003cmodule:Sidekiq\u003e'
sidekiq-6.1.2/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/processor.rb:257:in `stats'
sidekiq-6.1.2/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/job_logger.rb:13:in `call'
sidekiq-6.1.2/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/job_retry.rb:78:in `global'
sidekiq-6.1.2/lib/sidekiq/processor.rb:124:in `block in dispatch'
sidekiq-6.1.2/lib/sidekiq/logger.rb:10:in `with'
sidekiq-6.1.2/lib/sidekiq/job_logger.rb:33:in `prepare'
sidekiq-6.1.2/lib/sidekiq/processor.rb:123:in `dispatch'
sidekiq-6.1.2/lib/sidekiq/processor.rb:162:in `process'
sidekiq-6.1.2/lib/sidekiq/processor.rb:78:in `process_one'
sidekiq-6.1.2/lib/sidekiq/processor.rb:68:in `run'
sidekiq-6.1.2/lib/sidekiq/util.rb:15:in `watchdog'
sidekiq-6.1.2/lib/sidekiq/util.rb:24:in `block in safe_thread'
1 个赞

抱歉回复晚了,我之前不在 :beach_umbrella:

我已复现并修复了该问题。请更新后再次尝试下载。如果方便的话,请告知请求结果。

6 个赞

没问题,那是周末,没人需要立刻回应。
我很感激这个插件。

我已经更新并再次尝试了。
一切正常,非常感谢你的修复,真的很感激你的出色工作。

如果我想在数据探索器中通过用户的 user_id 一并包含“管理”类记录(如标记、投诉和工作人员私信),最佳的 SQL 语句应该怎么写?
如果需要,我可以随后将这部分数据单独提供给请求数据的用户。

3 个赞