Legal Tools Plugin

First words of admiration go to @angus :clap: :heart: :+1: for doing such a great job helping everyone here to get those highly useful tools (not only this plugin but other plugins too).

I’ve got one question though: wouldn’t it be better to have this export available to admins only (at least as option for those concerned)?
Isn’t it potentially risky in case when given account password is compromised and then ‘all activity’ is easily downloaded by unauthorized person? (Sorry that is two questions :slight_smile: )

2 Likes

I just did a few more tests and tried it on your sandbox too. My results are just the default download - they don’t match your new format. I don’t think I see any errors in sidekiq (at least not in ‘Failed’ - ‘Errors’ is not clickable). Any suggestions how I can best find more information here?

Please post (or PM) a screenshot of the csv you got from my sandbox with the ip addresses blacked out (if any).

I think I’ll make this a setting.

Potentially. This is partly what I meant by

Nevertheless, some people will need / want the functionality of allowing users to directly download their info. So I think a setting is the move here.

1 Like

There are now two site settings that enable the extended download:

  • legal extended user download: When enabled, the “Download All” feature in the User Activity page becomes an extended download.

    • Please note that, like the normal user download, the extended user download can only be performed by a user once a day.
  • legal extended user download admin. When enabled, permitted staff can download all the information of any user of the site. They will see an new “Download All” button at the top of the the admin user information of each user.

    Screenshot%20at%20May%2026%2014-55-03

    Options:

    • Disabled (default)
    • Admins Only (only admins can use the admin extended download feature)
    • Admins and Staff (both admins and staff can use the admin extended download feature)

The two settings are severable, i.e. you can enable legal extended user download without giving admins or staff the ability to download all the information of every user, or you can enable legal extended user download admin without giving users the ability to download all of their information.

Due to the security implications of allowing users (even if they’re admins) to download all the information of other users, I’ve been particularly careful with protecting the admin extended user download server methods, however I would appreciate a second set of eyes on that aspect of the changes, particularly considering this feature seems to be quite popular. Perhaps you could take a look @riking if you have the time? (i.e. particularly the changes to the Guardian).

I’ve also added text to the top of the extended csv:

  • A header (can be edited: Customize > Text Content > “csv_export.extended.title”):

    All information of %{username} stored by %{site_name}

    • “username” is the username of the user who’s information is in the download.
    • “site_name” is the title site setting.
  • A note below the header (can be edited: Customize > Text Content > “csv_export.extended.note”):

    Please note that some information associated with the user identifier of %{username}
    has been excluded from this download due to countervailing privacy and legal interests.
    For more information, please contact %{site_contact}.

    • “username” is the username of the user who’s information is in the download.
    • “site_contact” is the contact email site setting.

This has all been tested on my sandbox where it is currently live.

@bartv Let me know if these updates fix your issue.

7 Likes

This new version solves the issue I had before. Thanks for all your hard work on this, Angus!

The extended note is interesting; could you shed some light on which information has been excluded and for which reasons?

2 Likes

That is primarily referring to the exclusion mentioned below, and also helps if we’ve not included something some authority happens to consider relevant.

4 Likes

6 posts were split to a new topic: User data export failing (transaction aborted)

As admin I just tried to export data from a user (a GDPR request) by clicking on the right button (download all) but I got a message from the system that the data export did fail and that I should control the logs.

I cannot figure out if this is now a problem inside the plugin or a problem of Discourse itself.
Update January 11 17.40 UTC: The discourse team just said that looking at the backtrace it is a problem in a third party plugin. @angus can you please have a look at this, thanks.

The Discourse instance is running on 2.7.0.beta1 ( 4f72830eb0 )
Update January 11th 16:15 UTC: now on version 2.7.0.beta1( 422f395042 ) but the error remains the same.

This is what is configured inside the plugin:

grafik

In the logs I do see the following error:
Job exception: undefined method `collect’ for nil:NilClass

/usr/local/lib/ruby/2.7.0/csv/writer.rb:46:in `<<'
/usr/local/lib/ruby/2.7.0/csv.rb:1230:in `<<'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:66:in `block (3 levels) in execute'
/var/www/discourse/plugins/discourse-legal-tools/lib/export_csv_file_extension.rb:267:in `user_archive_export_extended'
/var/www/discourse/plugins/discourse-legal-tools/lib/export_csv_file_extension.rb:226:in `admin_user_archive_export'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:66:in `each'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:66:in `block (2 levels) in execute'
/usr/local/lib/ruby/2.7.0/csv.rb:658:in `open'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:64:in `block in execute'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:63:in `each'
/var/www/discourse/app/jobs/regular/export_csv_file.rb:63:in `execute'
/var/www/discourse/app/jobs/base.rb:232:in `block (2 levels) in perform'
rails_multisite-2.5.0/lib/rails_multisite/connection_management.rb:76:in `with_connection'
/var/www/discourse/app/jobs/base.rb:221:in `block in perform'
/var/www/discourse/app/jobs/base.rb:217:in `each'
/var/www/discourse/app/jobs/base.rb:217:in `perform'
sidekiq-6.1.2/lib/sidekiq/processor.rb:196:in `execute_job'
sidekiq-6.1.2/lib/sidekiq/processor.rb:164:in `block (2 levels) in process'
sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:138:in `block in invoke'
/var/www/discourse/lib/sidekiq/pausable.rb:138:in `call'
sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:140:in `block in invoke'
sidekiq-6.1.2/lib/sidekiq/middleware/chain.rb:143:in `invoke'
sidekiq-6.1.2/lib/sidekiq/processor.rb:163:in `block in process'
sidekiq-6.1.2/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/job_retry.rb:111:in `local'
sidekiq-6.1.2/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq.rb:38:in `block in <module:Sidekiq>'
sidekiq-6.1.2/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/processor.rb:257:in `stats'
sidekiq-6.1.2/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/job_logger.rb:13:in `call'
sidekiq-6.1.2/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch'
sidekiq-6.1.2/lib/sidekiq/job_retry.rb:78:in `global'
sidekiq-6.1.2/lib/sidekiq/processor.rb:124:in `block in dispatch'
sidekiq-6.1.2/lib/sidekiq/logger.rb:10:in `with'
sidekiq-6.1.2/lib/sidekiq/job_logger.rb:33:in `prepare'
sidekiq-6.1.2/lib/sidekiq/processor.rb:123:in `dispatch'
sidekiq-6.1.2/lib/sidekiq/processor.rb:162:in `process'
sidekiq-6.1.2/lib/sidekiq/processor.rb:78:in `process_one'
sidekiq-6.1.2/lib/sidekiq/processor.rb:68:in `run'
sidekiq-6.1.2/lib/sidekiq/util.rb:15:in `watchdog'
sidekiq-6.1.2/lib/sidekiq/util.rb:24:in `block in safe_thread'
1 Like

Apologies for the slow reply here, I’ve been away :beach_umbrella:.

I repro’d this and fixed it. Please update and try the download again. If possible, let me know how the request goes.

https://github.com/paviliondev/discourse-legal-tools/commit/a15193e9267ee5aee2d7eb8e5dc6c3b0d18d3b23

6 Likes

no problem, it was a weekend and nobody has to react right away.
I am thankful for this plugin.

I did update and tried it again.
It worked fine, thanks a lot for the fix, really appreciated your good work.

If I would like to also include ‘administrative’ records with the user’s user_id such as flags, complaints and staff whispers, what would be the best SQL statements to get the data with the data explorer?
If needed I could then provide this data separately to the user who requested the data.

3 Likes

it is not a problem of this great plugin but does someone know how can get this data. I do have the data explorer but I do not know the structure of all database tables and in which tables I would have to look for the data and which ones I would have to join in order to get this data.

See the earlier post, I think:

no these site settings were already enabled and that is the data I do already have.
But that does not include the administrative records , as Angus has written above:

And now I am looking how I could extract these administrative records.

Hmm, I thought that was the meaning of extended - it’s everything practically exportable.

Yes, that’s right. Unfortunately, there’s not a straightforward answer to this question. This is why those records are not included, as explained above. Note, in particular

That said, to gather additional records containing ther user’s user id, you can use this approach.

  1. Install the data explorer plugin

  2. Create a new query (perhaps call it “Additional User Records (GDPR)”)

  3. Do a search for “user_id” in the schema explorer on the right to see which database tables include a user_id in them. You’ll see that a number of them are already included (see list in OP + “user activity” as mentioned in my second post).

  4. Determine the user_id of the user in question (you can find it at /u/username.json)

  5. For each additional table you want to include, construct a query that extracts the rows where the user_id matches the relevant id. e.g.

    select * from [table_name] where user_id = [user_id]
    

I suggest you review each of these “additional” tables on their own merits, rather than just attempting to download every single record that contains the user’s id.

The records may contain other information, relevant to other users with counterveiling interests, or may be senstive in some other way. Unfortunately there’s no single answer to the question of “scope” here. You’ll need to make that call based on how you read your specific mixture of responsibiities. The GDPR is not the only relevant responsibility here. You shouldn’t just hand over every single record that contains a user’s id.

I’m actually a little unclear what’s driving the interest in these additional records? Is this something the user has asked for, i.e. beyond what’s already included? If they haven’t what’s motivating this? A different interpretation of your responsibilities under the GDPR than what I’ve laid out above? If so I’d be curious to learn more and the legal reasoning behind it (I may want to consider assimilating the reasoning into this plugin).

2 Likes

yes but of course we are not williing to give him all these information. Especially if the records do have also data from other users. We just want to be prepared to have these additional information if really needed. We will most probably not provide these information to this user but we might have to give information to the authorities because we expect that the user will address it to the authorities.
Our new data protection officer also told us that we should at least not yet provide the administrative records.

1 Like

I see.

If your data protection officer decides that additional records are needed, and has some tables in mind, I would be happy to provide a more specific sql query to help you out. For the reasons I mentioned, I don’t want to nominate specific additional tables to be provided as a general piece of advice outside of the context of a case.

But if you need something specific as this case progresses, I’d be happy to help you out pro bono, as that is in the spirit of this plugin, i.e. to make it easier for Discourse communities to navigate the GDPR. If that happens, and you have specific tables in mind, and you’re in need of assistance with the SQL query, PM me here on meta.

In short, I’m happy to provide some ad-hoc technical (non-legal) assistance to Discourse communities in response to specific cases under the GDPR, but I’m conscious of not setting general standards beyond the scope of what is reasonable for the majority of cases. If there is a legal argument of that sort that the scope of the plugin should be expanded, I’m open to it.

4 Likes

well our data protection officer told me that there is at least currently no need to extract additional administrative data. Thanks a lot for your help and if needed I would get back to you.

3 Likes

This plugin is great!

Handling GDPR subject access requests is a pain, a total time drain, and this helps cover all of that with much more confidence. Thank you.

Are there any plans to add more features? Particularly I’m struggling with data retention and minimisation principals. Specifically I’m interested in minimising ‘administrative records’ - whispers and posts in team areas which could contain notes on IP addresses and other personally identifying data that needs sifting and searching by hand. Five years on there’s too much to audit, and little value in the old messages so I want / need to permanently delete them. I’d actually like to have just a 6 month retention policy on such messages and whispers.

So I can select and delete stuff using rake, but it’s just marked deleted and still all there in the database :frowning:

I’ve therefore been thinking about an ‘obliterator’ plug-in that would either change the raw and cooked text of deleted posts to something like ‘this message has been obliterated’, or (preferably) unpick and remove the posts entirely. Having never written ruby or a plug-in, I’m not at an ideal starting position, though could potentially just write some SQL to run against the db directly, then use rake to rebuild the search indexes afterwards.

@angus - I did wonder if in your legal considerations you had any thoughts on the data retention aspects of GDPR, and how you handle it?

Interesting!

Yes, I’m open to adding a feature for that. I’ll have to consider it in some more depth after doing a bit more research.

Could you please a detailed feature request (select “Legal Tools” at the plugin step) laying out all the relevant details of your use case and any other research you’ve collected, I’ll then follow up and engage after doing a bit of background.

https://thepavilion.io/w/feature-request

1 Like