Providing data for GDPR

angus · March 28, 2018, 2:23am

I don’t mean to offend, but this topic and its companion is a little misleading.

If you’re looking for reliable information on this subject you should restrict yourself to:

Official sources, e.g. the European Commission’s Article 29 Working Party.
Formal legal advice.

Don’t rely on 3rd party summaries (or even what folks are saying here, including me).

Regarding the substantive points, I would point out a few things

Concerning the Article 29 Working Party’s Guidelines on the Right to Data Portability I note:
- Availability of data via a JSON API is explicitly mentioned (multiple times) as a suitable data format. In fact one might even say it is encouraged vis-a-vis other methods.
- There is no requirement to provide everything in a single package, or instantly. The data needs to be provided “within a reasonable time not exceeding one month”.
- The thrust of the regulation is to avoid data “lock-in” and to promote interoperability.
As far as I can tell, there is nothing that Discourse needs to add to its existing functionality to allow forums to which this directive applies to comply with it.
Concerning the Right to Erasure (aka “Right to be forgotten”), I would reiterate that the applicable timeline (like with the Right to Data Portability) is one month. There is no need to provide a one-click “Forget me” button for users. It is quite possible to comply with requests to be forgotten within the existing functionality of Discourse.

Moreover, It is not clear to me that it would be a good idea to allow a user to completely erase all data concerning them themselves as the Right to Erasure explicitly requires the data controller to consider exceptions and countervailing rights when complying with a request.

The bottom line here is that, as far as I can tell, Discourse does not contain any structural impediments to your compliance with the GDPR. Compliance with the GDPR is up to you, as it arises in specific cases and is largely a matter of organisational management, not one of technical functionality.

If you think the GDPR may apply to you, you should at a minimum review the help documents provided by the relevant Data Protection Authority in your jurisdiction (as they will be the ones actually enforcing the GDPR), and seek legal advice if you have specific concerns. If you’re not sure which DPA applies to you, you can review the European Commissions own documents I linked above, or just pick a DPA that uses a language you can understand.

None of the above constitutes legal advice, and I am not your lawyer.

sam · March 28, 2018, 2:35am

This is one huge sticking point for me, if you signed up and accepted in the TOS that you are licensing your content to the forum operator, I am not sure if you have a leg to stand on when asking for erasure. Asking for anonymization, sure, but erasure is far more strong and disruptive.

For example with Stack Overflow you are licensing your content under 2018 Stack Exchange under cc by-sa 3.0 with attribution required. There are strong competing rights here between an existing granted license.

KajMagnus · March 28, 2018, 3:00am

Actually that seems incorrect to me (so good advice to not listen to anyone then ) and, reading the docs, it seems to me that a delete-account (revoke consent) button is needed. From the docs:

However, when consent is obtained via electronic means through only one mouse-click, swipe, or
keystroke, data subjects must, in practice, be able to withdraw that consent equally as easily. Where
consent is obtained through use of a service-specific user interface (for example, via a website, an
app, a log-on account, the interface of an IoT device or by e-mail), there is no doubt a data subject
must be able to withdraw consent via the same electronic interface, as switching to another interface
for the sole reason of withdrawing consent would require undue effort. Furthermore, the data
subject should be able to withdraw his/her consent without detriment. This means, inter alia, that a
controller must make withdrawal of consent possible free of charge or without lowering service
levels.

From http://ec.europa.eu/newsroom/article29/item-detail.cfm?item_id=615239, section 5.2 Withdrawal of consent, on page 21.

Then they go on describing an example, where consent is given via a one click web widget. And withdrawn, by making a phone call during business hours. And that’s not ok. To me it seems that having to switch to email and message the staff, not totally ok (although not quite as bad).

HAWK · March 28, 2018, 3:06am

That’s giving and revoking consent. It’s essentially saying that if you check a box to give consent to your data being stored then you need to be able to uncheck a box to revoke your consent not to delete your data. What happens then is unrelated and as @angus pointed out can occur over a one month period.

KajMagnus · March 28, 2018, 4:36am

@HAWK I didn’t write anything about the personal data having to get deleted immediately. I said apparently there does need to be a button (or checkbox), when someone else said no-button-or-checkbox-needed.

(I’m assuming people mean the same thing when they talk about a delete-account widget, forget-me widget, and a revoke-consent widget. I’m thinking it would delete the user’s personal data (but not the user’s CC-By licensed posts).)

In fact I think it can make sense to schedule the deletion a week later, in case the user changes his/her mind.

riking · March 28, 2018, 4:45am

This is important, and it would be reasonable to conclude that the rights of the other forum participants and readers to an accurate archive of conversations means that the Anonymize feature is plenty sufficient. Personal information in the posts themselves should be dealt with via case-by-case review and manual editing, either by the moderators or by the user.

mpalmer · March 28, 2018, 4:46am

That is a rather poor assumption to make.

HAWK · March 28, 2018, 4:46am

That implies that revoking consent means all data has to be deleted and I’m not sure that’s the case.

That said, what I think we should avoid doing is debating the semantics here. I think that until we see this in action we’re shooting into the dark.

RGJ · March 28, 2018, 6:07am

Please note the differences between GDPR Article 15.3 “Right of access” and GDPR article 20, “Right to data portability”

The JSON approach is an absolutely interesting approach (thank you for bringing it to our attention), but I have two concerns:

does it apply to article 15 as well?
one of the differences between article 20 and article 15 is that for article 20 a subset of the data will suffice, where article 15 requires all data to be made available. Right now there seem to be some fields (for instance my sign-up IP address, post reading times) that are not available to a user by means of a JSON API call.

For implementing The Right to Erasure it is indeed not required to provide a one-click button to users. The process (currently: “send a PM to admin”) should be clearly documented though. I also agree with you that deleting post content is not required, the countervailing interest of the other users to keep the discussion intact is larger.

However, I do see some impediments for GDPR compliance, and I think it’s a good idea to try to make a list. This is what I have right now:

IP addresses are stored in too many places without legitimate interest
GDPR does not require deletion of posts when a user is deleted, but Discourse does. You will have to anonymize instead, but that does not delete enough other data (see below)
Anonymizing a user leaves @ mentions
Anonymizing a user keeps (amongst others) IP addresses, reading times*, clicked links* in the database
If JSON is a usable method to implement article 15, it does not provide enough data

*) need to double check

HAWK · March 28, 2018, 6:33am

This is currently being addressed.

fefrei · March 28, 2018, 8:00am

This is actually pretty difficult because Discourse doesn’t allow staff to truly delete posts or post revisions.

I think that keeping reading time and clicked links is actually fine, because they are no longer PII once any ties to an identity are cut. IP addresses being retained when an account is anonymized sounds pretty bad, though!

RGJ · March 28, 2018, 4:06pm

I can agree with that.

McBlu · March 28, 2018, 4:27pm

Here is a great youtube webinar with three case studies on how three companies addressed gdpr compliance. The gdpr key facts and figures is really interesting, particularly for me the stats on consumers’ confidence which appears to have impacted subscription and membership behavior. We also get a sense of penalties for non compliance and multi-national companies’ prioritiziation of gdpr compliance. For me, the silver lining in gdpr for companies is gaining members/customers’ trust and being able to proudly and clearly state policies that the public can understand. The webinar goes into the three companies’ particular situations and what the three companies did to comply in detail. In the beginning the presenter shows how unclear the three companies are about gdpr and how to comply by asking the three companies to answer a few questions on what their understanding of gdpr is. The debate on the practical applications of gdpr seems to be wide spread.

FYI I am interested in this from the gaining/retaining the public’s trust angle, am not in Europe, and unlike others in this conversation have no knowledge or expertise to contribute, except for interest in where this conversation ends up.

Here is a snapshot of the facts and figures page which is packed with interesting stats:

angus · March 30, 2018, 4:23am

For anyone reading this topic, it’s important to keep in mind that we’re talking about a major law reform that is not yet in force, has not yet been applied in practice by any authority and not been tested in any court. It does build on previous laws, but it also introduces substantive changes.

It is also important to keep in mind that regulators are not going to be focused on your (relatively speaking) small community when they have to deal with companies like Facebook. This is not to say that you should not try to comply with the GDPR (you should!), but it’s important to keep in mind the hierarchy of concerns here.

Beyond Facebook there are a multitude of other companies that are of more interest to regulators, particularly advertisers whose business relies on third party data, before they get to your community. A community which is not built around selling, researching or otherwise processing data beyond what is required for the running of the community itself (assuming you’re using Discourse in a standard way).

That said, I also understand that changes like this are bound to cause anxiety, particularly for smaller operations who will struggle to afford a lawyer and don’t have the time to read and understand the seemingly complex detail in the GDPR, particularly if your business is based around Discourse, heightening your exposure to the issue (e.g. for @RGJ).

Consent

@KajMagnus raised the role of ‘consent’, so it’s worth dealing with (albeit, to point out that it probably doesn’t apply to data being processed in Discourse).

As has been pointed out, consent for data processing and the right to erasure are two different things.

If we were to look at consent as it applies to Discourse, there would be a few prior questions we would need to ask before we got to withdrawal, starting with: Is consent the basis on which data is being processed?

The other possible basis is the “legitimate interests pursued by the controller” (i.e. 6.1(f)). In fact, I think it’s much more likely that 6.1(f) is the basis on which most data is processed in Discourse as the user does not give explicit consent to the standard required in the GDPR for most “processing” that goes on in Discourse.

The exception here may be emails, but even if consent were the basis on which emails are being processed in Discourse (which is also open for debate), the withdrawal of consent for emails already exists (i.e. your email settings and the unsubscribe buttons).

Article 15

I would reiterate that the Right to Access, like the Right to Portability, is really an administrative matter rather than an technical one. If you were to get a request to access, you would not only have to provide the data, but all the other items listed in Article 15. Again, you (i.e. the Data Controller) will have up to one month to comply with the request.

I would also point out that the GDPR states that the reason the Right to Access exists is allow the user to “…be aware of, and verify, the lawfulness of the processing” (Recital 63). This is where the hierarchy of concerns that I mentioned earlier is relevant. For a standard Discourse forum It is highly unlikely that any user would have concerns that their data was being processed illegally. The thrust of the regulation is focused on the digital advertising and marketing industries. Again, this is not to say that the right should be ignored, but the purpose and context matters in both the legal interpretation and how it will be enforced.

Given the tenor of the Art 29 Working Party’s guidelines on data portability, I think it’s likely that a JSON API will be considered just as legitimate as alternatives (e.g. CSV) with respect to all of the rights. I would note that both articles refer to “commonly used” electronic form or format. I would also note that the guidelines on data portability make statements like “commonly used open formats (e.g. XML, JSON, CSV,…)”. I see no reason to think that JSON would not be considered as “common” or less legitimate of a format than CSV for any of the rights.

Recital 63, which discusses the Right to Access in a more a discursive form than Article 15, does contain this sentence:

Where possible, the controller should be able to provide remote access to a secure system which would provide the data subject with direct access to his or her personal data.

It’s important to note that this sentence does not read: “You should have a page where a user can download all their data in one csv zip file”. Having API access (including secure API access using user-tokens), seems to be a plausible implementation of this guideline.

None of this is to say that Discourse shouldn’t consider increasing the amount and types of data included in the download functionality on the user page. Facebook’s new features that allow you to download a copy of your data (which they seemingly launched in preparation for the GDPR) are an interesting point of comparison here (they give a list of what can be downloaded here). Rather, it does not seem that providing that specific functionality is required for GDPR compliance. Or even that it is considered better than providing API access to the same data.

Indeed, given that the GDPR seems quite keen on controllers and processors providing continuing and interoperable access to data, it seems, at this initial stage, that JSON API access is considered desirable.

Other

Which storage of IP addresses do you think are not legitimate?

I’m not sure what the concern is here.

I’m not sure what the concern is here either, as it applies to GDPR rights and responsibilities.

Again, I am not your lawyer and this it not legal advice.

tophee · March 30, 2018, 9:15pm

So ”they” are the Data Protection Authorities, I suppose? But can’t any user bring me (the forum owner) to court if s/he thinks I am not respecting their rights?

angus · March 31, 2018, 1:14am

The GDPR provides for both individual claims for compensation (Article 82) and enforcement by regulatory authorities (Article 83). No doubt, there will be systems for individuals to make complaints to authorities to assist in or decide claims for compensation. As you’re probably aware, in Europe regulatory and judicial authorities tend to take a more proactive and involved role in the enforcement of law, as opposed to the more adversarial systems in common law countries (i.e. the UK and its former colonies). The level of involvement and the procedure by which claims are dealt depends on the country.

This is why I would re-emphasise that it’s important to consider who your relevant data protection authority is and to follow their guidance. If there is a claim for compensation under the GDPR, it is likely that they will be involved in some way, or that the guidance they publish will be relevant in any legal proceedings.

Nothing I laid out in my last post should be taken as saying “you should ignore the GDPR because you’re not Facebook”. Like I said in my post prior to that, I would again reiterate that GDPR compliance involves more administrative preparation rather than technical fixes. If you read any of the guidance published by the DPA’s you’ll see that they emphasise having appropriate procedures in place to deal with a request if you get one, having appropriate documentation and giving appropriate notices.

There may also be technical fixes that can be applied in certain circumstances. There may be some improvements that we could make to Discourse in the way it handles things like IP addresses. However, on my reading of the GDPR and my understanding of Discourse, I’m personally yet to see a situation in which I can clearly say there is an issue requiring a technical solution. One may well arise, or be pointed out, and we can address it then.

It’s important to keep this in perspective. Like running any business or organisation, being a forum provider can potentially involve a whole host of legal obligations that extend far beyond the GDPR. Most of which you have probably never considered before. I bet if I looked closely at any of your forums I could find a number of potential legal issues (note: for various administrative reasons, I’m not currently in a position to provide this as a formal service, and I am not actually reviewing any of your forums for legal vulnerabilities). I’ve pointed out a few regarding the default Terms and Conditions, but that’s just scratching the surface.

I don’t say this to scare you, rather to point out that in your normal course of business you swim above an in-depth consideration of your strict legal obligations (which is normally just fine). On the whole it’s a good thing that the GDPR has made people think seriously about privacy. There are some good things to be said about the suite of rights the EC has devised to handle privacy in the internet age. However, for most people, trying to engage with the GDPR at the level of the EC directive itself is risky as there are bound to be various ways in which you can misinterpret both what your obligations are and their scope.

testingsoftware · April 6, 2018, 9:44pm

I am pretty sure that users could make a request in writing that they want their data deleted. I don’t think there is a need to add buttons or tick boxes for this but we had to deal with the request which could be made via Message or email.

Another thing is that a user might want some of the posts deleted as they might contain personal information that might have seemed a good idea to post at the moment and then regret after.

I tried to delete a post (without having to delete the whole account) and the post remains in the database.
I think this should be addressed and administrators should have the option to really delete a post.

KajMagnus · April 8, 2018, 2:15am

Yes I also think that should be enough — if you mean content data, like posts and stuff. For personal data, personally I think it makes sense & is simpler for the staff, to let people delete their own personal data via a button, I mean, anonymizing their own account.

Maybe it’d be good to make a distinction between data and data, and write “their content” (CC-By licensed) and “their personal data” instead, … otherwise when someone writes just “data” I’m never 100% certain what they mean :- ) (Content? Or personal data?)

a user might want some of the posts deleted as they might contain personal information

Yes, and … it needn’t even be the user him/herself who posted that personal info. Maybe a member contacts staff, because another member posted someone’s personal data. Maybe the user who contacts the staff, to have [personal data in some post] deleted, is not even be a member of the forum.

I tried to delete a post (without having to delete the whole account) and the post remains in the database.

Hmm wouldn’t it be enough to edit & remove the personal data from the post? I think since the post is CC-By licensed no one can force the staff to remove it … but, as far as I can tell, according to the CC-By license, one can withdraw one’s name from the CC-By post, so one isn’t associated with it any longer. So being able to edit the post and removing personal info about the author seems to me to be required by both CC-By ((here)[CC BY 4.0 Legal Code | Attribution 4.0 International | Creative Commons], section 3(a)(3)) and GDPR. … But what if @the_authors_full_name is present in older revisions of the post :- P

But if the post contains stuff that is illegal to even store on disk (e.g. because of copyright? or forbidden images?), then I suppose it’d be good to have a way to totally erase it. (But that’s not related to GDPR though?)

AstonJ · April 8, 2018, 4:58pm

I agree with this.

Allowing users to delete all of their posts can have a huge impact on the forum and the experience for everyone else - because Topics with posts missing can be difficult to read/follow, thus much of the forum can be rendered useless by even a small handful of (rouge?) users.

The terms should be clear that users who submit content allow perpetual publishing rights. Forums are not social networks and users who don’t agree to this collective contribution and retention of content should not contribute anything to the forum.

ljpp · April 10, 2018, 7:23pm

@michaeld Any chance that you could share some information regarding your configuration that is GDPR compliant?

Introducing Discourse forum hosting in Europe

Topic		Replies	Views
GDPR countdown and compliance Community gdpr	90	15046	June 19, 2018
GDPR and anonymizing personal data Community gdpr , privacy	75	19493	December 1, 2018
Why is ‘delete account’ not offered automatically to all users at all times? UX	39	789	December 1, 2024
GDPR tooling on Discourse? Community	4	1390	May 6, 2022
Questions about user anonymization and GDPR Support anonymization	22	1531	April 15, 2024

Providing data for GDPR

Related topics