Providing data for GDPR

Sorry I am late to this conversation, but this is a very important issue and it does not appear it reached resolution.

It strikes me that GDPR compliance does mean changes to Discourse - maybe this is all in hand but I have not found it!

Data Access Requests
Most obvious thing - work for @sam I guess! - is to provide a one click ‘Data Access Request’ that does not need admin intervention. As far as I understand it, the GDPR requires data controllers to provide all personal information on request. This is very much like the existing UK Data Protection Act and can be onerous if you do not build it into the systems.

What I was thinking of was new section on your summary page (/u/richp10/summary) called something like ‘Your Data’. On that page, provide a brief explanation that the user can get a complete download of all the personal data kept on the site by clicking this button.

The app would then create a PDF of all the personal information and email it to them or display a link to download. Actually, the same mechanism as the ‘Download All’ button would be perfect. At a stroke, this solves compliance with that part of the legislation.

Right to be forgotten A
I think the above conversation reached the conclusion that posts are not - in general - personal information under the regulations. I agree with that assessment - whilst acknowledging that it is possible that posts exist which contain personal information.

My thought would be that on the ‘Your Data’ tab, we explain that shutting down the account removes all the personal data but leaves posts in place. Point out that if the user wants to shut the account they might want to remove some posts first.

I think it needs to be more obvious how to close down the account and more explicit what the implications are.

Right to be forgotten B
Although I think the above would comply with the law - I would like the option of going further.

My use of Discourse is with a vulnerable population, mostly with mental health problems. I would very much like to be able to allow people to entirely remove their content from the site, even though this has a cost in breaking up the conversations.

It would be great if we could have and provide the options - 1) Shutdown the account removing personal data but leaving posts or 2) Shutdown and deep clean - everything must go. Ideally we could configure to allow just one or both these options.

The privacy policy issue does not worry me, that is a question of wording.



Also worth pointing out - some people might think GDPR does not apply to you if you are not based in the EU - this is not the case - it applies if anyone uses your website who lives in the EU. This particular net is thrown very wide and an IP address is considered personal information under GDPR.

With the current online privacy fiasco at FB we really need to be on top of this - I think GDPR is setting the bar about right and most of us who run forums need to be able to confirm to stakeholders that our sites are GDPR compliant and we ‘get it’ that concerns about data privacy are legitimate.


What personal data are you referring to ? I kinda feel like this might be overkill but I’m curious to hear more.

Isn’t that the two options that we currently have – delete or anonymise?


There is a clear definition. There is no overkill, this is required by GDPR.

Article 4:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

and article 15:

The controller shall provide a copy of the personal data undergoing processing


That is correct. There currently already is a ‘Download All’ button in your profile. Problem is, it doesn’t contain all personal information, it just contains the posts you have made.

The right to be forgotten would indeed be accomplished sufficiently with the delete and anonymize functions. But they can only be performed by an admin, if I’m correct.
On one hand you don’t want to encourage them, on the other hand it would help if this would at least be something that could be requested from within the user interface. Right now you have to know that you need to message an admin about this.


I meant overkill to download as a pdf not to access your personal data. And I was asking for clarification around which data was supposed to be in the pdf.

Yup but I think that’s important.


Under the Data Protection Act - on which GDPR seems to be based, it is considered best practice to actually provide a copy of the data. Saying that they can find this if they dig through various profile pages does not really cut the mustard.

This is the definition under the act.

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an
identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an
identifier such as a name, an identification number, location data, an online identifier or to one or more factors
specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

So this would include all name, email, dob, address, long / lat location data, IP address. So if you have recorded IP address for every login - you need to provide every single one. Same with location data, or any changed addresses that you keep.

ALL such information that exists on the Discourse database needs to be provided to the user on request. I think it would be better to have a simple mechanism so they can do this themselves - and ideally a pdf emailed to them. Discourse should record that a data access request was made and the information sent - in case this is questioned later.

If anyone falls out with your forum - they might later claim that you breached GDPR regulations and it is a good idea to have data that proves you did as required.

The simplest and safest road is to provide everything - no quibble.


No - I am proposing a delete option that removes everything. At the moment you cannot entirely delete an account which has posts (if over two months old I think), you can only anonymize it.


Nope, you can delete account with posts elder of 2 months, there is the site setting delete user max post age to do that


The GDPR is not based on the Data Protection Act. the GDPR is the ‘sequel’ of directive 95/46/EG, and the DPA is an implementation of that same directive. That is why they are similar on some aspects.

Under the GDPR, providing a copy of the data is required.

Although from a personal note I think anything would be better than the current gzipped CSV file, it is more GDPR compliant than a PDF. The law requires (article 20)

a structured, commonly used and machine-readable format

So a CSV would be better, since a PDF is less clearly structured and less machine-readable.


Ah - I didn’t know that - thanks…

Got it, I think you are right (though for most users a pdf might be easier!) Just adding the personal information would probably make it legally compliant…

1 Like

Do we have clarity about what “the data” includes? I see two extreme possibilities:

  1. only the personal data (e.g. the name, IP address, etc
  2. all of the above plus all data somehow linked to these (e.g. how much time that person has spent reading each and every post on the forum).

I doubt that either of these extremes is the correct interpretation of “the data”, and I hope the “truth” doesn’t lie too close to #2.

BTW: To keep these discussions as focused as possible, I’d like to suggest that someone moves this latest topic digression about providing data into a new topic (and tags it #gdpr).


It’s #2.

‘personal data’ means any information relating to an identified or identifiable natural person


Entirely agree that for the purpose of a data access request it is #2 - everything must be provided. All posts, the lot. Good thing is that existing functionality provides most of this, we just need the personal information adding.

For the purposes of the right to be forgotten… Would you agree that:

If a user anonymizes their account but leaves posts in place - the remaining posts are no longer personal data because they are no longer ‘linked’ to identifiers. If so, it would appear that the ‘anonymize’ function is sufficient to comply with the right to be forgotten - as long as it also removes IP addresses, email addresses etc in the background.

There remains a risk that posts contain personal identifiers. My thinking is to offer users (not via admin) both options - anonymize and accept that posts remain (so we have consent in the event some identifiers remain) or complete deletion.


Isn’t that a risk also before the account got deleted? Personal data in comments & topics is maybe rather bad, most of the time? Because in general the staff doesn’t know if the one identified in the text, is okay with that. And if someone types his/her own name and details (which, intuitively, one should be allowed to do?) — then, in general, the staff still wouldn’t know if s/he is really the one s/he claims to be. Maybe s/he is an impostor, and the real person doesn’t want any of his/her name & info there.

Maybe personal data in comments & posts, should in general be deleted immediately (I mean, when the staff or core members sees it) and the author be sent a warning? Rather than waiting util the relevant account gets deleted. For example Reddit has a policy against posting any PII; one can get banned quickly by posting PII. (Public figures, like politicians & celebrities = exceptions)

If someone wants to tell the world who s/he is — then s/he can use his/her profile bio text, for that. And if later on s/he deletes the account, then the bio disappears, all is fine.

Maybe enabling-deletion-of-all-one’s-old-posts could be a forum wide option that could be turned on by admins? I’m thinking both alternatives make sense: some forums, with sensitive data (e.g. heath issues) might want to make their users feel extra safe & respected, by enabling the “delete all my old comments” button. Whilst the default, for “normal” forums, could be to disallow that (to avoid “destroying” old discussions).

This is a requirement in some form in Europe with real world consequences if you don’t comply. Having this might also be a selling point in this current anti-Facebook backlash. People seem to be turned off by the sharing of their personal info and knowing they have control over deleting personal information could be a plus.


So on discourse, that would mean all the read times for all posts etc? If so, I suppose it would suffice to provide the post-ID, right? Or does the post itself become part of my personal data because I have read it for X seconds and discourse is saving that information in its database?

No, reading a post will not make it part of your personal data.

Since a post ID is not visible to a user, the topic ID and post number would be better.

1 Like

Here is a good starter article for people interested in this topic: