Providing data for GDPR

This is actually pretty difficult because Discourse doesn’t allow staff to truly delete posts or post revisions.

I think that keeping reading time and clicked links is actually fine, because they are no longer PII once any ties to an identity are cut. IP addresses being retained when an account is anonymized sounds pretty bad, though!


I can agree with that.

Here is a great youtube webinar with three case studies on how three companies addressed gdpr compliance. The gdpr key facts and figures is really interesting, particularly for me the stats on consumers’ confidence which appears to have impacted subscription and membership behavior. We also get a sense of penalties for non compliance and multi-national companies’ prioritiziation of gdpr compliance. For me, the silver lining in gdpr for companies is gaining members/customers’ trust and being able to proudly and clearly state policies that the public can understand. The webinar goes into the three companies’ particular situations and what the three companies did to comply in detail. In the beginning the presenter shows how unclear the three companies are about gdpr and how to comply by asking the three companies to answer a few questions on what their understanding of gdpr is. The debate on the practical applications of gdpr seems to be wide spread.

FYI I am interested in this from the gaining/retaining the public’s trust angle, am not in Europe, and unlike others in this conversation have no knowledge or expertise to contribute, except for interest in where this conversation ends up.

Here is a snapshot of the facts and figures page which is packed with interesting stats:

1 Like

For anyone reading this topic, it’s important to keep in mind that we’re talking about a major law reform that is not yet in force, has not yet been applied in practice by any authority and not been tested in any court. It does build on previous laws, but it also introduces substantive changes.

It is also important to keep in mind that regulators are not going to be focused on your (relatively speaking) small community when they have to deal with companies like Facebook. This is not to say that you should not try to comply with the GDPR (you should!), but it’s important to keep in mind the hierarchy of concerns here.

Beyond Facebook there are a multitude of other companies that are of more interest to regulators, particularly advertisers whose business relies on third party data, before they get to your community. A community which is not built around selling, researching or otherwise processing data beyond what is required for the running of the community itself (assuming you’re using Discourse in a standard way).

That said, I also understand that changes like this are bound to cause anxiety, particularly for smaller operations who will struggle to afford a lawyer and don’t have the time to read and understand the seemingly complex detail in the GDPR, particularly if your business is based around Discourse, heightening your exposure to the issue (e.g. for @RGJ).


@KajMagnus raised the role of ‘consent’, so it’s worth dealing with (albeit, to point out that it probably doesn’t apply to data being processed in Discourse).

As has been pointed out, consent for data processing and the right to erasure are two different things.

If we were to look at consent as it applies to Discourse, there would be a few prior questions we would need to ask before we got to withdrawal, starting with: Is consent the basis on which data is being processed?

The other possible basis is the “legitimate interests pursued by the controller” (i.e. 6.1(f)). In fact, I think it’s much more likely that 6.1(f) is the basis on which most data is processed in Discourse as the user does not give explicit consent to the standard required in the GDPR for most “processing” that goes on in Discourse.

The exception here may be emails, but even if consent were the basis on which emails are being processed in Discourse (which is also open for debate), the withdrawal of consent for emails already exists (i.e. your email settings and the unsubscribe buttons).

Article 15

I would reiterate that the Right to Access, like the Right to Portability, is really an administrative matter rather than an technical one. If you were to get a request to access, you would not only have to provide the data, but all the other items listed in Article 15. Again, you (i.e. the Data Controller) will have up to one month to comply with the request.

I would also point out that the GDPR states that the reason the Right to Access exists is allow the user to “…be aware of, and verify, the lawfulness of the processing” (Recital 63). This is where the hierarchy of concerns that I mentioned earlier is relevant. For a standard Discourse forum It is highly unlikely that any user would have concerns that their data was being processed illegally. The thrust of the regulation is focused on the digital advertising and marketing industries. Again, this is not to say that the right should be ignored, but the purpose and context matters in both the legal interpretation and how it will be enforced.

Given the tenor of the Art 29 Working Party’s guidelines on data portability, I think it’s likely that a JSON API will be considered just as legitimate as alternatives (e.g. CSV) with respect to all of the rights. I would note that both articles refer to “commonly used” electronic form or format. I would also note that the guidelines on data portability make statements like “commonly used open formats (e.g. XML, JSON, CSV,…)”. I see no reason to think that JSON would not be considered as “common” or less legitimate of a format than CSV for any of the rights.

Recital 63, which discusses the Right to Access in a more a discursive form than Article 15, does contain this sentence:

Where possible, the controller should be able to provide remote access to a secure system which would provide the data subject with direct access to his or her personal data.

It’s important to note that this sentence does not read: “You should have a page where a user can download all their data in one csv zip file”. Having API access (including secure API access using user-tokens), seems to be a plausible implementation of this guideline.

None of this is to say that Discourse shouldn’t consider increasing the amount and types of data included in the download functionality on the user page. Facebook’s new features that allow you to download a copy of your data (which they seemingly launched in preparation for the GDPR) are an interesting point of comparison here (they give a list of what can be downloaded here). Rather, it does not seem that providing that specific functionality is required for GDPR compliance. Or even that it is considered better than providing API access to the same data.

Indeed, given that the GDPR seems quite keen on controllers and processors providing continuing and interoperable access to data, it seems, at this initial stage, that JSON API access is considered desirable.


Which storage of IP addresses do you think are not legitimate?

I’m not sure what the concern is here.

I’m not sure what the concern is here either, as it applies to GDPR rights and responsibilities.

Again, I am not your lawyer and this it not legal advice.


So ”they” are the Data Protection Authorities, I suppose? But can’t any user bring me (the forum owner) to court if s/he thinks I am not respecting their rights?

1 Like

The GDPR provides for both individual claims for compensation (Article 82) and enforcement by regulatory authorities (Article 83). No doubt, there will be systems for individuals to make complaints to authorities to assist in or decide claims for compensation. As you’re probably aware, in Europe regulatory and judicial authorities tend to take a more proactive and involved role in the enforcement of law, as opposed to the more adversarial systems in common law countries (i.e. the UK and its former colonies). The level of involvement and the procedure by which claims are dealt depends on the country.

This is why I would re-emphasise that it’s important to consider who your relevant data protection authority is and to follow their guidance. If there is a claim for compensation under the GDPR, it is likely that they will be involved in some way, or that the guidance they publish will be relevant in any legal proceedings.

Nothing I laid out in my last post should be taken as saying “you should ignore the GDPR because you’re not Facebook”. Like I said in my post prior to that, I would again reiterate that GDPR compliance involves more administrative preparation rather than technical fixes. If you read any of the guidance published by the DPA’s you’ll see that they emphasise having appropriate procedures in place to deal with a request if you get one, having appropriate documentation and giving appropriate notices.

There may also be technical fixes that can be applied in certain circumstances. There may be some improvements that we could make to Discourse in the way it handles things like IP addresses. However, on my reading of the GDPR and my understanding of Discourse, I’m personally yet to see a situation in which I can clearly say there is an issue requiring a technical solution. One may well arise, or be pointed out, and we can address it then.

It’s important to keep this in perspective. Like running any business or organisation, being a forum provider can potentially involve a whole host of legal obligations that extend far beyond the GDPR. Most of which you have probably never considered before. I bet if I looked closely at any of your forums I could find a number of potential legal issues (note: for various administrative reasons, I’m not currently in a position to provide this as a formal service, and I am not actually reviewing any of your forums for legal vulnerabilities). I’ve pointed out a few regarding the default Terms and Conditions, but that’s just scratching the surface.

I don’t say this to scare you, rather to point out that in your normal course of business you swim above an in-depth consideration of your strict legal obligations (which is normally just fine). On the whole it’s a good thing that the GDPR has made people think seriously about privacy. There are some good things to be said about the suite of rights the EC has devised to handle privacy in the internet age. However, for most people, trying to engage with the GDPR at the level of the EC directive itself is risky as there are bound to be various ways in which you can misinterpret both what your obligations are and their scope.


I am pretty sure that users could make a request in writing that they want their data deleted. I don’t think there is a need to add buttons or tick boxes for this but we had to deal with the request which could be made via Message or email.

Another thing is that a user might want some of the posts deleted as they might contain personal information that might have seemed a good idea to post at the moment and then regret after.

I tried to delete a post (without having to delete the whole account) and the post remains in the database.
I think this should be addressed and administrators should have the option to really delete a post.

1 Like

Yes I also think that should be enough — if you mean content data, like posts and stuff. For personal data, personally I think it makes sense & is simpler for the staff, to let people delete their own personal data via a button, I mean, anonymizing their own account.

Maybe it’d be good to make a distinction between data and data, and write “their content” (CC-By licensed) and “their personal data” instead, … otherwise when someone writes just “data” I’m never 100% certain what they mean :- ) (Content? Or personal data?)

a user might want some of the posts deleted as they might contain personal information

Yes, and … it needn’t even be the user him/herself who posted that personal info. Maybe a member contacts staff, because another member posted someone’s personal data. Maybe the user who contacts the staff, to have [personal data in some post] deleted, is not even be a member of the forum.

I tried to delete a post (without having to delete the whole account) and the post remains in the database.

Hmm wouldn’t it be enough to edit & remove the personal data from the post? I think since the post is CC-By licensed no one can force the staff to remove it … but, as far as I can tell, according to the CC-By license, one can withdraw one’s name from the CC-By post, so one isn’t associated with it any longer. So being able to edit the post and removing personal info about the author seems to me to be required by both CC-By ((here)[Creative Commons — Attribution 4.0 International — CC BY 4.0], section 3(a)(3)) and GDPR. … But what if @the_authors_full_name is present in older revisions of the post :- P

But if the post contains stuff that is illegal to even store on disk (e.g. because of copyright? or forbidden images?), then I suppose it’d be good to have a way to totally erase it. (But that’s not related to GDPR though?)

I agree with this.

Allowing users to delete all of their posts can have a huge impact on the forum and the experience for everyone else - because Topics with posts missing can be difficult to read/follow, thus much of the forum can be rendered useless by even a small handful of (rouge?) users.

The terms should be clear that users who submit content allow perpetual publishing rights. Forums are not social networks and users who don’t agree to this collective contribution and retention of content should not contribute anything to the forum.


@michaeld Any chance that you could share some information regarding your configuration that is GDPR compliant?

Introducing Discourse forum hosting in Europe

Sure. Although GDPR is mostly about processes and not that much about configuration.

Of course we have made sure that we have all the right things in place. Patch management, security best practices, ISO 27001 data center provider (Frankfurt, Germany) with a data processing agreement between us and them. On top of that we will* run nginx (or more specific: openresty) that is configured to remove the last octet from all IP adresses , and a Discourse with a patched rate limiter (using a plugin) so it can deal with the missing octet.

Backups and email use European data centers too (for European customers)

(*) I’m saying we “will” run that because we’re currently still ironing out the last details in that plugin)

Does this answer your question?


Unfortunately, the regulators are not the ones to worry about. They are chronically understaffed, and a GDPR specialized lawyer has told me that the relevant agencies have only received minimal budget increases to deal with the new beast. The real threat stems from EU located individuals – either acting on their own, or as proxies for organizations and lawyers – who want to harm your business or community, for whatever reason.

People living in the German speaking part of Europe are aware of the notorious “Abmahnanwälte”. These are typically individual lawyers or legal practices, which are entirely specialized on suing the operators of websites which are not compliant with various regulations. They will often go after small to midsize companies, which don’t have the expertise or resources to fight long drawn legal disputes, in the hope that they will just give in and settle out of court, or accept a fine. A court ruling in the EU can be enforced in countries outside the EU, providing the country in question has a functional legal system.

We must not forget that a discussion forum can potentially have an important influence on broader public opinion, media and even policy. I am providing service to a quite vocal patient organization (on a purely nonprofit basis). A company with very deep pockets is not at all happy about their existence, and would be glad to see them gone. Even though I am not in panic mode, I am worried about GDPR being exploited for solving such conflicts of interest. In my case, I find it crucial to have as few flanks open as possible, as to not invite potential attacks.


Thanks, this is interesting.

I should reiterate up top that I fully stand by everything I said in my previous posts. Following the guidance of your Data Protection Authority is still the first (and normally last) port of call. What we’re discussing here is what do in a (theoretical at this stage) edge case.

Yes, this is a fair point. Litigation is used like this in common law countries as well. This aspect of the discussion about the GDPR has been nagging me, as it does seem to introduce a private right of action (albeit, how that can and will be used is yet to be seen).

The typical way smaller entities deal with legal threats from bigger entities is by pooling resources. The point of abusive litigation tactics is to divide and conquer. Even if one community were to hire a lawyer now and get some initial advice, in this event of this kind of suit, it may not be enough.

One thing that occurred to me yesterday was whether it would be possible for small, community focused, data controllers and processors (i.e. Discourse communities) to join forces with the already existing community efforts to pool resources for GDPR enforcement against larger entities, in particular I had this organisation and its crowdfunding campaign in mind.

These guys seem to have a fair bit of support: Our Team, Members and Partners |

It may seem a bit strange at first, but I think there are some shared cultural touchstones (e.g. support of open source, tech community culture, support for individuals and small entities vs big entities etc.) that could make projects like this a natural ally.

Even if it didn’t result in specific advice, there would be benefit in culturally aligning with this side of the privacy discourse in the EU.

Does anyone know Max Schrems…?

@erlend_sh I understand that Discourse itself may not want to get involved in this kind thing, but I’d be interested in your thoughts on this specific point of the GDPR discussion (i.e. the pooling of resources and cultural alignment with the ‘privacy’ side of the tech community in the EU as a strategic step).


We’re certainly interested in such efforts, but at this point we’ve still got our hands full getting our own GDPR policies in place. I feel like there will be more of substance to talk about when we’ve lived with GDPR in practice for a little while.


Hi Everyone,

I think the most important things to do about the GDPR is to let the users download everything our discourse websites have about they and also to let the users delete everything if they want. At least that’s what this law ask for.

Someone ask why to do that if the TOS says everything a user publish become the forum property. That’s exactly about. This new law GDPR not let the companies to own the users informations even they agree.

Even this page, become “illegal” since May 25, 2018, because I’m from Europe and they don’t let me download all data discourse stores about me and my account. (Just an example). Also, there are no options to remove all my data without deleting my account.

That is not completely correct. It’s not about property or ownership, it’s about the right to request deletion. As I have pointed out before, article 17.3 of the GDPR provides for an exception where processing is necessary for “exercising the right of freedom of expression and information”;

That is not a requirement either.

There is no automatic mechanism, but maybe you can ask and they will process your request manually.


I really disagree with this quite a lot.

As a forum admin you can search for @bobthedeleted and just edit the posts and hide revisions if you must. Doing this automatically is very wrongheaded and full of edge cases.

What about posts that said:

I agree with what Bob the deleted said.


I agree with what Bob said.


Bob The Deleted was wrong


@bobthedeleted is a great username to use.

And so on and so on, I can list edge cases here all day.

After anonymization we can queue a rebake maybe on posts with mentions so they turn from @sam to @sam but this can be done today anyway. I don’t see why we are responsible for some magical, impossible to build right feature here.


That anonymizing a user leaves @ mentions intact is not an opinion but a fact, how can you disagree ?

I totally understand there are lots of edge cases, and I also understand that this is a pretty hard thing to do. But I wasn’t saying that you are “responsible” nor that you should fix it. I was merely stating that this is something where the user anonymization feature is not perfect.

Although I’m now getting confused whether you guys are working on this or not…


Hi Guys,

Discourse still have to let users download everything it has about they.

We already do that :slight_smile: