Providing data for GDPR

Well, you’re not going to get a multi-million euro fine.
You’ll most likely get a warning and a timeframe to fix the infraction. (this assumes your infraction was in good faith)

And the rules are simple:

  • don’t store data you don’t have explicit permission to store
  • allow people to request a copy of the data you have from them
  • provide a means for people to delete their data from your system

But yes, if you did not take this into account when you set up your system (and to be fair, at the time you didn’t have a legal reason to), it’ll hurt for a bit.


This is one of the most ambiguously-defined parts of the legislation unfortunately.

Can you tell me all the places in the Discourse codebase where PII is stored, for how long, and in what format, and how this is treated by the GDPR, and what I would do to resolve it, and whether it counts as a “legitimate interest” or whether I need to prompt the user for consent?

If there was a single authoritative page on explaining this I’d feel a lot more confident, but there isn’t. There are a number of answers on meta, many of which are mutually exclusive.

It’s a mess, and I should point out that this is not the fault of the Discourse team at all - the problem lies squarely with the lawmakers IMO.


No; this really is on the Discourse team to provide. GDPR has been known to be on its way for two years now, this can not have been a surprise.

That depends on the data; but again, for the default set up, this should be provided by Discourse.
There are guidelines that can help, but yes, every time you store someone’s data, you need to think to yourself why you’re doing it. And “it might be useful” ceased to be a valid reason.

I fully disagree. The law is 20 years or so overdue, you can blame the lawmakers for that. Apart from that, it’s on us to apply some reflection. We (online businesses) have been gobbling up data for decades and considered it to be our given right to do so, that has to change.

One more thing, this is a European law, in general (I’m not a lawyer though) they tend to be less trigger-happy with fines than their American cousins. I would be highly surprised if any forum-owner is going to be hit by a fine without prior notice.


To be fair to all of us who operate ethically, we have only stored data that the users have provided to us.

I’m not sure there’s anything wrong with that, provided we do not use this data for nefarious purposes. If we do use the data for nefarious purposes, I believe existing laws would give users the power to challenge that.

1 Like

I think I prefer a stronger, more preventative law :slight_smile:

But enough of this, think we’ve sidetracked enough. Noticed your announcement on HN and followed you on Twitter; to be frank, I think our policy disagreement is a bit more fundamental than GDPR&Discourse :smiley:
Happy to discuss more, but let’s take it out of this topic

In that case, you’ll find compliance a lot easier than others :wink:

But I’ve only seen one instance so far where “we have only stored data that the users have provided to us”. In other words you don’t do any observing (eg. web analytics, so-called socialCRM) or inferencing (eg, personalisation).

1 Like

Don’t forget that there is a third option: performance of a contract or service, described in article 6.1.b of GDPR. There are a number of restrictions there but you could argue that storing the e-mail address of a user is necessary for being able to deliver the forum service to them.

If you would be selling those e-mail addresses, or collecting physical address information (or storing their IP addresses for years), that would require consent, since you don’t need those in order to run your forum.


Here is a short video providing a quick overview of GDPR (aimed at consumers)


The more I read about GDPR, the more I am confused.

Sorry if this was answered before but…

Should forum owners be concerned by this ?
If Yes, is Discourse GDPR ready?

I am European, my business and servers are based in the UK. So I am kind of worried…. :expressionless:

1 Like

Yes. Guaranteed that you are concerned if your forum is a commercial one (ads and whatnot), debatable if it’s a small non-commercial forum (but by debatable I mean I called a lawyer in my country and they told me that yes, I should try and be compliant).

Not exactly. There are few main challenges left (and if those are taken care of then Discourse would be in pretty good spot):

  • Agreement to your terms of service has to be explicit, not implicit. Aka via a checkbox. You need to store this data somewhere afterwards too - on when the user agreed to it. This is kinda important cuz for instance sending someone emails without their consent can cause QUITE severe repercussions (here’s an example from UK). Also - one checkbox per one type of personal data. You can’t have a generic “I agree to the terms of service” and have a 50 pages document there.
  • If your ToS changes everyone should be asked to reconfirm that they still agree.

You are at least partially covered with other points - Discourse provides means to be forgotten (although you still need to ensure that you won’t store this data forever in backups), there are ways of exporting personal data too.

Of course you also need to create your document on GDPR. Aka what you store, how important it is and how you secure it (GDPR does not actually state many official guidelines, you decide by yourself on how to accomplish a sufficient level of security for your application and only in case of a failure need to prove that this was reasonable). You can have a lawyer help you write one or you can do it yourself - eg. that all data is on a server hosted in Europe (good if you have a data processing agreement with your ISP), backups are encrypted and kept on Amazon S3 bucket (located in Frankfurt for instance), you are using a full drive encryption and you cleanse logs every 30 days (to get rid of old IP addresses and whatnot since they too fall under GDPR). Then you write that you store email addresses (explicitely), IP addresses (implicitely but only for logging/security reasons for X days), names/surnames (explicitely) and if you use marketing services - which ones and what kind of data about your users goes to them. This should be sufficient for a smaller forum.


If your forum is connected to a large-ish business (i.e support forums for a product) then yes, you probably should be worried and may want to seek legal advice.

If you are running more of a stand-alone community then I probably wouldn’t be as worried… for now. (Unless you are doing something silly like sending mailers/spam that are not part of what’s expected of the forum.)


I considered this when setting up some servers recently. The obvious way to do it requires a human to type the password every time the machine reboots. Either I’m missing some obvious solution or this simply isn’t feasible.

Aha! Ask @mpalmer as he is an expert on this :slight_smile:

There are many, many things I’d like to ask @mpalmer! Don’t tell him I asked about this, though, because I’m really interested in his answer on this.

EDIT: Just great. He’s already noticed this message! And I really wanted his CPU cycles for my show-stopper problem. :wink:

1 Like

Too late! The power of @-mentions compels me!

Yes, the default way of doing full-disk encryption means you’re entering a passphrase at the console on every reboot. For servers, that’s somewhere between annoying and a showstopper, depending on factors like whether you have access to the console during boot (via IPMI or in a VM), what your uptime requirements are, how often unexpected reboots happen, and the probable delay between someone noticing the unexpected reboot has happened (coughmonitoringcough) and being able to enter the passphrase.

The “access to the console” problem can be ameliorated by installing the dropbear-initramfs package; it provides a minimal SSH server available before the system is booted, so you can SSH in and enter the unlock passphrase. It doesn’t solve any of the other problems, though. The other way around this problem is to not encrypt the root filesystem, but only the data filesystems; that requires you to partition appropriately, and script the unlock/mount sequence yourself. It is, however, a viable alternative if you’re not running a distro with dropbear-initramfs or equivalent available.

If you’re running a larger cluster, you can avoid the downtime problems of delayed unlock with proper replication and high availability. Basically, if your setup could withstand a machine just up and dying (not a crash-and-needs-a-reboot, but “catches fire”-grade failure) without impacting availability, then it taking a few hours for someone to login and enter a passphrase isn’t going to be a problem either.

Another way to avoid the passphrase problem in a larger cluster is to run something like Mandos, where machines provide each other the unlock passphrase at boot. It’s not as secure as manual entry, but it guards against all but the most determined of deliberate attacks (that FDE would otherwise guard against; ie offline attacks).

Finally, the gold standard holy grail of FDE is to use the TPM (if the machine has one) to store the unlock passphrase. In order to do that properly, though, you need to fully enable secure boot (with all its attendant complexity and risks), because it’s a bit pointless to use the TPM to store the passphrase if an attacker could just ask it nicely to cough up the creds.

Security is hard. Let’s go debugging!


A+++ wall of text, deployed with surgical precision, would read again!


I’ve already read it twice. And though I’ve started using HAProxy, I’m afraid that the Literate Computing Server Farm is not (yet?) one where High Availability is an option. When a machine catches fire, I’ll need to wake up, take notice, and manually restore backups on another machine. Most of what these machines will be doing will be one-time tasks anyway.

How do we do this? The “Download All” button on the profile page does not provide enough data since it only provides topics and not any other personal data.

Hey there. You can download the rest yourself from your database to provide to the user. It’s not intended to be a one-click self-serve situation (GDPR doesn’t require that).

If you are the user, you’ll need to contact an admin on the site in question.

Does that clarify?


Yep, that clarifies it.
I know GDPR doesn’t require it to be a one-click self-serve thing, but that would indeed make it easy for us admins :slight_smile:
Is there any examples of a data explorer query I can run to fulfill this request or should I manually go through all tables to see what data I should export to the user? I tried a quick search for one, but couldn’t find any in my short hunt.
I suppose I am not the only one looking for a solution to this, so instead of reinventing the wheel each time it would be awesome if it was in some kind of wiki on how to do it.