Providing data for GDPR

Bas · April 30, 2018, 9:33am

Well, you’re not going to get a multi-million euro fine.
You’ll most likely get a warning and a timeframe to fix the infraction. (this assumes your infraction was in good faith)

And the rules are simple:

don’t store data you don’t have explicit permission to store
allow people to request a copy of the data you have from them
provide a means for people to delete their data from your system

But yes, if you did not take this into account when you set up your system (and to be fair, at the time you didn’t have a legal reason to), it’ll hurt for a bit.

ChrisBeach · April 30, 2018, 12:34pm

This is one of the most ambiguously-defined parts of the legislation unfortunately.

Can you tell me all the places in the Discourse codebase where PII is stored, for how long, and in what format, and how this is treated by the GDPR, and what I would do to resolve it, and whether it counts as a “legitimate interest” or whether I need to prompt the user for consent?

If there was a single authoritative page on Discourse.org explaining this I’d feel a lot more confident, but there isn’t. There are a number of answers on meta, many of which are mutually exclusive.

It’s a mess, and I should point out that this is not the fault of the Discourse team at all - the problem lies squarely with the lawmakers IMO.

Bas · April 30, 2018, 12:49pm

No; this really is on the Discourse team to provide. GDPR has been known to be on its way for two years now, this can not have been a surprise.

That depends on the data; but again, for the default set up, this should be provided by Discourse.
There are guidelines that can help, but yes, every time you store someone’s data, you need to think to yourself why you’re doing it. And “it might be useful” ceased to be a valid reason.

I fully disagree. The law is 20 years or so overdue, you can blame the lawmakers for that. Apart from that, it’s on us to apply some reflection. We (online businesses) have been gobbling up data for decades and considered it to be our given right to do so, that has to change.

One more thing, this is a European law, in general (I’m not a lawyer though) they tend to be less trigger-happy with fines than their American cousins. I would be highly surprised if any forum-owner is going to be hit by a fine without prior notice.

ChrisBeach · April 30, 2018, 12:52pm

To be fair to all of us who operate ethically, we have only stored data that the users have provided to us.

I’m not sure there’s anything wrong with that, provided we do not use this data for nefarious purposes. If we do use the data for nefarious purposes, I believe existing laws would give users the power to challenge that.

Bas · April 30, 2018, 1:14pm

I think I prefer a stronger, more preventative law

But enough of this, think we’ve sidetracked enough. Noticed your announcement on HN and followed you on Twitter; to be frank, I think our policy disagreement is a bit more fundamental than GDPR&Discourse
Happy to discuss more, but let’s take it out of this topic

sheldrake · April 30, 2018, 1:31pm

In that case, you’ll find compliance a lot easier than others

But I’ve only seen one instance so far where “we have only stored data that the users have provided to us”. In other words you don’t do any observing (eg. web analytics, so-called socialCRM) or inferencing (eg, personalisation).

RGJ · April 30, 2018, 2:26pm

Don’t forget that there is a third option: performance of a contract or service, described in article 6.1.b of GDPR. There are a number of restrictions there but you could argue that storing the e-mail address of a user is necessary for being able to deliver the forum service to them.

If you would be selling those e-mail addresses, or collecting physical address information (or storing their IP addresses for years), that would require consent, since you don’t need those in order to run your forum.

AstonJ · May 4, 2018, 2:51pm

Here is a short video providing a quick overview of GDPR (aimed at consumers)

Carlo · May 4, 2018, 3:38pm

The more I read about GDPR, the more I am confused.

Sorry if this was answered before but…

Should forum owners be concerned by this ?
If Yes, is Discourse GDPR ready?

I am European, my business and servers are based in the UK. So I am kind of worried….

ziptofaf · May 4, 2018, 11:15pm

Yes. Guaranteed that you are concerned if your forum is a commercial one (ads and whatnot), debatable if it’s a small non-commercial forum (but by debatable I mean I called a lawyer in my country and they told me that yes, I should try and be compliant).

Not exactly. There are few main challenges left (and if those are taken care of then Discourse would be in pretty good spot):

Agreement to your terms of service has to be explicit, not implicit. Aka via a checkbox. You need to store this data somewhere afterwards too - on when the user agreed to it. This is kinda important cuz for instance sending someone emails without their consent can cause QUITE severe repercussions (here’s an example from UK). Also - one checkbox per one type of personal data. You can’t have a generic “I agree to the terms of service” and have a 50 pages document there.
If your ToS changes everyone should be asked to reconfirm that they still agree.

You are at least partially covered with other points - Discourse provides means to be forgotten (although you still need to ensure that you won’t store this data forever in backups), there are ways of exporting personal data too.

Of course you also need to create your document on GDPR. Aka what you store, how important it is and how you secure it (GDPR does not actually state many official guidelines, you decide by yourself on how to accomplish a sufficient level of security for your application and only in case of a failure need to prove that this was reasonable). You can have a lawyer help you write one or you can do it yourself - eg. that all data is on a server hosted in Europe (good if you have a data processing agreement with your ISP), backups are encrypted and kept on Amazon S3 bucket (located in Frankfurt for instance), you are using a full drive encryption and you cleanse logs every 30 days (to get rid of old IP addresses and whatnot since they too fall under GDPR). Then you write that you store email addresses (explicitely), IP addresses (implicitely but only for logging/security reasons for X days), names/surnames (explicitely) and if you use marketing services - which ones and what kind of data about your users goes to them. This should be sufficient for a smaller forum.

AstonJ · May 4, 2018, 11:48pm

If your forum is connected to a large-ish business (i.e support forums for a product) then yes, you probably should be worried and may want to seek legal advice.

If you are running more of a stand-alone community then I probably wouldn’t be as worried… for now. (Unless you are doing something silly like sending mailers/spam that are not part of what’s expected of the forum.)

pfaffman · May 5, 2018, 12:42am

I considered this when setting up some servers recently. The obvious way to do it requires a human to type the password every time the machine reboots. Either I’m missing some obvious solution or this simply isn’t feasible.

codinghorror · May 5, 2018, 12:44am

Aha! Ask @mpalmer as he is an expert on this

pfaffman · May 5, 2018, 12:45am

There are many, many things I’d like to ask @mpalmer! Don’t tell him I asked about this, though, because I’m really interested in his answer on this.

EDIT: Just great. He’s already noticed this message! And I really wanted his CPU cycles for my show-stopper problem.

mpalmer · May 5, 2018, 12:57am

Too late! The power of @-mentions compels me!

Yes, the default way of doing full-disk encryption means you’re entering a passphrase at the console on every reboot. For servers, that’s somewhere between annoying and a showstopper, depending on factors like whether you have access to the console during boot (via IPMI or in a VM), what your uptime requirements are, how often unexpected reboots happen, and the probable delay between someone noticing the unexpected reboot has happened (coughmonitoringcough) and being able to enter the passphrase.

The “access to the console” problem can be ameliorated by installing the dropbear-initramfs package; it provides a minimal SSH server available before the system is booted, so you can SSH in and enter the unlock passphrase. It doesn’t solve any of the other problems, though. The other way around this problem is to not encrypt the root filesystem, but only the data filesystems; that requires you to partition appropriately, and script the unlock/mount sequence yourself. It is, however, a viable alternative if you’re not running a distro with dropbear-initramfs or equivalent available.

If you’re running a larger cluster, you can avoid the downtime problems of delayed unlock with proper replication and high availability. Basically, if your setup could withstand a machine just up and dying (not a crash-and-needs-a-reboot, but “catches fire”-grade failure) without impacting availability, then it taking a few hours for someone to login and enter a passphrase isn’t going to be a problem either.

Another way to avoid the passphrase problem in a larger cluster is to run something like Mandos, where machines provide each other the unlock passphrase at boot. It’s not as secure as manual entry, but it guards against all but the most determined of deliberate attacks (that FDE would otherwise guard against; ie offline attacks).

Finally, the gold standard holy grail of FDE is to use the TPM (if the machine has one) to store the unlock passphrase. In order to do that properly, though, you need to fully enable secure boot (with all its attendant complexity and risks), because it’s a bit pointless to use the TPM to store the passphrase if an attacker could just ask it nicely to cough up the creds.

Security is hard. Let’s go debugging!

codinghorror · May 5, 2018, 12:58am

A+++ wall of text, deployed with surgical precision, would read again!

pfaffman · May 5, 2018, 1:16am

I’ve already read it twice. And though I’ve started using HAProxy, I’m afraid that the Literate Computing Server Farm is not (yet?) one where High Availability is an option. When a machine catches fire, I’ll need to wake up, take notice, and manually restore backups on another machine. Most of what these machines will be doing will be one-time tasks anyway.

fbjerggaard · December 19, 2018, 1:59pm

How do we do this? The “Download All” button on the profile page does not provide enough data since it only provides topics and not any other personal data.

HAWK · December 20, 2018, 1:12am

Hey there. You can download the rest yourself from your database to provide to the user. It’s not intended to be a one-click self-serve situation (GDPR doesn’t require that).

If you are the user, you’ll need to contact an admin on the site in question.

Does that clarify?

fbjerggaard · December 20, 2018, 9:15am

Yep, that clarifies it.
I know GDPR doesn’t require it to be a one-click self-serve thing, but that would indeed make it easy for us admins
Is there any examples of a data explorer query I can run to fulfill this request or should I manually go through all tables to see what data I should export to the user? I tried a quick search for one, but couldn’t find any in my short hunt.
I suppose I am not the only one looking for a solution to this, so instead of reinventing the wheel each time it would be awesome if it was in some kind of wiki on how to do it.

Topic		Replies	Views
GDPR countdown and compliance Community gdpr	90	14864	June 19, 2018
GDPR and anonymizing personal data Community gdpr , privacy	75	19270	December 1, 2018
Why is ‘delete account’ not offered automatically to all users at all times? UX	39	516	December 1, 2024
GDPR tooling on Discourse? Community	4	1274	May 6, 2022
Questions about user anonymization and GDPR Support anonymization	22	1293	April 15, 2024

Providing data for GDPR

Related topics