GDPR countdown and compliance

The data are available, if not easy to retrieve. The likelihood that some discourse customer will get such a request before you do and the ability to easily get the data is pretty great.

Worst case, you’ll have 30 days to solve the problem. At that point you can either do it yourself, or pay someone no more than a few thousand dollars to do it for you. You likely have many larger risks in your life.

6 Likes

Unfortunately for me the worst case scenario is here now - pressure to come up with a way that I can prove I can export GDPR compliant dataset to meet data access requests before the act comes into force! I can do this but I keep raising this as I am sure this is an issue for thousands of Discourse installations whether they realise it or not.

The new privacy policy per notice from @HAWK appears to claim that the export data function fulfils the requirement for access to data under GDPR. The advice I am getting (and comments elsewhere on this forum and just by looking at the database) is that this is clearly not the case - meta.discourse.org is clearly in breach of the requirements. Saying that the download includes “all of your activity” is just false - it does not. All personal information needs to be provided, which is more than just posts to the forum. For example IP addresses are personal data when linked to an account.

Please @sam or @codinghorror am I missing something here?

8 Likes

I agree, there is much more to this.

Actually, you could argue that all your posts are not even personal data since you licensed them to the forum owner when you published them.

But there are IP addresses, post reading times, staff notes, all kinds of stuff that you cannot access in any way.

3 Likes

Can’t these be accessed in the data explorer? Perhaps I’m too optimistic here, but I suppose we could come up with a number of Data explorer queries to generate this data for a given user id?

Yes - we are currently working on such a set of queries.

6 Likes

That’s fantastic, will you be sharing them? Happy to take on a few if you want to divide up the work.

5 Likes

Going to wet blanket here and mention that you probably need extra privacy disclosures around how exactly Data Explorer queries get used…

2 Likes

Including a list of IP adresses in a self-service data export seems dangerous and counterproductive to me. There should only be enough there for what the user needs if they are migrating to another service or backing up their content before deleting it.

I can see what GDPR is trying to do but I’m not sure they have considered the implication of what happens when a hacker steals your account and is able to easily dump a local copy of all the data linked to that account, including the sensitive stuff. Even after you regain control of your account they still have all your data.

Those kinds of requests NEED to go through an information officer that manually verifies the identity of the requester.
Imagine if PayPal just had a button on your account page saying “download all my stored data” and that archive included all your credit card information for instance!

4 Likes

@HAWK - first, thank you very much for providing updated privacy policy.

As others noted new policy is great step forward, but does not seem to address some of the GDPR requirements just yet. There is no mention about children safety (old policy had COPPA) and new E.U. rules need parent approval for all users under 16 years, so the policy should probably say that users below 16 years are not allowed to use forum. Also new policy does not explain cookies in such detail as old policy. New policy does not specify clearly under which lawful bases the forum is processing the data. Various information items can be processed for different purposes. Collecting email is definitely required for “performance of contract”, collecting IP is OK because of “legitimate purpose” of network security. But the only lawful purpose to send digest would be voluntary consent. My impression (I am not a lawyer) is that GDPR compliant privacy policy should explicitly list all processed data items and lawful basis for processing of every item. And as others noted “Download all” only downloads posts, not all activity.

2 Likes

There is a difference between ‘right to access’ and ‘right to data portability’ in GDPR. For the second one, the migration argument holds, for the first one, it does not.

2 Likes

The final rules are the product of extensive debates and are very well thought out. We’re well past the point where expressing opinion really matters, it’s a simple question of complying, or else.

I’m very familiar with @richp10’s dilemma, my own clients span the NHS, government and the UK education sector. As soon as GDPR was finalized it entered the standards and practices for many UK public sector organisations,which impact all new services.

This situation isn’t a new thing, a few years ago HEIs took a similar stance with identity and access management, mandating SAML2 and SIFA. Until compliance can be proven products such as Discourse can’t even be proposed at many levels.

1 Like

Both true. However, in the case of “Right to Access,” the requested data must be provided within 30 days.
Therefore, a semi-manual process, run by an information control officer or other qualified person, and taking a few extra steps to verify the identity of the requester, would be compliant. It could also arguably be preferable from an account security viewpoint.

2 Likes

This is about a million miles from the reality of the public sector, it’s the key difference between what we as technical people know we can deliver on, and the risk assessment that the non-technical decision makers will run through when approving new products and services for inclusion in a live service.

It doesn’t mean that we can spend 29 days writing code and deliver on the 30th, a request will be received, validated, a change proposed and tested, the exported data scrutinised (because disclosing more than the request can be as big a risk as failure to disclose) all before release. No large organisation is going to let the 25th come and go without knowing exactly how they’re going to execute the above, the penalties are just too large.

In the kinds of organisation mentioned above a Right to Access Request will be handled by a records manager in exactly the same way a Subject Access Request is handled today. While there are outliers, they’re typically non-technical managers and need solutions that don’t involve rooting around in a database. Every new solution touted by their peers within the technical organisation have to meet key criteria for audit, access, and discovery of user data. It’s one of the big reasons that a lot of in-house projects get killed off in favor of COTS alternatives which guarantee compliance with one or more standards.

Unless we can demonstrate a toolset which makes product compliant with GDPR, many projects will be halted. This isn’t a hypothetical though, I’ve already seen several streams of work which have been running for years put under review. It’s just like every other compliance exercise, Discourse will be no exception there.

2 Likes

That’s kinda the crux of it here, right? You can reasonably easily pull all data about a user (on request) within minutes. Whether or not your internal procedures make this onerous isn’t about the software.

1 Like

I want to bet that 90% of the Discourse forum admins are unable to pull all data about a user at all, 95% are unable to do it within 30 days and 99.9% are unable to do it within minutes.

Unless they would have a script or SQL query prepared to be executed, which does not exist (yet) by my knowledge.

3 Likes

Again, that’s my point. Software isn’t compliant or non-compliant. It is up to individual administrators to ensure that they comply – so perhaps step one for anyone that is concerned about whether they are compliant would be to source/write this query.

2 Likes

Working on it :slight_smile:

10 Likes

If you don’t have the in-house resource to do this I’m sure someone would pull it together for you if you post in the #marketplace

2 Likes

So does CDCK have a price in mind for hosted customers to fetch such data, or is it rolled into the hosting fee?

1 Like

Business or Enterprise customers could deal with these requests using the Data Explorer plugin so we wouldn’t need to be involved at all. Standard customers would need our assistance and we will talk to them on a case by case basis. There (obv) hasn’t yet been a requirement. :slight_smile:

2 Likes