Could be possible to store just a hash of the IP address instead? That would remove the personal information and will be useful for moderation purposes too.
Hashes aren’t a magic bullet to privacy issues. The systems which need to retain IP data can do so lawfully. The remainder just don’t need to do so.
One of the benefits of having access to IPs is being able to see if a user is still coming from the same subnet, source network or region. A hash would provide a simple Boolean response as to whether the user was on the exact same IP. It would be true in 1 out of potentially millions of IPs controlled by their ISP, and false for another ~3.7 billion address, while giving us no more information about the user and their behavior.
No - the amount of IPv4 addresses is sufficiently small (4 bytes) that it would only take a very short time to generate a lookup table containing all possible values.
Those are actually the IP storage we’ll be keeping – I’m going to work on removing any IP storage not shown in the UI.
Ok, general update as promised. It’s important to remember that Discourse can be used in GDPR compliant ways, but software itself isn’t compliant or non-compliant. If you host Discourse for users covered by GDPR, you’ll need to do so in a GDPR-compliant way. What that looks like for you will depend on your interpretation of the guidelines and how you specifically use your member’s data.
A user’s right to be informed
We have updated the privacy statement that ships with Discourse. You can see our version here. You can edit your own version to suit.
A user’s right to be forgotten
Users and their posts can be deleted or anonymised by an admin. We have added support in
v2.0.0beta8 to rename users in mentions and quotes when anonymising, as well as support for anonymising a user’s IP addresses
A user’s right to a copy of their data
A user can download their activity as a .csv file by going to their activity summary. An admin can do this for other members by impersonating them.
A user’s right to modify their data
Depending on how you have configured your Discourse instance, a user can modify their data via their personal preferences and/or by contacting an Admin.
Gaining consent
You can customise your own instance to include a mandatory custom field on registration or you could use the Custom Wizard Plugin as a means to gain consent.
That is put a bit too simple. GDPR (especially article 25) but also other standards and regulations (think ISO27000, PCI) set very specific requirements for software. So technically you could argue that “software only helps to achieve compliance” but if your software doesn’t meet the requirements, it can break compliance for your entire organisation.
Technically I could argue lots of things but that’s not how I’m choosing to spend my time right now. I’d rather wait and see how this pans out.
Discourse is amazing, the Discourse team are amazing, the Discourse Community is amazing - but I am not convinced that the GDPR issue has seen any of these at their best. One week from the deadline and it is very hard to comply with GDPR if you use Discourse.
As far as key stakeholders in my projects are concerned, Discourse makes it hard for them to meet their legal responsibilities under GDPR. Some people here seem to think there is not a problem I am making too much fuss - but the lawyers of people I am trying to engage with a product using Discourse are saying it is not good enough and they cannot allow their staff or service users to use Discourse until this is resolved. As such I am doing back-end coding to work around their concerns.
I shared my original concerns: Providing data for GDPR
Of these, I was wrong / reassured about the Discourse approach to Right to be Forgotten. I have set things up so users can choose whether to completely remove or anonymize and I explain the merits of each. I think Right to be forgotten has a big tick - thanks team.
Also mentioned there and alluded to but not explicit in @GBrowning post that started the current thread, we still have a big red cross beside the ‘Right of Access’ box. This is a separate issue to Data Portability.
Whatever people here may think, my stakeholders (NHS and major UK charities) demand that I am able to respond to data access requests in a way that is compliant with GDPR. At the moment I cannot comply with this basic requirement under GDPR using native Discourse tools.
The ‘download all posts’ simply does not cut the mustard because it does not provide all the personal data held on the database. My work around: on my backend I am coding an extraction directly from the database to try and pull together all the ‘personal information’ that must as a matter of law, be provided in response to a request. This includes IP addresses, private messages and other data that is not currently included in the download. I am finding it hard going because my knowledge of the database is limited.
As of next Friday, I could email @codinghorror making a Data Access Request and Discourse would as a matter of law have to carry out exactly the same exercise - building queries to extract all my personal data to send to me within a month. If they do not do this, the penalty could (theoretically) be 4% of your turnover.
Of course I am going to make no such request, but across the EU companies, Charities and Public Bodies are working hard to establish whether they are in a position to comply with Data Access Requests.
If they use Discourse, they cannot, unless they do some strenuous deep diving with SQL.
There might be a few other issues still worthy of more discussion (such as whether some old data no longer has legitimate use and should be deleted) but Right of Access requests are the big headache for me.
This is awesome, is there any way to get it in markdown in a handy way? I can re-craft by hand of course, but just checking.
Actually, just copy and paste will likely be pretty good. (looks like there is no route to get raw for it).
The data are available, if not easy to retrieve. The likelihood that some discourse customer will get such a request before you do and the ability to easily get the data is pretty great.
Worst case, you’ll have 30 days to solve the problem. At that point you can either do it yourself, or pay someone no more than a few thousand dollars to do it for you. You likely have many larger risks in your life.
Unfortunately for me the worst case scenario is here now - pressure to come up with a way that I can prove I can export GDPR compliant dataset to meet data access requests before the act comes into force! I can do this but I keep raising this as I am sure this is an issue for thousands of Discourse installations whether they realise it or not.
The new privacy policy per notice from @HAWK appears to claim that the export data function fulfils the requirement for access to data under GDPR. The advice I am getting (and comments elsewhere on this forum and just by looking at the database) is that this is clearly not the case - meta.discourse.org is clearly in breach of the requirements. Saying that the download includes “all of your activity” is just false - it does not. All personal information needs to be provided, which is more than just posts to the forum. For example IP addresses are personal data when linked to an account.
Please @sam or @codinghorror am I missing something here?
I agree, there is much more to this.
Actually, you could argue that all your posts are not even personal data since you licensed them to the forum owner when you published them.
But there are IP addresses, post reading times, staff notes, all kinds of stuff that you cannot access in any way.
Can’t these be accessed in the data explorer? Perhaps I’m too optimistic here, but I suppose we could come up with a number of Data explorer queries to generate this data for a given user id?
Yes - we are currently working on such a set of queries.
That’s fantastic, will you be sharing them? Happy to take on a few if you want to divide up the work.
Going to wet blanket here and mention that you probably need extra privacy disclosures around how exactly Data Explorer queries get used…
Including a list of IP adresses in a self-service data export seems dangerous and counterproductive to me. There should only be enough there for what the user needs if they are migrating to another service or backing up their content before deleting it.
I can see what GDPR is trying to do but I’m not sure they have considered the implication of what happens when a hacker steals your account and is able to easily dump a local copy of all the data linked to that account, including the sensitive stuff. Even after you regain control of your account they still have all your data.
Those kinds of requests NEED to go through an information officer that manually verifies the identity of the requester.
Imagine if PayPal just had a button on your account page saying “download all my stored data” and that archive included all your credit card information for instance!
@HAWK - first, thank you very much for providing updated privacy policy.
As others noted new policy is great step forward, but does not seem to address some of the GDPR requirements just yet. There is no mention about children safety (old policy had COPPA) and new E.U. rules need parent approval for all users under 16 years, so the policy should probably say that users below 16 years are not allowed to use forum. Also new policy does not explain cookies in such detail as old policy. New policy does not specify clearly under which lawful bases the forum is processing the data. Various information items can be processed for different purposes. Collecting email is definitely required for “performance of contract”, collecting IP is OK because of “legitimate purpose” of network security. But the only lawful purpose to send digest would be voluntary consent. My impression (I am not a lawyer) is that GDPR compliant privacy policy should explicitly list all processed data items and lawful basis for processing of every item. And as others noted “Download all” only downloads posts, not all activity.