GDPR and anonymizing personal data

GDPR that will be effective in EU by 28/5/2018 requires more careful handling of personal data. As I’ve analyzed needed changes in our software I’ve found following:

  • We need explicit consent by user to gather personal data (username, ip addresses) - I think that registration in forum can be taken as explicit consent :white_check_mark:

  • We need to handle them any personal data per request, so is there any convenient way to export the data we are collecting about the user? I mean are there any data not currently visible to the user in his profile? I can only think of IP addresses.

  • We need to delete the data per request - this could be easilly done be removing all post, removing user or anonymizing user.:white_check_mark:

Are you EU providers preparing for this new policy somehow?

11 Likes

I believe @erlend_sh is researching this on our end.

10 Likes

Sorry, new here so my use of markdown isn’t great. I’m commenting on GDPR from my (limited) understanding of the UK implementation and how we at FanFinders are preparing for it.

We need explicit consent by user to gather personal data (username, ip addresses) - I think that registration in forum can be taken as explicit consent

AD - Consent shouldn’t be a pre-requisite of using the service where this is avoidable. I’d suggest it’s a case by case interpretation.

Also, consent needs to be specific positive action. In our case, we are using an unchecked (this is important) checkbox. Nothing ground-breaking but it is important.

2 Likes

Discourse should have you fully covered with the user data export. It’s tough to determine how far-reaching these regulations are, but a lot of it comes down to being responsive and sharing data on a case-by-case basis where necessary.

If for example it turns out users are entitled to an “export” of their IP address as well, we’ll need some time to implement that into Discourse core. But in the meantime, if you received a request for retrieval of personal data, you could simply look it up and email them their IP address (which is super weird but no doubt we’ll run into pedantry sometimes with these new regulations).

5 Likes

I’m pretty sure that users are entitled to get all their personal data. IP addresses belong to that, because they are associated to user accounts in the database and also can be associated to a person by their ISP.

I really appreciate your effort and want to thank whole Discourse team for great support!

Exactly… Customers are also entitled to selectively rectify some data concerning them. They are probably able to find and change 99% of their personal info through the profile. The only thing they can’t is to remove IP address and Discourse team should probably target this feature in next updates… The problem we are all aware of is that pretty much all of our spam defensive features depend on ability to block IP address or IP range. I don’t think we can afford to lose this ability for the sake of user’s anonymity…

I wonder if we could bind collecting IP address with the registration (as I suggested earlier). Registration is opt-in mechanism sui generis. I think that principle of proportionality would give us right to gather IP of users because of our right as a company to defend our server installation against attacks. I’ll try to analyse EU documents more in detail to find correct answer and I welcome your opinions and ideas.

2 Likes

No, explicit consent means that you should ask a specific and separate question, i.e. a checkbox upon registration “I agree that…”

Article 7.2 of the GDPR:

If the data subject’s consent is given in the context of a written declaration which also concerns other matters, the request for consent shall be presented in a manner which is clearly distinguishable from the other matters, in an intelligible and easily accessible form, using clear and plain language

But more important: this only goes for registered users, so that means that you:

  • have to close your forum for unregistered visitors
    and/or

  • make sure that you remove all IP addresses from unregistered visits, including requests to the sign up form that never completed

It’s not tough to determine, it’s all in the law. IP addresses are categorized as ‘observed data’ and are included.

At this moment, I only see the ability to download my posts, likes and such, but not my profile data. Or am I overlooking something?


That is correct, Recital 47

The processing of personal data strictly necessary for the purposes of preventing fraud also constitutes a legitimate interest of the data controller concerned

:

5 Likes

I think that Discourse is already compliant in this… We don’t collect IP addresses of unregistered members and non-verified members are deleted after some period. Yes there are some server logs but these are flushed in short time period, so I don’t think that is a problem.

We could add separate checkbox to the registration. But 1) the problem is that we don’t know if user is from EU and 2) we should focus on having simple registration without any redundant steps because we all know it affects user conversion rates. I’d prefer having the explicit consent after email verification as a profile field.

The question is (in case of public forums) - are we required to know that user is from EU? I guess that would be really hard to detect reliably. Maybe something in the line of checkbox in the profile saying “I’m EU citizen and I agree with personal data collection + link to GDPR site policy” would suffice.

IP addresses are definitely collected by Discourse, even from anonymous visitors.
Check for instance the topic_views and search_logs tables.

2 Likes

Thanks for pointing that out.

https://www.whitecase.com/publications/alert/court-confirms-ip-addresses-are-personal-data-some-cases

It looks like IP address without any other user information is not personal data.

Where a piece of information (such as an IP address) does not directly identify a person, that piece of information will nevertheless be personal data in the hands of any party that can lawfully obtain sufficient additional data to link the information to a person’s real world identity. On the other hand, that same piece of information will not be personal data in the hands of a party that has no legal means of obtaining sufficient additional data to make such a link.

What are you saying would force us to remove correct counting of unique topic views. i’m not sure why we need to log IPs with searches.

You’re going a bit too fast. You’re reading very selectively here. The title of the article is even “Court confirms that IP addresses are personal data in some cases” .

This case only judged about a dynamic IP address so it’s limited anyway. But more important is the last paragraph of that article: “Consequently, it may be necessary for the CJEU to revisit this issue after enforcement of the GDPR begins on 25 May 2018.”

2 Likes

I agree, but this whole policy is little bit vague. In that case damaged party was using dynamic IP. But are we able to detect if IP is dynamic or static? I don’t think so (with the exception of edge cases where it could be reverse-looked-up to domain with someone’s surname).

Court confirms that IP addresses are personal data in some cases

Yes, it is personal in case you can link IP to some person (e.g. you are ISP that assigns that IP to your own customer).

Correct, you cannot always tell whether it’s dynamic or static - so then you have to stick to the safe side.
It’s easier to just stop registering IP addresses, than to conditionally stop registering them, by the way.

1 Like

Is deleting/anonymizing the data after a short period compliant? e.g. replacing the IP address with a sequential number.


Also, I think that exporting your read timings and topic tracking state is something that needs to be done at some point.

1 Like

Yes it is, but we would lose many essential features as it has been discussed above.

To add to this, our lawyer points out that there needs to be some sort of reasonable time limit for how long personal information is stored. Anyone has any clarity around this? E.g. should inactive accounts be auto deleted after a certain period? And if so, should previous posts by this account be anonymized, or would that require deleting post by inactive users as well?

Also, will Discourse assist in updating the privacy policy and terms to be compliant, or do we have to do that ourselves?

5 Likes

These are very good questions we are working through this now and would love your feedback and input!

6 Likes

I think a general overview of what information a Discourse setup actually collects about visitors/users, as well as the mechanisms for storing and deleting this information, would be useful in this discussion. Notes on how these settings may be changed would also be nice. I’m writing a privacy policy that I want to be as precise a possible but I can’t seem to find documentation about this.

Nginx keep access logs for 14 days. But copies may be kept in backups for much longer time. (In Discourse’s own backups, too?) And is there an easy way to change these settings?

The IP-address a user had when creating the account and during his/her last visit is retained. Can these be deleted somehow? Are the IP-addresses for previous visits really wiped?

Does any documentation state exactly what the cookies Discourse use do? Exactly what data is collected? (As opposed to the expiration time of the cookies, the visitor can’t infer these things by inspecting the cookie.) The privacy policy is not precise on this point. It doesn’t even mention there are two different cookies, _forum_session and _t (if those are the only ones).

Edit: It probably should be noted that the IP-address is retained (in the admin control panel) both for profiles posting anonymously and for users that have been anonymized.

1 Like

Thanks! I think it would be very helpful if you provided, and updated if necessary, a privacy policy that is US and EU compliant.

1 Like

Possibly, some of the EU rules are quite onerous and sometimes unnecessary like the “alert everyone that a website uses cookies” rule which is actively hostile to users and the web.

7 Likes