Problematic IP address fields

The GDPR is something I take very seriously.

However, my forums do not have legal teams to pick through this new and poorly-defined law.

The fines are huge, and there are always axe-grinding members looking to cause trouble for a forum. For me to keep running forums, I need to know the software I’m using is compliant with the new law.

If I interpret the law correctly then we need to ensure the following:

  • If IPs have been stored for users without their consent, they absolutely need to be scrubbed from our database and no longer stored for anonymous visitors.

  • When a signed up (or signing up) user visits the forum they need to see a consent screen with an unticked box and an explanation of how the IP will be used

  • If consent is not given, they cannot be allowed to use the forum.

For the record, I absolutely deplore laws like this as do a poor job of protecting our rights yet they harm millions of businesses and scare the hell out of well-meaning and ethical operators.

I’m absolutely relying on the Discourse team here to take some action to protect its forum operators.

1 Like

They can be stored, but no longer than necessary for a legitimate purpose.

For rate limiting, there is a legitimate interest and this period is pretty short.

For deduplicating link clicks, there is a legitimate interest but they need only to be stored in Redis for 24 hours. I don’t see any reason at all to keep them in the database.

I don’t see the purpose or a legitimate interest for keeping IP addresses in search logs or incoming links.

In contrast to the opening post I do think the topic_views and user_profile_views are problematic. After all, Redis is already deduplicating IP addresses so there is no need to store the IP address longer than topic view duration hours.

2 Likes

Thanks for the info. Out of interest, where are legitimate purposes and storage limits defined in the lawbooks?

Lawful purposes and legitimate interests are in article 6 of GDPR.

Recital 49 talks about usage of data for network and information security.
Recital 47 mentions fraud prevention and direct marketing as a legitimate interest. Deduplicating link clicks and topic views could be considered fraud prevention.

There are no hard storage limits defined. The time you need to keep an IP address in order to deduplicate statistics depends on the granularity of the accumulated statistics.

5 Likes

Sent in the first PR for cleaning this up: https://github.com/discourse/discourse/pull/5826

7 Likes

Just sent in 3 more PRs:

https://github.com/discourse/discourse/pull/5850
https://github.com/discourse/discourse/pull/5851
https://github.com/discourse/discourse/pull/5852
https://github.com/discourse/discourse/pull/5853

aaaand the linkback bot is going crazy with the edits to the OP, oops…

16 Likes

Very good, @sam can review these and make the call on 2.0 versus 2.1 depending on risk.

Although I do absolutely welcome these PR’s I do want to emphasize that storing the IP addresses of visitors without an account (for a longer time than needed for deduplication) is a much more problematic issue since those people cannot easily be asked to give their consent.

2 Likes

Yeah, I was starting to work on that and it’s a bit tricky due to all the various ways that topic view data is used for logged-in users! And topic views are interesting in that only the first time a user or IP sees a topic is counted right now - it doesn’t reset daily like some of the other data.

2 Likes

One thing I should mention since it can help with GDPR stuff is when IPs are anonymized all of the problematic IPs identified in the OP are replaced.

This behavior is only available via plugins right now, but it does work.

3 Likes

12 posts were merged into an existing topic: GDPR countdown and compliance

@riking once we get ALL of these sorted we can start looking at “data hoarding” reduction.

So, for example we can roll up incoming links daily throwing away IPs and only including anon vs logged in counts per day (and follow a similar pattern for search)

But first let’s sort out all these PRs.

4 Likes