Disabling unnecessary logging for GDPR compliance


#1

I’d like to disable logging for my Discourse instance for several reasons:

  • #gdpr paranoia - I do not want to be collecting and storing anything that’s surplus to requirements - especially where IP addresses are concerned* - anonymous users have not given me permission to collect their IPs. I don’t want to risk breaking this draconian and poorly-defined new law

  • Performance - logging all request to disk is unnecessarily hammering my SSD, and probably incurs a small performance penalty when a page and all its assets are requested by the client

  • I don’t need the logs

What’s the easiest and most bullet-proof way of disabling:

  • nginx logging for my instance, please?
  • IP addresses in any other logs (e.g. rails logs) where anonymous users are concerned?

*yes, I know that the internal nginx instance may only see the 172.17.0.1 Docker interface IP. EU legislators are unlikely to understand or care about this technical detail. They’ll just see “IP address,” which is on their list of “personally identifying information”


#2

The truth is that in practice every http server stores IP at least temporarily because it is how the web works. Without collecting and storing IP you would not be able to protect the server against for example DoS attacks.

I am not a lawyer, but reading this Art. 6 GDPR – Lawfulness of processing | General Data Protection Regulation (GDPR) suggests that GDPR does not require you to get consent to collect IP addresses of anon visitors. We can read that

Processing shall be lawful only if and to the extent that at least one of the following applies:

f) processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party

Then this mentions that maintaining network and information security is legitimate interest:


(Sam Saffron) #3

Going to help you do the fishing yourself here :fishing_pole_and_fish:

So start with NGINX and get it going

Step one … hack NGINX conf file

./launcher enter app
vim /etc/nginx/conf.d/discourse.conf
# edit the file so it does what you want ... remember the edits ... 
sv restart nginx
# confirm you are no longer logging by looking at `/var/log/nginx`

Step two … turn your hack into a re-runnable script and mix in to your bootstrap

See how the rate limiting yml script is able to “fix” up the nginx file with replace commands… you want that:

Step three

Post a howto on how to do this in case someone else wants to :vulcan_salute:t3:

And after we get this done we can move on to rails logs.

note … never logging this stuff may leave you in a minor bind if you are under attack or are trying to debug stuff.


#4

Many thanks Sam, that’s helpful.

I’ll post back with a howto.

In the case of an ongoing DDOS, I’d probably temporarily re-enable access logging (I assume this can be done by entering the running Discourse Docker container, altering Nginx config and doing an nginx -s reload)


(Michael - DiscourseHosting.com) #5

That doesn’t make any sense. They will understand such details.

This entire change doesn’t make any sense either.

There is no need to disable logging IP addresses in access logs because of the legitimate interest in recital 49. It would be sufficient to limit the access log storage to a week or so.

You will still be storing the IP addresses in the Discourse database, for instance where reading times and searches are logged. Now that is outside of the legitimate interest in recital 49, and that is the real problem here.


#6

Hold hold, hold up.

Where do you get the “week or so” from?

How can you be so sure that this would be legitimate interest, but more than a “week or so” would not be legitimate?


(Stephen) #7

Recital 49 covers data which the processor has a legitimate interest to collect, so as to be able to resist accidental events or malicious acts.

Log files fall well within that.

Depending on the type of attack, the scope and scale of breach, keeping nginx logs for months or even longer would qualify.

Turning them off entirely would, if anything, leave you open to a claim that you weren’t doing enough to protect user data.


(Michael - DiscourseHosting.com) #8

I’m not sure, that’s why I was saying “or so”.

Personally I would be able to explain why I would be keeping those log files around for a week, but I wouldn’t be able to explain why I’d be still keeping them after two weeks. So my retention period for legitimate interest would be between one and two weeks.

And of course: in case of an actual attack we’d be keeping them around longer, or at least we’d be keeping the interesting parts.