Hi. I’m reviving this discussion to extend it beyond IP addresses, which is what everyone has talked about so far. But Discourse logs, and displays publicly, far more information, which is not nearly as justifiable as IP logging: date joined, date of last post, trust level, how much time reading, how many topics read, how many hearts given and received, etc. In addition to the GDPR, I’m concerned about the US COPPA, because some of my site’s users will be under 13, and as a privacy enthusiast all this information collected just rubs me the wrong way.
If we really have to dig through the code to find all this, I suppose one of those 12-year-olds would be happy to do it. But really, I can’t be the only person who thinks this way, and I’d really like one big check box under Basic Setup saying “Maximize privacy.”
I can’t speak to GDPR/COPPA specifics (I’m not a lawyer), but I’m a bit confused here. Quite a bit of what you listed is already public. Last post date, likes given/received, and most of the other data on the user summary page (topics and posts created, topic replies, topic topics, top links…) as well. As for the rest, I’m failing to understand the concern - knowing how many topics I’ve read, for example, doesn’t seem like personally identifiable information, or a privacy concern.
If users are concerned, they can enable the Hide my public profile and presence features user preference to prevent non-staff users from seeing their profile page.
The problem with this is that “privacy” means something different to everyone. What in your mind is a privacy leak in others is perfectly fine. Also, whose privacy is the setting trying to protect? And from whom?
Can the site be public? That allows non-users of the site to see content. I guess the site becomes login required - that prevents anonymous users from accessing the site. Perhaps registration should be disabled - that prevents anonymous users from signing up. What about private email - can we trust users of the site not for forward emails they receive with content from the site?
I’m guessing none of the settings I just listed were what you were thinking for “maximize privacy”, but all change the privacy of user content. My point here is that a one size fits all setting for something as broad as “privacy” isn’t going to work.
It would help a lot if that Hide my public profile and presence features user preference were split in two. Kids want to be able to show “I like Chinese food and anime” but not “I spent six hours here yesterday.” And then all I’d need is a way to turn on Hide my presence by default for new users.
But better than hide would be don't collect, so staff don’t have access either, and (not that I expect this to be an issue for our young users) neither do people with subpoena powers.
Hide my presence isn’t what you think it is. Presence is the indicator in the composer that someone else is typing, nothing else. “disabling presence” will not hide “I spent six hours here yesterday”.
don't collect would break Discourse. Don’t collect post date, no posting allowed. Don’t collect like count, no liking. Don’t track posts read, no indication to the user what they’ve read. Disabling all of this would turn Discourse from a forum software to a static website.
All this data has to be stored somewhere. Sure you could write a plugin to hide it from the web UI, but it still would need to be in the database, and thus still very much subpoena’able.
Extreme over-simplification of the database to follow:
Going to use likes as the example as it’s simpler than posts.
When a user likes a post, that action causes multiple changes in the database.
The like_count for the post is incremented by 1.
The likes_given for the user is incremented by 1.
The fact that user foo liked post #123 is stored.
In order to indicate underneath the post the like count, we must increment the like count for the post. If likes_given isn’t incremented, that data isn’t as quickly obtained, but can still be determined via the user foo liked post data.
If we also stop tracking which user likes which post, we would only be able to display the count of likes below each post, not who liked it. We’d also no longer be able to limit users from liking the same post multiple times, or limit how many likes they give each day. A single user could sit and like a post infinite times.
I see the argument about likes (although I don’t like likes as a forum feature), but how does not collecting who posted when prevent posts? I see that it prevents preventing multiple posts. So for spam-prevention maybe you want to keep the posting data for 24 hours and then delete it.
But, I get that I have very different ideas about these things from the Discourse team. You probably have a user community mostly of young adults. The needs of a forum for children are very different – and so are the preferences of someone old enough to remember when we had privacy.
In and of itself it doesn’t, but it would make the forum anonymous, and completely change how it works. So I may have been a bit overzealous here - it may not completely prevent posting, but it would definitely break how one expects Discourse to work.
You’ve got that right . We do not think about COPPA when building Discourse. Thinking back to helping set up an online service for someone under 13, I recall all sorts of extra hoops to jump through for registration, as well as full moderation of all content prior to posting. I don’t recall all posting being anonymous, though.
In any case, you could likely make Discourse work for your needs, but you’d need to do significant work via a plugin to make it happen. I don’t see this type of “don’t collect data” becoming a core feature.
Yes, I was just going to say that to Joshua (time for me to consult the university’s lawyers) when you posted that.
I don’t want posts to be anonymous, by the way. So yes, by scraping the actual posts one could reconstruct a log. But that’s different from us keeping a log on purpose and displaying it with the user’s profile.
Long ago, when I was a callow youth and had never thought about privacy as an issue, we implemented a public display of users’ time of last logout. The idea was that our users woke up and slept in different phases, and you could use the last logout time to make inferences about whether your friend was awake now or not. To my astonishment, users complained; they were worried that people might instead make inferences about whether they were slacking. That was when I first learned that sometimes it’s better for sysadmins not to know some things.
Anyway, thanks for the discussion and the education.
The one thing you will achieve with any of the above is indicating which users are under 13, effectively marking them out.
Organisations I work with who have children or vulnerable adults posting usually enquire about additional protections, but quickly realise that unless these changes are made to all users they’re basically drawing a bullseye on the people they’re trying to protect.
This part implements the Children’s Online Privacy Protection Act of 1998, (15 U.S.C. 6501, et seq.,) which prohibits unfair or deceptive acts or practices in connection with the collection, use, and/or disclosure of personal information from and about children on the Internet.
I don’t see how any of the various forum related data could be considered within that scope. That is, I don’t see it as being either unfair or deceptive. Nor do I consider it to be PII.
(I work with Brain, and am maintaining our discourse instance.)
I think in many ways, the COPPA-specific discussion is a bit of a distraction from the general concern, which is doing the most we can for user privacy. There is a difference in saying something is technically possible, but difficult to do, and something that is already done for you. (Especially since, given a large enough body of text, even an anonymous user can be de-anonymized with reasonable certainty.)
(We are using SSO for our forum, so those issues of user validation an age are handled elsewhere.)
But in general, I think the request is more basic — we’d simply like to have forum defaults for users which are pretty privacy conscious. I.e “By default user stats are not shown on profile pages”, similar to how you can hide suspension reasons or whitelist custom fields to show on user pages. A starting point of “whitelist forum data to show” would be really helpful.
If these are settings we could make plugins for, that’s probably a reasonable path too.
There’s nothing stopping you from overriding these values directly in the database, for most everyone else though the things which have been flagged as privacy concerns above are actually central to promoting discussion.
Take the presence indicator on topics, I could be drafting a reply during a back-and-forth when I see that @codinghorror is also responding. It gives me the option to hold off on my reply until I’ve read his response. The indicator doesn’t facilitate any kind of abuse, merely gives me more information and context around which I can make decisions.
Rather than broadly flag the above as privacy concerns could you possibly elaborate for each how you believe privacy is breached by displaying the information and how it might be abused?