Discourse deleted 1500 inactive users who haven't been seen for two years after update

Thank you @jomaxro for your thoughtful and clear response.

The issue that sticks out to me is an assumption about what this data means. My own main use case uses SSO and the Discourse API to merge users between 2 systems. I also have processes outside of Discourse’s view entirely that account for this data, so it strikes me as heavy-handed that Discourse thinks it knows more about the context of my own data than I do.

This is why I wonder why flipping the default wouldn’t make more sense so that we have to opt in to this cleanup functionality. It’s impossible for any Discourse instance to know with 100% certainty that something else isn’t using this data.

That should be for for each one of us to decide for ourselves, and opt into its pruning if desired.

As for documentation, I totally understand and appreciate the effort involved to keep up. That’s kind of the point: there are so many changes to detail that it’s also unrealistic to expect all of us to fully understand every single part of a release.

Just as it would nearly be a full-time job to fully document every change, it’s can also be a prohibitively expensive task to comb through every part of every release before upgrading, especially for those of us who run small businesses.

That’s why “Clean up inactive users” feels so surprising as a footnote. It’s painful to miss this detail specifically because it can result in unexpected loss of data. That note seems out of place alongside the 17 other “New Feature” lines that aren’t potentially destructive.

4 Likes

If you have many users who are only using Discourse for identity, then it’s probably time to deploy an IdM.

This isn’t a technical problem, this change has highlighted a design problem within your environment

You are certainly entitled to an uninformed opinion about the details of my environment.

Regardless, I am still not sure what the harm is in making this feature opt-in. There are clearly several users here who came upon this as an unexpected and unpleasant surprise.

I agree though that this isn’t a technical problem. It’s a release management problem.

6 Likes

I think this is a pretty harmless change all in all, but maybe there could be a framework for presenting new site settings and important changes to the admin on the first visit after an upgrade so they can tweak them before they effect anything? Like a mini version of the first time setup wizard?

Being able to sort the site settings list by “newness” would also be nice to help admins discover all the new features. :slight_smile:

13 Likes

This isn’t meaningful “data” that is being “lost”, though. As you can see with GDPR and similar, we feel it is more appropriate to stand up for users and the overall health of the web by having a safe default that cleans up old, unused accounts automatically.

Regardless, it is a site setting so site owners can tweak it to taste. If you want it to be three, four, or five years instead of two, go for it.

4 Likes

It isn’t for you to determine what my data means to me. See my earlier points about that specific assumption.

It is resolved for me now though yes, as I have updated that site setting.

The lingering concern is that I only found out about this by chance through this thread. I might not be so lucky in the future if Discourse sees this whole thing as a non-issue with no opportunity for improvement.

In that regard, @ssvenn’s suggestion seems like a fantastic idea to mitigate similar issues going forward should they pop up.

7 Likes

It also isn’t for you to decide that other people’s data from abandoned, never used accounts belongs unconditionally to you, forever.

6 Likes

I very much agree.

I’m not against this feature by any means. I think it is a great feature. It’s the handling of its rollout that strikes me as revealing an opportunity for improvement.

I don’t think it’s unreasonable to not want to be surprised by something like this.

11 Likes

IMO the release notes didn’t draw adequate attention to a destructive change.

Not only that, but many of us update our instances every few days - taking in changes before the official release notes round-up. We cannot feasibly read every git commit comment before hitting the button.

It would be handy for the admin updates page to show any major/breaking changes (or new site settings).

7 Likes

It also isn’t for you to decide that other people’s data from abandoned, never used accounts belongs unconditionally to you, forever.

Again, the principle is more than valid but the execution has been colossally flawed, and this standoffish attitude in service of the real goal (data lifecycle management) is increasingly serving only to undermine confidence in the transparency of Discourse’s release cycle, which threatens to reduce patching…

Please consider the holistic usage of the discourse platform in the wider assortment of use cases that you and I can know little about instead of painting dissent and questioning around the migration plan as some kind of bad-faith data hoarding exercise.

7 Likes

If Discourse is not working as you desire, feel free to choose another free open source software that better meets your needs.

3 Likes

Discourse isn’t an identity management platform, if there are accounts who have never read anything, nor responded to anything, they are by definition not users.

In any scenario where Discourse is subordinate to SSO or an IdM, were those users to return they would be re-created with the same attributes.

5 Likes

One question, if it is a user like to read publication, account remains active?

Correct, if their read time is non-zero they aren’t affected. The accounts have to be totally unused to be hit.

2 Likes

Great!

If really the user will only read the posts and not create topics Will continue with the active account is a great option.

I want to change to 1 year inactive accounts to be deleted so as not to take up space :slight_smile:

3 Likes

This topic is getting way too heated for my liking, I am putting this topic on freeze for a week.

  1. I am going to now update the old release notes with a very clear description of the feature that is prominent

  2. I am going to update our #releases topic to ensure we call this out properly in the upcoming major release on the blog, I feel it is an important measure to default protect user data so it should be announced properly when we finally release 2.3

This change is good for 99.9% of the communities, the edge case 0.01% that use Discourse as an identity provider or some other obscure case have a clear workaround.

EDIT

(1) and (2) are now done, see:

22 Likes

This topic was automatically opened after 6 days.

So discourse is the SSO master? And users are logging in to the other system via discourse? If so it would seem like this logins should count as logging in.

1 Like

It’s the other way around: my other system handles auth and then logs into Discourse from there.

It’s besides the point either way though; it’s the users that aren’t logging in that I am specifically concerned about, as I (and others) may not necessarily agree with Discourse’s conclusion that these users represent worthless data. The implementation details are irrelevant to this concern.

It’s all a non-issue for me now though. The updates to the release notes were great :+1: and @ssvenn’s suggestion to call out new site settings sounds like an awesome check against any future cases we can’t yet anticipate. I hope it gets some further consideration.

5 Likes