Last Friday I updated Discourse to v2.3.0.beta8 (from Beta 3 I think it was). Since then, it has deleted 1500 inactive users (defined as (trust level 0 without any posts if I understand correctly). We did not want those to be deleted.
I am reporting this as a bug because I feel that Discourse should not change its behavior in such an important subject without either setting the defaults so these users do not get deleted, or at least prominently mentioning this in the release notes.
Yikes! Could someone from the team clarify the exact criteria for this automated deletion? I donāt want any users automatically deleted.
Iāve seen people return to my forum and post after two years of TL0 inactivity. Iād hate them to face the confusion (and annoyance) of a deleted account.
Edit: just seen this setting and have changed it (the default is 730)
It leads to artificially inflated stats, āwe have 50,000 usersā but ā¦ do you? If those users havenāt been on your site in more than two years in any meaningful way, either reading or posting, at all, in any capacity whatsoever ā¦ are they really users on your site?
Itās a form of global digital hygiene; itās unlikely any user wants an account they donāt use across {x} different websites on the internet. So removing them automatically cleans up their digital footprint with no effort on the userās part. (GDPR comes to mind, for example.)
Dangling unused accounts are potential points of compromise. For every user that re-uses passwords (and so many do), all it takes is one account to be compromised and attackers have a digital skeleton key that works on every website that user ever touched.
These reasons all make sense and I have no issue with the removal of such accounts. It seems to me the real issue on this topic is that users feel they havenāt been informed of this change. Iām not sure if this was actually in 2.3.0beta8 but if it was, itās not in the release notes (And yeah, if youāre on beta youād better always check those notes before upgrading).
I have to say, Iām very disheartened by the approach here.
Iāll first of all make clear that automated data lifecycle management is a great feature, and I donāt challenge the feature itself.
However I feel a great deal of thought needs to be given before rolling out new default behaviours that result in data destruction. Discourse the product clearly strives for convention over configuration but when those conventions break, these can be significantly impactful to everyone.
We have clients that have been affected by this change and have had to perform full data recovery operations in their production environments. To go into some specifics here:
Why is a destructive change being rolled out as on-by-default? This functionality could have been given different defaults for upgrades as for new installs to prevent unintended data destruction.
Why has this behaviour not been signalled more clearly? āClean up inactive usersā in a changelog under the heading āNew Featuresā does not clearly signal the potential impact of these changes.
Has sufficient consideration been given to some wider use-cases? Discourse does not exist in a vacuum as part of peopleās software stacks and I can already think of many edge cases where Discourse might see users as āInactiveā despite other integrations using the same user database for other purposes across their platforms. Not everyone will be using the native SSO functionality supported by discourse, and it is very probable that some production integrations utilise the User table more directly without updating Discourseās āviewā on what active/inactive is. It is very possible that this data destruction could result in destruction of active users of these platforms who nonetheless remain inactive on the āforumā side of these integrations
While I recognise this is a beta release I hope these concerns make sense - in general Discourse is opinionated software and this provides significant benefits in terms of administrative and moderation burdens for users, but the approach with this inactive user cleanup functionality risks (and has already introduced) breaking changes and it is important to balance these changes with consistency if we want people in this ecosystem to have any trust of update mechanisms and new features in futureā¦
Iām slightly puzzled why anyone would roll out updates to a client site without reading the release notes and staging them first.
Thatās just asking for trouble.
Were I a client reading that post I would be alarmed by the lack of due diligence being carried out here. Iām hoping that at the very minimum youāve apologised for your oversight and assured them that your future release practices will involve both consulting the documentation and staging the updates away from their live data?
This change happened in beta six, which is about two and a half weeks ago, why is this your first post on the topic if youāve already effectively āmanaged an incidentā and restored data? Thatās rather peculiar.
Discourse supports SSO, and a fairly robust API, those are the widely supported means for integration. If you reach into the database directly youāre asking for trouble, thatās technical architecture 101. The schema can and will change without notice, this is why APIs exist. Using the user tables directly is a Very Bad Ideaā¢.
The same reason people click āAgreeā on ToS without reading them.
What a wonderful world it would be if each one of us had enough time to read all the release notes for every update, fully understand them, make sure that any questions are addressed, and run comprehensive testing against a staged version to account for their own specific edge cases before moving forward.
That world, however, doesnāt exist. Thatās great if it does for you, but not everybody can prioritize reading, understanding, and testing against release notes.
Thereās a certain degree of trust that upgrading Discourse (as we all should) isnāt going to mess things up. Destroying data unexpectedly risks messing things up.
Itās shifting blame to suggest that @TobyPinder should apologize for Discourse assuming this default for data destruction. Yes, we all have a responsibility for our own due diligence but surely Discourse also has a responsibility to consider its usersā real-world scenarios.
I, for one, had no idea this feature was even a thing and immediately though āoh shit, did i lose accounts?ā when i found out. Iāve since adjusted the āclean up inactive usersā setting to 2000000000 days.
Does it really make sense though to have to opt out of this behavior, when data destruction is irrecoverable?
At least, why isnāt it worth considering flipping the default, so nobody risks unexpectedly losing data?
Iām slightly puzzled why anyone would roll out updates to a client site without reading the release notes and staging them first.
Was the clientās decision for reasons beyond scope for this topic. I do not expect ābetaā software to be supported nor are any of my concerns around āsupportā here. The discussion is very relevant to the subsequent release of this software.
future release practices will involve both consulting the documentation
As already identified in my original response and by multiple parties in this thread, the documentation is completely inadequate when describing this impact.
Discourse supports SSO, and a fairly robust API, those are the widely supported means for integration. If you reach into the database directly youāre asking for trouble, thatās technical architecture 101. The schema can and will change without notice, this is why APIs exist. Using the user tables directly is a Very Bad Ideaā¢.
People can and will continue to use all software imperfectly. This does not absolve discourse of responsibility to make safe changes. I am providing one of many potential scenarios within which Discourseās unilateral deletion of data can have wider impacts on a wider system. This simplified example demonstrated the wider impacts of such a change on forum administrators very clearly, but is by no means exhaustive.
If thatās not a sufficient example for the members of this thread, consider that across many industry verticals there is a legal or regulatory requirement for longer retention policies than the stated two years. So this change and the defaults it introduces could expose these platforms to legal risk through data destruction. That the ramifications of such a change have not been considered in a holistic manner beyond a very vague changelog item is troubling and my suggestion would be that this is reviewed.
This is the big difference between hobbyist IT and running any real production service at scale.
If any of my staff applied a bleeding-edge update to a live service without consulting the documentation and running it through staging, that would be immediate grounds for termination as gross misconduct.
15 years ago establishing a representative test environment was an arduous task, it usually required additional hardware and in many cases additional software licensing, but even then any formal release management processes required it. Now in 2019 itās ridiculously easy to clone out entire platforms to conduct such tests.
It does, and your practices will change the first time you make an egregious error and lose customers, revenue, or your job.
There is an alternative approach here which we frequently end up recommending: wait
Tests-passed is the default, and itās much easier to support, but if youāre really unwilling to do the basics to de-risk a technical change then you should probably think about letting stable catch up and let others do some of the heavy-lifting. Youāre still going to need to read the documentation, that responsibility never goes away, but all of the other risks associated with āliving on the edgeā go away.
This really isnāt true, if you decide to misuse something then you canāt really complain when it doesnāt behave as you perceived.
The users will still remain in backups, which is the actual regulatory requirement here in the UK. Thereās no legal requirement in any sector, public or private, to keep credentials in a live system. Financial records - yes. Legal ledgers? Absolutely. I work with organisations on GSI and they have extensive journalling requirements, but empty user records should be purged once theyāve passed their usefulness - thatās both a recommendation from the ICO and part of the Article 5 of the GDPR (Data Minimisation).
As mentioned by others, sometimes this decision is beyond our control, and the documentation around this is insufficient.
People are losing users unexpectedly as a result of this change. āThatās your fault. You should apologize to your clients.ā is an unacceptable response.
You are correct though, there is a big difference between hobbyist IT and running any real production service at scale. If Discourse aims to serve both and everything in between, as it purports to, then potentially destructive changes need to be handled with more consideration.
So Iām with Jeff here. Iād argue that sites arenāt losing ārealā users here. These are users who signed up two years ago, never created a single post, did not read more than 15 minutes worth of posts, and hasnāt been seen since. These users havenāt been emailed a digest in a year (assuming they didnāt unsubscribe earlier). These arenāt users. These are drive-by accounts that were never used.
Clean-up like this is not uncommon. If you sign up for a bank account, but never put money in and never visit the bank, your account will be closed. If you reach out to a doctorās office, but never go in for an appointment, or even schedule one, youāll be removed as a patient. Same idea here.
Sure, if you do not have backups, this data is not recoverable. However, what is the purpose for keeping this data? Discourse is storing this userās email, encrypted/salted password, and the IP address they signed up from. Discourse might store the details that they read a topic or two, but not many as they havenāt read for 15 minutes. What do you need this data for? Discourse isnāt even emailing this user anymore, why keep information about them?
On the topic of release notes, we do our best to mention changes, but we canāt detail everything, everytime. Discourse development moves very quickly, and to write up details on every change would nearly be a full-time job. To that point, we say right at the top of the second post in every release notes (emphasis mine):
We do our best to highlight new features and changes for you, but thereās always too many changes to detail. For a full list of new features, bug fixes, UX improvements, and more, be sure to review the Additional Features and Fixes listed below.