After cleaning my forum’s database, we have a few hundreds of hidden posts that were deleted during this process, and I have two questions about deleting posts in Discourse.
Is there a way to delete permanently all the hidden posts?
Will the posts be deleted someday automatically, or will they stay forever and be visible to admins when they check some user account?
Well, deleted posts are not completely deleted, they are always visible for admins.
So the answer is no, unless you go to put your hands on the database. Probably you can do it by the rails console, but is it so essential to delete something that is visible only to you or your admins?
Deleting rows in the posts table from the database is not recommended. If you do try, make sure you keep a backup to restore to if (when) you mess it up.
Is there in the meantime a way to really delete “deleted posts” from the database?
We have gotten our first GDPR hearing from the German authorities
The hearing is not related to our discourse forum since the person due to which the German authorities got involved did as far as I know not have a login in our discourse forum, but it is related to other of our systems.
Nevertheless I would really like to be on the safe side by deleting these kind of posts from the database. I think it is not GDPR compliant that an admin could view and even restore deleted posts that were deleted months, weeks and even years ago. I do see deleted posts which were deleted more 17 months ago with the start of the forum and I as an admin could even restore them.
Is there no background job which could for example delete “deleted posts” from the database which have been deleted more then a month ago (a configurable time frame)?
Personal data shall be:
…
( c ) adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’);
…
( e ) kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed; personal data may be stored for longer periods insofar as the personal data will be processed solely for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) subject to implementation of the appropriate technical and organisational measures required by this Regulation in order to safeguard the rights and freedoms of the data subject (‘storage limitation’);
This is such a subjective area that, if I were on the Discourse team, I would expect to see legal judgements or other precedents that enforce what you want to do.
from my point of view yes since it is a European law that everyone has to fulfill who is handling personal data of Europeans. In the global corporation where I am working as a senior IT consultant we took care in the last 18 months that no personal data is stored longer then it is needed for the intended use.
Next month we will have a re-audit for the ISO/IEC 27001 certificate and there it will now also be reviewed if we are GDPR compliant as part of the checks that we are ensuring conformity with laws and regulations.
Therefore I am quite sure that storing data of “deleted posts” especially with the possibility to reactivate that data is not GDPR compliant (at least not the data that was “deleted” more then 30 days ago)
So I also want to make sure that the forum where I am doing voluntary work is GDPR compliant.
Discourse did already change a couple of things for the GDPR topic, i.e. changed the logging, added the ability to extract personal data with external plugins for users and admin, the ability to remove user details in the logs after an anonymization …
This seams to the the last bigger topic that needs to be fixed to be fully GDPR compliant.
I do not want to wait until the authorities are suing our non profit organization for not being GDPR compliant and then later on fix the problem. A problem that is identified should be fixed upfront.
I would be interested to see where the law says “personal data” as in posted content vs. “personally identifiable information” as in email address etc.
I am not talking about an anonymized user, that is a totally different topic.
I am talking about personal data of active users, data that was “deleted” but is not really deleted and could be reactivated at any time. That is personal data which is stored even though the intended use is no longer given.
Personal data are any information which are related to an identified or identifiable natural person.
The data subjects are identifiable if they can be directly or indirectly identified, especially by reference to an identifier such as a name, an identification number, location data, an online identifier or one of several special characteristics, which expresses the physical, physiological, genetic, mental, commercial, cultural or social identity of these natural persons. In practice, these also include all data which are or can be assigned to a person in any kind of way.
I’m confused. All forum posts are connected to an account. You’re saying the law stipulates that all posts be deleted? Or that all accounts be anonymous? That seems so extreme I can’t believe that’s a correct interpretation of the law.
well that point is not yet clear if an anonymization is sufficient or if really everything has to be deleted.
I expect that this will be clarified in the future by the courts.
There I would also wait for the first court judgements before changing it to delete everything instead of an anonymization.
But “deleted” posts should at least be really deleted after a given time frame.
No, why should the accounts be anonymous when they are active? As long as they are active we are fine.
I doubt it! Many people have opinions on this and generally they don’t seem to agree with the position you’re taking.
Here’s just one example of the many websites discussing these issues and taking a different position:
It is not “personal data” just because it was written by an identifiable person. The context of a public forum where a member submits posts under terms and conditions they have agreed to is very different to the types of situation and information.that the GDPR appears to target. In those situations individuals may not be aware that data is being collected nor that they are being identified. An example I’m often hearing about is the shopper being tracked by their mobile phone while shopping.
Anyway, what we debate here is not what is important. What is important is that you should be submitting something with legal standing such as precedents or relevant legal opinions. If you don’t have such authorities then your opinion, no more or less than mine, carries no legal weight in determining whether your fears are well-founded or not.
well if a complete database deletion of the posts is not an option, is there a simple solution to overwrite the post and its revisions?
I understood from other posts in this forum that destroying just the post in the table posts might result in database inconsistencies due to the links to other tables. So that is not an option.
But would it be possible to overwrite the “deleted” post in the table posts and its revisions in the table post_revisions with a short text like “permanently deleted data” by a background job for all “deleted” posts older then for example 30 days based on the deleted_at timestamp?
Are there more tables to consider like quoted_posts?
With such a solution at least the original data would be “deleted” without the possibility by any admin to view and to restore old “not really deleted” data. No user could then complain that his or her data has not been deleted and even for the authorities the data would be “deleted”.
So if someone could tell me which tables should be taken into consideration I would ask the programmer I know, who has already programmed plugins for discourse, if he could program such a plugin to overwrite the “deleted” post data with “permanently deleted data” text.