In order to comply with our legal obligations, we need a way to allow users to download their full posting history (not excerpts) via the Discourse API, so that the data can be included when a user makes a DSAR request to our website / legal team directly (so they get a complete record of both their Last.fm data and discourse post data). Is this currently possible? And if so, what’s the correct way to go about this?
We had been trying to do this with get user actions (Discourse API Docs) but it seems that this is only returning a partial record of excerpts, not full posts.
I’m aware that users can already download their data directly from their activity settings page on Discourse, but I’m afraid this won’t be enough to satisfy our legal team () - they’re insisting that the data be downloaded from our website. Any ideas? I’m worried that we might be forced to pull out of our hosting plan if we can’t sort this out.
Well, you could conceivable do it with the API (see How to reverse engineer the Discourse API), but there isn’t an easy way around the requirement that they click the email validation link that protects them from someone else downloading their data.
As Jay noted, an authenticated Post request to /export_csv/export_entity.json can be used to generate the archive. To do this, you need to use an All Users Global API key. Set the request’s API Username to the username of the user you want to create the archive for. You need to supply an entity parameter with the request. The entity parameter should be set to user_archive. With this approach, a notification will be generated for the user. For most users, this will also send them an email that contains a download link, but I don’t think that can be relied on - it’s dependant on how the user has configured their email preferences.
An example curl request to generate the archive. I’ve substituted $api_key for a Global All Users API key in the request:
I’m unsure if there is any way you can generate the archive in a way that allows you to share it without the user having to access the notification and click its download link. Possibly a Data Explorer query could be developed that returns the information. You could then run the Data Explorer query via the API. The downside of this approach is that if there are more than 10 000 rows of data returned by the query, you would need to make multiple requests to get the data. For more details about the Data Explorer approach, see: How to run Data Explorer queries with the Discourse API.