Public data dumps

I’d like to propose a public data export feature similar to Stack Exchange’s. This is not the same as the backup feature, because it requires sanitizing all private user data first. Both JSON and HTML export formats would be great.

This is useful for:

  • CC-licensed content to be made available more easily
  • users to feel more comfortable knowing that the site can’t suddenly disappear with all their content
  • data analysis, etc.

Let me know what you think.

5 Likes

Sounds like a great idea, would you like to work on something like this?

1 Like

I would rather see the individual download button on the user page working first.

2 Likes

Maybe these are related. E.g. the functionality provides filtering options during the export process. Export content from these users, these topics, these keywords etc.

1 Like

This would be great!

Did this feature ever get built? What’s the best way to provide public exports of a site?

Now that ChatGPT has moved the world of bots and AI closer to center stage, starting to see mention of using the data from a Discourse forum as training data. As such this topic seemed like one of two possiblities for such, the other being the Discourse Rest API.

I know admins have the ability to grab a backup and then santize the data, having a known standard would be prefered.

Can we get a current status on this even it is only, no change since last time. :slightly_smiling_face:

2 Likes

This is in our roadmap for the AI team now. :smiley:

3 Likes

Glad to see Discourse is headed in the right direction, so sad for the little bird.

1 Like