Public data dumps

I’d like to propose a public data export feature similar to Stack Exchange’s. This is not the same as the backup feature, because it requires sanitizing all private user data first. Both JSON and HTML export formats would be great.

This is useful for:

  • CC-licensed content to be made available more easily
  • users to feel more comfortable knowing that the site can’t suddenly disappear with all their content
  • data analysis, etc.

Let me know what you think.

5 个赞

Sounds like a great idea, would you like to work on something like this?

1 个赞

I would rather see the individual download button on the user page working first.

2 个赞

Maybe these are related. E.g. the functionality provides filtering options during the export process. Export content from these users, these topics, these keywords etc.

1 个赞

This would be great!

Did this feature ever get built? What’s the best way to provide public exports of a site?

既然 ChatGPT 已将机器人和人工智能领域推向中心舞台,人们开始提及使用 Discourse 论坛的数据作为训练数据。因此,这个主题似乎是两种可能性中的一种,另一种是 Discourse REST API

我知道管理员有能力获取备份然后清理数据,最好有一个已知的标准。

我们能否获得此事的最新状态,即使只是“自上次以来没有变化”。 :slightly_smiling_face:

2 个赞

这是我们人工智能团队的路线图。 :smiley:

3 个赞

很高兴看到 Discourse 正朝着正确的方向发展,为那只小鸟感到难过。

1 个赞