Discourse Public Data Dump

Given the advent of AI and the need for large datasets on local development machines, we have pulled together a quick pattern for getting a “workable” copy of all public (visible by anon) data from a Discourse forum.

Keeping the documentation up to date at:

Why you care?

  • You want a local database with LOTS of topics
  • You don’t want ANY personal data on your system

This is still in a very rough shape, but it is workable for initial experiments and gives you a very populous local setup.


This document is version controlled - suggest changes on github.

20 Likes

hello, thank you for this work, i am pretty new to discourse api. but i would like to give it a try. from the read me file it looks like the topic_query and the post_query are the key docs in this repo. do you know if we can customize those files to adapt it to our desired dump? for example we just want to dump topics from a specific category or tags. thanks