Dump all conversations in a file and structured data

This question is similar to this one: Does Discourse support export conversations as an organized bulk of data?

but we are looking for a way to do some NLP on all conversations of our Discourse site. Someone in our team asked if this could be done by acting at some low-level, in the backend, e.g. exporting the database but without the table, with something like pg_dump --schema-only. I didn’t fully understand what my colleague meant but I thought maybe you would.

If you’re self-hosted, then they can do the pg_dump command that they think will help.

You can also dump data in various formats with the Data Explorer Plugin.

2 Likes

This plugin seems to provide most of what we’re looking for! Thanks!

So I installed the plugin and looked at all the queries made at (Superseded) What cool data explorer queries have you come up with? but there isn’t anything that can export the actual conversations. For example, I have asked for the top 100 active topics. I get database entries with topic IDs (see screenshot), but no conversations. Is this because the plugin is only to extract data from the database only and won’t pull the conversation themselves? If that is correct, is there a way to use the information pulled from the database to pull the conversations in a json files, and whose topic IDs are the ones pulled from the database by the plugin?

 SELECT * FROM posts where topic_id=425

That will give you the posts the first topic in your query (given that I can type on this phone).

But if what you want is JSON, you could do something like

  https://meta.discourse.org/t/dump-all-conversations-in-a-file-and-structured-data/202351.json

I didn’t understand your 1st option, maybe a typo in your text? Did you mean I only get the 1st post of the topic?

Regarding the 2nd option with the .json extension, is there an alternative url that uses the topic_id or any other entry that can be used to have a more programmatic way to get the conversation as a json without having to know the topic title?

Did you try the sql query? Was there an error? Edit: I checked. That query will return all posts in a topic.

You can get any topic with only the topic id.

https://meta.discourse.org/t/-/202351.json

the query was fine, i just misunderstood your explanation of what it actually provides. Thanks for double-checking. These are great solutions.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.