but we are looking for a way to do some NLP on all conversations of our Discourse site. Someone in our team asked if this could be done by acting at some low-level, in the backend, e.g. exporting the database but without the table, with something like pg_dump --schema-only. I didn’t fully understand what my colleague meant but I thought maybe you would.
So I installed the plugin and looked at all the queries made at (Superseded) What cool data explorer queries have you come up with? but there isn’t anything that can export the actual conversations. For example, I have asked for the top 100 active topics. I get database entries with topic IDs (see screenshot), but no conversations. Is this because the plugin is only to extract data from the database only and won’t pull the conversation themselves? If that is correct, is there a way to use the information pulled from the database to pull the conversations in a json files, and whose topic IDs are the ones pulled from the database by the plugin?
I didn’t understand your 1st option, maybe a typo in your text? Did you mean I only get the 1st post of the topic?
Regarding the 2nd option with the .json extension, is there an alternative url that uses the topic_id or any other entry that can be used to have a more programmatic way to get the conversation as a json without having to know the topic title?