Hello all, I didn’t quite find a similar post to this, so here goes:
My org is building out our analytics dashboards (within Snowflake) and are hoping to integrate the user data we have in Discourse.
Currently, we are self-hosting Discourse on an EC2 instance within Docker. It looks like there a couple of options here:
- Use AWS Database Migration Service to access the Postgres server hosted on the EC2. This would require exposing the Postgres port (which I don’t believe is configured to be publicly accessible by default) and creating a new Postgres user, but otherwise this seems like a pretty good solution.
- A straight sql dump to an s3 bucket (how backups work in Discourse) won’t work, as SQL isn’t accepted as a data format by Snowflake.
- Install the Data Explorer plugin, write the queries that output the data we want, then run + export resulting data in CSVs/JSONs that we can than import into Snowflake.
- This solution looks to have more steps than 1. above, but would also do the trick and has less of a chance of screwing up the Discourse DB.
I would appreciate input or hearing anyone else’s tales of getting their DB data into an analytics pipeline. Thank you!