Download Archive of My Posts Implementation


(Duke) #1

Hey All,

I’m working with an undergraduate software engineering team to implement a function on Discourse. @codinghorror suggested as a starting point to implement the “download archive of my posts” to start by making a post here.

Does anybody have any tips or guidance involving this project?


"Find" on this page is wierd
(Erlend Sogge Heggen) #2

Maybe start with what your current plan is. It’s easier to comment on and add to your current path as opposed to detailing the entire path for you.

Without any idea of what you’ve familiarised yourself with already and how you’re planning to approach the problem, the ones who could help basically have to start at “have you read the docs?”


(Régis Hanol) #3

A few things to think about while implementing this feature:

  • That operation needs to run in the background. A job in sidekiq is probably fine.
  • Make sure only once instance of that job can run at anytime for the same user
  • Make sure you think about large forums (ie. users with 10.000 posts). There are some memory constraints.
  • You need to think about how to clean/remove old archives
  • It would be awesome if you could send a PM to the user with a link to download the archive once it’s generated

(Abhishek Gupta) #4

Also, it would be better if you store that in json AND txt format. json for usage in other web apps if the user may and txt for standard record purpose. Because once it is all json it is inter-convertable in sql/csv etc…


(Robin Ward) #5

I disagree with the TXT format actually. While posts being as markdown, we cook them to HTML and that’s the way they’re meant to be read.

Personally I would start with JSON, and then later we could consider making an offline reader like twitter does for reading your downloaded tweets in a web browser. EmberJS makes this particular easy to implement for us. This would be a “version 2” feature though. I would concentrate on the main feature of being able to get the JSON first.


(Tobias Eigen) #6

I’m glad to see you are working on this - what’s the latest news on it? I’d be glad to contribute as a tester or to bounce ideas for what this should be able to do.

I’m not sure what JSON and TXT formats look like, but in the past I have always appreciated any forum tools that allow export into MBOX format, which can then be imported easily into a mail client.

Back in the day, I had some success with a forum called FUD Forum (great name, I know) that had a very slick email integration. It can be fed MBOX files to import email discussions. It automagically generate the users if they don’t already exist. I used it to import mailman archives.

FUD Forum ended up not working well as an actual forum, though, so I gave up on it and moved to Drupal organic groups which ended up being even worse. Maybe Discourse will be the answer for me in this new era. :smile:

Here’s some ancient history from FUD Forum in case you are interested.

Cheers,

Tobias


(Sam Saffron) #7

@codinghorror made a suggestion I really agree with.

To avoid a lot of the pain in the behind UI you need to notify progress and so on, you could.

  1. Change “Download archive” -> “Email me an archive of my posts”
  2. When they click on the button, add a confirm bootbox with a “are you sure you would lie a full archive of all you posts sent to your email” ? "
  3. Once they confirm, button becomes disabled and text is displayed “we are creating an archive and emailing to you”
  4. Only allow operation once a day/week.

Of course, all the actual work needs to happen in a background job.


(Jeff Atwood) #8

Any updates on this project @dukeayers? Any way we can help?


(Sam Saffron) #9

I assume this effort died, people more than welcome to pick this up. For now I removed the “to be implemented” button from the UI, it makes no sense to have it for V1.


(Erlend Sogge Heggen) #10

Certainly not a must-have for v1, but for the record I would love to have this eventually. Very important to encourage dedicated Discourse hosting services to have proper support for this kind of thing as well.

Matt Mullenweg: WordPress.com is the only service of its kind that not only lets you export your data, but gives you an open source package you can run on pretty much any web host out there to run your own instance of the software. So the freedom is really in your hands. I’ve always believed that if you make it easy for people to leave, they’re more likely to stay. - Techcrunch Interview.


(Robin Ward) #11

We definitely will have it eventually.

For the record, I think Discourse basically meets Matt’s description already. If you run the forum (similar to running a WP instance) you can export all the data using the awesome backup admin interface that @zogstrip built.

We want to take it a step farther and let any of your site users export their stuff too! We’ll get there, but we already do a lot to prevent lock in.


(Erlend Sogge Heggen) #12

Oh, right you are! I actually got my features mixed up. I was indeed referring to what @zogstrip has apparently already built. Happy camper, ←right here!


(Arpit Jalan) #13

This feature is now available :gift:

https://github.com/discourse/discourse/pull/3055


Alternate compression format for user post download?
(Bcguy) #14

Can someone share how the user can read these downloaded posts - I’ve searched but its not obvious to the non-technical users on my forum how to uncompress and then read these files.

Is there a FAQ somewhere?


(Jeff Atwood) #15

Open the file in Excel or upload them to Google Docs/Sheets.


(Jeff Atwood) #16