[bounty] Google+ (private ) communities: export screenscraper + importer

Progress report: I have written something vaguely resembling ruby an importer that looks like it covers all the current requirements in general design. (It doesn’t translate G+ +1’s into likes, because the exporter does not represent the +1s.) I haven’t run it at all actually imported data with it yet. :roll_eyes:

My script takes as arguments paths to files containing Friends+Me Google+ Exporter JSON export files, Friends+Me Google+ Exporter CSV image map files, and a single JSON file that maps Google+ Category IDs to Discourse categories, subcategories, and tags. It does not create new categories; if a category is missing it complains and bails out after writing a new file in which to fill the missing information to complete the import. It expects all the categories to have been created already.

The idea is to do all the imports into a single discourse in one pass, so that all possible “plus-mentions” of other G+ users across multiple G+ communities turn into “at-mentions” in Discourse, even for people not active in the community in which they are mentioned, as long as they wrote some post or comment somewhere in the whole set of data being imported. This is because so far it looks like I’ll be importing about 10 communities, with about 300MB of input JSON and about 40GB of images.

It is intended to work on a Discourse instance that already has users referenced in the import by google ID, and that already has content and categories created. I hope that it will also make it possible for people to log in with google OAuth2 after the import and automatically own their content because their google auth ID is tied to the fake account holding the data, so that their ability to own their own content is preserved.

I expect that a 431-line file that has never seen an interpreter will be loads of fun to debug, especially when written and being tested by someone who has never written any ruby before. I don’t pretend that writing this script is the largest part of the work. I’ll share it now or any later time with anyone seeking the bounty, as long as you’ll share your fixes with me regardless of bounty progress; just PM me. I’ll share it myself under GPLv3 at such time as I get it working. In the meantime, I’m considering this work my contribution toward someone else claiming the bounty, to make it more likely to be worth the time for whoever takes it to completion, because of the comment above that the bounty is smaller than typical.

11 Likes