[bounty] Google+ (private ) communities: export screenscraper + importer

Status update

I believe that I have finished all the development work I can see for this plugin, though I’m not promising to go away.

It is of relatively time-limited utility. Lacking automated tests, as mine is, import scripts will tend to rot when the rest of Discourse is changed, so instead of being a useful pattern for others it would gradually and insensibly decay into (or, more into?) a counter-example for the next author of an importer. I don’t expect to be an ongoing active developer here, even though I’m grateful to those who are, so I’m unlikely to catch such decay as it happens. Therefore, I won’t submit a PR.

That said, I have signed the CLA, and I support anyone else who does feel that it would bring value to submit a PR, with or without a rebase.

Using this importer

For anyone running imports with this code, I’m sorry, but you’ll just have to read this massive thread. I’m not going to try to summarize it into a README. For better or worse, reading this wall of text is still much less work than actually running an import.

When all your imports are done, you’ll want to rebake. For example, I have seen that oneboxes didn’t show up until after a rebake.

If you are doing an offline import before making content live, a simple rebake is the easiest thing. However, if — like we had for https://forum.makerforums.info/ — you want to import posts into a live site, you probably would rather rebake as a background task. To do that, I recommend following this example:

Post.in_batches.update_all('baked_version = NULL')

When you do that, make sure that rebake_old_posts_count is set to a value that your server can support. It’s number of posts rebaked every 15 minutes. I suggest starting small and watching sidekiq jobs to make sure you are not getting behind. (https://yoursite/sidekiq/ after logging into https://yoursite/ will show this.)

Automatically rebaking old posts starts with the newest posts and rebakes backwards in time, so the most immediately relevant content will rebake first.

3 Likes