Rails Girls Summer of Code 2017: Backup Providers

eviltrout · July 4, 2017, 3:52pm

We’ve officially kicked off the Rails Girls Summer of Code 2017 with our group, berlin diamonds.

They’re going to be working on adding extra backup providers to Discourse. This topic can be used by the teams involved for questions and answers, as well to track progress.

Background

Discourse has always provided administrators the ability to back up their community data. It is important to us as an open source project that regardless of where you run your forum, that you be able to take your data with you.

Currently, we support downloading data dumps to your local computer as well as uploading data to Amazon S3. There is also a plugin for uploading to Dropbox available that could be used as an example.

First Steps

The team has decided to focus on a backup provider for Google Drive. Here’s how I’d suggest tackling this problem, although the team is free to approach this on their own schedule and pace:

Become very familiar with the dropbox plugin for backups. Try installing it locally and using it to connect to Dropbox. @Falco is also around to answer questions you might have about how it works and how it is designed.
Review how Google Drive’s API works. What is required to upload a file there? Are there any rubygems you can use to contact the API easily?
Create a new repository for the Google Drive backup provider, and try to build it up from the simplest possible thing that could work. Use Site Settings for any API keys and variables. When it doubt, base it on the Dropbox plugin.
Discourse will provide code reviews and offer assistance along the way.
The plugin is working!

Once we’re done the above we can figure out following steps for more providers, and perhaps DRYing up the plugins to avoid repeated code.

kajatiger · July 5, 2017, 12:03pm

@Falco Can I install the dropbox plugin without using the docker container? I would like to just clone it into the discourse/plugins folder…

sam · July 5, 2017, 12:19pm

Yes, just cloning it into the plugins folder should do the trick!

eviltrout · July 5, 2017, 2:04pm

To be more specific, you can do something like this locally:

$ cd discourse/plugins
$ git clone git@github.com:xfalcox/discourse-backups-to-dropbox.git
$ cd ..
$ rm -rm tmp
$ bundle exec rails server

Then the plugin will be installed in plugins/discourse-backups-to-dropbox

Jen_Lijo · July 6, 2017, 9:45am

Hi! This is Jen, and together with @kajatiger will be working on creating the google-drive plugin during this Rails Girls Summer of Code, happy to meet the community of Discourse!

Jen_Lijo · July 6, 2017, 11:38am

hi @Falco! should we be able to do backups from dev? We keep getting a “backup failed” with a long bunch of errors in our log.

2017-07-06 13:32:48] EXCEPTION: Failed to archive uploads.
tar: Option --warning=no-file-changed is not supported

@kajatiger

sam · July 6, 2017, 12:15pm

Looks to me like the version of the tar program you are running is missing some features we need. What OS are you running? If this is a mac can you try brew install tar

Jen_Lijo · July 6, 2017, 12:23pm

We’ve solved the issue installing a linux-compatible version of tar for OSX:

$ brew install coreutils
$ brew install gnu-tar --with-default-names

kajatiger · July 13, 2017, 2:59pm

Here is a little update on what we have been doing so far:

we looked at the drop-box plugin and learned that it overwrites the after_create_hook method inside the backup.rb model with a Backup.class_eval
we used the same method to hook into the backup in discourse
we realized that the GoogleDrive OAuth uses a complex way to get the access_token for a certain user with his account, so
we decided to use another way of authorization: the GoogleDrive Service Account (https://github.com/gimite/google-drive-ruby/blob/master/doc/authorization.md look at the 3rd option here)
we successfully sent our backup files to our service account in the development mode
we pushed our changes to a repository (https://github.com/berlindiamonds/discourse-googledrive-backup)
we started writing tests with Rspec

It’s kind of working, but still needs a lot of improvement. Please feel free to comment, criticize, suggest and ask

Jen_Lijo · July 18, 2017, 1:29pm

[status update]

Hi all!

We’ve created the first version of the Google Drive backup plugin. It works with a service account from Google Drive. How to use it it’s described in the README file

This is the link to the repository:

We are open to suggestions / improvements, please let us know! It’s been a great experience to solve this little “plugin” puzzle!

kajatiger · July 18, 2017, 2:27pm

here is also a new blog post to have an overview: http://berlindiamonds.blogspot.de/2017/07/creating-plugin-for-discourse.html

Mittineague · July 18, 2017, 4:32pm

If you add a line to the plugin.rb file, you can have a link to the plugins “home” in the Installed Plugins list in the Admin - Plugins panel. eg.

# name: discourse-backup-to-googledrive
# about: -
# version: 1.0
# authors: Kaja & Jen
# url: https://github.com/berlindiamonds/discourse-googledrive-backup

eviltrout · July 18, 2017, 5:52pm

Wow you got that working fast, so great job! I do think we should should spend a little time making it more awesome before moving on to another backup provider. I know you based this on @falco’s work and that’s a great start, but I’d like to see us move towards something that is more reusable and easier to test.

DriveSynchronizer

This class does all the work currently in a sync class method, including finding the backups to synchronize and then copying them up to Google Drive.

I think it would be better if DriveSynchronizer used more principals of object oriented programming, such as instance variables. I was thinking it might be nice DriveSynchronizer was responsible for uploading one backup to google drive at a time, since they can’t be uploaded in bulk anyway. What if it worked like this?

ds = DriveSynchronizer.new(backup)
ds.sync

It would also be good if it had a method called can_sync? that would return true or false if the file can be synchronized.

Finally I think it would be good to have a reader method to return the backup that the synchronizer is for.

I’ve created a pull request for your repo that adds a bunch of failing tests for the behavior I’ve described above.

You can consider this a form of test driven development, where the specification is the tests and the exercise is to make your code pass them

To run the tests, once you’ve accepted the pull request, from your discourse directory do this:

bundle exec rake plugin:spec["discourse-googledrive-backup"]

You should see 5 test failures. Your goal will be to get all those passing. In the process I hope you’ll learn how tests work in Discourse and how I’d like to see the class designed. As always feel free to post questions here on meta or in our slack channel

Cheers!

kajatiger · July 19, 2017, 2:00pm

all tests are green here is my suggestion to a more ruby-ish synchronizer:

https://github.com/berlindiamonds/discourse-googledrive-backup/blob/testservice/lib/drive_synchronizer.rb

eviltrout · July 19, 2017, 2:30pm

It’s looking good! Here’s what I’d suggest working on next:

self.sync is a class method. It should be converted to an instance method which will use the @backup instance variable and sync it to the destination.
The SyncBackupsToDrive job should be updated to use the class. You will need to move the logic that finds the recent backups and put it there. The job should loop through the backups, and create a DriveSynchronizer for each backup and call .sync on it
We should be calling can_sync? before .sync, but thinking ahead shouldn’t this be automatic? I suggest you make a protected method of perform_sync with the old contents of sync, then have the sync method call perform_sync, but only if can_sync? is true.
At this point, you might have noticed that our DriveSynchronizer class has some generic functionality that might be reusable! For example the constructor and sync will never change. You should create a new base class that takes a backup and has perform_sync method, and then have DriveSynchronizer extend it.
Your new base class should be tested! Try to come up wth a test that makes sure that perform_sync is only called when can_sync is true.
The test of the backup reader can be moved to the base class test too.

As always, feel free to ask for questions/feedback

kajatiger · July 20, 2017, 11:49am

Do you think, that the @backup variable should store the value that we are now calling local_backup_files? Or is it the output of Backup.new? Or is it the result of the whole synchronizer? Right now it is only an argument, I don’t know what we need to test it for…

eviltrout · July 20, 2017, 3:00pm

We chatted about this offline, but @backup should be an instance of the Backup class that represents one Backup to sync. The synchronizer class will then ask that object for its path and use it to synchronize to the remote provider.

eviltrout · July 26, 2017, 2:29pm

I’ve just reviewed the dev branch and have the following very minor notes:

sync_backups_to_drive.rb

The line of code is far too long. You should change to the do form of each so you can split it up into a few shorter lines.

synchronizer.rb

Looks like the abstract method should be can_sync? with a question mark here.

Where to go from here

You probably noticed that Synchronizer and its specs are code that multiple plugins will use, and it would be bad to copy and paste them into each plugin you will create.

Instead, I think it’s time to split your plugin in half!

One that providers the Synchronizer base
One that providers the Synchronizer extended class for your specific provider

The idea here is when someone writes a new backup provider, they’ll first install the base plugin, then install the plugins for the providers of their choice

Naming is hard, so you should think carefully about what you want to call the two repos. It would be good if they followed a similar naming convention so that when people see lists of files (alphabetically) they are grouped together.

Finally a gotcha! currently plugins in DIscourse are loaded in alphabetical order. We plan on fixing this but for now, your base class needs to be the first one in alphabetical order if you want things to load properly.

So in summary

Split the plugin in half
Each should have their own readme
Install both and
- make sure the specs pass
- make sure the backups still work

Thanks!

Robin

kajatiger · July 26, 2017, 3:23pm

not done yet, but open for comments and reviews:
https://github.com/berlindiamonds/basic-sync-plugin

eviltrout · July 27, 2017, 2:08pm

I think you’re going to want to name it something more consistent due to the alphabetical order I mentioned above. For example:

discourse-sync-base
discourse-sync-googledrive
discourse-sync-dropbox
discourse-sync-etc…

Longer Term Plan

Now that the basic structure for the plugin is in place, I see the bigger picture for the project coming together:

Update the documentation for the google sync plugin with the latest screenshots and documentation. It is important that the installation instructions explain that the base plugin must be present first.
Add a directory of available sync plugins to the README of the base plugin. For now you can just link to your google drive one.
Fork @falco’s dropbox sync plugin, and update it to use the new Synchronizer base class. When ready, send him a pull request with the changes that he will review and merge in.
At this point you will have two sync providers with a common base class! Quite an accomplishment!
One thing we identified earlier as smelly code in the plugins to set up the after_create hook. Currently you have to use a class_eval to set them up which is not good. Would it be possible to use add_model_callback from the plugin api to do this? Please investigate and come up with a better API
Create a new pull request to discourse core for your new API. Once we merge it, update your plugin (and the dropbox plugin) to use the new API.
At this point, you should start on your next sync provider! Pick another one from the list, and assuming we’ve designed the API well, implement a synchronizer that works
Pick another sync provider and repeat

Topic		Replies	Views
Google Drive: External backup storage Feature	2	2160	August 21, 2017
Box: External backup storage Feature	2	1898	August 21, 2017
Discourse google drive Backup plugin Support	13	3175	August 4, 2023
Improving import and export support Feature	29	8376	October 8, 2014
File upload from google drive or dropbox Support	1	735	April 5, 2022