We’ve officially kicked off the Rails Girls Summer of Code 2017 with our group, berlin diamonds.
They’re going to be working on adding extra backup providers to Discourse. This topic can be used by the teams involved for questions and answers, as well to track progress.
Discourse has always provided administrators the ability to back up their community data. It is important to us as an open source project that regardless of where you run your forum, that you be able to take your data with you.
Currently, we support downloading data dumps to your local computer as well as uploading data to Amazon S3. There is also a plugin for uploading to Dropbox available that could be used as an example.
The team has decided to focus on a backup provider for Google Drive. Here’s how I’d suggest tackling this problem, although the team is free to approach this on their own schedule and pace:
Become very familiar with the dropbox plugin for backups. Try installing it locally and using it to connect to Dropbox. @Falco is also around to answer questions you might have about how it works and how it is designed.
Review how Google Drive’s API works. What is required to upload a file there? Are there any rubygems you can use to contact the API easily?
Create a new repository for the Google Drive backup provider, and try to build it up from the simplest possible thing that could work. Use Site Settings for any API keys and variables. When it doubt, base it on the Dropbox plugin.
Discourse will provide code reviews and offer assistance along the way.
The plugin is working!
Once we’re done the above we can figure out following steps for more providers, and perhaps DRYing up the plugins to avoid repeated code.
@Falco Can I install the dropbox plugin without using the docker container? I would like to just clone it into the discourse/plugins folder…
Yes, just cloning it into the plugins folder should do the trick!
To be more specific, you can do something like this locally:
$ cd discourse/plugins
$ git clone firstname.lastname@example.org:xfalcox/discourse-backups-to-dropbox.git
$ cd ..
$ rm -rm tmp
$ bundle exec rails server
Then the plugin will be installed in
Hi! This is Jen, and together with @kajatiger will be working on creating the google-drive plugin during this Rails Girls Summer of Code, happy to meet the community of Discourse!
hi @Falco! should we be able to do backups from dev? We keep getting a “backup failed” with a long bunch of errors in our log.
2017-07-06 13:32:48] EXCEPTION: Failed to archive uploads.
tar: Option --warning=no-file-changed is not supported
Looks to me like the version of the tar program you are running is missing some features we need. What OS are you running? If this is a mac can you try
brew install tar
We’ve solved the issue installing a linux-compatible version of tar for OSX:
$ brew install coreutils
$ brew install gnu-tar --with-default-names
Here is a little update on what we have been doing so far:
- we looked at the drop-box plugin and learned that it overwrites the after_create_hook method inside the backup.rb model with a Backup.class_eval
- we used the same method to hook into the backup in discourse
- we realized that the GoogleDrive OAuth uses a complex way to get the access_token for a certain user with his account, so
- we decided to use another way of authorization: the GoogleDrive Service Account (https://github.com/gimite/google-drive-ruby/blob/master/doc/authorization.md look at the 3rd option here)
- we successfully sent our backup files to our service account in the development mode
- we pushed our changes to a repository (https://github.com/berlindiamonds/discourse-googledrive-backup)
- we started writing tests with Rspec
It’s kind of working, but still needs a lot of improvement. Please feel free to comment, criticize, suggest and ask
We’ve created the first version of the Google Drive backup plugin. It works with a service account from Google Drive. How to use it it’s described in the README file
This is the link to the repository:
We are open to suggestions / improvements, please let us know! It’s been a great experience to solve this little “plugin” puzzle!
If you add a line to the plugin.rb file, you can have a link to the plugins “home” in the Installed Plugins list in the Admin - Plugins panel. eg.
# name: discourse-backup-to-googledrive
# about: -
# version: 1.0
# authors: Kaja & Jen
# url: https://github.com/berlindiamonds/discourse-googledrive-backup
Wow you got that working fast, so great job! I do think we should should spend a little time making it more awesome before moving on to another backup provider. I know you based this on @falco’s work and that’s a great start, but I’d like to see us move towards something that is more reusable and easier to test.
This class does all the work currently in a
sync class method, including finding the backups to synchronize and then copying them up to Google Drive.
I think it would be better if
DriveSynchronizer used more principals of object oriented programming, such as instance variables. I was thinking it might be nice
DriveSynchronizer was responsible for uploading one backup to google drive at a time, since they can’t be uploaded in bulk anyway. What if it worked like this?
ds = DriveSynchronizer.new(backup)
It would also be good if it had a method called
can_sync? that would return
false if the file can be synchronized.
Finally I think it would be good to have a reader method to return the
backup that the synchronizer is for.
I’ve created a pull request for your repo that adds a bunch of failing tests for the behavior I’ve described above.
You can consider this a form of test driven development, where the specification is the tests and the exercise is to make your code pass them
To run the tests, once you’ve accepted the pull request, from your discourse directory do this:
bundle exec rake plugin:spec["discourse-googledrive-backup"]
You should see 5 test failures. Your goal will be to get all those passing. In the process I hope you’ll learn how tests work in Discourse and how I’d like to see the class designed. As always feel free to post questions here on meta or in our slack channel
It’s looking good! Here’s what I’d suggest working on next:
self.sync is a class method. It should be converted to an instance method which will use the
@backup instance variable and sync it to the destination.
SyncBackupsToDrive job should be updated to use the class. You will need to move the logic that finds the recent backups and put it there. The job should loop through the backups, and create a
DriveSynchronizer for each backup and call
.sync on it
We should be calling
.sync, but thinking ahead shouldn’t this be automatic? I suggest you make a
protected method of
perform_sync with the old contents of
sync, then have the
sync method call
perform_sync, but only if
can_sync? is true.
At this point, you might have noticed that our
DriveSynchronizer class has some generic functionality that might be reusable! For example the constructor and
sync will never change. You should create a new base class that takes a backup and has perform_sync method, and then have
DriveSynchronizer extend it.
Your new base class should be tested! Try to come up wth a test that makes sure that
perform_sync is only called when
can_sync is true.
The test of the
backup reader can be moved to the base class test too.
As always, feel free to ask for questions/feedback
Do you think, that the
@backup variable should store the value that we are now calling
local_backup_files? Or is it the output of
Backup.new? Or is it the result of the whole synchronizer? Right now it is only an argument, I don’t know what we need to test it for…
We chatted about this offline, but
@backup should be an instance of the
Backup class that represents one
Backup to sync. The synchronizer class will then ask that object for its path and use it to synchronize to the remote provider.
I’ve just reviewed the dev branch and have the following very minor notes:
- The line of code is far too long. You should change to the
do form of
each so you can split it up into a few shorter lines.
- Looks like the abstract method should be
can_sync? with a question mark here.
Where to go from here
You probably noticed that
Synchronizer and its specs are code that multiple plugins will use, and it would be bad to copy and paste them into each plugin you will create.
Instead, I think it’s time to split your plugin in half!
- One that providers the Synchronizer base
- One that providers the Synchronizer extended class for your specific provider
The idea here is when someone writes a new backup provider, they’ll first install the base plugin, then install the plugins for the providers of their choice
Naming is hard, so you should think carefully about what you want to call the two repos. It would be good if they followed a similar naming convention so that when people see lists of files (alphabetically) they are grouped together.
Finally a gotcha! currently plugins in DIscourse are loaded in alphabetical order. We plan on fixing this but for now, your base class needs to be the first one in alphabetical order if you want things to load properly.
So in summary
- Split the plugin in half
- Each should have their own readme
- Install both and
- make sure the specs pass
- make sure the backups still work
I think you’re going to want to name it something more consistent due to the alphabetical order I mentioned above. For example:
Longer Term Plan
Now that the basic structure for the plugin is in place, I see the bigger picture for the project coming together:
Update the documentation for the google sync plugin with the latest screenshots and documentation. It is important that the installation instructions explain that the base plugin must be present first.
Add a directory of available sync plugins to the README of the base plugin. For now you can just link to your google drive one.
Fork @falco’s dropbox sync plugin, and update it to use the new
Synchronizer base class. When ready, send him a pull request with the changes that he will review and merge in.
At this point you will have two sync providers with a common base class! Quite an accomplishment!
One thing we identified earlier as smelly code in the plugins to set up the
after_create hook. Currently you have to use a
class_eval to set them up which is not good. Would it be possible to use
add_model_callback from the plugin api to do this? Please investigate and come up with a better API
Create a new pull request to discourse core for your new API. Once we merge it, update your plugin (and the dropbox plugin) to use the new API.
At this point, you should start on your next sync provider! Pick another one from the list, and assuming we’ve designed the API well, implement a synchronizer that works
Pick another sync provider and repeat