We’ve officially kicked off the Rails Girls Summer of Code 2017 with our group, berlin diamonds.
They’re going to be working on adding extra backup providers to Discourse. This topic can be used by the teams involved for questions and answers, as well to track progress.
Background
Discourse has always provided administrators the ability to back up their community data. It is important to us as an open source project that regardless of where you run your forum, that you be able to take your data with you.
Currently, we support downloading data dumps to your local computer as well as uploading data to Amazon S3. There is also a plugin for uploading to Dropbox available that could be used as an example.
First Steps
The team has decided to focus on a backup provider for Google Drive. Here’s how I’d suggest tackling this problem, although the team is free to approach this on their own schedule and pace:
Become very familiar with the dropbox plugin for backups. Try installing it locally and using it to connect to Dropbox. @Falco is also around to answer questions you might have about how it works and how it is designed.
Review how Google Drive’s API works. What is required to upload a file there? Are there any rubygems you can use to contact the API easily?
Create a new repository for the Google Drive backup provider, and try to build it up from the simplest possible thing that could work. Use Site Settings for any API keys and variables. When it doubt, base it on the Dropbox plugin.
Discourse will provide code reviews and offer assistance along the way.
The plugin is working!
Once we’re done the above we can figure out following steps for more providers, and perhaps DRYing up the plugins to avoid repeated code.
Hi! This is Jen, and together with @kajatiger will be working on creating the google-drive plugin during this Rails Girls Summer of Code, happy to meet the community of Discourse!
Looks to me like the version of the tar program you are running is missing some features we need. What OS are you running? If this is a mac can you try brew install tar
We’ve created the first version of the Google Drive backup plugin. It works with a service account from Google Drive. How to use it it’s described in the README file
This is the link to the repository:
We are open to suggestions / improvements, please let us know! It’s been a great experience to solve this little “plugin” puzzle!
Wow you got that working fast, so great job! I do think we should should spend a little time making it more awesome before moving on to another backup provider. I know you based this on @falco’s work and that’s a great start, but I’d like to see us move towards something that is more reusable and easier to test.
DriveSynchronizer
This class does all the work currently in a sync class method, including finding the backups to synchronize and then copying them up to Google Drive.
I think it would be better if DriveSynchronizer used more principals of object oriented programming, such as instance variables. I was thinking it might be nice DriveSynchronizer was responsible for uploading one backup to google drive at a time, since they can’t be uploaded in bulk anyway. What if it worked like this?
ds = DriveSynchronizer.new(backup)
ds.sync
It would also be good if it had a method called can_sync? that would return true or false if the file can be synchronized.
Finally I think it would be good to have a reader method to return the backup that the synchronizer is for.
I’ve created a pull request for your repo that adds a bunch of failing tests for the behavior I’ve described above.
You can consider this a form of test driven development, where the specification is the tests and the exercise is to make your code pass them
To run the tests, once you’ve accepted the pull request, from your discourse directory do this:
You should see 5 test failures. Your goal will be to get all those passing. In the process I hope you’ll learn how tests work in Discourse and how I’d like to see the class designed. As always feel free to post questions here on meta or in our slack channel
It’s looking good! Here’s what I’d suggest working on next:
self.sync is a class method. It should be converted to an instance method which will use the @backup instance variable and sync it to the destination.
The SyncBackupsToDrive job should be updated to use the class. You will need to move the logic that finds the recent backups and put it there. The job should loop through the backups, and create a DriveSynchronizer for each backup and call .sync on it
We should be calling can_sync? before .sync, but thinking ahead shouldn’t this be automatic? I suggest you make a protected method of perform_sync with the old contents of sync, then have the sync method call perform_sync, but only ifcan_sync? is true.
At this point, you might have noticed that our DriveSynchronizer class has some generic functionality that might be reusable! For example the constructor and sync will never change. You should create a new base class that takes a backup and has perform_sync method, and then have DriveSynchronizer extend it.
Your new base class should be tested! Try to come up wth a test that makes sure that perform_sync is only called when can_sync is true.
The test of the backup reader can be moved to the base class test too.
As always, feel free to ask for questions/feedback
Do you think, that the @backup variable should store the value that we are now calling local_backup_files? Or is it the output of Backup.new? Or is it the result of the whole synchronizer? Right now it is only an argument, I don’t know what we need to test it for…
We chatted about this offline, but @backup should be an instance of the Backup class that represents one Backup to sync. The synchronizer class will then ask that object for its path and use it to synchronize to the remote provider.
I’ve just reviewed the dev branch and have the following very minor notes:
sync_backups_to_drive.rb
The line of code is far too long. You should change to the do form of each so you can split it up into a few shorter lines.
synchronizer.rb
Looks like the abstract method should be can_sync? with a question mark here.
Where to go from here
You probably noticed that Synchronizer and its specs are code that multiple plugins will use, and it would be bad to copy and paste them into each plugin you will create.
Instead, I think it’s time to split your plugin in half!
One that providers the Synchronizer base
One that providers the Synchronizer extended class for your specific provider
The idea here is when someone writes a new backup provider, they’ll first install the base plugin, then install the plugins for the providers of their choice
Naming is hard, so you should think carefully about what you want to call the two repos. It would be good if they followed a similar naming convention so that when people see lists of files (alphabetically) they are grouped together.
Finally a gotcha! currently plugins in DIscourse are loaded in alphabetical order. We plan on fixing this but for now, your base class needs to be the first one in alphabetical order if you want things to load properly.
Now that the basic structure for the plugin is in place, I see the bigger picture for the project coming together:
Update the documentation for the google sync plugin with the latest screenshots and documentation. It is important that the installation instructions explain that the base plugin must be present first.
Add a directory of available sync plugins to the README of the base plugin. For now you can just link to your google drive one.
Fork @falco’s dropbox sync plugin, and update it to use the new Synchronizer base class. When ready, send him a pull request with the changes that he will review and merge in.
At this point you will have two sync providers with a common base class! Quite an accomplishment!
One thing we identified earlier as smelly code in the plugins to set up the after_create hook. Currently you have to use a class_eval to set them up which is not good. Would it be possible to use add_model_callback from the plugin api to do this? Please investigate and come up with a better API
Create a new pull request to discourse core for your new API. Once we merge it, update your plugin (and the dropbox plugin) to use the new API.
At this point, you should start on your next sync provider! Pick another one from the list, and assuming we’ve designed the API well, implement a synchronizer that works