Rails Girls Summer of Code 2017: Backup Providers

Given that a discourse-sync-amazonfoo is likely to be wanted at some time, maybe not using that exact naming structure.

discourse-sync-base
discourse-sync-to-amazon-glacier
discourse-sync-to-googledrive

4 Likes

@Falco I am trying right now to adjust the dropbox plugin to our base class. for that i just took out the loop and structured the synchronizer more as a subclass. I dont know if it will work like this now… Jen also wanted to change the job and loop the synchronizer over each backup there. but if you already have suggestions here or something that will conflict with the dropbox API please feel free to criticize
https://github.com/berlindiamonds/discourse-backups-to-dropbox/tree/perform

7 Likes

Currently the sync:

  1. get’s the last N local backups

  2. check if they aren’t already on Dropbox

  3. upload the missing ones

  4. Delete backups who aren’t local anymore from Dropbox

That’s a lot of responsability :sweat:

Your new model, of one instance of the class handling each backup is way better.

However now steps 2 and 4 logic must be somewhere else.

Did you handle deletion of old backups on the Google Drive plugin? We should use the same pattern here.

7 Likes

Hi @Falco!

Thanks for the answer, just today we’ve talked about implementing the same logic of checking for existent files in Google Drive and only upload those that are not present :slight_smile:

The num of backups is now handled by the class SyncBackupsToDropbox (in the Job), that sends one backup to the dropbox synchronizer. I see two options to deal with existent files:

  1. Check in the DropboxSynchronizer class if the backup file exists. Upload it if it doesn’t exist, or ignore if it exists.

  2. Check in the class SyncBackupsToDropbox (in the Job): before sending the backup file to the DropboxSynchronizer. Then we should look for a way to access the provider’s space from here.


  • Delete backups who aren’t local anymore from Dropbox <<<< we totally missed on this, thanks for pointing, seems totally reasonable since storages are not bottomless :smile:
6 Likes

Hello Falco!
Thanks for your feedback :slight_smile:
In our DriveSynchronizer I added a method called remove_old_files and its first draft looks like this:

def remove_old_files
  google_files = files.collection_by_title
  local_files  = Backup.all.map(&:filename)
    google_files.each do |d|
      unless local_files.include?(d)
        d.delete
     end
   end
end

Now I wonder, if this is the right amount of files to keep in our GoogleDrive clowd and what kind of limit or measurement makes sense here. Some people might want to keep more files than just the ones that Discourse has locally…? Where to set the right measurements? Should there be a setting or question to the admin of how many files she wants to keep in the clowd? How do you decide that?

3 Likes

Hello :slight_smile:

I’ve worked on the perform_sync method in the dropbox_synchronizer by @Falco .
The plugin is working with the base class that @kajatiger created and it uploads only files that don’t exist already in Dropbox.

This is my proposal:

https://github.com/berlindiamonds/discourse-sync-to-dropbox/blob/master/lib/dropbox_synchronizer.rb

I feel like perform_sync / upload_unique_files / upload / chunked_upload are a collection of Matryoshka dolls because they’re successively calling the next method from within… I don’t even know if this is a bad practice? :smiley:

We still need to include a method that deletes the oldest files from dropbox.

Open for changes/feedback, cheers!

2 Likes

The current plugin uses the site setting discourse_backups_to_dropbox_quantity, so it will try to sync the most recent N backup files, where N is defined by the setting.

However, this has some flaws.

discourse_backups_to_dropbox_quantity can’t be greater than Discourse core setting maximum backups.

To me, something better would be to select how many backups you want to store on Dropbox, independently of how many you have on the local file system.

So you will need to perform the following in the remove_old_files method:

  1. List all files on the remote folder

  2. Order this files by creation date

  3. Delete all files except the most N recent ones, where N is a plugin setting.

5 Likes

hello!
my method looks like this now for google drive:

def remove_old_files
  google_files = session.files
  sorted = google_files.sort_by {|x| x.created_time}
  keep = sorted.take(SiteSetting.discourse_sync_to_googledrive_quantity)
  trash = google_files - keep
  trash.each { |d| d.delete(true) }
end

But google drive doesn’t delete the files.
I posted this problem to stack overflow because I couldn’t find any similar problem and nobody knew what the problem was.
I will look into the dropbox sync now for the same issue…

1 Like

That is quite unusual. I wonder if there’s a bug in the API gem you’re using?

It will be very interesting to see if you have the same issue with Dropbox. I suspect you won’t!

with dropbox it works completely different. I find it harder to understand the API

Today is a holiday in Canada but maybe we can discuss tomorrow and go through the pain points of the API.

2 Likes

One thing we discussed recently in slack is the ability for the Backup class to add a hook that doesn’t overwrite the current hook.

What I think you should do is use DiscourseEvent.

If you look through the code, you’ll find examples of how DiscourseEvent.trigger work. You should add events.

What I suggest is, alongside the old after_create_hook and after_remove_hook you trigger a new DiscourseEvent.

Then your plugin can listen for the event using DiscourseEvent.on and handle the backup.

5 Likes