Prior to launch @neil spent a fair amount of time working on an import/export system for Discourse. It has significant advantages over a standard db backup.
- It allows you to import a backup while the site is running
- It has robust rollback in case anything goes wrong
- It uses a single bundle which also contains uploads
Unfortunately we have not promoted this feature well enough and most Discourse site owners are not aware of its existence.
As we near version 1.0 I feel we need to refocus and “finish off” this feature. Here is a basic list of things that should be done prior to 1.0.
-
@codinghorror we need to decide on terminology, what are things called? Backup vs Export, Restore vs Import. We need clear terminology describing this feature.
-
We need an admin UI capable of:
- Exporting / Backing up a site
- Viewing available local backups
- Download of available backups (to be expanded)
- Removal of backup sets
- Restore of external (uploaded) backups and local backups
- Displaying progress and logs of the backup / restore operations
-
We need the backend to support restore from “earlier” backup sets, at the moment the site being restored must be the same version as the site that was backed up. This heavily limits the usage of this feature.
After version 1.0 of the feature we can consider
- Scheduling backups using the admin UI
- Scheduling uploads of backups to S3 or other external services.
###Why this feature is so important
Today most VPSs have a built-in backup and restore function, this easily allows you to backup and restore locally, but ties you in very hard to the initial VPS you chose. What if you decide to move from Amazon to Digital Ocean and so forth? You are easily looking at a few hours of work figuring backup and restore out.
Having a “kick ass” backup/restore story gives the Discourse community flexibility to move to another server without needing expensive and complicated process. It also allows existing communities to easily migrate to our recommended docker setup.
This feature also heavily assists people extending Discourse as you can easily download copies of sites you are working on and use them locally.
In my proposed implementation there are 2 technical hurdles:
Securely downloading backups
Having a backup hanging around in a /public/ folder (even with a hash obfuscating it) is a big security hole. If anyone gets hold of the URL you are in trouble. So when dealing with downloading these potentially big files we have 3 options:
- Implement rack middleware that hijacks the connections and streams a file to an authenticated admin
- Implement http://wiki.nginx.org/XSendfile to send the files to authenticate users
- Less secure “obsfucated” timing out public link
For v1 I think we should implement 2 and 3, 2 being the default but configurable with site settings.
Where to run the job, getting progress
Backup and restore jobs can take a while to run. During this time we need to tell users “something is happening” and allow “cancel”. If the job runs in the web worker, unicorn will go ahead and nuke it. If you run it with a background thread you may have trouble reaching it when the load balancer sends your status http call to another front end.
To resolve this we can either place sidekiq in a “single job” mode, kick the job to sidekiq and have it pump progress into the Message Bus. Or do the same thing in a background thread or forked process.
Personally, I would like backup and restore to work even if for whatever reason sidekiq is not running. It allows you to easily migrate off problem setups. My preference is either a fork like docker_manager/upgrader.rb at master · discourse/docker_manager · GitHub or a background thread.
UI concerns
Once we get terminology sorted I would like a new tab for this functionality (only visible to admin)
It should display all the available backups and have a sub tab for logs (which are populated during backup / restore). We do not need an accurate progress bar with estimated completion time for v1. However we must clearly communicate that background job is running and disable all operations during this process.
We should not tell users to visit another tab (to move the site into maint mode) for a restore, instead simply present them with a bootbox. We should always keep a backup of “previous good” setup when doing a restore, in case someone makes a mistake and restores the wrong thing.
@zogstrip will be working on this feature.
Let us know if you have any feedback or need clarification.