Improving import and export support

Prior to launch @neil spent a fair amount of time working on an import/export system for Discourse. It has significant advantages over a standard db backup.

  • It allows you to import a backup while the site is running
  • It has robust rollback in case anything goes wrong
  • It uses a single bundle which also contains uploads

Unfortunately we have not promoted this feature well enough and most Discourse site owners are not aware of its existence.

As we near version 1.0 I feel we need to refocus and “finish off” this feature. Here is a basic list of things that should be done prior to 1.0.

  • @codinghorror we need to decide on terminology, what are things called? Backup vs Export, Restore vs Import. We need clear terminology describing this feature.

  • We need an admin UI capable of:

    • Exporting / Backing up a site
    • Viewing available local backups
    • Download of available backups (to be expanded)
    • Removal of backup sets
    • Restore of external (uploaded) backups and local backups
    • Displaying progress and logs of the backup / restore operations
  • We need the backend to support restore from “earlier” backup sets, at the moment the site being restored must be the same version as the site that was backed up. This heavily limits the usage of this feature.

After version 1.0 of the feature we can consider

  • Scheduling backups using the admin UI
  • Scheduling uploads of backups to S3 or other external services.

###Why this feature is so important

Today most VPSs have a built-in backup and restore function, this easily allows you to backup and restore locally, but ties you in very hard to the initial VPS you chose. What if you decide to move from Amazon to Digital Ocean and so forth? You are easily looking at a few hours of work figuring backup and restore out.

Having a “kick ass” backup/restore story gives the Discourse community flexibility to move to another server without needing expensive and complicated process. It also allows existing communities to easily migrate to our recommended docker setup.

This feature also heavily assists people extending Discourse as you can easily download copies of sites you are working on and use them locally.

In my proposed implementation there are 2 technical hurdles:

Securely downloading backups

Having a backup hanging around in a /public/ folder (even with a hash obfuscating it) is a big security hole. If anyone gets hold of the URL you are in trouble. So when dealing with downloading these potentially big files we have 3 options:

  1. Implement rack middleware that hijacks the connections and streams a file to an authenticated admin
  2. Implement http://wiki.nginx.org/XSendfile to send the files to authenticate users
  3. Less secure “obsfucated” timing out public link

For v1 I think we should implement 2 and 3, 2 being the default but configurable with site settings.

Where to run the job, getting progress

Backup and restore jobs can take a while to run. During this time we need to tell users “something is happening” and allow “cancel”. If the job runs in the web worker, unicorn will go ahead and nuke it. If you run it with a background thread you may have trouble reaching it when the load balancer sends your status http call to another front end.

To resolve this we can either place sidekiq in a “single job” mode, kick the job to sidekiq and have it pump progress into the Message Bus. Or do the same thing in a background thread or forked process.

Personally, I would like backup and restore to work even if for whatever reason sidekiq is not running. It allows you to easily migrate off problem setups. My preference is either a fork like docker_manager/upgrader.rb at master · discourse/docker_manager · GitHub or a background thread.

UI concerns

Once we get terminology sorted I would like a new tab for this functionality (only visible to admin)

It should display all the available backups and have a sub tab for logs (which are populated during backup / restore). We do not need an accurate progress bar with estimated completion time for v1. However we must clearly communicate that background job is running and disable all operations during this process.

We should not tell users to visit another tab (to move the site into maint mode) for a restore, instead simply present them with a bootbox. We should always keep a backup of “previous good” setup when doing a restore, in case someone makes a mistake and restores the wrong thing.


@zogstrip will be working on this feature.

Let us know if you have any feedback or need clarification.

11 Likes

When a backup/restore is running, we should disallow any write on the system:

  • How do we alert users? Using a banner at the top of every pages?
  • Should we prevent them from opening the composer?
  • What about admin settings?
  • What about user signin up?
1 Like

Be sure to review @neil’s existing implementation, I suspect you will need little to no changes in that department. He very wisely uses a backup schema during the import process, so its a non-problem there.

Disabling write globally can be a challenge, @neil already added the maint flag to site settings, we just need a cleaner way of checking it and to ensure it is checked in all the proper places. A key thing would be to ensure our key pages (topic / user / list) don’t issue any updates / inserts - and only trigger them via bg job. Then we can pretty much shut down the rest of the site during this process.

This particular area can quickly become a time sink, be careful. We want the simplest thing that can possibly work for v1. I don’t want corrupt backups they are a problem. But perhaps the import process can be robust enough to handle a handful of dangling records.

The absolute simplest implementation for v1 which is my recommendation:

During export:

  • Disable sidekiq jobs and ensure none are running using the site setting
  • Reroute all routes except for the export page to a “site is in maint mode” page.

During import you don’t really need to do anything.

1 Like

How are we going to handle uploads of such large files? Browsers are notoriously bad at sending files past a certain size.

I agree with automatically uploading to S3 (or other storage APIs like Dropbox) should wait until the next release, but using a 3rd party storage does solve some of those uploading problems. At the very least it should be kept in mind so that when we do choose to implement it.

I like the idea of a forked process working even when the site is in a maintenance mode. Are forks copy on write? I ask because in resource constrained environments such as low end Digital Ocean slices, people use up a lot of their memory on processes. Could creating a new one cause them to go over the limit and swap?

Actually that brings up another point which is how efficient is the import/export from a memory POV. If it uses GBs of RAM it’s not going to work in memory constrained environments without swapping a lot.

2 Likes

Yes, this can be a pain point at certain scale, I guess we have a few options here:

  1. Split up large backups into multiple files
  2. Allow for backups that exclude images (and just contain image urls)
  3. Ensure our docker setup of nginx (and nginx sample) does GET resume properly.

Yes totally, so you would share a nice 50% of memory with master, that said we need to properly review the export / import code to ensure it uses SAX techniques to parse and a stream based generator.

1 Like

I think the best solution will ultimately be a “Choose…” button that allows you to pick one of your backups from s3 or dropbox. I know I know, after v1.0 but it honestly solves this problem in a really nice way.

Great to hear about COW, that makes me feel better.

Currently, we return a 503 when the site is in maintence mode for import/export:

    def block_if_maintenance_mode
      if Discourse.maintenance_mode?
        if request.format.json?
          render status: 503, json: failed_json.merge(message: I18n.t('site_under_maintenance'))
        else
          render status: 503, file: File.join( Rails.root, 'public', '503.html' ), layout: false
        end
      end
    end

Ideally, all the json calls will handle this response and do… something. If you were composing something and the response is a 503, then it can become a mess. What happens to the draft? If an export is happening, then no big deal. Wait and submit again (wait how long?). If an import is happening, the topic you’re replying to could be gone. The 503 may need to return some extra fields so that the composer can give useful advice. “Please copy your post out of the composer and save it for later” or “The site will return soon. Please submit again in 5 minutes.”

This is a difficult problem… I took a stab at it with the import adapters, but it’s a lot of work. Whenever someone adds a migration, they would often need to write an import adapter. Here’s an example:

module Import
  module Adapter
    class RemoveSubTagFromTopics < Base

      register version: '20130116151829', tables: [:topics]

      def up_column_names(table_name, column_names)
        # remove_column :topics, :sub_tag
        if table_name.to_sym == :topics
          column_names.reject {|col| col == 'sub_tag'}
        else
          column_names
        end
      end

      def up_row(table_name, row)
        # remove_column :topics, :sub_tag
        if table_name.to_sym == :topics
          row[0..29] + row[31..-1]
        else
          row
        end
      end

    end
  end
end

Yuck. A simple adapter like this for remove_column could probably be generated. But there must be a better solution.

The export file contains the metadata about where it came from, including the last migration that was run on its database (ActiveRecord::Migrator.current_version). Could we create a database, migrate it to that version, run the import, and then migrate it to the latest version? Then we end up with two databases. They would need to be swapped or merged… Umm… Not sure.

3 Likes

I agree with @neil, that is an insane out of scope requirement to demand that old database versions be magically importable into the current version. I can’t even think of any other software that allows this, certainly WordPress does not.

Solution seems simple:

  1. Install correct version of software that matches the database version
  2. Import the database
  3. Upgrade the software and database to latest

So really what is needed is the ability to install arbitrary Discourse versions. (this also assumes people do not run latest but only numbered releases; after v1 we will need to be strict about that.)

How about

  1. Swap current db to “old_schema”
  2. Migrate up to the point the db was at during export
  3. Import data
  4. Migrate again

I don’t see any real drama there. No need for an extra db.

2 Likes

Yeah that makes more sense. Worth a try!

Is there a wiki page, or sticky post somewhere that documents the current import/export format, or how to do an import/export?

Or is the source code itself the only source of this information?

I believe it is all possible via the admin web gui now. @zogstrip worked on this and we use it extensively.

I just installed the latest Discourse via Docker on a Amazon EC2 box.

I have two dummy posts, with a file upload.

I went to /admin, then Backups, and tried to create a backup.

However, it seemed to stall at “Waiting for 17 jobs…”, then errored out.

Full log is on Gist here:

(Hmm - seems my Gist link isn’t shown on the post? Is it some kind of parsing issue?

gist.github.com/victorhooi/9339617

Any thoughts on what I did wrong?

It didn’t start the backup because Sidekiq had some jobs running.

Try visiting /sidekiq on your forum as an admin - what do you see?

Hmm,this is my Sidekiq admin:

Queues shows a single queue, “Default”, with no queued jobs.

However, the Backup job now refuses to start.

When I click the Backup button, nothing happens - it doesn’t put the forum in read-only mode, or anything like that.

Should I restart the docker box or something like that, to give it a kick? Or are there specific logs from the box that I can put here that would help diagnose what’s going on?

Also - this is a pretty small box, memory wise - so could it might simply be an issue with being memory constrained? There is only one user and two posts though.

How much memory? 1gb is the minimum.

I slightly disagree - 1GB is the minimum if and only if you have at least 1GB of swap as well.

Anything less than 1GB RAM will be unable to run the forum, and anything less than 2GB RAM + swap will be unable to do upgrades or other ‘extra’ tasks.

So, really, 2GB ram + swap is the minimum.

This is a EC2 Micro - so around 600 Mb of RAM - I figured it was enough for one user (me) and 2 posts.

It’s obviously not a prod setup or even QA - I only set it up because I was excited to checkout the admin export feature you mentioned above (Improving import and export support).

Is it possible to verify it’s a memory issue here? Or is there some other reason it’s seemingly in this confused state where the “Backup” button doesn’t do anything, or why the original job failed?

Are there any more detailed logs I can retrieve from the box to help troubleshoot?

Ok, I’ve put the output from ./launcher logs app on Gist here:

@riking - Aha, just saw your comment - wait - you replied after my last post? Lol. Ok, getting used to Discourse ordering =).

Hmm, so you’re saying Discourse is pretty memory hungry, huh? Bugger. So it wouldn’t even be possible to play around with it on any box with less than 2Gb?

I plan to have “low mem” Discourse config that gives up on perf, but its still in the works in my head :smile:

1 Like