Improving import and export support

Currently, we return a 503 when the site is in maintence mode for import/export:

    def block_if_maintenance_mode
      if Discourse.maintenance_mode?
        if request.format.json?
          render status: 503, json: failed_json.merge(message: I18n.t('site_under_maintenance'))
        else
          render status: 503, file: File.join( Rails.root, 'public', '503.html' ), layout: false
        end
      end
    end

Ideally, all the json calls will handle this response and do… something. If you were composing something and the response is a 503, then it can become a mess. What happens to the draft? If an export is happening, then no big deal. Wait and submit again (wait how long?). If an import is happening, the topic you’re replying to could be gone. The 503 may need to return some extra fields so that the composer can give useful advice. “Please copy your post out of the composer and save it for later” or “The site will return soon. Please submit again in 5 minutes.”

This is a difficult problem… I took a stab at it with the import adapters, but it’s a lot of work. Whenever someone adds a migration, they would often need to write an import adapter. Here’s an example:

module Import
  module Adapter
    class RemoveSubTagFromTopics < Base

      register version: '20130116151829', tables: [:topics]

      def up_column_names(table_name, column_names)
        # remove_column :topics, :sub_tag
        if table_name.to_sym == :topics
          column_names.reject {|col| col == 'sub_tag'}
        else
          column_names
        end
      end

      def up_row(table_name, row)
        # remove_column :topics, :sub_tag
        if table_name.to_sym == :topics
          row[0..29] + row[31..-1]
        else
          row
        end
      end

    end
  end
end

Yuck. A simple adapter like this for remove_column could probably be generated. But there must be a better solution.

The export file contains the metadata about where it came from, including the last migration that was run on its database (ActiveRecord::Migrator.current_version). Could we create a database, migrate it to that version, run the import, and then migrate it to the latest version? Then we end up with two databases. They would need to be swapped or merged… Umm… Not sure.

3 Likes

I agree with @neil, that is an insane out of scope requirement to demand that old database versions be magically importable into the current version. I can’t even think of any other software that allows this, certainly WordPress does not.

Solution seems simple:

  1. Install correct version of software that matches the database version
  2. Import the database
  3. Upgrade the software and database to latest

So really what is needed is the ability to install arbitrary Discourse versions. (this also assumes people do not run latest but only numbered releases; after v1 we will need to be strict about that.)

How about

  1. Swap current db to “old_schema”
  2. Migrate up to the point the db was at during export
  3. Import data
  4. Migrate again

I don’t see any real drama there. No need for an extra db.

2 Likes

Yeah that makes more sense. Worth a try!

Is there a wiki page, or sticky post somewhere that documents the current import/export format, or how to do an import/export?

Or is the source code itself the only source of this information?

I believe it is all possible via the admin web gui now. @zogstrip worked on this and we use it extensively.

I just installed the latest Discourse via Docker on a Amazon EC2 box.

I have two dummy posts, with a file upload.

I went to /admin, then Backups, and tried to create a backup.

However, it seemed to stall at “Waiting for 17 jobs…”, then errored out.

Full log is on Gist here:

(Hmm - seems my Gist link isn’t shown on the post? Is it some kind of parsing issue?

gist.github.com/victorhooi/9339617

Any thoughts on what I did wrong?

It didn’t start the backup because Sidekiq had some jobs running.

Try visiting /sidekiq on your forum as an admin - what do you see?

Hmm,this is my Sidekiq admin:

Queues shows a single queue, “Default”, with no queued jobs.

However, the Backup job now refuses to start.

When I click the Backup button, nothing happens - it doesn’t put the forum in read-only mode, or anything like that.

Should I restart the docker box or something like that, to give it a kick? Or are there specific logs from the box that I can put here that would help diagnose what’s going on?

Also - this is a pretty small box, memory wise - so could it might simply be an issue with being memory constrained? There is only one user and two posts though.

How much memory? 1gb is the minimum.

I slightly disagree - 1GB is the minimum if and only if you have at least 1GB of swap as well.

Anything less than 1GB RAM will be unable to run the forum, and anything less than 2GB RAM + swap will be unable to do upgrades or other ‘extra’ tasks.

So, really, 2GB ram + swap is the minimum.

This is a EC2 Micro - so around 600 Mb of RAM - I figured it was enough for one user (me) and 2 posts.

It’s obviously not a prod setup or even QA - I only set it up because I was excited to checkout the admin export feature you mentioned above (Improving import and export support).

Is it possible to verify it’s a memory issue here? Or is there some other reason it’s seemingly in this confused state where the “Backup” button doesn’t do anything, or why the original job failed?

Are there any more detailed logs I can retrieve from the box to help troubleshoot?

Ok, I’ve put the output from ./launcher logs app on Gist here:

@riking - Aha, just saw your comment - wait - you replied after my last post? Lol. Ok, getting used to Discourse ordering =).

Hmm, so you’re saying Discourse is pretty memory hungry, huh? Bugger. So it wouldn’t even be possible to play around with it on any box with less than 2Gb?

I plan to have “low mem” Discourse config that gives up on perf, but its still in the works in my head :smile:

1 Like

Hmm - sorry, I didn’t quite follow the sentence - do you mean some kind of warning sentence on the /admin page, just like we do for ImageMagick, missing Facebook tokens etc.?

Or something else?

No, any box with less than 2GB mem + swap combined.

I’m running one on a $10/mo DigitalOcean cloud server, which is 1GB of ram, but I also set up a 2GB swapfile.

1 Like

Ok, so I have it setup now on a higher-specced box, and I was able to run the backup =).

I downloaded the backup tar.gz file, and extracted it. Inside, I have:

  • dump.sql
  • meta.json

Just to clarify - the meta.json file has a timestamp as the version number - does this mean I need to match the exact same git commit in order to do a successful import/export? Is this going to be the case all the way until 1.0?

And the dump.sql - that’s just a straight PostgreSQL dump, right? So there isn’t really a specific Discourse import/export or backup format per se, it’s just a dump of the database.

Hmm, my original purpose of testing this feature was to see how I might import another forum (www.lefora.com) into Discourse.

If I want to do that - do I need to generate a dump.sql file exactly like this, including all the DDL statements, in order for Discourse to import? There’s no easier way/format?

I set up a Digital Ocean box with low memory and enough swap to mock around and I run into the problem where running backups or restores from the admin panel would fail with ‘Waiting for sidekiq to finish running jobs…’. and multiple lines of ‘Waiting for 1 jobs…’. I don’t think it has anything to do with memory. I tried to resize the DO droplet and increase memory to 2GB with no help.

Turns out even though (default) queue was empty, sidekiq had one email worker from my registration email that I needed to remove for it to run. I didn’t setup email for this install since I just wanted to play around and not use the install for anything else than testing. After clicking ‘Clear workers list’ it runs fine even with 512MB memory (+swap of course).

2 Likes

The meta.json file currently only holds the current database version (which is the timestamp when the last migration was created). This value is used during the restore to make sure you’re not importing a newer version of the database without migrating first.

Not exactly.

We do use the standard pg_dump command to generate the dump but then we add a slight modification.

In order to limit the amount of downtime during the restoration, we make sure to restore the backup in the restore schema instead of the standard public schema.

This allow us to limit the downtime to the amount of time it takes to switch the public schema to the backup schema, and the restore schema to the public schema :wink:

You will need to write custom code for that. You may want to take a look at

for inspiration.

Unfortunately no. Welcome to a world of pain :rage1:

Hi,

Have there been any changes/updates on the Discourse import/migration front?

Or is improving import/export support still on the roadmap somewhere?

There was talk about this Discourse migration service releasing some of their code as open-source, however, that doesn’t appear to have eventuated.

Is anybody aware of any up-to-date migration/import tools for Discourse?