Schema.rb vs migrations

(Cameron Martin) #1

You shouldn’t run rake db:migrate on initial deploy of the server, as suggested in this article. use rake db:setup instead. This loads the schema from schema.rb (along with adding any seed data), which is faster and less error-prone than running all the migrations.

Discourse as Your First Rails App
(Sam Saffron) #2

What schema.rb file are you talking about? We have no schema.rb file. We had to give it up cause there were migration features we need that were missing from rails 3, they may be there in rails 4 I am not sure.

(Cameron Martin) #3

It is standard in rails 4, which is the version you have in your Gemfile.lock. I don’t know why you wouldn’t want to use it. In fact, I would strongly recommend against not using it, as do the rails team. Here is the comment from the top of schema.rb.

# This file is auto-generated from the current state of the database. Instead
# of editing this file, please use the migrations feature of Active Record to
# incrementally modify your database, and then regenerate this schema definition.
# Note that this schema.rb definition is the authoritative source for your
# database schema. If you need to create the application database on another
# system, you should be using db:schema:load, not running all the migrations
# from scratch. The latter is a flawed and unsustainable approach (the more migrations
# you'll amass, the slower it'll run and the greater likelihood for issues).
# It's strongly recommended that you check this file into your version control system.

Also read Active Record Migrations — Ruby on Rails Guides

(Sam Saffron) #4

I strongly disagree with this comment, if you use any database features that do not map to schema.rb, like say fancy constraints or procs or certain index optimisations and so on, you are hosed. schema.rb only works if you define schema using the subset exposed by ActiveRecord migrations. This may or may not be fine for your project. Having a 1-1 mapping is in the long run impractical, there are always going to be missing bits.

Our migrations run fast enough, a couple of seconds. If we are totally freaking out about this, I suggest freaking out about the asset pipeline precompilation phase first. This bites us hard daily and takes upwards of a minute on a digital ocean slice. If we design our migrations in a faulty way we are going to mess with our users real bad, cause they will have a version of Discourse they can not upgrade.

I just can not agree with that comment. (btw, we do not check in structure.sql into source controls, that is very flawed)

cc @rafaelfranca

(Cameron Martin) #5

Ah, I partially misread your previous comment.

By this, do you mean that you lose portability with other databases? If so, surely you are already losing this when executing sql in your migrations.
Or does this sql dump also not produce a 1:1 mapping when using database features not present in the migrations dsl?

But I can see how it isn’t a problem if you are careful with your migrations.

(Sam Saffron) #6

Discourse is super duper tied to postgres, we have no intention whatsoever to support any other database engines, we make use of postgres secret sauce.

The issue with structure.sql is that the paradigm is broken, that file is generated directly from the current database a developer has. If the dev is experimenting with the db or has a different version of postgres installed that file can change in some real bad ways.

So you end up have an “image” of an incorrect schema checked in to your source control, something that is super bad.

Discourse with .NET?
(Cameron Martin) #7

Awesome, glad we had this discussion.

As a side-note, what is the reasoning behind not committing the pre-compiled assets to git?

(Sam Saffron) #8

Its complicated, cause plugins can hook into the process, ideally we can get to a stage that an automatic “robot” checks in portions or precompiled javascript that can then be reused by the outer precompilation process.

We already have the perfect spot to build this cache: discourse/assets.rake at master · discourse/discourse · GitHub

(Gabriel Mazetto) #9

Can’t schema_plus be of some help here?

(Sam Saffron) #10

Is it wrong to want to have features like this?

    VALUE ~ '^[A-Za-z][A-Za-z0-9]+$'

  name dom_username,
  email TEXT

Postgres DDL is surprisingly expressive.

(Cameron Martin) #11

I suppose this rules out using NewSQL databases like foundationdb for scaling beyond when master-slave replication or sharding is not an option?

(Neil Lalonde) #12

So, redo our db schema for a nosql product? Probably not going to happen.

(Sam Saffron) #13

Yes, in my humble experience designing a product so it is database agnostic, while a lofty goal, is a compromise. It is not practical for us to develop in this way, we want to take advantage of our platform properly and not use a subset of our platform just because theoretically someone wants to run a different db engine.

We picked Postgres and are sticking to it. Its going to take a super compelling argument to get us to change our minds about it, not to mention the months and months of dev time.

(Cameron Martin) #14

NewSQL is not NoSQL. They are fully ACID-compliant distributed database engines, which combine the scalability/fault tolerance of NoSQL and the ACID guarantees of traditional SQL databases. FoundationDB is strikingly similar to Google Spanner, and is also probably the most compelling of the NewSQL databases. It is basically a transactional ordered key-value store, which is a powerful enough model to build a full SQL database on top of, which is implemented through their SQL Layer (akin to Google F1).

But systems like this only become necessary when you have ridiculously high traffic.

I’m not suggesting that you redo the schema, I’m sure that you have good reasons for making the choices you do. I just find talking about this stuff interesting :smile:

(Neil Lalonde) #15

Oh you’re right. I was looking at “FoundationDB – a NoSQL Database with ACID Transactions” (their title). Haven’t looked at that one before.

(Cameron Martin) #16

What about write-scalability? I see there is Postgres-XC and Postgres Plus, but personally I’ve never looked into them. Maybe you can shed more light on the matter.