Tuning Ruby and Rails for Discourse

performance

(Sam Saffron) #1

Here are some quick raw notes I would like to flesh out to a blog post, eventually.

Been noticing performance on the front page of Discourse is rather bad, after digging in with MiniProfiler I notices lots of GCs are hit for a single page view. I ran some tests in production with a few settings and wanted to share them with you.

What is going on?

A strong indicator you are seeing too many GCs are huge gaps in MiniProfiler

An even stronger one is running MiniProfiler with ?pp=profile-gc-time

(4 stop the world GCs for a single request)

###Installs tested

  • rvm install 1.9.3-p385
  • rvm install 1.9.3-p385 --patch https://github.com/funny-falcon/ruby/compare/p385...p385_falcon.diff
  • rvm install 2.0.0

note this is an in-progress post, feel free to comment, but it will grow quite a bit as I flesh it out.


###Environments testing

  • default
  • RUBY_GC_MALLOC_LIMIT=90000000 (GC)
  • LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0.1.0 (GC+TC)
RAILS_ENV=production bundle exec thin start -p 3111

RAILS_ENV=production RUBY_GC_MALLOC_LIMIT=90000000 bundle exec thin start -p 3111

RAILS_ENV=production RUBY_GC_MALLOC_LIMIT=90000000 LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0.1.0 bundle exec thin start -p 3111

(tried a bunch of other “recommended” optimisations, most of them made little to no diff)

###Test rig

Production: Intel® Xeon® CPU E3-1280 V2 @ 3.60GHz

###Test performed

Results for: ab -n 100 -c 1 http://localhost:3111/
first run discarded

###rvm use 1.9.3-p385

RAILS_ENV=production bundle exec thin start -p 3111

                min  mean[+/-sd] median   max
Default:        209  260  15.3    263     297
GC     :         84  135  32.1    139     205
GC+TC  :         76  124  27.1    131     178

###rvm install 1.9.3-p385 --patch falcon

              min  mean[+/-sd] median   max
Default:        72  131  49.1    124     540
GC     :        70  120  18.9    126     158
GC+TC:          67  121  40.4    116     395

###rvm use 2.0.0

              min  mean[+/-sd] median   max
Default:       119  158  25.3    154     211
GC    :         65  120  36.1    118     221
GC+TC :         67  113  20.0    114     175

###What is the deal with RUBY_GC_MALLOC_LIMIT

This is one of the most poorly documented bits of optimisation out there:

Out-of-the box Ruby forces a GC every time 8 megs of memory is allocated this is insanely low, our front page caused about 32megs of allocation thus causing 4 GCs. bumping this up to allowing 90megs of allocation before forcing a GC means there is much less pressure of the GC. We are able to handle 3 or so requests without a GC.

As a rule we want the number large, but not too large, if its too big we will bleed way too much memory and GC will take forever. If its too small, like it is out-of-the-box we will have multiple GCs per page which is horrible on so many levels.

###What is the impact of raising RUBY_GC_MALLOC_LIMIT to 90 million ?

On our test rig, resident memory raised from 116Megs to 193Megs for the GC setting alone and up to 211Megs when combined with tcmalloc

###What do you recommend?

I have 2 recommendations depending on your stance:

  • Conservative: All conservative installs should use RUBY_GC_MALLOC_LIMIT=90000000 , its a no-brainer and the increased memory usage is well worth the extra perf, you get a 48% decrease in median request time just by doing that.

  • Bleeding Edge: I would recommend using Ruby 2.0 allocating memory using tcmalloc. You get a 57% decrease in median request time using that.

I can no longer recommend falcon patches due to the fact most of the patches are in 2.0 anyway, embrace the future.

###Issues with this test

The test was performed on a CPU that has Turbo Boost enabled. This means that clock speed varies based on load. It made it more complicated to get consistent test runs, that said I repeated all the tests multiple times and it was run on our production boxes.

###Why are we eating so much memory per request?

The easiest thing to answer here is what kind of memory we are consuming: (adding ?pp=profile-gc)

230 THOUSAND strings allocated during one request

It is going to take us a while to work through this to find the actual culprits, though quite a few of the strings allocated over and over again do seem to be coming from Active Record, so focusing on cutting the amount of querying is a sure way to reduce string allocations during a request life cycle.

###What about Rails 4?

We need to re-test this under rails 4 to see if its faster or slower and determine if string allocations have gone up or down.


When should we upgrade to Rails 4?
We are running Ruby 2.0 in production
The Discourse Servers
How can we speed up our test suite?
(Jeff Atwood) #2

Pretty cool, where did this 90000000 number come from?

We have 32gb per server so memory is unlikely to be a problem, but what are the current memory use profiles and how would this change impact memory?


(Sam Saffron) #3

Austin powers?

This is one of the most poorly documented bits of optimisation out there:

Out-of-the box Ruby forces a GC every time 8 megs of memory is allocated this is insanely low, our front page caused about 32megs of allocation thus causing 4 GCs. bumping this up to allowing 90megs of allocation before forcing a GC means there is much less pressure of the GC. We are able to handle 3 or so requests without a GC.

As a rule we want the number large, but not too large, if its too big we will bleed way too much memory and GC will take forever. If its too small, like it is out-of-the-box we will have multiple GCs per page which is horrible on so many levels.


Now wtf are we doing allocating 32megs of memory on a single page request is an entirely different and also fascinating question to answer.


(Sam Saffron) #4

FYI our thins are sitting at about 170-190MB resident, adding another 90meg there is really of very little significance, we could probably bump it even higher.


(Anthony Alberto) #5

Wow, that’s some valuable benchmark!
So switching to Ruby 2.0 looks like a great improvement, I’ll have to give it a try. No issues with Rails 3.2.12?
I saw there was a 3.2.13.rc1 released yesterday that includes fixes for Ruby 2.0 …


(tenderlove) #6

Are you able to reproduce these allocations on OS X? If so (since you’re able to use Ruby 2.0), you should be able to find the allocations pretty easily with DTrace. Use this script (I’m writing this from memory, so it may need tweaks):

ruby$target:::string-create {
  @_[copyinstr(arg1), arg2] = count();
}

Start your Rails process, run a few request, then attach to the pid with DTrace, run another request, and kill DTrace. You should end up with aggregate counts of file name / line number as the key, and number of string allocations done at that location.


(Sam Saffron) #7

Will try, give me a few days, very very sleep deprived at the moment. I would love it if we can get some of this improved in Rails 4.


(Sam Saffron) #8

results are here:

"~/.rvm/gems/ruby-2.0.0-p0/bundler/gems/MiniProfiler-d149f34fcdb6/Ruby/lib/patches/sql_patches.rb" 250L, 7628C
gems/activesupport-3.2.12/lib/active_support/time_with_zone.rb 326 210
gems/activesupport-3.2.12/lib/active_support/inflector/transliterate.rb 92 212
gems/rack-1.4.5/lib/rack/request.rb 26 214
gems/activesupport-3.2.12/lib/active_support/core_ext/string/output_safety.rb 184 259
gems/activesupport-3.2.12/lib/active_support/inflector/methods.rb 316 262
gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 26 272
gems/activesupport-3.2.12/lib/active_support/multibyte/unicode.rb 273 300
gems/activerecord-3.2.12/lib/active_record/connection_adapters/column.rb 227 360
gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 52 408
gems/activesupport-3.2.12/lib/active_support/time_with_zone.rb 115 420
gems/tzinfo-0.3.35/lib/tzinfo/time_or_datetime.rb 42 540
gems/activerecord-3.2.12/lib/active_record/connection_adapters/postgresql_adapter.rb 665 606
gems/activerecord-3.2.12/lib/active_record/relation.rb 26 936
gems/activerecord-3.2.12/lib/active_record/relation.rb 27 1170
gems/activesupport-3.2.12/lib/active_support/inflector/methods.rb 313 1179
gems/activerecord-3.2.12/lib/active_record/base.rb 514 1471
gems/activerecord-3.2.12/lib/active_record/model_schema.rb 162 1471
gems/activerecord-3.2.12/lib/active_record/inheritance.rb 68 2942
gems/activesupport-3.2.12/lib/active_support/callbacks.rb 81 2950
gems/activerecord-3.2.12/lib/active_record/attribute_methods/read.rb 72 3624
gems/activesupport-3.2.12/lib/active_support/callbacks.rb 414 8850
bundler/gems/MiniProfiler-d149f34fcdb6/Ruby/lib/patches/sql_patches.rb 69 45816
gems/activerecord-3.2.12/lib/active_record/result.rb 30 52781

The biggest offenders according to the probes appear to be:

  # this MiniProfiler monkey patch
 def values(*args, &blk)
      return values_without_profiling(*args, &blk) unless @miniprofiler_sql_id

  # and this AR method
  def hash_rows
      @hash_rows ||= @rows.map { |row|
        # this string allocation 
        Hash[@columns.zip(row)]
      }
    end

One method AR could get rid of this double allocation would be to nuke the underlying hash in favor of a real object ™, cache some sort of deserializer.

A HUGE packet of these string allocations could go away if pg supported a mode where it returned coerced types as opposed to strings.

I wonder if the MiniProfiler monkey patch should really be attributed as the “responsible party” for the allocation, the allocation happened actually a fair bit deeper in the cruby extension.


(tenderlove) #9

Hey Sam, thanks for the info.

I guess we fixed the hash thing on master, but it wasn’t backported to 3-2-stable. I’ve backported it here:

Freeze columns only once per Result · rails/rails@d2b9862 · GitHub

I’m actively working on the callbacks stuff, but it isn’t as simple. :frowning:

I’ll try to put this backport patch in the next 3-2 bugfix release.


(Sam Saffron) #10

I confirm this fix removes that huge number of allocations, it is a huge perf boost for AR.

A very big reason for the excessive allocations is us, for example we run:

SELECT  "topics".* FROM "topics"  WHERE ("topics"."deleted_at" IS NULL) AND (topics.archetype <> 'private_message') ORDER BY CASE WHEN topics.category_id IS NULL and topics.pinned THEN 0 ELSE 1 END, topics.bumped_at DESC LIMIT 30

We are not limiting down to the actual columns we need, this table has 38 columns, so that means the pg gem is allocating at least double the amount of strings we need.

The natural thing is to add a .select there, but my concern is that if we forget a column the failure mode is very problematic, stuff will not “explode” it will fail silently pretending the missing columns are nil. I wonder if we should add some special mode in dev to assist here. Regardless, we will fix this.

Additionally, we have an N+1 query hanging off category.

I think I will add a mode to MiniProfiler that allows you to see object creation count per query, that will allow people to attack the biggest offenders.

The very frustrating thing about this topic table is that only 2 out of the 38 columns are actually strings.

I profiled the simple:

(ran it twice before in production to ensure it was warmed up), and got:

Topic.limit(30).to_a

Amount of string allocations still feels excessive (even after the patch - I patched local)

File / Line Number / Allocations.
-------------------------------------
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/abstract_adapter.rb 276 1
/gems/activerecord-3.2.12/lib/active_record/log_subscriber.rb 4 1
/gems/activerecord-3.2.12/lib/active_record/log_subscriber.rb 8 1
/gems/activerecord-3.2.12/lib/active_record/querying.rb 38 1
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 53 1
/gems/activerecord-3.2.12/lib/active_record/scoping/default.rb 119 1
/gems/arel-3.0.2/lib/arel.rb 43 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 126 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 131 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 135 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 139 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 192 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 200 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 283 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 339 1
test.rb    4 1
test.rb    7 1
/gems/activerecord-3.2.12/lib/active_record/relation/query_methods.rb 268 2
/gems/activerecord-3.2.12/lib/active_record/scoping/default.rb 123 2
/gems/arel-3.0.2/lib/arel.rb 39 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 134 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 136 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 187 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 282 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 366 2
test.rb    6 2
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/abstract/connection_specification.rb 149 3
/gems/activesupport-3.2.12/lib/active_support/log_subscriber.rb 91 3
/gems/activerecord-3.2.12/lib/active_record/scoping.rb 130 4
/gems/activerecord-3.2.12/lib/active_record/scoping.rb 126 5
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 14 6
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 26 8
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 52 12
/gems/activerecord-3.2.12/lib/active_record/base.rb 514 30
/gems/activerecord-3.2.12/lib/active_record/model_schema.rb 162 30
/gems/activerecord-3.2.12/lib/active_record/relation.rb 26 32
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/postgresql_adapter.rb 665 38
/gems/activerecord-3.2.12/lib/active_record/result.rb 33 38
/gems/activerecord-3.2.12/lib/active_record/relation.rb 27 40
/gems/activerecord-3.2.12/lib/active_record/inheritance.rb 68 60
/gems/activesupport-3.2.12/lib/active_support/callbacks.rb 81 60
/gems/activesupport-3.2.12/lib/active_support/callbacks.rb 414 180
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/postgresql_adapter.rb 605 910

(Sam Saffron) #11

More profiling (both warmed up in prd)

100.times do
Topic.limit(30).to_a
end

vs.

100.times do
ActiveRecord::Base.connection.raw_connection.async_exec("select * from topics limit 30").values
end

sampler:

ruby*:::cmethod-entry
{
  k = copyinstr(arg0);
  m = copyinstr(arg1);
}

tick-5000hz
/k != 0/
{
  @[k, m] = count();
}

AR

 Hash   each   72
 String  hash   82
 Kernel  freeze  87
 PG::Result       ftype  88
 Thread  []    89
 Module  name   94
 Array   zip   107
 Array   each  120
 Thread  current 123
 Class   allocate       131
 Fixnum  abs   136
 String  intern 147
 Hash   delete 150
 Kernel  instance_variable_set 183
 Hash   []=   195
 String  to_s  195
 String  initialize      237
 String  to_sym 238
 Kernel  respond_to?      247
 Fixnum  to_s  286
 Class   superclass      349
 PG::Connection     async_exec      354
 PG::Result       values 383
 #<Class:0x007ff6e8b9e510>       relation       399
 Module  ===   415
 Kernel  block_given?     450
 Kernel  hash  512
 Kernel  class  517
 Hash   []   566
 Symbol  to_s  586
 Hash   key?  6810

Raw

 Mutexd current 1
 Thread [] 1
 Mutex unlock 3
 Module name 8
 Thread current 13
 PG::Result values 78
 PG::Connection async_exec 266

My particular concern here is all these extraneous calls to .key? , seems the sampler is really spending much time there.

These key? calls are all coming form

  2 143160      vm_call_cfunc:cmethod-entry Hash key? /Users/sam/.rvm/gems/ruby-2.0.0-p0/gems/activerecord-3.2.12/lib/active_record/locking/optimistic.rb 174
  2 143160      vm_call_cfunc:cmethod-entry Hash key? /Users/sam/.rvm/gems/ruby-2.0.0-p0/gems/activerecord-3.2.12/lib/active_record/attribute_methods/serialization.rb 66

(Jack James) #12

Having just been directed here from this:

Most of the above discussion went completely over my head. What steps do I need to take to fix the problem?


(Régis Hanol) #13

According to admin_dashboard_data.rb you have to define the RUBY_GC_MALLOC_LIMIT environment variable.

@sam recommends it to be 90000000.


(Robin Ward) #14

Maybe we should provide a better link/doc? This topic is definitely for pros :slight_smile:


(Jack James) #15

ok so I’ve added RUBY_GC_MALLOC_LIMIT=90000000 to my .env and then re-exported to upstart (which works!)

Thanks!


(Kevin P. Fleming) #16

This is needed in the process(es) where ‘thin’ is running, I assume, not clockwork or sidekiq.


(funny_falcon) #18

(mentioned memory leak in a falcon.patch were fixed a week ago)


(Neil Lalonde) #19

A better link would be nice, but I don’t know of one. Most of those admin dashboard warnings need useful links. I’m just adding the text right now, and will add better links over time.


(Sam Saffron) #20

I will try to TLDR my post to fix this up … the topic is good imho


(czbond) #21

Out of curiosity, for your home page, why not strip it out of views (erb/haml) since Rails takes so much overhead to construct a view? We create a static homepage builder with assets in s3.

Absolutely great findings and tests. Thanks.