#1 By: Sam Saffron, February 27th, 2013 07:27
Here are some quick raw notes I would like to flesh out to a blog post, eventually.
Been noticing performance on the front page of Discourse is rather bad, after digging in with MiniProfiler I notices lots of GCs are hit for a single page view. I ran some tests in production with a few settings and wanted to share them with you.
What is going on?
A strong indicator you are seeing too many GCs are huge gaps in MiniProfiler

An even stronger one is running MiniProfiler with ?pp=profile-gc-time

(4 stop the world GCs for a single request)
Installs tested
note this is an in-progress post, feel free to comment, but it will grow quite a bit as I flesh it out.
Environments testing
- default
- RUBY_GC_MALLOC_LIMIT=90000000 (GC)
- LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0.1.0 (GC+TC)
RAILS_ENV=production bundle exec thin start -p 3111
RAILS_ENV=production RUBY_GC_MALLOC_LIMIT=90000000 bundle exec thin start -p 3111
RAILS_ENV=production RUBY_GC_MALLOC_LIMIT=90000000 LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0.1.0 bundle exec thin start -p 3111
(tried a bunch of other "recommended" optimisations, most of them made little to no diff)
Test rig
Production: Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz
Test performed
Results for: ab -n 100 -c 1 http://localhost:3111/
first run discarded
rvm use 1.9.3-p385
RAILS_ENV=production bundle exec thin start -p 3111
min mean[+/-sd] median max
Default: 209 260 15.3 263 297
GC : 84 135 32.1 139 205
GC+TC : 76 124 27.1 131 178
rvm install 1.9.3-p385 --patch falcon
min mean[+/-sd] median max
Default: 72 131 49.1 124 540
GC : 70 120 18.9 126 158
GC+TC: 67 121 40.4 116 395
rvm use 2.0.0
min mean[+/-sd] median max
Default: 119 158 25.3 154 211
GC : 65 120 36.1 118 221
GC+TC : 67 113 20.0 114 175
What is the deal with RUBY_GC_MALLOC_LIMIT
This is one of the most poorly documented bits of optimisation out there:
github.com
#ifndef GC_MALLOC_LIMIT
#define GC_MALLOC_LIMIT 8000000
#endif
github.com
static void
aligned_free(void *ptr)
{
#if defined __MINGW32__
Out-of-the box Ruby forces a GC every time 8 megs of memory is allocated this is insanely low, our front page caused about 32megs of allocation thus causing 4 GCs. bumping this up to allowing 90megs of allocation before forcing a GC means there is much less pressure of the GC. We are able to handle 3 or so requests without a GC.
As a rule we want the number large, but not too large, if its too big we will bleed way too much memory and GC will take forever. If its too small, like it is out-of-the-box we will have multiple GCs per page which is horrible on so many levels.
What is the impact of raising RUBY_GC_MALLOC_LIMIT to 90 million ?
On our test rig, resident memory raised from 116Megs to 193Megs for the GC setting alone and up to 211Megs when combined with tcmalloc
What do you recommend?
I have 2 recommendations depending on your stance:
Conservative: All conservative installs should use RUBY_GC_MALLOC_LIMIT=90000000 , its a no-brainer and the increased memory usage is well worth the extra perf, you get a 48% decrease in median request time just by doing that.
Bleeding Edge: I would recommend using Ruby 2.0 allocating memory using tcmalloc. You get a 57% decrease in median request time using that.
I can no longer recommend falcon patches due to the fact most of the patches are in 2.0 anyway, embrace the future.
Issues with this test
The test was performed on a CPU that has Turbo Boost enabled. This means that clock speed varies based on load. It made it more complicated to get consistent test runs, that said I repeated all the tests multiple times and it was run on our production boxes.
Why are we eating so much memory per request?
The easiest thing to answer here is what kind of memory we are consuming: (adding ?pp=profile-gc)

230 THOUSAND strings allocated during one request
It is going to take us a while to work through this to find the actual culprits, though quite a few of the strings allocated over and over again do seem to be coming from Active Record, so focusing on cutting the amount of querying is a sure way to reduce string allocations during a request life cycle.
What about Rails 4?
We need to re-test this under rails 4 to see if its faster or slower and determine if string allocations have gone up or down.
#2 By: Jeff Atwood, February 27th, 2013 15:35
Pretty cool, where did this 90000000 number come from?
We have 32gb per server so memory is unlikely to be a problem, but what are the current memory use profiles and how would this change impact memory?
#3 By: Sam Saffron, February 28th, 2013 05:06
Austin powers?
This is one of the most poorly documented bits of optimisation out there:
github.com
#ifndef GC_MALLOC_LIMIT
#define GC_MALLOC_LIMIT 8000000
#endif
github.com
if ((ruby_gc_stress && !ruby_disable_gc_stress) ||
(malloc_increase+size) > malloc_limit) {
garbage_collect_with_gvl(objspace);
}
Out-of-the box Ruby forces a GC every time 8 megs of memory is allocated this is insanely low, our front page caused about 32megs of allocation thus causing 4 GCs. bumping this up to allowing 90megs of allocation before forcing a GC means there is much less pressure of the GC. We are able to handle 3 or so requests without a GC.
As a rule we want the number large, but not too large, if its too big we will bleed way too much memory and GC will take forever. If its too small, like it is out-of-the-box we will have multiple GCs per page which is horrible on so many levels.
Now wtf are we doing allocating 32megs of memory on a single page request is an entirely different and also fascinating question to answer.
#4 By: Sam Saffron, February 28th, 2013 05:29
FYI our thins are sitting at about 170-190MB resident, adding another 90meg there is really of very little significance, we could probably bump it even higher.
#5 By: Anthony Alberto, February 28th, 2013 08:17
Wow, that's some valuable benchmark!
So switching to Ruby 2.0 looks like a great improvement, I'll have to give it a try. No issues with Rails 3.2.12?
I saw there was a 3.2.13.rc1 released yesterday that includes fixes for Ruby 2.0 ...
#6 By: tenderlove, February 28th, 2013 13:15
Are you able to reproduce these allocations on OS X? If so (since you're able to use Ruby 2.0), you should be able to find the allocations pretty easily with DTrace. Use this script (I'm writing this from memory, so it may need tweaks):
ruby$target:::string-create {
@_[copyinstr(arg1), arg2] = count();
}
Start your Rails process, run a few request, then attach to the pid with DTrace, run another request, and kill DTrace. You should end up with aggregate counts of file name / line number as the key, and number of string allocations done at that location.
#7 By: Sam Saffron, February 28th, 2013 16:50
Will try, give me a few days, very very sleep deprived at the moment. I would love it if we can get some of this improved in Rails 4.
#8 By: Sam Saffron, March 3rd, 2013 07:54
results are here:
"~/.rvm/gems/ruby-2.0.0-p0/bundler/gems/MiniProfiler-d149f34fcdb6/Ruby/lib/patches/sql_patches.rb" 250L, 7628C
gems/activesupport-3.2.12/lib/active_support/time_with_zone.rb 326 210
gems/activesupport-3.2.12/lib/active_support/inflector/transliterate.rb 92 212
gems/rack-1.4.5/lib/rack/request.rb 26 214
gems/activesupport-3.2.12/lib/active_support/core_ext/string/output_safety.rb 184 259
gems/activesupport-3.2.12/lib/active_support/inflector/methods.rb 316 262
gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 26 272
gems/activesupport-3.2.12/lib/active_support/multibyte/unicode.rb 273 300
gems/activerecord-3.2.12/lib/active_record/connection_adapters/column.rb 227 360
gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 52 408
gems/activesupport-3.2.12/lib/active_support/time_with_zone.rb 115 420
gems/tzinfo-0.3.35/lib/tzinfo/time_or_datetime.rb 42 540
gems/activerecord-3.2.12/lib/active_record/connection_adapters/postgresql_adapter.rb 665 606
gems/activerecord-3.2.12/lib/active_record/relation.rb 26 936
gems/activerecord-3.2.12/lib/active_record/relation.rb 27 1170
gems/activesupport-3.2.12/lib/active_support/inflector/methods.rb 313 1179
gems/activerecord-3.2.12/lib/active_record/base.rb 514 1471
gems/activerecord-3.2.12/lib/active_record/model_schema.rb 162 1471
gems/activerecord-3.2.12/lib/active_record/inheritance.rb 68 2942
gems/activesupport-3.2.12/lib/active_support/callbacks.rb 81 2950
gems/activerecord-3.2.12/lib/active_record/attribute_methods/read.rb 72 3624
gems/activesupport-3.2.12/lib/active_support/callbacks.rb 414 8850
bundler/gems/MiniProfiler-d149f34fcdb6/Ruby/lib/patches/sql_patches.rb 69 45816
gems/activerecord-3.2.12/lib/active_record/result.rb 30 52781
The biggest offenders according to the probes appear to be:
# this MiniProfiler monkey patch
def values(*args, &blk)
return values_without_profiling(*args, &blk) unless @miniprofiler_sql_id
# and this AR method
def hash_rows
@hash_rows ||= @rows.map { |row|
# this string allocation
Hash[@columns.zip(row)]
}
end
One method AR could get rid of this double allocation would be to nuke the underlying hash in favor of a real object (tm), cache some sort of deserializer.
A HUGE packet of these string allocations could go away if pg supported a mode where it returned coerced types as opposed to strings.
I wonder if the MiniProfiler monkey patch should really be attributed as the "responsible party" for the allocation, the allocation happened actually a fair bit deeper in the cruby extension.
#9 By: tenderlove, March 3rd, 2013 11:17
Hey Sam, thanks for the info.
I guess we fixed the hash thing on master, but it wasn't backported to 3-2-stable. I've backported it here:
Freeze columns only once per Result
Conflicts:
activerecord/lib/active_record/result.rb
Changed 1 files with 9 additions and 3 deletions.
I'm actively working on the callbacks stuff, but it isn't as simple. :-(
I'll try to put this backport patch in the next 3-2 bugfix release.
#10 By: Sam Saffron, March 3rd, 2013 18:18
I confirm this fix removes that huge number of allocations, it is a huge perf boost for AR.
A very big reason for the excessive allocations is us, for example we run:
SELECT "topics".* FROM "topics" WHERE ("topics"."deleted_at" IS NULL) AND (topics.archetype <> 'private_message') ORDER BY CASE WHEN topics.category_id IS NULL and topics.pinned THEN 0 ELSE 1 END, topics.bumped_at DESC LIMIT 30
We are not limiting down to the actual columns we need, this table has 38 columns, so that means the pg gem is allocating at least double the amount of strings we need.
The natural thing is to add a .select there, but my concern is that if we forget a column the failure mode is very problematic, stuff will not "explode" it will fail silently pretending the missing columns are nil. I wonder if we should add some special mode in dev to assist here. Regardless, we will fix this.
Additionally, we have an N+1 query hanging off category.
I think I will add a mode to MiniProfiler that allows you to see object creation count per query, that will allow people to attack the biggest offenders.
The very frustrating thing about this topic table is that only 2 out of the 38 columns are actually strings.
I profiled the simple:
(ran it twice before in production to ensure it was warmed up), and got:
Topic.limit(30).to_a
Amount of string allocations still feels excessive (even after the patch - I patched local)
File / Line Number / Allocations.
-------------------------------------
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/abstract_adapter.rb 276 1
/gems/activerecord-3.2.12/lib/active_record/log_subscriber.rb 4 1
/gems/activerecord-3.2.12/lib/active_record/log_subscriber.rb 8 1
/gems/activerecord-3.2.12/lib/active_record/querying.rb 38 1
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 53 1
/gems/activerecord-3.2.12/lib/active_record/scoping/default.rb 119 1
/gems/arel-3.0.2/lib/arel.rb 43 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 126 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 131 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 135 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 139 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 192 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 200 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 283 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 339 1
test.rb 4 1
test.rb 7 1
/gems/activerecord-3.2.12/lib/active_record/relation/query_methods.rb 268 2
/gems/activerecord-3.2.12/lib/active_record/scoping/default.rb 123 2
/gems/arel-3.0.2/lib/arel.rb 39 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 134 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 136 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 187 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 282 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 366 2
test.rb 6 2
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/abstract/connection_specification.rb 149 3
/gems/activesupport-3.2.12/lib/active_support/log_subscriber.rb 91 3
/gems/activerecord-3.2.12/lib/active_record/scoping.rb 130 4
/gems/activerecord-3.2.12/lib/active_record/scoping.rb 126 5
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 14 6
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 26 8
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 52 12
/gems/activerecord-3.2.12/lib/active_record/base.rb 514 30
/gems/activerecord-3.2.12/lib/active_record/model_schema.rb 162 30
/gems/activerecord-3.2.12/lib/active_record/relation.rb 26 32
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/postgresql_adapter.rb 665 38
/gems/activerecord-3.2.12/lib/active_record/result.rb 33 38
/gems/activerecord-3.2.12/lib/active_record/relation.rb 27 40
/gems/activerecord-3.2.12/lib/active_record/inheritance.rb 68 60
/gems/activesupport-3.2.12/lib/active_support/callbacks.rb 81 60
/gems/activesupport-3.2.12/lib/active_support/callbacks.rb 414 180
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/postgresql_adapter.rb 605 910
#11 By: Sam Saffron, March 4th, 2013 03:16
More profiling (both warmed up in prd)
100.times do
Topic.limit(30).to_a
end
vs.
100.times do
ActiveRecord::Base.connection.raw_connection.async_exec("select * from topics limit 30").values
end
sampler:
ruby*:::cmethod-entry
{
k = copyinstr(arg0);
m = copyinstr(arg1);
}
tick-5000hz
/k != 0/
{
@[k, m] = count();
}
AR
Hash each 72
String hash 82
Kernel freeze 87
PG::Result ftype 88
Thread [] 89
Module name 94
Array zip 107
Array each 120
Thread current 123
Class allocate 131
Fixnum abs 136
String intern 147
Hash delete 150
Kernel instance_variable_set 183
Hash []= 195
String to_s 195
String initialize 237
String to_sym 238
Kernel respond_to? 247
Fixnum to_s 286
Class superclass 349
PG::Connection async_exec 354
PG::Result values 383
#<Class:0x007ff6e8b9e510> relation 399
Module === 415
Kernel block_given? 450
Kernel hash 512
Kernel class 517
Hash [] 566
Symbol to_s 586
Hash key? 6810
Raw
Mutexd current 1
Thread [] 1
Mutex unlock 3
Module name 8
Thread current 13
PG::Result values 78
PG::Connection async_exec 266
My particular concern here is all these extraneous calls to .key? , seems the sampler is really spending much time there.
These key? calls are all coming form
2 143160 vm_call_cfunc:cmethod-entry Hash key? /Users/sam/.rvm/gems/ruby-2.0.0-p0/gems/activerecord-3.2.12/lib/active_record/locking/optimistic.rb 174
2 143160 vm_call_cfunc:cmethod-entry Hash key? /Users/sam/.rvm/gems/ruby-2.0.0-p0/gems/activerecord-3.2.12/lib/active_record/attribute_methods/serialization.rb 66
#12 By: Jack James, March 22nd, 2013 08:11
Having just been directed here from this:

Most of the above discussion went completely over my head. What steps do I need to take to fix the problem?
#13 By: Régis Hanol, March 22nd, 2013 09:06
According to admin_dashboard_data.rb you have to define the RUBY_GC_MALLOC_LIMIT environment variable.
@sam recommends it to be 90000000.
#14 By: Robin Ward, March 22nd, 2013 09:55
Maybe we should provide a better link/doc? This topic is definitely for pros :)
#15 By: Jack James, March 22nd, 2013 10:13
ok so I've added RUBY_GC_MALLOC_LIMIT=90000000 to my .env and then re-exported to upstart (which works!)
Thanks!
#16 By: Kevin P. Fleming, March 22nd, 2013 11:39
This is needed in the process(es) where 'thin' is running, I assume, not clockwork or sidekiq.
#18 By: funny_falcon, March 22nd, 2013 12:57
(mentioned memory leak in a falcon.patch were fixed a week ago)
#19 By: Neil, March 22nd, 2013 17:10
A better link would be nice, but I don't know of one. Most of those admin dashboard warnings need useful links. I'm just adding the text right now, and will add better links over time.
#20 By: Sam Saffron, March 22nd, 2013 17:12
I will try to TLDR my post to fix this up ... the topic is good imho
#21 By: czbond, March 23rd, 2013 12:47
Out of curiosity, for your home page, why not strip it out of views (erb/haml) since Rails takes so much overhead to construct a view? We create a static homepage builder with assets in s3.
Absolutely great findings and tests. Thanks.
#22 By: all4miller, March 24th, 2013 05:03
Slim Scrooge monitors your queries and modifies the select list to only include required columns. May be either of use in itself or for ideas on reducing select * overhead.
Powered by Discourse, best viewed with JavaScript enabled