Speeding up Development Reloads when changing Ruby files

Recently a few team members noticed that reloading after saving ruby files was painfully slow. It seems to take about 10 seconds after each save to reload in a browser.

TL;DR: Check out the latest Discourse, add a ENV variable CHECKOUT_TIMEOUT=0.1 for a workaround.

I am quite busy right now but the pain of reloading was significant enough that I spent a few hours spelunking into Discourse and Rails to try and track it down.

The good news is I found a workaround. The bad news is I’m still not sure what the root cause is. I’m writing it up here in hopes that someone smarter than me can figure it out so we can fix it properly.

Findings

When reloading in development mode, the clear_reloadable_connections! method is called.

From its comments:

# The pool first tries to gain ownership of all connections. If unable to
# do so within a timeout interval (default duration is
# <tt>spec.config[:checkout_timeout] * 2</tt> seconds), then the pool forcefully
# clears the cache and reloads connections without any regard for other
# connection owning threads.

It seems in our application, every time it tries to do this it fails, and has to wait the full 5 * 2 = 10 seconds before continuing, when it works fine.

I tried to dig deeper and quickly got caught up in threads and locks.

My gut feeling is the problem is something in our stack has its hands on a connection, and is not handing it back properly. It always seems to work when the timeout is reached, so simply lowering the timeout to something small like 0.1 makes the request work quickly.

I’ve added support to Discourse to support changing the timeout via an ENV variable:

Things I’ve tried to fix it

  • Upgrading the pg gem

  • Disabling our custom reload code blocks

  • Different web servers like Webrick instead of Puma

  • Commenting out “weird” bits of Discourse code

So far, I’ve come up with nothing.

If you have any ideas I’d love to hear them!

Reproducing the bug

  1. rm -rf tmp
  2. bundle exec rails s
  3. Visit a dynamic URL such as /admin/dashboard/problems.json (it should load quickly)
  4. Make a change to a file such as discourse/app/controllers/admin/dashboard_controller.rb at main · discourse/discourse · GitHub - adding a comment with different text is enough
  5. Reload the problems.json URL.
  6. It should spin for 10 seconds until the timeout is reached.
18 Likes

Thanks for all the legwork here @eviltrout it made it way easier to fix.

Fixed per:

https://github.com/discourse/discourse/commit/7ca08216bdf6436764a2b8964d06f4bcd84e5eb4

How I debugged this?

I edited my local copy of active_record/connection_adapters/abstract/connnection_pool.rb

In def checkout I added:

def checkout(checkout_timeout = @checkout_timeout)
  c = checkout_and_verify(acquire_connection(checkout_timeout))
  puts "CHECKOUT-#{c.object_id}" + "-"* 10
  puts caller.grep(/discourse/)
  puts "-" * 10
  c
end

In def checkin I added:

def checkin(conn)
  puts "CHECKIN-#{conn.object_id}" + "-"* 10
  puts caller.grep(/discourse/)
  puts "-" * 10

   ... 

This revealed 2 spots that were not returning connections to the pool.

Minor problem was on boot, plugins sometimes opened connections to the db without giving connection back to pool.

https://github.com/discourse/discourse/blob/master/config/application.rb#L211

Major issue was the defer queue that was never releasing its connection (fixed here)

https://github.com/discourse/discourse/blob/7ca08216bdf6436764a2b8964d06f4bcd84e5eb4/lib/scheduler/defer.rb#L78

The defer queue exists to kick off lightweight background work we have the is too light to belong in Sidekiq (like increasing user view counts and stuff like that). In a puma environment it runs in a background thread, in a unicorn environment it runs between requests.

Enjoy :innocent:

11 Likes