Memory is running out and Discourse stops working

The information that it started in beta5/6 triggered me looking further back.

Initially I thought it was something we did in the last couple of weeks, but after graphing memory performance of a month old build with a current build, well nothing stuck out, except that all my recent memory work made our baseline way better.

I also noticed we had a rogue sidekiq which was quite old with rogue memory usage of 2GB.

I did notice this fairly recent report about multithreading issues with pg Google Groups

Our web workers use 5 threads which could be triggering some of this.

My plan is:

  1. Downgrade to old “good” version of pg (done). This unfortunately means this issue is back.
  2. Amend internal logic in unicorn so we do not run 5 threads and do everything from master thread.
  3. Create a standalone app that reproduces the memory issue under latest pg gem and report to pg
  4. Work with pg authors to resolve it, so we can again upgrade to latest.
  5. Deploy extensive memory profiling to our internal infrastructure (in-progress) so we can catch this in future
  6. Work on cutting down on redis memory requirement which is quite high now
  7. Consider building protection against rogue memory usage into our base image
12 Likes