Constant 500 Internal Server Errors


(Kirupa Chinnathambi) #1

Hi, everyone!
For the past week since we moved to using discourse, our forum installation (http://forum.kirupa.com) is constantly throwing 500 Internal Server error messages. Checking through the logs, nothing seems out of the ordinary. I was watching my processes a few moments ago, and about 5 minutes before the errors started appearing, this is what I saw:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
discour+    90  0.4 12.8 545920 264392 ?       Sl   Nov05   0:38 sidekiq 3.2.5 discourse [0 of 5 busy]                                                               
discour+   128  0.4 12.0 485496 247020 ?       Sl   Nov05   0:41 unicorn worker[2] -E production -c config/unicorn.conf.rb                                           
discour+   116  0.4 11.8 481400 242492 ?       Sl   Nov05   0:43 unicorn worker[0] -E production -c config/unicorn.conf.rb                                           
discour+   122  0.3 11.8 1048164 242288 ?      Sl   Nov05   0:30 unicorn worker[1] -E production -c config/unicorn.conf.rb                                           
postgres 10002  3.1 11.5 422128 236708 ?       Ss   02:00   0:08 postgres: discourse discourse [local] idle                        
postgres 10025  5.1 11.3 397684 232976 ?       Ss   02:01   0:12 postgres: discourse discourse [local] idle                        
discour+    59  0.2 10.9 459696 224712 ?       Sl   Nov05   0:25 unicorn master -E production -c config/unicorn.conf.rb                                              
postgres   325  0.0 10.7 396600 219640 ?       Ss   Nov05   0:02 postgres: discourse discourse [local] idle                        
postgres  8908  0.3  7.9 397376 162688 ?       Ss   01:44   0:04 postgres: discourse discourse [local] idle                        
postgres  9940  2.2  7.8 422128 161112 ?       Ss   01:59   0:07 postgres: discourse discourse [local] idle                        
discour+  7730  0.7  6.8 621904 140188 pts/0   Sl+  01:26   0:17 pry    

Is anything out of the ordinary here?

The only workaround I’ve found is for me to restart the DigitalOcean droplet. That seems to fix things for a few hours before the errors start occurring.

Cheers,
Kirupa


(Jeff Atwood) #2

Any plugins running?

If not, I suspect it is somehow related to the prior import of 2 million records you guys did?


#3

The only plugin running is this one, which is a very minor modification of the emoji plugin that swaps which emoticons are available and uses .gifs instead of .pngs.

The source of the error is a DB timeout: ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds))

It didn’t seem to me like Postgres was doing anything at all when it wasn’t accepting those connections.

It certainly could be.


(Sam Saffron) #4

What kind of load is this server under? does the downtime correlate with a job running?

I recally you ran a massive import no?


(Mittineague) #5

I would try temporarily disabling it.

It looks to be the emoji plugin (copy/paste save for a few name substitutions),

Because it is still using “emoji” in some places it might be that routing is getting confused somewhere with the text parsing and the routing.

And it isn’t

it’s

gifs instead of symlinks to pngs


#6

True, I’m not trying to mask that.

True, for sure. As I was editing the plugin, it wasn’t entirely clear to me where the unicode folder came from, since GitHub’s display of symlinks in repos isn’t super clear.

Certainly! I’ve done just so and will report back.


(Jeff Atwood) #7

I believe this was also resolved with the perf work @sam did while using a large local db?


(Kirupa Chinnathambi) #8

I believe so. I haven’t seen too many Internal Server 500 errors since updating with sam’s change a short while ago :smile:


(Jeff Atwood) #9