The post with the pointer to logs is super helpful.
Lots of workers are timing out:
unicorn.stderr.log
E, [2018-02-17T19:44:21.208533 #800] ERROR -- : worker=4 PID:29923 timeout (31s > 30s), killing
I, [2018-02-17T19:44:29.211582 #29979] INFO -- : worker=5 ready
I, [2018-02-17T19:44:29.699253 #29990] INFO -- : worker=7 ready
I, [2018-02-17T19:44:31.985875 #30008] INFO -- : worker=2 ready
I, [2018-02-17T19:44:46.317249 #30050] INFO -- : worker=4 ready
E, [2018-02-17T19:45:57.690129 #800] ERROR -- : worker=5 PID:29979 timeout (36s > 30s), killing
E, [2018-02-17T19:46:07.212510 #800] ERROR -- : worker=3 PID:29952 timeout (35s > 30s), killing
E, [2018-02-17T19:46:07.221215 #800] ERROR -- : worker=7 PID:29990 timeout (33s > 30s), killing
E, [2018-02-17T19:46:07.783812 #800] ERROR -- : worker=7 PID:29990 timeout (33s > 30s), killing
I, [2018-02-17T19:46:44.135563 #30165] INFO -- : worker=3 ready
I, [2018-02-17T19:46:44.840466 #30156] INFO -- : worker=5 ready
I, [2018-02-17T19:46:46.952119 #30174] INFO -- : worker=7 ready
E, [2018-02-17T19:46:59.823120 #800] ERROR -- : worker=2 PID:30008 timeout (31s > 30s), killing
E, [2018-02-17T19:47:43.274893 #800] ERROR -- : worker=7 PID:30174 timeout (31s > 30s), killing
E, [2018-02-17T19:47:46.393225 #800] ERROR -- : worker=1 PID:29767 timeout (33s > 30s), killing
E, [2018-02-17T19:47:46.579485 #800] ERROR -- : worker=5 PID:30156 timeout (31s > 30s), killing
I, [2018-02-17T19:47:46.686074 #30254] INFO -- : worker=2 ready
E, [2018-02-17T19:47:46.894589 #800] ERROR -- : worker=1 PID:29767 timeout (33s > 30s), killing
E, [2018-02-17T19:47:49.736359 #800] ERROR -- : worker=3 PID:30165 timeout (32s > 30s), killing
I, [2018-02-17T19:48:14.864398 #30307] INFO -- : worker=5 ready
I, [2018-02-17T19:48:16.177603 #30336] INFO -- : worker=3 ready
I, [2018-02-17T19:48:16.345038 #30322] INFO -- : worker=1 ready
I, [2018-02-17T19:48:20.622346 #30296] INFO -- : worker=7 ready
D, [2018-02-17T19:48:22.908873 #800] DEBUG -- : waiting 16.0s after suspend/hibernation
E, [2018-02-17T19:49:22.238502 #800] ERROR -- : worker=5 PID:30307 timeout (31s > 30s), killing
E, [2018-02-17T19:49:31.480543 #800] ERROR -- : worker=4 PID:30050 timeout (32s > 30s), killing
E, [2018-02-17T19:49:31.588904 #800] ERROR -- : worker=1 PID:30322 timeout (33s > 30s), killing
E, [2018-02-17T19:49:32.221465 #800] ERROR -- : worker=1 PID:30322 timeout (34s > 30s), killing
E, [2018-02-17T19:49:45.783025 #800] ERROR -- : worker=7 PID:30296 timeout (32s > 30s), killing
E, [2018-02-17T19:49:50.598973 #800] ERROR -- : worker=3 PID:30336 timeout (31s > 30s), killing
After digging in, it looks like postgres is being hammered, with a lot of queries taking 700+ms. one query took 50 seconds.
I’ve posted the last 1000 lines here: https://gist.github.com/886f2324e50958dea43c51a2595ec15d
Edit: deleted log – security
If anything looks weird or non-standard, I’d appreciate if anyone lets me know.
For now I’ll try upgrading to an even more beefier instance and see if that helps.