Thanks for all the details. It’s good that you did this
but I think probably doing that will only last until the next reboot. After reboot you would need to do it again. See MKJ’s Opinionated Discourse Deployment Configuration for tips on making this permanent.
It feels possible that you have too little memory (by which I mean RAM+swap) and yet 2+4 should be enough. Please run the following quick diagnostics and post the results:
cat /etc/lsb-release
uptime
df -h /
free
swapon
vmstat 5 5
dmesg|egrep -i "memory|oom|kill"
ps auxrc
Please also share your app.yml file here - but not the passwords and secret tokens inside it!
If you are able to set up two ssh connections, you can use one to run an app rebuild and use the other to see what the machine is doing. I like to alternate
vmstat 5 5
ps auxrc
It’s possible you are swapping to a remote disk - a network-attached storage - and this is known to be a problem. It will be very slow. Perhaps it causes a timeout and this is the problem. Perhaps there’s a way to adjust the timeout.
I found this - maybe it helps?
(The default systemd timeout is 90 seconds, at least in some releases of systemd, so this fits quite nicely).
You could try to work around this by increasing TimeoutStartSec in postgresql’s systemd unit (or even globally), which perhaps only hides the problem until the next service suddenly doesn’t start anymore.
Edit: if so, then this advice might be good:
You can uncomment in
/etc/systemd/system.conf
the lines:DefaultTimeoutStartSec=90s DefaultTimeoutStopSec=90s
And change the value to what you consider appropriate.