Lockup with wait state at 90% Plus

My Discourse is locking up with the CPU wait state (top wa:) at 90% or more. Is there a common reason other admins may have seen that causes this condition? I’m running Debian on AWS.

Is the database in RDS or in the same container as the web is?

Is the machine disk a EBS network mount? Did you check if you run out of allowed IOPS ?


The database is in the same docker container. The fellow that set this up for me created two EBS volumes, one is 8GiB the other is 32GiB. Volume types are GP2. Both volumes have 100 IOPS. Is that enough IOPS? I’m reading this Optimize the Performance of Amazon EBS Provisioned IOPS Volumes to learn but any hints pointing me in the right direction would be much appreciated.

Edit: I found that theQueue Length (mentioned in the above article) got very long during the last outage on the 19 (below chart). Question is now how do I find out what is doing that and how to prevent it?