Ubuntu 20.04 kernel update with docker causing a crash on EC2 and Lightsail

I ran into this issue last night, when my Ubuntu 20.04 LTS automatically upgraded itself, it installed a new kernel and I lost control of the system, it would just crash a few minutes after booting. I tried it again today with a fresh Discourse install and as soon as I upgraded the system it started crashing again.

Just a note for folks, don’t update your linux kernels just yet, this is a known bug - see this for more details.

5 Likes

The question being is there a way to start the system without having it start Discourse/docker? Running on AWS Lightsail. The only other option is to rebuild the whole system again which is PITA right now given the backup/restore issues I’m facing.

EDIT: This is what I found, hit or miss depending on how fast it comes up.

while true; do
  ssh <instance> "sudo systemctl disable docker.service; sudo systemctl disable containerd.service"
done

I had this happen on two EC2 instances as well. They went down at 5AM EDT for a reboot and never came back up.

2 Likes

Per the link this affects people running canonical “cloud kernels” in ubuntu machines. They removed a patch that affects OverlayFS.

While Canonical rolls a fix people can try a different kernel version or using Debian / other distro as a workaround.

6 Likes

I was able to interrupt the cycle using a quick SSH about 15 seconds after it starts to disable the docker/container services. Downgraded the kernel to 5.4 and it seems to be working

5 Likes

Yes, as I just posted in your other thread on restore troubles, that was essentially what I did as well when this bug crashed my server. Well, I booted the old kernel; didn’t have to disable docker or containers. And the current kernel is safe again. Here is a link to what I said in your other thread. In a bit I’ll try to write up my permanent solution to keep this from happening again.

Nasty kernel bug, that was!

1 Like

You simply can revert to the previous kernel and the machine is restored. Or update to the current, fixed kernel, which came out on Thursday.

1 Like

I have written up a tutorial on how to avoid kernel oops! issues like this that crash your server or keep it from coming back up.

I put the tutorial on my Discourse site, since that seemed convenient to me. My site has nothing to do with tech, though. So I unlisted the topic but published it to HTML.

Enjoy.

https://discourse.bluebottlefly.com/pub/hardening-your-server

@RBoy, maybe you in particular will find this useful.

/dr

1 Like