تحديث نواة Ubuntu 20.04 مع Docker يسبب تعطلًا على EC2 و Lightsail

I ran into this issue last night, when my Ubuntu 20.04 LTS automatically upgraded itself, it installed a new kernel and I lost control of the system, it would just crash a few minutes after booting. I tried it again today with a fresh Discourse install and as soon as I upgraded the system it started crashing again.

Just a note for folks, don’t update your linux kernels just yet, this is a known bug - see this for more details.

5 إعجابات

The question being is there a way to start the system without having it start Discourse/docker? Running on AWS Lightsail. The only other option is to rebuild the whole system again which is PITA right now given the backup/restore issues I’m facing.

EDIT: This is what I found, hit or miss depending on how fast it comes up.

while true; do
  ssh <instance> "sudo systemctl disable docker.service; sudo systemctl disable containerd.service"
done

I had this happen on two EC2 instances as well. They went down at 5AM EDT for a reboot and never came back up.

إعجابَين (2)

Per the link this affects people running canonical “cloud kernels” in ubuntu machines. They removed a patch that affects OverlayFS.

While Canonical rolls a fix people can try a different kernel version or using Debian / other distro as a workaround.

6 إعجابات

I was able to interrupt the cycle using a quick SSH about 15 seconds after it starts to disable the docker/container services. Downgraded the kernel to 5.4 and it seems to be working

5 إعجابات

Yes, as I just posted in your other thread on restore troubles, that was essentially what I did as well when this bug crashed my server. Well, I booted the old kernel; didn’t have to disable docker or containers. And the current kernel is safe again. Here is a link to what I said in your other thread. In a bit I’ll try to write up my permanent solution to keep this from happening again.

Nasty kernel bug, that was!

إعجاب واحد (1)

You simply can revert to the previous kernel and the machine is restored. Or update to the current, fixed kernel, which came out on Thursday.

إعجاب واحد (1)

I have written up a tutorial on how to avoid kernel oops! issues like this that crash your server or keep it from coming back up.

I put the tutorial on my Discourse site, since that seemed convenient to me. My site has nothing to do with tech, though. So I unlisted the topic but published it to HTML.

Enjoy.

https://discourse.bluebottlefly.com/pub/hardening-your-server

@RBoy, maybe you in particular will find this useful.

/dr

إعجاب واحد (1)