Kernel oops / CPU stalls due to XFS

Hmm. I would tend to agree, except for the kernel errors that started first. The VM had been running since 06/Jul without a single kernel oops until this morning. Here’s the full output of that instant. Notice the page_fault_oops and handle_mm_fault and xfs_filemap_map_pages stuff:

Jul 22 05:00:22 myvm kernel: BUG: kernel NULL pointer dereference, address: 0000000000000002
Jul 22 05:00:22 myvm kernel: #PF: supervisor read access in kernel mode
Jul 22 05:00:22 myvm kernel: #PF: error_code(0x0000) - not-present page
Jul 22 05:00:22 myvm kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jul 22 05:00:22 myvm kernel: CPU: 3 PID: 3235204 Comm: postmaster Not tainted 6.1.0-10-amd64 #1  Debian 6.1.37-1
Jul 22 05:00:22 myvm kernel: Hardware name: Hetzner vServer/Standard PC (Q35 + ICH9, 2009), BIOS 20171111 11/11/2017
Jul 22 05:00:22 myvm kernel: RIP: 0010:next_uptodate_page+0x45/0x1f0
Jul 22 05:00:22 myvm kernel: Code: 0f 84 2f 01 00 00 48 81 ff 06 04 00 00 0f 84 a3 00 00 00 48 81 ff 02 04 00 00 0f 84 26 01 00 00 40 f6 c7 01 0f 85 8c 00 00 00 <48> 8b 07 a8 01 0f 85 81 00 00 00 8b 47 34 85 c0 74 7a 8d 50 01 4c
Jul 22 05:00:22 myvm kernel: RSP: 0000:ffffc1ae8274bcc0 EFLAGS: 00010246
Jul 22 05:00:22 myvm kernel: RAX: 0000000000000002 RBX: ffffc1ae8274bd18 RCX: 000000000000005e
Jul 22 05:00:22 myvm kernel: RDX: ffffc1ae8274bd18 RSI: ffffa0210863d2b0 RDI: 0000000000000002
Jul 22 05:00:22 myvm kernel: RBP: ffffa0210863d2b0 R08: 000000000000005e R09: 000055fb22bbdfff
Jul 22 05:00:22 myvm kernel: R10: 000000000000004f R11: 0000000000000000 R12: 000000000000005e
Jul 22 05:00:22 myvm kernel: R13: ffffa02194ad6980 R14: ffffa0210863d2b0 R15: ffffa02118538f60
Jul 22 05:00:22 myvm kernel: FS:  00007f423625fa40(0000) GS:ffffa0226bf80000(0000) knlGS:0000000000000000
Jul 22 05:00:22 myvm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 22 05:00:22 myvm kernel: CR2: 0000000000000002 CR3: 000000010d87e000 CR4: 0000000000350ee0
Jul 22 05:00:22 myvm kernel: Call Trace:
Jul 22 05:00:22 myvm kernel:  <TASK>
Jul 22 05:00:22 myvm kernel:  ? __die_body.cold+0x1a/0x1f
Jul 22 05:00:22 myvm kernel:  ? page_fault_oops+0xd2/0x2b0
Jul 22 05:00:22 myvm kernel:  ? finish_task_switch.isra.0+0x9b/0x300
Jul 22 05:00:22 myvm kernel:  ? exc_page_fault+0x70/0x170
Jul 22 05:00:22 myvm kernel:  ? asm_exc_page_fault+0x22/0x30
Jul 22 05:00:22 myvm kernel:  ? next_uptodate_page+0x45/0x1f0
Jul 22 05:00:22 myvm kernel:  filemap_map_pages+0xb0/0x6e0
Jul 22 05:00:22 myvm kernel:  xfs_filemap_map_pages+0x41/0x60 [xfs]
Jul 22 05:00:22 myvm kernel:  do_fault+0x1a7/0x410
Jul 22 05:00:22 myvm kernel:  __handle_mm_fault+0x660/0xfa0
Jul 22 05:00:22 myvm kernel:  handle_mm_fault+0xdb/0x2d0
Jul 22 05:00:22 myvm kernel:  do_user_addr_fault+0x19c/0x570
Jul 22 05:00:22 myvm kernel:  exc_page_fault+0x70/0x170
Jul 22 05:00:22 myvm kernel:  asm_exc_page_fault+0x22/0x30
Jul 22 05:00:22 myvm kernel: RIP: 0033:0x7f42398b32a6
Jul 22 05:00:22 myvm kernel: Code: c7 5d 41 5c e9 3b 3d 00 00 5a 31 c0 5d 41 5c c3 0f 1f 40 00 89 f1 89 f8 48 83 e1 3f 48 83 e0 3f 83 f9 30 77 3f 83 f8 30 77 3a <66> 0f 12 0f 66 0f 12 16 66 0f 16 4f 08 66 0f 16 56 08 66 0f ef c0
Jul 22 05:00:22 myvm kernel: RSP: 002b:00007ffc8a9aae68 EFLAGS: 00010287
Jul 22 05:00:22 myvm kernel: RAX: 0000000000000001 RBX: 000055fb22b39750 RCX: 0000000000000010
Jul 22 05:00:22 myvm kernel: RDX: 0000000000000000 RSI: 00007f41b1534550 RDI: 000055fb22b59d01
Jul 22 05:00:22 myvm kernel: RBP: 0000000000000009 R08: 0000000000000000 R09: 000055fb22b39750
Jul 22 05:00:22 myvm kernel: R10: 00007f41b1534550 R11: 000000000000002c R12: 00007f42398c3180
Jul 22 05:00:22 myvm kernel: R13: 0000000000000000 R14: 0000000000000009 R15: 00007f42398c3180
Jul 22 05:00:22 myvm kernel:  </TASK>
Jul 22 05:00:22 myvm kernel: Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc lz4 lz4_compress zram zsmalloc overlay binfmt_misc intel_rapl_msr intel_rapl_common ghash_clmulni_intel sha512_ssse3 sha512_generic iTCO_wdt intel_pmc_bxt iTCO_vendor_support virtio_rng aesni_intel crypto_simd watchdog cryptd pcspkr rng_core virtio_gpu virtio_console virtio_balloon virtio_dma_buf drm_shmem_helper drm_kms_helper button evdev joydev serio_raw sg fuse dm_mod drm loop efi_pstore configfs qemu_fw_cfg ip_tables x_tables autofs4 xfs libcrc32c crc32c_generic hid_generic usbhid hid sr_mod cdrom sd_mod t10_pi ahci crc64_rocksoft crc64 crc_t10dif libahci crct10dif_generic virtio_net net_failover virtio_scsi failover libata xhci_pci scsi_mod psmouse xhci_hcd crct10dif_pclmul crct10dif_common
Jul 22 05:00:22 myvm kernel:  crc32_pclmul crc32c_intel i2c_i801 i2c_smbus lpc_ich scsi_common usbcore virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio usb_common virtio_ring
Jul 22 05:00:22 myvm kernel: CR2: 0000000000000002
Jul 22 05:00:22 myvm kernel: ---[ end trace 0000000000000000 ]---
Jul 22 05:00:22 myvm kernel: RIP: 0010:next_uptodate_page+0x45/0x1f0
Jul 22 05:00:22 myvm kernel: Code: 0f 84 2f 01 00 00 48 81 ff 06 04 00 00 0f 84 a3 00 00 00 48 81 ff 02 04 00 00 0f 84 26 01 00 00 40 f6 c7 01 0f 85 8c 00 00 00 <48> 8b 07 a8 01 0f 85 81 00 00 00 8b 47 34 85 c0 74 7a 8d 50 01 4c
Jul 22 05:00:22 myvm kernel: RSP: 0000:ffffc1ae8274bcc0 EFLAGS: 00010246
Jul 22 05:00:22 myvm kernel: RAX: 0000000000000002 RBX: ffffc1ae8274bd18 RCX: 000000000000005e
Jul 22 05:00:22 myvm kernel: RDX: ffffc1ae8274bd18 RSI: ffffa0210863d2b0 RDI: 0000000000000002
Jul 22 05:00:22 myvm kernel: RBP: ffffa0210863d2b0 R08: 000000000000005e R09: 000055fb22bbdfff
Jul 22 05:00:22 myvm kernel: R10: 000000000000004f R11: 0000000000000000 R12: 000000000000005e
Jul 22 05:00:22 myvm kernel: R13: ffffa02194ad6980 R14: ffffa0210863d2b0 R15: ffffa02118538f60
Jul 22 05:00:22 myvm kernel: FS:  00007f423625fa40(0000) GS:ffffa0226bf80000(0000) knlGS:0000000000000000
Jul 22 05:00:22 myvm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 22 05:00:22 myvm kernel: CR2: 0000000000000002 CR3: 000000010d87e000 CR4: 0000000000350ee0
Jul 22 05:00:22 myvm kernel: ------------[ cut here ]------------
Jul 22 05:00:22 myvm kernel: Voluntary context switch within RCU read-side critical section!
Jul 22 05:00:22 myvm kernel: WARNING: CPU: 3 PID: 3235204 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
Jul 22 05:00:22 myvm kernel: Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc lz4 lz4_compress zram zsmalloc overlay binfmt_misc intel_rapl_msr intel_rapl_common ghash_clmulni_intel sha512_ssse3 sha512_generic iTCO_wdt intel_pmc_bxt iTCO_vendor_support virtio_rng aesni_intel crypto_simd watchdog cryptd pcspkr rng_core virtio_gpu virtio_console virtio_balloon virtio_dma_buf drm_shmem_helper drm_kms_helper button evdev joydev serio_raw sg fuse dm_mod drm loop efi_pstore configfs qemu_fw_cfg ip_tables x_tables autofs4 xfs libcrc32c crc32c_generic hid_generic usbhid hid sr_mod cdrom sd_mod t10_pi ahci crc64_rocksoft crc64 crc_t10dif libahci crct10dif_generic virtio_net net_failover virtio_scsi failover libata xhci_pci scsi_mod psmouse xhci_hcd crct10dif_pclmul crct10dif_common
Jul 22 05:00:22 myvm kernel:  crc32_pclmul crc32c_intel i2c_i801 i2c_smbus lpc_ich scsi_common usbcore virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio usb_common virtio_ring
Jul 22 05:00:22 myvm kernel: CPU: 3 PID: 3235204 Comm: postmaster Tainted: G      D            6.1.0-10-amd64 #1  Debian 6.1.37-1
Jul 22 05:00:22 myvm kernel: Hardware name: Hetzner vServer/Standard PC (Q35 + ICH9, 2009), BIOS 20171111 11/11/2017
Jul 22 05:00:22 myvm kernel: RIP: 0010:rcu_note_context_switch+0x4ee/0x690
Jul 22 05:00:22 myvm kernel: Code: 49 89 3f 49 83 bc 24 98 00 00 00 00 0f 85 66 fe ff ff e9 58 fe ff ff 48 c7 c7 68 53 70 94 c6 05 d7 0e ad 01 01 e8 d2 8e f6 ff <0f> 0b e9 70 fb ff ff a9 ff ff ff 7f 0f 84 2c fc ff ff 65 48 8b 3c
Jul 22 05:00:22 myvm kernel: RSP: 0018:ffffc1ae8274bc60 EFLAGS: 00010086
Jul 22 05:00:22 myvm kernel: RAX: 0000000000000000 RBX: ffffa0226bfb1c00 RCX: 0000000000000000
Jul 22 05:00:22 myvm kernel: RDX: 0000000000000003 RSI: ffffffff9474105e RDI: 00000000ffffffff
Jul 22 05:00:22 myvm kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffc1ae8274bad0
Jul 22 05:00:22 myvm kernel: R10: 0000000000000003 R11: ffffffff94ed43a8 R12: 0000000000030e40
Jul 22 05:00:22 myvm kernel: R13: ffffa02175d09980 R14: ffffc1ae8274bd50 R15: 0000000000000000
Jul 22 05:00:22 myvm kernel: FS:  0000000000000000(0000) GS:ffffa0226bf80000(0000) knlGS:0000000000000000
Jul 22 05:00:22 myvm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 22 05:00:22 myvm kernel: CR2: 00007f41ef6dd70e CR3: 00000000059f6000 CR4: 0000000000350ee0
Jul 22 05:00:22 myvm kernel: Call Trace:
Jul 22 05:00:22 myvm kernel:  <TASK>
Jul 22 05:00:22 myvm kernel:  ? __warn+0x7d/0xc0
Jul 22 05:00:22 myvm kernel:  ? rcu_note_context_switch+0x4ee/0x690
Jul 22 05:00:22 myvm kernel:  ? report_bug+0xe6/0x170
Jul 22 05:00:22 myvm kernel:  ? irq_work_queue+0xa/0x50
Jul 22 05:00:22 myvm kernel:  ? handle_bug+0x41/0x70
Jul 22 05:00:22 myvm kernel:  ? exc_invalid_op+0x13/0x60
Jul 22 05:00:22 myvm kernel:  ? asm_exc_invalid_op+0x16/0x20
Jul 22 05:00:22 myvm kernel:  ? rcu_note_context_switch+0x4ee/0x690
Jul 22 05:00:22 myvm kernel:  __schedule+0xac/0xa20
Jul 22 05:00:22 myvm kernel:  schedule+0x5d/0xe0
Jul 22 05:00:22 myvm kernel:  rwsem_down_write_slowpath+0x34e/0x730
Jul 22 05:00:22 myvm kernel:  exit_mmap+0xf6/0x2f0
Jul 22 05:00:22 myvm kernel:  __mmput+0x3e/0x130
Jul 22 05:00:22 myvm kernel:  do_exit+0x2fc/0xb10
Jul 22 05:00:22 myvm kernel:  make_task_dead+0x8d/0x90
Jul 22 05:00:22 myvm kernel:  rewind_stack_and_make_dead+0x17/0x20
Jul 22 05:00:22 myvm kernel: RIP: 0033:0x7f42398b32a6
Jul 22 05:00:22 myvm kernel: Code: Unable to access opcode bytes at 0x7f42398b327c.
Jul 22 05:00:22 myvm kernel: RSP: 002b:00007ffc8a9aae68 EFLAGS: 00010287
Jul 22 05:00:22 myvm kernel: RAX: 0000000000000001 RBX: 000055fb22b39750 RCX: 0000000000000010
Jul 22 05:00:22 myvm kernel: RDX: 0000000000000000 RSI: 00007f41b1534550 RDI: 000055fb22b59d01
Jul 22 05:00:22 myvm kernel: RBP: 0000000000000009 R08: 0000000000000000 R09: 000055fb22b39750
Jul 22 05:00:22 myvm kernel: R10: 00007f41b1534550 R11: 000000000000002c R12: 00007f42398c3180
Jul 22 05:00:22 myvm kernel: R13: 0000000000000000 R14: 0000000000000009 R15: 00007f42398c3180
Jul 22 05:00:22 myvm kernel:  </TASK>
Jul 22 05:00:22 myvm kernel: ---[ end trace 0000000000000000 ]---

I kind of think the same thing, except that this is somewhat of a repeat issue, feels slightly not random enough. I suspect that Hetzner probably doesn’t use ECC RAM, that’s probably how they can offer so much for the price. Even their dedicated servers apparently don’t/didn’t have ECC. But even so Hetzner is generally regarded as quite reliable in terms of their infrastructure.