Hello,
According to your log, the kernel backtrace seems to be generated when containers are disabled (presumably when the operating system is executing the shutdown): can you please confirm it ?The process involved in this error is containerd-shim (included in the package containerd):The kernel complaints about a "page fault" (#PF). The kernel, on behalf of the containerd-shim program, cannot write to a memory page at a specific address. According to the error code 2 returned by the CPU (see [2] at page Vol. 3A 4-37):
If it is confirmed that the fault appears during shutdowns, a faulty RAM should trigger the error during normal operations and not only during shutdowns. In this case, the fault could be caused by something related to the container's supervisor (shim-containderd).
By the way, what is the version of installed containerd ? You can check with the command:Hope that helps.
----
[1] decode_stacktrace: make stack dump output useful again
[2] Combined Volume Set of Intel® 64 and IA-32 Architectures Software Developer’s Manuals
[2] Bug hunting
According to your log, the kernel backtrace seems to be generated when containers are disabled (presumably when the operating system is executing the shutdown): can you please confirm it ?
Code:
Aug 25 05:11:38 machine systemd[1]: run-docker-runtime\x2drunc-moby-eef16d249922cf43624ee73338deddcc35be2ef3bf81f1e594d38d20c0233dc7-runc.QLbOd7.mount: Deactivated successfully.[.. other docker deativation messages ..] Aug 25 05:20:40 machine systemd[1]: run-docker-runtime\x2drunc-moby-1e3c156667abd51865a2286598994970c92ab8908f238e5e6c3b8fe594f015bc-runc.dmydl6.mount: Deactivated successfully.Aug 25 05:20:40 machine kernel: BUG: unable to handle page fault for address: 0000010000000000Aug 25 05:20:40 machine kernel: #PF: supervisor write access in kernel modeAug 25 05:20:40 machine kernel: #PF: error_code(0x0002) - not-present pageAug 25 05:20:40 machine kernel: PGD 0 P4D 0 Aug 25 05:20:40 machine kernel: Oops: 0002 [#1] PREEMPT SMP NOPTIAug 25 05:20:40 machine kernel: CPU: 1 PID: 7190 Comm: containerd-shim Not tainted 6.1.0-23-amd64 #1 Debian 6.1.99-1Aug 25 05:20:40 machine kernel: Hardware name: Micro-Star International Co., Ltd. PRO ADL-N Cubi N (MS-B0A9)/MS-B0A91, BIOS 8.00 02/23/2023Aug 25 05:20:40 machine kernel: RIP: 0010:_raw_spin_lock+0x13/0x30[..]Aug 25 05:20:40 machine kernel: Call Trace:Aug 25 05:20:40 machine kernel: <TASK>Aug 25 05:20:40 machine kernel: ? __die_body.cold+0x1a/0x1fAug 25 05:20:40 machine kernel: ? page_fault_oops+0xd2/0x2b0Aug 25 05:20:40 machine kernel: ? try_to_wake_up+0x26b/0x570Aug 25 05:20:40 machine kernel: ? exc_page_fault+0x70/0x170Aug 25 05:20:40 machine kernel: ? asm_exc_page_fault+0x22/0x30Aug 25 05:20:40 machine kernel: ? _raw_spin_lock+0x13/0x30Aug 25 05:20:40 machine kernel: fsnotify_grab_connector+0x29/0x80Aug 25 05:20:40 machine kernel: fsnotify_destroy_marks+0x26/0x180Aug 25 05:20:40 machine kernel: __destroy_inode+0x7e/0x180Aug 25 05:20:40 machine kernel: destroy_inode+0x2a/0x70Aug 25 05:20:40 machine kernel: __dentry_kill+0xdc/0x170Aug 25 05:20:40 machine kernel: shrink_dentry_list+0x7d/0x160Aug 25 05:20:40 machine kernel: shrink_dcache_parent+0xcc/0x120Aug 25 05:20:40 machine kernel: d_invalidate+0x62/0xf0Aug 25 05:20:40 machine kernel: ? d_find_any_alias+0x46/0x60Aug 25 05:20:40 machine kernel: proc_invalidate_siblings_dcache+0x12b/0x150Aug 25 05:20:40 machine kernel: release_task+0x39b/0x560Aug 25 05:20:40 machine kernel: wait_consider_task+0x4f9/0xaa0Aug 25 05:20:40 machine kernel: do_wait+0x1f0/0x2f0Aug 25 05:20:40 machine kernel: kernel_wait4+0xb4/0x160Aug 25 05:20:40 machine kernel: ? thread_group_exited+0x50/0x50Aug 25 05:20:40 machine kernel: __do_sys_wait4+0x4b/0xb0Aug 25 05:20:40 machine kernel: ? update_load_avg+0x7e/0x780Aug 25 05:20:40 machine kernel: do_syscall_64+0x55/0xb0Aug 25 05:20:40 machine kernel: ? update_load_avg+0x613/0x780Aug 25 05:20:40 machine kernel: ? update_curr+0x69/0x1e0Aug 25 05:20:40 machine kernel: ? check_preempt_wakeup+0x136/0x2c0Aug 25 05:20:40 machine kernel: ? enqueue_task_fair+0xe7/0x3d0Aug 25 05:20:40 machine kernel: ? check_preempt_curr+0x5a/0x70Aug 25 05:20:40 machine kernel: ? ttwu_do_wakeup+0x17/0x170Aug 25 05:20:40 machine kernel: ? try_to_wake_up+0x26b/0x570Aug 25 05:20:40 machine kernel: ? schedule+0x5a/0xd0Aug 25 05:20:40 machine kernel: ? wake_up_q+0x4a/0x90Aug 25 05:20:40 machine kernel: ? futex_wake+0x151/0x180Aug 25 05:20:40 machine kernel: ? do_futex+0xda/0x1b0Aug 25 05:20:40 machine kernel: ? __x64_sys_futex+0x8e/0x1d0Aug 25 05:20:40 machine kernel: ? exit_to_user_mode_prepare+0x40/0x1e0Aug 25 05:20:40 machine kernel: ? syscall_exit_to_user_mode+0x1e/0x40Aug 25 05:20:40 machine kernel: ? do_syscall_64+0x61/0xb0Aug 25 05:20:40 machine kernel: ? switch_fpu_return+0x4c/0xd0Aug 25 05:20:40 machine kernel: ? exit_to_user_mode_prepare+0x14b/0x1e0Aug 25 05:20:40 machine kernel: ? syscall_exit_to_user_mode+0x1e/0x40Aug 25 05:20:40 machine kernel: ? do_syscall_64+0x61/0xb0Aug 25 05:20:40 machine kernel: ? syscall_exit_to_user_mode+0x1e/0x40Aug 25 05:20:40 machine kernel: ? do_syscall_64+0x61/0xb0Aug 25 05:20:40 machine kernel: ? exit_to_user_mode_prepare+0x40/0x1e0Aug 25 05:20:40 machine kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8Aug 25 05:20:40 machine kernel: RIP: 0033:0x40720eAug 25 05:20:40 machine kernel: Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48Aug 25 05:20:40 machine kernel: RSP: 002b:000000c0000c8b90 EFLAGS: 00000212 ORIG_RAX: 000000000000003dAug 25 05:20:40 machine kernel: RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 000000000040720eAug 25 05:20:40 machine kernel: RDX: 0000000000000001 RSI: 000000c0000c8d4c RDI: ffffffffffffffffAug 25 05:20:40 machine kernel: RBP: 000000c0000c8bd0 R08: 0000000000000000 R09: 0000000000000000Aug 25 05:20:40 machine kernel: R10: 000000c0000c8dd8 R11: 0000000000000212 R12: 000000c0000c8de0Aug 25 05:20:40 machine kernel: R13: ffffffffffffffff R14: 000000c0000061a0 R15: 0000000000000000Aug 25 05:20:40 machine kernel: </TASK>Aug 25 05:20:40 machine kernel: Modules linked in: tcp_diag udp_diag inet_diag iptable_filter iptable_nat wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc sd_mod sg uas usb_storage overlay snd_hda_codec_hdmi binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic snd_sof_pci_intel_tgl ledtrig_audio nls_ascii snd_sof_intel_hda_common nls_cp437 soundwire_intel vfat soundwire_generic_allocation x86_pkg_temp_thermal fat soundwire_cadence intel_powerclamp snd_sof_intel_hda coretemp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils kvm_intel snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi kvm snd_soc_core btusb irqbypass btrtl ghash_clmulni_intel snd_compress btbcmAug 25 05:20:40 machine kernel: sha256_ssse3 iwlmvm i915 btintel soundwire_bus btmtk sha1_ssse3 snd_hda_intel bluetooth mac80211 snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec aesni_intel libarc4 drm_buddy jitterentropy_rng snd_hda_core crypto_simd drm_display_helper iwlwifi cryptd sha512_ssse3 snd_hwdep rapl cec sha512_generic snd_pcm mei_hdcp pmt_telemetry rc_core intel_cstate ctr intel_rapl_msr pmt_class evdev drbg snd_timer processor_thermal_device_pci intel_uncore wmi_bmof processor_thermal_device ttm cfg80211 iTCO_wdt mei_me ansi_cprng pcspkr snd processor_thermal_rfim ecdh_generic intel_pmc_bxt drm_kms_helper processor_thermal_mbox iTCO_vendor_support mxm_wmi soundcore ecc processor_thermal_rapl watchdog mei intel_rapl_common int3403_thermal ee1004 i2c_algo_bit intel_vsec int340x_thermal_zone rfkill int3400_thermal acpi_tad intel_pmc_core acpi_thermal_rel acpi_pad button fuse drm loop efi_pstore dm_mod configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic nvmeAug 25 05:20:40 machine kernel: nvme_core t10_pi crc64_rocksoft ahci crc64 libahci crc_t10dif xhci_pci r8169 libata crct10dif_generic realtek xhci_hcd crc32_pclmul crct10dif_pclmul i2c_i801 mdio_devres crc32c_intel crct10dif_common libphy scsi_mod video i2c_smbus usbcore scsi_common usb_common wmi fanAug 25 05:20:40 machine kernel: CR2: 0000010000000000Aug 25 05:20:40 machine kernel: ---[ end trace 0000000000000000 ]---[..]Aug 25 05:20:40 machine kernel: note: containerd-shim[7190] exited with irqs disabledAug 25 05:20:40 machine kernel: note: containerd-shim[7190] exited with preempt_count 1Aug 25 05:21:10 machine dockerd[1467]: time="2024-08-25T05:21:10.264293797-04:00" level=warning msg="Health check for container 1e3c156667abd51865a2286598994970c92ab8908f238e5e6c3b8fe594f015bc error: timed out starting health check for container 1e3c156667abd51865a2286598994970c92ab8908f238e5e6c3b8fe594f015bc"Aug 25 05:21:10 machine dockerd[1467]: time="2024-08-25T05:21:10.264872608-04:00" level=error msg="stream copy error: reading from a closed fifo"Aug 25 05:21:10 machine dockerd[1467]: time="2024-08-25T05:21:10.265035912-04:00" level=error msg="stream copy error: reading from a closed fifo"
Code:
Aug 25 05:20:40 machine kernel: BUG: unable to handle page fault for address: 0000010000000000Aug 25 05:20:40 machine kernel: #PF: supervisor write access in kernel modeAug 25 05:20:40 machine kernel: #PF: error_code(0x0002) - not-present page
- The fault was caused by a non-present page
- The access causing the fault was a write
- A supervisor-mode access caused the fault (the containerd-shim program)
If it is confirmed that the fault appears during shutdowns, a faulty RAM should trigger the error during normal operations and not only during shutdowns. In this case, the fault could be caused by something related to the container's supervisor (shim-containderd).
By the way, what is the version of installed containerd ? You can check with the command:
Code:
apt list containerd
----
[1] decode_stacktrace: make stack dump output useful again
[2] Combined Volume Set of Intel® 64 and IA-32 Architectures Software Developer’s Manuals
[2] Bug hunting
Statistics: Posted by Aki — 2024-08-30 20:27 — Replies 2 — Views 355