Назад | Перейти на главную страницу

Команды lvm иногда замораживают весь сервер

Я обслуживаю несколько физических машин, на которых работает несколько виртуальных машин, причем «диски» виртуальных машин хранятся на логических томах хоста. Иногда выполнение команд lvm, таких как «lvremove» или «pvmove», вызывает некоторую блокировку, которая в конечном итоге приводит к гибели всего сервера, требующей жесткой перезагрузки.

На (физических) машинах работает стабильный Debian; проблемы проявились как в Debian 9 «stretch», так и в Debian 10 «buster». Используемая система виртуализации - kvm / qemu, конфигурация LV в качестве хранилища для виртуальных машин выглядит как

<disk type='block' device='disk'>
  <driver name='qemu' type='raw' cache='writeback' io='threads'/>
  <source dev='/dev/vg_ssds_0/machine-2-root'/>
  <backingStore/>
  <target dev='vda' bus='virtio'/>
  <boot order='1'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>

Со временем использовались другие значения для «cache» и «io», не оказав заметного эффекта.

Проблемы были вызваны как «медленным» оборудованием хранения (программный («md») RAID 5 на вращающихся дисках, используемых в качестве PV для LVM), так и «более быстрым» (lvmraid на SSD).

Проблемы были вызваны как удалением (неактивных) снимков с помощью «lvremove», так и перемещением (неактивных) логических томов с одного PV на другой с помощью «pvmove».

Обычно при возникновении проблем проявляются следующие симптомы:

  1. исходная команда lvm отображается в состоянии "D" (непрерывный сон)

  2. все больше и больше процессов появляются в состоянии "D", когда они пытаются писать (или, возможно, читать из?) других LV в том же VG

  3. это приводит к тому, что средняя нагрузка увеличивается и увеличивается по мере того, как все больше и больше процессов находятся в состоянии "D". Использование ЦП, конечно, не пострадает.

  4. виртуальные машины, работающие на хосте (и использующие LV в качестве «дисков»), перестают отвечать

  5. иногда даже хост полностью перестает отвечать (даже не отвечает на запросы ping)

Это происходило около пяти раз за несколько лет, так что это нелегко. Кроме того, я попытался настроить «тестовый пример», который запускает ту же проблему без использования

Однажды мне удалось «исправить» проблему с помощью «dmsetup resume», как описано в https://dumbailo.wordpress.com/2019/07/12/linux-lvm-commands-hangs-forever/ - в другой раз я очень старался сделать что-то подобное, и ничего не вышло; команда "dmsetup resume" также застряла в состоянии "D". Несколько раз машина полностью переставала отвечать, и я не мог понять, какое устройство застряло в «приостановленном» состоянии.

Записи журнала во время проблем сильно различаются, но вот что произошло вчера, когда я пытался переместить (часть) LV lvmraid на другой PV, запустив

pvmove -n гостевой дом / dev / sdf / dev / sdk

Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.598315] rcu: INFO: rcu_sched self-detected stall on CPU
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.599523] rcu:         3-....: (5248 ticks this GP) idle=c2e/1/0x4000000000000002 softirq=178427/178473 fqs=1927 
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.600694] rcu:          (t=5251 jiffies g=381357 q=5792)
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601814] NMI backtrace for cpu 3
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601817] CPU: 3 PID: 4481 Comm: mdX_raid5 Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601818] Hardware name: IBM System x3650 M4 : -[7915E3G]-/00Y8494, BIOS -[VVE166CUS-3.00]- 06/12/2019
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601819] Call Trace:
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601823]  <IRQ>
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601830]  dump_stack+0x66/0x90
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601834]  nmi_cpu_backtrace.cold.4+0x13/0x50
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601838]  ? lapic_can_unplug_cpu.cold.31+0x37/0x37
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601841]  nmi_trigger_cpumask_backtrace+0xf9/0xfb
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601845]  rcu_dump_cpu_stacks+0x9b/0xcb
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601847]  rcu_check_callbacks.cold.81+0x1db/0x335
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601851]  ? tick_sched_do_timer+0x60/0x60
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601854]  update_process_times+0x28/0x60
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601856]  tick_sched_handle+0x22/0x60
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601858]  tick_sched_timer+0x37/0x70
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601860]  __hrtimer_run_queues+0x100/0x280
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601863]  hrtimer_interrupt+0x100/0x220
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601868]  ? handle_irq_event+0x47/0x5c
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601871]  smp_apic_timer_interrupt+0x6a/0x140
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601874]  apic_timer_interrupt+0xf/0x20
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601875]  </IRQ>
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601882] RIP: 0010:raid5_get_active_stripe+0x150/0x5e0 [raid456]
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601885] Code: ff 8b 40 50 49 89 ef 85 c0 74 59 8d 48 01 49 8d 55 50 f0 41 0f b1 4d 50 75 39 48 8b 7c 24 18 c6 07 00 66 66 66 90 fb 66 66 90 <66> 66 90 48 8b 5c 24 48 65 48 33 1c 25 28 00 00 00 4c 89 e8 0f 85
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601886] RSP: 0018:ffff9e014f043d58 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601888] RAX: 0000000000000000 RBX: ffff88e3481f1b1c RCX: dead000000000200
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601889] RDX: ffff88e3481f1888 RSI: ffff88e348b59d10 RDI: ffff88e3481f1820
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601890] RBP: ffff88e3481f1801 R08: 0000000000000000 R09: 000000000a6abe00
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601891] R10: 0000000000000002 R11: 0000000000000000 R12: ffff88e3481f1a20
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601892] R13: ffff88e348b59d00 R14: ffff88e348b59d10 R15: ffff88e3481f1800
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601899]  ? raid5_get_active_stripe+0x1d1/0x5e0 [raid456]
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601904]  raid5d+0x299/0x5b0 [raid456]
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601908]  ? schedule_timeout+0x26d/0x390
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601910]  ? prepare_to_wait_event+0xbb/0x140
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601918]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601922]  md_thread+0x94/0x150 [md_mod]
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601925]  ? finish_wait+0x80/0x80
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601929]  kthread+0x112/0x130
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601932]  ? kthread_bind+0x30/0x30
Apr 13 22:28:28 vmhost-eq-1 kernel: [ 5079.601935]  ret_from_fork+0x35/0x40
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250300] ------------[ cut here ]------------
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250305] NETDEV WATCHDOG: eno2 (igb): transmit queue 6 timed out
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250327] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:466 dev_watchdog+0x20d/0x220
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250328] Modules linked in: vhost_net vhost tap tun loop devlink nf_tables nfnetlink dm_mirror dm_region_hash dm_log bridge 8021q garp stp mrp llc dm_raid bonding dm_snapshot dm_bufio intel_rapl sb_edac nls_ascii x86_pkg_temp_thermal intel_powerclamp nls_cp437 efi_pstore vfat fat coretemp kvm_intel kvm irqbypass mgag200 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif ttm drm_kms_helper cdc_ether usbnet intel_cstate mii joydev drm evdev intel_uncore tpm_tis ipmi_si tpm_tis_core pcc_cpufreq ipmi_devintf pcspkr intel_rapl_perf efivars tpm dm_mod ioatdma sg rng_core iTCO_wdt wmi ipmi_msghandler iTCO_vendor_support button efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb hid_generic usbhid hid sr_mod cdrom raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250371]  async_tx xor ses enclosure scsi_transport_sas sd_mod raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod crc32c_intel ahci libahci libata aesni_intel ehci_pci ehci_hcd megaraid_sas aes_x86_64 igb crypto_simd cryptd usbcore glue_helper scsi_mod lpc_ich i2c_algo_bit mfd_core dca usb_common
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250388] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250389] Hardware name: IBM System x3650 M4 : -[7915E3G]-/00Y8494, BIOS -[VVE166CUS-3.00]- 06/12/2019
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250392] RIP: 0010:dev_watchdog+0x20d/0x220
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250394] Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 92 f2 ad 00 01 e8 37 b9 fc ff 89 d9 4c 89 e6 48 c7 c7 58 d9 ad 8b 48 89 c2 e8 8d 1b a6 ff <0f> 0b eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 66 66 66
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250395] RSP: 0018:ffff88e387803e90 EFLAGS: 00010286
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250397] RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000006
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250398] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88e3878166b0
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250399] RBP: ffff88e37128c45c R08: 000000000000080f R09: 0000000000000004
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250400] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88e37128c000
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250401] R13: 0000000000000000 R14: ffff88e37128c480 R15: 0000000000000008
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250402] FS:  0000000000000000(0000) GS:ffff88e387800000(0000) knlGS:0000000000000000
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250403] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250404] CR2: 00007f71d9b23000 CR3: 00000004d240a001 CR4: 00000000000626f0
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250406] Call Trace:
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250409]  <IRQ>
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250413]  ? pfifo_fast_enqueue+0x110/0x110
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250417]  call_timer_fn+0x2b/0x130
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250420]  run_timer_softirq+0x1c7/0x3e0
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250423]  ? __hrtimer_run_queues+0x130/0x280
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250427]  ? recalibrate_cpu_khz+0x10/0x10
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250430]  ? ktime_get+0x3a/0xa0
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250434]  __do_softirq+0xde/0x2d8
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250439]  irq_exit+0xba/0xc0
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250441]  smp_apic_timer_interrupt+0x74/0x140
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250444]  apic_timer_interrupt+0xf/0x20
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250445]  </IRQ>
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250450] RIP: 0010:cpuidle_enter_state+0xb6/0x320
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250451] Code: 90 31 ff e8 3c dc b0 ff 80 7c 24 0b 00 74 17 9c 58 66 66 90 66 90 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 be c4 b6 ff fb 66 66 90 <66> 66 90 48 b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250452] RSP: 0018:ffffffff8bc03e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250454] RAX: ffff88e3878220c0 RBX: 0000049f8953fac9 RCX: 000000000000001f
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250455] RDX: 0000049f8953fac9 RSI: 0000000040000431 RDI: 0000000000000000
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250456] RBP: ffff88e38782a710 R08: 0000000000000002 R09: 0000000000021980
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250457] R10: 00000999a026fa64 R11: ffff88e3878210a8 R12: 0000000000000002
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250458] R13: ffffffff8bcb71b8 R14: 0000000000000002 R15: 00000000558118aa
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250463]  do_idle+0x228/0x270
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250466]  cpu_startup_entry+0x6f/0x80
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250469]  start_kernel+0x50c/0x52c
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250473]  secondary_startup_64+0xa4/0xb0
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250476] ---[ end trace 410da044632bdcd2 ]---
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.250492] igb 0000:06:00.0 eno2: Reset adapter
Apr 13 22:28:32 vmhost-eq-1 kernel: [ 5083.302359] bond0: link status down for active interface eno2, disabling it in 200 ms
Apr 13 22:28:33 vmhost-eq-1 kernel: [ 5084.266354] bond0: link status down for active interface eno2, disabling it in 200 ms
Apr 13 22:28:33 vmhost-eq-1 kernel: [ 5084.274331] igb 0000:06:00.0 eno2: igb: eno2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Apr 13 22:28:33 vmhost-eq-1 kernel: [ 5084.274355] bond0: link status down for active interface eno2, disabling it in 200 ms
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.518186] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [mdX_raid5:4481]
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519321] Modules linked in: vhost_net vhost tap tun loop devlink nf_tables nfnetlink dm_mirror dm_region_hash dm_log bridge 8021q garp stp mrp llc dm_raid bonding dm_snapshot dm_bufio intel_rapl sb_edac nls_ascii x86_pkg_temp_thermal intel_powerclamp nls_cp437 efi_pstore vfat fat coretemp kvm_intel kvm irqbypass mgag200 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif ttm drm_kms_helper cdc_ether usbnet intel_cstate mii joydev drm evdev intel_uncore tpm_tis ipmi_si tpm_tis_core pcc_cpufreq ipmi_devintf pcspkr intel_rapl_perf efivars tpm dm_mod ioatdma sg rng_core iTCO_wdt wmi ipmi_msghandler iTCO_vendor_support button efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb hid_generic usbhid hid sr_mod cdrom raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519365]  async_tx xor ses enclosure scsi_transport_sas sd_mod raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod crc32c_intel ahci libahci libata aesni_intel ehci_pci ehci_hcd megaraid_sas aes_x86_64 igb crypto_simd cryptd usbcore glue_helper scsi_mod lpc_ich i2c_algo_bit mfd_core dca usb_common
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519384] CPU: 3 PID: 4481 Comm: mdX_raid5 Tainted: G        W         4.19.0-8-amd64 #1 Debian 4.19.98-1
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519385] Hardware name: IBM System x3650 M4 : -[7915E3G]-/00Y8494, BIOS -[VVE166CUS-3.00]- 06/12/2019
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519395] RIP: 0010:raid5d+0x222/0x5b0 [raid456]
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519397] Code: 42 01 00 00 41 8b 87 e0 00 00 00 49 c7 87 d8 00 00 00 00 00 00 00 89 44 24 2c 48 8b 7c 24 40 c6 07 00 66 66 66 90 fb 66 66 90 <66> 66 90 4d 8b 66 28 48 8d 4c 24 50 45 31 c0 31 d2 4c 89 ff 49 83
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519398] RSP: 0018:ffff9e014f043de0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519400] RAX: 0000000000000000 RBX: 000000001f403c70 RCX: 0000000000000000
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519401] RDX: 0000000000000001 RSI: ffff88e3481f18b8 RDI: ffff88e3481f1b1c
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519402] RBP: ffff9e014f043eb0 R08: 0000000000000000 R09: ffff88e348441c00
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519403] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519404] R13: ffff88e348b59d00 R14: ffff88e37be48370 R15: ffff88e3481f1800
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519405] FS:  0000000000000000(0000) GS:ffff88e3878c0000(0000) knlGS:0000000000000000
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519407] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519408] CR2: 00007fe8d995cff0 CR3: 00000004d240a002 CR4: 00000000000626e0
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519409] Call Trace:
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519419]  ? schedule_timeout+0x26d/0x390
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519423]  ? prepare_to_wait_event+0xbb/0x140
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519431]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519436]  md_thread+0x94/0x150 [md_mod]
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519439]  ? finish_wait+0x80/0x80
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519443]  kthread+0x112/0x130
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519445]  ? kthread_bind+0x30/0x30
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5105.519449]  ret_from_fork+0x35/0x40
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.034197] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 3-... } 5696 jiffies s: 1197 root: 0x1/.
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.035352] rcu: blocking rcu_node structures: l=1:0-11:0x8/.
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036422] Task dump for CPU 3:
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036424] mdX_raid5       R  running task        0  4481      2 0x80000808
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036427] Call Trace:
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036440]  ? release_stripe_list+0x57/0x70 [raid456]
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036445]  ? raid5d+0x2b9/0x5b0 [raid456]
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036451]  ? schedule_timeout+0x26d/0x390
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036454]  ? prepare_to_wait_event+0xbb/0x140
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036462]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036466]  ? md_thread+0x94/0x150 [md_mod]
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036468]  ? finish_wait+0x80/0x80
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036471]  ? kthread+0x112/0x130
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036473]  ? kthread_bind+0x30/0x30
Apr 13 22:28:54 vmhost-eq-1 kernel: [ 5106.036476]  ? ret_from_fork+0x35/0x40
Apr 13 22:29:07 vmhost-eq-1 systemd-udevd[546]: dm-646: Worker [22289] processing SEQNUM=8427 is taking a long time
Apr 13 22:29:07 vmhost-eq-1 systemd-udevd[546]: dm-642: Worker [22291] processing SEQNUM=8428 is taking a long time
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.518044] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [mdX_raid5:4481]
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519195] Modules linked in: vhost_net vhost tap tun loop devlink nf_tables nfnetlink dm_mirror dm_region_hash dm_log bridge 8021q garp stp mrp llc dm_raid bonding dm_snapshot dm_bufio intel_rapl sb_edac nls_ascii x86_pkg_temp_thermal intel_powerclamp nls_cp437 efi_pstore vfat fat coretemp kvm_intel kvm irqbypass mgag200 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif ttm drm_kms_helper cdc_ether usbnet intel_cstate mii joydev drm evdev intel_uncore tpm_tis ipmi_si tpm_tis_core pcc_cpufreq ipmi_devintf pcspkr intel_rapl_perf efivars tpm dm_mod ioatdma sg rng_core iTCO_wdt wmi ipmi_msghandler iTCO_vendor_support button efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb hid_generic usbhid hid sr_mod cdrom raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519237]  async_tx xor ses enclosure scsi_transport_sas sd_mod raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod crc32c_intel ahci libahci libata aesni_intel ehci_pci ehci_hcd megaraid_sas aes_x86_64 igb crypto_simd cryptd usbcore glue_helper scsi_mod lpc_ich i2c_algo_bit mfd_core dca usb_common
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519255] CPU: 3 PID: 4481 Comm: mdX_raid5 Tainted: G        W    L    4.19.0-8-amd64 #1 Debian 4.19.98-1
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519256] Hardware name: IBM System x3650 M4 : -[7915E3G]-/00Y8494, BIOS -[VVE166CUS-3.00]- 06/12/2019
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519266] RIP: 0010:raid5d+0x222/0x5b0 [raid456]
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519268] Code: 42 01 00 00 41 8b 87 e0 00 00 00 49 c7 87 d8 00 00 00 00 00 00 00 89 44 24 2c 48 8b 7c 24 40 c6 07 00 66 66 66 90 fb 66 66 90 <66> 66 90 4d 8b 66 28 48 8d 4c 24 50 45 31 c0 31 d2 4c 89 ff 49 83
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519269] RSP: 0018:ffff9e014f043de0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519271] RAX: 0000000000000000 RBX: 000000001f403c70 RCX: 0000000000000000
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519272] RDX: 0000000000000001 RSI: ffff88e3481f18b8 RDI: ffff88e3481f1b1c
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519273] RBP: ffff9e014f043eb0 R08: 0000000000000000 R09: ffff88e348441c00
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519274] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519275] R13: ffff88e348b59d00 R14: ffff88e37be48370 R15: ffff88e3481f1800
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519277] FS:  0000000000000000(0000) GS:ffff88e3878c0000(0000) knlGS:0000000000000000
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519279] CR2: 00007fe8d995cff0 CR3: 00000004d240a002 CR4: 00000000000626e0
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519280] Call Trace:
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519290]  ? schedule_timeout+0x26d/0x390
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519294]  ? prepare_to_wait_event+0xbb/0x140
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519301]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519306]  md_thread+0x94/0x150 [md_mod]
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519309]  ? finish_wait+0x80/0x80
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519312]  kthread+0x112/0x130
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519314]  ? kthread_bind+0x30/0x30
Apr 13 22:29:22 vmhost-eq-1 kernel: [ 5133.519318]  ret_from_fork+0x35/0x40
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.605998] rcu: INFO: rcu_sched self-detected stall on CPU
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.607147] rcu:         3-....: (21000 ticks this GP) idle=c2e/1/0x4000000000000002 softirq=178427/178473 fqs=8415 
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.608244] rcu:          (t=21003 jiffies g=381357 q=7130)
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609353] NMI backtrace for cpu 3
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609356] CPU: 3 PID: 4481 Comm: mdX_raid5 Tainted: G        W    L    4.19.0-8-amd64 #1 Debian 4.19.98-1
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609357] Hardware name: IBM System x3650 M4 : -[7915E3G]-/00Y8494, BIOS -[VVE166CUS-3.00]- 06/12/2019
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609358] Call Trace:
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609362]  <IRQ>
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609369]  dump_stack+0x66/0x90
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609373]  nmi_cpu_backtrace.cold.4+0x13/0x50
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609377]  ? lapic_can_unplug_cpu.cold.31+0x37/0x37
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609379]  nmi_trigger_cpumask_backtrace+0xf9/0xfb
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609383]  rcu_dump_cpu_stacks+0x9b/0xcb
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609385]  rcu_check_callbacks.cold.81+0x1db/0x335
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609388]  ? tick_sched_do_timer+0x60/0x60
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609392]  update_process_times+0x28/0x60
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609394]  tick_sched_handle+0x22/0x60
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609395]  tick_sched_timer+0x37/0x70
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609398]  __hrtimer_run_queues+0x100/0x280
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609401]  hrtimer_interrupt+0x100/0x220
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609405]  smp_apic_timer_interrupt+0x6a/0x140
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609407]  apic_timer_interrupt+0xf/0x20
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609408]  </IRQ>
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609417] RIP: 0010:handle_active_stripes.isra.73+0x17f/0x5b0 [raid456]
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609419] Code: 48 8b 10 48 39 d0 0f 85 08 02 00 00 48 83 c0 10 48 39 c8 75 eb 48 8d ab 1c 03 00 00 48 89 ef c6 07 00 66 66 66 90 fb 66 66 90 <66> 66 90 48 8b bb 30 05 00 00 48 85 ff 0f 84 c2 03 00 00 e8 89 40
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609420] RSP: 0018:ffff9e014f043d38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609422] RAX: ffff88e3481f1d20 RBX: ffff88e3481f1800 RCX: ffff88e3481f1d20
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609423] RDX: ffff88e3481f1d10 RSI: 00000000ffffffff RDI: ffff88e3481f1b1c
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609424] RBP: ffff88e3481f1b1c R08: 0000000000000000 R09: 0000000000000000
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609425] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609426] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609432]  ? handle_active_stripes.isra.73+0x49f/0x5b0 [raid456]
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609437]  ? raid5_release_stripe+0x10f/0x120 [raid456]
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609441]  raid5d+0x392/0x5b0 [raid456]
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609444]  ? schedule_timeout+0x26d/0x390
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609447]  ? prepare_to_wait_event+0xbb/0x140
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609455]  ? md_rdev_init+0xb0/0xb0 [md_mod]
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609459]  md_thread+0x94/0x150 [md_mod]
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609462]  ? finish_wait+0x80/0x80
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609466]  kthread+0x112/0x130
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609468]  ? kthread_bind+0x30/0x30
Apr 13 22:29:31 vmhost-eq-1 kernel: [ 5142.609471]  ret_from_fork+0x35/0x40
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.617963] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [migration/10:63]
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619192] Modules linked in: (same-as-before-omitted-here-to-save-space-because-the-question-is-too-big-for-serverfault)
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619234]  async_tx xor ses enclosure scsi_transport_sas sd_mod raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod crc32c_intel ahci libahci libata aesni_intel ehci_pci ehci_hcd megaraid_sas aes_x86_64 igb crypto_simd cryptd usbcore glue_helper scsi_mod lpc_ich i2c_algo_bit mfd_core dca usb_common
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619253] CPU: 10 PID: 63 Comm: migration/10 Tainted: G        W    L    4.19.0-8-amd64 #1 Debian 4.19.98-1
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619254] Hardware name: IBM System x3650 M4 : -[7915E3G]-/00Y8494, BIOS -[VVE166CUS-3.00]- 06/12/2019
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619263] RIP: 0010:multi_cpu_stop+0x4b/0xf0
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619265] Code: 90 66 90 48 89 04 24 48 8b 47 18 48 85 c0 0f 84 95 00 00 00 89 db 48 0f a3 18 41 0f 92 c7 4c 8d 65 24 45 31 f6 45 31 ed f3 90 <8b> 5d 20 44 39 eb 74 3f 83 fb 02 74 4c 83 fb 03 75 15 45 84 ff 74
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619266] RSP: 0018:ffff9e0146523e78 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619268] RAX: ffffffff8b82dd80 RBX: 0000000000000001 RCX: dead000000000200
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619269] RDX: ffff88f18791d350 RSI: ffff9e014869f690 RDI: ffff9e014869f6e0
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619270] RBP: ffff9e014869f6e0 R08: 0000000000000000 R09: 0000000000000001
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619271] R10: 0000000000000001 R11: 00000000000040f8 R12: ffff9e014869f704
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619272] R13: 0000000000000001 R14: 0000000000000000 R15: ffff88f18791d300
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619274] FS:  0000000000000000(0000) GS:ffff88f187900000(0000) knlGS:0000000000000000
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619275] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619276] CR2: 00007fa4ece2d6e0 CR3: 00000004d240a006 CR4: 00000000000626e0
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619277] Call Trace:
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619286]  ? cpu_stopper_thread+0x100/0x100
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619287]  cpu_stopper_thread+0x47/0x100
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619291]  ? sort_range+0x20/0x20
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619293]  smpboot_thread_fn+0xc5/0x160
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619297]  kthread+0x112/0x130
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619299]  ? kthread_bind+0x30/0x30
Apr 13 22:29:38 vmhost-eq-1 kernel: [ 5149.619305]  ret_from_fork+0x35/0x40

После жесткой перезагрузки система «найдет» pvmove и без проблем завершит его. Больше pvmoves, подобных показанному выше, тоже работали нормально.

Есть идеи, что можно сделать, чтобы он не зависал? :-)

Я должен где-нибудь сообщить об этом? Как ошибка пакета Debian lvm? Как ошибка пакета ядра Debian? Сами разработчикам lvm? (где?)

С уважением

Андреас Троттманн