Сервер моей лаборатории с Debian-Wheezy-7.8-Stable продолжает перезагружаться несколько раз после нескольких часов безотказной работы без каких-либо уведомлений. Этот сервер настроен для численных вычислений со значительной нагрузкой, а также для параллельных вычислений. Я распечатал журнал var/log/messages
и last reboot
но мне было трудно понять эти сообщения журнала. Я попытался заглянуть в запись прямо перед временем перезагрузки и посмотреть в то же время в var/log/messages
но кажется, что записи из var/log/messages
показывать журнал / сообщения только после перезагрузки.
Я поискал и обнаружил, что у некоторых людей возникает одна и та же проблема, но кажется, что причина отличается друг от друга и /var/log/messages
кажется ключом к разгадке проблемы. Что значит мой var/log/messages
на самом деле описать в отношении этого нежелательного события перезагрузки? а как начать учиться читать этот журнал новичку? Я имею в виду, есть ли какое-нибудь важное ключевое слово для поиска или что-то в этом роде?
Спасибо за любую помощь, которую вы можете оказать.
last reboot
reboot system boot 3.2.0-4-amd64 Wed May 20 03:29 - 12:43 (09:14)
reboot system boot 3.2.0-4-amd64 Tue May 19 16:01 - 12:43 (20:42)
var/log/messages
May 18 07:35:01 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2400" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May 19 07:35:01 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2400" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May 19 16:01:19 labserver kernel: imklog 5.8.11, log source = /proc/kmsg started.
May 19 16:01:19 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2401" x-info="http://www.rsyslog.com"] start
May 19 16:01:19 labserver kernel: [ 0.000000] Initializing cgroup subsys cpuset
May 19 16:01:19 labserver kernel: [ 0.000000] Initializing cgroup subsys cpu
May 19 16:01:19 labserver kernel: [ 0.000000] Linux version 3.2.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.65-1+deb7u2
May 19 16:01:19 labserver kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-amd64 root=UUID=1fc245ac-9058-4208-862a-7f4e8e1b20b2 ro text
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-provided physical RAM map:
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 0000000000100000 - 000000007df71000 (usable)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000007df71000 - 000000007e0f1000 (reserved)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000007e0f1000 - 000000007e2ec000 (ACPI NVS)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000007e2ec000 - 000000007f367000 (reserved)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 000000007f367000 - 000000007f800000 (ACPI NVS)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 0000000080000000 - 0000000090000000 (reserved)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 00000000fed1c000 - 00000000fed40000 (reserved)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
May 19 16:01:19 labserver kernel: [ 0.000000] BIOS-e820: 0000000100000000 - 0000000880000000 (usable)
May 19 16:01:19 labserver kernel: [ 0.000000] NX (Execute Disable) protection: active
May 19 16:01:19 labserver kernel: [ 0.000000] SMBIOS 2.7 present.
May 19 16:01:19 labserver kernel: [ 0.000000] No AGP bridge found
May 19 16:01:19 labserver kernel: [ 0.000000] last_pfn = 0x880000 max_arch_pfn = 0x400000000
May 19 16:01:19 labserver kernel: [ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
May 19 16:01:19 labserver kernel: [ 0.000000] last_pfn = 0x7df71 max_arch_pfn = 0x400000000
May 19 16:01:19 labserver kernel: [ 0.000000] found SMP MP-table at [ffff8800000fd900] fd900
May 19 16:01:19 labserver kernel: [ 0.000000] Using GB pages for direct mapping
May 19 16:01:19 labserver kernel: [ 0.000000] init_memory_mapping: 0000000000000000-000000007df71000
May 19 16:01:19 labserver kernel: [ 0.000000] init_memory_mapping: 0000000100000000-0000000880000000
May 19 16:01:19 labserver kernel: [ 0.000000] RAMDISK: 36bea000 - 375ed000
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: RSDP 00000000000f04a0 00024 (v02 ALASKA)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: XSDT 000000007e204088 0008C (v01 ALASKA A M I 01072009 AMI 00010013)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: FACP 000000007e211040 0010C (v05 ALASKA A M I 01072009 AMI 00010013)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI Warning: FADT (revision 5) is longer than ACPI 2.0 version, truncating length 268 to 244 (20110623/tbfadt-288)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: DSDT 000000007e2041a8 0CE96 (v02 ALASKA A M I 00000015 INTL 20051117)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: FACS 000000007e2e3080 00040
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: APIC 000000007e211150 00100 (v03 ALASKA A M I 01072009 AMI 00010013)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: FPDT 000000007e211250 00044 (v01 ALASKA A M I 01072009 AMI 00010013)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: MCFG 000000007e211298 0003C (v01 ALASKA OEMMCFG. 01072009 MSFT 00000097)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: HPET 000000007e2112d8 00038 (v01 ALASKA A M I 01072009 AMI. 00000005)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: PRAD 000000007e211310 000BE (v02 PRADID PRADTID 00000001 MSFT 03000001)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: SPMI 000000007e2113d0 00040 (v05 A M I OEMSPMI 00000000 AMI. 00000000)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: SSDT 000000007e211410 D0CB0 (v02 INTEL CpuPm 00004000 INTL 20051117)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: EINJ 000000007e2e20c0 00130 (v01 AMI AMI EINJ 00000000 00000000)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: ERST 000000007e2e21f0 00230 (v01 AMIER AMI ERST 00000000 00000000)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: HEST 000000007e2e2420 000A8 (v01 AMI AMI HEST 00000000 00000000)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: BERT 000000007e2e24c8 00030 (v01 AMI AMI BERT 00000000 00000000)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: DMAR 000000007e2e24f8 000C4 (v01 A M I OEMDMAR 00000001 INTL 00000001)
May 19 16:01:19 labserver kernel: [ 0.000000] No NUMA configuration found
May 19 16:01:19 labserver kernel: [ 0.000000] Faking a node at 0000000000000000-0000000880000000
May 19 16:01:19 labserver kernel: [ 0.000000] Initmem setup node 0 0000000000000000-0000000880000000
May 19 16:01:19 labserver kernel: [ 0.000000] NODE_DATA [000000087fffb000 - 000000087fffffff]
May 19 16:01:19 labserver kernel: [ 0.000000] Zone PFN ranges:
May 19 16:01:19 labserver kernel: [ 0.000000] DMA 0x00000010 -> 0x00001000
May 19 16:01:19 labserver kernel: [ 0.000000] DMA32 0x00001000 -> 0x00100000
May 19 16:01:19 labserver kernel: [ 0.000000] Normal 0x00100000 -> 0x00880000
May 19 16:01:19 labserver kernel: [ 0.000000] Movable zone start PFN for each node
May 19 16:01:19 labserver kernel: [ 0.000000] early_node_map[3] active PFN ranges
May 19 16:01:19 labserver kernel: [ 0.000000] 0: 0x00000010 -> 0x0000009a
May 19 16:01:19 labserver kernel: [ 0.000000] 0: 0x00000100 -> 0x0007df71
May 19 16:01:19 labserver kernel: [ 0.000000] 0: 0x00100000 -> 0x00880000
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: PM-Timer IO Port: 0x408
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0a] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x09] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0b] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
May 19 16:01:19 labserver kernel: [ 0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec01000] gsi_base[24])
May 19 16:01:19 labserver kernel: [ 0.000000] IOAPIC[1]: apic_id 2, version 32, address 0xfec01000, GSI 24-47
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
May 19 16:01:19 labserver kernel: [ 0.000000] Using ACPI (MADT) for SMP configuration information
May 19 16:01:19 labserver kernel: [ 0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
May 19 16:01:19 labserver kernel: [ 0.000000] SMP: Allowing 12 CPUs, 0 hotplug CPUs
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000000009a000 - 000000000009b000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000000009b000 - 00000000000a0000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007df71000 - 000000007e0f1000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007e0f1000 - 000000007e2ec000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007e2ec000 - 000000007f367000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007f367000 - 000000007f800000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 000000007f800000 - 0000000080000000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 0000000080000000 - 0000000090000000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 0000000090000000 - 00000000fed1c000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000fed1c000 - 00000000fed40000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000fed40000 - 00000000ff000000
May 19 16:01:19 labserver kernel: [ 0.000000] PM: Registered nosave memory: 00000000ff000000 - 0000000100000000
May 19 16:01:19 labserver kernel: [ 0.000000] Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
May 19 16:01:19 labserver kernel: [ 0.000000] Booting paravirtualized kernel on bare hardware
May 19 16:01:19 labserver kernel: [ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:12 nr_node_ids:1
May 19 16:01:19 labserver kernel: [ 0.000000] PERCPU: Embedded 27 pages/cpu @ffff88087fc00000 s78848 r8192 d23552 u131072
May 19 16:01:19 labserver kernel: [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 8258294
May 19 16:01:19 labserver kernel: [ 0.000000] Policy zone: Normal
May 19 16:01:19 labserver kernel: [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-amd64 root=UUID=1fc245ac-9058-4208-862a-7f4e8e1b20b2 ro text
May 19 16:01:19 labserver kernel: [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
May 19 16:01:19 labserver kernel: [ 0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
May 19 16:01:19 labserver kernel: [ 0.000000] Checking aperture...
May 19 16:01:19 labserver kernel: [ 0.000000] No AGP bridge found
May 19 16:01:19 labserver kernel: [ 0.000000] Memory: 32975732k/35651584k available (3434k kernel code, 2130964k absent, 544888k reserved, 3305k data, 576k init)
May 19 16:01:19 labserver kernel: [ 0.000000] Hierarchical RCU implementation.
May 19 16:01:19 labserver kernel: [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
May 19 16:01:19 labserver kernel: [ 0.000000] NR_IRQS:33024 nr_irqs:1184 16
May 19 16:01:19 labserver kernel: [ 0.000000] Extended CMOS year: 2000
May 19 16:01:19 labserver kernel: [ 0.000000] Console: colour VGA+ 80x25
May 19 16:01:19 labserver kernel: [ 0.000000] console [tty0] enabled
May 19 16:01:19 labserver kernel: [ 0.000000] Fast TSC calibration using PIT
May 19 16:01:19 labserver kernel: [ 0.004000] Detected 2100.074 MHz processor.
May 19 16:01:19 labserver kernel: [ 0.000003] Calibrating delay loop (skipped), value calculated using timer frequency.. 4200.14 BogoMIPS (lpj=8400296)
May 19 16:01:19 labserver kernel: [ 0.000144] pid_max: default: 32768 minimum: 301
May 19 16:01:19 labserver kernel: [ 0.000253] Security Framework initialized
May 19 16:01:19 labserver kernel: [ 0.000324] AppArmor: AppArmor disabled by boot time parameter
May 19 16:01:19 labserver kernel: [ 0.002355] Dentry cache hash table entries: 4194304 (order: 13, 33554432 bytes)
May 19 16:01:19 labserver kernel: [ 0.011585] Inode-cache hash table entries: 2097152 (order: 12, 16777216 bytes)
May 19 16:01:19 labserver kernel: [ 0.015724] Mount-cache hash table entries: 256
May 19 16:01:19 labserver kernel: [ 0.015915] Initializing cgroup subsys cpuacct
May 19 16:01:19 labserver kernel: [ 0.015986] Initializing cgroup subsys memory
May 19 16:01:19 labserver kernel: [ 0.016063] Initializing cgroup subsys devices
May 19 16:01:19 labserver kernel: [ 0.016133] Initializing cgroup subsys freezer
May 19 16:01:19 labserver kernel: [ 0.016201] Initializing cgroup subsys net_cls
May 19 16:01:19 labserver kernel: [ 0.016270] Initializing cgroup subsys blkio
May 19 16:01:19 labserver kernel: [ 0.016344] Initializing cgroup subsys perf_event
May 19 16:01:19 labserver kernel: [ 0.016441] CPU: Physical Processor ID: 0
May 19 16:01:19 labserver kernel: [ 0.016509] CPU: Processor Core ID: 0
May 19 16:01:19 labserver kernel: [ 0.017564] mce: CPU supports 23 MCE banks
May 19 16:01:19 labserver kernel: [ 0.017670] CPU0: Thermal monitoring enabled (TM1)
May 19 16:01:19 labserver kernel: [ 0.017768] using mwait in idle threads.
May 19 16:01:19 labserver kernel: [ 0.018315] ACPI: Core revision 20110623
May 19 16:01:19 labserver kernel: [ 0.049889] DMAR: Host address width 46
May 19 16:01:19 labserver kernel: [ 0.049958] DMAR: DRHD base: 0x000000fbffc000 flags: 0x1
May 19 16:01:19 labserver kernel: [ 0.050034] IOMMU 0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020de
May 19 16:01:19 labserver kernel: [ 0.050122] DMAR: RMRR base: 0x0000007f239000 end: 0x0000007f247fff
May 19 16:01:19 labserver kernel: [ 0.050195] DMAR: ATSR flags: 0x0
May 19 16:01:19 labserver kernel: [ 0.050261] DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x0
May 19 16:01:19 labserver kernel: [ 0.050427] IOAPIC id 0 under DRHD base 0xfbffc000 IOMMU 0
May 19 16:01:19 labserver kernel: [ 0.050497] IOAPIC id 2 under DRHD base 0xfbffc000 IOMMU 0
May 19 16:01:19 labserver kernel: [ 0.050568] HPET id 0 under DRHD base 0xfbffc000
May 19 16:01:19 labserver kernel: [ 0.050741] Enabled IRQ remapping in x2apic mode
May 19 16:01:19 labserver kernel: [ 0.050810] Enabling x2apic
May 19 16:01:19 labserver kernel: [ 0.050875] Enabled x2apic
May 19 16:01:19 labserver kernel: [ 0.050943] Switched APIC routing to cluster x2apic.
May 19 16:01:19 labserver kernel: [ 0.051552] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
May 19 16:01:19 labserver kernel: [ 0.091256] CPU0: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping 04
May 19 16:01:19 labserver kernel: [ 0.195570] Performance Events: PEBS fmt1+, generic architected perfmon, Intel PMU driver.
May 19 16:01:19 labserver kernel: [ 0.195802] ... version: 3
May 19 16:01:19 labserver kernel: [ 0.195869] ... bit width: 48
May 19 16:01:19 labserver kernel: [ 0.195936] ... generic registers: 4
May 19 16:01:19 labserver kernel: [ 0.196003] ... value mask: 0000ffffffffffff
May 19 16:01:19 labserver kernel: [ 0.196073] ... max period: 000000007fffffff
May 19 16:01:19 labserver kernel: [ 0.196143] ... fixed-purpose events: 3
May 19 16:01:19 labserver kernel: [ 0.196210] ... event mask: 000000070000000f
May 19 16:01:19 labserver kernel: [ 0.196468] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 0.196637] Booting Node 0, Processors #1
May 19 16:01:19 labserver kernel: [ 0.312587] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 0.312765] #2
May 19 16:01:19 labserver kernel: [ 0.424400] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 0.424578] #3
May 19 16:01:19 labserver kernel: [ 0.536316] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 0.536489] #4
May 19 16:01:19 labserver kernel: [ 0.648124] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 0.648303] #5
May 19 16:01:19 labserver kernel: [ 0.759941] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 0.760115] #6
May 19 16:01:19 labserver kernel: [ 0.871864] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 0.872050] #7
May 19 16:01:19 labserver kernel: [ 0.983690] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 0.983866] #8
May 19 16:01:19 labserver kernel: [ 1.095600] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 1.095774] #9
May 19 16:01:19 labserver kernel: [ 1.207414] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 1.207589] #10
May 19 16:01:19 labserver kernel: [ 1.319223] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 1.319400] #11 Ok.
May 19 16:01:19 labserver kernel: [ 1.431095] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [ 1.431192] Brought up 12 CPUs
May 19 16:01:19 labserver kernel: [ 1.431260] Total of 12 processors activated (50398.84 BogoMIPS).
May 19 16:01:19 labserver kernel: [ 1.450786] devtmpfs: initialized
May 19 16:01:19 labserver kernel: [ 1.455360] PM: Registering ACPI NVS region at 7e0f1000 (2076672 bytes)
May 19 16:01:19 labserver kernel: [ 1.455494] PM: Registering ACPI NVS region at 7f367000 (4820992 bytes)
May 19 16:01:19 labserver kernel: [ 1.455843] print_constraints: dummy:
May 19 16:01:19 labserver kernel: [ 1.455977] NET: Registered protocol family 16
May 19 16:01:19 labserver kernel: [ 1.456140] ACPI: bus type pci registered
May 19 16:01:19 labserver kernel: [ 1.456268] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
May 19 16:01:19 labserver kernel: [ 1.456361] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
May 19 16:01:19 labserver kernel: [ 1.466673] PCI: Using configuration type 1 for base access
May 19 16:01:19 labserver kernel: [ 1.468173] bio: create slab <bio-0> at 0
May 19 16:01:19 labserver kernel: [ 1.468353] ACPI: Added _OSI(Module Device)
May 19 16:01:19 labserver kernel: [ 1.468422] ACPI: Added _OSI(Processor Device)
May 19 16:01:19 labserver kernel: [ 1.468491] ACPI: Added _OSI(3.0 _SCP Extensions)
May 19 16:01:19 labserver kernel: [ 1.468560] ACPI: Added _OSI(Processor Aggregator Device)
May 19 16:01:19 labserver kernel: [ 1.484562] ACPI: Executed 1 blocks of module-level executable AML code
May 19 16:01:19 labserver kernel: [ 1.727818] ACPI: Interpreter enabled
May 19 16:01:19 labserver kernel: [ 1.727891] ACPI: (supports S0 S1 S4 S5)
May 19 16:01:19 labserver kernel: [ 1.728159] ACPI: Using IOAPIC for interrupt routing
May 19 16:01:19 labserver kernel: [ 1.736531] ACPI: No dock devices found.
May 19 16:01:19 labserver kernel: [ 1.736630] HEST: Table parsing has been initialized.
May 19 16:01:19 labserver kernel: [ 1.736704] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
May 19 16:01:19 labserver kernel: [ 1.737041] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
May 19 16:01:19 labserver kernel: [ 1.737361] pci_root PNP0A08:00: host bridge window [io 0x0000-0x03af]
May 19 16:01:19 labserver kernel: [ 1.737435] pci_root PNP0A08:00: host bridge window [io 0x03e0-0x0cf7]
May 19 16:01:19 labserver kernel: [ 1.737508] pci_root PNP0A08:00: host bridge window [io 0x03b0-0x03df]
May 19 16:01:19 labserver kernel: [ 1.737586] pci_root PNP0A08:00: host bridge window [io 0x0d00-0xffff]
May 19 16:01:19 labserver kernel: [ 1.737659] pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
May 19 16:01:19 labserver kernel: [ 1.737747] pci_root PNP0A08:00: host bridge window [mem 0x000c0000-0x000dffff]
May 19 16:01:19 labserver kernel: [ 1.737834] pci_root PNP0A08:00: host bridge window [mem 0xfed0e000-0xfed0ffff]
May 19 16:01:19 labserver kernel: [ 1.737922] pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xfbffffff]
May 19 16:01:19 labserver kernel: [ 1.740791] pci 0000:00:01.0: PCI bridge to [bus 01-01]
May 19 16:01:19 labserver kernel: [ 1.745575] pci 0000:00:01.1: PCI bridge to [bus 02-03]
May 19 16:01:19 labserver kernel: [ 1.745700] pci 0000:00:02.0: PCI bridge to [bus 04-04]
May 19 16:01:19 labserver kernel: [ 1.745816] pci 0000:00:03.0: PCI bridge to [bus 05-05]
May 19 16:01:19 labserver kernel: [ 1.745933] pci 0000:00:03.2: PCI bridge to [bus 06-06]
May 19 16:01:19 labserver kernel: [ 1.746285] pci 0000:00:11.0: PCI bridge to [bus 07-07]
May 19 16:01:19 labserver kernel: [ 1.746541] pci 0000:00:1e.0: PCI bridge to [bus 08-08] (subtractive decode)
May 19 16:01:19 labserver kernel: [ 1.747170] pci0000:00: Requesting ACPI _OSC control (0x1d)
May 19 16:01:19 labserver kernel: [ 1.747465] pci0000:00: ACPI _OSC control (0x15) granted
May 19 16:01:19 labserver kernel: [ 1.756901] ACPI: PCI Root Bridge [UNC0] (domain 0000 [bus ff])
May 19 16:01:19 labserver kernel: [ 1.758443] pci0000:ff: Requesting ACPI _OSC control (0x1d)
May 19 16:01:19 labserver kernel: [ 1.758528] pci0000:ff: ACPI _OSC control (0x1d) granted
May 19 16:01:19 labserver kernel: [ 1.759439] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
May 19 16:01:19 labserver kernel: [ 1.760105] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 11 12 14 15)
May 19 16:01:19 labserver kernel: [ 1.760768] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 10 11 12 14 15)
May 19 16:01:19 labserver kernel: [ 1.761383] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 10 *11 12 14 15)
May 19 16:01:19 labserver kernel: [ 1.762006] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0
May 19 16:01:19 labserver kernel: [ 1.762729] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0
May 19 16:01:19 labserver kernel: [ 1.763450] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0
May 19 16:01:19 labserver kernel: [ 1.764170] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *7 10 11 12 14 15)
Вам необходимо предоставить дополнительную информацию, особенно записи журнала, непосредственно перед перезагрузкой системы. Однако, насколько я понимаю, он может не предоставить дополнительной информации. Проверьте другие журналы, такие как системный журнал.
По моему опыту, наиболее частые причины внезапных перезапусков без каких-либо указаний на то, что действительно пошло не так, часто связаны с оборудованием. В противном случае ядро в большинстве случаев будет иметь возможность записать что-то в журналы, чтобы дать подсказку.
Некоторые частые причины внезапных перезапусков:
ПерегревВероятно, это основная причина, получите представление о температуре, попробуйте ее зарегистрировать, есть ли на сервере дисплей, который может показывать температуру, правильно ли охлаждается комната. Возможно, замените термопасту на радиаторах, закрывающих ЦП.
Плохое оборудование или драйверы, получите список с помощью "lspci", например, плохой димм может привести к внезапному зависанию и / или перезагрузке системы (переустановите диммы, процессоры и карты). Я помню сервер, который иногда перезагружался из-за проблемы с сетевой картой Intel. Иногда плохой диск также может вызвать такие проблемы, хотя обычно он просто зависает, а не перезагружается.
Плохой ИБП, Я помню, как ИБП с резервным питанием от батареи медленно выходил из строя, и одним из индикаторов этого был регулярный еженедельный цикл питания подключенных к нему серверов. Возможно, у вас просто неправильно настроенный график цикла включения питания.