Я настроил ванильный кластер с помощью forest_gce. stonith_rhnfs01 находится на узле 1, а stonith_rhnfs02 - на узле 2. Сейчас вываливаю node2. Stonith_rhnfs01 уже был на узле 1, а stonith_rhnfs02 завершится ошибкой на узле 2 и запустится на узле 1 без определенной ошибки в журналах. очистка ресурса stonith_rhnfs02 принесет stonith_rhnfs02 на узел 2.
Я также увеличил интервал мониторинга стонита до 120 секунд, но все равно безуспешно.
Ниже приведены результаты для справки.
[root@rhnfs01 ~]# pcs status
Cluster name: etutorialguru_cluster
Stack: corosync
Current DC: rhnfs01 (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 2 11:50:41 2020
Last change: Tue Jun 2 08:20:27 2020 by hacluster via crmd on rhnfs01
2 nodes configured
2 resources configured
Online: [ rhnfs01 rhnfs02 ]
Full list of resources:
stonith_rhnfs01 (stonith:fence_gce): Started rhnfs01
stonith_rhnfs02 (stonith:fence_gce): Started rhnfs01
Failed Actions:
* stonith_rhnfs02_start_0 on rhnfs02 'unknown error' (1): call=10, status=Timed Out, exitreason='',
last-rc-change='Tue Jun 2 11:34:00 2020', queued=0ms, exec=20013ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Конфигурация кластера:
[root@rhnfs02 ~]# pcs config
Cluster Name: etutorialguru_cluster
Corosync Nodes:
rhnfs01 rhnfs02
Pacemaker Nodes:
rhnfs01 rhnfs02
Resources:
Stonith Devices:
Resource: stonith_rhnfs01 (class=stonith type=fence_gce)
Attributes: pcmk_host_map=rhnfs01:rhnfs01 pcmk_reboot_retries=4 pcmk_reboot_timeout=480s power_timeout=240 zone=us-central1-a project=mytower
Operations: monitor interval=120s (stonith_rhnfs01-monitor-interval-120s)
Resource: stonith_rhnfs02 (class=stonith type=fence_gce)
Attributes: pcmk_host_map=rhnfs02:rhnfs02 pcmk_reboot_retries=4 pcmk_reboot_timeout=480s power_timeout=240 zone=us-central1-b project=mytower
Operations: monitor interval=120s (stonith_rhnfs02-monitor-interval-120s)
Fencing Levels:
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: etutorialguru_cluster
dc-version: 1.1.19-8.el7_6.5-c3c624ea3d
have-watchdog: false
last-lrm-refresh: 1591086027
maintenance-mode: false
no-quorum-policy: ignore
stonith-enabled: true
Quorum:
Options:
сообщения Вывод журналов:
[root@rhnfs01 ~]# tail -f /var/log/messages
Jun 2 11:33:52 rhnfs01 corosync[1046]: [TOTEM ] A new membership (192.168.0.68:96) was formed. Members joined: 2
Jun 2 11:33:52 rhnfs01 corosync[1046]: [QUORUM] Members[2]: 1 2
Jun 2 11:33:52 rhnfs01 corosync[1046]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 2 11:33:52 rhnfs01 crmd[1162]: notice: Node rhnfs02 state is now member
Jun 2 11:33:52 rhnfs01 pacemakerd[1089]: notice: Node rhnfs02 state is now member
Jun 2 11:33:54 rhnfs01 attrd[1159]: notice: Node rhnfs02 state is now member
Jun 2 11:33:54 rhnfs01 stonith-ng[1156]: notice: Node rhnfs02 state is now member
Jun 2 11:33:55 rhnfs01 cib[1155]: notice: Node rhnfs02 state is now member
Jun 2 11:33:55 rhnfs01 crmd[1162]: notice: State transition S_IDLE -> S_INTEGRATION
Jun 2 11:33:59 rhnfs01 pengine[1161]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:33:59 rhnfs01 pengine[1161]: notice: * Move stonith_rhnfs02 ( rhnfs01 -> rhnfs02 )
Jun 2 11:33:59 rhnfs01 pengine[1161]: notice: Calculated transition 13, saving inputs in /var/lib/pacemaker/pengine/pe-input-30.bz2
Jun 2 11:33:59 rhnfs01 crmd[1162]: notice: Initiating monitor operation stonith_rhnfs01_monitor_0 on rhnfs02
Jun 2 11:33:59 rhnfs01 crmd[1162]: notice: Initiating stop operation stonith_rhnfs02_stop_0 locally on rhnfs01
Jun 2 11:33:59 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:33:59 rhnfs01 crmd[1162]: notice: Result of stop operation for stonith_rhnfs02 on rhnfs01: 0 (ok)
Jun 2 11:33:59 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:00 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:00 rhnfs01 crmd[1162]: notice: Initiating monitor operation stonith_rhnfs02_monitor_0 on rhnfs02
Jun 2 11:34:00 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:00 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:00 rhnfs01 crmd[1162]: notice: Initiating start operation stonith_rhnfs02_start_0 on rhnfs02
Jun 2 11:34:00 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:00 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:20 rhnfs01 crmd[1162]: warning: Action 8 (stonith_rhnfs02_start_0) on rhnfs02 failed (target: 0 vs. rc: 1): Error
Jun 2 11:34:20 rhnfs01 crmd[1162]: notice: Transition aborted by operation stonith_rhnfs02_start_0 'modify' on rhnfs02: Event failed
Jun 2 11:34:20 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:20 rhnfs01 crmd[1162]: notice: Transition 13 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-30.bz2): Complete
Jun 2 11:34:20 rhnfs01 pengine[1161]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:20 rhnfs01 pengine[1161]: warning: Processing failed start of stonith_rhnfs02 on rhnfs02: unknown error
Jun 2 11:34:20 rhnfs01 pengine[1161]: warning: Processing failed start of stonith_rhnfs02 on rhnfs02: unknown error
Jun 2 11:34:20 rhnfs01 pengine[1161]: notice: * Recover stonith_rhnfs02 ( rhnfs02 )
Jun 2 11:34:20 rhnfs01 pengine[1161]: notice: Calculated transition 14, saving inputs in /var/lib/pacemaker/pengine/pe-input-31.bz2
Jun 2 11:34:20 rhnfs01 crmd[1162]: notice: Transition aborted by transient_attributes.2 'create': Transient attribute change
Jun 2 11:34:20 rhnfs01 crmd[1162]: notice: Transition 14 (Complete=0, Pending=0, Fired=0, Skipped=1, Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-input-31.bz2): Stopped
Jun 2 11:34:20 rhnfs01 pengine[1161]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:20 rhnfs01 pengine[1161]: warning: Processing failed start of stonith_rhnfs02 on rhnfs02: unknown error
Jun 2 11:34:20 rhnfs01 pengine[1161]: warning: Processing failed start of stonith_rhnfs02 on rhnfs02: unknown error
Jun 2 11:34:20 rhnfs01 pengine[1161]: warning: Forcing stonith_rhnfs02 away from rhnfs02 after 1000000 failures (max=1000000)
Jun 2 11:34:20 rhnfs01 pengine[1161]: notice: * Recover stonith_rhnfs02 ( rhnfs02 -> rhnfs01 )
Jun 2 11:34:20 rhnfs01 pengine[1161]: notice: Calculated transition 15, saving inputs in /var/lib/pacemaker/pengine/pe-input-32.bz2
Jun 2 11:34:20 rhnfs01 crmd[1162]: notice: Initiating stop operation stonith_rhnfs02_stop_0 on rhnfs02
Jun 2 11:34:20 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:20 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:20 rhnfs01 crmd[1162]: notice: Initiating start operation stonith_rhnfs02_start_0 locally on rhnfs01
Jun 2 11:34:20 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:21 rhnfs01 crmd[1162]: notice: Result of start operation for stonith_rhnfs02 on rhnfs01: 0 (ok)
Jun 2 11:34:21 rhnfs01 crmd[1162]: notice: Initiating monitor operation stonith_rhnfs02_monitor_120000 locally on rhnfs01
Jun 2 11:34:21 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:21 rhnfs01 stonith-ng[1156]: notice: On loss of CCM Quorum: Ignore
Jun 2 11:34:22 rhnfs01 crmd[1162]: notice: Transition 15 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-32.bz2): Complete
Jun 2 11:34:22 rhnfs01 crmd[1162]: notice: State transition S_TRANSITION_ENGINE
Я также видел во время теста, что иногда stonith_rhnfs01 запускается на node2, а stonith_rhnfs02 запускается на node1.
Пожалуйста, предложите.