Назад | Перейти на главную страницу

Забор_gce не работает должным образом на GCP?

Я настроил ванильный кластер с помощью forest_gce. stonith_rhnfs01 находится на узле 1, а stonith_rhnfs02 - на узле 2. Сейчас вываливаю node2. Stonith_rhnfs01 уже был на узле 1, а stonith_rhnfs02 завершится ошибкой на узле 2 и запустится на узле 1 без определенной ошибки в журналах. очистка ресурса stonith_rhnfs02 принесет stonith_rhnfs02 на узел 2.

Я также увеличил интервал мониторинга стонита до 120 секунд, но все равно безуспешно.

Ниже приведены результаты для справки.

[root@rhnfs01 ~]# pcs status
Cluster name: etutorialguru_cluster
Stack: corosync
Current DC: rhnfs01 (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum
Last updated: Tue Jun  2 11:50:41 2020
Last change: Tue Jun  2 08:20:27 2020 by hacluster via crmd on rhnfs01

2 nodes configured
2 resources configured

Online: [ rhnfs01 rhnfs02 ]

Full list of resources:

 stonith_rhnfs01        (stonith:fence_gce):    Started rhnfs01
 stonith_rhnfs02        (stonith:fence_gce):    Started rhnfs01

Failed Actions:
* stonith_rhnfs02_start_0 on rhnfs02 'unknown error' (1): call=10, status=Timed Out, exitreason='',
    last-rc-change='Tue Jun  2 11:34:00 2020', queued=0ms, exec=20013ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Конфигурация кластера:

[root@rhnfs02 ~]# pcs config
Cluster Name: etutorialguru_cluster
Corosync Nodes:
 rhnfs01 rhnfs02
Pacemaker Nodes:
 rhnfs01 rhnfs02

Resources:

Stonith Devices:
 Resource: stonith_rhnfs01 (class=stonith type=fence_gce)
  Attributes: pcmk_host_map=rhnfs01:rhnfs01 pcmk_reboot_retries=4 pcmk_reboot_timeout=480s power_timeout=240 zone=us-central1-a project=mytower
  Operations: monitor interval=120s (stonith_rhnfs01-monitor-interval-120s)
 Resource: stonith_rhnfs02 (class=stonith type=fence_gce)
  Attributes: pcmk_host_map=rhnfs02:rhnfs02 pcmk_reboot_retries=4 pcmk_reboot_timeout=480s power_timeout=240 zone=us-central1-b project=mytower
  Operations: monitor interval=120s (stonith_rhnfs02-monitor-interval-120s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: etutorialguru_cluster
 dc-version: 1.1.19-8.el7_6.5-c3c624ea3d
 have-watchdog: false
 last-lrm-refresh: 1591086027
 maintenance-mode: false
 no-quorum-policy: ignore
 stonith-enabled: true

Quorum:
  Options:

сообщения Вывод журналов:

[root@rhnfs01 ~]# tail -f /var/log/messages
Jun  2 11:33:52 rhnfs01 corosync[1046]: [TOTEM ] A new membership (192.168.0.68:96) was formed. Members joined: 2
Jun  2 11:33:52 rhnfs01 corosync[1046]: [QUORUM] Members[2]: 1 2
Jun  2 11:33:52 rhnfs01 corosync[1046]: [MAIN  ] Completed service synchronization, ready to provide service.
Jun  2 11:33:52 rhnfs01 crmd[1162]:  notice: Node rhnfs02 state is now member
Jun  2 11:33:52 rhnfs01 pacemakerd[1089]:  notice: Node rhnfs02 state is now member
Jun  2 11:33:54 rhnfs01 attrd[1159]:  notice: Node rhnfs02 state is now member
Jun  2 11:33:54 rhnfs01 stonith-ng[1156]:  notice: Node rhnfs02 state is now member
Jun  2 11:33:55 rhnfs01 cib[1155]:  notice: Node rhnfs02 state is now member
Jun  2 11:33:55 rhnfs01 crmd[1162]:  notice: State transition S_IDLE -> S_INTEGRATION
Jun  2 11:33:59 rhnfs01 pengine[1161]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:33:59 rhnfs01 pengine[1161]:  notice:  * Move       stonith_rhnfs02     ( rhnfs01 -> rhnfs02 )
Jun  2 11:33:59 rhnfs01 pengine[1161]:  notice: Calculated transition 13, saving inputs in /var/lib/pacemaker/pengine/pe-input-30.bz2
Jun  2 11:33:59 rhnfs01 crmd[1162]:  notice: Initiating monitor operation stonith_rhnfs01_monitor_0 on rhnfs02
Jun  2 11:33:59 rhnfs01 crmd[1162]:  notice: Initiating stop operation stonith_rhnfs02_stop_0 locally on rhnfs01
Jun  2 11:33:59 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:33:59 rhnfs01 crmd[1162]:  notice: Result of stop operation for stonith_rhnfs02 on rhnfs01: 0 (ok)
Jun  2 11:33:59 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:00 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:00 rhnfs01 crmd[1162]:  notice: Initiating monitor operation stonith_rhnfs02_monitor_0 on rhnfs02
Jun  2 11:34:00 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:00 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:00 rhnfs01 crmd[1162]:  notice: Initiating start operation stonith_rhnfs02_start_0 on rhnfs02
Jun  2 11:34:00 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:00 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:20 rhnfs01 crmd[1162]: warning: Action 8 (stonith_rhnfs02_start_0) on rhnfs02 failed (target: 0 vs. rc: 1): Error
Jun  2 11:34:20 rhnfs01 crmd[1162]:  notice: Transition aborted by operation stonith_rhnfs02_start_0 'modify' on rhnfs02: Event failed
Jun  2 11:34:20 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:20 rhnfs01 crmd[1162]:  notice: Transition 13 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-30.bz2): Complete
Jun  2 11:34:20 rhnfs01 pengine[1161]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:20 rhnfs01 pengine[1161]: warning: Processing failed start of stonith_rhnfs02 on rhnfs02: unknown error
Jun  2 11:34:20 rhnfs01 pengine[1161]: warning: Processing failed start of stonith_rhnfs02 on rhnfs02: unknown error
Jun  2 11:34:20 rhnfs01 pengine[1161]:  notice:  * Recover    stonith_rhnfs02     (            rhnfs02 )
Jun  2 11:34:20 rhnfs01 pengine[1161]:  notice: Calculated transition 14, saving inputs in /var/lib/pacemaker/pengine/pe-input-31.bz2
Jun  2 11:34:20 rhnfs01 crmd[1162]:  notice: Transition aborted by transient_attributes.2 'create': Transient attribute change
Jun  2 11:34:20 rhnfs01 crmd[1162]:  notice: Transition 14 (Complete=0, Pending=0, Fired=0, Skipped=1, Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-input-31.bz2): Stopped
Jun  2 11:34:20 rhnfs01 pengine[1161]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:20 rhnfs01 pengine[1161]: warning: Processing failed start of stonith_rhnfs02 on rhnfs02: unknown error
Jun  2 11:34:20 rhnfs01 pengine[1161]: warning: Processing failed start of stonith_rhnfs02 on rhnfs02: unknown error
Jun  2 11:34:20 rhnfs01 pengine[1161]: warning: Forcing stonith_rhnfs02 away from rhnfs02 after 1000000 failures (max=1000000)
Jun  2 11:34:20 rhnfs01 pengine[1161]:  notice:  * Recover    stonith_rhnfs02     ( rhnfs02 -> rhnfs01 )
Jun  2 11:34:20 rhnfs01 pengine[1161]:  notice: Calculated transition 15, saving inputs in /var/lib/pacemaker/pengine/pe-input-32.bz2
Jun  2 11:34:20 rhnfs01 crmd[1162]:  notice: Initiating stop operation stonith_rhnfs02_stop_0 on rhnfs02
Jun  2 11:34:20 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:20 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:20 rhnfs01 crmd[1162]:  notice: Initiating start operation stonith_rhnfs02_start_0 locally on rhnfs01
Jun  2 11:34:20 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:21 rhnfs01 crmd[1162]:  notice: Result of start operation for stonith_rhnfs02 on rhnfs01: 0 (ok)
Jun  2 11:34:21 rhnfs01 crmd[1162]:  notice: Initiating monitor operation stonith_rhnfs02_monitor_120000 locally on rhnfs01
Jun  2 11:34:21 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:21 rhnfs01 stonith-ng[1156]:  notice: On loss of CCM Quorum: Ignore
Jun  2 11:34:22 rhnfs01 crmd[1162]:  notice: Transition 15 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-32.bz2): Complete
Jun  2 11:34:22 rhnfs01 crmd[1162]:  notice: State transition S_TRANSITION_ENGINE

Я также видел во время теста, что иногда stonith_rhnfs01 запускается на node2, а stonith_rhnfs02 запускается на node1.

Пожалуйста, предложите.