Я использую кластер GlusterFS с доверенным пулом хранения, состоящим из 4 одноранговых узлов.
Он работает правильно (= том можно смонтировать, файлы сохраняются правильно) за исключением одного момента: не выполняется перебалансировка.
Также на выходе peer status
и журналы в glus-glusterfs-glusterd.vol.log
вызывает беспокойство. Что-то идет не так, и я не знаю, как это исправить.
Меня беспокоит, что однажды вся система выйдет из строя, и я потеряю все данные. Поэтому я думаю, что мне нужно решить проблемы
Все серверы имеют gluster 3.7.6 и работают под управлением Ubuntu 16.04.
gluster> volume status
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick example-storage1:/data/brick1/gv0 49152 0 Y 1413
Brick 100.100.250.25:/data/brick2/gv0 49152 0 Y 3081
Brick 100.100.255.40:/data/brick3/gv0 N/A N/A N N/A
NFS Server on localhost N/A N/A N N/A
NFS Server on example-storage2 N/A N/A N N/A
NFS Server on example-storage1.example.com
2049 0 Y 24490
NFS Server on example-storage3 N/A N/A N N/A
Task Status of Volume gv0
------------------------------------------------------------------------------
Task : Rebalance
ID : 1ee56040-6bb5-4407-8ae5-f176e6c89db1
Status : completed
gluster> peer status
Number of Peers: 3
Hostname: example-storage3
Uuid: 5e5db480-d789-4ba4-8796-151ecb050ee8
State: Peer in Cluster (Connected)
Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Connected)
Other names:
example-storage2.example.com
Hostname: example-storage1.example.com
Uuid: 3f76dc73-77f4-4b9a-b1f1-3ba3a9aa26a7
State: Peer in Cluster (Connected)
Other names:
example-storage1.example.com
Number of Peers: 5
Hostname: example-storage3
Uuid: 5e5db480-d789-4ba4-8796-151ecb050ee8
State: Peer in Cluster (Connected)
Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Connected)
Other names:
example-storage2
Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Connected)
Hostname: example-storage3
Uuid: 49d9bc0a-b67d-4850-bff9-edeaa0dac8ca
State: Peer Rejected (Connected)
Hostname: example-prod.example.com
Uuid: 4170ef42-770d-4f52-be99-6c6e317f9fa0
State: Peer in Cluster (Connected)
Other names:
example-prod
Number of Peers: 3
Hostname: example-storage1.example.com
Uuid: 3f76dc73-77f4-4b9a-b1f1-3ba3a9aa26a7
State: Peer in Cluster (Connected)
Other names:
example-storage1
Hostname: example-storage3
Uuid: 5e5db480-d789-4ba4-8796-151ecb050ee8
State: Peer in Cluster (Connected)
Hostname: example-prod.example.com
Uuid: 4170ef42-770d-4f52-be99-6c6e317f9fa0
State: Peer in Cluster (Connected)
Other names:
example-prod
Обратите внимание на «Отключен статус».
Number of Peers: 3
Hostname: example-prod.example.com
Uuid: 4170ef42-770d-4f52-be99-6c6e317f9fa0
State: Peer in Cluster (Disconnected)
Other names:
example-prod
Hostname: example-storage1.example.com
Uuid: 3f76dc73-77f4-4b9a-b1f1-3ba3a9aa26a7
State: Peer in Cluster (Disconnected)
Other names:
example-storage1
Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Disconnected)
Other names:
example-storage2
// every 5 seconds the following line
[2018-04-13 07:07:05.602742] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/f86f1461d3e00792ac2b2fefcedc2d08.socket failed (Invalid argument)
[2018-04-13 07:07:08.603156] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/f86f1461d3e00792ac2b2fefcedc2d08.socket failed (Invalid argument)
// every 5 seconds the following line
[2018-04-13 07:00:38.987432] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
[2018-04-13 07:00:41.987968] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
// every 5 seconds the following line
[2018-04-13 07:08:24.119264] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
[2018-04-13 07:08:27.119618] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
// The following lines repeat
[2018-04-13 07:07:54.599955] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid argument
[2018-04-13 07:07:54.600003] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2018-04-13 07:08:02.697437] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <example-storage2> (<54566d17-f76b-45d0-82a2-ed8a474289c8>), in state <Peer in Cluster>, has disconnected from glusterd.
[2018-04-13 07:08:04.625465] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14, Invalid argument
[2018-04-13 07:08:04.625513] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument