Назад | Перейти на главную страницу

Первоначальная настройка Xtradb Cluster - каждый узел становится изолированным

Я настраиваю percona-xtradb-cluster-57 (3 узла) на Ubuntu 16.04. Они должны общаться с использованием частной сети с 10.254.10.101 по 10.254.10.103.

Когда я следую инструкциям на сайте Percona в том виде, как они написаны, после начальной загрузки первого узла я обычно вызываю один из двух других. Тем не мение, SHOW STATUS like 'ws_rep%'; приводит к размеру кластера 1 и идентификатору кластера, отличному от идентификатора загружаемого узла.

Я проверил брандмауэр, отключил брандмауэр, попытался подключиться к каждому узлу на портах 3306 и 4567 и дважды проверил, что каждая машина видит своих соседей как одиночный переход. Все это так, как принято. Порты 4444 и 4568 также открыты, хотя netstat не показывает, что они слушают. FWIW 4444 и 4568 также не прослушивают рабочий кластер в той же программной среде (которая распределена по нескольким центрам обработки данных).

Использование SST для репликации с созданием пользователя SST на каждом узле. Я также сделал это только на начальном узле.

Вот конфиг:

[mysqld]
wsrep_provider=/usr/lib/libgalera_smm.so

wsrep_cluster_name=dbcluster
wsrep_cluster_address=gcomm://10.254.10.101,10.254.10.102,10.254.10.103

wsrep_node_name=pxc1
wsrep_node_address=10.254.10.101

wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sst-user:sst-pass

pxc_strict_mode=ENFORCING

binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2

Какие диагностические данные вы хотели бы видеть, чтобы помочь отследить эту проблему? Первое, что меня поразило, это проблема с сетью, но я проделал там все обычные вещи, как описано выше.

Наблюдая за журналами при запуске, два узла, которые не были загружены, кажется, даже не ищут другие узлы.

2017-11-15T15:23:20.255455Z mysqld_safe Logging to '/var/log/mysqld.log'.
2017-11-15T15:23:20.272183Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2017-11-15T15:23:20.279082Z mysqld_safe Skipping wsrep-recover for 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14 pair
2017-11-15T15:23:20.280168Z mysqld_safe Assigning 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14 to wsrep_start_position

2017-11-15T15:23:20.481916Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-11-15T15:23:20.483755Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.19-17-57-log) starting as process 29150 ...
2017-11-15T15:23:20.487158Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=app3-bin' to avoid this problem.
2017-11-15T15:23:20.487910Z 0 [Note] WSREP: Setting wsrep_ready to false
2017-11-15T15:23:20.488055Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2017-11-15T15:23:20.488173Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2017-11-15T15:23:20.491810Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <info@codership.com> loaded successfully.
2017-11-15T15:23:20.491980Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-11-15T15:23:20.492510Z 0 [Note] WSREP: Found saved state: 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14, safe_to_bootsrap: 1
2017-11-15T15:23:20.493950Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.254.10.103; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-11-15T15:23:20.513602Z 0 [Note] WSREP: GCache history reset: 71bf4cd8-ca02-11e7-8b84-630bd10b8205:0 -> 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14
2017-11-15T15:23:20.514428Z 0 [Note] WSREP: Assign initial position for certification: 14, protocol version: -1
2017-11-15T15:23:20.514546Z 0 [Note] WSREP: Preparing to initiate SST/IST
2017-11-15T15:23:20.514663Z 0 [Note] WSREP: Starting replication
2017-11-15T15:23:20.514791Z 0 [Note] WSREP: Setting initial position to 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14
2017-11-15T15:23:20.515118Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-11-15T15:23:20.515346Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-11-15T15:23:20.515543Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown 2017-11-15T15:23:20.515662Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2017-11-15T15:23:20.516044Z 0 [Note] WSREP: GMCast version 0
2017-11-15T15:23:20.516252Z 0 [Note] WSREP: (e997d882, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-11-15T15:23:20.516316Z 0 [Note] WSREP: (e997d882, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-11-15T15:23:20.516739Z 0 [Note] WSREP: EVS version 0
2017-11-15T15:23:20.516889Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer ''
2017-11-15T15:23:20.516986Z 0 [Note] WSREP: start_prim is enabled, turn off pc_recovery
2017-11-15T15:23:20.517230Z 0 [Note] WSREP: Node e997d882 state primary
2017-11-15T15:23:20.517313Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,e997d882,1)
memb {
        e997d882,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2017-11-15T15:23:20.517372Z 0 [Note] WSREP: Save the discovered primary-component to disk
2017-11-15T15:23:20.517555Z 0 [Note] WSREP: gcomm: connected
2017-11-15T15:23:20.517672Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-11-15T15:23:20.517836Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2017-11-15T15:23:20.517987Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: e9982b85-ca18-11e7-bcf3-22cba208212d

2017-11-15T15:23:20.518021Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: e9982b85-ca18-11e7-bcf3-22cba208212d 2017-11-15T15:23:20.518038Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: e9982b85-ca18-11e7-bcf3-22cba208212d from 0 (pxc-cluster-node-1)
2017-11-15T15:23:20.518054Z 0 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 0,
        members    = 1/1 (primary/total),
        act_id     = 14,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 71bf4cd8-ca02-11e7-8b84-630bd10b8205
2017-11-15T15:23:20.518071Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2017-11-15T15:23:20.518084Z 0 [Note] WSREP: Trying to continue unpaused monitor
2017-11-15T15:23:20.518097Z 0 [Note] WSREP: Restored state OPEN -> JOINED (14)
2017-11-15T15:23:20.518141Z 0 [Note] WSREP: Member 0.0 (pxc-cluster-node-1) synced with group.
2017-11-15T15:23:20.517998Z 0 [Note] WSREP: Waiting for SST/IST to complete.
2017-11-15T15:23:20.518154Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 14)
2017-11-15T15:23:20.518501Z 1 [Note] WSREP: New cluster view: global state: 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
2017-11-15T15:23:20.518541Z 1 [Note] WSREP: Setting wsrep_ready to true
2017-11-15T15:23:20.518594Z 0 [Note] WSREP: SST complete, seqno: 14
2017-11-15T15:23:20.520410Z 0 [Note] InnoDB: PUNCH HOLE support available
2017-11-15T15:23:20.520445Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-11-15T15:23:20.520460Z 0 [Note] InnoDB: Uses event mutexes
2017-11-15T15:23:20.520467Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2017-11-15T15:23:20.520474Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2017-11-15T15:23:20.520481Z 0 [Note] InnoDB: Using Linux native AIO
2017-11-15T15:23:20.520824Z 0 [Note] InnoDB: Number of pools: 1
2017-11-15T15:23:20.520978Z 0 [Note] InnoDB: Using CPU crc32 instructions
2017-11-15T15:23:20.522918Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2017-11-15T15:23:20.528901Z 0 [Note] InnoDB: Completed initialization of buffer pool
2017-11-15T15:23:20.531671Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2017-11-15T15:23:20.543934Z 0 [Note] InnoDB: Crash recovery did not find the parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite
2017-11-15T15:23:20.544844Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2017-11-15T15:23:20.560093Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite, size 3932160 bytes
2017-11-15T15:23:20.566433Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2017-11-15T15:23:20.566788Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2017-11-15T15:23:20.588535Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2017-11-15T15:23:20.589701Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2017-11-15T15:23:20.589810Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2017-11-15T15:23:20.590375Z 0 [Note] InnoDB: Waiting for purge to start
2017-11-15T15:23:20.640641Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.19-17 started; log sequence number 2548109
2017-11-15T15:23:20.641178Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2017-11-15T15:23:20.641384Z 0 [Note] Plugin 'FEDERATED' is disabled.
2017-11-15T15:23:20.643176Z 0 [Note] InnoDB: Buffer pool(s) load completed at 171115  9:23:20
2017-11-15T15:23:20.650925Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2017-11-15T15:23:20.650959Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2017-11-15T15:23:20.651471Z 0 [Warning] CA certificate ca.pem is self signed.
2017-11-15T15:23:20.651529Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2017-11-15T15:23:20.651638Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2017-11-15T15:23:20.651688Z 0 [Note] IPv6 is available.
2017-11-15T15:23:20.651706Z 0 [Note]   - '::' resolves to '::';
2017-11-15T15:23:20.651733Z 0 [Note] Server socket created on IP: '::'.
2017-11-15T15:23:20.661454Z 0 [Note] Event Scheduler: Loaded 0 events
2017-11-15T15:23:20.662904Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.19-17-57-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22
2017-11-15T15:23:20.662945Z 0 [Note] Executing 'SELECT * FROM INFORMATION_SCHEMA.TABLES;' to get a list of tables using the deprecated partition engine. You may use the startup option '--disable-partition-engine-check' to skip this check.
2017-11-15T15:23:20.663009Z 0 [Note] Beginning of list of non-natively partitioned tables
2017-11-15T15:23:20.663172Z 1 [Note] WSREP: Initialized wsrep sidno 2
2017-11-15T15:23:20.663199Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-11-15T15:23:20.663233Z 1 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-11-15T15:23:20.663248Z 1 [Note] WSREP: Assign initial position for certification: 14, protocol version: 3
2017-11-15T15:23:20.663291Z 0 [Note] WSREP: Service thread queue flushed.
2017-11-15T15:23:20.663350Z 1 [Note] WSREP: GCache history reset: 71bf4cd8-ca02-11e7-8b84-630bd10b8205:0 -> 71bf4cd8-ca02-11e7-8b84-630bd10b8205:14
2017-11-15T15:23:20.663818Z 1 [Note] WSREP: Synchronized with group, ready for connections
2017-11-15T15:23:20.663839Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-11-15T15:23:20.679843Z 0 [Note] End of list of non-natively partitioned tables

Я думаю, ваша проблема может заключаться в следующем:

2017-11-15T15:23:20.516889Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer ''

Предполагая, что это ваш второй узел, и ваш первый узел настроен с помощью:

wsrep_cluster_name=dbcluster

Они не будут видеть друг друга, убедитесь, что у них одинаковое имя кластера.