Я весь день безуспешно пытался устранить эту проблему.
У меня есть два сервера, server1 и server2, оба работают под управлением Ubuntu 14.04.5 LTS и подключены к коммутатору Cisco sg200-08 через транк LAG с LACP. IP-адрес коммутатора - 172.128.1.254/24, а интерфейсы на серверах показаны ниже, включая таблицу маршрутов и arp для соответствующих IP-адресов:
На server1:
root@server1:~# ip addr show bond0
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 00:11:0a:10:03:29 brd ff:ff:ff:ff:ff:ff
inet 172.128.1.129/24 brd 172.128.1.255 scope global bond0
valid_lft forever preferred_lft forever
root@server1:~# ip addr show bond0.53
13: bond0.53@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 00:11:0a:10:03:29 brd ff:ff:ff:ff:ff:ff
inet 192.168.53.1/24 brd 192.168.53.255 scope global bond0.53
valid_lft forever preferred_lft forever
root@server1:~# ip route get 192.168.53.2
192.168.53.2 dev bond0.53 src 192.168.53.1
cache
root@server1:~# arp -n | grep '192.168.53.2'
192.168.53.2 (incomplete) bond0.53
На server2:
root@server2:~# ip addr show bond0
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 00:15:17:2e:ab:b4 brd ff:ff:ff:ff:ff:ff
inet 172.128.1.130/24 brd 172.128.1.255 scope global bond0
valid_lft forever preferred_lft foreve
root@server2:~# ip addr show bond0.53
22: bond0.53@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 00:15:17:2e:ab:b4 brd ff:ff:ff:ff:ff:ff
inet 192.168.53.2/24 brd 192.168.53.255 scope global bond0.53
valid_lft forever preferred_lft forever
root@server2:~# ip route get 192.168.53.1
192.168.53.1 dev bond0.53 src 192.168.53.2
cache
root@server2:~# arp -n | grep '192.168.53.1'
192.168.53.1 ether 00:11:0a:10:03:29 C bond0.53
Когда я пингую server2 с server1, я не вижу, чтобы ответы arp возвращались на server1:
root@server1:~# tcpdump -ennqt -i bond0 \( arp or icmp \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
но на стороне server2 я вижу, что запрос arp от server1 И ответы отправляются обратно через VLAN53:
root@server2:~# tcpdump -ennqt -i bond0 \( arp or icmp \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
Для пинга в противоположном направлении я вижу это только на server2:
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 1, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 2, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 3, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 4, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 5, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.1 tell 192.168.53.2, length 28
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.1 tell 192.168.53.2, length 28
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.1 tell 192.168.53.2, length 28
Никаких настроек firewall, arptables или ebtables с обеих сторон. Ядро sysctl не блокирует ICMP-трафик. Связи крепкие и здоровые. Коммутатор имеет 2 порта в каждой группе LAG, настроенных как магистраль по направлению к каждому серверу, и передает vlan 1 (собственный / по умолчанию без тегов) и 51,52,53,54 с тегами. Я могу пропинговать IP-адреса bond0 172.128.1.129 и 172.128.1.130 с коммутатора. Я могу пинговать 172.128.1.129 (server1) с другого компьютера с Linux, подключенного к коммутатору (ip 172.128.1.5), но не 172.128.1.130 (server2).
Заранее благодарим за любые указатели, идеи, предложения.
ИСПРАВЛЕНИЕ: Я могу пинговать ОБЕИХ сервера с третьего хоста в сети
igorc@client:~$ ip -f inet addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 172.128.1.5/24 brd 172.128.1.255 scope global dynamic eth1
valid_lft 22497sec preferred_lft 22497sec
igorc@client:~$ ping -c 2 172.128.1.129
PING 172.128.1.129 (172.128.1.129) 56(84) bytes of data.
64 bytes from 172.128.1.129: icmp_seq=1 ttl=64 time=0.618 ms
64 bytes from 172.128.1.129: icmp_seq=2 ttl=64 time=0.541 ms
--- 172.128.1.129 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.541/0.579/0.618/0.045 ms
igorc@client:~$ ping -c 2 172.128.1.130
PING 172.128.1.130 (172.128.1.130) 56(84) bytes of data.
64 bytes from 172.128.1.130: icmp_seq=1 ttl=64 time=0.645 ms
64 bytes from 172.128.1.130: icmp_seq=2 ttl=64 time=0.693 ms
--- 172.128.1.130 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.645/0.669/0.693/0.024 ms
ОБНОВИТЬ: Связь на обоих серверах
root@server1:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 17
Partner Key: 1
Partner Mac Address: 00:00:00:00:00:00
Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 00:11:0a:10:03:29
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 00:11:0a:10:03:28
Aggregator ID: 2
Slave queue ID: 0
root@server2:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 1
Actor Key: 17
Partner Key: 1
Partner Mac Address: 00:00:00:00:00:00
Slave Interface: p1p1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:15:17:2e:ab:b4
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: p1p2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:15:17:2e:ab:b5
Aggregator ID: 2
Slave queue ID: 0
Решено. Я по ошибке установил LAG в коммутаторе Cisco на статический, а не на динамический, что не позволяет использовать LACP. Встроенное изображение не будет отображаться, вероятно, из-за отсутствия баллов в моем аккаунте, но в любом случае прикрепление его.
Теперь все выглядит намного лучше:
root@server1:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
**Number of ports: 2**
Actor Key: 17
Partner Key: 10
**Partner Mac Address: 20:bb:c0:78:7e:9b**
Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:11:0a:10:03:28
**Aggregator ID: 1**
Slave queue ID: 0
Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:11:0a:10:03:29
**Aggregator ID: 1**
Slave queue ID: 0
Изменения выделены жирным шрифтом (если они видны в виджете кода), сначала количество портов правильно установлено на 2, а не на 1, затем идентификатор агрегатора теперь правильно имеет одинаковое значение для обоих ведомых устройств, и, наконец, MAC-адрес партнера теперь имеет значение (по сравнению с 00: 00: 00: 00: 00: 00 ранее), указывающим на обмен сообщениями LACP UDP между одноранговыми узлами.