Назад | Перейти на главную страницу

Проблема ARP / ICMP в VLAN через связанный интерфейс

Я весь день безуспешно пытался устранить эту проблему.

У меня есть два сервера, server1 и server2, оба работают под управлением Ubuntu 14.04.5 LTS и подключены к коммутатору Cisco sg200-08 через транк LAG с LACP. IP-адрес коммутатора - 172.128.1.254/24, а интерфейсы на серверах показаны ниже, включая таблицу маршрутов и arp для соответствующих IP-адресов:

На server1:

root@server1:~# ip addr show bond0
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 00:11:0a:10:03:29 brd ff:ff:ff:ff:ff:ff
    inet 172.128.1.129/24 brd 172.128.1.255 scope global bond0
       valid_lft forever preferred_lft forever

root@server1:~# ip addr show bond0.53
13: bond0.53@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 00:11:0a:10:03:29 brd ff:ff:ff:ff:ff:ff
    inet 192.168.53.1/24 brd 192.168.53.255 scope global bond0.53
       valid_lft forever preferred_lft forever

root@server1:~# ip route get 192.168.53.2
192.168.53.2 dev bond0.53  src 192.168.53.1 
    cache

root@server1:~# arp -n | grep '192.168.53.2'
192.168.53.2                     (incomplete)                              bond0.53

На server2:

root@server2:~# ip addr show bond0
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 00:15:17:2e:ab:b4 brd ff:ff:ff:ff:ff:ff
    inet 172.128.1.130/24 brd 172.128.1.255 scope global bond0
       valid_lft forever preferred_lft foreve

root@server2:~# ip addr show bond0.53
22: bond0.53@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 00:15:17:2e:ab:b4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.53.2/24 brd 192.168.53.255 scope global bond0.53
       valid_lft forever preferred_lft forever

root@server2:~# ip route get 192.168.53.1
192.168.53.1 dev bond0.53  src 192.168.53.2 
    cache

root@server2:~# arp -n | grep '192.168.53.1'
192.168.53.1             ether   00:11:0a:10:03:29   C                     bond0.53

Когда я пингую server2 с server1, я не вижу, чтобы ответы arp возвращались на server1:

root@server1:~# tcpdump -ennqt -i bond0 \( arp or icmp \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes

00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 28

но на стороне server2 я вижу, что запрос arp от server1 И ответы отправляются обратно через VLAN53:

root@server2:~# tcpdump -ennqt -i bond0 \( arp or icmp \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes

00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28
00:11:0a:10:03:29 > ff:ff:ff:ff:ff:ff, 802.1Q, length 64: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.2 tell 192.168.53.1, length 46
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Reply 192.168.53.2 is-at 00:15:17:2e:ab:b4, length 28

Для пинга в противоположном направлении я вижу это только на server2:

00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 1, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 2, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 3, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 4, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 102: vlan 53, p 0, ethertype IPv4, 192.168.53.2 > 192.168.53.1: ICMP echo request, id 6506, seq 5, length 64
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.1 tell 192.168.53.2, length 28
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.1 tell 192.168.53.2, length 28
00:15:17:2e:ab:b4 > 00:11:0a:10:03:29, 802.1Q, length 46: vlan 53, p 0, ethertype ARP, Request who-has 192.168.53.1 tell 192.168.53.2, length 28

Никаких настроек firewall, arptables или ebtables с обеих сторон. Ядро sysctl не блокирует ICMP-трафик. Связи крепкие и здоровые. Коммутатор имеет 2 порта в каждой группе LAG, настроенных как магистраль по направлению к каждому серверу, и передает vlan 1 (собственный / по умолчанию без тегов) и 51,52,53,54 с тегами. Я могу пропинговать IP-адреса bond0 172.128.1.129 и 172.128.1.130 с коммутатора. Я могу пинговать 172.128.1.129 (server1) с другого компьютера с Linux, подключенного к коммутатору (ip 172.128.1.5), но не 172.128.1.130 (server2).

Заранее благодарим за любые указатели, идеи, предложения.

ИСПРАВЛЕНИЕ: Я могу пинговать ОБЕИХ сервера с третьего хоста в сети

igorc@client:~$ ip -f inet addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 172.128.1.5/24 brd 172.128.1.255 scope global dynamic eth1
       valid_lft 22497sec preferred_lft 22497sec

igorc@client:~$ ping -c 2 172.128.1.129
PING 172.128.1.129 (172.128.1.129) 56(84) bytes of data.
64 bytes from 172.128.1.129: icmp_seq=1 ttl=64 time=0.618 ms
64 bytes from 172.128.1.129: icmp_seq=2 ttl=64 time=0.541 ms

--- 172.128.1.129 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.541/0.579/0.618/0.045 ms

igorc@client:~$ ping -c 2 172.128.1.130
PING 172.128.1.130 (172.128.1.130) 56(84) bytes of data.
64 bytes from 172.128.1.130: icmp_seq=1 ttl=64 time=0.645 ms
64 bytes from 172.128.1.130: icmp_seq=2 ttl=64 time=0.693 ms

--- 172.128.1.130 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.645/0.669/0.693/0.024 ms

ОБНОВИТЬ: Связь на обоих серверах

root@server1:~# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
    Aggregator ID: 1
    Number of ports: 1
    Actor Key: 17
    Partner Key: 1
    Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 00:11:0a:10:03:29
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 00:11:0a:10:03:28
Aggregator ID: 2
Slave queue ID: 0


root@server2:~# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
    Aggregator ID: 2
    Number of ports: 1
    Actor Key: 17
    Partner Key: 1
    Partner Mac Address: 00:00:00:00:00:00

Slave Interface: p1p1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:15:17:2e:ab:b4
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: p1p2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:15:17:2e:ab:b5
Aggregator ID: 2
Slave queue ID: 0

Решено. Я по ошибке установил LAG в коммутаторе Cisco на статический, а не на динамический, что не позволяет использовать LACP. Встроенное изображение не будет отображаться, вероятно, из-за отсутствия баллов в моем аккаунте, но в любом случае прикрепление его.

Cisco sg200-08 LAG Management

Теперь все выглядит намного лучше:

root@server1:~# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
    Aggregator ID: 1
    **Number of ports: 2**
    Actor Key: 17
    Partner Key: 10
    **Partner Mac Address: 20:bb:c0:78:7e:9b**

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:11:0a:10:03:28
**Aggregator ID: 1**
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:11:0a:10:03:29
**Aggregator ID: 1**
Slave queue ID: 0

Изменения выделены жирным шрифтом (если они видны в виджете кода), сначала количество портов правильно установлено на 2, а не на 1, затем идентификатор агрегатора теперь правильно имеет одинаковое значение для обоих ведомых устройств, и, наконец, MAC-адрес партнера теперь имеет значение (по сравнению с 00: 00: 00: 00: 00: 00 ранее), указывающим на обмен сообщениями LACP UDP между одноранговыми узлами.