Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

IPoIB: Fix loss of connectivity after bonding failover on both sides

Fix bonding failover in the case both peers failover and the
gratuitous ARP is lost. In that case, the sender side will create an
ipoib_neigh and issue a path request with the old GID first. When
skb->dst->neighbour->ha changes due to ARP refresh, this ipoib_neigh
will not be added to the path->list of the path of the new GID,
because the ipoib_neigh already exists. It will not have an AH
either, because of sender-side failover. Therefore, it will not get
an AH when the path is resolved.

The solution here is to compare GIDs in ipoib_start_xmit() even if
neigh->ah is invalid. Comparing with an uninitialized value of
neigh->dgid should be fine, since a spurious match is harmless (and
astronomically unlikely too).

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

authored by

Yossi Etigin and committed by
Roland Dreier
a50df398 6a94cb73

+19 -19
+19 -19
drivers/infiniband/ulp/ipoib/ipoib_main.c
··· 711 711 712 712 neigh = *to_ipoib_neigh(skb->dst->neighbour); 713 713 714 - if (neigh->ah) 715 - if (unlikely((memcmp(&neigh->dgid.raw, 716 - skb->dst->neighbour->ha + 4, 717 - sizeof(union ib_gid))) || 718 - (neigh->dev != dev))) { 719 - spin_lock_irqsave(&priv->lock, flags); 720 - /* 721 - * It's safe to call ipoib_put_ah() inside 722 - * priv->lock here, because we know that 723 - * path->ah will always hold one more reference, 724 - * so ipoib_put_ah() will never do more than 725 - * decrement the ref count. 726 - */ 714 + if (unlikely((memcmp(&neigh->dgid.raw, 715 + skb->dst->neighbour->ha + 4, 716 + sizeof(union ib_gid))) || 717 + (neigh->dev != dev))) { 718 + spin_lock_irqsave(&priv->lock, flags); 719 + /* 720 + * It's safe to call ipoib_put_ah() inside 721 + * priv->lock here, because we know that 722 + * path->ah will always hold one more reference, 723 + * so ipoib_put_ah() will never do more than 724 + * decrement the ref count. 725 + */ 726 + if (neigh->ah) 727 727 ipoib_put_ah(neigh->ah); 728 - list_del(&neigh->list); 729 - ipoib_neigh_free(dev, neigh); 730 - spin_unlock_irqrestore(&priv->lock, flags); 731 - ipoib_path_lookup(skb, dev); 732 - return NETDEV_TX_OK; 733 - } 728 + list_del(&neigh->list); 729 + ipoib_neigh_free(dev, neigh); 730 + spin_unlock_irqrestore(&priv->lock, flags); 731 + ipoib_path_lookup(skb, dev); 732 + return NETDEV_TX_OK; 733 + } 734 734 735 735 if (ipoib_cm_get(neigh)) { 736 736 if (ipoib_cm_up(neigh)) {