Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net: arp: introduce arp_evict_nocarrier sysctl parameter

This change introduces a new sysctl parameter, arp_evict_nocarrier.
When set (default) the ARP cache will be cleared on a NOCARRIER event.
This new option has been defaulted to '1' which maintains existing
behavior.

Clearing the ARP cache on NOCARRIER is relatively new, introduced by:

commit 859bd2ef1fc1110a8031b967ee656c53a6260a76
Author: David Ahern <dsahern@gmail.com>
Date: Thu Oct 11 20:33:49 2018 -0700

net: Evict neighbor entries on carrier down

The reason for this changes is to prevent the ARP cache from being
cleared when a wireless device roams. Specifically for wireless roams
the ARP cache should not be cleared because the underlying network has not
changed. Clearing the ARP cache in this case can introduce significant
delays sending out packets after a roam.

A user reported such a situation here:

https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/

After some investigation it was found that the kernel was holding onto
packets until ARP finished which resulted in this 1 second delay. It
was also found that the first ARP who-has was never responded to,
which is actually what caues the delay. This change is more or less
working around this behavior, but again, there is no reason to clear
the cache on a roam anyways.

As for the unanswered who-has, we know the packet made it OTA since
it was seen while monitoring. Why it never received a response is
unknown. In any case, since this is a problem on the AP side of things
all that can be done is to work around it until it is solved.

Some background on testing/reproducing the packet delay:

Hardware:
- 2 access points configured for Fast BSS Transition (Though I don't
see why regular reassociation wouldn't have the same behavior)
- Wireless station running IWD as supplicant
- A device on network able to respond to pings (I used one of the APs)

Procedure:
- Connect to first AP
- Ping once to establish an ARP entry
- Start a tcpdump
- Roam to second AP
- Wait for operstate UP event, and note the timestamp
- Start pinging

Results:

Below is the tcpdump after UP. It was recorded the interface went UP at
10:42:01.432875.

10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46
10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64
10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64
10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64
10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64
10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64
10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64
10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64
10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64
10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64
10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64
10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64
10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64

You can see the first ARP who-has went out very quickly after UP, but
was never responded to. Nearly a second later the kernel retries and
gets a response. Only then do the ping packets go out. If an ARP entry
is manually added prior to UP (after the cache is cleared) it is seen
that the first ping is never responded to, so its not only an issue with
ARP but with data packets in general.

As mentioned prior, the wireless interface was also monitored to verify
the ping/ARP packet made it OTA which was observed to be true.

Signed-off-by: James Prestwood <prestwoj@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

James Prestwood and committed by
Jakub Kicinski
fcdb44d0 1d6d336f

+27 -1
+9
Documentation/networking/ip-sysctl.rst
··· 1611 1611 gratuitous arp frame, the arp table will be updated regardless 1612 1612 if this setting is on or off. 1613 1613 1614 + arp_evict_nocarrier - BOOLEAN 1615 + Clears the ARP cache on NOCARRIER events. This option is important for 1616 + wireless devices where the ARP cache should not be cleared when roaming 1617 + between access points on the same network. In most cases this should 1618 + remain as the default (1). 1619 + 1620 + - 1 - (default): Clear the ARP cache on NOCARRIER events 1621 + - 0 - Do not clear ARP cache on NOCARRIER events 1622 + 1614 1623 mcast_solicit - INTEGER 1615 1624 The maximum number of multicast probes in INCOMPLETE state, 1616 1625 when the associated hardware address is unknown. Defaults
+2
include/linux/inetdevice.h
··· 133 133 #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE) 134 134 #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE) 135 135 #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY) 136 + #define IN_DEV_ARP_EVICT_NOCARRIER(in_dev) IN_DEV_ANDCONF((in_dev), \ 137 + ARP_EVICT_NOCARRIER) 136 138 137 139 struct in_ifaddr { 138 140 struct hlist_node hash;
+1
include/uapi/linux/ip.h
··· 169 169 IPV4_DEVCONF_DROP_UNICAST_IN_L2_MULTICAST, 170 170 IPV4_DEVCONF_DROP_GRATUITOUS_ARP, 171 171 IPV4_DEVCONF_BC_FORWARDING, 172 + IPV4_DEVCONF_ARP_EVICT_NOCARRIER, 172 173 __IPV4_DEVCONF_MAX 173 174 }; 174 175
+1
include/uapi/linux/sysctl.h
··· 482 482 NET_IPV4_CONF_PROMOTE_SECONDARIES=20, 483 483 NET_IPV4_CONF_ARP_ACCEPT=21, 484 484 NET_IPV4_CONF_ARP_NOTIFY=22, 485 + NET_IPV4_CONF_ARP_EVICT_NOCARRIER=23, 485 486 }; 486 487 487 488 /* /proc/sys/net/ipv4/netfilter */
+10 -1
net/ipv4/arp.c
··· 1247 1247 { 1248 1248 struct net_device *dev = netdev_notifier_info_to_dev(ptr); 1249 1249 struct netdev_notifier_change_info *change_info; 1250 + struct in_device *in_dev; 1251 + bool evict_nocarrier; 1250 1252 1251 1253 switch (event) { 1252 1254 case NETDEV_CHANGEADDR: ··· 1259 1257 change_info = ptr; 1260 1258 if (change_info->flags_changed & IFF_NOARP) 1261 1259 neigh_changeaddr(&arp_tbl, dev); 1262 - if (!netif_carrier_ok(dev)) 1260 + 1261 + in_dev = __in_dev_get_rtnl(dev); 1262 + if (!in_dev) 1263 + evict_nocarrier = true; 1264 + else 1265 + evict_nocarrier = IN_DEV_ARP_EVICT_NOCARRIER(in_dev); 1266 + 1267 + if (evict_nocarrier && !netif_carrier_ok(dev)) 1263 1268 neigh_carrier_down(&arp_tbl, dev); 1264 1269 break; 1265 1270 default:
+4
net/ipv4/devinet.c
··· 75 75 [IPV4_DEVCONF_SHARED_MEDIA - 1] = 1, 76 76 [IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL - 1] = 10000 /*ms*/, 77 77 [IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL - 1] = 1000 /*ms*/, 78 + [IPV4_DEVCONF_ARP_EVICT_NOCARRIER - 1] = 1, 78 79 }, 79 80 }; 80 81 ··· 88 87 [IPV4_DEVCONF_ACCEPT_SOURCE_ROUTE - 1] = 1, 89 88 [IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL - 1] = 10000 /*ms*/, 90 89 [IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL - 1] = 1000 /*ms*/, 90 + [IPV4_DEVCONF_ARP_EVICT_NOCARRIER - 1] = 1, 91 91 }, 92 92 }; 93 93 ··· 2534 2532 DEVINET_SYSCTL_RW_ENTRY(ARP_IGNORE, "arp_ignore"), 2535 2533 DEVINET_SYSCTL_RW_ENTRY(ARP_ACCEPT, "arp_accept"), 2536 2534 DEVINET_SYSCTL_RW_ENTRY(ARP_NOTIFY, "arp_notify"), 2535 + DEVINET_SYSCTL_RW_ENTRY(ARP_EVICT_NOCARRIER, 2536 + "arp_evict_nocarrier"), 2537 2537 DEVINET_SYSCTL_RW_ENTRY(PROXY_ARP_PVLAN, "proxy_arp_pvlan"), 2538 2538 DEVINET_SYSCTL_RW_ENTRY(FORCE_IGMP_VERSION, 2539 2539 "force_igmp_version"),