Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'icmp-add-rfc-5837-support'

Ido Schimmel says:

====================
icmp: Add RFC 5837 support

tl;dr
=====

This patchset extends certain ICMP error messages (e.g., "Time
Exceeded") with incoming interface information in accordance with RFC
5837 [1]. This is required for more meaningful traceroute results in
unnumbered networks. Like other ICMP settings, the feature is controlled
via a per-{netns, address family} sysctl. The interface and the
implementation are designed to support more ICMP extensions.

Motivation
==========

Over the years, the kernel was extended with the ability to derive the
source IP of ICMP error messages from the interface that received the
datagram which elicited the ICMP error [2][3][4]. This is especially
important for "Time Exceeded" messages as it allows traceroute users to
trace the actual packet path along the network.

The above scheme does not work in unnumbered networks. In these
networks, only the loopback / VRF interface is assigned a global IP
address while router interfaces are assigned IPv6 link-local addresses.
As such, ICMP error messages are generated with a source IP derived from
the loopback / VRF interface, making it impossible to trace the actual
packet path when parallel links exist between routers.

The problem can be solved by implementing the solution proposed by RFC
4884 [5] and RFC 5837. The former defines an ICMP extension structure
that can be appended to selected ICMP messages and carry extension
objects. The latter defines an extension object called the "Interface
Information Object" (IIO) that can carry interface information (e.g.,
name, index, MTU) about interfaces with certain roles such as the
interface that received the datagram which elicited the ICMP error.

The payload of the datagram that elicited the error (potentially padded
/ trimmed) along with the ICMP extension structure will be queued to the
error queue of the originating socket, thereby allowing traceroute
applications to parse and display the information encoded in the ICMP
extension structure. Example:

# traceroute6 -e 2001:db8:1::3
traceroute to 2001:db8:1::3 (2001:db8:1::3), 30 hops max, 80 byte packets
1 2001:db8:1::2 (2001:db8:1::2) <INC:11,"eth1",mtu=1500> 0.214 ms 0.171 ms 0.162 ms
2 2001:db8:1::3 (2001:db8:1::3) <INC:12,"eth2",mtu=1500> 0.154 ms 0.135 ms 0.127 ms

# traceroute -e 192.0.2.3
traceroute to 192.0.2.3 (192.0.2.3), 30 hops max, 60 byte packets
1 192.0.2.2 (192.0.2.2) <INC:11,"eth1",mtu=1500> 0.191 ms 0.148 ms 0.144 ms
2 192.0.2.3 (192.0.2.3) <INC:12,"eth2",mtu=1500> 0.137 ms 0.122 ms 0.114 ms

Implementation
==============

As previously stated, the feature is controlled via a per-{netns,
address} sysctl. Specifically, a bit mask where each bit controls the
addition of a different ICMP extension to ICMP error messages.
Currently, only a single value is supported, to append the incoming
interface information.

Key points:

1. Global knob vs finer control. I am not aware of users who require
finer control, but it is possible that some users will want to avoid
appending ICMP extensions when the packet is sent out of a specific
interface (e.g., the management interface) or to a specific subnet. This
can be accomplished via a tc-bpf program that trims the ICMP extension
structure. An example program can be found here [6].

2. Split implementation between IPv4 / IPv6. While the implementation is
currently similar, there are some differences between both address
families. In addition, some extensions (e.g., RFC 8883 [7]) are
IPv6-specific. Given the above and given that the implementation is not
very complex, it makes sense to keep both implementations separate.

3. Compatibility with legacy applications. RFC 4884 from 2007 extended
certain ICMP messages with a length field that encodes the length of the
"original datagram" field, so that applications will be able to tell
where the "original datagram" ends and where the ICMP extension
structure starts.

Before the introduction of the IP{,6}_RECVERR_RFC4884 socket options
[8][9] in 2020 it was impossible for applications to know where the ICMP
extension structure starts and to this day some applications assume that
it starts at offset 128, which is the minimum length of the "original
datagram" field as specified by RFC 4884.

Therefore, in order to be compatible with both legacy and modern
applications, the datagram that elicited the ICMP error is trimmed /
padded to 128 bytes before appending the ICMP extension structure.

This behavior is specifically called out by RFC 4884: "Those wishing to
be backward compatible with non-compliant TRACEROUTE implementations
will include exactly 128 octets" [10].

Note that in 128 bytes we should be able to include enough headers for
the originating node to match the ICMP error message with the relevant
socket. For example, the following headers will be present in the
"original datagram" field when a VXLAN encapsulated IPv6 packet elicits
an ICMP error in an IPv6 underlay: IPv6 (40) | UDP (8) | VXLAN (8) | Eth
(14) | IPv6 (40) | UDP (8). Overall, 118 bytes.

If the 128 bytes limit proves to be insufficient for some use case, we
can consider dedicating a new bit in the previously mentioned sysctl to
allow for more bytes to be included in the "original datagram" field.

4. Extensibility. This patchset adds partial support for a single ICMP
extension. However, the interface and the implementation should be able
to support more extensions, if needed. Examples:

* More interface information objects as part of RFC 5837. We should be
able to derive the outgoing interface information and nexthop IP from
the dst entry attached to the packet that elicited the error.

* Node identification object (e.g., hostname / loopback IP) [11].

* Extended Information object which encodes aggregate header limits as
part of RFC 8883.

A previous proposal from Ishaan Gandhi and Ron Bonica is available here
[12].

Testing
=======

The existing traceroute selftest is extended to test that ICMP
extensions are reported correctly when enabled. Both address families
are tested and with different packet sizes in order to make sure that
trimming / padding works correctly. Tested that packets are parsed
correctly by the IP{,6}_RECVERR_RFC4884 socket options using Willem's
selftest [13].

Changelog
=========

Changes since v1 [14]:

* Patches #1-#2: Added a comment about field ordering and review tags.

* Patch #3: Converted "sysctl" to "echo" when testing the return value.
Added a check to skip the test if traceroute version is older
than 2.1.5.

[1] https://datatracker.ietf.org/doc/html/rfc5837
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c2fb7f93cb20621772bf304f3dba0849942e5db
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fac6fce9bdb59837bb89930c3a92f5e0d1482f0b
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4a8c416602d97a4e2073ed563d4d4c7627de19cf
[5] https://datatracker.ietf.org/doc/html/rfc4884
[6] https://gist.github.com/idosch/5013448cdb5e9e060e6bfdc8b433577c
[7] https://datatracker.ietf.org/doc/html/rfc8883
[8] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eba75c587e811d3249c8bd50d22bb2266ccd3c0f
[9] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=01370434df85eb76ecb1527a4466013c4aca2436
[10] https://datatracker.ietf.org/doc/html/rfc4884#section-5.3
[11] https://datatracker.ietf.org/doc/html/draft-ietf-intarea-extended-icmp-nodeid-04
[12] https://lore.kernel.org/netdev/20210317221959.4410-1-ishaangandhi@gmail.com/
[13] https://lore.kernel.org/netdev/aPpMItF35gwpgzZx@shredder/
[14] https://lore.kernel.org/netdev/20251022065349.434123-1-idosch@nvidia.com/
====================

Link: https://patch.msgid.link/20251027082232.232571-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+796 -3
+34
Documentation/networking/ip-sysctl.rst
··· 1796 1796 1797 1797 Default: 0 (disabled) 1798 1798 1799 + icmp_errors_extension_mask - UNSIGNED INTEGER 1800 + Bitmask of ICMP extensions to append to ICMPv4 error messages 1801 + ("Destination Unreachable", "Time Exceeded" and "Parameter Problem"). 1802 + The original datagram is trimmed / padded to 128 bytes in order to be 1803 + compatible with applications that do not comply with RFC 4884. 1804 + 1805 + Possible extensions are: 1806 + 1807 + ==== ============================================================== 1808 + 0x01 Incoming IP interface information according to RFC 5837. 1809 + Extension will include the index, IPv4 address (if present), 1810 + name and MTU of the IP interface that received the datagram 1811 + which elicited the ICMP error. 1812 + ==== ============================================================== 1813 + 1814 + Default: 0x00 (no extensions) 1815 + 1799 1816 igmp_max_memberships - INTEGER 1800 1817 Change the maximum number of multicast groups we can subscribe to. 1801 1818 Default: 20 ··· 3278 3261 - 1 (enabled) 3279 3262 3280 3263 Default: 0 (disabled) 3264 + 3265 + errors_extension_mask - UNSIGNED INTEGER 3266 + Bitmask of ICMP extensions to append to ICMPv6 error messages 3267 + ("Destination Unreachable" and "Time Exceeded"). The original datagram 3268 + is trimmed / padded to 128 bytes in order to be compatible with 3269 + applications that do not comply with RFC 4884. 3270 + 3271 + Possible extensions are: 3272 + 3273 + ==== ============================================================== 3274 + 0x01 Incoming IP interface information according to RFC 5837. 3275 + Extension will include the index, IPv6 address (if present), 3276 + name and MTU of the IP interface that received the datagram 3277 + which elicited the ICMP error. 3278 + ==== ============================================================== 3279 + 3280 + Default: 0x00 (no extensions) 3281 3281 3282 3282 xfrm6_gc_thresh - INTEGER 3283 3283 (Obsolete since linux-4.14)
+32
include/linux/icmp.h
··· 40 40 struct sock_ee_data_rfc4884 *out, 41 41 int thlen, int off); 42 42 43 + /* RFC 4884 */ 44 + #define ICMP_EXT_ORIG_DGRAM_MIN_LEN 128 45 + #define ICMP_EXT_VERSION_2 2 46 + 47 + /* ICMP Extension Object Classes */ 48 + #define ICMP_EXT_OBJ_CLASS_IIO 2 /* RFC 5837 */ 49 + 50 + /* Interface Information Object - RFC 5837 */ 51 + enum { 52 + ICMP_EXT_CTYPE_IIO_ROLE_IIF, 53 + }; 54 + 55 + #define ICMP_EXT_CTYPE_IIO_ROLE(ROLE) ((ROLE) << 6) 56 + #define ICMP_EXT_CTYPE_IIO_MTU BIT(0) 57 + #define ICMP_EXT_CTYPE_IIO_NAME BIT(1) 58 + #define ICMP_EXT_CTYPE_IIO_IPADDR BIT(2) 59 + #define ICMP_EXT_CTYPE_IIO_IFINDEX BIT(3) 60 + 61 + struct icmp_ext_iio_name_subobj { 62 + u8 len; 63 + char name[IFNAMSIZ]; 64 + }; 65 + 66 + enum { 67 + /* RFC 5837 - Incoming IP Interface Role */ 68 + ICMP_ERR_EXT_IIO_IIF, 69 + /* Add new constants above. Used by "icmp_errors_extension_mask" 70 + * sysctl. 71 + */ 72 + ICMP_ERR_EXT_COUNT, 73 + }; 74 + 43 75 #endif /* _LINUX_ICMP_H */
+1
include/net/netns/ipv4.h
··· 135 135 u8 sysctl_icmp_echo_ignore_broadcasts; 136 136 u8 sysctl_icmp_ignore_bogus_error_responses; 137 137 u8 sysctl_icmp_errors_use_inbound_ifaddr; 138 + u8 sysctl_icmp_errors_extension_mask; 138 139 int sysctl_icmp_ratelimit; 139 140 int sysctl_icmp_ratemask; 140 141 int sysctl_icmp_msgs_per_sec;
+1
include/net/netns/ipv6.h
··· 56 56 u8 skip_notify_on_dev_down; 57 57 u8 fib_notify_on_flag_change; 58 58 u8 icmpv6_error_anycast_as_unicast; 59 + u8 icmpv6_errors_extension_mask; 59 60 }; 60 61 61 62 struct netns_ipv6 {
+1
net/core/dev.c
··· 1163 1163 strscpy(name, dev->name, IFNAMSIZ); 1164 1164 } while (read_seqretry(&netdev_rename_lock, seq)); 1165 1165 } 1166 + EXPORT_IPV6_MOD_GPL(netdev_copy_name); 1166 1167 1167 1168 /** 1168 1169 * netdev_get_name - get a netdevice name, knowing its ifindex.
+190 -1
net/ipv4/icmp.c
··· 582 582 return ERR_PTR(err); 583 583 } 584 584 585 + struct icmp_ext_iio_addr4_subobj { 586 + __be16 afi; 587 + __be16 reserved; 588 + __be32 addr4; 589 + }; 590 + 591 + static unsigned int icmp_ext_iio_len(void) 592 + { 593 + return sizeof(struct icmp_extobj_hdr) + 594 + /* ifIndex */ 595 + sizeof(__be32) + 596 + /* Interface Address Sub-Object */ 597 + sizeof(struct icmp_ext_iio_addr4_subobj) + 598 + /* Interface Name Sub-Object. Length must be a multiple of 4 599 + * bytes. 600 + */ 601 + ALIGN(sizeof(struct icmp_ext_iio_name_subobj), 4) + 602 + /* MTU */ 603 + sizeof(__be32); 604 + } 605 + 606 + static unsigned int icmp_ext_max_len(u8 ext_objs) 607 + { 608 + unsigned int ext_max_len; 609 + 610 + ext_max_len = sizeof(struct icmp_ext_hdr); 611 + 612 + if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF)) 613 + ext_max_len += icmp_ext_iio_len(); 614 + 615 + return ext_max_len; 616 + } 617 + 618 + static __be32 icmp_ext_iio_addr4_find(const struct net_device *dev) 619 + { 620 + struct in_device *in_dev; 621 + struct in_ifaddr *ifa; 622 + 623 + in_dev = __in_dev_get_rcu(dev); 624 + if (!in_dev) 625 + return 0; 626 + 627 + /* It is unclear from RFC 5837 which IP address should be chosen, but 628 + * it makes sense to choose a global unicast address. 629 + */ 630 + in_dev_for_each_ifa_rcu(ifa, in_dev) { 631 + if (READ_ONCE(ifa->ifa_flags) & IFA_F_SECONDARY) 632 + continue; 633 + if (ifa->ifa_scope != RT_SCOPE_UNIVERSE || 634 + ipv4_is_multicast(ifa->ifa_address)) 635 + continue; 636 + return ifa->ifa_address; 637 + } 638 + 639 + return 0; 640 + } 641 + 642 + static void icmp_ext_iio_iif_append(struct net *net, struct sk_buff *skb, 643 + int iif) 644 + { 645 + struct icmp_ext_iio_name_subobj *name_subobj; 646 + struct icmp_extobj_hdr *objh; 647 + struct net_device *dev; 648 + __be32 data; 649 + 650 + if (!iif) 651 + return; 652 + 653 + /* Add the fields in the order specified by RFC 5837. */ 654 + objh = skb_put(skb, sizeof(*objh)); 655 + objh->class_num = ICMP_EXT_OBJ_CLASS_IIO; 656 + objh->class_type = ICMP_EXT_CTYPE_IIO_ROLE(ICMP_EXT_CTYPE_IIO_ROLE_IIF); 657 + 658 + data = htonl(iif); 659 + skb_put_data(skb, &data, sizeof(__be32)); 660 + objh->class_type |= ICMP_EXT_CTYPE_IIO_IFINDEX; 661 + 662 + rcu_read_lock(); 663 + 664 + dev = dev_get_by_index_rcu(net, iif); 665 + if (!dev) 666 + goto out; 667 + 668 + data = icmp_ext_iio_addr4_find(dev); 669 + if (data) { 670 + struct icmp_ext_iio_addr4_subobj *addr4_subobj; 671 + 672 + addr4_subobj = skb_put_zero(skb, sizeof(*addr4_subobj)); 673 + addr4_subobj->afi = htons(ICMP_AFI_IP); 674 + addr4_subobj->addr4 = data; 675 + objh->class_type |= ICMP_EXT_CTYPE_IIO_IPADDR; 676 + } 677 + 678 + name_subobj = skb_put_zero(skb, ALIGN(sizeof(*name_subobj), 4)); 679 + name_subobj->len = ALIGN(sizeof(*name_subobj), 4); 680 + netdev_copy_name(dev, name_subobj->name); 681 + objh->class_type |= ICMP_EXT_CTYPE_IIO_NAME; 682 + 683 + data = htonl(READ_ONCE(dev->mtu)); 684 + skb_put_data(skb, &data, sizeof(__be32)); 685 + objh->class_type |= ICMP_EXT_CTYPE_IIO_MTU; 686 + 687 + out: 688 + rcu_read_unlock(); 689 + objh->length = htons(skb_tail_pointer(skb) - (unsigned char *)objh); 690 + } 691 + 692 + static void icmp_ext_objs_append(struct net *net, struct sk_buff *skb, 693 + u8 ext_objs, int iif) 694 + { 695 + if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF)) 696 + icmp_ext_iio_iif_append(net, skb, iif); 697 + } 698 + 699 + static struct sk_buff * 700 + icmp_ext_append(struct net *net, struct sk_buff *skb_in, struct icmphdr *icmph, 701 + unsigned int room, int iif) 702 + { 703 + unsigned int payload_len, ext_max_len, ext_len; 704 + struct icmp_ext_hdr *ext_hdr; 705 + struct sk_buff *skb; 706 + u8 ext_objs; 707 + int nhoff; 708 + 709 + switch (icmph->type) { 710 + case ICMP_DEST_UNREACH: 711 + case ICMP_TIME_EXCEEDED: 712 + case ICMP_PARAMETERPROB: 713 + break; 714 + default: 715 + return NULL; 716 + } 717 + 718 + ext_objs = READ_ONCE(net->ipv4.sysctl_icmp_errors_extension_mask); 719 + if (!ext_objs) 720 + return NULL; 721 + 722 + ext_max_len = icmp_ext_max_len(ext_objs); 723 + if (ICMP_EXT_ORIG_DGRAM_MIN_LEN + ext_max_len > room) 724 + return NULL; 725 + 726 + skb = skb_clone(skb_in, GFP_ATOMIC); 727 + if (!skb) 728 + return NULL; 729 + 730 + nhoff = skb_network_offset(skb); 731 + payload_len = min(skb->len - nhoff, ICMP_EXT_ORIG_DGRAM_MIN_LEN); 732 + 733 + if (!pskb_network_may_pull(skb, payload_len)) 734 + goto free_skb; 735 + 736 + if (pskb_trim(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN) || 737 + __skb_put_padto(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN, false)) 738 + goto free_skb; 739 + 740 + if (pskb_expand_head(skb, 0, ext_max_len, GFP_ATOMIC)) 741 + goto free_skb; 742 + 743 + ext_hdr = skb_put_zero(skb, sizeof(*ext_hdr)); 744 + ext_hdr->version = ICMP_EXT_VERSION_2; 745 + 746 + icmp_ext_objs_append(net, skb, ext_objs, iif); 747 + 748 + /* Do not send an empty extension structure. */ 749 + ext_len = skb_tail_pointer(skb) - (unsigned char *)ext_hdr; 750 + if (ext_len == sizeof(*ext_hdr)) 751 + goto free_skb; 752 + 753 + ext_hdr->checksum = ip_compute_csum(ext_hdr, ext_len); 754 + /* The length of the original datagram in 32-bit words (RFC 4884). */ 755 + icmph->un.reserved[1] = ICMP_EXT_ORIG_DGRAM_MIN_LEN / sizeof(u32); 756 + 757 + return skb; 758 + 759 + free_skb: 760 + consume_skb(skb); 761 + return NULL; 762 + } 763 + 585 764 /* 586 765 * Send an ICMP message in response to a situation 587 766 * ··· 780 601 struct icmp_bxm icmp_param; 781 602 struct rtable *rt = skb_rtable(skb_in); 782 603 bool apply_ratelimit = false; 604 + struct sk_buff *ext_skb; 783 605 struct ipcm_cookie ipc; 784 606 struct flowi4 fl4; 785 607 __be32 saddr; ··· 950 770 if (room <= (int)sizeof(struct iphdr)) 951 771 goto ende; 952 772 953 - icmp_param.data_len = skb_in->len - icmp_param.offset; 773 + ext_skb = icmp_ext_append(net, skb_in, &icmp_param.data.icmph, room, 774 + parm->iif); 775 + if (ext_skb) 776 + icmp_param.skb = ext_skb; 777 + 778 + icmp_param.data_len = icmp_param.skb->len - icmp_param.offset; 954 779 if (icmp_param.data_len > room) 955 780 icmp_param.data_len = room; 956 781 icmp_param.head_len = sizeof(struct icmphdr); ··· 970 785 trace_icmp_send(skb_in, type, code); 971 786 972 787 icmp_push_reply(sk, &icmp_param, &fl4, &ipc, &rt); 788 + 789 + if (ext_skb) 790 + consume_skb(ext_skb); 973 791 ende: 974 792 ip_rt_put(rt); 975 793 out_unlock: ··· 1690 1502 net->ipv4.sysctl_icmp_ratelimit = 1 * HZ; 1691 1503 net->ipv4.sysctl_icmp_ratemask = 0x1818; 1692 1504 net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr = 0; 1505 + net->ipv4.sysctl_icmp_errors_extension_mask = 0; 1693 1506 net->ipv4.sysctl_icmp_msgs_per_sec = 1000; 1694 1507 net->ipv4.sysctl_icmp_msgs_burst = 50; 1695 1508
+11
net/ipv4/sysctl_net_ipv4.c
··· 48 48 static int tcp_plb_max_cong_thresh = 256; 49 49 static unsigned int tcp_tw_reuse_delay_max = TCP_PAWS_MSL * MSEC_PER_SEC; 50 50 static int tcp_ecn_mode_max = 2; 51 + static u32 icmp_errors_extension_mask_all = 52 + GENMASK_U8(ICMP_ERR_EXT_COUNT - 1, 0); 51 53 52 54 /* obsolete */ 53 55 static int sysctl_tcp_low_latency __read_mostly; ··· 675 673 .proc_handler = proc_dou8vec_minmax, 676 674 .extra1 = SYSCTL_ZERO, 677 675 .extra2 = SYSCTL_ONE 676 + }, 677 + { 678 + .procname = "icmp_errors_extension_mask", 679 + .data = &init_net.ipv4.sysctl_icmp_errors_extension_mask, 680 + .maxlen = sizeof(u8), 681 + .mode = 0644, 682 + .proc_handler = proc_dou8vec_minmax, 683 + .extra1 = SYSCTL_ZERO, 684 + .extra2 = &icmp_errors_extension_mask_all, 678 685 }, 679 686 { 680 687 .procname = "icmp_ratelimit",
+1
net/ipv6/af_inet6.c
··· 960 960 net->ipv6.sysctl.icmpv6_echo_ignore_multicast = 0; 961 961 net->ipv6.sysctl.icmpv6_echo_ignore_anycast = 0; 962 962 net->ipv6.sysctl.icmpv6_error_anycast_as_unicast = 0; 963 + net->ipv6.sysctl.icmpv6_errors_extension_mask = 0; 963 964 964 965 /* By default, rate limit error messages. 965 966 * Except for pmtu discovery, it would break it.
+212 -2
net/ipv6/icmp.c
··· 444 444 return icmp6_dev(skb)->ifindex; 445 445 } 446 446 447 + struct icmp6_ext_iio_addr6_subobj { 448 + __be16 afi; 449 + __be16 reserved; 450 + struct in6_addr addr6; 451 + }; 452 + 453 + static unsigned int icmp6_ext_iio_len(void) 454 + { 455 + return sizeof(struct icmp_extobj_hdr) + 456 + /* ifIndex */ 457 + sizeof(__be32) + 458 + /* Interface Address Sub-Object */ 459 + sizeof(struct icmp6_ext_iio_addr6_subobj) + 460 + /* Interface Name Sub-Object. Length must be a multiple of 4 461 + * bytes. 462 + */ 463 + ALIGN(sizeof(struct icmp_ext_iio_name_subobj), 4) + 464 + /* MTU */ 465 + sizeof(__be32); 466 + } 467 + 468 + static unsigned int icmp6_ext_max_len(u8 ext_objs) 469 + { 470 + unsigned int ext_max_len; 471 + 472 + ext_max_len = sizeof(struct icmp_ext_hdr); 473 + 474 + if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF)) 475 + ext_max_len += icmp6_ext_iio_len(); 476 + 477 + return ext_max_len; 478 + } 479 + 480 + static struct in6_addr *icmp6_ext_iio_addr6_find(const struct net_device *dev) 481 + { 482 + struct inet6_dev *in6_dev; 483 + struct inet6_ifaddr *ifa; 484 + 485 + in6_dev = __in6_dev_get(dev); 486 + if (!in6_dev) 487 + return NULL; 488 + 489 + /* It is unclear from RFC 5837 which IP address should be chosen, but 490 + * it makes sense to choose a global unicast address. 491 + */ 492 + list_for_each_entry_rcu(ifa, &in6_dev->addr_list, if_list) { 493 + if (ifa->flags & (IFA_F_TENTATIVE | IFA_F_DADFAILED)) 494 + continue; 495 + if (ipv6_addr_type(&ifa->addr) != IPV6_ADDR_UNICAST || 496 + ipv6_addr_src_scope(&ifa->addr) != IPV6_ADDR_SCOPE_GLOBAL) 497 + continue; 498 + return &ifa->addr; 499 + } 500 + 501 + return NULL; 502 + } 503 + 504 + static void icmp6_ext_iio_iif_append(struct net *net, struct sk_buff *skb, 505 + int iif) 506 + { 507 + struct icmp_ext_iio_name_subobj *name_subobj; 508 + struct icmp_extobj_hdr *objh; 509 + struct net_device *dev; 510 + struct in6_addr *addr6; 511 + __be32 data; 512 + 513 + if (!iif) 514 + return; 515 + 516 + /* Add the fields in the order specified by RFC 5837. */ 517 + objh = skb_put(skb, sizeof(*objh)); 518 + objh->class_num = ICMP_EXT_OBJ_CLASS_IIO; 519 + objh->class_type = ICMP_EXT_CTYPE_IIO_ROLE(ICMP_EXT_CTYPE_IIO_ROLE_IIF); 520 + 521 + data = htonl(iif); 522 + skb_put_data(skb, &data, sizeof(__be32)); 523 + objh->class_type |= ICMP_EXT_CTYPE_IIO_IFINDEX; 524 + 525 + rcu_read_lock(); 526 + 527 + dev = dev_get_by_index_rcu(net, iif); 528 + if (!dev) 529 + goto out; 530 + 531 + addr6 = icmp6_ext_iio_addr6_find(dev); 532 + if (addr6) { 533 + struct icmp6_ext_iio_addr6_subobj *addr6_subobj; 534 + 535 + addr6_subobj = skb_put_zero(skb, sizeof(*addr6_subobj)); 536 + addr6_subobj->afi = htons(ICMP_AFI_IP6); 537 + addr6_subobj->addr6 = *addr6; 538 + objh->class_type |= ICMP_EXT_CTYPE_IIO_IPADDR; 539 + } 540 + 541 + name_subobj = skb_put_zero(skb, ALIGN(sizeof(*name_subobj), 4)); 542 + name_subobj->len = ALIGN(sizeof(*name_subobj), 4); 543 + netdev_copy_name(dev, name_subobj->name); 544 + objh->class_type |= ICMP_EXT_CTYPE_IIO_NAME; 545 + 546 + data = htonl(READ_ONCE(dev->mtu)); 547 + skb_put_data(skb, &data, sizeof(__be32)); 548 + objh->class_type |= ICMP_EXT_CTYPE_IIO_MTU; 549 + 550 + out: 551 + rcu_read_unlock(); 552 + objh->length = htons(skb_tail_pointer(skb) - (unsigned char *)objh); 553 + } 554 + 555 + static void icmp6_ext_objs_append(struct net *net, struct sk_buff *skb, 556 + u8 ext_objs, int iif) 557 + { 558 + if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF)) 559 + icmp6_ext_iio_iif_append(net, skb, iif); 560 + } 561 + 562 + static struct sk_buff * 563 + icmp6_ext_append(struct net *net, struct sk_buff *skb_in, 564 + struct icmp6hdr *icmp6h, unsigned int room, int iif) 565 + { 566 + unsigned int payload_len, ext_max_len, ext_len; 567 + struct icmp_ext_hdr *ext_hdr; 568 + struct sk_buff *skb; 569 + u8 ext_objs; 570 + int nhoff; 571 + 572 + switch (icmp6h->icmp6_type) { 573 + case ICMPV6_DEST_UNREACH: 574 + case ICMPV6_TIME_EXCEED: 575 + break; 576 + default: 577 + return NULL; 578 + } 579 + 580 + /* Do not overwrite existing extensions. This can happen when we 581 + * receive an ICMPv4 message with extensions from a tunnel and 582 + * translate it to an ICMPv6 message towards an IPv6 host in the 583 + * overlay network. 584 + */ 585 + if (icmp6h->icmp6_datagram_len) 586 + return NULL; 587 + 588 + ext_objs = READ_ONCE(net->ipv6.sysctl.icmpv6_errors_extension_mask); 589 + if (!ext_objs) 590 + return NULL; 591 + 592 + ext_max_len = icmp6_ext_max_len(ext_objs); 593 + if (ICMP_EXT_ORIG_DGRAM_MIN_LEN + ext_max_len > room) 594 + return NULL; 595 + 596 + skb = skb_clone(skb_in, GFP_ATOMIC); 597 + if (!skb) 598 + return NULL; 599 + 600 + nhoff = skb_network_offset(skb); 601 + payload_len = min(skb->len - nhoff, ICMP_EXT_ORIG_DGRAM_MIN_LEN); 602 + 603 + if (!pskb_network_may_pull(skb, payload_len)) 604 + goto free_skb; 605 + 606 + if (pskb_trim(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN) || 607 + __skb_put_padto(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN, false)) 608 + goto free_skb; 609 + 610 + if (pskb_expand_head(skb, 0, ext_max_len, GFP_ATOMIC)) 611 + goto free_skb; 612 + 613 + ext_hdr = skb_put_zero(skb, sizeof(*ext_hdr)); 614 + ext_hdr->version = ICMP_EXT_VERSION_2; 615 + 616 + icmp6_ext_objs_append(net, skb, ext_objs, iif); 617 + 618 + /* Do not send an empty extension structure. */ 619 + ext_len = skb_tail_pointer(skb) - (unsigned char *)ext_hdr; 620 + if (ext_len == sizeof(*ext_hdr)) 621 + goto free_skb; 622 + 623 + ext_hdr->checksum = ip_compute_csum(ext_hdr, ext_len); 624 + /* The length of the original datagram in 64-bit words (RFC 4884). */ 625 + icmp6h->icmp6_datagram_len = ICMP_EXT_ORIG_DGRAM_MIN_LEN / sizeof(u64); 626 + 627 + return skb; 628 + 629 + free_skb: 630 + consume_skb(skb); 631 + return NULL; 632 + } 633 + 447 634 /* 448 635 * Send an ICMP message in response to a packet in error 449 636 */ ··· 645 458 struct ipv6_pinfo *np; 646 459 const struct in6_addr *saddr = NULL; 647 460 bool apply_ratelimit = false; 461 + struct sk_buff *ext_skb; 648 462 struct dst_entry *dst; 463 + unsigned int room; 649 464 struct icmp6hdr tmp_hdr; 650 465 struct flowi6 fl6; 651 466 struct icmpv6_msg msg; ··· 801 612 msg.offset = skb_network_offset(skb); 802 613 msg.type = type; 803 614 804 - len = skb->len - msg.offset; 805 - len = min_t(unsigned int, len, IPV6_MIN_MTU - sizeof(struct ipv6hdr) - sizeof(struct icmp6hdr)); 615 + room = IPV6_MIN_MTU - sizeof(struct ipv6hdr) - sizeof(struct icmp6hdr); 616 + ext_skb = icmp6_ext_append(net, skb, &tmp_hdr, room, parm->iif); 617 + if (ext_skb) 618 + msg.skb = ext_skb; 619 + 620 + len = msg.skb->len - msg.offset; 621 + len = min_t(unsigned int, len, room); 806 622 if (len < 0) { 807 623 net_dbg_ratelimited("icmp: len problem [%pI6c > %pI6c]\n", 808 624 &hdr->saddr, &hdr->daddr); ··· 829 635 } 830 636 831 637 out_dst_release: 638 + if (ext_skb) 639 + consume_skb(ext_skb); 832 640 dst_release(dst); 833 641 out_unlock: 834 642 icmpv6_xmit_unlock(sk); ··· 1367 1171 EXPORT_SYMBOL(icmpv6_err_convert); 1368 1172 1369 1173 #ifdef CONFIG_SYSCTL 1174 + 1175 + static u32 icmpv6_errors_extension_mask_all = 1176 + GENMASK_U8(ICMP_ERR_EXT_COUNT - 1, 0); 1177 + 1370 1178 static struct ctl_table ipv6_icmp_table_template[] = { 1371 1179 { 1372 1180 .procname = "ratelimit", ··· 1416 1216 .extra1 = SYSCTL_ZERO, 1417 1217 .extra2 = SYSCTL_ONE, 1418 1218 }, 1219 + { 1220 + .procname = "errors_extension_mask", 1221 + .data = &init_net.ipv6.sysctl.icmpv6_errors_extension_mask, 1222 + .maxlen = sizeof(u8), 1223 + .mode = 0644, 1224 + .proc_handler = proc_dou8vec_minmax, 1225 + .extra1 = SYSCTL_ZERO, 1226 + .extra2 = &icmpv6_errors_extension_mask_all, 1227 + }, 1419 1228 }; 1420 1229 1421 1230 struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net) ··· 1442 1233 table[3].data = &net->ipv6.sysctl.icmpv6_echo_ignore_anycast; 1443 1234 table[4].data = &net->ipv6.sysctl.icmpv6_ratemask_ptr; 1444 1235 table[5].data = &net->ipv6.sysctl.icmpv6_error_anycast_as_unicast; 1236 + table[6].data = &net->ipv6.sysctl.icmpv6_errors_extension_mask; 1445 1237 } 1446 1238 return table; 1447 1239 }
+313
tools/testing/selftests/net/traceroute.sh
··· 36 36 return $rc 37 37 } 38 38 39 + __check_traceroute_version() 40 + { 41 + local cmd=$1; shift 42 + local req_ver=$1; shift 43 + local ver 44 + 45 + req_ver=$(echo "$req_ver" | sed 's/\.//g') 46 + ver=$($cmd -V 2>&1 | grep -Eo '[0-9]+.[0-9]+.[0-9]+' | sed 's/\.//g') 47 + if [[ $ver -lt $req_ver ]]; then 48 + return 1 49 + else 50 + return 0 51 + fi 52 + } 53 + 54 + check_traceroute6_version() 55 + { 56 + local req_ver=$1; shift 57 + 58 + __check_traceroute_version traceroute6 "$req_ver" 59 + } 60 + 61 + check_traceroute_version() 62 + { 63 + local req_ver=$1; shift 64 + 65 + __check_traceroute_version traceroute "$req_ver" 66 + } 67 + 39 68 ################################################################################ 40 69 # create namespaces and interconnects 41 70 ··· 88 59 ip netns exec ${ns} ip -6 ro add unreachable default metric 8192 89 60 90 61 ip netns exec ${ns} sysctl -qw net.ipv4.ip_forward=1 62 + ip netns exec ${ns} sysctl -qw net.ipv4.icmp_ratelimit=0 63 + ip netns exec ${ns} sysctl -qw net.ipv6.icmp.ratelimit=0 91 64 ip netns exec ${ns} sysctl -qw net.ipv6.conf.all.keep_addr_on_down=1 92 65 ip netns exec ${ns} sysctl -qw net.ipv6.conf.all.forwarding=1 93 66 ip netns exec ${ns} sysctl -qw net.ipv6.conf.default.forwarding=1 ··· 329 298 } 330 299 331 300 ################################################################################ 301 + # traceroute6 with ICMP extensions test 302 + # 303 + # Verify that in this scenario 304 + # 305 + # ---- ---- ---- 306 + # |H1|--------------------------|R1|--------------------------|H2| 307 + # ---- N1 ---- N2 ---- 308 + # 309 + # ICMP extensions are correctly reported. The loopback interfaces on all the 310 + # nodes are assigned global addresses and the interfaces connecting the nodes 311 + # are assigned IPv6 link-local addresses. 312 + 313 + cleanup_traceroute6_ext() 314 + { 315 + cleanup_all_ns 316 + } 317 + 318 + setup_traceroute6_ext() 319 + { 320 + # Start clean 321 + cleanup_traceroute6_ext 322 + 323 + setup_ns h1 r1 h2 324 + create_ns "$h1" 325 + create_ns "$r1" 326 + create_ns "$h2" 327 + 328 + # Setup N1 329 + connect_ns "$h1" eth1 - fe80::1/64 "$r1" eth1 - fe80::2/64 330 + # Setup N2 331 + connect_ns "$r1" eth2 - fe80::3/64 "$h2" eth2 - fe80::4/64 332 + 333 + # Setup H1 334 + ip -n "$h1" address add 2001:db8:1::1/128 dev lo 335 + ip -n "$h1" route add ::/0 nexthop via fe80::2 dev eth1 336 + 337 + # Setup R1 338 + ip -n "$r1" address add 2001:db8:1::2/128 dev lo 339 + ip -n "$r1" route add 2001:db8:1::1/128 nexthop via fe80::1 dev eth1 340 + ip -n "$r1" route add 2001:db8:1::3/128 nexthop via fe80::4 dev eth2 341 + 342 + # Setup H2 343 + ip -n "$h2" address add 2001:db8:1::3/128 dev lo 344 + ip -n "$h2" route add ::/0 nexthop via fe80::3 dev eth2 345 + 346 + # Prime the network 347 + ip netns exec "$h1" ping6 -c5 2001:db8:1::3 >/dev/null 2>&1 348 + } 349 + 350 + traceroute6_ext_iio_iif_test() 351 + { 352 + local r1_ifindex h2_ifindex 353 + local pkt_len=$1; shift 354 + 355 + # Test that incoming interface info is not appended by default. 356 + run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep INC" 357 + check_fail $? "Incoming interface info appended by default when should not" 358 + 359 + # Test that the extension is appended when enabled. 360 + run_cmd "$r1" "bash -c \"echo 0x01 > /proc/sys/net/ipv6/icmp/errors_extension_mask\"" 361 + check_err $? "Failed to enable incoming interface info extension on R1" 362 + 363 + run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep INC" 364 + check_err $? "Incoming interface info not appended after enable" 365 + 366 + # Test that the extension is not appended when disabled. 367 + run_cmd "$r1" "bash -c \"echo 0x00 > /proc/sys/net/ipv6/icmp/errors_extension_mask\"" 368 + check_err $? "Failed to disable incoming interface info extension on R1" 369 + 370 + run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep INC" 371 + check_fail $? "Incoming interface info appended after disable" 372 + 373 + # Test that the extension is sent correctly from both R1 and H2. 374 + run_cmd "$r1" "sysctl -w net.ipv6.icmp.errors_extension_mask=0x01" 375 + r1_ifindex=$(ip -n "$r1" -j link show dev eth1 | jq '.[]["ifindex"]') 376 + run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep '<INC:$r1_ifindex,\"eth1\",mtu=1500>'" 377 + check_err $? "Wrong incoming interface info reported from R1" 378 + 379 + run_cmd "$h2" "sysctl -w net.ipv6.icmp.errors_extension_mask=0x01" 380 + h2_ifindex=$(ip -n "$h2" -j link show dev eth2 | jq '.[]["ifindex"]') 381 + run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep '<INC:$h2_ifindex,\"eth2\",mtu=1500>'" 382 + check_err $? "Wrong incoming interface info reported from H2" 383 + 384 + # Add a global address on the incoming interface of R1 and check that 385 + # it is reported. 386 + run_cmd "$r1" "ip address add 2001:db8:100::1/64 dev eth1 nodad" 387 + run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep '<INC:$r1_ifindex,2001:db8:100::1,\"eth1\",mtu=1500>'" 388 + check_err $? "Wrong incoming interface info reported from R1 after address addition" 389 + run_cmd "$r1" "ip address del 2001:db8:100::1/64 dev eth1" 390 + 391 + # Change name and MTU and make sure the result is still correct. 392 + run_cmd "$r1" "ip link set dev eth1 name eth1tag mtu 1501" 393 + run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep '<INC:$r1_ifindex,\"eth1tag\",mtu=1501>'" 394 + check_err $? "Wrong incoming interface info reported from R1 after name and MTU change" 395 + run_cmd "$r1" "ip link set dev eth1tag name eth1 mtu 1500" 396 + 397 + run_cmd "$r1" "sysctl -w net.ipv6.icmp.errors_extension_mask=0x00" 398 + run_cmd "$h2" "sysctl -w net.ipv6.icmp.errors_extension_mask=0x00" 399 + } 400 + 401 + run_traceroute6_ext() 402 + { 403 + # Need at least version 2.1.5 for RFC 5837 support. 404 + if ! check_traceroute6_version 2.1.5; then 405 + log_test_skip "traceroute6 too old, missing ICMP extensions support" 406 + return 407 + fi 408 + 409 + setup_traceroute6_ext 410 + 411 + RET=0 412 + 413 + ## General ICMP extensions tests 414 + 415 + # Test that ICMP extensions are disabled by default. 416 + run_cmd "$h1" "sysctl net.ipv6.icmp.errors_extension_mask | grep \"= 0$\"" 417 + check_err $? "ICMP extensions are not disabled by default" 418 + 419 + # Test that unsupported values are rejected. Do not use "sysctl" as 420 + # older versions do not return an error code upon failure. 421 + run_cmd "$h1" "bash -c \"echo 0x80 > /proc/sys/net/ipv6/icmp/errors_extension_mask\"" 422 + check_fail $? "Unsupported sysctl value was not rejected" 423 + 424 + ## Extension-specific tests 425 + 426 + # Incoming interface info test. Test with various packet sizes, 427 + # including the default one. 428 + traceroute6_ext_iio_iif_test 429 + traceroute6_ext_iio_iif_test 127 430 + traceroute6_ext_iio_iif_test 128 431 + traceroute6_ext_iio_iif_test 129 432 + 433 + log_test "IPv6 traceroute with ICMP extensions" 434 + 435 + cleanup_traceroute6_ext 436 + } 437 + 438 + ################################################################################ 332 439 # traceroute test 333 440 # 334 441 # Verify that traceroute from H1 to H2 shows 1.0.3.1 and 1.0.1.1 when ··· 607 438 } 608 439 609 440 ################################################################################ 441 + # traceroute with ICMP extensions test 442 + # 443 + # Verify that in this scenario 444 + # 445 + # ---- ---- ---- 446 + # |H1|--------------------------|R1|--------------------------|H2| 447 + # ---- N1 ---- N2 ---- 448 + # 449 + # ICMP extensions are correctly reported. The loopback interfaces on all the 450 + # nodes are assigned global addresses and the interfaces connecting the nodes 451 + # are assigned IPv6 link-local addresses. 452 + 453 + cleanup_traceroute_ext() 454 + { 455 + cleanup_all_ns 456 + } 457 + 458 + setup_traceroute_ext() 459 + { 460 + # Start clean 461 + cleanup_traceroute_ext 462 + 463 + setup_ns h1 r1 h2 464 + create_ns "$h1" 465 + create_ns "$r1" 466 + create_ns "$h2" 467 + 468 + # Setup N1 469 + connect_ns "$h1" eth1 - fe80::1/64 "$r1" eth1 - fe80::2/64 470 + # Setup N2 471 + connect_ns "$r1" eth2 - fe80::3/64 "$h2" eth2 - fe80::4/64 472 + 473 + # Setup H1 474 + ip -n "$h1" address add 192.0.2.1/32 dev lo 475 + ip -n "$h1" route add 0.0.0.0/0 nexthop via inet6 fe80::2 dev eth1 476 + 477 + # Setup R1 478 + ip -n "$r1" address add 192.0.2.2/32 dev lo 479 + ip -n "$r1" route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev eth1 480 + ip -n "$r1" route add 192.0.2.3/32 nexthop via inet6 fe80::4 dev eth2 481 + 482 + # Setup H2 483 + ip -n "$h2" address add 192.0.2.3/32 dev lo 484 + ip -n "$h2" route add 0.0.0.0/0 nexthop via inet6 fe80::3 dev eth2 485 + 486 + # Prime the network 487 + ip netns exec "$h1" ping -c5 192.0.2.3 >/dev/null 2>&1 488 + } 489 + 490 + traceroute_ext_iio_iif_test() 491 + { 492 + local r1_ifindex h2_ifindex 493 + local pkt_len=$1; shift 494 + 495 + # Test that incoming interface info is not appended by default. 496 + run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep INC" 497 + check_fail $? "Incoming interface info appended by default when should not" 498 + 499 + # Test that the extension is appended when enabled. 500 + run_cmd "$r1" "bash -c \"echo 0x01 > /proc/sys/net/ipv4/icmp_errors_extension_mask\"" 501 + check_err $? "Failed to enable incoming interface info extension on R1" 502 + 503 + run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep INC" 504 + check_err $? "Incoming interface info not appended after enable" 505 + 506 + # Test that the extension is not appended when disabled. 507 + run_cmd "$r1" "bash -c \"echo 0x00 > /proc/sys/net/ipv4/icmp_errors_extension_mask\"" 508 + check_err $? "Failed to disable incoming interface info extension on R1" 509 + 510 + run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep INC" 511 + check_fail $? "Incoming interface info appended after disable" 512 + 513 + # Test that the extension is sent correctly from both R1 and H2. 514 + run_cmd "$r1" "sysctl -w net.ipv4.icmp_errors_extension_mask=0x01" 515 + r1_ifindex=$(ip -n "$r1" -j link show dev eth1 | jq '.[]["ifindex"]') 516 + run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep '<INC:$r1_ifindex,\"eth1\",mtu=1500>'" 517 + check_err $? "Wrong incoming interface info reported from R1" 518 + 519 + run_cmd "$h2" "sysctl -w net.ipv4.icmp_errors_extension_mask=0x01" 520 + h2_ifindex=$(ip -n "$h2" -j link show dev eth2 | jq '.[]["ifindex"]') 521 + run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep '<INC:$h2_ifindex,\"eth2\",mtu=1500>'" 522 + check_err $? "Wrong incoming interface info reported from H2" 523 + 524 + # Add a global address on the incoming interface of R1 and check that 525 + # it is reported. 526 + run_cmd "$r1" "ip address add 198.51.100.1/24 dev eth1" 527 + run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep '<INC:$r1_ifindex,198.51.100.1,\"eth1\",mtu=1500>'" 528 + check_err $? "Wrong incoming interface info reported from R1 after address addition" 529 + run_cmd "$r1" "ip address del 198.51.100.1/24 dev eth1" 530 + 531 + # Change name and MTU and make sure the result is still correct. 532 + # Re-add the route towards H1 since it was deleted when we removed the 533 + # last IPv4 address from eth1 on R1. 534 + run_cmd "$r1" "ip route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev eth1" 535 + run_cmd "$r1" "ip link set dev eth1 name eth1tag mtu 1501" 536 + run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep '<INC:$r1_ifindex,\"eth1tag\",mtu=1501>'" 537 + check_err $? "Wrong incoming interface info reported from R1 after name and MTU change" 538 + run_cmd "$r1" "ip link set dev eth1tag name eth1 mtu 1500" 539 + 540 + run_cmd "$r1" "sysctl -w net.ipv4.icmp_errors_extension_mask=0x00" 541 + run_cmd "$h2" "sysctl -w net.ipv4.icmp_errors_extension_mask=0x00" 542 + } 543 + 544 + run_traceroute_ext() 545 + { 546 + # Need at least version 2.1.5 for RFC 5837 support. 547 + if ! check_traceroute_version 2.1.5; then 548 + log_test_skip "traceroute too old, missing ICMP extensions support" 549 + return 550 + fi 551 + 552 + setup_traceroute_ext 553 + 554 + RET=0 555 + 556 + ## General ICMP extensions tests 557 + 558 + # Test that ICMP extensions are disabled by default. 559 + run_cmd "$h1" "sysctl net.ipv4.icmp_errors_extension_mask | grep \"= 0$\"" 560 + check_err $? "ICMP extensions are not disabled by default" 561 + 562 + # Test that unsupported values are rejected. Do not use "sysctl" as 563 + # older versions do not return an error code upon failure. 564 + run_cmd "$h1" "bash -c \"echo 0x80 > /proc/sys/net/ipv4/icmp_errors_extension_mask\"" 565 + check_fail $? "Unsupported sysctl value was not rejected" 566 + 567 + ## Extension-specific tests 568 + 569 + # Incoming interface info test. Test with various packet sizes, 570 + # including the default one. 571 + traceroute_ext_iio_iif_test 572 + traceroute_ext_iio_iif_test 127 573 + traceroute_ext_iio_iif_test 128 574 + traceroute_ext_iio_iif_test 129 575 + 576 + log_test "IPv4 traceroute with ICMP extensions" 577 + 578 + cleanup_traceroute_ext 579 + } 580 + 581 + ################################################################################ 610 582 # Run tests 611 583 612 584 run_tests() 613 585 { 614 586 run_traceroute6 615 587 run_traceroute6_vrf 588 + run_traceroute6_ext 616 589 run_traceroute 617 590 run_traceroute_vrf 591 + run_traceroute_ext 618 592 } 619 593 620 594 ################################################################################ ··· 774 462 775 463 require_command traceroute6 776 464 require_command traceroute 465 + require_command jq 777 466 778 467 run_tests 779 468