Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

cls_flower: Fix inability to match GRE/IPIP packets

When a packet of a new flow arrives in openvswitch kernel module, it dissects
the packet and passes the extracted flow key to ovs-vswtichd daemon. If hw-
offload configuration is enabled, the daemon creates a new TC flower entry to
bypass openvswitch kernel module for the flow (TC flower can also offload flows
to NICs but this time that does not matter).

In this processing flow, I found the following issue in cases of GRE/IPIP
packets.

When ovs_flow_key_extract() in openvswitch module parses a packet of a new
GRE (or IPIP) flow received on non-tunneling vports, it extracts information
of the outer IP header for ip_proto/src_ip/dst_ip match keys.

This means ovs-vswitchd creates a TC flower entry with IP protocol/addresses
match keys whose values are those of the outer IP header. OTOH, TC flower,
which uses flow_dissector (different parser from openvswitch module), extracts
information of the inner IP header.

The following flow is an example to describe the issue in more detail.

<----------- Outer IP -----------------> <---------- Inner IP ---------->
+----------+--------------+--------------+----------+----------+----------+
| ip_proto | src_ip | dst_ip | ip_proto | src_ip | dst_ip |
| 47 (GRE) | 192.168.10.1 | 192.168.10.2 | 6 (TCP) | 10.0.0.1 | 10.0.0.2 |
+----------+--------------+--------------+----------+----------+----------+

In this case, TC flower entry and extracted information are shown as below:

- ovs-vswitchd creates TC flower entry with:
- ip_proto: 47
- src_ip: 192.168.10.1
- dst_ip: 192.168.10.2

- TC flower extracts below for IP header matches:
- ip_proto: 6
- src_ip: 10.0.0.1
- dst_ip: 10.0.0.2

Thus, GRE or IPIP packets never match the TC flower entry, as each
dissector behaves differently.

IMHO, the behavior of TC flower (flow dissector) does not look correct,
as ip_proto/src_ip/dst_ip in TC flower match means the outermost IP
header information except for GRE/IPIP cases. This patch adds a new
flow_dissector flag FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP which skips
dissection of the encapsulated inner GRE/IPIP header in TC flower
classifier.

Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Yoshiki Komachi and committed by
David S. Miller
6de6e46d 34d7ecb3

+18 -1
+1
include/net/flow_dissector.h
··· 287 287 #define FLOW_DISSECTOR_F_PARSE_1ST_FRAG BIT(0) 288 288 #define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL BIT(1) 289 289 #define FLOW_DISSECTOR_F_STOP_AT_ENCAP BIT(2) 290 + #define FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP BIT(3) 290 291 291 292 struct flow_dissector_key { 292 293 enum flow_dissector_key_id key_id;
+15
net/core/flow_dissector.c
··· 1307 1307 1308 1308 switch (ip_proto) { 1309 1309 case IPPROTO_GRE: 1310 + if (flags & FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP) { 1311 + fdret = FLOW_DISSECT_RET_OUT_GOOD; 1312 + break; 1313 + } 1314 + 1310 1315 fdret = __skb_flow_dissect_gre(skb, key_control, flow_dissector, 1311 1316 target_container, data, 1312 1317 &proto, &nhoff, &hlen, flags); ··· 1369 1364 break; 1370 1365 } 1371 1366 case IPPROTO_IPIP: 1367 + if (flags & FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP) { 1368 + fdret = FLOW_DISSECT_RET_OUT_GOOD; 1369 + break; 1370 + } 1371 + 1372 1372 proto = htons(ETH_P_IP); 1373 1373 1374 1374 key_control->flags |= FLOW_DIS_ENCAPSULATION; ··· 1386 1376 break; 1387 1377 1388 1378 case IPPROTO_IPV6: 1379 + if (flags & FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP) { 1380 + fdret = FLOW_DISSECT_RET_OUT_GOOD; 1381 + break; 1382 + } 1383 + 1389 1384 proto = htons(ETH_P_IPV6); 1390 1385 1391 1386 key_control->flags |= FLOW_DIS_ENCAPSULATION;
+2 -1
net/sched/cls_flower.c
··· 329 329 ARRAY_SIZE(fl_ct_info_to_flower_map), 330 330 post_ct); 331 331 skb_flow_dissect_hash(skb, &mask->dissector, &skb_key); 332 - skb_flow_dissect(skb, &mask->dissector, &skb_key, 0); 332 + skb_flow_dissect(skb, &mask->dissector, &skb_key, 333 + FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP); 333 334 334 335 f = fl_mask_lookup(mask, &skb_key); 335 336 if (f && !tc_skip_sw(f->flags)) {