Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'Introduce-connection-tracking-offload'

Paul Blakey says:

====================
Introduce connection tracking offload

Background
----------

The connection tracking action provides the ability to associate connection state to a packet.
The connection state may be used for stateful packet processing such as stateful firewalls
and NAT operations.

Connection tracking in TC SW
----------------------------

The CT state may be matched only after the CT action is performed.
As such, CT use cases are commonly implemented using multiple chains.
Consider the following TC filters, as an example:
1. tc filter add dev ens1f0_0 ingress prio 1 chain 0 proto ip flower \
src_mac 24:8a:07:a5:28:01 ct_state -trk \
action ct \
pipe action goto chain 2

2. tc filter add dev ens1f0_0 ingress prio 1 chain 2 proto ip flower \
ct_state +trk+new \
action ct commit \
pipe action tunnel_key set \
src_ip 0.0.0.0 \
dst_ip 7.7.7.8 \
id 98 \
dst_port 4789 \
action mirred egress redirect dev vxlan0

3. tc filter add dev ens1f0_0 ingress prio 1 chain 2 proto ip flower \
ct_state +trk+est \
action tunnel_key set \
src_ip 0.0.0.0 \
dst_ip 7.7.7.8 \
id 98 \
dst_port 4789 \
action mirred egress redirect dev vxlan0

Filter #1 (chain 0) decides, after initial packet classification, to send the packet to the
connection tracking module (ct action).
Once the ct_state is initialized by the CT action the packet processing continues on chain 2.

Chain 2 classifies the packet based on the ct_state.
Filter #2 matches on the +trk+new CT state while filter #3 matches on the +trk+est ct_state.

MLX5 Connection tracking HW offload - MLX5 driver patches
------------------------------

The MLX5 hardware model aligns with the software model by realizing a multi-table
architecture. In SW the TC CT action sets the CT state on the skb. Similarly,
HW sets the CT state on a HW register. Driver gets this CT state while offloading
a tuple with a new ct_metadata action that provides it.

Matches on ct_state are translated to HW register matches.

TC filter with CT action broken to two rules, a pre_ct rule, and a post_ct rule.
pre_ct rule:
Inserted on the corrosponding tc chain table, matches on original tc match, with
actions: any pre ct actions, set fte_id, set zone, and goto the ct table.
The fte_id is a register mapping uniquely identifying this filter.
post_ct_rule:
Inserted in a post_ct table, matches on the fte_id register mapping, with
actions: counter + any post ct actions (this is usally 'goto chain X')

post_ct table is a table that all the tuples inserted to the ct table goto, so
if there is a tuple hit, packet will continue from ct table to post_ct table,
after being marked with the CT state (mark/label..)

This design ensures that the rule's actions and counters will be executed only after a CT hit.
HW misses will continue processing in SW from the last chain ID that was processed in hardware.

The following illustrates the HW model:

+-------------------+ +--------------------+ +--------------+
+ pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct +----->
+ original match + | + tuple + zone match + | + fte_id match + |
+-------------------+ | +--------------------+ | +--------------+ |
v v v
set chain miss mapping set mark original
set fte_id set label filter
set zone set established actions
set tunnel_id do nat (if needed)
do decap

To fill CT table, driver registers a CB for flow offload events, for each new
flow table that is passed to it from offloading ct actions. Once a flow offload
event is triggered on this CB, offload this flow to the hardware CT table.

Established events offload
--------------------------

Currently, act_ct maintains an FT instance per ct zone. Flow table entries
are created, per ct connection, when connections enter an established
state and deleted otherwise. Once an entry is created, the FT assumes
ownership of the entries, and manages their aging. FT is used for software
offload of conntrack. FT entries associate 5-tuples with an action list.

The act_ct changes in this patchset:
Populate the action list with a (new) ct_metadata action, providing the
connection's ct state (zone,mark and label), and mangle actions if NAT
is configured.

Pass the action's flow table instance as ct action entry parameter,
so when the action is offloaded, the driver may register a callback on
it's block to receive FT flow offload add/del/stats events.

Netilter changes
--------------------------
The netfilter changes export the relevant bits, and add the relevant CBs
to support the above.

Applying this patchset
--------------------------

On top of current net-next ("r8169: simplify getting stats by using netdev_stats_to_stats64"),
pull Saeed's ct-offload branch, from git git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
and fix the following non trivial conflict in fs_core.c as follows:

Then apply this patchset.

Changelog:
v2->v3:
Added the first two patches needed after rebasing on net-next:
"net/mlx5: E-Switch, Enable reg c1 loopback when possible"
"net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads table"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+2134 -70
+10
drivers/net/ethernet/mellanox/mlx5/core/Kconfig
··· 78 78 Legacy SRIOV mode (L2 mac vlan steering based). 79 79 Switchdev mode (eswitch offloads). 80 80 81 + config MLX5_TC_CT 82 + bool "MLX5 TC connection tracking offload support" 83 + depends on MLX5_CORE_EN && NET_SWITCHDEV && NF_FLOW_TABLE && NET_ACT_CT && NET_TC_SKB_EXT 84 + default y 85 + help 86 + Say Y here if you want to support offloading connection tracking rules 87 + via tc ct action. 88 + 89 + If unsure, set to Y 90 + 81 91 config MLX5_CORE_EN_DCB 82 92 bool "Data Center Bridging (DCB) Support" 83 93 default y
+1
drivers/net/ethernet/mellanox/mlx5/core/Makefile
··· 37 37 lib/geneve.o en/mapping.o en/tc_tun_vxlan.o en/tc_tun_gre.o \ 38 38 en/tc_tun_geneve.o diag/en_tc_tracepoint.o 39 39 mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += en/hv_vhca_stats.o 40 + mlx5_core-$(CONFIG_MLX5_TC_CT) += en/tc_ct.o 40 41 41 42 # 42 43 # Core extra
+1356
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + /* Copyright (c) 2019 Mellanox Technologies. */ 3 + 4 + #include <net/netfilter/nf_conntrack.h> 5 + #include <net/netfilter/nf_conntrack_core.h> 6 + #include <net/netfilter/nf_conntrack_zones.h> 7 + #include <net/netfilter/nf_conntrack_labels.h> 8 + #include <net/netfilter/nf_conntrack_helper.h> 9 + #include <net/netfilter/nf_conntrack_acct.h> 10 + #include <uapi/linux/tc_act/tc_pedit.h> 11 + #include <net/tc_act/tc_ct.h> 12 + #include <net/flow_offload.h> 13 + #include <net/netfilter/nf_flow_table.h> 14 + #include <linux/workqueue.h> 15 + 16 + #include "en/tc_ct.h" 17 + #include "en.h" 18 + #include "en_tc.h" 19 + #include "en_rep.h" 20 + #include "eswitch_offloads_chains.h" 21 + 22 + #define MLX5_CT_ZONE_BITS (mlx5e_tc_attr_to_reg_mappings[ZONE_TO_REG].mlen * 8) 23 + #define MLX5_CT_ZONE_MASK GENMASK(MLX5_CT_ZONE_BITS - 1, 0) 24 + #define MLX5_CT_STATE_ESTABLISHED_BIT BIT(1) 25 + #define MLX5_CT_STATE_TRK_BIT BIT(2) 26 + 27 + #define MLX5_FTE_ID_BITS (mlx5e_tc_attr_to_reg_mappings[FTEID_TO_REG].mlen * 8) 28 + #define MLX5_FTE_ID_MAX GENMASK(MLX5_FTE_ID_BITS - 1, 0) 29 + #define MLX5_FTE_ID_MASK MLX5_FTE_ID_MAX 30 + 31 + #define ct_dbg(fmt, args...)\ 32 + netdev_dbg(ct_priv->netdev, "ct_debug: " fmt "\n", ##args) 33 + 34 + struct mlx5_tc_ct_priv { 35 + struct mlx5_eswitch *esw; 36 + const struct net_device *netdev; 37 + struct idr fte_ids; 38 + struct idr tuple_ids; 39 + struct rhashtable zone_ht; 40 + struct mlx5_flow_table *ct; 41 + struct mlx5_flow_table *ct_nat; 42 + struct mlx5_flow_table *post_ct; 43 + struct mutex control_lock; /* guards parallel adds/dels */ 44 + }; 45 + 46 + struct mlx5_ct_flow { 47 + struct mlx5_esw_flow_attr pre_ct_attr; 48 + struct mlx5_esw_flow_attr post_ct_attr; 49 + struct mlx5_flow_handle *pre_ct_rule; 50 + struct mlx5_flow_handle *post_ct_rule; 51 + struct mlx5_ct_ft *ft; 52 + u32 fte_id; 53 + u32 chain_mapping; 54 + }; 55 + 56 + struct mlx5_ct_zone_rule { 57 + struct mlx5_flow_handle *rule; 58 + struct mlx5_esw_flow_attr attr; 59 + int tupleid; 60 + bool nat; 61 + }; 62 + 63 + struct mlx5_ct_ft { 64 + struct rhash_head node; 65 + u16 zone; 66 + refcount_t refcount; 67 + struct nf_flowtable *nf_ft; 68 + struct mlx5_tc_ct_priv *ct_priv; 69 + struct rhashtable ct_entries_ht; 70 + struct list_head ct_entries_list; 71 + }; 72 + 73 + struct mlx5_ct_entry { 74 + struct list_head list; 75 + u16 zone; 76 + struct rhash_head node; 77 + struct flow_rule *flow_rule; 78 + struct mlx5_fc *counter; 79 + unsigned long lastuse; 80 + unsigned long cookie; 81 + unsigned long restore_cookie; 82 + struct mlx5_ct_zone_rule zone_rules[2]; 83 + }; 84 + 85 + static const struct rhashtable_params cts_ht_params = { 86 + .head_offset = offsetof(struct mlx5_ct_entry, node), 87 + .key_offset = offsetof(struct mlx5_ct_entry, cookie), 88 + .key_len = sizeof(((struct mlx5_ct_entry *)0)->cookie), 89 + .automatic_shrinking = true, 90 + .min_size = 16 * 1024, 91 + }; 92 + 93 + static const struct rhashtable_params zone_params = { 94 + .head_offset = offsetof(struct mlx5_ct_ft, node), 95 + .key_offset = offsetof(struct mlx5_ct_ft, zone), 96 + .key_len = sizeof(((struct mlx5_ct_ft *)0)->zone), 97 + .automatic_shrinking = true, 98 + }; 99 + 100 + static struct mlx5_tc_ct_priv * 101 + mlx5_tc_ct_get_ct_priv(struct mlx5e_priv *priv) 102 + { 103 + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; 104 + struct mlx5_rep_uplink_priv *uplink_priv; 105 + struct mlx5e_rep_priv *uplink_rpriv; 106 + 107 + uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH); 108 + uplink_priv = &uplink_rpriv->uplink_priv; 109 + return uplink_priv->ct_priv; 110 + } 111 + 112 + static int 113 + mlx5_tc_ct_set_tuple_match(struct mlx5_flow_spec *spec, 114 + struct flow_rule *rule) 115 + { 116 + void *headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, 117 + outer_headers); 118 + void *headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, 119 + outer_headers); 120 + u16 addr_type = 0; 121 + u8 ip_proto = 0; 122 + 123 + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_BASIC)) { 124 + struct flow_match_basic match; 125 + 126 + flow_rule_match_basic(rule, &match); 127 + 128 + MLX5_SET(fte_match_set_lyr_2_4, headers_c, ethertype, 129 + ntohs(match.mask->n_proto)); 130 + MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype, 131 + ntohs(match.key->n_proto)); 132 + MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_protocol, 133 + match.mask->ip_proto); 134 + MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol, 135 + match.key->ip_proto); 136 + 137 + ip_proto = match.key->ip_proto; 138 + } 139 + 140 + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_CONTROL)) { 141 + struct flow_match_control match; 142 + 143 + flow_rule_match_control(rule, &match); 144 + addr_type = match.key->addr_type; 145 + } 146 + 147 + if (addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) { 148 + struct flow_match_ipv4_addrs match; 149 + 150 + flow_rule_match_ipv4_addrs(rule, &match); 151 + memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c, 152 + src_ipv4_src_ipv6.ipv4_layout.ipv4), 153 + &match.mask->src, sizeof(match.mask->src)); 154 + memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v, 155 + src_ipv4_src_ipv6.ipv4_layout.ipv4), 156 + &match.key->src, sizeof(match.key->src)); 157 + memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c, 158 + dst_ipv4_dst_ipv6.ipv4_layout.ipv4), 159 + &match.mask->dst, sizeof(match.mask->dst)); 160 + memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v, 161 + dst_ipv4_dst_ipv6.ipv4_layout.ipv4), 162 + &match.key->dst, sizeof(match.key->dst)); 163 + } 164 + 165 + if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) { 166 + struct flow_match_ipv6_addrs match; 167 + 168 + flow_rule_match_ipv6_addrs(rule, &match); 169 + memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c, 170 + src_ipv4_src_ipv6.ipv6_layout.ipv6), 171 + &match.mask->src, sizeof(match.mask->src)); 172 + memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v, 173 + src_ipv4_src_ipv6.ipv6_layout.ipv6), 174 + &match.key->src, sizeof(match.key->src)); 175 + 176 + memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c, 177 + dst_ipv4_dst_ipv6.ipv6_layout.ipv6), 178 + &match.mask->dst, sizeof(match.mask->dst)); 179 + memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v, 180 + dst_ipv4_dst_ipv6.ipv6_layout.ipv6), 181 + &match.key->dst, sizeof(match.key->dst)); 182 + } 183 + 184 + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_PORTS)) { 185 + struct flow_match_ports match; 186 + 187 + flow_rule_match_ports(rule, &match); 188 + switch (ip_proto) { 189 + case IPPROTO_TCP: 190 + MLX5_SET(fte_match_set_lyr_2_4, headers_c, 191 + tcp_sport, ntohs(match.mask->src)); 192 + MLX5_SET(fte_match_set_lyr_2_4, headers_v, 193 + tcp_sport, ntohs(match.key->src)); 194 + 195 + MLX5_SET(fte_match_set_lyr_2_4, headers_c, 196 + tcp_dport, ntohs(match.mask->dst)); 197 + MLX5_SET(fte_match_set_lyr_2_4, headers_v, 198 + tcp_dport, ntohs(match.key->dst)); 199 + break; 200 + 201 + case IPPROTO_UDP: 202 + MLX5_SET(fte_match_set_lyr_2_4, headers_c, 203 + udp_sport, ntohs(match.mask->src)); 204 + MLX5_SET(fte_match_set_lyr_2_4, headers_v, 205 + udp_sport, ntohs(match.key->src)); 206 + 207 + MLX5_SET(fte_match_set_lyr_2_4, headers_c, 208 + udp_dport, ntohs(match.mask->dst)); 209 + MLX5_SET(fte_match_set_lyr_2_4, headers_v, 210 + udp_dport, ntohs(match.key->dst)); 211 + break; 212 + default: 213 + break; 214 + } 215 + } 216 + 217 + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_TCP)) { 218 + struct flow_match_tcp match; 219 + 220 + flow_rule_match_tcp(rule, &match); 221 + MLX5_SET(fte_match_set_lyr_2_4, headers_c, tcp_flags, 222 + ntohs(match.mask->flags)); 223 + MLX5_SET(fte_match_set_lyr_2_4, headers_v, tcp_flags, 224 + ntohs(match.key->flags)); 225 + } 226 + 227 + return 0; 228 + } 229 + 230 + static void 231 + mlx5_tc_ct_entry_del_rule(struct mlx5_tc_ct_priv *ct_priv, 232 + struct mlx5_ct_entry *entry, 233 + bool nat) 234 + { 235 + struct mlx5_ct_zone_rule *zone_rule = &entry->zone_rules[nat]; 236 + struct mlx5_esw_flow_attr *attr = &zone_rule->attr; 237 + struct mlx5_eswitch *esw = ct_priv->esw; 238 + 239 + ct_dbg("Deleting ct entry rule in zone %d", entry->zone); 240 + 241 + mlx5_eswitch_del_offloaded_rule(esw, zone_rule->rule, attr); 242 + mlx5_modify_header_dealloc(esw->dev, attr->modify_hdr); 243 + idr_remove(&ct_priv->tuple_ids, zone_rule->tupleid); 244 + } 245 + 246 + static void 247 + mlx5_tc_ct_entry_del_rules(struct mlx5_tc_ct_priv *ct_priv, 248 + struct mlx5_ct_entry *entry) 249 + { 250 + mlx5_tc_ct_entry_del_rule(ct_priv, entry, true); 251 + mlx5_tc_ct_entry_del_rule(ct_priv, entry, false); 252 + 253 + mlx5_fc_destroy(ct_priv->esw->dev, entry->counter); 254 + } 255 + 256 + static struct flow_action_entry * 257 + mlx5_tc_ct_get_ct_metadata_action(struct flow_rule *flow_rule) 258 + { 259 + struct flow_action *flow_action = &flow_rule->action; 260 + struct flow_action_entry *act; 261 + int i; 262 + 263 + flow_action_for_each(i, act, flow_action) { 264 + if (act->id == FLOW_ACTION_CT_METADATA) 265 + return act; 266 + } 267 + 268 + return NULL; 269 + } 270 + 271 + static int 272 + mlx5_tc_ct_entry_set_registers(struct mlx5_tc_ct_priv *ct_priv, 273 + struct mlx5e_tc_mod_hdr_acts *mod_acts, 274 + u8 ct_state, 275 + u32 mark, 276 + u32 label, 277 + u32 tupleid) 278 + { 279 + struct mlx5_eswitch *esw = ct_priv->esw; 280 + int err; 281 + 282 + err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, 283 + CTSTATE_TO_REG, ct_state); 284 + if (err) 285 + return err; 286 + 287 + err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, 288 + MARK_TO_REG, mark); 289 + if (err) 290 + return err; 291 + 292 + err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, 293 + LABELS_TO_REG, label); 294 + if (err) 295 + return err; 296 + 297 + err = mlx5e_tc_match_to_reg_set(esw->dev, mod_acts, 298 + TUPLEID_TO_REG, tupleid); 299 + if (err) 300 + return err; 301 + 302 + return 0; 303 + } 304 + 305 + static int 306 + mlx5_tc_ct_parse_mangle_to_mod_act(struct flow_action_entry *act, 307 + char *modact) 308 + { 309 + u32 offset = act->mangle.offset, field; 310 + 311 + switch (act->mangle.htype) { 312 + case FLOW_ACT_MANGLE_HDR_TYPE_IP4: 313 + MLX5_SET(set_action_in, modact, length, 0); 314 + if (offset == offsetof(struct iphdr, saddr)) 315 + field = MLX5_ACTION_IN_FIELD_OUT_SIPV4; 316 + else if (offset == offsetof(struct iphdr, daddr)) 317 + field = MLX5_ACTION_IN_FIELD_OUT_DIPV4; 318 + else 319 + return -EOPNOTSUPP; 320 + break; 321 + 322 + case FLOW_ACT_MANGLE_HDR_TYPE_IP6: 323 + MLX5_SET(set_action_in, modact, length, 0); 324 + if (offset == offsetof(struct ipv6hdr, saddr)) 325 + field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_31_0; 326 + else if (offset == offsetof(struct ipv6hdr, saddr) + 4) 327 + field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_63_32; 328 + else if (offset == offsetof(struct ipv6hdr, saddr) + 8) 329 + field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_95_64; 330 + else if (offset == offsetof(struct ipv6hdr, saddr) + 12) 331 + field = MLX5_ACTION_IN_FIELD_OUT_SIPV6_127_96; 332 + else if (offset == offsetof(struct ipv6hdr, daddr)) 333 + field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_31_0; 334 + else if (offset == offsetof(struct ipv6hdr, daddr) + 4) 335 + field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_63_32; 336 + else if (offset == offsetof(struct ipv6hdr, daddr) + 8) 337 + field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_95_64; 338 + else if (offset == offsetof(struct ipv6hdr, daddr) + 12) 339 + field = MLX5_ACTION_IN_FIELD_OUT_DIPV6_127_96; 340 + else 341 + return -EOPNOTSUPP; 342 + break; 343 + 344 + case FLOW_ACT_MANGLE_HDR_TYPE_TCP: 345 + MLX5_SET(set_action_in, modact, length, 16); 346 + if (offset == offsetof(struct tcphdr, source)) 347 + field = MLX5_ACTION_IN_FIELD_OUT_TCP_SPORT; 348 + else if (offset == offsetof(struct tcphdr, dest)) 349 + field = MLX5_ACTION_IN_FIELD_OUT_TCP_DPORT; 350 + else 351 + return -EOPNOTSUPP; 352 + break; 353 + 354 + case FLOW_ACT_MANGLE_HDR_TYPE_UDP: 355 + MLX5_SET(set_action_in, modact, length, 16); 356 + if (offset == offsetof(struct udphdr, source)) 357 + field = MLX5_ACTION_IN_FIELD_OUT_UDP_SPORT; 358 + else if (offset == offsetof(struct udphdr, dest)) 359 + field = MLX5_ACTION_IN_FIELD_OUT_UDP_DPORT; 360 + else 361 + return -EOPNOTSUPP; 362 + break; 363 + 364 + default: 365 + return -EOPNOTSUPP; 366 + } 367 + 368 + MLX5_SET(set_action_in, modact, action_type, MLX5_ACTION_TYPE_SET); 369 + MLX5_SET(set_action_in, modact, offset, 0); 370 + MLX5_SET(set_action_in, modact, field, field); 371 + MLX5_SET(set_action_in, modact, data, act->mangle.val); 372 + 373 + return 0; 374 + } 375 + 376 + static int 377 + mlx5_tc_ct_entry_create_nat(struct mlx5_tc_ct_priv *ct_priv, 378 + struct flow_rule *flow_rule, 379 + struct mlx5e_tc_mod_hdr_acts *mod_acts) 380 + { 381 + struct flow_action *flow_action = &flow_rule->action; 382 + struct mlx5_core_dev *mdev = ct_priv->esw->dev; 383 + struct flow_action_entry *act; 384 + size_t action_size; 385 + char *modact; 386 + int err, i; 387 + 388 + action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto); 389 + 390 + flow_action_for_each(i, act, flow_action) { 391 + switch (act->id) { 392 + case FLOW_ACTION_MANGLE: { 393 + err = alloc_mod_hdr_actions(mdev, 394 + MLX5_FLOW_NAMESPACE_FDB, 395 + mod_acts); 396 + if (err) 397 + return err; 398 + 399 + modact = mod_acts->actions + 400 + mod_acts->num_actions * action_size; 401 + 402 + err = mlx5_tc_ct_parse_mangle_to_mod_act(act, modact); 403 + if (err) 404 + return err; 405 + 406 + mod_acts->num_actions++; 407 + } 408 + break; 409 + 410 + case FLOW_ACTION_CT_METADATA: 411 + /* Handled earlier */ 412 + continue; 413 + default: 414 + return -EOPNOTSUPP; 415 + } 416 + } 417 + 418 + return 0; 419 + } 420 + 421 + static int 422 + mlx5_tc_ct_entry_create_mod_hdr(struct mlx5_tc_ct_priv *ct_priv, 423 + struct mlx5_esw_flow_attr *attr, 424 + struct flow_rule *flow_rule, 425 + u32 tupleid, 426 + bool nat) 427 + { 428 + struct mlx5e_tc_mod_hdr_acts mod_acts = {}; 429 + struct mlx5_eswitch *esw = ct_priv->esw; 430 + struct mlx5_modify_hdr *mod_hdr; 431 + struct flow_action_entry *meta; 432 + int err; 433 + 434 + meta = mlx5_tc_ct_get_ct_metadata_action(flow_rule); 435 + if (!meta) 436 + return -EOPNOTSUPP; 437 + 438 + if (meta->ct_metadata.labels[1] || 439 + meta->ct_metadata.labels[2] || 440 + meta->ct_metadata.labels[3]) { 441 + ct_dbg("Failed to offload ct entry due to unsupported label"); 442 + return -EOPNOTSUPP; 443 + } 444 + 445 + if (nat) { 446 + err = mlx5_tc_ct_entry_create_nat(ct_priv, flow_rule, 447 + &mod_acts); 448 + if (err) 449 + goto err_mapping; 450 + } 451 + 452 + err = mlx5_tc_ct_entry_set_registers(ct_priv, &mod_acts, 453 + (MLX5_CT_STATE_ESTABLISHED_BIT | 454 + MLX5_CT_STATE_TRK_BIT), 455 + meta->ct_metadata.mark, 456 + meta->ct_metadata.labels[0], 457 + tupleid); 458 + if (err) 459 + goto err_mapping; 460 + 461 + mod_hdr = mlx5_modify_header_alloc(esw->dev, MLX5_FLOW_NAMESPACE_FDB, 462 + mod_acts.num_actions, 463 + mod_acts.actions); 464 + if (IS_ERR(mod_hdr)) { 465 + err = PTR_ERR(mod_hdr); 466 + goto err_mapping; 467 + } 468 + attr->modify_hdr = mod_hdr; 469 + 470 + dealloc_mod_hdr_actions(&mod_acts); 471 + return 0; 472 + 473 + err_mapping: 474 + dealloc_mod_hdr_actions(&mod_acts); 475 + return err; 476 + } 477 + 478 + static int 479 + mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv, 480 + struct flow_rule *flow_rule, 481 + struct mlx5_ct_entry *entry, 482 + bool nat) 483 + { 484 + struct mlx5_ct_zone_rule *zone_rule = &entry->zone_rules[nat]; 485 + struct mlx5_esw_flow_attr *attr = &zone_rule->attr; 486 + struct mlx5_eswitch *esw = ct_priv->esw; 487 + struct mlx5_flow_spec spec = {}; 488 + u32 tupleid = 1; 489 + int err; 490 + 491 + zone_rule->nat = nat; 492 + 493 + /* Get tuple unique id */ 494 + err = idr_alloc_u32(&ct_priv->tuple_ids, zone_rule, &tupleid, 495 + TUPLE_ID_MAX, GFP_KERNEL); 496 + if (err) { 497 + netdev_warn(ct_priv->netdev, 498 + "Failed to allocate tuple id, err: %d\n", err); 499 + return err; 500 + } 501 + zone_rule->tupleid = tupleid; 502 + 503 + err = mlx5_tc_ct_entry_create_mod_hdr(ct_priv, attr, flow_rule, 504 + tupleid, nat); 505 + if (err) { 506 + ct_dbg("Failed to create ct entry mod hdr"); 507 + goto err_mod_hdr; 508 + } 509 + 510 + attr->action = MLX5_FLOW_CONTEXT_ACTION_MOD_HDR | 511 + MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | 512 + MLX5_FLOW_CONTEXT_ACTION_COUNT; 513 + attr->dest_chain = 0; 514 + attr->dest_ft = ct_priv->post_ct; 515 + attr->fdb = nat ? ct_priv->ct_nat : ct_priv->ct; 516 + attr->outer_match_level = MLX5_MATCH_L4; 517 + attr->counter = entry->counter; 518 + attr->flags |= MLX5_ESW_ATTR_FLAG_NO_IN_PORT; 519 + 520 + mlx5_tc_ct_set_tuple_match(&spec, flow_rule); 521 + mlx5e_tc_match_to_reg_match(&spec, ZONE_TO_REG, 522 + entry->zone & MLX5_CT_ZONE_MASK, 523 + MLX5_CT_ZONE_MASK); 524 + 525 + zone_rule->rule = mlx5_eswitch_add_offloaded_rule(esw, &spec, attr); 526 + if (IS_ERR(zone_rule->rule)) { 527 + err = PTR_ERR(zone_rule->rule); 528 + ct_dbg("Failed to add ct entry rule, nat: %d", nat); 529 + goto err_rule; 530 + } 531 + 532 + ct_dbg("Offloaded ct entry rule in zone %d", entry->zone); 533 + 534 + return 0; 535 + 536 + err_rule: 537 + mlx5_modify_header_dealloc(esw->dev, attr->modify_hdr); 538 + err_mod_hdr: 539 + idr_remove(&ct_priv->tuple_ids, zone_rule->tupleid); 540 + return err; 541 + } 542 + 543 + static int 544 + mlx5_tc_ct_entry_add_rules(struct mlx5_tc_ct_priv *ct_priv, 545 + struct flow_rule *flow_rule, 546 + struct mlx5_ct_entry *entry) 547 + { 548 + struct mlx5_eswitch *esw = ct_priv->esw; 549 + int err; 550 + 551 + entry->counter = mlx5_fc_create(esw->dev, true); 552 + if (IS_ERR(entry->counter)) { 553 + err = PTR_ERR(entry->counter); 554 + ct_dbg("Failed to create counter for ct entry"); 555 + return err; 556 + } 557 + 558 + err = mlx5_tc_ct_entry_add_rule(ct_priv, flow_rule, entry, false); 559 + if (err) 560 + goto err_orig; 561 + 562 + err = mlx5_tc_ct_entry_add_rule(ct_priv, flow_rule, entry, true); 563 + if (err) 564 + goto err_nat; 565 + 566 + return 0; 567 + 568 + err_nat: 569 + mlx5_tc_ct_entry_del_rule(ct_priv, entry, false); 570 + err_orig: 571 + mlx5_fc_destroy(esw->dev, entry->counter); 572 + return err; 573 + } 574 + 575 + static int 576 + mlx5_tc_ct_block_flow_offload_add(struct mlx5_ct_ft *ft, 577 + struct flow_cls_offload *flow) 578 + { 579 + struct flow_rule *flow_rule = flow_cls_offload_flow_rule(flow); 580 + struct mlx5_tc_ct_priv *ct_priv = ft->ct_priv; 581 + struct flow_action_entry *meta_action; 582 + unsigned long cookie = flow->cookie; 583 + struct mlx5_ct_entry *entry; 584 + int err; 585 + 586 + meta_action = mlx5_tc_ct_get_ct_metadata_action(flow_rule); 587 + if (!meta_action) 588 + return -EOPNOTSUPP; 589 + 590 + entry = rhashtable_lookup_fast(&ft->ct_entries_ht, &cookie, 591 + cts_ht_params); 592 + if (entry) 593 + return 0; 594 + 595 + entry = kzalloc(sizeof(*entry), GFP_KERNEL); 596 + if (!entry) 597 + return -ENOMEM; 598 + 599 + entry->zone = ft->zone; 600 + entry->flow_rule = flow_rule; 601 + entry->cookie = flow->cookie; 602 + entry->restore_cookie = meta_action->ct_metadata.cookie; 603 + 604 + err = mlx5_tc_ct_entry_add_rules(ct_priv, flow_rule, entry); 605 + if (err) 606 + goto err_rules; 607 + 608 + err = rhashtable_insert_fast(&ft->ct_entries_ht, &entry->node, 609 + cts_ht_params); 610 + if (err) 611 + goto err_insert; 612 + 613 + list_add(&entry->list, &ft->ct_entries_list); 614 + 615 + return 0; 616 + 617 + err_insert: 618 + mlx5_tc_ct_entry_del_rules(ct_priv, entry); 619 + err_rules: 620 + kfree(entry); 621 + netdev_warn(ct_priv->netdev, 622 + "Failed to offload ct entry, err: %d\n", err); 623 + return err; 624 + } 625 + 626 + static int 627 + mlx5_tc_ct_block_flow_offload_del(struct mlx5_ct_ft *ft, 628 + struct flow_cls_offload *flow) 629 + { 630 + unsigned long cookie = flow->cookie; 631 + struct mlx5_ct_entry *entry; 632 + 633 + entry = rhashtable_lookup_fast(&ft->ct_entries_ht, &cookie, 634 + cts_ht_params); 635 + if (!entry) 636 + return -ENOENT; 637 + 638 + mlx5_tc_ct_entry_del_rules(ft->ct_priv, entry); 639 + WARN_ON(rhashtable_remove_fast(&ft->ct_entries_ht, 640 + &entry->node, 641 + cts_ht_params)); 642 + list_del(&entry->list); 643 + kfree(entry); 644 + 645 + return 0; 646 + } 647 + 648 + static int 649 + mlx5_tc_ct_block_flow_offload_stats(struct mlx5_ct_ft *ft, 650 + struct flow_cls_offload *f) 651 + { 652 + unsigned long cookie = f->cookie; 653 + struct mlx5_ct_entry *entry; 654 + u64 lastuse, packets, bytes; 655 + 656 + entry = rhashtable_lookup_fast(&ft->ct_entries_ht, &cookie, 657 + cts_ht_params); 658 + if (!entry) 659 + return -ENOENT; 660 + 661 + mlx5_fc_query_cached(entry->counter, &bytes, &packets, &lastuse); 662 + flow_stats_update(&f->stats, bytes, packets, lastuse); 663 + 664 + return 0; 665 + } 666 + 667 + static int 668 + mlx5_tc_ct_block_flow_offload(enum tc_setup_type type, void *type_data, 669 + void *cb_priv) 670 + { 671 + struct flow_cls_offload *f = type_data; 672 + struct mlx5_ct_ft *ft = cb_priv; 673 + 674 + if (type != TC_SETUP_CLSFLOWER) 675 + return -EOPNOTSUPP; 676 + 677 + switch (f->command) { 678 + case FLOW_CLS_REPLACE: 679 + return mlx5_tc_ct_block_flow_offload_add(ft, f); 680 + case FLOW_CLS_DESTROY: 681 + return mlx5_tc_ct_block_flow_offload_del(ft, f); 682 + case FLOW_CLS_STATS: 683 + return mlx5_tc_ct_block_flow_offload_stats(ft, f); 684 + default: 685 + break; 686 + }; 687 + 688 + return -EOPNOTSUPP; 689 + } 690 + 691 + int 692 + mlx5_tc_ct_parse_match(struct mlx5e_priv *priv, 693 + struct mlx5_flow_spec *spec, 694 + struct flow_cls_offload *f, 695 + struct netlink_ext_ack *extack) 696 + { 697 + struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv); 698 + struct flow_dissector_key_ct *mask, *key; 699 + bool trk, est, untrk, unest, new, unnew; 700 + u32 ctstate = 0, ctstate_mask = 0; 701 + u16 ct_state_on, ct_state_off; 702 + u16 ct_state, ct_state_mask; 703 + struct flow_match_ct match; 704 + 705 + if (!flow_rule_match_key(f->rule, FLOW_DISSECTOR_KEY_CT)) 706 + return 0; 707 + 708 + if (!ct_priv) { 709 + NL_SET_ERR_MSG_MOD(extack, 710 + "offload of ct matching isn't available"); 711 + return -EOPNOTSUPP; 712 + } 713 + 714 + flow_rule_match_ct(f->rule, &match); 715 + 716 + key = match.key; 717 + mask = match.mask; 718 + 719 + ct_state = key->ct_state; 720 + ct_state_mask = mask->ct_state; 721 + 722 + if (ct_state_mask & ~(TCA_FLOWER_KEY_CT_FLAGS_TRACKED | 723 + TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED | 724 + TCA_FLOWER_KEY_CT_FLAGS_NEW)) { 725 + NL_SET_ERR_MSG_MOD(extack, 726 + "only ct_state trk, est and new are supported for offload"); 727 + return -EOPNOTSUPP; 728 + } 729 + 730 + if (mask->ct_labels[1] || mask->ct_labels[2] || mask->ct_labels[3]) { 731 + NL_SET_ERR_MSG_MOD(extack, 732 + "only lower 32bits of ct_labels are supported for offload"); 733 + return -EOPNOTSUPP; 734 + } 735 + 736 + ct_state_on = ct_state & ct_state_mask; 737 + ct_state_off = (ct_state & ct_state_mask) ^ ct_state_mask; 738 + trk = ct_state_on & TCA_FLOWER_KEY_CT_FLAGS_TRACKED; 739 + new = ct_state_on & TCA_FLOWER_KEY_CT_FLAGS_NEW; 740 + est = ct_state_on & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED; 741 + untrk = ct_state_off & TCA_FLOWER_KEY_CT_FLAGS_TRACKED; 742 + unnew = ct_state_off & TCA_FLOWER_KEY_CT_FLAGS_NEW; 743 + unest = ct_state_off & TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED; 744 + 745 + ctstate |= trk ? MLX5_CT_STATE_TRK_BIT : 0; 746 + ctstate |= est ? MLX5_CT_STATE_ESTABLISHED_BIT : 0; 747 + ctstate_mask |= (untrk || trk) ? MLX5_CT_STATE_TRK_BIT : 0; 748 + ctstate_mask |= (unest || est) ? MLX5_CT_STATE_ESTABLISHED_BIT : 0; 749 + 750 + if (new) { 751 + NL_SET_ERR_MSG_MOD(extack, 752 + "matching on ct_state +new isn't supported"); 753 + return -EOPNOTSUPP; 754 + } 755 + 756 + if (mask->ct_zone) 757 + mlx5e_tc_match_to_reg_match(spec, ZONE_TO_REG, 758 + key->ct_zone, MLX5_CT_ZONE_MASK); 759 + if (ctstate_mask) 760 + mlx5e_tc_match_to_reg_match(spec, CTSTATE_TO_REG, 761 + ctstate, ctstate_mask); 762 + if (mask->ct_mark) 763 + mlx5e_tc_match_to_reg_match(spec, MARK_TO_REG, 764 + key->ct_mark, mask->ct_mark); 765 + if (mask->ct_labels[0]) 766 + mlx5e_tc_match_to_reg_match(spec, LABELS_TO_REG, 767 + key->ct_labels[0], 768 + mask->ct_labels[0]); 769 + 770 + return 0; 771 + } 772 + 773 + int 774 + mlx5_tc_ct_parse_action(struct mlx5e_priv *priv, 775 + struct mlx5_esw_flow_attr *attr, 776 + const struct flow_action_entry *act, 777 + struct netlink_ext_ack *extack) 778 + { 779 + struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv); 780 + 781 + if (!ct_priv) { 782 + NL_SET_ERR_MSG_MOD(extack, 783 + "offload of ct action isn't available"); 784 + return -EOPNOTSUPP; 785 + } 786 + 787 + attr->ct_attr.zone = act->ct.zone; 788 + attr->ct_attr.ct_action = act->ct.action; 789 + attr->ct_attr.nf_ft = act->ct.flow_table; 790 + 791 + return 0; 792 + } 793 + 794 + static struct mlx5_ct_ft * 795 + mlx5_tc_ct_add_ft_cb(struct mlx5_tc_ct_priv *ct_priv, u16 zone, 796 + struct nf_flowtable *nf_ft) 797 + { 798 + struct mlx5_ct_ft *ft; 799 + int err; 800 + 801 + ft = rhashtable_lookup_fast(&ct_priv->zone_ht, &zone, zone_params); 802 + if (ft) { 803 + refcount_inc(&ft->refcount); 804 + return ft; 805 + } 806 + 807 + ft = kzalloc(sizeof(*ft), GFP_KERNEL); 808 + if (!ft) 809 + return ERR_PTR(-ENOMEM); 810 + 811 + ft->zone = zone; 812 + ft->nf_ft = nf_ft; 813 + ft->ct_priv = ct_priv; 814 + INIT_LIST_HEAD(&ft->ct_entries_list); 815 + refcount_set(&ft->refcount, 1); 816 + 817 + err = rhashtable_init(&ft->ct_entries_ht, &cts_ht_params); 818 + if (err) 819 + goto err_init; 820 + 821 + err = rhashtable_insert_fast(&ct_priv->zone_ht, &ft->node, 822 + zone_params); 823 + if (err) 824 + goto err_insert; 825 + 826 + err = nf_flow_table_offload_add_cb(ft->nf_ft, 827 + mlx5_tc_ct_block_flow_offload, ft); 828 + if (err) 829 + goto err_add_cb; 830 + 831 + return ft; 832 + 833 + err_add_cb: 834 + rhashtable_remove_fast(&ct_priv->zone_ht, &ft->node, zone_params); 835 + err_insert: 836 + rhashtable_destroy(&ft->ct_entries_ht); 837 + err_init: 838 + kfree(ft); 839 + return ERR_PTR(err); 840 + } 841 + 842 + static void 843 + mlx5_tc_ct_flush_ft(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft) 844 + { 845 + struct mlx5_ct_entry *entry; 846 + 847 + list_for_each_entry(entry, &ft->ct_entries_list, list) 848 + mlx5_tc_ct_entry_del_rules(ft->ct_priv, entry); 849 + } 850 + 851 + static void 852 + mlx5_tc_ct_del_ft_cb(struct mlx5_tc_ct_priv *ct_priv, struct mlx5_ct_ft *ft) 853 + { 854 + if (!refcount_dec_and_test(&ft->refcount)) 855 + return; 856 + 857 + nf_flow_table_offload_del_cb(ft->nf_ft, 858 + mlx5_tc_ct_block_flow_offload, ft); 859 + mlx5_tc_ct_flush_ft(ct_priv, ft); 860 + rhashtable_remove_fast(&ct_priv->zone_ht, &ft->node, zone_params); 861 + rhashtable_destroy(&ft->ct_entries_ht); 862 + kfree(ft); 863 + } 864 + 865 + /* We translate the tc filter with CT action to the following HW model: 866 + * 867 + * +-------------------+ +--------------------+ +--------------+ 868 + * + pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct +-----> 869 + * + original match + | + tuple + zone match + | + fte_id match + | 870 + * +-------------------+ | +--------------------+ | +--------------+ | 871 + * v v v 872 + * set chain miss mapping set mark original 873 + * set fte_id set label filter 874 + * set zone set established actions 875 + * set tunnel_id do nat (if needed) 876 + * do decap 877 + */ 878 + static int 879 + __mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv, 880 + struct mlx5e_tc_flow *flow, 881 + struct mlx5_flow_spec *orig_spec, 882 + struct mlx5_esw_flow_attr *attr, 883 + struct mlx5_flow_handle **flow_rule) 884 + { 885 + struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv); 886 + bool nat = attr->ct_attr.ct_action & TCA_CT_ACT_NAT; 887 + struct mlx5e_tc_mod_hdr_acts pre_mod_acts = {}; 888 + struct mlx5_eswitch *esw = ct_priv->esw; 889 + struct mlx5_flow_spec post_ct_spec = {}; 890 + struct mlx5_esw_flow_attr *pre_ct_attr; 891 + struct mlx5_modify_hdr *mod_hdr; 892 + struct mlx5_flow_handle *rule; 893 + struct mlx5_ct_flow *ct_flow; 894 + int chain_mapping = 0, err; 895 + struct mlx5_ct_ft *ft; 896 + u32 fte_id = 1; 897 + 898 + ct_flow = kzalloc(sizeof(*ct_flow), GFP_KERNEL); 899 + if (!ct_flow) 900 + return -ENOMEM; 901 + 902 + /* Register for CT established events */ 903 + ft = mlx5_tc_ct_add_ft_cb(ct_priv, attr->ct_attr.zone, 904 + attr->ct_attr.nf_ft); 905 + if (IS_ERR(ft)) { 906 + err = PTR_ERR(ft); 907 + ct_dbg("Failed to register to ft callback"); 908 + goto err_ft; 909 + } 910 + ct_flow->ft = ft; 911 + 912 + err = idr_alloc_u32(&ct_priv->fte_ids, ct_flow, &fte_id, 913 + MLX5_FTE_ID_MAX, GFP_KERNEL); 914 + if (err) { 915 + netdev_warn(priv->netdev, 916 + "Failed to allocate fte id, err: %d\n", err); 917 + goto err_idr; 918 + } 919 + ct_flow->fte_id = fte_id; 920 + 921 + /* Base esw attributes of both rules on original rule attribute */ 922 + pre_ct_attr = &ct_flow->pre_ct_attr; 923 + memcpy(pre_ct_attr, attr, sizeof(*attr)); 924 + memcpy(&ct_flow->post_ct_attr, attr, sizeof(*attr)); 925 + 926 + /* Modify the original rule's action to fwd and modify, leave decap */ 927 + pre_ct_attr->action = attr->action & MLX5_FLOW_CONTEXT_ACTION_DECAP; 928 + pre_ct_attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | 929 + MLX5_FLOW_CONTEXT_ACTION_MOD_HDR; 930 + 931 + /* Write chain miss tag for miss in ct table as we 932 + * don't go though all prios of this chain as normal tc rules 933 + * miss. 934 + */ 935 + err = mlx5_esw_chains_get_chain_mapping(esw, attr->chain, 936 + &chain_mapping); 937 + if (err) { 938 + ct_dbg("Failed to get chain register mapping for chain"); 939 + goto err_get_chain; 940 + } 941 + ct_flow->chain_mapping = chain_mapping; 942 + 943 + err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts, 944 + CHAIN_TO_REG, chain_mapping); 945 + if (err) { 946 + ct_dbg("Failed to set chain register mapping"); 947 + goto err_mapping; 948 + } 949 + 950 + err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts, ZONE_TO_REG, 951 + attr->ct_attr.zone & 952 + MLX5_CT_ZONE_MASK); 953 + if (err) { 954 + ct_dbg("Failed to set zone register mapping"); 955 + goto err_mapping; 956 + } 957 + 958 + err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts, 959 + FTEID_TO_REG, fte_id); 960 + if (err) { 961 + ct_dbg("Failed to set fte_id register mapping"); 962 + goto err_mapping; 963 + } 964 + 965 + /* If original flow is decap, we do it before going into ct table 966 + * so add a rewrite for the tunnel match_id. 967 + */ 968 + if ((pre_ct_attr->action & MLX5_FLOW_CONTEXT_ACTION_DECAP) && 969 + attr->chain == 0) { 970 + u32 tun_id = mlx5e_tc_get_flow_tun_id(flow); 971 + 972 + err = mlx5e_tc_match_to_reg_set(esw->dev, &pre_mod_acts, 973 + TUNNEL_TO_REG, 974 + tun_id); 975 + if (err) { 976 + ct_dbg("Failed to set tunnel register mapping"); 977 + goto err_mapping; 978 + } 979 + } 980 + 981 + mod_hdr = mlx5_modify_header_alloc(esw->dev, 982 + MLX5_FLOW_NAMESPACE_FDB, 983 + pre_mod_acts.num_actions, 984 + pre_mod_acts.actions); 985 + if (IS_ERR(mod_hdr)) { 986 + err = PTR_ERR(mod_hdr); 987 + ct_dbg("Failed to create pre ct mod hdr"); 988 + goto err_mapping; 989 + } 990 + pre_ct_attr->modify_hdr = mod_hdr; 991 + 992 + /* Post ct rule matches on fte_id and executes original rule's 993 + * tc rule action 994 + */ 995 + mlx5e_tc_match_to_reg_match(&post_ct_spec, FTEID_TO_REG, 996 + fte_id, MLX5_FTE_ID_MASK); 997 + 998 + /* Put post_ct rule on post_ct fdb */ 999 + ct_flow->post_ct_attr.chain = 0; 1000 + ct_flow->post_ct_attr.prio = 0; 1001 + ct_flow->post_ct_attr.fdb = ct_priv->post_ct; 1002 + 1003 + ct_flow->post_ct_attr.inner_match_level = MLX5_MATCH_NONE; 1004 + ct_flow->post_ct_attr.outer_match_level = MLX5_MATCH_NONE; 1005 + ct_flow->post_ct_attr.action &= ~(MLX5_FLOW_CONTEXT_ACTION_DECAP); 1006 + rule = mlx5_eswitch_add_offloaded_rule(esw, &post_ct_spec, 1007 + &ct_flow->post_ct_attr); 1008 + ct_flow->post_ct_rule = rule; 1009 + if (IS_ERR(ct_flow->post_ct_rule)) { 1010 + err = PTR_ERR(ct_flow->post_ct_rule); 1011 + ct_dbg("Failed to add post ct rule"); 1012 + goto err_insert_post_ct; 1013 + } 1014 + 1015 + /* Change original rule point to ct table */ 1016 + pre_ct_attr->dest_chain = 0; 1017 + pre_ct_attr->dest_ft = nat ? ct_priv->ct_nat : ct_priv->ct; 1018 + ct_flow->pre_ct_rule = mlx5_eswitch_add_offloaded_rule(esw, 1019 + orig_spec, 1020 + pre_ct_attr); 1021 + if (IS_ERR(ct_flow->pre_ct_rule)) { 1022 + err = PTR_ERR(ct_flow->pre_ct_rule); 1023 + ct_dbg("Failed to add pre ct rule"); 1024 + goto err_insert_orig; 1025 + } 1026 + 1027 + attr->ct_attr.ct_flow = ct_flow; 1028 + *flow_rule = ct_flow->post_ct_rule; 1029 + dealloc_mod_hdr_actions(&pre_mod_acts); 1030 + 1031 + return 0; 1032 + 1033 + err_insert_orig: 1034 + mlx5_eswitch_del_offloaded_rule(ct_priv->esw, ct_flow->post_ct_rule, 1035 + &ct_flow->post_ct_attr); 1036 + err_insert_post_ct: 1037 + mlx5_modify_header_dealloc(priv->mdev, pre_ct_attr->modify_hdr); 1038 + err_mapping: 1039 + dealloc_mod_hdr_actions(&pre_mod_acts); 1040 + mlx5_esw_chains_put_chain_mapping(esw, ct_flow->chain_mapping); 1041 + err_get_chain: 1042 + idr_remove(&ct_priv->fte_ids, fte_id); 1043 + err_idr: 1044 + mlx5_tc_ct_del_ft_cb(ct_priv, ft); 1045 + err_ft: 1046 + kfree(ct_flow); 1047 + netdev_warn(priv->netdev, "Failed to offload ct flow, err %d\n", err); 1048 + return err; 1049 + } 1050 + 1051 + static int 1052 + __mlx5_tc_ct_flow_offload_clear(struct mlx5e_priv *priv, 1053 + struct mlx5e_tc_flow *flow, 1054 + struct mlx5_flow_spec *orig_spec, 1055 + struct mlx5_esw_flow_attr *attr, 1056 + struct mlx5e_tc_mod_hdr_acts *mod_acts, 1057 + struct mlx5_flow_handle **flow_rule) 1058 + { 1059 + struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv); 1060 + struct mlx5_eswitch *esw = ct_priv->esw; 1061 + struct mlx5_esw_flow_attr *pre_ct_attr; 1062 + struct mlx5_modify_hdr *mod_hdr; 1063 + struct mlx5_flow_handle *rule; 1064 + struct mlx5_ct_flow *ct_flow; 1065 + int err; 1066 + 1067 + ct_flow = kzalloc(sizeof(*ct_flow), GFP_KERNEL); 1068 + if (!ct_flow) 1069 + return -ENOMEM; 1070 + 1071 + /* Base esw attributes on original rule attribute */ 1072 + pre_ct_attr = &ct_flow->pre_ct_attr; 1073 + memcpy(pre_ct_attr, attr, sizeof(*attr)); 1074 + 1075 + err = mlx5_tc_ct_entry_set_registers(ct_priv, mod_acts, 0, 0, 0, 0); 1076 + if (err) { 1077 + ct_dbg("Failed to set register for ct clear"); 1078 + goto err_set_registers; 1079 + } 1080 + 1081 + mod_hdr = mlx5_modify_header_alloc(esw->dev, 1082 + MLX5_FLOW_NAMESPACE_FDB, 1083 + mod_acts->num_actions, 1084 + mod_acts->actions); 1085 + if (IS_ERR(mod_hdr)) { 1086 + err = PTR_ERR(mod_hdr); 1087 + ct_dbg("Failed to add create ct clear mod hdr"); 1088 + goto err_set_registers; 1089 + } 1090 + 1091 + dealloc_mod_hdr_actions(mod_acts); 1092 + pre_ct_attr->modify_hdr = mod_hdr; 1093 + pre_ct_attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR; 1094 + 1095 + rule = mlx5_eswitch_add_offloaded_rule(esw, orig_spec, pre_ct_attr); 1096 + if (IS_ERR(rule)) { 1097 + err = PTR_ERR(rule); 1098 + ct_dbg("Failed to add ct clear rule"); 1099 + goto err_insert; 1100 + } 1101 + 1102 + attr->ct_attr.ct_flow = ct_flow; 1103 + ct_flow->pre_ct_rule = rule; 1104 + *flow_rule = rule; 1105 + 1106 + return 0; 1107 + 1108 + err_insert: 1109 + mlx5_modify_header_dealloc(priv->mdev, mod_hdr); 1110 + err_set_registers: 1111 + netdev_warn(priv->netdev, 1112 + "Failed to offload ct clear flow, err %d\n", err); 1113 + return err; 1114 + } 1115 + 1116 + struct mlx5_flow_handle * 1117 + mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv, 1118 + struct mlx5e_tc_flow *flow, 1119 + struct mlx5_flow_spec *spec, 1120 + struct mlx5_esw_flow_attr *attr, 1121 + struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts) 1122 + { 1123 + bool clear_action = attr->ct_attr.ct_action & TCA_CT_ACT_CLEAR; 1124 + struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv); 1125 + struct mlx5_flow_handle *rule; 1126 + int err; 1127 + 1128 + if (!ct_priv) 1129 + return ERR_PTR(-EOPNOTSUPP); 1130 + 1131 + mutex_lock(&ct_priv->control_lock); 1132 + if (clear_action) 1133 + err = __mlx5_tc_ct_flow_offload_clear(priv, flow, spec, attr, 1134 + mod_hdr_acts, &rule); 1135 + else 1136 + err = __mlx5_tc_ct_flow_offload(priv, flow, spec, attr, 1137 + &rule); 1138 + mutex_unlock(&ct_priv->control_lock); 1139 + if (err) 1140 + return ERR_PTR(err); 1141 + 1142 + return rule; 1143 + } 1144 + 1145 + static void 1146 + __mlx5_tc_ct_delete_flow(struct mlx5_tc_ct_priv *ct_priv, 1147 + struct mlx5_ct_flow *ct_flow) 1148 + { 1149 + struct mlx5_esw_flow_attr *pre_ct_attr = &ct_flow->pre_ct_attr; 1150 + struct mlx5_eswitch *esw = ct_priv->esw; 1151 + 1152 + mlx5_eswitch_del_offloaded_rule(esw, ct_flow->pre_ct_rule, 1153 + pre_ct_attr); 1154 + mlx5_modify_header_dealloc(esw->dev, pre_ct_attr->modify_hdr); 1155 + 1156 + if (ct_flow->post_ct_rule) { 1157 + mlx5_eswitch_del_offloaded_rule(esw, ct_flow->post_ct_rule, 1158 + &ct_flow->post_ct_attr); 1159 + mlx5_esw_chains_put_chain_mapping(esw, ct_flow->chain_mapping); 1160 + idr_remove(&ct_priv->fte_ids, ct_flow->fte_id); 1161 + mlx5_tc_ct_del_ft_cb(ct_priv, ct_flow->ft); 1162 + } 1163 + 1164 + kfree(ct_flow); 1165 + } 1166 + 1167 + void 1168 + mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow, 1169 + struct mlx5_esw_flow_attr *attr) 1170 + { 1171 + struct mlx5_tc_ct_priv *ct_priv = mlx5_tc_ct_get_ct_priv(priv); 1172 + struct mlx5_ct_flow *ct_flow = attr->ct_attr.ct_flow; 1173 + 1174 + /* We are called on error to clean up stuff from parsing 1175 + * but we don't have anything for now 1176 + */ 1177 + if (!ct_flow) 1178 + return; 1179 + 1180 + mutex_lock(&ct_priv->control_lock); 1181 + __mlx5_tc_ct_delete_flow(ct_priv, ct_flow); 1182 + mutex_unlock(&ct_priv->control_lock); 1183 + } 1184 + 1185 + static int 1186 + mlx5_tc_ct_init_check_support(struct mlx5_eswitch *esw, 1187 + const char **err_msg) 1188 + { 1189 + #if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT) 1190 + /* cannot restore chain ID on HW miss */ 1191 + 1192 + *err_msg = "tc skb extension missing"; 1193 + return -EOPNOTSUPP; 1194 + #endif 1195 + 1196 + if (!MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ignore_flow_level)) { 1197 + *err_msg = "firmware level support is missing"; 1198 + return -EOPNOTSUPP; 1199 + } 1200 + 1201 + if (!mlx5_eswitch_vlan_actions_supported(esw->dev, 1)) { 1202 + /* vlan workaround should be avoided for multi chain rules. 1203 + * This is just a sanity check as pop vlan action should 1204 + * be supported by any FW that supports ignore_flow_level 1205 + */ 1206 + 1207 + *err_msg = "firmware vlan actions support is missing"; 1208 + return -EOPNOTSUPP; 1209 + } 1210 + 1211 + if (!MLX5_CAP_ESW_FLOWTABLE(esw->dev, 1212 + fdb_modify_header_fwd_to_table)) { 1213 + /* CT always writes to registers which are mod header actions. 1214 + * Therefore, mod header and goto is required 1215 + */ 1216 + 1217 + *err_msg = "firmware fwd and modify support is missing"; 1218 + return -EOPNOTSUPP; 1219 + } 1220 + 1221 + if (!mlx5_eswitch_reg_c1_loopback_enabled(esw)) { 1222 + *err_msg = "register loopback isn't supported"; 1223 + return -EOPNOTSUPP; 1224 + } 1225 + 1226 + return 0; 1227 + } 1228 + 1229 + static void 1230 + mlx5_tc_ct_init_err(struct mlx5e_rep_priv *rpriv, const char *msg, int err) 1231 + { 1232 + if (msg) 1233 + netdev_warn(rpriv->netdev, 1234 + "tc ct offload not supported, %s, err: %d\n", 1235 + msg, err); 1236 + else 1237 + netdev_warn(rpriv->netdev, 1238 + "tc ct offload not supported, err: %d\n", 1239 + err); 1240 + } 1241 + 1242 + int 1243 + mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv) 1244 + { 1245 + struct mlx5_tc_ct_priv *ct_priv; 1246 + struct mlx5e_rep_priv *rpriv; 1247 + struct mlx5_eswitch *esw; 1248 + struct mlx5e_priv *priv; 1249 + const char *msg; 1250 + int err; 1251 + 1252 + rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv); 1253 + priv = netdev_priv(rpriv->netdev); 1254 + esw = priv->mdev->priv.eswitch; 1255 + 1256 + err = mlx5_tc_ct_init_check_support(esw, &msg); 1257 + if (err) { 1258 + mlx5_tc_ct_init_err(rpriv, msg, err); 1259 + goto err_support; 1260 + } 1261 + 1262 + ct_priv = kzalloc(sizeof(*ct_priv), GFP_KERNEL); 1263 + if (!ct_priv) { 1264 + mlx5_tc_ct_init_err(rpriv, NULL, -ENOMEM); 1265 + goto err_alloc; 1266 + } 1267 + 1268 + ct_priv->esw = esw; 1269 + ct_priv->netdev = rpriv->netdev; 1270 + ct_priv->ct = mlx5_esw_chains_create_global_table(esw); 1271 + if (IS_ERR(ct_priv->ct)) { 1272 + err = PTR_ERR(ct_priv->ct); 1273 + mlx5_tc_ct_init_err(rpriv, "failed to create ct table", err); 1274 + goto err_ct_tbl; 1275 + } 1276 + 1277 + ct_priv->ct_nat = mlx5_esw_chains_create_global_table(esw); 1278 + if (IS_ERR(ct_priv->ct_nat)) { 1279 + err = PTR_ERR(ct_priv->ct_nat); 1280 + mlx5_tc_ct_init_err(rpriv, "failed to create ct nat table", 1281 + err); 1282 + goto err_ct_nat_tbl; 1283 + } 1284 + 1285 + ct_priv->post_ct = mlx5_esw_chains_create_global_table(esw); 1286 + if (IS_ERR(ct_priv->post_ct)) { 1287 + err = PTR_ERR(ct_priv->post_ct); 1288 + mlx5_tc_ct_init_err(rpriv, "failed to create post ct table", 1289 + err); 1290 + goto err_post_ct_tbl; 1291 + } 1292 + 1293 + idr_init(&ct_priv->fte_ids); 1294 + idr_init(&ct_priv->tuple_ids); 1295 + mutex_init(&ct_priv->control_lock); 1296 + rhashtable_init(&ct_priv->zone_ht, &zone_params); 1297 + 1298 + /* Done, set ct_priv to know it initializted */ 1299 + uplink_priv->ct_priv = ct_priv; 1300 + 1301 + return 0; 1302 + 1303 + err_post_ct_tbl: 1304 + mlx5_esw_chains_destroy_global_table(esw, ct_priv->ct_nat); 1305 + err_ct_nat_tbl: 1306 + mlx5_esw_chains_destroy_global_table(esw, ct_priv->ct); 1307 + err_ct_tbl: 1308 + kfree(ct_priv); 1309 + err_alloc: 1310 + err_support: 1311 + 1312 + return 0; 1313 + } 1314 + 1315 + void 1316 + mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv) 1317 + { 1318 + struct mlx5_tc_ct_priv *ct_priv = uplink_priv->ct_priv; 1319 + 1320 + if (!ct_priv) 1321 + return; 1322 + 1323 + mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->post_ct); 1324 + mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->ct_nat); 1325 + mlx5_esw_chains_destroy_global_table(ct_priv->esw, ct_priv->ct); 1326 + 1327 + rhashtable_destroy(&ct_priv->zone_ht); 1328 + mutex_destroy(&ct_priv->control_lock); 1329 + idr_destroy(&ct_priv->tuple_ids); 1330 + idr_destroy(&ct_priv->fte_ids); 1331 + kfree(ct_priv); 1332 + 1333 + uplink_priv->ct_priv = NULL; 1334 + } 1335 + 1336 + bool 1337 + mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv, 1338 + struct sk_buff *skb, u32 tupleid) 1339 + { 1340 + struct mlx5_tc_ct_priv *ct_priv = uplink_priv->ct_priv; 1341 + struct mlx5_ct_zone_rule *zone_rule; 1342 + struct mlx5_ct_entry *entry; 1343 + 1344 + if (!ct_priv || !tupleid) 1345 + return true; 1346 + 1347 + zone_rule = idr_find(&ct_priv->tuple_ids, tupleid); 1348 + if (!zone_rule) 1349 + return false; 1350 + 1351 + entry = container_of(zone_rule, struct mlx5_ct_entry, 1352 + zone_rules[zone_rule->nat]); 1353 + tcf_ct_flow_table_restore_skb(skb, entry->restore_cookie); 1354 + 1355 + return true; 1356 + }
+171
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + /* Copyright (c) 2018 Mellanox Technologies. */ 3 + 4 + #ifndef __MLX5_EN_TC_CT_H__ 5 + #define __MLX5_EN_TC_CT_H__ 6 + 7 + #include <net/pkt_cls.h> 8 + #include <linux/mlx5/fs.h> 9 + #include <net/tc_act/tc_ct.h> 10 + 11 + struct mlx5_esw_flow_attr; 12 + struct mlx5e_tc_mod_hdr_acts; 13 + struct mlx5_rep_uplink_priv; 14 + struct mlx5e_tc_flow; 15 + struct mlx5e_priv; 16 + 17 + struct mlx5_ct_flow; 18 + 19 + struct nf_flowtable; 20 + 21 + struct mlx5_ct_attr { 22 + u16 zone; 23 + u16 ct_action; 24 + struct mlx5_ct_flow *ct_flow; 25 + struct nf_flowtable *nf_ft; 26 + }; 27 + 28 + #define zone_to_reg_ct {\ 29 + .mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_2,\ 30 + .moffset = 0,\ 31 + .mlen = 2,\ 32 + .soffset = MLX5_BYTE_OFF(fte_match_param,\ 33 + misc_parameters_2.metadata_reg_c_2) + 2,\ 34 + } 35 + 36 + #define ctstate_to_reg_ct {\ 37 + .mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_2,\ 38 + .moffset = 2,\ 39 + .mlen = 2,\ 40 + .soffset = MLX5_BYTE_OFF(fte_match_param,\ 41 + misc_parameters_2.metadata_reg_c_2),\ 42 + } 43 + 44 + #define mark_to_reg_ct {\ 45 + .mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_3,\ 46 + .moffset = 0,\ 47 + .mlen = 4,\ 48 + .soffset = MLX5_BYTE_OFF(fte_match_param,\ 49 + misc_parameters_2.metadata_reg_c_3),\ 50 + } 51 + 52 + #define labels_to_reg_ct {\ 53 + .mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_4,\ 54 + .moffset = 0,\ 55 + .mlen = 4,\ 56 + .soffset = MLX5_BYTE_OFF(fte_match_param,\ 57 + misc_parameters_2.metadata_reg_c_4),\ 58 + } 59 + 60 + #define fteid_to_reg_ct {\ 61 + .mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_5,\ 62 + .moffset = 0,\ 63 + .mlen = 4,\ 64 + .soffset = MLX5_BYTE_OFF(fte_match_param,\ 65 + misc_parameters_2.metadata_reg_c_5),\ 66 + } 67 + 68 + #define tupleid_to_reg_ct {\ 69 + .mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_1,\ 70 + .moffset = 0,\ 71 + .mlen = 3,\ 72 + .soffset = MLX5_BYTE_OFF(fte_match_param,\ 73 + misc_parameters_2.metadata_reg_c_1),\ 74 + } 75 + 76 + #define TUPLE_ID_BITS (mlx5e_tc_attr_to_reg_mappings[TUPLEID_TO_REG].mlen * 8) 77 + #define TUPLE_ID_MAX GENMASK(TUPLE_ID_BITS - 1, 0) 78 + 79 + #if IS_ENABLED(CONFIG_MLX5_TC_CT) 80 + 81 + int 82 + mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv); 83 + void 84 + mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv); 85 + 86 + int 87 + mlx5_tc_ct_parse_match(struct mlx5e_priv *priv, 88 + struct mlx5_flow_spec *spec, 89 + struct flow_cls_offload *f, 90 + struct netlink_ext_ack *extack); 91 + int 92 + mlx5_tc_ct_parse_action(struct mlx5e_priv *priv, 93 + struct mlx5_esw_flow_attr *attr, 94 + const struct flow_action_entry *act, 95 + struct netlink_ext_ack *extack); 96 + 97 + struct mlx5_flow_handle * 98 + mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv, 99 + struct mlx5e_tc_flow *flow, 100 + struct mlx5_flow_spec *spec, 101 + struct mlx5_esw_flow_attr *attr, 102 + struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts); 103 + void 104 + mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv, 105 + struct mlx5e_tc_flow *flow, 106 + struct mlx5_esw_flow_attr *attr); 107 + 108 + bool 109 + mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv, 110 + struct sk_buff *skb, u32 tupleid); 111 + 112 + #else /* CONFIG_MLX5_TC_CT */ 113 + 114 + static inline int 115 + mlx5_tc_ct_init(struct mlx5_rep_uplink_priv *uplink_priv) 116 + { 117 + return 0; 118 + } 119 + 120 + static inline void 121 + mlx5_tc_ct_clean(struct mlx5_rep_uplink_priv *uplink_priv) 122 + { 123 + } 124 + 125 + static inline int 126 + mlx5_tc_ct_parse_match(struct mlx5e_priv *priv, 127 + struct mlx5_flow_spec *spec, 128 + struct flow_cls_offload *f, 129 + struct netlink_ext_ack *extack) 130 + { 131 + return -EOPNOTSUPP; 132 + } 133 + 134 + static inline int 135 + mlx5_tc_ct_parse_action(struct mlx5e_priv *priv, 136 + struct mlx5_esw_flow_attr *attr, 137 + const struct flow_action_entry *act, 138 + struct netlink_ext_ack *extack) 139 + { 140 + return -EOPNOTSUPP; 141 + } 142 + 143 + static inline struct mlx5_flow_handle * 144 + mlx5_tc_ct_flow_offload(struct mlx5e_priv *priv, 145 + struct mlx5e_tc_flow *flow, 146 + struct mlx5_flow_spec *spec, 147 + struct mlx5_esw_flow_attr *attr, 148 + struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts) 149 + { 150 + return ERR_PTR(-EOPNOTSUPP); 151 + } 152 + 153 + static inline void 154 + mlx5_tc_ct_delete_flow(struct mlx5e_priv *priv, 155 + struct mlx5e_tc_flow *flow, 156 + struct mlx5_esw_flow_attr *attr) 157 + { 158 + } 159 + 160 + static inline bool 161 + mlx5e_tc_ct_restore_flow(struct mlx5_rep_uplink_priv *uplink_priv, 162 + struct sk_buff *skb, u32 tupleid) 163 + { 164 + if (!tupleid) 165 + return true; 166 + 167 + return false; 168 + } 169 + 170 + #endif /* !IS_ENABLED(CONFIG_MLX5_TC_CT) */ 171 + #endif /* __MLX5_EN_TC_CT_H__ */
+1
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
··· 1607 1607 } 1608 1608 1609 1609 ft_attr.max_fte = 0; /* Empty table, miss rule will always point to next table */ 1610 + ft_attr.prio = 1; 1610 1611 ft_attr.level = 1; 1611 1612 1612 1613 rpriv->root_ft = mlx5_create_flow_table(ns, &ft_attr);
+3
drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
··· 55 55 unsigned long min_interval; /* jiffies */ 56 56 }; 57 57 58 + struct mlx5_tc_ct_priv; 58 59 struct mlx5_rep_uplink_priv { 59 60 /* Filters DB - instantiated by the uplink representor and shared by 60 61 * the uplink's VFs ··· 87 86 struct mapping_ctx *tunnel_mapping; 88 87 /* maps tun_enc_opts to a unique id*/ 89 88 struct mapping_ctx *tunnel_enc_opts_mapping; 89 + 90 + struct mlx5_tc_ct_priv *ct_priv; 90 91 }; 91 92 92 93 struct mlx5e_rep_priv {
+103 -17
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
··· 56 56 #include "en/port.h" 57 57 #include "en/tc_tun.h" 58 58 #include "en/mapping.h" 59 + #include "en/tc_ct.h" 59 60 #include "lib/devcom.h" 60 61 #include "lib/geneve.h" 61 62 #include "diag/en_tc_tracepoint.h" ··· 88 87 MLX5E_TC_FLOW_FLAG_DUP = MLX5E_TC_FLOW_BASE + 4, 89 88 MLX5E_TC_FLOW_FLAG_NOT_READY = MLX5E_TC_FLOW_BASE + 5, 90 89 MLX5E_TC_FLOW_FLAG_DELETED = MLX5E_TC_FLOW_BASE + 6, 90 + MLX5E_TC_FLOW_FLAG_CT = MLX5E_TC_FLOW_BASE + 7, 91 91 }; 92 92 93 93 #define MLX5E_TC_MAX_SPLITS 1 ··· 195 193 .soffset = MLX5_BYTE_OFF(fte_match_param, 196 194 misc_parameters_2.metadata_reg_c_1), 197 195 }, 196 + [ZONE_TO_REG] = zone_to_reg_ct, 197 + [CTSTATE_TO_REG] = ctstate_to_reg_ct, 198 + [MARK_TO_REG] = mark_to_reg_ct, 199 + [LABELS_TO_REG] = labels_to_reg_ct, 200 + [FTEID_TO_REG] = fteid_to_reg_ct, 201 + [TUPLEID_TO_REG] = tupleid_to_reg_ct, 198 202 }; 199 203 200 204 static void mlx5e_put_flow_tunnel_id(struct mlx5e_tc_flow *flow); ··· 1151 1143 struct mlx5_flow_spec *spec, 1152 1144 struct mlx5_esw_flow_attr *attr) 1153 1145 { 1146 + struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts; 1154 1147 struct mlx5_flow_handle *rule; 1148 + 1149 + if (flow_flag_test(flow, CT)) { 1150 + mod_hdr_acts = &attr->parse_attr->mod_hdr_acts; 1151 + 1152 + return mlx5_tc_ct_flow_offload(flow->priv, flow, spec, attr, 1153 + mod_hdr_acts); 1154 + } 1155 1155 1156 1156 rule = mlx5_eswitch_add_offloaded_rule(esw, spec, attr); 1157 1157 if (IS_ERR(rule)) ··· 1179 1163 static void 1180 1164 mlx5e_tc_unoffload_fdb_rules(struct mlx5_eswitch *esw, 1181 1165 struct mlx5e_tc_flow *flow, 1182 - struct mlx5_esw_flow_attr *attr) 1166 + struct mlx5_esw_flow_attr *attr) 1183 1167 { 1184 1168 flow_flag_clear(flow, OFFLOADED); 1169 + 1170 + if (flow_flag_test(flow, CT)) { 1171 + mlx5_tc_ct_delete_flow(flow->priv, flow, attr); 1172 + return; 1173 + } 1185 1174 1186 1175 if (attr->split_count) 1187 1176 mlx5_eswitch_del_fwd_rule(esw, flow->rule[1], attr); ··· 1959 1938 enc_opts_id); 1960 1939 } 1961 1940 1941 + u32 mlx5e_tc_get_flow_tun_id(struct mlx5e_tc_flow *flow) 1942 + { 1943 + return flow->tunnel_id; 1944 + } 1945 + 1962 1946 static int parse_tunnel_attr(struct mlx5e_priv *priv, 1963 1947 struct mlx5e_tc_flow *flow, 1964 1948 struct mlx5_flow_spec *spec, ··· 2129 2103 BIT(FLOW_DISSECTOR_KEY_ENC_CONTROL) | 2130 2104 BIT(FLOW_DISSECTOR_KEY_TCP) | 2131 2105 BIT(FLOW_DISSECTOR_KEY_IP) | 2106 + BIT(FLOW_DISSECTOR_KEY_CT) | 2132 2107 BIT(FLOW_DISSECTOR_KEY_ENC_IP) | 2133 2108 BIT(FLOW_DISSECTOR_KEY_ENC_OPTS))) { 2134 2109 NL_SET_ERR_MSG_MOD(extack, "Unsupported key"); ··· 2940 2913 __u8 hop_limit; 2941 2914 }; 2942 2915 2943 - static bool is_action_keys_supported(const struct flow_action_entry *act) 2916 + static int is_action_keys_supported(const struct flow_action_entry *act, 2917 + bool ct_flow, bool *modify_ip_header, 2918 + struct netlink_ext_ack *extack) 2944 2919 { 2945 2920 u32 mask, offset; 2946 2921 u8 htype; ··· 2961 2932 if (offset != offsetof(struct iphdr, ttl) || 2962 2933 ttl_word->protocol || 2963 2934 ttl_word->check) { 2964 - return true; 2935 + *modify_ip_header = true; 2936 + } 2937 + 2938 + if (ct_flow && offset >= offsetof(struct iphdr, saddr)) { 2939 + NL_SET_ERR_MSG_MOD(extack, 2940 + "can't offload re-write of ipv4 address with action ct"); 2941 + return -EOPNOTSUPP; 2965 2942 } 2966 2943 } else if (htype == FLOW_ACT_MANGLE_HDR_TYPE_IP6) { 2967 2944 struct ipv6_hoplimit_word *hoplimit_word = ··· 2976 2941 if (offset != offsetof(struct ipv6hdr, payload_len) || 2977 2942 hoplimit_word->payload_len || 2978 2943 hoplimit_word->nexthdr) { 2979 - return true; 2944 + *modify_ip_header = true; 2980 2945 } 2946 + 2947 + if (ct_flow && offset >= offsetof(struct ipv6hdr, saddr)) { 2948 + NL_SET_ERR_MSG_MOD(extack, 2949 + "can't offload re-write of ipv6 address with action ct"); 2950 + return -EOPNOTSUPP; 2951 + } 2952 + } else if (ct_flow && (htype == FLOW_ACT_MANGLE_HDR_TYPE_TCP || 2953 + htype == FLOW_ACT_MANGLE_HDR_TYPE_UDP)) { 2954 + NL_SET_ERR_MSG_MOD(extack, 2955 + "can't offload re-write of transport header ports with action ct"); 2956 + return -EOPNOTSUPP; 2981 2957 } 2982 - return false; 2958 + 2959 + return 0; 2983 2960 } 2984 2961 2985 2962 static bool modify_header_match_supported(struct mlx5_flow_spec *spec, 2986 2963 struct flow_action *flow_action, 2987 - u32 actions, 2964 + u32 actions, bool ct_flow, 2988 2965 struct netlink_ext_ack *extack) 2989 2966 { 2990 2967 const struct flow_action_entry *act; ··· 3004 2957 void *headers_v; 3005 2958 u16 ethertype; 3006 2959 u8 ip_proto; 3007 - int i; 2960 + int i, err; 3008 2961 3009 2962 headers_v = get_match_headers_value(actions, spec); 3010 2963 ethertype = MLX5_GET(fte_match_set_lyr_2_4, headers_v, ethertype); ··· 3019 2972 act->id != FLOW_ACTION_ADD) 3020 2973 continue; 3021 2974 3022 - if (is_action_keys_supported(act)) { 3023 - modify_ip_header = true; 3024 - break; 3025 - } 2975 + err = is_action_keys_supported(act, ct_flow, 2976 + &modify_ip_header, extack); 2977 + if (err) 2978 + return err; 3026 2979 } 3027 2980 3028 2981 ip_proto = MLX5_GET(fte_match_set_lyr_2_4, headers_v, ip_protocol); ··· 3045 2998 struct netlink_ext_ack *extack) 3046 2999 { 3047 3000 struct net_device *filter_dev = parse_attr->filter_dev; 3048 - bool drop_action, pop_action; 3001 + bool drop_action, pop_action, ct_flow; 3049 3002 u32 actions; 3050 3003 3051 - if (mlx5e_is_eswitch_flow(flow)) 3004 + ct_flow = flow_flag_test(flow, CT); 3005 + if (mlx5e_is_eswitch_flow(flow)) { 3052 3006 actions = flow->esw_attr->action; 3053 - else 3007 + 3008 + if (flow->esw_attr->split_count && ct_flow) { 3009 + /* All registers used by ct are cleared when using 3010 + * split rules. 3011 + */ 3012 + NL_SET_ERR_MSG_MOD(extack, 3013 + "Can't offload mirroring with action ct"); 3014 + return -EOPNOTSUPP; 3015 + } 3016 + } else { 3054 3017 actions = flow->nic_attr->action; 3018 + } 3055 3019 3056 3020 drop_action = actions & MLX5_FLOW_CONTEXT_ACTION_DROP; 3057 3021 pop_action = actions & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP; ··· 3079 3021 if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) 3080 3022 return modify_header_match_supported(&parse_attr->spec, 3081 3023 flow_action, actions, 3082 - extack); 3024 + ct_flow, extack); 3083 3025 3084 3026 return true; 3085 3027 } ··· 3884 3826 action |= MLX5_FLOW_CONTEXT_ACTION_COUNT; 3885 3827 attr->dest_chain = act->chain_index; 3886 3828 break; 3829 + case FLOW_ACTION_CT: 3830 + err = mlx5_tc_ct_parse_action(priv, attr, act, extack); 3831 + if (err) 3832 + return err; 3833 + 3834 + flow_flag_set(flow, CT); 3835 + break; 3887 3836 default: 3888 3837 NL_SET_ERR_MSG_MOD(extack, "The offload action is not supported"); 3889 3838 return -EOPNOTSUPP; ··· 4128 4063 goto err_free; 4129 4064 4130 4065 err = parse_tc_fdb_actions(priv, &rule->action, flow, extack); 4066 + if (err) 4067 + goto err_free; 4068 + 4069 + err = mlx5_tc_ct_parse_match(priv, &parse_attr->spec, f, extack); 4131 4070 if (err) 4132 4071 goto err_free; 4133 4072 ··· 4419 4350 goto errout; 4420 4351 } 4421 4352 4422 - if (mlx5e_is_offloaded_flow(flow)) { 4353 + if (mlx5e_is_offloaded_flow(flow) || flow_flag_test(flow, CT)) { 4423 4354 counter = mlx5e_tc_get_counter(flow); 4424 4355 if (!counter) 4425 4356 goto errout; ··· 4691 4622 uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht); 4692 4623 priv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv); 4693 4624 4625 + err = mlx5_tc_ct_init(uplink_priv); 4626 + if (err) 4627 + goto err_ct; 4628 + 4694 4629 mapping = mapping_create(sizeof(struct tunnel_match_key), 4695 4630 TUNNEL_INFO_BITS_MASK, true); 4696 4631 if (IS_ERR(mapping)) { ··· 4721 4648 err_enc_opts_mapping: 4722 4649 mapping_destroy(uplink_priv->tunnel_mapping); 4723 4650 err_tun_mapping: 4651 + mlx5_tc_ct_clean(uplink_priv); 4652 + err_ct: 4724 4653 netdev_warn(priv->netdev, 4725 4654 "Failed to initialize tc (eswitch), err: %d", err); 4726 4655 return err; ··· 4737 4662 uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht); 4738 4663 mapping_destroy(uplink_priv->tunnel_enc_opts_mapping); 4739 4664 mapping_destroy(uplink_priv->tunnel_mapping); 4665 + 4666 + mlx5_tc_ct_clean(uplink_priv); 4740 4667 } 4741 4668 4742 4669 int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags) ··· 4856 4779 struct mlx5e_tc_update_priv *tc_priv) 4857 4780 { 4858 4781 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) 4859 - u32 chain = 0, reg_c0, reg_c1, tunnel_id; 4782 + u32 chain = 0, reg_c0, reg_c1, tunnel_id, tuple_id; 4783 + struct mlx5_rep_uplink_priv *uplink_priv; 4784 + struct mlx5e_rep_priv *uplink_rpriv; 4860 4785 struct tc_skb_ext *tc_skb_ext; 4861 4786 struct mlx5_eswitch *esw; 4862 4787 struct mlx5e_priv *priv; ··· 4892 4813 } 4893 4814 4894 4815 tc_skb_ext->chain = chain; 4816 + 4817 + tuple_id = reg_c1 & TUPLE_ID_MAX; 4818 + 4819 + uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH); 4820 + uplink_priv = &uplink_rpriv->uplink_priv; 4821 + if (!mlx5e_tc_ct_restore_flow(uplink_priv, skb, tuple_id)) 4822 + return false; 4895 4823 } 4896 4824 4897 4825 tunnel_moffset = mlx5e_tc_attr_to_reg_mappings[TUNNEL_TO_REG].moffset;
+9
drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
··· 94 94 enum mlx5e_tc_attr_to_reg { 95 95 CHAIN_TO_REG, 96 96 TUNNEL_TO_REG, 97 + CTSTATE_TO_REG, 98 + ZONE_TO_REG, 99 + MARK_TO_REG, 100 + LABELS_TO_REG, 101 + FTEID_TO_REG, 102 + TUPLEID_TO_REG, 97 103 }; 98 104 99 105 struct mlx5e_tc_attr_to_reg_mapping { ··· 144 138 int namespace, 145 139 struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts); 146 140 void dealloc_mod_hdr_actions(struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts); 141 + 142 + struct mlx5e_tc_flow; 143 + u32 mlx5e_tc_get_flow_tun_id(struct mlx5e_tc_flow *flow); 147 144 148 145 #else /* CONFIG_MLX5_ESWITCH */ 149 146 static inline int mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
+6
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
··· 42 42 #include <linux/mlx5/vport.h> 43 43 #include <linux/mlx5/fs.h> 44 44 #include "lib/mpfs.h" 45 + #include "en/tc_ct.h" 45 46 46 47 #define FDB_TC_MAX_CHAIN 3 47 48 #define FDB_FT_CHAIN (FDB_TC_MAX_CHAIN + 1) ··· 237 236 238 237 enum { 239 238 MLX5_ESWITCH_VPORT_MATCH_METADATA = BIT(0), 239 + MLX5_ESWITCH_REG_C1_LOOPBACK_ENABLED = BIT(1), 240 240 }; 241 241 242 242 struct mlx5_eswitch { ··· 392 390 enum { 393 391 MLX5_ESW_ATTR_FLAG_VLAN_HANDLED = BIT(0), 394 392 MLX5_ESW_ATTR_FLAG_SLOW_PATH = BIT(1), 393 + MLX5_ESW_ATTR_FLAG_NO_IN_PORT = BIT(2), 395 394 }; 396 395 397 396 struct mlx5_esw_flow_attr { ··· 423 420 u16 prio; 424 421 u32 dest_chain; 425 422 u32 flags; 423 + struct mlx5_flow_table *fdb; 424 + struct mlx5_flow_table *dest_ft; 425 + struct mlx5_ct_attr ct_attr; 426 426 struct mlx5e_tc_flow_parse_attr *parse_attr; 427 427 }; 428 428
+49 -17
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
··· 324 324 if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) { 325 325 struct mlx5_flow_table *ft; 326 326 327 - if (attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) { 327 + if (attr->dest_ft) { 328 + flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL; 329 + dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; 330 + dest[i].ft = attr->dest_ft; 331 + i++; 332 + } else if (attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) { 328 333 flow_act.flags |= FLOW_ACT_IGNORE_FLOW_LEVEL; 329 334 dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; 330 335 dest[i].ft = mlx5_esw_chains_get_tc_end_ft(esw); ··· 383 378 if (split) { 384 379 fdb = esw_vport_tbl_get(esw, attr); 385 380 } else { 386 - fdb = mlx5_esw_chains_get_table(esw, attr->chain, attr->prio, 387 - 0); 388 - mlx5_eswitch_set_rule_source_port(esw, spec, attr); 381 + if (attr->chain || attr->prio) 382 + fdb = mlx5_esw_chains_get_table(esw, attr->chain, 383 + attr->prio, 0); 384 + else 385 + fdb = attr->fdb; 386 + 387 + if (!(attr->flags & MLX5_ESW_ATTR_FLAG_NO_IN_PORT)) 388 + mlx5_eswitch_set_rule_source_port(esw, spec, attr); 389 389 } 390 390 if (IS_ERR(fdb)) { 391 391 rule = ERR_CAST(fdb); ··· 412 402 err_add_rule: 413 403 if (split) 414 404 esw_vport_tbl_put(esw, attr); 415 - else 405 + else if (attr->chain || attr->prio) 416 406 mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 0); 417 407 err_esw_get: 418 408 if (!(attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH) && attr->dest_chain) ··· 509 499 } else { 510 500 if (split) 511 501 esw_vport_tbl_put(esw, attr); 512 - else 502 + else if (attr->chain || attr->prio) 513 503 mlx5_esw_chains_put_table(esw, attr->chain, attr->prio, 514 504 0); 515 505 if (attr->dest_chain) ··· 773 763 mlx5_del_flow_rules(rule); 774 764 } 775 765 766 + static bool mlx5_eswitch_reg_c1_loopback_supported(struct mlx5_eswitch *esw) 767 + { 768 + return MLX5_CAP_ESW_FLOWTABLE(esw->dev, fdb_to_vport_reg_c_id) & 769 + MLX5_FDB_TO_VPORT_REG_C_1; 770 + } 771 + 776 772 static int esw_set_passing_vport_metadata(struct mlx5_eswitch *esw, bool enable) 777 773 { 778 774 u32 out[MLX5_ST_SZ_DW(query_esw_vport_context_out)] = {}; 779 775 u32 in[MLX5_ST_SZ_DW(modify_esw_vport_context_in)] = {}; 780 - u8 fdb_to_vport_reg_c_id; 776 + u8 curr, wanted; 781 777 int err; 782 778 783 - if (!mlx5_eswitch_vport_match_metadata_enabled(esw)) 779 + if (!mlx5_eswitch_reg_c1_loopback_supported(esw) && 780 + !mlx5_eswitch_vport_match_metadata_enabled(esw)) 784 781 return 0; 785 782 786 783 err = mlx5_eswitch_query_esw_vport_context(esw->dev, 0, false, ··· 795 778 if (err) 796 779 return err; 797 780 798 - fdb_to_vport_reg_c_id = MLX5_GET(query_esw_vport_context_out, out, 799 - esw_vport_context.fdb_to_vport_reg_c_id); 781 + curr = MLX5_GET(query_esw_vport_context_out, out, 782 + esw_vport_context.fdb_to_vport_reg_c_id); 783 + wanted = MLX5_FDB_TO_VPORT_REG_C_0; 784 + if (mlx5_eswitch_reg_c1_loopback_supported(esw)) 785 + wanted |= MLX5_FDB_TO_VPORT_REG_C_1; 800 786 801 787 if (enable) 802 - fdb_to_vport_reg_c_id |= MLX5_FDB_TO_VPORT_REG_C_0 | 803 - MLX5_FDB_TO_VPORT_REG_C_1; 788 + curr |= wanted; 804 789 else 805 - fdb_to_vport_reg_c_id &= ~(MLX5_FDB_TO_VPORT_REG_C_0 | 806 - MLX5_FDB_TO_VPORT_REG_C_1); 790 + curr &= ~wanted; 807 791 808 792 MLX5_SET(modify_esw_vport_context_in, in, 809 - esw_vport_context.fdb_to_vport_reg_c_id, fdb_to_vport_reg_c_id); 793 + esw_vport_context.fdb_to_vport_reg_c_id, curr); 810 794 811 795 MLX5_SET(modify_esw_vport_context_in, in, 812 796 field_select.fdb_to_vport_reg_c_id, 1); 813 797 814 - return mlx5_eswitch_modify_esw_vport_context(esw->dev, 0, false, 815 - in, sizeof(in)); 798 + err = mlx5_eswitch_modify_esw_vport_context(esw->dev, 0, false, in, 799 + sizeof(in)); 800 + if (!err) { 801 + if (enable && (curr & MLX5_FDB_TO_VPORT_REG_C_1)) 802 + esw->flags |= MLX5_ESWITCH_REG_C1_LOOPBACK_ENABLED; 803 + else 804 + esw->flags &= ~MLX5_ESWITCH_REG_C1_LOOPBACK_ENABLED; 805 + } 806 + 807 + return err; 816 808 } 817 809 818 810 static void peer_miss_rules_setup(struct mlx5_eswitch *esw, ··· 2856 2830 return vport_num >= MLX5_VPORT_FIRST_VF && 2857 2831 vport_num <= esw->dev->priv.sriov.max_vfs; 2858 2832 } 2833 + 2834 + bool mlx5_eswitch_reg_c1_loopback_enabled(const struct mlx5_eswitch *esw) 2835 + { 2836 + return !!(esw->flags & MLX5_ESWITCH_REG_C1_LOOPBACK_ENABLED); 2837 + } 2838 + EXPORT_SYMBOL(mlx5_eswitch_reg_c1_loopback_enabled); 2859 2839 2860 2840 bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw) 2861 2841 {
+43
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
··· 722 722 return tc_end_fdb(esw); 723 723 } 724 724 725 + struct mlx5_flow_table * 726 + mlx5_esw_chains_create_global_table(struct mlx5_eswitch *esw) 727 + { 728 + int chain, prio, level, err; 729 + 730 + if (!fdb_ignore_flow_level_supported(esw)) { 731 + err = -EOPNOTSUPP; 732 + 733 + esw_warn(esw->dev, 734 + "Couldn't create global flow table, ignore_flow_level not supported."); 735 + goto err_ignore; 736 + } 737 + 738 + chain = mlx5_esw_chains_get_chain_range(esw), 739 + prio = mlx5_esw_chains_get_prio_range(esw); 740 + level = mlx5_esw_chains_get_level_range(esw); 741 + 742 + return mlx5_esw_chains_create_fdb_table(esw, chain, prio, level); 743 + 744 + err_ignore: 745 + return ERR_PTR(err); 746 + } 747 + 748 + void 749 + mlx5_esw_chains_destroy_global_table(struct mlx5_eswitch *esw, 750 + struct mlx5_flow_table *ft) 751 + { 752 + mlx5_esw_chains_destroy_fdb_table(esw, ft); 753 + } 754 + 725 755 static int 726 756 mlx5_esw_chains_init(struct mlx5_eswitch *esw) 727 757 { ··· 898 868 { 899 869 mlx5_esw_chains_close(esw); 900 870 mlx5_esw_chains_cleanup(esw); 871 + } 872 + 873 + int 874 + mlx5_esw_chains_get_chain_mapping(struct mlx5_eswitch *esw, u32 chain, 875 + u32 *chain_mapping) 876 + { 877 + return mapping_add(esw_chains_mapping(esw), &chain, chain_mapping); 878 + } 879 + 880 + int 881 + mlx5_esw_chains_put_chain_mapping(struct mlx5_eswitch *esw, u32 chain_mapping) 882 + { 883 + return mapping_remove(esw_chains_mapping(esw), chain_mapping); 901 884 } 902 885 903 886 int mlx5_eswitch_get_chain_for_tag(struct mlx5_eswitch *esw, u32 tag,
+13
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
··· 25 25 struct mlx5_flow_table * 26 26 mlx5_esw_chains_get_tc_end_ft(struct mlx5_eswitch *esw); 27 27 28 + struct mlx5_flow_table * 29 + mlx5_esw_chains_create_global_table(struct mlx5_eswitch *esw); 30 + void 31 + mlx5_esw_chains_destroy_global_table(struct mlx5_eswitch *esw, 32 + struct mlx5_flow_table *ft); 33 + 34 + int 35 + mlx5_esw_chains_get_chain_mapping(struct mlx5_eswitch *esw, u32 chain, 36 + u32 *chain_mapping); 37 + int 38 + mlx5_esw_chains_put_chain_mapping(struct mlx5_eswitch *esw, 39 + u32 chain_mapping); 40 + 28 41 int mlx5_esw_chains_create(struct mlx5_eswitch *esw); 29 42 void mlx5_esw_chains_destroy(struct mlx5_eswitch *esw); 30 43
+7
include/linux/mlx5/eswitch.h
··· 70 70 enum devlink_eswitch_encap_mode 71 71 mlx5_eswitch_get_encap_mode(const struct mlx5_core_dev *dev); 72 72 73 + bool mlx5_eswitch_reg_c1_loopback_enabled(const struct mlx5_eswitch *esw); 73 74 bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw); 74 75 75 76 /* Reg C0 usage: ··· 108 107 { 109 108 return DEVLINK_ESWITCH_ENCAP_MODE_NONE; 110 109 } 110 + 111 + static inline bool 112 + mlx5_eswitch_reg_c1_loopback_enabled(const struct mlx5_eswitch *esw) 113 + { 114 + return false; 115 + }; 111 116 112 117 static inline bool 113 118 mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw)
+13
include/net/flow_offload.h
··· 69 69 struct flow_dissector_key_enc_opts *key, *mask; 70 70 }; 71 71 72 + struct flow_match_ct { 73 + struct flow_dissector_key_ct *key, *mask; 74 + }; 75 + 72 76 struct flow_rule; 73 77 74 78 void flow_rule_match_meta(const struct flow_rule *rule, ··· 115 111 struct flow_match_enc_keyid *out); 116 112 void flow_rule_match_enc_opts(const struct flow_rule *rule, 117 113 struct flow_match_enc_opts *out); 114 + void flow_rule_match_ct(const struct flow_rule *rule, 115 + struct flow_match_ct *out); 118 116 119 117 enum flow_action_id { 120 118 FLOW_ACTION_ACCEPT = 0, ··· 142 136 FLOW_ACTION_SAMPLE, 143 137 FLOW_ACTION_POLICE, 144 138 FLOW_ACTION_CT, 139 + FLOW_ACTION_CT_METADATA, 145 140 FLOW_ACTION_MPLS_PUSH, 146 141 FLOW_ACTION_MPLS_POP, 147 142 FLOW_ACTION_MPLS_MANGLE, ··· 231 224 struct { /* FLOW_ACTION_CT */ 232 225 int action; 233 226 u16 zone; 227 + struct nf_flowtable *flow_table; 234 228 } ct; 229 + struct { 230 + unsigned long cookie; 231 + u32 mark; 232 + u32 labels[4]; 233 + } ct_metadata; 235 234 struct { /* FLOW_ACTION_MPLS_PUSH */ 236 235 u32 label; 237 236 __be16 proto;
+32
include/net/netfilter/nf_flow_table.h
··· 16 16 struct flow_offload; 17 17 enum flow_offload_tuple_dir; 18 18 19 + struct nf_flow_key { 20 + struct flow_dissector_key_meta meta; 21 + struct flow_dissector_key_control control; 22 + struct flow_dissector_key_basic basic; 23 + union { 24 + struct flow_dissector_key_ipv4_addrs ipv4; 25 + struct flow_dissector_key_ipv6_addrs ipv6; 26 + }; 27 + struct flow_dissector_key_tcp tcp; 28 + struct flow_dissector_key_ports tp; 29 + } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */ 30 + 31 + struct nf_flow_match { 32 + struct flow_dissector dissector; 33 + struct nf_flow_key key; 34 + struct nf_flow_key mask; 35 + }; 36 + 37 + struct nf_flow_rule { 38 + struct nf_flow_match match; 39 + struct flow_rule *rule; 40 + }; 41 + 19 42 struct nf_flowtable_type { 20 43 struct list_head list; 21 44 int family; ··· 67 44 struct delayed_work gc_work; 68 45 unsigned int flags; 69 46 struct flow_block flow_block; 47 + struct mutex flow_block_lock; /* Guards flow_block */ 70 48 possible_net_t net; 71 49 }; 72 50 ··· 153 129 struct flow_offload *flow_offload_alloc(struct nf_conn *ct); 154 130 void flow_offload_free(struct flow_offload *flow); 155 131 132 + int nf_flow_table_offload_add_cb(struct nf_flowtable *flow_table, 133 + flow_setup_cb_t *cb, void *cb_priv); 134 + void nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table, 135 + flow_setup_cb_t *cb, void *cb_priv); 136 + 156 137 int flow_offload_route_init(struct flow_offload *flow, 157 138 const struct nf_flow_route *route); 158 139 159 140 int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow); 141 + void flow_offload_refresh(struct nf_flowtable *flow_table, 142 + struct flow_offload *flow); 143 + 160 144 struct flow_offload_tuple_rhash *flow_offload_lookup(struct nf_flowtable *flow_table, 161 145 struct flow_offload_tuple *tuple); 162 146 void nf_flow_table_cleanup(struct net_device *dev);
+17
include/net/tc_act/tc_ct.h
··· 27 27 struct rcu_head rcu; 28 28 29 29 struct tcf_ct_flow_table *ct_ft; 30 + struct nf_flowtable *nf_ft; 30 31 }; 31 32 32 33 struct tcf_ct { ··· 51 50 return to_ct_params(a)->ct_action; 52 51 } 53 52 53 + static inline struct nf_flowtable *tcf_ct_ft(const struct tc_action *a) 54 + { 55 + return to_ct_params(a)->nf_ft; 56 + } 57 + 54 58 #else 55 59 static inline uint16_t tcf_ct_zone(const struct tc_action *a) { return 0; } 56 60 static inline int tcf_ct_action(const struct tc_action *a) { return 0; } 61 + static inline struct nf_flowtable *tcf_ct_ft(const struct tc_action *a) 62 + { 63 + return NULL; 64 + } 57 65 #endif /* CONFIG_NF_CONNTRACK */ 66 + 67 + #if IS_ENABLED(CONFIG_NET_ACT_CT) 68 + void tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie); 69 + #else 70 + static inline void 71 + tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie) { } 72 + #endif 58 73 59 74 static inline bool is_tcf_ct(const struct tc_action *a) 60 75 {
+7
net/core/flow_offload.c
··· 188 188 } 189 189 EXPORT_SYMBOL(flow_action_cookie_destroy); 190 190 191 + void flow_rule_match_ct(const struct flow_rule *rule, 192 + struct flow_match_ct *out) 193 + { 194 + FLOW_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_CT, out); 195 + } 196 + EXPORT_SYMBOL(flow_rule_match_ct); 197 + 191 198 struct flow_block_cb *flow_block_cb_alloc(flow_setup_cb_t *cb, 192 199 void *cb_ident, void *cb_priv, 193 200 void (*release)(void *cb_priv))
+60
net/netfilter/nf_flow_table_core.c
··· 252 252 } 253 253 EXPORT_SYMBOL_GPL(flow_offload_add); 254 254 255 + void flow_offload_refresh(struct nf_flowtable *flow_table, 256 + struct flow_offload *flow) 257 + { 258 + flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT; 259 + 260 + if (likely(!nf_flowtable_hw_offload(flow_table) || 261 + !test_and_clear_bit(NF_FLOW_HW_REFRESH, &flow->flags))) 262 + return; 263 + 264 + nf_flow_offload_add(flow_table, flow); 265 + } 266 + EXPORT_SYMBOL_GPL(flow_offload_refresh); 267 + 255 268 static inline bool nf_flow_has_expired(const struct flow_offload *flow) 256 269 { 257 270 return nf_flow_timeout_delta(flow->timeout) <= 0; ··· 384 371 nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, flow_table); 385 372 queue_delayed_work(system_power_efficient_wq, &flow_table->gc_work, HZ); 386 373 } 374 + 375 + int nf_flow_table_offload_add_cb(struct nf_flowtable *flow_table, 376 + flow_setup_cb_t *cb, void *cb_priv) 377 + { 378 + struct flow_block *block = &flow_table->flow_block; 379 + struct flow_block_cb *block_cb; 380 + int err = 0; 381 + 382 + mutex_lock(&flow_table->flow_block_lock); 383 + block_cb = flow_block_cb_lookup(block, cb, cb_priv); 384 + if (block_cb) { 385 + err = -EEXIST; 386 + goto unlock; 387 + } 388 + 389 + block_cb = flow_block_cb_alloc(cb, cb_priv, cb_priv, NULL); 390 + if (IS_ERR(block_cb)) { 391 + err = PTR_ERR(block_cb); 392 + goto unlock; 393 + } 394 + 395 + list_add_tail(&block_cb->list, &block->cb_list); 396 + 397 + unlock: 398 + mutex_unlock(&flow_table->flow_block_lock); 399 + return err; 400 + } 401 + EXPORT_SYMBOL_GPL(nf_flow_table_offload_add_cb); 402 + 403 + void nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table, 404 + flow_setup_cb_t *cb, void *cb_priv) 405 + { 406 + struct flow_block *block = &flow_table->flow_block; 407 + struct flow_block_cb *block_cb; 408 + 409 + mutex_lock(&flow_table->flow_block_lock); 410 + block_cb = flow_block_cb_lookup(block, cb, cb_priv); 411 + if (block_cb) 412 + list_del(&block_cb->list); 413 + else 414 + WARN_ON(true); 415 + mutex_unlock(&flow_table->flow_block_lock); 416 + } 417 + EXPORT_SYMBOL_GPL(nf_flow_table_offload_del_cb); 387 418 388 419 static int nf_flow_nat_port_tcp(struct sk_buff *skb, unsigned int thoff, 389 420 __be16 port, __be16 new_port) ··· 551 494 552 495 INIT_DEFERRABLE_WORK(&flowtable->gc_work, nf_flow_offload_work_gc); 553 496 flow_block_init(&flowtable->flow_block); 497 + mutex_init(&flowtable->flow_block_lock); 554 498 555 499 err = rhashtable_init(&flowtable->rhashtable, 556 500 &nf_flow_offload_rhash_params); ··· 608 550 mutex_lock(&flowtable_lock); 609 551 list_del(&flow_table->list); 610 552 mutex_unlock(&flowtable_lock); 553 + 611 554 cancel_delayed_work_sync(&flow_table->gc_work); 612 555 nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); 613 556 nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, flow_table); 614 557 nf_flow_table_offload_flush(flow_table); 615 558 rhashtable_destroy(&flow_table->rhashtable); 559 + mutex_destroy(&flow_table->flow_block_lock); 616 560 } 617 561 EXPORT_SYMBOL_GPL(nf_flow_table_free); 618 562
+2 -13
net/netfilter/nf_flow_table_ip.c
··· 232 232 return NF_STOLEN; 233 233 } 234 234 235 - static bool nf_flow_offload_refresh(struct nf_flowtable *flow_table, 236 - struct flow_offload *flow) 237 - { 238 - return nf_flowtable_hw_offload(flow_table) && 239 - test_and_clear_bit(NF_FLOW_HW_REFRESH, &flow->flags); 240 - } 241 - 242 235 unsigned int 243 236 nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, 244 237 const struct nf_hook_state *state) ··· 272 279 if (nf_flow_state_check(flow, ip_hdr(skb)->protocol, skb, thoff)) 273 280 return NF_ACCEPT; 274 281 275 - if (unlikely(nf_flow_offload_refresh(flow_table, flow))) 276 - nf_flow_offload_add(flow_table, flow); 282 + flow_offload_refresh(flow_table, flow); 277 283 278 284 if (nf_flow_offload_dst_check(&rt->dst)) { 279 285 flow_offload_teardown(flow); ··· 282 290 if (nf_flow_nat_ip(flow, skb, thoff, dir) < 0) 283 291 return NF_DROP; 284 292 285 - flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT; 286 293 iph = ip_hdr(skb); 287 294 ip_decrease_ttl(iph); 288 295 skb->tstamp = 0; ··· 499 508 sizeof(*ip6h))) 500 509 return NF_ACCEPT; 501 510 502 - if (unlikely(nf_flow_offload_refresh(flow_table, flow))) 503 - nf_flow_offload_add(flow_table, flow); 511 + flow_offload_refresh(flow_table, flow); 504 512 505 513 if (nf_flow_offload_dst_check(&rt->dst)) { 506 514 flow_offload_teardown(flow); ··· 512 522 if (nf_flow_nat_ipv6(flow, skb, dir) < 0) 513 523 return NF_DROP; 514 524 515 - flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT; 516 525 ip6h = ipv6_hdr(skb); 517 526 ip6h->hop_limit--; 518 527 skb->tstamp = 0;
+4 -23
net/netfilter/nf_flow_table_offload.c
··· 23 23 struct flow_offload *flow; 24 24 }; 25 25 26 - struct nf_flow_key { 27 - struct flow_dissector_key_meta meta; 28 - struct flow_dissector_key_control control; 29 - struct flow_dissector_key_basic basic; 30 - union { 31 - struct flow_dissector_key_ipv4_addrs ipv4; 32 - struct flow_dissector_key_ipv6_addrs ipv6; 33 - }; 34 - struct flow_dissector_key_tcp tcp; 35 - struct flow_dissector_key_ports tp; 36 - } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */ 37 - 38 - struct nf_flow_match { 39 - struct flow_dissector dissector; 40 - struct nf_flow_key key; 41 - struct nf_flow_key mask; 42 - }; 43 - 44 - struct nf_flow_rule { 45 - struct nf_flow_match match; 46 - struct flow_rule *rule; 47 - }; 48 - 49 26 #define NF_FLOW_DISSECTOR(__match, __type, __field) \ 50 27 (__match)->dissector.offset[__type] = \ 51 28 offsetof(struct nf_flow_key, __field) ··· 587 610 if (cmd == FLOW_CLS_REPLACE) 588 611 cls_flow.rule = flow_rule->rule; 589 612 613 + mutex_lock(&flowtable->flow_block_lock); 590 614 list_for_each_entry(block_cb, block_cb_list, list) { 591 615 err = block_cb->cb(TC_SETUP_CLSFLOWER, &cls_flow, 592 616 block_cb->cb_priv); ··· 596 618 597 619 i++; 598 620 } 621 + mutex_unlock(&flowtable->flow_block_lock); 599 622 600 623 return i; 601 624 } ··· 671 692 FLOW_CLS_STATS, 672 693 &offload->flow->tuplehash[dir].tuple, &extack); 673 694 695 + mutex_lock(&flowtable->flow_block_lock); 674 696 list_for_each_entry(block_cb, &flowtable->flow_block.cb_list, list) 675 697 block_cb->cb(TC_SETUP_CLSFLOWER, &cls_flow, block_cb->cb_priv); 698 + mutex_unlock(&flowtable->flow_block_lock); 676 699 memcpy(stats, &cls_flow.stats, sizeof(*stats)); 677 700 } 678 701
+226
net/sched/act_ct.c
··· 55 55 .automatic_shrinking = true, 56 56 }; 57 57 58 + static struct flow_action_entry * 59 + tcf_ct_flow_table_flow_action_get_next(struct flow_action *flow_action) 60 + { 61 + int i = flow_action->num_entries++; 62 + 63 + return &flow_action->entries[i]; 64 + } 65 + 66 + static void tcf_ct_add_mangle_action(struct flow_action *action, 67 + enum flow_action_mangle_base htype, 68 + u32 offset, 69 + u32 mask, 70 + u32 val) 71 + { 72 + struct flow_action_entry *entry; 73 + 74 + entry = tcf_ct_flow_table_flow_action_get_next(action); 75 + entry->id = FLOW_ACTION_MANGLE; 76 + entry->mangle.htype = htype; 77 + entry->mangle.mask = ~mask; 78 + entry->mangle.offset = offset; 79 + entry->mangle.val = val; 80 + } 81 + 82 + /* The following nat helper functions check if the inverted reverse tuple 83 + * (target) is different then the current dir tuple - meaning nat for ports 84 + * and/or ip is needed, and add the relevant mangle actions. 85 + */ 86 + static void 87 + tcf_ct_flow_table_add_action_nat_ipv4(const struct nf_conntrack_tuple *tuple, 88 + struct nf_conntrack_tuple target, 89 + struct flow_action *action) 90 + { 91 + if (memcmp(&target.src.u3, &tuple->src.u3, sizeof(target.src.u3))) 92 + tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_IP4, 93 + offsetof(struct iphdr, saddr), 94 + 0xFFFFFFFF, 95 + be32_to_cpu(target.src.u3.ip)); 96 + if (memcmp(&target.dst.u3, &tuple->dst.u3, sizeof(target.dst.u3))) 97 + tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_IP4, 98 + offsetof(struct iphdr, daddr), 99 + 0xFFFFFFFF, 100 + be32_to_cpu(target.dst.u3.ip)); 101 + } 102 + 103 + static void 104 + tcf_ct_add_ipv6_addr_mangle_action(struct flow_action *action, 105 + union nf_inet_addr *addr, 106 + u32 offset) 107 + { 108 + int i; 109 + 110 + for (i = 0; i < sizeof(struct in6_addr) / sizeof(u32); i++) 111 + tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_IP6, 112 + i * sizeof(u32) + offset, 113 + 0xFFFFFFFF, be32_to_cpu(addr->ip6[i])); 114 + } 115 + 116 + static void 117 + tcf_ct_flow_table_add_action_nat_ipv6(const struct nf_conntrack_tuple *tuple, 118 + struct nf_conntrack_tuple target, 119 + struct flow_action *action) 120 + { 121 + if (memcmp(&target.src.u3, &tuple->src.u3, sizeof(target.src.u3))) 122 + tcf_ct_add_ipv6_addr_mangle_action(action, &target.src.u3, 123 + offsetof(struct ipv6hdr, 124 + saddr)); 125 + if (memcmp(&target.dst.u3, &tuple->dst.u3, sizeof(target.dst.u3))) 126 + tcf_ct_add_ipv6_addr_mangle_action(action, &target.dst.u3, 127 + offsetof(struct ipv6hdr, 128 + daddr)); 129 + } 130 + 131 + static void 132 + tcf_ct_flow_table_add_action_nat_tcp(const struct nf_conntrack_tuple *tuple, 133 + struct nf_conntrack_tuple target, 134 + struct flow_action *action) 135 + { 136 + __be16 target_src = target.src.u.tcp.port; 137 + __be16 target_dst = target.dst.u.tcp.port; 138 + 139 + if (target_src != tuple->src.u.tcp.port) 140 + tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP, 141 + offsetof(struct tcphdr, source), 142 + 0xFFFF, be16_to_cpu(target_src)); 143 + if (target_dst != tuple->dst.u.tcp.port) 144 + tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP, 145 + offsetof(struct tcphdr, dest), 146 + 0xFFFF, be16_to_cpu(target_dst)); 147 + } 148 + 149 + static void 150 + tcf_ct_flow_table_add_action_nat_udp(const struct nf_conntrack_tuple *tuple, 151 + struct nf_conntrack_tuple target, 152 + struct flow_action *action) 153 + { 154 + __be16 target_src = target.src.u.udp.port; 155 + __be16 target_dst = target.dst.u.udp.port; 156 + 157 + if (target_src != tuple->src.u.udp.port) 158 + tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP, 159 + offsetof(struct udphdr, source), 160 + 0xFFFF, be16_to_cpu(target_src)); 161 + if (target_dst != tuple->dst.u.udp.port) 162 + tcf_ct_add_mangle_action(action, FLOW_ACT_MANGLE_HDR_TYPE_TCP, 163 + offsetof(struct udphdr, dest), 164 + 0xFFFF, be16_to_cpu(target_dst)); 165 + } 166 + 167 + static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct, 168 + enum ip_conntrack_dir dir, 169 + struct flow_action *action) 170 + { 171 + struct nf_conn_labels *ct_labels; 172 + struct flow_action_entry *entry; 173 + enum ip_conntrack_info ctinfo; 174 + u32 *act_ct_labels; 175 + 176 + entry = tcf_ct_flow_table_flow_action_get_next(action); 177 + entry->id = FLOW_ACTION_CT_METADATA; 178 + #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) 179 + entry->ct_metadata.mark = ct->mark; 180 + #endif 181 + ctinfo = dir == IP_CT_DIR_ORIGINAL ? IP_CT_ESTABLISHED : 182 + IP_CT_ESTABLISHED_REPLY; 183 + /* aligns with the CT reference on the SKB nf_ct_set */ 184 + entry->ct_metadata.cookie = (unsigned long)ct | ctinfo; 185 + 186 + act_ct_labels = entry->ct_metadata.labels; 187 + ct_labels = nf_ct_labels_find(ct); 188 + if (ct_labels) 189 + memcpy(act_ct_labels, ct_labels->bits, NF_CT_LABELS_MAX_SIZE); 190 + else 191 + memset(act_ct_labels, 0, NF_CT_LABELS_MAX_SIZE); 192 + } 193 + 194 + static int tcf_ct_flow_table_add_action_nat(struct net *net, 195 + struct nf_conn *ct, 196 + enum ip_conntrack_dir dir, 197 + struct flow_action *action) 198 + { 199 + const struct nf_conntrack_tuple *tuple = &ct->tuplehash[dir].tuple; 200 + struct nf_conntrack_tuple target; 201 + 202 + nf_ct_invert_tuple(&target, &ct->tuplehash[!dir].tuple); 203 + 204 + switch (tuple->src.l3num) { 205 + case NFPROTO_IPV4: 206 + tcf_ct_flow_table_add_action_nat_ipv4(tuple, target, 207 + action); 208 + break; 209 + case NFPROTO_IPV6: 210 + tcf_ct_flow_table_add_action_nat_ipv6(tuple, target, 211 + action); 212 + break; 213 + default: 214 + return -EOPNOTSUPP; 215 + } 216 + 217 + switch (nf_ct_protonum(ct)) { 218 + case IPPROTO_TCP: 219 + tcf_ct_flow_table_add_action_nat_tcp(tuple, target, action); 220 + break; 221 + case IPPROTO_UDP: 222 + tcf_ct_flow_table_add_action_nat_udp(tuple, target, action); 223 + break; 224 + default: 225 + return -EOPNOTSUPP; 226 + } 227 + 228 + return 0; 229 + } 230 + 231 + static int tcf_ct_flow_table_fill_actions(struct net *net, 232 + const struct flow_offload *flow, 233 + enum flow_offload_tuple_dir tdir, 234 + struct nf_flow_rule *flow_rule) 235 + { 236 + struct flow_action *action = &flow_rule->rule->action; 237 + int num_entries = action->num_entries; 238 + struct nf_conn *ct = flow->ct; 239 + enum ip_conntrack_dir dir; 240 + int i, err; 241 + 242 + switch (tdir) { 243 + case FLOW_OFFLOAD_DIR_ORIGINAL: 244 + dir = IP_CT_DIR_ORIGINAL; 245 + break; 246 + case FLOW_OFFLOAD_DIR_REPLY: 247 + dir = IP_CT_DIR_REPLY; 248 + break; 249 + default: 250 + return -EOPNOTSUPP; 251 + } 252 + 253 + err = tcf_ct_flow_table_add_action_nat(net, ct, dir, action); 254 + if (err) 255 + goto err_nat; 256 + 257 + tcf_ct_flow_table_add_action_meta(ct, dir, action); 258 + return 0; 259 + 260 + err_nat: 261 + /* Clear filled actions */ 262 + for (i = num_entries; i < action->num_entries; i++) 263 + memset(&action->entries[i], 0, sizeof(action->entries[i])); 264 + action->num_entries = num_entries; 265 + 266 + return err; 267 + } 268 + 58 269 static struct nf_flowtable_type flowtable_ct = { 270 + .action = tcf_ct_flow_table_fill_actions, 59 271 .owner = THIS_MODULE, 60 272 }; 61 273 ··· 292 80 goto err_insert; 293 81 294 82 ct_ft->nf_ft.type = &flowtable_ct; 83 + ct_ft->nf_ft.flags |= NF_FLOWTABLE_HW_OFFLOAD; 295 84 err = nf_flow_table_init(&ct_ft->nf_ft); 296 85 if (err) 297 86 goto err_init; ··· 300 87 __module_get(THIS_MODULE); 301 88 out_unlock: 302 89 params->ct_ft = ct_ft; 90 + params->nf_ft = &ct_ft->nf_ft; 303 91 mutex_unlock(&zones_mutex); 304 92 305 93 return 0; ··· 533 319 ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED : 534 320 IP_CT_ESTABLISHED_REPLY; 535 321 322 + flow_offload_refresh(nf_ft, flow); 536 323 nf_conntrack_get(&ct->ct_general); 537 324 nf_ct_set(skb, ct, ctinfo); 538 325 ··· 1537 1322 tcf_ct_flow_tables_uninit(); 1538 1323 destroy_workqueue(act_ct_wq); 1539 1324 } 1325 + 1326 + void tcf_ct_flow_table_restore_skb(struct sk_buff *skb, unsigned long cookie) 1327 + { 1328 + enum ip_conntrack_info ctinfo = cookie & NFCT_INFOMASK; 1329 + struct nf_conn *ct; 1330 + 1331 + ct = (struct nf_conn *)(cookie & NFCT_PTRMASK); 1332 + nf_conntrack_get(&ct->ct_general); 1333 + nf_ct_set(skb, ct, ctinfo); 1334 + } 1335 + EXPORT_SYMBOL_GPL(tcf_ct_flow_table_restore_skb); 1540 1336 1541 1337 module_init(ct_init_module); 1542 1338 module_exit(ct_cleanup_module);
+1
net/sched/cls_api.c
··· 3636 3636 entry->id = FLOW_ACTION_CT; 3637 3637 entry->ct.action = tcf_ct_action(act); 3638 3638 entry->ct.zone = tcf_ct_zone(act); 3639 + entry->ct.flow_table = tcf_ct_ft(act); 3639 3640 } else if (is_tcf_mpls(act)) { 3640 3641 switch (tcf_mpls_action(act)) { 3641 3642 case TCA_MPLS_ACT_PUSH: