Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'mlx5-updates-2019-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 tc flow handling for concurrent execution (Part 3)

This series includes updates to mlx5 ethernet and core driver:

Vlad submits part 3 of 3 part series to allow TC flow handling
for concurrent execution.

Vlad says:
==========

Structure mlx5e_neigh_hash_entry code that uses it are refactored in
following ways:

- Extend neigh_hash_entry with rcu and modify its users to always take
reference to the structure when using it (neigh_hash_entry has already
had atomic reference counter which was only used when scheduling neigh
update on workqueue from atomic context of neigh update netevent).

- Always use mlx5e_neigh_update_table->encap_lock when modifying neigh
update hash table and list. Originally, this lock was only used to
synchronize with netevent handler function, which is called from bh
context and cannot use rtnl lock for synchronization. Use rcu read lock
instead of encap_lock to lookup nhe in atomic context of netevent even
handler function. Convert encap_lock to mutex to allow creating new
neigh hash entries while holding it, which is safe to do because the
lock is no longer used in atomic context.

- Rcu-ify mlx5e_neigh_hash_entry->encap_list by changing operations on
encap list to their rcu counterparts and extending encap structure
with rcu_head to free the encap instances after rcu grace period. This
allows fast traversal of list of encaps attached to nhe under rcu read
lock protection.

- Take encap_table_lock when accessing encap entries in neigh update and
neigh stats update code to protect from concurrent encap entry
insertion or removal.

This approach leads to potential race condition when neigh update and
neigh stats update code can access encap and flow entries that are not
fully initialized or are being destroyed, or neigh can change state
without updating encaps that are created concurrently. Prevent these
issues by following changes in flow and encap initialization:

- Extend mlx5e_tc_flow with 'init_done' completion. Modify neigh update
to wait for both encap and flow completions to prevent concurrent
access to a structure that is being initialized by tc.

- Skip structures that failed during initialization: encaps with
encap_id<0 and flows that don't have OFFLOADED flag set.

- To ensure that no new flows are added to encap when it is being
accessed by neigh update or neigh stats update, take encap_table_lock
mutex.

- To prevent concurrent deletion by tc, ensure that neigh update and
neigh stats update hold references to encap and flow instances while
using them.

With changes presented in this patch set it is now safe to execute tc
concurrently with neigh update and neigh stats update. However, these
two workqueue tasks modify same flow "tmp_list" field to store flows
with reference taken in temporary list to release the references after
update operation finishes and should not be executed concurrently with
each other.

Last 3 patches of this series provide 3 new mlx5 trace points to track
mlx5 tc requests and mlx5 neigh updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+545 -126
+46
Documentation/networking/device_drivers/mellanox/mlx5.rst
··· 12 12 - `Enabling the driver and kconfig options`_ 13 13 - `Devlink info`_ 14 14 - `Devlink health reporters`_ 15 + - `mlx5 tracepoints`_ 15 16 16 17 Enabling the driver and kconfig options 17 18 ================================================ ··· 220 219 $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal 221 220 222 221 NOTE: This command can run only on PF. 222 + 223 + mlx5 tracepoints 224 + ================ 225 + 226 + mlx5 driver provides internal trace points for tracking and debugging using 227 + kernel tracepoints interfaces (refer to Documentation/trace/ftrase.rst). 228 + 229 + For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/ 230 + 231 + tc and eswitch offloads tracepoints: 232 + 233 + - mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5:: 234 + 235 + $ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event 236 + $ cat /sys/kernel/debug/tracing/trace 237 + ... 238 + tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT 239 + 240 + - mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5:: 241 + 242 + $ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event 243 + $ cat /sys/kernel/debug/tracing/trace 244 + ... 245 + tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL 246 + 247 + - mlx5e_stats_flower: trace flower stats request:: 248 + 249 + $ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event 250 + $ cat /sys/kernel/debug/tracing/trace 251 + ... 252 + tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217 253 + 254 + - mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5:: 255 + 256 + $ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event 257 + $ cat /sys/kernel/debug/tracing/trace 258 + ... 259 + kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1 260 + 261 + - mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events:: 262 + 263 + $ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event 264 + $ cat /sys/kernel/debug/tracing/trace 265 + ... 266 + kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1
+1 -1
drivers/net/ethernet/mellanox/mlx5/core/Makefile
··· 35 35 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o en/port_buffer.o 36 36 mlx5_core-$(CONFIG_MLX5_ESWITCH) += en_rep.o en_tc.o en/tc_tun.o lib/port_tun.o lag_mp.o \ 37 37 lib/geneve.o en/tc_tun_vxlan.o en/tc_tun_gre.o \ 38 - en/tc_tun_geneve.o 38 + en/tc_tun_geneve.o diag/en_tc_tracepoint.o 39 39 40 40 # 41 41 # Core extra
+54
drivers/net/ethernet/mellanox/mlx5/core/diag/en_rep_tracepoint.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + /* Copyright (c) 2019 Mellanox Technologies. */ 3 + 4 + #undef TRACE_SYSTEM 5 + #define TRACE_SYSTEM mlx5 6 + 7 + #if !defined(_MLX5_EN_REP_TP_) || defined(TRACE_HEADER_MULTI_READ) 8 + #define _MLX5_EN_REP_TP_ 9 + 10 + #include <linux/tracepoint.h> 11 + #include <linux/trace_seq.h> 12 + #include "en_rep.h" 13 + 14 + TRACE_EVENT(mlx5e_rep_neigh_update, 15 + TP_PROTO(const struct mlx5e_neigh_hash_entry *nhe, const u8 *ha, 16 + bool neigh_connected), 17 + TP_ARGS(nhe, ha, neigh_connected), 18 + TP_STRUCT__entry(__string(devname, nhe->m_neigh.dev->name) 19 + __array(u8, ha, ETH_ALEN) 20 + __array(u8, v4, 4) 21 + __array(u8, v6, 16) 22 + __field(bool, neigh_connected) 23 + ), 24 + TP_fast_assign(const struct mlx5e_neigh *mn = &nhe->m_neigh; 25 + struct in6_addr *pin6; 26 + __be32 *p32; 27 + 28 + __assign_str(devname, mn->dev->name); 29 + __entry->neigh_connected = neigh_connected; 30 + memcpy(__entry->ha, ha, ETH_ALEN); 31 + 32 + p32 = (__be32 *)__entry->v4; 33 + pin6 = (struct in6_addr *)__entry->v6; 34 + if (mn->family == AF_INET) { 35 + *p32 = mn->dst_ip.v4; 36 + ipv6_addr_set_v4mapped(*p32, pin6); 37 + } else if (mn->family == AF_INET6) { 38 + *pin6 = mn->dst_ip.v6; 39 + } 40 + ), 41 + TP_printk("netdev: %s MAC: %pM IPv4: %pI4 IPv6: %pI6c neigh_connected=%d\n", 42 + __get_str(devname), __entry->ha, 43 + __entry->v4, __entry->v6, __entry->neigh_connected 44 + ) 45 + ); 46 + 47 + #endif /* _MLX5_EN_REP_TP_ */ 48 + 49 + /* This part must be outside protection */ 50 + #undef TRACE_INCLUDE_PATH 51 + #define TRACE_INCLUDE_PATH ./diag 52 + #undef TRACE_INCLUDE_FILE 53 + #define TRACE_INCLUDE_FILE en_rep_tracepoint 54 + #include <trace/define_trace.h>
+58
drivers/net/ethernet/mellanox/mlx5/core/diag/en_tc_tracepoint.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + /* Copyright (c) 2019 Mellanox Technologies. */ 3 + 4 + #define CREATE_TRACE_POINTS 5 + #include "en_tc_tracepoint.h" 6 + 7 + void put_ids_to_array(int *ids, 8 + const struct flow_action_entry *entries, 9 + unsigned int num) 10 + { 11 + unsigned int i; 12 + 13 + for (i = 0; i < num; i++) 14 + ids[i] = entries[i].id; 15 + } 16 + 17 + #define NAME_SIZE 16 18 + 19 + static const char FLOWACT2STR[NUM_FLOW_ACTIONS][NAME_SIZE] = { 20 + [FLOW_ACTION_ACCEPT] = "ACCEPT", 21 + [FLOW_ACTION_DROP] = "DROP", 22 + [FLOW_ACTION_TRAP] = "TRAP", 23 + [FLOW_ACTION_GOTO] = "GOTO", 24 + [FLOW_ACTION_REDIRECT] = "REDIRECT", 25 + [FLOW_ACTION_MIRRED] = "MIRRED", 26 + [FLOW_ACTION_VLAN_PUSH] = "VLAN_PUSH", 27 + [FLOW_ACTION_VLAN_POP] = "VLAN_POP", 28 + [FLOW_ACTION_VLAN_MANGLE] = "VLAN_MANGLE", 29 + [FLOW_ACTION_TUNNEL_ENCAP] = "TUNNEL_ENCAP", 30 + [FLOW_ACTION_TUNNEL_DECAP] = "TUNNEL_DECAP", 31 + [FLOW_ACTION_MANGLE] = "MANGLE", 32 + [FLOW_ACTION_ADD] = "ADD", 33 + [FLOW_ACTION_CSUM] = "CSUM", 34 + [FLOW_ACTION_MARK] = "MARK", 35 + [FLOW_ACTION_WAKE] = "WAKE", 36 + [FLOW_ACTION_QUEUE] = "QUEUE", 37 + [FLOW_ACTION_SAMPLE] = "SAMPLE", 38 + [FLOW_ACTION_POLICE] = "POLICE", 39 + [FLOW_ACTION_CT] = "CT", 40 + }; 41 + 42 + const char *parse_action(struct trace_seq *p, 43 + int *ids, 44 + unsigned int num) 45 + { 46 + const char *ret = trace_seq_buffer_ptr(p); 47 + unsigned int i; 48 + 49 + for (i = 0; i < num; i++) { 50 + if (ids[i] < NUM_FLOW_ACTIONS) 51 + trace_seq_printf(p, "%s ", FLOWACT2STR[ids[i]]); 52 + else 53 + trace_seq_printf(p, "UNKNOWN "); 54 + } 55 + 56 + trace_seq_putc(p, 0); 57 + return ret; 58 + }
+114
drivers/net/ethernet/mellanox/mlx5/core/diag/en_tc_tracepoint.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + /* Copyright (c) 2019 Mellanox Technologies. */ 3 + 4 + #undef TRACE_SYSTEM 5 + #define TRACE_SYSTEM mlx5 6 + 7 + #if !defined(_MLX5_TC_TP_) || defined(TRACE_HEADER_MULTI_READ) 8 + #define _MLX5_TC_TP_ 9 + 10 + #include <linux/tracepoint.h> 11 + #include <linux/trace_seq.h> 12 + #include <net/flow_offload.h> 13 + #include "en_rep.h" 14 + 15 + #define __parse_action(ids, num) parse_action(p, ids, num) 16 + 17 + void put_ids_to_array(int *ids, 18 + const struct flow_action_entry *entries, 19 + unsigned int num); 20 + 21 + const char *parse_action(struct trace_seq *p, 22 + int *ids, 23 + unsigned int num); 24 + 25 + DECLARE_EVENT_CLASS(mlx5e_flower_template, 26 + TP_PROTO(const struct flow_cls_offload *f), 27 + TP_ARGS(f), 28 + TP_STRUCT__entry(__field(void *, cookie) 29 + __field(unsigned int, num) 30 + __dynamic_array(int, ids, f->rule ? 31 + f->rule->action.num_entries : 0) 32 + ), 33 + TP_fast_assign(__entry->cookie = (void *)f->cookie; 34 + __entry->num = (f->rule ? 35 + f->rule->action.num_entries : 0); 36 + if (__entry->num) 37 + put_ids_to_array(__get_dynamic_array(ids), 38 + f->rule->action.entries, 39 + f->rule->action.num_entries); 40 + ), 41 + TP_printk("cookie=%p actions= %s\n", 42 + __entry->cookie, __entry->num ? 43 + __parse_action(__get_dynamic_array(ids), 44 + __entry->num) : "NULL" 45 + ) 46 + ); 47 + 48 + DEFINE_EVENT(mlx5e_flower_template, mlx5e_configure_flower, 49 + TP_PROTO(const struct flow_cls_offload *f), 50 + TP_ARGS(f) 51 + ); 52 + 53 + DEFINE_EVENT(mlx5e_flower_template, mlx5e_delete_flower, 54 + TP_PROTO(const struct flow_cls_offload *f), 55 + TP_ARGS(f) 56 + ); 57 + 58 + TRACE_EVENT(mlx5e_stats_flower, 59 + TP_PROTO(const struct flow_cls_offload *f), 60 + TP_ARGS(f), 61 + TP_STRUCT__entry(__field(void *, cookie) 62 + __field(u64, bytes) 63 + __field(u64, packets) 64 + __field(u64, lastused) 65 + ), 66 + TP_fast_assign(__entry->cookie = (void *)f->cookie; 67 + __entry->bytes = f->stats.bytes; 68 + __entry->packets = f->stats.pkts; 69 + __entry->lastused = f->stats.lastused; 70 + ), 71 + TP_printk("cookie=%p bytes=%llu packets=%llu lastused=%llu\n", 72 + __entry->cookie, __entry->bytes, 73 + __entry->packets, __entry->lastused 74 + ) 75 + ); 76 + 77 + TRACE_EVENT(mlx5e_tc_update_neigh_used_value, 78 + TP_PROTO(const struct mlx5e_neigh_hash_entry *nhe, bool neigh_used), 79 + TP_ARGS(nhe, neigh_used), 80 + TP_STRUCT__entry(__string(devname, nhe->m_neigh.dev->name) 81 + __array(u8, v4, 4) 82 + __array(u8, v6, 16) 83 + __field(bool, neigh_used) 84 + ), 85 + TP_fast_assign(const struct mlx5e_neigh *mn = &nhe->m_neigh; 86 + struct in6_addr *pin6; 87 + __be32 *p32; 88 + 89 + __assign_str(devname, mn->dev->name); 90 + __entry->neigh_used = neigh_used; 91 + 92 + p32 = (__be32 *)__entry->v4; 93 + pin6 = (struct in6_addr *)__entry->v6; 94 + if (mn->family == AF_INET) { 95 + *p32 = mn->dst_ip.v4; 96 + ipv6_addr_set_v4mapped(*p32, pin6); 97 + } else if (mn->family == AF_INET6) { 98 + *pin6 = mn->dst_ip.v6; 99 + } 100 + ), 101 + TP_printk("netdev: %s IPv4: %pI4 IPv6: %pI6c neigh_used=%d\n", 102 + __get_str(devname), __entry->v4, __entry->v6, 103 + __entry->neigh_used 104 + ) 105 + ); 106 + 107 + #endif /* _MLX5_TC_TP_ */ 108 + 109 + /* This part must be outside protection */ 110 + #undef TRACE_INCLUDE_PATH 111 + #define TRACE_INCLUDE_PATH ./diag 112 + #undef TRACE_INCLUDE_FILE 113 + #define TRACE_INCLUDE_FILE en_tc_tracepoint 114 + #include <trace/define_trace.h>
+138 -86
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
··· 46 46 #include "en/tc_tun.h" 47 47 #include "fs_core.h" 48 48 #include "lib/port_tun.h" 49 + #define CREATE_TRACE_POINTS 50 + #include "diag/en_rep_tracepoint.h" 49 51 50 52 #define MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE \ 51 53 max(0x7, MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE) ··· 526 524 neigh_update->min_interval); 527 525 } 528 526 527 + static bool mlx5e_rep_neigh_entry_hold(struct mlx5e_neigh_hash_entry *nhe) 528 + { 529 + return refcount_inc_not_zero(&nhe->refcnt); 530 + } 531 + 532 + static void mlx5e_rep_neigh_entry_remove(struct mlx5e_neigh_hash_entry *nhe); 533 + 534 + static void mlx5e_rep_neigh_entry_release(struct mlx5e_neigh_hash_entry *nhe) 535 + { 536 + if (refcount_dec_and_test(&nhe->refcnt)) { 537 + mlx5e_rep_neigh_entry_remove(nhe); 538 + kfree_rcu(nhe, rcu); 539 + } 540 + } 541 + 542 + static struct mlx5e_neigh_hash_entry * 543 + mlx5e_get_next_nhe(struct mlx5e_rep_priv *rpriv, 544 + struct mlx5e_neigh_hash_entry *nhe) 545 + { 546 + struct mlx5e_neigh_hash_entry *next = NULL; 547 + 548 + rcu_read_lock(); 549 + 550 + for (next = nhe ? 551 + list_next_or_null_rcu(&rpriv->neigh_update.neigh_list, 552 + &nhe->neigh_list, 553 + struct mlx5e_neigh_hash_entry, 554 + neigh_list) : 555 + list_first_or_null_rcu(&rpriv->neigh_update.neigh_list, 556 + struct mlx5e_neigh_hash_entry, 557 + neigh_list); 558 + next; 559 + next = list_next_or_null_rcu(&rpriv->neigh_update.neigh_list, 560 + &next->neigh_list, 561 + struct mlx5e_neigh_hash_entry, 562 + neigh_list)) 563 + if (mlx5e_rep_neigh_entry_hold(next)) 564 + break; 565 + 566 + rcu_read_unlock(); 567 + 568 + if (nhe) 569 + mlx5e_rep_neigh_entry_release(nhe); 570 + 571 + return next; 572 + } 573 + 529 574 static void mlx5e_rep_neigh_stats_work(struct work_struct *work) 530 575 { 531 576 struct mlx5e_rep_priv *rpriv = container_of(work, struct mlx5e_rep_priv, 532 577 neigh_update.neigh_stats_work.work); 533 578 struct net_device *netdev = rpriv->netdev; 534 579 struct mlx5e_priv *priv = netdev_priv(netdev); 535 - struct mlx5e_neigh_hash_entry *nhe; 580 + struct mlx5e_neigh_hash_entry *nhe = NULL; 536 581 537 582 rtnl_lock(); 538 583 if (!list_empty(&rpriv->neigh_update.neigh_list)) 539 584 mlx5e_rep_queue_neigh_stats_work(priv); 540 585 541 - list_for_each_entry(nhe, &rpriv->neigh_update.neigh_list, neigh_list) 586 + while ((nhe = mlx5e_get_next_nhe(rpriv, nhe)) != NULL) 542 587 mlx5e_tc_update_neigh_used_value(nhe); 543 588 544 589 rtnl_unlock(); 545 - } 546 - 547 - static void mlx5e_rep_neigh_entry_hold(struct mlx5e_neigh_hash_entry *nhe) 548 - { 549 - refcount_inc(&nhe->refcnt); 550 - } 551 - 552 - static void mlx5e_rep_neigh_entry_release(struct mlx5e_neigh_hash_entry *nhe) 553 - { 554 - if (refcount_dec_and_test(&nhe->refcnt)) 555 - kfree(nhe); 556 590 } 557 591 558 592 static void mlx5e_rep_update_flows(struct mlx5e_priv *priv, ··· 597 559 unsigned char ha[ETH_ALEN]) 598 560 { 599 561 struct ethhdr *eth = (struct ethhdr *)e->encap_header; 562 + struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; 563 + bool encap_connected; 564 + LIST_HEAD(flow_list); 600 565 601 566 ASSERT_RTNL(); 602 567 568 + /* wait for encap to be fully initialized */ 569 + wait_for_completion(&e->res_ready); 570 + 571 + mutex_lock(&esw->offloads.encap_tbl_lock); 572 + encap_connected = !!(e->flags & MLX5_ENCAP_ENTRY_VALID); 573 + if (e->compl_result || (encap_connected == neigh_connected && 574 + ether_addr_equal(e->h_dest, ha))) 575 + goto unlock; 576 + 577 + mlx5e_take_all_encap_flows(e, &flow_list); 578 + 603 579 if ((e->flags & MLX5_ENCAP_ENTRY_VALID) && 604 580 (!neigh_connected || !ether_addr_equal(e->h_dest, ha))) 605 - mlx5e_tc_encap_flows_del(priv, e); 581 + mlx5e_tc_encap_flows_del(priv, e, &flow_list); 606 582 607 583 if (neigh_connected && !(e->flags & MLX5_ENCAP_ENTRY_VALID)) { 608 584 ether_addr_copy(e->h_dest, ha); ··· 626 574 */ 627 575 ether_addr_copy(eth->h_source, e->route_dev->dev_addr); 628 576 629 - mlx5e_tc_encap_flows_add(priv, e); 577 + mlx5e_tc_encap_flows_add(priv, e, &flow_list); 630 578 } 579 + unlock: 580 + mutex_unlock(&esw->offloads.encap_tbl_lock); 581 + mlx5e_put_encap_flow_list(priv, &flow_list); 631 582 } 632 583 633 584 static void mlx5e_rep_neigh_update(struct work_struct *work) ··· 642 587 unsigned char ha[ETH_ALEN]; 643 588 struct mlx5e_priv *priv; 644 589 bool neigh_connected; 645 - bool encap_connected; 646 590 u8 nud_state, dead; 647 591 648 592 rtnl_lock(); ··· 659 605 660 606 neigh_connected = (nud_state & NUD_VALID) && !dead; 661 607 608 + trace_mlx5e_rep_neigh_update(nhe, ha, neigh_connected); 609 + 662 610 list_for_each_entry(e, &nhe->encap_list, encap_list) { 663 611 if (!mlx5e_encap_take(e)) 664 612 continue; 665 613 666 - encap_connected = !!(e->flags & MLX5_ENCAP_ENTRY_VALID); 667 614 priv = netdev_priv(e->out_dev); 668 - 669 - if (encap_connected != neigh_connected || 670 - !ether_addr_equal(e->h_dest, ha)) 671 - mlx5e_rep_update_flows(priv, e, neigh_connected, ha); 672 - 615 + mlx5e_rep_update_flows(priv, e, neigh_connected, ha); 673 616 mlx5e_encap_put(priv, e); 674 617 } 675 618 mlx5e_rep_neigh_entry_release(nhe); ··· 872 821 return NOTIFY_OK; 873 822 } 874 823 824 + static void 825 + mlx5e_rep_queue_neigh_update_work(struct mlx5e_priv *priv, 826 + struct mlx5e_neigh_hash_entry *nhe, 827 + struct neighbour *n) 828 + { 829 + /* Take a reference to ensure the neighbour and mlx5 encap 830 + * entry won't be destructed until we drop the reference in 831 + * delayed work. 832 + */ 833 + neigh_hold(n); 834 + 835 + /* This assignment is valid as long as the the neigh reference 836 + * is taken 837 + */ 838 + nhe->n = n; 839 + 840 + if (!queue_work(priv->wq, &nhe->neigh_update_work)) { 841 + mlx5e_rep_neigh_entry_release(nhe); 842 + neigh_release(n); 843 + } 844 + } 845 + 875 846 static struct mlx5e_neigh_hash_entry * 876 847 mlx5e_rep_neigh_entry_lookup(struct mlx5e_priv *priv, 877 848 struct mlx5e_neigh *m_neigh); ··· 926 853 m_neigh.family = n->ops->family; 927 854 memcpy(&m_neigh.dst_ip, n->primary_key, n->tbl->key_len); 928 855 929 - /* We are in atomic context and can't take RTNL mutex, so use 930 - * spin_lock_bh to lookup the neigh table. bh is used since 931 - * netevent can be called from a softirq context. 932 - */ 933 - spin_lock_bh(&neigh_update->encap_lock); 856 + rcu_read_lock(); 934 857 nhe = mlx5e_rep_neigh_entry_lookup(priv, &m_neigh); 935 - if (!nhe) { 936 - spin_unlock_bh(&neigh_update->encap_lock); 858 + rcu_read_unlock(); 859 + if (!nhe) 937 860 return NOTIFY_DONE; 938 - } 939 861 940 - /* This assignment is valid as long as the the neigh reference 941 - * is taken 942 - */ 943 - nhe->n = n; 944 - 945 - /* Take a reference to ensure the neighbour and mlx5 encap 946 - * entry won't be destructed until we drop the reference in 947 - * delayed work. 948 - */ 949 - neigh_hold(n); 950 - mlx5e_rep_neigh_entry_hold(nhe); 951 - 952 - if (!queue_work(priv->wq, &nhe->neigh_update_work)) { 953 - mlx5e_rep_neigh_entry_release(nhe); 954 - neigh_release(n); 955 - } 956 - spin_unlock_bh(&neigh_update->encap_lock); 862 + mlx5e_rep_queue_neigh_update_work(priv, nhe, n); 957 863 break; 958 864 959 865 case NETEVENT_DELAY_PROBE_TIME_UPDATE: ··· 949 897 #endif 950 898 return NOTIFY_DONE; 951 899 952 - /* We are in atomic context and can't take RTNL mutex, 953 - * so use spin_lock_bh to walk the neigh list and look for 954 - * the relevant device. bh is used since netevent can be 955 - * called from a softirq context. 956 - */ 957 - spin_lock_bh(&neigh_update->encap_lock); 958 - list_for_each_entry(nhe, &neigh_update->neigh_list, neigh_list) { 900 + rcu_read_lock(); 901 + list_for_each_entry_rcu(nhe, &neigh_update->neigh_list, 902 + neigh_list) { 959 903 if (p->dev == nhe->m_neigh.dev) { 960 904 found = true; 961 905 break; 962 906 } 963 907 } 964 - spin_unlock_bh(&neigh_update->encap_lock); 908 + rcu_read_unlock(); 965 909 if (!found) 966 910 return NOTIFY_DONE; 967 911 ··· 988 940 return err; 989 941 990 942 INIT_LIST_HEAD(&neigh_update->neigh_list); 991 - spin_lock_init(&neigh_update->encap_lock); 943 + mutex_init(&neigh_update->encap_lock); 992 944 INIT_DELAYED_WORK(&neigh_update->neigh_stats_work, 993 945 mlx5e_rep_neigh_stats_work); 994 946 mlx5e_rep_neigh_update_init_interval(rpriv); ··· 1015 967 1016 968 cancel_delayed_work_sync(&rpriv->neigh_update.neigh_stats_work); 1017 969 970 + mutex_destroy(&neigh_update->encap_lock); 1018 971 rhashtable_destroy(&neigh_update->neigh_ht); 1019 972 } 1020 973 ··· 1031 982 if (err) 1032 983 return err; 1033 984 1034 - list_add(&nhe->neigh_list, &rpriv->neigh_update.neigh_list); 985 + list_add_rcu(&nhe->neigh_list, &rpriv->neigh_update.neigh_list); 1035 986 1036 987 return err; 1037 988 } 1038 989 1039 - static void mlx5e_rep_neigh_entry_remove(struct mlx5e_priv *priv, 1040 - struct mlx5e_neigh_hash_entry *nhe) 990 + static void mlx5e_rep_neigh_entry_remove(struct mlx5e_neigh_hash_entry *nhe) 1041 991 { 1042 - struct mlx5e_rep_priv *rpriv = priv->ppriv; 992 + struct mlx5e_rep_priv *rpriv = nhe->priv->ppriv; 1043 993 1044 - spin_lock_bh(&rpriv->neigh_update.encap_lock); 994 + mutex_lock(&rpriv->neigh_update.encap_lock); 1045 995 1046 - list_del(&nhe->neigh_list); 996 + list_del_rcu(&nhe->neigh_list); 1047 997 1048 998 rhashtable_remove_fast(&rpriv->neigh_update.neigh_ht, 1049 999 &nhe->rhash_node, 1050 1000 mlx5e_neigh_ht_params); 1051 - spin_unlock_bh(&rpriv->neigh_update.encap_lock); 1001 + mutex_unlock(&rpriv->neigh_update.encap_lock); 1052 1002 } 1053 1003 1054 - /* This function must only be called under RTNL lock or under the 1055 - * representor's encap_lock in case RTNL mutex can't be held. 1004 + /* This function must only be called under the representor's encap_lock or 1005 + * inside rcu read lock section. 1056 1006 */ 1057 1007 static struct mlx5e_neigh_hash_entry * 1058 1008 mlx5e_rep_neigh_entry_lookup(struct mlx5e_priv *priv, ··· 1059 1011 { 1060 1012 struct mlx5e_rep_priv *rpriv = priv->ppriv; 1061 1013 struct mlx5e_neigh_update_table *neigh_update = &rpriv->neigh_update; 1014 + struct mlx5e_neigh_hash_entry *nhe; 1062 1015 1063 - return rhashtable_lookup_fast(&neigh_update->neigh_ht, m_neigh, 1064 - mlx5e_neigh_ht_params); 1016 + nhe = rhashtable_lookup_fast(&neigh_update->neigh_ht, m_neigh, 1017 + mlx5e_neigh_ht_params); 1018 + return nhe && mlx5e_rep_neigh_entry_hold(nhe) ? nhe : NULL; 1065 1019 } 1066 1020 1067 1021 static int mlx5e_rep_neigh_entry_create(struct mlx5e_priv *priv, ··· 1076 1026 if (!*nhe) 1077 1027 return -ENOMEM; 1078 1028 1029 + (*nhe)->priv = priv; 1079 1030 memcpy(&(*nhe)->m_neigh, &e->m_neigh, sizeof(e->m_neigh)); 1080 1031 INIT_WORK(&(*nhe)->neigh_update_work, mlx5e_rep_neigh_update); 1032 + spin_lock_init(&(*nhe)->encap_list_lock); 1081 1033 INIT_LIST_HEAD(&(*nhe)->encap_list); 1082 1034 refcount_set(&(*nhe)->refcnt, 1); 1083 1035 ··· 1091 1039 out_free: 1092 1040 kfree(*nhe); 1093 1041 return err; 1094 - } 1095 - 1096 - static void mlx5e_rep_neigh_entry_destroy(struct mlx5e_priv *priv, 1097 - struct mlx5e_neigh_hash_entry *nhe) 1098 - { 1099 - /* The neigh hash entry must be removed from the hash table regardless 1100 - * of the reference count value, so it won't be found by the next 1101 - * neigh notification call. The neigh hash entry reference count is 1102 - * incremented only during creation and neigh notification calls and 1103 - * protects from freeing the nhe struct. 1104 - */ 1105 - mlx5e_rep_neigh_entry_remove(priv, nhe); 1106 - mlx5e_rep_neigh_entry_release(nhe); 1107 1042 } 1108 1043 1109 1044 int mlx5e_rep_encap_entry_attach(struct mlx5e_priv *priv, ··· 1105 1066 err = mlx5_tun_entropy_refcount_inc(tun_entropy, e->reformat_type); 1106 1067 if (err) 1107 1068 return err; 1069 + 1070 + mutex_lock(&rpriv->neigh_update.encap_lock); 1108 1071 nhe = mlx5e_rep_neigh_entry_lookup(priv, &e->m_neigh); 1109 1072 if (!nhe) { 1110 1073 err = mlx5e_rep_neigh_entry_create(priv, e, &nhe); 1111 1074 if (err) { 1075 + mutex_unlock(&rpriv->neigh_update.encap_lock); 1112 1076 mlx5_tun_entropy_refcount_dec(tun_entropy, 1113 1077 e->reformat_type); 1114 1078 return err; 1115 1079 } 1116 1080 } 1117 - list_add(&e->encap_list, &nhe->encap_list); 1081 + 1082 + e->nhe = nhe; 1083 + spin_lock(&nhe->encap_list_lock); 1084 + list_add_rcu(&e->encap_list, &nhe->encap_list); 1085 + spin_unlock(&nhe->encap_list_lock); 1086 + 1087 + mutex_unlock(&rpriv->neigh_update.encap_lock); 1088 + 1118 1089 return 0; 1119 1090 } 1120 1091 ··· 1134 1085 struct mlx5e_rep_priv *rpriv = priv->ppriv; 1135 1086 struct mlx5_rep_uplink_priv *uplink_priv = &rpriv->uplink_priv; 1136 1087 struct mlx5_tun_entropy *tun_entropy = &uplink_priv->tun_entropy; 1137 - struct mlx5e_neigh_hash_entry *nhe; 1138 1088 1139 - list_del(&e->encap_list); 1140 - nhe = mlx5e_rep_neigh_entry_lookup(priv, &e->m_neigh); 1089 + if (!e->nhe) 1090 + return; 1141 1091 1142 - if (list_empty(&nhe->encap_list)) 1143 - mlx5e_rep_neigh_entry_destroy(priv, nhe); 1092 + spin_lock(&e->nhe->encap_list_lock); 1093 + list_del_rcu(&e->encap_list); 1094 + spin_unlock(&e->nhe->encap_list_lock); 1095 + 1096 + mlx5e_rep_neigh_entry_release(e->nhe); 1097 + e->nhe = NULL; 1144 1098 mlx5_tun_entropy_refcount_dec(tun_entropy, e->reformat_type); 1145 1099 } 1146 1100
+10 -1
drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
··· 35 35 36 36 #include <net/ip_tunnels.h> 37 37 #include <linux/rhashtable.h> 38 + #include <linux/mutex.h> 38 39 #include "eswitch.h" 39 40 #include "en.h" 40 41 #include "lib/port_tun.h" ··· 49 48 */ 50 49 struct list_head neigh_list; 51 50 /* protect lookup/remove operations */ 52 - spinlock_t encap_lock; 51 + struct mutex encap_lock; 53 52 struct notifier_block netevent_nb; 54 53 struct delayed_work neigh_stats_work; 55 54 unsigned long min_interval; /* jiffies */ ··· 111 110 struct mlx5e_neigh_hash_entry { 112 111 struct rhash_head rhash_node; 113 112 struct mlx5e_neigh m_neigh; 113 + struct mlx5e_priv *priv; 114 114 115 115 /* Save the neigh hash entry in a list on the representor in 116 116 * addition to the hash table. In order to iterate easily over the ··· 119 117 */ 120 118 struct list_head neigh_list; 121 119 120 + /* protects encap list */ 121 + spinlock_t encap_list_lock; 122 122 /* encap list sharing the same neigh */ 123 123 struct list_head encap_list; 124 124 ··· 141 137 * 'used' value and avoid neigh deleting by the kernel. 142 138 */ 143 139 unsigned long reported_lastuse; 140 + 141 + struct rcu_head rcu; 144 142 }; 145 143 146 144 enum { ··· 151 145 }; 152 146 153 147 struct mlx5e_encap_entry { 148 + /* attached neigh hash entry */ 149 + struct mlx5e_neigh_hash_entry *nhe; 154 150 /* neigh hash entry list of encaps sharing the same neigh */ 155 151 struct list_head encap_list; 156 152 struct mlx5e_neigh m_neigh; ··· 175 167 refcount_t refcnt; 176 168 struct completion res_ready; 177 169 int compl_result; 170 + struct rcu_head rcu; 178 171 }; 179 172 180 173 struct mlx5e_rep_sq {
+116 -36
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
··· 56 56 #include "en/tc_tun.h" 57 57 #include "lib/devcom.h" 58 58 #include "lib/geneve.h" 59 + #include "diag/en_tc_tracepoint.h" 59 60 60 61 struct mlx5_nic_flow_attr { 61 62 u32 action; ··· 127 126 struct list_head hairpin; /* flows sharing the same hairpin */ 128 127 struct list_head peer; /* flows with peer flow */ 129 128 struct list_head unready; /* flows not ready to be offloaded (e.g due to missing route) */ 129 + int tmp_efi_index; 130 + struct list_head tmp_list; /* temporary flow list used by neigh update */ 130 131 refcount_t refcnt; 131 132 struct rcu_head rcu_head; 133 + struct completion init_done; 132 134 union { 133 135 struct mlx5_esw_flow_attr esw_attr[0]; 134 136 struct mlx5_nic_flow_attr nic_attr[0]; ··· 1294 1290 } 1295 1291 1296 1292 void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv, 1297 - struct mlx5e_encap_entry *e) 1293 + struct mlx5e_encap_entry *e, 1294 + struct list_head *flow_list) 1298 1295 { 1299 1296 struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; 1300 1297 struct mlx5_esw_flow_attr slow_attr, *esw_attr; 1301 - struct encap_flow_item *efi, *tmp; 1302 1298 struct mlx5_flow_handle *rule; 1303 1299 struct mlx5_flow_spec *spec; 1304 1300 struct mlx5e_tc_flow *flow; ··· 1317 1313 e->flags |= MLX5_ENCAP_ENTRY_VALID; 1318 1314 mlx5e_rep_queue_neigh_stats_work(priv); 1319 1315 1320 - list_for_each_entry_safe(efi, tmp, &e->flows, list) { 1316 + list_for_each_entry(flow, flow_list, tmp_list) { 1321 1317 bool all_flow_encaps_valid = true; 1322 1318 int i; 1323 1319 1324 - flow = container_of(efi, struct mlx5e_tc_flow, encaps[efi->index]); 1325 - if (IS_ERR(mlx5e_flow_get(flow))) 1320 + if (!mlx5e_is_offloaded_flow(flow)) 1326 1321 continue; 1327 - 1328 1322 esw_attr = flow->esw_attr; 1329 1323 spec = &esw_attr->parse_attr->spec; 1330 1324 1331 - esw_attr->dests[efi->index].encap_id = e->encap_id; 1332 - esw_attr->dests[efi->index].flags |= MLX5_ESW_DEST_ENCAP_VALID; 1325 + esw_attr->dests[flow->tmp_efi_index].encap_id = e->encap_id; 1326 + esw_attr->dests[flow->tmp_efi_index].flags |= MLX5_ESW_DEST_ENCAP_VALID; 1333 1327 /* Flow can be associated with multiple encap entries. 1334 1328 * Before offloading the flow verify that all of them have 1335 1329 * a valid neighbour. ··· 1342 1340 } 1343 1341 /* Do not offload flows with unresolved neighbors */ 1344 1342 if (!all_flow_encaps_valid) 1345 - goto loop_cont; 1343 + continue; 1346 1344 /* update from slow path rule to encap rule */ 1347 1345 rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, esw_attr); 1348 1346 if (IS_ERR(rule)) { 1349 1347 err = PTR_ERR(rule); 1350 1348 mlx5_core_warn(priv->mdev, "Failed to update cached encapsulation flow, %d\n", 1351 1349 err); 1352 - goto loop_cont; 1350 + continue; 1353 1351 } 1354 1352 1355 1353 mlx5e_tc_unoffload_from_slow_path(esw, flow, &slow_attr); 1356 1354 flow->rule[0] = rule; 1357 1355 /* was unset when slow path rule removed */ 1358 1356 flow_flag_set(flow, OFFLOADED); 1359 - 1360 - loop_cont: 1361 - mlx5e_flow_put(priv, flow); 1362 1357 } 1363 1358 } 1364 1359 1365 1360 void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv, 1366 - struct mlx5e_encap_entry *e) 1361 + struct mlx5e_encap_entry *e, 1362 + struct list_head *flow_list) 1367 1363 { 1368 1364 struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; 1369 1365 struct mlx5_esw_flow_attr slow_attr; 1370 - struct encap_flow_item *efi, *tmp; 1371 1366 struct mlx5_flow_handle *rule; 1372 1367 struct mlx5_flow_spec *spec; 1373 1368 struct mlx5e_tc_flow *flow; 1374 1369 int err; 1375 1370 1376 - list_for_each_entry_safe(efi, tmp, &e->flows, list) { 1377 - flow = container_of(efi, struct mlx5e_tc_flow, encaps[efi->index]); 1378 - if (IS_ERR(mlx5e_flow_get(flow))) 1371 + list_for_each_entry(flow, flow_list, tmp_list) { 1372 + if (!mlx5e_is_offloaded_flow(flow)) 1379 1373 continue; 1380 - 1381 1374 spec = &flow->esw_attr->parse_attr->spec; 1382 1375 1383 1376 /* update from encap rule to slow path rule */ 1384 1377 rule = mlx5e_tc_offload_to_slow_path(esw, flow, spec, &slow_attr); 1385 1378 /* mark the flow's encap dest as non-valid */ 1386 - flow->esw_attr->dests[efi->index].flags &= ~MLX5_ESW_DEST_ENCAP_VALID; 1379 + flow->esw_attr->dests[flow->tmp_efi_index].flags &= ~MLX5_ESW_DEST_ENCAP_VALID; 1387 1380 1388 1381 if (IS_ERR(rule)) { 1389 1382 err = PTR_ERR(rule); 1390 1383 mlx5_core_warn(priv->mdev, "Failed to update slow path (encap) flow, %d\n", 1391 1384 err); 1392 - goto loop_cont; 1385 + continue; 1393 1386 } 1394 1387 1395 1388 mlx5e_tc_unoffload_fdb_rules(esw, flow, flow->esw_attr); 1396 1389 flow->rule[0] = rule; 1397 1390 /* was unset when fast path rule removed */ 1398 1391 flow_flag_set(flow, OFFLOADED); 1399 - 1400 - loop_cont: 1401 - mlx5e_flow_put(priv, flow); 1402 1392 } 1403 1393 1404 1394 /* we know that the encap is valid */ ··· 1406 1412 return flow->nic_attr->counter; 1407 1413 } 1408 1414 1415 + /* Takes reference to all flows attached to encap and adds the flows to 1416 + * flow_list using 'tmp_list' list_head in mlx5e_tc_flow. 1417 + */ 1418 + void mlx5e_take_all_encap_flows(struct mlx5e_encap_entry *e, struct list_head *flow_list) 1419 + { 1420 + struct encap_flow_item *efi; 1421 + struct mlx5e_tc_flow *flow; 1422 + 1423 + list_for_each_entry(efi, &e->flows, list) { 1424 + flow = container_of(efi, struct mlx5e_tc_flow, encaps[efi->index]); 1425 + if (IS_ERR(mlx5e_flow_get(flow))) 1426 + continue; 1427 + wait_for_completion(&flow->init_done); 1428 + 1429 + flow->tmp_efi_index = efi->index; 1430 + list_add(&flow->tmp_list, flow_list); 1431 + } 1432 + } 1433 + 1434 + /* Iterate over tmp_list of flows attached to flow_list head. */ 1435 + void mlx5e_put_encap_flow_list(struct mlx5e_priv *priv, struct list_head *flow_list) 1436 + { 1437 + struct mlx5e_tc_flow *flow, *tmp; 1438 + 1439 + list_for_each_entry_safe(flow, tmp, flow_list, tmp_list) 1440 + mlx5e_flow_put(priv, flow); 1441 + } 1442 + 1443 + static struct mlx5e_encap_entry * 1444 + mlx5e_get_next_valid_encap(struct mlx5e_neigh_hash_entry *nhe, 1445 + struct mlx5e_encap_entry *e) 1446 + { 1447 + struct mlx5e_encap_entry *next = NULL; 1448 + 1449 + retry: 1450 + rcu_read_lock(); 1451 + 1452 + /* find encap with non-zero reference counter value */ 1453 + for (next = e ? 1454 + list_next_or_null_rcu(&nhe->encap_list, 1455 + &e->encap_list, 1456 + struct mlx5e_encap_entry, 1457 + encap_list) : 1458 + list_first_or_null_rcu(&nhe->encap_list, 1459 + struct mlx5e_encap_entry, 1460 + encap_list); 1461 + next; 1462 + next = list_next_or_null_rcu(&nhe->encap_list, 1463 + &next->encap_list, 1464 + struct mlx5e_encap_entry, 1465 + encap_list)) 1466 + if (mlx5e_encap_take(next)) 1467 + break; 1468 + 1469 + rcu_read_unlock(); 1470 + 1471 + /* release starting encap */ 1472 + if (e) 1473 + mlx5e_encap_put(netdev_priv(e->out_dev), e); 1474 + if (!next) 1475 + return next; 1476 + 1477 + /* wait for encap to be fully initialized */ 1478 + wait_for_completion(&next->res_ready); 1479 + /* continue searching if encap entry is not in valid state after completion */ 1480 + if (!(next->flags & MLX5_ENCAP_ENTRY_VALID)) { 1481 + e = next; 1482 + goto retry; 1483 + } 1484 + 1485 + return next; 1486 + } 1487 + 1409 1488 void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe) 1410 1489 { 1411 1490 struct mlx5e_neigh *m_neigh = &nhe->m_neigh; 1491 + struct mlx5e_encap_entry *e = NULL; 1412 1492 struct mlx5e_tc_flow *flow; 1413 - struct mlx5e_encap_entry *e; 1414 1493 struct mlx5_fc *counter; 1415 1494 struct neigh_table *tbl; 1416 1495 bool neigh_used = false; ··· 1499 1432 else 1500 1433 return; 1501 1434 1502 - list_for_each_entry(e, &nhe->encap_list, encap_list) { 1435 + /* mlx5e_get_next_valid_encap() releases previous encap before returning 1436 + * next one. 1437 + */ 1438 + while ((e = mlx5e_get_next_valid_encap(nhe, e)) != NULL) { 1439 + struct mlx5e_priv *priv = netdev_priv(e->out_dev); 1503 1440 struct encap_flow_item *efi, *tmp; 1441 + struct mlx5_eswitch *esw; 1442 + LIST_HEAD(flow_list); 1504 1443 1505 - if (!(e->flags & MLX5_ENCAP_ENTRY_VALID) || 1506 - !mlx5e_encap_take(e)) 1507 - continue; 1508 - 1444 + esw = priv->mdev->priv.eswitch; 1445 + mutex_lock(&esw->offloads.encap_tbl_lock); 1509 1446 list_for_each_entry_safe(efi, tmp, &e->flows, list) { 1510 1447 flow = container_of(efi, struct mlx5e_tc_flow, 1511 1448 encaps[efi->index]); 1512 1449 if (IS_ERR(mlx5e_flow_get(flow))) 1513 1450 continue; 1451 + list_add(&flow->tmp_list, &flow_list); 1514 1452 1515 1453 if (mlx5e_is_offloaded_flow(flow)) { 1516 1454 counter = mlx5e_tc_get_counter(flow); 1517 1455 lastuse = mlx5_fc_query_lastuse(counter); 1518 1456 if (time_after((unsigned long)lastuse, nhe->reported_lastuse)) { 1519 - mlx5e_flow_put(netdev_priv(e->out_dev), flow); 1520 1457 neigh_used = true; 1521 1458 break; 1522 1459 } 1523 1460 } 1524 - 1525 - mlx5e_flow_put(netdev_priv(e->out_dev), flow); 1526 1461 } 1462 + mutex_unlock(&esw->offloads.encap_tbl_lock); 1527 1463 1528 - mlx5e_encap_put(netdev_priv(e->out_dev), e); 1529 - if (neigh_used) 1464 + mlx5e_put_encap_flow_list(priv, &flow_list); 1465 + if (neigh_used) { 1466 + /* release current encap before breaking the loop */ 1467 + mlx5e_encap_put(priv, e); 1530 1468 break; 1469 + } 1531 1470 } 1471 + 1472 + trace_mlx5e_tc_update_neigh_used_value(nhe, neigh_used); 1532 1473 1533 1474 if (neigh_used) { 1534 1475 nhe->reported_lastuse = jiffies; ··· 1565 1490 } 1566 1491 1567 1492 kfree(e->encap_header); 1568 - kfree(e); 1493 + kfree_rcu(e, rcu); 1569 1494 } 1570 1495 1571 1496 void mlx5e_encap_put(struct mlx5e_priv *priv, struct mlx5e_encap_entry *e) ··· 3501 3426 INIT_LIST_HEAD(&flow->mod_hdr); 3502 3427 INIT_LIST_HEAD(&flow->hairpin); 3503 3428 refcount_set(&flow->refcnt, 1); 3429 + init_completion(&flow->init_done); 3504 3430 3505 3431 *__flow = flow; 3506 3432 *__parse_attr = parse_attr; ··· 3574 3498 goto err_free; 3575 3499 3576 3500 err = mlx5e_tc_add_fdb_flow(priv, flow, extack); 3501 + complete_all(&flow->init_done); 3577 3502 if (err) { 3578 3503 if (!(err == -ENETUNREACH && mlx5_lag_is_multipath(in_mdev))) 3579 3504 goto err_free; ··· 3772 3695 goto out; 3773 3696 } 3774 3697 3698 + trace_mlx5e_configure_flower(f); 3775 3699 err = mlx5e_tc_add_flow(priv, f, flags, dev, &flow); 3776 3700 if (err) 3777 3701 goto out; ··· 3822 3744 rhashtable_remove_fast(tc_ht, &flow->node, tc_ht_params); 3823 3745 rcu_read_unlock(); 3824 3746 3747 + trace_mlx5e_delete_flower(f); 3825 3748 mlx5e_flow_put(priv, flow); 3826 3749 3827 3750 return 0; ··· 3892 3813 mlx5_devcom_release_peer_data(devcom, MLX5_DEVCOM_ESW_OFFLOADS); 3893 3814 out: 3894 3815 flow_stats_update(&f->stats, bytes, packets, lastuse); 3816 + trace_mlx5e_stats_flower(f); 3895 3817 errout: 3896 3818 mlx5e_flow_put(priv, flow); 3897 3819 return err;
+7 -2
drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
··· 72 72 73 73 struct mlx5e_encap_entry; 74 74 void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv, 75 - struct mlx5e_encap_entry *e); 75 + struct mlx5e_encap_entry *e, 76 + struct list_head *flow_list); 76 77 void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv, 77 - struct mlx5e_encap_entry *e); 78 + struct mlx5e_encap_entry *e, 79 + struct list_head *flow_list); 78 80 bool mlx5e_encap_take(struct mlx5e_encap_entry *e); 79 81 void mlx5e_encap_put(struct mlx5e_priv *priv, struct mlx5e_encap_entry *e); 82 + 83 + void mlx5e_take_all_encap_flows(struct mlx5e_encap_entry *e, struct list_head *flow_list); 84 + void mlx5e_put_encap_flow_list(struct mlx5e_priv *priv, struct list_head *flow_list); 80 85 81 86 struct mlx5e_neigh_hash_entry; 82 87 void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe);
+1
include/net/flow_offload.h
··· 138 138 FLOW_ACTION_MPLS_PUSH, 139 139 FLOW_ACTION_MPLS_POP, 140 140 FLOW_ACTION_MPLS_MANGLE, 141 + NUM_FLOW_ACTIONS, 141 142 }; 142 143 143 144 /* This is mirroring enum pedit_header_type definition for easy mapping between