Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'expose-burst-period-for-devlink-health-reporter'

Mark Bloch says:

====================
Expose burst period for devlink health reporter

Shahar writes:
--------------------------------------------------------------------------

Currently, the devlink health reporter initiates the grace period
immediately after recovering an error, which blocks further recovery
attempts until the grace period concludes. Since additional errors
are not generally expected during this short interval, any new error
reported during the grace period is not only rejected but also causes
the reporter to enter an error state that requires manual intervention.

This approach poses a problem in scenarios where a single root cause
triggers multiple related errors in quick succession - for example,
a PCI issue affecting multiple hardware queues. Because these errors
are closely related and occur rapidly, it is more effective to handle
them together rather than handling only the first one reported and
blocking any subsequent recovery attempts. Furthermore, setting the
reporter to an error state in this context can be misleading, as these
multiple errors are manifestations of a single underlying issue, making
it unlike the general case where additional errors are not expected
during the grace period.

To resolve this, introduce a configurable burst period attribute to the
devlink health reporter. This period starts when the first error
is recovered and lasts for a user-defined duration. Once this error
burst period expires, the grace period begins. After the grace period
ends, a new reported error will start the same flow again.

Timeline summary:

----|--------|------------------------------/----------------------/--
error is error is burst period grace period
reported recovered (recoveries allowed) (recoveries blocked)

With burst period, create a time window during which recovery attempts
are permitted, allowing all reported errors to be handled sequentially
before the grace period starts. Once the grace period begins, it
prevents any further error recoveries until it ends.

When burst period is set to 0, current behavior is preserved.

Design alternatives considered:

1. Recover all queues upon any error:
A brute-force approach that recovers all queues on any error.
While simple, it is overly aggressive and disrupts unaffected queues
unnecessarily. Also, because this is handled entirely within the
driver, it leads to a driver-specific implementation rather than a
generic one.

2. Per-queue reporter:
This design would isolate recovery handling per SQ or RQ, effectively
removing interdependencies between queues. While conceptually clean,
it introduces significant scalability challenges as the number of
queues grows, as well as synchronization challenges across multiple
reporters.

3. Error aggregation with delayed handling:
Errors arriving during the grace period are saved and processed after
it ends. While addressing the issue of related errors whose recovery
is aborted as grace period started, this adds complexity due to
synchronization needs and contradicts the assumption that no errors
should occur during a healthy system’s grace period. Also, this
breaks the important role of grace period in preventing an infinite
loop of immediate error detection following recovery. In such cases
we want to stop.

4. Allowing a fixed burst of errors before starting grace period:
Allows a set number of recoveries before the grace period begins.
However, it also requires limiting the error reporting window.
To keep the design simple, the burst threshold becomes redundant.

The burst period design was chosen for its simplicity and precision in
addressing the problem at hand. It effectively captures the temporal
correlation of related errors and aligns with the original intent of
the grace period as a stabilization window where further errors are
unexpected, and if they do occur, they indicate an abnormal system
state.

v3: https://lore.kernel.org/1755111349-416632-1-git-send-email-tariqt@nvidia.com
v2: https://lore.kernel.org/1753390134-345154-1-git-send-email-tariqt@nvidia.com
v1: https://lore.kernel.org/1752768442-264413-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/20250824084354.533182-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+187 -85
+7
Documentation/netlink/specs/devlink.yaml
··· 853 853 type: nest 854 854 multi-attr: true 855 855 nested-attributes: dl-rate-tc-bws 856 + - 857 + name: health-reporter-burst-period 858 + type: u64 859 + doc: Time (in msec) for recoveries before starting the grace period. 856 860 - 857 861 name: dl-dev-stats 858 862 subset-of: devlink ··· 1220 1216 name: health-reporter-dump-ts-ns 1221 1217 - 1222 1218 name: health-reporter-auto-dump 1219 + - 1220 + name: health-reporter-burst-period 1223 1221 1224 1222 - 1225 1223 name: dl-attr-stats ··· 1967 1961 - health-reporter-graceful-period 1968 1962 - health-reporter-auto-recover 1969 1963 - health-reporter-auto-dump 1964 + - health-reporter-burst-period 1970 1965 1971 1966 - 1972 1967 name: health-reporter-recover
+1 -1
drivers/net/ethernet/amd/pds_core/main.c
··· 280 280 goto err_out_del_dev; 281 281 } 282 282 283 - hr = devl_health_reporter_create(dl, &pdsc_fw_reporter_ops, 0, pdsc); 283 + hr = devl_health_reporter_create(dl, &pdsc_fw_reporter_ops, pdsc); 284 284 if (IS_ERR(hr)) { 285 285 devl_unlock(dl); 286 286 dev_warn(pdsc->dev, "Failed to create fw reporter: %pe\n", hr);
+1 -1
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
··· 220 220 { 221 221 struct devlink_health_reporter *reporter; 222 222 223 - reporter = devlink_health_reporter_create(bp->dl, ops, 0, bp); 223 + reporter = devlink_health_reporter_create(bp->dl, ops, bp); 224 224 if (IS_ERR(reporter)) { 225 225 netdev_warn(bp->dev, "Failed to create %s health reporter, rc = %ld\n", 226 226 ops->name, PTR_ERR(reporter));
+6 -4
drivers/net/ethernet/huawei/hinic/hinic_devlink.c
··· 443 443 struct devlink *devlink = priv_to_devlink(priv); 444 444 445 445 priv->hw_fault_reporter = 446 - devlink_health_reporter_create(devlink, &hinic_hw_fault_reporter_ops, 447 - 0, priv); 446 + devlink_health_reporter_create(devlink, 447 + &hinic_hw_fault_reporter_ops, 448 + priv); 448 449 if (IS_ERR(priv->hw_fault_reporter)) { 449 450 dev_warn(&priv->hwdev->hwif->pdev->dev, "Failed to create hw fault reporter, err: %ld\n", 450 451 PTR_ERR(priv->hw_fault_reporter)); ··· 453 452 } 454 453 455 454 priv->fw_fault_reporter = 456 - devlink_health_reporter_create(devlink, &hinic_fw_fault_reporter_ops, 457 - 0, priv); 455 + devlink_health_reporter_create(devlink, 456 + &hinic_fw_fault_reporter_ops, 457 + priv); 458 458 if (IS_ERR(priv->fw_fault_reporter)) { 459 459 dev_warn(&priv->hwdev->hwif->pdev->dev, "Failed to create fw fault reporter, err: %ld\n", 460 460 PTR_ERR(priv->fw_fault_reporter));
+1 -2
drivers/net/ethernet/intel/ice/devlink/health.c
··· 450 450 { 451 451 struct devlink *devlink = priv_to_devlink(pf); 452 452 struct devlink_health_reporter *rep; 453 - const u64 graceful_period = 0; 454 453 455 - rep = devl_health_reporter_create(devlink, ops, graceful_period, pf); 454 + rep = devl_health_reporter_create(devlink, ops, pf); 456 455 if (IS_ERR(rep)) { 457 456 struct device *dev = ice_pf_to_dev(pf); 458 457
+24 -8
drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
··· 505 505 506 506 rvu_reporters->nix_event_ctx = nix_event_context; 507 507 rvu_reporters->rvu_hw_nix_intr_reporter = 508 - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_intr_reporter_ops, 0, rvu); 508 + devlink_health_reporter_create(rvu_dl->dl, 509 + &rvu_hw_nix_intr_reporter_ops, 510 + rvu); 509 511 if (IS_ERR(rvu_reporters->rvu_hw_nix_intr_reporter)) { 510 512 dev_warn(rvu->dev, "Failed to create hw_nix_intr reporter, err=%ld\n", 511 513 PTR_ERR(rvu_reporters->rvu_hw_nix_intr_reporter)); ··· 515 513 } 516 514 517 515 rvu_reporters->rvu_hw_nix_gen_reporter = 518 - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_gen_reporter_ops, 0, rvu); 516 + devlink_health_reporter_create(rvu_dl->dl, 517 + &rvu_hw_nix_gen_reporter_ops, 518 + rvu); 519 519 if (IS_ERR(rvu_reporters->rvu_hw_nix_gen_reporter)) { 520 520 dev_warn(rvu->dev, "Failed to create hw_nix_gen reporter, err=%ld\n", 521 521 PTR_ERR(rvu_reporters->rvu_hw_nix_gen_reporter)); ··· 525 521 } 526 522 527 523 rvu_reporters->rvu_hw_nix_err_reporter = 528 - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_err_reporter_ops, 0, rvu); 524 + devlink_health_reporter_create(rvu_dl->dl, 525 + &rvu_hw_nix_err_reporter_ops, 526 + rvu); 529 527 if (IS_ERR(rvu_reporters->rvu_hw_nix_err_reporter)) { 530 528 dev_warn(rvu->dev, "Failed to create hw_nix_err reporter, err=%ld\n", 531 529 PTR_ERR(rvu_reporters->rvu_hw_nix_err_reporter)); ··· 535 529 } 536 530 537 531 rvu_reporters->rvu_hw_nix_ras_reporter = 538 - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_ras_reporter_ops, 0, rvu); 532 + devlink_health_reporter_create(rvu_dl->dl, 533 + &rvu_hw_nix_ras_reporter_ops, 534 + rvu); 539 535 if (IS_ERR(rvu_reporters->rvu_hw_nix_ras_reporter)) { 540 536 dev_warn(rvu->dev, "Failed to create hw_nix_ras reporter, err=%ld\n", 541 537 PTR_ERR(rvu_reporters->rvu_hw_nix_ras_reporter)); ··· 1059 1051 1060 1052 rvu_reporters->npa_event_ctx = npa_event_context; 1061 1053 rvu_reporters->rvu_hw_npa_intr_reporter = 1062 - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_intr_reporter_ops, 0, rvu); 1054 + devlink_health_reporter_create(rvu_dl->dl, 1055 + &rvu_hw_npa_intr_reporter_ops, 1056 + rvu); 1063 1057 if (IS_ERR(rvu_reporters->rvu_hw_npa_intr_reporter)) { 1064 1058 dev_warn(rvu->dev, "Failed to create hw_npa_intr reporter, err=%ld\n", 1065 1059 PTR_ERR(rvu_reporters->rvu_hw_npa_intr_reporter)); ··· 1069 1059 } 1070 1060 1071 1061 rvu_reporters->rvu_hw_npa_gen_reporter = 1072 - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_gen_reporter_ops, 0, rvu); 1062 + devlink_health_reporter_create(rvu_dl->dl, 1063 + &rvu_hw_npa_gen_reporter_ops, 1064 + rvu); 1073 1065 if (IS_ERR(rvu_reporters->rvu_hw_npa_gen_reporter)) { 1074 1066 dev_warn(rvu->dev, "Failed to create hw_npa_gen reporter, err=%ld\n", 1075 1067 PTR_ERR(rvu_reporters->rvu_hw_npa_gen_reporter)); ··· 1079 1067 } 1080 1068 1081 1069 rvu_reporters->rvu_hw_npa_err_reporter = 1082 - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_err_reporter_ops, 0, rvu); 1070 + devlink_health_reporter_create(rvu_dl->dl, 1071 + &rvu_hw_npa_err_reporter_ops, 1072 + rvu); 1083 1073 if (IS_ERR(rvu_reporters->rvu_hw_npa_err_reporter)) { 1084 1074 dev_warn(rvu->dev, "Failed to create hw_npa_err reporter, err=%ld\n", 1085 1075 PTR_ERR(rvu_reporters->rvu_hw_npa_err_reporter)); ··· 1089 1075 } 1090 1076 1091 1077 rvu_reporters->rvu_hw_npa_ras_reporter = 1092 - devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_ras_reporter_ops, 0, rvu); 1078 + devlink_health_reporter_create(rvu_dl->dl, 1079 + &rvu_hw_npa_ras_reporter_ops, 1080 + rvu); 1093 1081 if (IS_ERR(rvu_reporters->rvu_hw_npa_ras_reporter)) { 1094 1082 dev_warn(rvu->dev, "Failed to create hw_npa_ras reporter, err=%ld\n", 1095 1083 PTR_ERR(rvu_reporters->rvu_hw_npa_ras_reporter));
+1 -1
drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c
··· 135 135 health->vnic_reporter = 136 136 devlink_health_reporter_create(devlink, 137 137 &mlx5_reporter_vnic_ops, 138 - 0, dev); 138 + dev); 139 139 if (IS_ERR(health->vnic_reporter)) 140 140 mlx5_core_warn(dev, 141 141 "Failed to create vnic reporter, err = %ld\n",
+8 -4
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
··· 651 651 mutex_unlock(&c->icosq_recovery_lock); 652 652 } 653 653 654 + #define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500 655 + #define MLX5E_REPORTER_RX_BURST_PERIOD 500 656 + 654 657 static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = { 655 658 .name = "rx", 656 659 .recover = mlx5e_rx_reporter_recover, 657 660 .diagnose = mlx5e_rx_reporter_diagnose, 658 661 .dump = mlx5e_rx_reporter_dump, 662 + .default_graceful_period = MLX5E_REPORTER_RX_GRACEFUL_PERIOD, 663 + .default_burst_period = MLX5E_REPORTER_RX_BURST_PERIOD, 659 664 }; 660 - 661 - #define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500 662 665 663 666 void mlx5e_reporter_rx_create(struct mlx5e_priv *priv) 664 667 { 668 + struct devlink_port *port = priv->netdev->devlink_port; 665 669 struct devlink_health_reporter *reporter; 666 670 667 - reporter = devlink_port_health_reporter_create(priv->netdev->devlink_port, 671 + reporter = devlink_port_health_reporter_create(port, 668 672 &mlx5_rx_reporter_ops, 669 - MLX5E_REPORTER_RX_GRACEFUL_PERIOD, priv); 673 + priv); 670 674 if (IS_ERR(reporter)) { 671 675 netdev_warn(priv->netdev, "Failed to create rx reporter, err = %ld\n", 672 676 PTR_ERR(reporter));
+8 -4
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
··· 539 539 mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx); 540 540 } 541 541 542 + #define MLX5E_REPORTER_TX_GRACEFUL_PERIOD 500 543 + #define MLX5E_REPORTER_TX_BURST_PERIOD 500 544 + 542 545 static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = { 543 546 .name = "tx", 544 547 .recover = mlx5e_tx_reporter_recover, 545 548 .diagnose = mlx5e_tx_reporter_diagnose, 546 549 .dump = mlx5e_tx_reporter_dump, 550 + .default_graceful_period = MLX5E_REPORTER_TX_GRACEFUL_PERIOD, 551 + .default_burst_period = MLX5E_REPORTER_TX_BURST_PERIOD, 547 552 }; 548 - 549 - #define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500 550 553 551 554 void mlx5e_reporter_tx_create(struct mlx5e_priv *priv) 552 555 { 556 + struct devlink_port *port = priv->netdev->devlink_port; 553 557 struct devlink_health_reporter *reporter; 554 558 555 - reporter = devlink_port_health_reporter_create(priv->netdev->devlink_port, 559 + reporter = devlink_port_health_reporter_create(port, 556 560 &mlx5_tx_reporter_ops, 557 - MLX5_REPORTER_TX_GRACEFUL_PERIOD, priv); 561 + priv); 558 562 if (IS_ERR(reporter)) { 559 563 netdev_warn(priv->netdev, 560 564 "Failed to create tx reporter, err = %ld\n",
+1 -1
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
··· 1447 1447 1448 1448 reporter = devl_port_health_reporter_create(dl_port, 1449 1449 &mlx5_rep_vnic_reporter_ops, 1450 - 0, rpriv); 1450 + rpriv); 1451 1451 if (IS_ERR(reporter)) { 1452 1452 mlx5_core_err(priv->mdev, 1453 1453 "Failed to create representor vnic reporter, err = %ld\n",
+24 -17
drivers/net/ethernet/mellanox/mlx5/core/health.c
··· 669 669 } 670 670 } 671 671 672 + #define MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD 180000 673 + #define MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD 60000 674 + #define MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD 30000 675 + #define MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD \ 676 + MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD 677 + 678 + static 679 + const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ecpf_ops = { 680 + .name = "fw_fatal", 681 + .recover = mlx5_fw_fatal_reporter_recover, 682 + .dump = mlx5_fw_fatal_reporter_dump, 683 + .default_graceful_period = 684 + MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD, 685 + }; 686 + 672 687 static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_pf_ops = { 673 688 .name = "fw_fatal", 674 689 .recover = mlx5_fw_fatal_reporter_recover, 675 690 .dump = mlx5_fw_fatal_reporter_dump, 691 + .default_graceful_period = MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD, 676 692 }; 677 693 678 694 static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = { 679 695 .name = "fw_fatal", 680 696 .recover = mlx5_fw_fatal_reporter_recover, 697 + .default_graceful_period = 698 + MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD, 681 699 }; 682 - 683 - #define MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD 180000 684 - #define MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD 60000 685 - #define MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD 30000 686 - #define MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD 687 700 688 701 void mlx5_fw_reporters_create(struct mlx5_core_dev *dev) 689 702 { ··· 704 691 struct mlx5_core_health *health = &dev->priv.health; 705 692 const struct devlink_health_reporter_ops *fw_ops; 706 693 struct devlink *devlink = priv_to_devlink(dev); 707 - u64 grace_period; 708 694 709 - fw_fatal_ops = &mlx5_fw_fatal_reporter_pf_ops; 710 695 fw_ops = &mlx5_fw_reporter_pf_ops; 711 696 if (mlx5_core_is_ecpf(dev)) { 712 - grace_period = MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD; 697 + fw_fatal_ops = &mlx5_fw_fatal_reporter_ecpf_ops; 713 698 } else if (mlx5_core_is_pf(dev)) { 714 - grace_period = MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD; 699 + fw_fatal_ops = &mlx5_fw_fatal_reporter_pf_ops; 715 700 } else { 716 701 /* VF or SF */ 717 - grace_period = MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD; 718 702 fw_fatal_ops = &mlx5_fw_fatal_reporter_ops; 719 703 fw_ops = &mlx5_fw_reporter_ops; 720 704 } 721 705 722 - health->fw_reporter = 723 - devl_health_reporter_create(devlink, fw_ops, 0, dev); 706 + health->fw_reporter = devl_health_reporter_create(devlink, fw_ops, dev); 724 707 if (IS_ERR(health->fw_reporter)) 725 708 mlx5_core_warn(dev, "Failed to create fw reporter, err = %ld\n", 726 709 PTR_ERR(health->fw_reporter)); 727 710 728 - health->fw_fatal_reporter = 729 - devl_health_reporter_create(devlink, 730 - fw_fatal_ops, 731 - grace_period, 732 - dev); 711 + health->fw_fatal_reporter = devl_health_reporter_create(devlink, 712 + fw_fatal_ops, 713 + dev); 733 714 if (IS_ERR(health->fw_fatal_reporter)) 734 715 mlx5_core_warn(dev, "Failed to create fw fatal reporter, err = %ld\n", 735 716 PTR_ERR(health->fw_fatal_reporter));
+1 -1
drivers/net/ethernet/mellanox/mlxsw/core.c
··· 2043 2043 return 0; 2044 2044 2045 2045 fw_fatal = devl_health_reporter_create(devlink, &mlxsw_core_health_fw_fatal_ops, 2046 - 0, mlxsw_core); 2046 + mlxsw_core); 2047 2047 if (IS_ERR(fw_fatal)) { 2048 2048 dev_err(mlxsw_core->bus_info->dev, "Failed to create fw fatal reporter"); 2049 2049 return PTR_ERR(fw_fatal);
+5 -4
drivers/net/ethernet/qlogic/qed/qed_devlink.c
··· 87 87 return 0; 88 88 } 89 89 90 + #define QED_REPORTER_FW_GRACEFUL_PERIOD 0 91 + 90 92 static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = { 91 93 .name = "fw_fatal", 92 94 .recover = qed_fw_fatal_reporter_recover, 93 95 .dump = qed_fw_fatal_reporter_dump, 96 + .default_graceful_period = QED_REPORTER_FW_GRACEFUL_PERIOD, 94 97 }; 95 - 96 - #define QED_REPORTER_FW_GRACEFUL_PERIOD 0 97 98 98 99 void qed_fw_reporters_create(struct devlink *devlink) 99 100 { 100 101 struct qed_devlink *dl = devlink_priv(devlink); 101 102 102 - dl->fw_reporter = devlink_health_reporter_create(devlink, &qed_fw_fatal_reporter_ops, 103 - QED_REPORTER_FW_GRACEFUL_PERIOD, dl); 103 + dl->fw_reporter = devlink_health_reporter_create(devlink, 104 + &qed_fw_fatal_reporter_ops, dl); 104 105 if (IS_ERR(dl->fw_reporter)) { 105 106 DP_NOTICE(dl->cdev, "Failed to create fw reporter, err = %ld\n", 106 107 PTR_ERR(dl->fw_reporter));
+2 -2
drivers/net/netdevsim/health.c
··· 183 183 health->empty_reporter = 184 184 devl_health_reporter_create(devlink, 185 185 &nsim_dev_empty_reporter_ops, 186 - 0, health); 186 + health); 187 187 if (IS_ERR(health->empty_reporter)) 188 188 return PTR_ERR(health->empty_reporter); 189 189 190 190 health->dummy_reporter = 191 191 devl_health_reporter_create(devlink, 192 192 &nsim_dev_dummy_reporter_ops, 193 - 0, health); 193 + health); 194 194 if (IS_ERR(health->dummy_reporter)) { 195 195 err = PTR_ERR(health->dummy_reporter); 196 196 goto err_empty_reporter_destroy;
+10 -4
include/net/devlink.h
··· 746 746 * if priv_ctx is NULL, run a full dump 747 747 * @diagnose: callback to diagnose the current status 748 748 * @test: callback to trigger a test event 749 + * @default_graceful_period: default min time (in msec) 750 + * between recovery attempts 751 + * @default_burst_period: default time (in msec) for 752 + * error recoveries before starting the grace period 749 753 */ 750 754 751 755 struct devlink_health_reporter_ops { ··· 764 760 struct netlink_ext_ack *extack); 765 761 int (*test)(struct devlink_health_reporter *reporter, 766 762 struct netlink_ext_ack *extack); 763 + u64 default_graceful_period; 764 + u64 default_burst_period; 767 765 }; 768 766 769 767 /** ··· 1934 1928 struct devlink_health_reporter * 1935 1929 devl_port_health_reporter_create(struct devlink_port *port, 1936 1930 const struct devlink_health_reporter_ops *ops, 1937 - u64 graceful_period, void *priv); 1931 + void *priv); 1938 1932 1939 1933 struct devlink_health_reporter * 1940 1934 devlink_port_health_reporter_create(struct devlink_port *port, 1941 1935 const struct devlink_health_reporter_ops *ops, 1942 - u64 graceful_period, void *priv); 1936 + void *priv); 1943 1937 1944 1938 struct devlink_health_reporter * 1945 1939 devl_health_reporter_create(struct devlink *devlink, 1946 1940 const struct devlink_health_reporter_ops *ops, 1947 - u64 graceful_period, void *priv); 1941 + void *priv); 1948 1942 1949 1943 struct devlink_health_reporter * 1950 1944 devlink_health_reporter_create(struct devlink *devlink, 1951 1945 const struct devlink_health_reporter_ops *ops, 1952 - u64 graceful_period, void *priv); 1946 + void *priv); 1953 1947 1954 1948 void 1955 1949 devl_health_reporter_destroy(struct devlink_health_reporter *reporter);
+2
include/uapi/linux/devlink.h
··· 636 636 637 637 DEVLINK_ATTR_RATE_TC_BWS, /* nested */ 638 638 639 + DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD, /* u64 */ 640 + 639 641 /* Add new attributes above here, update the spec in 640 642 * Documentation/netlink/specs/devlink.yaml and re-generate 641 643 * net/devlink/netlink_gen.c.
+81 -28
net/devlink/health.c
··· 60 60 struct devlink_port *devlink_port; 61 61 struct devlink_fmsg *dump_fmsg; 62 62 u64 graceful_period; 63 + u64 burst_period; 63 64 bool auto_recover; 64 65 bool auto_dump; 65 66 u8 health_state; ··· 109 108 static struct devlink_health_reporter * 110 109 __devlink_health_reporter_create(struct devlink *devlink, 111 110 const struct devlink_health_reporter_ops *ops, 112 - u64 graceful_period, void *priv) 111 + void *priv) 113 112 { 114 113 struct devlink_health_reporter *reporter; 115 114 116 - if (WARN_ON(graceful_period && !ops->recover)) 115 + if (WARN_ON(ops->default_graceful_period && !ops->recover)) 116 + return ERR_PTR(-EINVAL); 117 + 118 + if (WARN_ON(ops->default_burst_period && !ops->default_graceful_period)) 117 119 return ERR_PTR(-EINVAL); 118 120 119 121 reporter = kzalloc(sizeof(*reporter), GFP_KERNEL); ··· 126 122 reporter->priv = priv; 127 123 reporter->ops = ops; 128 124 reporter->devlink = devlink; 129 - reporter->graceful_period = graceful_period; 125 + reporter->graceful_period = ops->default_graceful_period; 126 + reporter->burst_period = ops->default_burst_period; 130 127 reporter->auto_recover = !!ops->recover; 131 128 reporter->auto_dump = !!ops->dump; 132 129 return reporter; ··· 139 134 * 140 135 * @port: devlink_port to which health reports will relate 141 136 * @ops: devlink health reporter ops 142 - * @graceful_period: min time (in msec) between recovery attempts 143 137 * @priv: driver priv pointer 144 138 */ 145 139 struct devlink_health_reporter * 146 140 devl_port_health_reporter_create(struct devlink_port *port, 147 141 const struct devlink_health_reporter_ops *ops, 148 - u64 graceful_period, void *priv) 142 + void *priv) 149 143 { 150 144 struct devlink_health_reporter *reporter; 151 145 ··· 154 150 ops->name)) 155 151 return ERR_PTR(-EEXIST); 156 152 157 - reporter = __devlink_health_reporter_create(port->devlink, ops, 158 - graceful_period, priv); 153 + reporter = __devlink_health_reporter_create(port->devlink, ops, priv); 159 154 if (IS_ERR(reporter)) 160 155 return reporter; 161 156 ··· 167 164 struct devlink_health_reporter * 168 165 devlink_port_health_reporter_create(struct devlink_port *port, 169 166 const struct devlink_health_reporter_ops *ops, 170 - u64 graceful_period, void *priv) 167 + void *priv) 171 168 { 172 169 struct devlink_health_reporter *reporter; 173 170 struct devlink *devlink = port->devlink; 174 171 175 172 devl_lock(devlink); 176 - reporter = devl_port_health_reporter_create(port, ops, 177 - graceful_period, priv); 173 + reporter = devl_port_health_reporter_create(port, ops, priv); 178 174 devl_unlock(devlink); 179 175 return reporter; 180 176 } ··· 184 182 * 185 183 * @devlink: devlink instance which the health reports will relate 186 184 * @ops: devlink health reporter ops 187 - * @graceful_period: min time (in msec) between recovery attempts 188 185 * @priv: driver priv pointer 189 186 */ 190 187 struct devlink_health_reporter * 191 188 devl_health_reporter_create(struct devlink *devlink, 192 189 const struct devlink_health_reporter_ops *ops, 193 - u64 graceful_period, void *priv) 190 + void *priv) 194 191 { 195 192 struct devlink_health_reporter *reporter; 196 193 ··· 198 197 if (devlink_health_reporter_find_by_name(devlink, ops->name)) 199 198 return ERR_PTR(-EEXIST); 200 199 201 - reporter = __devlink_health_reporter_create(devlink, ops, 202 - graceful_period, priv); 200 + reporter = __devlink_health_reporter_create(devlink, ops, priv); 203 201 if (IS_ERR(reporter)) 204 202 return reporter; 205 203 ··· 210 210 struct devlink_health_reporter * 211 211 devlink_health_reporter_create(struct devlink *devlink, 212 212 const struct devlink_health_reporter_ops *ops, 213 - u64 graceful_period, void *priv) 213 + void *priv) 214 214 { 215 215 struct devlink_health_reporter *reporter; 216 216 217 217 devl_lock(devlink); 218 - reporter = devl_health_reporter_create(devlink, ops, 219 - graceful_period, priv); 218 + reporter = devl_health_reporter_create(devlink, ops, priv); 220 219 devl_unlock(devlink); 221 220 return reporter; 222 221 } ··· 295 296 if (reporter->ops->recover && 296 297 devlink_nl_put_u64(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, 297 298 reporter->graceful_period)) 299 + goto reporter_nest_cancel; 300 + if (reporter->ops->recover && 301 + devlink_nl_put_u64(msg, DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD, 302 + reporter->burst_period)) 298 303 goto reporter_nest_cancel; 299 304 if (reporter->ops->recover && 300 305 nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, ··· 465 462 466 463 if (!reporter->ops->recover && 467 464 (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] || 468 - info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])) 465 + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] || 466 + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD])) 469 467 return -EOPNOTSUPP; 470 468 471 469 if (!reporter->ops->dump && 472 470 info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) 473 471 return -EOPNOTSUPP; 474 472 475 - if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) 473 + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) { 476 474 reporter->graceful_period = 477 475 nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]); 476 + if (!reporter->graceful_period) 477 + reporter->burst_period = 0; 478 + } 479 + 480 + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD]) { 481 + u64 burst_period = 482 + nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD]); 483 + 484 + if (!reporter->graceful_period && burst_period) { 485 + NL_SET_ERR_MSG_MOD(info->extack, 486 + "Cannot set burst period without a grace period."); 487 + return -EINVAL; 488 + } 489 + 490 + reporter->burst_period = burst_period; 491 + } 478 492 479 493 if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]) 480 494 reporter->auto_recover = ··· 534 514 devlink_nl_notify_send_desc(devlink, msg, &desc); 535 515 } 536 516 517 + static bool 518 + devlink_health_reporter_in_burst(struct devlink_health_reporter *reporter) 519 + { 520 + unsigned long burst_threshold = reporter->last_recovery_ts + 521 + msecs_to_jiffies(reporter->burst_period); 522 + 523 + return time_is_after_jiffies(burst_threshold); 524 + } 525 + 537 526 void 538 527 devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter) 539 528 { 540 529 reporter->recovery_count++; 541 - reporter->last_recovery_ts = jiffies; 530 + if (!devlink_health_reporter_in_burst(reporter)) 531 + /* When burst period is set, last_recovery_ts marks the first 532 + * recovery within the burst period, not necessarily the last 533 + * one. 534 + */ 535 + reporter->last_recovery_ts = jiffies; 542 536 } 543 537 EXPORT_SYMBOL_GPL(devlink_health_reporter_recovery_done); 544 538 ··· 626 592 return err; 627 593 } 628 594 595 + static bool 596 + devlink_health_recover_abort(struct devlink_health_reporter *reporter, 597 + enum devlink_health_reporter_state prev_state) 598 + { 599 + unsigned long recover_ts_threshold; 600 + 601 + if (!reporter->auto_recover) 602 + return false; 603 + 604 + /* abort if the previous error wasn't recovered */ 605 + if (prev_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY) 606 + return true; 607 + 608 + if (devlink_health_reporter_in_burst(reporter)) 609 + return false; 610 + 611 + recover_ts_threshold = reporter->last_recovery_ts + 612 + msecs_to_jiffies(reporter->burst_period) + 613 + msecs_to_jiffies(reporter->graceful_period); 614 + if (reporter->last_recovery_ts && reporter->recovery_count && 615 + time_is_after_jiffies(recover_ts_threshold)) 616 + return true; 617 + 618 + return false; 619 + } 620 + 629 621 int devlink_health_report(struct devlink_health_reporter *reporter, 630 622 const char *msg, void *priv_ctx) 631 623 { 632 624 enum devlink_health_reporter_state prev_health_state; 633 625 struct devlink *devlink = reporter->devlink; 634 - unsigned long recover_ts_threshold; 635 626 int ret; 636 627 637 628 /* write a log message of the current error */ ··· 667 608 reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR; 668 609 devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); 669 610 670 - /* abort if the previous error wasn't recovered */ 671 - recover_ts_threshold = reporter->last_recovery_ts + 672 - msecs_to_jiffies(reporter->graceful_period); 673 - if (reporter->auto_recover && 674 - (prev_health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY || 675 - (reporter->last_recovery_ts && reporter->recovery_count && 676 - time_is_after_jiffies(recover_ts_threshold)))) { 611 + if (devlink_health_recover_abort(reporter, prev_health_state)) { 677 612 trace_devlink_health_recover_aborted(devlink, 678 613 reporter->ops->name, 679 614 reporter->health_state,