Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

RDMA/mlx5: Print wc status on CQE error and dump needed

mlx5_handle_error_cqe() only dump the content of the CQE which is raw hex
data, and not straighforward for debug. Print WC status message when we
got CQE error and dump is need.

Here is an example of how the dmesg log looks like with this:

infiniband mlx5_0: mlx5_handle_error_cqe:333:(pid 0): WC error: 10, message: remote access error
infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 00 00 88 13 08 03 61 b3 1e a1 42 d3

Link: https://lore.kernel.org/r/20211227123806.47530-1-dust.li@linux.alibaba.com
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

authored by

Dust Li and committed by
Jason Gunthorpe
a7ad9dde 8d1cfb88

+4 -1
+4 -1
drivers/infiniband/hw/mlx5/cq.c
··· 328 328 } 329 329 330 330 wc->vendor_err = cqe->vendor_err_synd; 331 - if (dump) 331 + if (dump) { 332 + mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status, 333 + ib_wc_status_msg(wc->status)); 332 334 dump_cqe(dev, cqe); 335 + } 333 336 } 334 337 335 338 static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64,