IB/mlx5: Fix initializing CQ fragments buffer

The function init_cq_frag_buf() can be called to initialize the current CQ
fragments buffer cq->buf, or the temporary cq->resize_buf that is filled
during CQ resize operation.

However, the offending commit started to use function get_cqe() for
getting the CQEs, the issue with this change is that get_cqe() always
returns CQEs from cq->buf, which leads us to initialize the wrong buffer,
and in case of enlarging the CQ we try to access elements beyond the size
of the current cq->buf and eventually hit a kernel panic.

[exception RIP: init_cq_frag_buf+103]
[ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib]
[ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core]
[ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt]
[ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt]
[ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt]
[ffff9f799ddcbec8] kthread at ffffffffa66c5da1
[ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd

Fix it by getting the needed CQE by calling mlx5_frag_buf_get_wqe() that
takes the correct source buffer as a parameter.

Fixes: 388ca8be0037 ("IB/mlx5: Implement fragmented completion queue (CQ)")
Link: https://lore.kernel.org/r/90a0e8c924093cfa50a482880ad7e7edb73dc19a.1623309971.git.leonro@nvidia.com
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

authored by Alaa Hleihel and committed by Jason Gunthorpe 2ba0aa2f 6466f03f

+4 -5
+4 -5
drivers/infiniband/hw/mlx5/cq.c
··· 849 ib_umem_release(cq->buf.umem); 850 } 851 852 - static void init_cq_frag_buf(struct mlx5_ib_cq *cq, 853 - struct mlx5_ib_cq_buf *buf) 854 { 855 int i; 856 void *cqe; 857 struct mlx5_cqe64 *cqe64; 858 859 for (i = 0; i < buf->nent; i++) { 860 - cqe = get_cqe(cq, i); 861 cqe64 = buf->cqe_size == 64 ? cqe : cqe + 64; 862 cqe64->op_own = MLX5_CQE_INVALID << 4; 863 } ··· 882 if (err) 883 goto err_db; 884 885 - init_cq_frag_buf(cq, &cq->buf); 886 887 *inlen = MLX5_ST_SZ_BYTES(create_cq_in) + 888 MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) * ··· 1183 if (err) 1184 goto ex; 1185 1186 - init_cq_frag_buf(cq, cq->resize_buf); 1187 1188 return 0; 1189
··· 849 ib_umem_release(cq->buf.umem); 850 } 851 852 + static void init_cq_frag_buf(struct mlx5_ib_cq_buf *buf) 853 { 854 int i; 855 void *cqe; 856 struct mlx5_cqe64 *cqe64; 857 858 for (i = 0; i < buf->nent; i++) { 859 + cqe = mlx5_frag_buf_get_wqe(&buf->fbc, i); 860 cqe64 = buf->cqe_size == 64 ? cqe : cqe + 64; 861 cqe64->op_own = MLX5_CQE_INVALID << 4; 862 } ··· 883 if (err) 884 goto err_db; 885 886 + init_cq_frag_buf(&cq->buf); 887 888 *inlen = MLX5_ST_SZ_BYTES(create_cq_in) + 889 MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) * ··· 1184 if (err) 1185 goto ex; 1186 1187 + init_cq_frag_buf(cq->resize_buf); 1188 1189 return 0; 1190