scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ

When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of MQ
resources based on shost values set by the driver. In newer cases of the
driver, which attempts to set nr_hw_queues to the cpu count, the
multipliers become excessive, with a single shost having SCSI-MQ
pre-allocation reaching into the multiple GBytes range. NPIV, which
creates additional shosts, only multiply this overhead. On lower-memory
systems, this can exhaust system memory very quickly, resulting in a system
crash or failures in the driver or elsewhere due to low memory conditions.

After testing several scenarios, the situation can be mitigated by limiting
the value set in shost->nr_hw_queues to 4. Although the shost values were
changed, the driver still had per-cpu hardware queues of its own that
allowed parallelization per-cpu. Testing revealed that even with the
smallish number for nr_hw_queues for SCSI-MQ, performance levels remained
near maximum with the within-driver affiinitization.

A module parameter was created to allow the value set for the nr_hw_queues
to be tunable.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

authored by

James Smart and committed by
Martin K. Petersen
77ffd346 7c7cfdcf

+27 -4
+1
drivers/scsi/lpfc/lpfc.h
··· 824 uint32_t cfg_cq_poll_threshold; 825 uint32_t cfg_cq_max_proc_limit; 826 uint32_t cfg_fcp_cpu_map; 827 uint32_t cfg_hdw_queue; 828 uint32_t cfg_irq_chann; 829 uint32_t cfg_suppress_rsp;
··· 824 uint32_t cfg_cq_poll_threshold; 825 uint32_t cfg_cq_max_proc_limit; 826 uint32_t cfg_fcp_cpu_map; 827 + uint32_t cfg_fcp_mq_threshold; 828 uint32_t cfg_hdw_queue; 829 uint32_t cfg_irq_chann; 830 uint32_t cfg_suppress_rsp;
+15
drivers/scsi/lpfc/lpfc_attr.c
··· 5709 "Embed NVME Command in WQE"); 5710 5711 /* 5712 * lpfc_hdw_queue: Set the number of Hardware Queues the driver 5713 * will advertise it supports to the NVME and SCSI layers. This also 5714 * will map to the number of CQ/WQ pairs the driver will create. ··· 6043 &dev_attr_lpfc_cq_poll_threshold, 6044 &dev_attr_lpfc_cq_max_proc_limit, 6045 &dev_attr_lpfc_fcp_cpu_map, 6046 &dev_attr_lpfc_hdw_queue, 6047 &dev_attr_lpfc_irq_chann, 6048 &dev_attr_lpfc_suppress_rsp, ··· 7126 /* Initialize first burst. Target vs Initiator are different. */ 7127 lpfc_nvme_enable_fb_init(phba, lpfc_nvme_enable_fb); 7128 lpfc_nvmet_fb_size_init(phba, lpfc_nvmet_fb_size); 7129 lpfc_hdw_queue_init(phba, lpfc_hdw_queue); 7130 lpfc_irq_chann_init(phba, lpfc_irq_chann); 7131 lpfc_enable_bbcr_init(phba, lpfc_enable_bbcr);
··· 5709 "Embed NVME Command in WQE"); 5710 5711 /* 5712 + * lpfc_fcp_mq_threshold: Set the maximum number of Hardware Queues 5713 + * the driver will advertise it supports to the SCSI layer. 5714 + * 5715 + * 0 = Set nr_hw_queues by the number of CPUs or HW queues. 5716 + * 1,128 = Manually specify the maximum nr_hw_queue value to be set, 5717 + * 5718 + * Value range is [0,128]. Default value is 8. 5719 + */ 5720 + LPFC_ATTR_R(fcp_mq_threshold, LPFC_FCP_MQ_THRESHOLD_DEF, 5721 + LPFC_FCP_MQ_THRESHOLD_MIN, LPFC_FCP_MQ_THRESHOLD_MAX, 5722 + "Set the number of SCSI Queues advertised"); 5723 + 5724 + /* 5725 * lpfc_hdw_queue: Set the number of Hardware Queues the driver 5726 * will advertise it supports to the NVME and SCSI layers. This also 5727 * will map to the number of CQ/WQ pairs the driver will create. ··· 6030 &dev_attr_lpfc_cq_poll_threshold, 6031 &dev_attr_lpfc_cq_max_proc_limit, 6032 &dev_attr_lpfc_fcp_cpu_map, 6033 + &dev_attr_lpfc_fcp_mq_threshold, 6034 &dev_attr_lpfc_hdw_queue, 6035 &dev_attr_lpfc_irq_chann, 6036 &dev_attr_lpfc_suppress_rsp, ··· 7112 /* Initialize first burst. Target vs Initiator are different. */ 7113 lpfc_nvme_enable_fb_init(phba, lpfc_nvme_enable_fb); 7114 lpfc_nvmet_fb_size_init(phba, lpfc_nvmet_fb_size); 7115 + lpfc_fcp_mq_threshold_init(phba, lpfc_fcp_mq_threshold); 7116 lpfc_hdw_queue_init(phba, lpfc_hdw_queue); 7117 lpfc_irq_chann_init(phba, lpfc_irq_chann); 7118 lpfc_enable_bbcr_init(phba, lpfc_enable_bbcr);
+6 -4
drivers/scsi/lpfc/lpfc_init.c
··· 4309 shost->max_cmd_len = 16; 4310 4311 if (phba->sli_rev == LPFC_SLI_REV4) { 4312 - if (phba->cfg_fcp_io_sched == LPFC_FCP_SCHED_BY_HDWQ) 4313 - shost->nr_hw_queues = phba->cfg_hdw_queue; 4314 - else 4315 - shost->nr_hw_queues = phba->sli4_hba.num_present_cpu; 4316 4317 shost->dma_boundary = 4318 phba->sli4_hba.pc_sli4_params.sge_supp_len-1;
··· 4309 shost->max_cmd_len = 16; 4310 4311 if (phba->sli_rev == LPFC_SLI_REV4) { 4312 + if (!phba->cfg_fcp_mq_threshold || 4313 + phba->cfg_fcp_mq_threshold > phba->cfg_hdw_queue) 4314 + phba->cfg_fcp_mq_threshold = phba->cfg_hdw_queue; 4315 + 4316 + shost->nr_hw_queues = min_t(int, 2 * num_possible_nodes(), 4317 + phba->cfg_fcp_mq_threshold); 4318 4319 shost->dma_boundary = 4320 phba->sli4_hba.pc_sli4_params.sge_supp_len-1;
+5
drivers/scsi/lpfc/lpfc_sli4.h
··· 44 #define LPFC_HBA_HDWQ_MAX 128 45 #define LPFC_HBA_HDWQ_DEF 0 46 47 /* Common buffer size to accomidate SCSI and NVME IO buffers */ 48 #define LPFC_COMMON_IO_BUF_SZ 768 49
··· 44 #define LPFC_HBA_HDWQ_MAX 128 45 #define LPFC_HBA_HDWQ_DEF 0 46 47 + /* FCP MQ queue count limiting */ 48 + #define LPFC_FCP_MQ_THRESHOLD_MIN 0 49 + #define LPFC_FCP_MQ_THRESHOLD_MAX 128 50 + #define LPFC_FCP_MQ_THRESHOLD_DEF 8 51 + 52 /* Common buffer size to accomidate SCSI and NVME IO buffers */ 53 #define LPFC_COMMON_IO_BUF_SZ 768 54