Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

PCI: hv: Use effective affinity mask

The effective_affinity_mask is always set when an interrupt is assigned in
__assign_irq_vector() -> apic->cpu_mask_to_apicid(), e.g. for struct apic
apic_physflat: -> default_cpu_mask_to_apicid() ->
irq_data_update_effective_affinity(), but it looks d->common->affinity
remains all-1's before the user space or the kernel changes it later.

In the early allocation/initialization phase of an IRQ, we should use the
effective_affinity_mask, otherwise Hyper-V may not deliver the interrupt to
the expected CPU. Without the patch, if we assign 7 Mellanox ConnectX-3
VFs to a 32-vCPU VM, one of the VFs may fail to receive interrupts.

Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
Cc: stable@vger.kernel.org
Cc: Jork Loeser <jloeser@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>

authored by

Dexuan Cui and committed by
Bjorn Helgaas
79aa801e 9e66317d

+5 -3
+5 -3
drivers/pci/host/pci-hyperv.c
··· 879 879 int cpu; 880 880 u64 res; 881 881 882 - dest = irq_data_get_affinity_mask(data); 882 + dest = irq_data_get_effective_affinity_mask(data); 883 883 pdev = msi_desc_to_pci_dev(msi_desc); 884 884 pbus = pdev->bus; 885 885 hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); ··· 1042 1042 struct hv_pci_dev *hpdev; 1043 1043 struct pci_bus *pbus; 1044 1044 struct pci_dev *pdev; 1045 + struct cpumask *dest; 1045 1046 struct compose_comp_ctxt comp; 1046 1047 struct tran_int_desc *int_desc; 1047 1048 struct { ··· 1057 1056 int ret; 1058 1057 1059 1058 pdev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data)); 1059 + dest = irq_data_get_effective_affinity_mask(data); 1060 1060 pbus = pdev->bus; 1061 1061 hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); 1062 1062 hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn)); ··· 1083 1081 switch (pci_protocol_version) { 1084 1082 case PCI_PROTOCOL_VERSION_1_1: 1085 1083 size = hv_compose_msi_req_v1(&ctxt.int_pkts.v1, 1086 - irq_data_get_affinity_mask(data), 1084 + dest, 1087 1085 hpdev->desc.win_slot.slot, 1088 1086 cfg->vector); 1089 1087 break; 1090 1088 1091 1089 case PCI_PROTOCOL_VERSION_1_2: 1092 1090 size = hv_compose_msi_req_v2(&ctxt.int_pkts.v2, 1093 - irq_data_get_affinity_mask(data), 1091 + dest, 1094 1092 hpdev->desc.win_slot.slot, 1095 1093 cfg->vector); 1096 1094 break;