Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

vfio: Stall vfio_del_group_dev() for container group detach

When the user unbinds the last device of a group from a vfio bus
driver, the devices within that group should be available for other
purposes. We currently have a race that makes this generally, but
not always true. The device can be unbound from the vfio bus driver,
but remaining IOMMU context of the group attached to the container
can result in errors as the next driver configures DMA for the device.

Wait for the group to be detached from the IOMMU backend before
allowing the bus driver remove callback to complete.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

+20
+20
drivers/vfio/vfio.c
··· 85 85 struct list_head unbound_list; 86 86 struct mutex unbound_lock; 87 87 atomic_t opened; 88 + wait_queue_head_t container_q; 88 89 bool noiommu; 89 90 struct kvm *kvm; 90 91 struct blocking_notifier_head notifier; ··· 339 338 mutex_init(&group->unbound_lock); 340 339 atomic_set(&group->container_users, 0); 341 340 atomic_set(&group->opened, 0); 341 + init_waitqueue_head(&group->container_q); 342 342 group->iommu_group = iommu_group; 343 343 #ifdef CONFIG_VFIO_NOIOMMU 344 344 group->noiommu = (iommu_group_get_iommudata(iommu_group) == &noiommu); ··· 996 994 } 997 995 } while (ret <= 0); 998 996 997 + /* 998 + * In order to support multiple devices per group, devices can be 999 + * plucked from the group while other devices in the group are still 1000 + * in use. The container persists with this group and those remaining 1001 + * devices still attached. If the user creates an isolation violation 1002 + * by binding this device to another driver while the group is still in 1003 + * use, that's their fault. However, in the case of removing the last, 1004 + * or potentially the only, device in the group there can be no other 1005 + * in-use devices in the group. The user has done their due diligence 1006 + * and we should lay no claims to those devices. In order to do that, 1007 + * we need to make sure the group is detached from the container. 1008 + * Without this stall, we're potentially racing with a user process 1009 + * that may attempt to immediately bind this device to another driver. 1010 + */ 1011 + if (list_empty(&group->device_list)) 1012 + wait_event(group->container_q, !group->container); 1013 + 999 1014 vfio_group_put(group); 1000 1015 1001 1016 return device_data; ··· 1318 1299 group->iommu_group); 1319 1300 1320 1301 group->container = NULL; 1302 + wake_up(&group->container_q); 1321 1303 list_del(&group->container_next); 1322 1304 1323 1305 /* Detaching the last group deprivileges a container, remove iommu */