Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

USB: cdc-wdm: Make wdm_flush() interruptible and add wdm_fsync().

syzbot is reporting hung task at wdm_flush() [1], for there is a circular
dependency that wdm_flush() from flip_close() for /dev/cdc-wdm0 forever
waits for /dev/raw-gadget to be closed while close() for /dev/raw-gadget
cannot be called unless close() for /dev/cdc-wdm0 completes.

Tetsuo Handa considered that such circular dependency is an usage error [2]
which corresponds to an unresponding broken hardware [3]. But Alan Stern
responded that we should be prepared for such hardware [4]. Therefore,
this patch changes wdm_flush() to use wait_event_interruptible_timeout()
which gives up after 30 seconds, for hardware that remains silent must be
ignored. The 30 seconds are coming out of thin air.

Changing wait_event() to wait_event_interruptible_timeout() makes error
reporting from close() syscall less reliable. To compensate it, this patch
also implements wdm_fsync() which does not use timeout. Those who want to
be very sure that data has gone out to the device are now advised to call
fsync(), with a caveat that fsync() can return -EINVAL when running on
older kernels which do not implement wdm_fsync().

This patch also fixes three more problems (listed below) found during
exhaustive discussion and testing.

Since multiple threads can concurrently call wdm_write()/wdm_flush(),
we need to use wake_up_all() whenever clearing WDM_IN_USE in order to
make sure that all waiters are woken up. Also, error reporting needs
to use fetch-and-clear approach in order not to report same error for
multiple times.

Since wdm_flush() checks WDM_DISCONNECTING, wdm_write() should as well
check WDM_DISCONNECTING.

In wdm_flush(), since locks are not held, it is not safe to dereference
desc->intf after checking that WDM_DISCONNECTING is not set [5]. Thus,
remove dev_err() from wdm_flush().

[1] https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186
[2] https://lkml.kernel.org/r/27b7545e-8f41-10b8-7c02-e35a08eb1611@i-love.sakura.ne.jp
[3] https://lkml.kernel.org/r/79ba410f-e0ef-2465-b94f-6b9a4a82adf5@i-love.sakura.ne.jp
[4] https://lkml.kernel.org/r/20200530011040.GB12419@rowland.harvard.edu
[5] https://lkml.kernel.org/r/c85331fc-874c-6e46-a77f-0ef1dc075308@i-love.sakura.ne.jp

Reported-by: syzbot <syzbot+854768b99f19e89d7f81@syzkaller.appspotmail.com>
Cc: stable <stable@vger.kernel.org>
Co-developed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20200928141755.3476-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

authored by

Oliver Neukum and committed by
Greg Kroah-Hartman
37d2a363 fb58cf4f

+54 -16
+54 -16
drivers/usb/class/cdc-wdm.c
··· 58 58 59 59 #define WDM_MAX 16 60 60 61 + /* we cannot wait forever at flush() */ 62 + #define WDM_FLUSH_TIMEOUT (30 * HZ) 63 + 61 64 /* CDC-WMC r1.1 requires wMaxCommand to be "at least 256 decimal (0x100)" */ 62 65 #define WDM_DEFAULT_BUFSIZE 256 63 66 ··· 154 151 kfree(desc->outbuf); 155 152 desc->outbuf = NULL; 156 153 clear_bit(WDM_IN_USE, &desc->flags); 157 - wake_up(&desc->wait); 154 + wake_up_all(&desc->wait); 158 155 } 159 156 160 157 static void wdm_in_callback(struct urb *urb) ··· 396 393 if (test_bit(WDM_RESETTING, &desc->flags)) 397 394 r = -EIO; 398 395 396 + if (test_bit(WDM_DISCONNECTING, &desc->flags)) 397 + r = -ENODEV; 398 + 399 399 if (r < 0) { 400 400 rv = r; 401 401 goto out_free_mem_pm; ··· 430 424 if (rv < 0) { 431 425 desc->outbuf = NULL; 432 426 clear_bit(WDM_IN_USE, &desc->flags); 427 + wake_up_all(&desc->wait); /* for wdm_wait_for_response() */ 433 428 dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); 434 429 rv = usb_translate_errors(rv); 435 430 goto out_free_mem_pm; ··· 590 583 return rv; 591 584 } 592 585 593 - static int wdm_flush(struct file *file, fl_owner_t id) 586 + static int wdm_wait_for_response(struct file *file, long timeout) 594 587 { 595 588 struct wdm_device *desc = file->private_data; 589 + long rv; /* Use long here because (int) MAX_SCHEDULE_TIMEOUT < 0. */ 596 590 597 - wait_event(desc->wait, 598 - /* 599 - * needs both flags. We cannot do with one 600 - * because resetting it would cause a race 601 - * with write() yet we need to signal 602 - * a disconnect 603 - */ 604 - !test_bit(WDM_IN_USE, &desc->flags) || 605 - test_bit(WDM_DISCONNECTING, &desc->flags)); 591 + /* 592 + * Needs both flags. We cannot do with one because resetting it would 593 + * cause a race with write() yet we need to signal a disconnect. 594 + */ 595 + rv = wait_event_interruptible_timeout(desc->wait, 596 + !test_bit(WDM_IN_USE, &desc->flags) || 597 + test_bit(WDM_DISCONNECTING, &desc->flags), 598 + timeout); 606 599 607 - /* cannot dereference desc->intf if WDM_DISCONNECTING */ 600 + /* 601 + * To report the correct error. This is best effort. 602 + * We are inevitably racing with the hardware. 603 + */ 608 604 if (test_bit(WDM_DISCONNECTING, &desc->flags)) 609 605 return -ENODEV; 610 - if (desc->werr < 0) 611 - dev_err(&desc->intf->dev, "Error in flush path: %d\n", 612 - desc->werr); 606 + if (!rv) 607 + return -EIO; 608 + if (rv < 0) 609 + return -EINTR; 613 610 614 - return usb_translate_errors(desc->werr); 611 + spin_lock_irq(&desc->iuspin); 612 + rv = desc->werr; 613 + desc->werr = 0; 614 + spin_unlock_irq(&desc->iuspin); 615 + 616 + return usb_translate_errors(rv); 617 + 618 + } 619 + 620 + /* 621 + * You need to send a signal when you react to malicious or defective hardware. 622 + * Also, don't abort when fsync() returned -EINVAL, for older kernels which do 623 + * not implement wdm_flush() will return -EINVAL. 624 + */ 625 + static int wdm_fsync(struct file *file, loff_t start, loff_t end, int datasync) 626 + { 627 + return wdm_wait_for_response(file, MAX_SCHEDULE_TIMEOUT); 628 + } 629 + 630 + /* 631 + * Same with wdm_fsync(), except it uses finite timeout in order to react to 632 + * malicious or defective hardware which ceased communication after close() was 633 + * implicitly called due to process termination. 634 + */ 635 + static int wdm_flush(struct file *file, fl_owner_t id) 636 + { 637 + return wdm_wait_for_response(file, WDM_FLUSH_TIMEOUT); 615 638 } 616 639 617 640 static __poll_t wdm_poll(struct file *file, struct poll_table_struct *wait) ··· 766 729 .owner = THIS_MODULE, 767 730 .read = wdm_read, 768 731 .write = wdm_write, 732 + .fsync = wdm_fsync, 769 733 .open = wdm_open, 770 734 .flush = wdm_flush, 771 735 .release = wdm_release,