commits

Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference
added by grab_header when return from !dir_emit_dots path.
It can cause any path called unregister_sysctl_table will
wait forever.

The calltrace of CVE-2016-9191:

[ 5535.960522] Call Trace:
[ 5535.963265] [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 5535.968817] [<ffffffff817d33fb>] schedule_timeout+0x3db/0x6f0
[ 5535.975346] [<ffffffff817cf055>] ? wait_for_completion+0x45/0x130
[ 5535.982256] [<ffffffff817cf0d3>] wait_for_completion+0xc3/0x130
[ 5535.988972] [<ffffffff810d1fd0>] ? wake_up_q+0x80/0x80
[ 5535.994804] [<ffffffff8130de64>] drop_sysctl_table+0xc4/0xe0
[ 5536.001227] [<ffffffff8130de17>] drop_sysctl_table+0x77/0xe0
[ 5536.007648] [<ffffffff8130decd>] unregister_sysctl_table+0x4d/0xa0
[ 5536.014654] [<ffffffff8130deff>] unregister_sysctl_table+0x7f/0xa0
[ 5536.021657] [<ffffffff810f57f5>] unregister_sched_domain_sysctl+0x15/0x40
[ 5536.029344] [<ffffffff810d7704>] partition_sched_domains+0x44/0x450
[ 5536.036447] [<ffffffff817d0761>] ? __mutex_unlock_slowpath+0x111/0x1f0
[ 5536.043844] [<ffffffff81167684>] rebuild_sched_domains_locked+0x64/0xb0
[ 5536.051336] [<ffffffff8116789d>] update_flag+0x11d/0x210
[ 5536.057373] [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.064186] [<ffffffff81167acb>] ? cpuset_css_offline+0x1b/0x60
[ 5536.070899] [<ffffffff810fce3d>] ? trace_hardirqs_on+0xd/0x10
[ 5536.077420] [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.084234] [<ffffffff8115a9f5>] ? css_killed_work_fn+0x25/0x220
[ 5536.091049] [<ffffffff81167ae5>] cpuset_css_offline+0x35/0x60
[ 5536.097571] [<ffffffff8115aa2c>] css_killed_work_fn+0x5c/0x220
[ 5536.104207] [<ffffffff810bc83f>] process_one_work+0x1df/0x710
[ 5536.110736] [<ffffffff810bc7c0>] ? process_one_work+0x160/0x710
[ 5536.117461] [<ffffffff810bce9b>] worker_thread+0x12b/0x4a0
[ 5536.123697] [<ffffffff810bcd70>] ? process_one_work+0x710/0x710
[ 5536.130426] [<ffffffff810c3f7e>] kthread+0xfe/0x120
[ 5536.135991] [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40
[ 5536.142041] [<ffffffff810c3e80>] ? kthread_create_on_node+0x230/0x230

One cgroup maintainer mentioned that "cgroup is trying to offline
a cpuset css, which takes place under cgroup_mutex. The offlining
ends up trying to drain active usages of a sysctl table which apprently
is not happening."
The real reason is that proc_sys_readdir doesn't drop reference added
by grab_header when return from !dir_emit_dots path. So this cpuset
offline path will wait here forever.

See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13

Fixes: f0c3b5093add ("[readdir] convert procfs")
Cc: stable@vger.kernel.org
Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: Yang Shukui <yangshukui@huawei.com>
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

9y ago

Linus Torvalds

2d5a7101

Merge tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

9y ago

Augusto Mecking Caringi

c8a6a09c

vme: Fix wrong pointer utilization in ca91cx42_slave_get

9y ago

Andrei Vagin

add7c65c

pid: fix lockdep deadlock warning due to ucount_lock

9y ago

Linus Torvalds

7f138b97

Merge tag 'tty-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

9y ago

Greg Kroah-Hartman

c7334ce8

Revert "driver core: Add deferred_probe attribute to devices in sysfs"

9y ago

Randy Dunlap

546cf3ef

auxdisplay: fix new ht16k33 build errors

9y ago

Eric W. Biederman

75422726

libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount

9y ago

Linus Torvalds

793e039e

Merge tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

9y ago

Akinobu Mita

802c0388

sysrq: attach sysrq handler correctly for 32-bit kernel

9y ago

Linus Torvalds

a121103c

Linux 4.10-rc3 v4.10-rc3

9y ago

Colin Ian King

0fa2c8eb

ppdev: don't print a free'd string

9y ago

Eric W. Biederman

3895dbf8

mnt: Protect the mountpoint hashtable with mount_lock

9y ago

Linus Torvalds

aa2797b3

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

9y ago

Greg Kroah-Hartman

97f9c5f2

Merge tag 'usb-serial-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

9y ago

Herbert Xu

6741f551

Revert "tty: serial: 8250: add CON_CONSDEV to flags"

9y ago

Linus Torvalds

83280e90

Merge tag 'usb-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

9y ago

Pan Bian

5b11ebed

extcon: return error code on failure

9y ago

Linus Torvalds

0c744ea4

Linux 4.10-rc2 v4.10-rc2

9y ago

Linus Torvalds

83346fbc

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9y ago

Ricardo Ribalda

701dc207

i2c: piix4: Avoid race conditions with IMC

9y ago

Mathias Nyman

d6169d04

xhci: fix deadlock at host remove by running watchdog correctly

9y ago

Johan Hovold

2d5a9c72

USB: serial: ch341: fix control-message error handling

9y ago

Daniel Jedrychowski

2bed8a8e

Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break

9y ago

Linus Torvalds

cc250e26

Merge tag 'char-misc-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

9y ago

Alan Stern

0a8fd134

USB: fix problems with duplicate endpoint addresses

When checking a new device's descriptors, the USB core does not check
for duplicate endpoint addresses. This can cause a problem when the
sysfs files for those endpoints are created; trying to create multiple
files with the same name will provoke a WARNING:

WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
sysfs: cannot create duplicate filename
'/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
Kernel panic - not syncing: panic_on_warn set ...

CPU: 2 PID: 865 Comm: kworker/2:1 Not tainted 4.9.0-rc7+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
ffff88006bee64c8 ffffffff81f96b8a ffffffff00000001 1ffff1000d7dcc2c
ffffed000d7dcc24 0000000000000001 0000000041b58ab3 ffffffff8598b510
ffffffff81f968f8 ffffffff850fee20 ffffffff85cff020 dffffc0000000000
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51
[<ffffffff8168c88e>] panic+0x1cb/0x3a9 kernel/panic.c:179
[<ffffffff812b80b4>] __warn+0x1c4/0x1e0 kernel/panic.c:542
[<ffffffff812b8195>] warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:565
[<ffffffff819e70ca>] sysfs_warn_dup+0x8a/0xa0 fs/sysfs/dir.c:30
[<ffffffff819e7308>] sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:59
[< inline >] create_dir lib/kobject.c:71
[<ffffffff81fa1b07>] kobject_add_internal+0x227/0xa60 lib/kobject.c:229
[< inline >] kobject_add_varg lib/kobject.c:366
[<ffffffff81fa2479>] kobject_add+0x139/0x220 lib/kobject.c:411
[<ffffffff82737a63>] device_add+0x353/0x1660 drivers/base/core.c:1088
[<ffffffff82738d8d>] device_register+0x1d/0x20 drivers/base/core.c:1206
[<ffffffff82cb77d3>] usb_create_ep_devs+0x163/0x260 drivers/usb/core/endpoint.c:195
[<ffffffff82c9f27b>] create_intf_ep_devs+0x13b/0x200 drivers/usb/core/message.c:1030
[<ffffffff82ca39d3>] usb_set_configuration+0x1083/0x18d0 drivers/usb/core/message.c:1937
[<ffffffff82cc9e2e>] generic_probe+0x6e/0xe0 drivers/usb/core/generic.c:172
[<ffffffff82caa7fa>] usb_probe_device+0xaa/0xe0 drivers/usb/core/driver.c:263

This patch prevents the problem by checking for duplicate endpoint
addresses during enumeration and skipping any duplicates.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9y ago

Robin Murphy

488debb9

drivers: char: mem: Fix thinkos in kmem address checks

9y ago

Linus Torvalds

4759d386

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

9y ago

Linus Torvalds

a11ce3a4

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9y ago

Tobias Klauser

45382862

x86/mpx: Use compatible types in comparison to fix sparse error

9y ago

Colin Ian King

2659161d

i2c: fix spelling mistake: "insufficent" -> "insufficient"

9y ago

Bin Liu

7b6c1b4c

usb: musb: fix runtime PM in debugfs

9y ago

Johan Hovold

146cc8a1

USB: serial: kl5kusb105: fix line-state error handling

9y ago

Gabriel Krisman Bertazi

c130b666

8250_pci: Fix potential use-after-free in error path

9y ago

Linus Torvalds

6ea17ed1

Merge tag 'staging-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

9y ago

Stephen Hemminger

421463b8

hyper-v: Add myself as additional MAINTAINER

9y ago

Peter Rosin

8f12dc24

usb: ohci-at91: use descriptor-based gpio APIs correctly

9y ago

Alexander Usyskin

7ee7f45a

mei: bus: enable OS version only for SPT and newer

9y ago

Linus Torvalds

238d1d0f

Merge tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux

9y ago

Jan Kara

1db17542

ext4: Simplify DAX fault path

9y ago

Linus Torvalds

79078c53

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9y ago

Frederic Weisbecker

24b91e36

nohz: Fix collision between tick and other hrtimers

When the tick is stopped and an interrupt occurs afterward, we check on
that interrupt exit if the next tick needs to be rescheduled. If it
doesn't need any update, we don't want to do anything.

In order to check if the tick needs an update, we compare it against the
clockevent device deadline. Now that's a problem because the clockevent
device is at a lower level than the tick itself if it is implemented
on top of hrtimer.

Every hrtimer share this clockevent device. So comparing the next tick
deadline against the clockevent device deadline is wrong because the
device may be programmed for another hrtimer whose deadline collides
with the tick. As a result we may end up not reprogramming the tick
accidentally.

In a worst case scenario under full dynticks mode, the tick stops firing
as it is supposed to every 1hz, leaving /proc/stat stalled:

Task in a full dynticks CPU
----------------------------

* hrtimer A is queued 2 seconds ahead
* the tick is stopped, scheduled 1 second ahead
* tick fires 1 second later
* on tick exit, nohz schedules the tick 1 second ahead but sees
the clockevent device is already programmed to that deadline,
fooled by hrtimer A, the tick isn't rescheduled.
* hrtimer A is cancelled before its deadline
* tick never fires again until an interrupt happens...

In order to fix this, store the next tick deadline to the tick_sched
local structure and reuse that value later to check whether we need to
reprogram the clock after an interrupt.

On the other hand, ts->sleep_length still wants to know about the next
clock event and not just the tick, so we want to improve the related
comment to avoid confusion.

Reported-by: James Hartsock <hartsjc@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/r/1483539124-5693-1-git-send-email-fweisbec@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

9y ago

Len Brown

695085b4

x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()

9y ago

John Garry

6f724fb3

i2c: print correct device invalid address

9y ago

Andy Lutomirski

620f1a63

wusbcore: Fix one more crypto-on-the-stack bug

9y ago

Johan Hovold

55fa15b5

USB: serial: ch341: fix baud rate and line-control handling

9y ago

Richard Genoud

b389f173

tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done

9y ago

Johannes Weiner

ea07b862

mm: workingset: fix use-after-free in shadow node shrinker

Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked like
use-after-free access from the shadow node shrinker.

Dave Jones managed to reproduce the issue with a debug patch applied,
which confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:

WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
Call Trace:
delete_node+0x1e4/0x200
__radix_tree_delete_node+0xd/0x10
shadow_lru_isolate+0xe6/0x220
__list_lru_walk_one.isra.4+0x9b/0x190
list_lru_walk_one+0x23/0x30
scan_shadow_nodes+0x2e/0x40
shrink_slab.part.44+0x23d/0x5d0
shrink_node+0x22c/0x330
kswapd+0x392/0x8f0

This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
inlined radix_tree_shrink().

The problem is with 14b468791fa9 ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node.

While the reclaimed shadow node itself is unlinked by the shrinker, its
deletion from the tree can cause the left-most leaf node in the tree to
be shrunk. If that happens to be a shadow node as well, we don't unlink
it from the LRU as we should.

Consider this tree, where the s are shadow entries:

root->rnode
|
[0 n]
| |
[s ] [sssss]

Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:

root->rnode
|
[0 ]
|
[s ]

Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:

root->rnode
|
[s ]

The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root->rnode directly and free
the node - which is a leaf node on the shadow node LRU.

root->rnode
|
s

Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.

Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.

Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.

Fixes: 14b468791fa9 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: Dave Chinner <david@fromorbit.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Leech <cleech@redhat.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>