commits

While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
Kinsbursky added in commit 4a674f34ba04 ("ipc: introduce message queue
copy feature" => kernel 3.8), I discovered a couple of bugs in the
implementation. The two bugs concern MSG_COPY interactions with other
msgrcv() flags, namely:

(A) MSG_COPY + MSG_EXCEPT
(B) MSG_COPY + !IPC_NOWAIT

The bugs are distinct (and the fix for the first one is obvious),
however my fix for both is a single-line patch, which is why I'm
combining them in a single mail, rather than writing two mails+patches.

===== (A) MSG_COPY + MSG_EXCEPT =====

With the addition of the MSG_COPY flag, there are now two msgrcv()
flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
argument in unrelated ways. Specifying both in the same call is a
logical error that is currently permitted, with the effect that MSG_COPY
has priority and MSG_EXCEPT is ignored. The call should give an error
if both flags are specified. The patch below implements that behavior.

===== (B) (B) MSG_COPY + !IPC_NOWAIT =====

The test code that was submitted in commit 3a665531a3b7 ("selftests: IPC
message queue copy feature test") shows MSG_COPY being used in
conjunction with IPC_NOWAIT. In other words, if there is no message at
the position 'msgtyp'. return immediately with the error in ENOMSG.

What was not (fully) tested is the behavior if MSG_COPY is specified
*without* IPC_NOWAIT, and there is an odd behavior. If the queue
contains less than 'msgtyp' messages, then the call blocks until the
next message is written to the queue. At that point, the msgrcv() call
returns a copy of the newly added message, regardless of whether that
message is at the ordinal position 'msgtyp'. This is clearly bogus, and
problematic for applications that might want to make use of the MSG_COPY
flag.

I considered the following possible solutions to this problem:

(1) Force the call to block until a message *does* appear at the
position 'msgtyp'.

(2) If the MSG_COPY flag is specified, the kernel should implicitly add
IPC_NOWAIT, so that the call fails with ENOMSG for this case.

(3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate
an error (probably, EINVAL is the right one).

I do not know if any application would really want to have the
functionality of solution (1), especially since an application can
determine in advance the number of messages in the queue using msgctl()
IPC_STAT. Obviously, this solution would be the most work to implement.

Solution (2) would have the effect of silently fixing any applications
that tried to employ broken behavior. However, it would mean that if we
later decided to implement solution (1), then user-space could not
easily detect what the kernel supports (but, since I'm somewhat doubtful
that solution (1) is needed, I'm not sure that this is much of a
problem).

Solution (3) would have the effect of informing broken applications that
they are doing something broken. The downside is that this would cause
a ABI breakage for any applications that are currently employing the
broken behavior. However:

a) Those applications are almost certainly not getting the results they
expect.
b) Possibly, those applications don't even exist, because MSG_COPY is
currently hidden behind CONFIG_CHECKPOINT_RESTORE.

The upside of solution (3) is that if we later decided to implement
solution (1), user-space could determine what the kernel supports, via
the error return.

In my view, solution (3) is mildly preferable to solution (2), and
solution (1) could still be done later if anyone really cares. The
patch below implements solution (3).

PS. For anyone out there still listening, it's the usual story:
documenting an API (and the thinking about, and the testing of the API,
that documentation entails) is the one of the single best ways of
finding bugs in the API, as I've learned from a lot of experience. Best
to do that documentation before releasing the API.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11y ago

Dave Jones

b7b4839d

perf/x86: Fix leak in uncore_type_init failure paths

12y ago

Peter Zijlstra

177c53d9

stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()

12y ago

Linus Torvalds

3b4df68d

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

12y ago

Ingo Molnar

b8ad0f91

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

12y ago

Juri Lelli

d44753b8

sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy

12y ago

Linus Torvalds

a4ecdf82

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

12y ago

Ales Novak

b12bb60d

[SCSI] storvsc: NULL pointer dereference fix

12y ago

Linus Torvalds

8712a005

Merge branch 'akpm' (patches from Andrew Morton)

12y ago

Don Zickus

fdf57dd0

perf machine: Use map as success in ip__resolve_ams

12y ago

Linus Torvalds

cee152ff

Merge tag 'pm+acpi-3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:
"Three of these are regression fixes, for two recent regressions and
one introduced during the 3.13 cycle, and the fourth one is a working
version of the fix that had to be reverted last time.

Specifics:

- A recent ACPI resources handling fix overlooked the fact that it
had to update the ACPI PNP subsystem's resources parsing too and
caused confusing warning messages to be printed during system
intialization on some systems (with arguably buggy ACPI tables).
Fix from Zhang Rui.

- Moving the early ACPI initialization before timekeeping_init()
earlier in this cycle broke fast TSC calibration on at least one
system, so it needs to be done later, but still before
efi_enter_virtual_mode() to allow the EFI initialization to refer
to ACPI.

- A change related to code duplication reduction in the cpufreq core
inadvertently caused cpufreq intialization to fail for some CPUs
handled by intel_pstate by adding checks that may fail for that
driver, but aren't even necessary when it is used. The issue is
addressed by preventing those checks from run in the configurations
in which they aren't needed.

- If the Hardware Reduced ACPI flag is set in the ACPI tables, system
suspend, hibernation and ACPI power off will only work when special
sleep control and sleep status registeres are provided (their
addresses in the ACPI tables are not zero). If those registers are
not available, the features in question have no chances to work, so
they shouldn't even be regarded as supported. That helps with
power off in particular, because alternative power off methods may
be used then and they may actually work"

* tag 'pm+acpi-3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / sleep: Add extra checks for HW Reduced ACPI mode sleep states
ACPI / init: Invoke early ACPI initialization later
cpufreq: Skip current frequency initialization for ->setpolicy drivers
PNP / ACPI: proper handling of ACPI IO/Memory resource parsing failures

12y ago

Daniel J Blueman

847d7970

x86/amd/numa: Fix northbridge quirk to assign correct NUMA node

12y ago

Giridhar Malavali

b77ed25c

[SCSI] qla2xxx: Poll during initialization for ISP25xx and ISP83xx

12y ago

Linus Torvalds

e6a4b6f5

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

12y ago

Geert Uytterhoeven

0eb808eb

cris: convert ffs from an object-like macro to a function-like macro

12y ago

Jiri Olsa

155b3a13

perf symbols: Fix crash in elf_section_by_name

12y ago

Linus Torvalds

0c01b452

Merge tag 'dm-3.14-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

12y ago

Rafael J. Wysocki

d5af40d6

Merge branches 'pnp', 'acpi-init', 'acpi-sleep' and 'pm-cpufreq'

12y ago

Suresh Siddha

731bd6a9

x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU

12y ago

Lukasz Dorau

c59053a2

[SCSI] isci: correct erroneous for_each_isci_host macro

In the first place, the loop 'for' in the macro 'for_each_isci_host'
(drivers/scsi/isci/host.h:314) is incorrect, because it accesses
the 3rd element of 2 element array. After the 2nd iteration it executes
the instruction:
ihost = to_pci_info(pdev)->hosts[2]
(while the size of the 'hosts' array equals 2) and reads an
out of range element.

In the second place, this loop is incorrectly optimized by GCC v4.8
(see http://marc.info/?l=linux-kernel&m=138998871911336&w=2).
As a result, on platforms with two SCU controllers,
the loop is executed more times than it can be (for i=0,1 and 2).
It causes kernel panic during entering the S3 state
and the following oops after 'rmmod isci':

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8131360b>] __list_add+0x1b/0xc0
Oops: 0000 [#1] SMP
RIP: 0010:[<ffffffff8131360b>] [<ffffffff8131360b>] __list_add+0x1b/0xc0
Call Trace:
[<ffffffff81661b84>] __mutex_lock_slowpath+0x114/0x1b0
[<ffffffff81661c3f>] mutex_lock+0x1f/0x30
[<ffffffffa03e97cb>] sas_disable_events+0x1b/0x50 [libsas]
[<ffffffffa03e9818>] sas_unregister_ha+0x18/0x60 [libsas]
[<ffffffffa040316e>] isci_unregister+0x1e/0x40 [isci]
[<ffffffffa0403efd>] isci_pci_remove+0x5d/0x100 [isci]
[<ffffffff813391cb>] pci_device_remove+0x3b/0xb0
[<ffffffff813fbf7f>] __device_release_driver+0x7f/0xf0
[<ffffffff813fc8f8>] driver_detach+0xa8/0xb0
[<ffffffff813fbb8b>] bus_remove_driver+0x9b/0x120
[<ffffffff813fcf2c>] driver_unregister+0x2c/0x50
[<ffffffff813381f3>] pci_unregister_driver+0x23/0x80
[<ffffffffa04152f8>] isci_exit+0x10/0x1e [isci]
[<ffffffff810d199b>] SyS_delete_module+0x16b/0x2d0
[<ffffffff81012a21>] ? do_notify_resume+0x61/0xa0
[<ffffffff8166ce29>] system_call_fastpath+0x16/0x1b

The loop has been corrected.
This patch fixes kernel panic during entering the S3 state
and the above oops.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Tested-by: Lukasz Dorau <lukasz.dorau@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

12y ago

Linus Torvalds

2b64c543

Merge branch 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

12y ago

Al Viro

bd2a31d5

get rid of fget_light()

12y ago

Sergei Antonov

d7d673a5

hfsplus: add HFSX subfolder count support

12y ago

Ben Hutchings

02c5bb4a

perf trace: Decode architecture-specific signal numbers

12y ago

Linus Torvalds

c60f7d5a

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

12y ago

Heinz Mauelshagen

e893fba9

dm cache: fix access beyond end of origin device

12y ago

Zhang Rui

89935315

PNP / ACPI: proper handling of ACPI IO/Memory resource parsing failures

12y ago

Rafael J. Wysocki

c4e1acbb

ACPI / init: Invoke early ACPI initialization later

12y ago

Rafael J. Wysocki

a4e90bed

ACPI / sleep: Add extra checks for HW Reduced ACPI mode sleep states

12y ago

Rafael J. Wysocki

2ed99e39

cpufreq: Skip current frequency initialization for ->setpolicy drivers

12y ago

Linus Torvalds

fa389e22

Linux 3.14-rc6 v3.14-rc6

12y ago

Dan Williams

ddfadd77

[SCSI] isci: fix reset timeout handling

12y ago

Tejun Heo

83493d7e

libata: use wider match for blacklisting Crucial M500

12y ago

Al Viro

00e188ef

sockfd_lookup_light(): switch to fdget^W^Waway from fget_light

12y ago

Colin Ian King

53942232

tools/testing/selftests/ipc/msgque.c: handle msgget failure return correctly

12y ago

Ingo Molnar

af76815a

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

12y ago

Linus Torvalds

c14c06b7

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

12y ago

Dave Airlie

f042cc4a

Merge tag 'ttm-fixes-3.14-2014-03-12' of git://people.freedesktop.org/~thomash/linux into drm-fixes

12y ago

Heinz Mauelshagen

8b9d9666

dm cache: fix truncation bug when copying a block to/from >2TB fast device

12y ago

Linus Torvalds

79e61542

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

12y ago

Mike Christie

126e964a

[SCSI] be2iscsi: fix bad if expression

12y ago

Michele Baldessari

b28a613e

libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8 (2BA30001)

12y ago

Linus Torvalds

9c225f26

vfs: atomic f_pos accesses as per POSIX

12y ago

Michael Opdenacker

1443176f

MAINTAINERS: blackfin: add git repository

12y ago

Ingo Molnar

b6e53f32

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

12y ago

Jiri Olsa

b39c2a57

perf tools: Fix strict alias issue for find_first_bit

12y ago

Linus Torvalds

53611c0c

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"I know this is a bit more than you want to see, and I've told the
wireless folks under no uncertain terms that they must severely scale
back the extent of the fixes they are submitting this late in the
game.

Anyways:

1) vmxnet3's netpoll doesn't perform the equivalent of an ISR, which
is the correct implementation, like it should. Instead it does
something like a NAPI poll operation. This leads to crashes.

From Neil Horman and Arnd Bergmann.

2) Segmentation of SKBs requires proper socket orphaning of the
fragments, otherwise we might access stale state released by the
release callbacks.

This is a 5 patch fix, but the initial patches are giving
variables and such significantly clearer names such that the
actual fix itself at the end looks trivial.

From Michael S. Tsirkin.

3) TCP control block release can deadlock if invoked from a timer on
an already "owned" socket. Fix from Eric Dumazet.

4) In the bridge multicast code, we must validate that the
destination address of general queries is the link local all-nodes
multicast address. From Linus Lüssing.

5) The x86 BPF JIT support for negative offsets puts the parameter
for the helper function call in the wrong register. Fix from
Alexei Starovoitov.

6) The descriptor type used for RTL_GIGA_MAC_VER_17 chips in the
r8169 driver is incorrect. Fix from Hayes Wang.

7) The xen-netback driver tests skb_shinfo(skb)->gso_type bits to see
if a packet is a GSO frame, but that's not the correct test. It
should use skb_is_gso(skb) instead. Fix from Wei Liu.

8) Negative msg->msg_namelen values should generate an error, from
Matthew Leach.

9) at86rf230 can deadlock because it takes the same lock from it's
ISR and it's hard_start_xmit method, without disabling interrupts
in the latter. Fix from Alexander Aring.

10) The FEC driver's restart doesn't perform operations in the correct
order, so promiscuous settings can get lost. Fix from Stefan
Wahren.

11) Fix SKB leak in SCTP cookie handling, from Daniel Borkmann.

12) Reference count and memory leak fixes in TIPC from Ying Xue and
Erik Hugne.

13) Forced eviction in inet_frag_evictor() must strictly make sure all
frags are deleted, otherwise module unload (f.e. 6lowpan) can
crash. Fix from Florian Westphal.

14) Remove assumptions in AF_UNIX's use of csum_partial() (which it
uses as a hash function), which breaks on PowerPC. From Anton
Blanchard.

The main gist of the issue is that csum_partial() is defined only
as a value that, once folded (f.e. via csum_fold()) produces a
correct 16-bit checksum. It is legitimate, therefore, for
csum_partial() to produce two different 32-bit values over the
same data if their respective alignments are different.

15) Fix endiannes bug in MAC address handling of ibmveth driver, also
from Anton Blanchard.

16) Error checks for ipv6 exthdrs offload registration are reversed,
from Anton Nayshtut.

17) Externally triggered ipv6 addrconf routes should count against the
garbage collection threshold. Fix from Sabrina Dubroca.

18) The PCI shutdown handler added to the bnx2 driver can wedge the
chip if it was not brought up earlier already, which in particular
causes the firmware to shut down the PHY. Fix from Michael Chan.

19) Adjust the sanity WARN_ON_ONCE() in qdisc_list_add() because as
currently coded it can and does trigger in legitimate situations.
From Eric Dumazet.

20) BNA driver fails to build on ARM because of a too large udelay()
call, fix from Ben Hutchings.

21) Fair-Queue qdisc holds locks during GFP_KERNEL allocations, fix
from Eric Dumazet.

22) The vlan passthrough ops added in the previous release causes a
regression in source MAC address setting of outgoing headers in
some circumstances. Fix from Peter Boström"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
ipv6: Avoid unnecessary temporary addresses being generated
eth: fec: Fix lost promiscuous mode after reconnecting cable
bonding: set correct vlan id for alb xmit path
at86rf230: fix lockdep splats
net/mlx4_en: Deregister multicast vxlan steering rules when going down
vmxnet3: fix building without CONFIG_PCI_MSI
MAINTAINERS: add networking selftests to NETWORKING
net: socket: error on a negative msg_namelen
MAINTAINERS: Add tools/net to NETWORKING [GENERAL]
packet: doc: Spelling s/than/that/
net/mlx4_core: Load the IB driver when the device supports IBoE
net/mlx4_en: Handle vxlan steering rules for mac address changes
net/mlx4_core: Fix wrong dump of the vxlan offloads device capability
xen-netback: use skb_is_gso in xenvif_start_xmit
r8169: fix the incorrect tx descriptor version
tools/net/Makefile: Define PACKAGE to fix build problems
x86: bpf_jit: support negative offsets
bridge: multicast: enable snooping on general queries only
bridge: multicast: add sanity check for general query destination
tcp: tcp_release_cb() should release socket ownership
...

12y ago

Richard Weinberger

62c19c9d

i2c: Remove usage of orphaned symbol OF_I2C

12y ago

Dave Airlie

bf21d605

Merge branch 'drm-fixes-3.14' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

12y ago

Rob Clark

9ef7506f

drm/ttm: don't oops if no invalidate_caches()

A few of the simpler TTM drivers (cirrus, ast, mgag200) do not implement
this function. Yet can end up somehow with an evicted bo:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
PGD 16e761067 PUD 16e6cf067 PMD 0
Oops: 0010 [#1] SMP
Modules linked in: bnep bluetooth rfkill fuse ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg btrfs zlib_deflate raid6_pq xor dm_queue_length iTCO_wdt iTCO_vendor_support coretemp kvm dcdbas dm_service_time microcode serio_raw pcspkr lpc_ich mfd_core i7core_edac edac_core ses enclosure ipmi_si ipmi_msghandler shpchp acpi_power_meter mperf nfsd auth_rpcgss nfs_acl lockd uinput sunrpc dm_multipath xfs libcrc32c ata_generic pata_acpi sr_mod cdrom
sd_mod usb_storage mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit lpfc drm_kms_helper ttm crc32c_intel ata_piix bfa drm ixgbe libata i2c_core mdio crc_t10dif ptp crct10dif_common pps_core scsi_transport_fc dca scsi_tgt megaraid_sas bnx2 dm_mirror dm_region_hash dm_log dm_mod
CPU: 16 PID: 2572 Comm: X Not tainted 3.10.0-86.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R810/0H235N, BIOS 0.3.0 11/14/2009
task: ffff8801799dabc0 ti: ffff88016c884000 task.ti: ffff88016c884000
RIP: 0010:[<0000000000000000>] [< (null)>] (null)
RSP: 0018:ffff88016c885ad8 EFLAGS: 00010202
RAX: ffffffffa04e94c0 RBX: ffff880178937a20 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000240004 RDI: ffff880178937a00
RBP: ffff88016c885b60 R08: 00000000000171a0 R09: ffff88007cf171a0
R10: ffffea0005842540 R11: ffffffff810487b9 R12: ffff880178937b30
R13: ffff880178937a00 R14: ffff88016c885b78 R15: ffff880179929400
FS: 00007f81ba2ef980(0000) GS:ffff88007cf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000016e763000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffffffffa0306fae ffff8801799295c0 0000000000260004 0000000000000001
ffff88016c885b60 ffffffffa0307669 00ff88007cf17738 ffff88017cf17700
ffff880178937a00 ffff880100000000 ffff880100000000 0000000079929400
Call Trace:
[<ffffffffa0306fae>] ? ttm_bo_handle_move_mem+0x54e/0x5b0 [ttm]
[<ffffffffa0307669>] ? ttm_bo_mem_space+0x169/0x340 [ttm]
[<ffffffffa0307bd7>] ttm_bo_move_buffer+0x117/0x130 [ttm]
[<ffffffff81130001>] ? perf_event_init_context+0x141/0x220
[<ffffffffa0307cb1>] ttm_bo_validate+0xc1/0x130 [ttm]
[<ffffffffa04e7377>] mgag200_bo_pin+0x87/0xc0 [mgag200]
[<ffffffffa04e56c4>] mga_crtc_cursor_set+0x474/0xbb0 [mgag200]
[<ffffffff811971d2>] ? __mem_cgroup_commit_charge+0x152/0x3b0
[<ffffffff815c4182>] ? mutex_lock+0x12/0x2f
[<ffffffffa0201433>] drm_mode_cursor_common+0x123/0x170 [drm]
[<ffffffffa0205231>] drm_mode_cursor_ioctl+0x41/0x50 [drm]
[<ffffffffa01f5ca2>] drm_ioctl+0x502/0x630 [drm]
[<ffffffff815cbab4>] ? __do_page_fault+0x1f4/0x510
[<ffffffff8101cb68>] ? __restore_xstate_sig+0x218/0x4f0
[<ffffffff811b4445>] do_vfs_ioctl+0x2e5/0x4d0
[<ffffffff8124488e>] ? file_has_perm+0x8e/0xa0
[<ffffffff811b46b1>] SyS_ioctl+0x81/0xa0
[<ffffffff815d05d9>] system_call_fastpath+0x16/0x1b
Code: Bad RIP value.
RIP [< (null)>] (null)
RSP <ffff88016c885ad8>
CR2: 0000000000000000

Signed-off-by: Rob Clark <rclark@redhat.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: stable@vger.kernel.org

12y ago

Joe Thornber

cebc2de4

dm space map metadata: fix refcount decrement below 0 which caused corruption

This has been a relatively long-standing issue that wasn't nailed down
until Teng-Feng Yang's meticulous bug report to dm-devel on 3/7/2014,
see: http://www.redhat.com/archives/dm-devel/2014-March/msg00021.html

From that report:
"When decreasing the reference count of a metadata block with its
reference count equals 3, we will call dm_btree_remove() to remove
this enrty from the B+tree which keeps the reference count info in
metadata device.

The B+tree will try to rebalance the entry of the child nodes in each
node it traversed, and the rebalance process contains the following
steps.

(1) Finding the corresponding children in current node (shadow_current(s))
(2) Shadow the children block (issue BOP_INC)
(3) redistribute keys among children, and free children if necessary (issue BOP_DEC)

Since the update of a metadata block's reference count could be
recursive, we will stash these reference count update operations in
smm->uncommitted and then process them in a FILO fashion.

The problem is that step(3) could free the children which is created
in step(2), so the BOP_DEC issued in step(3) will be carried out
before the BOP_INC issued in step(2) since these BOPs will be
processed in FILO fashion. Once the BOP_DEC from step(3) tries to
decrease the reference count of newly shadow block, it will report
failure for its reference equals 0 before decreasing. It looks like we
can solve this issue by processing these BOPs in a FIFO fashion
instead of FILO."

Commit 5b564d80 ("dm space map: disallow decrementing a reference count
below zero") changed the code to report an error for this temporary
refcount decrement below zero. So what was previously a harmless
invalid refcount became a hard failure due to the new error path:

device-mapper: space map common: unable to decrement a reference count below 0
device-mapper: thin: 253:6: dm_thin_insert_block() failed: error = -22
device-mapper: thin: 253:6: switching pool to read-only mode

This bug is in dm persistent-data code that is common to the DM thin and
cache targets. So any users of those targets should apply this fix.

Fix this by applying recursive space map operations in FIFO order rather
than FILO.

Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=68801

Reported-by: Apollon Oikonomopoulos <apoikos@debian.org>
Reported-by: edwillam1007@gmail.com
Reported-by: Teng-Feng Yang <shinrairis@gmail.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.13+

12y ago

Linus Torvalds

fe9ea91c

Merge tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

12y ago

Olof Johansson

10554647

Merge tag 'omap-for-v3.14/fixes-dt-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes

12y ago

Chad Dupuis

f324777e

[SCSI] qla2xxx: Fix multiqueue MSI-X registration.

12y ago

Marios Andreopoulos

2564338b

libata: disable queued TRIM for Crucial M500 mSATA SSDs

12y ago

Al Viro

1b56e989

ocfs2 syncs the wrong range...

12y ago

Linux 3.14-rc7 v3.14-rc7

dcb99fd9

Linus Torvalds

11y

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

59bf6c3c

Linus Torvalds

11y

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

b44eeb4d

Linus Torvalds

11y

sched/clock: Prevent tracing recursion in sched_clock_cpu()

96b3d28b

Fernando Luis Vazquez Cao

12y

ipc: Fix 2 bugs in msgrcv() MSG_COPY implementation

4f87dac3

Michael Kerrisk

11y

perf/x86: Fix leak in uncore_type_init failure paths

b7b4839d

Dave Jones

12y

stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()

177c53d9

Peter Zijlstra

12y

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

3b4df68d

Linus Torvalds

12y

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

b8ad0f91

Ingo Molnar

12y

sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy

d44753b8

Juri Lelli

12y

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

a4ecdf82

Linus Torvalds

12y

[SCSI] storvsc: NULL pointer dereference fix

b12bb60d

Ales Novak

12y

Merge branch 'akpm' (patches from Andrew Morton)

8712a005

Linus Torvalds

12y

perf machine: Use map as success in ip__resolve_ams

fdf57dd0

Don Zickus

12y

Merge tag 'pm+acpi-3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

cee152ff

Linus Torvalds

12y

x86/amd/numa: Fix northbridge quirk to assign correct NUMA node

847d7970

Daniel J Blueman

12y

[SCSI] qla2xxx: Poll during initialization for ISP25xx and ISP83xx

b77ed25c

Giridhar Malavali

12y

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

e6a4b6f5

Linus Torvalds

12y

cris: convert ffs from an object-like macro to a function-like macro

0eb808eb

Geert Uytterhoeven

12y

perf symbols: Fix crash in elf_section_by_name

155b3a13

Jiri Olsa

12y

Merge tag 'dm-3.14-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

0c01b452

Linus Torvalds

12y

Merge branches 'pnp', 'acpi-init', 'acpi-sleep' and 'pm-cpufreq'

d5af40d6

Rafael J. Wysocki

12y

x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU

For non-eager fpu mode, thread's fpu state is allocated during the first
fpu usage (in the context of device not available exception). This
(math_state_restore()) can be a blocking call and hence we enable
interrupts (which were originally disabled when the exception happened),
allocate memory and disable interrupts etc.

But the eager-fpu mode, call's the same math_state_restore() from
kernel_fpu_end(). The assumption being that tsk_used_math() is always
set for the eager-fpu mode and thus avoid the code path of enabling
interrupts, allocating fpu state using blocking call and disable
interrupts etc.

But the below issue was noticed by Maarten Baert, Nate Eldredge and
few others:

If a user process dumps core on an ecrypt fs while aesni-intel is loaded,
we get a BUG() in __find_get_block() complaining that it was called with
interrupts disabled; then all further accesses to our ecrypt fs hang
and we have to reboot.

The aesni-intel code (encrypting the core file that we are writing) needs
the FPU and quite properly wraps its code in kernel_fpu_{begin,end}(),
the latter of which calls math_state_restore(). So after kernel_fpu_end(),
interrupts may be disabled, which nobody seems to expect, and they stay
that way until we eventually get to __find_get_block() which barfs.

For eager fpu, most the time, tsk_used_math() is true. At few instances
during thread exit, signal return handling etc, tsk_used_math() might
be false.

In kernel_fpu_end(), for eager-fpu, call math_state_restore()
only if tsk_used_math() is set. Otherwise, don't bother. Kernel code
path which cleared tsk_used_math() knows what needs to be done
with the fpu state.

Reported-by: Maarten Baert <maarten-baert@hotmail.com>
Reported-by: Nate Eldredge <nate@thatsmathematics.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/1391410583.3801.6.camel@europa
Cc: George Spelvin <linux@horizon.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>