commits

journal_head::b_transaction and journal_head::b_next_transaction could
be accessed concurrently as noticed by KCSAN,

LTP: starting fsync04
/dev/zero: Can't open blockdev
EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
==================================================================
BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]

write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
__jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
__jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
(inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
kjournald2+0x13b/0x450 [jbd2]
kthread+0x1cd/0x1f0
ret_from_fork+0x27/0x50

read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
jbd2_write_access_granted+0x1b2/0x250 [jbd2]
jbd2_write_access_granted at fs/jbd2/transaction.c:1155
jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
__ext4_journal_get_write_access+0x50/0x90 [ext4]
ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
ext4_mb_new_blocks+0x54f/0xca0 [ext4]
ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
ext4_map_blocks+0x3b4/0x950 [ext4]
_ext4_get_block+0xfc/0x270 [ext4]
ext4_get_block+0x3b/0x50 [ext4]
__block_write_begin_int+0x22e/0xae0
__block_write_begin+0x39/0x50
ext4_write_begin+0x388/0xb50 [ext4]
generic_perform_write+0x15d/0x290
ext4_buffered_write_iter+0x11f/0x210 [ext4]
ext4_file_write_iter+0xce/0x9e0 [ext4]
new_sync_write+0x29c/0x3b0
__vfs_write+0x92/0xa0
vfs_write+0x103/0x260
ksys_write+0x9d/0x130
__x64_sys_write+0x4c/0x60
do_syscall_64+0x91/0xb05
entry_SYSCALL_64_after_hwframe+0x49/0xbe

5 locks held by fsync04/25724:
#0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
#1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
#2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
#3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
#4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
irq event stamp: 1407125
hardirqs last enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
softirqs last enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0

Reported by Kernel Concurrency Sanitizer on:
CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The plain reads are outside of jh->b_state_lock critical section which result
in data races. Fix them by adding pairs of READ|WRITE_ONCE().

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

5y ago

Linus Torvalds

7557c1b3

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

5y ago

Wolfram Sang

38b17afb

macintosh: therm_windtunnel: fix regression when instantiating devices

5y ago

Paolo Bonzini

e951445f

Merge tag 'kvmarm-fixes-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

5y ago

Linus Torvalds

f8788d86

Linux 5.6-rc3 v5.6-rc3

5y ago

Linus Torvalds

29795de0

Merge tag 'pci-v5.6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

5y ago

Adam Williamson

03264ddd

scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places

5y ago

Gustavo A. R. Silva

54498e80

i2c: altera: Fix potential integer overflow

5y ago

Erwan Velu

ef935c25

kvm: x86: Limit the number of "kvm: disabled by bios" messages

5y ago

James Morse

e43f1331

arm64: Ask the compiler to __always_inline functions used by KVM at HYP

5y ago

Linus Torvalds

d2eee258

Merge tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

5y ago

Linus Torvalds

2edc78b9

Merge tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block

5y ago

Lukas Bulwahn

5901b51f

MAINTAINERS: Correct Cadence PCI driver path

5y ago

Benjamin Block

a3fd4bfe

scsi: zfcp: fix wrong data and display format of SFP+ temperature

5y ago

Wolfram Sang

9e661ced

i2c: jz4780: silence log flood on txabrt

5y ago

Paolo Bonzini

aaec7c03

KVM: x86: avoid useless copy of cpufreq policy

5y ago

James Morse

8c2d146e

KVM: arm64: Define our own swab32() to avoid a uapi static inline

5y ago

Linus Torvalds

a3163ca0

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

5y ago

Filipe Manana

a5ae50de

Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eof

While logging the prealloc extents of an inode during a fast fsync we call
btrfs_truncate_inode_items(), through btrfs_log_prealloc_extents(), while
holding a read lock on a leaf of the inode's root (not the log root, the
fs/subvol root), and then that function locks the file range in the inode's
iotree. This can lead to a deadlock when:

* the fsync is ranged

* the file has prealloc extents beyond eof

* writeback for a range different from the fsync range starts
during the fsync

* the size of the file is not sector size aligned

Because when finishing an ordered extent we lock first a file range and
then try to COW the fs/subvol tree to insert an extent item.

The following diagram shows how the deadlock can happen.

CPU 1 CPU 2

btrfs_sync_file()
--> for range [0, 1MiB)

--> inode has a size of
1MiB and has 1 prealloc
extent beyond the
i_size, starting at offset
4MiB

flushes all delalloc for the
range [0MiB, 1MiB) and waits
for the respective ordered
extents to complete

--> before task at CPU 1 locks the
inode, a write into file range
[1MiB, 2MiB + 1KiB) is made

--> i_size is updated to 2MiB + 1KiB

--> writeback is started for that
range, [1MiB, 2MiB + 4KiB)
--> end offset rounded up to
be sector size aligned

btrfs_log_dentry_safe()
btrfs_log_inode_parent()
btrfs_log_inode()

btrfs_log_changed_extents()
btrfs_log_prealloc_extents()
--> does a search on the
inode's root
--> holds a read lock on
leaf X

btrfs_finish_ordered_io()
--> locks range [1MiB, 2MiB + 4KiB)
--> end offset rounded up
to be sector size aligned

--> tries to cow leaf X, through
insert_reserved_file_extent()
--> already locked by the
task at CPU 1

btrfs_truncate_inode_items()

--> gets an i_size of
2MiB + 1KiB, which is
not sector size
aligned

--> tries to lock file
range [2MiB, (u64)-1)
--> the start range
is rounded down
from 2MiB + 1K
to 2MiB to be sector
size aligned

--> but the subrange
[2MiB, 2MiB + 4KiB) is
already locked by
task at CPU 2 which
is waiting to get a
write lock on leaf X
for which we are
holding a read lock

*** deadlock ***

This results in a stack trace like the following, triggered by test case
generic/561 from fstests:

[ 2779.973608] INFO: task kworker/u8:6:247 blocked for more than 120 seconds.
[ 2779.979536] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2779.984503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2779.990136] kworker/u8:6 D 0 247 2 0x80004000
[ 2779.990457] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[ 2779.990466] Call Trace:
[ 2779.990491] ? __schedule+0x384/0xa30
[ 2779.990521] schedule+0x33/0xe0
[ 2779.990616] btrfs_tree_read_lock+0x19e/0x2e0 [btrfs]
[ 2779.990632] ? remove_wait_queue+0x60/0x60
[ 2779.990730] btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 2779.990782] btrfs_search_slot+0x510/0x1000 [btrfs]
[ 2779.990869] btrfs_lookup_file_extent+0x4a/0x70 [btrfs]
[ 2779.990944] __btrfs_drop_extents+0x161/0x1060 [btrfs]
[ 2779.990987] ? mark_held_locks+0x6d/0xc0
[ 2779.990994] ? __slab_alloc.isra.49+0x99/0x100
[ 2779.991060] ? insert_reserved_file_extent.constprop.19+0x64/0x300 [btrfs]
[ 2779.991145] insert_reserved_file_extent.constprop.19+0x97/0x300 [btrfs]
[ 2779.991222] ? start_transaction+0xdd/0x5c0 [btrfs]
[ 2779.991291] btrfs_finish_ordered_io+0x4f4/0x840 [btrfs]
[ 2779.991405] btrfs_work_helper+0xaa/0x720 [btrfs]
[ 2779.991432] process_one_work+0x26d/0x6a0
[ 2779.991460] worker_thread+0x4f/0x3e0
[ 2779.991481] ? process_one_work+0x6a0/0x6a0
[ 2779.991489] kthread+0x103/0x140
[ 2779.991499] ? kthread_create_worker_on_cpu+0x70/0x70
[ 2779.991515] ret_from_fork+0x3a/0x50
(...)
[ 2780.026211] INFO: task fsstress:17375 blocked for more than 120 seconds.
[ 2780.027480] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2780.028482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2780.030035] fsstress D 0 17375 17373 0x00004000
[ 2780.030038] Call Trace:
[ 2780.030044] ? __schedule+0x384/0xa30
[ 2780.030052] schedule+0x33/0xe0
[ 2780.030075] lock_extent_bits+0x20c/0x320 [btrfs]
[ 2780.030094] ? btrfs_truncate_inode_items+0xf4/0x1150 [btrfs]
[ 2780.030098] ? rcu_read_lock_sched_held+0x59/0xa0
[ 2780.030102] ? remove_wait_queue+0x60/0x60
[ 2780.030122] btrfs_truncate_inode_items+0x133/0x1150 [btrfs]
[ 2780.030151] ? btrfs_set_path_blocking+0xb2/0x160 [btrfs]
[ 2780.030165] ? btrfs_search_slot+0x379/0x1000 [btrfs]
[ 2780.030195] btrfs_log_changed_extents.isra.8+0x841/0x93e [btrfs]
[ 2780.030202] ? do_raw_spin_unlock+0x49/0xc0
[ 2780.030215] ? btrfs_get_num_csums+0x10/0x10 [btrfs]
[ 2780.030239] btrfs_log_inode+0xf83/0x1124 [btrfs]
[ 2780.030251] ? __mutex_unlock_slowpath+0x45/0x2a0
[ 2780.030275] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs]
[ 2780.030282] ? dget_parent+0xa1/0x370
[ 2780.030309] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[ 2780.030329] btrfs_sync_file+0x3f3/0x490 [btrfs]
[ 2780.030339] do_fsync+0x38/0x60
[ 2780.030343] __x64_sys_fdatasync+0x13/0x20
[ 2780.030345] do_syscall_64+0x5c/0x280
[ 2780.030348] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2780.030356] RIP: 0033:0x7f2d80f6d5f0
[ 2780.030361] Code: Bad RIP value.
[ 2780.030362] RSP: 002b:00007ffdba3c8548 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 2780.030364] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d80f6d5f0
[ 2780.030365] RDX: 00007ffdba3c84b0 RSI: 00007ffdba3c84b0 RDI: 0000000000000003
[ 2780.030367] RBP: 000000000000004a R08: 0000000000000001 R09: 00007ffdba3c855c
[ 2780.030368] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000001f4
[ 2780.030369] R13: 0000000051eb851f R14: 00007ffdba3c85f0 R15: 0000557a49220d90

So fix this by making btrfs_truncate_inode_items() not lock the range in
the inode's iotree when the target root is a log root, since it's not
needed to lock the range for log roots as the protection from the inode's
lock and log_mutex are all that's needed.

Fixes: 28553fa992cb28 ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

5y ago

Linus Torvalds

74dea5d9

Merge tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block

5y ago

Jens Axboe

5b8ea58b

Merge branch 'nvme-5.6-rc4' of git://git.infradead.org/nvme into block-5.6

5y ago

Marek Szyprowski

73a7a271

PCI: brcmstb: Fix build on 32bit ARM platforms with older compilers

5y ago

Damien Le Moal

51fdaa04

scsi: sd_sbc: Fix sd_zbc_report_zones()

5y ago

Linus Torvalds

bb6d3fb3

Linux 5.6-rc1 v5.6-rc1

6y ago

Paolo Bonzini

4f337faf

KVM: allow disabling -Werror

5y ago

James Morse

5c37f1ae

KVM: arm64: Ask the compiler to __always_inline functions used at HYP

5y ago

Linus Torvalds

c6188dff

Merge tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux

5y ago

Jan Kara

9db176bc

ext4: fix mount failure with quota configured as module

5y ago

Filipe Manana

e75fd33b

Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents

5y ago

Linus Torvalds

c60c0402

Merge tag 'acpi-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

5y ago

Jens Axboe

d8768362

io_uring: fix 32-bit compatability with sendmsg/recvmsg

5y ago

John Garry

cae740a0

blk-mq: Remove some unused function arguments

5y ago

Bijan Mottahedeh

9515743b

nvme-pci: Hold cq_poll_lock while completing CQEs

5y ago

Igor Druzhinin

ff6993bb

scsi: libfc: free response frame from GPN_ID

5y ago

Linus Torvalds

89a47dd1

Merge tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

6y ago

Valdis Klētnieks

575b255c

KVM: x86: allow compiling as non-module with W=1

5y ago

Mark Rutland

b3f15ec3

kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()

5y ago

Linus Torvalds

dca132a6

Merge tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5y ago

Geert Uytterhoeven

99db590b

csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>

5y ago

wangyan

8eedabfd

jbd2: fix ocfs2 corrupt when clearing block group bits

I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
The running environment:
kernel version: 4.19
A cluster with two nodes, 5 luns mounted on two nodes, and do some
file operations like dd/fallocate/truncate/rm on every lun with storage
network disconnection.

The fallocate operation on dm-23-45 caused an null pointer dereference.

The information of NULL pointer dereference as follows:
[577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
[577992.878290] Aborting journal on device dm-23-45.
...
[577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
[577992.890908] __journal_remove_journal_head: freeing b_committed_data
[577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
[577992.890918] __journal_remove_journal_head: freeing b_committed_data
[577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
[577992.890922] __journal_remove_journal_head: freeing b_committed_data
[577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
[577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
[577992.890928] __journal_remove_journal_head: freeing b_committed_data
[577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
[577992.890933] __journal_remove_journal_head: freeing b_committed_data
[577992.890939] __journal_remove_journal_head: freeing b_committed_data
[577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[577992.890950] Mem abort info:
[577992.890951] ESR = 0x96000004
[577992.890952] Exception class = DABT (current EL), IL = 32 bits
[577992.890952] SET = 0, FnV = 0
[577992.890953] EA = 0, S1PTW = 0
[577992.890954] Data abort info:
[577992.890955] ISV = 0, ISS = 0x00000004
[577992.890956] CM = 0, WnR = 0
[577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
[577992.890960] [0000000000000020] pgd=0000000000000000
[577992.890964] Internal error: Oops: 96000004 [#1] SMP
[577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
[577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1
[577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
[577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
[577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
[577992.891084] sp : ffff0000c8e2b810
[577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
[577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
[577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
[577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
[577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
[577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
[577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
[577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
[577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
[577992.891103] x11: 0000000000000038 x10: 0101010101010101
[577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
[577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
[577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
[577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
[577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
[577992.891116] Call trace:
[577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2]
[577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2]
[577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
[577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
[577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
[577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
[577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
[577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2]
[577992.891316] vfs_fallocate+0x11c/0x250
[577992.891317] ksys_fallocate+0x54/0x88
[577992.891319] __arm64_sys_fallocate+0x28/0x38
[577992.891323] el0_svc_common+0x78/0x130
[577992.891325] el0_svc_handler+0x38/0x78
[577992.891327] el0_svc+0x8/0xc

My analysis process as follows:
ocfs2_fallocate
__ocfs2_change_file_space
ocfs2_allocate_extents
ocfs2_extend_allocation
ocfs2_add_inode_data
ocfs2_add_clusters_in_btree
ocfs2_insert_extent
ocfs2_do_insert_extent
ocfs2_rotate_tree_right
ocfs2_extend_rotate_transaction
ocfs2_extend_trans
jbd2_journal_restart
jbd2__journal_restart
/* handle->h_transaction is NULL,
* is_handle_aborted(handle) is true
*/
handle->h_transaction = NULL;
start_this_handle
return -EROFS;
ocfs2_free_clusters
_ocfs2_free_clusters
_ocfs2_free_suballoc_bits
ocfs2_block_group_clear_bits
ocfs2_journal_access_gd
__ocfs2_journal_access
jbd2_journal_get_undo_access
/* I think jbd2_write_access_granted() will
* return true, because do_get_write_access()
* will return -EROFS.
*/
if (jbd2_write_access_granted(...)) return 0;
do_get_write_access
/* handle->h_transaction is NULL, it will
* return -EROFS here, so do_get_write_access()
* was not called.
*/
if (is_handle_aborted(handle)) return -EROFS;
/* bh2jh(group_bh) is NULL, caused NULL
pointer dereference */
undo_bg = (struct ocfs2_group_desc *)
bh2jh(group_bh)->b_committed_data;

If handle->h_transaction == NULL, then jbd2_write_access_granted()
does not really guarantee that journal_head will stay around,
not even speaking of its b_committed_data. The bh2jh(group_bh)
can be removed after ocfs2_journal_access_gd() and before call
"bh2jh(group_bh)->b_committed_data". So, we should move
is_handle_aborted() check from do_get_write_access() into
jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
before the call to jbd2_write_access_granted().

Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com
Signed-off-by: Yan Wang <wangyan122@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org

5y ago

Josef Bacik

b778cf96

btrfs: fix bytes_may_use underflow in prealloc error condtition

5y ago

Linus Torvalds

36428598

Merge tag 'pm-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

5y ago

Mika Westerberg

cabe17d0

ACPI: watchdog: Set default timeout in probe

5y ago

Tobias Klauser

bebdb65e

io_uring: define and set show_fdinfo only if procfs is enabled

5y ago

Dongli Zhang

93d7c318

null_blk: remove unused fields in 'nullb_cmd'

5y ago

Bart Van Assche

807b9515

scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session"

5y ago

Linus Torvalds

380a129e

Merge tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

6y ago

Masahiro Yamada

f566e1fb

kbuild: make multiple directory targets work

6y ago

Wanpeng Li

8a9442f4

KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis

5y ago

Jeremy Cline

51b25694

KVM: arm/arm64: Fix up includes for trace.h

6y ago

Linus Torvalds

f3cc2494

Merge tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5y ago

Thomas Gleixner

51dede9c

x86/mce/amd: Fix kobject lifetime

5y ago

Guo Ren

0b9f386c

csky: Implement copy_thread_tls

5y ago

Eric Biggers

cb85f4d2

ext4: fix race between writepages and enabling EXT4_EXTENTS_FL

5y ago

Linux 5.6-rc4 v5.6-rc4

98d54f81

Linus Torvalds

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

e7086982

Linus Torvalds

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

f853ed90

Linus Torvalds

ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()

37b0b6b8

Dan Carpenter

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

fb279f4e

Linus Torvalds

KVM: VMX: check descriptor table exits on instruction emulation

86f7e90c

Oliver Upton

jbd2: fix data races at struct journal_head

6c5d9112

Qian Cai

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

7557c1b3

Linus Torvalds

macintosh: therm_windtunnel: fix regression when instantiating devices

38b17afb

Wolfram Sang

Merge tag 'kvmarm-fixes-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

e951445f

Paolo Bonzini

Linux 5.6-rc3 v5.6-rc3

f8788d86

Linus Torvalds

Merge tag 'pci-v5.6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

29795de0

Linus Torvalds

scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places

03264ddd

Adam Williamson

i2c: altera: Fix potential integer overflow

54498e80

Gustavo A. R. Silva

kvm: x86: Limit the number of "kvm: disabled by bios" messages

ef935c25

Erwan Velu

arm64: Ask the compiler to __always_inline functions used by KVM at HYP

e43f1331

James Morse

Merge tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

d2eee258

Linus Torvalds

Merge tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block

2edc78b9

Linus Torvalds

MAINTAINERS: Correct Cadence PCI driver path

5901b51f

Lukas Bulwahn

scsi: zfcp: fix wrong data and display format of SFP+ temperature

a3fd4bfe

Benjamin Block

i2c: jz4780: silence log flood on txabrt

9e661ced

Wolfram Sang

KVM: x86: avoid useless copy of cpufreq policy

aaec7c03

Paolo Bonzini

KVM: arm64: Define our own swab32() to avoid a uapi static inline

8c2d146e

James Morse

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

a3163ca0

Linus Torvalds

Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eof

a5ae50de

Filipe Manana

Merge tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block

74dea5d9

Linus Torvalds

Merge branch 'nvme-5.6-rc4' of git://git.infradead.org/nvme into block-5.6

5b8ea58b

Jens Axboe

PCI: brcmstb: Fix build on 32bit ARM platforms with older compilers

73a7a271

Marek Szyprowski

scsi: sd_sbc: Fix sd_zbc_report_zones()

The block layer generic blk_revalidate_disk_zones() checks the validity of
zone descriptors reported by a disk using the blk_revalidate_zone_cb()
callback function executed for each zone descriptor. If a ZBC disk reports
invalid zone descriptors, blk_revalidate_disk_zones() returns an error and
sd_zbc_read_zones() changes the disk capacity to 0, which in turn results
in the gendisk structure capacity to be set to 0. This all works well for
the first revalidate pass on a disk and the block layer detects the
capactiy change.

On the second revalidate pass, blk_revalidate_disk_zones() is called again
and sd_zbc_report_zones() executed to check the zones a second time.
However, for this second pass, the gendisk capacity is now 0, which results
in sd_zbc_report_zones() to do nothing and to report success and no
zones. blk_revalidate_disk_zones() in turn returns success and sets the
disk queue chunk_sectors limit with zero as no zones were checked, causing
a oops to trigger on the BUG_ON(!is_power_of_2(chunk_sectors)) in
blk_queue_chunk_sectors().

Fix this by using the sdkp capacity field rather than the gendisk capacity
for the report zones loop in sd_zbc_report_zones(). Also add a check to
return immediately an error if the sdkp capacity is 0. With this fix,
invalid/buggy ZBC disk scan does not trigger a oops and are exposed with a
0 capacity. This change also preserve the chance for the disk to be
correctly revalidated on the second revalidate pass as the scsi disk
structure capacity field is always set to the disk reported value when
sd_zbc_report_zones() is called.

Link: https://lore.kernel.org/r/20200219063800.880834-1-damien.lemoal@wdc.com
Fixes: d41003513e61 ("block: rework zone reporting")
Cc: Cc: <stable@vger.kernel.org> # v5.5
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>