commits

xfs: reduce AGF hold times during fstrim operations

A recent log space overflow and recovery failure was root caused to
a long running truncate blocking on the AGF and ending up pinning
the tail of the log. The filesystem then hung, the machine was
rebooted, and log recoery then refused to run because there wasn't
enough space in the log for EFI transaction reservation.

The reason the long running truncate got blocked on the AGF for so
long was that an fstrim was being run. THe underlying block device
was large and very slow (10TB ceph rbd volume) and so discarding all
the free space in the AG took a really long time.

The current fstrim implementation holds the AGF across the entire
operations - both the freee space scan and the issuing of all the
discards. The discards are synchronous and single depth, so if there
are millions of free spaces, we hold the AGF lock across millions of
discard operations.

It doesn't really need to be said that this is a Bad Thing.

This series reworks the fstrim discard path to use the same
mechanisms as online discard. This allows discards to be issued
asynchronously without holding the AGF locked, enabling higher
discard queue depths (much faster on fast devices) and only
requiring the AGF lock to be held whilst we are scanning free space.

To do this, we make use of busy extents - we lock the AGF, mark all
the extents we want to discard as "busy under discard" so that
nothing will be allowed to allocate them, and then drop the AGF
lock. We then issue discards on the gathered busy extents and on
discard completion remove them from the busy list.

This results in AGF lock holds times for fstrim dropping to a few
milliseconds each batch of free extents we scan, and so the hours
long hold times that can currently occur on large, slow, badly
fragmented device no longer occur.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'xfs-fstrim-busy-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: abort fstrim if kernel is suspending
xfs: reduce AGF hold times during fstrim operations
xfs: move log discard work to xfs_discard.c

2y ago

Jordan Rife

cedc019b

smb: use kernel_connect() and kernel_bind()

2y ago

Linus Torvalds

ce9ecca0

Linux 6.6-rc2 v6.6-rc2

2y ago

Linus Torvalds

e81a2dab

Merge tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

2y ago

Namjae Jeon

53ff5cf8

ksmbd: fix race condition between session lookup and expire

2y ago

Linus Torvalds

b036cda9

Merge tag 'media/v6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

2y ago

Mike Snitzer

3da5d2de

MAINTAINERS: update the dm-devel mailing list

2y ago

Dave Chinner

e78a40b8

xfs: abort fstrim if kernel is suspending

2y ago

Linus Torvalds

e7892864

Merge tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2y ago

Linus Torvalds

d2c52315

Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2y ago

Masahiro Yamada

2d7d1bc1

kbuild: remove stale code for 'source' symlink in packaging scripts

2y ago

Linus Torvalds

5e5558f5

Merge tag 'devicetree-fixes-for-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

2y ago

Irui Wang

1146bec0

media: mediatek: vcodec: Fix encoder access NULL pointer

2y ago

Fedor Pchelkin

9850ccd5

dm zoned: free dmz->ddev array in dmz_put_zoned_devices

2y ago

Dave Chinner

89cfa899

xfs: reduce AGF hold times during fstrim operations

2y ago

Linus Torvalds

e5a710d1

Merge tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2y ago

Song Liu

75b2f7e4

x86/purgatory: Remove LTO flags

2y ago

Linus Torvalds

8f633369

Merge tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

2y ago

Baoquan He

e2a8f20d

Crash: add lock to serialize crash hotplug handling

2y ago

Uwe Kleine-König

f177cd0c

modpost: Don't let "driver"s reference .exit.*

2y ago

Linus Torvalds

22823378

Merge tag 'gpio-fixes-for-v6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

2y ago

Luca Ceresoli

19007c62

dt-bindings: trivial-devices: Fix MEMSIC MXC4005 compatible string

2y ago

Luca Ceresoli

6bd01c42

staging: media: tegra-video: fix infinite recursion regression

Since commit 9bf19fbf0c8b ("media: v4l: async: Rework internal lists"), aka
v6.6-rc1~97^2~198, probing the tegra-video VI driver causes infinite
recursion due tegra_vi_graph_parse_one() calling itself until:

[ 1.571168] Insufficient stack space to handle exception!
...
[ 1.591416] Internal error: kernel stack overflow: 0 [#1] PREEMPT SMP ARM
...
[ 3.861013] of_phandle_iterator_init from __of_parse_phandle_with_args+0x40/0xf0
[ 3.868497] __of_parse_phandle_with_args from of_fwnode_graph_get_remote_endpoint+0x68/0xa8
[ 3.876938] of_fwnode_graph_get_remote_endpoint from fwnode_graph_get_remote_port_parent+0x30/0x7c
[ 3.885984] fwnode_graph_get_remote_port_parent from tegra_vi_graph_parse_one+0x7c/0x224
[ 3.894158] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 3.901459] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 3.908760] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 3.916061] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
...
[ 4.857892] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 4.865193] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 4.872494] tegra_vi_graph_parse_one from tegra_vi_init+0x574/0x6d4
[ 4.878842] tegra_vi_init from host1x_device_init+0x84/0x15c
[ 4.884594] host1x_device_init from host1x_video_probe+0xa0/0x114
[ 4.890770] host1x_video_probe from really_probe+0xe0/0x400

The reason is the mentioned commit changed tegra_vi_graph_find_entity() to
search for an entity in the done notifier list:

> @@ -1464,7 +1464,7 @@ tegra_vi_graph_find_entity(struct tegra_vi_channel *chan,
> struct tegra_vi_graph_entity *entity;
> struct v4l2_async_connection *asd;
>
> - list_for_each_entry(asd, &chan->notifier.asc_list, asc_entry) {
> + list_for_each_entry(asd, &chan->notifier.done_list, asc_entry) {
> entity = to_tegra_vi_graph_entity(asd);
> if (entity->asd.match.fwnode == fwnode)
> return entity;

This is not always correct, being tegra_vi_graph_find_entity() called in
three locations, in this order:

1. tegra_vi_graph_parse_one() -- called while probing
2. tegra_vi_graph_notify_bound() -- the .bound notifier op
3. tegra_vi_graph_build() -- called in the .complete notifier op

Locations 1 and 2 are called before moving the entity from waiting_list to
done_list, thus they won't find what they are looking for in
done_list. Location 3 happens afterwards and thus it is not broken, however
it means tegra_vi_graph_find_entity() should not search in the same list
every time.

The error appears at step 1: tegra_vi_graph_parse_one() iterates
recursively until it finds the entity already notified, which now never
happens.

Fix by passing the specific notifier list pointer to
tegra_vi_graph_find_entity() instead of the channel, so each caller can
search in whatever list is correct.

Also improve the tegra_vi_graph_find_entity() comment.

Fixes: 9bf19fbf0c8b ("media: v4l: async: Rework internal lists")
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Sowjanya Komatineni <skomatineni@nvidia.com>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
[Sakari Ailus: Wrapped some long lines.]
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>

2y ago

Dave Chinner

428c4435

xfs: move log discard work to xfs_discard.c

2y ago

Linus Torvalds

e54ca3c8

Merge tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2y ago

Ricardo Neri

108af4b4

x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain

2y ago

Kirill A. Shutemov

f530ee95

x86/boot/compressed: Reserve more memory for page tables

2y ago

Linus Torvalds

3abd15e2

Merge tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

2y ago

Ricky WU

0e4cac55

misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe

2y ago

Juntong Deng

bbe246f8

selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error

2y ago

Masahiro Yamada

15e86643

vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros

2y ago

Linus Torvalds

8fea9f8f

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

2y ago

Bartosz Golaszewski

f9315f17

gpio: aspeed: fix the GPIO number passed to pinctrl_gpio_set_config()

2y ago

Rob Herring

d6e201f8

dt-bindings: PCI: brcm,iproc-pcie: Fix 'msi' child node schema

2y ago

Arnd Bergmann

90d3c11a

media: pci: intel: ivsc: select V4L2_FWNODE

2y ago

Linus Torvalds

99a73f9e

Merge tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2y ago

Josh Poimboeuf

72178d5d

objtool: Fix _THIS_IP_ detection for cold functions

2y ago

Tim Chen

450e7497

sched/fair: Fix SMT4 group_smt_balance handling

2y ago

Linux 6.6-rc5 v6.6-rc5

94f6f055

Linus Torvalds

Merge tag '6.6-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd

37faf07b

Linus Torvalds

Merge tag 'sched-urgent-2023-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

f707e40d

Linus Torvalds

ksmbd: fix race condition between tree conn lookup and disconnect

33b235a6

Namjae Jeon

Merge tag 'x86-urgent-2023-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

7e20d344

Linus Torvalds

cpufreq: schedutil: Update next_freq when cpufreq_limits change

9e0bc36a

Xuewen Yan

ksmbd: fix race condition from parallel smb2 lock requests

75ac9a3d

Namjae Jeon

Merge tag 'parisc-for-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

b9ddbb0c

Linus Torvalds

x86/sev: Change npages to unsigned long in snp_accept_memory()

62d5e970

Tom Lendacky

sched/eevdf: Fix avg_vruntime()

650cad56

Peter Zijlstra

ksmbd: fix race condition from parallel smb2 logoff requests

7ca9da7d

Namjae Jeon

Merge tag '6.6-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

59f3fd30

Linus Torvalds

parisc: Restore __ldcw_align for PA-RISC 2.0 processors

914988e0

John David Anglin

x86/sev: Use the GHCB protocol when available for SNP CPUID requests

6bc6f7d9

Tom Lendacky

sched/eevdf: Also update slice on placement

2f2fc17b

Peter Zijlstra

ksmbd: fix uaf in smb20_oplock_break_ack

c6981347

luosili

Merge tag 'xfs-6.6-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

102363a3

Linus Torvalds

smb: client: do not start laundromat thread on nohandlecache

3b8bb317

Paulo Alcantara

parisc: Fix crash with nr_cpus=1 option

d3b3c637

Helge Deller

Linux 6.6-rc4 v6.6-rc4

8a749fd1

Linus Torvalds

ksmbd: fix race condition with fp

5a7ee91d

Namjae Jeon

Merge tag 'for-6.6/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

4aef108a

Linus Torvalds

Merge tag 'xfs-fstrim-busy-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-6.6-fixesC

4e69f490

Chandan Babu R

smb: use kernel_connect() and kernel_bind()

cedc019b

Jordan Rife

Linux 6.6-rc2 v6.6-rc2

ce9ecca0

Linus Torvalds

Merge tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

e81a2dab

Linus Torvalds

ksmbd: fix race condition between session lookup and expire

53ff5cf8

Namjae Jeon

Merge tag 'media/v6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

b036cda9

Linus Torvalds

MAINTAINERS: update the dm-devel mailing list

3da5d2de

Mike Snitzer

xfs: abort fstrim if kernel is suspending

e78a40b8

Dave Chinner

Merge tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

e7892864

Linus Torvalds

Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

d2c52315

Linus Torvalds

kbuild: remove stale code for 'source' symlink in packaging scripts

2d7d1bc1

Masahiro Yamada

Merge tag 'devicetree-fixes-for-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

5e5558f5

Linus Torvalds

media: mediatek: vcodec: Fix encoder access NULL pointer

1146bec0

Irui Wang

dm zoned: free dmz->ddev array in dmz_put_zoned_devices

9850ccd5

Fedor Pchelkin

xfs: reduce AGF hold times during fstrim operations

fstrim will hold the AGF lock for as long as it takes to walk and
discard all the free space in the AG that meets the userspace trim
criteria. For AGs with lots of free space extents (e.g. millions)
or the underlying device is really slow at processing discard
requests (e.g. Ceph RBD), this means the AGF hold time is often
measured in minutes to hours, not a few milliseconds as we normal
see with non-discard based operations.

This can result in the entire filesystem hanging whilst the
long-running fstrim is in progress. We can have transactions get
stuck waiting for the AGF lock (data or metadata extent allocation
and freeing), and then more transactions get stuck waiting on the
locks those transactions hold. We can get to the point where fstrim
blocks an extent allocation or free operation long enough that it
ends up pinning the tail of the log and the log then runs out of
space. At this point, every modification in the filesystem gets
blocked. This includes read operations, if atime updates need to be
made.

To fix this problem, we need to be able to discard free space
extents safely without holding the AGF lock. Fortunately, we already
do this with online discard via busy extents. We can mark free space
extents as "busy being discarded" under the AGF lock and then unlock
the AGF, knowing that nobody will be able to allocate that free
space extent until we remove it from the busy tree.

Modify xfs_trim_extents to use the same asynchronous discard
mechanism backed by busy extents as is used with online discard.
This results in the AGF only needing to be held for short periods of
time and it is never held while we issue discards. Hence if discard
submission gets throttled because it is slow and/or there are lots
of them, we aren't preventing other operations from being performed
on AGF while we wait for discards to complete...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

89cfa899

Dave Chinner

Merge tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

e5a710d1

Linus Torvalds

x86/purgatory: Remove LTO flags

75b2f7e4

Song Liu

Merge tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

8f633369

Linus Torvalds

Crash: add lock to serialize crash hotplug handling

e2a8f20d

Baoquan He

modpost: Don't let "driver"s reference .exit.*

f177cd0c

Uwe Kleine-König

Merge tag 'gpio-fixes-for-v6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

22823378

Linus Torvalds

dt-bindings: trivial-devices: Fix MEMSIC MXC4005 compatible string

19007c62

Luca Ceresoli

staging: media: tegra-video: fix infinite recursion regression

6bd01c42

Luca Ceresoli

xfs: move log discard work to xfs_discard.c

428c4435

Dave Chinner

Merge tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

e54ca3c8

Linus Torvalds

x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain

108af4b4

Ricardo Neri

x86/boot/compressed: Reserve more memory for page tables

f530ee95

Kirill A. Shutemov

Merge tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

3abd15e2

Linus Torvalds

misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe

0e4cac55

Ricky WU

selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error

bbe246f8

Juntong Deng

vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros

15e86643

Masahiro Yamada

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

8fea9f8f

Linus Torvalds

gpio: aspeed: fix the GPIO number passed to pinctrl_gpio_set_config()

f9315f17

Bartosz Golaszewski

dt-bindings: PCI: brcm,iproc-pcie: Fix 'msi' child node schema

d6e201f8

Rob Herring

media: pci: intel: ivsc: select V4L2_FWNODE

90d3c11a

Arnd Bergmann

Merge tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

99a73f9e

Linus Torvalds

objtool: Fix _THIS_IP_ detection for cold functions

Cold functions and their non-cold counterparts can use _THIS_IP_ to
reference each other. Don't warn about !ENDBR in that case.

Note that for GCC this is currently irrelevant in light of the following
commit

c27cd083cfb9 ("Compiler attributes: GCC cold function alignment workarounds")

which disabled cold functions in the kernel. However this may still be
possible with Clang.

Fixes several warnings like the following:

drivers/scsi/bnx2i/bnx2i.prelink.o: warning: objtool: bnx2i_hw_ep_disconnect+0x19d: relocation to !ENDBR: bnx2i_hw_ep_disconnect.cold+0x0
drivers/net/ipvlan/ipvlan.prelink.o: warning: objtool: ipvlan_addr4_event.cold+0x28: relocation to !ENDBR: ipvlan_addr4_event+0xda
drivers/net/ipvlan/ipvlan.prelink.o: warning: objtool: ipvlan_addr6_event.cold+0x26: relocation to !ENDBR: ipvlan_addr6_event+0xb7
drivers/net/ethernet/broadcom/tg3.prelink.o: warning: objtool: tg3_set_ringparam.cold+0x17: relocation to !ENDBR: tg3_set_ringparam+0x115
drivers/net/ethernet/broadcom/tg3.prelink.o: warning: objtool: tg3_self_test.cold+0x17: relocation to !ENDBR: tg3_self_test+0x2e1
drivers/target/iscsi/cxgbit/cxgbit.prelink.o: warning: objtool: __cxgbit_free_conn.cold+0x24: relocation to !ENDBR: __cxgbit_free_conn+0xfb
net/can/can.prelink.o: warning: objtool: can_rx_unregister.cold+0x2c: relocation to !ENDBR: can_rx_unregister+0x11b
drivers/net/ethernet/qlogic/qed/qed.prelink.o: warning: objtool: qed_spq_post+0xc0: relocation to !ENDBR: qed_spq_post.cold+0x9a
drivers/net/ethernet/qlogic/qed/qed.prelink.o: warning: objtool: qed_iwarp_ll2_comp_syn_pkt.cold+0x12f: relocation to !ENDBR: qed_iwarp_ll2_comp_syn_pkt+0x34b
net/tipc/tipc.prelink.o: warning: objtool: tipc_nametbl_publish.cold+0x21: relocation to !ENDBR: tipc_nametbl_publish+0xa6

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/d8f1ab6a23a6105bc023c132b105f245c7976be6.1694476559.git.jpoimboe@kernel.org

72178d5d

Josh Poimboeuf

sched/fair: Fix SMT4 group_smt_balance handling

450e7497

Tim Chen