commits

Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time. In
Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling. In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is
being made") introduced the problem although commit a19594ca4a8b
("mm/vmscan: increase the timeout if page reclaim is not making
progress") made it worse. Systems at or near an OOM state that cannot
be recovered must reach OOM quickly and memcg should kill tasks if a
memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there
were excessive pages pending for writeback. If kswapd has stopped
reclaiming due to excessive failures, do not stall at all so that OOM
triggers relatively quickly. Similarly, if an LRU is simply congested,
only lightly throttle similar to NOPROGRESS.

Alexey's original case was the most straight forward

for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times. On 5.15, playback only jitters slightly, 5.16-rc1 stalls a
lot with lots of frames missing and numerous audio glitches. With this
patch applies, the video plays similarly to 5.15.

[lkp@intel.com: Fix W=1 build warning]

Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net
Link: https://lore.kernel.org/r/20211202150614.22440-1-mgorman@techsingularity.net
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/20211124011954.7cab9bb4@mail.inbox.lv/
Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv>
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tracked-by: Thorsten Leemhuis <regressions@leemhuis.info>
Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4y ago

Pavel Skripkin

9f3ccdc3

Input: appletouch - initialize work before device registration

4y ago

Linus Torvalds

e8ffcd3a

Merge tag 'x86_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Adrian Hunter

a78abde2

perf intel-pt: Fix parsing of VM time correlation arguments

4y ago

Linus Torvalds

f87bcc88

Merge branch 'akpm' (patches from Andrew)

4y ago

Johnny Chuang

4ebfee2b

Input: elants_i2c - do not check Remark ID on eKTH3900/eKTH5312

4y ago

Linus Torvalds

2afa90bd

Merge tag 'objtool_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Andrew Cooper

57690554

x86/pkey: Fix undefined behaviour with PKRU_WD_BIT

4y ago

Miaoqian Lin

9f3c16a4

perf expr: Fix return value of ids__new()

4y ago

Linus Torvalds

e46227bf

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

4y ago

SeongJae Park

ebb3f994

mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'

4y ago

José Expósito

12f247ab

Input: atmel_mxt_ts - fix double free in mxt_read_info_block

4y ago

Linus Torvalds

43864519

Merge tag 'pinctrl-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

4y ago

Josh Poimboeuf

dcce50e6

compiler.h: Fix annotation macro misplacement with Clang

4y ago

Mike Rapoport

2f5b3514

x86/boot: Move EFI range reservation after cmdline parsing

4y ago

Linus Torvalds

ecf71de7

Merge tag 'auxdisplay-for-linus-v5.16' of git://github.com/ojeda/linux

4y ago

Linus Torvalds

4f3d93c6

Merge tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"This is a bit bigger than I'd like, however it has two weeks of amdgpu
fixes in it, since they missed last week, which was very small.

The nouveau regression is probably the biggest fix in here, and it
needs to go into 5.15 as well, two i915 fixes, and then a scattering
of amdgpu fixes. The biggest fix in there is for a fencing NULL
pointer dereference, the rest are pretty minor.

For the misc team, I've pulled the two misc fixes manually since I'm
not sure what is happening at this time of year!

The amdgpu maintainers have the outstanding runpm regression to fix
still, they are just working through the last bits of it now.

Summary:

nouveau:
- fencing regression fix

i915:
- Fix possible uninitialized variable
- Fix composite fence seqno icrement on each fence creation

amdgpu:
- Fencing fix
- XGMI fix
- VCN regression fix
- IP discovery regression fixes
- Fix runpm documentation
- Suspend/resume fixes
- Yellow Carp display fixes
- MCLK power management fix
- dma-buf fix"

* tag 'drm-fixes-2021-12-31' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
drm/amd/display: Set optimize_pwr_state for DCN31
drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
drm/amd/display: Added power down for DCN10
drm/amd/display: fix B0 TMDS deepcolor no dislay issue
drm/amdgpu: no DC support for headless chips
drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
drm/amdgpu: always reset the asic in suspend (v2)
drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
drm/i915: Increment composite fence seqno
drm/i915: Fix possible uninitialized variable in parallel extension
drm/amdgpu: fix runpm documentation
drm/nouveau: wait for the exclusive fence after the shared ones v2
drm/amdgpu: add support for IP discovery gc_info table v2
drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
drm/amd/pm: Fix xgmi link control on aldebaran
drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence
drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify

4y ago

Alexey Makhalov

142c779d

scsi: vmw_pvscsi: Set residual data length conditionally

4y ago

Mike Kravetz

f5c73297

userfaultfd/selftests: fix hugetlb area allocations

Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh
or any environment where there are 'just enough' hugetlb pages will
always fail with:

testing events (fork, remap, remove):
ERROR: UFFDIO_COPY error: -12 (errno=12, line=616)

The ENOMEM error code implies there are not enough hugetlb pages.
However, there are free hugetlb pages but they are all reserved. There
is a basic problem with the way the test allocates hugetlb pages which
has existed since the test was originally written.

Due to the way 'cleanup' was done between different phases of the test,
this issue was masked until recently. The issue was uncovered by commit
8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each
test").

For the hugetlb test, src and dst areas are allocated as PRIVATE
mappings of a hugetlb file. This means that at mmap time, pages are
reserved for the src and dst areas. At the start of event testing (and
other tests) the src area is populated which results in allocation of
huge pages to fill the area and consumption of reserves associated with
the area. Then, a child is forked to fault in the dst area. Note that
the dst area was allocated in the parent and hence the parent owns the
reserves associated with the mapping. The child has normal access to
the dst area, but can not use the reserves created/owned by the parent.
Thus, if there are no other huge pages available allocation of a page
for the dst by the child will fail.

Fix by not creating reserves for the dst area. In this way the child
can use free (non-reserved) pages.

Also, MAP_PRIVATE of a file only makes sense if you are interested in
the contents of the file before making a COW copy. The test does not do
this. So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous
hugetlb mapping. There is no need to create a hugetlb file in the
non-shared case.

Link: https://lkml.kernel.org/r/20211217172919.7861-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4y ago

José Expósito

3fd6e12a

Input: goodix - fix memory leak in goodix_firmware_upload

4y ago

Linus Torvalds

e2ae0d4a

Merge tag 'hwmon-for-v5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

4y ago

Fabien Dessenne

b67210cc

pinctrl: stm32: consider the GPIO offset to expose all the GPIO lines

4y ago

Ismael Luceno

cb8747b7

uapi: Fix undefined __always_inline on non-glibc systems

4y ago

Borislav Petkov

fbe61839

Revert "x86/boot: Pull up cmdline preparation and early param parsing"

4y ago

Linus Torvalds

f651faaa

Merge tag 'powerpc-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

4y ago

Luiz Sampaio

4daa9ff8

auxdisplay: charlcd: checking for pointer reference before dereferencing

4y ago

Christian Brauner

012e3322

fs/mount_setattr: always cleanup mount_kattr

4y ago

Dave Airlie

ce9b333c

Merge branch 'drm-misc-fixes' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes

4y ago

Lixiaokeng

1b8d0300

scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()

4y ago

Hans de Goede

81e81886

Input: goodix - add id->model mapping for the "9111" model

4y ago

Linus Torvalds

5b5e3d03

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

4y ago

Guenter Roeck

cdc5287a

hwmon: (lm90) Do not report 'busy' status bit as alarm

4y ago

Phil Elwell

266423e6

pinctrl: bcm2835: Change init order for gpio hogs

4y ago

Linus Torvalds

a7904a53

Linux 5.16-rc6 v5.16-rc6

4y ago

Borislav Petkov

58e138d6

Revert "x86/boot: Mark prepare_command_line() __init"

4y ago

Linus Torvalds

a8ad9a24

Merge tag 'efi-urgent-for-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

4y ago

Michael Ellerman

8d84fca4

powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion

4y ago

Luiz Sampaio

94047df1

auxdisplay: charlcd: fixing coding style issue

4y ago

Linus Torvalds

74c78b42

Merge tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from.. Santa?

No regressions on our radar at this point. The igc problem fixed here
was the last one I was tracking but it was broken in previous
releases, anyway. Mostly driver fixes and a couple of largish SMC
fixes.

Current release - regressions:

- xsk: initialise xskb free_list_node, fixup for a -rc7 fix

Current release - new code bugs:

- mlx5: handful of minor fixes:

- use first online CPU instead of hard coded CPU

- fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'

- fix skb memory leak when TC classifier action offloads are disabled

- fix memory leak with rules with internal OvS port

Previous releases - regressions:

- igc: do not enable crosstimestamping for i225-V models

Previous releases - always broken:

- udp: use datalen to cap ipv6 udp max gso segments

- fix use-after-free in tw_timer_handler due to early free of stats

- smc: fix kernel panic caused by race of smc_sock

- smc: don't send CDC/LLC message if link not ready, avoid timeouts

- sctp: use call_rcu to free endpoint, avoid UAF in sock diag

- bridge: mcast: add and enforce query interval minimum

- usb: pegasus: do not drop long Ethernet frames

- mlx5e: fix ICOSQ recovery flow for XSK

- nfc: uapi: use kernel size_t to fix user-space builds"

* tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
fsl/fman: Fix missing put_device() call in fman_port_probe
selftests: net: using ping6 for IPv6 in udpgro_fwd.sh
Documentation: fix outdated interpretation of ip_no_pmtu_disc
net/ncsi: check for error return from call to nla_put_u32
net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper
net: fix use-after-free in tw_timer_handler
selftests: net: Fix a typo in udpgro_fwd.sh
selftests/net: udpgso_bench_tx: fix dst ip argument
net: bridge: mcast: add and enforce startup query interval minimum
net: bridge: mcast: add and enforce query interval minimum
ipv6: raw: check passed optlen before reading
xsk: Initialise xskb free_list_node
net/mlx5e: Fix wrong features assignment in case of error
net/mlx5e: TC, Fix memory leak with rules with internal port
ionic: Initialize the 'lif->dbid_inuse' bitmap
igc: Fix TX timestamp support for non-MSI-X platforms
igc: Do not enable crosstimestamping for i225-V models
net/smc: fix kernel panic caused by race of smc_sock
net/smc: don't send CDC/LLC message if link not ready
NFC: st21nfca: Fix memory leak in device probe and remove
...