Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.

+3

.mailmap

··· 70 70 Baolin Wang <baolin.wang@linux.alibaba.com> <baolin.wang7@gmail.com> 71 71 Bart Van Assche <bvanassche@acm.org> <bart.vanassche@sandisk.com> 72 72 Bart Van Assche <bvanassche@acm.org> <bart.vanassche@wdc.com> 73 + Ben Dooks <ben-linux@fluff.org> <ben.dooks@simtec.co.uk> 74 + Ben Dooks <ben-linux@fluff.org> <ben.dooks@sifive.com> 73 75 Ben Gardner <bgardner@wabtec.com> 74 76 Ben M Cahill <ben.m.cahill@intel.com> 75 77 Ben Widawsky <bwidawsk@kernel.org> <ben@bwidawsk.net> ··· 235 233 Johan Hovold <johan@kernel.org> <jhovold@gmail.com> 236 234 Johan Hovold <johan@kernel.org> <johan@hovoldconsulting.com> 237 235 John Crispin <john@phrozen.org> <blogic@openwrt.org> 236 + John Keeping <john@keeping.me.uk> <john@metanate.com> 238 237 John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> 239 238 John Stultz <johnstul@us.ibm.com> 240 239 <jon.toppins+linux@gmail.com> <jtoppins@cumulusnetworks.com>

+32 -32

Documentation/trace/histogram.rst

··· 35 35 in place of an explicit value field - this is simply a count of 36 36 event hits. If 'values' isn't specified, an implicit 'hitcount' 37 37 value will be automatically created and used as the only value. 38 - Keys can be any field, or the special string 'stacktrace', which 38 + Keys can be any field, or the special string 'common_stacktrace', which 39 39 will use the event's kernel stacktrace as the key. The keywords 40 40 'keys' or 'key' can be used to specify keys, and the keywords 41 41 'values', 'vals', or 'val' can be used to specify values. Compound ··· 54 54 'compatible' if the fields named in the trigger share the same 55 55 number and type of fields and those fields also have the same names. 56 56 Note that any two events always share the compatible 'hitcount' and 57 - 'stacktrace' fields and can therefore be combined using those 57 + 'common_stacktrace' fields and can therefore be combined using those 58 58 fields, however pointless that may be. 59 59 60 60 'hist' triggers add a 'hist' file to each event's subdirectory. ··· 547 547 the hist trigger display symbolic call_sites, we can have the hist 548 548 trigger additionally display the complete set of kernel stack traces 549 549 that led to each call_site. To do that, we simply use the special 550 - value 'stacktrace' for the key parameter:: 550 + value 'common_stacktrace' for the key parameter:: 551 551 552 - # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \ 552 + # echo 'hist:keys=common_stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \ 553 553 /sys/kernel/tracing/events/kmem/kmalloc/trigger 554 554 555 555 The above trigger will use the kernel stack trace in effect when an ··· 561 561 every callpath to a kmalloc for a kernel compile):: 562 562 563 563 # cat /sys/kernel/tracing/events/kmem/kmalloc/hist 564 - # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active] 564 + # trigger info: hist:keys=common_stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active] 565 565 566 - { stacktrace: 566 + { common_stacktrace: 567 567 __kmalloc_track_caller+0x10b/0x1a0 568 568 kmemdup+0x20/0x50 569 569 hidraw_report_event+0x8a/0x120 [hid] ··· 581 581 cpu_startup_entry+0x315/0x3e0 582 582 rest_init+0x7c/0x80 583 583 } hitcount: 3 bytes_req: 21 bytes_alloc: 24 584 - { stacktrace: 584 + { common_stacktrace: 585 585 __kmalloc_track_caller+0x10b/0x1a0 586 586 kmemdup+0x20/0x50 587 587 hidraw_report_event+0x8a/0x120 [hid] ··· 596 596 do_IRQ+0x5a/0xf0 597 597 ret_from_intr+0x0/0x30 598 598 } hitcount: 3 bytes_req: 21 bytes_alloc: 24 599 - { stacktrace: 599 + { common_stacktrace: 600 600 kmem_cache_alloc_trace+0xeb/0x150 601 601 aa_alloc_task_context+0x27/0x40 602 602 apparmor_cred_prepare+0x1f/0x50 ··· 608 608 . 609 609 . 610 610 . 611 - { stacktrace: 611 + { common_stacktrace: 612 612 __kmalloc+0x11b/0x1b0 613 613 i915_gem_execbuffer2+0x6c/0x2c0 [i915] 614 614 drm_ioctl+0x349/0x670 [drm] ··· 616 616 SyS_ioctl+0x81/0xa0 617 617 system_call_fastpath+0x12/0x6a 618 618 } hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808 619 - { stacktrace: 619 + { common_stacktrace: 620 620 __kmalloc+0x11b/0x1b0 621 621 load_elf_phdrs+0x76/0xa0 622 622 load_elf_binary+0x102/0x1650 ··· 625 625 SyS_execve+0x3a/0x50 626 626 return_from_execve+0x0/0x23 627 627 } hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048 628 - { stacktrace: 628 + { common_stacktrace: 629 629 kmem_cache_alloc_trace+0xeb/0x150 630 630 apparmor_file_alloc_security+0x27/0x40 631 631 security_file_alloc+0x16/0x20 ··· 636 636 SyS_open+0x1e/0x20 637 637 system_call_fastpath+0x12/0x6a 638 638 } hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376 639 - { stacktrace: 639 + { common_stacktrace: 640 640 __kmalloc+0x11b/0x1b0 641 641 seq_buf_alloc+0x1b/0x50 642 642 seq_read+0x2cc/0x370 ··· 1026 1026 First we set up an initially paused stacktrace trigger on the 1027 1027 netif_receive_skb event:: 1028 1028 1029 - # echo 'hist:key=stacktrace:vals=len:pause' > \ 1029 + # echo 'hist:key=common_stacktrace:vals=len:pause' > \ 1030 1030 /sys/kernel/tracing/events/net/netif_receive_skb/trigger 1031 1031 1032 1032 Next, we set up an 'enable_hist' trigger on the sched_process_exec ··· 1060 1060 $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz 1061 1061 1062 1062 # cat /sys/kernel/tracing/events/net/netif_receive_skb/hist 1063 - # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused] 1063 + # trigger info: hist:keys=common_stacktrace:vals=len:sort=hitcount:size=2048 [paused] 1064 1064 1065 - { stacktrace: 1065 + { common_stacktrace: 1066 1066 __netif_receive_skb_core+0x46d/0x990 1067 1067 __netif_receive_skb+0x18/0x60 1068 1068 netif_receive_skb_internal+0x23/0x90 ··· 1079 1079 kthread+0xd2/0xf0 1080 1080 ret_from_fork+0x42/0x70 1081 1081 } hitcount: 85 len: 28884 1082 - { stacktrace: 1082 + { common_stacktrace: 1083 1083 __netif_receive_skb_core+0x46d/0x990 1084 1084 __netif_receive_skb+0x18/0x60 1085 1085 netif_receive_skb_internal+0x23/0x90 ··· 1097 1097 irq_thread+0x11f/0x150 1098 1098 kthread+0xd2/0xf0 1099 1099 } hitcount: 98 len: 664329 1100 - { stacktrace: 1100 + { common_stacktrace: 1101 1101 __netif_receive_skb_core+0x46d/0x990 1102 1102 __netif_receive_skb+0x18/0x60 1103 1103 process_backlog+0xa8/0x150 ··· 1115 1115 inet_sendmsg+0x64/0xa0 1116 1116 sock_sendmsg+0x3d/0x50 1117 1117 } hitcount: 115 len: 13030 1118 - { stacktrace: 1118 + { common_stacktrace: 1119 1119 __netif_receive_skb_core+0x46d/0x990 1120 1120 __netif_receive_skb+0x18/0x60 1121 1121 netif_receive_skb_internal+0x23/0x90 ··· 1142 1142 into the histogram. In order to avoid having to set everything up 1143 1143 again, we can just clear the histogram first:: 1144 1144 1145 - # echo 'hist:key=stacktrace:vals=len:clear' >> \ 1145 + # echo 'hist:key=common_stacktrace:vals=len:clear' >> \ 1146 1146 /sys/kernel/tracing/events/net/netif_receive_skb/trigger 1147 1147 1148 1148 Just to verify that it is in fact cleared, here's what we now see in 1149 1149 the hist file:: 1150 1150 1151 1151 # cat /sys/kernel/tracing/events/net/netif_receive_skb/hist 1152 - # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused] 1152 + # trigger info: hist:keys=common_stacktrace:vals=len:sort=hitcount:size=2048 [paused] 1153 1153 1154 1154 Totals: 1155 1155 Hits: 0 ··· 1485 1485 1486 1486 And here's an example that shows how to combine histogram data from 1487 1487 any two events even if they don't share any 'compatible' fields 1488 - other than 'hitcount' and 'stacktrace'. These commands create a 1488 + other than 'hitcount' and 'common_stacktrace'. These commands create a 1489 1489 couple of triggers named 'bar' using those fields:: 1490 1490 1491 - # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \ 1491 + # echo 'hist:name=bar:key=common_stacktrace:val=hitcount' > \ 1492 1492 /sys/kernel/tracing/events/sched/sched_process_fork/trigger 1493 - # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \ 1493 + # echo 'hist:name=bar:key=common_stacktrace:val=hitcount' > \ 1494 1494 /sys/kernel/tracing/events/net/netif_rx/trigger 1495 1495 1496 1496 And displaying the output of either shows some interesting if ··· 1501 1501 1502 1502 # event histogram 1503 1503 # 1504 - # trigger info: hist:name=bar:keys=stacktrace:vals=hitcount:sort=hitcount:size=2048 [active] 1504 + # trigger info: hist:name=bar:keys=common_stacktrace:vals=hitcount:sort=hitcount:size=2048 [active] 1505 1505 # 1506 1506 1507 - { stacktrace: 1507 + { common_stacktrace: 1508 1508 kernel_clone+0x18e/0x330 1509 1509 kernel_thread+0x29/0x30 1510 1510 kthreadd+0x154/0x1b0 1511 1511 ret_from_fork+0x3f/0x70 1512 1512 } hitcount: 1 1513 - { stacktrace: 1513 + { common_stacktrace: 1514 1514 netif_rx_internal+0xb2/0xd0 1515 1515 netif_rx_ni+0x20/0x70 1516 1516 dev_loopback_xmit+0xaa/0xd0 ··· 1528 1528 call_cpuidle+0x3b/0x60 1529 1529 cpu_startup_entry+0x22d/0x310 1530 1530 } hitcount: 1 1531 - { stacktrace: 1531 + { common_stacktrace: 1532 1532 netif_rx_internal+0xb2/0xd0 1533 1533 netif_rx_ni+0x20/0x70 1534 1534 dev_loopback_xmit+0xaa/0xd0 ··· 1543 1543 SyS_sendto+0xe/0x10 1544 1544 entry_SYSCALL_64_fastpath+0x12/0x6a 1545 1545 } hitcount: 2 1546 - { stacktrace: 1546 + { common_stacktrace: 1547 1547 netif_rx_internal+0xb2/0xd0 1548 1548 netif_rx+0x1c/0x60 1549 1549 loopback_xmit+0x6c/0xb0 ··· 1561 1561 sock_sendmsg+0x38/0x50 1562 1562 ___sys_sendmsg+0x14e/0x270 1563 1563 } hitcount: 76 1564 - { stacktrace: 1564 + { common_stacktrace: 1565 1565 netif_rx_internal+0xb2/0xd0 1566 1566 netif_rx+0x1c/0x60 1567 1567 loopback_xmit+0x6c/0xb0 ··· 1579 1579 sock_sendmsg+0x38/0x50 1580 1580 ___sys_sendmsg+0x269/0x270 1581 1581 } hitcount: 77 1582 - { stacktrace: 1582 + { common_stacktrace: 1583 1583 netif_rx_internal+0xb2/0xd0 1584 1584 netif_rx+0x1c/0x60 1585 1585 loopback_xmit+0x6c/0xb0 ··· 1597 1597 sock_sendmsg+0x38/0x50 1598 1598 SYSC_sendto+0xef/0x170 1599 1599 } hitcount: 88 1600 - { stacktrace: 1600 + { common_stacktrace: 1601 1601 kernel_clone+0x18e/0x330 1602 1602 SyS_clone+0x19/0x20 1603 1603 entry_SYSCALL_64_fastpath+0x12/0x6a ··· 1949 1949 1950 1950 # cd /sys/kernel/tracing 1951 1951 # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events 1952 - # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger 1952 + # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=common_stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger 1953 1953 # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger 1954 1954 # echo 1 > events/synthetic/block_lat/enable 1955 1955 # cat trace

+5

arch/powerpc/purgatory/Makefile

··· 5 5 6 6 targets += trampoline_$(BITS).o purgatory.ro 7 7 8 + # When profile-guided optimization is enabled, llvm emits two different 9 + # overlapping text sections, which is not supported by kexec. Remove profile 10 + # optimization flags. 11 + KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%,$(KBUILD_CFLAGS)) 12 + 8 13 LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined 9 14 10 15 $(obj)/purgatory.ro: $(obj)/trampoline_$(BITS).o FORCE

+5

arch/riscv/purgatory/Makefile

··· 35 35 CFLAGS_string.o := -D__DISABLE_EXPORTS 36 36 CFLAGS_ctype.o := -D__DISABLE_EXPORTS 37 37 38 + # When profile-guided optimization is enabled, llvm emits two different 39 + # overlapping text sections, which is not supported by kexec. Remove profile 40 + # optimization flags. 41 + KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%,$(KBUILD_CFLAGS)) 42 + 38 43 # When linking purgatory.ro with -r unresolved symbols are not checked, 39 44 # also link a purgatory.chk binary without -r to check for unresolved symbols. 40 45 PURGATORY_LDFLAGS := -e purgatory_start -z nodefaultlib

-2

arch/x86/crypto/aria-aesni-avx-asm_64.S

··· 773 773 .octa 0x3F893781E95FE1576CDA64D2BA0CB204 774 774 775 775 #ifdef CONFIG_AS_GFNI 776 - .section .rodata.cst8, "aM", @progbits, 8 777 - .align 8 778 776 /* AES affine: */ 779 777 #define tf_aff_const BV8(1, 1, 0, 0, 0, 1, 1, 0) 780 778 .Ltf_aff_bitmatrix:

+5

arch/x86/purgatory/Makefile

··· 14 14 15 15 CFLAGS_sha256.o := -D__DISABLE_EXPORTS 16 16 17 + # When profile-guided optimization is enabled, llvm emits two different 18 + # overlapping text sections, which is not supported by kexec. Remove profile 19 + # optimization flags. 20 + KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%,$(KBUILD_CFLAGS)) 21 + 17 22 # When linking purgatory.ro with -r unresolved symbols are not checked, 18 23 # also link a purgatory.chk binary without -r to check for unresolved symbols. 19 24 PURGATORY_LDFLAGS := -e purgatory_start -z nodefaultlib

+6 -41

drivers/dma-buf/udmabuf.c

··· 12 12 #include <linux/shmem_fs.h> 13 13 #include <linux/slab.h> 14 14 #include <linux/udmabuf.h> 15 - #include <linux/hugetlb.h> 16 15 #include <linux/vmalloc.h> 17 16 #include <linux/iosys-map.h> 18 17 ··· 206 207 struct udmabuf *ubuf; 207 208 struct dma_buf *buf; 208 209 pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; 209 - struct page *page, *hpage = NULL; 210 - pgoff_t subpgoff, maxsubpgs; 211 - struct hstate *hpstate; 210 + struct page *page; 212 211 int seals, ret = -EINVAL; 213 212 u32 i, flags; 214 213 ··· 242 245 if (!memfd) 243 246 goto err; 244 247 mapping = memfd->f_mapping; 245 - if (!shmem_mapping(mapping) && !is_file_hugepages(memfd)) 248 + if (!shmem_mapping(mapping)) 246 249 goto err; 247 250 seals = memfd_fcntl(memfd, F_GET_SEALS, 0); 248 251 if (seals == -EINVAL) ··· 253 256 goto err; 254 257 pgoff = list[i].offset >> PAGE_SHIFT; 255 258 pgcnt = list[i].size >> PAGE_SHIFT; 256 - if (is_file_hugepages(memfd)) { 257 - hpstate = hstate_file(memfd); 258 - pgoff = list[i].offset >> huge_page_shift(hpstate); 259 - subpgoff = (list[i].offset & 260 - ~huge_page_mask(hpstate)) >> PAGE_SHIFT; 261 - maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT; 262 - } 263 259 for (pgidx = 0; pgidx < pgcnt; pgidx++) { 264 - if (is_file_hugepages(memfd)) { 265 - if (!hpage) { 266 - hpage = find_get_page_flags(mapping, pgoff, 267 - FGP_ACCESSED); 268 - if (!hpage) { 269 - ret = -EINVAL; 270 - goto err; 271 - } 272 - } 273 - page = hpage + subpgoff; 274 - get_page(page); 275 - subpgoff++; 276 - if (subpgoff == maxsubpgs) { 277 - put_page(hpage); 278 - hpage = NULL; 279 - subpgoff = 0; 280 - pgoff++; 281 - } 282 - } else { 283 - page = shmem_read_mapping_page(mapping, 284 - pgoff + pgidx); 285 - if (IS_ERR(page)) { 286 - ret = PTR_ERR(page); 287 - goto err; 288 - } 260 + page = shmem_read_mapping_page(mapping, pgoff + pgidx); 261 + if (IS_ERR(page)) { 262 + ret = PTR_ERR(page); 263 + goto err; 289 264 } 290 265 ubuf->pages[pgbuf++] = page; 291 266 } 292 267 fput(memfd); 293 268 memfd = NULL; 294 - if (hpage) { 295 - put_page(hpage); 296 - hpage = NULL; 297 - } 298 269 } 299 270 300 271 exp_info.ops = &udmabuf_ops;

+10 -7

drivers/dma/at_hdmac.c

··· 132 132 #define ATC_DST_PIP BIT(12) /* Destination Picture-in-Picture enabled */ 133 133 #define ATC_SRC_DSCR_DIS BIT(16) /* Src Descriptor fetch disable */ 134 134 #define ATC_DST_DSCR_DIS BIT(20) /* Dst Descriptor fetch disable */ 135 - #define ATC_FC GENMASK(22, 21) /* Choose Flow Controller */ 135 + #define ATC_FC GENMASK(23, 21) /* Choose Flow Controller */ 136 136 #define ATC_FC_MEM2MEM 0x0 /* Mem-to-Mem (DMA) */ 137 137 #define ATC_FC_MEM2PER 0x1 /* Mem-to-Periph (DMA) */ 138 138 #define ATC_FC_PER2MEM 0x2 /* Periph-to-Mem (DMA) */ ··· 153 153 #define ATC_AUTO BIT(31) /* Auto multiple buffer tx enable */ 154 154 155 155 /* Bitfields in CFG */ 156 - #define ATC_PER_MSB(h) ((0x30U & (h)) >> 4) /* Extract most significant bits of a handshaking identifier */ 157 - 158 156 #define ATC_SRC_PER GENMASK(3, 0) /* Channel src rq associated with periph handshaking ifc h */ 159 157 #define ATC_DST_PER GENMASK(7, 4) /* Channel dst rq associated with periph handshaking ifc h */ 160 158 #define ATC_SRC_REP BIT(8) /* Source Replay Mod */ ··· 179 181 #define ATC_DPIP_HOLE GENMASK(15, 0) 180 182 #define ATC_DPIP_BOUNDARY GENMASK(25, 16) 181 183 182 - #define ATC_SRC_PER_ID(id) (FIELD_PREP(ATC_SRC_PER_MSB, (id)) | \ 183 - FIELD_PREP(ATC_SRC_PER, (id))) 184 - #define ATC_DST_PER_ID(id) (FIELD_PREP(ATC_DST_PER_MSB, (id)) | \ 185 - FIELD_PREP(ATC_DST_PER, (id))) 184 + #define ATC_PER_MSB GENMASK(5, 4) /* Extract MSBs of a handshaking identifier */ 185 + #define ATC_SRC_PER_ID(id) \ 186 + ({ typeof(id) _id = (id); \ 187 + FIELD_PREP(ATC_SRC_PER_MSB, FIELD_GET(ATC_PER_MSB, _id)) | \ 188 + FIELD_PREP(ATC_SRC_PER, _id); }) 189 + #define ATC_DST_PER_ID(id) \ 190 + ({ typeof(id) _id = (id); \ 191 + FIELD_PREP(ATC_DST_PER_MSB, FIELD_GET(ATC_PER_MSB, _id)) | \ 192 + FIELD_PREP(ATC_DST_PER, _id); }) 186 193 187 194 188 195

+5 -2

drivers/dma/at_xdmac.c

··· 1102 1102 NULL, 1103 1103 src_addr, dst_addr, 1104 1104 xt, xt->sgl); 1105 + if (!first) 1106 + return NULL; 1105 1107 1106 1108 /* Length of the block is (BLEN+1) microblocks. */ 1107 1109 for (i = 0; i < xt->numf - 1; i++) ··· 1134 1132 src_addr, dst_addr, 1135 1133 xt, chunk); 1136 1134 if (!desc) { 1137 - list_splice_tail_init(&first->descs_list, 1138 - &atchan->free_descs_list); 1135 + if (first) 1136 + list_splice_tail_init(&first->descs_list, 1137 + &atchan->free_descs_list); 1139 1138 return NULL; 1140 1139 } 1141 1140

-1

drivers/dma/idxd/cdev.c

··· 277 277 if (wq_dedicated(wq)) { 278 278 rc = idxd_wq_set_pasid(wq, pasid); 279 279 if (rc < 0) { 280 - iommu_sva_unbind_device(sva); 281 280 dev_err(dev, "wq set pasid failed: %d\n", rc); 282 281 goto failed_set_pasid; 283 282 }

+4 -4

drivers/dma/pl330.c

··· 1050 1050 return true; 1051 1051 } 1052 1052 1053 - static bool _start(struct pl330_thread *thrd) 1053 + static bool pl330_start_thread(struct pl330_thread *thrd) 1054 1054 { 1055 1055 switch (_state(thrd)) { 1056 1056 case PL330_STATE_FAULT_COMPLETING: ··· 1702 1702 thrd->req_running = -1; 1703 1703 1704 1704 /* Get going again ASAP */ 1705 - _start(thrd); 1705 + pl330_start_thread(thrd); 1706 1706 1707 1707 /* For now, just make a list of callbacks to be done */ 1708 1708 list_add_tail(&descdone->rqd, &pl330->req_done); ··· 2089 2089 } else { 2090 2090 /* Make sure the PL330 Channel thread is active */ 2091 2091 spin_lock(&pch->thread->dmac->lock); 2092 - _start(pch->thread); 2092 + pl330_start_thread(pch->thread); 2093 2093 spin_unlock(&pch->thread->dmac->lock); 2094 2094 } 2095 2095 ··· 2107 2107 if (power_down) { 2108 2108 pch->active = true; 2109 2109 spin_lock(&pch->thread->dmac->lock); 2110 - _start(pch->thread); 2110 + pl330_start_thread(pch->thread); 2111 2111 spin_unlock(&pch->thread->dmac->lock); 2112 2112 power_down = false; 2113 2113 }

+2 -2

drivers/dma/ti/k3-udma.c

··· 5527 5527 return ret; 5528 5528 } 5529 5529 5530 - static int udma_pm_suspend(struct device *dev) 5530 + static int __maybe_unused udma_pm_suspend(struct device *dev) 5531 5531 { 5532 5532 struct udma_dev *ud = dev_get_drvdata(dev); 5533 5533 struct dma_device *dma_dev = &ud->ddev; ··· 5549 5549 return 0; 5550 5550 } 5551 5551 5552 - static int udma_pm_resume(struct device *dev) 5552 + static int __maybe_unused udma_pm_resume(struct device *dev) 5553 5553 { 5554 5554 struct udma_dev *ud = dev_get_drvdata(dev); 5555 5555 struct dma_device *dma_dev = &ud->ddev;

+1 -1

drivers/md/dm-cache-metadata.c

··· 1828 1828 * Replacement block manager (new_bm) is created and old_bm destroyed outside of 1829 1829 * cmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of 1830 1830 * shrinker associated with the block manager's bufio client vs cmd root_lock). 1831 - * - must take shrinker_mutex without holding cmd->root_lock 1831 + * - must take shrinker_rwsem without holding cmd->root_lock 1832 1832 */ 1833 1833 new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT, 1834 1834 CACHE_MAX_CONCURRENT_LOCKS);

+1 -1

drivers/md/dm-thin-metadata.c

··· 1887 1887 * Replacement block manager (new_bm) is created and old_bm destroyed outside of 1888 1888 * pmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of 1889 1889 * shrinker associated with the block manager's bufio client vs pmd root_lock). 1890 - * - must take shrinker_mutex without holding pmd->root_lock 1890 + * - must take shrinker_rwsem without holding pmd->root_lock 1891 1891 */ 1892 1892 new_bm = dm_block_manager_create(pmd->bdev, THIN_METADATA_BLOCK_SIZE << SECTOR_SHIFT, 1893 1893 THIN_MAX_CONCURRENT_LOCKS);

+1 -1

drivers/phy/amlogic/phy-meson-g12a-mipi-dphy-analog.c

··· 70 70 HHI_MIPI_CNTL1_BANDGAP); 71 71 72 72 regmap_write(priv->regmap, HHI_MIPI_CNTL2, 73 - FIELD_PREP(HHI_MIPI_CNTL2_DIF_TX_CTL0, 0x459) | 73 + FIELD_PREP(HHI_MIPI_CNTL2_DIF_TX_CTL0, 0x45a) | 74 74 FIELD_PREP(HHI_MIPI_CNTL2_DIF_TX_CTL1, 0x2680)); 75 75 76 76 reg = DSI_LANE_CLK;

+5 -5

drivers/phy/mediatek/phy-mtk-hdmi-mt8195.c

··· 237 237 */ 238 238 if (tmds_clk < 54 * MEGA) 239 239 txposdiv = 8; 240 - else if (tmds_clk >= 54 * MEGA && tmds_clk < 148.35 * MEGA) 240 + else if (tmds_clk >= 54 * MEGA && (tmds_clk * 100) < 14835 * MEGA) 241 241 txposdiv = 4; 242 - else if (tmds_clk >= 148.35 * MEGA && tmds_clk < 296.7 * MEGA) 242 + else if ((tmds_clk * 100) >= 14835 * MEGA && (tmds_clk * 10) < 2967 * MEGA) 243 243 txposdiv = 2; 244 - else if (tmds_clk >= 296.7 * MEGA && tmds_clk <= 594 * MEGA) 244 + else if ((tmds_clk * 10) >= 2967 * MEGA && tmds_clk <= 594 * MEGA) 245 245 txposdiv = 1; 246 246 else 247 247 return -EINVAL; ··· 324 324 clk_channel_bias = 0x34; /* 20mA */ 325 325 impedance_en = 0xf; 326 326 impedance = 0x36; /* 100ohm */ 327 - } else if (pixel_clk >= 74.175 * MEGA && pixel_clk <= 300 * MEGA) { 327 + } else if (((u64)pixel_clk * 1000) >= 74175 * MEGA && pixel_clk <= 300 * MEGA) { 328 328 data_channel_bias = 0x34; /* 20mA */ 329 329 clk_channel_bias = 0x2c; /* 16mA */ 330 330 impedance_en = 0xf; 331 331 impedance = 0x36; /* 100ohm */ 332 - } else if (pixel_clk >= 27 * MEGA && pixel_clk < 74.175 * MEGA) { 332 + } else if (pixel_clk >= 27 * MEGA && ((u64)pixel_clk * 1000) < 74175 * MEGA) { 333 333 data_channel_bias = 0x14; /* 10mA */ 334 334 clk_channel_bias = 0x14; /* 10mA */ 335 335 impedance_en = 0x0;

+3 -2

drivers/phy/qualcomm/phy-qcom-qmp-combo.c

··· 2472 2472 ret = regulator_bulk_enable(cfg->num_vregs, qmp->vregs); 2473 2473 if (ret) { 2474 2474 dev_err(qmp->dev, "failed to enable regulators, err=%d\n", ret); 2475 - goto err_unlock; 2475 + goto err_decrement_count; 2476 2476 } 2477 2477 2478 2478 ret = reset_control_bulk_assert(cfg->num_resets, qmp->resets); ··· 2522 2522 reset_control_bulk_assert(cfg->num_resets, qmp->resets); 2523 2523 err_disable_regulators: 2524 2524 regulator_bulk_disable(cfg->num_vregs, qmp->vregs); 2525 - err_unlock: 2525 + err_decrement_count: 2526 + qmp->init_count--; 2526 2527 mutex_unlock(&qmp->phy_mutex); 2527 2528 2528 2529 return ret;

+3 -2

drivers/phy/qualcomm/phy-qcom-qmp-pcie-msm8996.c

··· 379 379 ret = regulator_bulk_enable(cfg->num_vregs, qmp->vregs); 380 380 if (ret) { 381 381 dev_err(qmp->dev, "failed to enable regulators, err=%d\n", ret); 382 - goto err_unlock; 382 + goto err_decrement_count; 383 383 } 384 384 385 385 ret = reset_control_bulk_assert(cfg->num_resets, qmp->resets); ··· 409 409 reset_control_bulk_assert(cfg->num_resets, qmp->resets); 410 410 err_disable_regulators: 411 411 regulator_bulk_disable(cfg->num_vregs, qmp->vregs); 412 - err_unlock: 412 + err_decrement_count: 413 + qmp->init_count--; 413 414 mutex_unlock(&qmp->phy_mutex); 414 415 415 416 return ret;

+1 -1

drivers/phy/qualcomm/phy-qcom-snps-femto-v2.c

··· 115 115 * 116 116 * @cfg_ahb_clk: AHB2PHY interface clock 117 117 * @ref_clk: phy reference clock 118 - * @iface_clk: phy interface clock 119 118 * @phy_reset: phy reset control 120 119 * @vregs: regulator supplies bulk data 121 120 * @phy_initialized: if PHY has been initialized correctly 122 121 * @mode: contains the current mode the PHY is in 122 + * @update_seq_cfg: tuning parameters for phy init 123 123 */ 124 124 struct qcom_snps_hsphy { 125 125 struct phy *phy;

+5 -1

fs/eventpoll.c

··· 1805 1805 { 1806 1806 int ret = default_wake_function(wq_entry, mode, sync, key); 1807 1807 1808 - list_del_init(&wq_entry->entry); 1808 + /* 1809 + * Pairs with list_empty_careful in ep_poll, and ensures future loop 1810 + * iterations see the cause of this wakeup. 1811 + */ 1812 + list_del_init_careful(&wq_entry->entry); 1809 1813 return ret; 1810 1814 } 1811 1815

+10 -2

fs/nilfs2/btnode.c

··· 285 285 if (nbh == NULL) { /* blocksize == pagesize */ 286 286 xa_erase_irq(&btnc->i_pages, newkey); 287 287 unlock_page(ctxt->bh->b_page); 288 - } else 289 - brelse(nbh); 288 + } else { 289 + /* 290 + * When canceling a buffer that a prepare operation has 291 + * allocated to copy a node block to another location, use 292 + * nilfs_btnode_delete() to initialize and release the buffer 293 + * so that the buffer flags will not be in an inconsistent 294 + * state when it is reallocated. 295 + */ 296 + nilfs_btnode_delete(nbh); 297 + } 290 298 }

+9 -1

fs/nilfs2/page.c

··· 370 370 struct folio *folio = fbatch.folios[i]; 371 371 372 372 folio_lock(folio); 373 - nilfs_clear_dirty_page(&folio->page, silent); 373 + 374 + /* 375 + * This folio may have been removed from the address 376 + * space by truncation or invalidation when the lock 377 + * was acquired. Skip processing in that case. 378 + */ 379 + if (likely(folio->mapping == mapping)) 380 + nilfs_clear_dirty_page(&folio->page, silent); 381 + 374 382 folio_unlock(folio); 375 383 } 376 384 folio_batch_release(&fbatch);

+6

fs/nilfs2/segbuf.c

··· 101 101 if (unlikely(!bh)) 102 102 return -ENOMEM; 103 103 104 + lock_buffer(bh); 105 + if (!buffer_uptodate(bh)) { 106 + memset(bh->b_data, 0, bh->b_size); 107 + set_buffer_uptodate(bh); 108 + } 109 + unlock_buffer(bh); 104 110 nilfs_segbuf_add_segsum_buffer(segbuf, bh); 105 111 return 0; 106 112 }

+7

fs/nilfs2/segment.c

··· 981 981 unsigned int isz, srsz; 982 982 983 983 bh_sr = NILFS_LAST_SEGBUF(&sci->sc_segbufs)->sb_super_root; 984 + 985 + lock_buffer(bh_sr); 984 986 raw_sr = (struct nilfs_super_root *)bh_sr->b_data; 985 987 isz = nilfs->ns_inode_size; 986 988 srsz = NILFS_SR_BYTES(isz); 987 989 990 + raw_sr->sr_sum = 0; /* Ensure initialization within this update */ 988 991 raw_sr->sr_bytes = cpu_to_le16(srsz); 989 992 raw_sr->sr_nongc_ctime 990 993 = cpu_to_le64(nilfs_doing_gc() ? ··· 1001 998 nilfs_write_inode_common(nilfs->ns_sufile, (void *)raw_sr + 1002 999 NILFS_SR_SUFILE_OFFSET(isz), 1); 1003 1000 memset((void *)raw_sr + srsz, 0, nilfs->ns_blocksize - srsz); 1001 + set_buffer_uptodate(bh_sr); 1002 + unlock_buffer(bh_sr); 1004 1003 } 1005 1004 1006 1005 static void nilfs_redirty_inodes(struct list_head *head) ··· 1785 1780 list_for_each_entry(segbuf, logs, sb_list) { 1786 1781 list_for_each_entry(bh, &segbuf->sb_segsum_buffers, 1787 1782 b_assoc_buffers) { 1783 + clear_buffer_uptodate(bh); 1788 1784 if (bh->b_page != bd_page) { 1789 1785 if (bd_page) 1790 1786 end_page_writeback(bd_page); ··· 1797 1791 b_assoc_buffers) { 1798 1792 clear_buffer_async_write(bh); 1799 1793 if (bh == segbuf->sb_super_root) { 1794 + clear_buffer_uptodate(bh); 1800 1795 if (bh->b_page != bd_page) { 1801 1796 end_page_writeback(bd_page); 1802 1797 bd_page = bh->b_page;

+9

fs/nilfs2/sufile.c

··· 779 779 goto out_header; 780 780 781 781 sui->ncleansegs -= nsegs - newnsegs; 782 + 783 + /* 784 + * If the sufile is successfully truncated, immediately adjust 785 + * the segment allocation space while locking the semaphore 786 + * "mi_sem" so that nilfs_sufile_alloc() never allocates 787 + * segments in the truncated space. 788 + */ 789 + sui->allocmax = newnsegs - 1; 790 + sui->allocmin = 0; 782 791 } 783 792 784 793 kaddr = kmap_atomic(header_bh->b_page);

+23 -2

fs/nilfs2/super.c

··· 372 372 goto out; 373 373 } 374 374 nsbp = (void *)nsbh->b_data + offset; 375 - memset(nsbp, 0, nilfs->ns_blocksize); 375 + 376 + lock_buffer(nsbh); 377 + if (sb2i >= 0) { 378 + /* 379 + * The position of the second superblock only changes by 4KiB, 380 + * which is larger than the maximum superblock data size 381 + * (= 1KiB), so there is no need to use memmove() to allow 382 + * overlap between source and destination. 383 + */ 384 + memcpy(nsbp, nilfs->ns_sbp[sb2i], nilfs->ns_sbsize); 385 + 386 + /* 387 + * Zero fill after copy to avoid overwriting in case of move 388 + * within the same block. 389 + */ 390 + memset(nsbh->b_data, 0, offset); 391 + memset((void *)nsbp + nilfs->ns_sbsize, 0, 392 + nsbh->b_size - offset - nilfs->ns_sbsize); 393 + } else { 394 + memset(nsbh->b_data, 0, nsbh->b_size); 395 + } 396 + set_buffer_uptodate(nsbh); 397 + unlock_buffer(nsbh); 376 398 377 399 if (sb2i >= 0) { 378 - memcpy(nsbp, nilfs->ns_sbp[sb2i], nilfs->ns_sbsize); 379 400 brelse(nilfs->ns_sbh[sb2i]); 380 401 nilfs->ns_sbh[sb2i] = nsbh; 381 402 nilfs->ns_sbp[sb2i] = nsbp;

+42 -1

fs/nilfs2/the_nilfs.c

··· 405 405 100)); 406 406 } 407 407 408 + /** 409 + * nilfs_max_segment_count - calculate the maximum number of segments 410 + * @nilfs: nilfs object 411 + */ 412 + static u64 nilfs_max_segment_count(struct the_nilfs *nilfs) 413 + { 414 + u64 max_count = U64_MAX; 415 + 416 + do_div(max_count, nilfs->ns_blocks_per_segment); 417 + return min_t(u64, max_count, ULONG_MAX); 418 + } 419 + 408 420 void nilfs_set_nsegments(struct the_nilfs *nilfs, unsigned long nsegs) 409 421 { 410 422 nilfs->ns_nsegments = nsegs; ··· 426 414 static int nilfs_store_disk_layout(struct the_nilfs *nilfs, 427 415 struct nilfs_super_block *sbp) 428 416 { 417 + u64 nsegments, nblocks; 418 + 429 419 if (le32_to_cpu(sbp->s_rev_level) < NILFS_MIN_SUPP_REV) { 430 420 nilfs_err(nilfs->ns_sb, 431 421 "unsupported revision (superblock rev.=%d.%d, current rev.=%d.%d). Please check the version of mkfs.nilfs(2).", ··· 471 457 return -EINVAL; 472 458 } 473 459 474 - nilfs_set_nsegments(nilfs, le64_to_cpu(sbp->s_nsegments)); 460 + nsegments = le64_to_cpu(sbp->s_nsegments); 461 + if (nsegments > nilfs_max_segment_count(nilfs)) { 462 + nilfs_err(nilfs->ns_sb, 463 + "segment count %llu exceeds upper limit (%llu segments)", 464 + (unsigned long long)nsegments, 465 + (unsigned long long)nilfs_max_segment_count(nilfs)); 466 + return -EINVAL; 467 + } 468 + 469 + nblocks = sb_bdev_nr_blocks(nilfs->ns_sb); 470 + if (nblocks) { 471 + u64 min_block_count = nsegments * nilfs->ns_blocks_per_segment; 472 + /* 473 + * To avoid failing to mount early device images without a 474 + * second superblock, exclude that block count from the 475 + * "min_block_count" calculation. 476 + */ 477 + 478 + if (nblocks < min_block_count) { 479 + nilfs_err(nilfs->ns_sb, 480 + "total number of segment blocks %llu exceeds device size (%llu blocks)", 481 + (unsigned long long)min_block_count, 482 + (unsigned long long)nblocks); 483 + return -EINVAL; 484 + } 485 + } 486 + 487 + nilfs_set_nsegments(nilfs, nsegments); 475 488 nilfs->ns_crc_seed = le32_to_cpu(sbp->s_crc_seed); 476 489 return 0; 477 490 }

+7 -1

fs/ocfs2/file.c

··· 2100 2100 struct ocfs2_space_resv sr; 2101 2101 int change_size = 1; 2102 2102 int cmd = OCFS2_IOC_RESVSP64; 2103 + int ret = 0; 2103 2104 2104 2105 if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) 2105 2106 return -EOPNOTSUPP; 2106 2107 if (!ocfs2_writes_unwritten_extents(osb)) 2107 2108 return -EOPNOTSUPP; 2108 2109 2109 - if (mode & FALLOC_FL_KEEP_SIZE) 2110 + if (mode & FALLOC_FL_KEEP_SIZE) { 2110 2111 change_size = 0; 2112 + } else { 2113 + ret = inode_newsize_ok(inode, offset + len); 2114 + if (ret) 2115 + return ret; 2116 + } 2111 2117 2112 2118 if (mode & FALLOC_FL_PUNCH_HOLE) 2113 2119 cmd = OCFS2_IOC_UNRESVSP64;

+4 -2

fs/ocfs2/super.c

··· 952 952 for (type = 0; type < OCFS2_MAXQUOTAS; type++) { 953 953 if (!sb_has_quota_loaded(sb, type)) 954 954 continue; 955 - oinfo = sb_dqinfo(sb, type)->dqi_priv; 956 - cancel_delayed_work_sync(&oinfo->dqi_sync_work); 955 + if (!sb_has_quota_suspended(sb, type)) { 956 + oinfo = sb_dqinfo(sb, type)->dqi_priv; 957 + cancel_delayed_work_sync(&oinfo->dqi_sync_work); 958 + } 957 959 inode = igrab(sb->s_dquot.files[type]); 958 960 /* Turn off quotas. This will remove all dquot structures from 959 961 * memory and so they will be automatically synced to global

+1 -1

fs/super.c

··· 54 54 * One thing we have to be careful of with a per-sb shrinker is that we don't 55 55 * drop the last active reference to the superblock from within the shrinker. 56 56 * If that happens we could trigger unregistering the shrinker from within the 57 - * shrinker path and that leads to deadlock on the shrinker_mutex. Hence we 57 + * shrinker path and that leads to deadlock on the shrinker_rwsem. Hence we 58 58 * take a passive reference to the superblock to avoid this from occurring. 59 59 */ 60 60 static unsigned long super_cache_scan(struct shrinker *shrink,

+11 -2

fs/userfaultfd.c

··· 1322 1322 bool basic_ioctls; 1323 1323 unsigned long start, end, vma_end; 1324 1324 struct vma_iterator vmi; 1325 + pgoff_t pgoff; 1325 1326 1326 1327 user_uffdio_register = (struct uffdio_register __user *) arg; 1327 1328 ··· 1450 1449 1451 1450 vma_iter_set(&vmi, start); 1452 1451 prev = vma_prev(&vmi); 1452 + if (vma->vm_start < start) 1453 + prev = vma; 1453 1454 1454 1455 ret = 0; 1455 1456 for_each_vma_range(vmi, vma, end) { ··· 1475 1472 vma_end = min(end, vma->vm_end); 1476 1473 1477 1474 new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; 1475 + pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); 1478 1476 prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags, 1479 - vma->anon_vma, vma->vm_file, vma->vm_pgoff, 1477 + vma->anon_vma, vma->vm_file, pgoff, 1480 1478 vma_policy(vma), 1481 1479 ((struct vm_userfaultfd_ctx){ ctx }), 1482 1480 anon_vma_name(vma)); ··· 1557 1553 unsigned long start, end, vma_end; 1558 1554 const void __user *buf = (void __user *)arg; 1559 1555 struct vma_iterator vmi; 1556 + pgoff_t pgoff; 1560 1557 1561 1558 ret = -EFAULT; 1562 1559 if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister))) ··· 1620 1615 1621 1616 vma_iter_set(&vmi, start); 1622 1617 prev = vma_prev(&vmi); 1618 + if (vma->vm_start < start) 1619 + prev = vma; 1620 + 1623 1621 ret = 0; 1624 1622 for_each_vma_range(vmi, vma, end) { 1625 1623 cond_resched(); ··· 1660 1652 uffd_wp_range(vma, start, vma_end - start, false); 1661 1653 1662 1654 new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; 1655 + pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); 1663 1656 prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags, 1664 - vma->anon_vma, vma->vm_file, vma->vm_pgoff, 1657 + vma->anon_vma, vma->vm_file, pgoff, 1665 1658 vma_policy(vma), 1666 1659 NULL_VM_UFFD_CTX, anon_vma_name(vma)); 1667 1660 if (prev) {

-6

include/linux/fs.h

··· 2566 2566 struct inode *inode = file_inode(file); 2567 2567 return atomic_dec_unless_positive(&inode->i_writecount) ? 0 : -ETXTBSY; 2568 2568 } 2569 - static inline int exclusive_deny_write_access(struct file *file) 2570 - { 2571 - int old = 0; 2572 - struct inode *inode = file_inode(file); 2573 - return atomic_try_cmpxchg(&inode->i_writecount, &old, -1) ? 0 : -ETXTBSY; 2574 - } 2575 2569 static inline void put_write_access(struct inode * inode) 2576 2570 { 2577 2571 atomic_dec(&inode->i_writecount);

+1

include/linux/trace_events.h

··· 806 806 FILTER_TRACE_FN, 807 807 FILTER_COMM, 808 808 FILTER_CPU, 809 + FILTER_STACKTRACE, 809 810 }; 810 811 811 812 extern int trace_event_raw_init(struct trace_event_call *call);

+2 -1

include/linux/user_events.h

··· 17 17 18 18 #ifdef CONFIG_USER_EVENTS 19 19 struct user_event_mm { 20 - struct list_head link; 20 + struct list_head mms_link; 21 21 struct list_head enablers; 22 22 struct mm_struct *mm; 23 + /* Used for one-shot lists, protected by event_mutex */ 23 24 struct user_event_mm *next; 24 25 refcount_t refcnt; 25 26 refcount_t tasks;

+1 -1

include/trace/events/writeback.h

··· 68 68 strscpy_pad(__entry->name, 69 69 bdi_dev_name(mapping ? inode_to_bdi(mapping->host) : 70 70 NULL), 32); 71 - __entry->ino = mapping ? mapping->host->i_ino : 0; 71 + __entry->ino = (mapping && mapping->host) ? mapping->host->i_ino : 0; 72 72 __entry->index = folio->index; 73 73 ), 74 74

+13 -1

kernel/kexec_file.c

··· 901 901 } 902 902 903 903 offset = ALIGN(offset, align); 904 + 905 + /* 906 + * Check if the segment contains the entry point, if so, 907 + * calculate the value of image->start based on it. 908 + * If the compiler has produced more than one .text section 909 + * (Eg: .text.hot), they are generally after the main .text 910 + * section, and they shall not be used to calculate 911 + * image->start. So do not re-calculate image->start if it 912 + * is not set to the initial value, and warn the user so they 913 + * have a chance to fix their purgatory's linker script. 914 + */ 904 915 if (sechdrs[i].sh_flags & SHF_EXECINSTR && 905 916 pi->ehdr->e_entry >= sechdrs[i].sh_addr && 906 917 pi->ehdr->e_entry < (sechdrs[i].sh_addr 907 - + sechdrs[i].sh_size)) { 918 + + sechdrs[i].sh_size) && 919 + !WARN_ON(kbuf->image->start != pi->ehdr->e_entry)) { 908 920 kbuf->image->start -= sechdrs[i].sh_addr; 909 921 kbuf->image->start += kbuf->mem + offset; 910 922 }

+22 -50

kernel/module/main.c

··· 3057 3057 return load_module(&info, uargs, 0); 3058 3058 } 3059 3059 3060 - static int file_init_module(struct file *file, const char __user * uargs, int flags) 3060 + SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags) 3061 3061 { 3062 3062 struct load_info info = { }; 3063 3063 void *buf = NULL; 3064 3064 int len; 3065 - 3066 - len = kernel_read_file(file, 0, &buf, INT_MAX, NULL, 3067 - READING_MODULE); 3068 - if (len < 0) { 3069 - mod_stat_inc(&failed_kreads); 3070 - mod_stat_add_long(len, &invalid_kread_bytes); 3071 - return len; 3072 - } 3073 - 3074 - if (flags & MODULE_INIT_COMPRESSED_FILE) { 3075 - int err = module_decompress(&info, buf, len); 3076 - vfree(buf); /* compressed data is no longer needed */ 3077 - if (err) { 3078 - mod_stat_inc(&failed_decompress); 3079 - mod_stat_add_long(len, &invalid_decompress_bytes); 3080 - return err; 3081 - } 3082 - } else { 3083 - info.hdr = buf; 3084 - info.len = len; 3085 - } 3086 - 3087 - return load_module(&info, uargs, flags); 3088 - } 3089 - 3090 - /* 3091 - * kernel_read_file() will already deny write access, but module 3092 - * loading wants _exclusive_ access to the file, so we do that 3093 - * here, along with basic sanity checks. 3094 - */ 3095 - static int prepare_file_for_module_load(struct file *file) 3096 - { 3097 - if (!file || !(file->f_mode & FMODE_READ)) 3098 - return -EBADF; 3099 - if (!S_ISREG(file_inode(file)->i_mode)) 3100 - return -EINVAL; 3101 - return exclusive_deny_write_access(file); 3102 - } 3103 - 3104 - SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags) 3105 - { 3106 - struct fd f; 3107 3065 int err; 3108 3066 3109 3067 err = may_init_module(); ··· 3075 3117 |MODULE_INIT_COMPRESSED_FILE)) 3076 3118 return -EINVAL; 3077 3119 3078 - f = fdget(fd); 3079 - err = prepare_file_for_module_load(f.file); 3080 - if (!err) { 3081 - err = file_init_module(f.file, uargs, flags); 3082 - allow_write_access(f.file); 3120 + len = kernel_read_file_from_fd(fd, 0, &buf, INT_MAX, NULL, 3121 + READING_MODULE); 3122 + if (len < 0) { 3123 + mod_stat_inc(&failed_kreads); 3124 + mod_stat_add_long(len, &invalid_kread_bytes); 3125 + return len; 3083 3126 } 3084 - fdput(f); 3085 - return err; 3127 + 3128 + if (flags & MODULE_INIT_COMPRESSED_FILE) { 3129 + err = module_decompress(&info, buf, len); 3130 + vfree(buf); /* compressed data is no longer needed */ 3131 + if (err) { 3132 + mod_stat_inc(&failed_decompress); 3133 + mod_stat_add_long(len, &invalid_decompress_bytes); 3134 + return err; 3135 + } 3136 + } else { 3137 + info.hdr = buf; 3138 + info.len = len; 3139 + } 3140 + 3141 + return load_module(&info, uargs, flags); 3086 3142 } 3087 3143 3088 3144 /* Keep in sync with MODULE_FLAGS_BUF_SIZE !!! */

+36 -8

kernel/trace/trace.c

··· 60 60 */ 61 61 bool ring_buffer_expanded; 62 62 63 + #ifdef CONFIG_FTRACE_STARTUP_TEST 63 64 /* 64 65 * We need to change this state when a selftest is running. 65 66 * A selftest will lurk into the ring-buffer to count the ··· 76 75 */ 77 76 bool __read_mostly tracing_selftest_disabled; 78 77 79 - #ifdef CONFIG_FTRACE_STARTUP_TEST 80 78 void __init disable_tracing_selftest(const char *reason) 81 79 { 82 80 if (!tracing_selftest_disabled) { ··· 83 83 pr_info("Ftrace startup test is disabled due to %s\n", reason); 84 84 } 85 85 } 86 + #else 87 + #define tracing_selftest_running 0 88 + #define tracing_selftest_disabled 0 86 89 #endif 87 90 88 91 /* Pipe tracepoints to printk */ ··· 1054 1051 if (!(tr->trace_flags & TRACE_ITER_PRINTK)) 1055 1052 return 0; 1056 1053 1057 - if (unlikely(tracing_selftest_running || tracing_disabled)) 1054 + if (unlikely(tracing_selftest_running && tr == &global_trace)) 1055 + return 0; 1056 + 1057 + if (unlikely(tracing_disabled)) 1058 1058 return 0; 1059 1059 1060 1060 alloc = sizeof(*entry) + size + 2; /* possible \n added */ ··· 2047 2041 return 0; 2048 2042 } 2049 2043 2044 + static int do_run_tracer_selftest(struct tracer *type) 2045 + { 2046 + int ret; 2047 + 2048 + /* 2049 + * Tests can take a long time, especially if they are run one after the 2050 + * other, as does happen during bootup when all the tracers are 2051 + * registered. This could cause the soft lockup watchdog to trigger. 2052 + */ 2053 + cond_resched(); 2054 + 2055 + tracing_selftest_running = true; 2056 + ret = run_tracer_selftest(type); 2057 + tracing_selftest_running = false; 2058 + 2059 + return ret; 2060 + } 2061 + 2050 2062 static __init int init_trace_selftests(void) 2051 2063 { 2052 2064 struct trace_selftests *p, *n; ··· 2116 2092 { 2117 2093 return 0; 2118 2094 } 2095 + static inline int do_run_tracer_selftest(struct tracer *type) 2096 + { 2097 + return 0; 2098 + } 2119 2099 #endif /* CONFIG_FTRACE_STARTUP_TEST */ 2120 2100 2121 2101 static void add_tracer_options(struct trace_array *tr, struct tracer *t); ··· 2155 2127 2156 2128 mutex_lock(&trace_types_lock); 2157 2129 2158 - tracing_selftest_running = true; 2159 - 2160 2130 for (t = trace_types; t; t = t->next) { 2161 2131 if (strcmp(type->name, t->name) == 0) { 2162 2132 /* already found */ ··· 2183 2157 /* store the tracer for __set_tracer_option */ 2184 2158 type->flags->trace = type; 2185 2159 2186 - ret = run_tracer_selftest(type); 2160 + ret = do_run_tracer_selftest(type); 2187 2161 if (ret < 0) 2188 2162 goto out; 2189 2163 ··· 2192 2166 add_tracer_options(&global_trace, type); 2193 2167 2194 2168 out: 2195 - tracing_selftest_running = false; 2196 2169 mutex_unlock(&trace_types_lock); 2197 2170 2198 2171 if (ret || !default_bootup_tracer) ··· 3515 3490 unsigned int trace_ctx; 3516 3491 char *tbuffer; 3517 3492 3518 - if (tracing_disabled || tracing_selftest_running) 3493 + if (tracing_disabled) 3519 3494 return 0; 3520 3495 3521 3496 /* Don't pollute graph traces with trace_vprintk internals */ ··· 3563 3538 int trace_array_vprintk(struct trace_array *tr, 3564 3539 unsigned long ip, const char *fmt, va_list args) 3565 3540 { 3541 + if (tracing_selftest_running && tr == &global_trace) 3542 + return 0; 3543 + 3566 3544 return __trace_array_vprintk(tr->array_buffer.buffer, ip, fmt, args); 3567 3545 } 3568 3546 ··· 5780 5752 "\t table using the key(s) and value(s) named, and the value of a\n" 5781 5753 "\t sum called 'hitcount' is incremented. Keys and values\n" 5782 5754 "\t correspond to fields in the event's format description. Keys\n" 5783 - "\t can be any field, or the special string 'stacktrace'.\n" 5755 + "\t can be any field, or the special string 'common_stacktrace'.\n" 5784 5756 "\t Compound keys consisting of up to two fields can be specified\n" 5785 5757 "\t by the 'keys' keyword. Values must correspond to numeric\n" 5786 5758 "\t fields. Sort keys consisting of up to two fields can be\n"

+2

kernel/trace/trace_events.c

··· 194 194 __generic_field(int, common_cpu, FILTER_CPU); 195 195 __generic_field(char *, COMM, FILTER_COMM); 196 196 __generic_field(char *, comm, FILTER_COMM); 197 + __generic_field(char *, stacktrace, FILTER_STACKTRACE); 198 + __generic_field(char *, STACKTRACE, FILTER_STACKTRACE); 197 199 198 200 return ret; 199 201 }

+26 -13

kernel/trace/trace_events_hist.c

··· 1364 1364 if (field->field) 1365 1365 field_name = field->field->name; 1366 1366 else 1367 - field_name = "stacktrace"; 1367 + field_name = "common_stacktrace"; 1368 1368 } else if (field->flags & HIST_FIELD_FL_HITCOUNT) 1369 1369 field_name = "hitcount"; 1370 1370 ··· 2367 2367 hist_data->enable_timestamps = true; 2368 2368 if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS) 2369 2369 hist_data->attrs->ts_in_usecs = true; 2370 - } else if (strcmp(field_name, "stacktrace") == 0) { 2370 + } else if (strcmp(field_name, "common_stacktrace") == 0) { 2371 2371 *flags |= HIST_FIELD_FL_STACKTRACE; 2372 2372 } else if (strcmp(field_name, "common_cpu") == 0) 2373 2373 *flags |= HIST_FIELD_FL_CPU; ··· 2378 2378 if (!field || !field->size) { 2379 2379 /* 2380 2380 * For backward compatibility, if field_name 2381 - * was "cpu", then we treat this the same as 2382 - * common_cpu. This also works for "CPU". 2381 + * was "cpu" or "stacktrace", then we treat this 2382 + * the same as common_cpu and common_stacktrace 2383 + * respectively. This also works for "CPU", and 2384 + * "STACKTRACE". 2383 2385 */ 2384 2386 if (field && field->filter_type == FILTER_CPU) { 2385 2387 *flags |= HIST_FIELD_FL_CPU; 2388 + } else if (field && field->filter_type == FILTER_STACKTRACE) { 2389 + *flags |= HIST_FIELD_FL_STACKTRACE; 2386 2390 } else { 2387 2391 hist_err(tr, HIST_ERR_FIELD_NOT_FOUND, 2388 2392 errpos(field_name)); ··· 4242 4238 goto out; 4243 4239 } 4244 4240 4245 - /* Some types cannot be a value */ 4246 - if (hist_field->flags & (HIST_FIELD_FL_GRAPH | HIST_FIELD_FL_PERCENT | 4247 - HIST_FIELD_FL_BUCKET | HIST_FIELD_FL_LOG2 | 4248 - HIST_FIELD_FL_SYM | HIST_FIELD_FL_SYM_OFFSET | 4249 - HIST_FIELD_FL_SYSCALL | HIST_FIELD_FL_STACKTRACE)) { 4250 - hist_err(file->tr, HIST_ERR_BAD_FIELD_MODIFIER, errpos(field_str)); 4251 - ret = -EINVAL; 4241 + /* values and variables should not have some modifiers */ 4242 + if (hist_field->flags & HIST_FIELD_FL_VAR) { 4243 + /* Variable */ 4244 + if (hist_field->flags & (HIST_FIELD_FL_GRAPH | HIST_FIELD_FL_PERCENT | 4245 + HIST_FIELD_FL_BUCKET | HIST_FIELD_FL_LOG2)) 4246 + goto err; 4247 + } else { 4248 + /* Value */ 4249 + if (hist_field->flags & (HIST_FIELD_FL_GRAPH | HIST_FIELD_FL_PERCENT | 4250 + HIST_FIELD_FL_BUCKET | HIST_FIELD_FL_LOG2 | 4251 + HIST_FIELD_FL_SYM | HIST_FIELD_FL_SYM_OFFSET | 4252 + HIST_FIELD_FL_SYSCALL | HIST_FIELD_FL_STACKTRACE)) 4253 + goto err; 4252 4254 } 4253 4255 4254 4256 hist_data->fields[val_idx] = hist_field; ··· 4266 4256 ret = -EINVAL; 4267 4257 out: 4268 4258 return ret; 4259 + err: 4260 + hist_err(file->tr, HIST_ERR_BAD_FIELD_MODIFIER, errpos(field_str)); 4261 + return -EINVAL; 4269 4262 } 4270 4263 4271 4264 static int create_val_field(struct hist_trigger_data *hist_data, ··· 5398 5385 if (key_field->field) 5399 5386 seq_printf(m, "%s.stacktrace", key_field->field->name); 5400 5387 else 5401 - seq_puts(m, "stacktrace:\n"); 5388 + seq_puts(m, "common_stacktrace:\n"); 5402 5389 hist_trigger_stacktrace_print(m, 5403 5390 key + key_field->offset, 5404 5391 HIST_STACKTRACE_DEPTH); ··· 5981 5968 if (field->field) 5982 5969 seq_printf(m, "%s.stacktrace", field->field->name); 5983 5970 else 5984 - seq_puts(m, "stacktrace"); 5971 + seq_puts(m, "common_stacktrace"); 5985 5972 } else 5986 5973 hist_field_print(m, field); 5987 5974 }

+73 -39

kernel/trace/trace_events_user.c

··· 96 96 * these to track enablement sites that are tied to an event. 97 97 */ 98 98 struct user_event_enabler { 99 - struct list_head link; 99 + struct list_head mm_enablers_link; 100 100 struct user_event *event; 101 101 unsigned long addr; 102 102 103 103 /* Track enable bit, flags, etc. Aligned for bitops. */ 104 - unsigned int values; 104 + unsigned long values; 105 105 }; 106 106 107 107 /* Bits 0-5 are for the bit to update upon enable/disable (0-63 allowed) */ ··· 116 116 /* Only duplicate the bit value */ 117 117 #define ENABLE_VAL_DUP_MASK ENABLE_VAL_BIT_MASK 118 118 119 - #define ENABLE_BITOPS(e) ((unsigned long *)&(e)->values) 119 + #define ENABLE_BITOPS(e) (&(e)->values) 120 + 121 + #define ENABLE_BIT(e) ((int)((e)->values & ENABLE_VAL_BIT_MASK)) 120 122 121 123 /* Used for asynchronous faulting in of pages */ 122 124 struct user_event_enabler_fault { ··· 155 153 #define VALIDATOR_REL (1 << 1) 156 154 157 155 struct user_event_validator { 158 - struct list_head link; 156 + struct list_head user_event_link; 159 157 int offset; 160 158 int flags; 161 159 }; ··· 261 259 262 260 static void user_event_enabler_destroy(struct user_event_enabler *enabler) 263 261 { 264 - list_del_rcu(&enabler->link); 262 + list_del_rcu(&enabler->mm_enablers_link); 265 263 266 264 /* No longer tracking the event via the enabler */ 267 265 refcount_dec(&enabler->event->refcnt); ··· 425 423 426 424 /* Update bit atomically, user tracers must be atomic as well */ 427 425 if (enabler->event && enabler->event->status) 428 - set_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr); 426 + set_bit(ENABLE_BIT(enabler), ptr); 429 427 else 430 - clear_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr); 428 + clear_bit(ENABLE_BIT(enabler), ptr); 431 429 432 430 kunmap_local(kaddr); 433 431 unpin_user_pages_dirty_lock(&page, 1, true); ··· 439 437 unsigned long uaddr, unsigned char bit) 440 438 { 441 439 struct user_event_enabler *enabler; 442 - struct user_event_enabler *next; 443 440 444 - list_for_each_entry_safe(enabler, next, &mm->enablers, link) { 445 - if (enabler->addr == uaddr && 446 - (enabler->values & ENABLE_VAL_BIT_MASK) == bit) 441 + list_for_each_entry(enabler, &mm->enablers, mm_enablers_link) { 442 + if (enabler->addr == uaddr && ENABLE_BIT(enabler) == bit) 447 443 return true; 448 444 } 449 445 ··· 451 451 static void user_event_enabler_update(struct user_event *user) 452 452 { 453 453 struct user_event_enabler *enabler; 454 - struct user_event_mm *mm = user_event_mm_get_all(user); 455 454 struct user_event_mm *next; 455 + struct user_event_mm *mm; 456 456 int attempt; 457 + 458 + lockdep_assert_held(&event_mutex); 459 + 460 + /* 461 + * We need to build a one-shot list of all the mms that have an 462 + * enabler for the user_event passed in. This list is only valid 463 + * while holding the event_mutex. The only reason for this is due 464 + * to the global mm list being RCU protected and we use methods 465 + * which can wait (mmap_read_lock and pin_user_pages_remote). 466 + * 467 + * NOTE: user_event_mm_get_all() increments the ref count of each 468 + * mm that is added to the list to prevent removal timing windows. 469 + * We must always put each mm after they are used, which may wait. 470 + */ 471 + mm = user_event_mm_get_all(user); 457 472 458 473 while (mm) { 459 474 next = mm->next; 460 475 mmap_read_lock(mm->mm); 461 - rcu_read_lock(); 462 476 463 - list_for_each_entry_rcu(enabler, &mm->enablers, link) { 477 + list_for_each_entry(enabler, &mm->enablers, mm_enablers_link) { 464 478 if (enabler->event == user) { 465 479 attempt = 0; 466 480 user_event_enabler_write(mm, enabler, true, &attempt); 467 481 } 468 482 } 469 483 470 - rcu_read_unlock(); 471 484 mmap_read_unlock(mm->mm); 472 485 user_event_mm_put(mm); 473 486 mm = next; ··· 508 495 enabler->values = orig->values & ENABLE_VAL_DUP_MASK; 509 496 510 497 refcount_inc(&enabler->event->refcnt); 511 - list_add_rcu(&enabler->link, &mm->enablers); 498 + 499 + /* Enablers not exposed yet, RCU not required */ 500 + list_add(&enabler->mm_enablers_link, &mm->enablers); 512 501 513 502 return true; 514 503 } ··· 529 514 struct user_event_mm *mm; 530 515 531 516 /* 517 + * We use the mm->next field to build a one-shot list from the global 518 + * RCU protected list. To build this list the event_mutex must be held. 519 + * This lets us build a list without requiring allocs that could fail 520 + * when user based events are most wanted for diagnostics. 521 + */ 522 + lockdep_assert_held(&event_mutex); 523 + 524 + /* 532 525 * We do not want to block fork/exec while enablements are being 533 526 * updated, so we use RCU to walk the current tasks that have used 534 527 * user_events ABI for 1 or more events. Each enabler found in each ··· 548 525 */ 549 526 rcu_read_lock(); 550 527 551 - list_for_each_entry_rcu(mm, &user_event_mms, link) 552 - list_for_each_entry_rcu(enabler, &mm->enablers, link) 528 + list_for_each_entry_rcu(mm, &user_event_mms, mms_link) { 529 + list_for_each_entry_rcu(enabler, &mm->enablers, mm_enablers_link) { 553 530 if (enabler->event == user) { 554 531 mm->next = found; 555 532 found = user_event_mm_get(mm); 556 533 break; 557 534 } 535 + } 536 + } 558 537 559 538 rcu_read_unlock(); 560 539 561 540 return found; 562 541 } 563 542 564 - static struct user_event_mm *user_event_mm_create(struct task_struct *t) 543 + static struct user_event_mm *user_event_mm_alloc(struct task_struct *t) 565 544 { 566 545 struct user_event_mm *user_mm; 567 - unsigned long flags; 568 546 569 547 user_mm = kzalloc(sizeof(*user_mm), GFP_KERNEL_ACCOUNT); 570 548 ··· 576 552 INIT_LIST_HEAD(&user_mm->enablers); 577 553 refcount_set(&user_mm->refcnt, 1); 578 554 refcount_set(&user_mm->tasks, 1); 579 - 580 - spin_lock_irqsave(&user_event_mms_lock, flags); 581 - list_add_rcu(&user_mm->link, &user_event_mms); 582 - spin_unlock_irqrestore(&user_event_mms_lock, flags); 583 - 584 - t->user_event_mm = user_mm; 585 555 586 556 /* 587 557 * The lifetime of the memory descriptor can slightly outlast ··· 590 572 return user_mm; 591 573 } 592 574 575 + static void user_event_mm_attach(struct user_event_mm *user_mm, struct task_struct *t) 576 + { 577 + unsigned long flags; 578 + 579 + spin_lock_irqsave(&user_event_mms_lock, flags); 580 + list_add_rcu(&user_mm->mms_link, &user_event_mms); 581 + spin_unlock_irqrestore(&user_event_mms_lock, flags); 582 + 583 + t->user_event_mm = user_mm; 584 + } 585 + 593 586 static struct user_event_mm *current_user_event_mm(void) 594 587 { 595 588 struct user_event_mm *user_mm = current->user_event_mm; ··· 608 579 if (user_mm) 609 580 goto inc; 610 581 611 - user_mm = user_event_mm_create(current); 582 + user_mm = user_event_mm_alloc(current); 612 583 613 584 if (!user_mm) 614 585 goto error; 586 + 587 + user_event_mm_attach(user_mm, current); 615 588 inc: 616 589 refcount_inc(&user_mm->refcnt); 617 590 error: ··· 624 593 { 625 594 struct user_event_enabler *enabler, *next; 626 595 627 - list_for_each_entry_safe(enabler, next, &mm->enablers, link) 596 + list_for_each_entry_safe(enabler, next, &mm->enablers, mm_enablers_link) 628 597 user_event_enabler_destroy(enabler); 629 598 630 599 mmdrop(mm->mm); ··· 661 630 662 631 /* Remove the mm from the list, so it can no longer be enabled */ 663 632 spin_lock_irqsave(&user_event_mms_lock, flags); 664 - list_del_rcu(&mm->link); 633 + list_del_rcu(&mm->mms_link); 665 634 spin_unlock_irqrestore(&user_event_mms_lock, flags); 666 635 667 636 /* ··· 701 670 702 671 void user_event_mm_dup(struct task_struct *t, struct user_event_mm *old_mm) 703 672 { 704 - struct user_event_mm *mm = user_event_mm_create(t); 673 + struct user_event_mm *mm = user_event_mm_alloc(t); 705 674 struct user_event_enabler *enabler; 706 675 707 676 if (!mm) ··· 709 678 710 679 rcu_read_lock(); 711 680 712 - list_for_each_entry_rcu(enabler, &old_mm->enablers, link) 681 + list_for_each_entry_rcu(enabler, &old_mm->enablers, mm_enablers_link) { 713 682 if (!user_event_enabler_dup(enabler, mm)) 714 683 goto error; 684 + } 715 685 716 686 rcu_read_unlock(); 717 687 688 + user_event_mm_attach(mm, t); 718 689 return; 719 690 error: 720 691 rcu_read_unlock(); 721 - user_event_mm_remove(t); 692 + user_event_mm_destroy(mm); 722 693 } 723 694 724 695 static bool current_user_event_enabler_exists(unsigned long uaddr, ··· 781 748 */ 782 749 if (!*write_result) { 783 750 refcount_inc(&enabler->event->refcnt); 784 - list_add_rcu(&enabler->link, &user_mm->enablers); 751 + list_add_rcu(&enabler->mm_enablers_link, &user_mm->enablers); 785 752 } 786 753 787 754 mutex_unlock(&event_mutex); ··· 937 904 struct user_event_validator *validator, *next; 938 905 struct list_head *head = &user->validators; 939 906 940 - list_for_each_entry_safe(validator, next, head, link) { 941 - list_del(&validator->link); 907 + list_for_each_entry_safe(validator, next, head, user_event_link) { 908 + list_del(&validator->user_event_link); 942 909 kfree(validator); 943 910 } 944 911 } ··· 992 959 validator->offset = offset; 993 960 994 961 /* Want sequential access when validating */ 995 - list_add_tail(&validator->link, &user->validators); 962 + list_add_tail(&validator->user_event_link, &user->validators); 996 963 997 964 add_field: 998 965 field->type = type; ··· 1382 1349 void *pos, *end = data + len; 1383 1350 u32 loc, offset, size; 1384 1351 1385 - list_for_each_entry(validator, head, link) { 1352 + list_for_each_entry(validator, head, user_event_link) { 1386 1353 pos = data + validator->offset; 1387 1354 1388 1355 /* Already done min_size check, no bounds check here */ ··· 2303 2270 */ 2304 2271 mutex_lock(&event_mutex); 2305 2272 2306 - list_for_each_entry_safe(enabler, next, &mm->enablers, link) 2273 + list_for_each_entry_safe(enabler, next, &mm->enablers, mm_enablers_link) { 2307 2274 if (enabler->addr == reg.disable_addr && 2308 - (enabler->values & ENABLE_VAL_BIT_MASK) == reg.disable_bit) { 2275 + ENABLE_BIT(enabler) == reg.disable_bit) { 2309 2276 set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler)); 2310 2277 2311 2278 if (!test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler))) ··· 2314 2281 /* Removed at least one */ 2315 2282 ret = 0; 2316 2283 } 2284 + } 2317 2285 2318 2286 mutex_unlock(&event_mutex); 2319 2287

+2

kernel/trace/trace_osnoise.c

··· 1652 1652 osnoise_stop_tracing(); 1653 1653 notify_new_max_latency(diff); 1654 1654 1655 + wake_up_process(tlat->kthread); 1656 + 1655 1657 return HRTIMER_NORESTART; 1656 1658 } 1657 1659 }

+10

kernel/trace/trace_selftest.c

··· 848 848 } 849 849 850 850 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 851 + /* 852 + * These tests can take some time to run. Make sure on non PREEMPT 853 + * kernels, we do not trigger the softlockup detector. 854 + */ 855 + cond_resched(); 856 + 851 857 tracing_reset_online_cpus(&tr->array_buffer); 852 858 set_graph_array(tr); 853 859 ··· 874 868 (unsigned long)ftrace_stub_direct_tramp); 875 869 if (ret) 876 870 goto out; 871 + 872 + cond_resched(); 877 873 878 874 ret = register_ftrace_graph(&fgraph_ops); 879 875 if (ret) { ··· 898 890 true); 899 891 if (ret) 900 892 goto out; 893 + 894 + cond_resched(); 901 895 902 896 tracing_start(); 903 897

+2

lib/radix-tree.c

··· 27 27 #include <linux/string.h> 28 28 #include <linux/xarray.h> 29 29 30 + #include "radix-tree.h" 31 + 30 32 /* 31 33 * Radix tree node cache. 32 34 */

+8

lib/radix-tree.h

··· 1 + // SPDX-License-Identifier: GPL-2.0+ 2 + /* radix-tree helpers that are only shared with xarray */ 3 + 4 + struct kmem_cache; 5 + struct rcu_head; 6 + 7 + extern struct kmem_cache *radix_tree_node_cachep; 8 + extern void radix_tree_node_rcu_free(struct rcu_head *head);

+1 -1

lib/test_vmalloc.c

··· 369 369 int i; 370 370 371 371 map_nr_pages = nr_pages > 0 ? nr_pages:1; 372 - pages = kmalloc(map_nr_pages * sizeof(struct page), GFP_KERNEL); 372 + pages = kcalloc(map_nr_pages, sizeof(struct page *), GFP_KERNEL); 373 373 if (!pages) 374 374 return -1; 375 375

+2 -4

lib/xarray.c

··· 12 12 #include <linux/slab.h> 13 13 #include <linux/xarray.h> 14 14 15 + #include "radix-tree.h" 16 + 15 17 /* 16 18 * Coding conventions in this file: 17 19 * ··· 248 246 return entry; 249 247 } 250 248 EXPORT_SYMBOL_GPL(xas_load); 251 - 252 - /* Move the radix tree node cache here */ 253 - extern struct kmem_cache *radix_tree_node_cachep; 254 - extern void radix_tree_node_rcu_free(struct rcu_head *head); 255 249 256 250 #define XA_RCU_FREE ((struct xarray *)1) 257 251

+2

mm/damon/core.c

··· 551 551 return -EINVAL; 552 552 if (attrs->min_nr_regions > attrs->max_nr_regions) 553 553 return -EINVAL; 554 + if (attrs->sample_interval > attrs->aggr_interval) 555 + return -EINVAL; 554 556 555 557 damon_update_monitoring_results(ctx, attrs); 556 558 ctx->attrs = *attrs;

+16 -10

mm/filemap.c

··· 1728 1728 * 1729 1729 * Return: The index of the gap if found, otherwise an index outside the 1730 1730 * range specified (in which case 'return - index >= max_scan' will be true). 1731 - * In the rare case of index wrap-around, 0 will be returned. 1731 + * In the rare case of index wrap-around, 0 will be returned. 0 will also 1732 + * be returned if index == 0 and there is a gap at the index. We can not 1733 + * wrap-around if passed index == 0. 1732 1734 */ 1733 1735 pgoff_t page_cache_next_miss(struct address_space *mapping, 1734 1736 pgoff_t index, unsigned long max_scan) ··· 1740 1738 while (max_scan--) { 1741 1739 void *entry = xas_next(&xas); 1742 1740 if (!entry || xa_is_value(entry)) 1743 - break; 1744 - if (xas.xa_index == 0) 1745 - break; 1741 + return xas.xa_index; 1742 + if (xas.xa_index == 0 && index != 0) 1743 + return xas.xa_index; 1746 1744 } 1747 1745 1748 - return xas.xa_index; 1746 + /* No gaps in range and no wrap-around, return index beyond range */ 1747 + return xas.xa_index + 1; 1749 1748 } 1750 1749 EXPORT_SYMBOL(page_cache_next_miss); 1751 1750 ··· 1767 1764 * 1768 1765 * Return: The index of the gap if found, otherwise an index outside the 1769 1766 * range specified (in which case 'index - return >= max_scan' will be true). 1770 - * In the rare case of wrap-around, ULONG_MAX will be returned. 1767 + * In the rare case of wrap-around, ULONG_MAX will be returned. ULONG_MAX 1768 + * will also be returned if index == ULONG_MAX and there is a gap at the 1769 + * index. We can not wrap-around if passed index == ULONG_MAX. 1771 1770 */ 1772 1771 pgoff_t page_cache_prev_miss(struct address_space *mapping, 1773 1772 pgoff_t index, unsigned long max_scan) ··· 1779 1774 while (max_scan--) { 1780 1775 void *entry = xas_prev(&xas); 1781 1776 if (!entry || xa_is_value(entry)) 1782 - break; 1783 - if (xas.xa_index == ULONG_MAX) 1784 - break; 1777 + return xas.xa_index; 1778 + if (xas.xa_index == ULONG_MAX && index != ULONG_MAX) 1779 + return xas.xa_index; 1785 1780 } 1786 1781 1787 - return xas.xa_index; 1782 + /* No gaps in range and no wrap-around, return index beyond range */ 1783 + return xas.xa_index - 1; 1788 1784 } 1789 1785 EXPORT_SYMBOL(page_cache_prev_miss); 1790 1786

+1

mm/gup_test.c

··· 380 380 static const struct file_operations gup_test_fops = { 381 381 .open = nonseekable_open, 382 382 .unlocked_ioctl = gup_test_ioctl, 383 + .compat_ioctl = compat_ptr_ioctl, 383 384 .release = gup_test_release, 384 385 }; 385 386

-1

mm/khugepaged.c

··· 2089 2089 TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH); 2090 2090 2091 2091 xas_lock_irq(&xas); 2092 - xas_set(&xas, index); 2093 2092 2094 2093 VM_BUG_ON_PAGE(page != xas_load(&xas), page); 2095 2094

+6 -3

mm/memfd.c

··· 371 371 372 372 inode->i_mode &= ~0111; 373 373 file_seals = memfd_file_seals_ptr(file); 374 - *file_seals &= ~F_SEAL_SEAL; 375 - *file_seals |= F_SEAL_EXEC; 374 + if (file_seals) { 375 + *file_seals &= ~F_SEAL_SEAL; 376 + *file_seals |= F_SEAL_EXEC; 377 + } 376 378 } else if (flags & MFD_ALLOW_SEALING) { 377 379 /* MFD_EXEC and MFD_ALLOW_SEALING are set */ 378 380 file_seals = memfd_file_seals_ptr(file); 379 - *file_seals &= ~F_SEAL_SEAL; 381 + if (file_seals) 382 + *file_seals &= ~F_SEAL_SEAL; 380 383 } 381 384 382 385 fd_install(fd, file);

+1 -1

mm/mprotect.c

··· 824 824 } 825 825 tlb_finish_mmu(&tlb); 826 826 827 - if (!error && vma_iter_end(&vmi) < end) 827 + if (!error && tmp < end) 828 828 error = -ENOMEM; 829 829 830 830 out:

+24 -15

mm/shrinker_debug.c

··· 5 5 #include <linux/seq_file.h> 6 6 #include <linux/shrinker.h> 7 7 #include <linux/memcontrol.h> 8 - #include <linux/srcu.h> 9 8 10 9 /* defined in vmscan.c */ 11 - extern struct mutex shrinker_mutex; 10 + extern struct rw_semaphore shrinker_rwsem; 12 11 extern struct list_head shrinker_list; 13 - extern struct srcu_struct shrinker_srcu; 14 12 15 13 static DEFINE_IDA(shrinker_debugfs_ida); 16 14 static struct dentry *shrinker_debugfs_root; ··· 49 51 struct mem_cgroup *memcg; 50 52 unsigned long total; 51 53 bool memcg_aware; 52 - int ret = 0, nid, srcu_idx; 54 + int ret, nid; 53 55 54 56 count_per_node = kcalloc(nr_node_ids, sizeof(unsigned long), GFP_KERNEL); 55 57 if (!count_per_node) 56 58 return -ENOMEM; 57 59 58 - srcu_idx = srcu_read_lock(&shrinker_srcu); 60 + ret = down_read_killable(&shrinker_rwsem); 61 + if (ret) { 62 + kfree(count_per_node); 63 + return ret; 64 + } 65 + rcu_read_lock(); 59 66 60 67 memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE; 61 68 ··· 91 88 } 92 89 } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); 93 90 94 - srcu_read_unlock(&shrinker_srcu, srcu_idx); 91 + rcu_read_unlock(); 92 + up_read(&shrinker_rwsem); 95 93 96 94 kfree(count_per_node); 97 95 return ret; ··· 115 111 .gfp_mask = GFP_KERNEL, 116 112 }; 117 113 struct mem_cgroup *memcg = NULL; 118 - int nid, srcu_idx; 114 + int nid; 119 115 char kbuf[72]; 116 + ssize_t ret; 120 117 121 118 read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1); 122 119 if (copy_from_user(kbuf, buf, read_len)) ··· 146 141 return -EINVAL; 147 142 } 148 143 149 - srcu_idx = srcu_read_lock(&shrinker_srcu); 144 + ret = down_read_killable(&shrinker_rwsem); 145 + if (ret) { 146 + mem_cgroup_put(memcg); 147 + return ret; 148 + } 150 149 151 150 sc.nid = nid; 152 151 sc.memcg = memcg; ··· 159 150 160 151 shrinker->scan_objects(shrinker, &sc); 161 152 162 - srcu_read_unlock(&shrinker_srcu, srcu_idx); 153 + up_read(&shrinker_rwsem); 163 154 mem_cgroup_put(memcg); 164 155 165 156 return size; ··· 177 168 char buf[128]; 178 169 int id; 179 170 180 - lockdep_assert_held(&shrinker_mutex); 171 + lockdep_assert_held(&shrinker_rwsem); 181 172 182 173 /* debugfs isn't initialized yet, add debugfs entries later. */ 183 174 if (!shrinker_debugfs_root) ··· 220 211 if (!new) 221 212 return -ENOMEM; 222 213 223 - mutex_lock(&shrinker_mutex); 214 + down_write(&shrinker_rwsem); 224 215 225 216 old = shrinker->name; 226 217 shrinker->name = new; ··· 238 229 shrinker->debugfs_entry = entry; 239 230 } 240 231 241 - mutex_unlock(&shrinker_mutex); 232 + up_write(&shrinker_rwsem); 242 233 243 234 kfree_const(old); 244 235 ··· 251 242 { 252 243 struct dentry *entry = shrinker->debugfs_entry; 253 244 254 - lockdep_assert_held(&shrinker_mutex); 245 + lockdep_assert_held(&shrinker_rwsem); 255 246 256 247 kfree_const(shrinker->name); 257 248 shrinker->name = NULL; ··· 280 271 shrinker_debugfs_root = dentry; 281 272 282 273 /* Create debugfs entries for shrinkers registered at boot */ 283 - mutex_lock(&shrinker_mutex); 274 + down_write(&shrinker_rwsem); 284 275 list_for_each_entry(shrinker, &shrinker_list, list) 285 276 if (!shrinker->debugfs_entry) { 286 277 ret = shrinker_debugfs_add(shrinker); 287 278 if (ret) 288 279 break; 289 280 } 290 - mutex_unlock(&shrinker_mutex); 281 + up_write(&shrinker_rwsem); 291 282 292 283 return ret; 293 284 }

+13 -4

mm/vmalloc.c

··· 3148 3148 * allocation request, free them via vfree() if any. 3149 3149 */ 3150 3150 if (area->nr_pages != nr_small_pages) { 3151 - /* vm_area_alloc_pages() can also fail due to a fatal signal */ 3152 - if (!fatal_signal_pending(current)) 3151 + /* 3152 + * vm_area_alloc_pages() can fail due to insufficient memory but 3153 + * also:- 3154 + * 3155 + * - a pending fatal signal 3156 + * - insufficient huge page-order pages 3157 + * 3158 + * Since we always retry allocations at order-0 in the huge page 3159 + * case a warning for either is spurious. 3160 + */ 3161 + if (!fatal_signal_pending(current) && page_order == 0) 3153 3162 warn_alloc(gfp_mask, NULL, 3154 - "vmalloc error: size %lu, page order %u, failed to allocate pages", 3155 - area->nr_pages * PAGE_SIZE, page_order); 3163 + "vmalloc error: size %lu, failed to allocate pages", 3164 + area->nr_pages * PAGE_SIZE); 3156 3165 goto fail; 3157 3166 } 3158 3167

+62 -76

mm/vmscan.c

··· 35 35 #include <linux/cpuset.h> 36 36 #include <linux/compaction.h> 37 37 #include <linux/notifier.h> 38 - #include <linux/mutex.h> 38 + #include <linux/rwsem.h> 39 39 #include <linux/delay.h> 40 40 #include <linux/kthread.h> 41 41 #include <linux/freezer.h> ··· 57 57 #include <linux/khugepaged.h> 58 58 #include <linux/rculist_nulls.h> 59 59 #include <linux/random.h> 60 - #include <linux/srcu.h> 61 60 62 61 #include <asm/tlbflush.h> 63 62 #include <asm/div64.h> ··· 189 190 int vm_swappiness = 60; 190 191 191 192 LIST_HEAD(shrinker_list); 192 - DEFINE_MUTEX(shrinker_mutex); 193 - DEFINE_SRCU(shrinker_srcu); 194 - static atomic_t shrinker_srcu_generation = ATOMIC_INIT(0); 193 + DECLARE_RWSEM(shrinker_rwsem); 195 194 196 195 #ifdef CONFIG_MEMCG 197 196 static int shrinker_nr_max; ··· 208 211 static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg, 209 212 int nid) 210 213 { 211 - return srcu_dereference_check(memcg->nodeinfo[nid]->shrinker_info, 212 - &shrinker_srcu, 213 - lockdep_is_held(&shrinker_mutex)); 214 - } 215 - 216 - static struct shrinker_info *shrinker_info_srcu(struct mem_cgroup *memcg, 217 - int nid) 218 - { 219 - return srcu_dereference(memcg->nodeinfo[nid]->shrinker_info, 220 - &shrinker_srcu); 221 - } 222 - 223 - static void free_shrinker_info_rcu(struct rcu_head *head) 224 - { 225 - kvfree(container_of(head, struct shrinker_info, rcu)); 214 + return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info, 215 + lockdep_is_held(&shrinker_rwsem)); 226 216 } 227 217 228 218 static int expand_one_shrinker_info(struct mem_cgroup *memcg, ··· 250 266 defer_size - old_defer_size); 251 267 252 268 rcu_assign_pointer(pn->shrinker_info, new); 253 - call_srcu(&shrinker_srcu, &old->rcu, free_shrinker_info_rcu); 269 + kvfree_rcu(old, rcu); 254 270 } 255 271 256 272 return 0; ··· 276 292 int nid, size, ret = 0; 277 293 int map_size, defer_size = 0; 278 294 279 - mutex_lock(&shrinker_mutex); 295 + down_write(&shrinker_rwsem); 280 296 map_size = shrinker_map_size(shrinker_nr_max); 281 297 defer_size = shrinker_defer_size(shrinker_nr_max); 282 298 size = map_size + defer_size; ··· 292 308 info->map_nr_max = shrinker_nr_max; 293 309 rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info); 294 310 } 295 - mutex_unlock(&shrinker_mutex); 311 + up_write(&shrinker_rwsem); 296 312 297 313 return ret; 298 314 } ··· 308 324 if (!root_mem_cgroup) 309 325 goto out; 310 326 311 - lockdep_assert_held(&shrinker_mutex); 327 + lockdep_assert_held(&shrinker_rwsem); 312 328 313 329 map_size = shrinker_map_size(new_nr_max); 314 330 defer_size = shrinker_defer_size(new_nr_max); ··· 336 352 { 337 353 if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) { 338 354 struct shrinker_info *info; 339 - int srcu_idx; 340 355 341 - srcu_idx = srcu_read_lock(&shrinker_srcu); 342 - info = shrinker_info_srcu(memcg, nid); 356 + rcu_read_lock(); 357 + info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); 343 358 if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) { 344 359 /* Pairs with smp mb in shrink_slab() */ 345 360 smp_mb__before_atomic(); 346 361 set_bit(shrinker_id, info->map); 347 362 } 348 - srcu_read_unlock(&shrinker_srcu, srcu_idx); 363 + rcu_read_unlock(); 349 364 } 350 365 } 351 366 ··· 357 374 if (mem_cgroup_disabled()) 358 375 return -ENOSYS; 359 376 360 - mutex_lock(&shrinker_mutex); 377 + down_write(&shrinker_rwsem); 378 + /* This may call shrinker, so it must use down_read_trylock() */ 361 379 id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); 362 380 if (id < 0) 363 381 goto unlock; ··· 372 388 shrinker->id = id; 373 389 ret = 0; 374 390 unlock: 375 - mutex_unlock(&shrinker_mutex); 391 + up_write(&shrinker_rwsem); 376 392 return ret; 377 393 } 378 394 ··· 382 398 383 399 BUG_ON(id < 0); 384 400 385 - lockdep_assert_held(&shrinker_mutex); 401 + lockdep_assert_held(&shrinker_rwsem); 386 402 387 403 idr_remove(&shrinker_idr, id); 388 404 } ··· 392 408 { 393 409 struct shrinker_info *info; 394 410 395 - info = shrinker_info_srcu(memcg, nid); 411 + info = shrinker_info_protected(memcg, nid); 396 412 return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); 397 413 } 398 414 ··· 401 417 { 402 418 struct shrinker_info *info; 403 419 404 - info = shrinker_info_srcu(memcg, nid); 420 + info = shrinker_info_protected(memcg, nid); 405 421 return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); 406 422 } 407 423 ··· 417 433 parent = root_mem_cgroup; 418 434 419 435 /* Prevent from concurrent shrinker_info expand */ 420 - mutex_lock(&shrinker_mutex); 436 + down_read(&shrinker_rwsem); 421 437 for_each_node(nid) { 422 438 child_info = shrinker_info_protected(memcg, nid); 423 439 parent_info = shrinker_info_protected(parent, nid); ··· 426 442 atomic_long_add(nr, &parent_info->nr_deferred[i]); 427 443 } 428 444 } 429 - mutex_unlock(&shrinker_mutex); 445 + up_read(&shrinker_rwsem); 430 446 } 431 447 432 448 static bool cgroup_reclaim(struct scan_control *sc) ··· 727 743 shrinker->name = NULL; 728 744 #endif 729 745 if (shrinker->flags & SHRINKER_MEMCG_AWARE) { 730 - mutex_lock(&shrinker_mutex); 746 + down_write(&shrinker_rwsem); 731 747 unregister_memcg_shrinker(shrinker); 732 - mutex_unlock(&shrinker_mutex); 748 + up_write(&shrinker_rwsem); 733 749 return; 734 750 } 735 751 ··· 739 755 740 756 void register_shrinker_prepared(struct shrinker *shrinker) 741 757 { 742 - mutex_lock(&shrinker_mutex); 743 - list_add_tail_rcu(&shrinker->list, &shrinker_list); 758 + down_write(&shrinker_rwsem); 759 + list_add_tail(&shrinker->list, &shrinker_list); 744 760 shrinker->flags |= SHRINKER_REGISTERED; 745 761 shrinker_debugfs_add(shrinker); 746 - mutex_unlock(&shrinker_mutex); 762 + up_write(&shrinker_rwsem); 747 763 } 748 764 749 765 static int __register_shrinker(struct shrinker *shrinker) ··· 794 810 if (!(shrinker->flags & SHRINKER_REGISTERED)) 795 811 return; 796 812 797 - mutex_lock(&shrinker_mutex); 798 - list_del_rcu(&shrinker->list); 813 + down_write(&shrinker_rwsem); 814 + list_del(&shrinker->list); 799 815 shrinker->flags &= ~SHRINKER_REGISTERED; 800 816 if (shrinker->flags & SHRINKER_MEMCG_AWARE) 801 817 unregister_memcg_shrinker(shrinker); 802 818 debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id); 803 - mutex_unlock(&shrinker_mutex); 804 - 805 - atomic_inc(&shrinker_srcu_generation); 806 - synchronize_srcu(&shrinker_srcu); 819 + up_write(&shrinker_rwsem); 807 820 808 821 shrinker_debugfs_remove(debugfs_entry, debugfs_id); 809 822 ··· 812 831 /** 813 832 * synchronize_shrinkers - Wait for all running shrinkers to complete. 814 833 * 815 - * This is useful to guarantee that all shrinker invocations have seen an 816 - * update, before freeing memory. 834 + * This is equivalent to calling unregister_shrink() and register_shrinker(), 835 + * but atomically and with less overhead. This is useful to guarantee that all 836 + * shrinker invocations have seen an update, before freeing memory, similar to 837 + * rcu. 817 838 */ 818 839 void synchronize_shrinkers(void) 819 840 { 820 - atomic_inc(&shrinker_srcu_generation); 821 - synchronize_srcu(&shrinker_srcu); 841 + down_write(&shrinker_rwsem); 842 + up_write(&shrinker_rwsem); 822 843 } 823 844 EXPORT_SYMBOL(synchronize_shrinkers); 824 845 ··· 929 946 { 930 947 struct shrinker_info *info; 931 948 unsigned long ret, freed = 0; 932 - int srcu_idx, generation; 933 - int i = 0; 949 + int i; 934 950 935 951 if (!mem_cgroup_online(memcg)) 936 952 return 0; 937 953 938 - again: 939 - srcu_idx = srcu_read_lock(&shrinker_srcu); 940 - info = shrinker_info_srcu(memcg, nid); 954 + if (!down_read_trylock(&shrinker_rwsem)) 955 + return 0; 956 + 957 + info = shrinker_info_protected(memcg, nid); 941 958 if (unlikely(!info)) 942 959 goto unlock; 943 960 944 - generation = atomic_read(&shrinker_srcu_generation); 945 - for_each_set_bit_from(i, info->map, info->map_nr_max) { 961 + for_each_set_bit(i, info->map, info->map_nr_max) { 946 962 struct shrink_control sc = { 947 963 .gfp_mask = gfp_mask, 948 964 .nid = nid, ··· 987 1005 set_shrinker_bit(memcg, nid, i); 988 1006 } 989 1007 freed += ret; 990 - if (atomic_read(&shrinker_srcu_generation) != generation) { 991 - srcu_read_unlock(&shrinker_srcu, srcu_idx); 992 - i++; 993 - goto again; 1008 + 1009 + if (rwsem_is_contended(&shrinker_rwsem)) { 1010 + freed = freed ? : 1; 1011 + break; 994 1012 } 995 1013 } 996 1014 unlock: 997 - srcu_read_unlock(&shrinker_srcu, srcu_idx); 1015 + up_read(&shrinker_rwsem); 998 1016 return freed; 999 1017 } 1000 1018 #else /* CONFIG_MEMCG */ ··· 1031 1049 { 1032 1050 unsigned long ret, freed = 0; 1033 1051 struct shrinker *shrinker; 1034 - int srcu_idx, generation; 1035 1052 1036 1053 /* 1037 1054 * The root memcg might be allocated even though memcg is disabled ··· 1042 1061 if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) 1043 1062 return shrink_slab_memcg(gfp_mask, nid, memcg, priority); 1044 1063 1045 - srcu_idx = srcu_read_lock(&shrinker_srcu); 1064 + if (!down_read_trylock(&shrinker_rwsem)) 1065 + goto out; 1046 1066 1047 - generation = atomic_read(&shrinker_srcu_generation); 1048 - list_for_each_entry_srcu(shrinker, &shrinker_list, list, 1049 - srcu_read_lock_held(&shrinker_srcu)) { 1067 + list_for_each_entry(shrinker, &shrinker_list, list) { 1050 1068 struct shrink_control sc = { 1051 1069 .gfp_mask = gfp_mask, 1052 1070 .nid = nid, ··· 1056 1076 if (ret == SHRINK_EMPTY) 1057 1077 ret = 0; 1058 1078 freed += ret; 1059 - 1060 - if (atomic_read(&shrinker_srcu_generation) != generation) { 1079 + /* 1080 + * Bail out if someone want to register a new shrinker to 1081 + * prevent the registration from being stalled for long periods 1082 + * by parallel ongoing shrinking. 1083 + */ 1084 + if (rwsem_is_contended(&shrinker_rwsem)) { 1061 1085 freed = freed ? : 1; 1062 1086 break; 1063 1087 } 1064 1088 } 1065 1089 1066 - srcu_read_unlock(&shrinker_srcu, srcu_idx); 1090 + up_read(&shrinker_rwsem); 1091 + out: 1067 1092 cond_resched(); 1068 1093 return freed; 1069 1094 } ··· 4759 4774 { 4760 4775 int seg; 4761 4776 int old, new; 4777 + unsigned long flags; 4762 4778 int bin = get_random_u32_below(MEMCG_NR_BINS); 4763 4779 struct pglist_data *pgdat = lruvec_pgdat(lruvec); 4764 4780 4765 - spin_lock(&pgdat->memcg_lru.lock); 4781 + spin_lock_irqsave(&pgdat->memcg_lru.lock, flags); 4766 4782 4767 4783 VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list)); 4768 4784 ··· 4798 4812 if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq)) 4799 4813 WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); 4800 4814 4801 - spin_unlock(&pgdat->memcg_lru.lock); 4815 + spin_unlock_irqrestore(&pgdat->memcg_lru.lock, flags); 4802 4816 } 4803 4817 4804 4818 void lru_gen_online_memcg(struct mem_cgroup *memcg) ··· 4811 4825 struct pglist_data *pgdat = NODE_DATA(nid); 4812 4826 struct lruvec *lruvec = get_lruvec(memcg, nid); 4813 4827 4814 - spin_lock(&pgdat->memcg_lru.lock); 4828 + spin_lock_irq(&pgdat->memcg_lru.lock); 4815 4829 4816 4830 VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list)); 4817 4831 ··· 4822 4836 4823 4837 lruvec->lrugen.gen = gen; 4824 4838 4825 - spin_unlock(&pgdat->memcg_lru.lock); 4839 + spin_unlock_irq(&pgdat->memcg_lru.lock); 4826 4840 } 4827 4841 } 4828 4842 ··· 4846 4860 struct pglist_data *pgdat = NODE_DATA(nid); 4847 4861 struct lruvec *lruvec = get_lruvec(memcg, nid); 4848 4862 4849 - spin_lock(&pgdat->memcg_lru.lock); 4863 + spin_lock_irq(&pgdat->memcg_lru.lock); 4850 4864 4851 4865 VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list)); 4852 4866 ··· 4858 4872 if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq)) 4859 4873 WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); 4860 4874 4861 - spin_unlock(&pgdat->memcg_lru.lock); 4875 + spin_unlock_irq(&pgdat->memcg_lru.lock); 4862 4876 } 4863 4877 } 4864 4878

+9 -2

mm/zswap.c

··· 1229 1229 goto reject; 1230 1230 } 1231 1231 1232 + /* 1233 + * XXX: zswap reclaim does not work with cgroups yet. Without a 1234 + * cgroup-aware entry LRU, we will push out entries system-wide based on 1235 + * local cgroup limits. 1236 + */ 1232 1237 objcg = get_obj_cgroup_from_page(page); 1233 - if (objcg && !obj_cgroup_may_zswap(objcg)) 1234 - goto shrink; 1238 + if (objcg && !obj_cgroup_may_zswap(objcg)) { 1239 + ret = -ENOMEM; 1240 + goto reject; 1241 + } 1235 1242 1236 1243 /* reclaim space if needed */ 1237 1244 if (zswap_is_full()) {

+6 -6

scripts/gdb/linux/constants.py.in

··· 48 48 LX_GDBPARSED(CLK_GET_RATE_NOCACHE) 49 49 50 50 /* linux/fs.h */ 51 - LX_VALUE(SB_RDONLY) 52 - LX_VALUE(SB_SYNCHRONOUS) 53 - LX_VALUE(SB_MANDLOCK) 54 - LX_VALUE(SB_DIRSYNC) 55 - LX_VALUE(SB_NOATIME) 56 - LX_VALUE(SB_NODIRATIME) 51 + LX_GDBPARSED(SB_RDONLY) 52 + LX_GDBPARSED(SB_SYNCHRONOUS) 53 + LX_GDBPARSED(SB_MANDLOCK) 54 + LX_GDBPARSED(SB_DIRSYNC) 55 + LX_GDBPARSED(SB_NOATIME) 56 + LX_GDBPARSED(SB_NODIRATIME) 57 57 58 58 /* linux/htimer.h */ 59 59 LX_GDBPARSED(hrtimer_resolution)

+3 -3

scripts/gfp-translate

··· 63 63 64 64 # Extract GFP flags from the kernel source 65 65 TMPFILE=`mktemp -t gfptranslate-XXXXXX` || exit 1 66 - grep -q ___GFP $SOURCE/include/linux/gfp.h 66 + grep -q ___GFP $SOURCE/include/linux/gfp_types.h 67 67 if [ $? -eq 0 ]; then 68 - grep "^#define ___GFP" $SOURCE/include/linux/gfp.h | sed -e 's/u$//' | grep -v GFP_BITS > $TMPFILE 68 + grep "^#define ___GFP" $SOURCE/include/linux/gfp_types.h | sed -e 's/u$//' | grep -v GFP_BITS > $TMPFILE 69 69 else 70 - grep "^#define __GFP" $SOURCE/include/linux/gfp.h | sed -e 's/(__force gfp_t)//' | sed -e 's/u)/)/' | grep -v GFP_BITS | sed -e 's/)\//) \//' > $TMPFILE 70 + grep "^#define __GFP" $SOURCE/include/linux/gfp_types.h | sed -e 's/(__force gfp_t)//' | sed -e 's/u)/)/' | grep -v GFP_BITS | sed -e 's/)\//) \//' > $TMPFILE 71 71 fi 72 72 73 73 # Parse the flags

+3 -2

tools/testing/radix-tree/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 3 - CFLAGS += -I. -I../../include -g -Og -Wall -D_LGPL_SOURCE -fsanitize=address \ 4 - -fsanitize=undefined 3 + CFLAGS += -I. -I../../include -I../../../lib -g -Og -Wall \ 4 + -D_LGPL_SOURCE -fsanitize=address -fsanitize=undefined 5 5 LDFLAGS += -fsanitize=address -fsanitize=undefined 6 6 LDLIBS+= -lpthread -lurcu 7 7 TARGETS = main idr-test multiorder xarray maple ··· 49 49 ../../../include/linux/xarray.h \ 50 50 ../../../include/linux/maple_tree.h \ 51 51 ../../../include/linux/radix-tree.h \ 52 + ../../../lib/radix-tree.h \ 52 53 ../../../include/linux/idr.h 53 54 54 55 radix-tree.c: ../../../lib/radix-tree.c

+24

tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack-legacy.tc

··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: event trigger - test inter-event histogram trigger trace action with dynamic string param (legacy stack) 4 + # requires: set_event synthetic_events events/sched/sched_process_exec/hist "long[] stack' >> synthetic_events":README 5 + 6 + fail() { #msg 7 + echo $1 8 + exit_fail 9 + } 10 + 11 + echo "Test create synthetic event with stack" 12 + 13 + # Test the old stacktrace keyword (for backward compatibility) 14 + echo 's:wake_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events 15 + echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger 16 + echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(wake_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger 17 + echo 1 > events/synthetic/wake_lat/enable 18 + sleep 1 19 + 20 + if ! grep -q "=>.*sched" trace; then 21 + fail "Failed to create synthetic event with stack" 22 + fi 23 + 24 + exit 0

+2 -3

tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack.tc

··· 1 1 #!/bin/sh 2 2 # SPDX-License-Identifier: GPL-2.0 3 3 # description: event trigger - test inter-event histogram trigger trace action with dynamic string param 4 - # requires: set_event synthetic_events events/sched/sched_process_exec/hist "long[]' >> synthetic_events":README 4 + # requires: set_event synthetic_events events/sched/sched_process_exec/hist "can be any field, or the special string 'common_stacktrace'":README 5 5 6 6 fail() { #msg 7 7 echo $1 ··· 10 10 11 11 echo "Test create synthetic event with stack" 12 12 13 - 14 13 echo 's:wake_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events 15 - echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger 14 + echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=common_stacktrace if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger 16 15 echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(wake_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger 17 16 echo 1 > events/synthetic/wake_lat/enable 18 17 sleep 1

+8 -5

tools/testing/selftests/mm/Makefile

··· 5 5 6 6 include local_config.mk 7 7 8 + ifeq ($(ARCH),) 9 + 8 10 ifeq ($(CROSS_COMPILE),) 9 11 uname_M := $(shell uname -m 2>/dev/null || echo not) 10 12 else 11 13 uname_M := $(shell echo $(CROSS_COMPILE) | grep -o '^[a-z0-9]\+') 12 14 endif 13 - MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/ppc64/') 15 + ARCH ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/ppc64/') 16 + endif 14 17 15 18 # Without this, failed build products remain, with up-to-date timestamps, 16 19 # thus tricking Make (and you!) into believing that All Is Well, in subsequent ··· 69 66 TEST_GEN_PROGS += ksm_functional_tests 70 67 TEST_GEN_PROGS += mdwe_test 71 68 72 - ifeq ($(MACHINE),x86_64) 69 + ifeq ($(ARCH),x86_64) 73 70 CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32) 74 71 CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c) 75 72 CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie) ··· 91 88 endif 92 89 else 93 90 94 - ifneq (,$(findstring $(MACHINE),ppc64)) 91 + ifneq (,$(findstring $(ARCH),ppc64)) 95 92 TEST_GEN_PROGS += protection_keys 96 93 endif 97 94 98 95 endif 99 96 100 - ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sparc64 x86_64)) 97 + ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sparc64 x86_64)) 101 98 TEST_GEN_PROGS += va_high_addr_switch 102 99 TEST_GEN_PROGS += virtual_address_range 103 100 TEST_GEN_PROGS += write_to_hugetlbfs ··· 116 113 $(OUTPUT)/uffd-stress: uffd-common.c 117 114 $(OUTPUT)/uffd-unit-tests: uffd-common.c 118 115 119 - ifeq ($(MACHINE),x86_64) 116 + ifeq ($(ARCH),x86_64) 120 117 BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32)) 121 118 BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64)) 122 119