Merge drm/drm-next into drm-misc-next · tjh.dev/kernel@8bd1a8e

+2 -4

CREDITS

··· 2515 2515 D: Initial implementation of VC's, pty's and select() 2516 2516 2517 2517 N: Pavel Machek 2518 - E: pavel@ucw.cz 2518 + E: pavel@kernel.org 2519 2519 P: 4096R/92DFCE96 4FA7 9EEF FCD4 C44F C585 B8C7 C060 2241 92DF CE96 2520 - D: Softcursor for vga, hypertech cdrom support, vcsa bugfix, nbd, 2521 - D: sun4/330 port, capabilities for elf, speedup for rm on ext2, USB, 2522 - D: work on suspend-to-ram/disk, killing duplicates from ioctl32, 2520 + D: NBD, Sun4/330 port, USB, work on suspend-to-ram/disk, 2523 2521 D: Altera SoCFPGA and Nokia N900 support. 2524 2522 S: Czech Republic 2525 2523

+2 -3

Documentation/devicetree/bindings/interrupt-controller/microchip,lan966x-oic.yaml

··· 14 14 15 15 description: | 16 16 The Microchip LAN966x outband interrupt controller (OIC) maps the internal 17 - interrupt sources of the LAN966x device to an external interrupt. 18 - When the LAN966x device is used as a PCI device, the external interrupt is 19 - routed to the PCI interrupt. 17 + interrupt sources of the LAN966x device to a PCI interrupt when the LAN966x 18 + device is used as a PCI device. 20 19 21 20 properties: 22 21 compatible:

+98

Documentation/filesystems/bcachefs/SubmittingPatches.rst

··· 1 + Submitting patches to bcachefs: 2 + =============================== 3 + 4 + Patches must be tested before being submitted, either with the xfstests suite 5 + [0], or the full bcachefs test suite in ktest [1], depending on what's being 6 + touched. Note that ktest wraps xfstests and will be an easier method to running 7 + it for most users; it includes single-command wrappers for all the mainstream 8 + in-kernel local filesystems. 9 + 10 + Patches will undergo more testing after being merged (including 11 + lockdep/kasan/preempt/etc. variants), these are not generally required to be 12 + run by the submitter - but do put some thought into what you're changing and 13 + which tests might be relevant, e.g. are you dealing with tricky memory layout 14 + work? kasan, are you doing locking work? then lockdep; and ktest includes 15 + single-command variants for the debug build types you'll most likely need. 16 + 17 + The exception to this rule is incomplete WIP/RFC patches: if you're working on 18 + something nontrivial, it's encouraged to send out a WIP patch to let people 19 + know what you're doing and make sure you're on the right track. Just make sure 20 + it includes a brief note as to what's done and what's incomplete, to avoid 21 + confusion. 22 + 23 + Rigorous checkpatch.pl adherence is not required (many of its warnings are 24 + considered out of date), but try not to deviate too much without reason. 25 + 26 + Focus on writing code that reads well and is organized well; code should be 27 + aesthetically pleasing. 28 + 29 + CI: 30 + === 31 + 32 + Instead of running your tests locally, when running the full test suite it's 33 + prefereable to let a server farm do it in parallel, and then have the results 34 + in a nice test dashboard (which can tell you which failures are new, and 35 + presents results in a git log view, avoiding the need for most bisecting). 36 + 37 + That exists [2], and community members may request an account. If you work for 38 + a big tech company, you'll need to help out with server costs to get access - 39 + but the CI is not restricted to running bcachefs tests: it runs any ktest test 40 + (which generally makes it easy to wrap other tests that can run in qemu). 41 + 42 + Other things to think about: 43 + ============================ 44 + 45 + - How will we debug this code? Is there sufficient introspection to diagnose 46 + when something starts acting wonky on a user machine? 47 + 48 + We don't necessarily need every single field of every data structure visible 49 + with introspection, but having the important fields of all the core data 50 + types wired up makes debugging drastically easier - a bit of thoughtful 51 + foresight greatly reduces the need to have people build custom kernels with 52 + debug patches. 53 + 54 + More broadly, think about all the debug tooling that might be needed. 55 + 56 + - Does it make the codebase more or less of a mess? Can we also try to do some 57 + organizing, too? 58 + 59 + - Do new tests need to be written? New assertions? How do we know and verify 60 + that the code is correct, and what happens if something goes wrong? 61 + 62 + We don't yet have automated code coverage analysis or easy fault injection - 63 + but for now, pretend we did and ask what they might tell us. 64 + 65 + Assertions are hugely important, given that we don't yet have a systems 66 + language that can do ergonomic embedded correctness proofs. Hitting an assert 67 + in testing is much better than wandering off into undefined behaviour la-la 68 + land - use them. Use them judiciously, and not as a replacement for proper 69 + error handling, but use them. 70 + 71 + - Does it need to be performance tested? Should we add new peformance counters? 72 + 73 + bcachefs has a set of persistent runtime counters which can be viewed with 74 + the 'bcachefs fs top' command; this should give users a basic idea of what 75 + their filesystem is currently doing. If you're doing a new feature or looking 76 + at old code, think if anything should be added. 77 + 78 + - If it's a new on disk format feature - have upgrades and downgrades been 79 + tested? (Automated tests exists but aren't in the CI, due to the hassle of 80 + disk image management; coordinate to have them run.) 81 + 82 + Mailing list, IRC: 83 + ================== 84 + 85 + Patches should hit the list [3], but much discussion and code review happens on 86 + IRC as well [4]; many people appreciate the more conversational approach and 87 + quicker feedback. 88 + 89 + Additionally, we have a lively user community doing excellent QA work, which 90 + exists primarily on IRC. Please make use of that resource; user feedback is 91 + important for any nontrivial feature, and documenting it in commit messages 92 + would be a good idea. 93 + 94 + [0]: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git 95 + [1]: https://evilpiepirate.org/git/ktest.git/ 96 + [2]: https://evilpiepirate.org/~testdashboard/ci/ 97 + [3]: linux-bcachefs@vger.kernel.org 98 + [4]: irc.oftc.net#bcache, #bcachefs-dev

+1

Documentation/filesystems/bcachefs/index.rst

··· 9 9 :numbered: 10 10 11 11 CodingStyle 12 + SubmittingPatches 12 13 errorcodes

+1 -1

Documentation/virt/kvm/api.rst

··· 1419 1419 S390: 1420 1420 ^^^^^ 1421 1421 1422 - Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set. 1422 + Returns -EINVAL or -EEXIST if the VM has the KVM_VM_S390_UCONTROL flag set. 1423 1423 Returns -EINVAL if called on a protected VM. 1424 1424 1425 1425 4.36 KVM_SET_TSS_ADDR

+50 -7

MAINTAINERS

··· 2209 2209 F: sound/soc/codecs/ssm3515.c 2210 2210 2211 2211 ARM/APPLE MACHINE SUPPORT 2212 - M: Hector Martin <marcan@marcan.st> 2213 2212 M: Sven Peter <sven@svenpeter.dev> 2214 2213 R: Alyssa Rosenzweig <alyssa@rosenzweig.io> 2215 2214 L: asahi@lists.linux.dev ··· 3954 3955 L: linux-bcachefs@vger.kernel.org 3955 3956 S: Supported 3956 3957 C: irc://irc.oftc.net/bcache 3958 + P: Documentation/filesystems/bcachefs/SubmittingPatches.rst 3957 3959 T: git https://evilpiepirate.org/git/bcachefs.git 3958 3960 F: fs/bcachefs/ 3959 3961 F: Documentation/filesystems/bcachefs/ ··· 9416 9416 9417 9417 FREEZER 9418 9418 M: "Rafael J. Wysocki" <rafael@kernel.org> 9419 - M: Pavel Machek <pavel@ucw.cz> 9419 + M: Pavel Machek <pavel@kernel.org> 9420 9420 L: linux-pm@vger.kernel.org 9421 9421 S: Supported 9422 9422 F: Documentation/power/freezing-of-tasks.rst ··· 9876 9876 F: drivers/staging/gpib/ 9877 9877 9878 9878 GPIO ACPI SUPPORT 9879 - M: Mika Westerberg <mika.westerberg@linux.intel.com> 9879 + M: Mika Westerberg <westeri@kernel.org> 9880 9880 M: Andy Shevchenko <andriy.shevchenko@linux.intel.com> 9881 9881 L: linux-gpio@vger.kernel.org 9882 9882 L: linux-acpi@vger.kernel.org ··· 10251 10251 10252 10252 HIBERNATION (aka Software Suspend, aka swsusp) 10253 10253 M: "Rafael J. Wysocki" <rafael@kernel.org> 10254 - M: Pavel Machek <pavel@ucw.cz> 10254 + M: Pavel Machek <pavel@kernel.org> 10255 10255 L: linux-pm@vger.kernel.org 10256 10256 S: Supported 10257 10257 B: https://bugzilla.kernel.org ··· 13122 13122 F: scripts/leaking_addresses.pl 13123 13123 13124 13124 LED SUBSYSTEM 13125 - M: Pavel Machek <pavel@ucw.cz> 13126 13125 M: Lee Jones <lee@kernel.org> 13126 + M: Pavel Machek <pavel@kernel.org> 13127 13127 L: linux-leds@vger.kernel.org 13128 13128 S: Maintained 13129 13129 T: git git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git ··· 16460 16460 F: net/dsa/ 16461 16461 F: tools/testing/selftests/drivers/net/dsa/ 16462 16462 16463 + NETWORKING [ETHTOOL] 16464 + M: Andrew Lunn <andrew@lunn.ch> 16465 + M: Jakub Kicinski <kuba@kernel.org> 16466 + F: Documentation/netlink/specs/ethtool.yaml 16467 + F: Documentation/networking/ethtool-netlink.rst 16468 + F: include/linux/ethtool* 16469 + F: include/uapi/linux/ethtool* 16470 + F: net/ethtool/ 16471 + F: tools/testing/selftests/drivers/net/*/ethtool* 16472 + 16473 + NETWORKING [ETHTOOL CABLE TEST] 16474 + M: Andrew Lunn <andrew@lunn.ch> 16475 + F: net/ethtool/cabletest.c 16476 + F: tools/testing/selftests/drivers/net/*/ethtool* 16477 + K: cable_test 16478 + 16463 16479 NETWORKING [GENERAL] 16464 16480 M: "David S. Miller" <davem@davemloft.net> 16465 16481 M: Eric Dumazet <edumazet@google.com> ··· 16635 16619 NETWORKING [TCP] 16636 16620 M: Eric Dumazet <edumazet@google.com> 16637 16621 M: Neal Cardwell <ncardwell@google.com> 16622 + R: Kuniyuki Iwashima <kuniyu@amazon.com> 16638 16623 L: netdev@vger.kernel.org 16639 16624 S: Maintained 16640 16625 F: Documentation/networking/net_cachelines/tcp_sock.rst ··· 16662 16645 F: include/net/tls.h 16663 16646 F: include/uapi/linux/tls.h 16664 16647 F: net/tls/* 16648 + 16649 + NETWORKING [SOCKETS] 16650 + M: Eric Dumazet <edumazet@google.com> 16651 + M: Kuniyuki Iwashima <kuniyu@amazon.com> 16652 + M: Paolo Abeni <pabeni@redhat.com> 16653 + M: Willem de Bruijn <willemb@google.com> 16654 + S: Maintained 16655 + F: include/linux/sock_diag.h 16656 + F: include/linux/socket.h 16657 + F: include/linux/sockptr.h 16658 + F: include/net/sock.h 16659 + F: include/net/sock_reuseport.h 16660 + F: include/uapi/linux/socket.h 16661 + F: net/core/*sock* 16662 + F: net/core/scm.c 16663 + F: net/socket.c 16664 + 16665 + NETWORKING [UNIX SOCKETS] 16666 + M: Kuniyuki Iwashima <kuniyu@amazon.com> 16667 + S: Maintained 16668 + F: include/net/af_unix.h 16669 + F: include/net/netns/unix.h 16670 + F: include/uapi/linux/unix_diag.h 16671 + F: net/unix/ 16672 + F: tools/testing/selftests/net/af_unix/ 16665 16673 16666 16674 NETXEN (1/10) GbE SUPPORT 16667 16675 M: Manish Chopra <manishc@marvell.com> ··· 16821 16779 F: kernel/time/tick*.* 16822 16780 16823 16781 NOKIA N900 CAMERA SUPPORT (ET8EK8 SENSOR, AD5820 FOCUS) 16824 - M: Pavel Machek <pavel@ucw.cz> 16782 + M: Pavel Machek <pavel@kernel.org> 16825 16783 M: Sakari Ailus <sakari.ailus@iki.fi> 16826 16784 L: linux-media@vger.kernel.org 16827 16785 S: Maintained ··· 17752 17710 L: dev@openvswitch.org 17753 17711 S: Maintained 17754 17712 W: http://openvswitch.org 17713 + F: Documentation/networking/openvswitch.rst 17755 17714 F: include/uapi/linux/openvswitch.h 17756 17715 F: net/openvswitch/ 17757 17716 F: tools/testing/selftests/net/openvswitch/ ··· 22846 22803 SUSPEND TO RAM 22847 22804 M: "Rafael J. Wysocki" <rafael@kernel.org> 22848 22805 M: Len Brown <len.brown@intel.com> 22849 - M: Pavel Machek <pavel@ucw.cz> 22806 + M: Pavel Machek <pavel@kernel.org> 22850 22807 L: linux-pm@vger.kernel.org 22851 22808 S: Supported 22852 22809 B: https://bugzilla.kernel.org

+1 -1

Makefile

··· 2 2 VERSION = 6 3 3 PATCHLEVEL = 14 4 4 SUBLEVEL = 0 5 - EXTRAVERSION = -rc1 5 + EXTRAVERSION = -rc2 6 6 NAME = Baby Opossum Posse 7 7 8 8 # *DOCUMENTATION*

+1 -5

arch/alpha/include/asm/elf.h

··· 74 74 /* 75 75 * This is used to ensure we don't load something for the wrong architecture. 76 76 */ 77 - #define elf_check_arch(x) ((x)->e_machine == EM_ALPHA) 77 + #define elf_check_arch(x) (((x)->e_machine == EM_ALPHA) && !((x)->e_flags & EF_ALPHA_32BIT)) 78 78 79 79 /* 80 80 * These are used to set parameters in the core dumps. ··· 136 136 ( i_ == IMPLVER_EV5 ? "ev56" \ 137 137 : amask (AMASK_CIX) ? "ev6" : "ev67"); \ 138 138 }) 139 - 140 - #define SET_PERSONALITY(EX) \ 141 - set_personality(((EX).e_flags & EF_ALPHA_32BIT) \ 142 - ? PER_LINUX_32BIT : PER_LINUX) 143 139 144 140 extern int alpha_l1i_cacheshape; 145 141 extern int alpha_l1d_cacheshape;

+1 -1

arch/alpha/include/asm/pgtable.h

··· 360 360 361 361 extern void paging_init(void); 362 362 363 - /* We have our own get_unmapped_area to cope with ADDR_LIMIT_32BIT. */ 363 + /* We have our own get_unmapped_area */ 364 364 #define HAVE_ARCH_UNMAPPED_AREA 365 365 366 366 #endif /* _ALPHA_PGTABLE_H */

+2 -6

arch/alpha/include/asm/processor.h

··· 8 8 #ifndef __ASM_ALPHA_PROCESSOR_H 9 9 #define __ASM_ALPHA_PROCESSOR_H 10 10 11 - #include <linux/personality.h> /* for ADDR_LIMIT_32BIT */ 12 - 13 11 /* 14 12 * We have a 42-bit user address space: 4TB user VM... 15 13 */ 16 14 #define TASK_SIZE (0x40000000000UL) 17 15 18 - #define STACK_TOP \ 19 - (current->personality & ADDR_LIMIT_32BIT ? 0x80000000 : 0x00120000000UL) 16 + #define STACK_TOP (0x00120000000UL) 20 17 21 18 #define STACK_TOP_MAX 0x00120000000UL 22 19 23 20 /* This decides where the kernel will search for a free chunk of vm 24 21 * space during mmap's. 25 22 */ 26 - #define TASK_UNMAPPED_BASE \ 27 - ((current->personality & ADDR_LIMIT_32BIT) ? 0x40000000 : TASK_SIZE / 2) 23 + #define TASK_UNMAPPED_BASE (TASK_SIZE / 2) 28 24 29 25 /* This is dead. Everything has been moved to thread_info. */ 30 26 struct thread_struct { };

+2 -9

arch/alpha/kernel/osf_sys.c

··· 1210 1210 return ret; 1211 1211 } 1212 1212 1213 - /* Get an address range which is currently unmapped. Similar to the 1214 - generic version except that we know how to honor ADDR_LIMIT_32BIT. */ 1213 + /* Get an address range which is currently unmapped. */ 1215 1214 1216 1215 static unsigned long 1217 1216 arch_get_unmapped_area_1(unsigned long addr, unsigned long len, ··· 1229 1230 unsigned long len, unsigned long pgoff, 1230 1231 unsigned long flags, vm_flags_t vm_flags) 1231 1232 { 1232 - unsigned long limit; 1233 - 1234 - /* "32 bit" actually means 31 bit, since pointers sign extend. */ 1235 - if (current->personality & ADDR_LIMIT_32BIT) 1236 - limit = 0x80000000; 1237 - else 1238 - limit = TASK_SIZE; 1233 + unsigned long limit = TASK_SIZE; 1239 1234 1240 1235 if (len > limit) 1241 1236 return -ENOMEM;

+11 -38

arch/arm64/kvm/arch_timer.c

··· 471 471 472 472 trace_kvm_timer_emulate(ctx, should_fire); 473 473 474 - if (should_fire != ctx->irq.level) { 474 + if (should_fire != ctx->irq.level) 475 475 kvm_timer_update_irq(ctx->vcpu, should_fire, ctx); 476 - return; 477 - } 478 476 479 477 kvm_timer_update_status(ctx, should_fire); 480 478 ··· 759 761 timer_irq(map->direct_ptimer), 760 762 &arch_timer_irq_ops); 761 763 WARN_ON_ONCE(ret); 762 - 763 - /* 764 - * The virtual offset behaviour is "interesting", as it 765 - * always applies when HCR_EL2.E2H==0, but only when 766 - * accessed from EL1 when HCR_EL2.E2H==1. So make sure we 767 - * track E2H when putting the HV timer in "direct" mode. 768 - */ 769 - if (map->direct_vtimer == vcpu_hvtimer(vcpu)) { 770 - struct arch_timer_offset *offs = &map->direct_vtimer->offset; 771 - 772 - if (vcpu_el2_e2h_is_set(vcpu)) 773 - offs->vcpu_offset = NULL; 774 - else 775 - offs->vcpu_offset = &__vcpu_sys_reg(vcpu, CNTVOFF_EL2); 776 - } 777 764 } 778 765 } 779 766 ··· 959 976 * which allows trapping of the timer registers even with NV2. 960 977 * Still, this is still worse than FEAT_NV on its own. Meh. 961 978 */ 962 - if (!vcpu_el2_e2h_is_set(vcpu)) { 963 - if (cpus_have_final_cap(ARM64_HAS_ECV)) 964 - return; 965 - 966 - /* 967 - * A non-VHE guest hypervisor doesn't have any direct access 968 - * to its timers: the EL2 registers trap (and the HW is 969 - * fully emulated), while the EL0 registers access memory 970 - * despite the access being notionally direct. Boo. 971 - * 972 - * We update the hardware timer registers with the 973 - * latest value written by the guest to the VNCR page 974 - * and let the hardware take care of the rest. 975 - */ 976 - write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0), SYS_CNTV_CTL); 977 - write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL); 978 - write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0), SYS_CNTP_CTL); 979 - write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL); 980 - } else { 979 + if (!cpus_have_final_cap(ARM64_HAS_ECV)) { 981 980 /* 982 981 * For a VHE guest hypervisor, the EL2 state is directly 983 - * stored in the host EL1 timers, while the emulated EL0 982 + * stored in the host EL1 timers, while the emulated EL1 984 983 * state is stored in the VNCR page. The latter could have 985 984 * been updated behind our back, and we must reset the 986 985 * emulation of the timers. 986 + * 987 + * A non-VHE guest hypervisor doesn't have any direct access 988 + * to its timers: the EL2 registers trap despite being 989 + * notionally direct (we use the EL1 HW, as for VHE), while 990 + * the EL1 registers access memory. 991 + * 992 + * In both cases, process the emulated timers on each guest 993 + * exit. Boo. 987 994 */ 988 995 struct timer_map map; 989 996 get_timer_map(vcpu, &map);

+20

arch/arm64/kvm/arm.c

··· 2290 2290 break; 2291 2291 case -ENODEV: 2292 2292 case -ENXIO: 2293 + /* 2294 + * No VGIC? No pKVM for you. 2295 + * 2296 + * Protected mode assumes that VGICv3 is present, so no point 2297 + * in trying to hobble along if vgic initialization fails. 2298 + */ 2299 + if (is_protected_kvm_enabled()) 2300 + goto out; 2301 + 2302 + /* 2303 + * Otherwise, userspace could choose to implement a GIC for its 2304 + * guest on non-cooperative hardware. 2305 + */ 2293 2306 vgic_present = false; 2294 2307 err = 0; 2295 2308 break; ··· 2413 2400 kvm_nvhe_sym(id_aa64smfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64SMFR0_EL1); 2414 2401 kvm_nvhe_sym(__icache_flags) = __icache_flags; 2415 2402 kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits; 2403 + 2404 + /* 2405 + * Flush entire BSS since part of its data containing init symbols is read 2406 + * while the MMU is off. 2407 + */ 2408 + kvm_flush_dcache_to_poc(kvm_ksym_ref(__hyp_bss_start), 2409 + kvm_ksym_ref(__hyp_bss_end) - kvm_ksym_ref(__hyp_bss_start)); 2416 2410 } 2417 2411 2418 2412 static int __init kvm_hyp_init_protection(u32 hyp_va_bits)

+24

arch/arm64/kvm/hyp/nvhe/hyp-main.c

··· 91 91 *host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED; 92 92 } 93 93 94 + static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu) 95 + { 96 + struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu; 97 + 98 + hyp_vcpu->vcpu.arch.debug_owner = host_vcpu->arch.debug_owner; 99 + 100 + if (kvm_guest_owns_debug_regs(&hyp_vcpu->vcpu)) 101 + hyp_vcpu->vcpu.arch.vcpu_debug_state = host_vcpu->arch.vcpu_debug_state; 102 + else if (kvm_host_owns_debug_regs(&hyp_vcpu->vcpu)) 103 + hyp_vcpu->vcpu.arch.external_debug_state = host_vcpu->arch.external_debug_state; 104 + } 105 + 106 + static void sync_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu) 107 + { 108 + struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu; 109 + 110 + if (kvm_guest_owns_debug_regs(&hyp_vcpu->vcpu)) 111 + host_vcpu->arch.vcpu_debug_state = hyp_vcpu->vcpu.arch.vcpu_debug_state; 112 + else if (kvm_host_owns_debug_regs(&hyp_vcpu->vcpu)) 113 + host_vcpu->arch.external_debug_state = hyp_vcpu->vcpu.arch.external_debug_state; 114 + } 115 + 94 116 static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu) 95 117 { 96 118 struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu; 97 119 98 120 fpsimd_sve_flush(); 121 + flush_debug_state(hyp_vcpu); 99 122 100 123 hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt; 101 124 ··· 146 123 unsigned int i; 147 124 148 125 fpsimd_sve_sync(&hyp_vcpu->vcpu); 126 + sync_debug_state(hyp_vcpu); 149 127 150 128 host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt; 151 129

+5 -4

arch/arm64/kvm/nested.c

··· 67 67 if (!tmp) 68 68 return -ENOMEM; 69 69 70 + swap(kvm->arch.nested_mmus, tmp); 71 + 70 72 /* 71 73 * If we went through a realocation, adjust the MMU back-pointers in 72 74 * the previously initialised kvm_pgtable structures. 73 75 */ 74 76 if (kvm->arch.nested_mmus != tmp) 75 77 for (int i = 0; i < kvm->arch.nested_mmus_size; i++) 76 - tmp[i].pgt->mmu = &tmp[i]; 78 + kvm->arch.nested_mmus[i].pgt->mmu = &kvm->arch.nested_mmus[i]; 77 79 78 80 for (int i = kvm->arch.nested_mmus_size; !ret && i < num_mmus; i++) 79 - ret = init_nested_s2_mmu(kvm, &tmp[i]); 81 + ret = init_nested_s2_mmu(kvm, &kvm->arch.nested_mmus[i]); 80 82 81 83 if (ret) { 82 84 for (int i = kvm->arch.nested_mmus_size; i < num_mmus; i++) 83 - kvm_free_stage2_pgd(&tmp[i]); 85 + kvm_free_stage2_pgd(&kvm->arch.nested_mmus[i]); 84 86 85 87 return ret; 86 88 } 87 89 88 90 kvm->arch.nested_mmus_size = num_mmus; 89 - kvm->arch.nested_mmus = tmp; 90 91 91 92 return 0; 92 93 }

+13 -3

arch/arm64/kvm/sys_regs.c

··· 1452 1452 return true; 1453 1453 } 1454 1454 1455 + static bool access_hv_timer(struct kvm_vcpu *vcpu, 1456 + struct sys_reg_params *p, 1457 + const struct sys_reg_desc *r) 1458 + { 1459 + if (!vcpu_el2_e2h_is_set(vcpu)) 1460 + return undef_access(vcpu, p, r); 1461 + 1462 + return access_arch_timer(vcpu, p, r); 1463 + } 1464 + 1455 1465 static s64 kvm_arm64_ftr_safe_value(u32 id, const struct arm64_ftr_bits *ftrp, 1456 1466 s64 new, s64 cur) 1457 1467 { ··· 3113 3103 EL2_REG(CNTHP_CTL_EL2, access_arch_timer, reset_val, 0), 3114 3104 EL2_REG(CNTHP_CVAL_EL2, access_arch_timer, reset_val, 0), 3115 3105 3116 - { SYS_DESC(SYS_CNTHV_TVAL_EL2), access_arch_timer }, 3117 - EL2_REG(CNTHV_CTL_EL2, access_arch_timer, reset_val, 0), 3118 - EL2_REG(CNTHV_CVAL_EL2, access_arch_timer, reset_val, 0), 3106 + { SYS_DESC(SYS_CNTHV_TVAL_EL2), access_hv_timer }, 3107 + EL2_REG(CNTHV_CTL_EL2, access_hv_timer, reset_val, 0), 3108 + EL2_REG(CNTHV_CVAL_EL2, access_hv_timer, reset_val, 0), 3119 3109 3120 3110 { SYS_DESC(SYS_CNTKCTL_EL12), access_cntkctl_el12 }, 3121 3111

+1 -1

arch/powerpc/sysdev/fsl_msi.c

··· 75 75 srs = (hwirq >> msi_data->srs_shift) & MSI_SRS_MASK; 76 76 cascade_virq = msi_data->cascade_array[srs]->virq; 77 77 78 - seq_printf(p, " fsl-msi-%d", cascade_virq); 78 + seq_printf(p, "fsl-msi-%d", cascade_virq); 79 79 } 80 80 81 81

+6 -14

arch/s390/include/asm/gmap.h

··· 23 23 /** 24 24 * struct gmap_struct - guest address space 25 25 * @list: list head for the mm->context gmap list 26 - * @crst_list: list of all crst tables used in the guest address space 27 26 * @mm: pointer to the parent mm_struct 28 27 * @guest_to_host: radix tree with guest to host address translation 29 28 * @host_to_guest: radix tree with pointer to segment table entries ··· 34 35 * @guest_handle: protected virtual machine handle for the ultravisor 35 36 * @host_to_rmap: radix tree with gmap_rmap lists 36 37 * @children: list of shadow gmap structures 37 - * @pt_list: list of all page tables used in the shadow guest address space 38 38 * @shadow_lock: spinlock to protect the shadow gmap list 39 39 * @parent: pointer to the parent gmap for shadow guest address spaces 40 40 * @orig_asce: ASCE for which the shadow page table has been created ··· 43 45 */ 44 46 struct gmap { 45 47 struct list_head list; 46 - struct list_head crst_list; 47 48 struct mm_struct *mm; 48 49 struct radix_tree_root guest_to_host; 49 50 struct radix_tree_root host_to_guest; ··· 58 61 /* Additional data for shadow guest address spaces */ 59 62 struct radix_tree_root host_to_rmap; 60 63 struct list_head children; 61 - struct list_head pt_list; 62 64 spinlock_t shadow_lock; 63 65 struct gmap *parent; 64 66 unsigned long orig_asce; ··· 102 106 void gmap_remove(struct gmap *gmap); 103 107 struct gmap *gmap_get(struct gmap *gmap); 104 108 void gmap_put(struct gmap *gmap); 109 + void gmap_free(struct gmap *gmap); 110 + struct gmap *gmap_alloc(unsigned long limit); 105 111 106 112 int gmap_map_segment(struct gmap *gmap, unsigned long from, 107 113 unsigned long to, unsigned long len); 108 114 int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long len); 109 115 unsigned long __gmap_translate(struct gmap *, unsigned long gaddr); 110 - unsigned long gmap_translate(struct gmap *, unsigned long gaddr); 111 116 int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr); 112 - int gmap_fault(struct gmap *, unsigned long gaddr, unsigned int fault_flags); 113 117 void gmap_discard(struct gmap *, unsigned long from, unsigned long to); 114 118 void __gmap_zap(struct gmap *, unsigned long gaddr); 115 119 void gmap_unlink(struct mm_struct *, unsigned long *table, unsigned long vmaddr); 116 120 117 121 int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val); 118 122 119 - struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, 120 - int edat_level); 121 - int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level); 123 + void gmap_unshadow(struct gmap *sg); 122 124 int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t, 123 125 int fake); 124 126 int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t, ··· 125 131 int fake); 126 132 int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt, 127 133 int fake); 128 - int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, 129 - unsigned long *pgt, int *dat_protection, int *fake); 130 134 int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte); 131 135 132 136 void gmap_register_pte_notifier(struct gmap_notifier *); 133 137 void gmap_unregister_pte_notifier(struct gmap_notifier *); 134 138 135 - int gmap_mprotect_notify(struct gmap *, unsigned long start, 136 - unsigned long len, int prot); 139 + int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, unsigned long bits); 137 140 138 141 void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4], 139 142 unsigned long gaddr, unsigned long vmaddr); 140 143 int s390_disable_cow_sharing(void); 141 - void s390_unlist_old_asce(struct gmap *gmap); 142 144 int s390_replace_asce(struct gmap *gmap); 143 145 void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns); 144 146 int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start, 145 147 unsigned long end, bool interruptible); 148 + int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split); 149 + unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int level); 146 150 147 151 /** 148 152 * s390_uv_destroy_range - Destroy a range of pages in the given mm.

+5 -1

arch/s390/include/asm/kvm_host.h

··· 30 30 #define KVM_S390_ESCA_CPU_SLOTS 248 31 31 #define KVM_MAX_VCPUS 255 32 32 33 + #define KVM_INTERNAL_MEM_SLOTS 1 34 + 33 35 /* 34 36 * These seem to be used for allocating ->chip in the routing table, which we 35 37 * don't use. 1 is as small as we can get to reduce the needed memory. If we ··· 933 931 u8 reserved928[0x1000 - 0x928]; /* 0x0928 */ 934 932 }; 935 933 934 + struct vsie_page; 935 + 936 936 struct kvm_s390_vsie { 937 937 struct mutex mutex; 938 938 struct radix_tree_root addr_to_page; 939 939 int page_count; 940 940 int next; 941 - struct page *pages[KVM_MAX_VCPUS]; 941 + struct vsie_page *pages[KVM_MAX_VCPUS]; 942 942 }; 943 943 944 944 struct kvm_s390_gisa_iam {

+18 -3

arch/s390/include/asm/pgtable.h

··· 420 420 #define PGSTE_HC_BIT 0x0020000000000000UL 421 421 #define PGSTE_GR_BIT 0x0004000000000000UL 422 422 #define PGSTE_GC_BIT 0x0002000000000000UL 423 - #define PGSTE_UC_BIT 0x0000800000000000UL /* user dirty (migration) */ 424 - #define PGSTE_IN_BIT 0x0000400000000000UL /* IPTE notify bit */ 425 - #define PGSTE_VSIE_BIT 0x0000200000000000UL /* ref'd in a shadow table */ 423 + #define PGSTE_ST2_MASK 0x0000ffff00000000UL 424 + #define PGSTE_UC_BIT 0x0000000000008000UL /* user dirty (migration) */ 425 + #define PGSTE_IN_BIT 0x0000000000004000UL /* IPTE notify bit */ 426 + #define PGSTE_VSIE_BIT 0x0000000000002000UL /* ref'd in a shadow table */ 426 427 427 428 /* Guest Page State used for virtualization */ 428 429 #define _PGSTE_GPS_ZERO 0x0000000080000000UL ··· 2007 2006 2008 2007 #define pmd_pgtable(pmd) \ 2009 2008 ((pgtable_t)__va(pmd_val(pmd) & -sizeof(pte_t)*PTRS_PER_PTE)) 2009 + 2010 + static inline unsigned long gmap_pgste_get_pgt_addr(unsigned long *pgt) 2011 + { 2012 + unsigned long *pgstes, res; 2013 + 2014 + pgstes = pgt + _PAGE_ENTRIES; 2015 + 2016 + res = (pgstes[0] & PGSTE_ST2_MASK) << 16; 2017 + res |= pgstes[1] & PGSTE_ST2_MASK; 2018 + res |= (pgstes[2] & PGSTE_ST2_MASK) >> 16; 2019 + res |= (pgstes[3] & PGSTE_ST2_MASK) >> 32; 2020 + 2021 + return res; 2022 + } 2010 2023 2011 2024 #endif /* _S390_PAGE_H */

+3 -3

arch/s390/include/asm/uv.h

··· 628 628 } 629 629 630 630 int uv_pin_shared(unsigned long paddr); 631 - int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb); 632 - int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr); 633 631 int uv_destroy_folio(struct folio *folio); 634 632 int uv_destroy_pte(pte_t pte); 635 633 int uv_convert_from_secure_pte(pte_t pte); 636 - int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr); 634 + int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb); 635 + int uv_convert_from_secure(unsigned long paddr); 636 + int uv_convert_from_secure_folio(struct folio *folio); 637 637 638 638 void setup_uv(void); 639 639

+29 -263

arch/s390/kernel/uv.c

··· 19 19 #include <asm/sections.h> 20 20 #include <asm/uv.h> 21 21 22 - #if !IS_ENABLED(CONFIG_KVM) 23 - unsigned long __gmap_translate(struct gmap *gmap, unsigned long gaddr) 24 - { 25 - return 0; 26 - } 27 - 28 - int gmap_fault(struct gmap *gmap, unsigned long gaddr, 29 - unsigned int fault_flags) 30 - { 31 - return 0; 32 - } 33 - #endif 34 - 35 22 /* the bootdata_preserved fields come from ones in arch/s390/boot/uv.c */ 36 23 int __bootdata_preserved(prot_virt_guest); 37 24 EXPORT_SYMBOL(prot_virt_guest); ··· 146 159 folio_put(folio); 147 160 return rc; 148 161 } 162 + EXPORT_SYMBOL(uv_destroy_folio); 149 163 150 164 /* 151 165 * The present PTE still indirectly holds a folio reference through the mapping. ··· 163 175 * 164 176 * @paddr: Absolute host address of page to be exported 165 177 */ 166 - static int uv_convert_from_secure(unsigned long paddr) 178 + int uv_convert_from_secure(unsigned long paddr) 167 179 { 168 180 struct uv_cb_cfs uvcb = { 169 181 .header.cmd = UVC_CMD_CONV_FROM_SEC_STOR, ··· 175 187 return -EINVAL; 176 188 return 0; 177 189 } 190 + EXPORT_SYMBOL_GPL(uv_convert_from_secure); 178 191 179 192 /* 180 193 * The caller must already hold a reference to the folio. 181 194 */ 182 - static int uv_convert_from_secure_folio(struct folio *folio) 195 + int uv_convert_from_secure_folio(struct folio *folio) 183 196 { 184 197 int rc; 185 198 ··· 195 206 folio_put(folio); 196 207 return rc; 197 208 } 209 + EXPORT_SYMBOL_GPL(uv_convert_from_secure_folio); 198 210 199 211 /* 200 212 * The present PTE still indirectly holds a folio reference through the mapping. ··· 227 237 return res; 228 238 } 229 239 230 - static int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb) 240 + /** 241 + * make_folio_secure() - make a folio secure 242 + * @folio: the folio to make secure 243 + * @uvcb: the uvcb that describes the UVC to be used 244 + * 245 + * The folio @folio will be made secure if possible, @uvcb will be passed 246 + * as-is to the UVC. 247 + * 248 + * Return: 0 on success; 249 + * -EBUSY if the folio is in writeback or has too many references; 250 + * -E2BIG if the folio is large; 251 + * -EAGAIN if the UVC needs to be attempted again; 252 + * -ENXIO if the address is not mapped; 253 + * -EINVAL if the UVC failed for other reasons. 254 + * 255 + * Context: The caller must hold exactly one extra reference on the folio 256 + * (it's the same logic as split_folio()) 257 + */ 258 + int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb) 231 259 { 232 260 int expected, cc = 0; 233 261 262 + if (folio_test_large(folio)) 263 + return -E2BIG; 234 264 if (folio_test_writeback(folio)) 235 - return -EAGAIN; 236 - expected = expected_folio_refs(folio); 265 + return -EBUSY; 266 + expected = expected_folio_refs(folio) + 1; 237 267 if (!folio_ref_freeze(folio, expected)) 238 268 return -EBUSY; 239 269 set_bit(PG_arch_1, &folio->flags); ··· 277 267 return -EAGAIN; 278 268 return uvcb->rc == 0x10a ? -ENXIO : -EINVAL; 279 269 } 280 - 281 - /** 282 - * should_export_before_import - Determine whether an export is needed 283 - * before an import-like operation 284 - * @uvcb: the Ultravisor control block of the UVC to be performed 285 - * @mm: the mm of the process 286 - * 287 - * Returns whether an export is needed before every import-like operation. 288 - * This is needed for shared pages, which don't trigger a secure storage 289 - * exception when accessed from a different guest. 290 - * 291 - * Although considered as one, the Unpin Page UVC is not an actual import, 292 - * so it is not affected. 293 - * 294 - * No export is needed also when there is only one protected VM, because the 295 - * page cannot belong to the wrong VM in that case (there is no "other VM" 296 - * it can belong to). 297 - * 298 - * Return: true if an export is needed before every import, otherwise false. 299 - */ 300 - static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm) 301 - { 302 - /* 303 - * The misc feature indicates, among other things, that importing a 304 - * shared page from a different protected VM will automatically also 305 - * transfer its ownership. 306 - */ 307 - if (uv_has_feature(BIT_UV_FEAT_MISC)) 308 - return false; 309 - if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED) 310 - return false; 311 - return atomic_read(&mm->context.protected_count) > 1; 312 - } 313 - 314 - /* 315 - * Drain LRU caches: the local one on first invocation and the ones of all 316 - * CPUs on successive invocations. Returns "true" on the first invocation. 317 - */ 318 - static bool drain_lru(bool *drain_lru_called) 319 - { 320 - /* 321 - * If we have tried a local drain and the folio refcount 322 - * still does not match our expected safe value, try with a 323 - * system wide drain. This is needed if the pagevecs holding 324 - * the page are on a different CPU. 325 - */ 326 - if (*drain_lru_called) { 327 - lru_add_drain_all(); 328 - /* We give up here, don't retry immediately. */ 329 - return false; 330 - } 331 - /* 332 - * We are here if the folio refcount does not match the 333 - * expected safe value. The main culprits are usually 334 - * pagevecs. With lru_add_drain() we drain the pagevecs 335 - * on the local CPU so that hopefully the refcount will 336 - * reach the expected safe value. 337 - */ 338 - lru_add_drain(); 339 - *drain_lru_called = true; 340 - /* The caller should try again immediately */ 341 - return true; 342 - } 343 - 344 - /* 345 - * Requests the Ultravisor to make a page accessible to a guest. 346 - * If it's brought in the first time, it will be cleared. If 347 - * it has been exported before, it will be decrypted and integrity 348 - * checked. 349 - */ 350 - int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb) 351 - { 352 - struct vm_area_struct *vma; 353 - bool drain_lru_called = false; 354 - spinlock_t *ptelock; 355 - unsigned long uaddr; 356 - struct folio *folio; 357 - pte_t *ptep; 358 - int rc; 359 - 360 - again: 361 - rc = -EFAULT; 362 - mmap_read_lock(gmap->mm); 363 - 364 - uaddr = __gmap_translate(gmap, gaddr); 365 - if (IS_ERR_VALUE(uaddr)) 366 - goto out; 367 - vma = vma_lookup(gmap->mm, uaddr); 368 - if (!vma) 369 - goto out; 370 - /* 371 - * Secure pages cannot be huge and userspace should not combine both. 372 - * In case userspace does it anyway this will result in an -EFAULT for 373 - * the unpack. The guest is thus never reaching secure mode. If 374 - * userspace is playing dirty tricky with mapping huge pages later 375 - * on this will result in a segmentation fault. 376 - */ 377 - if (is_vm_hugetlb_page(vma)) 378 - goto out; 379 - 380 - rc = -ENXIO; 381 - ptep = get_locked_pte(gmap->mm, uaddr, &ptelock); 382 - if (!ptep) 383 - goto out; 384 - if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) { 385 - folio = page_folio(pte_page(*ptep)); 386 - rc = -EAGAIN; 387 - if (folio_test_large(folio)) { 388 - rc = -E2BIG; 389 - } else if (folio_trylock(folio)) { 390 - if (should_export_before_import(uvcb, gmap->mm)) 391 - uv_convert_from_secure(PFN_PHYS(folio_pfn(folio))); 392 - rc = make_folio_secure(folio, uvcb); 393 - folio_unlock(folio); 394 - } 395 - 396 - /* 397 - * Once we drop the PTL, the folio may get unmapped and 398 - * freed immediately. We need a temporary reference. 399 - */ 400 - if (rc == -EAGAIN || rc == -E2BIG) 401 - folio_get(folio); 402 - } 403 - pte_unmap_unlock(ptep, ptelock); 404 - out: 405 - mmap_read_unlock(gmap->mm); 406 - 407 - switch (rc) { 408 - case -E2BIG: 409 - folio_lock(folio); 410 - rc = split_folio(folio); 411 - folio_unlock(folio); 412 - folio_put(folio); 413 - 414 - switch (rc) { 415 - case 0: 416 - /* Splitting succeeded, try again immediately. */ 417 - goto again; 418 - case -EAGAIN: 419 - /* Additional folio references. */ 420 - if (drain_lru(&drain_lru_called)) 421 - goto again; 422 - return -EAGAIN; 423 - case -EBUSY: 424 - /* Unexpected race. */ 425 - return -EAGAIN; 426 - } 427 - WARN_ON_ONCE(1); 428 - return -ENXIO; 429 - case -EAGAIN: 430 - /* 431 - * If we are here because the UVC returned busy or partial 432 - * completion, this is just a useless check, but it is safe. 433 - */ 434 - folio_wait_writeback(folio); 435 - folio_put(folio); 436 - return -EAGAIN; 437 - case -EBUSY: 438 - /* Additional folio references. */ 439 - if (drain_lru(&drain_lru_called)) 440 - goto again; 441 - return -EAGAIN; 442 - case -ENXIO: 443 - if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE)) 444 - return -EFAULT; 445 - return -EAGAIN; 446 - } 447 - return rc; 448 - } 449 - EXPORT_SYMBOL_GPL(gmap_make_secure); 450 - 451 - int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr) 452 - { 453 - struct uv_cb_cts uvcb = { 454 - .header.cmd = UVC_CMD_CONV_TO_SEC_STOR, 455 - .header.len = sizeof(uvcb), 456 - .guest_handle = gmap->guest_handle, 457 - .gaddr = gaddr, 458 - }; 459 - 460 - return gmap_make_secure(gmap, gaddr, &uvcb); 461 - } 462 - EXPORT_SYMBOL_GPL(gmap_convert_to_secure); 463 - 464 - /** 465 - * gmap_destroy_page - Destroy a guest page. 466 - * @gmap: the gmap of the guest 467 - * @gaddr: the guest address to destroy 468 - * 469 - * An attempt will be made to destroy the given guest page. If the attempt 470 - * fails, an attempt is made to export the page. If both attempts fail, an 471 - * appropriate error is returned. 472 - */ 473 - int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr) 474 - { 475 - struct vm_area_struct *vma; 476 - struct folio_walk fw; 477 - unsigned long uaddr; 478 - struct folio *folio; 479 - int rc; 480 - 481 - rc = -EFAULT; 482 - mmap_read_lock(gmap->mm); 483 - 484 - uaddr = __gmap_translate(gmap, gaddr); 485 - if (IS_ERR_VALUE(uaddr)) 486 - goto out; 487 - vma = vma_lookup(gmap->mm, uaddr); 488 - if (!vma) 489 - goto out; 490 - /* 491 - * Huge pages should not be able to become secure 492 - */ 493 - if (is_vm_hugetlb_page(vma)) 494 - goto out; 495 - 496 - rc = 0; 497 - folio = folio_walk_start(&fw, vma, uaddr, 0); 498 - if (!folio) 499 - goto out; 500 - /* 501 - * See gmap_make_secure(): large folios cannot be secure. Small 502 - * folio implies FW_LEVEL_PTE. 503 - */ 504 - if (folio_test_large(folio) || !pte_write(fw.pte)) 505 - goto out_walk_end; 506 - rc = uv_destroy_folio(folio); 507 - /* 508 - * Fault handlers can race; it is possible that two CPUs will fault 509 - * on the same secure page. One CPU can destroy the page, reboot, 510 - * re-enter secure mode and import it, while the second CPU was 511 - * stuck at the beginning of the handler. At some point the second 512 - * CPU will be able to progress, and it will not be able to destroy 513 - * the page. In that case we do not want to terminate the process, 514 - * we instead try to export the page. 515 - */ 516 - if (rc) 517 - rc = uv_convert_from_secure_folio(folio); 518 - out_walk_end: 519 - folio_walk_end(&fw, vma); 520 - out: 521 - mmap_read_unlock(gmap->mm); 522 - return rc; 523 - } 524 - EXPORT_SYMBOL_GPL(gmap_destroy_page); 270 + EXPORT_SYMBOL_GPL(make_folio_secure); 525 271 526 272 /* 527 273 * To be called with the folio locked or with an extra reference! This will

+1 -1

arch/s390/kvm/Makefile

··· 8 8 ccflags-y := -Ivirt/kvm -Iarch/s390/kvm 9 9 10 10 kvm-y += kvm-s390.o intercept.o interrupt.o priv.o sigp.o 11 - kvm-y += diag.o gaccess.o guestdbg.o vsie.o pv.o 11 + kvm-y += diag.o gaccess.o guestdbg.o vsie.o pv.o gmap.o gmap-vsie.o 12 12 13 13 kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) += pci.o 14 14 obj-$(CONFIG_KVM) += kvm.o

+43 -1

arch/s390/kvm/gaccess.c

··· 16 16 #include <asm/gmap.h> 17 17 #include <asm/dat-bits.h> 18 18 #include "kvm-s390.h" 19 + #include "gmap.h" 19 20 #include "gaccess.h" 20 21 21 22 /* ··· 1394 1393 } 1395 1394 1396 1395 /** 1396 + * shadow_pgt_lookup() - find a shadow page table 1397 + * @sg: pointer to the shadow guest address space structure 1398 + * @saddr: the address in the shadow aguest address space 1399 + * @pgt: parent gmap address of the page table to get shadowed 1400 + * @dat_protection: if the pgtable is marked as protected by dat 1401 + * @fake: pgt references contiguous guest memory block, not a pgtable 1402 + * 1403 + * Returns 0 if the shadow page table was found and -EAGAIN if the page 1404 + * table was not found. 1405 + * 1406 + * Called with sg->mm->mmap_lock in read. 1407 + */ 1408 + static int shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, unsigned long *pgt, 1409 + int *dat_protection, int *fake) 1410 + { 1411 + unsigned long pt_index; 1412 + unsigned long *table; 1413 + struct page *page; 1414 + int rc; 1415 + 1416 + spin_lock(&sg->guest_table_lock); 1417 + table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */ 1418 + if (table && !(*table & _SEGMENT_ENTRY_INVALID)) { 1419 + /* Shadow page tables are full pages (pte+pgste) */ 1420 + page = pfn_to_page(*table >> PAGE_SHIFT); 1421 + pt_index = gmap_pgste_get_pgt_addr(page_to_virt(page)); 1422 + *pgt = pt_index & ~GMAP_SHADOW_FAKE_TABLE; 1423 + *dat_protection = !!(*table & _SEGMENT_ENTRY_PROTECT); 1424 + *fake = !!(pt_index & GMAP_SHADOW_FAKE_TABLE); 1425 + rc = 0; 1426 + } else { 1427 + rc = -EAGAIN; 1428 + } 1429 + spin_unlock(&sg->guest_table_lock); 1430 + return rc; 1431 + } 1432 + 1433 + /** 1397 1434 * kvm_s390_shadow_fault - handle fault on a shadow page table 1398 1435 * @vcpu: virtual cpu 1399 1436 * @sg: pointer to the shadow guest address space structure ··· 1454 1415 int dat_protection, fake; 1455 1416 int rc; 1456 1417 1418 + if (KVM_BUG_ON(!gmap_is_shadow(sg), vcpu->kvm)) 1419 + return -EFAULT; 1420 + 1457 1421 mmap_read_lock(sg->mm); 1458 1422 /* 1459 1423 * We don't want any guest-2 tables to change - so the parent ··· 1465 1423 */ 1466 1424 ipte_lock(vcpu->kvm); 1467 1425 1468 - rc = gmap_shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake); 1426 + rc = shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake); 1469 1427 if (rc) 1470 1428 rc = kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection, 1471 1429 &fake);

+142

arch/s390/kvm/gmap-vsie.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Guest memory management for KVM/s390 nested VMs. 4 + * 5 + * Copyright IBM Corp. 2008, 2020, 2024 6 + * 7 + * Author(s): Claudio Imbrenda <imbrenda@linux.ibm.com> 8 + * Martin Schwidefsky <schwidefsky@de.ibm.com> 9 + * David Hildenbrand <david@redhat.com> 10 + * Janosch Frank <frankja@linux.vnet.ibm.com> 11 + */ 12 + 13 + #include <linux/compiler.h> 14 + #include <linux/kvm.h> 15 + #include <linux/kvm_host.h> 16 + #include <linux/pgtable.h> 17 + #include <linux/pagemap.h> 18 + #include <linux/mman.h> 19 + 20 + #include <asm/lowcore.h> 21 + #include <asm/gmap.h> 22 + #include <asm/uv.h> 23 + 24 + #include "kvm-s390.h" 25 + #include "gmap.h" 26 + 27 + /** 28 + * gmap_find_shadow - find a specific asce in the list of shadow tables 29 + * @parent: pointer to the parent gmap 30 + * @asce: ASCE for which the shadow table is created 31 + * @edat_level: edat level to be used for the shadow translation 32 + * 33 + * Returns the pointer to a gmap if a shadow table with the given asce is 34 + * already available, ERR_PTR(-EAGAIN) if another one is just being created, 35 + * otherwise NULL 36 + * 37 + * Context: Called with parent->shadow_lock held 38 + */ 39 + static struct gmap *gmap_find_shadow(struct gmap *parent, unsigned long asce, int edat_level) 40 + { 41 + struct gmap *sg; 42 + 43 + lockdep_assert_held(&parent->shadow_lock); 44 + list_for_each_entry(sg, &parent->children, list) { 45 + if (!gmap_shadow_valid(sg, asce, edat_level)) 46 + continue; 47 + if (!sg->initialized) 48 + return ERR_PTR(-EAGAIN); 49 + refcount_inc(&sg->ref_count); 50 + return sg; 51 + } 52 + return NULL; 53 + } 54 + 55 + /** 56 + * gmap_shadow - create/find a shadow guest address space 57 + * @parent: pointer to the parent gmap 58 + * @asce: ASCE for which the shadow table is created 59 + * @edat_level: edat level to be used for the shadow translation 60 + * 61 + * The pages of the top level page table referred by the asce parameter 62 + * will be set to read-only and marked in the PGSTEs of the kvm process. 63 + * The shadow table will be removed automatically on any change to the 64 + * PTE mapping for the source table. 65 + * 66 + * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of memory, 67 + * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the 68 + * parent gmap table could not be protected. 69 + */ 70 + struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat_level) 71 + { 72 + struct gmap *sg, *new; 73 + unsigned long limit; 74 + int rc; 75 + 76 + if (KVM_BUG_ON(parent->mm->context.allow_gmap_hpage_1m, (struct kvm *)parent->private) || 77 + KVM_BUG_ON(gmap_is_shadow(parent), (struct kvm *)parent->private)) 78 + return ERR_PTR(-EFAULT); 79 + spin_lock(&parent->shadow_lock); 80 + sg = gmap_find_shadow(parent, asce, edat_level); 81 + spin_unlock(&parent->shadow_lock); 82 + if (sg) 83 + return sg; 84 + /* Create a new shadow gmap */ 85 + limit = -1UL >> (33 - (((asce & _ASCE_TYPE_MASK) >> 2) * 11)); 86 + if (asce & _ASCE_REAL_SPACE) 87 + limit = -1UL; 88 + new = gmap_alloc(limit); 89 + if (!new) 90 + return ERR_PTR(-ENOMEM); 91 + new->mm = parent->mm; 92 + new->parent = gmap_get(parent); 93 + new->private = parent->private; 94 + new->orig_asce = asce; 95 + new->edat_level = edat_level; 96 + new->initialized = false; 97 + spin_lock(&parent->shadow_lock); 98 + /* Recheck if another CPU created the same shadow */ 99 + sg = gmap_find_shadow(parent, asce, edat_level); 100 + if (sg) { 101 + spin_unlock(&parent->shadow_lock); 102 + gmap_free(new); 103 + return sg; 104 + } 105 + if (asce & _ASCE_REAL_SPACE) { 106 + /* only allow one real-space gmap shadow */ 107 + list_for_each_entry(sg, &parent->children, list) { 108 + if (sg->orig_asce & _ASCE_REAL_SPACE) { 109 + spin_lock(&sg->guest_table_lock); 110 + gmap_unshadow(sg); 111 + spin_unlock(&sg->guest_table_lock); 112 + list_del(&sg->list); 113 + gmap_put(sg); 114 + break; 115 + } 116 + } 117 + } 118 + refcount_set(&new->ref_count, 2); 119 + list_add(&new->list, &parent->children); 120 + if (asce & _ASCE_REAL_SPACE) { 121 + /* nothing to protect, return right away */ 122 + new->initialized = true; 123 + spin_unlock(&parent->shadow_lock); 124 + return new; 125 + } 126 + spin_unlock(&parent->shadow_lock); 127 + /* protect after insertion, so it will get properly invalidated */ 128 + mmap_read_lock(parent->mm); 129 + rc = __kvm_s390_mprotect_many(parent, asce & _ASCE_ORIGIN, 130 + ((asce & _ASCE_TABLE_LENGTH) + 1), 131 + PROT_READ, GMAP_NOTIFY_SHADOW); 132 + mmap_read_unlock(parent->mm); 133 + spin_lock(&parent->shadow_lock); 134 + new->initialized = true; 135 + if (rc) { 136 + list_del(&new->list); 137 + gmap_free(new); 138 + new = ERR_PTR(rc); 139 + } 140 + spin_unlock(&parent->shadow_lock); 141 + return new; 142 + }

+212

arch/s390/kvm/gmap.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Guest memory management for KVM/s390 4 + * 5 + * Copyright IBM Corp. 2008, 2020, 2024 6 + * 7 + * Author(s): Claudio Imbrenda <imbrenda@linux.ibm.com> 8 + * Martin Schwidefsky <schwidefsky@de.ibm.com> 9 + * David Hildenbrand <david@redhat.com> 10 + * Janosch Frank <frankja@linux.vnet.ibm.com> 11 + */ 12 + 13 + #include <linux/compiler.h> 14 + #include <linux/kvm.h> 15 + #include <linux/kvm_host.h> 16 + #include <linux/pgtable.h> 17 + #include <linux/pagemap.h> 18 + 19 + #include <asm/lowcore.h> 20 + #include <asm/gmap.h> 21 + #include <asm/uv.h> 22 + 23 + #include "gmap.h" 24 + 25 + /** 26 + * should_export_before_import - Determine whether an export is needed 27 + * before an import-like operation 28 + * @uvcb: the Ultravisor control block of the UVC to be performed 29 + * @mm: the mm of the process 30 + * 31 + * Returns whether an export is needed before every import-like operation. 32 + * This is needed for shared pages, which don't trigger a secure storage 33 + * exception when accessed from a different guest. 34 + * 35 + * Although considered as one, the Unpin Page UVC is not an actual import, 36 + * so it is not affected. 37 + * 38 + * No export is needed also when there is only one protected VM, because the 39 + * page cannot belong to the wrong VM in that case (there is no "other VM" 40 + * it can belong to). 41 + * 42 + * Return: true if an export is needed before every import, otherwise false. 43 + */ 44 + static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm) 45 + { 46 + /* 47 + * The misc feature indicates, among other things, that importing a 48 + * shared page from a different protected VM will automatically also 49 + * transfer its ownership. 50 + */ 51 + if (uv_has_feature(BIT_UV_FEAT_MISC)) 52 + return false; 53 + if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED) 54 + return false; 55 + return atomic_read(&mm->context.protected_count) > 1; 56 + } 57 + 58 + static int __gmap_make_secure(struct gmap *gmap, struct page *page, void *uvcb) 59 + { 60 + struct folio *folio = page_folio(page); 61 + int rc; 62 + 63 + /* 64 + * Secure pages cannot be huge and userspace should not combine both. 65 + * In case userspace does it anyway this will result in an -EFAULT for 66 + * the unpack. The guest is thus never reaching secure mode. 67 + * If userspace plays dirty tricks and decides to map huge pages at a 68 + * later point in time, it will receive a segmentation fault or 69 + * KVM_RUN will return -EFAULT. 70 + */ 71 + if (folio_test_hugetlb(folio)) 72 + return -EFAULT; 73 + if (folio_test_large(folio)) { 74 + mmap_read_unlock(gmap->mm); 75 + rc = kvm_s390_wiggle_split_folio(gmap->mm, folio, true); 76 + mmap_read_lock(gmap->mm); 77 + if (rc) 78 + return rc; 79 + folio = page_folio(page); 80 + } 81 + 82 + if (!folio_trylock(folio)) 83 + return -EAGAIN; 84 + if (should_export_before_import(uvcb, gmap->mm)) 85 + uv_convert_from_secure(folio_to_phys(folio)); 86 + rc = make_folio_secure(folio, uvcb); 87 + folio_unlock(folio); 88 + 89 + /* 90 + * In theory a race is possible and the folio might have become 91 + * large again before the folio_trylock() above. In that case, no 92 + * action is performed and -EAGAIN is returned; the callers will 93 + * have to try again later. 94 + * In most cases this implies running the VM again, getting the same 95 + * exception again, and make another attempt in this function. 96 + * This is expected to happen extremely rarely. 97 + */ 98 + if (rc == -E2BIG) 99 + return -EAGAIN; 100 + /* The folio has too many references, try to shake some off */ 101 + if (rc == -EBUSY) { 102 + mmap_read_unlock(gmap->mm); 103 + kvm_s390_wiggle_split_folio(gmap->mm, folio, false); 104 + mmap_read_lock(gmap->mm); 105 + return -EAGAIN; 106 + } 107 + 108 + return rc; 109 + } 110 + 111 + /** 112 + * gmap_make_secure() - make one guest page secure 113 + * @gmap: the guest gmap 114 + * @gaddr: the guest address that needs to be made secure 115 + * @uvcb: the UVCB specifying which operation needs to be performed 116 + * 117 + * Context: needs to be called with kvm->srcu held. 118 + * Return: 0 on success, < 0 in case of error (see __gmap_make_secure()). 119 + */ 120 + int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb) 121 + { 122 + struct kvm *kvm = gmap->private; 123 + struct page *page; 124 + int rc = 0; 125 + 126 + lockdep_assert_held(&kvm->srcu); 127 + 128 + page = gfn_to_page(kvm, gpa_to_gfn(gaddr)); 129 + mmap_read_lock(gmap->mm); 130 + if (page) 131 + rc = __gmap_make_secure(gmap, page, uvcb); 132 + kvm_release_page_clean(page); 133 + mmap_read_unlock(gmap->mm); 134 + 135 + return rc; 136 + } 137 + 138 + int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr) 139 + { 140 + struct uv_cb_cts uvcb = { 141 + .header.cmd = UVC_CMD_CONV_TO_SEC_STOR, 142 + .header.len = sizeof(uvcb), 143 + .guest_handle = gmap->guest_handle, 144 + .gaddr = gaddr, 145 + }; 146 + 147 + return gmap_make_secure(gmap, gaddr, &uvcb); 148 + } 149 + 150 + /** 151 + * __gmap_destroy_page() - Destroy a guest page. 152 + * @gmap: the gmap of the guest 153 + * @page: the page to destroy 154 + * 155 + * An attempt will be made to destroy the given guest page. If the attempt 156 + * fails, an attempt is made to export the page. If both attempts fail, an 157 + * appropriate error is returned. 158 + * 159 + * Context: must be called holding the mm lock for gmap->mm 160 + */ 161 + static int __gmap_destroy_page(struct gmap *gmap, struct page *page) 162 + { 163 + struct folio *folio = page_folio(page); 164 + int rc; 165 + 166 + /* 167 + * See gmap_make_secure(): large folios cannot be secure. Small 168 + * folio implies FW_LEVEL_PTE. 169 + */ 170 + if (folio_test_large(folio)) 171 + return -EFAULT; 172 + 173 + rc = uv_destroy_folio(folio); 174 + /* 175 + * Fault handlers can race; it is possible that two CPUs will fault 176 + * on the same secure page. One CPU can destroy the page, reboot, 177 + * re-enter secure mode and import it, while the second CPU was 178 + * stuck at the beginning of the handler. At some point the second 179 + * CPU will be able to progress, and it will not be able to destroy 180 + * the page. In that case we do not want to terminate the process, 181 + * we instead try to export the page. 182 + */ 183 + if (rc) 184 + rc = uv_convert_from_secure_folio(folio); 185 + 186 + return rc; 187 + } 188 + 189 + /** 190 + * gmap_destroy_page() - Destroy a guest page. 191 + * @gmap: the gmap of the guest 192 + * @gaddr: the guest address to destroy 193 + * 194 + * An attempt will be made to destroy the given guest page. If the attempt 195 + * fails, an attempt is made to export the page. If both attempts fail, an 196 + * appropriate error is returned. 197 + * 198 + * Context: may sleep. 199 + */ 200 + int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr) 201 + { 202 + struct page *page; 203 + int rc = 0; 204 + 205 + mmap_read_lock(gmap->mm); 206 + page = gfn_to_page(gmap->private, gpa_to_gfn(gaddr)); 207 + if (page) 208 + rc = __gmap_destroy_page(gmap, page); 209 + kvm_release_page_clean(page); 210 + mmap_read_unlock(gmap->mm); 211 + return rc; 212 + }

+39

arch/s390/kvm/gmap.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * KVM guest address space mapping code 4 + * 5 + * Copyright IBM Corp. 2007, 2016, 2025 6 + * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com> 7 + * Claudio Imbrenda <imbrenda@linux.ibm.com> 8 + */ 9 + 10 + #ifndef ARCH_KVM_S390_GMAP_H 11 + #define ARCH_KVM_S390_GMAP_H 12 + 13 + #define GMAP_SHADOW_FAKE_TABLE 1ULL 14 + 15 + int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb); 16 + int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr); 17 + int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr); 18 + struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat_level); 19 + 20 + /** 21 + * gmap_shadow_valid - check if a shadow guest address space matches the 22 + * given properties and is still valid 23 + * @sg: pointer to the shadow guest address space structure 24 + * @asce: ASCE for which the shadow table is requested 25 + * @edat_level: edat level to be used for the shadow translation 26 + * 27 + * Returns 1 if the gmap shadow is still valid and matches the given 28 + * properties, the caller can continue using it. Returns 0 otherwise, the 29 + * caller has to request a new shadow gmap in this case. 30 + * 31 + */ 32 + static inline int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level) 33 + { 34 + if (sg->removed) 35 + return 0; 36 + return sg->orig_asce == asce && sg->edat_level == edat_level; 37 + } 38 + 39 + #endif

+4 -3

arch/s390/kvm/intercept.c

··· 21 21 #include "gaccess.h" 22 22 #include "trace.h" 23 23 #include "trace-s390.h" 24 + #include "gmap.h" 24 25 25 26 u8 kvm_s390_get_ilen(struct kvm_vcpu *vcpu) 26 27 { ··· 368 367 reg2, &srcaddr, GACC_FETCH, 0); 369 368 if (rc) 370 369 return kvm_s390_inject_prog_cond(vcpu, rc); 371 - rc = gmap_fault(vcpu->arch.gmap, srcaddr, 0); 370 + rc = kvm_s390_handle_dat_fault(vcpu, srcaddr, 0); 372 371 if (rc != 0) 373 372 return rc; 374 373 ··· 377 376 reg1, &dstaddr, GACC_STORE, 0); 378 377 if (rc) 379 378 return kvm_s390_inject_prog_cond(vcpu, rc); 380 - rc = gmap_fault(vcpu->arch.gmap, dstaddr, FAULT_FLAG_WRITE); 379 + rc = kvm_s390_handle_dat_fault(vcpu, dstaddr, FOLL_WRITE); 381 380 if (rc != 0) 382 381 return rc; 383 382 ··· 550 549 * If the unpin did not succeed, the guest will exit again for the UVC 551 550 * and we will retry the unpin. 552 551 */ 553 - if (rc == -EINVAL) 552 + if (rc == -EINVAL || rc == -ENXIO) 554 553 return 0; 555 554 /* 556 555 * If we got -EAGAIN here, we simply return it. It will eventually

+11 -8

arch/s390/kvm/interrupt.c

··· 2893 2893 struct kvm_kernel_irq_routing_entry *e, 2894 2894 const struct kvm_irq_routing_entry *ue) 2895 2895 { 2896 - u64 uaddr; 2896 + u64 uaddr_s, uaddr_i; 2897 + int idx; 2897 2898 2898 2899 switch (ue->type) { 2899 2900 /* we store the userspace addresses instead of the guest addresses */ ··· 2902 2901 if (kvm_is_ucontrol(kvm)) 2903 2902 return -EINVAL; 2904 2903 e->set = set_adapter_int; 2905 - uaddr = gmap_translate(kvm->arch.gmap, ue->u.adapter.summary_addr); 2906 - if (uaddr == -EFAULT) 2904 + 2905 + idx = srcu_read_lock(&kvm->srcu); 2906 + uaddr_s = gpa_to_hva(kvm, ue->u.adapter.summary_addr); 2907 + uaddr_i = gpa_to_hva(kvm, ue->u.adapter.ind_addr); 2908 + srcu_read_unlock(&kvm->srcu, idx); 2909 + 2910 + if (kvm_is_error_hva(uaddr_s) || kvm_is_error_hva(uaddr_i)) 2907 2911 return -EFAULT; 2908 - e->adapter.summary_addr = uaddr; 2909 - uaddr = gmap_translate(kvm->arch.gmap, ue->u.adapter.ind_addr); 2910 - if (uaddr == -EFAULT) 2911 - return -EFAULT; 2912 - e->adapter.ind_addr = uaddr; 2912 + e->adapter.summary_addr = uaddr_s; 2913 + e->adapter.ind_addr = uaddr_i; 2913 2914 e->adapter.summary_offset = ue->u.adapter.summary_offset; 2914 2915 e->adapter.ind_offset = ue->u.adapter.ind_offset; 2915 2916 e->adapter.adapter_id = ue->u.adapter.adapter_id;

+197 -40

arch/s390/kvm/kvm-s390.c

··· 50 50 #include "kvm-s390.h" 51 51 #include "gaccess.h" 52 52 #include "pci.h" 53 + #include "gmap.h" 53 54 54 55 #define CREATE_TRACE_POINTS 55 56 #include "trace.h" ··· 3429 3428 VM_EVENT(kvm, 3, "vm created with type %lu", type); 3430 3429 3431 3430 if (type & KVM_VM_S390_UCONTROL) { 3431 + struct kvm_userspace_memory_region2 fake_memslot = { 3432 + .slot = KVM_S390_UCONTROL_MEMSLOT, 3433 + .guest_phys_addr = 0, 3434 + .userspace_addr = 0, 3435 + .memory_size = ALIGN_DOWN(TASK_SIZE, _SEGMENT_SIZE), 3436 + .flags = 0, 3437 + }; 3438 + 3432 3439 kvm->arch.gmap = NULL; 3433 3440 kvm->arch.mem_limit = KVM_S390_NO_MEM_LIMIT; 3441 + /* one flat fake memslot covering the whole address-space */ 3442 + mutex_lock(&kvm->slots_lock); 3443 + KVM_BUG_ON(kvm_set_internal_memslot(kvm, &fake_memslot), kvm); 3444 + mutex_unlock(&kvm->slots_lock); 3434 3445 } else { 3435 3446 if (sclp.hamax == U64_MAX) 3436 3447 kvm->arch.mem_limit = TASK_SIZE_MAX; ··· 4511 4498 return kvm_s390_test_cpuflags(vcpu, CPUSTAT_IBS); 4512 4499 } 4513 4500 4501 + static int __kvm_s390_fixup_fault_sync(struct gmap *gmap, gpa_t gaddr, unsigned int flags) 4502 + { 4503 + struct kvm *kvm = gmap->private; 4504 + gfn_t gfn = gpa_to_gfn(gaddr); 4505 + bool unlocked; 4506 + hva_t vmaddr; 4507 + gpa_t tmp; 4508 + int rc; 4509 + 4510 + if (kvm_is_ucontrol(kvm)) { 4511 + tmp = __gmap_translate(gmap, gaddr); 4512 + gfn = gpa_to_gfn(tmp); 4513 + } 4514 + 4515 + vmaddr = gfn_to_hva(kvm, gfn); 4516 + rc = fixup_user_fault(gmap->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked); 4517 + if (!rc) 4518 + rc = __gmap_link(gmap, gaddr, vmaddr); 4519 + return rc; 4520 + } 4521 + 4522 + /** 4523 + * __kvm_s390_mprotect_many() - Apply specified protection to guest pages 4524 + * @gmap: the gmap of the guest 4525 + * @gpa: the starting guest address 4526 + * @npages: how many pages to protect 4527 + * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE 4528 + * @bits: pgste notification bits to set 4529 + * 4530 + * Returns: 0 in case of success, < 0 in case of error - see gmap_protect_one() 4531 + * 4532 + * Context: kvm->srcu and gmap->mm need to be held in read mode 4533 + */ 4534 + int __kvm_s390_mprotect_many(struct gmap *gmap, gpa_t gpa, u8 npages, unsigned int prot, 4535 + unsigned long bits) 4536 + { 4537 + unsigned int fault_flag = (prot & PROT_WRITE) ? FAULT_FLAG_WRITE : 0; 4538 + gpa_t end = gpa + npages * PAGE_SIZE; 4539 + int rc; 4540 + 4541 + for (; gpa < end; gpa = ALIGN(gpa + 1, rc)) { 4542 + rc = gmap_protect_one(gmap, gpa, prot, bits); 4543 + if (rc == -EAGAIN) { 4544 + __kvm_s390_fixup_fault_sync(gmap, gpa, fault_flag); 4545 + rc = gmap_protect_one(gmap, gpa, prot, bits); 4546 + } 4547 + if (rc < 0) 4548 + return rc; 4549 + } 4550 + 4551 + return 0; 4552 + } 4553 + 4554 + static int kvm_s390_mprotect_notify_prefix(struct kvm_vcpu *vcpu) 4555 + { 4556 + gpa_t gaddr = kvm_s390_get_prefix(vcpu); 4557 + int idx, rc; 4558 + 4559 + idx = srcu_read_lock(&vcpu->kvm->srcu); 4560 + mmap_read_lock(vcpu->arch.gmap->mm); 4561 + 4562 + rc = __kvm_s390_mprotect_many(vcpu->arch.gmap, gaddr, 2, PROT_WRITE, GMAP_NOTIFY_MPROT); 4563 + 4564 + mmap_read_unlock(vcpu->arch.gmap->mm); 4565 + srcu_read_unlock(&vcpu->kvm->srcu, idx); 4566 + 4567 + return rc; 4568 + } 4569 + 4514 4570 static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) 4515 4571 { 4516 4572 retry: ··· 4595 4513 */ 4596 4514 if (kvm_check_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu)) { 4597 4515 int rc; 4598 - rc = gmap_mprotect_notify(vcpu->arch.gmap, 4599 - kvm_s390_get_prefix(vcpu), 4600 - PAGE_SIZE * 2, PROT_WRITE); 4516 + 4517 + rc = kvm_s390_mprotect_notify_prefix(vcpu); 4601 4518 if (rc) { 4602 4519 kvm_make_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu); 4603 4520 return rc; ··· 4847 4766 return kvm_s390_inject_prog_irq(vcpu, &pgm_info); 4848 4767 } 4849 4768 4769 + static void kvm_s390_assert_primary_as(struct kvm_vcpu *vcpu) 4770 + { 4771 + KVM_BUG(current->thread.gmap_teid.as != PSW_BITS_AS_PRIMARY, vcpu->kvm, 4772 + "Unexpected program interrupt 0x%x, TEID 0x%016lx", 4773 + current->thread.gmap_int_code, current->thread.gmap_teid.val); 4774 + } 4775 + 4776 + /* 4777 + * __kvm_s390_handle_dat_fault() - handle a dat fault for the gmap of a vcpu 4778 + * @vcpu: the vCPU whose gmap is to be fixed up 4779 + * @gfn: the guest frame number used for memslots (including fake memslots) 4780 + * @gaddr: the gmap address, does not have to match @gfn for ucontrol gmaps 4781 + * @flags: FOLL_* flags 4782 + * 4783 + * Return: 0 on success, < 0 in case of error. 4784 + * Context: The mm lock must not be held before calling. May sleep. 4785 + */ 4786 + int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t gaddr, unsigned int flags) 4787 + { 4788 + struct kvm_memory_slot *slot; 4789 + unsigned int fault_flags; 4790 + bool writable, unlocked; 4791 + unsigned long vmaddr; 4792 + struct page *page; 4793 + kvm_pfn_t pfn; 4794 + int rc; 4795 + 4796 + slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 4797 + if (!slot || slot->flags & KVM_MEMSLOT_INVALID) 4798 + return vcpu_post_run_addressing_exception(vcpu); 4799 + 4800 + fault_flags = flags & FOLL_WRITE ? FAULT_FLAG_WRITE : 0; 4801 + if (vcpu->arch.gmap->pfault_enabled) 4802 + flags |= FOLL_NOWAIT; 4803 + vmaddr = __gfn_to_hva_memslot(slot, gfn); 4804 + 4805 + try_again: 4806 + pfn = __kvm_faultin_pfn(slot, gfn, flags, &writable, &page); 4807 + 4808 + /* Access outside memory, inject addressing exception */ 4809 + if (is_noslot_pfn(pfn)) 4810 + return vcpu_post_run_addressing_exception(vcpu); 4811 + /* Signal pending: try again */ 4812 + if (pfn == KVM_PFN_ERR_SIGPENDING) 4813 + return -EAGAIN; 4814 + 4815 + /* Needs I/O, try to setup async pfault (only possible with FOLL_NOWAIT) */ 4816 + if (pfn == KVM_PFN_ERR_NEEDS_IO) { 4817 + trace_kvm_s390_major_guest_pfault(vcpu); 4818 + if (kvm_arch_setup_async_pf(vcpu)) 4819 + return 0; 4820 + vcpu->stat.pfault_sync++; 4821 + /* Could not setup async pfault, try again synchronously */ 4822 + flags &= ~FOLL_NOWAIT; 4823 + goto try_again; 4824 + } 4825 + /* Any other error */ 4826 + if (is_error_pfn(pfn)) 4827 + return -EFAULT; 4828 + 4829 + /* Success */ 4830 + mmap_read_lock(vcpu->arch.gmap->mm); 4831 + /* Mark the userspace PTEs as young and/or dirty, to avoid page fault loops */ 4832 + rc = fixup_user_fault(vcpu->arch.gmap->mm, vmaddr, fault_flags, &unlocked); 4833 + if (!rc) 4834 + rc = __gmap_link(vcpu->arch.gmap, gaddr, vmaddr); 4835 + scoped_guard(spinlock, &vcpu->kvm->mmu_lock) { 4836 + kvm_release_faultin_page(vcpu->kvm, page, false, writable); 4837 + } 4838 + mmap_read_unlock(vcpu->arch.gmap->mm); 4839 + return rc; 4840 + } 4841 + 4842 + static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, unsigned long gaddr, unsigned int flags) 4843 + { 4844 + unsigned long gaddr_tmp; 4845 + gfn_t gfn; 4846 + 4847 + gfn = gpa_to_gfn(gaddr); 4848 + if (kvm_is_ucontrol(vcpu->kvm)) { 4849 + /* 4850 + * This translates the per-vCPU guest address into a 4851 + * fake guest address, which can then be used with the 4852 + * fake memslots that are identity mapping userspace. 4853 + * This allows ucontrol VMs to use the normal fault 4854 + * resolution path, like normal VMs. 4855 + */ 4856 + mmap_read_lock(vcpu->arch.gmap->mm); 4857 + gaddr_tmp = __gmap_translate(vcpu->arch.gmap, gaddr); 4858 + mmap_read_unlock(vcpu->arch.gmap->mm); 4859 + if (gaddr_tmp == -EFAULT) { 4860 + vcpu->run->exit_reason = KVM_EXIT_S390_UCONTROL; 4861 + vcpu->run->s390_ucontrol.trans_exc_code = gaddr; 4862 + vcpu->run->s390_ucontrol.pgm_code = PGM_SEGMENT_TRANSLATION; 4863 + return -EREMOTE; 4864 + } 4865 + gfn = gpa_to_gfn(gaddr_tmp); 4866 + } 4867 + return __kvm_s390_handle_dat_fault(vcpu, gfn, gaddr, flags); 4868 + } 4869 + 4850 4870 static int vcpu_post_run_handle_fault(struct kvm_vcpu *vcpu) 4851 4871 { 4852 4872 unsigned int flags = 0; 4853 4873 unsigned long gaddr; 4854 - int rc = 0; 4855 4874 4856 4875 gaddr = current->thread.gmap_teid.addr * PAGE_SIZE; 4857 4876 if (kvm_s390_cur_gmap_fault_is_write()) ··· 4962 4781 vcpu->stat.exit_null++; 4963 4782 break; 4964 4783 case PGM_NON_SECURE_STORAGE_ACCESS: 4965 - KVM_BUG(current->thread.gmap_teid.as != PSW_BITS_AS_PRIMARY, vcpu->kvm, 4966 - "Unexpected program interrupt 0x%x, TEID 0x%016lx", 4967 - current->thread.gmap_int_code, current->thread.gmap_teid.val); 4784 + kvm_s390_assert_primary_as(vcpu); 4968 4785 /* 4969 4786 * This is normal operation; a page belonging to a protected 4970 4787 * guest has not been imported yet. Try to import the page into ··· 4973 4794 break; 4974 4795 case PGM_SECURE_STORAGE_ACCESS: 4975 4796 case PGM_SECURE_STORAGE_VIOLATION: 4976 - KVM_BUG(current->thread.gmap_teid.as != PSW_BITS_AS_PRIMARY, vcpu->kvm, 4977 - "Unexpected program interrupt 0x%x, TEID 0x%016lx", 4978 - current->thread.gmap_int_code, current->thread.gmap_teid.val); 4797 + kvm_s390_assert_primary_as(vcpu); 4979 4798 /* 4980 4799 * This can happen after a reboot with asynchronous teardown; 4981 4800 * the new guest (normal or protected) will run on top of the ··· 5002 4825 case PGM_REGION_FIRST_TRANS: 5003 4826 case PGM_REGION_SECOND_TRANS: 5004 4827 case PGM_REGION_THIRD_TRANS: 5005 - KVM_BUG(current->thread.gmap_teid.as != PSW_BITS_AS_PRIMARY, vcpu->kvm, 5006 - "Unexpected program interrupt 0x%x, TEID 0x%016lx", 5007 - current->thread.gmap_int_code, current->thread.gmap_teid.val); 5008 - if (vcpu->arch.gmap->pfault_enabled) { 5009 - rc = gmap_fault(vcpu->arch.gmap, gaddr, flags | FAULT_FLAG_RETRY_NOWAIT); 5010 - if (rc == -EFAULT) 5011 - return vcpu_post_run_addressing_exception(vcpu); 5012 - if (rc == -EAGAIN) { 5013 - trace_kvm_s390_major_guest_pfault(vcpu); 5014 - if (kvm_arch_setup_async_pf(vcpu)) 5015 - return 0; 5016 - vcpu->stat.pfault_sync++; 5017 - } else { 5018 - return rc; 5019 - } 5020 - } 5021 - rc = gmap_fault(vcpu->arch.gmap, gaddr, flags); 5022 - if (rc == -EFAULT) { 5023 - if (kvm_is_ucontrol(vcpu->kvm)) { 5024 - vcpu->run->exit_reason = KVM_EXIT_S390_UCONTROL; 5025 - vcpu->run->s390_ucontrol.trans_exc_code = gaddr; 5026 - vcpu->run->s390_ucontrol.pgm_code = 0x10; 5027 - return -EREMOTE; 5028 - } 5029 - return vcpu_post_run_addressing_exception(vcpu); 5030 - } 5031 - break; 4828 + kvm_s390_assert_primary_as(vcpu); 4829 + return vcpu_dat_fault_handler(vcpu, gaddr, flags); 5032 4830 default: 5033 4831 KVM_BUG(1, vcpu->kvm, "Unexpected program interrupt 0x%x, TEID 0x%016lx", 5034 4832 current->thread.gmap_int_code, current->thread.gmap_teid.val); 5035 4833 send_sig(SIGSEGV, current, 0); 5036 4834 break; 5037 4835 } 5038 - return rc; 4836 + return 0; 5039 4837 } 5040 4838 5041 4839 static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason) ··· 5889 5737 } 5890 5738 #endif 5891 5739 case KVM_S390_VCPU_FAULT: { 5892 - r = gmap_fault(vcpu->arch.gmap, arg, 0); 5740 + idx = srcu_read_lock(&vcpu->kvm->srcu); 5741 + r = vcpu_dat_fault_handler(vcpu, arg, 0); 5742 + srcu_read_unlock(&vcpu->kvm->srcu, idx); 5893 5743 break; 5894 5744 } 5895 5745 case KVM_ENABLE_CAP: ··· 6007 5853 { 6008 5854 gpa_t size; 6009 5855 6010 - if (kvm_is_ucontrol(kvm)) 5856 + if (kvm_is_ucontrol(kvm) && new->id < KVM_USER_MEM_SLOTS) 6011 5857 return -EINVAL; 6012 5858 6013 5859 /* When we are protected, we should not change the memory slots */ ··· 6058 5904 enum kvm_mr_change change) 6059 5905 { 6060 5906 int rc = 0; 5907 + 5908 + if (kvm_is_ucontrol(kvm)) 5909 + return; 6061 5910 6062 5911 switch (change) { 6063 5912 case KVM_MR_DELETE:

+19

arch/s390/kvm/kvm-s390.h

··· 20 20 #include <asm/processor.h> 21 21 #include <asm/sclp.h> 22 22 23 + #define KVM_S390_UCONTROL_MEMSLOT (KVM_USER_MEM_SLOTS + 0) 24 + 23 25 static inline void kvm_s390_fpu_store(struct kvm_run *run) 24 26 { 25 27 fpu_stfpc(&run->s.regs.fpc); ··· 281 279 return gd; 282 280 } 283 281 282 + static inline hva_t gpa_to_hva(struct kvm *kvm, gpa_t gpa) 283 + { 284 + hva_t hva = gfn_to_hva(kvm, gpa_to_gfn(gpa)); 285 + 286 + if (!kvm_is_error_hva(hva)) 287 + hva |= offset_in_page(gpa); 288 + return hva; 289 + } 290 + 284 291 /* implemented in pv.c */ 285 292 int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc); 286 293 int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc); ··· 419 408 void kvm_s390_set_cpu_timer(struct kvm_vcpu *vcpu, __u64 cputm); 420 409 __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu); 421 410 int kvm_s390_cpus_from_pv(struct kvm *kvm, u16 *rc, u16 *rrc); 411 + int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t gaddr, unsigned int flags); 412 + int __kvm_s390_mprotect_many(struct gmap *gmap, gpa_t gpa, u8 npages, unsigned int prot, 413 + unsigned long bits); 414 + 415 + static inline int kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gpa_t gaddr, unsigned int flags) 416 + { 417 + return __kvm_s390_handle_dat_fault(vcpu, gpa_to_gfn(gaddr), gaddr, flags); 418 + } 422 419 423 420 /* implemented in diag.c */ 424 421 int kvm_s390_handle_diag(struct kvm_vcpu *vcpu);

+21

arch/s390/kvm/pv.c

··· 17 17 #include <linux/sched/mm.h> 18 18 #include <linux/mmu_notifier.h> 19 19 #include "kvm-s390.h" 20 + #include "gmap.h" 20 21 21 22 bool kvm_s390_pv_is_protected(struct kvm *kvm) 22 23 { ··· 639 638 .tweak[1] = offset, 640 639 }; 641 640 int ret = gmap_make_secure(kvm->arch.gmap, addr, &uvcb); 641 + unsigned long vmaddr; 642 + bool unlocked; 642 643 643 644 *rc = uvcb.header.rc; 644 645 *rrc = uvcb.header.rrc; 646 + 647 + if (ret == -ENXIO) { 648 + mmap_read_lock(kvm->mm); 649 + vmaddr = gfn_to_hva(kvm, gpa_to_gfn(addr)); 650 + if (kvm_is_error_hva(vmaddr)) { 651 + ret = -EFAULT; 652 + } else { 653 + ret = fixup_user_fault(kvm->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked); 654 + if (!ret) 655 + ret = __gmap_link(kvm->arch.gmap, addr, vmaddr); 656 + } 657 + mmap_read_unlock(kvm->mm); 658 + if (!ret) 659 + return -EAGAIN; 660 + return ret; 661 + } 645 662 646 663 if (ret && ret != -EAGAIN) 647 664 KVM_UV_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx with rc %x rrc %x", ··· 678 659 679 660 KVM_UV_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx", 680 661 addr, size); 662 + 663 + guard(srcu)(&kvm->srcu); 681 664 682 665 while (offset < size) { 683 666 ret = unpack_one(kvm, addr, tweak, offset, rc, rrc);

+68 -38

arch/s390/kvm/vsie.c

··· 13 13 #include <linux/bitmap.h> 14 14 #include <linux/sched/signal.h> 15 15 #include <linux/io.h> 16 + #include <linux/mman.h> 16 17 17 18 #include <asm/gmap.h> 18 19 #include <asm/mmu_context.h> ··· 23 22 #include <asm/facility.h> 24 23 #include "kvm-s390.h" 25 24 #include "gaccess.h" 25 + #include "gmap.h" 26 + 27 + enum vsie_page_flags { 28 + VSIE_PAGE_IN_USE = 0, 29 + }; 26 30 27 31 struct vsie_page { 28 32 struct kvm_s390_sie_block scb_s; /* 0x0000 */ ··· 52 46 gpa_t gvrd_gpa; /* 0x0240 */ 53 47 gpa_t riccbd_gpa; /* 0x0248 */ 54 48 gpa_t sdnx_gpa; /* 0x0250 */ 55 - __u8 reserved[0x0700 - 0x0258]; /* 0x0258 */ 49 + /* 50 + * guest address of the original SCB. Remains set for free vsie 51 + * pages, so we can properly look them up in our addr_to_page 52 + * radix tree. 53 + */ 54 + gpa_t scb_gpa; /* 0x0258 */ 55 + /* 56 + * Flags: must be set/cleared atomically after the vsie page can be 57 + * looked up by other CPUs. 58 + */ 59 + unsigned long flags; /* 0x0260 */ 60 + __u8 reserved[0x0700 - 0x0268]; /* 0x0268 */ 56 61 struct kvm_s390_crypto_cb crycb; /* 0x0700 */ 57 62 __u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */ 58 63 }; ··· 601 584 struct kvm *kvm = gmap->private; 602 585 struct vsie_page *cur; 603 586 unsigned long prefix; 604 - struct page *page; 605 587 int i; 606 588 607 589 if (!gmap_is_shadow(gmap)) ··· 610 594 * therefore we can safely reference them all the time. 611 595 */ 612 596 for (i = 0; i < kvm->arch.vsie.page_count; i++) { 613 - page = READ_ONCE(kvm->arch.vsie.pages[i]); 614 - if (!page) 597 + cur = READ_ONCE(kvm->arch.vsie.pages[i]); 598 + if (!cur) 615 599 continue; 616 - cur = page_to_virt(page); 617 600 if (READ_ONCE(cur->gmap) != gmap) 618 601 continue; 619 602 prefix = cur->scb_s.prefix << GUEST_PREFIX_SHIFT; ··· 1360 1345 return rc; 1361 1346 } 1362 1347 1348 + /* Try getting a given vsie page, returning "true" on success. */ 1349 + static inline bool try_get_vsie_page(struct vsie_page *vsie_page) 1350 + { 1351 + if (test_bit(VSIE_PAGE_IN_USE, &vsie_page->flags)) 1352 + return false; 1353 + return !test_and_set_bit(VSIE_PAGE_IN_USE, &vsie_page->flags); 1354 + } 1355 + 1356 + /* Put a vsie page acquired through get_vsie_page / try_get_vsie_page. */ 1357 + static void put_vsie_page(struct vsie_page *vsie_page) 1358 + { 1359 + clear_bit(VSIE_PAGE_IN_USE, &vsie_page->flags); 1360 + } 1361 + 1363 1362 /* 1364 1363 * Get or create a vsie page for a scb address. 1365 1364 * ··· 1384 1355 static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr) 1385 1356 { 1386 1357 struct vsie_page *vsie_page; 1387 - struct page *page; 1388 1358 int nr_vcpus; 1389 1359 1390 1360 rcu_read_lock(); 1391 - page = radix_tree_lookup(&kvm->arch.vsie.addr_to_page, addr >> 9); 1361 + vsie_page = radix_tree_lookup(&kvm->arch.vsie.addr_to_page, addr >> 9); 1392 1362 rcu_read_unlock(); 1393 - if (page) { 1394 - if (page_ref_inc_return(page) == 2) 1395 - return page_to_virt(page); 1396 - page_ref_dec(page); 1363 + if (vsie_page) { 1364 + if (try_get_vsie_page(vsie_page)) { 1365 + if (vsie_page->scb_gpa == addr) 1366 + return vsie_page; 1367 + /* 1368 + * We raced with someone reusing + putting this vsie 1369 + * page before we grabbed it. 1370 + */ 1371 + put_vsie_page(vsie_page); 1372 + } 1397 1373 } 1398 1374 1399 1375 /* ··· 1409 1375 1410 1376 mutex_lock(&kvm->arch.vsie.mutex); 1411 1377 if (kvm->arch.vsie.page_count < nr_vcpus) { 1412 - page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | GFP_DMA); 1413 - if (!page) { 1378 + vsie_page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | GFP_DMA); 1379 + if (!vsie_page) { 1414 1380 mutex_unlock(&kvm->arch.vsie.mutex); 1415 1381 return ERR_PTR(-ENOMEM); 1416 1382 } 1417 - page_ref_inc(page); 1418 - kvm->arch.vsie.pages[kvm->arch.vsie.page_count] = page; 1383 + __set_bit(VSIE_PAGE_IN_USE, &vsie_page->flags); 1384 + kvm->arch.vsie.pages[kvm->arch.vsie.page_count] = vsie_page; 1419 1385 kvm->arch.vsie.page_count++; 1420 1386 } else { 1421 1387 /* reuse an existing entry that belongs to nobody */ 1422 1388 while (true) { 1423 - page = kvm->arch.vsie.pages[kvm->arch.vsie.next]; 1424 - if (page_ref_inc_return(page) == 2) 1389 + vsie_page = kvm->arch.vsie.pages[kvm->arch.vsie.next]; 1390 + if (try_get_vsie_page(vsie_page)) 1425 1391 break; 1426 - page_ref_dec(page); 1427 1392 kvm->arch.vsie.next++; 1428 1393 kvm->arch.vsie.next %= nr_vcpus; 1429 1394 } 1430 - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); 1395 + if (vsie_page->scb_gpa != ULONG_MAX) 1396 + radix_tree_delete(&kvm->arch.vsie.addr_to_page, 1397 + vsie_page->scb_gpa >> 9); 1431 1398 } 1432 - page->index = addr; 1433 - /* double use of the same address */ 1434 - if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, page)) { 1435 - page_ref_dec(page); 1399 + /* Mark it as invalid until it resides in the tree. */ 1400 + vsie_page->scb_gpa = ULONG_MAX; 1401 + 1402 + /* Double use of the same address or allocation failure. */ 1403 + if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, 1404 + vsie_page)) { 1405 + put_vsie_page(vsie_page); 1436 1406 mutex_unlock(&kvm->arch.vsie.mutex); 1437 1407 return NULL; 1438 1408 } 1409 + vsie_page->scb_gpa = addr; 1439 1410 mutex_unlock(&kvm->arch.vsie.mutex); 1440 1411 1441 - vsie_page = page_to_virt(page); 1442 1412 memset(&vsie_page->scb_s, 0, sizeof(struct kvm_s390_sie_block)); 1443 1413 release_gmap_shadow(vsie_page); 1444 1414 vsie_page->fault_addr = 0; 1445 1415 vsie_page->scb_s.ihcpu = 0xffffU; 1446 1416 return vsie_page; 1447 - } 1448 - 1449 - /* put a vsie page acquired via get_vsie_page */ 1450 - static void put_vsie_page(struct kvm *kvm, struct vsie_page *vsie_page) 1451 - { 1452 - struct page *page = pfn_to_page(__pa(vsie_page) >> PAGE_SHIFT); 1453 - 1454 - page_ref_dec(page); 1455 1417 } 1456 1418 1457 1419 int kvm_s390_handle_vsie(struct kvm_vcpu *vcpu) ··· 1500 1470 out_unpin_scb: 1501 1471 unpin_scb(vcpu, vsie_page, scb_addr); 1502 1472 out_put: 1503 - put_vsie_page(vcpu->kvm, vsie_page); 1473 + put_vsie_page(vsie_page); 1504 1474 1505 1475 return rc < 0 ? rc : 0; 1506 1476 } ··· 1516 1486 void kvm_s390_vsie_destroy(struct kvm *kvm) 1517 1487 { 1518 1488 struct vsie_page *vsie_page; 1519 - struct page *page; 1520 1489 int i; 1521 1490 1522 1491 mutex_lock(&kvm->arch.vsie.mutex); 1523 1492 for (i = 0; i < kvm->arch.vsie.page_count; i++) { 1524 - page = kvm->arch.vsie.pages[i]; 1493 + vsie_page = kvm->arch.vsie.pages[i]; 1525 1494 kvm->arch.vsie.pages[i] = NULL; 1526 - vsie_page = page_to_virt(page); 1527 1495 release_gmap_shadow(vsie_page); 1528 1496 /* free the radix tree entry */ 1529 - radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9); 1530 - __free_page(page); 1497 + if (vsie_page->scb_gpa != ULONG_MAX) 1498 + radix_tree_delete(&kvm->arch.vsie.addr_to_page, 1499 + vsie_page->scb_gpa >> 9); 1500 + free_page((unsigned long)vsie_page); 1531 1501 } 1532 1502 kvm->arch.vsie.page_count = 0; 1533 1503 mutex_unlock(&kvm->arch.vsie.mutex);

+150 -531

arch/s390/mm/gmap.c

··· 24 24 #include <asm/page.h> 25 25 #include <asm/tlb.h> 26 26 27 + /* 28 + * The address is saved in a radix tree directly; NULL would be ambiguous, 29 + * since 0 is a valid address, and NULL is returned when nothing was found. 30 + * The lower bits are ignored by all users of the macro, so it can be used 31 + * to distinguish a valid address 0 from a NULL. 32 + */ 33 + #define VALID_GADDR_FLAG 1 34 + #define IS_GADDR_VALID(gaddr) ((gaddr) & VALID_GADDR_FLAG) 35 + #define MAKE_VALID_GADDR(gaddr) (((gaddr) & HPAGE_MASK) | VALID_GADDR_FLAG) 36 + 27 37 #define GMAP_SHADOW_FAKE_TABLE 1ULL 28 38 29 39 static struct page *gmap_alloc_crst(void) ··· 53 43 * 54 44 * Returns a guest address space structure. 55 45 */ 56 - static struct gmap *gmap_alloc(unsigned long limit) 46 + struct gmap *gmap_alloc(unsigned long limit) 57 47 { 58 48 struct gmap *gmap; 59 49 struct page *page; ··· 80 70 gmap = kzalloc(sizeof(struct gmap), GFP_KERNEL_ACCOUNT); 81 71 if (!gmap) 82 72 goto out; 83 - INIT_LIST_HEAD(&gmap->crst_list); 84 73 INIT_LIST_HEAD(&gmap->children); 85 - INIT_LIST_HEAD(&gmap->pt_list); 86 74 INIT_RADIX_TREE(&gmap->guest_to_host, GFP_KERNEL_ACCOUNT); 87 75 INIT_RADIX_TREE(&gmap->host_to_guest, GFP_ATOMIC | __GFP_ACCOUNT); 88 76 INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_ATOMIC | __GFP_ACCOUNT); ··· 90 82 page = gmap_alloc_crst(); 91 83 if (!page) 92 84 goto out_free; 93 - page->index = 0; 94 - list_add(&page->lru, &gmap->crst_list); 95 85 table = page_to_virt(page); 96 86 crst_table_init(table, etype); 97 87 gmap->table = table; ··· 103 97 out: 104 98 return NULL; 105 99 } 100 + EXPORT_SYMBOL_GPL(gmap_alloc); 106 101 107 102 /** 108 103 * gmap_create - create a guest address space ··· 192 185 } while (nr > 0); 193 186 } 194 187 188 + static void gmap_free_crst(unsigned long *table, bool free_ptes) 189 + { 190 + bool is_segment = (table[0] & _SEGMENT_ENTRY_TYPE_MASK) == 0; 191 + int i; 192 + 193 + if (is_segment) { 194 + if (!free_ptes) 195 + goto out; 196 + for (i = 0; i < _CRST_ENTRIES; i++) 197 + if (!(table[i] & _SEGMENT_ENTRY_INVALID)) 198 + page_table_free_pgste(page_ptdesc(phys_to_page(table[i]))); 199 + } else { 200 + for (i = 0; i < _CRST_ENTRIES; i++) 201 + if (!(table[i] & _REGION_ENTRY_INVALID)) 202 + gmap_free_crst(__va(table[i] & PAGE_MASK), free_ptes); 203 + } 204 + 205 + out: 206 + free_pages((unsigned long)table, CRST_ALLOC_ORDER); 207 + } 208 + 195 209 /** 196 210 * gmap_free - free a guest address space 197 211 * @gmap: pointer to the guest address space structure 198 212 * 199 213 * No locks required. There are no references to this gmap anymore. 200 214 */ 201 - static void gmap_free(struct gmap *gmap) 215 + void gmap_free(struct gmap *gmap) 202 216 { 203 - struct page *page, *next; 204 - 205 217 /* Flush tlb of all gmaps (if not already done for shadows) */ 206 218 if (!(gmap_is_shadow(gmap) && gmap->removed)) 207 219 gmap_flush_tlb(gmap); 208 220 /* Free all segment & region tables. */ 209 - list_for_each_entry_safe(page, next, &gmap->crst_list, lru) 210 - __free_pages(page, CRST_ALLOC_ORDER); 221 + gmap_free_crst(gmap->table, gmap_is_shadow(gmap)); 222 + 211 223 gmap_radix_tree_free(&gmap->guest_to_host); 212 224 gmap_radix_tree_free(&gmap->host_to_guest); 213 225 214 226 /* Free additional data for a shadow gmap */ 215 227 if (gmap_is_shadow(gmap)) { 216 - struct ptdesc *ptdesc, *n; 217 - 218 - /* Free all page tables. */ 219 - list_for_each_entry_safe(ptdesc, n, &gmap->pt_list, pt_list) 220 - page_table_free_pgste(ptdesc); 221 228 gmap_rmap_radix_tree_free(&gmap->host_to_rmap); 222 229 /* Release reference to the parent */ 223 230 gmap_put(gmap->parent); ··· 239 218 240 219 kfree(gmap); 241 220 } 221 + EXPORT_SYMBOL_GPL(gmap_free); 242 222 243 223 /** 244 224 * gmap_get - increase reference counter for guest address space ··· 320 298 crst_table_init(new, init); 321 299 spin_lock(&gmap->guest_table_lock); 322 300 if (*table & _REGION_ENTRY_INVALID) { 323 - list_add(&page->lru, &gmap->crst_list); 324 301 *table = __pa(new) | _REGION_ENTRY_LENGTH | 325 302 (*table & _REGION_ENTRY_TYPE_MASK); 326 - page->index = gaddr; 327 303 page = NULL; 328 304 } 329 305 spin_unlock(&gmap->guest_table_lock); ··· 330 310 return 0; 331 311 } 332 312 333 - /** 334 - * __gmap_segment_gaddr - find virtual address from segment pointer 335 - * @entry: pointer to a segment table entry in the guest address space 336 - * 337 - * Returns the virtual address in the guest address space for the segment 338 - */ 339 - static unsigned long __gmap_segment_gaddr(unsigned long *entry) 313 + static unsigned long host_to_guest_lookup(struct gmap *gmap, unsigned long vmaddr) 340 314 { 341 - struct page *page; 342 - unsigned long offset; 315 + return (unsigned long)radix_tree_lookup(&gmap->host_to_guest, vmaddr >> PMD_SHIFT); 316 + } 343 317 344 - offset = (unsigned long) entry / sizeof(unsigned long); 345 - offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE; 346 - page = pmd_pgtable_page((pmd_t *) entry); 347 - return page->index + offset; 318 + static unsigned long host_to_guest_delete(struct gmap *gmap, unsigned long vmaddr) 319 + { 320 + return (unsigned long)radix_tree_delete(&gmap->host_to_guest, vmaddr >> PMD_SHIFT); 321 + } 322 + 323 + static pmd_t *host_to_guest_pmd_delete(struct gmap *gmap, unsigned long vmaddr, 324 + unsigned long *gaddr) 325 + { 326 + *gaddr = host_to_guest_delete(gmap, vmaddr); 327 + if (IS_GADDR_VALID(*gaddr)) 328 + return (pmd_t *)gmap_table_walk(gmap, *gaddr, 1); 329 + return NULL; 348 330 } 349 331 350 332 /** ··· 358 336 */ 359 337 static int __gmap_unlink_by_vmaddr(struct gmap *gmap, unsigned long vmaddr) 360 338 { 361 - unsigned long *entry; 339 + unsigned long gaddr; 362 340 int flush = 0; 341 + pmd_t *pmdp; 363 342 364 343 BUG_ON(gmap_is_shadow(gmap)); 365 344 spin_lock(&gmap->guest_table_lock); 366 - entry = radix_tree_delete(&gmap->host_to_guest, vmaddr >> PMD_SHIFT); 367 - if (entry) { 368 - flush = (*entry != _SEGMENT_ENTRY_EMPTY); 369 - *entry = _SEGMENT_ENTRY_EMPTY; 345 + 346 + pmdp = host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); 347 + if (pmdp) { 348 + flush = (pmd_val(*pmdp) != _SEGMENT_ENTRY_EMPTY); 349 + *pmdp = __pmd(_SEGMENT_ENTRY_EMPTY); 370 350 } 351 + 371 352 spin_unlock(&gmap->guest_table_lock); 372 353 return flush; 373 354 } ··· 489 464 EXPORT_SYMBOL_GPL(__gmap_translate); 490 465 491 466 /** 492 - * gmap_translate - translate a guest address to a user space address 493 - * @gmap: pointer to guest mapping meta data structure 494 - * @gaddr: guest address 495 - * 496 - * Returns user space address which corresponds to the guest address or 497 - * -EFAULT if no such mapping exists. 498 - * This function does not establish potentially missing page table entries. 499 - */ 500 - unsigned long gmap_translate(struct gmap *gmap, unsigned long gaddr) 501 - { 502 - unsigned long rc; 503 - 504 - mmap_read_lock(gmap->mm); 505 - rc = __gmap_translate(gmap, gaddr); 506 - mmap_read_unlock(gmap->mm); 507 - return rc; 508 - } 509 - EXPORT_SYMBOL_GPL(gmap_translate); 510 - 511 - /** 512 467 * gmap_unlink - disconnect a page table from the gmap shadow tables 513 468 * @mm: pointer to the parent mm_struct 514 469 * @table: pointer to the host page table ··· 587 582 spin_lock(&gmap->guest_table_lock); 588 583 if (*table == _SEGMENT_ENTRY_EMPTY) { 589 584 rc = radix_tree_insert(&gmap->host_to_guest, 590 - vmaddr >> PMD_SHIFT, table); 585 + vmaddr >> PMD_SHIFT, 586 + (void *)MAKE_VALID_GADDR(gaddr)); 591 587 if (!rc) { 592 588 if (pmd_leaf(*pmd)) { 593 589 *table = (pmd_val(*pmd) & ··· 611 605 radix_tree_preload_end(); 612 606 return rc; 613 607 } 614 - 615 - /** 616 - * fixup_user_fault_nowait - manually resolve a user page fault without waiting 617 - * @mm: mm_struct of target mm 618 - * @address: user address 619 - * @fault_flags:flags to pass down to handle_mm_fault() 620 - * @unlocked: did we unlock the mmap_lock while retrying 621 - * 622 - * This function behaves similarly to fixup_user_fault(), but it guarantees 623 - * that the fault will be resolved without waiting. The function might drop 624 - * and re-acquire the mm lock, in which case @unlocked will be set to true. 625 - * 626 - * The guarantee is that the fault is handled without waiting, but the 627 - * function itself might sleep, due to the lock. 628 - * 629 - * Context: Needs to be called with mm->mmap_lock held in read mode, and will 630 - * return with the lock held in read mode; @unlocked will indicate whether 631 - * the lock has been dropped and re-acquired. This is the same behaviour as 632 - * fixup_user_fault(). 633 - * 634 - * Return: 0 on success, -EAGAIN if the fault cannot be resolved without 635 - * waiting, -EFAULT if the fault cannot be resolved, -ENOMEM if out of 636 - * memory. 637 - */ 638 - static int fixup_user_fault_nowait(struct mm_struct *mm, unsigned long address, 639 - unsigned int fault_flags, bool *unlocked) 640 - { 641 - struct vm_area_struct *vma; 642 - unsigned int test_flags; 643 - vm_fault_t fault; 644 - int rc; 645 - 646 - fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; 647 - test_flags = fault_flags & FAULT_FLAG_WRITE ? VM_WRITE : VM_READ; 648 - 649 - vma = find_vma(mm, address); 650 - if (unlikely(!vma || address < vma->vm_start)) 651 - return -EFAULT; 652 - if (unlikely(!(vma->vm_flags & test_flags))) 653 - return -EFAULT; 654 - 655 - fault = handle_mm_fault(vma, address, fault_flags, NULL); 656 - /* the mm lock has been dropped, take it again */ 657 - if (fault & VM_FAULT_COMPLETED) { 658 - *unlocked = true; 659 - mmap_read_lock(mm); 660 - return 0; 661 - } 662 - /* the mm lock has not been dropped */ 663 - if (fault & VM_FAULT_ERROR) { 664 - rc = vm_fault_to_errno(fault, 0); 665 - BUG_ON(!rc); 666 - return rc; 667 - } 668 - /* the mm lock has not been dropped because of FAULT_FLAG_RETRY_NOWAIT */ 669 - if (fault & VM_FAULT_RETRY) 670 - return -EAGAIN; 671 - /* nothing needed to be done and the mm lock has not been dropped */ 672 - return 0; 673 - } 674 - 675 - /** 676 - * __gmap_fault - resolve a fault on a guest address 677 - * @gmap: pointer to guest mapping meta data structure 678 - * @gaddr: guest address 679 - * @fault_flags: flags to pass down to handle_mm_fault() 680 - * 681 - * Context: Needs to be called with mm->mmap_lock held in read mode. Might 682 - * drop and re-acquire the lock. Will always return with the lock held. 683 - */ 684 - static int __gmap_fault(struct gmap *gmap, unsigned long gaddr, unsigned int fault_flags) 685 - { 686 - unsigned long vmaddr; 687 - bool unlocked; 688 - int rc = 0; 689 - 690 - retry: 691 - unlocked = false; 692 - 693 - vmaddr = __gmap_translate(gmap, gaddr); 694 - if (IS_ERR_VALUE(vmaddr)) 695 - return vmaddr; 696 - 697 - if (fault_flags & FAULT_FLAG_RETRY_NOWAIT) 698 - rc = fixup_user_fault_nowait(gmap->mm, vmaddr, fault_flags, &unlocked); 699 - else 700 - rc = fixup_user_fault(gmap->mm, vmaddr, fault_flags, &unlocked); 701 - if (rc) 702 - return rc; 703 - /* 704 - * In the case that fixup_user_fault unlocked the mmap_lock during 705 - * fault-in, redo __gmap_translate() to avoid racing with a 706 - * map/unmap_segment. 707 - * In particular, __gmap_translate(), fixup_user_fault{,_nowait}(), 708 - * and __gmap_link() must all be called atomically in one go; if the 709 - * lock had been dropped in between, a retry is needed. 710 - */ 711 - if (unlocked) 712 - goto retry; 713 - 714 - return __gmap_link(gmap, gaddr, vmaddr); 715 - } 716 - 717 - /** 718 - * gmap_fault - resolve a fault on a guest address 719 - * @gmap: pointer to guest mapping meta data structure 720 - * @gaddr: guest address 721 - * @fault_flags: flags to pass down to handle_mm_fault() 722 - * 723 - * Returns 0 on success, -ENOMEM for out of memory conditions, -EFAULT if the 724 - * vm address is already mapped to a different guest segment, and -EAGAIN if 725 - * FAULT_FLAG_RETRY_NOWAIT was specified and the fault could not be processed 726 - * immediately. 727 - */ 728 - int gmap_fault(struct gmap *gmap, unsigned long gaddr, unsigned int fault_flags) 729 - { 730 - int rc; 731 - 732 - mmap_read_lock(gmap->mm); 733 - rc = __gmap_fault(gmap, gaddr, fault_flags); 734 - mmap_read_unlock(gmap->mm); 735 - return rc; 736 - } 737 - EXPORT_SYMBOL_GPL(gmap_fault); 608 + EXPORT_SYMBOL(__gmap_link); 738 609 739 610 /* 740 611 * this function is assumed to be called with mmap_lock held ··· 736 853 * 737 854 * Note: Can also be called for shadow gmaps. 738 855 */ 739 - static inline unsigned long *gmap_table_walk(struct gmap *gmap, 740 - unsigned long gaddr, int level) 856 + unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int level) 741 857 { 742 858 const int asce_type = gmap->asce & _ASCE_TYPE_MASK; 743 859 unsigned long *table = gmap->table; ··· 787 905 } 788 906 return table; 789 907 } 908 + EXPORT_SYMBOL(gmap_table_walk); 790 909 791 910 /** 792 911 * gmap_pte_op_walk - walk the gmap page table, get the page table lock ··· 984 1101 * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE 985 1102 * @bits: pgste notification bits to set 986 1103 * 987 - * Returns 0 if successfully protected, -ENOMEM if out of memory and 988 - * -EFAULT if gaddr is invalid (or mapping for shadows is missing). 1104 + * Returns: 1105 + * PAGE_SIZE if a small page was successfully protected; 1106 + * HPAGE_SIZE if a large page was successfully protected; 1107 + * -ENOMEM if out of memory; 1108 + * -EFAULT if gaddr is invalid (or mapping for shadows is missing); 1109 + * -EAGAIN if the guest mapping is missing and should be fixed by the caller. 989 1110 * 990 - * Called with sg->mm->mmap_lock in read. 1111 + * Context: Called with sg->mm->mmap_lock in read. 991 1112 */ 992 - static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr, 993 - unsigned long len, int prot, unsigned long bits) 1113 + int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, unsigned long bits) 994 1114 { 995 - unsigned long vmaddr, dist; 996 1115 pmd_t *pmdp; 997 - int rc; 1116 + int rc = 0; 998 1117 999 1118 BUG_ON(gmap_is_shadow(gmap)); 1000 - while (len) { 1001 - rc = -EAGAIN; 1002 - pmdp = gmap_pmd_op_walk(gmap, gaddr); 1003 - if (pmdp) { 1004 - if (!pmd_leaf(*pmdp)) { 1005 - rc = gmap_protect_pte(gmap, gaddr, pmdp, prot, 1006 - bits); 1007 - if (!rc) { 1008 - len -= PAGE_SIZE; 1009 - gaddr += PAGE_SIZE; 1010 - } 1011 - } else { 1012 - rc = gmap_protect_pmd(gmap, gaddr, pmdp, prot, 1013 - bits); 1014 - if (!rc) { 1015 - dist = HPAGE_SIZE - (gaddr & ~HPAGE_MASK); 1016 - len = len < dist ? 0 : len - dist; 1017 - gaddr = (gaddr & HPAGE_MASK) + HPAGE_SIZE; 1018 - } 1019 - } 1020 - gmap_pmd_op_end(gmap, pmdp); 1021 - } 1022 - if (rc) { 1023 - if (rc == -EINVAL) 1024 - return rc; 1025 1119 1026 - /* -EAGAIN, fixup of userspace mm and gmap */ 1027 - vmaddr = __gmap_translate(gmap, gaddr); 1028 - if (IS_ERR_VALUE(vmaddr)) 1029 - return vmaddr; 1030 - rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot); 1031 - if (rc) 1032 - return rc; 1033 - } 1120 + pmdp = gmap_pmd_op_walk(gmap, gaddr); 1121 + if (!pmdp) 1122 + return -EAGAIN; 1123 + 1124 + if (!pmd_leaf(*pmdp)) { 1125 + rc = gmap_protect_pte(gmap, gaddr, pmdp, prot, bits); 1126 + if (!rc) 1127 + rc = PAGE_SIZE; 1128 + } else { 1129 + rc = gmap_protect_pmd(gmap, gaddr, pmdp, prot, bits); 1130 + if (!rc) 1131 + rc = HPAGE_SIZE; 1034 1132 } 1035 - return 0; 1036 - } 1133 + gmap_pmd_op_end(gmap, pmdp); 1037 1134 1038 - /** 1039 - * gmap_mprotect_notify - change access rights for a range of ptes and 1040 - * call the notifier if any pte changes again 1041 - * @gmap: pointer to guest mapping meta data structure 1042 - * @gaddr: virtual address in the guest address space 1043 - * @len: size of area 1044 - * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE 1045 - * 1046 - * Returns 0 if for each page in the given range a gmap mapping exists, 1047 - * the new access rights could be set and the notifier could be armed. 1048 - * If the gmap mapping is missing for one or more pages -EFAULT is 1049 - * returned. If no memory could be allocated -ENOMEM is returned. 1050 - * This function establishes missing page table entries. 1051 - */ 1052 - int gmap_mprotect_notify(struct gmap *gmap, unsigned long gaddr, 1053 - unsigned long len, int prot) 1054 - { 1055 - int rc; 1056 - 1057 - if ((gaddr & ~PAGE_MASK) || (len & ~PAGE_MASK) || gmap_is_shadow(gmap)) 1058 - return -EINVAL; 1059 - if (!MACHINE_HAS_ESOP && prot == PROT_READ) 1060 - return -EINVAL; 1061 - mmap_read_lock(gmap->mm); 1062 - rc = gmap_protect_range(gmap, gaddr, len, prot, GMAP_NOTIFY_MPROT); 1063 - mmap_read_unlock(gmap->mm); 1064 1135 return rc; 1065 1136 } 1066 - EXPORT_SYMBOL_GPL(gmap_mprotect_notify); 1137 + EXPORT_SYMBOL_GPL(gmap_protect_one); 1067 1138 1068 1139 /** 1069 1140 * gmap_read_table - get an unsigned long value from a guest page table using ··· 1251 1414 __gmap_unshadow_pgt(sg, raddr, __va(pgt)); 1252 1415 /* Free page table */ 1253 1416 ptdesc = page_ptdesc(phys_to_page(pgt)); 1254 - list_del(&ptdesc->pt_list); 1255 1417 page_table_free_pgste(ptdesc); 1256 1418 } 1257 1419 ··· 1278 1442 __gmap_unshadow_pgt(sg, raddr, __va(pgt)); 1279 1443 /* Free page table */ 1280 1444 ptdesc = page_ptdesc(phys_to_page(pgt)); 1281 - list_del(&ptdesc->pt_list); 1282 1445 page_table_free_pgste(ptdesc); 1283 1446 } 1284 1447 } ··· 1307 1472 __gmap_unshadow_sgt(sg, raddr, __va(sgt)); 1308 1473 /* Free segment table */ 1309 1474 page = phys_to_page(sgt); 1310 - list_del(&page->lru); 1311 1475 __free_pages(page, CRST_ALLOC_ORDER); 1312 1476 } 1313 1477 ··· 1334 1500 __gmap_unshadow_sgt(sg, raddr, __va(sgt)); 1335 1501 /* Free segment table */ 1336 1502 page = phys_to_page(sgt); 1337 - list_del(&page->lru); 1338 1503 __free_pages(page, CRST_ALLOC_ORDER); 1339 1504 } 1340 1505 } ··· 1363 1530 __gmap_unshadow_r3t(sg, raddr, __va(r3t)); 1364 1531 /* Free region 3 table */ 1365 1532 page = phys_to_page(r3t); 1366 - list_del(&page->lru); 1367 1533 __free_pages(page, CRST_ALLOC_ORDER); 1368 1534 } 1369 1535 ··· 1390 1558 __gmap_unshadow_r3t(sg, raddr, __va(r3t)); 1391 1559 /* Free region 3 table */ 1392 1560 page = phys_to_page(r3t); 1393 - list_del(&page->lru); 1394 1561 __free_pages(page, CRST_ALLOC_ORDER); 1395 1562 } 1396 1563 } ··· 1419 1588 __gmap_unshadow_r2t(sg, raddr, __va(r2t)); 1420 1589 /* Free region 2 table */ 1421 1590 page = phys_to_page(r2t); 1422 - list_del(&page->lru); 1423 1591 __free_pages(page, CRST_ALLOC_ORDER); 1424 1592 } 1425 1593 ··· 1450 1620 r1t[i] = _REGION1_ENTRY_EMPTY; 1451 1621 /* Free region 2 table */ 1452 1622 page = phys_to_page(r2t); 1453 - list_del(&page->lru); 1454 1623 __free_pages(page, CRST_ALLOC_ORDER); 1455 1624 } 1456 1625 } ··· 1460 1631 * 1461 1632 * Called with sg->guest_table_lock 1462 1633 */ 1463 - static void gmap_unshadow(struct gmap *sg) 1634 + void gmap_unshadow(struct gmap *sg) 1464 1635 { 1465 1636 unsigned long *table; 1466 1637 ··· 1486 1657 break; 1487 1658 } 1488 1659 } 1489 - 1490 - /** 1491 - * gmap_find_shadow - find a specific asce in the list of shadow tables 1492 - * @parent: pointer to the parent gmap 1493 - * @asce: ASCE for which the shadow table is created 1494 - * @edat_level: edat level to be used for the shadow translation 1495 - * 1496 - * Returns the pointer to a gmap if a shadow table with the given asce is 1497 - * already available, ERR_PTR(-EAGAIN) if another one is just being created, 1498 - * otherwise NULL 1499 - */ 1500 - static struct gmap *gmap_find_shadow(struct gmap *parent, unsigned long asce, 1501 - int edat_level) 1502 - { 1503 - struct gmap *sg; 1504 - 1505 - list_for_each_entry(sg, &parent->children, list) { 1506 - if (sg->orig_asce != asce || sg->edat_level != edat_level || 1507 - sg->removed) 1508 - continue; 1509 - if (!sg->initialized) 1510 - return ERR_PTR(-EAGAIN); 1511 - refcount_inc(&sg->ref_count); 1512 - return sg; 1513 - } 1514 - return NULL; 1515 - } 1516 - 1517 - /** 1518 - * gmap_shadow_valid - check if a shadow guest address space matches the 1519 - * given properties and is still valid 1520 - * @sg: pointer to the shadow guest address space structure 1521 - * @asce: ASCE for which the shadow table is requested 1522 - * @edat_level: edat level to be used for the shadow translation 1523 - * 1524 - * Returns 1 if the gmap shadow is still valid and matches the given 1525 - * properties, the caller can continue using it. Returns 0 otherwise, the 1526 - * caller has to request a new shadow gmap in this case. 1527 - * 1528 - */ 1529 - int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level) 1530 - { 1531 - if (sg->removed) 1532 - return 0; 1533 - return sg->orig_asce == asce && sg->edat_level == edat_level; 1534 - } 1535 - EXPORT_SYMBOL_GPL(gmap_shadow_valid); 1536 - 1537 - /** 1538 - * gmap_shadow - create/find a shadow guest address space 1539 - * @parent: pointer to the parent gmap 1540 - * @asce: ASCE for which the shadow table is created 1541 - * @edat_level: edat level to be used for the shadow translation 1542 - * 1543 - * The pages of the top level page table referred by the asce parameter 1544 - * will be set to read-only and marked in the PGSTEs of the kvm process. 1545 - * The shadow table will be removed automatically on any change to the 1546 - * PTE mapping for the source table. 1547 - * 1548 - * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of memory, 1549 - * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the 1550 - * parent gmap table could not be protected. 1551 - */ 1552 - struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, 1553 - int edat_level) 1554 - { 1555 - struct gmap *sg, *new; 1556 - unsigned long limit; 1557 - int rc; 1558 - 1559 - BUG_ON(parent->mm->context.allow_gmap_hpage_1m); 1560 - BUG_ON(gmap_is_shadow(parent)); 1561 - spin_lock(&parent->shadow_lock); 1562 - sg = gmap_find_shadow(parent, asce, edat_level); 1563 - spin_unlock(&parent->shadow_lock); 1564 - if (sg) 1565 - return sg; 1566 - /* Create a new shadow gmap */ 1567 - limit = -1UL >> (33 - (((asce & _ASCE_TYPE_MASK) >> 2) * 11)); 1568 - if (asce & _ASCE_REAL_SPACE) 1569 - limit = -1UL; 1570 - new = gmap_alloc(limit); 1571 - if (!new) 1572 - return ERR_PTR(-ENOMEM); 1573 - new->mm = parent->mm; 1574 - new->parent = gmap_get(parent); 1575 - new->private = parent->private; 1576 - new->orig_asce = asce; 1577 - new->edat_level = edat_level; 1578 - new->initialized = false; 1579 - spin_lock(&parent->shadow_lock); 1580 - /* Recheck if another CPU created the same shadow */ 1581 - sg = gmap_find_shadow(parent, asce, edat_level); 1582 - if (sg) { 1583 - spin_unlock(&parent->shadow_lock); 1584 - gmap_free(new); 1585 - return sg; 1586 - } 1587 - if (asce & _ASCE_REAL_SPACE) { 1588 - /* only allow one real-space gmap shadow */ 1589 - list_for_each_entry(sg, &parent->children, list) { 1590 - if (sg->orig_asce & _ASCE_REAL_SPACE) { 1591 - spin_lock(&sg->guest_table_lock); 1592 - gmap_unshadow(sg); 1593 - spin_unlock(&sg->guest_table_lock); 1594 - list_del(&sg->list); 1595 - gmap_put(sg); 1596 - break; 1597 - } 1598 - } 1599 - } 1600 - refcount_set(&new->ref_count, 2); 1601 - list_add(&new->list, &parent->children); 1602 - if (asce & _ASCE_REAL_SPACE) { 1603 - /* nothing to protect, return right away */ 1604 - new->initialized = true; 1605 - spin_unlock(&parent->shadow_lock); 1606 - return new; 1607 - } 1608 - spin_unlock(&parent->shadow_lock); 1609 - /* protect after insertion, so it will get properly invalidated */ 1610 - mmap_read_lock(parent->mm); 1611 - rc = gmap_protect_range(parent, asce & _ASCE_ORIGIN, 1612 - ((asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE, 1613 - PROT_READ, GMAP_NOTIFY_SHADOW); 1614 - mmap_read_unlock(parent->mm); 1615 - spin_lock(&parent->shadow_lock); 1616 - new->initialized = true; 1617 - if (rc) { 1618 - list_del(&new->list); 1619 - gmap_free(new); 1620 - new = ERR_PTR(rc); 1621 - } 1622 - spin_unlock(&parent->shadow_lock); 1623 - return new; 1624 - } 1625 - EXPORT_SYMBOL_GPL(gmap_shadow); 1660 + EXPORT_SYMBOL(gmap_unshadow); 1626 1661 1627 1662 /** 1628 1663 * gmap_shadow_r2t - create an empty shadow region 2 table ··· 1520 1827 page = gmap_alloc_crst(); 1521 1828 if (!page) 1522 1829 return -ENOMEM; 1523 - page->index = r2t & _REGION_ENTRY_ORIGIN; 1524 - if (fake) 1525 - page->index |= GMAP_SHADOW_FAKE_TABLE; 1526 1830 s_r2t = page_to_phys(page); 1527 1831 /* Install shadow region second table */ 1528 1832 spin_lock(&sg->guest_table_lock); ··· 1541 1851 _REGION_ENTRY_TYPE_R1 | _REGION_ENTRY_INVALID; 1542 1852 if (sg->edat_level >= 1) 1543 1853 *table |= (r2t & _REGION_ENTRY_PROTECT); 1544 - list_add(&page->lru, &sg->crst_list); 1545 1854 if (fake) { 1546 1855 /* nothing to protect for fake tables */ 1547 1856 *table &= ~_REGION_ENTRY_INVALID; ··· 1600 1911 page = gmap_alloc_crst(); 1601 1912 if (!page) 1602 1913 return -ENOMEM; 1603 - page->index = r3t & _REGION_ENTRY_ORIGIN; 1604 - if (fake) 1605 - page->index |= GMAP_SHADOW_FAKE_TABLE; 1606 1914 s_r3t = page_to_phys(page); 1607 1915 /* Install shadow region second table */ 1608 1916 spin_lock(&sg->guest_table_lock); ··· 1621 1935 _REGION_ENTRY_TYPE_R2 | _REGION_ENTRY_INVALID; 1622 1936 if (sg->edat_level >= 1) 1623 1937 *table |= (r3t & _REGION_ENTRY_PROTECT); 1624 - list_add(&page->lru, &sg->crst_list); 1625 1938 if (fake) { 1626 1939 /* nothing to protect for fake tables */ 1627 1940 *table &= ~_REGION_ENTRY_INVALID; ··· 1680 1995 page = gmap_alloc_crst(); 1681 1996 if (!page) 1682 1997 return -ENOMEM; 1683 - page->index = sgt & _REGION_ENTRY_ORIGIN; 1684 - if (fake) 1685 - page->index |= GMAP_SHADOW_FAKE_TABLE; 1686 1998 s_sgt = page_to_phys(page); 1687 1999 /* Install shadow region second table */ 1688 2000 spin_lock(&sg->guest_table_lock); ··· 1701 2019 _REGION_ENTRY_TYPE_R3 | _REGION_ENTRY_INVALID; 1702 2020 if (sg->edat_level >= 1) 1703 2021 *table |= sgt & _REGION_ENTRY_PROTECT; 1704 - list_add(&page->lru, &sg->crst_list); 1705 2022 if (fake) { 1706 2023 /* nothing to protect for fake tables */ 1707 2024 *table &= ~_REGION_ENTRY_INVALID; ··· 1733 2052 } 1734 2053 EXPORT_SYMBOL_GPL(gmap_shadow_sgt); 1735 2054 1736 - /** 1737 - * gmap_shadow_pgt_lookup - find a shadow page table 1738 - * @sg: pointer to the shadow guest address space structure 1739 - * @saddr: the address in the shadow aguest address space 1740 - * @pgt: parent gmap address of the page table to get shadowed 1741 - * @dat_protection: if the pgtable is marked as protected by dat 1742 - * @fake: pgt references contiguous guest memory block, not a pgtable 1743 - * 1744 - * Returns 0 if the shadow page table was found and -EAGAIN if the page 1745 - * table was not found. 1746 - * 1747 - * Called with sg->mm->mmap_lock in read. 1748 - */ 1749 - int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, 1750 - unsigned long *pgt, int *dat_protection, 1751 - int *fake) 2055 + static void gmap_pgste_set_pgt_addr(struct ptdesc *ptdesc, unsigned long pgt_addr) 1752 2056 { 1753 - unsigned long *table; 1754 - struct page *page; 1755 - int rc; 2057 + unsigned long *pgstes = page_to_virt(ptdesc_page(ptdesc)); 1756 2058 1757 - BUG_ON(!gmap_is_shadow(sg)); 1758 - spin_lock(&sg->guest_table_lock); 1759 - table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */ 1760 - if (table && !(*table & _SEGMENT_ENTRY_INVALID)) { 1761 - /* Shadow page tables are full pages (pte+pgste) */ 1762 - page = pfn_to_page(*table >> PAGE_SHIFT); 1763 - *pgt = page->index & ~GMAP_SHADOW_FAKE_TABLE; 1764 - *dat_protection = !!(*table & _SEGMENT_ENTRY_PROTECT); 1765 - *fake = !!(page->index & GMAP_SHADOW_FAKE_TABLE); 1766 - rc = 0; 1767 - } else { 1768 - rc = -EAGAIN; 1769 - } 1770 - spin_unlock(&sg->guest_table_lock); 1771 - return rc; 2059 + pgstes += _PAGE_ENTRIES; 1772 2060 2061 + pgstes[0] &= ~PGSTE_ST2_MASK; 2062 + pgstes[1] &= ~PGSTE_ST2_MASK; 2063 + pgstes[2] &= ~PGSTE_ST2_MASK; 2064 + pgstes[3] &= ~PGSTE_ST2_MASK; 2065 + 2066 + pgstes[0] |= (pgt_addr >> 16) & PGSTE_ST2_MASK; 2067 + pgstes[1] |= pgt_addr & PGSTE_ST2_MASK; 2068 + pgstes[2] |= (pgt_addr << 16) & PGSTE_ST2_MASK; 2069 + pgstes[3] |= (pgt_addr << 32) & PGSTE_ST2_MASK; 1773 2070 } 1774 - EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup); 1775 2071 1776 2072 /** 1777 2073 * gmap_shadow_pgt - instantiate a shadow page table ··· 1777 2119 ptdesc = page_table_alloc_pgste(sg->mm); 1778 2120 if (!ptdesc) 1779 2121 return -ENOMEM; 1780 - ptdesc->pt_index = pgt & _SEGMENT_ENTRY_ORIGIN; 2122 + origin = pgt & _SEGMENT_ENTRY_ORIGIN; 1781 2123 if (fake) 1782 - ptdesc->pt_index |= GMAP_SHADOW_FAKE_TABLE; 2124 + origin |= GMAP_SHADOW_FAKE_TABLE; 2125 + gmap_pgste_set_pgt_addr(ptdesc, origin); 1783 2126 s_pgt = page_to_phys(ptdesc_page(ptdesc)); 1784 2127 /* Install shadow page table */ 1785 2128 spin_lock(&sg->guest_table_lock); ··· 1799 2140 /* mark as invalid as long as the parent table is not protected */ 1800 2141 *table = (unsigned long) s_pgt | _SEGMENT_ENTRY | 1801 2142 (pgt & _SEGMENT_ENTRY_PROTECT) | _SEGMENT_ENTRY_INVALID; 1802 - list_add(&ptdesc->pt_list, &sg->pt_list); 1803 2143 if (fake) { 1804 2144 /* nothing to protect for fake tables */ 1805 2145 *table &= ~_SEGMENT_ENTRY_INVALID; ··· 1976 2318 pte_t *pte, unsigned long bits) 1977 2319 { 1978 2320 unsigned long offset, gaddr = 0; 1979 - unsigned long *table; 1980 2321 struct gmap *gmap, *sg, *next; 1981 2322 1982 2323 offset = ((unsigned long) pte) & (255 * sizeof(pte_t)); ··· 1983 2326 rcu_read_lock(); 1984 2327 list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { 1985 2328 spin_lock(&gmap->guest_table_lock); 1986 - table = radix_tree_lookup(&gmap->host_to_guest, 1987 - vmaddr >> PMD_SHIFT); 1988 - if (table) 1989 - gaddr = __gmap_segment_gaddr(table) + offset; 2329 + gaddr = host_to_guest_lookup(gmap, vmaddr) + offset; 1990 2330 spin_unlock(&gmap->guest_table_lock); 1991 - if (!table) 2331 + if (!IS_GADDR_VALID(gaddr)) 1992 2332 continue; 1993 2333 1994 2334 if (!list_empty(&gmap->children) && (bits & PGSTE_VSIE_BIT)) { ··· 2045 2391 rcu_read_lock(); 2046 2392 list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { 2047 2393 spin_lock(&gmap->guest_table_lock); 2048 - pmdp = (pmd_t *)radix_tree_delete(&gmap->host_to_guest, 2049 - vmaddr >> PMD_SHIFT); 2394 + pmdp = host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); 2050 2395 if (pmdp) { 2051 - gaddr = __gmap_segment_gaddr((unsigned long *)pmdp); 2052 2396 pmdp_notify_gmap(gmap, pmdp, gaddr); 2053 2397 WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | 2054 2398 _SEGMENT_ENTRY_GMAP_UC | ··· 2090 2438 */ 2091 2439 void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr) 2092 2440 { 2093 - unsigned long *entry, gaddr; 2441 + unsigned long gaddr; 2094 2442 struct gmap *gmap; 2095 2443 pmd_t *pmdp; 2096 2444 2097 2445 rcu_read_lock(); 2098 2446 list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { 2099 2447 spin_lock(&gmap->guest_table_lock); 2100 - entry = radix_tree_delete(&gmap->host_to_guest, 2101 - vmaddr >> PMD_SHIFT); 2102 - if (entry) { 2103 - pmdp = (pmd_t *)entry; 2104 - gaddr = __gmap_segment_gaddr(entry); 2448 + pmdp = host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); 2449 + if (pmdp) { 2105 2450 pmdp_notify_gmap(gmap, pmdp, gaddr); 2106 - WARN_ON(*entry & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | 2107 - _SEGMENT_ENTRY_GMAP_UC | 2108 - _SEGMENT_ENTRY)); 2451 + WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | 2452 + _SEGMENT_ENTRY_GMAP_UC | 2453 + _SEGMENT_ENTRY)); 2109 2454 if (MACHINE_HAS_TLB_GUEST) 2110 2455 __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE, 2111 2456 gmap->asce, IDTE_LOCAL); 2112 2457 else if (MACHINE_HAS_IDTE) 2113 2458 __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_LOCAL); 2114 - *entry = _SEGMENT_ENTRY_EMPTY; 2459 + *pmdp = __pmd(_SEGMENT_ENTRY_EMPTY); 2115 2460 } 2116 2461 spin_unlock(&gmap->guest_table_lock); 2117 2462 } ··· 2123 2474 */ 2124 2475 void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr) 2125 2476 { 2126 - unsigned long *entry, gaddr; 2477 + unsigned long gaddr; 2127 2478 struct gmap *gmap; 2128 2479 pmd_t *pmdp; 2129 2480 2130 2481 rcu_read_lock(); 2131 2482 list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) { 2132 2483 spin_lock(&gmap->guest_table_lock); 2133 - entry = radix_tree_delete(&gmap->host_to_guest, 2134 - vmaddr >> PMD_SHIFT); 2135 - if (entry) { 2136 - pmdp = (pmd_t *)entry; 2137 - gaddr = __gmap_segment_gaddr(entry); 2484 + pmdp = host_to_guest_pmd_delete(gmap, vmaddr, &gaddr); 2485 + if (pmdp) { 2138 2486 pmdp_notify_gmap(gmap, pmdp, gaddr); 2139 - WARN_ON(*entry & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | 2140 - _SEGMENT_ENTRY_GMAP_UC | 2141 - _SEGMENT_ENTRY)); 2487 + WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE | 2488 + _SEGMENT_ENTRY_GMAP_UC | 2489 + _SEGMENT_ENTRY)); 2142 2490 if (MACHINE_HAS_TLB_GUEST) 2143 2491 __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE, 2144 2492 gmap->asce, IDTE_GLOBAL); ··· 2143 2497 __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_GLOBAL); 2144 2498 else 2145 2499 __pmdp_csp(pmdp); 2146 - *entry = _SEGMENT_ENTRY_EMPTY; 2500 + *pmdp = __pmd(_SEGMENT_ENTRY_EMPTY); 2147 2501 } 2148 2502 spin_unlock(&gmap->guest_table_lock); 2149 2503 } ··· 2589 2943 EXPORT_SYMBOL_GPL(__s390_uv_destroy_range); 2590 2944 2591 2945 /** 2592 - * s390_unlist_old_asce - Remove the topmost level of page tables from the 2593 - * list of page tables of the gmap. 2594 - * @gmap: the gmap whose table is to be removed 2595 - * 2596 - * On s390x, KVM keeps a list of all pages containing the page tables of the 2597 - * gmap (the CRST list). This list is used at tear down time to free all 2598 - * pages that are now not needed anymore. 2599 - * 2600 - * This function removes the topmost page of the tree (the one pointed to by 2601 - * the ASCE) from the CRST list. 2602 - * 2603 - * This means that it will not be freed when the VM is torn down, and needs 2604 - * to be handled separately by the caller, unless a leak is actually 2605 - * intended. Notice that this function will only remove the page from the 2606 - * list, the page will still be used as a top level page table (and ASCE). 2607 - */ 2608 - void s390_unlist_old_asce(struct gmap *gmap) 2609 - { 2610 - struct page *old; 2611 - 2612 - old = virt_to_page(gmap->table); 2613 - spin_lock(&gmap->guest_table_lock); 2614 - list_del(&old->lru); 2615 - /* 2616 - * Sometimes the topmost page might need to be "removed" multiple 2617 - * times, for example if the VM is rebooted into secure mode several 2618 - * times concurrently, or if s390_replace_asce fails after calling 2619 - * s390_remove_old_asce and is attempted again later. In that case 2620 - * the old asce has been removed from the list, and therefore it 2621 - * will not be freed when the VM terminates, but the ASCE is still 2622 - * in use and still pointed to. 2623 - * A subsequent call to replace_asce will follow the pointer and try 2624 - * to remove the same page from the list again. 2625 - * Therefore it's necessary that the page of the ASCE has valid 2626 - * pointers, so list_del can work (and do nothing) without 2627 - * dereferencing stale or invalid pointers. 2628 - */ 2629 - INIT_LIST_HEAD(&old->lru); 2630 - spin_unlock(&gmap->guest_table_lock); 2631 - } 2632 - EXPORT_SYMBOL_GPL(s390_unlist_old_asce); 2633 - 2634 - /** 2635 2946 * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy 2636 2947 * @gmap: the gmap whose ASCE needs to be replaced 2637 2948 * ··· 2607 3004 struct page *page; 2608 3005 void *table; 2609 3006 2610 - s390_unlist_old_asce(gmap); 2611 - 2612 3007 /* Replacing segment type ASCEs would cause serious issues */ 2613 3008 if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) 2614 3009 return -EINVAL; ··· 2614 3013 page = gmap_alloc_crst(); 2615 3014 if (!page) 2616 3015 return -ENOMEM; 2617 - page->index = 0; 2618 3016 table = page_to_virt(page); 2619 3017 memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT)); 2620 - 2621 - /* 2622 - * The caller has to deal with the old ASCE, but here we make sure 2623 - * the new one is properly added to the CRST list, so that 2624 - * it will be freed when the VM is torn down. 2625 - */ 2626 - spin_lock(&gmap->guest_table_lock); 2627 - list_add(&page->lru, &gmap->crst_list); 2628 - spin_unlock(&gmap->guest_table_lock); 2629 3018 2630 3019 /* Set new table origin while preserving existing ASCE control bits */ 2631 3020 asce = (gmap->asce & ~_ASCE_ORIGIN) | __pa(table); ··· 2626 3035 return 0; 2627 3036 } 2628 3037 EXPORT_SYMBOL_GPL(s390_replace_asce); 3038 + 3039 + /** 3040 + * kvm_s390_wiggle_split_folio() - try to drain extra references to a folio and optionally split 3041 + * @mm: the mm containing the folio to work on 3042 + * @folio: the folio 3043 + * @split: whether to split a large folio 3044 + * 3045 + * Context: Must be called while holding an extra reference to the folio; 3046 + * the mm lock should not be held. 3047 + */ 3048 + int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split) 3049 + { 3050 + int rc; 3051 + 3052 + lockdep_assert_not_held(&mm->mmap_lock); 3053 + folio_wait_writeback(folio); 3054 + lru_add_drain_all(); 3055 + if (split) { 3056 + folio_lock(folio); 3057 + rc = split_folio(folio); 3058 + folio_unlock(folio); 3059 + 3060 + if (rc != -EBUSY) 3061 + return rc; 3062 + } 3063 + return -EAGAIN; 3064 + } 3065 + EXPORT_SYMBOL_GPL(kvm_s390_wiggle_split_folio);

-2

arch/s390/mm/pgalloc.c

··· 176 176 } 177 177 table = ptdesc_to_virt(ptdesc); 178 178 __arch_set_page_dat(table, 1); 179 - /* pt_list is used by gmap only */ 180 - INIT_LIST_HEAD(&ptdesc->pt_list); 181 179 memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE); 182 180 memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE); 183 181 return table;

+1

arch/x86/boot/compressed/Makefile

··· 25 25 # avoid errors with '-march=i386', and future flags may depend on the target to 26 26 # be valid. 27 27 KBUILD_CFLAGS := -m$(BITS) -O2 $(CLANG_FLAGS) 28 + KBUILD_CFLAGS += -std=gnu11 28 29 KBUILD_CFLAGS += -fno-strict-aliasing -fPIE 29 30 KBUILD_CFLAGS += -Wundef 30 31 KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING

+1 -1

arch/x86/kvm/cpuid.c

··· 1180 1180 SYNTHESIZED_F(SBPB), 1181 1181 SYNTHESIZED_F(IBPB_BRTYPE), 1182 1182 SYNTHESIZED_F(SRSO_NO), 1183 - SYNTHESIZED_F(SRSO_USER_KERNEL_NO), 1183 + F(SRSO_USER_KERNEL_NO), 1184 1184 ); 1185 1185 1186 1186 kvm_cpu_cap_init(CPUID_8000_0022_EAX,

+26 -7

arch/x86/kvm/mmu/mmu.c

··· 7120 7120 kmem_cache_destroy(mmu_page_header_cache); 7121 7121 } 7122 7122 7123 + static void kvm_wake_nx_recovery_thread(struct kvm *kvm) 7124 + { 7125 + /* 7126 + * The NX recovery thread is spawned on-demand at the first KVM_RUN and 7127 + * may not be valid even though the VM is globally visible. Do nothing, 7128 + * as such a VM can't have any possible NX huge pages. 7129 + */ 7130 + struct vhost_task *nx_thread = READ_ONCE(kvm->arch.nx_huge_page_recovery_thread); 7131 + 7132 + if (nx_thread) 7133 + vhost_task_wake(nx_thread); 7134 + } 7135 + 7123 7136 static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp) 7124 7137 { 7125 7138 if (nx_hugepage_mitigation_hard_disabled) ··· 7193 7180 kvm_mmu_zap_all_fast(kvm); 7194 7181 mutex_unlock(&kvm->slots_lock); 7195 7182 7196 - vhost_task_wake(kvm->arch.nx_huge_page_recovery_thread); 7183 + kvm_wake_nx_recovery_thread(kvm); 7197 7184 } 7198 7185 mutex_unlock(&kvm_lock); 7199 7186 } ··· 7328 7315 mutex_lock(&kvm_lock); 7329 7316 7330 7317 list_for_each_entry(kvm, &vm_list, vm_list) 7331 - vhost_task_wake(kvm->arch.nx_huge_page_recovery_thread); 7318 + kvm_wake_nx_recovery_thread(kvm); 7332 7319 7333 7320 mutex_unlock(&kvm_lock); 7334 7321 } ··· 7464 7451 { 7465 7452 struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once); 7466 7453 struct kvm *kvm = container_of(ka, struct kvm, arch); 7454 + struct vhost_task *nx_thread; 7467 7455 7468 7456 kvm->arch.nx_huge_page_last = get_jiffies_64(); 7469 - kvm->arch.nx_huge_page_recovery_thread = vhost_task_create( 7470 - kvm_nx_huge_page_recovery_worker, kvm_nx_huge_page_recovery_worker_kill, 7471 - kvm, "kvm-nx-lpage-recovery"); 7457 + nx_thread = vhost_task_create(kvm_nx_huge_page_recovery_worker, 7458 + kvm_nx_huge_page_recovery_worker_kill, 7459 + kvm, "kvm-nx-lpage-recovery"); 7472 7460 7473 - if (kvm->arch.nx_huge_page_recovery_thread) 7474 - vhost_task_start(kvm->arch.nx_huge_page_recovery_thread); 7461 + if (!nx_thread) 7462 + return; 7463 + 7464 + vhost_task_start(nx_thread); 7465 + 7466 + /* Make the task visible only once it is fully started. */ 7467 + WRITE_ONCE(kvm->arch.nx_huge_page_recovery_thread, nx_thread); 7475 7468 } 7476 7469 7477 7470 int kvm_mmu_post_init_vm(struct kvm *kvm)

+1 -6

arch/x86/kvm/x86.c

··· 12741 12741 "does not run without ignore_msrs=1, please report it to kvm@vger.kernel.org.\n"); 12742 12742 } 12743 12743 12744 + once_init(&kvm->arch.nx_once); 12744 12745 return 0; 12745 12746 12746 12747 out_uninit_mmu: ··· 12749 12748 kvm_page_track_cleanup(kvm); 12750 12749 out: 12751 12750 return ret; 12752 - } 12753 - 12754 - int kvm_arch_post_init_vm(struct kvm *kvm) 12755 - { 12756 - once_init(&kvm->arch.nx_once); 12757 - return 0; 12758 12751 } 12759 12752 12760 12753 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)

+3 -8

arch/x86/xen/xen-head.S

··· 100 100 push %r10 101 101 push %r9 102 102 push %r8 103 - #ifdef CONFIG_FRAME_POINTER 104 - pushq $0 /* Dummy push for stack alignment. */ 105 - #endif 106 103 #endif 107 104 /* Set the vendor specific function. */ 108 105 call __xen_hypercall_setfunc ··· 114 117 pop %ebx 115 118 pop %eax 116 119 #else 117 - lea xen_hypercall_amd(%rip), %rbx 118 - cmp %rax, %rbx 119 - #ifdef CONFIG_FRAME_POINTER 120 - pop %rax /* Dummy pop. */ 121 - #endif 120 + lea xen_hypercall_amd(%rip), %rcx 121 + cmp %rax, %rcx 122 122 pop %r8 123 123 pop %r9 124 124 pop %r10 ··· 126 132 pop %rcx 127 133 pop %rax 128 134 #endif 135 + FRAME_END 129 136 /* Use correct hypercall function. */ 130 137 jz xen_hypercall_amd 131 138 jmp xen_hypercall_intel

+5

drivers/accel/amdxdna/amdxdna_pci_drv.c

··· 21 21 22 22 #define AMDXDNA_AUTOSUSPEND_DELAY 5000 /* milliseconds */ 23 23 24 + MODULE_FIRMWARE("amdnpu/1502_00/npu.sbin"); 25 + MODULE_FIRMWARE("amdnpu/17f0_10/npu.sbin"); 26 + MODULE_FIRMWARE("amdnpu/17f0_11/npu.sbin"); 27 + MODULE_FIRMWARE("amdnpu/17f0_20/npu.sbin"); 28 + 24 29 /* 25 30 * Bind the driver base on (vendor_id, device_id) pair and later use the 26 31 * (device_id, rev_id) pair as a key to select the devices. The devices with

+6 -2

drivers/accel/ivpu/ivpu_drv.c

··· 397 397 if (ivpu_fw_is_cold_boot(vdev)) { 398 398 ret = ivpu_pm_dct_init(vdev); 399 399 if (ret) 400 - goto err_diagnose_failure; 400 + goto err_disable_ipc; 401 401 402 402 ret = ivpu_hw_sched_init(vdev); 403 403 if (ret) 404 - goto err_diagnose_failure; 404 + goto err_disable_ipc; 405 405 } 406 406 407 407 return 0; 408 408 409 + err_disable_ipc: 410 + ivpu_ipc_disable(vdev); 411 + ivpu_hw_irq_disable(vdev); 412 + disable_irq(vdev->irq); 409 413 err_diagnose_failure: 410 414 ivpu_hw_diagnose_failure(vdev); 411 415 ivpu_mmu_evtq_dump(vdev);

+47 -37

drivers/accel/ivpu/ivpu_pm.c

··· 115 115 return ret; 116 116 } 117 117 118 - static void ivpu_pm_recovery_work(struct work_struct *work) 118 + static void ivpu_pm_reset_begin(struct ivpu_device *vdev) 119 119 { 120 - struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, recovery_work); 121 - struct ivpu_device *vdev = pm->vdev; 122 - char *evt[2] = {"IVPU_PM_EVENT=IVPU_RECOVER", NULL}; 123 - int ret; 124 - 125 - ivpu_err(vdev, "Recovering the NPU (reset #%d)\n", atomic_read(&vdev->pm->reset_counter)); 126 - 127 - ret = pm_runtime_resume_and_get(vdev->drm.dev); 128 - if (ret) 129 - ivpu_err(vdev, "Failed to resume NPU: %d\n", ret); 130 - 131 - ivpu_jsm_state_dump(vdev); 132 - ivpu_dev_coredump(vdev); 120 + pm_runtime_disable(vdev->drm.dev); 133 121 134 122 atomic_inc(&vdev->pm->reset_counter); 135 123 atomic_set(&vdev->pm->reset_pending, 1); 136 124 down_write(&vdev->pm->reset_lock); 125 + } 137 126 138 - ivpu_suspend(vdev); 127 + static void ivpu_pm_reset_complete(struct ivpu_device *vdev) 128 + { 129 + int ret; 130 + 139 131 ivpu_pm_prepare_cold_boot(vdev); 140 132 ivpu_jobs_abort_all(vdev); 141 133 ivpu_ms_cleanup_all(vdev); 142 134 143 135 ret = ivpu_resume(vdev); 144 - if (ret) 136 + if (ret) { 145 137 ivpu_err(vdev, "Failed to resume NPU: %d\n", ret); 138 + pm_runtime_set_suspended(vdev->drm.dev); 139 + } else { 140 + pm_runtime_set_active(vdev->drm.dev); 141 + } 146 142 147 143 up_write(&vdev->pm->reset_lock); 148 144 atomic_set(&vdev->pm->reset_pending, 0); 149 145 150 - kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt); 151 146 pm_runtime_mark_last_busy(vdev->drm.dev); 152 - pm_runtime_put_autosuspend(vdev->drm.dev); 147 + pm_runtime_enable(vdev->drm.dev); 148 + } 149 + 150 + static void ivpu_pm_recovery_work(struct work_struct *work) 151 + { 152 + struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, recovery_work); 153 + struct ivpu_device *vdev = pm->vdev; 154 + char *evt[2] = {"IVPU_PM_EVENT=IVPU_RECOVER", NULL}; 155 + 156 + ivpu_err(vdev, "Recovering the NPU (reset #%d)\n", atomic_read(&vdev->pm->reset_counter)); 157 + 158 + ivpu_pm_reset_begin(vdev); 159 + 160 + if (!pm_runtime_status_suspended(vdev->drm.dev)) { 161 + ivpu_jsm_state_dump(vdev); 162 + ivpu_dev_coredump(vdev); 163 + ivpu_suspend(vdev); 164 + } 165 + 166 + ivpu_pm_reset_complete(vdev); 167 + 168 + kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt); 153 169 } 154 170 155 171 void ivpu_pm_trigger_recovery(struct ivpu_device *vdev, const char *reason) ··· 320 304 int ret; 321 305 322 306 ret = pm_runtime_resume_and_get(vdev->drm.dev); 323 - drm_WARN_ON(&vdev->drm, ret < 0); 307 + if (ret < 0) { 308 + ivpu_err(vdev, "Failed to resume NPU: %d\n", ret); 309 + pm_runtime_set_suspended(vdev->drm.dev); 310 + } 324 311 325 312 return ret; 326 313 } ··· 339 320 struct ivpu_device *vdev = pci_get_drvdata(pdev); 340 321 341 322 ivpu_dbg(vdev, PM, "Pre-reset..\n"); 342 - atomic_inc(&vdev->pm->reset_counter); 343 - atomic_set(&vdev->pm->reset_pending, 1); 344 323 345 - pm_runtime_get_sync(vdev->drm.dev); 346 - down_write(&vdev->pm->reset_lock); 347 - ivpu_prepare_for_reset(vdev); 348 - ivpu_hw_reset(vdev); 349 - ivpu_pm_prepare_cold_boot(vdev); 350 - ivpu_jobs_abort_all(vdev); 351 - ivpu_ms_cleanup_all(vdev); 324 + ivpu_pm_reset_begin(vdev); 325 + 326 + if (!pm_runtime_status_suspended(vdev->drm.dev)) { 327 + ivpu_prepare_for_reset(vdev); 328 + ivpu_hw_reset(vdev); 329 + } 352 330 353 331 ivpu_dbg(vdev, PM, "Pre-reset done.\n"); 354 332 } ··· 353 337 void ivpu_pm_reset_done_cb(struct pci_dev *pdev) 354 338 { 355 339 struct ivpu_device *vdev = pci_get_drvdata(pdev); 356 - int ret; 357 340 358 341 ivpu_dbg(vdev, PM, "Post-reset..\n"); 359 - ret = ivpu_resume(vdev); 360 - if (ret) 361 - ivpu_err(vdev, "Failed to set RESUME state: %d\n", ret); 362 - up_write(&vdev->pm->reset_lock); 363 - atomic_set(&vdev->pm->reset_pending, 0); 364 - ivpu_dbg(vdev, PM, "Post-reset done.\n"); 365 342 366 - pm_runtime_mark_last_busy(vdev->drm.dev); 367 - pm_runtime_put_autosuspend(vdev->drm.dev); 343 + ivpu_pm_reset_complete(vdev); 344 + 345 + ivpu_dbg(vdev, PM, "Post-reset done.\n"); 368 346 } 369 347 370 348 void ivpu_pm_init(struct ivpu_device *vdev)

+1 -3

drivers/acpi/prmt.c

··· 287 287 if (!handler || !module) 288 288 goto invalid_guid; 289 289 290 - if (!handler->handler_addr || 291 - !handler->static_data_buffer_addr || 292 - !handler->acpi_param_buffer_addr) { 290 + if (!handler->handler_addr) { 293 291 buffer->prm_status = PRM_HANDLER_ERROR; 294 292 return AE_OK; 295 293 }

+5 -5

drivers/acpi/property.c

··· 1187 1187 } 1188 1188 break; 1189 1189 } 1190 - if (nval == 0) 1191 - return -EINVAL; 1192 1190 1193 1191 if (obj->type == ACPI_TYPE_BUFFER) { 1194 1192 if (proptype != DEV_PROP_U8) ··· 1210 1212 ret = acpi_copy_property_array_uint(items, (u64 *)val, nval); 1211 1213 break; 1212 1214 case DEV_PROP_STRING: 1213 - ret = acpi_copy_property_array_string( 1214 - items, (char **)val, 1215 - min_t(u32, nval, obj->package.count)); 1215 + nval = min_t(u32, nval, obj->package.count); 1216 + if (nval == 0) 1217 + return -ENODATA; 1218 + 1219 + ret = acpi_copy_property_array_string(items, (char **)val, nval); 1216 1220 break; 1217 1221 default: 1218 1222 ret = -EINVAL;

+6

drivers/acpi/resource.c

··· 564 564 }, 565 565 }, 566 566 { 567 + .matches = { 568 + DMI_MATCH(DMI_SYS_VENDOR, "Eluktronics Inc."), 569 + DMI_MATCH(DMI_BOARD_NAME, "MECH-17"), 570 + }, 571 + }, 572 + { 567 573 /* TongFang GM6XGxX/TUXEDO Stellaris 16 Gen5 AMD */ 568 574 .matches = { 569 575 DMI_MATCH(DMI_BOARD_NAME, "GM6XGxX"),

+9 -12

drivers/base/power/main.c

··· 1191 1191 return PMSG_ON; 1192 1192 } 1193 1193 1194 - static void dpm_superior_set_must_resume(struct device *dev, bool set_active) 1194 + static void dpm_superior_set_must_resume(struct device *dev) 1195 1195 { 1196 1196 struct device_link *link; 1197 1197 int idx; 1198 1198 1199 - if (dev->parent) { 1199 + if (dev->parent) 1200 1200 dev->parent->power.must_resume = true; 1201 - if (set_active) 1202 - dev->parent->power.set_active = true; 1203 - } 1204 1201 1205 1202 idx = device_links_read_lock(); 1206 1203 1207 - list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) { 1204 + list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) 1208 1205 link->supplier->power.must_resume = true; 1209 - if (set_active) 1210 - link->supplier->power.set_active = true; 1211 - } 1212 1206 1213 1207 device_links_read_unlock(idx); 1214 1208 } ··· 1281 1287 dev->power.must_resume = true; 1282 1288 1283 1289 if (dev->power.must_resume) { 1284 - dev->power.set_active = dev->power.set_active || 1285 - dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND); 1286 - dpm_superior_set_must_resume(dev, dev->power.set_active); 1290 + if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)) { 1291 + dev->power.set_active = true; 1292 + if (dev->parent && !dev->parent->power.ignore_children) 1293 + dev->parent->power.set_active = true; 1294 + } 1295 + dpm_superior_set_must_resume(dev); 1287 1296 } 1288 1297 1289 1298 Complete:

+2 -2

drivers/block/sunvdc.c

··· 1127 1127 1128 1128 spin_lock_irq(&port->vio.lock); 1129 1129 port->drain = 0; 1130 - blk_mq_unquiesce_queue(q, memflags); 1131 - blk_mq_unfreeze_queue(q); 1130 + blk_mq_unquiesce_queue(q); 1131 + blk_mq_unfreeze_queue(q, memflags); 1132 1132 } 1133 1133 1134 1134 static void vdc_ldc_reset_timer_work(struct work_struct *work)

+1 -1

drivers/bus/moxtet.c

··· 657 657 658 658 id = moxtet->modules[pos->idx]; 659 659 660 - seq_printf(p, " moxtet-%s.%i#%i", mox_module_name(id), pos->idx, 660 + seq_printf(p, "moxtet-%s.%i#%i", mox_module_name(id), pos->idx, 661 661 pos->bit); 662 662 } 663 663

+2 -1

drivers/cpufreq/Kconfig.arm

··· 17 17 18 18 config ARM_AIROHA_SOC_CPUFREQ 19 19 tristate "Airoha EN7581 SoC CPUFreq support" 20 - depends on (ARCH_AIROHA && OF) || COMPILE_TEST 20 + depends on ARCH_AIROHA || COMPILE_TEST 21 + depends on OF 21 22 select PM_OPP 22 23 default ARCH_AIROHA 23 24 help

+10 -10

drivers/cpufreq/amd-pstate.c

··· 699 699 if (min_perf < lowest_nonlinear_perf) 700 700 min_perf = lowest_nonlinear_perf; 701 701 702 - max_perf = cap_perf; 702 + max_perf = cpudata->max_limit_perf; 703 703 if (max_perf < min_perf) 704 704 max_perf = min_perf; 705 705 ··· 747 747 guard(mutex)(&amd_pstate_driver_lock); 748 748 749 749 ret = amd_pstate_cpu_boost_update(policy, state); 750 - policy->boost_enabled = !ret ? state : false; 751 750 refresh_frequency_limits(policy); 752 751 753 752 return ret; ··· 821 822 822 823 static void amd_pstate_update_limits(unsigned int cpu) 823 824 { 824 - struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); 825 + struct cpufreq_policy *policy = NULL; 825 826 struct amd_cpudata *cpudata; 826 827 u32 prev_high = 0, cur_high = 0; 827 828 int ret; 828 829 bool highest_perf_changed = false; 829 830 831 + if (!amd_pstate_prefcore) 832 + return; 833 + 834 + policy = cpufreq_cpu_get(cpu); 830 835 if (!policy) 831 836 return; 832 837 833 838 cpudata = policy->driver_data; 834 839 835 - if (!amd_pstate_prefcore) 836 - return; 837 - 838 840 guard(mutex)(&amd_pstate_driver_lock); 839 841 840 842 ret = amd_get_highest_perf(cpu, &cur_high); 841 - if (ret) 842 - goto free_cpufreq_put; 843 + if (ret) { 844 + cpufreq_cpu_put(policy); 845 + return; 846 + } 843 847 844 848 prev_high = READ_ONCE(cpudata->prefcore_ranking); 845 849 highest_perf_changed = (prev_high != cur_high); ··· 852 850 if (cur_high < CPPC_MAX_PERF) 853 851 sched_set_itmt_core_prio((int)cur_high, cpu); 854 852 } 855 - 856 - free_cpufreq_put: 857 853 cpufreq_cpu_put(policy); 858 854 859 855 if (!highest_perf_changed)

+2 -1

drivers/cpufreq/cpufreq.c

··· 1571 1571 policy->cdev = of_cpufreq_cooling_register(policy); 1572 1572 1573 1573 /* Let the per-policy boost flag mirror the cpufreq_driver boost during init */ 1574 - if (policy->boost_enabled != cpufreq_boost_enabled()) { 1574 + if (cpufreq_driver->set_boost && 1575 + policy->boost_enabled != cpufreq_boost_enabled()) { 1575 1576 policy->boost_enabled = cpufreq_boost_enabled(); 1576 1577 ret = cpufreq_driver->set_boost(policy, policy->boost_enabled); 1577 1578 if (ret) {

+1 -1

drivers/firmware/Kconfig

··· 106 106 select ISCSI_BOOT_SYSFS 107 107 select ISCSI_IBFT_FIND if X86 108 108 depends on ACPI && SCSI && SCSI_LOWLEVEL 109 - default n 109 + default n 110 110 help 111 111 This option enables support for detection and exposing of iSCSI 112 112 Boot Firmware Table (iBFT) via sysfs to userspace. If you wish to

+4 -1

drivers/firmware/iscsi_ibft.c

··· 310 310 str += sprintf_ipaddr(str, nic->ip_addr); 311 311 break; 312 312 case ISCSI_BOOT_ETH_SUBNET_MASK: 313 - val = cpu_to_be32(~((1 << (32-nic->subnet_mask_prefix))-1)); 313 + if (nic->subnet_mask_prefix > 32) 314 + val = cpu_to_be32(~0); 315 + else 316 + val = cpu_to_be32(~((1 << (32-nic->subnet_mask_prefix))-1)); 314 317 str += sprintf(str, "%pI4", &val); 315 318 break; 316 319 case ISCSI_BOOT_ETH_PREFIX_LEN:

+1

drivers/gpio/Kconfig

··· 338 338 339 339 config GPIO_GRGPIO 340 340 tristate "Aeroflex Gaisler GRGPIO support" 341 + depends on OF || COMPILE_TEST 341 342 select GPIO_GENERIC 342 343 select IRQ_DOMAIN 343 344 help

-19

drivers/gpio/gpio-pca953x.c

··· 841 841 DECLARE_BITMAP(trigger, MAX_LINE); 842 842 int ret; 843 843 844 - if (chip->driver_data & PCA_PCAL) { 845 - /* Read the current interrupt status from the device */ 846 - ret = pca953x_read_regs(chip, PCAL953X_INT_STAT, trigger); 847 - if (ret) 848 - return false; 849 - 850 - /* Check latched inputs and clear interrupt status */ 851 - ret = pca953x_read_regs(chip, chip->regs->input, cur_stat); 852 - if (ret) 853 - return false; 854 - 855 - /* Apply filter for rising/falling edge selection */ 856 - bitmap_replace(new_stat, chip->irq_trig_fall, chip->irq_trig_raise, cur_stat, gc->ngpio); 857 - 858 - bitmap_and(pending, new_stat, trigger, gc->ngpio); 859 - 860 - return !bitmap_empty(pending, gc->ngpio); 861 - } 862 - 863 844 ret = pca953x_read_regs(chip, chip->regs->input, cur_stat); 864 845 if (ret) 865 846 return false;

+8 -5

drivers/gpio/gpio-sim.c

··· 1028 1028 struct configfs_subsystem *subsys = dev->group.cg_subsys; 1029 1029 struct gpio_sim_bank *bank; 1030 1030 struct gpio_sim_line *line; 1031 + struct config_item *item; 1031 1032 1032 1033 /* 1033 - * The device only needs to depend on leaf line entries. This is 1034 + * The device only needs to depend on leaf entries. This is 1034 1035 * sufficient to lock up all the configfs entries that the 1035 1036 * instantiated, alive device depends on. 1036 1037 */ 1037 1038 list_for_each_entry(bank, &dev->bank_list, siblings) { 1038 1039 list_for_each_entry(line, &bank->line_list, siblings) { 1040 + item = line->hog ? &line->hog->item 1041 + : &line->group.cg_item; 1042 + 1039 1043 if (lock) 1040 - WARN_ON(configfs_depend_item_unlocked( 1041 - subsys, &line->group.cg_item)); 1044 + WARN_ON(configfs_depend_item_unlocked(subsys, 1045 + item)); 1042 1046 else 1043 - configfs_undepend_item_unlocked( 1044 - &line->group.cg_item); 1047 + configfs_undepend_item_unlocked(item); 1045 1048 } 1046 1049 } 1047 1050 }

+2 -1

drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

··· 119 119 * - 3.57.0 - Compute tunneling on GFX10+ 120 120 * - 3.58.0 - Add GFX12 DCC support 121 121 * - 3.59.0 - Cleared VRAM 122 + * - 3.60.0 - Add AMDGPU_TILING_GFX12_DCC_WRITE_COMPRESS_DISABLE (Vulkan requirement) 122 123 */ 123 124 #define KMS_DRIVER_MAJOR 3 124 - #define KMS_DRIVER_MINOR 59 125 + #define KMS_DRIVER_MINOR 60 125 126 #define KMS_DRIVER_PATCHLEVEL 0 126 127 127 128 /*

+6 -2

drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

··· 309 309 mutex_lock(&adev->mman.gtt_window_lock); 310 310 while (src_mm.remaining) { 311 311 uint64_t from, to, cur_size, tiling_flags; 312 - uint32_t num_type, data_format, max_com; 312 + uint32_t num_type, data_format, max_com, write_compress_disable; 313 313 struct dma_fence *next; 314 314 315 315 /* Never copy more than 256MiB at once to avoid a timeout */ ··· 340 340 max_com = AMDGPU_TILING_GET(tiling_flags, GFX12_DCC_MAX_COMPRESSED_BLOCK); 341 341 num_type = AMDGPU_TILING_GET(tiling_flags, GFX12_DCC_NUMBER_TYPE); 342 342 data_format = AMDGPU_TILING_GET(tiling_flags, GFX12_DCC_DATA_FORMAT); 343 + write_compress_disable = 344 + AMDGPU_TILING_GET(tiling_flags, GFX12_DCC_WRITE_COMPRESS_DISABLE); 343 345 copy_flags |= (AMDGPU_COPY_FLAGS_SET(MAX_COMPRESSED, max_com) | 344 346 AMDGPU_COPY_FLAGS_SET(NUMBER_TYPE, num_type) | 345 - AMDGPU_COPY_FLAGS_SET(DATA_FORMAT, data_format)); 347 + AMDGPU_COPY_FLAGS_SET(DATA_FORMAT, data_format) | 348 + AMDGPU_COPY_FLAGS_SET(WRITE_COMPRESS_DISABLE, 349 + write_compress_disable)); 346 350 } 347 351 348 352 r = amdgpu_copy_buffer(ring, from, to, cur_size, resv,

+2

drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h

··· 119 119 #define AMDGPU_COPY_FLAGS_NUMBER_TYPE_MASK 0x07 120 120 #define AMDGPU_COPY_FLAGS_DATA_FORMAT_SHIFT 8 121 121 #define AMDGPU_COPY_FLAGS_DATA_FORMAT_MASK 0x3f 122 + #define AMDGPU_COPY_FLAGS_WRITE_COMPRESS_DISABLE_SHIFT 14 123 + #define AMDGPU_COPY_FLAGS_WRITE_COMPRESS_DISABLE_MASK 0x1 122 124 123 125 #define AMDGPU_COPY_FLAGS_SET(field, value) \ 124 126 (((__u32)(value) & AMDGPU_COPY_FLAGS_##field##_MASK) << AMDGPU_COPY_FLAGS_##field##_SHIFT)

+3 -2

drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c

··· 1741 1741 uint32_t byte_count, 1742 1742 uint32_t copy_flags) 1743 1743 { 1744 - uint32_t num_type, data_format, max_com; 1744 + uint32_t num_type, data_format, max_com, write_cm; 1745 1745 1746 1746 max_com = AMDGPU_COPY_FLAGS_GET(copy_flags, MAX_COMPRESSED); 1747 1747 data_format = AMDGPU_COPY_FLAGS_GET(copy_flags, DATA_FORMAT); 1748 1748 num_type = AMDGPU_COPY_FLAGS_GET(copy_flags, NUMBER_TYPE); 1749 + write_cm = AMDGPU_COPY_FLAGS_GET(copy_flags, WRITE_COMPRESS_DISABLE) ? 2 : 1; 1749 1750 1750 1751 ib->ptr[ib->length_dw++] = SDMA_PKT_COPY_LINEAR_HEADER_OP(SDMA_OP_COPY) | 1751 1752 SDMA_PKT_COPY_LINEAR_HEADER_SUB_OP(SDMA_SUBOP_COPY_LINEAR) | ··· 1763 1762 if ((copy_flags & (AMDGPU_COPY_FLAGS_READ_DECOMPRESSED | AMDGPU_COPY_FLAGS_WRITE_COMPRESSED))) 1764 1763 ib->ptr[ib->length_dw++] = SDMA_DCC_DATA_FORMAT(data_format) | SDMA_DCC_NUM_TYPE(num_type) | 1765 1764 ((copy_flags & AMDGPU_COPY_FLAGS_READ_DECOMPRESSED) ? SDMA_DCC_READ_CM(2) : 0) | 1766 - ((copy_flags & AMDGPU_COPY_FLAGS_WRITE_COMPRESSED) ? SDMA_DCC_WRITE_CM(1) : 0) | 1765 + ((copy_flags & AMDGPU_COPY_FLAGS_WRITE_COMPRESSED) ? SDMA_DCC_WRITE_CM(write_cm) : 0) | 1767 1766 SDMA_DCC_MAX_COM(max_com) | SDMA_DCC_MAX_UCOM(1); 1768 1767 else 1769 1768 ib->ptr[ib->length_dw++] = 0;

+1 -1

drivers/gpu/drm/amd/display/dc/core/dc.c

··· 2133 2133 2134 2134 dc_enable_stereo(dc, context, dc_streams, context->stream_count); 2135 2135 2136 - if (context->stream_count > get_seamless_boot_stream_count(context) || 2136 + if (get_seamless_boot_stream_count(context) == 0 || 2137 2137 context->stream_count == 0) { 2138 2138 /* Must wait for no flips to be pending before doing optimize bw */ 2139 2139 hwss_wait_for_no_pipes_pending(dc, context);

+1 -2

drivers/gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c

··· 63 63 64 64 bool should_use_dmub_lock(struct dc_link *link) 65 65 { 66 - if (link->psr_settings.psr_version == DC_PSR_VERSION_SU_1 || 67 - link->psr_settings.psr_version == DC_PSR_VERSION_1) 66 + if (link->psr_settings.psr_version == DC_PSR_VERSION_SU_1) 68 67 return true; 69 68 70 69 if (link->replay_settings.replay_feature_enabled)

+9 -5

drivers/gpu/drm/amd/display/dc/dml/Makefile

··· 29 29 dml_rcflags := $(CC_FLAGS_NO_FPU) 30 30 31 31 ifneq ($(CONFIG_FRAME_WARN),0) 32 - ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y) 33 - frame_warn_flag := -Wframe-larger-than=3072 34 - else 35 - frame_warn_flag := -Wframe-larger-than=2048 36 - endif 32 + ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y) 33 + frame_warn_limit := 3072 34 + else 35 + frame_warn_limit := 2048 36 + endif 37 + 38 + ifeq ($(call test-lt, $(CONFIG_FRAME_WARN), $(frame_warn_limit)),y) 39 + frame_warn_flag := -Wframe-larger-than=$(frame_warn_limit) 40 + endif 37 41 endif 38 42 39 43 CFLAGS_$(AMDDALPATH)/dc/dml/display_mode_lib.o := $(dml_ccflags)

+13 -9

drivers/gpu/drm/amd/display/dc/dml2/Makefile

··· 28 28 dml2_rcflags := $(CC_FLAGS_NO_FPU) 29 29 30 30 ifneq ($(CONFIG_FRAME_WARN),0) 31 - ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y) 32 - ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_COMPILE_TEST),yy) 33 - frame_warn_flag := -Wframe-larger-than=4096 34 - else 35 - frame_warn_flag := -Wframe-larger-than=3072 36 - endif 37 - else 38 - frame_warn_flag := -Wframe-larger-than=2048 39 - endif 31 + ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y) 32 + ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_COMPILE_TEST),yy) 33 + frame_warn_limit := 4096 34 + else 35 + frame_warn_limit := 3072 36 + endif 37 + else 38 + frame_warn_limit := 2048 39 + endif 40 + 41 + ifeq ($(call test-lt, $(CONFIG_FRAME_WARN), $(frame_warn_limit)),y) 42 + frame_warn_flag := -Wframe-larger-than=$(frame_warn_limit) 43 + endif 40 44 endif 41 45 42 46 subdir-ccflags-y += -I$(FULL_AMD_DISPLAY_PATH)/dc/dml2

+2 -2

drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c

··· 1017 1017 if (disp_cfg_stream_location < 0) 1018 1018 disp_cfg_stream_location = dml_dispcfg->num_streams++; 1019 1019 1020 - ASSERT(disp_cfg_stream_location >= 0 && disp_cfg_stream_location <= __DML2_WRAPPER_MAX_STREAMS_PLANES__); 1020 + ASSERT(disp_cfg_stream_location >= 0 && disp_cfg_stream_location < __DML2_WRAPPER_MAX_STREAMS_PLANES__); 1021 1021 populate_dml21_timing_config_from_stream_state(&dml_dispcfg->stream_descriptors[disp_cfg_stream_location].timing, context->streams[stream_index], dml_ctx); 1022 1022 adjust_dml21_hblank_timing_config_from_pipe_ctx(&dml_dispcfg->stream_descriptors[disp_cfg_stream_location].timing, &context->res_ctx.pipe_ctx[stream_index]); 1023 1023 populate_dml21_output_config_from_stream_state(&dml_dispcfg->stream_descriptors[disp_cfg_stream_location].output, context->streams[stream_index], &context->res_ctx.pipe_ctx[stream_index]); ··· 1042 1042 if (disp_cfg_plane_location < 0) 1043 1043 disp_cfg_plane_location = dml_dispcfg->num_planes++; 1044 1044 1045 - ASSERT(disp_cfg_plane_location >= 0 && disp_cfg_plane_location <= __DML2_WRAPPER_MAX_STREAMS_PLANES__); 1045 + ASSERT(disp_cfg_plane_location >= 0 && disp_cfg_plane_location < __DML2_WRAPPER_MAX_STREAMS_PLANES__); 1046 1046 1047 1047 populate_dml21_surface_config_from_plane_state(in_dc, &dml_dispcfg->plane_descriptors[disp_cfg_plane_location].surface, context->stream_status[stream_index].plane_states[plane_index]); 1048 1048 populate_dml21_plane_config_from_plane_state(dml_ctx, &dml_dispcfg->plane_descriptors[disp_cfg_plane_location], context->stream_status[stream_index].plane_states[plane_index], context, stream_index);

+3 -3

drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c

··· 786 786 case SIGNAL_TYPE_DISPLAY_PORT_MST: 787 787 case SIGNAL_TYPE_DISPLAY_PORT: 788 788 out->OutputEncoder[location] = dml_dp; 789 - if (dml2->v20.scratch.hpo_stream_to_link_encoder_mapping[location] != -1) 789 + if (location < MAX_HPO_DP2_ENCODERS && dml2->v20.scratch.hpo_stream_to_link_encoder_mapping[location] != -1) 790 790 out->OutputEncoder[dml2->v20.scratch.hpo_stream_to_link_encoder_mapping[location]] = dml_dp2p0; 791 791 break; 792 792 case SIGNAL_TYPE_EDP: ··· 1343 1343 if (disp_cfg_stream_location < 0) 1344 1344 disp_cfg_stream_location = dml_dispcfg->num_timings++; 1345 1345 1346 - ASSERT(disp_cfg_stream_location >= 0 && disp_cfg_stream_location <= __DML2_WRAPPER_MAX_STREAMS_PLANES__); 1346 + ASSERT(disp_cfg_stream_location >= 0 && disp_cfg_stream_location < __DML2_WRAPPER_MAX_STREAMS_PLANES__); 1347 1347 1348 1348 populate_dml_timing_cfg_from_stream_state(&dml_dispcfg->timing, disp_cfg_stream_location, context->streams[i]); 1349 1349 populate_dml_output_cfg_from_stream_state(&dml_dispcfg->output, disp_cfg_stream_location, context->streams[i], current_pipe_context, dml2); ··· 1383 1383 if (disp_cfg_plane_location < 0) 1384 1384 disp_cfg_plane_location = dml_dispcfg->num_surfaces++; 1385 1385 1386 - ASSERT(disp_cfg_plane_location >= 0 && disp_cfg_plane_location <= __DML2_WRAPPER_MAX_STREAMS_PLANES__); 1386 + ASSERT(disp_cfg_plane_location >= 0 && disp_cfg_plane_location < __DML2_WRAPPER_MAX_STREAMS_PLANES__); 1387 1387 1388 1388 populate_dml_surface_cfg_from_plane_state(dml2->v20.dml_core_ctx.project, &dml_dispcfg->surface, disp_cfg_plane_location, context->stream_status[i].plane_states[j]); 1389 1389 populate_dml_plane_cfg_from_plane_state(

+2 -1

drivers/gpu/drm/amd/display/dc/hubbub/dcn30/dcn30_hubbub.c

··· 129 129 REG_UPDATE(DCHUBBUB_ARB_DF_REQ_OUTSTAND, 130 130 DCHUBBUB_ARB_MIN_REQ_OUTSTAND, 0x1FF); 131 131 132 - hubbub1_allow_self_refresh_control(hubbub, !hubbub->ctx->dc->debug.disable_stutter); 132 + if (safe_to_lower || hubbub->ctx->dc->debug.disable_stutter) 133 + hubbub1_allow_self_refresh_control(hubbub, !hubbub->ctx->dc->debug.disable_stutter); 133 134 134 135 return wm_pending; 135 136 }

+2 -1

drivers/gpu/drm/amd/display/dc/hubbub/dcn31/dcn31_hubbub.c

··· 750 750 REG_UPDATE(DCHUBBUB_ARB_DF_REQ_OUTSTAND, 751 751 DCHUBBUB_ARB_MIN_REQ_OUTSTAND, 0x1FF);*/ 752 752 753 - hubbub1_allow_self_refresh_control(hubbub, !hubbub->ctx->dc->debug.disable_stutter); 753 + if (safe_to_lower || hubbub->ctx->dc->debug.disable_stutter) 754 + hubbub1_allow_self_refresh_control(hubbub, !hubbub->ctx->dc->debug.disable_stutter); 754 755 return wm_pending; 755 756 } 756 757

+2 -1

drivers/gpu/drm/amd/display/dc/hubbub/dcn32/dcn32_hubbub.c

··· 786 786 REG_UPDATE(DCHUBBUB_ARB_DF_REQ_OUTSTAND, 787 787 DCHUBBUB_ARB_MIN_REQ_OUTSTAND, 0x1FF);*/ 788 788 789 - hubbub1_allow_self_refresh_control(hubbub, !hubbub->ctx->dc->debug.disable_stutter); 789 + if (safe_to_lower || hubbub->ctx->dc->debug.disable_stutter) 790 + hubbub1_allow_self_refresh_control(hubbub, !hubbub->ctx->dc->debug.disable_stutter); 790 791 791 792 hubbub32_force_usr_retraining_allow(hubbub, hubbub->ctx->dc->debug.force_usr_allow); 792 793

+2 -1

drivers/gpu/drm/amd/display/dc/hubbub/dcn35/dcn35_hubbub.c

··· 326 326 DCHUBBUB_ARB_MIN_REQ_OUTSTAND_COMMIT_THRESHOLD, 0xA);/*hw delta*/ 327 327 REG_UPDATE(DCHUBBUB_ARB_HOSTVM_CNTL, DCHUBBUB_ARB_MAX_QOS_COMMIT_THRESHOLD, 0xF); 328 328 329 - hubbub1_allow_self_refresh_control(hubbub, !hubbub->ctx->dc->debug.disable_stutter); 329 + if (safe_to_lower || hubbub->ctx->dc->debug.disable_stutter) 330 + hubbub1_allow_self_refresh_control(hubbub, !hubbub->ctx->dc->debug.disable_stutter); 330 331 331 332 hubbub32_force_usr_retraining_allow(hubbub, hubbub->ctx->dc->debug.force_usr_allow); 332 333

+2

drivers/gpu/drm/amd/display/dc/hubp/dcn30/dcn30_hubp.c

··· 500 500 //hubp[i].HUBPREQ_DEBUG.HUBPREQ_DEBUG[26] = 1; 501 501 REG_WRITE(HUBPREQ_DEBUG, 1 << 26); 502 502 503 + REG_UPDATE(DCHUBP_CNTL, HUBP_TTU_DISABLE, 0); 504 + 503 505 hubp_reset(hubp); 504 506 } 505 507

+2

drivers/gpu/drm/amd/display/dc/hubp/dcn32/dcn32_hubp.c

··· 168 168 { 169 169 struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); 170 170 REG_WRITE(HUBPREQ_DEBUG_DB, 1 << 8); 171 + 172 + REG_UPDATE(DCHUBP_CNTL, HUBP_TTU_DISABLE, 0); 171 173 } 172 174 static struct hubp_funcs dcn32_hubp_funcs = { 173 175 .hubp_enable_tripleBuffer = hubp2_enable_triplebuffer,

+2 -1

drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c

··· 236 236 } 237 237 238 238 hws->funcs.init_pipes(dc, dc->current_state); 239 - if (dc->res_pool->hubbub->funcs->allow_self_refresh_control) 239 + if (dc->res_pool->hubbub->funcs->allow_self_refresh_control && 240 + !dc->res_pool->hubbub->ctx->dc->debug.disable_stutter) 240 241 dc->res_pool->hubbub->funcs->allow_self_refresh_control(dc->res_pool->hubbub, 241 242 !dc->res_pool->hubbub->ctx->dc->debug.disable_stutter); 242 243 }

+4

drivers/gpu/drm/arm/display/komeda/komeda_wb_connector.c

··· 160 160 formats = komeda_get_layer_fourcc_list(&mdev->fmt_tbl, 161 161 kwb_conn->wb_layer->layer_type, 162 162 &n_formats); 163 + if (!formats) { 164 + kfree(kwb_conn); 165 + return -ENOMEM; 166 + } 163 167 164 168 err = drm_writeback_connector_init(&kms->base, wb_conn, 165 169 &komeda_wb_connector_funcs,

+1 -1

drivers/gpu/drm/ast/ast_dp.c

··· 256 256 if (enabled) 257 257 vgacrdf_test |= AST_IO_VGACRDF_DP_VIDEO_ENABLE; 258 258 259 - for (i = 0; i < 200; ++i) { 259 + for (i = 0; i < 1000; ++i) { 260 260 if (i) 261 261 mdelay(1); 262 262 vgacrdf = ast_get_index_reg_mask(ast, AST_IO_VGACRI, 0xdf,

+3 -11

drivers/gpu/drm/display/drm_dp_cec.c

··· 311 311 if (!aux->transfer) 312 312 return; 313 313 314 - #ifndef CONFIG_MEDIA_CEC_RC 315 - /* 316 - * CEC_CAP_RC is part of CEC_CAP_DEFAULTS, but it is stripped by 317 - * cec_allocate_adapter() if CONFIG_MEDIA_CEC_RC is undefined. 318 - * 319 - * Do this here as well to ensure the tests against cec_caps are 320 - * correct. 321 - */ 322 - cec_caps &= ~CEC_CAP_RC; 323 - #endif 324 314 cancel_delayed_work_sync(&aux->cec.unregister_work); 325 315 326 316 mutex_lock(&aux->cec.lock); ··· 327 337 num_las = CEC_MAX_LOG_ADDRS; 328 338 329 339 if (aux->cec.adap) { 330 - if (aux->cec.adap->capabilities == cec_caps && 340 + /* Check if the adapter properties have changed */ 341 + if ((aux->cec.adap->capabilities & CEC_CAP_MONITOR_ALL) == 342 + (cec_caps & CEC_CAP_MONITOR_ALL) && 331 343 aux->cec.adap->available_log_addrs == num_las) { 332 344 /* Unchanged, so just set the phys addr */ 333 345 cec_s_phys_addr(aux->cec.adap, source_physical_address, false);

+3 -2

drivers/gpu/drm/i915/display/intel_backlight.c

··· 41 41 { 42 42 u64 target_val; 43 43 44 - WARN_ON(source_min > source_max); 45 - WARN_ON(target_min > target_max); 44 + if (WARN_ON(source_min >= source_max) || 45 + WARN_ON(target_min > target_max)) 46 + return target_min; 46 47 47 48 /* defensive */ 48 49 source_val = clamp(source_val, source_min, source_max);

+5 -7

drivers/gpu/drm/i915/display/intel_dp.c

··· 1791 1791 if (DISPLAY_VER(display) == 11) 1792 1792 return 10; 1793 1793 1794 - return 0; 1794 + return intel_dp_dsc_min_src_input_bpc(); 1795 1795 } 1796 1796 1797 1797 int intel_dp_dsc_compute_max_bpp(const struct intel_connector *connector, ··· 2072 2072 /* Compressed BPP should be less than the Input DSC bpp */ 2073 2073 dsc_max_bpp = min(dsc_max_bpp, pipe_bpp - 1); 2074 2074 2075 - for (i = 0; i < ARRAY_SIZE(valid_dsc_bpp); i++) { 2076 - if (valid_dsc_bpp[i] < dsc_min_bpp) 2075 + for (i = ARRAY_SIZE(valid_dsc_bpp) - 1; i >= 0; i--) { 2076 + if (valid_dsc_bpp[i] < dsc_min_bpp || 2077 + valid_dsc_bpp[i] > dsc_max_bpp) 2077 2078 continue; 2078 - if (valid_dsc_bpp[i] > dsc_max_bpp) 2079 - break; 2080 2079 2081 2080 ret = dsc_compute_link_config(intel_dp, 2082 2081 pipe_config, ··· 2828 2829 2829 2830 crtc_state->infoframes.enable |= intel_hdmi_infoframe_enable(DP_SDP_ADAPTIVE_SYNC); 2830 2831 2831 - /* Currently only DP_AS_SDP_AVT_FIXED_VTOTAL mode supported */ 2832 2832 as_sdp->sdp_type = DP_SDP_ADAPTIVE_SYNC; 2833 2833 as_sdp->length = 0x9; 2834 2834 as_sdp->duration_incr_ms = 0; ··· 2838 2840 as_sdp->target_rr = drm_mode_vrefresh(adjusted_mode); 2839 2841 as_sdp->target_rr_divider = true; 2840 2842 } else { 2841 - as_sdp->mode = DP_AS_SDP_AVT_FIXED_VTOTAL; 2843 + as_sdp->mode = DP_AS_SDP_AVT_DYNAMIC_VTOTAL; 2842 2844 as_sdp->vtotal = adjusted_mode->vtotal; 2843 2845 as_sdp->target_rr = 0; 2844 2846 }

+4

drivers/gpu/drm/i915/display/intel_dp_mst.c

··· 341 341 342 342 break; 343 343 } 344 + 345 + /* Allow using zero step to indicate one try */ 346 + if (!step) 347 + break; 344 348 } 345 349 346 350 if (slots < 0) {

+14 -1

drivers/gpu/drm/i915/display/intel_hdcp.c

··· 41 41 u32 rekey_bit = 0; 42 42 43 43 /* Here we assume HDMI is in TMDS mode of operation */ 44 - if (encoder->type != INTEL_OUTPUT_HDMI) 44 + if (!intel_encoder_is_hdmi(encoder)) 45 45 return; 46 46 47 47 if (DISPLAY_VER(display) >= 30) { ··· 2188 2188 2189 2189 drm_dbg_kms(display->drm, 2190 2190 "HDCP2.2 Downstream topology change\n"); 2191 + 2192 + ret = hdcp2_authenticate_repeater_topology(connector); 2193 + if (!ret) { 2194 + intel_hdcp_update_value(connector, 2195 + DRM_MODE_CONTENT_PROTECTION_ENABLED, 2196 + true); 2197 + goto out; 2198 + } 2199 + 2200 + drm_dbg_kms(display->drm, 2201 + "[CONNECTOR:%d:%s] Repeater topology auth failed.(%d)\n", 2202 + connector->base.base.id, connector->base.name, 2203 + ret); 2191 2204 } else { 2192 2205 drm_dbg_kms(display->drm, 2193 2206 "[CONNECTOR:%d:%s] HDCP2.2 link failed, retrying auth\n",

-4

drivers/gpu/drm/i915/display/skl_universal_plane.c

··· 106 106 DRM_FORMAT_Y216, 107 107 DRM_FORMAT_XYUV8888, 108 108 DRM_FORMAT_XVYU2101010, 109 - DRM_FORMAT_XVYU12_16161616, 110 - DRM_FORMAT_XVYU16161616, 111 109 }; 112 110 113 111 static const u32 icl_sdr_uv_plane_formats[] = { ··· 132 134 DRM_FORMAT_Y216, 133 135 DRM_FORMAT_XYUV8888, 134 136 DRM_FORMAT_XVYU2101010, 135 - DRM_FORMAT_XVYU12_16161616, 136 - DRM_FORMAT_XVYU16161616, 137 137 }; 138 138 139 139 static const u32 icl_hdr_plane_formats[] = {

+1 -5

drivers/gpu/drm/i915/gem/i915_gem_shmem.c

··· 209 209 struct address_space *mapping = obj->base.filp->f_mapping; 210 210 unsigned int max_segment = i915_sg_segment_size(i915->drm.dev); 211 211 struct sg_table *st; 212 - struct sgt_iter sgt_iter; 213 - struct page *page; 214 212 int ret; 215 213 216 214 /* ··· 237 239 * for PAGE_SIZE chunks instead may be helpful. 238 240 */ 239 241 if (max_segment > PAGE_SIZE) { 240 - for_each_sgt_page(page, sgt_iter, st) 241 - put_page(page); 242 - sg_free_table(st); 242 + shmem_sg_free_table(st, mapping, false, false); 243 243 kfree(st); 244 244 245 245 max_segment = PAGE_SIZE;

+30 -6

drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

··· 1469 1469 spin_unlock_irqrestore(&guc->timestamp.lock, flags); 1470 1470 } 1471 1471 1472 + static void __update_guc_busyness_running_state(struct intel_guc *guc) 1473 + { 1474 + struct intel_gt *gt = guc_to_gt(guc); 1475 + struct intel_engine_cs *engine; 1476 + enum intel_engine_id id; 1477 + unsigned long flags; 1478 + 1479 + spin_lock_irqsave(&guc->timestamp.lock, flags); 1480 + for_each_engine(engine, gt, id) 1481 + engine->stats.guc.running = false; 1482 + spin_unlock_irqrestore(&guc->timestamp.lock, flags); 1483 + } 1484 + 1472 1485 static void __update_guc_busyness_stats(struct intel_guc *guc) 1473 1486 { 1474 1487 struct intel_gt *gt = guc_to_gt(guc); ··· 1631 1618 1632 1619 if (!guc_submission_initialized(guc)) 1633 1620 return; 1621 + 1622 + /* Assume no engines are running and set running state to false */ 1623 + __update_guc_busyness_running_state(guc); 1634 1624 1635 1625 /* 1636 1626 * There is a race with suspend flow where the worker runs after suspend ··· 5535 5519 { 5536 5520 drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id.id); 5537 5521 drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca); 5538 - drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n", 5539 - ce->ring->head, 5540 - ce->lrc_reg_state[CTX_RING_HEAD]); 5541 - drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n", 5542 - ce->ring->tail, 5543 - ce->lrc_reg_state[CTX_RING_TAIL]); 5522 + if (intel_context_pin_if_active(ce)) { 5523 + drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n", 5524 + ce->ring->head, 5525 + ce->lrc_reg_state[CTX_RING_HEAD]); 5526 + drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n", 5527 + ce->ring->tail, 5528 + ce->lrc_reg_state[CTX_RING_TAIL]); 5529 + intel_context_unpin(ce); 5530 + } else { 5531 + drm_printf(p, "\t\tLRC Head: Internal %u, Memory not pinned\n", 5532 + ce->ring->head); 5533 + drm_printf(p, "\t\tLRC Tail: Internal %u, Memory not pinned\n", 5534 + ce->ring->tail); 5535 + } 5544 5536 drm_printf(p, "\t\tContext Pin Count: %u\n", 5545 5537 atomic_read(&ce->pin_count)); 5546 5538 drm_printf(p, "\t\tGuC ID Ref Count: %u\n",

+6

drivers/gpu/drm/xe/regs/xe_oa_regs.h

··· 51 51 /* Common to all OA units */ 52 52 #define OA_OACONTROL_REPORT_BC_MASK REG_GENMASK(9, 9) 53 53 #define OA_OACONTROL_COUNTER_SIZE_MASK REG_GENMASK(8, 8) 54 + #define OAG_OACONTROL_USED_BITS \ 55 + (OAG_OACONTROL_OA_PES_DISAG_EN | OAG_OACONTROL_OA_CCS_SELECT_MASK | \ 56 + OAG_OACONTROL_OA_COUNTER_SEL_MASK | OAG_OACONTROL_OA_COUNTER_ENABLE | \ 57 + OA_OACONTROL_REPORT_BC_MASK | OA_OACONTROL_COUNTER_SIZE_MASK) 54 58 55 59 #define OAG_OA_DEBUG XE_REG(0xdaf8, XE_REG_OPTION_MASKED) 56 60 #define OAG_OA_DEBUG_DISABLE_MMIO_TRG REG_BIT(14) ··· 82 78 #define OAM_CONTEXT_CONTROL_OFFSET (0x1bc) 83 79 #define OAM_CONTROL_OFFSET (0x194) 84 80 #define OAM_CONTROL_COUNTER_SEL_MASK REG_GENMASK(3, 1) 81 + #define OAM_OACONTROL_USED_BITS \ 82 + (OAM_CONTROL_COUNTER_SEL_MASK | OAG_OACONTROL_OA_COUNTER_ENABLE) 85 83 #define OAM_DEBUG_OFFSET (0x198) 86 84 #define OAM_STATUS_OFFSET (0x19c) 87 85 #define OAM_MMIO_TRG_OFFSET (0x1d0)

+15 -27

drivers/gpu/drm/xe/xe_devcoredump.c

··· 119 119 drm_puts(&p, "\n**** GuC CT ****\n"); 120 120 xe_guc_ct_snapshot_print(ss->guc.ct, &p); 121 121 122 - /* 123 - * Don't add a new section header here because the mesa debug decoder 124 - * tool expects the context information to be in the 'GuC CT' section. 125 - */ 126 - /* drm_puts(&p, "\n**** Contexts ****\n"); */ 122 + drm_puts(&p, "\n**** Contexts ****\n"); 127 123 xe_guc_exec_queue_snapshot_print(ss->ge, &p); 128 124 129 125 drm_puts(&p, "\n**** Job ****\n"); ··· 391 395 /** 392 396 * xe_print_blob_ascii85 - print a BLOB to some useful location in ASCII85 393 397 * 394 - * The output is split to multiple lines because some print targets, e.g. dmesg 395 - * cannot handle arbitrarily long lines. Note also that printing to dmesg in 396 - * piece-meal fashion is not possible, each separate call to drm_puts() has a 397 - * line-feed automatically added! Therefore, the entire output line must be 398 - * constructed in a local buffer first, then printed in one atomic output call. 398 + * The output is split into multiple calls to drm_puts() because some print 399 + * targets, e.g. dmesg, cannot handle arbitrarily long lines. These targets may 400 + * add newlines, as is the case with dmesg: each drm_puts() call creates a 401 + * separate line. 399 402 * 400 403 * There is also a scheduler yield call to prevent the 'task has been stuck for 401 404 * 120s' kernel hang check feature from firing when printing to a slow target 402 405 * such as dmesg over a serial port. 403 406 * 404 - * TODO: Add compression prior to the ASCII85 encoding to shrink huge buffers down. 405 - * 406 407 * @p: the printer object to output to 407 408 * @prefix: optional prefix to add to output string 409 + * @suffix: optional suffix to add at the end. 0 disables it and is 410 + * not added to the output, which is useful when using multiple calls 411 + * to dump data to @p 408 412 * @blob: the Binary Large OBject to dump out 409 413 * @offset: offset in bytes to skip from the front of the BLOB, must be a multiple of sizeof(u32) 410 414 * @size: the size in bytes of the BLOB, must be a multiple of sizeof(u32) 411 415 */ 412 - void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, 416 + void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix, 413 417 const void *blob, size_t offset, size_t size) 414 418 { 415 419 const u32 *blob32 = (const u32 *)blob; 416 420 char buff[ASCII85_BUFSZ], *line_buff; 417 421 size_t line_pos = 0; 418 422 419 - /* 420 - * Splitting blobs across multiple lines is not compatible with the mesa 421 - * debug decoder tool. Note that even dropping the explicit '\n' below 422 - * doesn't help because the GuC log is so big some underlying implementation 423 - * still splits the lines at 512K characters. So just bail completely for 424 - * the moment. 425 - */ 426 - return; 427 - 428 423 #define DMESG_MAX_LINE_LEN 800 429 - #define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "\n\0" */ 424 + /* Always leave space for the suffix char and the \0 */ 425 + #define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "<suffix>\0" */ 430 426 431 427 if (size & 3) 432 428 drm_printf(p, "Size not word aligned: %zu", size); ··· 450 462 line_pos += strlen(line_buff + line_pos); 451 463 452 464 if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) { 453 - line_buff[line_pos++] = '\n'; 454 465 line_buff[line_pos++] = 0; 455 466 456 467 drm_puts(p, line_buff); ··· 461 474 } 462 475 } 463 476 464 - if (line_pos) { 465 - line_buff[line_pos++] = '\n'; 466 - line_buff[line_pos++] = 0; 477 + if (suffix) 478 + line_buff[line_pos++] = suffix; 467 479 480 + if (line_pos) { 481 + line_buff[line_pos++] = 0; 468 482 drm_puts(p, line_buff); 469 483 } 470 484

+1 -1

drivers/gpu/drm/xe/xe_devcoredump.h

··· 29 29 } 30 30 #endif 31 31 32 - void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, 32 + void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix, 33 33 const void *blob, size_t offset, size_t size); 34 34 35 35 #endif

+3 -1

drivers/gpu/drm/xe/xe_gt.c

··· 532 532 if (IS_SRIOV_PF(gt_to_xe(gt)) && !xe_gt_is_media_type(gt)) 533 533 xe_lmtt_init_hw(&gt_to_tile(gt)->sriov.pf.lmtt); 534 534 535 - if (IS_SRIOV_PF(gt_to_xe(gt))) 535 + if (IS_SRIOV_PF(gt_to_xe(gt))) { 536 + xe_gt_sriov_pf_init(gt); 536 537 xe_gt_sriov_pf_init_hw(gt); 538 + } 537 539 538 540 xe_force_wake_put(gt_to_fw(gt), fw_ref); 539 541

+13 -1

drivers/gpu/drm/xe/xe_gt_sriov_pf.c

··· 68 68 return 0; 69 69 } 70 70 71 + /** 72 + * xe_gt_sriov_pf_init - Prepare SR-IOV PF data structures on PF. 73 + * @gt: the &xe_gt to initialize 74 + * 75 + * Late one-time initialization of the PF data. 76 + * 77 + * Return: 0 on success or a negative error code on failure. 78 + */ 79 + int xe_gt_sriov_pf_init(struct xe_gt *gt) 80 + { 81 + return xe_gt_sriov_pf_migration_init(gt); 82 + } 83 + 71 84 static bool pf_needs_enable_ggtt_guest_update(struct xe_device *xe) 72 85 { 73 86 return GRAPHICS_VERx100(xe) == 1200; ··· 103 90 pf_enable_ggtt_guest_update(gt); 104 91 105 92 xe_gt_sriov_pf_service_update(gt); 106 - xe_gt_sriov_pf_migration_init(gt); 107 93 } 108 94 109 95 static u32 pf_get_vf_regs_stride(struct xe_device *xe)

+6

drivers/gpu/drm/xe/xe_gt_sriov_pf.h

··· 10 10 11 11 #ifdef CONFIG_PCI_IOV 12 12 int xe_gt_sriov_pf_init_early(struct xe_gt *gt); 13 + int xe_gt_sriov_pf_init(struct xe_gt *gt); 13 14 void xe_gt_sriov_pf_init_hw(struct xe_gt *gt); 14 15 void xe_gt_sriov_pf_sanitize_hw(struct xe_gt *gt, unsigned int vfid); 15 16 void xe_gt_sriov_pf_restart(struct xe_gt *gt); 16 17 #else 17 18 static inline int xe_gt_sriov_pf_init_early(struct xe_gt *gt) 19 + { 20 + return 0; 21 + } 22 + 23 + static inline int xe_gt_sriov_pf_init(struct xe_gt *gt) 18 24 { 19 25 return 0; 20 26 }

+2 -1

drivers/gpu/drm/xe/xe_guc_ct.c

··· 1724 1724 snapshot->g2h_outstanding); 1725 1725 1726 1726 if (snapshot->ctb) 1727 - xe_print_blob_ascii85(p, "CTB data", snapshot->ctb, 0, snapshot->ctb_size); 1727 + xe_print_blob_ascii85(p, "CTB data", '\n', 1728 + snapshot->ctb, 0, snapshot->ctb_size); 1728 1729 } else { 1729 1730 drm_puts(p, "CT disabled\n"); 1730 1731 }

+3 -1

drivers/gpu/drm/xe/xe_guc_log.c

··· 211 211 remain = snapshot->size; 212 212 for (i = 0; i < snapshot->num_chunks; i++) { 213 213 size_t size = min(GUC_LOG_CHUNK_SIZE, remain); 214 + const char *prefix = i ? NULL : "Log data"; 215 + char suffix = i == snapshot->num_chunks - 1 ? '\n' : 0; 214 216 215 - xe_print_blob_ascii85(p, i ? NULL : "Log data", snapshot->copy[i], 0, size); 217 + xe_print_blob_ascii85(p, prefix, suffix, snapshot->copy[i], 0, size); 216 218 remain -= size; 217 219 } 218 220 }

+13 -8

drivers/gpu/drm/xe/xe_oa.c

··· 237 237 u32 tail, hw_tail, partial_report_size, available; 238 238 int report_size = stream->oa_buffer.format->size; 239 239 unsigned long flags; 240 - bool pollin; 241 240 242 241 spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); 243 242 ··· 281 282 stream->oa_buffer.tail = tail; 282 283 283 284 available = xe_oa_circ_diff(stream, stream->oa_buffer.tail, stream->oa_buffer.head); 284 - pollin = available >= stream->wait_num_reports * report_size; 285 + stream->pollin = available >= stream->wait_num_reports * report_size; 285 286 286 287 spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); 287 288 288 - return pollin; 289 + return stream->pollin; 289 290 } 290 291 291 292 static enum hrtimer_restart xe_oa_poll_check_timer_cb(struct hrtimer *hrtimer) ··· 293 294 struct xe_oa_stream *stream = 294 295 container_of(hrtimer, typeof(*stream), poll_check_timer); 295 296 296 - if (xe_oa_buffer_check_unlocked(stream)) { 297 - stream->pollin = true; 297 + if (xe_oa_buffer_check_unlocked(stream)) 298 298 wake_up(&stream->poll_wq); 299 - } 300 299 301 300 hrtimer_forward_now(hrtimer, ns_to_ktime(stream->poll_period_ns)); 302 301 ··· 449 452 return val; 450 453 } 451 454 455 + static u32 __oactrl_used_bits(struct xe_oa_stream *stream) 456 + { 457 + return stream->hwe->oa_unit->type == DRM_XE_OA_UNIT_TYPE_OAG ? 458 + OAG_OACONTROL_USED_BITS : OAM_OACONTROL_USED_BITS; 459 + } 460 + 452 461 static void xe_oa_enable(struct xe_oa_stream *stream) 453 462 { 454 463 const struct xe_oa_format *format = stream->oa_buffer.format; ··· 475 472 stream->hwe->oa_unit->type == DRM_XE_OA_UNIT_TYPE_OAG) 476 473 val |= OAG_OACONTROL_OA_PES_DISAG_EN; 477 474 478 - xe_mmio_write32(&stream->gt->mmio, regs->oa_ctrl, val); 475 + xe_mmio_rmw32(&stream->gt->mmio, regs->oa_ctrl, __oactrl_used_bits(stream), val); 479 476 } 480 477 481 478 static void xe_oa_disable(struct xe_oa_stream *stream) 482 479 { 483 480 struct xe_mmio *mmio = &stream->gt->mmio; 484 481 485 - xe_mmio_write32(mmio, __oa_regs(stream)->oa_ctrl, 0); 482 + xe_mmio_rmw32(mmio, __oa_regs(stream)->oa_ctrl, __oactrl_used_bits(stream), 0); 486 483 if (xe_mmio_wait32(mmio, __oa_regs(stream)->oa_ctrl, 487 484 OAG_OACONTROL_OA_COUNTER_ENABLE, 0, 50000, NULL, false)) 488 485 drm_err(&stream->oa->xe->drm, ··· 2536 2533 u->regs = __oam_regs(mtl_oa_base[i]); 2537 2534 u->type = DRM_XE_OA_UNIT_TYPE_OAM; 2538 2535 } 2536 + 2537 + xe_mmio_write32(&gt->mmio, u->regs.oa_ctrl, 0); 2539 2538 2540 2539 /* Ensure MMIO trigger remains disabled till there is a stream */ 2541 2540 xe_mmio_write32(&gt->mmio, u->regs.oa_debug,

+76 -39

drivers/i2c/i2c-core-base.c

··· 1300 1300 info.flags |= I2C_CLIENT_SLAVE; 1301 1301 } 1302 1302 1303 - info.flags |= I2C_CLIENT_USER; 1304 - 1305 1303 client = i2c_new_client_device(adap, &info); 1306 1304 if (IS_ERR(client)) 1307 1305 return PTR_ERR(client); 1308 1306 1307 + /* Keep track of the added device */ 1308 + mutex_lock(&adap->userspace_clients_lock); 1309 + list_add_tail(&client->detected, &adap->userspace_clients); 1310 + mutex_unlock(&adap->userspace_clients_lock); 1309 1311 dev_info(dev, "%s: Instantiated device %s at 0x%02hx\n", "new_device", 1310 1312 info.type, info.addr); 1311 1313 1312 1314 return count; 1313 1315 } 1314 1316 static DEVICE_ATTR_WO(new_device); 1315 - 1316 - static int __i2c_find_user_addr(struct device *dev, const void *addrp) 1317 - { 1318 - struct i2c_client *client = i2c_verify_client(dev); 1319 - unsigned short addr = *(unsigned short *)addrp; 1320 - 1321 - return client && client->flags & I2C_CLIENT_USER && 1322 - i2c_encode_flags_to_addr(client) == addr; 1323 - } 1324 1317 1325 1318 /* 1326 1319 * And of course let the users delete the devices they instantiated, if ··· 1329 1336 const char *buf, size_t count) 1330 1337 { 1331 1338 struct i2c_adapter *adap = to_i2c_adapter(dev); 1332 - struct device *child_dev; 1339 + struct i2c_client *client, *next; 1333 1340 unsigned short addr; 1334 1341 char end; 1335 1342 int res; ··· 1345 1352 return -EINVAL; 1346 1353 } 1347 1354 1348 - mutex_lock(&core_lock); 1349 1355 /* Make sure the device was added through sysfs */ 1350 - child_dev = device_find_child(&adap->dev, &addr, __i2c_find_user_addr); 1351 - if (child_dev) { 1352 - i2c_unregister_device(i2c_verify_client(child_dev)); 1353 - put_device(child_dev); 1354 - } else { 1355 - dev_err(dev, "Can't find userspace-created device at %#x\n", addr); 1356 - count = -ENOENT; 1357 - } 1358 - mutex_unlock(&core_lock); 1356 + res = -ENOENT; 1357 + mutex_lock_nested(&adap->userspace_clients_lock, 1358 + i2c_adapter_depth(adap)); 1359 + list_for_each_entry_safe(client, next, &adap->userspace_clients, 1360 + detected) { 1361 + if (i2c_encode_flags_to_addr(client) == addr) { 1362 + dev_info(dev, "%s: Deleting device %s at 0x%02hx\n", 1363 + "delete_device", client->name, client->addr); 1359 1364 1360 - return count; 1365 + list_del(&client->detected); 1366 + i2c_unregister_device(client); 1367 + res = count; 1368 + break; 1369 + } 1370 + } 1371 + mutex_unlock(&adap->userspace_clients_lock); 1372 + 1373 + if (res < 0) 1374 + dev_err(dev, "%s: Can't find device in list\n", 1375 + "delete_device"); 1376 + return res; 1361 1377 } 1362 1378 static DEVICE_ATTR_IGNORE_LOCKDEP(delete_device, S_IWUSR, NULL, 1363 1379 delete_device_store); ··· 1537 1535 adap->locked_flags = 0; 1538 1536 rt_mutex_init(&adap->bus_lock); 1539 1537 rt_mutex_init(&adap->mux_lock); 1538 + mutex_init(&adap->userspace_clients_lock); 1539 + INIT_LIST_HEAD(&adap->userspace_clients); 1540 1540 1541 1541 /* Set default timeout to 1 second if not already set */ 1542 1542 if (adap->timeout == 0) ··· 1704 1700 } 1705 1701 EXPORT_SYMBOL_GPL(i2c_add_numbered_adapter); 1706 1702 1703 + static void i2c_do_del_adapter(struct i2c_driver *driver, 1704 + struct i2c_adapter *adapter) 1705 + { 1706 + struct i2c_client *client, *_n; 1707 + 1708 + /* Remove the devices we created ourselves as the result of hardware 1709 + * probing (using a driver's detect method) */ 1710 + list_for_each_entry_safe(client, _n, &driver->clients, detected) { 1711 + if (client->adapter == adapter) { 1712 + dev_dbg(&adapter->dev, "Removing %s at 0x%x\n", 1713 + client->name, client->addr); 1714 + list_del(&client->detected); 1715 + i2c_unregister_device(client); 1716 + } 1717 + } 1718 + } 1719 + 1707 1720 static int __unregister_client(struct device *dev, void *dummy) 1708 1721 { 1709 1722 struct i2c_client *client = i2c_verify_client(dev); ··· 1736 1715 return 0; 1737 1716 } 1738 1717 1718 + static int __process_removed_adapter(struct device_driver *d, void *data) 1719 + { 1720 + i2c_do_del_adapter(to_i2c_driver(d), data); 1721 + return 0; 1722 + } 1723 + 1739 1724 /** 1740 1725 * i2c_del_adapter - unregister I2C adapter 1741 1726 * @adap: the adapter being unregistered ··· 1753 1726 void i2c_del_adapter(struct i2c_adapter *adap) 1754 1727 { 1755 1728 struct i2c_adapter *found; 1729 + struct i2c_client *client, *next; 1756 1730 1757 1731 /* First make sure that this adapter was ever added */ 1758 1732 mutex_lock(&core_lock); ··· 1765 1737 } 1766 1738 1767 1739 i2c_acpi_remove_space_handler(adap); 1740 + /* Tell drivers about this removal */ 1741 + mutex_lock(&core_lock); 1742 + bus_for_each_drv(&i2c_bus_type, NULL, adap, 1743 + __process_removed_adapter); 1744 + mutex_unlock(&core_lock); 1745 + 1746 + /* Remove devices instantiated from sysfs */ 1747 + mutex_lock_nested(&adap->userspace_clients_lock, 1748 + i2c_adapter_depth(adap)); 1749 + list_for_each_entry_safe(client, next, &adap->userspace_clients, 1750 + detected) { 1751 + dev_dbg(&adap->dev, "Removing %s at 0x%x\n", client->name, 1752 + client->addr); 1753 + list_del(&client->detected); 1754 + i2c_unregister_device(client); 1755 + } 1756 + mutex_unlock(&adap->userspace_clients_lock); 1768 1757 1769 1758 /* Detach any active clients. This can't fail, thus we do not 1770 1759 * check the returned value. This is a two-pass process, because 1771 1760 * we can't remove the dummy devices during the first pass: they 1772 1761 * could have been instantiated by real devices wishing to clean 1773 1762 * them up properly, so we give them a chance to do that first. */ 1774 - mutex_lock(&core_lock); 1775 1763 device_for_each_child(&adap->dev, NULL, __unregister_client); 1776 1764 device_for_each_child(&adap->dev, NULL, __unregister_dummy); 1777 - mutex_unlock(&core_lock); 1778 1765 1779 1766 /* device name is gone after device_unregister */ 1780 1767 dev_dbg(&adap->dev, "adapter [%s] unregistered\n", adap->name); ··· 2009 1966 /* add the driver to the list of i2c drivers in the driver core */ 2010 1967 driver->driver.owner = owner; 2011 1968 driver->driver.bus = &i2c_bus_type; 1969 + INIT_LIST_HEAD(&driver->clients); 2012 1970 2013 1971 /* When registration returns, the driver core 2014 1972 * will have called probe() for all matching-but-unbound devices. ··· 2027 1983 } 2028 1984 EXPORT_SYMBOL(i2c_register_driver); 2029 1985 2030 - static int __i2c_unregister_detected_client(struct device *dev, void *argp) 1986 + static int __process_removed_driver(struct device *dev, void *data) 2031 1987 { 2032 - struct i2c_client *client = i2c_verify_client(dev); 2033 - 2034 - if (client && client->flags & I2C_CLIENT_AUTO) 2035 - i2c_unregister_device(client); 2036 - 1988 + if (dev->type == &i2c_adapter_type) 1989 + i2c_do_del_adapter(data, to_i2c_adapter(dev)); 2037 1990 return 0; 2038 1991 } 2039 1992 ··· 2041 2000 */ 2042 2001 void i2c_del_driver(struct i2c_driver *driver) 2043 2002 { 2044 - mutex_lock(&core_lock); 2045 - /* Satisfy __must_check, function can't fail */ 2046 - if (driver_for_each_device(&driver->driver, NULL, NULL, 2047 - __i2c_unregister_detected_client)) { 2048 - } 2049 - mutex_unlock(&core_lock); 2003 + i2c_for_each_dev(driver, __process_removed_driver); 2050 2004 2051 2005 driver_unregister(&driver->driver); 2052 2006 pr_debug("driver [%s] unregistered\n", driver->driver.name); ··· 2468 2432 /* Finally call the custom detection function */ 2469 2433 memset(&info, 0, sizeof(struct i2c_board_info)); 2470 2434 info.addr = addr; 2471 - info.flags = I2C_CLIENT_AUTO; 2472 2435 err = driver->detect(temp_client, &info); 2473 2436 if (err) { 2474 2437 /* -ENODEV is returned if the detection fails. We catch it ··· 2494 2459 dev_dbg(&adapter->dev, "Creating %s at 0x%02x\n", 2495 2460 info.type, info.addr); 2496 2461 client = i2c_new_client_device(adapter, &info); 2497 - if (IS_ERR(client)) 2462 + if (!IS_ERR(client)) 2463 + list_add_tail(&client->detected, &driver->clients); 2464 + else 2498 2465 dev_err(&adapter->dev, "Failed creating %s at 0x%02x\n", 2499 2466 info.type, info.addr); 2500 2467 }

+1

drivers/irqchip/Kconfig

··· 169 169 170 170 config LAN966X_OIC 171 171 tristate "Microchip LAN966x OIC Support" 172 + depends on MCHP_LAN966X_PCI || COMPILE_TEST 172 173 select GENERIC_IRQ_CHIP 173 174 select IRQ_DOMAIN 174 175 help

+2 -1

drivers/irqchip/irq-apple-aic.c

··· 577 577 AIC_FIQ_HWIRQ(AIC_TMR_EL02_VIRT)); 578 578 } 579 579 580 - if (read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & PMCR0_IACT) { 580 + if ((read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & (PMCR0_IMODE | PMCR0_IACT)) == 581 + (FIELD_PREP(PMCR0_IMODE, PMCR0_IMODE_FIQ) | PMCR0_IACT)) { 581 582 int irq; 582 583 if (cpumask_test_cpu(smp_processor_id(), 583 584 &aic_irqc->fiq_aff[AIC_CPU_PMU_P]->aff))

+2 -1

drivers/irqchip/irq-mvebu-icu.c

··· 68 68 unsigned long *hwirq, unsigned int *type) 69 69 { 70 70 unsigned int param_count = static_branch_unlikely(&legacy_bindings) ? 3 : 2; 71 - struct mvebu_icu_msi_data *msi_data = d->host_data; 71 + struct msi_domain_info *info = d->host_data; 72 + struct mvebu_icu_msi_data *msi_data = info->chip_data; 72 73 struct mvebu_icu *icu = msi_data->icu; 73 74 74 75 /* Check the count of the parameters in dt */

+1 -1

drivers/irqchip/irq-partition-percpu.c

··· 98 98 struct irq_chip *chip = irq_desc_get_chip(part->chained_desc); 99 99 struct irq_data *data = irq_desc_get_irq_data(part->chained_desc); 100 100 101 - seq_printf(p, " %5s-%lu", chip->name, data->hwirq); 101 + seq_printf(p, "%5s-%lu", chip->name, data->hwirq); 102 102 } 103 103 104 104 static struct irq_chip partition_irq_chip = {

+1 -1

drivers/irqchip/irq-riscv-imsic-early.c

··· 27 27 { 28 28 struct imsic_local_config *local = per_cpu_ptr(imsic->global.local, cpu); 29 29 30 - writel_relaxed(IMSIC_IPI_ID, local->msi_va); 30 + writel(IMSIC_IPI_ID, local->msi_va); 31 31 } 32 32 33 33 static void imsic_ipi_starting_cpu(void)

+1 -1

drivers/irqchip/irq-thead-c900-aclint-sswi.c

··· 31 31 32 32 static void thead_aclint_sswi_ipi_send(unsigned int cpu) 33 33 { 34 - writel_relaxed(0x1, per_cpu(sswi_cpu_regs, cpu)); 34 + writel(0x1, per_cpu(sswi_cpu_regs, cpu)); 35 35 } 36 36 37 37 static void thead_aclint_sswi_ipi_clear(void)

+1 -3

drivers/md/md-linear.c

··· 76 76 lim.max_write_zeroes_sectors = mddev->chunk_sectors; 77 77 lim.io_min = mddev->chunk_sectors << 9; 78 78 err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); 79 - if (err) { 80 - queue_limits_cancel_update(mddev->gendisk->queue); 79 + if (err) 81 80 return err; 82 - } 83 81 84 82 return queue_limits_set(mddev->gendisk->queue, &lim); 85 83 }

+3 -1

drivers/net/ethernet/aquantia/atlantic/aq_nic.c

··· 1441 1441 aq_ptp_ring_free(self); 1442 1442 aq_ptp_free(self); 1443 1443 1444 - if (likely(self->aq_fw_ops->deinit) && link_down) { 1444 + /* May be invoked during hot unplug. */ 1445 + if (pci_device_is_present(self->pdev) && 1446 + likely(self->aq_fw_ops->deinit) && link_down) { 1445 1447 mutex_lock(&self->fwreq_mutex); 1446 1448 self->aq_fw_ops->deinit(self->aq_hw); 1447 1449 mutex_unlock(&self->fwreq_mutex);

+12 -4

drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c

··· 41 41 { 42 42 struct bcmgenet_priv *priv = netdev_priv(dev); 43 43 struct device *kdev = &priv->pdev->dev; 44 + u32 phy_wolopts = 0; 44 45 45 - if (dev->phydev) 46 + if (dev->phydev) { 46 47 phy_ethtool_get_wol(dev->phydev, wol); 48 + phy_wolopts = wol->wolopts; 49 + } 47 50 48 51 /* MAC is not wake-up capable, return what the PHY does */ 49 52 if (!device_can_wakeup(kdev)) ··· 54 51 55 52 /* Overlay MAC capabilities with that of the PHY queried before */ 56 53 wol->supported |= WAKE_MAGIC | WAKE_MAGICSECURE | WAKE_FILTER; 57 - wol->wolopts = priv->wolopts; 58 - memset(wol->sopass, 0, sizeof(wol->sopass)); 54 + wol->wolopts |= priv->wolopts; 59 55 56 + /* Return the PHY configured magic password */ 57 + if (phy_wolopts & WAKE_MAGICSECURE) 58 + return; 59 + 60 + /* Otherwise the MAC one */ 61 + memset(wol->sopass, 0, sizeof(wol->sopass)); 60 62 if (wol->wolopts & WAKE_MAGICSECURE) 61 63 memcpy(wol->sopass, priv->sopass, sizeof(priv->sopass)); 62 64 } ··· 78 70 /* Try Wake-on-LAN from the PHY first */ 79 71 if (dev->phydev) { 80 72 ret = phy_ethtool_set_wol(dev->phydev, wol); 81 - if (ret != -EOPNOTSUPP) 73 + if (ret != -EOPNOTSUPP && wol->wolopts) 82 74 return ret; 83 75 } 84 76

+58

drivers/net/ethernet/broadcom/tg3.c

··· 55 55 #include <linux/hwmon.h> 56 56 #include <linux/hwmon-sysfs.h> 57 57 #include <linux/crc32poly.h> 58 + #include <linux/dmi.h> 58 59 59 60 #include <net/checksum.h> 60 61 #include <net/gso.h> ··· 18213 18212 18214 18213 static SIMPLE_DEV_PM_OPS(tg3_pm_ops, tg3_suspend, tg3_resume); 18215 18214 18215 + /* Systems where ACPI _PTS (Prepare To Sleep) S5 will result in a fatal 18216 + * PCIe AER event on the tg3 device if the tg3 device is not, or cannot 18217 + * be, powered down. 18218 + */ 18219 + static const struct dmi_system_id tg3_restart_aer_quirk_table[] = { 18220 + { 18221 + .matches = { 18222 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18223 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R440"), 18224 + }, 18225 + }, 18226 + { 18227 + .matches = { 18228 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18229 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R540"), 18230 + }, 18231 + }, 18232 + { 18233 + .matches = { 18234 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18235 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R640"), 18236 + }, 18237 + }, 18238 + { 18239 + .matches = { 18240 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18241 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R650"), 18242 + }, 18243 + }, 18244 + { 18245 + .matches = { 18246 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18247 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R740"), 18248 + }, 18249 + }, 18250 + { 18251 + .matches = { 18252 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18253 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R750"), 18254 + }, 18255 + }, 18256 + {} 18257 + }; 18258 + 18216 18259 static void tg3_shutdown(struct pci_dev *pdev) 18217 18260 { 18218 18261 struct net_device *dev = pci_get_drvdata(pdev); ··· 18273 18228 18274 18229 if (system_state == SYSTEM_POWER_OFF) 18275 18230 tg3_power_down(tp); 18231 + else if (system_state == SYSTEM_RESTART && 18232 + dmi_first_match(tg3_restart_aer_quirk_table) && 18233 + pdev->current_state != PCI_D3cold && 18234 + pdev->current_state != PCI_UNKNOWN) { 18235 + /* Disable PCIe AER on the tg3 to avoid a fatal 18236 + * error during this system restart. 18237 + */ 18238 + pcie_capability_clear_word(pdev, PCI_EXP_DEVCTL, 18239 + PCI_EXP_DEVCTL_CERE | 18240 + PCI_EXP_DEVCTL_NFERE | 18241 + PCI_EXP_DEVCTL_FERE | 18242 + PCI_EXP_DEVCTL_URRE); 18243 + } 18276 18244 18277 18245 rtnl_unlock(); 18278 18246

+3

drivers/net/ethernet/intel/ice/devlink/devlink.c

··· 981 981 982 982 /* preallocate memory for ice_sched_node */ 983 983 node = devm_kzalloc(ice_hw_to_dev(pi->hw), sizeof(*node), GFP_KERNEL); 984 + if (!node) 985 + return -ENOMEM; 986 + 984 987 *priv = node; 985 988 986 989 return 0;

+103 -47

drivers/net/ethernet/intel/ice/ice_txrx.c

··· 527 527 * @xdp: xdp_buff used as input to the XDP program 528 528 * @xdp_prog: XDP program to run 529 529 * @xdp_ring: ring to be used for XDP_TX action 530 - * @rx_buf: Rx buffer to store the XDP action 531 530 * @eop_desc: Last descriptor in packet to read metadata from 532 531 * 533 532 * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR} 534 533 */ 535 - static void 534 + static u32 536 535 ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, 537 536 struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring, 538 - struct ice_rx_buf *rx_buf, union ice_32b_rx_flex_desc *eop_desc) 537 + union ice_32b_rx_flex_desc *eop_desc) 539 538 { 540 539 unsigned int ret = ICE_XDP_PASS; 541 540 u32 act; ··· 573 574 ret = ICE_XDP_CONSUMED; 574 575 } 575 576 exit: 576 - ice_set_rx_bufs_act(xdp, rx_ring, ret); 577 + return ret; 577 578 } 578 579 579 580 /** ··· 859 860 xdp_buff_set_frags_flag(xdp); 860 861 } 861 862 862 - if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) { 863 - ice_set_rx_bufs_act(xdp, rx_ring, ICE_XDP_CONSUMED); 863 + if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) 864 864 return -ENOMEM; 865 - } 866 865 867 866 __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page, 868 867 rx_buf->page_offset, size); ··· 921 924 struct ice_rx_buf *rx_buf; 922 925 923 926 rx_buf = &rx_ring->rx_buf[ntc]; 924 - rx_buf->pgcnt = page_count(rx_buf->page); 925 927 prefetchw(rx_buf->page); 926 928 927 929 if (!size) ··· 934 938 rx_buf->pagecnt_bias--; 935 939 936 940 return rx_buf; 941 + } 942 + 943 + /** 944 + * ice_get_pgcnts - grab page_count() for gathered fragments 945 + * @rx_ring: Rx descriptor ring to store the page counts on 946 + * 947 + * This function is intended to be called right before running XDP 948 + * program so that the page recycling mechanism will be able to take 949 + * a correct decision regarding underlying pages; this is done in such 950 + * way as XDP program can change the refcount of page 951 + */ 952 + static void ice_get_pgcnts(struct ice_rx_ring *rx_ring) 953 + { 954 + u32 nr_frags = rx_ring->nr_frags + 1; 955 + u32 idx = rx_ring->first_desc; 956 + struct ice_rx_buf *rx_buf; 957 + u32 cnt = rx_ring->count; 958 + 959 + for (int i = 0; i < nr_frags; i++) { 960 + rx_buf = &rx_ring->rx_buf[idx]; 961 + rx_buf->pgcnt = page_count(rx_buf->page); 962 + 963 + if (++idx == cnt) 964 + idx = 0; 965 + } 937 966 } 938 967 939 968 /** ··· 1072 1051 rx_buf->page_offset + headlen, size, 1073 1052 xdp->frame_sz); 1074 1053 } else { 1075 - /* buffer is unused, change the act that should be taken later 1076 - * on; data was copied onto skb's linear part so there's no 1054 + /* buffer is unused, restore biased page count in Rx buffer; 1055 + * data was copied onto skb's linear part so there's no 1077 1056 * need for adjusting page offset and we can reuse this buffer 1078 1057 * as-is 1079 1058 */ 1080 - rx_buf->act = ICE_SKB_CONSUMED; 1059 + rx_buf->pagecnt_bias++; 1081 1060 } 1082 1061 1083 1062 if (unlikely(xdp_buff_has_frags(xdp))) { ··· 1125 1104 } 1126 1105 1127 1106 /** 1107 + * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all frame frags 1108 + * @rx_ring: Rx ring with all the auxiliary data 1109 + * @xdp: XDP buffer carrying linear + frags part 1110 + * @xdp_xmit: XDP_TX/XDP_REDIRECT verdict storage 1111 + * @ntc: a current next_to_clean value to be stored at rx_ring 1112 + * @verdict: return code from XDP program execution 1113 + * 1114 + * Walk through gathered fragments and satisfy internal page 1115 + * recycle mechanism; we take here an action related to verdict 1116 + * returned by XDP program; 1117 + */ 1118 + static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, 1119 + u32 *xdp_xmit, u32 ntc, u32 verdict) 1120 + { 1121 + u32 nr_frags = rx_ring->nr_frags + 1; 1122 + u32 idx = rx_ring->first_desc; 1123 + u32 cnt = rx_ring->count; 1124 + u32 post_xdp_frags = 1; 1125 + struct ice_rx_buf *buf; 1126 + int i; 1127 + 1128 + if (unlikely(xdp_buff_has_frags(xdp))) 1129 + post_xdp_frags += xdp_get_shared_info_from_buff(xdp)->nr_frags; 1130 + 1131 + for (i = 0; i < post_xdp_frags; i++) { 1132 + buf = &rx_ring->rx_buf[idx]; 1133 + 1134 + if (verdict & (ICE_XDP_TX | ICE_XDP_REDIR)) { 1135 + ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); 1136 + *xdp_xmit |= verdict; 1137 + } else if (verdict & ICE_XDP_CONSUMED) { 1138 + buf->pagecnt_bias++; 1139 + } else if (verdict == ICE_XDP_PASS) { 1140 + ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); 1141 + } 1142 + 1143 + ice_put_rx_buf(rx_ring, buf); 1144 + 1145 + if (++idx == cnt) 1146 + idx = 0; 1147 + } 1148 + /* handle buffers that represented frags released by XDP prog; 1149 + * for these we keep pagecnt_bias as-is; refcount from struct page 1150 + * has been decremented within XDP prog and we do not have to increase 1151 + * the biased refcnt 1152 + */ 1153 + for (; i < nr_frags; i++) { 1154 + buf = &rx_ring->rx_buf[idx]; 1155 + ice_put_rx_buf(rx_ring, buf); 1156 + if (++idx == cnt) 1157 + idx = 0; 1158 + } 1159 + 1160 + xdp->data = NULL; 1161 + rx_ring->first_desc = ntc; 1162 + rx_ring->nr_frags = 0; 1163 + } 1164 + 1165 + /** 1128 1166 * ice_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf 1129 1167 * @rx_ring: Rx descriptor ring to transact packets on 1130 1168 * @budget: Total limit on number of packets to process ··· 1200 1120 unsigned int total_rx_bytes = 0, total_rx_pkts = 0; 1201 1121 unsigned int offset = rx_ring->rx_offset; 1202 1122 struct xdp_buff *xdp = &rx_ring->xdp; 1203 - u32 cached_ntc = rx_ring->first_desc; 1204 1123 struct ice_tx_ring *xdp_ring = NULL; 1205 1124 struct bpf_prog *xdp_prog = NULL; 1206 1125 u32 ntc = rx_ring->next_to_clean; 1126 + u32 cached_ntu, xdp_verdict; 1207 1127 u32 cnt = rx_ring->count; 1208 1128 u32 xdp_xmit = 0; 1209 - u32 cached_ntu; 1210 1129 bool failure; 1211 - u32 first; 1212 1130 1213 1131 xdp_prog = READ_ONCE(rx_ring->xdp_prog); 1214 1132 if (xdp_prog) { ··· 1268 1190 xdp_prepare_buff(xdp, hard_start, offset, size, !!offset); 1269 1191 xdp_buff_clear_frags_flag(xdp); 1270 1192 } else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) { 1193 + ice_put_rx_mbuf(rx_ring, xdp, NULL, ntc, ICE_XDP_CONSUMED); 1271 1194 break; 1272 1195 } 1273 1196 if (++ntc == cnt) ··· 1278 1199 if (ice_is_non_eop(rx_ring, rx_desc)) 1279 1200 continue; 1280 1201 1281 - ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf, rx_desc); 1282 - if (rx_buf->act == ICE_XDP_PASS) 1202 + ice_get_pgcnts(rx_ring); 1203 + xdp_verdict = ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_desc); 1204 + if (xdp_verdict == ICE_XDP_PASS) 1283 1205 goto construct_skb; 1284 1206 total_rx_bytes += xdp_get_buff_len(xdp); 1285 1207 total_rx_pkts++; 1286 1208 1287 - xdp->data = NULL; 1288 - rx_ring->first_desc = ntc; 1289 - rx_ring->nr_frags = 0; 1209 + ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict); 1210 + 1290 1211 continue; 1291 1212 construct_skb: 1292 1213 if (likely(ice_ring_uses_build_skb(rx_ring))) ··· 1296 1217 /* exit if we failed to retrieve a buffer */ 1297 1218 if (!skb) { 1298 1219 rx_ring->ring_stats->rx_stats.alloc_page_failed++; 1299 - rx_buf->act = ICE_XDP_CONSUMED; 1300 - if (unlikely(xdp_buff_has_frags(xdp))) 1301 - ice_set_rx_bufs_act(xdp, rx_ring, 1302 - ICE_XDP_CONSUMED); 1303 - xdp->data = NULL; 1304 - rx_ring->first_desc = ntc; 1305 - rx_ring->nr_frags = 0; 1306 - break; 1220 + xdp_verdict = ICE_XDP_CONSUMED; 1307 1221 } 1308 - xdp->data = NULL; 1309 - rx_ring->first_desc = ntc; 1310 - rx_ring->nr_frags = 0; 1222 + ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict); 1223 + 1224 + if (!skb) 1225 + break; 1311 1226 1312 1227 stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S); 1313 1228 if (unlikely(ice_test_staterr(rx_desc->wb.status_error0, ··· 1330 1257 total_rx_pkts++; 1331 1258 } 1332 1259 1333 - first = rx_ring->first_desc; 1334 - while (cached_ntc != first) { 1335 - struct ice_rx_buf *buf = &rx_ring->rx_buf[cached_ntc]; 1336 - 1337 - if (buf->act & (ICE_XDP_TX | ICE_XDP_REDIR)) { 1338 - ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); 1339 - xdp_xmit |= buf->act; 1340 - } else if (buf->act & ICE_XDP_CONSUMED) { 1341 - buf->pagecnt_bias++; 1342 - } else if (buf->act == ICE_XDP_PASS) { 1343 - ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); 1344 - } 1345 - 1346 - ice_put_rx_buf(rx_ring, buf); 1347 - if (++cached_ntc >= cnt) 1348 - cached_ntc = 0; 1349 - } 1350 1260 rx_ring->next_to_clean = ntc; 1351 1261 /* return up to cleaned_count buffers to hardware */ 1352 1262 failure = ice_alloc_rx_bufs(rx_ring, ICE_RX_DESC_UNUSED(rx_ring));

-1

drivers/net/ethernet/intel/ice/ice_txrx.h

··· 201 201 struct page *page; 202 202 unsigned int page_offset; 203 203 unsigned int pgcnt; 204 - unsigned int act; 205 204 unsigned int pagecnt_bias; 206 205 }; 207 206

-43

drivers/net/ethernet/intel/ice/ice_txrx_lib.h

··· 6 6 #include "ice.h" 7 7 8 8 /** 9 - * ice_set_rx_bufs_act - propagate Rx buffer action to frags 10 - * @xdp: XDP buffer representing frame (linear and frags part) 11 - * @rx_ring: Rx ring struct 12 - * act: action to store onto Rx buffers related to XDP buffer parts 13 - * 14 - * Set action that should be taken before putting Rx buffer from first frag 15 - * to the last. 16 - */ 17 - static inline void 18 - ice_set_rx_bufs_act(struct xdp_buff *xdp, const struct ice_rx_ring *rx_ring, 19 - const unsigned int act) 20 - { 21 - u32 sinfo_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; 22 - u32 nr_frags = rx_ring->nr_frags + 1; 23 - u32 idx = rx_ring->first_desc; 24 - u32 cnt = rx_ring->count; 25 - struct ice_rx_buf *buf; 26 - 27 - for (int i = 0; i < nr_frags; i++) { 28 - buf = &rx_ring->rx_buf[idx]; 29 - buf->act = act; 30 - 31 - if (++idx == cnt) 32 - idx = 0; 33 - } 34 - 35 - /* adjust pagecnt_bias on frags freed by XDP prog */ 36 - if (sinfo_frags < rx_ring->nr_frags && act == ICE_XDP_CONSUMED) { 37 - u32 delta = rx_ring->nr_frags - sinfo_frags; 38 - 39 - while (delta) { 40 - if (idx == 0) 41 - idx = cnt - 1; 42 - else 43 - idx--; 44 - buf = &rx_ring->rx_buf[idx]; 45 - buf->pagecnt_bias--; 46 - delta--; 47 - } 48 - } 49 - } 50 - 51 - /** 52 9 * ice_test_staterr - tests bits in Rx descriptor status and error fields 53 10 * @status_err_n: Rx descriptor status_error0 or status_error1 bits 54 11 * @stat_err_bits: value to mask

+17 -18

drivers/net/ethernet/stmicro/stmmac/stmmac_main.c

··· 2424 2424 u32 chan = 0; 2425 2425 u8 qmode = 0; 2426 2426 2427 + if (rxfifosz == 0) 2428 + rxfifosz = priv->dma_cap.rx_fifo_size; 2429 + if (txfifosz == 0) 2430 + txfifosz = priv->dma_cap.tx_fifo_size; 2431 + 2427 2432 /* Split up the shared Tx/Rx FIFO memory on DW QoS Eth and DW XGMAC */ 2428 2433 if (priv->plat->has_gmac4 || priv->plat->has_xgmac) { 2429 2434 rxfifosz /= rx_channels_count; ··· 2896 2891 u32 tx_channels_count = priv->plat->tx_queues_to_use; 2897 2892 int rxfifosz = priv->plat->rx_fifo_size; 2898 2893 int txfifosz = priv->plat->tx_fifo_size; 2894 + 2895 + if (rxfifosz == 0) 2896 + rxfifosz = priv->dma_cap.rx_fifo_size; 2897 + if (txfifosz == 0) 2898 + txfifosz = priv->dma_cap.tx_fifo_size; 2899 2899 2900 2900 /* Adjust for real per queue fifo size */ 2901 2901 rxfifosz /= rx_channels_count; ··· 5878 5868 const int mtu = new_mtu; 5879 5869 int ret; 5880 5870 5871 + if (txfifosz == 0) 5872 + txfifosz = priv->dma_cap.tx_fifo_size; 5873 + 5881 5874 txfifosz /= priv->plat->tx_queues_to_use; 5882 5875 5883 5876 if (stmmac_xdp_is_enabled(priv) && new_mtu > ETH_DATA_LEN) { ··· 7232 7219 priv->plat->tx_queues_to_use = priv->dma_cap.number_tx_queues; 7233 7220 } 7234 7221 7235 - if (!priv->plat->rx_fifo_size) { 7236 - if (priv->dma_cap.rx_fifo_size) { 7237 - priv->plat->rx_fifo_size = priv->dma_cap.rx_fifo_size; 7238 - } else { 7239 - dev_err(priv->device, "Can't specify Rx FIFO size\n"); 7240 - return -ENODEV; 7241 - } 7242 - } else if (priv->dma_cap.rx_fifo_size && 7243 - priv->plat->rx_fifo_size > priv->dma_cap.rx_fifo_size) { 7222 + if (priv->dma_cap.rx_fifo_size && 7223 + priv->plat->rx_fifo_size > priv->dma_cap.rx_fifo_size) { 7244 7224 dev_warn(priv->device, 7245 7225 "Rx FIFO size (%u) exceeds dma capability\n", 7246 7226 priv->plat->rx_fifo_size); 7247 7227 priv->plat->rx_fifo_size = priv->dma_cap.rx_fifo_size; 7248 7228 } 7249 - if (!priv->plat->tx_fifo_size) { 7250 - if (priv->dma_cap.tx_fifo_size) { 7251 - priv->plat->tx_fifo_size = priv->dma_cap.tx_fifo_size; 7252 - } else { 7253 - dev_err(priv->device, "Can't specify Tx FIFO size\n"); 7254 - return -ENODEV; 7255 - } 7256 - } else if (priv->dma_cap.tx_fifo_size && 7257 - priv->plat->tx_fifo_size > priv->dma_cap.tx_fifo_size) { 7229 + if (priv->dma_cap.tx_fifo_size && 7230 + priv->plat->tx_fifo_size > priv->dma_cap.tx_fifo_size) { 7258 7231 dev_warn(priv->device, 7259 7232 "Tx FIFO size (%u) exceeds dma capability\n", 7260 7233 priv->plat->tx_fifo_size);

+5 -9

drivers/net/tun.c

··· 574 574 return ret; 575 575 } 576 576 577 - static inline bool tun_capable(struct tun_struct *tun) 577 + static inline bool tun_not_capable(struct tun_struct *tun) 578 578 { 579 579 const struct cred *cred = current_cred(); 580 580 struct net *net = dev_net(tun->dev); 581 581 582 - if (ns_capable(net->user_ns, CAP_NET_ADMIN)) 583 - return 1; 584 - if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner)) 585 - return 1; 586 - if (gid_valid(tun->group) && in_egroup_p(tun->group)) 587 - return 1; 588 - return 0; 582 + return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) || 583 + (gid_valid(tun->group) && !in_egroup_p(tun->group))) && 584 + !ns_capable(net->user_ns, CAP_NET_ADMIN); 589 585 } 590 586 591 587 static void tun_set_real_num_queues(struct tun_struct *tun) ··· 2778 2782 !!(tun->flags & IFF_MULTI_QUEUE)) 2779 2783 return -EINVAL; 2780 2784 2781 - if (!tun_capable(tun)) 2785 + if (tun_not_capable(tun)) 2782 2786 return -EPERM; 2783 2787 err = security_tun_dev_open(tun->security); 2784 2788 if (err < 0)

+12 -2

drivers/net/vmxnet3/vmxnet3_xdp.c

··· 28 28 if (likely(cpu < tq_number)) 29 29 tq = &adapter->tx_queue[cpu]; 30 30 else 31 - tq = &adapter->tx_queue[reciprocal_scale(cpu, tq_number)]; 31 + tq = &adapter->tx_queue[cpu % tq_number]; 32 32 33 33 return tq; 34 34 } ··· 124 124 u32 buf_size; 125 125 u32 dw2; 126 126 127 + spin_lock_irq(&tq->tx_lock); 127 128 dw2 = (tq->tx_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT; 128 129 dw2 |= xdpf->len; 129 130 ctx.sop_txd = tq->tx_ring.base + tq->tx_ring.next2fill; ··· 135 134 136 135 if (vmxnet3_cmd_ring_desc_avail(&tq->tx_ring) == 0) { 137 136 tq->stats.tx_ring_full++; 137 + spin_unlock_irq(&tq->tx_lock); 138 138 return -ENOSPC; 139 139 } 140 140 ··· 144 142 tbi->dma_addr = dma_map_single(&adapter->pdev->dev, 145 143 xdpf->data, buf_size, 146 144 DMA_TO_DEVICE); 147 - if (dma_mapping_error(&adapter->pdev->dev, tbi->dma_addr)) 145 + if (dma_mapping_error(&adapter->pdev->dev, tbi->dma_addr)) { 146 + spin_unlock_irq(&tq->tx_lock); 148 147 return -EFAULT; 148 + } 149 149 tbi->map_type |= VMXNET3_MAP_SINGLE; 150 150 } else { /* XDP buffer from page pool */ 151 151 page = virt_to_page(xdpf->data); ··· 186 182 dma_wmb(); 187 183 gdesc->dword[2] = cpu_to_le32(le32_to_cpu(gdesc->dword[2]) ^ 188 184 VMXNET3_TXD_GEN); 185 + spin_unlock_irq(&tq->tx_lock); 189 186 190 187 /* No need to handle the case when tx_num_deferred doesn't reach 191 188 * threshold. Backend driver at hypervisor side will poll and reset ··· 230 225 { 231 226 struct vmxnet3_adapter *adapter = netdev_priv(dev); 232 227 struct vmxnet3_tx_queue *tq; 228 + struct netdev_queue *nq; 233 229 int i; 234 230 235 231 if (unlikely(test_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state))) ··· 242 236 if (tq->stopped) 243 237 return -ENETDOWN; 244 238 239 + nq = netdev_get_tx_queue(adapter->netdev, tq->qid); 240 + 241 + __netif_tx_lock(nq, smp_processor_id()); 245 242 for (i = 0; i < n; i++) { 246 243 if (vmxnet3_xdp_xmit_frame(adapter, frames[i], tq, true)) { 247 244 tq->stats.xdp_xmit_err++; ··· 252 243 } 253 244 } 254 245 tq->stats.xdp_xmit += i; 246 + __netif_tx_unlock(nq); 255 247 256 248 return i; 257 249 }

+7 -1

drivers/nvme/host/core.c

··· 1700 1700 1701 1701 status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, NULL, 0, 1702 1702 &result); 1703 - if (status < 0) 1703 + 1704 + /* 1705 + * It's either a kernel error or the host observed a connection 1706 + * lost. In either case it's not possible communicate with the 1707 + * controller and thus enter the error code path. 1708 + */ 1709 + if (status < 0 || status == NVME_SC_HOST_PATH_ERROR) 1704 1710 return status; 1705 1711 1706 1712 /*

+25 -10

drivers/nvme/host/fc.c

··· 781 781 static void 782 782 nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl) 783 783 { 784 + enum nvme_ctrl_state state; 785 + unsigned long flags; 786 + 784 787 dev_info(ctrl->ctrl.device, 785 788 "NVME-FC{%d}: controller connectivity lost. Awaiting " 786 789 "Reconnect", ctrl->cnum); 787 790 788 - switch (nvme_ctrl_state(&ctrl->ctrl)) { 791 + spin_lock_irqsave(&ctrl->lock, flags); 792 + set_bit(ASSOC_FAILED, &ctrl->flags); 793 + state = nvme_ctrl_state(&ctrl->ctrl); 794 + spin_unlock_irqrestore(&ctrl->lock, flags); 795 + 796 + switch (state) { 789 797 case NVME_CTRL_NEW: 790 798 case NVME_CTRL_LIVE: 791 799 /* ··· 2087 2079 nvme_fc_complete_rq(rq); 2088 2080 2089 2081 check_error: 2090 - if (terminate_assoc && ctrl->ctrl.state != NVME_CTRL_RESETTING) 2082 + if (terminate_assoc && 2083 + nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_RESETTING) 2091 2084 queue_work(nvme_reset_wq, &ctrl->ioerr_work); 2092 2085 } 2093 2086 ··· 2542 2533 static void 2543 2534 nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg) 2544 2535 { 2536 + enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl); 2537 + 2545 2538 /* 2546 2539 * if an error (io timeout, etc) while (re)connecting, the remote 2547 2540 * port requested terminating of the association (disconnect_ls) ··· 2551 2540 * the controller. Abort any ios on the association and let the 2552 2541 * create_association error path resolve things. 2553 2542 */ 2554 - if (ctrl->ctrl.state == NVME_CTRL_CONNECTING) { 2543 + if (state == NVME_CTRL_CONNECTING) { 2555 2544 __nvme_fc_abort_outstanding_ios(ctrl, true); 2556 - set_bit(ASSOC_FAILED, &ctrl->flags); 2557 2545 dev_warn(ctrl->ctrl.device, 2558 2546 "NVME-FC{%d}: transport error during (re)connect\n", 2559 2547 ctrl->cnum); ··· 2560 2550 } 2561 2551 2562 2552 /* Otherwise, only proceed if in LIVE state - e.g. on first error */ 2563 - if (ctrl->ctrl.state != NVME_CTRL_LIVE) 2553 + if (state != NVME_CTRL_LIVE) 2564 2554 return; 2565 2555 2566 2556 dev_warn(ctrl->ctrl.device, ··· 3177 3167 else 3178 3168 ret = nvme_fc_recreate_io_queues(ctrl); 3179 3169 } 3180 - if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags)) 3181 - ret = -EIO; 3182 3170 if (ret) 3183 3171 goto out_term_aen_ops; 3184 3172 3185 - changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE); 3173 + spin_lock_irqsave(&ctrl->lock, flags); 3174 + if (!test_bit(ASSOC_FAILED, &ctrl->flags)) 3175 + changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE); 3176 + else 3177 + ret = -EIO; 3178 + spin_unlock_irqrestore(&ctrl->lock, flags); 3179 + 3180 + if (ret) 3181 + goto out_term_aen_ops; 3186 3182 3187 3183 ctrl->ctrl.nr_reconnects = 0; 3188 3184 ··· 3594 3578 list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list); 3595 3579 spin_unlock_irqrestore(&rport->lock, flags); 3596 3580 3597 - if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) || 3598 - !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) { 3581 + if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) { 3599 3582 dev_err(ctrl->ctrl.device, 3600 3583 "NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum); 3601 3584 goto fail_ctrl;

+3 -9

drivers/nvme/host/pci.c

··· 2153 2153 return 0; 2154 2154 2155 2155 out_free_bufs: 2156 - while (--i >= 0) { 2157 - size_t size = le32_to_cpu(descs[i].size) * NVME_CTRL_PAGE_SIZE; 2158 - 2159 - dma_free_attrs(dev->dev, size, bufs[i], 2160 - le64_to_cpu(descs[i].addr), 2161 - DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN); 2162 - } 2163 - 2164 2156 kfree(bufs); 2165 2157 out_free_descs: 2166 2158 dma_free_coherent(dev->dev, descs_size, descs, descs_dma); ··· 3139 3147 * because of high power consumption (> 2 Watt) in s2idle 3140 3148 * sleep. Only some boards with Intel CPU are affected. 3141 3149 */ 3142 - if (dmi_match(DMI_BOARD_NAME, "GMxPXxx") || 3150 + if (dmi_match(DMI_BOARD_NAME, "DN50Z-140HC-YD") || 3151 + dmi_match(DMI_BOARD_NAME, "GMxPXxx") || 3152 + dmi_match(DMI_BOARD_NAME, "GXxMRXx") || 3143 3153 dmi_match(DMI_BOARD_NAME, "PH4PG31") || 3144 3154 dmi_match(DMI_BOARD_NAME, "PH4PRX1_PH6PRX1") || 3145 3155 dmi_match(DMI_BOARD_NAME, "PH6PG01_PH6PG71"))

+1 -1

drivers/nvme/host/sysfs.c

··· 792 792 return a->mode; 793 793 } 794 794 795 - const struct attribute_group nvme_tls_attrs_group = { 795 + static const struct attribute_group nvme_tls_attrs_group = { 796 796 .attrs = nvme_tls_attrs, 797 797 .is_visible = nvme_tls_attrs_are_visible, 798 798 };

+1

drivers/nvme/target/admin-cmd.c

··· 1068 1068 goto out; 1069 1069 } 1070 1070 status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id)); 1071 + kfree(id); 1071 1072 out: 1072 1073 nvmet_req_complete(req, status); 1073 1074 }

+1 -1

drivers/nvme/target/fabrics-cmd.c

··· 287 287 args.subsysnqn = d->subsysnqn; 288 288 args.hostnqn = d->hostnqn; 289 289 args.hostid = &d->hostid; 290 - args.kato = c->kato; 290 + args.kato = le32_to_cpu(c->kato); 291 291 292 292 ctrl = nvmet_alloc_ctrl(&args); 293 293 if (!ctrl)

+1 -1

drivers/nvme/target/io-cmd-bdev.c

··· 272 272 iter_flags = SG_MITER_FROM_SG; 273 273 } 274 274 275 - if (req->cmd->rw.control & NVME_RW_LR) 275 + if (req->cmd->rw.control & cpu_to_le16(NVME_RW_LR)) 276 276 opf |= REQ_FAILFAST_DEV; 277 277 278 278 if (is_pci_p2pdma_page(sg_page(req->sg)))

+1 -1

drivers/nvme/target/nvmet.h

··· 589 589 const struct nvmet_fabrics_ops *ops; 590 590 struct device *p2p_client; 591 591 u32 kato; 592 - u32 result; 592 + __le32 result; 593 593 u16 error_loc; 594 594 u16 status; 595 595 };

-3

drivers/pci/pcie/aspm.c

··· 108 108 pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL2, cap++); 109 109 pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL1, cap++); 110 110 111 - if (parent->state_saved) 112 - return; 113 - 114 111 /* 115 112 * Save parent's L1 substate configuration so we have it for 116 113 * pci_restore_aspm_l1ss_state(pdev) to restore.

+1 -1

drivers/pci/tph.c

··· 360 360 return err; 361 361 } 362 362 363 - set_ctrl_reg_req_en(pdev, pdev->tph_mode); 363 + set_ctrl_reg_req_en(pdev, pdev->tph_req_type); 364 364 365 365 pci_dbg(pdev, "set steering tag: %s table, index=%d, tag=%#04x\n", 366 366 (loc == PCI_TPH_LOC_MSIX) ? "MSI-X" : "ST", index, tag);

+1 -1

drivers/platform/x86/ideapad-laptop.c

··· 1121 1121 1122 1122 /* Create platform_profile structure and register */ 1123 1123 priv->dytc->ppdev = devm_platform_profile_register(&priv->platform_device->dev, 1124 - "ideapad-laptop", &priv->dytc, 1124 + "ideapad-laptop", priv->dytc, 1125 1125 &dytc_profile_ops); 1126 1126 if (IS_ERR(priv->dytc->ppdev)) { 1127 1127 err = PTR_ERR(priv->dytc->ppdev);

+4 -5

drivers/platform/x86/intel/ifs/ifs.h

··· 23 23 * IFS Image 24 24 * --------- 25 25 * 26 - * Intel provides a firmware file containing the scan tests via 27 - * github [#f1]_. Similar to microcode there is a separate file for each 26 + * Intel provides firmware files containing the scan tests via the webpage [#f1]_. 27 + * Look under "In-Field Scan Test Images Download" section towards the 28 + * end of the page. Similar to microcode, there are separate files for each 28 29 * family-model-stepping. IFS Images are not applicable for some test types. 29 30 * Wherever applicable the sysfs directory would provide a "current_batch" file 30 31 * (see below) for loading the image. 31 32 * 33 + * .. [#f1] https://intel.com/InFieldScan 32 34 * 33 35 * IFS Image Loading 34 36 * ----------------- ··· 126 124 * 127 125 * 2) Hardware allows for some number of cores to be tested in parallel. 128 126 * The driver does not make use of this, it only tests one core at a time. 129 - * 130 - * .. [#f1] https://github.com/intel/TBD 131 - * 132 127 * 133 128 * Structural Based Functional Test at Field (SBAF): 134 129 * -------------------------------------------------

+2 -2

drivers/platform/x86/intel/pmc/core.c

··· 626 626 static int pmc_core_ltr_show(struct seq_file *s, void *unused) 627 627 { 628 628 struct pmc_dev *pmcdev = s->private; 629 - u64 decoded_snoop_ltr, decoded_non_snoop_ltr; 630 - u32 ltr_raw_data, scale, val; 629 + u64 decoded_snoop_ltr, decoded_non_snoop_ltr, val; 630 + u32 ltr_raw_data, scale; 631 631 u16 snoop_ltr, nonsnoop_ltr; 632 632 unsigned int i, index, ltr_index = 0; 633 633

+1 -2

drivers/powercap/powercap_sys.c

··· 627 627 dev_set_name(&control_type->dev, "%s", name); 628 628 result = device_register(&control_type->dev); 629 629 if (result) { 630 - if (control_type->allocated) 631 - kfree(control_type); 630 + put_device(&control_type->dev); 632 631 return ERR_PTR(result); 633 632 } 634 633 idr_init(&control_type->idr);

+1 -1

drivers/scsi/qla1280.c

··· 2867 2867 dprintk(3, "S/G Segment phys_addr=%x %x, len=0x%x\n", 2868 2868 cpu_to_le32(upper_32_bits(dma_handle)), 2869 2869 cpu_to_le32(lower_32_bits(dma_handle)), 2870 - cpu_to_le32(sg_dma_len(sg_next(s)))); 2870 + cpu_to_le32(sg_dma_len(s))); 2871 2871 remseg--; 2872 2872 } 2873 2873 dprintk(5, "qla1280_64bit_start_scsi: Scatter/gather "

+7 -2

drivers/scsi/scsi_lib.c

··· 872 872 case 0x1a: /* start stop unit in progress */ 873 873 case 0x1b: /* sanitize in progress */ 874 874 case 0x1d: /* configuration in progress */ 875 - case 0x24: /* depopulation in progress */ 876 - case 0x25: /* depopulation restore in progress */ 877 875 action = ACTION_DELAYED_RETRY; 878 876 break; 879 877 case 0x0a: /* ALUA state transition */ 880 878 action = ACTION_DELAYED_REPREP; 881 879 break; 880 + /* 881 + * Depopulation might take many hours, 882 + * thus it is not worthwhile to retry. 883 + */ 884 + case 0x24: /* depopulation in progress */ 885 + case 0x25: /* depopulation restore in progress */ 886 + fallthrough; 882 887 default: 883 888 action = ACTION_FAIL; 884 889 break;

+7

drivers/scsi/scsi_lib_test.c

··· 67 67 }; 68 68 int i; 69 69 70 + /* Success */ 71 + sc.result = 0; 72 + KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures)); 73 + KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, NULL)); 74 + /* Command failed but caller did not pass in a failures array */ 75 + scsi_build_sense(&sc, 0, ILLEGAL_REQUEST, 0x91, 0x36); 76 + KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, NULL)); 70 77 /* Match end of array */ 71 78 scsi_build_sense(&sc, 0, ILLEGAL_REQUEST, 0x91, 0x36); 72 79 KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));

+1 -1

drivers/scsi/scsi_scan.c

··· 246 246 } 247 247 ret = sbitmap_init_node(&sdev->budget_map, 248 248 scsi_device_max_queue_depth(sdev), 249 - new_shift, GFP_KERNEL, 249 + new_shift, GFP_NOIO, 250 250 sdev->request_queue->node, false, true); 251 251 if (!ret) 252 252 sbitmap_resize(&sdev->budget_map, depth);

+1

drivers/scsi/storvsc_drv.c

··· 1800 1800 1801 1801 length = scsi_bufflen(scmnd); 1802 1802 payload = (struct vmbus_packet_mpb_array *)&cmd_request->mpb; 1803 + payload->range.len = 0; 1803 1804 payload_sz = 0; 1804 1805 1805 1806 if (scsi_sg_count(scmnd)) {

+1 -1

drivers/soc/qcom/smp2p.c

··· 365 365 { 366 366 struct smp2p_entry *entry = irq_data_get_irq_chip_data(irqd); 367 367 368 - seq_printf(p, " %8s", dev_name(entry->smp2p->dev)); 368 + seq_printf(p, "%8s", dev_name(entry->smp2p->dev)); 369 369 } 370 370 371 371 static struct irq_chip smp2p_irq_chip = {

+2 -2

drivers/target/target_core_stat.c

··· 117 117 char *page) 118 118 { 119 119 if (to_stat_tgt_dev(item)->export_count) 120 - return snprintf(page, PAGE_SIZE, "activated"); 120 + return snprintf(page, PAGE_SIZE, "activated\n"); 121 121 else 122 - return snprintf(page, PAGE_SIZE, "deactivated"); 122 + return snprintf(page, PAGE_SIZE, "deactivated\n"); 123 123 } 124 124 125 125 static ssize_t target_stat_tgt_non_access_lus_show(struct config_item *item,

+1 -1

drivers/tty/pty.c

··· 798 798 nonseekable_open(inode, filp); 799 799 800 800 /* We refuse fsnotify events on ptmx, since it's a shared resource */ 801 - filp->f_mode |= FMODE_NONOTIFY; 801 + file_set_fsnotify_mode(filp, FMODE_NONOTIFY); 802 802 803 803 retval = tty_alloc_file(filp); 804 804 if (retval)

+35 -33

drivers/ufs/core/ufshcd.c

··· 2120 2120 INIT_DELAYED_WORK(&hba->clk_gating.gate_work, ufshcd_gate_work); 2121 2121 INIT_WORK(&hba->clk_gating.ungate_work, ufshcd_ungate_work); 2122 2122 2123 - spin_lock_init(&hba->clk_gating.lock); 2124 - 2125 2123 hba->clk_gating.clk_gating_workq = alloc_ordered_workqueue( 2126 2124 "ufs_clk_gating_%d", WQ_MEM_RECLAIM | WQ_HIGHPRI, 2127 2125 hba->host->host_no); ··· 3104 3106 case UPIU_TRANSACTION_QUERY_RSP: { 3105 3107 u8 response = lrbp->ucd_rsp_ptr->header.response; 3106 3108 3107 - if (response == 0) 3109 + if (response == 0) { 3108 3110 err = ufshcd_copy_query_response(hba, lrbp); 3111 + } else { 3112 + err = -EINVAL; 3113 + dev_err(hba->dev, "%s: unexpected response in Query RSP: %x\n", 3114 + __func__, response); 3115 + } 3109 3116 break; 3110 3117 } 3111 3118 case UPIU_TRANSACTION_REJECT_UPIU: ··· 5979 5976 __func__, err); 5980 5977 } 5981 5978 5982 - static void ufshcd_temp_exception_event_handler(struct ufs_hba *hba, u16 status) 5983 - { 5984 - u32 value; 5985 - 5986 - if (ufshcd_query_attr_retry(hba, UPIU_QUERY_OPCODE_READ_ATTR, 5987 - QUERY_ATTR_IDN_CASE_ROUGH_TEMP, 0, 0, &value)) 5988 - return; 5989 - 5990 - dev_info(hba->dev, "exception Tcase %d\n", value - 80); 5991 - 5992 - ufs_hwmon_notify_event(hba, status & MASK_EE_URGENT_TEMP); 5993 - 5994 - /* 5995 - * A placeholder for the platform vendors to add whatever additional 5996 - * steps required 5997 - */ 5998 - } 5999 - 6000 5979 static int __ufshcd_wb_toggle(struct ufs_hba *hba, bool set, enum flag_idn idn) 6001 5980 { 6002 5981 u8 index; ··· 6199 6214 ufshcd_bkops_exception_event_handler(hba); 6200 6215 6201 6216 if (status & hba->ee_drv_mask & MASK_EE_URGENT_TEMP) 6202 - ufshcd_temp_exception_event_handler(hba, status); 6217 + ufs_hwmon_notify_event(hba, status & MASK_EE_URGENT_TEMP); 6203 6218 6204 6219 ufs_debugfs_exception_event(hba, status); 6205 6220 } ··· 9145 9160 if (!IS_ERR_OR_NULL(clki->clk) && clki->enabled) 9146 9161 clk_disable_unprepare(clki->clk); 9147 9162 } 9148 - } else if (!ret && on) { 9163 + } else if (!ret && on && hba->clk_gating.is_initialized) { 9149 9164 scoped_guard(spinlock_irqsave, &hba->clk_gating.lock) 9150 9165 hba->clk_gating.state = CLKS_ON; 9151 9166 trace_ufshcd_clk_gating(dev_name(hba->dev), ··· 10232 10247 #endif /* CONFIG_PM_SLEEP */ 10233 10248 10234 10249 /** 10235 - * ufshcd_dealloc_host - deallocate Host Bus Adapter (HBA) 10236 - * @hba: pointer to Host Bus Adapter (HBA) 10237 - */ 10238 - void ufshcd_dealloc_host(struct ufs_hba *hba) 10239 - { 10240 - scsi_host_put(hba->host); 10241 - } 10242 - EXPORT_SYMBOL_GPL(ufshcd_dealloc_host); 10243 - 10244 - /** 10245 10250 * ufshcd_set_dma_mask - Set dma mask based on the controller 10246 10251 * addressing capability 10247 10252 * @hba: per adapter instance ··· 10250 10275 } 10251 10276 10252 10277 /** 10278 + * ufshcd_devres_release - devres cleanup handler, invoked during release of 10279 + * hba->dev 10280 + * @host: pointer to SCSI host 10281 + */ 10282 + static void ufshcd_devres_release(void *host) 10283 + { 10284 + scsi_host_put(host); 10285 + } 10286 + 10287 + /** 10253 10288 * ufshcd_alloc_host - allocate Host Bus Adapter (HBA) 10254 10289 * @dev: pointer to device handle 10255 10290 * @hba_handle: driver private handle 10256 10291 * 10257 10292 * Return: 0 on success, non-zero value on failure. 10293 + * 10294 + * NOTE: There is no corresponding ufshcd_dealloc_host() because this function 10295 + * keeps track of its allocations using devres and deallocates everything on 10296 + * device removal automatically. 10258 10297 */ 10259 10298 int ufshcd_alloc_host(struct device *dev, struct ufs_hba **hba_handle) 10260 10299 { ··· 10290 10301 err = -ENOMEM; 10291 10302 goto out_error; 10292 10303 } 10304 + 10305 + err = devm_add_action_or_reset(dev, ufshcd_devres_release, 10306 + host); 10307 + if (err) 10308 + return dev_err_probe(dev, err, 10309 + "failed to add ufshcd dealloc action\n"); 10310 + 10293 10311 host->nr_maps = HCTX_TYPE_POLL + 1; 10294 10312 hba = shost_priv(host); 10295 10313 hba->host = host; ··· 10424 10428 hba->mmio_base = mmio_base; 10425 10429 hba->irq = irq; 10426 10430 hba->vps = &ufs_hba_vps; 10431 + 10432 + /* 10433 + * Initialize clk_gating.lock early since it is being used in 10434 + * ufshcd_setup_clocks() 10435 + */ 10436 + spin_lock_init(&hba->clk_gating.lock); 10427 10437 10428 10438 err = ufshcd_hba_init(hba); 10429 10439 if (err)

-2

drivers/ufs/host/ufshcd-pci.c

··· 562 562 pm_runtime_forbid(&pdev->dev); 563 563 pm_runtime_get_noresume(&pdev->dev); 564 564 ufshcd_remove(hba); 565 - ufshcd_dealloc_host(hba); 566 565 } 567 566 568 567 /** ··· 604 605 err = ufshcd_init(hba, mmio_base, pdev->irq); 605 606 if (err) { 606 607 dev_err(&pdev->dev, "Initialization failed\n"); 607 - ufshcd_dealloc_host(hba); 608 608 return err; 609 609 } 610 610

+9 -19

drivers/ufs/host/ufshcd-pltfrm.c

··· 465 465 struct device *dev = &pdev->dev; 466 466 467 467 mmio_base = devm_platform_ioremap_resource(pdev, 0); 468 - if (IS_ERR(mmio_base)) { 469 - err = PTR_ERR(mmio_base); 470 - goto out; 471 - } 468 + if (IS_ERR(mmio_base)) 469 + return PTR_ERR(mmio_base); 472 470 473 471 irq = platform_get_irq(pdev, 0); 474 - if (irq < 0) { 475 - err = irq; 476 - goto out; 477 - } 472 + if (irq < 0) 473 + return irq; 478 474 479 475 err = ufshcd_alloc_host(dev, &hba); 480 476 if (err) { 481 477 dev_err(dev, "Allocation failed\n"); 482 - goto out; 478 + return err; 483 479 } 484 480 485 481 hba->vops = vops; ··· 484 488 if (err) { 485 489 dev_err(dev, "%s: clock parse failed %d\n", 486 490 __func__, err); 487 - goto dealloc_host; 491 + return err; 488 492 } 489 493 err = ufshcd_parse_regulator_info(hba); 490 494 if (err) { 491 495 dev_err(dev, "%s: regulator init failed %d\n", 492 496 __func__, err); 493 - goto dealloc_host; 497 + return err; 494 498 } 495 499 496 500 ufshcd_init_lanes_per_dir(hba); ··· 498 502 err = ufshcd_parse_operating_points(hba); 499 503 if (err) { 500 504 dev_err(dev, "%s: OPP parse failed %d\n", __func__, err); 501 - goto dealloc_host; 505 + return err; 502 506 } 503 507 504 508 err = ufshcd_init(hba, mmio_base, irq); 505 509 if (err) { 506 510 dev_err_probe(dev, err, "Initialization failed with error %d\n", 507 511 err); 508 - goto dealloc_host; 512 + return err; 509 513 } 510 514 511 515 pm_runtime_set_active(dev); 512 516 pm_runtime_enable(dev); 513 517 514 518 return 0; 515 - 516 - dealloc_host: 517 - ufshcd_dealloc_host(hba); 518 - out: 519 - return err; 520 519 } 521 520 EXPORT_SYMBOL_GPL(ufshcd_pltfrm_init); 522 521 ··· 525 534 526 535 pm_runtime_get_sync(&pdev->dev); 527 536 ufshcd_remove(hba); 528 - ufshcd_dealloc_host(hba); 529 537 pm_runtime_disable(&pdev->dev); 530 538 pm_runtime_put_noidle(&pdev->dev); 531 539 }

+25 -22

fs/bcachefs/alloc_background.c

··· 1803 1803 u64 open; 1804 1804 u64 need_journal_commit; 1805 1805 u64 discarded; 1806 - u64 need_journal_commit_this_dev; 1807 1806 }; 1808 1807 1809 1808 static int bch2_discard_one_bucket(struct btree_trans *trans, ··· 1826 1827 goto out; 1827 1828 } 1828 1829 1829 - if (bch2_bucket_needs_journal_commit(&c->buckets_waiting_for_journal, 1830 - c->journal.flushed_seq_ondisk, 1831 - pos.inode, pos.offset)) { 1832 - s->need_journal_commit++; 1833 - s->need_journal_commit_this_dev++; 1830 + u64 seq_ready = bch2_bucket_journal_seq_ready(&c->buckets_waiting_for_journal, 1831 + pos.inode, pos.offset); 1832 + if (seq_ready > c->journal.flushed_seq_ondisk) { 1833 + if (seq_ready > c->journal.flushing_seq) 1834 + s->need_journal_commit++; 1834 1835 goto out; 1835 1836 } 1836 1837 ··· 1864 1865 discard_locked = true; 1865 1866 } 1866 1867 1867 - if (!bkey_eq(*discard_pos_done, iter.pos) && 1868 - ca->mi.discard && !c->opts.nochanges) { 1869 - /* 1870 - * This works without any other locks because this is the only 1871 - * thread that removes items from the need_discard tree 1872 - */ 1873 - bch2_trans_unlock_long(trans); 1874 - blkdev_issue_discard(ca->disk_sb.bdev, 1875 - k.k->p.offset * ca->mi.bucket_size, 1876 - ca->mi.bucket_size, 1877 - GFP_KERNEL); 1878 - *discard_pos_done = iter.pos; 1868 + if (!bkey_eq(*discard_pos_done, iter.pos)) { 1879 1869 s->discarded++; 1870 + *discard_pos_done = iter.pos; 1880 1871 1881 - ret = bch2_trans_relock_notrace(trans); 1882 - if (ret) 1883 - goto out; 1872 + if (ca->mi.discard && !c->opts.nochanges) { 1873 + /* 1874 + * This works without any other locks because this is the only 1875 + * thread that removes items from the need_discard tree 1876 + */ 1877 + bch2_trans_unlock_long(trans); 1878 + blkdev_issue_discard(ca->disk_sb.bdev, 1879 + k.k->p.offset * ca->mi.bucket_size, 1880 + ca->mi.bucket_size, 1881 + GFP_KERNEL); 1882 + ret = bch2_trans_relock_notrace(trans); 1883 + if (ret) 1884 + goto out; 1885 + } 1884 1886 } 1885 1887 1886 1888 SET_BCH_ALLOC_V4_NEED_DISCARD(&a->v, false); ··· 1928 1928 POS(ca->dev_idx, 0), 1929 1929 POS(ca->dev_idx, U64_MAX), 0, k, 1930 1930 bch2_discard_one_bucket(trans, ca, &iter, &discard_pos_done, &s, false))); 1931 + 1932 + if (s.need_journal_commit > dev_buckets_available(ca, BCH_WATERMARK_normal)) 1933 + bch2_journal_flush_async(&c->journal, NULL); 1931 1934 1932 1935 trace_discard_buckets(c, s.seen, s.open, s.need_journal_commit, s.discarded, 1933 1936 bch2_err_str(ret)); ··· 2027 2024 break; 2028 2025 } 2029 2026 2030 - trace_discard_buckets(c, s.seen, s.open, s.need_journal_commit, s.discarded, bch2_err_str(ret)); 2027 + trace_discard_buckets_fast(c, s.seen, s.open, s.need_journal_commit, s.discarded, bch2_err_str(ret)); 2031 2028 2032 2029 bch2_trans_put(trans); 2033 2030 percpu_ref_put(&ca->io_ref);

+7 -3

fs/bcachefs/alloc_foreground.c

··· 205 205 return false; 206 206 } 207 207 208 - if (bch2_bucket_needs_journal_commit(&c->buckets_waiting_for_journal, 209 - c->journal.flushed_seq_ondisk, bucket.inode, bucket.offset)) { 208 + u64 journal_seq_ready = 209 + bch2_bucket_journal_seq_ready(&c->buckets_waiting_for_journal, 210 + bucket.inode, bucket.offset); 211 + if (journal_seq_ready > c->journal.flushed_seq_ondisk) { 212 + if (journal_seq_ready > c->journal.flushing_seq) 213 + s->need_journal_commit++; 210 214 s->skipped_need_journal_commit++; 211 215 return false; 212 216 } ··· 574 570 ? bch2_bucket_alloc_freelist(trans, ca, watermark, &s, cl) 575 571 : bch2_bucket_alloc_early(trans, ca, watermark, &s, cl); 576 572 577 - if (s.skipped_need_journal_commit * 2 > avail) 573 + if (s.need_journal_commit * 2 > avail) 578 574 bch2_journal_flush_async(&c->journal, NULL); 579 575 580 576 if (!ob && s.btree_bitmap != BTREE_BITMAP_ANY) {

+1

fs/bcachefs/alloc_types.h

··· 18 18 u64 buckets_seen; 19 19 u64 skipped_open; 20 20 u64 skipped_need_journal_commit; 21 + u64 need_journal_commit; 21 22 u64 skipped_nocow; 22 23 u64 skipped_nouse; 23 24 u64 skipped_mi_btree_bitmap;

-1

fs/bcachefs/btree_key_cache.c

··· 748 748 rcu_read_unlock(); 749 749 mutex_lock(&bc->table.mutex); 750 750 mutex_unlock(&bc->table.mutex); 751 - rcu_read_lock(); 752 751 continue; 753 752 } 754 753 for (i = 0; i < tbl->size; i++)

+5 -7

fs/bcachefs/buckets_waiting_for_journal.c

··· 22 22 memset(t->d, 0, sizeof(t->d[0]) << t->bits); 23 23 } 24 24 25 - bool bch2_bucket_needs_journal_commit(struct buckets_waiting_for_journal *b, 26 - u64 flushed_seq, 27 - unsigned dev, u64 bucket) 25 + u64 bch2_bucket_journal_seq_ready(struct buckets_waiting_for_journal *b, 26 + unsigned dev, u64 bucket) 28 27 { 29 28 struct buckets_waiting_for_journal_table *t; 30 29 u64 dev_bucket = (u64) dev << 56 | bucket; 31 - bool ret = false; 32 - unsigned i; 30 + u64 ret = 0; 33 31 34 32 mutex_lock(&b->lock); 35 33 t = b->t; 36 34 37 - for (i = 0; i < ARRAY_SIZE(t->hash_seeds); i++) { 35 + for (unsigned i = 0; i < ARRAY_SIZE(t->hash_seeds); i++) { 38 36 struct bucket_hashed *h = bucket_hash(t, i, dev_bucket); 39 37 40 38 if (h->dev_bucket == dev_bucket) { 41 - ret = h->journal_seq > flushed_seq; 39 + ret = h->journal_seq; 42 40 break; 43 41 } 44 42 }

+2 -2

fs/bcachefs/buckets_waiting_for_journal.h

··· 4 4 5 5 #include "buckets_waiting_for_journal_types.h" 6 6 7 - bool bch2_bucket_needs_journal_commit(struct buckets_waiting_for_journal *, 8 - u64, unsigned, u64); 7 + u64 bch2_bucket_journal_seq_ready(struct buckets_waiting_for_journal *, 8 + unsigned, u64); 9 9 int bch2_set_bucket_needs_journal_commit(struct buckets_waiting_for_journal *, 10 10 u64, unsigned, u64, u64); 11 11

+3 -1

fs/bcachefs/inode.h

··· 285 285 struct bch_inode_unpacked *); 286 286 int bch2_inum_opts_get(struct btree_trans*, subvol_inum, struct bch_io_opts *); 287 287 288 + #include "rebalance.h" 289 + 288 290 static inline struct bch_extent_rebalance 289 291 bch2_inode_rebalance_opts_get(struct bch_fs *c, struct bch_inode_unpacked *inode) 290 292 { 291 293 struct bch_io_opts io_opts; 292 294 bch2_inode_opts_get(&io_opts, c, inode); 293 - return io_opts_to_rebalance_opts(&io_opts); 295 + return io_opts_to_rebalance_opts(c, &io_opts); 294 296 } 295 297 296 298 int bch2_inode_rm_snapshot(struct btree_trans *, u64, u32);

+16 -2

fs/bcachefs/journal.c

··· 319 319 spin_unlock(&j->lock); 320 320 } 321 321 322 + void bch2_journal_halt_locked(struct journal *j) 323 + { 324 + lockdep_assert_held(&j->lock); 325 + 326 + __journal_entry_close(j, JOURNAL_ENTRY_ERROR_VAL, true); 327 + if (!j->err_seq) 328 + j->err_seq = journal_cur_seq(j); 329 + journal_wake(j); 330 + } 331 + 322 332 static bool journal_entry_want_write(struct journal *j) 323 333 { 324 334 bool ret = !journal_entry_is_open(j) || ··· 391 381 if (nr_unwritten_journal_entries(j) == ARRAY_SIZE(j->buf)) 392 382 return JOURNAL_ERR_max_in_flight; 393 383 394 - if (bch2_fs_fatal_err_on(journal_cur_seq(j) >= JOURNAL_SEQ_MAX, 395 - c, "cannot start: journal seq overflow")) 384 + if (journal_cur_seq(j) >= JOURNAL_SEQ_MAX) { 385 + bch_err(c, "cannot start: journal seq overflow"); 386 + if (bch2_fs_emergency_read_only_locked(c)) 387 + bch_err(c, "fatal error - emergency read only"); 396 388 return JOURNAL_ERR_insufficient_devices; /* -EROFS */ 389 + } 397 390 398 391 BUG_ON(!j->cur_entry_sectors); 399 392 ··· 796 783 } 797 784 798 785 buf->must_flush = true; 786 + j->flushing_seq = max(j->flushing_seq, seq); 799 787 800 788 if (parent && !closure_wait(&buf->wait, parent)) 801 789 BUG();

+1

fs/bcachefs/journal.h

··· 409 409 int bch2_journal_meta(struct journal *); 410 410 411 411 void bch2_journal_halt(struct journal *); 412 + void bch2_journal_halt_locked(struct journal *); 412 413 413 414 static inline int bch2_journal_error(struct journal *j) 414 415 {

+1

fs/bcachefs/journal_types.h

··· 237 237 /* seq, last_seq from the most recent journal entry successfully written */ 238 238 u64 seq_ondisk; 239 239 u64 flushed_seq_ondisk; 240 + u64 flushing_seq; 240 241 u64 last_seq_ondisk; 241 242 u64 err_seq; 242 243 u64 last_empty_seq;

-14

fs/bcachefs/opts.h

··· 659 659 struct bch_io_opts bch2_opts_to_inode_opts(struct bch_opts); 660 660 bool bch2_opt_is_inode_opt(enum bch_opt_id); 661 661 662 - /* rebalance opts: */ 663 - 664 - static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_io_opts *opts) 665 - { 666 - return (struct bch_extent_rebalance) { 667 - .type = BIT(BCH_EXTENT_ENTRY_rebalance), 668 - #define x(_name) \ 669 - ._name = opts->_name, \ 670 - ._name##_from_inode = opts->_name##_from_inode, 671 - BCH_REBALANCE_OPTS() 672 - #undef x 673 - }; 674 - }; 675 - 676 662 #endif /* _BCACHEFS_OPTS_H */

+3 -5

fs/bcachefs/rebalance.c

··· 121 121 } 122 122 } 123 123 incompressible: 124 - if (opts->background_target && 125 - bch2_target_accepts_data(c, BCH_DATA_user, opts->background_target)) { 124 + if (opts->background_target) 126 125 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) 127 126 if (!p.ptr.cached && !bch2_dev_in_target(c, p.ptr.dev, opts->background_target)) 128 127 sectors += p.crc.compressed_size; 129 - } 130 128 131 129 return sectors; 132 130 } ··· 138 140 const struct bch_extent_rebalance *old = bch2_bkey_rebalance_opts(k); 139 141 140 142 if (k.k->type == KEY_TYPE_reflink_v || bch2_bkey_ptrs_need_rebalance(c, opts, k)) { 141 - struct bch_extent_rebalance new = io_opts_to_rebalance_opts(opts); 143 + struct bch_extent_rebalance new = io_opts_to_rebalance_opts(c, opts); 142 144 return old == NULL || memcmp(old, &new, sizeof(new)); 143 145 } else { 144 146 return old != NULL; ··· 161 163 k.k->u64s += sizeof(*old) / sizeof(u64); 162 164 } 163 165 164 - *old = io_opts_to_rebalance_opts(opts); 166 + *old = io_opts_to_rebalance_opts(c, opts); 165 167 } else { 166 168 if (old) 167 169 extent_entry_drop(k, (union bch_extent_entry *) old);

+20

fs/bcachefs/rebalance.h

··· 4 4 5 5 #include "compress.h" 6 6 #include "disk_groups.h" 7 + #include "opts.h" 7 8 #include "rebalance_types.h" 9 + 10 + static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_fs *c, 11 + struct bch_io_opts *opts) 12 + { 13 + struct bch_extent_rebalance r = { 14 + .type = BIT(BCH_EXTENT_ENTRY_rebalance), 15 + #define x(_name) \ 16 + ._name = opts->_name, \ 17 + ._name##_from_inode = opts->_name##_from_inode, 18 + BCH_REBALANCE_OPTS() 19 + #undef x 20 + }; 21 + 22 + if (r.background_target && 23 + !bch2_target_accepts_data(c, BCH_DATA_user, r.background_target)) 24 + r.background_target = 0; 25 + 26 + return r; 27 + }; 8 28 9 29 u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *, struct bkey_s_c); 10 30 int bch2_bkey_set_needs_rebalance(struct bch_fs *, struct bch_io_opts *, struct bkey_i *);

+6 -1

fs/bcachefs/subvolume.c

··· 428 428 bch2_bkey_get_iter_typed(trans, &snapshot_iter, 429 429 BTREE_ID_snapshots, POS(0, snapid), 430 430 0, snapshot); 431 - ret = bkey_err(subvol); 431 + ret = bkey_err(snapshot); 432 432 bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), trans->c, 433 433 "missing snapshot %u", snapid); 434 434 if (ret) ··· 440 440 bch2_bkey_get_iter_typed(trans, &snapshot_tree_iter, 441 441 BTREE_ID_snapshot_trees, POS(0, treeid), 442 442 0, snapshot_tree); 443 + ret = bkey_err(snapshot_tree); 444 + bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), trans->c, 445 + "missing snapshot tree %u", treeid); 446 + if (ret) 447 + goto err; 443 448 444 449 if (le32_to_cpu(snapshot_tree.v->master_subvol) == subvolid) { 445 450 struct bkey_i_snapshot_tree *snapshot_tree_mut =

+11

fs/bcachefs/super.c

··· 411 411 return ret; 412 412 } 413 413 414 + bool bch2_fs_emergency_read_only_locked(struct bch_fs *c) 415 + { 416 + bool ret = !test_and_set_bit(BCH_FS_emergency_ro, &c->flags); 417 + 418 + bch2_journal_halt_locked(&c->journal); 419 + bch2_fs_read_only_async(c); 420 + 421 + wake_up(&bch2_read_only_wait); 422 + return ret; 423 + } 424 + 414 425 static int bch2_fs_read_write_late(struct bch_fs *c) 415 426 { 416 427 int ret;

+1

fs/bcachefs/super.h

··· 29 29 struct bch_dev *bch2_dev_lookup(struct bch_fs *, const char *); 30 30 31 31 bool bch2_fs_emergency_read_only(struct bch_fs *); 32 + bool bch2_fs_emergency_read_only_locked(struct bch_fs *); 32 33 void bch2_fs_read_only(struct bch_fs *); 33 34 34 35 int bch2_fs_read_write(struct bch_fs *);

+13 -1

fs/bcachefs/trace.h

··· 727 727 TP_ARGS(c, str) 728 728 ); 729 729 730 - TRACE_EVENT(discard_buckets, 730 + DECLARE_EVENT_CLASS(discard_buckets_class, 731 731 TP_PROTO(struct bch_fs *c, u64 seen, u64 open, 732 732 u64 need_journal_commit, u64 discarded, const char *err), 733 733 TP_ARGS(c, seen, open, need_journal_commit, discarded, err), ··· 757 757 __entry->need_journal_commit, 758 758 __entry->discarded, 759 759 __entry->err) 760 + ); 761 + 762 + DEFINE_EVENT(discard_buckets_class, discard_buckets, 763 + TP_PROTO(struct bch_fs *c, u64 seen, u64 open, 764 + u64 need_journal_commit, u64 discarded, const char *err), 765 + TP_ARGS(c, seen, open, need_journal_commit, discarded, err) 766 + ); 767 + 768 + DEFINE_EVENT(discard_buckets_class, discard_buckets_fast, 769 + TP_PROTO(struct bch_fs *c, u64 seen, u64 open, 770 + u64 need_journal_commit, u64 discarded, const char *err), 771 + TP_ARGS(c, seen, open, need_journal_commit, discarded, err) 760 772 ); 761 773 762 774 TRACE_EVENT(bucket_invalidate,

+2

fs/btrfs/ctree.c

··· 1496 1496 1497 1497 if (!p->skip_locking) { 1498 1498 btrfs_unlock_up_safe(p, parent_level + 1); 1499 + btrfs_maybe_reset_lockdep_class(root, tmp); 1499 1500 tmp_locked = true; 1500 1501 btrfs_tree_read_lock(tmp); 1501 1502 btrfs_release_path(p); ··· 1540 1539 1541 1540 if (!p->skip_locking) { 1542 1541 ASSERT(ret == -EAGAIN); 1542 + btrfs_maybe_reset_lockdep_class(root, tmp); 1543 1543 tmp_locked = true; 1544 1544 btrfs_tree_read_lock(tmp); 1545 1545 btrfs_release_path(p);

+12

fs/btrfs/ordered-data.c

··· 1229 1229 */ 1230 1230 if (WARN_ON_ONCE(len >= ordered->num_bytes)) 1231 1231 return ERR_PTR(-EINVAL); 1232 + /* 1233 + * If our ordered extent had an error there's no point in continuing. 1234 + * The error may have come from a transaction abort done either by this 1235 + * task or some other concurrent task, and the transaction abort path 1236 + * iterates over all existing ordered extents and sets the flag 1237 + * BTRFS_ORDERED_IOERR on them. 1238 + */ 1239 + if (unlikely(flags & (1U << BTRFS_ORDERED_IOERR))) { 1240 + const int fs_error = BTRFS_FS_ERROR(fs_info); 1241 + 1242 + return fs_error ? ERR_PTR(fs_error) : ERR_PTR(-EIO); 1243 + } 1232 1244 /* We cannot split partially completed ordered extents. */ 1233 1245 if (ordered->bytes_left) { 1234 1246 ASSERT(!(flags & ~BTRFS_ORDERED_TYPE_FLAGS));

+5 -6

fs/btrfs/qgroup.c

··· 1880 1880 * Commit current transaction to make sure all the rfer/excl numbers 1881 1881 * get updated. 1882 1882 */ 1883 - trans = btrfs_start_transaction(fs_info->quota_root, 0); 1884 - if (IS_ERR(trans)) 1885 - return PTR_ERR(trans); 1886 - 1887 - ret = btrfs_commit_transaction(trans); 1883 + ret = btrfs_commit_current_transaction(fs_info->quota_root); 1888 1884 if (ret < 0) 1889 1885 return ret; 1890 1886 ··· 1893 1897 /* 1894 1898 * It's squota and the subvolume still has numbers needed for future 1895 1899 * accounting, in this case we can not delete it. Just skip it. 1900 + * 1901 + * Or the qgroup is already removed by a qgroup rescan. For both cases we're 1902 + * safe to ignore them. 1896 1903 */ 1897 - if (ret == -EBUSY) 1904 + if (ret == -EBUSY || ret == -ENOENT) 1898 1905 ret = 0; 1899 1906 return ret; 1900 1907 }

+3 -1

fs/btrfs/transaction.c

··· 274 274 cur_trans = fs_info->running_transaction; 275 275 if (cur_trans) { 276 276 if (TRANS_ABORTED(cur_trans)) { 277 + const int abort_error = cur_trans->aborted; 278 + 277 279 spin_unlock(&fs_info->trans_lock); 278 - return cur_trans->aborted; 280 + return abort_error; 279 281 } 280 282 if (btrfs_blocked_trans_types[cur_trans->state] & type) { 281 283 spin_unlock(&fs_info->trans_lock);

+3 -3

fs/dcache.c

··· 1700 1700 smp_store_release(&dentry->d_name.name, dname); /* ^^^ */ 1701 1701 1702 1702 dentry->d_flags = 0; 1703 - lockref_init(&dentry->d_lockref, 1); 1703 + lockref_init(&dentry->d_lockref); 1704 1704 seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock); 1705 1705 dentry->d_inode = NULL; 1706 1706 dentry->d_parent = dentry; ··· 2966 2966 goto out_err; 2967 2967 m2 = &alias->d_parent->d_inode->i_rwsem; 2968 2968 out_unalias: 2969 - if (alias->d_op->d_unalias_trylock && 2969 + if (alias->d_op && alias->d_op->d_unalias_trylock && 2970 2970 !alias->d_op->d_unalias_trylock(alias)) 2971 2971 goto out_err; 2972 2972 __d_move(alias, dentry, false); 2973 - if (alias->d_op->d_unalias_unlock) 2973 + if (alias->d_op && alias->d_op->d_unalias_unlock) 2974 2974 alias->d_op->d_unalias_unlock(alias); 2975 2975 ret = 0; 2976 2976 out_err:

+1 -1

fs/erofs/zdata.c

··· 726 726 if (IS_ERR(pcl)) 727 727 return PTR_ERR(pcl); 728 728 729 - lockref_init(&pcl->lockref, 1); /* one ref for this request */ 729 + lockref_init(&pcl->lockref); /* one ref for this request */ 730 730 pcl->algorithmformat = map->m_algorithmformat; 731 731 pcl->length = 0; 732 732 pcl->partial = true;

+16

fs/file_table.c

··· 194 194 * refcount bumps we should reinitialize the reused file first. 195 195 */ 196 196 file_ref_init(&f->f_ref, 1); 197 + /* 198 + * Disable permission and pre-content events for all files by default. 199 + * They may be enabled later by file_set_fsnotify_mode_from_watchers(). 200 + */ 201 + file_set_fsnotify_mode(f, FMODE_NONOTIFY_PERM); 197 202 return 0; 198 203 } 199 204 ··· 380 375 if (IS_ERR(file)) { 381 376 ihold(inode); 382 377 path_put(&path); 378 + return file; 383 379 } 380 + /* 381 + * Disable all fsnotify events for pseudo files by default. 382 + * They may be enabled by caller with file_set_fsnotify_mode(). 383 + */ 384 + file_set_fsnotify_mode(file, FMODE_NONOTIFY); 384 385 return file; 385 386 } 386 387 EXPORT_SYMBOL(alloc_file_pseudo); ··· 411 400 return file; 412 401 } 413 402 file_init_path(file, &path, fops); 403 + /* 404 + * Disable all fsnotify events for pseudo files by default. 405 + * They may be enabled by caller with file_set_fsnotify_mode(). 406 + */ 407 + file_set_fsnotify_mode(file, FMODE_NONOTIFY); 414 408 return file; 415 409 } 416 410 EXPORT_SYMBOL_GPL(alloc_file_pseudo_noaccount);

+1 -1

fs/gfs2/glock.c

··· 1201 1201 if (glops->go_instantiate) 1202 1202 gl->gl_flags |= BIT(GLF_INSTANTIATE_NEEDED); 1203 1203 gl->gl_name = name; 1204 + lockref_init(&gl->gl_lockref); 1204 1205 lockdep_set_subclass(&gl->gl_lockref.lock, glops->go_subclass); 1205 - gl->gl_lockref.count = 1; 1206 1206 gl->gl_state = LM_ST_UNLOCKED; 1207 1207 gl->gl_target = LM_ST_UNLOCKED; 1208 1208 gl->gl_demote_state = LM_ST_EXCLUSIVE;

-1

fs/gfs2/main.c

··· 51 51 { 52 52 struct gfs2_glock *gl = foo; 53 53 54 - spin_lock_init(&gl->gl_lockref.lock); 55 54 INIT_LIST_HEAD(&gl->gl_holders); 56 55 INIT_LIST_HEAD(&gl->gl_lru); 57 56 INIT_LIST_HEAD(&gl->gl_ail_list);

+2 -2

fs/gfs2/quota.c

··· 236 236 return NULL; 237 237 238 238 qd->qd_sbd = sdp; 239 - lockref_init(&qd->qd_lockref, 0); 239 + lockref_init(&qd->qd_lockref); 240 240 qd->qd_id = qid; 241 241 qd->qd_slot = -1; 242 242 INIT_LIST_HEAD(&qd->qd_lru); ··· 297 297 spin_lock_bucket(hash); 298 298 *qdp = qd = gfs2_qd_search_bucket(hash, sdp, qid); 299 299 if (qd == NULL) { 300 - new_qd->qd_lockref.count++; 301 300 *qdp = new_qd; 302 301 list_add(&new_qd->qd_list, &sdp->sd_quota_list); 303 302 hlist_bl_add_head_rcu(&new_qd->qd_hlist, &qd_hash_table[hash]); ··· 1449 1450 if (qd == NULL) 1450 1451 goto fail_brelse; 1451 1452 1453 + qd->qd_lockref.count = 0; 1452 1454 set_bit(QDF_CHANGE, &qd->qd_flags); 1453 1455 qd->qd_change = qc_change; 1454 1456 qd->qd_slot = slot;

+32 -26

fs/namespace.c

··· 5087 5087 { 5088 5088 struct vfsmount *mnt = s->mnt; 5089 5089 struct super_block *sb = mnt->mnt_sb; 5090 + size_t start = seq->count; 5090 5091 int err; 5091 5092 5093 + err = security_sb_show_options(seq, sb); 5094 + if (err) 5095 + return err; 5096 + 5092 5097 if (sb->s_op->show_options) { 5093 - size_t start = seq->count; 5094 - 5095 - err = security_sb_show_options(seq, sb); 5096 - if (err) 5097 - return err; 5098 - 5099 5098 err = sb->s_op->show_options(seq, mnt->mnt_root); 5100 5099 if (err) 5101 5100 return err; 5102 - 5103 - if (unlikely(seq_has_overflowed(seq))) 5104 - return -EAGAIN; 5105 - 5106 - if (seq->count == start) 5107 - return 0; 5108 - 5109 - /* skip leading comma */ 5110 - memmove(seq->buf + start, seq->buf + start + 1, 5111 - seq->count - start - 1); 5112 - seq->count--; 5113 5101 } 5102 + 5103 + if (unlikely(seq_has_overflowed(seq))) 5104 + return -EAGAIN; 5105 + 5106 + if (seq->count == start) 5107 + return 0; 5108 + 5109 + /* skip leading comma */ 5110 + memmove(seq->buf + start, seq->buf + start + 1, 5111 + seq->count - start - 1); 5112 + seq->count--; 5114 5113 5115 5114 return 0; 5116 5115 } ··· 5190 5191 size_t kbufsize; 5191 5192 struct seq_file *seq = &s->seq; 5192 5193 struct statmount *sm = &s->sm; 5193 - u32 start = seq->count; 5194 + u32 start, *offp; 5195 + 5196 + /* Reserve an empty string at the beginning for any unset offsets */ 5197 + if (!seq->count) 5198 + seq_putc(seq, 0); 5199 + 5200 + start = seq->count; 5194 5201 5195 5202 switch (flag) { 5196 5203 case STATMOUNT_FS_TYPE: 5197 - sm->fs_type = start; 5204 + offp = &sm->fs_type; 5198 5205 ret = statmount_fs_type(s, seq); 5199 5206 break; 5200 5207 case STATMOUNT_MNT_ROOT: 5201 - sm->mnt_root = start; 5208 + offp = &sm->mnt_root; 5202 5209 ret = statmount_mnt_root(s, seq); 5203 5210 break; 5204 5211 case STATMOUNT_MNT_POINT: 5205 - sm->mnt_point = start; 5212 + offp = &sm->mnt_point; 5206 5213 ret = statmount_mnt_point(s, seq); 5207 5214 break; 5208 5215 case STATMOUNT_MNT_OPTS: 5209 - sm->mnt_opts = start; 5216 + offp = &sm->mnt_opts; 5210 5217 ret = statmount_mnt_opts(s, seq); 5211 5218 break; 5212 5219 case STATMOUNT_OPT_ARRAY: 5213 - sm->opt_array = start; 5220 + offp = &sm->opt_array; 5214 5221 ret = statmount_opt_array(s, seq); 5215 5222 break; 5216 5223 case STATMOUNT_OPT_SEC_ARRAY: 5217 - sm->opt_sec_array = start; 5224 + offp = &sm->opt_sec_array; 5218 5225 ret = statmount_opt_sec_array(s, seq); 5219 5226 break; 5220 5227 case STATMOUNT_FS_SUBTYPE: 5221 - sm->fs_subtype = start; 5228 + offp = &sm->fs_subtype; 5222 5229 statmount_fs_subtype(s, seq); 5223 5230 break; 5224 5231 case STATMOUNT_SB_SOURCE: 5225 - sm->sb_source = start; 5232 + offp = &sm->sb_source; 5226 5233 ret = statmount_sb_source(s, seq); 5227 5234 break; 5228 5235 default: ··· 5256 5251 5257 5252 seq->buf[seq->count++] = '\0'; 5258 5253 sm->mask |= flag; 5254 + *offp = start; 5259 5255 return 0; 5260 5256 } 5261 5257

+12 -6

fs/notify/fsnotify.c

··· 648 648 * Later, fsnotify permission hooks do not check if there are permission event 649 649 * watches, but that there were permission event watches at open time. 650 650 */ 651 - void file_set_fsnotify_mode(struct file *file) 651 + void file_set_fsnotify_mode_from_watchers(struct file *file) 652 652 { 653 653 struct dentry *dentry = file->f_path.dentry, *parent; 654 654 struct super_block *sb = dentry->d_sb; ··· 665 665 */ 666 666 if (likely(!fsnotify_sb_has_priority_watchers(sb, 667 667 FSNOTIFY_PRIO_CONTENT))) { 668 - file->f_mode |= FMODE_NONOTIFY_PERM; 668 + file_set_fsnotify_mode(file, FMODE_NONOTIFY_PERM); 669 669 return; 670 670 } 671 671 ··· 676 676 if ((!d_is_dir(dentry) && !d_is_reg(dentry)) || 677 677 likely(!fsnotify_sb_has_priority_watchers(sb, 678 678 FSNOTIFY_PRIO_PRE_CONTENT))) { 679 - file->f_mode |= FMODE_NONOTIFY | FMODE_NONOTIFY_PERM; 679 + file_set_fsnotify_mode(file, FMODE_NONOTIFY | FMODE_NONOTIFY_PERM); 680 680 return; 681 681 } 682 682 ··· 686 686 */ 687 687 mnt_mask = READ_ONCE(real_mount(file->f_path.mnt)->mnt_fsnotify_mask); 688 688 if (unlikely(fsnotify_object_watched(d_inode(dentry), mnt_mask, 689 - FSNOTIFY_PRE_CONTENT_EVENTS))) 689 + FSNOTIFY_PRE_CONTENT_EVENTS))) { 690 + /* Enable pre-content events */ 691 + file_set_fsnotify_mode(file, 0); 690 692 return; 693 + } 691 694 692 695 /* Is parent watching for pre-content events on this file? */ 693 696 if (dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) { 694 697 parent = dget_parent(dentry); 695 698 p_mask = fsnotify_inode_watches_children(d_inode(parent)); 696 699 dput(parent); 697 - if (p_mask & FSNOTIFY_PRE_CONTENT_EVENTS) 700 + if (p_mask & FSNOTIFY_PRE_CONTENT_EVENTS) { 701 + /* Enable pre-content events */ 702 + file_set_fsnotify_mode(file, 0); 698 703 return; 704 + } 699 705 } 700 706 /* Nobody watching for pre-content events from this file */ 701 - file->f_mode |= FMODE_NONOTIFY | FMODE_NONOTIFY_PERM; 707 + file_set_fsnotify_mode(file, FMODE_NONOTIFY | FMODE_NONOTIFY_PERM); 702 708 } 703 709 #endif 704 710

+6 -5

fs/open.c

··· 905 905 f->f_sb_err = file_sample_sb_err(f); 906 906 907 907 if (unlikely(f->f_flags & O_PATH)) { 908 - f->f_mode = FMODE_PATH | FMODE_OPENED | FMODE_NONOTIFY; 908 + f->f_mode = FMODE_PATH | FMODE_OPENED; 909 + file_set_fsnotify_mode(f, FMODE_NONOTIFY); 909 910 f->f_op = &empty_fops; 910 911 return 0; 911 912 } ··· 936 935 937 936 /* 938 937 * Set FMODE_NONOTIFY_* bits according to existing permission watches. 939 - * If FMODE_NONOTIFY was already set for an fanotify fd, this doesn't 940 - * change anything. 938 + * If FMODE_NONOTIFY mode was already set for an fanotify fd or for a 939 + * pseudo file, this call will not change the mode. 941 940 */ 942 - file_set_fsnotify_mode(f); 941 + file_set_fsnotify_mode_from_watchers(f); 943 942 error = fsnotify_open_perm(f); 944 943 if (error) 945 944 goto cleanup_all; ··· 1123 1122 if (!IS_ERR(f)) { 1124 1123 int error; 1125 1124 1126 - f->f_mode |= FMODE_NONOTIFY; 1125 + file_set_fsnotify_mode(f, FMODE_NONOTIFY); 1127 1126 error = vfs_open(path, f); 1128 1127 if (error) { 1129 1128 fput(f);

+11 -1

fs/pidfs.c

··· 287 287 switch (cmd) { 288 288 case FS_IOC_GETVERSION: 289 289 case PIDFD_GET_CGROUP_NAMESPACE: 290 - case PIDFD_GET_INFO: 291 290 case PIDFD_GET_IPC_NAMESPACE: 292 291 case PIDFD_GET_MNT_NAMESPACE: 293 292 case PIDFD_GET_NET_NAMESPACE: ··· 297 298 case PIDFD_GET_USER_NAMESPACE: 298 299 case PIDFD_GET_PID_NAMESPACE: 299 300 return true; 301 + } 302 + 303 + /* Extensible ioctls require some more careful checks. */ 304 + switch (_IOC_NR(cmd)) { 305 + case _IOC_NR(PIDFD_GET_INFO): 306 + /* 307 + * Try to prevent performing a pidfd ioctl when someone 308 + * erronously mistook the file descriptor for a pidfd. 309 + * This is not perfect but will catch most cases. 310 + */ 311 + return (_IOC_TYPE(cmd) == _IOC_TYPE(PIDFD_GET_INFO)); 300 312 } 301 313 302 314 return false;

+6

fs/pipe.c

··· 960 960 res[1] = f; 961 961 stream_open(inode, res[0]); 962 962 stream_open(inode, res[1]); 963 + /* 964 + * Disable permission and pre-content events, but enable legacy 965 + * inotify events for legacy users. 966 + */ 967 + file_set_fsnotify_mode(res[0], FMODE_NONOTIFY_PERM); 968 + file_set_fsnotify_mode(res[1], FMODE_NONOTIFY_PERM); 963 969 return 0; 964 970 } 965 971

+7 -7

fs/smb/client/cifsglob.h

··· 357 357 int (*handle_cancelled_mid)(struct mid_q_entry *, struct TCP_Server_Info *); 358 358 void (*downgrade_oplock)(struct TCP_Server_Info *server, 359 359 struct cifsInodeInfo *cinode, __u32 oplock, 360 - unsigned int epoch, bool *purge_cache); 360 + __u16 epoch, bool *purge_cache); 361 361 /* process transaction2 response */ 362 362 bool (*check_trans2)(struct mid_q_entry *, struct TCP_Server_Info *, 363 363 char *, int); ··· 552 552 /* if we can do cache read operations */ 553 553 bool (*is_read_op)(__u32); 554 554 /* set oplock level for the inode */ 555 - void (*set_oplock_level)(struct cifsInodeInfo *, __u32, unsigned int, 556 - bool *); 555 + void (*set_oplock_level)(struct cifsInodeInfo *cinode, __u32 oplock, __u16 epoch, 556 + bool *purge_cache); 557 557 /* create lease context buffer for CREATE request */ 558 558 char * (*create_lease_buf)(u8 *lease_key, u8 oplock); 559 559 /* parse lease context buffer and return oplock/epoch info */ 560 - __u8 (*parse_lease_buf)(void *buf, unsigned int *epoch, char *lkey); 560 + __u8 (*parse_lease_buf)(void *buf, __u16 *epoch, char *lkey); 561 561 ssize_t (*copychunk_range)(const unsigned int, 562 562 struct cifsFileInfo *src_file, 563 563 struct cifsFileInfo *target_file, ··· 1447 1447 __u8 create_guid[16]; 1448 1448 __u32 access; 1449 1449 struct cifs_pending_open *pending_open; 1450 - unsigned int epoch; 1450 + __u16 epoch; 1451 1451 #ifdef CONFIG_CIFS_DEBUG2 1452 1452 __u64 mid; 1453 1453 #endif /* CIFS_DEBUG2 */ ··· 1480 1480 bool oplock_break_cancelled:1; 1481 1481 bool status_file_deleted:1; /* file has been deleted */ 1482 1482 bool offload:1; /* offload final part of _put to a wq */ 1483 - unsigned int oplock_epoch; /* epoch from the lease break */ 1483 + __u16 oplock_epoch; /* epoch from the lease break */ 1484 1484 __u32 oplock_level; /* oplock/lease level from the lease break */ 1485 1485 int count; 1486 1486 spinlock_t file_info_lock; /* protects four flag/count fields above */ ··· 1577 1577 spinlock_t open_file_lock; /* protects openFileList */ 1578 1578 __u32 cifsAttrs; /* e.g. DOS archive bit, sparse, compressed, system */ 1579 1579 unsigned int oplock; /* oplock/lease level we have */ 1580 - unsigned int epoch; /* used to track lease state changes */ 1580 + __u16 epoch; /* used to track lease state changes */ 1581 1581 #define CIFS_INODE_PENDING_OPLOCK_BREAK (0) /* oplock break in progress */ 1582 1582 #define CIFS_INODE_PENDING_WRITERS (1) /* Writes in progress */ 1583 1583 #define CIFS_INODE_FLAG_UNUSED (2) /* Unused flag */

+16 -14

fs/smb/client/dfs.c

··· 150 150 if (rc) 151 151 continue; 152 152 153 - if (tgt.flags & DFSREF_STORAGE_SERVER) { 154 - rc = cifs_mount_get_tcon(mnt_ctx); 155 - if (!rc) 156 - rc = cifs_is_path_remote(mnt_ctx); 153 + rc = cifs_mount_get_tcon(mnt_ctx); 154 + if (rc) { 155 + if (tgt.server_type == DFS_TYPE_LINK && 156 + DFS_INTERLINK(tgt.flags)) 157 + rc = -EREMOTE; 158 + } else { 159 + rc = cifs_is_path_remote(mnt_ctx); 157 160 if (!rc) { 158 161 ref_walk_set_tgt_hint(rw); 159 162 break; 160 163 } 161 - if (rc != -EREMOTE) 162 - continue; 163 164 } 164 - 165 - rc = ref_walk_advance(rw); 166 - if (!rc) { 167 - rc = setup_dfs_ref(&tgt, rw); 168 - if (rc) 169 - break; 170 - ref_walk_mark_end(rw); 171 - goto again; 165 + if (rc == -EREMOTE) { 166 + rc = ref_walk_advance(rw); 167 + if (!rc) { 168 + rc = setup_dfs_ref(&tgt, rw); 169 + if (rc) 170 + break; 171 + ref_walk_mark_end(rw); 172 + goto again; 173 + } 172 174 } 173 175 } 174 176 } while (rc && ref_walk_descend(rw));

+7

fs/smb/client/dfs.h

··· 188 188 } 189 189 } 190 190 191 + static inline const char *dfs_ses_refpath(struct cifs_ses *ses) 192 + { 193 + const char *path = ses->server->leaf_fullpath; 194 + 195 + return path ? path + 1 : ERR_PTR(-ENOENT); 196 + } 197 + 191 198 #endif /* _CIFS_DFS_H */

+5 -22

fs/smb/client/dfs_cache.c

··· 1136 1136 return ret; 1137 1137 } 1138 1138 1139 - static char *get_ses_refpath(struct cifs_ses *ses) 1140 - { 1141 - struct TCP_Server_Info *server = ses->server; 1142 - char *path = ERR_PTR(-ENOENT); 1143 - 1144 - if (server->leaf_fullpath) { 1145 - path = kstrdup(server->leaf_fullpath + 1, GFP_KERNEL); 1146 - if (!path) 1147 - path = ERR_PTR(-ENOMEM); 1148 - } 1149 - return path; 1150 - } 1151 - 1152 1139 /* Refresh dfs referral of @ses */ 1153 1140 static void refresh_ses_referral(struct cifs_ses *ses) 1154 1141 { 1155 1142 struct cache_entry *ce; 1156 1143 unsigned int xid; 1157 - char *path; 1144 + const char *path; 1158 1145 int rc = 0; 1159 1146 1160 1147 xid = get_xid(); 1161 1148 1162 - path = get_ses_refpath(ses); 1149 + path = dfs_ses_refpath(ses); 1163 1150 if (IS_ERR(path)) { 1164 1151 rc = PTR_ERR(path); 1165 - path = NULL; 1166 1152 goto out; 1167 1153 } 1168 1154 ··· 1167 1181 1168 1182 out: 1169 1183 free_xid(xid); 1170 - kfree(path); 1171 1184 } 1172 1185 1173 1186 static int __refresh_tcon_referral(struct cifs_tcon *tcon, ··· 1216 1231 struct dfs_info3_param *refs = NULL; 1217 1232 struct cache_entry *ce; 1218 1233 struct cifs_ses *ses; 1219 - unsigned int xid; 1220 1234 bool needs_refresh; 1221 - char *path; 1235 + const char *path; 1236 + unsigned int xid; 1222 1237 int numrefs = 0; 1223 1238 int rc = 0; 1224 1239 1225 1240 xid = get_xid(); 1226 1241 ses = tcon->ses; 1227 1242 1228 - path = get_ses_refpath(ses); 1243 + path = dfs_ses_refpath(ses); 1229 1244 if (IS_ERR(path)) { 1230 1245 rc = PTR_ERR(path); 1231 - path = NULL; 1232 1246 goto out; 1233 1247 } 1234 1248 ··· 1255 1271 1256 1272 out: 1257 1273 free_xid(xid); 1258 - kfree(path); 1259 1274 free_dfs_info_array(refs, numrefs); 1260 1275 } 1261 1276

+1 -1

fs/smb/client/smb1ops.c

··· 377 377 static void 378 378 cifs_downgrade_oplock(struct TCP_Server_Info *server, 379 379 struct cifsInodeInfo *cinode, __u32 oplock, 380 - unsigned int epoch, bool *purge_cache) 380 + __u16 epoch, bool *purge_cache) 381 381 { 382 382 cifs_set_oplock_level(cinode, oplock); 383 383 }

+9 -9

fs/smb/client/smb2ops.c

··· 3904 3904 static void 3905 3905 smb2_downgrade_oplock(struct TCP_Server_Info *server, 3906 3906 struct cifsInodeInfo *cinode, __u32 oplock, 3907 - unsigned int epoch, bool *purge_cache) 3907 + __u16 epoch, bool *purge_cache) 3908 3908 { 3909 3909 server->ops->set_oplock_level(cinode, oplock, 0, NULL); 3910 3910 } 3911 3911 3912 3912 static void 3913 3913 smb21_set_oplock_level(struct cifsInodeInfo *cinode, __u32 oplock, 3914 - unsigned int epoch, bool *purge_cache); 3914 + __u16 epoch, bool *purge_cache); 3915 3915 3916 3916 static void 3917 3917 smb3_downgrade_oplock(struct TCP_Server_Info *server, 3918 3918 struct cifsInodeInfo *cinode, __u32 oplock, 3919 - unsigned int epoch, bool *purge_cache) 3919 + __u16 epoch, bool *purge_cache) 3920 3920 { 3921 3921 unsigned int old_state = cinode->oplock; 3922 - unsigned int old_epoch = cinode->epoch; 3922 + __u16 old_epoch = cinode->epoch; 3923 3923 unsigned int new_state; 3924 3924 3925 3925 if (epoch > old_epoch) { ··· 3939 3939 3940 3940 static void 3941 3941 smb2_set_oplock_level(struct cifsInodeInfo *cinode, __u32 oplock, 3942 - unsigned int epoch, bool *purge_cache) 3942 + __u16 epoch, bool *purge_cache) 3943 3943 { 3944 3944 oplock &= 0xFF; 3945 3945 cinode->lease_granted = false; ··· 3963 3963 3964 3964 static void 3965 3965 smb21_set_oplock_level(struct cifsInodeInfo *cinode, __u32 oplock, 3966 - unsigned int epoch, bool *purge_cache) 3966 + __u16 epoch, bool *purge_cache) 3967 3967 { 3968 3968 char message[5] = {0}; 3969 3969 unsigned int new_oplock = 0; ··· 4000 4000 4001 4001 static void 4002 4002 smb3_set_oplock_level(struct cifsInodeInfo *cinode, __u32 oplock, 4003 - unsigned int epoch, bool *purge_cache) 4003 + __u16 epoch, bool *purge_cache) 4004 4004 { 4005 4005 unsigned int old_oplock = cinode->oplock; 4006 4006 ··· 4114 4114 } 4115 4115 4116 4116 static __u8 4117 - smb2_parse_lease_buf(void *buf, unsigned int *epoch, char *lease_key) 4117 + smb2_parse_lease_buf(void *buf, __u16 *epoch, char *lease_key) 4118 4118 { 4119 4119 struct create_lease *lc = (struct create_lease *)buf; 4120 4120 ··· 4125 4125 } 4126 4126 4127 4127 static __u8 4128 - smb3_parse_lease_buf(void *buf, unsigned int *epoch, char *lease_key) 4128 + smb3_parse_lease_buf(void *buf, __u16 *epoch, char *lease_key) 4129 4129 { 4130 4130 struct create_lease_v2 *lc = (struct create_lease_v2 *)buf; 4131 4131

+2 -2

fs/smb/client/smb2pdu.c

··· 2169 2169 2170 2170 tcon_error_exit: 2171 2171 if (rsp && rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) 2172 - cifs_tcon_dbg(VFS, "BAD_NETWORK_NAME: %s\n", tree); 2172 + cifs_dbg(VFS | ONCE, "BAD_NETWORK_NAME: %s\n", tree); 2173 2173 goto tcon_exit; 2174 2174 } 2175 2175 ··· 2329 2329 2330 2330 int smb2_parse_contexts(struct TCP_Server_Info *server, 2331 2331 struct kvec *rsp_iov, 2332 - unsigned int *epoch, 2332 + __u16 *epoch, 2333 2333 char *lease_key, __u8 *oplock, 2334 2334 struct smb2_file_all_info *buf, 2335 2335 struct create_posix_rsp *posix)

+1 -1

fs/smb/client/smb2proto.h

··· 283 283 enum securityEnum); 284 284 int smb2_parse_contexts(struct TCP_Server_Info *server, 285 285 struct kvec *rsp_iov, 286 - unsigned int *epoch, 286 + __u16 *epoch, 287 287 char *lease_key, __u8 *oplock, 288 288 struct smb2_file_all_info *buf, 289 289 struct create_posix_rsp *posix);

+3 -1

fs/stat.c

··· 281 281 u32 request_mask) 282 282 { 283 283 int error = vfs_getattr(path, stat, request_mask, flags); 284 + if (error) 285 + return error; 284 286 285 287 if (request_mask & STATX_MNT_ID_UNIQUE) { 286 288 stat->mnt_id = real_mount(path->mnt)->mnt_id_unique; ··· 304 302 if (S_ISBLK(stat->mode)) 305 303 bdev_statx(path, stat, request_mask); 306 304 307 - return error; 305 + return 0; 308 306 } 309 307 310 308 static int vfs_statx_fd(int fd, int flags, struct kstat *stat,

+2 -1

fs/vboxsf/super.c

··· 21 21 22 22 #define VBOXSF_SUPER_MAGIC 0x786f4256 /* 'VBox' little endian */ 23 23 24 - static const unsigned char VBSF_MOUNT_SIGNATURE[4] = "\000\377\376\375"; 24 + static const unsigned char VBSF_MOUNT_SIGNATURE[4] = { '\000', '\377', '\376', 25 + '\375' }; 25 26 26 27 static int follow_symlinks; 27 28 module_param(follow_symlinks, int, 0444);

+7 -6

fs/xfs/libxfs/xfs_bmap.c

··· 3563 3563 int error; 3564 3564 3565 3565 /* 3566 - * If there are already extents in the file, try an exact EOF block 3567 - * allocation to extend the file as a contiguous extent. If that fails, 3568 - * or it's the first allocation in a file, just try for a stripe aligned 3569 - * allocation. 3566 + * If there are already extents in the file, and xfs_bmap_adjacent() has 3567 + * given a better blkno, try an exact EOF block allocation to extend the 3568 + * file as a contiguous extent. If that fails, or it's the first 3569 + * allocation in a file, just try for a stripe aligned allocation. 3570 3570 */ 3571 - if (ap->offset) { 3571 + if (ap->eof) { 3572 3572 xfs_extlen_t nextminlen = 0; 3573 3573 3574 3574 /* ··· 3736 3736 int error; 3737 3737 3738 3738 ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino); 3739 - xfs_bmap_adjacent(ap); 3739 + if (!xfs_bmap_adjacent(ap)) 3740 + ap->eof = false; 3740 3741 3741 3742 /* 3742 3743 * Search for an allocation group with a single extent large enough for

+17 -19

fs/xfs/xfs_buf.c

··· 41 41 * 42 42 * xfs_buf_rele: 43 43 * b_lock 44 - * pag_buf_lock 45 - * lru_lock 44 + * lru_lock 46 45 * 47 46 * xfs_buftarg_drain_rele 48 47 * lru_lock ··· 219 220 */ 220 221 flags &= ~(XBF_UNMAPPED | XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD); 221 222 222 - spin_lock_init(&bp->b_lock); 223 + /* 224 + * A new buffer is held and locked by the owner. This ensures that the 225 + * buffer is owned by the caller and racing RCU lookups right after 226 + * inserting into the hash table are safe (and will have to wait for 227 + * the unlock to do anything non-trivial). 228 + */ 223 229 bp->b_hold = 1; 230 + sema_init(&bp->b_sema, 0); /* held, no waiters */ 231 + 232 + spin_lock_init(&bp->b_lock); 224 233 atomic_set(&bp->b_lru_ref, 1); 225 234 init_completion(&bp->b_iowait); 226 235 INIT_LIST_HEAD(&bp->b_lru); 227 236 INIT_LIST_HEAD(&bp->b_list); 228 237 INIT_LIST_HEAD(&bp->b_li_list); 229 - sema_init(&bp->b_sema, 0); /* held, no waiters */ 230 238 bp->b_target = target; 231 239 bp->b_mount = target->bt_mount; 232 240 bp->b_flags = flags; 233 241 234 - /* 235 - * Set length and io_length to the same value initially. 236 - * I/O routines should use io_length, which will be the same in 237 - * most cases but may be reset (e.g. XFS recovery). 238 - */ 239 242 error = xfs_buf_get_maps(bp, nmaps); 240 243 if (error) { 241 244 kmem_cache_free(xfs_buf_cache, bp); ··· 503 502 xfs_buf_cache_init( 504 503 struct xfs_buf_cache *bch) 505 504 { 506 - spin_lock_init(&bch->bc_lock); 507 505 return rhashtable_init(&bch->bc_hash, &xfs_buf_hash_params); 508 506 } 509 507 ··· 652 652 if (error) 653 653 goto out_free_buf; 654 654 655 - spin_lock(&bch->bc_lock); 655 + /* The new buffer keeps the perag reference until it is freed. */ 656 + new_bp->b_pag = pag; 657 + 658 + rcu_read_lock(); 656 659 bp = rhashtable_lookup_get_insert_fast(&bch->bc_hash, 657 660 &new_bp->b_rhash_head, xfs_buf_hash_params); 658 661 if (IS_ERR(bp)) { 662 + rcu_read_unlock(); 659 663 error = PTR_ERR(bp); 660 - spin_unlock(&bch->bc_lock); 661 664 goto out_free_buf; 662 665 } 663 666 if (bp && xfs_buf_try_hold(bp)) { 664 667 /* found an existing buffer */ 665 - spin_unlock(&bch->bc_lock); 668 + rcu_read_unlock(); 666 669 error = xfs_buf_find_lock(bp, flags); 667 670 if (error) 668 671 xfs_buf_rele(bp); ··· 673 670 *bpp = bp; 674 671 goto out_free_buf; 675 672 } 673 + rcu_read_unlock(); 676 674 677 - /* The new buffer keeps the perag reference until it is freed. */ 678 - new_bp->b_pag = pag; 679 - spin_unlock(&bch->bc_lock); 680 675 *bpp = new_bp; 681 676 return 0; 682 677 ··· 1091 1090 } 1092 1091 1093 1092 /* we are asked to drop the last reference */ 1094 - spin_lock(&bch->bc_lock); 1095 1093 __xfs_buf_ioacct_dec(bp); 1096 1094 if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) { 1097 1095 /* ··· 1102 1102 bp->b_state &= ~XFS_BSTATE_DISPOSE; 1103 1103 else 1104 1104 bp->b_hold--; 1105 - spin_unlock(&bch->bc_lock); 1106 1105 } else { 1107 1106 bp->b_hold--; 1108 1107 /* ··· 1119 1120 ASSERT(!(bp->b_flags & _XBF_DELWRI_Q)); 1120 1121 rhashtable_remove_fast(&bch->bc_hash, &bp->b_rhash_head, 1121 1122 xfs_buf_hash_params); 1122 - spin_unlock(&bch->bc_lock); 1123 1123 if (pag) 1124 1124 xfs_perag_put(pag); 1125 1125 freebuf = true;

-1

fs/xfs/xfs_buf.h

··· 80 80 #define XFS_BSTATE_IN_FLIGHT (1 << 1) /* I/O in flight */ 81 81 82 82 struct xfs_buf_cache { 83 - spinlock_t bc_lock; 84 83 struct rhashtable bc_hash; 85 84 }; 86 85

+27 -44

fs/xfs/xfs_exchrange.c

··· 329 329 * successfully but before locks are dropped. 330 330 */ 331 331 332 - /* Verify that we have security clearance to perform this operation. */ 333 - static int 334 - xfs_exchange_range_verify_area( 335 - struct xfs_exchrange *fxr) 336 - { 337 - int ret; 338 - 339 - ret = remap_verify_area(fxr->file1, fxr->file1_offset, fxr->length, 340 - true); 341 - if (ret) 342 - return ret; 343 - 344 - return remap_verify_area(fxr->file2, fxr->file2_offset, fxr->length, 345 - true); 346 - } 347 - 348 332 /* 349 333 * Performs necessary checks before doing a range exchange, having stabilized 350 334 * mutable inode attributes via i_rwsem. ··· 339 355 unsigned int alloc_unit) 340 356 { 341 357 struct inode *inode1 = file_inode(fxr->file1); 358 + loff_t size1 = i_size_read(inode1); 342 359 struct inode *inode2 = file_inode(fxr->file2); 360 + loff_t size2 = i_size_read(inode2); 343 361 uint64_t allocmask = alloc_unit - 1; 344 362 int64_t test_len; 345 363 uint64_t blen; 346 - loff_t size1, size2, tmp; 364 + loff_t tmp; 347 365 int error; 348 366 349 367 /* Don't touch certain kinds of inodes */ ··· 354 368 if (IS_SWAPFILE(inode1) || IS_SWAPFILE(inode2)) 355 369 return -ETXTBSY; 356 370 357 - size1 = i_size_read(inode1); 358 - size2 = i_size_read(inode2); 359 - 360 371 /* Ranges cannot start after EOF. */ 361 372 if (fxr->file1_offset > size1 || fxr->file2_offset > size2) 362 373 return -EINVAL; 363 374 364 - /* 365 - * If the caller said to exchange to EOF, we set the length of the 366 - * request large enough to cover everything to the end of both files. 367 - */ 368 375 if (fxr->flags & XFS_EXCHANGE_RANGE_TO_EOF) { 376 + /* 377 + * If the caller said to exchange to EOF, we set the length of 378 + * the request large enough to cover everything to the end of 379 + * both files. 380 + */ 369 381 fxr->length = max_t(int64_t, size1 - fxr->file1_offset, 370 382 size2 - fxr->file2_offset); 371 - 372 - error = xfs_exchange_range_verify_area(fxr); 373 - if (error) 374 - return error; 383 + } else { 384 + /* 385 + * Otherwise we require both ranges to end within EOF. 386 + */ 387 + if (fxr->file1_offset + fxr->length > size1 || 388 + fxr->file2_offset + fxr->length > size2) 389 + return -EINVAL; 375 390 } 376 391 377 392 /* ··· 386 399 /* Ensure offsets don't wrap. */ 387 400 if (check_add_overflow(fxr->file1_offset, fxr->length, &tmp) || 388 401 check_add_overflow(fxr->file2_offset, fxr->length, &tmp)) 389 - return -EINVAL; 390 - 391 - /* 392 - * We require both ranges to end within EOF, unless we're exchanging 393 - * to EOF. 394 - */ 395 - if (!(fxr->flags & XFS_EXCHANGE_RANGE_TO_EOF) && 396 - (fxr->file1_offset + fxr->length > size1 || 397 - fxr->file2_offset + fxr->length > size2)) 398 402 return -EINVAL; 399 403 400 404 /* ··· 725 747 { 726 748 struct inode *inode1 = file_inode(fxr->file1); 727 749 struct inode *inode2 = file_inode(fxr->file2); 750 + loff_t check_len = fxr->length; 728 751 int ret; 729 752 730 753 BUILD_BUG_ON(XFS_EXCHANGE_RANGE_ALL_FLAGS & ··· 758 779 return -EBADF; 759 780 760 781 /* 761 - * If we're not exchanging to EOF, we can check the areas before 762 - * stabilizing both files' i_size. 782 + * If we're exchanging to EOF we can't calculate the length until taking 783 + * the iolock. Pass a 0 length to remap_verify_area similar to the 784 + * FICLONE and FICLONERANGE ioctls that support cloning to EOF as well. 763 785 */ 764 - if (!(fxr->flags & XFS_EXCHANGE_RANGE_TO_EOF)) { 765 - ret = xfs_exchange_range_verify_area(fxr); 766 - if (ret) 767 - return ret; 768 - } 786 + if (fxr->flags & XFS_EXCHANGE_RANGE_TO_EOF) 787 + check_len = 0; 788 + ret = remap_verify_area(fxr->file1, fxr->file1_offset, check_len, true); 789 + if (ret) 790 + return ret; 791 + ret = remap_verify_area(fxr->file2, fxr->file2_offset, check_len, true); 792 + if (ret) 793 + return ret; 769 794 770 795 /* Update cmtime if the fd/inode don't forbid it. */ 771 796 if (!(fxr->file1->f_mode & FMODE_NOCMTIME) && !IS_NOCMTIME(inode1))

+5 -2

fs/xfs/xfs_inode.c

··· 1404 1404 goto out; 1405 1405 1406 1406 /* Try to clean out the cow blocks if there are any. */ 1407 - if (xfs_inode_has_cow_data(ip)) 1408 - xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true); 1407 + if (xfs_inode_has_cow_data(ip)) { 1408 + error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true); 1409 + if (error) 1410 + goto out; 1411 + } 1409 1412 1410 1413 if (VFS_I(ip)->i_nlink != 0) { 1411 1414 /*

+2 -4

fs/xfs/xfs_iomap.c

··· 976 976 if (!xfs_is_cow_inode(ip)) 977 977 return 0; 978 978 979 - if (!written) { 980 - xfs_reflink_cancel_cow_range(ip, pos, length, true); 981 - return 0; 982 - } 979 + if (!written) 980 + return xfs_reflink_cancel_cow_range(ip, pos, length, true); 983 981 984 982 return xfs_reflink_end_cow(ip, pos, written); 985 983 }

+1

include/asm-generic/vmlinux.lds.h

··· 1038 1038 *(.discard) \ 1039 1039 *(.discard.*) \ 1040 1040 *(.export_symbol) \ 1041 + *(.no_trim_symbol) \ 1041 1042 *(.modinfo) \ 1042 1043 /* ld.bfd warns about .gnu.version* even when not emitted */ \ 1043 1044 *(.gnu.version*) \

+1

include/drm/drm_print.h

··· 32 32 #include <linux/dynamic_debug.h> 33 33 34 34 #include <drm/drm.h> 35 + #include <drm/drm_device.h> 35 36 36 37 struct debugfs_regset32; 37 38 struct drm_device;

+19 -13

include/linux/compiler.h

··· 191 191 __v; \ 192 192 }) 193 193 194 + #ifdef __CHECKER__ 195 + #define __BUILD_BUG_ON_ZERO_MSG(e, msg) (0) 196 + #else /* __CHECKER__ */ 197 + #define __BUILD_BUG_ON_ZERO_MSG(e, msg) ((int)sizeof(struct {_Static_assert(!(e), msg);})) 198 + #endif /* __CHECKER__ */ 199 + 200 + /* &a[0] degrades to a pointer: a different type from an array */ 201 + #define __is_array(a) (!__same_type((a), &(a)[0])) 202 + #define __must_be_array(a) __BUILD_BUG_ON_ZERO_MSG(!__is_array(a), \ 203 + "must be array") 204 + 205 + #define __is_byte_array(a) (__is_array(a) && sizeof((a)[0]) == 1) 206 + #define __must_be_byte_array(a) __BUILD_BUG_ON_ZERO_MSG(!__is_byte_array(a), \ 207 + "must be byte array") 208 + 209 + /* Require C Strings (i.e. NUL-terminated) lack the "nonstring" attribute. */ 210 + #define __must_be_cstr(p) \ 211 + __BUILD_BUG_ON_ZERO_MSG(__annotated(p, nonstring), "must be cstr (NUL-terminated)") 212 + 194 213 #endif /* __KERNEL__ */ 195 214 196 215 /** ··· 249 230 .popsection; 250 231 251 232 #define __ADDRESSABLE_ASM_STR(sym) __stringify(__ADDRESSABLE_ASM(sym)) 252 - 253 - #ifdef __CHECKER__ 254 - #define __BUILD_BUG_ON_ZERO_MSG(e, msg) (0) 255 - #else /* __CHECKER__ */ 256 - #define __BUILD_BUG_ON_ZERO_MSG(e, msg) ((int)sizeof(struct {_Static_assert(!(e), msg);})) 257 - #endif /* __CHECKER__ */ 258 - 259 - /* &a[0] degrades to a pointer: a different type from an array */ 260 - #define __must_be_array(a) __BUILD_BUG_ON_ZERO_MSG(__same_type((a), &(a)[0]), "must be array") 261 - 262 - /* Require C Strings (i.e. NUL-terminated) lack the "nonstring" attribute. */ 263 - #define __must_be_cstr(p) \ 264 - __BUILD_BUG_ON_ZERO_MSG(__annotated(p, nonstring), "must be cstr (NUL-terminated)") 265 233 266 234 /* 267 235 * This returns a constant expression while determining if an argument is

+19 -1

include/linux/fs.h

··· 222 222 #define FMODE_FSNOTIFY_HSM(mode) 0 223 223 #endif 224 224 225 - 226 225 /* 227 226 * Attribute flags. These should be or-ed together to figure out what 228 227 * has been changed! ··· 790 791 791 792 static inline void inode_set_cached_link(struct inode *inode, char *link, int linklen) 792 793 { 794 + int testlen; 795 + 796 + /* 797 + * TODO: patch it into a debug-only check if relevant macros show up. 798 + * In the meantime, since we are suffering strlen even on production kernels 799 + * to find the right length, do a fixup if the wrong value got passed. 800 + */ 801 + testlen = strlen(link); 802 + if (testlen != linklen) { 803 + WARN_ONCE(1, "bad length passed for symlink [%s] (got %d, expected %d)", 804 + link, linklen, testlen); 805 + linklen = testlen; 806 + } 793 807 inode->i_link = link; 794 808 inode->i_linklen = linklen; 795 809 inode->i_opflags |= IOP_CACHED_LINK; ··· 3150 3138 if (unlikely(!exe_file || FMODE_FSNOTIFY_HSM(exe_file->f_mode))) 3151 3139 return; 3152 3140 allow_write_access(exe_file); 3141 + } 3142 + 3143 + static inline void file_set_fsnotify_mode(struct file *file, fmode_t mode) 3144 + { 3145 + file->f_mode &= ~FMODE_FSNOTIFY_MASK; 3146 + file->f_mode |= mode; 3153 3147 } 3154 3148 3155 3149 static inline bool inode_is_open_for_write(const struct inode *inode)

+2 -2

include/linux/fsnotify.h

··· 129 129 130 130 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS 131 131 132 - void file_set_fsnotify_mode(struct file *file); 132 + void file_set_fsnotify_mode_from_watchers(struct file *file); 133 133 134 134 /* 135 135 * fsnotify_file_area_perm - permission hook before access to file range ··· 213 213 } 214 214 215 215 #else 216 - static inline void file_set_fsnotify_mode(struct file *file) 216 + static inline void file_set_fsnotify_mode_from_watchers(struct file *file) 217 217 { 218 218 } 219 219

+1

include/linux/hrtimer_defs.h

··· 125 125 ktime_t softirq_expires_next; 126 126 struct hrtimer *softirq_next_timer; 127 127 struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES]; 128 + call_single_data_t csd; 128 129 } ____cacheline_aligned; 129 130 130 131

+8 -2

include/linux/i2c.h

··· 244 244 * @id_table: List of I2C devices supported by this driver 245 245 * @detect: Callback for device detection 246 246 * @address_list: The I2C addresses to probe (for detect) 247 + * @clients: List of detected clients we created (for i2c-core use only) 247 248 * @flags: A bitmask of flags defined in &enum i2c_driver_flags 248 249 * 249 250 * The driver.owner field should be set to the module owner of this driver. ··· 299 298 /* Device detection callback for automatic device creation */ 300 299 int (*detect)(struct i2c_client *client, struct i2c_board_info *info); 301 300 const unsigned short *address_list; 301 + struct list_head clients; 302 302 303 303 u32 flags; 304 304 }; ··· 315 313 * @dev: Driver model device node for the slave. 316 314 * @init_irq: IRQ that was set at initialization 317 315 * @irq: indicates the IRQ generated by this device (if any) 316 + * @detected: member of an i2c_driver.clients list or i2c-core's 317 + * userspace_devices list 318 318 * @slave_cb: Callback when I2C slave mode of an adapter is used. The adapter 319 319 * calls it to pass on slave events to the slave driver. 320 320 * @devres_group_id: id of the devres group that will be created for resources ··· 336 332 #define I2C_CLIENT_SLAVE 0x20 /* we are the slave */ 337 333 #define I2C_CLIENT_HOST_NOTIFY 0x40 /* We want to use I2C host notify */ 338 334 #define I2C_CLIENT_WAKE 0x80 /* for board_info; true iff can wake */ 339 - #define I2C_CLIENT_AUTO 0x100 /* client was auto-detected */ 340 - #define I2C_CLIENT_USER 0x200 /* client was userspace-created */ 341 335 #define I2C_CLIENT_SCCB 0x9000 /* Use Omnivision SCCB protocol */ 342 336 /* Must match I2C_M_STOP|IGNORE_NAK */ 343 337 ··· 347 345 struct device dev; /* the device structure */ 348 346 int init_irq; /* irq set at initialization */ 349 347 int irq; /* irq issued by device */ 348 + struct list_head detected; 350 349 #if IS_ENABLED(CONFIG_I2C_SLAVE) 351 350 i2c_slave_cb_t slave_cb; /* callback for slave mode */ 352 351 #endif ··· 753 750 int nr; 754 751 char name[48]; 755 752 struct completion dev_released; 753 + 754 + struct mutex userspace_clients_lock; 755 + struct list_head userspace_clients; 756 756 757 757 struct i2c_bus_recovery_info *bus_recovery_info; 758 758 const struct i2c_adapter_quirks *quirks;

+1 -1

include/linux/jiffies.h

··· 537 537 * 538 538 * Return: jiffies value 539 539 */ 540 - #define secs_to_jiffies(_secs) ((_secs) * HZ) 540 + #define secs_to_jiffies(_secs) (unsigned long)((_secs) * HZ) 541 541 542 542 extern unsigned long __usecs_to_jiffies(const unsigned int u); 543 543 #if !(USEC_PER_SEC % HZ)

-1

include/linux/kvm_host.h

··· 1615 1615 bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu); 1616 1616 bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu); 1617 1617 bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu); 1618 - int kvm_arch_post_init_vm(struct kvm *kvm); 1619 1618 void kvm_arch_pre_destroy_vm(struct kvm *kvm); 1620 1619 void kvm_arch_create_vm_debugfs(struct kvm *kvm); 1621 1620

+4 -3

include/linux/lockref.h

··· 37 37 /** 38 38 * lockref_init - Initialize a lockref 39 39 * @lockref: pointer to lockref structure 40 - * @count: initial count 40 + * 41 + * Initializes @lockref->count to 1. 41 42 */ 42 - static inline void lockref_init(struct lockref *lockref, unsigned int count) 43 + static inline void lockref_init(struct lockref *lockref) 43 44 { 44 45 spin_lock_init(&lockref->lock); 45 - lockref->count = count; 46 + lockref->count = 1; 46 47 } 47 48 48 49 void lockref_get(struct lockref *lockref);

+4 -1

include/linux/module.h

··· 306 306 /* Get/put a kernel symbol (calls must be symmetric) */ 307 307 void *__symbol_get(const char *symbol); 308 308 void *__symbol_get_gpl(const char *symbol); 309 - #define symbol_get(x) ((typeof(&x))(__symbol_get(__stringify(x)))) 309 + #define symbol_get(x) ({ \ 310 + static const char __notrim[] \ 311 + __used __section(".no_trim_symbol") = __stringify(x); \ 312 + (typeof(&x))(__symbol_get(__stringify(x))); }) 310 313 311 314 /* modules using other modules: kdb wants to see this. */ 312 315 struct module_use {

+1 -1

include/linux/netdevice.h

··· 2904 2904 struct pcpu_dstats { 2905 2905 u64_stats_t rx_packets; 2906 2906 u64_stats_t rx_bytes; 2907 - u64_stats_t rx_drops; 2908 2907 u64_stats_t tx_packets; 2909 2908 u64_stats_t tx_bytes; 2909 + u64_stats_t rx_drops; 2910 2910 u64_stats_t tx_drops; 2911 2911 struct u64_stats_sync syncp; 2912 2912 } __aligned(8 * sizeof(u64));

+8 -4

include/linux/string.h

··· 414 414 * must be discoverable by the compiler. 415 415 */ 416 416 #define strtomem_pad(dest, src, pad) do { \ 417 - const size_t _dest_len = __builtin_object_size(dest, 1); \ 417 + const size_t _dest_len = __must_be_byte_array(dest) + \ 418 + ARRAY_SIZE(dest); \ 418 419 const size_t _src_len = __builtin_object_size(src, 1); \ 419 420 \ 420 421 BUILD_BUG_ON(!__builtin_constant_p(_dest_len) || \ ··· 438 437 * must be discoverable by the compiler. 439 438 */ 440 439 #define strtomem(dest, src) do { \ 441 - const size_t _dest_len = __builtin_object_size(dest, 1); \ 440 + const size_t _dest_len = __must_be_byte_array(dest) + \ 441 + ARRAY_SIZE(dest); \ 442 442 const size_t _src_len = __builtin_object_size(src, 1); \ 443 443 \ 444 444 BUILD_BUG_ON(!__builtin_constant_p(_dest_len) || \ ··· 458 456 * Note that sizes of @dest and @src must be known at compile-time. 459 457 */ 460 458 #define memtostr(dest, src) do { \ 461 - const size_t _dest_len = __builtin_object_size(dest, 1); \ 459 + const size_t _dest_len = __must_be_byte_array(dest) + \ 460 + ARRAY_SIZE(dest); \ 462 461 const size_t _src_len = __builtin_object_size(src, 1); \ 463 462 const size_t _src_chars = strnlen(src, _src_len); \ 464 463 const size_t _copy_len = min(_dest_len - 1, _src_chars); \ ··· 484 481 * Note that sizes of @dest and @src must be known at compile-time. 485 482 */ 486 483 #define memtostr_pad(dest, src) do { \ 487 - const size_t _dest_len = __builtin_object_size(dest, 1); \ 484 + const size_t _dest_len = __must_be_byte_array(dest) + \ 485 + ARRAY_SIZE(dest); \ 488 486 const size_t _src_len = __builtin_object_size(src, 1); \ 489 487 const size_t _src_chars = strnlen(src, _src_len); \ 490 488 const size_t _copy_len = min(_dest_len - 1, _src_chars); \

+1 -1

include/net/sch_generic.h

··· 851 851 } 852 852 853 853 static inline void _bstats_update(struct gnet_stats_basic_sync *bstats, 854 - __u64 bytes, __u32 packets) 854 + __u64 bytes, __u64 packets) 855 855 { 856 856 u64_stats_update_begin(&bstats->syncp); 857 857 u64_stats_add(&bstats->bytes, bytes);

+1

include/trace/events/rxrpc.h

··· 219 219 EM(rxrpc_conn_get_conn_input, "GET inp-conn") \ 220 220 EM(rxrpc_conn_get_idle, "GET idle ") \ 221 221 EM(rxrpc_conn_get_poke_abort, "GET pk-abort") \ 222 + EM(rxrpc_conn_get_poke_secured, "GET secured ") \ 222 223 EM(rxrpc_conn_get_poke_timer, "GET poke ") \ 223 224 EM(rxrpc_conn_get_service_conn, "GET svc-conn") \ 224 225 EM(rxrpc_conn_new_client, "NEW client ") \

+8 -1

include/uapi/drm/amdgpu_drm.h

··· 411 411 /* GFX12 and later: */ 412 412 #define AMDGPU_TILING_GFX12_SWIZZLE_MODE_SHIFT 0 413 413 #define AMDGPU_TILING_GFX12_SWIZZLE_MODE_MASK 0x7 414 - /* These are DCC recompression setting for memory management: */ 414 + /* These are DCC recompression settings for memory management: */ 415 415 #define AMDGPU_TILING_GFX12_DCC_MAX_COMPRESSED_BLOCK_SHIFT 3 416 416 #define AMDGPU_TILING_GFX12_DCC_MAX_COMPRESSED_BLOCK_MASK 0x3 /* 0:64B, 1:128B, 2:256B */ 417 417 #define AMDGPU_TILING_GFX12_DCC_NUMBER_TYPE_SHIFT 5 418 418 #define AMDGPU_TILING_GFX12_DCC_NUMBER_TYPE_MASK 0x7 /* CB_COLOR0_INFO.NUMBER_TYPE */ 419 419 #define AMDGPU_TILING_GFX12_DCC_DATA_FORMAT_SHIFT 8 420 420 #define AMDGPU_TILING_GFX12_DCC_DATA_FORMAT_MASK 0x3f /* [0:4]:CB_COLOR0_INFO.FORMAT, [5]:MM */ 421 + /* When clearing the buffer or moving it from VRAM to GTT, don't compress and set DCC metadata 422 + * to uncompressed. Set when parts of an allocation bypass DCC and read raw data. */ 423 + #define AMDGPU_TILING_GFX12_DCC_WRITE_COMPRESS_DISABLE_SHIFT 14 424 + #define AMDGPU_TILING_GFX12_DCC_WRITE_COMPRESS_DISABLE_MASK 0x1 425 + /* bit gap */ 426 + #define AMDGPU_TILING_GFX12_SCANOUT_SHIFT 63 427 + #define AMDGPU_TILING_GFX12_SCANOUT_MASK 0x1 421 428 422 429 /* Set/Get helpers for tiling flags. */ 423 430 #define AMDGPU_TILING_SET(field, value) \

+2 -2

include/ufs/ufs.h

··· 385 385 386 386 /* Possible values for dExtendedUFSFeaturesSupport */ 387 387 enum { 388 - UFS_DEV_LOW_TEMP_NOTIF = BIT(4), 389 - UFS_DEV_HIGH_TEMP_NOTIF = BIT(5), 388 + UFS_DEV_HIGH_TEMP_NOTIF = BIT(4), 389 + UFS_DEV_LOW_TEMP_NOTIF = BIT(5), 390 390 UFS_DEV_EXT_TEMP_NOTIF = BIT(6), 391 391 UFS_DEV_HPB_SUPPORT = BIT(7), 392 392 UFS_DEV_WRITE_BOOSTER_SUP = BIT(8),

-1

include/ufs/ufshcd.h

··· 1309 1309 void ufshcd_enable_irq(struct ufs_hba *hba); 1310 1310 void ufshcd_disable_irq(struct ufs_hba *hba); 1311 1311 int ufshcd_alloc_host(struct device *, struct ufs_hba **); 1312 - void ufshcd_dealloc_host(struct ufs_hba *); 1313 1312 int ufshcd_hba_enable(struct ufs_hba *hba); 1314 1313 int ufshcd_init(struct ufs_hba *, void __iomem *, unsigned int); 1315 1314 int ufshcd_link_recovery(struct ufs_hba *hba);

+1 -1

io_uring/futex.c

··· 338 338 hlist_add_head(&req->hash_node, &ctx->futex_list); 339 339 io_ring_submit_unlock(ctx, issue_flags); 340 340 341 - futex_queue(&ifd->q, hb); 341 + futex_queue(&ifd->q, hb, NULL); 342 342 return IOU_ISSUE_SKIP_COMPLETE; 343 343 } 344 344

+3 -2

kernel/futex/core.c

··· 532 532 futex_hb_waiters_dec(hb); 533 533 } 534 534 535 - void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb) 535 + void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb, 536 + struct task_struct *task) 536 537 { 537 538 int prio; 538 539 ··· 549 548 550 549 plist_node_init(&q->list, prio); 551 550 plist_add(&q->list, &hb->chain); 552 - q->task = current; 551 + q->task = task; 553 552 } 554 553 555 554 /**

+8 -3

kernel/futex/futex.h

··· 285 285 } 286 286 287 287 extern void __futex_unqueue(struct futex_q *q); 288 - extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb); 288 + extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb, 289 + struct task_struct *task); 289 290 extern int futex_unqueue(struct futex_q *q); 290 291 291 292 /** 292 293 * futex_queue() - Enqueue the futex_q on the futex_hash_bucket 293 294 * @q: The futex_q to enqueue 294 295 * @hb: The destination hash bucket 296 + * @task: Task queueing this futex 295 297 * 296 298 * The hb->lock must be held by the caller, and is released here. A call to 297 299 * futex_queue() is typically paired with exactly one call to futex_unqueue(). The ··· 301 299 * or nothing if the unqueue is done as part of the wake process and the unqueue 302 300 * state is implicit in the state of woken task (see futex_wait_requeue_pi() for 303 301 * an example). 302 + * 303 + * Note that @task may be NULL, for async usage of futexes. 304 304 */ 305 - static inline void futex_queue(struct futex_q *q, struct futex_hash_bucket *hb) 305 + static inline void futex_queue(struct futex_q *q, struct futex_hash_bucket *hb, 306 + struct task_struct *task) 306 307 __releases(&hb->lock) 307 308 { 308 - __futex_queue(q, hb); 309 + __futex_queue(q, hb, task); 309 310 spin_unlock(&hb->lock); 310 311 } 311 312

+1 -1

kernel/futex/pi.c

··· 982 982 /* 983 983 * Only actually queue now that the atomic ops are done: 984 984 */ 985 - __futex_queue(&q, hb); 985 + __futex_queue(&q, hb, current); 986 986 987 987 if (trylock) { 988 988 ret = rt_mutex_futex_trylock(&q.pi_state->pi_mutex);

+2 -2

kernel/futex/waitwake.c

··· 349 349 * access to the hash list and forcing another memory barrier. 350 350 */ 351 351 set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); 352 - futex_queue(q, hb); 352 + futex_queue(q, hb, current); 353 353 354 354 /* Arm the timer */ 355 355 if (timeout) ··· 460 460 * next futex. Queue each futex at this moment so hb can 461 461 * be unlocked. 462 462 */ 463 - futex_queue(q, hb); 463 + futex_queue(q, hb, current); 464 464 continue; 465 465 } 466 466

+2 -2

kernel/kthread.c

··· 859 859 struct kthread *kthread = to_kthread(p); 860 860 cpumask_var_t affinity; 861 861 unsigned long flags; 862 - int ret; 862 + int ret = 0; 863 863 864 864 if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) { 865 865 WARN_ON(1); ··· 892 892 out: 893 893 free_cpumask_var(affinity); 894 894 895 - return 0; 895 + return ret; 896 896 } 897 897 898 898 /*

+2

kernel/sched/debug.c

··· 1262 1262 if (task_has_dl_policy(p)) { 1263 1263 P(dl.runtime); 1264 1264 P(dl.deadline); 1265 + } else if (fair_policy(p->policy)) { 1266 + P(se.slice); 1265 1267 } 1266 1268 #ifdef CONFIG_SCHED_CLASS_EXT 1267 1269 __PS("ext.enabled", task_on_scx(p));

+19

kernel/sched/fair.c

··· 5385 5385 static void set_delayed(struct sched_entity *se) 5386 5386 { 5387 5387 se->sched_delayed = 1; 5388 + 5389 + /* 5390 + * Delayed se of cfs_rq have no tasks queued on them. 5391 + * Do not adjust h_nr_runnable since dequeue_entities() 5392 + * will account it for blocked tasks. 5393 + */ 5394 + if (!entity_is_task(se)) 5395 + return; 5396 + 5388 5397 for_each_sched_entity(se) { 5389 5398 struct cfs_rq *cfs_rq = cfs_rq_of(se); 5390 5399 ··· 5406 5397 static void clear_delayed(struct sched_entity *se) 5407 5398 { 5408 5399 se->sched_delayed = 0; 5400 + 5401 + /* 5402 + * Delayed se of cfs_rq have no tasks queued on them. 5403 + * Do not adjust h_nr_runnable since a dequeue has 5404 + * already accounted for it or an enqueue of a task 5405 + * below it will account for it in enqueue_task_fair(). 5406 + */ 5407 + if (!entity_is_task(se)) 5408 + return; 5409 + 5409 5410 for_each_sched_entity(se) { 5410 5411 struct cfs_rq *cfs_rq = cfs_rq_of(se); 5411 5412

+12

kernel/seccomp.c

··· 749 749 if (WARN_ON_ONCE(!fprog)) 750 750 return false; 751 751 752 + /* Our single exception to filtering. */ 753 + #ifdef __NR_uretprobe 754 + #ifdef SECCOMP_ARCH_COMPAT 755 + if (sd->arch == SECCOMP_ARCH_NATIVE) 756 + #endif 757 + if (sd->nr == __NR_uretprobe) 758 + return true; 759 + #endif 760 + 752 761 for (pc = 0; pc < fprog->len; pc++) { 753 762 struct sock_filter *insn = &fprog->filter[pc]; 754 763 u16 code = insn->code; ··· 1032 1023 */ 1033 1024 static const int mode1_syscalls[] = { 1034 1025 __NR_seccomp_read, __NR_seccomp_write, __NR_seccomp_exit, __NR_seccomp_sigreturn, 1026 + #ifdef __NR_uretprobe 1027 + __NR_uretprobe, 1028 + #endif 1035 1029 -1, /* negative terminated */ 1036 1030 }; 1037 1031

+6 -3

kernel/time/clocksource.c

··· 373 373 cpumask_clear(&cpus_ahead); 374 374 cpumask_clear(&cpus_behind); 375 375 cpus_read_lock(); 376 - preempt_disable(); 376 + migrate_disable(); 377 377 clocksource_verify_choose_cpus(); 378 378 if (cpumask_empty(&cpus_chosen)) { 379 - preempt_enable(); 379 + migrate_enable(); 380 380 cpus_read_unlock(); 381 381 pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name); 382 382 return; 383 383 } 384 384 testcpu = smp_processor_id(); 385 - pr_warn("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", cs->name, testcpu, cpumask_pr_args(&cpus_chosen)); 385 + pr_info("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", 386 + cs->name, testcpu, cpumask_pr_args(&cpus_chosen)); 387 + preempt_disable(); 386 388 for_each_cpu(cpu, &cpus_chosen) { 387 389 if (cpu == testcpu) 388 390 continue; ··· 404 402 cs_nsec_min = cs_nsec; 405 403 } 406 404 preempt_enable(); 405 + migrate_enable(); 407 406 cpus_read_unlock(); 408 407 if (!cpumask_empty(&cpus_ahead)) 409 408 pr_warn(" CPUs %*pbl ahead of CPU %d for clocksource %s.\n",

+94 -31

kernel/time/hrtimer.c

··· 58 58 #define HRTIMER_ACTIVE_SOFT (HRTIMER_ACTIVE_HARD << MASK_SHIFT) 59 59 #define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD) 60 60 61 + static void retrigger_next_event(void *arg); 62 + 61 63 /* 62 64 * The timer bases: 63 65 * ··· 113 111 .clockid = CLOCK_TAI, 114 112 .get_time = &ktime_get_clocktai, 115 113 }, 116 - } 114 + }, 115 + .csd = CSD_INIT(retrigger_next_event, NULL) 117 116 }; 118 117 119 118 static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = { ··· 126 123 [CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME, 127 124 [CLOCK_TAI] = HRTIMER_BASE_TAI, 128 125 }; 126 + 127 + static inline bool hrtimer_base_is_online(struct hrtimer_cpu_base *base) 128 + { 129 + if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) 130 + return true; 131 + else 132 + return likely(base->online); 133 + } 129 134 130 135 /* 131 136 * Functions and macros which are different for UP/SMP systems are kept in a ··· 155 144 }; 156 145 157 146 #define migration_base migration_cpu_base.clock_base[0] 158 - 159 - static inline bool is_migration_base(struct hrtimer_clock_base *base) 160 - { 161 - return base == &migration_base; 162 - } 163 147 164 148 /* 165 149 * We are using hashed locking: holding per_cpu(hrtimer_bases)[n].lock ··· 189 183 } 190 184 191 185 /* 192 - * We do not migrate the timer when it is expiring before the next 193 - * event on the target cpu. When high resolution is enabled, we cannot 194 - * reprogram the target cpu hardware and we would cause it to fire 195 - * late. To keep it simple, we handle the high resolution enabled and 196 - * disabled case similar. 186 + * Check if the elected target is suitable considering its next 187 + * event and the hotplug state of the current CPU. 188 + * 189 + * If the elected target is remote and its next event is after the timer 190 + * to queue, then a remote reprogram is necessary. However there is no 191 + * guarantee the IPI handling the operation would arrive in time to meet 192 + * the high resolution deadline. In this case the local CPU becomes a 193 + * preferred target, unless it is offline. 194 + * 195 + * High and low resolution modes are handled the same way for simplicity. 197 196 * 198 197 * Called with cpu_base->lock of target cpu held. 199 198 */ 200 - static int 201 - hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base) 199 + static bool hrtimer_suitable_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base, 200 + struct hrtimer_cpu_base *new_cpu_base, 201 + struct hrtimer_cpu_base *this_cpu_base) 202 202 { 203 203 ktime_t expires; 204 204 205 + /* 206 + * The local CPU clockevent can be reprogrammed. Also get_target_base() 207 + * guarantees it is online. 208 + */ 209 + if (new_cpu_base == this_cpu_base) 210 + return true; 211 + 212 + /* 213 + * The offline local CPU can't be the default target if the 214 + * next remote target event is after this timer. Keep the 215 + * elected new base. An IPI will we issued to reprogram 216 + * it as a last resort. 217 + */ 218 + if (!hrtimer_base_is_online(this_cpu_base)) 219 + return true; 220 + 205 221 expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset); 206 - return expires < new_base->cpu_base->expires_next; 222 + 223 + return expires >= new_base->cpu_base->expires_next; 207 224 } 208 225 209 - static inline 210 - struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base, 211 - int pinned) 226 + static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base, int pinned) 212 227 { 228 + if (!hrtimer_base_is_online(base)) { 229 + int cpu = cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYPE_TIMER)); 230 + 231 + return &per_cpu(hrtimer_bases, cpu); 232 + } 233 + 213 234 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) 214 235 if (static_branch_likely(&timers_migration_enabled) && !pinned) 215 236 return &per_cpu(hrtimer_bases, get_nohz_timer_target()); ··· 287 254 raw_spin_unlock(&base->cpu_base->lock); 288 255 raw_spin_lock(&new_base->cpu_base->lock); 289 256 290 - if (new_cpu_base != this_cpu_base && 291 - hrtimer_check_target(timer, new_base)) { 257 + if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, 258 + this_cpu_base)) { 292 259 raw_spin_unlock(&new_base->cpu_base->lock); 293 260 raw_spin_lock(&base->cpu_base->lock); 294 261 new_cpu_base = this_cpu_base; ··· 297 264 } 298 265 WRITE_ONCE(timer->base, new_base); 299 266 } else { 300 - if (new_cpu_base != this_cpu_base && 301 - hrtimer_check_target(timer, new_base)) { 267 + if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, this_cpu_base)) { 302 268 new_cpu_base = this_cpu_base; 303 269 goto again; 304 270 } ··· 306 274 } 307 275 308 276 #else /* CONFIG_SMP */ 309 - 310 - static inline bool is_migration_base(struct hrtimer_clock_base *base) 311 - { 312 - return false; 313 - } 314 277 315 278 static inline struct hrtimer_clock_base * 316 279 lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags) ··· 742 715 { 743 716 return hrtimer_hres_enabled; 744 717 } 745 - 746 - static void retrigger_next_event(void *arg); 747 718 748 719 /* 749 720 * Switch to high resolution mode ··· 1230 1205 u64 delta_ns, const enum hrtimer_mode mode, 1231 1206 struct hrtimer_clock_base *base) 1232 1207 { 1208 + struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases); 1233 1209 struct hrtimer_clock_base *new_base; 1234 1210 bool force_local, first; 1235 1211 ··· 1242 1216 * and enforce reprogramming after it is queued no matter whether 1243 1217 * it is the new first expiring timer again or not. 1244 1218 */ 1245 - force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases); 1219 + force_local = base->cpu_base == this_cpu_base; 1246 1220 force_local &= base->cpu_base->next_timer == timer; 1221 + 1222 + /* 1223 + * Don't force local queuing if this enqueue happens on a unplugged 1224 + * CPU after hrtimer_cpu_dying() has been invoked. 1225 + */ 1226 + force_local &= this_cpu_base->online; 1247 1227 1248 1228 /* 1249 1229 * Remove an active timer from the queue. In case it is not queued ··· 1280 1248 } 1281 1249 1282 1250 first = enqueue_hrtimer(timer, new_base, mode); 1283 - if (!force_local) 1284 - return first; 1251 + if (!force_local) { 1252 + /* 1253 + * If the current CPU base is online, then the timer is 1254 + * never queued on a remote CPU if it would be the first 1255 + * expiring timer there. 1256 + */ 1257 + if (hrtimer_base_is_online(this_cpu_base)) 1258 + return first; 1259 + 1260 + /* 1261 + * Timer was enqueued remote because the current base is 1262 + * already offline. If the timer is the first to expire, 1263 + * kick the remote CPU to reprogram the clock event. 1264 + */ 1265 + if (first) { 1266 + struct hrtimer_cpu_base *new_cpu_base = new_base->cpu_base; 1267 + 1268 + smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd); 1269 + } 1270 + return 0; 1271 + } 1285 1272 1286 1273 /* 1287 1274 * Timer was forced to stay on the current CPU to avoid ··· 1420 1369 raw_spin_lock_irq(&cpu_base->lock); 1421 1370 } 1422 1371 } 1372 + 1373 + #ifdef CONFIG_SMP 1374 + static __always_inline bool is_migration_base(struct hrtimer_clock_base *base) 1375 + { 1376 + return base == &migration_base; 1377 + } 1378 + #else 1379 + static __always_inline bool is_migration_base(struct hrtimer_clock_base *base) 1380 + { 1381 + return false; 1382 + } 1383 + #endif 1423 1384 1424 1385 /* 1425 1386 * This function is called on PREEMPT_RT kernels when the fast path

+9 -1

kernel/time/timer_migration.c

··· 1675 1675 1676 1676 } while (i < tmigr_hierarchy_levels); 1677 1677 1678 + /* Assert single root */ 1679 + WARN_ON_ONCE(!err && !group->parent && !list_is_singular(&tmigr_level_list[top])); 1680 + 1678 1681 while (i > 0) { 1679 1682 group = stack[--i]; 1680 1683 ··· 1719 1716 WARN_ON_ONCE(top == 0); 1720 1717 1721 1718 lvllist = &tmigr_level_list[top]; 1722 - if (group->num_children == 1 && list_is_singular(lvllist)) { 1719 + 1720 + /* 1721 + * Newly created root level should have accounted the upcoming 1722 + * CPU's child group and pre-accounted the old root. 1723 + */ 1724 + if (group->num_children == 2 && list_is_singular(lvllist)) { 1723 1725 /* 1724 1726 * The target CPU must never do the prepare work, except 1725 1727 * on early boot when the boot CPU is the target. Otherwise

+1 -1

kernel/trace/trace_functions_graph.c

··· 198 198 * returning from the function. 199 199 */ 200 200 if (ftrace_graph_notrace_addr(trace->func)) { 201 - *task_var |= TRACE_GRAPH_NOTRACE_BIT; 201 + *task_var |= TRACE_GRAPH_NOTRACE; 202 202 /* 203 203 * Need to return 1 to have the return called 204 204 * that will clear the NOTRACE bit.

+4 -2

lib/stackinit_kunit.c

··· 75 75 */ 76 76 #ifdef CONFIG_M68K 77 77 #define FILL_SIZE_STRING 8 78 + #define FILL_SIZE_ARRAY 2 78 79 #else 79 80 #define FILL_SIZE_STRING 16 81 + #define FILL_SIZE_ARRAY 8 80 82 #endif 81 83 82 84 #define INIT_CLONE_SCALAR /**/ ··· 347 345 short three; 348 346 unsigned long four; 349 347 struct big_struct { 350 - unsigned long array[8]; 348 + unsigned long array[FILL_SIZE_ARRAY]; 351 349 } big; 352 350 }; 353 351 354 - /* Mismatched sizes, with one and two being small */ 352 + /* Mismatched sizes, with three and four being small */ 355 353 union test_small_end { 356 354 short one; 357 355 unsigned long two;

+14

net/core/dev.c

··· 11286 11286 const struct net_device_ops *ops = dev->netdev_ops; 11287 11287 const struct net_device_core_stats __percpu *p; 11288 11288 11289 + /* 11290 + * IPv{4,6} and udp tunnels share common stat helpers and use 11291 + * different stat type (NETDEV_PCPU_STAT_TSTATS vs 11292 + * NETDEV_PCPU_STAT_DSTATS). Ensure the accounting is consistent. 11293 + */ 11294 + BUILD_BUG_ON(offsetof(struct pcpu_sw_netstats, rx_bytes) != 11295 + offsetof(struct pcpu_dstats, rx_bytes)); 11296 + BUILD_BUG_ON(offsetof(struct pcpu_sw_netstats, rx_packets) != 11297 + offsetof(struct pcpu_dstats, rx_packets)); 11298 + BUILD_BUG_ON(offsetof(struct pcpu_sw_netstats, tx_bytes) != 11299 + offsetof(struct pcpu_dstats, tx_bytes)); 11300 + BUILD_BUG_ON(offsetof(struct pcpu_sw_netstats, tx_packets) != 11301 + offsetof(struct pcpu_dstats, tx_packets)); 11302 + 11289 11303 if (ops->ndo_get_stats64) { 11290 11304 memset(storage, 0, sizeof(*storage)); 11291 11305 ops->ndo_get_stats64(dev, storage);

+1 -1

net/ethtool/ioctl.c

··· 993 993 return rc; 994 994 995 995 /* Nonzero ring with RSS only makes sense if NIC adds them together */ 996 - if (cmd == ETHTOOL_SRXCLSRLINS && info.flow_type & FLOW_RSS && 996 + if (cmd == ETHTOOL_SRXCLSRLINS && info.fs.flow_type & FLOW_RSS && 997 997 !ops->cap_rss_rxnfc_adds && 998 998 ethtool_get_flow_spec_ring(info.fs.ring_cookie)) 999 999 return -EINVAL;

+2 -1

net/ethtool/rss.c

··· 107 107 u32 total_size, indir_bytes; 108 108 u8 *rss_config; 109 109 110 + data->no_key_fields = !dev->ethtool_ops->rxfh_per_ctx_key; 111 + 110 112 ctx = xa_load(&dev->ethtool->rss_ctx, request->rss_context); 111 113 if (!ctx) 112 114 return -ENOENT; ··· 155 153 if (!ops->cap_rss_ctx_supported && !ops->create_rxfh_context) 156 154 return -EOPNOTSUPP; 157 155 158 - data->no_key_fields = !ops->rxfh_per_ctx_key; 159 156 return rss_prepare_ctx(request, dev, data, info); 160 157 } 161 158

+2 -2

net/ipv4/udp.c

··· 1141 1141 const int hlen = skb_network_header_len(skb) + 1142 1142 sizeof(struct udphdr); 1143 1143 1144 - if (hlen + cork->gso_size > cork->fragsize) { 1144 + if (hlen + min(datalen, cork->gso_size) > cork->fragsize) { 1145 1145 kfree_skb(skb); 1146 - return -EINVAL; 1146 + return -EMSGSIZE; 1147 1147 } 1148 1148 if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) { 1149 1149 kfree_skb(skb);

+9 -5

net/ipv6/ioam6_iptunnel.c

··· 336 336 337 337 static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) 338 338 { 339 - struct dst_entry *dst = skb_dst(skb), *cache_dst; 339 + struct dst_entry *dst = skb_dst(skb), *cache_dst = NULL; 340 340 struct in6_addr orig_daddr; 341 341 struct ioam6_lwt *ilwt; 342 342 int err = -EINVAL; ··· 407 407 cache_dst = ip6_route_output(net, NULL, &fl6); 408 408 if (cache_dst->error) { 409 409 err = cache_dst->error; 410 - dst_release(cache_dst); 411 410 goto drop; 412 411 } 413 412 414 - local_bh_disable(); 415 - dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr); 416 - local_bh_enable(); 413 + /* cache only if we don't create a dst reference loop */ 414 + if (dst->lwtstate != cache_dst->lwtstate) { 415 + local_bh_disable(); 416 + dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr); 417 + local_bh_enable(); 418 + } 417 419 418 420 err = skb_cow_head(skb, LL_RESERVED_SPACE(cache_dst->dev)); 419 421 if (unlikely(err)) ··· 428 426 return dst_output(net, sk, skb); 429 427 } 430 428 out: 429 + dst_release(cache_dst); 431 430 return dst->lwtstate->orig_output(net, sk, skb); 432 431 drop: 432 + dst_release(cache_dst); 433 433 kfree_skb(skb); 434 434 return err; 435 435 }

+10 -5

net/ipv6/rpl_iptunnel.c

··· 232 232 dst = ip6_route_output(net, NULL, &fl6); 233 233 if (dst->error) { 234 234 err = dst->error; 235 - dst_release(dst); 236 235 goto drop; 237 236 } 238 237 239 - local_bh_disable(); 240 - dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr); 241 - local_bh_enable(); 238 + /* cache only if we don't create a dst reference loop */ 239 + if (orig_dst->lwtstate != dst->lwtstate) { 240 + local_bh_disable(); 241 + dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr); 242 + local_bh_enable(); 243 + } 242 244 243 245 err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); 244 246 if (unlikely(err)) ··· 253 251 return dst_output(net, sk, skb); 254 252 255 253 drop: 254 + dst_release(dst); 256 255 kfree_skb(skb); 257 256 return err; 258 257 } ··· 272 269 local_bh_enable(); 273 270 274 271 err = rpl_do_srh(skb, rlwt, dst); 275 - if (unlikely(err)) 272 + if (unlikely(err)) { 273 + dst_release(dst); 276 274 goto drop; 275 + } 277 276 278 277 if (!dst) { 279 278 ip6_route_input(skb);

+10 -5

net/ipv6/seg6_iptunnel.c

··· 482 482 local_bh_enable(); 483 483 484 484 err = seg6_do_srh(skb, dst); 485 - if (unlikely(err)) 485 + if (unlikely(err)) { 486 + dst_release(dst); 486 487 goto drop; 488 + } 487 489 488 490 if (!dst) { 489 491 ip6_route_input(skb); ··· 573 571 dst = ip6_route_output(net, NULL, &fl6); 574 572 if (dst->error) { 575 573 err = dst->error; 576 - dst_release(dst); 577 574 goto drop; 578 575 } 579 576 580 - local_bh_disable(); 581 - dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr); 582 - local_bh_enable(); 577 + /* cache only if we don't create a dst reference loop */ 578 + if (orig_dst->lwtstate != dst->lwtstate) { 579 + local_bh_disable(); 580 + dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr); 581 + local_bh_enable(); 582 + } 583 583 584 584 err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); 585 585 if (unlikely(err)) ··· 597 593 598 594 return dst_output(net, sk, skb); 599 595 drop: 596 + dst_release(dst); 600 597 kfree_skb(skb); 601 598 return err; 602 599 }

+2 -2

net/ipv6/udp.c

··· 1389 1389 const int hlen = skb_network_header_len(skb) + 1390 1390 sizeof(struct udphdr); 1391 1391 1392 - if (hlen + cork->gso_size > cork->fragsize) { 1392 + if (hlen + min(datalen, cork->gso_size) > cork->fragsize) { 1393 1393 kfree_skb(skb); 1394 - return -EINVAL; 1394 + return -EMSGSIZE; 1395 1395 } 1396 1396 if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) { 1397 1397 kfree_skb(skb);

+16 -8

net/rose/af_rose.c

··· 701 701 struct net_device *dev; 702 702 ax25_address *source; 703 703 ax25_uid_assoc *user; 704 + int err = -EINVAL; 704 705 int n; 705 - 706 - if (!sock_flag(sk, SOCK_ZAPPED)) 707 - return -EINVAL; 708 706 709 707 if (addr_len != sizeof(struct sockaddr_rose) && addr_len != sizeof(struct full_sockaddr_rose)) 710 708 return -EINVAL; ··· 716 718 if ((unsigned int) addr->srose_ndigis > ROSE_MAX_DIGIS) 717 719 return -EINVAL; 718 720 719 - if ((dev = rose_dev_get(&addr->srose_addr)) == NULL) 720 - return -EADDRNOTAVAIL; 721 + lock_sock(sk); 722 + 723 + if (!sock_flag(sk, SOCK_ZAPPED)) 724 + goto out_release; 725 + 726 + err = -EADDRNOTAVAIL; 727 + dev = rose_dev_get(&addr->srose_addr); 728 + if (!dev) 729 + goto out_release; 721 730 722 731 source = &addr->srose_call; 723 732 ··· 735 730 } else { 736 731 if (ax25_uid_policy && !capable(CAP_NET_BIND_SERVICE)) { 737 732 dev_put(dev); 738 - return -EACCES; 733 + err = -EACCES; 734 + goto out_release; 739 735 } 740 736 rose->source_call = *source; 741 737 } ··· 759 753 rose_insert_socket(sk); 760 754 761 755 sock_reset_flag(sk, SOCK_ZAPPED); 762 - 763 - return 0; 756 + err = 0; 757 + out_release: 758 + release_sock(sk); 759 + return err; 764 760 } 765 761 766 762 static int rose_connect(struct socket *sock, struct sockaddr *uaddr, int addr_len, int flags)

+1 -1

net/rxrpc/ar-internal.h

··· 582 582 RXRPC_CALL_EXCLUSIVE, /* The call uses a once-only connection */ 583 583 RXRPC_CALL_RX_IS_IDLE, /* recvmsg() is idle - send an ACK */ 584 584 RXRPC_CALL_RECVMSG_READ_ALL, /* recvmsg() read all of the received data */ 585 + RXRPC_CALL_CONN_CHALLENGING, /* The connection is being challenged */ 585 586 }; 586 587 587 588 /* ··· 603 602 RXRPC_CALL_CLIENT_AWAIT_REPLY, /* - client awaiting reply */ 604 603 RXRPC_CALL_CLIENT_RECV_REPLY, /* - client receiving reply phase */ 605 604 RXRPC_CALL_SERVER_PREALLOC, /* - service preallocation */ 606 - RXRPC_CALL_SERVER_SECURING, /* - server securing request connection */ 607 605 RXRPC_CALL_SERVER_RECV_REQUEST, /* - server receiving request */ 608 606 RXRPC_CALL_SERVER_ACK_REQUEST, /* - server pending ACK of request */ 609 607 RXRPC_CALL_SERVER_SEND_REPLY, /* - server sending reply */

+2 -4

net/rxrpc/call_object.c

··· 22 22 [RXRPC_CALL_CLIENT_AWAIT_REPLY] = "ClAwtRpl", 23 23 [RXRPC_CALL_CLIENT_RECV_REPLY] = "ClRcvRpl", 24 24 [RXRPC_CALL_SERVER_PREALLOC] = "SvPrealc", 25 - [RXRPC_CALL_SERVER_SECURING] = "SvSecure", 26 25 [RXRPC_CALL_SERVER_RECV_REQUEST] = "SvRcvReq", 27 26 [RXRPC_CALL_SERVER_ACK_REQUEST] = "SvAckReq", 28 27 [RXRPC_CALL_SERVER_SEND_REPLY] = "SvSndRpl", ··· 452 453 call->cong_tstamp = skb->tstamp; 453 454 454 455 __set_bit(RXRPC_CALL_EXPOSED, &call->flags); 455 - rxrpc_set_call_state(call, RXRPC_CALL_SERVER_SECURING); 456 + rxrpc_set_call_state(call, RXRPC_CALL_SERVER_RECV_REQUEST); 456 457 457 458 spin_lock(&conn->state_lock); 458 459 459 460 switch (conn->state) { 460 461 case RXRPC_CONN_SERVICE_UNSECURED: 461 462 case RXRPC_CONN_SERVICE_CHALLENGING: 462 - rxrpc_set_call_state(call, RXRPC_CALL_SERVER_SECURING); 463 + __set_bit(RXRPC_CALL_CONN_CHALLENGING, &call->flags); 463 464 break; 464 465 case RXRPC_CONN_SERVICE: 465 - rxrpc_set_call_state(call, RXRPC_CALL_SERVER_RECV_REQUEST); 466 466 break; 467 467 468 468 case RXRPC_CONN_ABORTED:

+11 -10

net/rxrpc/conn_event.c

··· 228 228 */ 229 229 static void rxrpc_call_is_secure(struct rxrpc_call *call) 230 230 { 231 - if (call && __rxrpc_call_state(call) == RXRPC_CALL_SERVER_SECURING) { 232 - rxrpc_set_call_state(call, RXRPC_CALL_SERVER_RECV_REQUEST); 231 + if (call && __test_and_clear_bit(RXRPC_CALL_CONN_CHALLENGING, &call->flags)) 233 232 rxrpc_notify_socket(call); 234 - } 235 233 } 236 234 237 235 /* ··· 270 272 * we've already received the packet, put it on the 271 273 * front of the queue. 272 274 */ 275 + sp->conn = rxrpc_get_connection(conn, rxrpc_conn_get_poke_secured); 273 276 skb->mark = RXRPC_SKB_MARK_SERVICE_CONN_SECURED; 274 277 rxrpc_get_skb(skb, rxrpc_skb_get_conn_secured); 275 278 skb_queue_head(&conn->local->rx_queue, skb); ··· 436 437 if (test_and_clear_bit(RXRPC_CONN_EV_ABORT_CALLS, &conn->events)) 437 438 rxrpc_abort_calls(conn); 438 439 439 - switch (skb->mark) { 440 - case RXRPC_SKB_MARK_SERVICE_CONN_SECURED: 441 - if (conn->state != RXRPC_CONN_SERVICE) 442 - break; 440 + if (skb) { 441 + switch (skb->mark) { 442 + case RXRPC_SKB_MARK_SERVICE_CONN_SECURED: 443 + if (conn->state != RXRPC_CONN_SERVICE) 444 + break; 443 445 444 - for (loop = 0; loop < RXRPC_MAXCALLS; loop++) 445 - rxrpc_call_is_secure(conn->channels[loop].call); 446 - break; 446 + for (loop = 0; loop < RXRPC_MAXCALLS; loop++) 447 + rxrpc_call_is_secure(conn->channels[loop].call); 448 + break; 449 + } 447 450 } 448 451 449 452 /* Process delayed ACKs whose time has come. */

+1

net/rxrpc/conn_object.c

··· 67 67 INIT_WORK(&conn->destructor, rxrpc_clean_up_connection); 68 68 INIT_LIST_HEAD(&conn->proc_link); 69 69 INIT_LIST_HEAD(&conn->link); 70 + INIT_LIST_HEAD(&conn->attend_link); 70 71 mutex_init(&conn->security_lock); 71 72 mutex_init(&conn->tx_data_alloc_lock); 72 73 skb_queue_head_init(&conn->rx_queue);

+10 -2

net/rxrpc/input.c

··· 448 448 struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 449 449 bool last = sp->hdr.flags & RXRPC_LAST_PACKET; 450 450 451 - skb_queue_tail(&call->recvmsg_queue, skb); 451 + spin_lock_irq(&call->recvmsg_queue.lock); 452 + 453 + __skb_queue_tail(&call->recvmsg_queue, skb); 452 454 rxrpc_input_update_ack_window(call, window, wtop); 453 455 trace_rxrpc_receive(call, last ? why + 1 : why, sp->hdr.serial, sp->hdr.seq); 454 456 if (last) 457 + /* Change the state inside the lock so that recvmsg syncs 458 + * correctly with it and using sendmsg() to send a reply 459 + * doesn't race. 460 + */ 455 461 rxrpc_end_rx_phase(call, sp->hdr.serial); 462 + 463 + spin_unlock_irq(&call->recvmsg_queue.lock); 456 464 } 457 465 458 466 /* ··· 665 657 rxrpc_propose_delay_ACK(call, sp->hdr.serial, 666 658 rxrpc_propose_ack_input_data); 667 659 } 668 - if (notify) { 660 + if (notify && !test_bit(RXRPC_CALL_CONN_CHALLENGING, &call->flags)) { 669 661 trace_rxrpc_notify_socket(call->debug_id, sp->hdr.serial); 670 662 rxrpc_notify_socket(call); 671 663 }

+1 -1

net/rxrpc/sendmsg.c

··· 707 707 } else { 708 708 switch (rxrpc_call_state(call)) { 709 709 case RXRPC_CALL_CLIENT_AWAIT_CONN: 710 - case RXRPC_CALL_SERVER_SECURING: 710 + case RXRPC_CALL_SERVER_RECV_REQUEST: 711 711 if (p.command == RXRPC_CMD_SEND_ABORT) 712 712 break; 713 713 fallthrough;

+3

net/sched/sch_fifo.c

··· 40 40 { 41 41 unsigned int prev_backlog; 42 42 43 + if (unlikely(READ_ONCE(sch->limit) == 0)) 44 + return qdisc_drop(skb, sch, to_free); 45 + 43 46 if (likely(sch->q.qlen < READ_ONCE(sch->limit))) 44 47 return qdisc_enqueue_tail(skb, sch); 45 48

+1 -1

net/sched/sch_netem.c

··· 749 749 if (err != NET_XMIT_SUCCESS) { 750 750 if (net_xmit_drop_count(err)) 751 751 qdisc_qstats_drop(sch); 752 - qdisc_tree_reduce_backlog(sch, 1, pkt_len); 753 752 sch->qstats.backlog -= pkt_len; 754 753 sch->q.qlen--; 754 + qdisc_tree_reduce_backlog(sch, 1, pkt_len); 755 755 } 756 756 goto tfifo_dequeue; 757 757 }

+5

net/socket.c

··· 479 479 sock->file = file; 480 480 file->private_data = sock; 481 481 stream_open(SOCK_INODE(sock), file); 482 + /* 483 + * Disable permission and pre-content events, but enable legacy 484 + * inotify events for legacy users. 485 + */ 486 + file_set_fsnotify_mode(file, FMODE_NONOTIFY_PERM); 482 487 return file; 483 488 } 484 489 EXPORT_SYMBOL(sock_alloc_file);

+3 -2

rust/Makefile

··· 144 144 --extern bindings --extern uapi 145 145 rusttestlib-kernel: $(src)/kernel/lib.rs \ 146 146 rusttestlib-bindings rusttestlib-uapi rusttestlib-build_error \ 147 - $(obj)/libmacros.so $(obj)/bindings.o FORCE 147 + $(obj)/$(libmacros_name) $(obj)/bindings.o FORCE 148 148 +$(call if_changed,rustc_test_library) 149 149 150 150 rusttestlib-bindings: private rustc_target_flags = --extern ffi ··· 240 240 -fzero-call-used-regs=% -fno-stack-clash-protection \ 241 241 -fno-inline-functions-called-once -fsanitize=bounds-strict \ 242 242 -fstrict-flex-arrays=% -fmin-function-alignment=% \ 243 + -fzero-init-padding-bits=% \ 243 244 --param=% --param asan-% 244 245 245 246 # Derived from `scripts/Makefile.clang`. ··· 332 331 $(obj)/bindings/bindings_helpers_generated.rs: $(src)/helpers/helpers.c FORCE 333 332 $(call if_changed_dep,bindgen) 334 333 335 - rust_exports = $(NM) -p --defined-only $(1) | awk '$$2~/(T|R|D|B)/ && $$3!~/__cfi/ { printf $(2),$$3 }' 334 + rust_exports = $(NM) -p --defined-only $(1) | awk '$$2~/(T|R|D|B)/ && $$3!~/__cfi/ && $$3!~/__odr_asan/ { printf $(2),$$3 }' 336 335 337 336 quiet_cmd_exports = EXPORTS $@ 338 337 cmd_exports = \

+1 -1

rust/kernel/init.rs

··· 870 870 /// use kernel::{types::Opaque, init::pin_init_from_closure}; 871 871 /// #[repr(C)] 872 872 /// struct RawFoo([u8; 16]); 873 - /// extern { 873 + /// extern "C" { 874 874 /// fn init_foo(_: *mut RawFoo); 875 875 /// } 876 876 ///

+9 -6

scripts/Makefile.extrawarn

··· 31 31 ifdef CONFIG_CC_IS_CLANG 32 32 # The kernel builds with '-std=gnu11' so use of GNU extensions is acceptable. 33 33 KBUILD_CFLAGS += -Wno-gnu 34 + 35 + # Clang checks for overflow/truncation with '%p', while GCC does not: 36 + # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111219 37 + KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow-non-kprintf) 38 + KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation-non-kprintf) 34 39 else 35 40 36 41 # gcc inanely warns about local variables called 'main' ··· 110 105 KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow) 111 106 ifdef CONFIG_CC_IS_GCC 112 107 KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation) 113 - else 114 - # Clang checks for overflow/truncation with '%p', while GCC does not: 115 - # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111219 116 - KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow-non-kprintf) 117 - KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation-non-kprintf) 118 108 endif 119 109 KBUILD_CFLAGS += $(call cc-disable-warning, stringop-truncation) 120 110 ··· 133 133 KBUILD_CFLAGS += -Wno-tautological-constant-out-of-range-compare 134 134 KBUILD_CFLAGS += $(call cc-disable-warning, unaligned-access) 135 135 KBUILD_CFLAGS += -Wno-enum-compare-conditional 136 - KBUILD_CFLAGS += -Wno-enum-enum-conversion 137 136 endif 138 137 139 138 endif ··· 155 156 KBUILD_CFLAGS += -Wno-missing-field-initializers 156 157 KBUILD_CFLAGS += -Wno-type-limits 157 158 KBUILD_CFLAGS += -Wno-shift-negative-value 159 + 160 + ifdef CONFIG_CC_IS_CLANG 161 + KBUILD_CFLAGS += -Wno-enum-enum-conversion 162 + endif 158 163 159 164 ifdef CONFIG_CC_IS_GCC 160 165 KBUILD_CFLAGS += -Wno-maybe-uninitialized

+1 -1

scripts/Makefile.lib

··· 305 305 # These are shared by some Makefile.* files. 306 306 307 307 ifdef CONFIG_LTO_CLANG 308 - # Run $(LD) here to covert LLVM IR to ELF in the following cases: 308 + # Run $(LD) here to convert LLVM IR to ELF in the following cases: 309 309 # - when this object needs objtool processing, as objtool cannot process LLVM IR 310 310 # - when this is a single-object module, as modpost cannot process LLVM IR 311 311 cmd_ld_single = $(if $(objtool-enabled)$(is-single-obj-m), ; $(LD) $(ld_flags) -r -o $(tmp-target) $@; mv $(tmp-target) $@)

+18

scripts/generate_rust_target.rs

··· 165 165 let option = "CONFIG_".to_owned() + option; 166 166 self.0.contains_key(&option) 167 167 } 168 + 169 + /// Is the rustc version at least `major.minor.patch`? 170 + fn rustc_version_atleast(&self, major: u32, minor: u32, patch: u32) -> bool { 171 + let check_version = 100000 * major + 100 * minor + patch; 172 + let actual_version = self 173 + .0 174 + .get("CONFIG_RUSTC_VERSION") 175 + .unwrap() 176 + .parse::<u32>() 177 + .unwrap(); 178 + check_version <= actual_version 179 + } 168 180 } 169 181 170 182 fn main() { ··· 194 182 } 195 183 } else if cfg.has("X86_64") { 196 184 ts.push("arch", "x86_64"); 185 + if cfg.rustc_version_atleast(1, 86, 0) { 186 + ts.push("rustc-abi", "x86-softfloat"); 187 + } 197 188 ts.push( 198 189 "data-layout", 199 190 "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", ··· 230 215 panic!("32-bit x86 only works under UML"); 231 216 } 232 217 ts.push("arch", "x86"); 218 + if cfg.rustc_version_atleast(1, 86, 0) { 219 + ts.push("rustc-abi", "x86-softfloat"); 220 + } 233 221 ts.push( 234 222 "data-layout", 235 223 "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128",

+35

scripts/mod/modpost.c

··· 507 507 info->modinfo_len = sechdrs[i].sh_size; 508 508 } else if (!strcmp(secname, ".export_symbol")) { 509 509 info->export_symbol_secndx = i; 510 + } else if (!strcmp(secname, ".no_trim_symbol")) { 511 + info->no_trim_symbol = (void *)hdr + sechdrs[i].sh_offset; 512 + info->no_trim_symbol_len = sechdrs[i].sh_size; 510 513 } 511 514 512 515 if (sechdrs[i].sh_type == SHT_SYMTAB) { ··· 1569 1566 /* strip trailing .o */ 1570 1567 mod = new_module(modname, strlen(modname) - strlen(".o")); 1571 1568 1569 + /* save .no_trim_symbol section for later use */ 1570 + if (info.no_trim_symbol_len) { 1571 + mod->no_trim_symbol = xmalloc(info.no_trim_symbol_len); 1572 + memcpy(mod->no_trim_symbol, info.no_trim_symbol, 1573 + info.no_trim_symbol_len); 1574 + mod->no_trim_symbol_len = info.no_trim_symbol_len; 1575 + } 1576 + 1572 1577 if (!mod->is_vmlinux) { 1573 1578 license = get_modinfo(&info, "license"); 1574 1579 if (!license) ··· 1737 1726 } 1738 1727 1739 1728 free(buf); 1729 + } 1730 + 1731 + /* 1732 + * Keep symbols recorded in the .no_trim_symbol section. This is necessary to 1733 + * prevent CONFIG_TRIM_UNUSED_KSYMS from dropping EXPORT_SYMBOL because 1734 + * symbol_get() relies on the symbol being present in the ksymtab for lookups. 1735 + */ 1736 + static void keep_no_trim_symbols(struct module *mod) 1737 + { 1738 + unsigned long size = mod->no_trim_symbol_len; 1739 + 1740 + for (char *s = mod->no_trim_symbol; s; s = next_string(s , &size)) { 1741 + struct symbol *sym; 1742 + 1743 + /* 1744 + * If find_symbol() returns NULL, this symbol is not provided 1745 + * by any module, and symbol_get() will fail. 1746 + */ 1747 + sym = find_symbol(s); 1748 + if (sym) 1749 + sym->used = true; 1750 + } 1740 1751 } 1741 1752 1742 1753 static void check_modname_len(struct module *mod) ··· 2287 2254 read_symbols_from_files(files_source); 2288 2255 2289 2256 list_for_each_entry(mod, &modules, list) { 2257 + keep_no_trim_symbols(mod); 2258 + 2290 2259 if (mod->dump_file || mod->is_vmlinux) 2291 2260 continue; 2292 2261

+6

scripts/mod/modpost.h

··· 111 111 * 112 112 * @dump_file: path to the .symvers file if loaded from a file 113 113 * @aliases: list head for module_aliases 114 + * @no_trim_symbol: .no_trim_symbol section data 115 + * @no_trim_symbol_len: length of the .no_trim_symbol section 114 116 */ 115 117 struct module { 116 118 struct list_head list; ··· 130 128 // Actual imported namespaces 131 129 struct list_head imported_namespaces; 132 130 struct list_head aliases; 131 + char *no_trim_symbol; 132 + unsigned int no_trim_symbol_len; 133 133 char name[]; 134 134 }; 135 135 ··· 145 141 char *strtab; 146 142 char *modinfo; 147 143 unsigned int modinfo_len; 144 + char *no_trim_symbol; 145 + unsigned int no_trim_symbol_len; 148 146 149 147 /* support for 32bit section numbers */ 150 148

+1

scripts/module.lds.S

··· 16 16 *(.discard) 17 17 *(.discard.*) 18 18 *(.export_symbol) 19 + *(.no_trim_symbol) 19 20 } 20 21 21 22 __ksymtab 0 : ALIGN(8) { *(SORT(___ksymtab+*)) }

+1 -1

scripts/package/install-extmod-build

··· 63 63 # Clear VPATH and srcroot because the source files reside in the output 64 64 # directory. 65 65 # shellcheck disable=SC2016 # $(MAKE), $(CC), and $(build) will be expanded by Make 66 - "${MAKE}" run-command KBUILD_RUN_COMMAND='+$(MAKE) HOSTCC=$(CC) VPATH= srcroot=. $(build)='"${destdir}"/scripts 66 + "${MAKE}" run-command KBUILD_RUN_COMMAND='+$(MAKE) HOSTCC="$(CC)" VPATH= srcroot=. $(build)='"${destdir}"/scripts 67 67 68 68 rm -f "${destdir}/scripts/Kbuild" 69 69 fi

+8 -1

tools/testing/selftests/drivers/net/hw/rss_ctx.py

··· 252 252 try: 253 253 # this targets queue 4, which doesn't exist 254 254 ntuple2 = ethtool_create(cfg, "-N", flow) 255 + defer(ethtool, f"-N {cfg.ifname} delete {ntuple2}") 255 256 except CmdExitFailure: 256 257 pass 257 258 else: ··· 260 259 # change the table to target queues 0 and 2 261 260 ethtool(f"-X {cfg.ifname} {ctx_ref} weight 1 0 1 0") 262 261 # ntuple rule therefore targets queues 1 and 3 263 - ntuple2 = ethtool_create(cfg, "-N", flow) 262 + try: 263 + ntuple2 = ethtool_create(cfg, "-N", flow) 264 + except CmdExitFailure: 265 + ksft_pr("Driver does not support rss + queue offset") 266 + return 267 + 268 + defer(ethtool, f"-N {cfg.ifname} delete {ntuple2}") 264 269 # should replace existing filter 265 270 ksft_eq(ntuple, ntuple2) 266 271 _send_traffic_check(cfg, port, ctx_ref, { 'target': (1, 3),

+21 -1

tools/testing/selftests/filesystems/statmount/statmount_test.c

··· 383 383 return; 384 384 } 385 385 386 + if (!(sm->mask & STATMOUNT_MNT_POINT)) { 387 + ksft_test_result_fail("missing STATMOUNT_MNT_POINT in mask\n"); 388 + return; 389 + } 386 390 if (strcmp(sm->str + sm->mnt_point, "/") != 0) { 387 391 ksft_test_result_fail("unexpected mount point: '%s' != '/'\n", 388 392 sm->str + sm->mnt_point); ··· 410 406 if (!sm) { 411 407 ksft_test_result_fail("statmount mount root: %s\n", 412 408 strerror(errno)); 409 + return; 410 + } 411 + if (!(sm->mask & STATMOUNT_MNT_ROOT)) { 412 + ksft_test_result_fail("missing STATMOUNT_MNT_ROOT in mask\n"); 413 413 return; 414 414 } 415 415 mnt_root = sm->str + sm->mnt_root; ··· 445 437 strerror(errno)); 446 438 return; 447 439 } 440 + if (!(sm->mask & STATMOUNT_FS_TYPE)) { 441 + ksft_test_result_fail("missing STATMOUNT_FS_TYPE in mask\n"); 442 + return; 443 + } 448 444 fs_type = sm->str + sm->fs_type; 449 445 for (s = known_fs; s != NULL; s++) { 450 446 if (strcmp(fs_type, *s) == 0) ··· 473 461 if (!sm) { 474 462 ksft_test_result_fail("statmount mnt opts: %s\n", 475 463 strerror(errno)); 464 + return; 465 + } 466 + 467 + if (!(sm->mask & STATMOUNT_MNT_BASIC)) { 468 + ksft_test_result_fail("missing STATMOUNT_MNT_BASIC in mask\n"); 476 469 return; 477 470 } 478 471 ··· 531 514 if (p2) 532 515 *p2 = '\0'; 533 516 534 - statmount_opts = sm->str + sm->mnt_opts; 517 + if (sm->mask & STATMOUNT_MNT_OPTS) 518 + statmount_opts = sm->str + sm->mnt_opts; 519 + else 520 + statmount_opts = ""; 535 521 if (strcmp(statmount_opts, p) != 0) 536 522 ksft_test_result_fail( 537 523 "unexpected mount options: '%s' != '%s'\n",

+2 -2

tools/testing/selftests/kvm/s390/cmma_test.c

··· 444 444 ); 445 445 } 446 446 447 - static void test_get_inital_dirty(void) 447 + static void test_get_initial_dirty(void) 448 448 { 449 449 struct kvm_vm *vm = create_vm_two_memslots(); 450 450 struct kvm_vcpu *vcpu; ··· 651 651 } testlist[] = { 652 652 { "migration mode and dirty tracking", test_migration_mode }, 653 653 { "GET_CMMA_BITS: basic calls", test_get_cmma_basic }, 654 - { "GET_CMMA_BITS: all pages are dirty initally", test_get_inital_dirty }, 654 + { "GET_CMMA_BITS: all pages are dirty initially", test_get_initial_dirty }, 655 655 { "GET_CMMA_BITS: holes are skipped", test_get_skip_holes }, 656 656 }; 657 657

+12 -20

tools/testing/selftests/kvm/s390/ucontrol_test.c

··· 88 88 " ahi %r0,1\n" 89 89 " st %r1,0(%r5,%r6)\n" 90 90 91 - " iske %r1,%r6\n" 92 - " ahi %r0,1\n" 93 - " diag 0,0,0x44\n" 94 - 95 91 " sske %r1,%r6\n" 96 92 " xgr %r1,%r1\n" 97 93 " iske %r1,%r6\n" ··· 455 459 }; 456 460 457 461 ASSERT_EQ(-1, ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION, &region)); 458 - ASSERT_EQ(EINVAL, errno); 462 + ASSERT_TRUE(errno == EEXIST || errno == EINVAL) 463 + TH_LOG("errno %s (%i) not expected for ioctl KVM_SET_USER_MEMORY_REGION", 464 + strerror(errno), errno); 459 465 460 466 ASSERT_EQ(-1, ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION2, &region2)); 461 - ASSERT_EQ(EINVAL, errno); 467 + ASSERT_TRUE(errno == EEXIST || errno == EINVAL) 468 + TH_LOG("errno %s (%i) not expected for ioctl KVM_SET_USER_MEMORY_REGION2", 469 + strerror(errno), errno); 462 470 } 463 471 464 472 TEST_F(uc_kvm, uc_map_unmap) ··· 596 596 ASSERT_EQ(true, uc_handle_exit(self)); 597 597 ASSERT_EQ(1, sync_regs->gprs[0]); 598 598 599 - /* ISKE */ 599 + /* SSKE + ISKE */ 600 + sync_regs->gprs[1] = skeyvalue; 601 + run->kvm_dirty_regs |= KVM_SYNC_GPRS; 600 602 ASSERT_EQ(0, uc_run_once(self)); 601 603 602 604 /* ··· 610 608 TEST_ASSERT_EQ(0, sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)); 611 609 TEST_ASSERT_EQ(KVM_EXIT_S390_SIEIC, self->run->exit_reason); 612 610 TEST_ASSERT_EQ(ICPT_INST, sie_block->icptcode); 613 - TEST_REQUIRE(sie_block->ipa != 0xb229); 611 + TEST_REQUIRE(sie_block->ipa != 0xb22b); 614 612 615 - /* ISKE contd. */ 613 + /* SSKE + ISKE contd. */ 616 614 ASSERT_EQ(false, uc_handle_exit(self)); 617 615 ASSERT_EQ(2, sync_regs->gprs[0]); 618 - /* assert initial skey (ACC = 0, R & C = 1) */ 619 - ASSERT_EQ(0x06, sync_regs->gprs[1]); 620 - uc_assert_diag44(self); 621 - 622 - /* SSKE + ISKE */ 623 - sync_regs->gprs[1] = skeyvalue; 624 - run->kvm_dirty_regs |= KVM_SYNC_GPRS; 625 - ASSERT_EQ(0, uc_run_once(self)); 626 - ASSERT_EQ(false, uc_handle_exit(self)); 627 - ASSERT_EQ(3, sync_regs->gprs[0]); 628 616 ASSERT_EQ(skeyvalue, sync_regs->gprs[1]); 629 617 uc_assert_diag44(self); 630 618 ··· 623 631 run->kvm_dirty_regs |= KVM_SYNC_GPRS; 624 632 ASSERT_EQ(0, uc_run_once(self)); 625 633 ASSERT_EQ(false, uc_handle_exit(self)); 626 - ASSERT_EQ(4, sync_regs->gprs[0]); 634 + ASSERT_EQ(3, sync_regs->gprs[0]); 627 635 /* assert R reset but rest of skey unchanged */ 628 636 ASSERT_EQ(skeyvalue & 0xfa, sync_regs->gprs[1]); 629 637 ASSERT_EQ(0, sync_regs->gprs[1] & 0x04);

+2 -1

tools/testing/selftests/livepatch/functions.sh

··· 306 306 result=$(dmesg | awk -v last_dmesg="$LAST_DMESG" 'p; $0 == last_dmesg { p=1 }' | \ 307 307 grep -e 'livepatch:' -e 'test_klp' | \ 308 308 grep -v '$tainting\|taints$ kernel' | \ 309 - sed 's/^\[[ 0-9.]*\] //') 309 + sed 's/^\[[ 0-9.]*\] //' | \ 310 + sed 's/^\[[ ]*[CT][0-9]*\] //') 310 311 311 312 if [[ "$expect" == "$result" ]] ; then 312 313 echo "ok"

+1 -1

tools/testing/selftests/net/mptcp/mptcp_connect.c

··· 1302 1302 return ret; 1303 1303 1304 1304 if (cfg_truncate > 0) { 1305 - xdisconnect(fd); 1305 + shutdown(fd, SHUT_WR); 1306 1306 } else if (--cfg_repeat > 0) { 1307 1307 xdisconnect(fd); 1308 1308

+26

tools/testing/selftests/net/udpgso.c

··· 103 103 .r_num_mss = 1, 104 104 }, 105 105 { 106 + /* datalen <= MSS < gso_len: will fall back to no GSO */ 107 + .tlen = CONST_MSS_V4, 108 + .gso_len = CONST_MSS_V4 + 1, 109 + .r_num_mss = 0, 110 + .r_len_last = CONST_MSS_V4, 111 + }, 112 + { 113 + /* MSS < datalen < gso_len: fail */ 114 + .tlen = CONST_MSS_V4 + 1, 115 + .gso_len = CONST_MSS_V4 + 2, 116 + .tfail = true, 117 + }, 118 + { 106 119 /* send a single MSS + 1B */ 107 120 .tlen = CONST_MSS_V4 + 1, 108 121 .gso_len = CONST_MSS_V4, ··· 217 204 .tlen = CONST_MSS_V6, 218 205 .gso_len = CONST_MSS_V6, 219 206 .r_num_mss = 1, 207 + }, 208 + { 209 + /* datalen <= MSS < gso_len: will fall back to no GSO */ 210 + .tlen = CONST_MSS_V6, 211 + .gso_len = CONST_MSS_V6 + 1, 212 + .r_num_mss = 0, 213 + .r_len_last = CONST_MSS_V6, 214 + }, 215 + { 216 + /* MSS < datalen < gso_len: fail */ 217 + .tlen = CONST_MSS_V6 + 1, 218 + .gso_len = CONST_MSS_V6 + 2, 219 + .tfail = true 220 220 }, 221 221 { 222 222 /* send a single MSS + 1B */

+199

tools/testing/selftests/seccomp/seccomp_bpf.c

··· 47 47 #include <linux/kcmp.h> 48 48 #include <sys/resource.h> 49 49 #include <sys/capability.h> 50 + #include <linux/perf_event.h> 50 51 51 52 #include <unistd.h> 52 53 #include <sys/syscall.h> ··· 67 66 68 67 #ifndef PR_SET_PTRACER 69 68 # define PR_SET_PTRACER 0x59616d61 69 + #endif 70 + 71 + #ifndef noinline 72 + #define noinline __attribute__((noinline)) 70 73 #endif 71 74 72 75 #ifndef PR_SET_NO_NEW_PRIVS ··· 4891 4886 4892 4887 EXPECT_EQ(pid, waitpid(pid, &status, 0)); 4893 4888 EXPECT_EQ(0, status); 4889 + } 4890 + 4891 + noinline int probed(void) 4892 + { 4893 + return 1; 4894 + } 4895 + 4896 + static int parse_uint_from_file(const char *file, const char *fmt) 4897 + { 4898 + int err = -1, ret; 4899 + FILE *f; 4900 + 4901 + f = fopen(file, "re"); 4902 + if (f) { 4903 + err = fscanf(f, fmt, &ret); 4904 + fclose(f); 4905 + } 4906 + return err == 1 ? ret : err; 4907 + } 4908 + 4909 + static int determine_uprobe_perf_type(void) 4910 + { 4911 + const char *file = "/sys/bus/event_source/devices/uprobe/type"; 4912 + 4913 + return parse_uint_from_file(file, "%d\n"); 4914 + } 4915 + 4916 + static int determine_uprobe_retprobe_bit(void) 4917 + { 4918 + const char *file = "/sys/bus/event_source/devices/uprobe/format/retprobe"; 4919 + 4920 + return parse_uint_from_file(file, "config:%d\n"); 4921 + } 4922 + 4923 + static ssize_t get_uprobe_offset(const void *addr) 4924 + { 4925 + size_t start, base, end; 4926 + bool found = false; 4927 + char buf[256]; 4928 + FILE *f; 4929 + 4930 + f = fopen("/proc/self/maps", "r"); 4931 + if (!f) 4932 + return -1; 4933 + 4934 + while (fscanf(f, "%zx-%zx %s %zx %*[^\n]\n", &start, &end, buf, &base) == 4) { 4935 + if (buf[2] == 'x' && (uintptr_t)addr >= start && (uintptr_t)addr < end) { 4936 + found = true; 4937 + break; 4938 + } 4939 + } 4940 + fclose(f); 4941 + return found ? (uintptr_t)addr - start + base : -1; 4942 + } 4943 + 4944 + FIXTURE(URETPROBE) { 4945 + int fd; 4946 + }; 4947 + 4948 + FIXTURE_VARIANT(URETPROBE) { 4949 + /* 4950 + * All of the URETPROBE behaviors can be tested with either 4951 + * uretprobe attached or not 4952 + */ 4953 + bool attach; 4954 + }; 4955 + 4956 + FIXTURE_VARIANT_ADD(URETPROBE, attached) { 4957 + .attach = true, 4958 + }; 4959 + 4960 + FIXTURE_VARIANT_ADD(URETPROBE, not_attached) { 4961 + .attach = false, 4962 + }; 4963 + 4964 + FIXTURE_SETUP(URETPROBE) 4965 + { 4966 + const size_t attr_sz = sizeof(struct perf_event_attr); 4967 + struct perf_event_attr attr; 4968 + ssize_t offset; 4969 + int type, bit; 4970 + 4971 + #ifndef __NR_uretprobe 4972 + SKIP(return, "__NR_uretprobe syscall not defined"); 4973 + #endif 4974 + 4975 + if (!variant->attach) 4976 + return; 4977 + 4978 + memset(&attr, 0, attr_sz); 4979 + 4980 + type = determine_uprobe_perf_type(); 4981 + ASSERT_GE(type, 0); 4982 + bit = determine_uprobe_retprobe_bit(); 4983 + ASSERT_GE(bit, 0); 4984 + offset = get_uprobe_offset(probed); 4985 + ASSERT_GE(offset, 0); 4986 + 4987 + attr.config |= 1 << bit; 4988 + attr.size = attr_sz; 4989 + attr.type = type; 4990 + attr.config1 = ptr_to_u64("/proc/self/exe"); 4991 + attr.config2 = offset; 4992 + 4993 + self->fd = syscall(__NR_perf_event_open, &attr, 4994 + getpid() /* pid */, -1 /* cpu */, -1 /* group_fd */, 4995 + PERF_FLAG_FD_CLOEXEC); 4996 + } 4997 + 4998 + FIXTURE_TEARDOWN(URETPROBE) 4999 + { 5000 + /* we could call close(self->fd), but we'd need extra filter for 5001 + * that and since we are calling _exit right away.. 5002 + */ 5003 + } 5004 + 5005 + static int run_probed_with_filter(struct sock_fprog *prog) 5006 + { 5007 + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) || 5008 + seccomp(SECCOMP_SET_MODE_FILTER, 0, prog)) { 5009 + return -1; 5010 + } 5011 + 5012 + probed(); 5013 + return 0; 5014 + } 5015 + 5016 + TEST_F(URETPROBE, uretprobe_default_allow) 5017 + { 5018 + struct sock_filter filter[] = { 5019 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 5020 + }; 5021 + struct sock_fprog prog = { 5022 + .len = (unsigned short)ARRAY_SIZE(filter), 5023 + .filter = filter, 5024 + }; 5025 + 5026 + ASSERT_EQ(0, run_probed_with_filter(&prog)); 5027 + } 5028 + 5029 + TEST_F(URETPROBE, uretprobe_default_block) 5030 + { 5031 + struct sock_filter filter[] = { 5032 + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 5033 + offsetof(struct seccomp_data, nr)), 5034 + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_exit_group, 1, 0), 5035 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL), 5036 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 5037 + }; 5038 + struct sock_fprog prog = { 5039 + .len = (unsigned short)ARRAY_SIZE(filter), 5040 + .filter = filter, 5041 + }; 5042 + 5043 + ASSERT_EQ(0, run_probed_with_filter(&prog)); 5044 + } 5045 + 5046 + TEST_F(URETPROBE, uretprobe_block_uretprobe_syscall) 5047 + { 5048 + struct sock_filter filter[] = { 5049 + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 5050 + offsetof(struct seccomp_data, nr)), 5051 + #ifdef __NR_uretprobe 5052 + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uretprobe, 0, 1), 5053 + #endif 5054 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL), 5055 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 5056 + }; 5057 + struct sock_fprog prog = { 5058 + .len = (unsigned short)ARRAY_SIZE(filter), 5059 + .filter = filter, 5060 + }; 5061 + 5062 + ASSERT_EQ(0, run_probed_with_filter(&prog)); 5063 + } 5064 + 5065 + TEST_F(URETPROBE, uretprobe_default_block_with_uretprobe_syscall) 5066 + { 5067 + struct sock_filter filter[] = { 5068 + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 5069 + offsetof(struct seccomp_data, nr)), 5070 + #ifdef __NR_uretprobe 5071 + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uretprobe, 2, 0), 5072 + #endif 5073 + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_exit_group, 1, 0), 5074 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL), 5075 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 5076 + }; 5077 + struct sock_fprog prog = { 5078 + .len = (unsigned short)ARRAY_SIZE(filter), 5079 + .filter = filter, 5080 + }; 5081 + 5082 + ASSERT_EQ(0, run_probed_with_filter(&prog)); 4894 5083 } 4895 5084 4896 5085 /*

+33 -1

tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json

··· 94 94 "$TC qdisc del dev $DUMMY ingress", 95 95 "$IP addr del 10.10.10.10/24 dev $DUMMY" 96 96 ] 97 - } 97 + }, 98 + { 99 + "id": "a4b9", 100 + "name": "Test class qlen notification", 101 + "category": [ 102 + "qdisc" 103 + ], 104 + "plugins": { 105 + "requires": "nsPlugin" 106 + }, 107 + "setup": [ 108 + "$IP link set dev $DUMMY up || true", 109 + "$IP addr add 10.10.10.10/24 dev $DUMMY || true", 110 + "$TC qdisc add dev $DUMMY root handle 1: drr", 111 + "$TC filter add dev $DUMMY parent 1: basic classid 1:1", 112 + "$TC class add dev $DUMMY parent 1: classid 1:1 drr", 113 + "$TC qdisc add dev $DUMMY parent 1:1 handle 2: netem", 114 + "$TC qdisc add dev $DUMMY parent 2: handle 3: drr", 115 + "$TC filter add dev $DUMMY parent 3: basic action drop", 116 + "$TC class add dev $DUMMY parent 3: classid 3:1 drr", 117 + "$TC class del dev $DUMMY classid 1:1", 118 + "$TC class add dev $DUMMY parent 1: classid 1:1 drr" 119 + ], 120 + "cmdUnderTest": "ping -c1 -W0.01 -I $DUMMY 10.10.10.1", 121 + "expExitCode": "1", 122 + "verifyCmd": "$TC qdisc ls dev $DUMMY", 123 + "matchPattern": "drr 1: root", 124 + "matchCount": "1", 125 + "teardown": [ 126 + "$TC qdisc del dev $DUMMY root handle 1: drr", 127 + "$IP addr del 10.10.10.10/24 dev $DUMMY" 128 + ] 129 + } 98 130 ]

+23

tools/testing/selftests/tc-testing/tc-tests/qdiscs/fifo.json

··· 313 313 "matchPattern": "qdisc bfifo 1: root", 314 314 "matchCount": "0", 315 315 "teardown": [ 316 + ] 317 + }, 318 + { 319 + "id": "d774", 320 + "name": "Check pfifo_head_drop qdisc enqueue behaviour when limit == 0", 321 + "category": [ 322 + "qdisc", 323 + "pfifo_head_drop" 324 + ], 325 + "plugins": { 326 + "requires": "nsPlugin" 327 + }, 328 + "setup": [ 329 + "$IP addr add 10.10.10.10/24 dev $DUMMY || true", 330 + "$TC qdisc add dev $DUMMY root handle 1: pfifo_head_drop limit 0", 331 + "$IP link set dev $DUMMY up || true" 332 + ], 333 + "cmdUnderTest": "ping -c2 -W0.01 -I $DUMMY 10.10.10.1", 334 + "expExitCode": "1", 335 + "verifyCmd": "$TC -s qdisc show dev $DUMMY", 336 + "matchPattern": "dropped 2", 337 + "matchCount": "1", 338 + "teardown": [ 316 339 ] 317 340 } 318 341 ]

+9 -16

virt/kvm/kvm_main.c

··· 1071 1071 } 1072 1072 1073 1073 /* 1074 - * Called after the VM is otherwise initialized, but just before adding it to 1075 - * the vm_list. 1076 - */ 1077 - int __weak kvm_arch_post_init_vm(struct kvm *kvm) 1078 - { 1079 - return 0; 1080 - } 1081 - 1082 - /* 1083 1074 * Called just after removing the VM from the vm_list, but before doing any 1084 1075 * other destruction. 1085 1076 */ ··· 1190 1199 if (r) 1191 1200 goto out_err_no_debugfs; 1192 1201 1193 - r = kvm_arch_post_init_vm(kvm); 1194 - if (r) 1195 - goto out_err; 1196 - 1197 1202 mutex_lock(&kvm_lock); 1198 1203 list_add(&kvm->vm_list, &vm_list); 1199 1204 mutex_unlock(&kvm_lock); ··· 1199 1212 1200 1213 return kvm; 1201 1214 1202 - out_err: 1203 - kvm_destroy_vm_debugfs(kvm); 1204 1215 out_err_no_debugfs: 1205 1216 kvm_coalesced_mmio_free(kvm); 1206 1217 out_no_coalesced_mmio: ··· 1956 1971 return -EINVAL; 1957 1972 if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr) 1958 1973 return -EINVAL; 1959 - if ((mem->memory_size >> PAGE_SHIFT) > KVM_MEM_MAX_NR_PAGES) 1974 + 1975 + /* 1976 + * The size of userspace-defined memory regions is restricted in order 1977 + * to play nice with dirty bitmap operations, which are indexed with an 1978 + * "unsigned int". KVM's internal memory regions don't support dirty 1979 + * logging, and so are exempt. 1980 + */ 1981 + if (id < KVM_USER_MEM_SLOTS && 1982 + (mem->memory_size >> PAGE_SHIFT) > KVM_MEM_MAX_NR_PAGES) 1960 1983 return -EINVAL; 1961 1984 1962 1985 slots = __kvm_memslots(kvm, as_id);