···25152515D: Initial implementation of VC's, pty's and select()2516251625172517N: Pavel Machek25182518-E: pavel@ucw.cz25182518+E: pavel@kernel.org25192519P: 4096R/92DFCE96 4FA7 9EEF FCD4 C44F C585 B8C7 C060 2241 92DF CE9625202520-D: Softcursor for vga, hypertech cdrom support, vcsa bugfix, nbd,25212521-D: sun4/330 port, capabilities for elf, speedup for rm on ext2, USB,25222522-D: work on suspend-to-ram/disk, killing duplicates from ioctl32,25202520+D: NBD, Sun4/330 port, USB, work on suspend-to-ram/disk,25232521D: Altera SoCFPGA and Nokia N900 support.25242522S: Czech Republic25252523
···14141515description: |1616 The Microchip LAN966x outband interrupt controller (OIC) maps the internal1717- interrupt sources of the LAN966x device to an external interrupt.1818- When the LAN966x device is used as a PCI device, the external interrupt is1919- routed to the PCI interrupt.1717+ interrupt sources of the LAN966x device to a PCI interrupt when the LAN966x1818+ device is used as a PCI device.20192120properties:2221 compatible:
···11+Submitting patches to bcachefs:22+===============================33+44+Patches must be tested before being submitted, either with the xfstests suite55+[0], or the full bcachefs test suite in ktest [1], depending on what's being66+touched. Note that ktest wraps xfstests and will be an easier method to running77+it for most users; it includes single-command wrappers for all the mainstream88+in-kernel local filesystems.99+1010+Patches will undergo more testing after being merged (including1111+lockdep/kasan/preempt/etc. variants), these are not generally required to be1212+run by the submitter - but do put some thought into what you're changing and1313+which tests might be relevant, e.g. are you dealing with tricky memory layout1414+work? kasan, are you doing locking work? then lockdep; and ktest includes1515+single-command variants for the debug build types you'll most likely need.1616+1717+The exception to this rule is incomplete WIP/RFC patches: if you're working on1818+something nontrivial, it's encouraged to send out a WIP patch to let people1919+know what you're doing and make sure you're on the right track. Just make sure2020+it includes a brief note as to what's done and what's incomplete, to avoid2121+confusion.2222+2323+Rigorous checkpatch.pl adherence is not required (many of its warnings are2424+considered out of date), but try not to deviate too much without reason.2525+2626+Focus on writing code that reads well and is organized well; code should be2727+aesthetically pleasing.2828+2929+CI:3030+===3131+3232+Instead of running your tests locally, when running the full test suite it's3333+prefereable to let a server farm do it in parallel, and then have the results3434+in a nice test dashboard (which can tell you which failures are new, and3535+presents results in a git log view, avoiding the need for most bisecting).3636+3737+That exists [2], and community members may request an account. If you work for3838+a big tech company, you'll need to help out with server costs to get access -3939+but the CI is not restricted to running bcachefs tests: it runs any ktest test4040+(which generally makes it easy to wrap other tests that can run in qemu).4141+4242+Other things to think about:4343+============================4444+4545+- How will we debug this code? Is there sufficient introspection to diagnose4646+ when something starts acting wonky on a user machine?4747+4848+ We don't necessarily need every single field of every data structure visible4949+ with introspection, but having the important fields of all the core data5050+ types wired up makes debugging drastically easier - a bit of thoughtful5151+ foresight greatly reduces the need to have people build custom kernels with5252+ debug patches.5353+5454+ More broadly, think about all the debug tooling that might be needed.5555+5656+- Does it make the codebase more or less of a mess? Can we also try to do some5757+ organizing, too?5858+5959+- Do new tests need to be written? New assertions? How do we know and verify6060+ that the code is correct, and what happens if something goes wrong?6161+6262+ We don't yet have automated code coverage analysis or easy fault injection -6363+ but for now, pretend we did and ask what they might tell us.6464+6565+ Assertions are hugely important, given that we don't yet have a systems6666+ language that can do ergonomic embedded correctness proofs. Hitting an assert6767+ in testing is much better than wandering off into undefined behaviour la-la6868+ land - use them. Use them judiciously, and not as a replacement for proper6969+ error handling, but use them.7070+7171+- Does it need to be performance tested? Should we add new peformance counters?7272+7373+ bcachefs has a set of persistent runtime counters which can be viewed with7474+ the 'bcachefs fs top' command; this should give users a basic idea of what7575+ their filesystem is currently doing. If you're doing a new feature or looking7676+ at old code, think if anything should be added.7777+7878+- If it's a new on disk format feature - have upgrades and downgrades been7979+ tested? (Automated tests exists but aren't in the CI, due to the hassle of8080+ disk image management; coordinate to have them run.)8181+8282+Mailing list, IRC:8383+==================8484+8585+Patches should hit the list [3], but much discussion and code review happens on8686+IRC as well [4]; many people appreciate the more conversational approach and8787+quicker feedback.8888+8989+Additionally, we have a lively user community doing excellent QA work, which9090+exists primarily on IRC. Please make use of that resource; user feedback is9191+important for any nontrivial feature, and documenting it in commit messages9292+would be a good idea.9393+9494+[0]: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git9595+[1]: https://evilpiepirate.org/git/ktest.git/9696+[2]: https://evilpiepirate.org/~testdashboard/ci/9797+[3]: linux-bcachefs@vger.kernel.org9898+[4]: irc.oftc.net#bcache, #bcachefs-dev
···14191419S390:14201420^^^^^1421142114221422-Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set.14221422+Returns -EINVAL or -EEXIST if the VM has the KVM_VM_S390_UCONTROL flag set.14231423Returns -EINVAL if called on a protected VM.14241424142514254.36 KVM_SET_TSS_ADDR
+50-7
MAINTAINERS
···22092209F: sound/soc/codecs/ssm3515.c2210221022112211ARM/APPLE MACHINE SUPPORT22122212-M: Hector Martin <marcan@marcan.st>22132212M: Sven Peter <sven@svenpeter.dev>22142213R: Alyssa Rosenzweig <alyssa@rosenzweig.io>22152214L: asahi@lists.linux.dev···39543955L: linux-bcachefs@vger.kernel.org39553956S: Supported39563957C: irc://irc.oftc.net/bcache39583958+P: Documentation/filesystems/bcachefs/SubmittingPatches.rst39573959T: git https://evilpiepirate.org/git/bcachefs.git39583960F: fs/bcachefs/39593961F: Documentation/filesystems/bcachefs/···9416941694179417FREEZER94189418M: "Rafael J. Wysocki" <rafael@kernel.org>94199419-M: Pavel Machek <pavel@ucw.cz>94199419+M: Pavel Machek <pavel@kernel.org>94209420L: linux-pm@vger.kernel.org94219421S: Supported94229422F: Documentation/power/freezing-of-tasks.rst···98769876F: drivers/staging/gpib/9877987798789878GPIO ACPI SUPPORT98799879-M: Mika Westerberg <mika.westerberg@linux.intel.com>98799879+M: Mika Westerberg <westeri@kernel.org>98809880M: Andy Shevchenko <andriy.shevchenko@linux.intel.com>98819881L: linux-gpio@vger.kernel.org98829882L: linux-acpi@vger.kernel.org···10251102511025210252HIBERNATION (aka Software Suspend, aka swsusp)1025310253M: "Rafael J. Wysocki" <rafael@kernel.org>1025410254-M: Pavel Machek <pavel@ucw.cz>1025410254+M: Pavel Machek <pavel@kernel.org>1025510255L: linux-pm@vger.kernel.org1025610256S: Supported1025710257B: https://bugzilla.kernel.org···1312213122F: scripts/leaking_addresses.pl13123131231312413124LED SUBSYSTEM1312513125-M: Pavel Machek <pavel@ucw.cz>1312613125M: Lee Jones <lee@kernel.org>1312613126+M: Pavel Machek <pavel@kernel.org>1312713127L: linux-leds@vger.kernel.org1312813128S: Maintained1312913129T: git git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git···1646016460F: net/dsa/1646116461F: tools/testing/selftests/drivers/net/dsa/16462164621646316463+NETWORKING [ETHTOOL]1646416464+M: Andrew Lunn <andrew@lunn.ch>1646516465+M: Jakub Kicinski <kuba@kernel.org>1646616466+F: Documentation/netlink/specs/ethtool.yaml1646716467+F: Documentation/networking/ethtool-netlink.rst1646816468+F: include/linux/ethtool*1646916469+F: include/uapi/linux/ethtool*1647016470+F: net/ethtool/1647116471+F: tools/testing/selftests/drivers/net/*/ethtool*1647216472+1647316473+NETWORKING [ETHTOOL CABLE TEST]1647416474+M: Andrew Lunn <andrew@lunn.ch>1647516475+F: net/ethtool/cabletest.c1647616476+F: tools/testing/selftests/drivers/net/*/ethtool*1647716477+K: cable_test1647816478+1646316479NETWORKING [GENERAL]1646416480M: "David S. Miller" <davem@davemloft.net>1646516481M: Eric Dumazet <edumazet@google.com>···1663516619NETWORKING [TCP]1663616620M: Eric Dumazet <edumazet@google.com>1663716621M: Neal Cardwell <ncardwell@google.com>1662216622+R: Kuniyuki Iwashima <kuniyu@amazon.com>1663816623L: netdev@vger.kernel.org1663916624S: Maintained1664016625F: Documentation/networking/net_cachelines/tcp_sock.rst···1666216645F: include/net/tls.h1666316646F: include/uapi/linux/tls.h1666416647F: net/tls/*1664816648+1664916649+NETWORKING [SOCKETS]1665016650+M: Eric Dumazet <edumazet@google.com>1665116651+M: Kuniyuki Iwashima <kuniyu@amazon.com>1665216652+M: Paolo Abeni <pabeni@redhat.com>1665316653+M: Willem de Bruijn <willemb@google.com>1665416654+S: Maintained1665516655+F: include/linux/sock_diag.h1665616656+F: include/linux/socket.h1665716657+F: include/linux/sockptr.h1665816658+F: include/net/sock.h1665916659+F: include/net/sock_reuseport.h1666016660+F: include/uapi/linux/socket.h1666116661+F: net/core/*sock*1666216662+F: net/core/scm.c1666316663+F: net/socket.c1666416664+1666516665+NETWORKING [UNIX SOCKETS]1666616666+M: Kuniyuki Iwashima <kuniyu@amazon.com>1666716667+S: Maintained1666816668+F: include/net/af_unix.h1666916669+F: include/net/netns/unix.h1667016670+F: include/uapi/linux/unix_diag.h1667116671+F: net/unix/1667216672+F: tools/testing/selftests/net/af_unix/16665166731666616674NETXEN (1/10) GbE SUPPORT1666716675M: Manish Chopra <manishc@marvell.com>···1682116779F: kernel/time/tick*.*16822167801682316781NOKIA N900 CAMERA SUPPORT (ET8EK8 SENSOR, AD5820 FOCUS)1682416824-M: Pavel Machek <pavel@ucw.cz>1678216782+M: Pavel Machek <pavel@kernel.org>1682516783M: Sakari Ailus <sakari.ailus@iki.fi>1682616784L: linux-media@vger.kernel.org1682716785S: Maintained···1775217710L: dev@openvswitch.org1775317711S: Maintained1775417712W: http://openvswitch.org1771317713+F: Documentation/networking/openvswitch.rst1775517714F: include/uapi/linux/openvswitch.h1775617715F: net/openvswitch/1775717716F: tools/testing/selftests/net/openvswitch/···2284622803SUSPEND TO RAM2284722804M: "Rafael J. Wysocki" <rafael@kernel.org>2284822805M: Len Brown <len.brown@intel.com>2284922849-M: Pavel Machek <pavel@ucw.cz>2280622806+M: Pavel Machek <pavel@kernel.org>2285022807L: linux-pm@vger.kernel.org2285122808S: Supported2285222809B: https://bugzilla.kernel.org
···7474/*7575 * This is used to ensure we don't load something for the wrong architecture.7676 */7777-#define elf_check_arch(x) ((x)->e_machine == EM_ALPHA)7777+#define elf_check_arch(x) (((x)->e_machine == EM_ALPHA) && !((x)->e_flags & EF_ALPHA_32BIT))78787979/*8080 * These are used to set parameters in the core dumps.···136136 ( i_ == IMPLVER_EV5 ? "ev56" \137137 : amask (AMASK_CIX) ? "ev6" : "ev67"); \138138})139139-140140-#define SET_PERSONALITY(EX) \141141- set_personality(((EX).e_flags & EF_ALPHA_32BIT) \142142- ? PER_LINUX_32BIT : PER_LINUX)143139144140extern int alpha_l1i_cacheshape;145141extern int alpha_l1d_cacheshape;
+1-1
arch/alpha/include/asm/pgtable.h
···360360361361extern void paging_init(void);362362363363-/* We have our own get_unmapped_area to cope with ADDR_LIMIT_32BIT. */363363+/* We have our own get_unmapped_area */364364#define HAVE_ARCH_UNMAPPED_AREA365365366366#endif /* _ALPHA_PGTABLE_H */
+2-6
arch/alpha/include/asm/processor.h
···88#ifndef __ASM_ALPHA_PROCESSOR_H99#define __ASM_ALPHA_PROCESSOR_H10101111-#include <linux/personality.h> /* for ADDR_LIMIT_32BIT */1212-1311/*1412 * We have a 42-bit user address space: 4TB user VM...1513 */1614#define TASK_SIZE (0x40000000000UL)17151818-#define STACK_TOP \1919- (current->personality & ADDR_LIMIT_32BIT ? 0x80000000 : 0x00120000000UL)1616+#define STACK_TOP (0x00120000000UL)20172118#define STACK_TOP_MAX 0x00120000000UL22192320/* This decides where the kernel will search for a free chunk of vm2421 * space during mmap's.2522 */2626-#define TASK_UNMAPPED_BASE \2727- ((current->personality & ADDR_LIMIT_32BIT) ? 0x40000000 : TASK_SIZE / 2)2323+#define TASK_UNMAPPED_BASE (TASK_SIZE / 2)28242925/* This is dead. Everything has been moved to thread_info. */3026struct thread_struct { };
+2-9
arch/alpha/kernel/osf_sys.c
···12101210 return ret;12111211}1212121212131213-/* Get an address range which is currently unmapped. Similar to the12141214- generic version except that we know how to honor ADDR_LIMIT_32BIT. */12131213+/* Get an address range which is currently unmapped. */1215121412161215static unsigned long12171216arch_get_unmapped_area_1(unsigned long addr, unsigned long len,···12291230 unsigned long len, unsigned long pgoff,12301231 unsigned long flags, vm_flags_t vm_flags)12311232{12321232- unsigned long limit;12331233-12341234- /* "32 bit" actually means 31 bit, since pointers sign extend. */12351235- if (current->personality & ADDR_LIMIT_32BIT)12361236- limit = 0x80000000;12371237- else12381238- limit = TASK_SIZE;12331233+ unsigned long limit = TASK_SIZE;1239123412401235 if (len > limit)12411236 return -ENOMEM;
+11-38
arch/arm64/kvm/arch_timer.c
···471471472472 trace_kvm_timer_emulate(ctx, should_fire);473473474474- if (should_fire != ctx->irq.level) {474474+ if (should_fire != ctx->irq.level)475475 kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);476476- return;477477- }478476479477 kvm_timer_update_status(ctx, should_fire);480478···759761 timer_irq(map->direct_ptimer),760762 &arch_timer_irq_ops);761763 WARN_ON_ONCE(ret);762762-763763- /*764764- * The virtual offset behaviour is "interesting", as it765765- * always applies when HCR_EL2.E2H==0, but only when766766- * accessed from EL1 when HCR_EL2.E2H==1. So make sure we767767- * track E2H when putting the HV timer in "direct" mode.768768- */769769- if (map->direct_vtimer == vcpu_hvtimer(vcpu)) {770770- struct arch_timer_offset *offs = &map->direct_vtimer->offset;771771-772772- if (vcpu_el2_e2h_is_set(vcpu))773773- offs->vcpu_offset = NULL;774774- else775775- offs->vcpu_offset = &__vcpu_sys_reg(vcpu, CNTVOFF_EL2);776776- }777764 }778765}779766···959976 * which allows trapping of the timer registers even with NV2.960977 * Still, this is still worse than FEAT_NV on its own. Meh.961978 */962962- if (!vcpu_el2_e2h_is_set(vcpu)) {963963- if (cpus_have_final_cap(ARM64_HAS_ECV))964964- return;965965-966966- /*967967- * A non-VHE guest hypervisor doesn't have any direct access968968- * to its timers: the EL2 registers trap (and the HW is969969- * fully emulated), while the EL0 registers access memory970970- * despite the access being notionally direct. Boo.971971- *972972- * We update the hardware timer registers with the973973- * latest value written by the guest to the VNCR page974974- * and let the hardware take care of the rest.975975- */976976- write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0), SYS_CNTV_CTL);977977- write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL);978978- write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0), SYS_CNTP_CTL);979979- write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL);980980- } else {979979+ if (!cpus_have_final_cap(ARM64_HAS_ECV)) {981980 /*982981 * For a VHE guest hypervisor, the EL2 state is directly983983- * stored in the host EL1 timers, while the emulated EL0982982+ * stored in the host EL1 timers, while the emulated EL1984983 * state is stored in the VNCR page. The latter could have985984 * been updated behind our back, and we must reset the986985 * emulation of the timers.986986+ *987987+ * A non-VHE guest hypervisor doesn't have any direct access988988+ * to its timers: the EL2 registers trap despite being989989+ * notionally direct (we use the EL1 HW, as for VHE), while990990+ * the EL1 registers access memory.991991+ *992992+ * In both cases, process the emulated timers on each guest993993+ * exit. Boo.987994 */988995 struct timer_map map;989996 get_timer_map(vcpu, &map);
+20
arch/arm64/kvm/arm.c
···22902290 break;22912291 case -ENODEV:22922292 case -ENXIO:22932293+ /*22942294+ * No VGIC? No pKVM for you.22952295+ *22962296+ * Protected mode assumes that VGICv3 is present, so no point22972297+ * in trying to hobble along if vgic initialization fails.22982298+ */22992299+ if (is_protected_kvm_enabled())23002300+ goto out;23012301+23022302+ /*23032303+ * Otherwise, userspace could choose to implement a GIC for its23042304+ * guest on non-cooperative hardware.23052305+ */22932306 vgic_present = false;22942307 err = 0;22952308 break;···24132400 kvm_nvhe_sym(id_aa64smfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64SMFR0_EL1);24142401 kvm_nvhe_sym(__icache_flags) = __icache_flags;24152402 kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits;24032403+24042404+ /*24052405+ * Flush entire BSS since part of its data containing init symbols is read24062406+ * while the MMU is off.24072407+ */24082408+ kvm_flush_dcache_to_poc(kvm_ksym_ref(__hyp_bss_start),24092409+ kvm_ksym_ref(__hyp_bss_end) - kvm_ksym_ref(__hyp_bss_start));24162410}2417241124182412static int __init kvm_hyp_init_protection(u32 hyp_va_bits)
···6767 if (!tmp)6868 return -ENOMEM;69697070+ swap(kvm->arch.nested_mmus, tmp);7171+7072 /*7173 * If we went through a realocation, adjust the MMU back-pointers in7274 * the previously initialised kvm_pgtable structures.7375 */7476 if (kvm->arch.nested_mmus != tmp)7577 for (int i = 0; i < kvm->arch.nested_mmus_size; i++)7676- tmp[i].pgt->mmu = &tmp[i];7878+ kvm->arch.nested_mmus[i].pgt->mmu = &kvm->arch.nested_mmus[i];77797880 for (int i = kvm->arch.nested_mmus_size; !ret && i < num_mmus; i++)7979- ret = init_nested_s2_mmu(kvm, &tmp[i]);8181+ ret = init_nested_s2_mmu(kvm, &kvm->arch.nested_mmus[i]);80828183 if (ret) {8284 for (int i = kvm->arch.nested_mmus_size; i < num_mmus; i++)8383- kvm_free_stage2_pgd(&tmp[i]);8585+ kvm_free_stage2_pgd(&kvm->arch.nested_mmus[i]);84868587 return ret;8688 }87898890 kvm->arch.nested_mmus_size = num_mmus;8989- kvm->arch.nested_mmus = tmp;90919192 return 0;9293}
···2323/**2424 * struct gmap_struct - guest address space2525 * @list: list head for the mm->context gmap list2626- * @crst_list: list of all crst tables used in the guest address space2726 * @mm: pointer to the parent mm_struct2827 * @guest_to_host: radix tree with guest to host address translation2928 * @host_to_guest: radix tree with pointer to segment table entries···3435 * @guest_handle: protected virtual machine handle for the ultravisor3536 * @host_to_rmap: radix tree with gmap_rmap lists3637 * @children: list of shadow gmap structures3737- * @pt_list: list of all page tables used in the shadow guest address space3838 * @shadow_lock: spinlock to protect the shadow gmap list3939 * @parent: pointer to the parent gmap for shadow guest address spaces4040 * @orig_asce: ASCE for which the shadow page table has been created···4345 */4446struct gmap {4547 struct list_head list;4646- struct list_head crst_list;4748 struct mm_struct *mm;4849 struct radix_tree_root guest_to_host;4950 struct radix_tree_root host_to_guest;···5861 /* Additional data for shadow guest address spaces */5962 struct radix_tree_root host_to_rmap;6063 struct list_head children;6161- struct list_head pt_list;6264 spinlock_t shadow_lock;6365 struct gmap *parent;6466 unsigned long orig_asce;···102106void gmap_remove(struct gmap *gmap);103107struct gmap *gmap_get(struct gmap *gmap);104108void gmap_put(struct gmap *gmap);109109+void gmap_free(struct gmap *gmap);110110+struct gmap *gmap_alloc(unsigned long limit);105111106112int gmap_map_segment(struct gmap *gmap, unsigned long from,107113 unsigned long to, unsigned long len);108114int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long len);109115unsigned long __gmap_translate(struct gmap *, unsigned long gaddr);110110-unsigned long gmap_translate(struct gmap *, unsigned long gaddr);111116int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr);112112-int gmap_fault(struct gmap *, unsigned long gaddr, unsigned int fault_flags);113117void gmap_discard(struct gmap *, unsigned long from, unsigned long to);114118void __gmap_zap(struct gmap *, unsigned long gaddr);115119void gmap_unlink(struct mm_struct *, unsigned long *table, unsigned long vmaddr);116120117121int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val);118122119119-struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,120120- int edat_level);121121-int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level);123123+void gmap_unshadow(struct gmap *sg);122124int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,123125 int fake);124126int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,···125131 int fake);126132int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,127133 int fake);128128-int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr,129129- unsigned long *pgt, int *dat_protection, int *fake);130134int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte);131135132136void gmap_register_pte_notifier(struct gmap_notifier *);133137void gmap_unregister_pte_notifier(struct gmap_notifier *);134138135135-int gmap_mprotect_notify(struct gmap *, unsigned long start,136136- unsigned long len, int prot);139139+int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, unsigned long bits);137140138141void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],139142 unsigned long gaddr, unsigned long vmaddr);140143int s390_disable_cow_sharing(void);141141-void s390_unlist_old_asce(struct gmap *gmap);142144int s390_replace_asce(struct gmap *gmap);143145void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns);144146int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start,145147 unsigned long end, bool interruptible);148148+int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split);149149+unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int level);146150147151/**148152 * s390_uv_destroy_range - Destroy a range of pages in the given mm.
+5-1
arch/s390/include/asm/kvm_host.h
···3030#define KVM_S390_ESCA_CPU_SLOTS 2483131#define KVM_MAX_VCPUS 25532323333+#define KVM_INTERNAL_MEM_SLOTS 13434+3335/*3436 * These seem to be used for allocating ->chip in the routing table, which we3537 * don't use. 1 is as small as we can get to reduce the needed memory. If we···933931 u8 reserved928[0x1000 - 0x928]; /* 0x0928 */934932};935933934934+struct vsie_page;935935+936936struct kvm_s390_vsie {937937 struct mutex mutex;938938 struct radix_tree_root addr_to_page;939939 int page_count;940940 int next;941941- struct page *pages[KVM_MAX_VCPUS];941941+ struct vsie_page *pages[KVM_MAX_VCPUS];942942};943943944944struct kvm_s390_gisa_iam {
+18-3
arch/s390/include/asm/pgtable.h
···420420#define PGSTE_HC_BIT 0x0020000000000000UL421421#define PGSTE_GR_BIT 0x0004000000000000UL422422#define PGSTE_GC_BIT 0x0002000000000000UL423423-#define PGSTE_UC_BIT 0x0000800000000000UL /* user dirty (migration) */424424-#define PGSTE_IN_BIT 0x0000400000000000UL /* IPTE notify bit */425425-#define PGSTE_VSIE_BIT 0x0000200000000000UL /* ref'd in a shadow table */423423+#define PGSTE_ST2_MASK 0x0000ffff00000000UL424424+#define PGSTE_UC_BIT 0x0000000000008000UL /* user dirty (migration) */425425+#define PGSTE_IN_BIT 0x0000000000004000UL /* IPTE notify bit */426426+#define PGSTE_VSIE_BIT 0x0000000000002000UL /* ref'd in a shadow table */426427427428/* Guest Page State used for virtualization */428429#define _PGSTE_GPS_ZERO 0x0000000080000000UL···2007200620082007#define pmd_pgtable(pmd) \20092008 ((pgtable_t)__va(pmd_val(pmd) & -sizeof(pte_t)*PTRS_PER_PTE))20092009+20102010+static inline unsigned long gmap_pgste_get_pgt_addr(unsigned long *pgt)20112011+{20122012+ unsigned long *pgstes, res;20132013+20142014+ pgstes = pgt + _PAGE_ENTRIES;20152015+20162016+ res = (pgstes[0] & PGSTE_ST2_MASK) << 16;20172017+ res |= pgstes[1] & PGSTE_ST2_MASK;20182018+ res |= (pgstes[2] & PGSTE_ST2_MASK) >> 16;20192019+ res |= (pgstes[3] & PGSTE_ST2_MASK) >> 32;20202020+20212021+ return res;20222022+}2010202320112024#endif /* _S390_PAGE_H */
+3-3
arch/s390/include/asm/uv.h
···628628}629629630630int uv_pin_shared(unsigned long paddr);631631-int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);632632-int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr);633631int uv_destroy_folio(struct folio *folio);634632int uv_destroy_pte(pte_t pte);635633int uv_convert_from_secure_pte(pte_t pte);636636-int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr);634634+int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb);635635+int uv_convert_from_secure(unsigned long paddr);636636+int uv_convert_from_secure_folio(struct folio *folio);637637638638void setup_uv(void);639639
+29-263
arch/s390/kernel/uv.c
···1919#include <asm/sections.h>2020#include <asm/uv.h>21212222-#if !IS_ENABLED(CONFIG_KVM)2323-unsigned long __gmap_translate(struct gmap *gmap, unsigned long gaddr)2424-{2525- return 0;2626-}2727-2828-int gmap_fault(struct gmap *gmap, unsigned long gaddr,2929- unsigned int fault_flags)3030-{3131- return 0;3232-}3333-#endif3434-3522/* the bootdata_preserved fields come from ones in arch/s390/boot/uv.c */3623int __bootdata_preserved(prot_virt_guest);3724EXPORT_SYMBOL(prot_virt_guest);···146159 folio_put(folio);147160 return rc;148161}162162+EXPORT_SYMBOL(uv_destroy_folio);149163150164/*151165 * The present PTE still indirectly holds a folio reference through the mapping.···163175 *164176 * @paddr: Absolute host address of page to be exported165177 */166166-static int uv_convert_from_secure(unsigned long paddr)178178+int uv_convert_from_secure(unsigned long paddr)167179{168180 struct uv_cb_cfs uvcb = {169181 .header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,···175187 return -EINVAL;176188 return 0;177189}190190+EXPORT_SYMBOL_GPL(uv_convert_from_secure);178191179192/*180193 * The caller must already hold a reference to the folio.181194 */182182-static int uv_convert_from_secure_folio(struct folio *folio)195195+int uv_convert_from_secure_folio(struct folio *folio)183196{184197 int rc;185198···195206 folio_put(folio);196207 return rc;197208}209209+EXPORT_SYMBOL_GPL(uv_convert_from_secure_folio);198210199211/*200212 * The present PTE still indirectly holds a folio reference through the mapping.···227237 return res;228238}229239230230-static int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)240240+/**241241+ * make_folio_secure() - make a folio secure242242+ * @folio: the folio to make secure243243+ * @uvcb: the uvcb that describes the UVC to be used244244+ *245245+ * The folio @folio will be made secure if possible, @uvcb will be passed246246+ * as-is to the UVC.247247+ *248248+ * Return: 0 on success;249249+ * -EBUSY if the folio is in writeback or has too many references;250250+ * -E2BIG if the folio is large;251251+ * -EAGAIN if the UVC needs to be attempted again;252252+ * -ENXIO if the address is not mapped;253253+ * -EINVAL if the UVC failed for other reasons.254254+ *255255+ * Context: The caller must hold exactly one extra reference on the folio256256+ * (it's the same logic as split_folio())257257+ */258258+int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)231259{232260 int expected, cc = 0;233261262262+ if (folio_test_large(folio))263263+ return -E2BIG;234264 if (folio_test_writeback(folio))235235- return -EAGAIN;236236- expected = expected_folio_refs(folio);265265+ return -EBUSY;266266+ expected = expected_folio_refs(folio) + 1;237267 if (!folio_ref_freeze(folio, expected))238268 return -EBUSY;239269 set_bit(PG_arch_1, &folio->flags);···277267 return -EAGAIN;278268 return uvcb->rc == 0x10a ? -ENXIO : -EINVAL;279269}280280-281281-/**282282- * should_export_before_import - Determine whether an export is needed283283- * before an import-like operation284284- * @uvcb: the Ultravisor control block of the UVC to be performed285285- * @mm: the mm of the process286286- *287287- * Returns whether an export is needed before every import-like operation.288288- * This is needed for shared pages, which don't trigger a secure storage289289- * exception when accessed from a different guest.290290- *291291- * Although considered as one, the Unpin Page UVC is not an actual import,292292- * so it is not affected.293293- *294294- * No export is needed also when there is only one protected VM, because the295295- * page cannot belong to the wrong VM in that case (there is no "other VM"296296- * it can belong to).297297- *298298- * Return: true if an export is needed before every import, otherwise false.299299- */300300-static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm)301301-{302302- /*303303- * The misc feature indicates, among other things, that importing a304304- * shared page from a different protected VM will automatically also305305- * transfer its ownership.306306- */307307- if (uv_has_feature(BIT_UV_FEAT_MISC))308308- return false;309309- if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED)310310- return false;311311- return atomic_read(&mm->context.protected_count) > 1;312312-}313313-314314-/*315315- * Drain LRU caches: the local one on first invocation and the ones of all316316- * CPUs on successive invocations. Returns "true" on the first invocation.317317- */318318-static bool drain_lru(bool *drain_lru_called)319319-{320320- /*321321- * If we have tried a local drain and the folio refcount322322- * still does not match our expected safe value, try with a323323- * system wide drain. This is needed if the pagevecs holding324324- * the page are on a different CPU.325325- */326326- if (*drain_lru_called) {327327- lru_add_drain_all();328328- /* We give up here, don't retry immediately. */329329- return false;330330- }331331- /*332332- * We are here if the folio refcount does not match the333333- * expected safe value. The main culprits are usually334334- * pagevecs. With lru_add_drain() we drain the pagevecs335335- * on the local CPU so that hopefully the refcount will336336- * reach the expected safe value.337337- */338338- lru_add_drain();339339- *drain_lru_called = true;340340- /* The caller should try again immediately */341341- return true;342342-}343343-344344-/*345345- * Requests the Ultravisor to make a page accessible to a guest.346346- * If it's brought in the first time, it will be cleared. If347347- * it has been exported before, it will be decrypted and integrity348348- * checked.349349- */350350-int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)351351-{352352- struct vm_area_struct *vma;353353- bool drain_lru_called = false;354354- spinlock_t *ptelock;355355- unsigned long uaddr;356356- struct folio *folio;357357- pte_t *ptep;358358- int rc;359359-360360-again:361361- rc = -EFAULT;362362- mmap_read_lock(gmap->mm);363363-364364- uaddr = __gmap_translate(gmap, gaddr);365365- if (IS_ERR_VALUE(uaddr))366366- goto out;367367- vma = vma_lookup(gmap->mm, uaddr);368368- if (!vma)369369- goto out;370370- /*371371- * Secure pages cannot be huge and userspace should not combine both.372372- * In case userspace does it anyway this will result in an -EFAULT for373373- * the unpack. The guest is thus never reaching secure mode. If374374- * userspace is playing dirty tricky with mapping huge pages later375375- * on this will result in a segmentation fault.376376- */377377- if (is_vm_hugetlb_page(vma))378378- goto out;379379-380380- rc = -ENXIO;381381- ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);382382- if (!ptep)383383- goto out;384384- if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) {385385- folio = page_folio(pte_page(*ptep));386386- rc = -EAGAIN;387387- if (folio_test_large(folio)) {388388- rc = -E2BIG;389389- } else if (folio_trylock(folio)) {390390- if (should_export_before_import(uvcb, gmap->mm))391391- uv_convert_from_secure(PFN_PHYS(folio_pfn(folio)));392392- rc = make_folio_secure(folio, uvcb);393393- folio_unlock(folio);394394- }395395-396396- /*397397- * Once we drop the PTL, the folio may get unmapped and398398- * freed immediately. We need a temporary reference.399399- */400400- if (rc == -EAGAIN || rc == -E2BIG)401401- folio_get(folio);402402- }403403- pte_unmap_unlock(ptep, ptelock);404404-out:405405- mmap_read_unlock(gmap->mm);406406-407407- switch (rc) {408408- case -E2BIG:409409- folio_lock(folio);410410- rc = split_folio(folio);411411- folio_unlock(folio);412412- folio_put(folio);413413-414414- switch (rc) {415415- case 0:416416- /* Splitting succeeded, try again immediately. */417417- goto again;418418- case -EAGAIN:419419- /* Additional folio references. */420420- if (drain_lru(&drain_lru_called))421421- goto again;422422- return -EAGAIN;423423- case -EBUSY:424424- /* Unexpected race. */425425- return -EAGAIN;426426- }427427- WARN_ON_ONCE(1);428428- return -ENXIO;429429- case -EAGAIN:430430- /*431431- * If we are here because the UVC returned busy or partial432432- * completion, this is just a useless check, but it is safe.433433- */434434- folio_wait_writeback(folio);435435- folio_put(folio);436436- return -EAGAIN;437437- case -EBUSY:438438- /* Additional folio references. */439439- if (drain_lru(&drain_lru_called))440440- goto again;441441- return -EAGAIN;442442- case -ENXIO:443443- if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))444444- return -EFAULT;445445- return -EAGAIN;446446- }447447- return rc;448448-}449449-EXPORT_SYMBOL_GPL(gmap_make_secure);450450-451451-int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr)452452-{453453- struct uv_cb_cts uvcb = {454454- .header.cmd = UVC_CMD_CONV_TO_SEC_STOR,455455- .header.len = sizeof(uvcb),456456- .guest_handle = gmap->guest_handle,457457- .gaddr = gaddr,458458- };459459-460460- return gmap_make_secure(gmap, gaddr, &uvcb);461461-}462462-EXPORT_SYMBOL_GPL(gmap_convert_to_secure);463463-464464-/**465465- * gmap_destroy_page - Destroy a guest page.466466- * @gmap: the gmap of the guest467467- * @gaddr: the guest address to destroy468468- *469469- * An attempt will be made to destroy the given guest page. If the attempt470470- * fails, an attempt is made to export the page. If both attempts fail, an471471- * appropriate error is returned.472472- */473473-int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)474474-{475475- struct vm_area_struct *vma;476476- struct folio_walk fw;477477- unsigned long uaddr;478478- struct folio *folio;479479- int rc;480480-481481- rc = -EFAULT;482482- mmap_read_lock(gmap->mm);483483-484484- uaddr = __gmap_translate(gmap, gaddr);485485- if (IS_ERR_VALUE(uaddr))486486- goto out;487487- vma = vma_lookup(gmap->mm, uaddr);488488- if (!vma)489489- goto out;490490- /*491491- * Huge pages should not be able to become secure492492- */493493- if (is_vm_hugetlb_page(vma))494494- goto out;495495-496496- rc = 0;497497- folio = folio_walk_start(&fw, vma, uaddr, 0);498498- if (!folio)499499- goto out;500500- /*501501- * See gmap_make_secure(): large folios cannot be secure. Small502502- * folio implies FW_LEVEL_PTE.503503- */504504- if (folio_test_large(folio) || !pte_write(fw.pte))505505- goto out_walk_end;506506- rc = uv_destroy_folio(folio);507507- /*508508- * Fault handlers can race; it is possible that two CPUs will fault509509- * on the same secure page. One CPU can destroy the page, reboot,510510- * re-enter secure mode and import it, while the second CPU was511511- * stuck at the beginning of the handler. At some point the second512512- * CPU will be able to progress, and it will not be able to destroy513513- * the page. In that case we do not want to terminate the process,514514- * we instead try to export the page.515515- */516516- if (rc)517517- rc = uv_convert_from_secure_folio(folio);518518-out_walk_end:519519- folio_walk_end(&fw, vma);520520-out:521521- mmap_read_unlock(gmap->mm);522522- return rc;523523-}524524-EXPORT_SYMBOL_GPL(gmap_destroy_page);270270+EXPORT_SYMBOL_GPL(make_folio_secure);525271526272/*527273 * To be called with the folio locked or with an extra reference! This will
···1616#include <asm/gmap.h>1717#include <asm/dat-bits.h>1818#include "kvm-s390.h"1919+#include "gmap.h"1920#include "gaccess.h"20212122/*···13941393}1395139413961395/**13961396+ * shadow_pgt_lookup() - find a shadow page table13971397+ * @sg: pointer to the shadow guest address space structure13981398+ * @saddr: the address in the shadow aguest address space13991399+ * @pgt: parent gmap address of the page table to get shadowed14001400+ * @dat_protection: if the pgtable is marked as protected by dat14011401+ * @fake: pgt references contiguous guest memory block, not a pgtable14021402+ *14031403+ * Returns 0 if the shadow page table was found and -EAGAIN if the page14041404+ * table was not found.14051405+ *14061406+ * Called with sg->mm->mmap_lock in read.14071407+ */14081408+static int shadow_pgt_lookup(struct gmap *sg, unsigned long saddr, unsigned long *pgt,14091409+ int *dat_protection, int *fake)14101410+{14111411+ unsigned long pt_index;14121412+ unsigned long *table;14131413+ struct page *page;14141414+ int rc;14151415+14161416+ spin_lock(&sg->guest_table_lock);14171417+ table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */14181418+ if (table && !(*table & _SEGMENT_ENTRY_INVALID)) {14191419+ /* Shadow page tables are full pages (pte+pgste) */14201420+ page = pfn_to_page(*table >> PAGE_SHIFT);14211421+ pt_index = gmap_pgste_get_pgt_addr(page_to_virt(page));14221422+ *pgt = pt_index & ~GMAP_SHADOW_FAKE_TABLE;14231423+ *dat_protection = !!(*table & _SEGMENT_ENTRY_PROTECT);14241424+ *fake = !!(pt_index & GMAP_SHADOW_FAKE_TABLE);14251425+ rc = 0;14261426+ } else {14271427+ rc = -EAGAIN;14281428+ }14291429+ spin_unlock(&sg->guest_table_lock);14301430+ return rc;14311431+}14321432+14331433+/**13971434 * kvm_s390_shadow_fault - handle fault on a shadow page table13981435 * @vcpu: virtual cpu13991436 * @sg: pointer to the shadow guest address space structure···14541415 int dat_protection, fake;14551416 int rc;1456141714181418+ if (KVM_BUG_ON(!gmap_is_shadow(sg), vcpu->kvm))14191419+ return -EFAULT;14201420+14571421 mmap_read_lock(sg->mm);14581422 /*14591423 * We don't want any guest-2 tables to change - so the parent···14651423 */14661424 ipte_lock(vcpu->kvm);1467142514681468- rc = gmap_shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);14261426+ rc = shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);14691427 if (rc)14701428 rc = kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection,14711429 &fake);
+142
arch/s390/kvm/gmap-vsie.c
···11+// SPDX-License-Identifier: GPL-2.022+/*33+ * Guest memory management for KVM/s390 nested VMs.44+ *55+ * Copyright IBM Corp. 2008, 2020, 202466+ *77+ * Author(s): Claudio Imbrenda <imbrenda@linux.ibm.com>88+ * Martin Schwidefsky <schwidefsky@de.ibm.com>99+ * David Hildenbrand <david@redhat.com>1010+ * Janosch Frank <frankja@linux.vnet.ibm.com>1111+ */1212+1313+#include <linux/compiler.h>1414+#include <linux/kvm.h>1515+#include <linux/kvm_host.h>1616+#include <linux/pgtable.h>1717+#include <linux/pagemap.h>1818+#include <linux/mman.h>1919+2020+#include <asm/lowcore.h>2121+#include <asm/gmap.h>2222+#include <asm/uv.h>2323+2424+#include "kvm-s390.h"2525+#include "gmap.h"2626+2727+/**2828+ * gmap_find_shadow - find a specific asce in the list of shadow tables2929+ * @parent: pointer to the parent gmap3030+ * @asce: ASCE for which the shadow table is created3131+ * @edat_level: edat level to be used for the shadow translation3232+ *3333+ * Returns the pointer to a gmap if a shadow table with the given asce is3434+ * already available, ERR_PTR(-EAGAIN) if another one is just being created,3535+ * otherwise NULL3636+ *3737+ * Context: Called with parent->shadow_lock held3838+ */3939+static struct gmap *gmap_find_shadow(struct gmap *parent, unsigned long asce, int edat_level)4040+{4141+ struct gmap *sg;4242+4343+ lockdep_assert_held(&parent->shadow_lock);4444+ list_for_each_entry(sg, &parent->children, list) {4545+ if (!gmap_shadow_valid(sg, asce, edat_level))4646+ continue;4747+ if (!sg->initialized)4848+ return ERR_PTR(-EAGAIN);4949+ refcount_inc(&sg->ref_count);5050+ return sg;5151+ }5252+ return NULL;5353+}5454+5555+/**5656+ * gmap_shadow - create/find a shadow guest address space5757+ * @parent: pointer to the parent gmap5858+ * @asce: ASCE for which the shadow table is created5959+ * @edat_level: edat level to be used for the shadow translation6060+ *6161+ * The pages of the top level page table referred by the asce parameter6262+ * will be set to read-only and marked in the PGSTEs of the kvm process.6363+ * The shadow table will be removed automatically on any change to the6464+ * PTE mapping for the source table.6565+ *6666+ * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of memory,6767+ * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the6868+ * parent gmap table could not be protected.6969+ */7070+struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat_level)7171+{7272+ struct gmap *sg, *new;7373+ unsigned long limit;7474+ int rc;7575+7676+ if (KVM_BUG_ON(parent->mm->context.allow_gmap_hpage_1m, (struct kvm *)parent->private) ||7777+ KVM_BUG_ON(gmap_is_shadow(parent), (struct kvm *)parent->private))7878+ return ERR_PTR(-EFAULT);7979+ spin_lock(&parent->shadow_lock);8080+ sg = gmap_find_shadow(parent, asce, edat_level);8181+ spin_unlock(&parent->shadow_lock);8282+ if (sg)8383+ return sg;8484+ /* Create a new shadow gmap */8585+ limit = -1UL >> (33 - (((asce & _ASCE_TYPE_MASK) >> 2) * 11));8686+ if (asce & _ASCE_REAL_SPACE)8787+ limit = -1UL;8888+ new = gmap_alloc(limit);8989+ if (!new)9090+ return ERR_PTR(-ENOMEM);9191+ new->mm = parent->mm;9292+ new->parent = gmap_get(parent);9393+ new->private = parent->private;9494+ new->orig_asce = asce;9595+ new->edat_level = edat_level;9696+ new->initialized = false;9797+ spin_lock(&parent->shadow_lock);9898+ /* Recheck if another CPU created the same shadow */9999+ sg = gmap_find_shadow(parent, asce, edat_level);100100+ if (sg) {101101+ spin_unlock(&parent->shadow_lock);102102+ gmap_free(new);103103+ return sg;104104+ }105105+ if (asce & _ASCE_REAL_SPACE) {106106+ /* only allow one real-space gmap shadow */107107+ list_for_each_entry(sg, &parent->children, list) {108108+ if (sg->orig_asce & _ASCE_REAL_SPACE) {109109+ spin_lock(&sg->guest_table_lock);110110+ gmap_unshadow(sg);111111+ spin_unlock(&sg->guest_table_lock);112112+ list_del(&sg->list);113113+ gmap_put(sg);114114+ break;115115+ }116116+ }117117+ }118118+ refcount_set(&new->ref_count, 2);119119+ list_add(&new->list, &parent->children);120120+ if (asce & _ASCE_REAL_SPACE) {121121+ /* nothing to protect, return right away */122122+ new->initialized = true;123123+ spin_unlock(&parent->shadow_lock);124124+ return new;125125+ }126126+ spin_unlock(&parent->shadow_lock);127127+ /* protect after insertion, so it will get properly invalidated */128128+ mmap_read_lock(parent->mm);129129+ rc = __kvm_s390_mprotect_many(parent, asce & _ASCE_ORIGIN,130130+ ((asce & _ASCE_TABLE_LENGTH) + 1),131131+ PROT_READ, GMAP_NOTIFY_SHADOW);132132+ mmap_read_unlock(parent->mm);133133+ spin_lock(&parent->shadow_lock);134134+ new->initialized = true;135135+ if (rc) {136136+ list_del(&new->list);137137+ gmap_free(new);138138+ new = ERR_PTR(rc);139139+ }140140+ spin_unlock(&parent->shadow_lock);141141+ return new;142142+}
+212
arch/s390/kvm/gmap.c
···11+// SPDX-License-Identifier: GPL-2.022+/*33+ * Guest memory management for KVM/s39044+ *55+ * Copyright IBM Corp. 2008, 2020, 202466+ *77+ * Author(s): Claudio Imbrenda <imbrenda@linux.ibm.com>88+ * Martin Schwidefsky <schwidefsky@de.ibm.com>99+ * David Hildenbrand <david@redhat.com>1010+ * Janosch Frank <frankja@linux.vnet.ibm.com>1111+ */1212+1313+#include <linux/compiler.h>1414+#include <linux/kvm.h>1515+#include <linux/kvm_host.h>1616+#include <linux/pgtable.h>1717+#include <linux/pagemap.h>1818+1919+#include <asm/lowcore.h>2020+#include <asm/gmap.h>2121+#include <asm/uv.h>2222+2323+#include "gmap.h"2424+2525+/**2626+ * should_export_before_import - Determine whether an export is needed2727+ * before an import-like operation2828+ * @uvcb: the Ultravisor control block of the UVC to be performed2929+ * @mm: the mm of the process3030+ *3131+ * Returns whether an export is needed before every import-like operation.3232+ * This is needed for shared pages, which don't trigger a secure storage3333+ * exception when accessed from a different guest.3434+ *3535+ * Although considered as one, the Unpin Page UVC is not an actual import,3636+ * so it is not affected.3737+ *3838+ * No export is needed also when there is only one protected VM, because the3939+ * page cannot belong to the wrong VM in that case (there is no "other VM"4040+ * it can belong to).4141+ *4242+ * Return: true if an export is needed before every import, otherwise false.4343+ */4444+static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm)4545+{4646+ /*4747+ * The misc feature indicates, among other things, that importing a4848+ * shared page from a different protected VM will automatically also4949+ * transfer its ownership.5050+ */5151+ if (uv_has_feature(BIT_UV_FEAT_MISC))5252+ return false;5353+ if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED)5454+ return false;5555+ return atomic_read(&mm->context.protected_count) > 1;5656+}5757+5858+static int __gmap_make_secure(struct gmap *gmap, struct page *page, void *uvcb)5959+{6060+ struct folio *folio = page_folio(page);6161+ int rc;6262+6363+ /*6464+ * Secure pages cannot be huge and userspace should not combine both.6565+ * In case userspace does it anyway this will result in an -EFAULT for6666+ * the unpack. The guest is thus never reaching secure mode.6767+ * If userspace plays dirty tricks and decides to map huge pages at a6868+ * later point in time, it will receive a segmentation fault or6969+ * KVM_RUN will return -EFAULT.7070+ */7171+ if (folio_test_hugetlb(folio))7272+ return -EFAULT;7373+ if (folio_test_large(folio)) {7474+ mmap_read_unlock(gmap->mm);7575+ rc = kvm_s390_wiggle_split_folio(gmap->mm, folio, true);7676+ mmap_read_lock(gmap->mm);7777+ if (rc)7878+ return rc;7979+ folio = page_folio(page);8080+ }8181+8282+ if (!folio_trylock(folio))8383+ return -EAGAIN;8484+ if (should_export_before_import(uvcb, gmap->mm))8585+ uv_convert_from_secure(folio_to_phys(folio));8686+ rc = make_folio_secure(folio, uvcb);8787+ folio_unlock(folio);8888+8989+ /*9090+ * In theory a race is possible and the folio might have become9191+ * large again before the folio_trylock() above. In that case, no9292+ * action is performed and -EAGAIN is returned; the callers will9393+ * have to try again later.9494+ * In most cases this implies running the VM again, getting the same9595+ * exception again, and make another attempt in this function.9696+ * This is expected to happen extremely rarely.9797+ */9898+ if (rc == -E2BIG)9999+ return -EAGAIN;100100+ /* The folio has too many references, try to shake some off */101101+ if (rc == -EBUSY) {102102+ mmap_read_unlock(gmap->mm);103103+ kvm_s390_wiggle_split_folio(gmap->mm, folio, false);104104+ mmap_read_lock(gmap->mm);105105+ return -EAGAIN;106106+ }107107+108108+ return rc;109109+}110110+111111+/**112112+ * gmap_make_secure() - make one guest page secure113113+ * @gmap: the guest gmap114114+ * @gaddr: the guest address that needs to be made secure115115+ * @uvcb: the UVCB specifying which operation needs to be performed116116+ *117117+ * Context: needs to be called with kvm->srcu held.118118+ * Return: 0 on success, < 0 in case of error (see __gmap_make_secure()).119119+ */120120+int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)121121+{122122+ struct kvm *kvm = gmap->private;123123+ struct page *page;124124+ int rc = 0;125125+126126+ lockdep_assert_held(&kvm->srcu);127127+128128+ page = gfn_to_page(kvm, gpa_to_gfn(gaddr));129129+ mmap_read_lock(gmap->mm);130130+ if (page)131131+ rc = __gmap_make_secure(gmap, page, uvcb);132132+ kvm_release_page_clean(page);133133+ mmap_read_unlock(gmap->mm);134134+135135+ return rc;136136+}137137+138138+int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr)139139+{140140+ struct uv_cb_cts uvcb = {141141+ .header.cmd = UVC_CMD_CONV_TO_SEC_STOR,142142+ .header.len = sizeof(uvcb),143143+ .guest_handle = gmap->guest_handle,144144+ .gaddr = gaddr,145145+ };146146+147147+ return gmap_make_secure(gmap, gaddr, &uvcb);148148+}149149+150150+/**151151+ * __gmap_destroy_page() - Destroy a guest page.152152+ * @gmap: the gmap of the guest153153+ * @page: the page to destroy154154+ *155155+ * An attempt will be made to destroy the given guest page. If the attempt156156+ * fails, an attempt is made to export the page. If both attempts fail, an157157+ * appropriate error is returned.158158+ *159159+ * Context: must be called holding the mm lock for gmap->mm160160+ */161161+static int __gmap_destroy_page(struct gmap *gmap, struct page *page)162162+{163163+ struct folio *folio = page_folio(page);164164+ int rc;165165+166166+ /*167167+ * See gmap_make_secure(): large folios cannot be secure. Small168168+ * folio implies FW_LEVEL_PTE.169169+ */170170+ if (folio_test_large(folio))171171+ return -EFAULT;172172+173173+ rc = uv_destroy_folio(folio);174174+ /*175175+ * Fault handlers can race; it is possible that two CPUs will fault176176+ * on the same secure page. One CPU can destroy the page, reboot,177177+ * re-enter secure mode and import it, while the second CPU was178178+ * stuck at the beginning of the handler. At some point the second179179+ * CPU will be able to progress, and it will not be able to destroy180180+ * the page. In that case we do not want to terminate the process,181181+ * we instead try to export the page.182182+ */183183+ if (rc)184184+ rc = uv_convert_from_secure_folio(folio);185185+186186+ return rc;187187+}188188+189189+/**190190+ * gmap_destroy_page() - Destroy a guest page.191191+ * @gmap: the gmap of the guest192192+ * @gaddr: the guest address to destroy193193+ *194194+ * An attempt will be made to destroy the given guest page. If the attempt195195+ * fails, an attempt is made to export the page. If both attempts fail, an196196+ * appropriate error is returned.197197+ *198198+ * Context: may sleep.199199+ */200200+int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)201201+{202202+ struct page *page;203203+ int rc = 0;204204+205205+ mmap_read_lock(gmap->mm);206206+ page = gfn_to_page(gmap->private, gpa_to_gfn(gaddr));207207+ if (page)208208+ rc = __gmap_destroy_page(gmap, page);209209+ kvm_release_page_clean(page);210210+ mmap_read_unlock(gmap->mm);211211+ return rc;212212+}
+39
arch/s390/kvm/gmap.h
···11+/* SPDX-License-Identifier: GPL-2.0 */22+/*33+ * KVM guest address space mapping code44+ *55+ * Copyright IBM Corp. 2007, 2016, 202566+ * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>77+ * Claudio Imbrenda <imbrenda@linux.ibm.com>88+ */99+1010+#ifndef ARCH_KVM_S390_GMAP_H1111+#define ARCH_KVM_S390_GMAP_H1212+1313+#define GMAP_SHADOW_FAKE_TABLE 1ULL1414+1515+int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);1616+int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr);1717+int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr);1818+struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat_level);1919+2020+/**2121+ * gmap_shadow_valid - check if a shadow guest address space matches the2222+ * given properties and is still valid2323+ * @sg: pointer to the shadow guest address space structure2424+ * @asce: ASCE for which the shadow table is requested2525+ * @edat_level: edat level to be used for the shadow translation2626+ *2727+ * Returns 1 if the gmap shadow is still valid and matches the given2828+ * properties, the caller can continue using it. Returns 0 otherwise, the2929+ * caller has to request a new shadow gmap in this case.3030+ *3131+ */3232+static inline int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level)3333+{3434+ if (sg->removed)3535+ return 0;3636+ return sg->orig_asce == asce && sg->edat_level == edat_level;3737+}3838+3939+#endif
+4-3
arch/s390/kvm/intercept.c
···2121#include "gaccess.h"2222#include "trace.h"2323#include "trace-s390.h"2424+#include "gmap.h"24252526u8 kvm_s390_get_ilen(struct kvm_vcpu *vcpu)2627{···368367 reg2, &srcaddr, GACC_FETCH, 0);369368 if (rc)370369 return kvm_s390_inject_prog_cond(vcpu, rc);371371- rc = gmap_fault(vcpu->arch.gmap, srcaddr, 0);370370+ rc = kvm_s390_handle_dat_fault(vcpu, srcaddr, 0);372371 if (rc != 0)373372 return rc;374373···377376 reg1, &dstaddr, GACC_STORE, 0);378377 if (rc)379378 return kvm_s390_inject_prog_cond(vcpu, rc);380380- rc = gmap_fault(vcpu->arch.gmap, dstaddr, FAULT_FLAG_WRITE);379379+ rc = kvm_s390_handle_dat_fault(vcpu, dstaddr, FOLL_WRITE);381380 if (rc != 0)382381 return rc;383382···550549 * If the unpin did not succeed, the guest will exit again for the UVC551550 * and we will retry the unpin.552551 */553553- if (rc == -EINVAL)552552+ if (rc == -EINVAL || rc == -ENXIO)554553 return 0;555554 /*556555 * If we got -EAGAIN here, we simply return it. It will eventually
+11-8
arch/s390/kvm/interrupt.c
···28932893 struct kvm_kernel_irq_routing_entry *e,28942894 const struct kvm_irq_routing_entry *ue)28952895{28962896- u64 uaddr;28962896+ u64 uaddr_s, uaddr_i;28972897+ int idx;2897289828982899 switch (ue->type) {28992900 /* we store the userspace addresses instead of the guest addresses */···29022901 if (kvm_is_ucontrol(kvm))29032902 return -EINVAL;29042903 e->set = set_adapter_int;29052905- uaddr = gmap_translate(kvm->arch.gmap, ue->u.adapter.summary_addr);29062906- if (uaddr == -EFAULT)29042904+29052905+ idx = srcu_read_lock(&kvm->srcu);29062906+ uaddr_s = gpa_to_hva(kvm, ue->u.adapter.summary_addr);29072907+ uaddr_i = gpa_to_hva(kvm, ue->u.adapter.ind_addr);29082908+ srcu_read_unlock(&kvm->srcu, idx);29092909+29102910+ if (kvm_is_error_hva(uaddr_s) || kvm_is_error_hva(uaddr_i))29072911 return -EFAULT;29082908- e->adapter.summary_addr = uaddr;29092909- uaddr = gmap_translate(kvm->arch.gmap, ue->u.adapter.ind_addr);29102910- if (uaddr == -EFAULT)29112911- return -EFAULT;29122912- e->adapter.ind_addr = uaddr;29122912+ e->adapter.summary_addr = uaddr_s;29132913+ e->adapter.ind_addr = uaddr_i;29132914 e->adapter.summary_offset = ue->u.adapter.summary_offset;29142915 e->adapter.ind_offset = ue->u.adapter.ind_offset;29152916 e->adapter.adapter_id = ue->u.adapter.adapter_id;
+197-40
arch/s390/kvm/kvm-s390.c
···5050#include "kvm-s390.h"5151#include "gaccess.h"5252#include "pci.h"5353+#include "gmap.h"53545455#define CREATE_TRACE_POINTS5556#include "trace.h"···34293428 VM_EVENT(kvm, 3, "vm created with type %lu", type);3430342934313430 if (type & KVM_VM_S390_UCONTROL) {34313431+ struct kvm_userspace_memory_region2 fake_memslot = {34323432+ .slot = KVM_S390_UCONTROL_MEMSLOT,34333433+ .guest_phys_addr = 0,34343434+ .userspace_addr = 0,34353435+ .memory_size = ALIGN_DOWN(TASK_SIZE, _SEGMENT_SIZE),34363436+ .flags = 0,34373437+ };34383438+34323439 kvm->arch.gmap = NULL;34333440 kvm->arch.mem_limit = KVM_S390_NO_MEM_LIMIT;34413441+ /* one flat fake memslot covering the whole address-space */34423442+ mutex_lock(&kvm->slots_lock);34433443+ KVM_BUG_ON(kvm_set_internal_memslot(kvm, &fake_memslot), kvm);34443444+ mutex_unlock(&kvm->slots_lock);34343445 } else {34353446 if (sclp.hamax == U64_MAX)34363447 kvm->arch.mem_limit = TASK_SIZE_MAX;···45114498 return kvm_s390_test_cpuflags(vcpu, CPUSTAT_IBS);45124499}4513450045014501+static int __kvm_s390_fixup_fault_sync(struct gmap *gmap, gpa_t gaddr, unsigned int flags)45024502+{45034503+ struct kvm *kvm = gmap->private;45044504+ gfn_t gfn = gpa_to_gfn(gaddr);45054505+ bool unlocked;45064506+ hva_t vmaddr;45074507+ gpa_t tmp;45084508+ int rc;45094509+45104510+ if (kvm_is_ucontrol(kvm)) {45114511+ tmp = __gmap_translate(gmap, gaddr);45124512+ gfn = gpa_to_gfn(tmp);45134513+ }45144514+45154515+ vmaddr = gfn_to_hva(kvm, gfn);45164516+ rc = fixup_user_fault(gmap->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked);45174517+ if (!rc)45184518+ rc = __gmap_link(gmap, gaddr, vmaddr);45194519+ return rc;45204520+}45214521+45224522+/**45234523+ * __kvm_s390_mprotect_many() - Apply specified protection to guest pages45244524+ * @gmap: the gmap of the guest45254525+ * @gpa: the starting guest address45264526+ * @npages: how many pages to protect45274527+ * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE45284528+ * @bits: pgste notification bits to set45294529+ *45304530+ * Returns: 0 in case of success, < 0 in case of error - see gmap_protect_one()45314531+ *45324532+ * Context: kvm->srcu and gmap->mm need to be held in read mode45334533+ */45344534+int __kvm_s390_mprotect_many(struct gmap *gmap, gpa_t gpa, u8 npages, unsigned int prot,45354535+ unsigned long bits)45364536+{45374537+ unsigned int fault_flag = (prot & PROT_WRITE) ? FAULT_FLAG_WRITE : 0;45384538+ gpa_t end = gpa + npages * PAGE_SIZE;45394539+ int rc;45404540+45414541+ for (; gpa < end; gpa = ALIGN(gpa + 1, rc)) {45424542+ rc = gmap_protect_one(gmap, gpa, prot, bits);45434543+ if (rc == -EAGAIN) {45444544+ __kvm_s390_fixup_fault_sync(gmap, gpa, fault_flag);45454545+ rc = gmap_protect_one(gmap, gpa, prot, bits);45464546+ }45474547+ if (rc < 0)45484548+ return rc;45494549+ }45504550+45514551+ return 0;45524552+}45534553+45544554+static int kvm_s390_mprotect_notify_prefix(struct kvm_vcpu *vcpu)45554555+{45564556+ gpa_t gaddr = kvm_s390_get_prefix(vcpu);45574557+ int idx, rc;45584558+45594559+ idx = srcu_read_lock(&vcpu->kvm->srcu);45604560+ mmap_read_lock(vcpu->arch.gmap->mm);45614561+45624562+ rc = __kvm_s390_mprotect_many(vcpu->arch.gmap, gaddr, 2, PROT_WRITE, GMAP_NOTIFY_MPROT);45634563+45644564+ mmap_read_unlock(vcpu->arch.gmap->mm);45654565+ srcu_read_unlock(&vcpu->kvm->srcu, idx);45664566+45674567+ return rc;45684568+}45694569+45144570static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)45154571{45164572retry:···45954513 */45964514 if (kvm_check_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu)) {45974515 int rc;45984598- rc = gmap_mprotect_notify(vcpu->arch.gmap,45994599- kvm_s390_get_prefix(vcpu),46004600- PAGE_SIZE * 2, PROT_WRITE);45164516+45174517+ rc = kvm_s390_mprotect_notify_prefix(vcpu);46014518 if (rc) {46024519 kvm_make_request(KVM_REQ_REFRESH_GUEST_PREFIX, vcpu);46034520 return rc;···48474766 return kvm_s390_inject_prog_irq(vcpu, &pgm_info);48484767}4849476847694769+static void kvm_s390_assert_primary_as(struct kvm_vcpu *vcpu)47704770+{47714771+ KVM_BUG(current->thread.gmap_teid.as != PSW_BITS_AS_PRIMARY, vcpu->kvm,47724772+ "Unexpected program interrupt 0x%x, TEID 0x%016lx",47734773+ current->thread.gmap_int_code, current->thread.gmap_teid.val);47744774+}47754775+47764776+/*47774777+ * __kvm_s390_handle_dat_fault() - handle a dat fault for the gmap of a vcpu47784778+ * @vcpu: the vCPU whose gmap is to be fixed up47794779+ * @gfn: the guest frame number used for memslots (including fake memslots)47804780+ * @gaddr: the gmap address, does not have to match @gfn for ucontrol gmaps47814781+ * @flags: FOLL_* flags47824782+ *47834783+ * Return: 0 on success, < 0 in case of error.47844784+ * Context: The mm lock must not be held before calling. May sleep.47854785+ */47864786+int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t gaddr, unsigned int flags)47874787+{47884788+ struct kvm_memory_slot *slot;47894789+ unsigned int fault_flags;47904790+ bool writable, unlocked;47914791+ unsigned long vmaddr;47924792+ struct page *page;47934793+ kvm_pfn_t pfn;47944794+ int rc;47954795+47964796+ slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);47974797+ if (!slot || slot->flags & KVM_MEMSLOT_INVALID)47984798+ return vcpu_post_run_addressing_exception(vcpu);47994799+48004800+ fault_flags = flags & FOLL_WRITE ? FAULT_FLAG_WRITE : 0;48014801+ if (vcpu->arch.gmap->pfault_enabled)48024802+ flags |= FOLL_NOWAIT;48034803+ vmaddr = __gfn_to_hva_memslot(slot, gfn);48044804+48054805+try_again:48064806+ pfn = __kvm_faultin_pfn(slot, gfn, flags, &writable, &page);48074807+48084808+ /* Access outside memory, inject addressing exception */48094809+ if (is_noslot_pfn(pfn))48104810+ return vcpu_post_run_addressing_exception(vcpu);48114811+ /* Signal pending: try again */48124812+ if (pfn == KVM_PFN_ERR_SIGPENDING)48134813+ return -EAGAIN;48144814+48154815+ /* Needs I/O, try to setup async pfault (only possible with FOLL_NOWAIT) */48164816+ if (pfn == KVM_PFN_ERR_NEEDS_IO) {48174817+ trace_kvm_s390_major_guest_pfault(vcpu);48184818+ if (kvm_arch_setup_async_pf(vcpu))48194819+ return 0;48204820+ vcpu->stat.pfault_sync++;48214821+ /* Could not setup async pfault, try again synchronously */48224822+ flags &= ~FOLL_NOWAIT;48234823+ goto try_again;48244824+ }48254825+ /* Any other error */48264826+ if (is_error_pfn(pfn))48274827+ return -EFAULT;48284828+48294829+ /* Success */48304830+ mmap_read_lock(vcpu->arch.gmap->mm);48314831+ /* Mark the userspace PTEs as young and/or dirty, to avoid page fault loops */48324832+ rc = fixup_user_fault(vcpu->arch.gmap->mm, vmaddr, fault_flags, &unlocked);48334833+ if (!rc)48344834+ rc = __gmap_link(vcpu->arch.gmap, gaddr, vmaddr);48354835+ scoped_guard(spinlock, &vcpu->kvm->mmu_lock) {48364836+ kvm_release_faultin_page(vcpu->kvm, page, false, writable);48374837+ }48384838+ mmap_read_unlock(vcpu->arch.gmap->mm);48394839+ return rc;48404840+}48414841+48424842+static int vcpu_dat_fault_handler(struct kvm_vcpu *vcpu, unsigned long gaddr, unsigned int flags)48434843+{48444844+ unsigned long gaddr_tmp;48454845+ gfn_t gfn;48464846+48474847+ gfn = gpa_to_gfn(gaddr);48484848+ if (kvm_is_ucontrol(vcpu->kvm)) {48494849+ /*48504850+ * This translates the per-vCPU guest address into a48514851+ * fake guest address, which can then be used with the48524852+ * fake memslots that are identity mapping userspace.48534853+ * This allows ucontrol VMs to use the normal fault48544854+ * resolution path, like normal VMs.48554855+ */48564856+ mmap_read_lock(vcpu->arch.gmap->mm);48574857+ gaddr_tmp = __gmap_translate(vcpu->arch.gmap, gaddr);48584858+ mmap_read_unlock(vcpu->arch.gmap->mm);48594859+ if (gaddr_tmp == -EFAULT) {48604860+ vcpu->run->exit_reason = KVM_EXIT_S390_UCONTROL;48614861+ vcpu->run->s390_ucontrol.trans_exc_code = gaddr;48624862+ vcpu->run->s390_ucontrol.pgm_code = PGM_SEGMENT_TRANSLATION;48634863+ return -EREMOTE;48644864+ }48654865+ gfn = gpa_to_gfn(gaddr_tmp);48664866+ }48674867+ return __kvm_s390_handle_dat_fault(vcpu, gfn, gaddr, flags);48684868+}48694869+48504870static int vcpu_post_run_handle_fault(struct kvm_vcpu *vcpu)48514871{48524872 unsigned int flags = 0;48534873 unsigned long gaddr;48544854- int rc = 0;4855487448564875 gaddr = current->thread.gmap_teid.addr * PAGE_SIZE;48574876 if (kvm_s390_cur_gmap_fault_is_write())···49624781 vcpu->stat.exit_null++;49634782 break;49644783 case PGM_NON_SECURE_STORAGE_ACCESS:49654965- KVM_BUG(current->thread.gmap_teid.as != PSW_BITS_AS_PRIMARY, vcpu->kvm,49664966- "Unexpected program interrupt 0x%x, TEID 0x%016lx",49674967- current->thread.gmap_int_code, current->thread.gmap_teid.val);47844784+ kvm_s390_assert_primary_as(vcpu);49684785 /*49694786 * This is normal operation; a page belonging to a protected49704787 * guest has not been imported yet. Try to import the page into···49734794 break;49744795 case PGM_SECURE_STORAGE_ACCESS:49754796 case PGM_SECURE_STORAGE_VIOLATION:49764976- KVM_BUG(current->thread.gmap_teid.as != PSW_BITS_AS_PRIMARY, vcpu->kvm,49774977- "Unexpected program interrupt 0x%x, TEID 0x%016lx",49784978- current->thread.gmap_int_code, current->thread.gmap_teid.val);47974797+ kvm_s390_assert_primary_as(vcpu);49794798 /*49804799 * This can happen after a reboot with asynchronous teardown;49814800 * the new guest (normal or protected) will run on top of the···50024825 case PGM_REGION_FIRST_TRANS:50034826 case PGM_REGION_SECOND_TRANS:50044827 case PGM_REGION_THIRD_TRANS:50055005- KVM_BUG(current->thread.gmap_teid.as != PSW_BITS_AS_PRIMARY, vcpu->kvm,50065006- "Unexpected program interrupt 0x%x, TEID 0x%016lx",50075007- current->thread.gmap_int_code, current->thread.gmap_teid.val);50085008- if (vcpu->arch.gmap->pfault_enabled) {50095009- rc = gmap_fault(vcpu->arch.gmap, gaddr, flags | FAULT_FLAG_RETRY_NOWAIT);50105010- if (rc == -EFAULT)50115011- return vcpu_post_run_addressing_exception(vcpu);50125012- if (rc == -EAGAIN) {50135013- trace_kvm_s390_major_guest_pfault(vcpu);50145014- if (kvm_arch_setup_async_pf(vcpu))50155015- return 0;50165016- vcpu->stat.pfault_sync++;50175017- } else {50185018- return rc;50195019- }50205020- }50215021- rc = gmap_fault(vcpu->arch.gmap, gaddr, flags);50225022- if (rc == -EFAULT) {50235023- if (kvm_is_ucontrol(vcpu->kvm)) {50245024- vcpu->run->exit_reason = KVM_EXIT_S390_UCONTROL;50255025- vcpu->run->s390_ucontrol.trans_exc_code = gaddr;50265026- vcpu->run->s390_ucontrol.pgm_code = 0x10;50275027- return -EREMOTE;50285028- }50295029- return vcpu_post_run_addressing_exception(vcpu);50305030- }50315031- break;48284828+ kvm_s390_assert_primary_as(vcpu);48294829+ return vcpu_dat_fault_handler(vcpu, gaddr, flags);50324830 default:50334831 KVM_BUG(1, vcpu->kvm, "Unexpected program interrupt 0x%x, TEID 0x%016lx",50344832 current->thread.gmap_int_code, current->thread.gmap_teid.val);50354833 send_sig(SIGSEGV, current, 0);50364834 break;50374835 }50385038- return rc;48364836+ return 0;50394837}5040483850414839static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)···58895737 }58905738#endif58915739 case KVM_S390_VCPU_FAULT: {58925892- r = gmap_fault(vcpu->arch.gmap, arg, 0);57405740+ idx = srcu_read_lock(&vcpu->kvm->srcu);57415741+ r = vcpu_dat_fault_handler(vcpu, arg, 0);57425742+ srcu_read_unlock(&vcpu->kvm->srcu, idx);58935743 break;58945744 }58955745 case KVM_ENABLE_CAP:···60075853{60085854 gpa_t size;6009585560106010- if (kvm_is_ucontrol(kvm))58565856+ if (kvm_is_ucontrol(kvm) && new->id < KVM_USER_MEM_SLOTS)60115857 return -EINVAL;6012585860135859 /* When we are protected, we should not change the memory slots */···60585904 enum kvm_mr_change change)60595905{60605906 int rc = 0;59075907+59085908+ if (kvm_is_ucontrol(kvm))59095909+ return;6061591060625911 switch (change) {60635912 case KVM_MR_DELETE:
···1717#include <linux/sched/mm.h>1818#include <linux/mmu_notifier.h>1919#include "kvm-s390.h"2020+#include "gmap.h"20212122bool kvm_s390_pv_is_protected(struct kvm *kvm)2223{···639638 .tweak[1] = offset,640639 };641640 int ret = gmap_make_secure(kvm->arch.gmap, addr, &uvcb);641641+ unsigned long vmaddr;642642+ bool unlocked;642643643644 *rc = uvcb.header.rc;644645 *rrc = uvcb.header.rrc;646646+647647+ if (ret == -ENXIO) {648648+ mmap_read_lock(kvm->mm);649649+ vmaddr = gfn_to_hva(kvm, gpa_to_gfn(addr));650650+ if (kvm_is_error_hva(vmaddr)) {651651+ ret = -EFAULT;652652+ } else {653653+ ret = fixup_user_fault(kvm->mm, vmaddr, FAULT_FLAG_WRITE, &unlocked);654654+ if (!ret)655655+ ret = __gmap_link(kvm->arch.gmap, addr, vmaddr);656656+ }657657+ mmap_read_unlock(kvm->mm);658658+ if (!ret)659659+ return -EAGAIN;660660+ return ret;661661+ }645662646663 if (ret && ret != -EAGAIN)647664 KVM_UV_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx with rc %x rrc %x",···678659679660 KVM_UV_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",680661 addr, size);662662+663663+ guard(srcu)(&kvm->srcu);681664682665 while (offset < size) {683666 ret = unpack_one(kvm, addr, tweak, offset, rc, rrc);
+68-38
arch/s390/kvm/vsie.c
···1313#include <linux/bitmap.h>1414#include <linux/sched/signal.h>1515#include <linux/io.h>1616+#include <linux/mman.h>16171718#include <asm/gmap.h>1819#include <asm/mmu_context.h>···2322#include <asm/facility.h>2423#include "kvm-s390.h"2524#include "gaccess.h"2525+#include "gmap.h"2626+2727+enum vsie_page_flags {2828+ VSIE_PAGE_IN_USE = 0,2929+};26302731struct vsie_page {2832 struct kvm_s390_sie_block scb_s; /* 0x0000 */···5246 gpa_t gvrd_gpa; /* 0x0240 */5347 gpa_t riccbd_gpa; /* 0x0248 */5448 gpa_t sdnx_gpa; /* 0x0250 */5555- __u8 reserved[0x0700 - 0x0258]; /* 0x0258 */4949+ /*5050+ * guest address of the original SCB. Remains set for free vsie5151+ * pages, so we can properly look them up in our addr_to_page5252+ * radix tree.5353+ */5454+ gpa_t scb_gpa; /* 0x0258 */5555+ /*5656+ * Flags: must be set/cleared atomically after the vsie page can be5757+ * looked up by other CPUs.5858+ */5959+ unsigned long flags; /* 0x0260 */6060+ __u8 reserved[0x0700 - 0x0268]; /* 0x0268 */5661 struct kvm_s390_crypto_cb crycb; /* 0x0700 */5762 __u8 fac[S390_ARCH_FAC_LIST_SIZE_BYTE]; /* 0x0800 */5863};···601584 struct kvm *kvm = gmap->private;602585 struct vsie_page *cur;603586 unsigned long prefix;604604- struct page *page;605587 int i;606588607589 if (!gmap_is_shadow(gmap))···610594 * therefore we can safely reference them all the time.611595 */612596 for (i = 0; i < kvm->arch.vsie.page_count; i++) {613613- page = READ_ONCE(kvm->arch.vsie.pages[i]);614614- if (!page)597597+ cur = READ_ONCE(kvm->arch.vsie.pages[i]);598598+ if (!cur)615599 continue;616616- cur = page_to_virt(page);617600 if (READ_ONCE(cur->gmap) != gmap)618601 continue;619602 prefix = cur->scb_s.prefix << GUEST_PREFIX_SHIFT;···13601345 return rc;13611346}1362134713481348+/* Try getting a given vsie page, returning "true" on success. */13491349+static inline bool try_get_vsie_page(struct vsie_page *vsie_page)13501350+{13511351+ if (test_bit(VSIE_PAGE_IN_USE, &vsie_page->flags))13521352+ return false;13531353+ return !test_and_set_bit(VSIE_PAGE_IN_USE, &vsie_page->flags);13541354+}13551355+13561356+/* Put a vsie page acquired through get_vsie_page / try_get_vsie_page. */13571357+static void put_vsie_page(struct vsie_page *vsie_page)13581358+{13591359+ clear_bit(VSIE_PAGE_IN_USE, &vsie_page->flags);13601360+}13611361+13631362/*13641363 * Get or create a vsie page for a scb address.13651364 *···13841355static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr)13851356{13861357 struct vsie_page *vsie_page;13871387- struct page *page;13881358 int nr_vcpus;1389135913901360 rcu_read_lock();13911391- page = radix_tree_lookup(&kvm->arch.vsie.addr_to_page, addr >> 9);13611361+ vsie_page = radix_tree_lookup(&kvm->arch.vsie.addr_to_page, addr >> 9);13921362 rcu_read_unlock();13931393- if (page) {13941394- if (page_ref_inc_return(page) == 2)13951395- return page_to_virt(page);13961396- page_ref_dec(page);13631363+ if (vsie_page) {13641364+ if (try_get_vsie_page(vsie_page)) {13651365+ if (vsie_page->scb_gpa == addr)13661366+ return vsie_page;13671367+ /*13681368+ * We raced with someone reusing + putting this vsie13691369+ * page before we grabbed it.13701370+ */13711371+ put_vsie_page(vsie_page);13721372+ }13971373 }1398137413991375 /*···1409137514101376 mutex_lock(&kvm->arch.vsie.mutex);14111377 if (kvm->arch.vsie.page_count < nr_vcpus) {14121412- page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | GFP_DMA);14131413- if (!page) {13781378+ vsie_page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | GFP_DMA);13791379+ if (!vsie_page) {14141380 mutex_unlock(&kvm->arch.vsie.mutex);14151381 return ERR_PTR(-ENOMEM);14161382 }14171417- page_ref_inc(page);14181418- kvm->arch.vsie.pages[kvm->arch.vsie.page_count] = page;13831383+ __set_bit(VSIE_PAGE_IN_USE, &vsie_page->flags);13841384+ kvm->arch.vsie.pages[kvm->arch.vsie.page_count] = vsie_page;14191385 kvm->arch.vsie.page_count++;14201386 } else {14211387 /* reuse an existing entry that belongs to nobody */14221388 while (true) {14231423- page = kvm->arch.vsie.pages[kvm->arch.vsie.next];14241424- if (page_ref_inc_return(page) == 2)13891389+ vsie_page = kvm->arch.vsie.pages[kvm->arch.vsie.next];13901390+ if (try_get_vsie_page(vsie_page))14251391 break;14261426- page_ref_dec(page);14271392 kvm->arch.vsie.next++;14281393 kvm->arch.vsie.next %= nr_vcpus;14291394 }14301430- radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9);13951395+ if (vsie_page->scb_gpa != ULONG_MAX)13961396+ radix_tree_delete(&kvm->arch.vsie.addr_to_page,13971397+ vsie_page->scb_gpa >> 9);14311398 }14321432- page->index = addr;14331433- /* double use of the same address */14341434- if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9, page)) {14351435- page_ref_dec(page);13991399+ /* Mark it as invalid until it resides in the tree. */14001400+ vsie_page->scb_gpa = ULONG_MAX;14011401+14021402+ /* Double use of the same address or allocation failure. */14031403+ if (radix_tree_insert(&kvm->arch.vsie.addr_to_page, addr >> 9,14041404+ vsie_page)) {14051405+ put_vsie_page(vsie_page);14361406 mutex_unlock(&kvm->arch.vsie.mutex);14371407 return NULL;14381408 }14091409+ vsie_page->scb_gpa = addr;14391410 mutex_unlock(&kvm->arch.vsie.mutex);1440141114411441- vsie_page = page_to_virt(page);14421412 memset(&vsie_page->scb_s, 0, sizeof(struct kvm_s390_sie_block));14431413 release_gmap_shadow(vsie_page);14441414 vsie_page->fault_addr = 0;14451415 vsie_page->scb_s.ihcpu = 0xffffU;14461416 return vsie_page;14471447-}14481448-14491449-/* put a vsie page acquired via get_vsie_page */14501450-static void put_vsie_page(struct kvm *kvm, struct vsie_page *vsie_page)14511451-{14521452- struct page *page = pfn_to_page(__pa(vsie_page) >> PAGE_SHIFT);14531453-14541454- page_ref_dec(page);14551417}1456141814571419int kvm_s390_handle_vsie(struct kvm_vcpu *vcpu)···15001470out_unpin_scb:15011471 unpin_scb(vcpu, vsie_page, scb_addr);15021472out_put:15031503- put_vsie_page(vcpu->kvm, vsie_page);14731473+ put_vsie_page(vsie_page);1504147415051475 return rc < 0 ? rc : 0;15061476}···15161486void kvm_s390_vsie_destroy(struct kvm *kvm)15171487{15181488 struct vsie_page *vsie_page;15191519- struct page *page;15201489 int i;1521149015221491 mutex_lock(&kvm->arch.vsie.mutex);15231492 for (i = 0; i < kvm->arch.vsie.page_count; i++) {15241524- page = kvm->arch.vsie.pages[i];14931493+ vsie_page = kvm->arch.vsie.pages[i];15251494 kvm->arch.vsie.pages[i] = NULL;15261526- vsie_page = page_to_virt(page);15271495 release_gmap_shadow(vsie_page);15281496 /* free the radix tree entry */15291529- radix_tree_delete(&kvm->arch.vsie.addr_to_page, page->index >> 9);15301530- __free_page(page);14971497+ if (vsie_page->scb_gpa != ULONG_MAX)14981498+ radix_tree_delete(&kvm->arch.vsie.addr_to_page,14991499+ vsie_page->scb_gpa >> 9);15001500+ free_page((unsigned long)vsie_page);15311501 }15321502 kvm->arch.vsie.page_count = 0;15331503 mutex_unlock(&kvm->arch.vsie.mutex);
+150-531
arch/s390/mm/gmap.c
···2424#include <asm/page.h>2525#include <asm/tlb.h>26262727+/*2828+ * The address is saved in a radix tree directly; NULL would be ambiguous,2929+ * since 0 is a valid address, and NULL is returned when nothing was found.3030+ * The lower bits are ignored by all users of the macro, so it can be used3131+ * to distinguish a valid address 0 from a NULL.3232+ */3333+#define VALID_GADDR_FLAG 13434+#define IS_GADDR_VALID(gaddr) ((gaddr) & VALID_GADDR_FLAG)3535+#define MAKE_VALID_GADDR(gaddr) (((gaddr) & HPAGE_MASK) | VALID_GADDR_FLAG)3636+2737#define GMAP_SHADOW_FAKE_TABLE 1ULL28382939static struct page *gmap_alloc_crst(void)···5343 *5444 * Returns a guest address space structure.5545 */5656-static struct gmap *gmap_alloc(unsigned long limit)4646+struct gmap *gmap_alloc(unsigned long limit)5747{5848 struct gmap *gmap;5949 struct page *page;···8070 gmap = kzalloc(sizeof(struct gmap), GFP_KERNEL_ACCOUNT);8171 if (!gmap)8272 goto out;8383- INIT_LIST_HEAD(&gmap->crst_list);8473 INIT_LIST_HEAD(&gmap->children);8585- INIT_LIST_HEAD(&gmap->pt_list);8674 INIT_RADIX_TREE(&gmap->guest_to_host, GFP_KERNEL_ACCOUNT);8775 INIT_RADIX_TREE(&gmap->host_to_guest, GFP_ATOMIC | __GFP_ACCOUNT);8876 INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_ATOMIC | __GFP_ACCOUNT);···9082 page = gmap_alloc_crst();9183 if (!page)9284 goto out_free;9393- page->index = 0;9494- list_add(&page->lru, &gmap->crst_list);9585 table = page_to_virt(page);9686 crst_table_init(table, etype);9787 gmap->table = table;···10397out:10498 return NULL;10599}100100+EXPORT_SYMBOL_GPL(gmap_alloc);106101107102/**108103 * gmap_create - create a guest address space···192185 } while (nr > 0);193186}194187188188+static void gmap_free_crst(unsigned long *table, bool free_ptes)189189+{190190+ bool is_segment = (table[0] & _SEGMENT_ENTRY_TYPE_MASK) == 0;191191+ int i;192192+193193+ if (is_segment) {194194+ if (!free_ptes)195195+ goto out;196196+ for (i = 0; i < _CRST_ENTRIES; i++)197197+ if (!(table[i] & _SEGMENT_ENTRY_INVALID))198198+ page_table_free_pgste(page_ptdesc(phys_to_page(table[i])));199199+ } else {200200+ for (i = 0; i < _CRST_ENTRIES; i++)201201+ if (!(table[i] & _REGION_ENTRY_INVALID))202202+ gmap_free_crst(__va(table[i] & PAGE_MASK), free_ptes);203203+ }204204+205205+out:206206+ free_pages((unsigned long)table, CRST_ALLOC_ORDER);207207+}208208+195209/**196210 * gmap_free - free a guest address space197211 * @gmap: pointer to the guest address space structure198212 *199213 * No locks required. There are no references to this gmap anymore.200214 */201201-static void gmap_free(struct gmap *gmap)215215+void gmap_free(struct gmap *gmap)202216{203203- struct page *page, *next;204204-205217 /* Flush tlb of all gmaps (if not already done for shadows) */206218 if (!(gmap_is_shadow(gmap) && gmap->removed))207219 gmap_flush_tlb(gmap);208220 /* Free all segment & region tables. */209209- list_for_each_entry_safe(page, next, &gmap->crst_list, lru)210210- __free_pages(page, CRST_ALLOC_ORDER);221221+ gmap_free_crst(gmap->table, gmap_is_shadow(gmap));222222+211223 gmap_radix_tree_free(&gmap->guest_to_host);212224 gmap_radix_tree_free(&gmap->host_to_guest);213225214226 /* Free additional data for a shadow gmap */215227 if (gmap_is_shadow(gmap)) {216216- struct ptdesc *ptdesc, *n;217217-218218- /* Free all page tables. */219219- list_for_each_entry_safe(ptdesc, n, &gmap->pt_list, pt_list)220220- page_table_free_pgste(ptdesc);221228 gmap_rmap_radix_tree_free(&gmap->host_to_rmap);222229 /* Release reference to the parent */223230 gmap_put(gmap->parent);···239218240219 kfree(gmap);241220}221221+EXPORT_SYMBOL_GPL(gmap_free);242222243223/**244224 * gmap_get - increase reference counter for guest address space···320298 crst_table_init(new, init);321299 spin_lock(&gmap->guest_table_lock);322300 if (*table & _REGION_ENTRY_INVALID) {323323- list_add(&page->lru, &gmap->crst_list);324301 *table = __pa(new) | _REGION_ENTRY_LENGTH |325302 (*table & _REGION_ENTRY_TYPE_MASK);326326- page->index = gaddr;327303 page = NULL;328304 }329305 spin_unlock(&gmap->guest_table_lock);···330310 return 0;331311}332312333333-/**334334- * __gmap_segment_gaddr - find virtual address from segment pointer335335- * @entry: pointer to a segment table entry in the guest address space336336- *337337- * Returns the virtual address in the guest address space for the segment338338- */339339-static unsigned long __gmap_segment_gaddr(unsigned long *entry)313313+static unsigned long host_to_guest_lookup(struct gmap *gmap, unsigned long vmaddr)340314{341341- struct page *page;342342- unsigned long offset;315315+ return (unsigned long)radix_tree_lookup(&gmap->host_to_guest, vmaddr >> PMD_SHIFT);316316+}343317344344- offset = (unsigned long) entry / sizeof(unsigned long);345345- offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;346346- page = pmd_pgtable_page((pmd_t *) entry);347347- return page->index + offset;318318+static unsigned long host_to_guest_delete(struct gmap *gmap, unsigned long vmaddr)319319+{320320+ return (unsigned long)radix_tree_delete(&gmap->host_to_guest, vmaddr >> PMD_SHIFT);321321+}322322+323323+static pmd_t *host_to_guest_pmd_delete(struct gmap *gmap, unsigned long vmaddr,324324+ unsigned long *gaddr)325325+{326326+ *gaddr = host_to_guest_delete(gmap, vmaddr);327327+ if (IS_GADDR_VALID(*gaddr))328328+ return (pmd_t *)gmap_table_walk(gmap, *gaddr, 1);329329+ return NULL;348330}349331350332/**···358336 */359337static int __gmap_unlink_by_vmaddr(struct gmap *gmap, unsigned long vmaddr)360338{361361- unsigned long *entry;339339+ unsigned long gaddr;362340 int flush = 0;341341+ pmd_t *pmdp;363342364343 BUG_ON(gmap_is_shadow(gmap));365344 spin_lock(&gmap->guest_table_lock);366366- entry = radix_tree_delete(&gmap->host_to_guest, vmaddr >> PMD_SHIFT);367367- if (entry) {368368- flush = (*entry != _SEGMENT_ENTRY_EMPTY);369369- *entry = _SEGMENT_ENTRY_EMPTY;345345+346346+ pmdp = host_to_guest_pmd_delete(gmap, vmaddr, &gaddr);347347+ if (pmdp) {348348+ flush = (pmd_val(*pmdp) != _SEGMENT_ENTRY_EMPTY);349349+ *pmdp = __pmd(_SEGMENT_ENTRY_EMPTY);370350 }351351+371352 spin_unlock(&gmap->guest_table_lock);372353 return flush;373354}···489464EXPORT_SYMBOL_GPL(__gmap_translate);490465491466/**492492- * gmap_translate - translate a guest address to a user space address493493- * @gmap: pointer to guest mapping meta data structure494494- * @gaddr: guest address495495- *496496- * Returns user space address which corresponds to the guest address or497497- * -EFAULT if no such mapping exists.498498- * This function does not establish potentially missing page table entries.499499- */500500-unsigned long gmap_translate(struct gmap *gmap, unsigned long gaddr)501501-{502502- unsigned long rc;503503-504504- mmap_read_lock(gmap->mm);505505- rc = __gmap_translate(gmap, gaddr);506506- mmap_read_unlock(gmap->mm);507507- return rc;508508-}509509-EXPORT_SYMBOL_GPL(gmap_translate);510510-511511-/**512467 * gmap_unlink - disconnect a page table from the gmap shadow tables513468 * @mm: pointer to the parent mm_struct514469 * @table: pointer to the host page table···587582 spin_lock(&gmap->guest_table_lock);588583 if (*table == _SEGMENT_ENTRY_EMPTY) {589584 rc = radix_tree_insert(&gmap->host_to_guest,590590- vmaddr >> PMD_SHIFT, table);585585+ vmaddr >> PMD_SHIFT,586586+ (void *)MAKE_VALID_GADDR(gaddr));591587 if (!rc) {592588 if (pmd_leaf(*pmd)) {593589 *table = (pmd_val(*pmd) &···611605 radix_tree_preload_end();612606 return rc;613607}614614-615615-/**616616- * fixup_user_fault_nowait - manually resolve a user page fault without waiting617617- * @mm: mm_struct of target mm618618- * @address: user address619619- * @fault_flags:flags to pass down to handle_mm_fault()620620- * @unlocked: did we unlock the mmap_lock while retrying621621- *622622- * This function behaves similarly to fixup_user_fault(), but it guarantees623623- * that the fault will be resolved without waiting. The function might drop624624- * and re-acquire the mm lock, in which case @unlocked will be set to true.625625- *626626- * The guarantee is that the fault is handled without waiting, but the627627- * function itself might sleep, due to the lock.628628- *629629- * Context: Needs to be called with mm->mmap_lock held in read mode, and will630630- * return with the lock held in read mode; @unlocked will indicate whether631631- * the lock has been dropped and re-acquired. This is the same behaviour as632632- * fixup_user_fault().633633- *634634- * Return: 0 on success, -EAGAIN if the fault cannot be resolved without635635- * waiting, -EFAULT if the fault cannot be resolved, -ENOMEM if out of636636- * memory.637637- */638638-static int fixup_user_fault_nowait(struct mm_struct *mm, unsigned long address,639639- unsigned int fault_flags, bool *unlocked)640640-{641641- struct vm_area_struct *vma;642642- unsigned int test_flags;643643- vm_fault_t fault;644644- int rc;645645-646646- fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;647647- test_flags = fault_flags & FAULT_FLAG_WRITE ? VM_WRITE : VM_READ;648648-649649- vma = find_vma(mm, address);650650- if (unlikely(!vma || address < vma->vm_start))651651- return -EFAULT;652652- if (unlikely(!(vma->vm_flags & test_flags)))653653- return -EFAULT;654654-655655- fault = handle_mm_fault(vma, address, fault_flags, NULL);656656- /* the mm lock has been dropped, take it again */657657- if (fault & VM_FAULT_COMPLETED) {658658- *unlocked = true;659659- mmap_read_lock(mm);660660- return 0;661661- }662662- /* the mm lock has not been dropped */663663- if (fault & VM_FAULT_ERROR) {664664- rc = vm_fault_to_errno(fault, 0);665665- BUG_ON(!rc);666666- return rc;667667- }668668- /* the mm lock has not been dropped because of FAULT_FLAG_RETRY_NOWAIT */669669- if (fault & VM_FAULT_RETRY)670670- return -EAGAIN;671671- /* nothing needed to be done and the mm lock has not been dropped */672672- return 0;673673-}674674-675675-/**676676- * __gmap_fault - resolve a fault on a guest address677677- * @gmap: pointer to guest mapping meta data structure678678- * @gaddr: guest address679679- * @fault_flags: flags to pass down to handle_mm_fault()680680- *681681- * Context: Needs to be called with mm->mmap_lock held in read mode. Might682682- * drop and re-acquire the lock. Will always return with the lock held.683683- */684684-static int __gmap_fault(struct gmap *gmap, unsigned long gaddr, unsigned int fault_flags)685685-{686686- unsigned long vmaddr;687687- bool unlocked;688688- int rc = 0;689689-690690-retry:691691- unlocked = false;692692-693693- vmaddr = __gmap_translate(gmap, gaddr);694694- if (IS_ERR_VALUE(vmaddr))695695- return vmaddr;696696-697697- if (fault_flags & FAULT_FLAG_RETRY_NOWAIT)698698- rc = fixup_user_fault_nowait(gmap->mm, vmaddr, fault_flags, &unlocked);699699- else700700- rc = fixup_user_fault(gmap->mm, vmaddr, fault_flags, &unlocked);701701- if (rc)702702- return rc;703703- /*704704- * In the case that fixup_user_fault unlocked the mmap_lock during705705- * fault-in, redo __gmap_translate() to avoid racing with a706706- * map/unmap_segment.707707- * In particular, __gmap_translate(), fixup_user_fault{,_nowait}(),708708- * and __gmap_link() must all be called atomically in one go; if the709709- * lock had been dropped in between, a retry is needed.710710- */711711- if (unlocked)712712- goto retry;713713-714714- return __gmap_link(gmap, gaddr, vmaddr);715715-}716716-717717-/**718718- * gmap_fault - resolve a fault on a guest address719719- * @gmap: pointer to guest mapping meta data structure720720- * @gaddr: guest address721721- * @fault_flags: flags to pass down to handle_mm_fault()722722- *723723- * Returns 0 on success, -ENOMEM for out of memory conditions, -EFAULT if the724724- * vm address is already mapped to a different guest segment, and -EAGAIN if725725- * FAULT_FLAG_RETRY_NOWAIT was specified and the fault could not be processed726726- * immediately.727727- */728728-int gmap_fault(struct gmap *gmap, unsigned long gaddr, unsigned int fault_flags)729729-{730730- int rc;731731-732732- mmap_read_lock(gmap->mm);733733- rc = __gmap_fault(gmap, gaddr, fault_flags);734734- mmap_read_unlock(gmap->mm);735735- return rc;736736-}737737-EXPORT_SYMBOL_GPL(gmap_fault);608608+EXPORT_SYMBOL(__gmap_link);738609739610/*740611 * this function is assumed to be called with mmap_lock held···736853 *737854 * Note: Can also be called for shadow gmaps.738855 */739739-static inline unsigned long *gmap_table_walk(struct gmap *gmap,740740- unsigned long gaddr, int level)856856+unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int level)741857{742858 const int asce_type = gmap->asce & _ASCE_TYPE_MASK;743859 unsigned long *table = gmap->table;···787905 }788906 return table;789907}908908+EXPORT_SYMBOL(gmap_table_walk);790909791910/**792911 * gmap_pte_op_walk - walk the gmap page table, get the page table lock···9841101 * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE9851102 * @bits: pgste notification bits to set9861103 *987987- * Returns 0 if successfully protected, -ENOMEM if out of memory and988988- * -EFAULT if gaddr is invalid (or mapping for shadows is missing).11041104+ * Returns:11051105+ * PAGE_SIZE if a small page was successfully protected;11061106+ * HPAGE_SIZE if a large page was successfully protected;11071107+ * -ENOMEM if out of memory;11081108+ * -EFAULT if gaddr is invalid (or mapping for shadows is missing);11091109+ * -EAGAIN if the guest mapping is missing and should be fixed by the caller.9891110 *990990- * Called with sg->mm->mmap_lock in read.11111111+ * Context: Called with sg->mm->mmap_lock in read.9911112 */992992-static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,993993- unsigned long len, int prot, unsigned long bits)11131113+int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, unsigned long bits)9941114{995995- unsigned long vmaddr, dist;9961115 pmd_t *pmdp;997997- int rc;11161116+ int rc = 0;99811179991118 BUG_ON(gmap_is_shadow(gmap));10001000- while (len) {10011001- rc = -EAGAIN;10021002- pmdp = gmap_pmd_op_walk(gmap, gaddr);10031003- if (pmdp) {10041004- if (!pmd_leaf(*pmdp)) {10051005- rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,10061006- bits);10071007- if (!rc) {10081008- len -= PAGE_SIZE;10091009- gaddr += PAGE_SIZE;10101010- }10111011- } else {10121012- rc = gmap_protect_pmd(gmap, gaddr, pmdp, prot,10131013- bits);10141014- if (!rc) {10151015- dist = HPAGE_SIZE - (gaddr & ~HPAGE_MASK);10161016- len = len < dist ? 0 : len - dist;10171017- gaddr = (gaddr & HPAGE_MASK) + HPAGE_SIZE;10181018- }10191019- }10201020- gmap_pmd_op_end(gmap, pmdp);10211021- }10221022- if (rc) {10231023- if (rc == -EINVAL)10241024- return rc;1025111910261026- /* -EAGAIN, fixup of userspace mm and gmap */10271027- vmaddr = __gmap_translate(gmap, gaddr);10281028- if (IS_ERR_VALUE(vmaddr))10291029- return vmaddr;10301030- rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot);10311031- if (rc)10321032- return rc;10331033- }11201120+ pmdp = gmap_pmd_op_walk(gmap, gaddr);11211121+ if (!pmdp)11221122+ return -EAGAIN;11231123+11241124+ if (!pmd_leaf(*pmdp)) {11251125+ rc = gmap_protect_pte(gmap, gaddr, pmdp, prot, bits);11261126+ if (!rc)11271127+ rc = PAGE_SIZE;11281128+ } else {11291129+ rc = gmap_protect_pmd(gmap, gaddr, pmdp, prot, bits);11301130+ if (!rc)11311131+ rc = HPAGE_SIZE;10341132 }10351035- return 0;10361036-}11331133+ gmap_pmd_op_end(gmap, pmdp);1037113410381038-/**10391039- * gmap_mprotect_notify - change access rights for a range of ptes and10401040- * call the notifier if any pte changes again10411041- * @gmap: pointer to guest mapping meta data structure10421042- * @gaddr: virtual address in the guest address space10431043- * @len: size of area10441044- * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE10451045- *10461046- * Returns 0 if for each page in the given range a gmap mapping exists,10471047- * the new access rights could be set and the notifier could be armed.10481048- * If the gmap mapping is missing for one or more pages -EFAULT is10491049- * returned. If no memory could be allocated -ENOMEM is returned.10501050- * This function establishes missing page table entries.10511051- */10521052-int gmap_mprotect_notify(struct gmap *gmap, unsigned long gaddr,10531053- unsigned long len, int prot)10541054-{10551055- int rc;10561056-10571057- if ((gaddr & ~PAGE_MASK) || (len & ~PAGE_MASK) || gmap_is_shadow(gmap))10581058- return -EINVAL;10591059- if (!MACHINE_HAS_ESOP && prot == PROT_READ)10601060- return -EINVAL;10611061- mmap_read_lock(gmap->mm);10621062- rc = gmap_protect_range(gmap, gaddr, len, prot, GMAP_NOTIFY_MPROT);10631063- mmap_read_unlock(gmap->mm);10641135 return rc;10651136}10661066-EXPORT_SYMBOL_GPL(gmap_mprotect_notify);11371137+EXPORT_SYMBOL_GPL(gmap_protect_one);1067113810681139/**10691140 * gmap_read_table - get an unsigned long value from a guest page table using···12511414 __gmap_unshadow_pgt(sg, raddr, __va(pgt));12521415 /* Free page table */12531416 ptdesc = page_ptdesc(phys_to_page(pgt));12541254- list_del(&ptdesc->pt_list);12551417 page_table_free_pgste(ptdesc);12561418}12571419···12781442 __gmap_unshadow_pgt(sg, raddr, __va(pgt));12791443 /* Free page table */12801444 ptdesc = page_ptdesc(phys_to_page(pgt));12811281- list_del(&ptdesc->pt_list);12821445 page_table_free_pgste(ptdesc);12831446 }12841447}···13071472 __gmap_unshadow_sgt(sg, raddr, __va(sgt));13081473 /* Free segment table */13091474 page = phys_to_page(sgt);13101310- list_del(&page->lru);13111475 __free_pages(page, CRST_ALLOC_ORDER);13121476}13131477···13341500 __gmap_unshadow_sgt(sg, raddr, __va(sgt));13351501 /* Free segment table */13361502 page = phys_to_page(sgt);13371337- list_del(&page->lru);13381503 __free_pages(page, CRST_ALLOC_ORDER);13391504 }13401505}···13631530 __gmap_unshadow_r3t(sg, raddr, __va(r3t));13641531 /* Free region 3 table */13651532 page = phys_to_page(r3t);13661366- list_del(&page->lru);13671533 __free_pages(page, CRST_ALLOC_ORDER);13681534}13691535···13901558 __gmap_unshadow_r3t(sg, raddr, __va(r3t));13911559 /* Free region 3 table */13921560 page = phys_to_page(r3t);13931393- list_del(&page->lru);13941561 __free_pages(page, CRST_ALLOC_ORDER);13951562 }13961563}···14191588 __gmap_unshadow_r2t(sg, raddr, __va(r2t));14201589 /* Free region 2 table */14211590 page = phys_to_page(r2t);14221422- list_del(&page->lru);14231591 __free_pages(page, CRST_ALLOC_ORDER);14241592}14251593···14501620 r1t[i] = _REGION1_ENTRY_EMPTY;14511621 /* Free region 2 table */14521622 page = phys_to_page(r2t);14531453- list_del(&page->lru);14541623 __free_pages(page, CRST_ALLOC_ORDER);14551624 }14561625}···14601631 *14611632 * Called with sg->guest_table_lock14621633 */14631463-static void gmap_unshadow(struct gmap *sg)16341634+void gmap_unshadow(struct gmap *sg)14641635{14651636 unsigned long *table;14661637···14861657 break;14871658 }14881659}14891489-14901490-/**14911491- * gmap_find_shadow - find a specific asce in the list of shadow tables14921492- * @parent: pointer to the parent gmap14931493- * @asce: ASCE for which the shadow table is created14941494- * @edat_level: edat level to be used for the shadow translation14951495- *14961496- * Returns the pointer to a gmap if a shadow table with the given asce is14971497- * already available, ERR_PTR(-EAGAIN) if another one is just being created,14981498- * otherwise NULL14991499- */15001500-static struct gmap *gmap_find_shadow(struct gmap *parent, unsigned long asce,15011501- int edat_level)15021502-{15031503- struct gmap *sg;15041504-15051505- list_for_each_entry(sg, &parent->children, list) {15061506- if (sg->orig_asce != asce || sg->edat_level != edat_level ||15071507- sg->removed)15081508- continue;15091509- if (!sg->initialized)15101510- return ERR_PTR(-EAGAIN);15111511- refcount_inc(&sg->ref_count);15121512- return sg;15131513- }15141514- return NULL;15151515-}15161516-15171517-/**15181518- * gmap_shadow_valid - check if a shadow guest address space matches the15191519- * given properties and is still valid15201520- * @sg: pointer to the shadow guest address space structure15211521- * @asce: ASCE for which the shadow table is requested15221522- * @edat_level: edat level to be used for the shadow translation15231523- *15241524- * Returns 1 if the gmap shadow is still valid and matches the given15251525- * properties, the caller can continue using it. Returns 0 otherwise, the15261526- * caller has to request a new shadow gmap in this case.15271527- *15281528- */15291529-int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level)15301530-{15311531- if (sg->removed)15321532- return 0;15331533- return sg->orig_asce == asce && sg->edat_level == edat_level;15341534-}15351535-EXPORT_SYMBOL_GPL(gmap_shadow_valid);15361536-15371537-/**15381538- * gmap_shadow - create/find a shadow guest address space15391539- * @parent: pointer to the parent gmap15401540- * @asce: ASCE for which the shadow table is created15411541- * @edat_level: edat level to be used for the shadow translation15421542- *15431543- * The pages of the top level page table referred by the asce parameter15441544- * will be set to read-only and marked in the PGSTEs of the kvm process.15451545- * The shadow table will be removed automatically on any change to the15461546- * PTE mapping for the source table.15471547- *15481548- * Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of memory,15491549- * ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the15501550- * parent gmap table could not be protected.15511551- */15521552-struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,15531553- int edat_level)15541554-{15551555- struct gmap *sg, *new;15561556- unsigned long limit;15571557- int rc;15581558-15591559- BUG_ON(parent->mm->context.allow_gmap_hpage_1m);15601560- BUG_ON(gmap_is_shadow(parent));15611561- spin_lock(&parent->shadow_lock);15621562- sg = gmap_find_shadow(parent, asce, edat_level);15631563- spin_unlock(&parent->shadow_lock);15641564- if (sg)15651565- return sg;15661566- /* Create a new shadow gmap */15671567- limit = -1UL >> (33 - (((asce & _ASCE_TYPE_MASK) >> 2) * 11));15681568- if (asce & _ASCE_REAL_SPACE)15691569- limit = -1UL;15701570- new = gmap_alloc(limit);15711571- if (!new)15721572- return ERR_PTR(-ENOMEM);15731573- new->mm = parent->mm;15741574- new->parent = gmap_get(parent);15751575- new->private = parent->private;15761576- new->orig_asce = asce;15771577- new->edat_level = edat_level;15781578- new->initialized = false;15791579- spin_lock(&parent->shadow_lock);15801580- /* Recheck if another CPU created the same shadow */15811581- sg = gmap_find_shadow(parent, asce, edat_level);15821582- if (sg) {15831583- spin_unlock(&parent->shadow_lock);15841584- gmap_free(new);15851585- return sg;15861586- }15871587- if (asce & _ASCE_REAL_SPACE) {15881588- /* only allow one real-space gmap shadow */15891589- list_for_each_entry(sg, &parent->children, list) {15901590- if (sg->orig_asce & _ASCE_REAL_SPACE) {15911591- spin_lock(&sg->guest_table_lock);15921592- gmap_unshadow(sg);15931593- spin_unlock(&sg->guest_table_lock);15941594- list_del(&sg->list);15951595- gmap_put(sg);15961596- break;15971597- }15981598- }15991599- }16001600- refcount_set(&new->ref_count, 2);16011601- list_add(&new->list, &parent->children);16021602- if (asce & _ASCE_REAL_SPACE) {16031603- /* nothing to protect, return right away */16041604- new->initialized = true;16051605- spin_unlock(&parent->shadow_lock);16061606- return new;16071607- }16081608- spin_unlock(&parent->shadow_lock);16091609- /* protect after insertion, so it will get properly invalidated */16101610- mmap_read_lock(parent->mm);16111611- rc = gmap_protect_range(parent, asce & _ASCE_ORIGIN,16121612- ((asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE,16131613- PROT_READ, GMAP_NOTIFY_SHADOW);16141614- mmap_read_unlock(parent->mm);16151615- spin_lock(&parent->shadow_lock);16161616- new->initialized = true;16171617- if (rc) {16181618- list_del(&new->list);16191619- gmap_free(new);16201620- new = ERR_PTR(rc);16211621- }16221622- spin_unlock(&parent->shadow_lock);16231623- return new;16241624-}16251625-EXPORT_SYMBOL_GPL(gmap_shadow);16601660+EXPORT_SYMBOL(gmap_unshadow);1626166116271662/**16281663 * gmap_shadow_r2t - create an empty shadow region 2 table···15201827 page = gmap_alloc_crst();15211828 if (!page)15221829 return -ENOMEM;15231523- page->index = r2t & _REGION_ENTRY_ORIGIN;15241524- if (fake)15251525- page->index |= GMAP_SHADOW_FAKE_TABLE;15261830 s_r2t = page_to_phys(page);15271831 /* Install shadow region second table */15281832 spin_lock(&sg->guest_table_lock);···15411851 _REGION_ENTRY_TYPE_R1 | _REGION_ENTRY_INVALID;15421852 if (sg->edat_level >= 1)15431853 *table |= (r2t & _REGION_ENTRY_PROTECT);15441544- list_add(&page->lru, &sg->crst_list);15451854 if (fake) {15461855 /* nothing to protect for fake tables */15471856 *table &= ~_REGION_ENTRY_INVALID;···16001911 page = gmap_alloc_crst();16011912 if (!page)16021913 return -ENOMEM;16031603- page->index = r3t & _REGION_ENTRY_ORIGIN;16041604- if (fake)16051605- page->index |= GMAP_SHADOW_FAKE_TABLE;16061914 s_r3t = page_to_phys(page);16071915 /* Install shadow region second table */16081916 spin_lock(&sg->guest_table_lock);···16211935 _REGION_ENTRY_TYPE_R2 | _REGION_ENTRY_INVALID;16221936 if (sg->edat_level >= 1)16231937 *table |= (r3t & _REGION_ENTRY_PROTECT);16241624- list_add(&page->lru, &sg->crst_list);16251938 if (fake) {16261939 /* nothing to protect for fake tables */16271940 *table &= ~_REGION_ENTRY_INVALID;···16801995 page = gmap_alloc_crst();16811996 if (!page)16821997 return -ENOMEM;16831683- page->index = sgt & _REGION_ENTRY_ORIGIN;16841684- if (fake)16851685- page->index |= GMAP_SHADOW_FAKE_TABLE;16861998 s_sgt = page_to_phys(page);16871999 /* Install shadow region second table */16882000 spin_lock(&sg->guest_table_lock);···17012019 _REGION_ENTRY_TYPE_R3 | _REGION_ENTRY_INVALID;17022020 if (sg->edat_level >= 1)17032021 *table |= sgt & _REGION_ENTRY_PROTECT;17041704- list_add(&page->lru, &sg->crst_list);17052022 if (fake) {17062023 /* nothing to protect for fake tables */17072024 *table &= ~_REGION_ENTRY_INVALID;···17332052}17342053EXPORT_SYMBOL_GPL(gmap_shadow_sgt);1735205417361736-/**17371737- * gmap_shadow_pgt_lookup - find a shadow page table17381738- * @sg: pointer to the shadow guest address space structure17391739- * @saddr: the address in the shadow aguest address space17401740- * @pgt: parent gmap address of the page table to get shadowed17411741- * @dat_protection: if the pgtable is marked as protected by dat17421742- * @fake: pgt references contiguous guest memory block, not a pgtable17431743- *17441744- * Returns 0 if the shadow page table was found and -EAGAIN if the page17451745- * table was not found.17461746- *17471747- * Called with sg->mm->mmap_lock in read.17481748- */17491749-int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr,17501750- unsigned long *pgt, int *dat_protection,17511751- int *fake)20552055+static void gmap_pgste_set_pgt_addr(struct ptdesc *ptdesc, unsigned long pgt_addr)17522056{17531753- unsigned long *table;17541754- struct page *page;17551755- int rc;20572057+ unsigned long *pgstes = page_to_virt(ptdesc_page(ptdesc));1756205817571757- BUG_ON(!gmap_is_shadow(sg));17581758- spin_lock(&sg->guest_table_lock);17591759- table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */17601760- if (table && !(*table & _SEGMENT_ENTRY_INVALID)) {17611761- /* Shadow page tables are full pages (pte+pgste) */17621762- page = pfn_to_page(*table >> PAGE_SHIFT);17631763- *pgt = page->index & ~GMAP_SHADOW_FAKE_TABLE;17641764- *dat_protection = !!(*table & _SEGMENT_ENTRY_PROTECT);17651765- *fake = !!(page->index & GMAP_SHADOW_FAKE_TABLE);17661766- rc = 0;17671767- } else {17681768- rc = -EAGAIN;17691769- }17701770- spin_unlock(&sg->guest_table_lock);17711771- return rc;20592059+ pgstes += _PAGE_ENTRIES;1772206020612061+ pgstes[0] &= ~PGSTE_ST2_MASK;20622062+ pgstes[1] &= ~PGSTE_ST2_MASK;20632063+ pgstes[2] &= ~PGSTE_ST2_MASK;20642064+ pgstes[3] &= ~PGSTE_ST2_MASK;20652065+20662066+ pgstes[0] |= (pgt_addr >> 16) & PGSTE_ST2_MASK;20672067+ pgstes[1] |= pgt_addr & PGSTE_ST2_MASK;20682068+ pgstes[2] |= (pgt_addr << 16) & PGSTE_ST2_MASK;20692069+ pgstes[3] |= (pgt_addr << 32) & PGSTE_ST2_MASK;17732070}17741774-EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup);1775207117762072/**17772073 * gmap_shadow_pgt - instantiate a shadow page table···17772119 ptdesc = page_table_alloc_pgste(sg->mm);17782120 if (!ptdesc)17792121 return -ENOMEM;17801780- ptdesc->pt_index = pgt & _SEGMENT_ENTRY_ORIGIN;21222122+ origin = pgt & _SEGMENT_ENTRY_ORIGIN;17812123 if (fake)17821782- ptdesc->pt_index |= GMAP_SHADOW_FAKE_TABLE;21242124+ origin |= GMAP_SHADOW_FAKE_TABLE;21252125+ gmap_pgste_set_pgt_addr(ptdesc, origin);17832126 s_pgt = page_to_phys(ptdesc_page(ptdesc));17842127 /* Install shadow page table */17852128 spin_lock(&sg->guest_table_lock);···17992140 /* mark as invalid as long as the parent table is not protected */18002141 *table = (unsigned long) s_pgt | _SEGMENT_ENTRY |18012142 (pgt & _SEGMENT_ENTRY_PROTECT) | _SEGMENT_ENTRY_INVALID;18021802- list_add(&ptdesc->pt_list, &sg->pt_list);18032143 if (fake) {18042144 /* nothing to protect for fake tables */18052145 *table &= ~_SEGMENT_ENTRY_INVALID;···19762318 pte_t *pte, unsigned long bits)19772319{19782320 unsigned long offset, gaddr = 0;19791979- unsigned long *table;19802321 struct gmap *gmap, *sg, *next;1981232219822323 offset = ((unsigned long) pte) & (255 * sizeof(pte_t));···19832326 rcu_read_lock();19842327 list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {19852328 spin_lock(&gmap->guest_table_lock);19861986- table = radix_tree_lookup(&gmap->host_to_guest,19871987- vmaddr >> PMD_SHIFT);19881988- if (table)19891989- gaddr = __gmap_segment_gaddr(table) + offset;23292329+ gaddr = host_to_guest_lookup(gmap, vmaddr) + offset;19902330 spin_unlock(&gmap->guest_table_lock);19911991- if (!table)23312331+ if (!IS_GADDR_VALID(gaddr))19922332 continue;1993233319942334 if (!list_empty(&gmap->children) && (bits & PGSTE_VSIE_BIT)) {···20452391 rcu_read_lock();20462392 list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {20472393 spin_lock(&gmap->guest_table_lock);20482048- pmdp = (pmd_t *)radix_tree_delete(&gmap->host_to_guest,20492049- vmaddr >> PMD_SHIFT);23942394+ pmdp = host_to_guest_pmd_delete(gmap, vmaddr, &gaddr);20502395 if (pmdp) {20512051- gaddr = __gmap_segment_gaddr((unsigned long *)pmdp);20522396 pmdp_notify_gmap(gmap, pmdp, gaddr);20532397 WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE |20542398 _SEGMENT_ENTRY_GMAP_UC |···20902438 */20912439void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr)20922440{20932093- unsigned long *entry, gaddr;24412441+ unsigned long gaddr;20942442 struct gmap *gmap;20952443 pmd_t *pmdp;2096244420972445 rcu_read_lock();20982446 list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {20992447 spin_lock(&gmap->guest_table_lock);21002100- entry = radix_tree_delete(&gmap->host_to_guest,21012101- vmaddr >> PMD_SHIFT);21022102- if (entry) {21032103- pmdp = (pmd_t *)entry;21042104- gaddr = __gmap_segment_gaddr(entry);24482448+ pmdp = host_to_guest_pmd_delete(gmap, vmaddr, &gaddr);24492449+ if (pmdp) {21052450 pmdp_notify_gmap(gmap, pmdp, gaddr);21062106- WARN_ON(*entry & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE |21072107- _SEGMENT_ENTRY_GMAP_UC |21082108- _SEGMENT_ENTRY));24512451+ WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE |24522452+ _SEGMENT_ENTRY_GMAP_UC |24532453+ _SEGMENT_ENTRY));21092454 if (MACHINE_HAS_TLB_GUEST)21102455 __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE,21112456 gmap->asce, IDTE_LOCAL);21122457 else if (MACHINE_HAS_IDTE)21132458 __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_LOCAL);21142114- *entry = _SEGMENT_ENTRY_EMPTY;24592459+ *pmdp = __pmd(_SEGMENT_ENTRY_EMPTY);21152460 }21162461 spin_unlock(&gmap->guest_table_lock);21172462 }···21232474 */21242475void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr)21252476{21262126- unsigned long *entry, gaddr;24772477+ unsigned long gaddr;21272478 struct gmap *gmap;21282479 pmd_t *pmdp;2129248021302481 rcu_read_lock();21312482 list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {21322483 spin_lock(&gmap->guest_table_lock);21332133- entry = radix_tree_delete(&gmap->host_to_guest,21342134- vmaddr >> PMD_SHIFT);21352135- if (entry) {21362136- pmdp = (pmd_t *)entry;21372137- gaddr = __gmap_segment_gaddr(entry);24842484+ pmdp = host_to_guest_pmd_delete(gmap, vmaddr, &gaddr);24852485+ if (pmdp) {21382486 pmdp_notify_gmap(gmap, pmdp, gaddr);21392139- WARN_ON(*entry & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE |21402140- _SEGMENT_ENTRY_GMAP_UC |21412141- _SEGMENT_ENTRY));24872487+ WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE |24882488+ _SEGMENT_ENTRY_GMAP_UC |24892489+ _SEGMENT_ENTRY));21422490 if (MACHINE_HAS_TLB_GUEST)21432491 __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE,21442492 gmap->asce, IDTE_GLOBAL);···21432497 __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_GLOBAL);21442498 else21452499 __pmdp_csp(pmdp);21462146- *entry = _SEGMENT_ENTRY_EMPTY;25002500+ *pmdp = __pmd(_SEGMENT_ENTRY_EMPTY);21472501 }21482502 spin_unlock(&gmap->guest_table_lock);21492503 }···25892943EXPORT_SYMBOL_GPL(__s390_uv_destroy_range);2590294425912945/**25922592- * s390_unlist_old_asce - Remove the topmost level of page tables from the25932593- * list of page tables of the gmap.25942594- * @gmap: the gmap whose table is to be removed25952595- *25962596- * On s390x, KVM keeps a list of all pages containing the page tables of the25972597- * gmap (the CRST list). This list is used at tear down time to free all25982598- * pages that are now not needed anymore.25992599- *26002600- * This function removes the topmost page of the tree (the one pointed to by26012601- * the ASCE) from the CRST list.26022602- *26032603- * This means that it will not be freed when the VM is torn down, and needs26042604- * to be handled separately by the caller, unless a leak is actually26052605- * intended. Notice that this function will only remove the page from the26062606- * list, the page will still be used as a top level page table (and ASCE).26072607- */26082608-void s390_unlist_old_asce(struct gmap *gmap)26092609-{26102610- struct page *old;26112611-26122612- old = virt_to_page(gmap->table);26132613- spin_lock(&gmap->guest_table_lock);26142614- list_del(&old->lru);26152615- /*26162616- * Sometimes the topmost page might need to be "removed" multiple26172617- * times, for example if the VM is rebooted into secure mode several26182618- * times concurrently, or if s390_replace_asce fails after calling26192619- * s390_remove_old_asce and is attempted again later. In that case26202620- * the old asce has been removed from the list, and therefore it26212621- * will not be freed when the VM terminates, but the ASCE is still26222622- * in use and still pointed to.26232623- * A subsequent call to replace_asce will follow the pointer and try26242624- * to remove the same page from the list again.26252625- * Therefore it's necessary that the page of the ASCE has valid26262626- * pointers, so list_del can work (and do nothing) without26272627- * dereferencing stale or invalid pointers.26282628- */26292629- INIT_LIST_HEAD(&old->lru);26302630- spin_unlock(&gmap->guest_table_lock);26312631-}26322632-EXPORT_SYMBOL_GPL(s390_unlist_old_asce);26332633-26342634-/**26352946 * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy26362947 * @gmap: the gmap whose ASCE needs to be replaced26372948 *···26073004 struct page *page;26083005 void *table;2609300626102610- s390_unlist_old_asce(gmap);26112611-26123007 /* Replacing segment type ASCEs would cause serious issues */26133008 if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)26143009 return -EINVAL;···26143013 page = gmap_alloc_crst();26153014 if (!page)26163015 return -ENOMEM;26172617- page->index = 0;26183016 table = page_to_virt(page);26193017 memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT));26202620-26212621- /*26222622- * The caller has to deal with the old ASCE, but here we make sure26232623- * the new one is properly added to the CRST list, so that26242624- * it will be freed when the VM is torn down.26252625- */26262626- spin_lock(&gmap->guest_table_lock);26272627- list_add(&page->lru, &gmap->crst_list);26282628- spin_unlock(&gmap->guest_table_lock);2629301826303019 /* Set new table origin while preserving existing ASCE control bits */26313020 asce = (gmap->asce & ~_ASCE_ORIGIN) | __pa(table);···26263035 return 0;26273036}26283037EXPORT_SYMBOL_GPL(s390_replace_asce);30383038+30393039+/**30403040+ * kvm_s390_wiggle_split_folio() - try to drain extra references to a folio and optionally split30413041+ * @mm: the mm containing the folio to work on30423042+ * @folio: the folio30433043+ * @split: whether to split a large folio30443044+ *30453045+ * Context: Must be called while holding an extra reference to the folio;30463046+ * the mm lock should not be held.30473047+ */30483048+int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split)30493049+{30503050+ int rc;30513051+30523052+ lockdep_assert_not_held(&mm->mmap_lock);30533053+ folio_wait_writeback(folio);30543054+ lru_add_drain_all();30553055+ if (split) {30563056+ folio_lock(folio);30573057+ rc = split_folio(folio);30583058+ folio_unlock(folio);30593059+30603060+ if (rc != -EBUSY)30613061+ return rc;30623062+ }30633063+ return -EAGAIN;30643064+}30653065+EXPORT_SYMBOL_GPL(kvm_s390_wiggle_split_folio);
-2
arch/s390/mm/pgalloc.c
···176176 }177177 table = ptdesc_to_virt(ptdesc);178178 __arch_set_page_dat(table, 1);179179- /* pt_list is used by gmap only */180180- INIT_LIST_HEAD(&ptdesc->pt_list);181179 memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);182180 memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);183181 return table;
+1
arch/x86/boot/compressed/Makefile
···2525# avoid errors with '-march=i386', and future flags may depend on the target to2626# be valid.2727KBUILD_CFLAGS := -m$(BITS) -O2 $(CLANG_FLAGS)2828+KBUILD_CFLAGS += -std=gnu112829KBUILD_CFLAGS += -fno-strict-aliasing -fPIE2930KBUILD_CFLAGS += -Wundef3031KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
···71207120 kmem_cache_destroy(mmu_page_header_cache);71217121}7122712271237123+static void kvm_wake_nx_recovery_thread(struct kvm *kvm)71247124+{71257125+ /*71267126+ * The NX recovery thread is spawned on-demand at the first KVM_RUN and71277127+ * may not be valid even though the VM is globally visible. Do nothing,71287128+ * as such a VM can't have any possible NX huge pages.71297129+ */71307130+ struct vhost_task *nx_thread = READ_ONCE(kvm->arch.nx_huge_page_recovery_thread);71317131+71327132+ if (nx_thread)71337133+ vhost_task_wake(nx_thread);71347134+}71357135+71237136static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp)71247137{71257138 if (nx_hugepage_mitigation_hard_disabled)···71937180 kvm_mmu_zap_all_fast(kvm);71947181 mutex_unlock(&kvm->slots_lock);7195718271967196- vhost_task_wake(kvm->arch.nx_huge_page_recovery_thread);71837183+ kvm_wake_nx_recovery_thread(kvm);71977184 }71987185 mutex_unlock(&kvm_lock);71997186 }···73287315 mutex_lock(&kvm_lock);7329731673307317 list_for_each_entry(kvm, &vm_list, vm_list)73317331- vhost_task_wake(kvm->arch.nx_huge_page_recovery_thread);73187318+ kvm_wake_nx_recovery_thread(kvm);7332731973337320 mutex_unlock(&kvm_lock);73347321 }···74647451{74657452 struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once);74667453 struct kvm *kvm = container_of(ka, struct kvm, arch);74547454+ struct vhost_task *nx_thread;7467745574687456 kvm->arch.nx_huge_page_last = get_jiffies_64();74697469- kvm->arch.nx_huge_page_recovery_thread = vhost_task_create(74707470- kvm_nx_huge_page_recovery_worker, kvm_nx_huge_page_recovery_worker_kill,74717471- kvm, "kvm-nx-lpage-recovery");74577457+ nx_thread = vhost_task_create(kvm_nx_huge_page_recovery_worker,74587458+ kvm_nx_huge_page_recovery_worker_kill,74597459+ kvm, "kvm-nx-lpage-recovery");7472746074737473- if (kvm->arch.nx_huge_page_recovery_thread)74747474- vhost_task_start(kvm->arch.nx_huge_page_recovery_thread);74617461+ if (!nx_thread)74627462+ return;74637463+74647464+ vhost_task_start(nx_thread);74657465+74667466+ /* Make the task visible only once it is fully started. */74677467+ WRITE_ONCE(kvm->arch.nx_huge_page_recovery_thread, nx_thread);74757468}7476746974777470int kvm_mmu_post_init_vm(struct kvm *kvm)
···100100 push %r10101101 push %r9102102 push %r8103103-#ifdef CONFIG_FRAME_POINTER104104- pushq $0 /* Dummy push for stack alignment. */105105-#endif106103#endif107104 /* Set the vendor specific function. */108105 call __xen_hypercall_setfunc···114117 pop %ebx115118 pop %eax116119#else117117- lea xen_hypercall_amd(%rip), %rbx118118- cmp %rax, %rbx119119-#ifdef CONFIG_FRAME_POINTER120120- pop %rax /* Dummy pop. */121121-#endif120120+ lea xen_hypercall_amd(%rip), %rcx121121+ cmp %rax, %rcx122122 pop %r8123123 pop %r9124124 pop %r10···126132 pop %rcx127133 pop %rax128134#endif135135+ FRAME_END129136 /* Use correct hypercall function. */130137 jz xen_hypercall_amd131138 jmp xen_hypercall_intel
+5
drivers/accel/amdxdna/amdxdna_pci_drv.c
···21212222#define AMDXDNA_AUTOSUSPEND_DELAY 5000 /* milliseconds */23232424+MODULE_FIRMWARE("amdnpu/1502_00/npu.sbin");2525+MODULE_FIRMWARE("amdnpu/17f0_10/npu.sbin");2626+MODULE_FIRMWARE("amdnpu/17f0_11/npu.sbin");2727+MODULE_FIRMWARE("amdnpu/17f0_20/npu.sbin");2828+2429/*2530 * Bind the driver base on (vendor_id, device_id) pair and later use the2631 * (device_id, rev_id) pair as a key to select the devices. The devices with
+6-2
drivers/accel/ivpu/ivpu_drv.c
···397397 if (ivpu_fw_is_cold_boot(vdev)) {398398 ret = ivpu_pm_dct_init(vdev);399399 if (ret)400400- goto err_diagnose_failure;400400+ goto err_disable_ipc;401401402402 ret = ivpu_hw_sched_init(vdev);403403 if (ret)404404- goto err_diagnose_failure;404404+ goto err_disable_ipc;405405 }406406407407 return 0;408408409409+err_disable_ipc:410410+ ivpu_ipc_disable(vdev);411411+ ivpu_hw_irq_disable(vdev);412412+ disable_irq(vdev->irq);409413err_diagnose_failure:410414 ivpu_hw_diagnose_failure(vdev);411415 ivpu_mmu_evtq_dump(vdev);
···17171818config ARM_AIROHA_SOC_CPUFREQ1919 tristate "Airoha EN7581 SoC CPUFreq support"2020- depends on (ARCH_AIROHA && OF) || COMPILE_TEST2020+ depends on ARCH_AIROHA || COMPILE_TEST2121+ depends on OF2122 select PM_OPP2223 default ARCH_AIROHA2324 help
+10-10
drivers/cpufreq/amd-pstate.c
···699699 if (min_perf < lowest_nonlinear_perf)700700 min_perf = lowest_nonlinear_perf;701701702702- max_perf = cap_perf;702702+ max_perf = cpudata->max_limit_perf;703703 if (max_perf < min_perf)704704 max_perf = min_perf;705705···747747 guard(mutex)(&amd_pstate_driver_lock);748748749749 ret = amd_pstate_cpu_boost_update(policy, state);750750- policy->boost_enabled = !ret ? state : false;751750 refresh_frequency_limits(policy);752751753752 return ret;···821822822823static void amd_pstate_update_limits(unsigned int cpu)823824{824824- struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);825825+ struct cpufreq_policy *policy = NULL;825826 struct amd_cpudata *cpudata;826827 u32 prev_high = 0, cur_high = 0;827828 int ret;828829 bool highest_perf_changed = false;829830831831+ if (!amd_pstate_prefcore)832832+ return;833833+834834+ policy = cpufreq_cpu_get(cpu);830835 if (!policy)831836 return;832837833838 cpudata = policy->driver_data;834839835835- if (!amd_pstate_prefcore)836836- return;837837-838840 guard(mutex)(&amd_pstate_driver_lock);839841840842 ret = amd_get_highest_perf(cpu, &cur_high);841841- if (ret)842842- goto free_cpufreq_put;843843+ if (ret) {844844+ cpufreq_cpu_put(policy);845845+ return;846846+ }843847844848 prev_high = READ_ONCE(cpudata->prefcore_ranking);845849 highest_perf_changed = (prev_high != cur_high);···852850 if (cur_high < CPPC_MAX_PERF)853851 sched_set_itmt_core_prio((int)cur_high, cpu);854852 }855855-856856-free_cpufreq_put:857853 cpufreq_cpu_put(policy);858854859855 if (!highest_perf_changed)
+2-1
drivers/cpufreq/cpufreq.c
···15711571 policy->cdev = of_cpufreq_cooling_register(policy);1572157215731573 /* Let the per-policy boost flag mirror the cpufreq_driver boost during init */15741574- if (policy->boost_enabled != cpufreq_boost_enabled()) {15741574+ if (cpufreq_driver->set_boost &&15751575+ policy->boost_enabled != cpufreq_boost_enabled()) {15751576 policy->boost_enabled = cpufreq_boost_enabled();15761577 ret = cpufreq_driver->set_boost(policy, policy->boost_enabled);15771578 if (ret) {
+1-1
drivers/firmware/Kconfig
···106106 select ISCSI_BOOT_SYSFS107107 select ISCSI_IBFT_FIND if X86108108 depends on ACPI && SCSI && SCSI_LOWLEVEL109109- default n109109+ default n110110 help111111 This option enables support for detection and exposing of iSCSI112112 Boot Firmware Table (iBFT) via sysfs to userspace. If you wish to
+4-1
drivers/firmware/iscsi_ibft.c
···310310 str += sprintf_ipaddr(str, nic->ip_addr);311311 break;312312 case ISCSI_BOOT_ETH_SUBNET_MASK:313313- val = cpu_to_be32(~((1 << (32-nic->subnet_mask_prefix))-1));313313+ if (nic->subnet_mask_prefix > 32)314314+ val = cpu_to_be32(~0);315315+ else316316+ val = cpu_to_be32(~((1 << (32-nic->subnet_mask_prefix))-1));314317 str += sprintf(str, "%pI4", &val);315318 break;316319 case ISCSI_BOOT_ETH_PREFIX_LEN:
+1
drivers/gpio/Kconfig
···338338339339config GPIO_GRGPIO340340 tristate "Aeroflex Gaisler GRGPIO support"341341+ depends on OF || COMPILE_TEST341342 select GPIO_GENERIC342343 select IRQ_DOMAIN343344 help
-19
drivers/gpio/gpio-pca953x.c
···841841 DECLARE_BITMAP(trigger, MAX_LINE);842842 int ret;843843844844- if (chip->driver_data & PCA_PCAL) {845845- /* Read the current interrupt status from the device */846846- ret = pca953x_read_regs(chip, PCAL953X_INT_STAT, trigger);847847- if (ret)848848- return false;849849-850850- /* Check latched inputs and clear interrupt status */851851- ret = pca953x_read_regs(chip, chip->regs->input, cur_stat);852852- if (ret)853853- return false;854854-855855- /* Apply filter for rising/falling edge selection */856856- bitmap_replace(new_stat, chip->irq_trig_fall, chip->irq_trig_raise, cur_stat, gc->ngpio);857857-858858- bitmap_and(pending, new_stat, trigger, gc->ngpio);859859-860860- return !bitmap_empty(pending, gc->ngpio);861861- }862862-863844 ret = pca953x_read_regs(chip, chip->regs->input, cur_stat);864845 if (ret)865846 return false;
+8-5
drivers/gpio/gpio-sim.c
···10281028 struct configfs_subsystem *subsys = dev->group.cg_subsys;10291029 struct gpio_sim_bank *bank;10301030 struct gpio_sim_line *line;10311031+ struct config_item *item;1031103210321033 /*10331033- * The device only needs to depend on leaf line entries. This is10341034+ * The device only needs to depend on leaf entries. This is10341035 * sufficient to lock up all the configfs entries that the10351036 * instantiated, alive device depends on.10361037 */10371038 list_for_each_entry(bank, &dev->bank_list, siblings) {10381039 list_for_each_entry(line, &bank->line_list, siblings) {10401040+ item = line->hog ? &line->hog->item10411041+ : &line->group.cg_item;10421042+10391043 if (lock)10401040- WARN_ON(configfs_depend_item_unlocked(10411041- subsys, &line->group.cg_item));10441044+ WARN_ON(configfs_depend_item_unlocked(subsys,10451045+ item));10421046 else10431043- configfs_undepend_item_unlocked(10441044- &line->group.cg_item);10471047+ configfs_undepend_item_unlocked(item);10451048 }10461049 }10471050}
···2133213321342134 dc_enable_stereo(dc, context, dc_streams, context->stream_count);2135213521362136- if (context->stream_count > get_seamless_boot_stream_count(context) ||21362136+ if (get_seamless_boot_stream_count(context) == 0 ||21372137 context->stream_count == 0) {21382138 /* Must wait for no flips to be pending before doing optimize bw */21392139 hwss_wait_for_no_pipes_pending(dc, context);
···256256 if (enabled)257257 vgacrdf_test |= AST_IO_VGACRDF_DP_VIDEO_ENABLE;258258259259- for (i = 0; i < 200; ++i) {259259+ for (i = 0; i < 1000; ++i) {260260 if (i)261261 mdelay(1);262262 vgacrdf = ast_get_index_reg_mask(ast, AST_IO_VGACRI, 0xdf,
+3-11
drivers/gpu/drm/display/drm_dp_cec.c
···311311 if (!aux->transfer)312312 return;313313314314-#ifndef CONFIG_MEDIA_CEC_RC315315- /*316316- * CEC_CAP_RC is part of CEC_CAP_DEFAULTS, but it is stripped by317317- * cec_allocate_adapter() if CONFIG_MEDIA_CEC_RC is undefined.318318- *319319- * Do this here as well to ensure the tests against cec_caps are320320- * correct.321321- */322322- cec_caps &= ~CEC_CAP_RC;323323-#endif324314 cancel_delayed_work_sync(&aux->cec.unregister_work);325315326316 mutex_lock(&aux->cec.lock);···327337 num_las = CEC_MAX_LOG_ADDRS;328338329339 if (aux->cec.adap) {330330- if (aux->cec.adap->capabilities == cec_caps &&340340+ /* Check if the adapter properties have changed */341341+ if ((aux->cec.adap->capabilities & CEC_CAP_MONITOR_ALL) ==342342+ (cec_caps & CEC_CAP_MONITOR_ALL) &&331343 aux->cec.adap->available_log_addrs == num_las) {332344 /* Unchanged, so just set the phys addr */333345 cec_s_phys_addr(aux->cec.adap, source_physical_address, false);
···119119 drm_puts(&p, "\n**** GuC CT ****\n");120120 xe_guc_ct_snapshot_print(ss->guc.ct, &p);121121122122- /*123123- * Don't add a new section header here because the mesa debug decoder124124- * tool expects the context information to be in the 'GuC CT' section.125125- */126126- /* drm_puts(&p, "\n**** Contexts ****\n"); */122122+ drm_puts(&p, "\n**** Contexts ****\n");127123 xe_guc_exec_queue_snapshot_print(ss->ge, &p);128124129125 drm_puts(&p, "\n**** Job ****\n");···391395/**392396 * xe_print_blob_ascii85 - print a BLOB to some useful location in ASCII85393397 *394394- * The output is split to multiple lines because some print targets, e.g. dmesg395395- * cannot handle arbitrarily long lines. Note also that printing to dmesg in396396- * piece-meal fashion is not possible, each separate call to drm_puts() has a397397- * line-feed automatically added! Therefore, the entire output line must be398398- * constructed in a local buffer first, then printed in one atomic output call.398398+ * The output is split into multiple calls to drm_puts() because some print399399+ * targets, e.g. dmesg, cannot handle arbitrarily long lines. These targets may400400+ * add newlines, as is the case with dmesg: each drm_puts() call creates a401401+ * separate line.399402 *400403 * There is also a scheduler yield call to prevent the 'task has been stuck for401404 * 120s' kernel hang check feature from firing when printing to a slow target402405 * such as dmesg over a serial port.403406 *404404- * TODO: Add compression prior to the ASCII85 encoding to shrink huge buffers down.405405- *406407 * @p: the printer object to output to407408 * @prefix: optional prefix to add to output string409409+ * @suffix: optional suffix to add at the end. 0 disables it and is410410+ * not added to the output, which is useful when using multiple calls411411+ * to dump data to @p408412 * @blob: the Binary Large OBject to dump out409413 * @offset: offset in bytes to skip from the front of the BLOB, must be a multiple of sizeof(u32)410414 * @size: the size in bytes of the BLOB, must be a multiple of sizeof(u32)411415 */412412-void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,416416+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix,413417 const void *blob, size_t offset, size_t size)414418{415419 const u32 *blob32 = (const u32 *)blob;416420 char buff[ASCII85_BUFSZ], *line_buff;417421 size_t line_pos = 0;418422419419- /*420420- * Splitting blobs across multiple lines is not compatible with the mesa421421- * debug decoder tool. Note that even dropping the explicit '\n' below422422- * doesn't help because the GuC log is so big some underlying implementation423423- * still splits the lines at 512K characters. So just bail completely for424424- * the moment.425425- */426426- return;427427-428423#define DMESG_MAX_LINE_LEN 800429429-#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "\n\0" */424424+ /* Always leave space for the suffix char and the \0 */425425+#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "<suffix>\0" */430426431427 if (size & 3)432428 drm_printf(p, "Size not word aligned: %zu", size);···450462 line_pos += strlen(line_buff + line_pos);451463452464 if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {453453- line_buff[line_pos++] = '\n';454465 line_buff[line_pos++] = 0;455466456467 drm_puts(p, line_buff);···461474 }462475 }463476464464- if (line_pos) {465465- line_buff[line_pos++] = '\n';466466- line_buff[line_pos++] = 0;477477+ if (suffix)478478+ line_buff[line_pos++] = suffix;467479480480+ if (line_pos) {481481+ line_buff[line_pos++] = 0;468482 drm_puts(p, line_buff);469483 }470484
···13001300 info.flags |= I2C_CLIENT_SLAVE;13011301 }1302130213031303- info.flags |= I2C_CLIENT_USER;13041304-13051303 client = i2c_new_client_device(adap, &info);13061304 if (IS_ERR(client))13071305 return PTR_ERR(client);1308130613071307+ /* Keep track of the added device */13081308+ mutex_lock(&adap->userspace_clients_lock);13091309+ list_add_tail(&client->detected, &adap->userspace_clients);13101310+ mutex_unlock(&adap->userspace_clients_lock);13091311 dev_info(dev, "%s: Instantiated device %s at 0x%02hx\n", "new_device",13101312 info.type, info.addr);1311131313121314 return count;13131315}13141316static DEVICE_ATTR_WO(new_device);13151315-13161316-static int __i2c_find_user_addr(struct device *dev, const void *addrp)13171317-{13181318- struct i2c_client *client = i2c_verify_client(dev);13191319- unsigned short addr = *(unsigned short *)addrp;13201320-13211321- return client && client->flags & I2C_CLIENT_USER &&13221322- i2c_encode_flags_to_addr(client) == addr;13231323-}1324131713251318/*13261319 * And of course let the users delete the devices they instantiated, if···13291336 const char *buf, size_t count)13301337{13311338 struct i2c_adapter *adap = to_i2c_adapter(dev);13321332- struct device *child_dev;13391339+ struct i2c_client *client, *next;13331340 unsigned short addr;13341341 char end;13351342 int res;···13451352 return -EINVAL;13461353 }1347135413481348- mutex_lock(&core_lock);13491355 /* Make sure the device was added through sysfs */13501350- child_dev = device_find_child(&adap->dev, &addr, __i2c_find_user_addr);13511351- if (child_dev) {13521352- i2c_unregister_device(i2c_verify_client(child_dev));13531353- put_device(child_dev);13541354- } else {13551355- dev_err(dev, "Can't find userspace-created device at %#x\n", addr);13561356- count = -ENOENT;13571357- }13581358- mutex_unlock(&core_lock);13561356+ res = -ENOENT;13571357+ mutex_lock_nested(&adap->userspace_clients_lock,13581358+ i2c_adapter_depth(adap));13591359+ list_for_each_entry_safe(client, next, &adap->userspace_clients,13601360+ detected) {13611361+ if (i2c_encode_flags_to_addr(client) == addr) {13621362+ dev_info(dev, "%s: Deleting device %s at 0x%02hx\n",13631363+ "delete_device", client->name, client->addr);1359136413601360- return count;13651365+ list_del(&client->detected);13661366+ i2c_unregister_device(client);13671367+ res = count;13681368+ break;13691369+ }13701370+ }13711371+ mutex_unlock(&adap->userspace_clients_lock);13721372+13731373+ if (res < 0)13741374+ dev_err(dev, "%s: Can't find device in list\n",13751375+ "delete_device");13761376+ return res;13611377}13621378static DEVICE_ATTR_IGNORE_LOCKDEP(delete_device, S_IWUSR, NULL,13631379 delete_device_store);···15371535 adap->locked_flags = 0;15381536 rt_mutex_init(&adap->bus_lock);15391537 rt_mutex_init(&adap->mux_lock);15381538+ mutex_init(&adap->userspace_clients_lock);15391539+ INIT_LIST_HEAD(&adap->userspace_clients);1540154015411541 /* Set default timeout to 1 second if not already set */15421542 if (adap->timeout == 0)···17041700}17051701EXPORT_SYMBOL_GPL(i2c_add_numbered_adapter);1706170217031703+static void i2c_do_del_adapter(struct i2c_driver *driver,17041704+ struct i2c_adapter *adapter)17051705+{17061706+ struct i2c_client *client, *_n;17071707+17081708+ /* Remove the devices we created ourselves as the result of hardware17091709+ * probing (using a driver's detect method) */17101710+ list_for_each_entry_safe(client, _n, &driver->clients, detected) {17111711+ if (client->adapter == adapter) {17121712+ dev_dbg(&adapter->dev, "Removing %s at 0x%x\n",17131713+ client->name, client->addr);17141714+ list_del(&client->detected);17151715+ i2c_unregister_device(client);17161716+ }17171717+ }17181718+}17191719+17071720static int __unregister_client(struct device *dev, void *dummy)17081721{17091722 struct i2c_client *client = i2c_verify_client(dev);···17361715 return 0;17371716}1738171717181718+static int __process_removed_adapter(struct device_driver *d, void *data)17191719+{17201720+ i2c_do_del_adapter(to_i2c_driver(d), data);17211721+ return 0;17221722+}17231723+17391724/**17401725 * i2c_del_adapter - unregister I2C adapter17411726 * @adap: the adapter being unregistered···17531726void i2c_del_adapter(struct i2c_adapter *adap)17541727{17551728 struct i2c_adapter *found;17291729+ struct i2c_client *client, *next;1756173017571731 /* First make sure that this adapter was ever added */17581732 mutex_lock(&core_lock);···17651737 }1766173817671739 i2c_acpi_remove_space_handler(adap);17401740+ /* Tell drivers about this removal */17411741+ mutex_lock(&core_lock);17421742+ bus_for_each_drv(&i2c_bus_type, NULL, adap,17431743+ __process_removed_adapter);17441744+ mutex_unlock(&core_lock);17451745+17461746+ /* Remove devices instantiated from sysfs */17471747+ mutex_lock_nested(&adap->userspace_clients_lock,17481748+ i2c_adapter_depth(adap));17491749+ list_for_each_entry_safe(client, next, &adap->userspace_clients,17501750+ detected) {17511751+ dev_dbg(&adap->dev, "Removing %s at 0x%x\n", client->name,17521752+ client->addr);17531753+ list_del(&client->detected);17541754+ i2c_unregister_device(client);17551755+ }17561756+ mutex_unlock(&adap->userspace_clients_lock);1768175717691758 /* Detach any active clients. This can't fail, thus we do not17701759 * check the returned value. This is a two-pass process, because17711760 * we can't remove the dummy devices during the first pass: they17721761 * could have been instantiated by real devices wishing to clean17731762 * them up properly, so we give them a chance to do that first. */17741774- mutex_lock(&core_lock);17751763 device_for_each_child(&adap->dev, NULL, __unregister_client);17761764 device_for_each_child(&adap->dev, NULL, __unregister_dummy);17771777- mutex_unlock(&core_lock);1778176517791766 /* device name is gone after device_unregister */17801767 dev_dbg(&adap->dev, "adapter [%s] unregistered\n", adap->name);···20091966 /* add the driver to the list of i2c drivers in the driver core */20101967 driver->driver.owner = owner;20111968 driver->driver.bus = &i2c_bus_type;19691969+ INIT_LIST_HEAD(&driver->clients);2012197020131971 /* When registration returns, the driver core20141972 * will have called probe() for all matching-but-unbound devices.···20271983}20281984EXPORT_SYMBOL(i2c_register_driver);2029198520302030-static int __i2c_unregister_detected_client(struct device *dev, void *argp)19861986+static int __process_removed_driver(struct device *dev, void *data)20311987{20322032- struct i2c_client *client = i2c_verify_client(dev);20332033-20342034- if (client && client->flags & I2C_CLIENT_AUTO)20352035- i2c_unregister_device(client);20362036-19881988+ if (dev->type == &i2c_adapter_type)19891989+ i2c_do_del_adapter(data, to_i2c_adapter(dev));20371990 return 0;20381991}20391992···20412000 */20422001void i2c_del_driver(struct i2c_driver *driver)20432002{20442044- mutex_lock(&core_lock);20452045- /* Satisfy __must_check, function can't fail */20462046- if (driver_for_each_device(&driver->driver, NULL, NULL,20472047- __i2c_unregister_detected_client)) {20482048- }20492049- mutex_unlock(&core_lock);20032003+ i2c_for_each_dev(driver, __process_removed_driver);2050200420512005 driver_unregister(&driver->driver);20522006 pr_debug("driver [%s] unregistered\n", driver->driver.name);···24682432 /* Finally call the custom detection function */24692433 memset(&info, 0, sizeof(struct i2c_board_info));24702434 info.addr = addr;24712471- info.flags = I2C_CLIENT_AUTO;24722435 err = driver->detect(temp_client, &info);24732436 if (err) {24742437 /* -ENODEV is returned if the detection fails. We catch it···24942459 dev_dbg(&adapter->dev, "Creating %s at 0x%02x\n",24952460 info.type, info.addr);24962461 client = i2c_new_client_device(adapter, &info);24972497- if (IS_ERR(client))24622462+ if (!IS_ERR(client))24632463+ list_add_tail(&client->detected, &driver->clients);24642464+ else24982465 dev_err(&adapter->dev, "Failed creating %s at 0x%02x\n",24992466 info.type, info.addr);25002467 }
+1
drivers/irqchip/Kconfig
···169169170170config LAN966X_OIC171171 tristate "Microchip LAN966x OIC Support"172172+ depends on MCHP_LAN966X_PCI || COMPILE_TEST172173 select GENERIC_IRQ_CHIP173174 select IRQ_DOMAIN174175 help
+2-1
drivers/irqchip/irq-apple-aic.c
···577577 AIC_FIQ_HWIRQ(AIC_TMR_EL02_VIRT));578578 }579579580580- if (read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & PMCR0_IACT) {580580+ if ((read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & (PMCR0_IMODE | PMCR0_IACT)) ==581581+ (FIELD_PREP(PMCR0_IMODE, PMCR0_IMODE_FIQ) | PMCR0_IACT)) {581582 int irq;582583 if (cpumask_test_cpu(smp_processor_id(),583584 &aic_irqc->fiq_aff[AIC_CPU_PMU_P]->aff))
+2-1
drivers/irqchip/irq-mvebu-icu.c
···6868 unsigned long *hwirq, unsigned int *type)6969{7070 unsigned int param_count = static_branch_unlikely(&legacy_bindings) ? 3 : 2;7171- struct mvebu_icu_msi_data *msi_data = d->host_data;7171+ struct msi_domain_info *info = d->host_data;7272+ struct mvebu_icu_msi_data *msi_data = info->chip_data;7273 struct mvebu_icu *icu = msi_data->icu;73747475 /* Check the count of the parameters in dt */
···14411441 aq_ptp_ring_free(self);14421442 aq_ptp_free(self);1443144314441444- if (likely(self->aq_fw_ops->deinit) && link_down) {14441444+ /* May be invoked during hot unplug. */14451445+ if (pci_device_is_present(self->pdev) &&14461446+ likely(self->aq_fw_ops->deinit) && link_down) {14451447 mutex_lock(&self->fwreq_mutex);14461448 self->aq_fw_ops->deinit(self->aq_hw);14471449 mutex_unlock(&self->fwreq_mutex);
···4141{4242 struct bcmgenet_priv *priv = netdev_priv(dev);4343 struct device *kdev = &priv->pdev->dev;4444+ u32 phy_wolopts = 0;44454545- if (dev->phydev)4646+ if (dev->phydev) {4647 phy_ethtool_get_wol(dev->phydev, wol);4848+ phy_wolopts = wol->wolopts;4949+ }47504851 /* MAC is not wake-up capable, return what the PHY does */4952 if (!device_can_wakeup(kdev))···54515552 /* Overlay MAC capabilities with that of the PHY queried before */5653 wol->supported |= WAKE_MAGIC | WAKE_MAGICSECURE | WAKE_FILTER;5757- wol->wolopts = priv->wolopts;5858- memset(wol->sopass, 0, sizeof(wol->sopass));5454+ wol->wolopts |= priv->wolopts;59555656+ /* Return the PHY configured magic password */5757+ if (phy_wolopts & WAKE_MAGICSECURE)5858+ return;5959+6060+ /* Otherwise the MAC one */6161+ memset(wol->sopass, 0, sizeof(wol->sopass));6062 if (wol->wolopts & WAKE_MAGICSECURE)6163 memcpy(wol->sopass, priv->sopass, sizeof(priv->sopass));6264}···7870 /* Try Wake-on-LAN from the PHY first */7971 if (dev->phydev) {8072 ret = phy_ethtool_set_wol(dev->phydev, wol);8181- if (ret != -EOPNOTSUPP)7373+ if (ret != -EOPNOTSUPP && wol->wolopts)8274 return ret;8375 }8476
+58
drivers/net/ethernet/broadcom/tg3.c
···5555#include <linux/hwmon.h>5656#include <linux/hwmon-sysfs.h>5757#include <linux/crc32poly.h>5858+#include <linux/dmi.h>58595960#include <net/checksum.h>6061#include <net/gso.h>···18213182121821418213static SIMPLE_DEV_PM_OPS(tg3_pm_ops, tg3_suspend, tg3_resume);18215182141821518215+/* Systems where ACPI _PTS (Prepare To Sleep) S5 will result in a fatal1821618216+ * PCIe AER event on the tg3 device if the tg3 device is not, or cannot1821718217+ * be, powered down.1821818218+ */1821918219+static const struct dmi_system_id tg3_restart_aer_quirk_table[] = {1822018220+ {1822118221+ .matches = {1822218222+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),1822318223+ DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R440"),1822418224+ },1822518225+ },1822618226+ {1822718227+ .matches = {1822818228+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),1822918229+ DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R540"),1823018230+ },1823118231+ },1823218232+ {1823318233+ .matches = {1823418234+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),1823518235+ DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R640"),1823618236+ },1823718237+ },1823818238+ {1823918239+ .matches = {1824018240+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),1824118241+ DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R650"),1824218242+ },1824318243+ },1824418244+ {1824518245+ .matches = {1824618246+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),1824718247+ DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R740"),1824818248+ },1824918249+ },1825018250+ {1825118251+ .matches = {1825218252+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),1825318253+ DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R750"),1825418254+ },1825518255+ },1825618256+ {}1825718257+};1825818258+1821618259static void tg3_shutdown(struct pci_dev *pdev)1821718260{1821818261 struct net_device *dev = pci_get_drvdata(pdev);···18273182281827418229 if (system_state == SYSTEM_POWER_OFF)1827518230 tg3_power_down(tp);1823118231+ else if (system_state == SYSTEM_RESTART &&1823218232+ dmi_first_match(tg3_restart_aer_quirk_table) &&1823318233+ pdev->current_state != PCI_D3cold &&1823418234+ pdev->current_state != PCI_UNKNOWN) {1823518235+ /* Disable PCIe AER on the tg3 to avoid a fatal1823618236+ * error during this system restart.1823718237+ */1823818238+ pcie_capability_clear_word(pdev, PCI_EXP_DEVCTL,1823918239+ PCI_EXP_DEVCTL_CERE |1824018240+ PCI_EXP_DEVCTL_NFERE |1824118241+ PCI_EXP_DEVCTL_FERE |1824218242+ PCI_EXP_DEVCTL_URRE);1824318243+ }18276182441827718245 rtnl_unlock();1827818246
···527527 * @xdp: xdp_buff used as input to the XDP program528528 * @xdp_prog: XDP program to run529529 * @xdp_ring: ring to be used for XDP_TX action530530- * @rx_buf: Rx buffer to store the XDP action531530 * @eop_desc: Last descriptor in packet to read metadata from532531 *533532 * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}534533 */535535-static void534534+static u32536535ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,537536 struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,538538- struct ice_rx_buf *rx_buf, union ice_32b_rx_flex_desc *eop_desc)537537+ union ice_32b_rx_flex_desc *eop_desc)539538{540539 unsigned int ret = ICE_XDP_PASS;541540 u32 act;···573574 ret = ICE_XDP_CONSUMED;574575 }575576exit:576576- ice_set_rx_bufs_act(xdp, rx_ring, ret);577577+ return ret;577578}578579579580/**···859860 xdp_buff_set_frags_flag(xdp);860861 }861862862862- if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) {863863- ice_set_rx_bufs_act(xdp, rx_ring, ICE_XDP_CONSUMED);863863+ if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS))864864 return -ENOMEM;865865- }866865867866 __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page,868867 rx_buf->page_offset, size);···921924 struct ice_rx_buf *rx_buf;922925923926 rx_buf = &rx_ring->rx_buf[ntc];924924- rx_buf->pgcnt = page_count(rx_buf->page);925927 prefetchw(rx_buf->page);926928927929 if (!size)···934938 rx_buf->pagecnt_bias--;935939936940 return rx_buf;941941+}942942+943943+/**944944+ * ice_get_pgcnts - grab page_count() for gathered fragments945945+ * @rx_ring: Rx descriptor ring to store the page counts on946946+ *947947+ * This function is intended to be called right before running XDP948948+ * program so that the page recycling mechanism will be able to take949949+ * a correct decision regarding underlying pages; this is done in such950950+ * way as XDP program can change the refcount of page951951+ */952952+static void ice_get_pgcnts(struct ice_rx_ring *rx_ring)953953+{954954+ u32 nr_frags = rx_ring->nr_frags + 1;955955+ u32 idx = rx_ring->first_desc;956956+ struct ice_rx_buf *rx_buf;957957+ u32 cnt = rx_ring->count;958958+959959+ for (int i = 0; i < nr_frags; i++) {960960+ rx_buf = &rx_ring->rx_buf[idx];961961+ rx_buf->pgcnt = page_count(rx_buf->page);962962+963963+ if (++idx == cnt)964964+ idx = 0;965965+ }937966}938967939968/**···10721051 rx_buf->page_offset + headlen, size,10731052 xdp->frame_sz);10741053 } else {10751075- /* buffer is unused, change the act that should be taken later10761076- * on; data was copied onto skb's linear part so there's no10541054+ /* buffer is unused, restore biased page count in Rx buffer;10551055+ * data was copied onto skb's linear part so there's no10771056 * need for adjusting page offset and we can reuse this buffer10781057 * as-is10791058 */10801080- rx_buf->act = ICE_SKB_CONSUMED;10591059+ rx_buf->pagecnt_bias++;10811060 }1082106110831062 if (unlikely(xdp_buff_has_frags(xdp))) {···11251104}1126110511271106/**11071107+ * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all frame frags11081108+ * @rx_ring: Rx ring with all the auxiliary data11091109+ * @xdp: XDP buffer carrying linear + frags part11101110+ * @xdp_xmit: XDP_TX/XDP_REDIRECT verdict storage11111111+ * @ntc: a current next_to_clean value to be stored at rx_ring11121112+ * @verdict: return code from XDP program execution11131113+ *11141114+ * Walk through gathered fragments and satisfy internal page11151115+ * recycle mechanism; we take here an action related to verdict11161116+ * returned by XDP program;11171117+ */11181118+static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,11191119+ u32 *xdp_xmit, u32 ntc, u32 verdict)11201120+{11211121+ u32 nr_frags = rx_ring->nr_frags + 1;11221122+ u32 idx = rx_ring->first_desc;11231123+ u32 cnt = rx_ring->count;11241124+ u32 post_xdp_frags = 1;11251125+ struct ice_rx_buf *buf;11261126+ int i;11271127+11281128+ if (unlikely(xdp_buff_has_frags(xdp)))11291129+ post_xdp_frags += xdp_get_shared_info_from_buff(xdp)->nr_frags;11301130+11311131+ for (i = 0; i < post_xdp_frags; i++) {11321132+ buf = &rx_ring->rx_buf[idx];11331133+11341134+ if (verdict & (ICE_XDP_TX | ICE_XDP_REDIR)) {11351135+ ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);11361136+ *xdp_xmit |= verdict;11371137+ } else if (verdict & ICE_XDP_CONSUMED) {11381138+ buf->pagecnt_bias++;11391139+ } else if (verdict == ICE_XDP_PASS) {11401140+ ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);11411141+ }11421142+11431143+ ice_put_rx_buf(rx_ring, buf);11441144+11451145+ if (++idx == cnt)11461146+ idx = 0;11471147+ }11481148+ /* handle buffers that represented frags released by XDP prog;11491149+ * for these we keep pagecnt_bias as-is; refcount from struct page11501150+ * has been decremented within XDP prog and we do not have to increase11511151+ * the biased refcnt11521152+ */11531153+ for (; i < nr_frags; i++) {11541154+ buf = &rx_ring->rx_buf[idx];11551155+ ice_put_rx_buf(rx_ring, buf);11561156+ if (++idx == cnt)11571157+ idx = 0;11581158+ }11591159+11601160+ xdp->data = NULL;11611161+ rx_ring->first_desc = ntc;11621162+ rx_ring->nr_frags = 0;11631163+}11641164+11651165+/**11281166 * ice_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf11291167 * @rx_ring: Rx descriptor ring to transact packets on11301168 * @budget: Total limit on number of packets to process···12001120 unsigned int total_rx_bytes = 0, total_rx_pkts = 0;12011121 unsigned int offset = rx_ring->rx_offset;12021122 struct xdp_buff *xdp = &rx_ring->xdp;12031203- u32 cached_ntc = rx_ring->first_desc;12041123 struct ice_tx_ring *xdp_ring = NULL;12051124 struct bpf_prog *xdp_prog = NULL;12061125 u32 ntc = rx_ring->next_to_clean;11261126+ u32 cached_ntu, xdp_verdict;12071127 u32 cnt = rx_ring->count;12081128 u32 xdp_xmit = 0;12091209- u32 cached_ntu;12101129 bool failure;12111211- u32 first;1212113012131131 xdp_prog = READ_ONCE(rx_ring->xdp_prog);12141132 if (xdp_prog) {···12681190 xdp_prepare_buff(xdp, hard_start, offset, size, !!offset);12691191 xdp_buff_clear_frags_flag(xdp);12701192 } else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) {11931193+ ice_put_rx_mbuf(rx_ring, xdp, NULL, ntc, ICE_XDP_CONSUMED);12711194 break;12721195 }12731196 if (++ntc == cnt)···12781199 if (ice_is_non_eop(rx_ring, rx_desc))12791200 continue;1280120112811281- ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf, rx_desc);12821282- if (rx_buf->act == ICE_XDP_PASS)12021202+ ice_get_pgcnts(rx_ring);12031203+ xdp_verdict = ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_desc);12041204+ if (xdp_verdict == ICE_XDP_PASS)12831205 goto construct_skb;12841206 total_rx_bytes += xdp_get_buff_len(xdp);12851207 total_rx_pkts++;1286120812871287- xdp->data = NULL;12881288- rx_ring->first_desc = ntc;12891289- rx_ring->nr_frags = 0;12091209+ ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict);12101210+12901211 continue;12911212construct_skb:12921213 if (likely(ice_ring_uses_build_skb(rx_ring)))···12961217 /* exit if we failed to retrieve a buffer */12971218 if (!skb) {12981219 rx_ring->ring_stats->rx_stats.alloc_page_failed++;12991299- rx_buf->act = ICE_XDP_CONSUMED;13001300- if (unlikely(xdp_buff_has_frags(xdp)))13011301- ice_set_rx_bufs_act(xdp, rx_ring,13021302- ICE_XDP_CONSUMED);13031303- xdp->data = NULL;13041304- rx_ring->first_desc = ntc;13051305- rx_ring->nr_frags = 0;13061306- break;12201220+ xdp_verdict = ICE_XDP_CONSUMED;13071221 }13081308- xdp->data = NULL;13091309- rx_ring->first_desc = ntc;13101310- rx_ring->nr_frags = 0;12221222+ ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict);12231223+12241224+ if (!skb)12251225+ break;1311122613121227 stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S);13131228 if (unlikely(ice_test_staterr(rx_desc->wb.status_error0,···13301257 total_rx_pkts++;13311258 }1332125913331333- first = rx_ring->first_desc;13341334- while (cached_ntc != first) {13351335- struct ice_rx_buf *buf = &rx_ring->rx_buf[cached_ntc];13361336-13371337- if (buf->act & (ICE_XDP_TX | ICE_XDP_REDIR)) {13381338- ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);13391339- xdp_xmit |= buf->act;13401340- } else if (buf->act & ICE_XDP_CONSUMED) {13411341- buf->pagecnt_bias++;13421342- } else if (buf->act == ICE_XDP_PASS) {13431343- ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);13441344- }13451345-13461346- ice_put_rx_buf(rx_ring, buf);13471347- if (++cached_ntc >= cnt)13481348- cached_ntc = 0;13491349- }13501260 rx_ring->next_to_clean = ntc;13511261 /* return up to cleaned_count buffers to hardware */13521262 failure = ice_alloc_rx_bufs(rx_ring, ICE_RX_DESC_UNUSED(rx_ring));
-1
drivers/net/ethernet/intel/ice/ice_txrx.h
···201201 struct page *page;202202 unsigned int page_offset;203203 unsigned int pgcnt;204204- unsigned int act;205204 unsigned int pagecnt_bias;206205};207206
-43
drivers/net/ethernet/intel/ice/ice_txrx_lib.h
···66#include "ice.h"7788/**99- * ice_set_rx_bufs_act - propagate Rx buffer action to frags1010- * @xdp: XDP buffer representing frame (linear and frags part)1111- * @rx_ring: Rx ring struct1212- * act: action to store onto Rx buffers related to XDP buffer parts1313- *1414- * Set action that should be taken before putting Rx buffer from first frag1515- * to the last.1616- */1717-static inline void1818-ice_set_rx_bufs_act(struct xdp_buff *xdp, const struct ice_rx_ring *rx_ring,1919- const unsigned int act)2020-{2121- u32 sinfo_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;2222- u32 nr_frags = rx_ring->nr_frags + 1;2323- u32 idx = rx_ring->first_desc;2424- u32 cnt = rx_ring->count;2525- struct ice_rx_buf *buf;2626-2727- for (int i = 0; i < nr_frags; i++) {2828- buf = &rx_ring->rx_buf[idx];2929- buf->act = act;3030-3131- if (++idx == cnt)3232- idx = 0;3333- }3434-3535- /* adjust pagecnt_bias on frags freed by XDP prog */3636- if (sinfo_frags < rx_ring->nr_frags && act == ICE_XDP_CONSUMED) {3737- u32 delta = rx_ring->nr_frags - sinfo_frags;3838-3939- while (delta) {4040- if (idx == 0)4141- idx = cnt - 1;4242- else4343- idx--;4444- buf = &rx_ring->rx_buf[idx];4545- buf->pagecnt_bias--;4646- delta--;4747- }4848- }4949-}5050-5151-/**529 * ice_test_staterr - tests bits in Rx descriptor status and error fields5310 * @status_err_n: Rx descriptor status_error0 or status_error1 bits5411 * @stat_err_bits: value to mask
+17-18
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
···24242424 u32 chan = 0;24252425 u8 qmode = 0;2426242624272427+ if (rxfifosz == 0)24282428+ rxfifosz = priv->dma_cap.rx_fifo_size;24292429+ if (txfifosz == 0)24302430+ txfifosz = priv->dma_cap.tx_fifo_size;24312431+24272432 /* Split up the shared Tx/Rx FIFO memory on DW QoS Eth and DW XGMAC */24282433 if (priv->plat->has_gmac4 || priv->plat->has_xgmac) {24292434 rxfifosz /= rx_channels_count;···28962891 u32 tx_channels_count = priv->plat->tx_queues_to_use;28972892 int rxfifosz = priv->plat->rx_fifo_size;28982893 int txfifosz = priv->plat->tx_fifo_size;28942894+28952895+ if (rxfifosz == 0)28962896+ rxfifosz = priv->dma_cap.rx_fifo_size;28972897+ if (txfifosz == 0)28982898+ txfifosz = priv->dma_cap.tx_fifo_size;2899289929002900 /* Adjust for real per queue fifo size */29012901 rxfifosz /= rx_channels_count;···58785868 const int mtu = new_mtu;58795869 int ret;5880587058715871+ if (txfifosz == 0)58725872+ txfifosz = priv->dma_cap.tx_fifo_size;58735873+58815874 txfifosz /= priv->plat->tx_queues_to_use;5882587558835876 if (stmmac_xdp_is_enabled(priv) && new_mtu > ETH_DATA_LEN) {···72327219 priv->plat->tx_queues_to_use = priv->dma_cap.number_tx_queues;72337220 }7234722172357235- if (!priv->plat->rx_fifo_size) {72367236- if (priv->dma_cap.rx_fifo_size) {72377237- priv->plat->rx_fifo_size = priv->dma_cap.rx_fifo_size;72387238- } else {72397239- dev_err(priv->device, "Can't specify Rx FIFO size\n");72407240- return -ENODEV;72417241- }72427242- } else if (priv->dma_cap.rx_fifo_size &&72437243- priv->plat->rx_fifo_size > priv->dma_cap.rx_fifo_size) {72227222+ if (priv->dma_cap.rx_fifo_size &&72237223+ priv->plat->rx_fifo_size > priv->dma_cap.rx_fifo_size) {72447224 dev_warn(priv->device,72457225 "Rx FIFO size (%u) exceeds dma capability\n",72467226 priv->plat->rx_fifo_size);72477227 priv->plat->rx_fifo_size = priv->dma_cap.rx_fifo_size;72487228 }72497249- if (!priv->plat->tx_fifo_size) {72507250- if (priv->dma_cap.tx_fifo_size) {72517251- priv->plat->tx_fifo_size = priv->dma_cap.tx_fifo_size;72527252- } else {72537253- dev_err(priv->device, "Can't specify Tx FIFO size\n");72547254- return -ENODEV;72557255- }72567256- } else if (priv->dma_cap.tx_fifo_size &&72577257- priv->plat->tx_fifo_size > priv->dma_cap.tx_fifo_size) {72297229+ if (priv->dma_cap.tx_fifo_size &&72307230+ priv->plat->tx_fifo_size > priv->dma_cap.tx_fifo_size) {72587231 dev_warn(priv->device,72597232 "Tx FIFO size (%u) exceeds dma capability\n",72607233 priv->plat->tx_fifo_size);
···2828 if (likely(cpu < tq_number))2929 tq = &adapter->tx_queue[cpu];3030 else3131- tq = &adapter->tx_queue[reciprocal_scale(cpu, tq_number)];3131+ tq = &adapter->tx_queue[cpu % tq_number];32323333 return tq;3434}···124124 u32 buf_size;125125 u32 dw2;126126127127+ spin_lock_irq(&tq->tx_lock);127128 dw2 = (tq->tx_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT;128129 dw2 |= xdpf->len;129130 ctx.sop_txd = tq->tx_ring.base + tq->tx_ring.next2fill;···135134136135 if (vmxnet3_cmd_ring_desc_avail(&tq->tx_ring) == 0) {137136 tq->stats.tx_ring_full++;137137+ spin_unlock_irq(&tq->tx_lock);138138 return -ENOSPC;139139 }140140···144142 tbi->dma_addr = dma_map_single(&adapter->pdev->dev,145143 xdpf->data, buf_size,146144 DMA_TO_DEVICE);147147- if (dma_mapping_error(&adapter->pdev->dev, tbi->dma_addr))145145+ if (dma_mapping_error(&adapter->pdev->dev, tbi->dma_addr)) {146146+ spin_unlock_irq(&tq->tx_lock);148147 return -EFAULT;148148+ }149149 tbi->map_type |= VMXNET3_MAP_SINGLE;150150 } else { /* XDP buffer from page pool */151151 page = virt_to_page(xdpf->data);···186182 dma_wmb();187183 gdesc->dword[2] = cpu_to_le32(le32_to_cpu(gdesc->dword[2]) ^188184 VMXNET3_TXD_GEN);185185+ spin_unlock_irq(&tq->tx_lock);189186190187 /* No need to handle the case when tx_num_deferred doesn't reach191188 * threshold. Backend driver at hypervisor side will poll and reset···230225{231226 struct vmxnet3_adapter *adapter = netdev_priv(dev);232227 struct vmxnet3_tx_queue *tq;228228+ struct netdev_queue *nq;233229 int i;234230235231 if (unlikely(test_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state)))···242236 if (tq->stopped)243237 return -ENETDOWN;244238239239+ nq = netdev_get_tx_queue(adapter->netdev, tq->qid);240240+241241+ __netif_tx_lock(nq, smp_processor_id());245242 for (i = 0; i < n; i++) {246243 if (vmxnet3_xdp_xmit_frame(adapter, frames[i], tq, true)) {247244 tq->stats.xdp_xmit_err++;···252243 }253244 }254245 tq->stats.xdp_xmit += i;246246+ __netif_tx_unlock(nq);255247256248 return i;257249}
+7-1
drivers/nvme/host/core.c
···1700170017011701 status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, NULL, 0,17021702 &result);17031703- if (status < 0)17031703+17041704+ /*17051705+ * It's either a kernel error or the host observed a connection17061706+ * lost. In either case it's not possible communicate with the17071707+ * controller and thus enter the error code path.17081708+ */17091709+ if (status < 0 || status == NVME_SC_HOST_PATH_ERROR)17041710 return status;1705171117061712 /*
+25-10
drivers/nvme/host/fc.c
···781781static void782782nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)783783{784784+ enum nvme_ctrl_state state;785785+ unsigned long flags;786786+784787 dev_info(ctrl->ctrl.device,785788 "NVME-FC{%d}: controller connectivity lost. Awaiting "786789 "Reconnect", ctrl->cnum);787790788788- switch (nvme_ctrl_state(&ctrl->ctrl)) {791791+ spin_lock_irqsave(&ctrl->lock, flags);792792+ set_bit(ASSOC_FAILED, &ctrl->flags);793793+ state = nvme_ctrl_state(&ctrl->ctrl);794794+ spin_unlock_irqrestore(&ctrl->lock, flags);795795+796796+ switch (state) {789797 case NVME_CTRL_NEW:790798 case NVME_CTRL_LIVE:791799 /*···20872079 nvme_fc_complete_rq(rq);2088208020892081check_error:20902090- if (terminate_assoc && ctrl->ctrl.state != NVME_CTRL_RESETTING)20822082+ if (terminate_assoc &&20832083+ nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_RESETTING)20912084 queue_work(nvme_reset_wq, &ctrl->ioerr_work);20922085}20932086···25422533static void25432534nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)25442535{25362536+ enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);25372537+25452538 /*25462539 * if an error (io timeout, etc) while (re)connecting, the remote25472540 * port requested terminating of the association (disconnect_ls)···25512540 * the controller. Abort any ios on the association and let the25522541 * create_association error path resolve things.25532542 */25542554- if (ctrl->ctrl.state == NVME_CTRL_CONNECTING) {25432543+ if (state == NVME_CTRL_CONNECTING) {25552544 __nvme_fc_abort_outstanding_ios(ctrl, true);25562556- set_bit(ASSOC_FAILED, &ctrl->flags);25572545 dev_warn(ctrl->ctrl.device,25582546 "NVME-FC{%d}: transport error during (re)connect\n",25592547 ctrl->cnum);···25602550 }2561255125622552 /* Otherwise, only proceed if in LIVE state - e.g. on first error */25632563- if (ctrl->ctrl.state != NVME_CTRL_LIVE)25532553+ if (state != NVME_CTRL_LIVE)25642554 return;2565255525662556 dev_warn(ctrl->ctrl.device,···31773167 else31783168 ret = nvme_fc_recreate_io_queues(ctrl);31793169 }31803180- if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))31813181- ret = -EIO;31823170 if (ret)31833171 goto out_term_aen_ops;3184317231853185- changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);31733173+ spin_lock_irqsave(&ctrl->lock, flags);31743174+ if (!test_bit(ASSOC_FAILED, &ctrl->flags))31753175+ changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);31763176+ else31773177+ ret = -EIO;31783178+ spin_unlock_irqrestore(&ctrl->lock, flags);31793179+31803180+ if (ret)31813181+ goto out_term_aen_ops;3186318231873183 ctrl->ctrl.nr_reconnects = 0;31883184···35943578 list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);35953579 spin_unlock_irqrestore(&rport->lock, flags);3596358035973597- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||35983598- !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {35813581+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {35993582 dev_err(ctrl->ctrl.device,36003583 "NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);36013584 goto fail_ctrl;
+3-9
drivers/nvme/host/pci.c
···21532153 return 0;2154215421552155out_free_bufs:21562156- while (--i >= 0) {21572157- size_t size = le32_to_cpu(descs[i].size) * NVME_CTRL_PAGE_SIZE;21582158-21592159- dma_free_attrs(dev->dev, size, bufs[i],21602160- le64_to_cpu(descs[i].addr),21612161- DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN);21622162- }21632163-21642156 kfree(bufs);21652157out_free_descs:21662158 dma_free_coherent(dev->dev, descs_size, descs, descs_dma);···31393147 * because of high power consumption (> 2 Watt) in s2idle31403148 * sleep. Only some boards with Intel CPU are affected.31413149 */31423142- if (dmi_match(DMI_BOARD_NAME, "GMxPXxx") ||31503150+ if (dmi_match(DMI_BOARD_NAME, "DN50Z-140HC-YD") ||31513151+ dmi_match(DMI_BOARD_NAME, "GMxPXxx") ||31523152+ dmi_match(DMI_BOARD_NAME, "GXxMRXx") ||31433153 dmi_match(DMI_BOARD_NAME, "PH4PG31") ||31443154 dmi_match(DMI_BOARD_NAME, "PH4PRX1_PH6PRX1") ||31453155 dmi_match(DMI_BOARD_NAME, "PH6PG01_PH6PG71"))
···108108 pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL2, cap++);109109 pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL1, cap++);110110111111- if (parent->state_saved)112112- return;113113-114111 /*115112 * Save parent's L1 substate configuration so we have it for116113 * pci_restore_aspm_l1ss_state(pdev) to restore.
···2323 * IFS Image2424 * ---------2525 *2626- * Intel provides a firmware file containing the scan tests via2727- * github [#f1]_. Similar to microcode there is a separate file for each2626+ * Intel provides firmware files containing the scan tests via the webpage [#f1]_.2727+ * Look under "In-Field Scan Test Images Download" section towards the2828+ * end of the page. Similar to microcode, there are separate files for each2829 * family-model-stepping. IFS Images are not applicable for some test types.2930 * Wherever applicable the sysfs directory would provide a "current_batch" file3031 * (see below) for loading the image.3132 *3333+ * .. [#f1] https://intel.com/InFieldScan3234 *3335 * IFS Image Loading3436 * -----------------···126124 *127125 * 2) Hardware allows for some number of cores to be tested in parallel.128126 * The driver does not make use of this, it only tests one core at a time.129129- *130130- * .. [#f1] https://github.com/intel/TBD131131- *132127 *133128 * Structural Based Functional Test at Field (SBAF):134129 * -------------------------------------------------
···872872 case 0x1a: /* start stop unit in progress */873873 case 0x1b: /* sanitize in progress */874874 case 0x1d: /* configuration in progress */875875- case 0x24: /* depopulation in progress */876876- case 0x25: /* depopulation restore in progress */877875 action = ACTION_DELAYED_RETRY;878876 break;879877 case 0x0a: /* ALUA state transition */880878 action = ACTION_DELAYED_REPREP;881879 break;880880+ /*881881+ * Depopulation might take many hours,882882+ * thus it is not worthwhile to retry.883883+ */884884+ case 0x24: /* depopulation in progress */885885+ case 0x25: /* depopulation restore in progress */886886+ fallthrough;882887 default:883888 action = ACTION_FAIL;884889 break;
+7
drivers/scsi/scsi_lib_test.c
···6767 };6868 int i;69697070+ /* Success */7171+ sc.result = 0;7272+ KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, &failures));7373+ KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, NULL));7474+ /* Command failed but caller did not pass in a failures array */7575+ scsi_build_sense(&sc, 0, ILLEGAL_REQUEST, 0x91, 0x36);7676+ KUNIT_EXPECT_EQ(test, 0, scsi_check_passthrough(&sc, NULL));7077 /* Match end of array */7178 scsi_build_sense(&sc, 0, ILLEGAL_REQUEST, 0x91, 0x36);7279 KUNIT_EXPECT_EQ(test, -EAGAIN, scsi_check_passthrough(&sc, &failures));
+1-1
drivers/scsi/scsi_scan.c
···246246 }247247 ret = sbitmap_init_node(&sdev->budget_map,248248 scsi_device_max_queue_depth(sdev),249249- new_shift, GFP_KERNEL,249249+ new_shift, GFP_NOIO,250250 sdev->request_queue->node, false, true);251251 if (!ret)252252 sbitmap_resize(&sdev->budget_map, depth);
···748748 rcu_read_unlock();749749 mutex_lock(&bc->table.mutex);750750 mutex_unlock(&bc->table.mutex);751751- rcu_read_lock();752751 continue;753752 }754753 for (i = 0; i < tbl->size; i++)
+5-7
fs/bcachefs/buckets_waiting_for_journal.c
···2222 memset(t->d, 0, sizeof(t->d[0]) << t->bits);2323}24242525-bool bch2_bucket_needs_journal_commit(struct buckets_waiting_for_journal *b,2626- u64 flushed_seq,2727- unsigned dev, u64 bucket)2525+u64 bch2_bucket_journal_seq_ready(struct buckets_waiting_for_journal *b,2626+ unsigned dev, u64 bucket)2827{2928 struct buckets_waiting_for_journal_table *t;3029 u64 dev_bucket = (u64) dev << 56 | bucket;3131- bool ret = false;3232- unsigned i;3030+ u64 ret = 0;33313432 mutex_lock(&b->lock);3533 t = b->t;36343737- for (i = 0; i < ARRAY_SIZE(t->hash_seeds); i++) {3535+ for (unsigned i = 0; i < ARRAY_SIZE(t->hash_seeds); i++) {3836 struct bucket_hashed *h = bucket_hash(t, i, dev_bucket);39374038 if (h->dev_bucket == dev_bucket) {4141- ret = h->journal_seq > flushed_seq;3939+ ret = h->journal_seq;4240 break;4341 }4442 }
···12291229 */12301230 if (WARN_ON_ONCE(len >= ordered->num_bytes))12311231 return ERR_PTR(-EINVAL);12321232+ /*12331233+ * If our ordered extent had an error there's no point in continuing.12341234+ * The error may have come from a transaction abort done either by this12351235+ * task or some other concurrent task, and the transaction abort path12361236+ * iterates over all existing ordered extents and sets the flag12371237+ * BTRFS_ORDERED_IOERR on them.12381238+ */12391239+ if (unlikely(flags & (1U << BTRFS_ORDERED_IOERR))) {12401240+ const int fs_error = BTRFS_FS_ERROR(fs_info);12411241+12421242+ return fs_error ? ERR_PTR(fs_error) : ERR_PTR(-EIO);12431243+ }12321244 /* We cannot split partially completed ordered extents. */12331245 if (ordered->bytes_left) {12341246 ASSERT(!(flags & ~BTRFS_ORDERED_TYPE_FLAGS));
+5-6
fs/btrfs/qgroup.c
···18801880 * Commit current transaction to make sure all the rfer/excl numbers18811881 * get updated.18821882 */18831883- trans = btrfs_start_transaction(fs_info->quota_root, 0);18841884- if (IS_ERR(trans))18851885- return PTR_ERR(trans);18861886-18871887- ret = btrfs_commit_transaction(trans);18831883+ ret = btrfs_commit_current_transaction(fs_info->quota_root);18881884 if (ret < 0)18891885 return ret;18901886···18931897 /*18941898 * It's squota and the subvolume still has numbers needed for future18951899 * accounting, in this case we can not delete it. Just skip it.19001900+ *19011901+ * Or the qgroup is already removed by a qgroup rescan. For both cases we're19021902+ * safe to ignore them.18961903 */18971897- if (ret == -EBUSY)19041904+ if (ret == -EBUSY || ret == -ENOENT)18981905 ret = 0;18991906 return ret;19001907}
+3-1
fs/btrfs/transaction.c
···274274 cur_trans = fs_info->running_transaction;275275 if (cur_trans) {276276 if (TRANS_ABORTED(cur_trans)) {277277+ const int abort_error = cur_trans->aborted;278278+277279 spin_unlock(&fs_info->trans_lock);278278- return cur_trans->aborted;280280+ return abort_error;279281 }280282 if (btrfs_blocked_trans_types[cur_trans->state] & type) {281283 spin_unlock(&fs_info->trans_lock);
+3-3
fs/dcache.c
···17001700 smp_store_release(&dentry->d_name.name, dname); /* ^^^ */1701170117021702 dentry->d_flags = 0;17031703- lockref_init(&dentry->d_lockref, 1);17031703+ lockref_init(&dentry->d_lockref);17041704 seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock);17051705 dentry->d_inode = NULL;17061706 dentry->d_parent = dentry;···29662966 goto out_err;29672967 m2 = &alias->d_parent->d_inode->i_rwsem;29682968out_unalias:29692969- if (alias->d_op->d_unalias_trylock &&29692969+ if (alias->d_op && alias->d_op->d_unalias_trylock &&29702970 !alias->d_op->d_unalias_trylock(alias))29712971 goto out_err;29722972 __d_move(alias, dentry, false);29732973- if (alias->d_op->d_unalias_unlock)29732973+ if (alias->d_op && alias->d_op->d_unalias_unlock)29742974 alias->d_op->d_unalias_unlock(alias);29752975 ret = 0;29762976out_err:
+1-1
fs/erofs/zdata.c
···726726 if (IS_ERR(pcl))727727 return PTR_ERR(pcl);728728729729- lockref_init(&pcl->lockref, 1); /* one ref for this request */729729+ lockref_init(&pcl->lockref); /* one ref for this request */730730 pcl->algorithmformat = map->m_algorithmformat;731731 pcl->length = 0;732732 pcl->partial = true;
+16
fs/file_table.c
···194194 * refcount bumps we should reinitialize the reused file first.195195 */196196 file_ref_init(&f->f_ref, 1);197197+ /*198198+ * Disable permission and pre-content events for all files by default.199199+ * They may be enabled later by file_set_fsnotify_mode_from_watchers().200200+ */201201+ file_set_fsnotify_mode(f, FMODE_NONOTIFY_PERM);197202 return 0;198203}199204···380375 if (IS_ERR(file)) {381376 ihold(inode);382377 path_put(&path);378378+ return file;383379 }380380+ /*381381+ * Disable all fsnotify events for pseudo files by default.382382+ * They may be enabled by caller with file_set_fsnotify_mode().383383+ */384384+ file_set_fsnotify_mode(file, FMODE_NONOTIFY);384385 return file;385386}386387EXPORT_SYMBOL(alloc_file_pseudo);···411400 return file;412401 }413402 file_init_path(file, &path, fops);403403+ /*404404+ * Disable all fsnotify events for pseudo files by default.405405+ * They may be enabled by caller with file_set_fsnotify_mode().406406+ */407407+ file_set_fsnotify_mode(file, FMODE_NONOTIFY);414408 return file;415409}416410EXPORT_SYMBOL_GPL(alloc_file_pseudo_noaccount);
···50875087{50885088 struct vfsmount *mnt = s->mnt;50895089 struct super_block *sb = mnt->mnt_sb;50905090+ size_t start = seq->count;50905091 int err;5091509250935093+ err = security_sb_show_options(seq, sb);50945094+ if (err)50955095+ return err;50965096+50925097 if (sb->s_op->show_options) {50935093- size_t start = seq->count;50945094-50955095- err = security_sb_show_options(seq, sb);50965096- if (err)50975097- return err;50985098-50995098 err = sb->s_op->show_options(seq, mnt->mnt_root);51005099 if (err)51015100 return err;51025102-51035103- if (unlikely(seq_has_overflowed(seq)))51045104- return -EAGAIN;51055105-51065106- if (seq->count == start)51075107- return 0;51085108-51095109- /* skip leading comma */51105110- memmove(seq->buf + start, seq->buf + start + 1,51115111- seq->count - start - 1);51125112- seq->count--;51135101 }51025102+51035103+ if (unlikely(seq_has_overflowed(seq)))51045104+ return -EAGAIN;51055105+51065106+ if (seq->count == start)51075107+ return 0;51085108+51095109+ /* skip leading comma */51105110+ memmove(seq->buf + start, seq->buf + start + 1,51115111+ seq->count - start - 1);51125112+ seq->count--;5114511351155114 return 0;51165115}···51905191 size_t kbufsize;51915192 struct seq_file *seq = &s->seq;51925193 struct statmount *sm = &s->sm;51935193- u32 start = seq->count;51945194+ u32 start, *offp;51955195+51965196+ /* Reserve an empty string at the beginning for any unset offsets */51975197+ if (!seq->count)51985198+ seq_putc(seq, 0);51995199+52005200+ start = seq->count;5194520151955202 switch (flag) {51965203 case STATMOUNT_FS_TYPE:51975197- sm->fs_type = start;52045204+ offp = &sm->fs_type;51985205 ret = statmount_fs_type(s, seq);51995206 break;52005207 case STATMOUNT_MNT_ROOT:52015201- sm->mnt_root = start;52085208+ offp = &sm->mnt_root;52025209 ret = statmount_mnt_root(s, seq);52035210 break;52045211 case STATMOUNT_MNT_POINT:52055205- sm->mnt_point = start;52125212+ offp = &sm->mnt_point;52065213 ret = statmount_mnt_point(s, seq);52075214 break;52085215 case STATMOUNT_MNT_OPTS:52095209- sm->mnt_opts = start;52165216+ offp = &sm->mnt_opts;52105217 ret = statmount_mnt_opts(s, seq);52115218 break;52125219 case STATMOUNT_OPT_ARRAY:52135213- sm->opt_array = start;52205220+ offp = &sm->opt_array;52145221 ret = statmount_opt_array(s, seq);52155222 break;52165223 case STATMOUNT_OPT_SEC_ARRAY:52175217- sm->opt_sec_array = start;52245224+ offp = &sm->opt_sec_array;52185225 ret = statmount_opt_sec_array(s, seq);52195226 break;52205227 case STATMOUNT_FS_SUBTYPE:52215221- sm->fs_subtype = start;52285228+ offp = &sm->fs_subtype;52225229 statmount_fs_subtype(s, seq);52235230 break;52245231 case STATMOUNT_SB_SOURCE:52255225- sm->sb_source = start;52325232+ offp = &sm->sb_source;52265233 ret = statmount_sb_source(s, seq);52275234 break;52285235 default:···5256525152575252 seq->buf[seq->count++] = '\0';52585253 sm->mask |= flag;52545254+ *offp = start;52595255 return 0;52605256}52615257
+12-6
fs/notify/fsnotify.c
···648648 * Later, fsnotify permission hooks do not check if there are permission event649649 * watches, but that there were permission event watches at open time.650650 */651651-void file_set_fsnotify_mode(struct file *file)651651+void file_set_fsnotify_mode_from_watchers(struct file *file)652652{653653 struct dentry *dentry = file->f_path.dentry, *parent;654654 struct super_block *sb = dentry->d_sb;···665665 */666666 if (likely(!fsnotify_sb_has_priority_watchers(sb,667667 FSNOTIFY_PRIO_CONTENT))) {668668- file->f_mode |= FMODE_NONOTIFY_PERM;668668+ file_set_fsnotify_mode(file, FMODE_NONOTIFY_PERM);669669 return;670670 }671671···676676 if ((!d_is_dir(dentry) && !d_is_reg(dentry)) ||677677 likely(!fsnotify_sb_has_priority_watchers(sb,678678 FSNOTIFY_PRIO_PRE_CONTENT))) {679679- file->f_mode |= FMODE_NONOTIFY | FMODE_NONOTIFY_PERM;679679+ file_set_fsnotify_mode(file, FMODE_NONOTIFY | FMODE_NONOTIFY_PERM);680680 return;681681 }682682···686686 */687687 mnt_mask = READ_ONCE(real_mount(file->f_path.mnt)->mnt_fsnotify_mask);688688 if (unlikely(fsnotify_object_watched(d_inode(dentry), mnt_mask,689689- FSNOTIFY_PRE_CONTENT_EVENTS)))689689+ FSNOTIFY_PRE_CONTENT_EVENTS))) {690690+ /* Enable pre-content events */691691+ file_set_fsnotify_mode(file, 0);690692 return;693693+ }691694692695 /* Is parent watching for pre-content events on this file? */693696 if (dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) {694697 parent = dget_parent(dentry);695698 p_mask = fsnotify_inode_watches_children(d_inode(parent));696699 dput(parent);697697- if (p_mask & FSNOTIFY_PRE_CONTENT_EVENTS)700700+ if (p_mask & FSNOTIFY_PRE_CONTENT_EVENTS) {701701+ /* Enable pre-content events */702702+ file_set_fsnotify_mode(file, 0);698703 return;704704+ }699705 }700706 /* Nobody watching for pre-content events from this file */701701- file->f_mode |= FMODE_NONOTIFY | FMODE_NONOTIFY_PERM;707707+ file_set_fsnotify_mode(file, FMODE_NONOTIFY | FMODE_NONOTIFY_PERM);702708}703709#endif704710
+6-5
fs/open.c
···905905 f->f_sb_err = file_sample_sb_err(f);906906907907 if (unlikely(f->f_flags & O_PATH)) {908908- f->f_mode = FMODE_PATH | FMODE_OPENED | FMODE_NONOTIFY;908908+ f->f_mode = FMODE_PATH | FMODE_OPENED;909909+ file_set_fsnotify_mode(f, FMODE_NONOTIFY);909910 f->f_op = &empty_fops;910911 return 0;911912 }···936935937936 /*938937 * Set FMODE_NONOTIFY_* bits according to existing permission watches.939939- * If FMODE_NONOTIFY was already set for an fanotify fd, this doesn't940940- * change anything.938938+ * If FMODE_NONOTIFY mode was already set for an fanotify fd or for a939939+ * pseudo file, this call will not change the mode.941940 */942942- file_set_fsnotify_mode(f);941941+ file_set_fsnotify_mode_from_watchers(f);943942 error = fsnotify_open_perm(f);944943 if (error)945944 goto cleanup_all;···11231122 if (!IS_ERR(f)) {11241123 int error;1125112411261126- f->f_mode |= FMODE_NONOTIFY;11251125+ file_set_fsnotify_mode(f, FMODE_NONOTIFY);11271126 error = vfs_open(path, f);11281127 if (error) {11291128 fput(f);
+11-1
fs/pidfs.c
···287287 switch (cmd) {288288 case FS_IOC_GETVERSION:289289 case PIDFD_GET_CGROUP_NAMESPACE:290290- case PIDFD_GET_INFO:291290 case PIDFD_GET_IPC_NAMESPACE:292291 case PIDFD_GET_MNT_NAMESPACE:293292 case PIDFD_GET_NET_NAMESPACE:···297298 case PIDFD_GET_USER_NAMESPACE:298299 case PIDFD_GET_PID_NAMESPACE:299300 return true;301301+ }302302+303303+ /* Extensible ioctls require some more careful checks. */304304+ switch (_IOC_NR(cmd)) {305305+ case _IOC_NR(PIDFD_GET_INFO):306306+ /*307307+ * Try to prevent performing a pidfd ioctl when someone308308+ * erronously mistook the file descriptor for a pidfd.309309+ * This is not perfect but will catch most cases.310310+ */311311+ return (_IOC_TYPE(cmd) == _IOC_TYPE(PIDFD_GET_INFO));300312 }301313302314 return false;
···35633563 int error;3564356435653565 /*35663566- * If there are already extents in the file, try an exact EOF block35673567- * allocation to extend the file as a contiguous extent. If that fails,35683568- * or it's the first allocation in a file, just try for a stripe aligned35693569- * allocation.35663566+ * If there are already extents in the file, and xfs_bmap_adjacent() has35673567+ * given a better blkno, try an exact EOF block allocation to extend the35683568+ * file as a contiguous extent. If that fails, or it's the first35693569+ * allocation in a file, just try for a stripe aligned allocation.35703570 */35713571- if (ap->offset) {35713571+ if (ap->eof) {35723572 xfs_extlen_t nextminlen = 0;3573357335743574 /*···37363736 int error;3737373737383738 ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);37393739- xfs_bmap_adjacent(ap);37393739+ if (!xfs_bmap_adjacent(ap))37403740+ ap->eof = false;3740374137413742 /*37423743 * Search for an allocation group with a single extent large enough for
+17-19
fs/xfs/xfs_buf.c
···4141 *4242 * xfs_buf_rele:4343 * b_lock4444- * pag_buf_lock4545- * lru_lock4444+ * lru_lock4645 *4746 * xfs_buftarg_drain_rele4847 * lru_lock···219220 */220221 flags &= ~(XBF_UNMAPPED | XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD);221222222222- spin_lock_init(&bp->b_lock);223223+ /*224224+ * A new buffer is held and locked by the owner. This ensures that the225225+ * buffer is owned by the caller and racing RCU lookups right after226226+ * inserting into the hash table are safe (and will have to wait for227227+ * the unlock to do anything non-trivial).228228+ */223229 bp->b_hold = 1;230230+ sema_init(&bp->b_sema, 0); /* held, no waiters */231231+232232+ spin_lock_init(&bp->b_lock);224233 atomic_set(&bp->b_lru_ref, 1);225234 init_completion(&bp->b_iowait);226235 INIT_LIST_HEAD(&bp->b_lru);227236 INIT_LIST_HEAD(&bp->b_list);228237 INIT_LIST_HEAD(&bp->b_li_list);229229- sema_init(&bp->b_sema, 0); /* held, no waiters */230238 bp->b_target = target;231239 bp->b_mount = target->bt_mount;232240 bp->b_flags = flags;233241234234- /*235235- * Set length and io_length to the same value initially.236236- * I/O routines should use io_length, which will be the same in237237- * most cases but may be reset (e.g. XFS recovery).238238- */239242 error = xfs_buf_get_maps(bp, nmaps);240243 if (error) {241244 kmem_cache_free(xfs_buf_cache, bp);···503502xfs_buf_cache_init(504503 struct xfs_buf_cache *bch)505504{506506- spin_lock_init(&bch->bc_lock);507505 return rhashtable_init(&bch->bc_hash, &xfs_buf_hash_params);508506}509507···652652 if (error)653653 goto out_free_buf;654654655655- spin_lock(&bch->bc_lock);655655+ /* The new buffer keeps the perag reference until it is freed. */656656+ new_bp->b_pag = pag;657657+658658+ rcu_read_lock();656659 bp = rhashtable_lookup_get_insert_fast(&bch->bc_hash,657660 &new_bp->b_rhash_head, xfs_buf_hash_params);658661 if (IS_ERR(bp)) {662662+ rcu_read_unlock();659663 error = PTR_ERR(bp);660660- spin_unlock(&bch->bc_lock);661664 goto out_free_buf;662665 }663666 if (bp && xfs_buf_try_hold(bp)) {664667 /* found an existing buffer */665665- spin_unlock(&bch->bc_lock);668668+ rcu_read_unlock();666669 error = xfs_buf_find_lock(bp, flags);667670 if (error)668671 xfs_buf_rele(bp);···673670 *bpp = bp;674671 goto out_free_buf;675672 }673673+ rcu_read_unlock();676674677677- /* The new buffer keeps the perag reference until it is freed. */678678- new_bp->b_pag = pag;679679- spin_unlock(&bch->bc_lock);680675 *bpp = new_bp;681676 return 0;682677···10911090 }1092109110931092 /* we are asked to drop the last reference */10941094- spin_lock(&bch->bc_lock);10951093 __xfs_buf_ioacct_dec(bp);10961094 if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) {10971095 /*···11021102 bp->b_state &= ~XFS_BSTATE_DISPOSE;11031103 else11041104 bp->b_hold--;11051105- spin_unlock(&bch->bc_lock);11061105 } else {11071106 bp->b_hold--;11081107 /*···11191120 ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));11201121 rhashtable_remove_fast(&bch->bc_hash, &bp->b_rhash_head,11211122 xfs_buf_hash_params);11221122- spin_unlock(&bch->bc_lock);11231123 if (pag)11241124 xfs_perag_put(pag);11251125 freebuf = true;
···329329 * successfully but before locks are dropped.330330 */331331332332-/* Verify that we have security clearance to perform this operation. */333333-static int334334-xfs_exchange_range_verify_area(335335- struct xfs_exchrange *fxr)336336-{337337- int ret;338338-339339- ret = remap_verify_area(fxr->file1, fxr->file1_offset, fxr->length,340340- true);341341- if (ret)342342- return ret;343343-344344- return remap_verify_area(fxr->file2, fxr->file2_offset, fxr->length,345345- true);346346-}347347-348332/*349333 * Performs necessary checks before doing a range exchange, having stabilized350334 * mutable inode attributes via i_rwsem.···339355 unsigned int alloc_unit)340356{341357 struct inode *inode1 = file_inode(fxr->file1);358358+ loff_t size1 = i_size_read(inode1);342359 struct inode *inode2 = file_inode(fxr->file2);360360+ loff_t size2 = i_size_read(inode2);343361 uint64_t allocmask = alloc_unit - 1;344362 int64_t test_len;345363 uint64_t blen;346346- loff_t size1, size2, tmp;364364+ loff_t tmp;347365 int error;348366349367 /* Don't touch certain kinds of inodes */···354368 if (IS_SWAPFILE(inode1) || IS_SWAPFILE(inode2))355369 return -ETXTBSY;356370357357- size1 = i_size_read(inode1);358358- size2 = i_size_read(inode2);359359-360371 /* Ranges cannot start after EOF. */361372 if (fxr->file1_offset > size1 || fxr->file2_offset > size2)362373 return -EINVAL;363374364364- /*365365- * If the caller said to exchange to EOF, we set the length of the366366- * request large enough to cover everything to the end of both files.367367- */368375 if (fxr->flags & XFS_EXCHANGE_RANGE_TO_EOF) {376376+ /*377377+ * If the caller said to exchange to EOF, we set the length of378378+ * the request large enough to cover everything to the end of379379+ * both files.380380+ */369381 fxr->length = max_t(int64_t, size1 - fxr->file1_offset,370382 size2 - fxr->file2_offset);371371-372372- error = xfs_exchange_range_verify_area(fxr);373373- if (error)374374- return error;383383+ } else {384384+ /*385385+ * Otherwise we require both ranges to end within EOF.386386+ */387387+ if (fxr->file1_offset + fxr->length > size1 ||388388+ fxr->file2_offset + fxr->length > size2)389389+ return -EINVAL;375390 }376391377392 /*···386399 /* Ensure offsets don't wrap. */387400 if (check_add_overflow(fxr->file1_offset, fxr->length, &tmp) ||388401 check_add_overflow(fxr->file2_offset, fxr->length, &tmp))389389- return -EINVAL;390390-391391- /*392392- * We require both ranges to end within EOF, unless we're exchanging393393- * to EOF.394394- */395395- if (!(fxr->flags & XFS_EXCHANGE_RANGE_TO_EOF) &&396396- (fxr->file1_offset + fxr->length > size1 ||397397- fxr->file2_offset + fxr->length > size2))398402 return -EINVAL;399403400404 /*···725747{726748 struct inode *inode1 = file_inode(fxr->file1);727749 struct inode *inode2 = file_inode(fxr->file2);750750+ loff_t check_len = fxr->length;728751 int ret;729752730753 BUILD_BUG_ON(XFS_EXCHANGE_RANGE_ALL_FLAGS &···758779 return -EBADF;759780760781 /*761761- * If we're not exchanging to EOF, we can check the areas before762762- * stabilizing both files' i_size.782782+ * If we're exchanging to EOF we can't calculate the length until taking783783+ * the iolock. Pass a 0 length to remap_verify_area similar to the784784+ * FICLONE and FICLONERANGE ioctls that support cloning to EOF as well.763785 */764764- if (!(fxr->flags & XFS_EXCHANGE_RANGE_TO_EOF)) {765765- ret = xfs_exchange_range_verify_area(fxr);766766- if (ret)767767- return ret;768768- }786786+ if (fxr->flags & XFS_EXCHANGE_RANGE_TO_EOF)787787+ check_len = 0;788788+ ret = remap_verify_area(fxr->file1, fxr->file1_offset, check_len, true);789789+ if (ret)790790+ return ret;791791+ ret = remap_verify_area(fxr->file2, fxr->file2_offset, check_len, true);792792+ if (ret)793793+ return ret;769794770795 /* Update cmtime if the fd/inode don't forbid it. */771796 if (!(fxr->file1->f_mode & FMODE_NOCMTIME) && !IS_NOCMTIME(inode1))
+5-2
fs/xfs/xfs_inode.c
···14041404 goto out;1405140514061406 /* Try to clean out the cow blocks if there are any. */14071407- if (xfs_inode_has_cow_data(ip))14081408- xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);14071407+ if (xfs_inode_has_cow_data(ip)) {14081408+ error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);14091409+ if (error)14101410+ goto out;14111411+ }1409141214101413 if (VFS_I(ip)->i_nlink != 0) {14111414 /*
+2-4
fs/xfs/xfs_iomap.c
···976976 if (!xfs_is_cow_inode(ip))977977 return 0;978978979979- if (!written) {980980- xfs_reflink_cancel_cow_range(ip, pos, length, true);981981- return 0;982982- }979979+ if (!written)980980+ return xfs_reflink_cancel_cow_range(ip, pos, length, true);983981984982 return xfs_reflink_end_cow(ip, pos, written);985983}
+1
include/asm-generic/vmlinux.lds.h
···10381038 *(.discard) \10391039 *(.discard.*) \10401040 *(.export_symbol) \10411041+ *(.no_trim_symbol) \10411042 *(.modinfo) \10421043 /* ld.bfd warns about .gnu.version* even when not emitted */ \10431044 *(.gnu.version*) \
···191191 __v; \192192})193193194194+#ifdef __CHECKER__195195+#define __BUILD_BUG_ON_ZERO_MSG(e, msg) (0)196196+#else /* __CHECKER__ */197197+#define __BUILD_BUG_ON_ZERO_MSG(e, msg) ((int)sizeof(struct {_Static_assert(!(e), msg);}))198198+#endif /* __CHECKER__ */199199+200200+/* &a[0] degrades to a pointer: a different type from an array */201201+#define __is_array(a) (!__same_type((a), &(a)[0]))202202+#define __must_be_array(a) __BUILD_BUG_ON_ZERO_MSG(!__is_array(a), \203203+ "must be array")204204+205205+#define __is_byte_array(a) (__is_array(a) && sizeof((a)[0]) == 1)206206+#define __must_be_byte_array(a) __BUILD_BUG_ON_ZERO_MSG(!__is_byte_array(a), \207207+ "must be byte array")208208+209209+/* Require C Strings (i.e. NUL-terminated) lack the "nonstring" attribute. */210210+#define __must_be_cstr(p) \211211+ __BUILD_BUG_ON_ZERO_MSG(__annotated(p, nonstring), "must be cstr (NUL-terminated)")212212+194213#endif /* __KERNEL__ */195214196215/**···249230 .popsection;250231251232#define __ADDRESSABLE_ASM_STR(sym) __stringify(__ADDRESSABLE_ASM(sym))252252-253253-#ifdef __CHECKER__254254-#define __BUILD_BUG_ON_ZERO_MSG(e, msg) (0)255255-#else /* __CHECKER__ */256256-#define __BUILD_BUG_ON_ZERO_MSG(e, msg) ((int)sizeof(struct {_Static_assert(!(e), msg);}))257257-#endif /* __CHECKER__ */258258-259259-/* &a[0] degrades to a pointer: a different type from an array */260260-#define __must_be_array(a) __BUILD_BUG_ON_ZERO_MSG(__same_type((a), &(a)[0]), "must be array")261261-262262-/* Require C Strings (i.e. NUL-terminated) lack the "nonstring" attribute. */263263-#define __must_be_cstr(p) \264264- __BUILD_BUG_ON_ZERO_MSG(__annotated(p, nonstring), "must be cstr (NUL-terminated)")265233266234/*267235 * This returns a constant expression while determining if an argument is
+19-1
include/linux/fs.h
···222222#define FMODE_FSNOTIFY_HSM(mode) 0223223#endif224224225225-226225/*227226 * Attribute flags. These should be or-ed together to figure out what228227 * has been changed!···790791791792static inline void inode_set_cached_link(struct inode *inode, char *link, int linklen)792793{794794+ int testlen;795795+796796+ /*797797+ * TODO: patch it into a debug-only check if relevant macros show up.798798+ * In the meantime, since we are suffering strlen even on production kernels799799+ * to find the right length, do a fixup if the wrong value got passed.800800+ */801801+ testlen = strlen(link);802802+ if (testlen != linklen) {803803+ WARN_ONCE(1, "bad length passed for symlink [%s] (got %d, expected %d)",804804+ link, linklen, testlen);805805+ linklen = testlen;806806+ }793807 inode->i_link = link;794808 inode->i_linklen = linklen;795809 inode->i_opflags |= IOP_CACHED_LINK;···31503138 if (unlikely(!exe_file || FMODE_FSNOTIFY_HSM(exe_file->f_mode)))31513139 return;31523140 allow_write_access(exe_file);31413141+}31423142+31433143+static inline void file_set_fsnotify_mode(struct file *file, fmode_t mode)31443144+{31453145+ file->f_mode &= ~FMODE_FSNOTIFY_MASK;31463146+ file->f_mode |= mode;31533147}3154314831553149static inline bool inode_is_open_for_write(const struct inode *inode)
···244244 * @id_table: List of I2C devices supported by this driver245245 * @detect: Callback for device detection246246 * @address_list: The I2C addresses to probe (for detect)247247+ * @clients: List of detected clients we created (for i2c-core use only)247248 * @flags: A bitmask of flags defined in &enum i2c_driver_flags248249 *249250 * The driver.owner field should be set to the module owner of this driver.···299298 /* Device detection callback for automatic device creation */300299 int (*detect)(struct i2c_client *client, struct i2c_board_info *info);301300 const unsigned short *address_list;301301+ struct list_head clients;302302303303 u32 flags;304304};···315313 * @dev: Driver model device node for the slave.316314 * @init_irq: IRQ that was set at initialization317315 * @irq: indicates the IRQ generated by this device (if any)316316+ * @detected: member of an i2c_driver.clients list or i2c-core's317317+ * userspace_devices list318318 * @slave_cb: Callback when I2C slave mode of an adapter is used. The adapter319319 * calls it to pass on slave events to the slave driver.320320 * @devres_group_id: id of the devres group that will be created for resources···336332#define I2C_CLIENT_SLAVE 0x20 /* we are the slave */337333#define I2C_CLIENT_HOST_NOTIFY 0x40 /* We want to use I2C host notify */338334#define I2C_CLIENT_WAKE 0x80 /* for board_info; true iff can wake */339339-#define I2C_CLIENT_AUTO 0x100 /* client was auto-detected */340340-#define I2C_CLIENT_USER 0x200 /* client was userspace-created */341335#define I2C_CLIENT_SCCB 0x9000 /* Use Omnivision SCCB protocol */342336 /* Must match I2C_M_STOP|IGNORE_NAK */343337···347345 struct device dev; /* the device structure */348346 int init_irq; /* irq set at initialization */349347 int irq; /* irq issued by device */348348+ struct list_head detected;350349#if IS_ENABLED(CONFIG_I2C_SLAVE)351350 i2c_slave_cb_t slave_cb; /* callback for slave mode */352351#endif···753750 int nr;754751 char name[48];755752 struct completion dev_released;753753+754754+ struct mutex userspace_clients_lock;755755+ struct list_head userspace_clients;756756757757 struct i2c_bus_recovery_info *bus_recovery_info;758758 const struct i2c_adapter_quirks *quirks;
···411411/* GFX12 and later: */412412#define AMDGPU_TILING_GFX12_SWIZZLE_MODE_SHIFT 0413413#define AMDGPU_TILING_GFX12_SWIZZLE_MODE_MASK 0x7414414-/* These are DCC recompression setting for memory management: */414414+/* These are DCC recompression settings for memory management: */415415#define AMDGPU_TILING_GFX12_DCC_MAX_COMPRESSED_BLOCK_SHIFT 3416416#define AMDGPU_TILING_GFX12_DCC_MAX_COMPRESSED_BLOCK_MASK 0x3 /* 0:64B, 1:128B, 2:256B */417417#define AMDGPU_TILING_GFX12_DCC_NUMBER_TYPE_SHIFT 5418418#define AMDGPU_TILING_GFX12_DCC_NUMBER_TYPE_MASK 0x7 /* CB_COLOR0_INFO.NUMBER_TYPE */419419#define AMDGPU_TILING_GFX12_DCC_DATA_FORMAT_SHIFT 8420420#define AMDGPU_TILING_GFX12_DCC_DATA_FORMAT_MASK 0x3f /* [0:4]:CB_COLOR0_INFO.FORMAT, [5]:MM */421421+/* When clearing the buffer or moving it from VRAM to GTT, don't compress and set DCC metadata422422+ * to uncompressed. Set when parts of an allocation bypass DCC and read raw data. */423423+#define AMDGPU_TILING_GFX12_DCC_WRITE_COMPRESS_DISABLE_SHIFT 14424424+#define AMDGPU_TILING_GFX12_DCC_WRITE_COMPRESS_DISABLE_MASK 0x1425425+/* bit gap */426426+#define AMDGPU_TILING_GFX12_SCANOUT_SHIFT 63427427+#define AMDGPU_TILING_GFX12_SCANOUT_MASK 0x1421428422429/* Set/Get helpers for tiling flags. */423430#define AMDGPU_TILING_SET(field, value) \
···285285}286286287287extern void __futex_unqueue(struct futex_q *q);288288-extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb);288288+extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb,289289+ struct task_struct *task);289290extern int futex_unqueue(struct futex_q *q);290291291292/**292293 * futex_queue() - Enqueue the futex_q on the futex_hash_bucket293294 * @q: The futex_q to enqueue294295 * @hb: The destination hash bucket296296+ * @task: Task queueing this futex295297 *296298 * The hb->lock must be held by the caller, and is released here. A call to297299 * futex_queue() is typically paired with exactly one call to futex_unqueue(). The···301299 * or nothing if the unqueue is done as part of the wake process and the unqueue302300 * state is implicit in the state of woken task (see futex_wait_requeue_pi() for303301 * an example).302302+ *303303+ * Note that @task may be NULL, for async usage of futexes.304304 */305305-static inline void futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)305305+static inline void futex_queue(struct futex_q *q, struct futex_hash_bucket *hb,306306+ struct task_struct *task)306307 __releases(&hb->lock)307308{308308- __futex_queue(q, hb);309309+ __futex_queue(q, hb, task);309310 spin_unlock(&hb->lock);310311}311312
+1-1
kernel/futex/pi.c
···982982 /*983983 * Only actually queue now that the atomic ops are done:984984 */985985- __futex_queue(&q, hb);985985+ __futex_queue(&q, hb, current);986986987987 if (trylock) {988988 ret = rt_mutex_futex_trylock(&q.pi_state->pi_mutex);
+2-2
kernel/futex/waitwake.c
···349349 * access to the hash list and forcing another memory barrier.350350 */351351 set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);352352- futex_queue(q, hb);352352+ futex_queue(q, hb, current);353353354354 /* Arm the timer */355355 if (timeout)···460460 * next futex. Queue each futex at this moment so hb can461461 * be unlocked.462462 */463463- futex_queue(q, hb);463463+ futex_queue(q, hb, current);464464 continue;465465 }466466
+2-2
kernel/kthread.c
···859859 struct kthread *kthread = to_kthread(p);860860 cpumask_var_t affinity;861861 unsigned long flags;862862- int ret;862862+ int ret = 0;863863864864 if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {865865 WARN_ON(1);···892892out:893893 free_cpumask_var(affinity);894894895895- return 0;895895+ return ret;896896}897897898898/*
+2
kernel/sched/debug.c
···12621262 if (task_has_dl_policy(p)) {12631263 P(dl.runtime);12641264 P(dl.deadline);12651265+ } else if (fair_policy(p->policy)) {12661266+ P(se.slice);12651267 }12661268#ifdef CONFIG_SCHED_CLASS_EXT12671269 __PS("ext.enabled", task_on_scx(p));
+19
kernel/sched/fair.c
···53855385static void set_delayed(struct sched_entity *se)53865386{53875387 se->sched_delayed = 1;53885388+53895389+ /*53905390+ * Delayed se of cfs_rq have no tasks queued on them.53915391+ * Do not adjust h_nr_runnable since dequeue_entities()53925392+ * will account it for blocked tasks.53935393+ */53945394+ if (!entity_is_task(se))53955395+ return;53965396+53885397 for_each_sched_entity(se) {53895398 struct cfs_rq *cfs_rq = cfs_rq_of(se);53905399···54065397static void clear_delayed(struct sched_entity *se)54075398{54085399 se->sched_delayed = 0;54005400+54015401+ /*54025402+ * Delayed se of cfs_rq have no tasks queued on them.54035403+ * Do not adjust h_nr_runnable since a dequeue has54045404+ * already accounted for it or an enqueue of a task54055405+ * below it will account for it in enqueue_task_fair().54065406+ */54075407+ if (!entity_is_task(se))54085408+ return;54095409+54095410 for_each_sched_entity(se) {54105411 struct cfs_rq *cfs_rq = cfs_rq_of(se);54115412
+12
kernel/seccomp.c
···749749 if (WARN_ON_ONCE(!fprog))750750 return false;751751752752+ /* Our single exception to filtering. */753753+#ifdef __NR_uretprobe754754+#ifdef SECCOMP_ARCH_COMPAT755755+ if (sd->arch == SECCOMP_ARCH_NATIVE)756756+#endif757757+ if (sd->nr == __NR_uretprobe)758758+ return true;759759+#endif760760+752761 for (pc = 0; pc < fprog->len; pc++) {753762 struct sock_filter *insn = &fprog->filter[pc];754763 u16 code = insn->code;···10321023 */10331024static const int mode1_syscalls[] = {10341025 __NR_seccomp_read, __NR_seccomp_write, __NR_seccomp_exit, __NR_seccomp_sigreturn,10261026+#ifdef __NR_uretprobe10271027+ __NR_uretprobe,10281028+#endif10351029 -1, /* negative terminated */10361030};10371031
+6-3
kernel/time/clocksource.c
···373373 cpumask_clear(&cpus_ahead);374374 cpumask_clear(&cpus_behind);375375 cpus_read_lock();376376- preempt_disable();376376+ migrate_disable();377377 clocksource_verify_choose_cpus();378378 if (cpumask_empty(&cpus_chosen)) {379379- preempt_enable();379379+ migrate_enable();380380 cpus_read_unlock();381381 pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name);382382 return;383383 }384384 testcpu = smp_processor_id();385385- pr_warn("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", cs->name, testcpu, cpumask_pr_args(&cpus_chosen));385385+ pr_info("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n",386386+ cs->name, testcpu, cpumask_pr_args(&cpus_chosen));387387+ preempt_disable();386388 for_each_cpu(cpu, &cpus_chosen) {387389 if (cpu == testcpu)388390 continue;···404402 cs_nsec_min = cs_nsec;405403 }406404 preempt_enable();405405+ migrate_enable();407406 cpus_read_unlock();408407 if (!cpumask_empty(&cpus_ahead))409408 pr_warn(" CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+94-31
kernel/time/hrtimer.c
···5858#define HRTIMER_ACTIVE_SOFT (HRTIMER_ACTIVE_HARD << MASK_SHIFT)5959#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)60606161+static void retrigger_next_event(void *arg);6262+6163/*6264 * The timer bases:6365 *···113111 .clockid = CLOCK_TAI,114112 .get_time = &ktime_get_clocktai,115113 },116116- }114114+ },115115+ .csd = CSD_INIT(retrigger_next_event, NULL)117116};118117119118static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {···126123 [CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME,127124 [CLOCK_TAI] = HRTIMER_BASE_TAI,128125};126126+127127+static inline bool hrtimer_base_is_online(struct hrtimer_cpu_base *base)128128+{129129+ if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))130130+ return true;131131+ else132132+ return likely(base->online);133133+}129134130135/*131136 * Functions and macros which are different for UP/SMP systems are kept in a···155144};156145157146#define migration_base migration_cpu_base.clock_base[0]158158-159159-static inline bool is_migration_base(struct hrtimer_clock_base *base)160160-{161161- return base == &migration_base;162162-}163147164148/*165149 * We are using hashed locking: holding per_cpu(hrtimer_bases)[n].lock···189183}190184191185/*192192- * We do not migrate the timer when it is expiring before the next193193- * event on the target cpu. When high resolution is enabled, we cannot194194- * reprogram the target cpu hardware and we would cause it to fire195195- * late. To keep it simple, we handle the high resolution enabled and196196- * disabled case similar.186186+ * Check if the elected target is suitable considering its next187187+ * event and the hotplug state of the current CPU.188188+ *189189+ * If the elected target is remote and its next event is after the timer190190+ * to queue, then a remote reprogram is necessary. However there is no191191+ * guarantee the IPI handling the operation would arrive in time to meet192192+ * the high resolution deadline. In this case the local CPU becomes a193193+ * preferred target, unless it is offline.194194+ *195195+ * High and low resolution modes are handled the same way for simplicity.197196 *198197 * Called with cpu_base->lock of target cpu held.199198 */200200-static int201201-hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base)199199+static bool hrtimer_suitable_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base,200200+ struct hrtimer_cpu_base *new_cpu_base,201201+ struct hrtimer_cpu_base *this_cpu_base)202202{203203 ktime_t expires;204204205205+ /*206206+ * The local CPU clockevent can be reprogrammed. Also get_target_base()207207+ * guarantees it is online.208208+ */209209+ if (new_cpu_base == this_cpu_base)210210+ return true;211211+212212+ /*213213+ * The offline local CPU can't be the default target if the214214+ * next remote target event is after this timer. Keep the215215+ * elected new base. An IPI will we issued to reprogram216216+ * it as a last resort.217217+ */218218+ if (!hrtimer_base_is_online(this_cpu_base))219219+ return true;220220+205221 expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);206206- return expires < new_base->cpu_base->expires_next;222222+223223+ return expires >= new_base->cpu_base->expires_next;207224}208225209209-static inline210210-struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base,211211- int pinned)226226+static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base, int pinned)212227{228228+ if (!hrtimer_base_is_online(base)) {229229+ int cpu = cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYPE_TIMER));230230+231231+ return &per_cpu(hrtimer_bases, cpu);232232+ }233233+213234#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)214235 if (static_branch_likely(&timers_migration_enabled) && !pinned)215236 return &per_cpu(hrtimer_bases, get_nohz_timer_target());···287254 raw_spin_unlock(&base->cpu_base->lock);288255 raw_spin_lock(&new_base->cpu_base->lock);289256290290- if (new_cpu_base != this_cpu_base &&291291- hrtimer_check_target(timer, new_base)) {257257+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base,258258+ this_cpu_base)) {292259 raw_spin_unlock(&new_base->cpu_base->lock);293260 raw_spin_lock(&base->cpu_base->lock);294261 new_cpu_base = this_cpu_base;···297264 }298265 WRITE_ONCE(timer->base, new_base);299266 } else {300300- if (new_cpu_base != this_cpu_base &&301301- hrtimer_check_target(timer, new_base)) {267267+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, this_cpu_base)) {302268 new_cpu_base = this_cpu_base;303269 goto again;304270 }···306274}307275308276#else /* CONFIG_SMP */309309-310310-static inline bool is_migration_base(struct hrtimer_clock_base *base)311311-{312312- return false;313313-}314277315278static inline struct hrtimer_clock_base *316279lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)···742715{743716 return hrtimer_hres_enabled;744717}745745-746746-static void retrigger_next_event(void *arg);747718748719/*749720 * Switch to high resolution mode···12301205 u64 delta_ns, const enum hrtimer_mode mode,12311206 struct hrtimer_clock_base *base)12321207{12081208+ struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);12331209 struct hrtimer_clock_base *new_base;12341210 bool force_local, first;12351211···12421216 * and enforce reprogramming after it is queued no matter whether12431217 * it is the new first expiring timer again or not.12441218 */12451245- force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases);12191219+ force_local = base->cpu_base == this_cpu_base;12461220 force_local &= base->cpu_base->next_timer == timer;12211221+12221222+ /*12231223+ * Don't force local queuing if this enqueue happens on a unplugged12241224+ * CPU after hrtimer_cpu_dying() has been invoked.12251225+ */12261226+ force_local &= this_cpu_base->online;1247122712481228 /*12491229 * Remove an active timer from the queue. In case it is not queued···12801248 }1281124912821250 first = enqueue_hrtimer(timer, new_base, mode);12831283- if (!force_local)12841284- return first;12511251+ if (!force_local) {12521252+ /*12531253+ * If the current CPU base is online, then the timer is12541254+ * never queued on a remote CPU if it would be the first12551255+ * expiring timer there.12561256+ */12571257+ if (hrtimer_base_is_online(this_cpu_base))12581258+ return first;12591259+12601260+ /*12611261+ * Timer was enqueued remote because the current base is12621262+ * already offline. If the timer is the first to expire,12631263+ * kick the remote CPU to reprogram the clock event.12641264+ */12651265+ if (first) {12661266+ struct hrtimer_cpu_base *new_cpu_base = new_base->cpu_base;12671267+12681268+ smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd);12691269+ }12701270+ return 0;12711271+ }1285127212861273 /*12871274 * Timer was forced to stay on the current CPU to avoid···14201369 raw_spin_lock_irq(&cpu_base->lock);14211370 }14221371}13721372+13731373+#ifdef CONFIG_SMP13741374+static __always_inline bool is_migration_base(struct hrtimer_clock_base *base)13751375+{13761376+ return base == &migration_base;13771377+}13781378+#else13791379+static __always_inline bool is_migration_base(struct hrtimer_clock_base *base)13801380+{13811381+ return false;13821382+}13831383+#endif1423138414241385/*14251386 * This function is called on PREEMPT_RT kernels when the fast path
+9-1
kernel/time/timer_migration.c
···1675167516761676 } while (i < tmigr_hierarchy_levels);1677167716781678+ /* Assert single root */16791679+ WARN_ON_ONCE(!err && !group->parent && !list_is_singular(&tmigr_level_list[top]));16801680+16781681 while (i > 0) {16791682 group = stack[--i];16801683···17191716 WARN_ON_ONCE(top == 0);1720171717211718 lvllist = &tmigr_level_list[top];17221722- if (group->num_children == 1 && list_is_singular(lvllist)) {17191719+17201720+ /*17211721+ * Newly created root level should have accounted the upcoming17221722+ * CPU's child group and pre-accounted the old root.17231723+ */17241724+ if (group->num_children == 2 && list_is_singular(lvllist)) {17231725 /*17241726 * The target CPU must never do the prepare work, except17251727 * on early boot when the boot CPU is the target. Otherwise
+1-1
kernel/trace/trace_functions_graph.c
···198198 * returning from the function.199199 */200200 if (ftrace_graph_notrace_addr(trace->func)) {201201- *task_var |= TRACE_GRAPH_NOTRACE_BIT;201201+ *task_var |= TRACE_GRAPH_NOTRACE;202202 /*203203 * Need to return 1 to have the return called204204 * that will clear the NOTRACE bit.
+4-2
lib/stackinit_kunit.c
···7575 */7676#ifdef CONFIG_M68K7777#define FILL_SIZE_STRING 87878+#define FILL_SIZE_ARRAY 27879#else7980#define FILL_SIZE_STRING 168181+#define FILL_SIZE_ARRAY 88082#endif81838284#define INIT_CLONE_SCALAR /**/···347345 short three;348346 unsigned long four;349347 struct big_struct {350350- unsigned long array[8];348348+ unsigned long array[FILL_SIZE_ARRAY];351349 } big;352350};353351354354-/* Mismatched sizes, with one and two being small */352352+/* Mismatched sizes, with three and four being small */355353union test_small_end {356354 short one;357355 unsigned long two;
+14
net/core/dev.c
···1128611286 const struct net_device_ops *ops = dev->netdev_ops;1128711287 const struct net_device_core_stats __percpu *p;11288112881128911289+ /*1129011290+ * IPv{4,6} and udp tunnels share common stat helpers and use1129111291+ * different stat type (NETDEV_PCPU_STAT_TSTATS vs1129211292+ * NETDEV_PCPU_STAT_DSTATS). Ensure the accounting is consistent.1129311293+ */1129411294+ BUILD_BUG_ON(offsetof(struct pcpu_sw_netstats, rx_bytes) !=1129511295+ offsetof(struct pcpu_dstats, rx_bytes));1129611296+ BUILD_BUG_ON(offsetof(struct pcpu_sw_netstats, rx_packets) !=1129711297+ offsetof(struct pcpu_dstats, rx_packets));1129811298+ BUILD_BUG_ON(offsetof(struct pcpu_sw_netstats, tx_bytes) !=1129911299+ offsetof(struct pcpu_dstats, tx_bytes));1130011300+ BUILD_BUG_ON(offsetof(struct pcpu_sw_netstats, tx_packets) !=1130111301+ offsetof(struct pcpu_dstats, tx_packets));1130211302+1128911303 if (ops->ndo_get_stats64) {1129011304 memset(storage, 0, sizeof(*storage));1129111305 ops->ndo_get_stats64(dev, storage);
+1-1
net/ethtool/ioctl.c
···993993 return rc;994994995995 /* Nonzero ring with RSS only makes sense if NIC adds them together */996996- if (cmd == ETHTOOL_SRXCLSRLINS && info.flow_type & FLOW_RSS &&996996+ if (cmd == ETHTOOL_SRXCLSRLINS && info.fs.flow_type & FLOW_RSS &&997997 !ops->cap_rss_rxnfc_adds &&998998 ethtool_get_flow_spec_ring(info.fs.ring_cookie))999999 return -EINVAL;
···11411141 const int hlen = skb_network_header_len(skb) +11421142 sizeof(struct udphdr);1143114311441144- if (hlen + cork->gso_size > cork->fragsize) {11441144+ if (hlen + min(datalen, cork->gso_size) > cork->fragsize) {11451145 kfree_skb(skb);11461146- return -EINVAL;11461146+ return -EMSGSIZE;11471147 }11481148 if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {11491149 kfree_skb(skb);
+9-5
net/ipv6/ioam6_iptunnel.c
···336336337337static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)338338{339339- struct dst_entry *dst = skb_dst(skb), *cache_dst;339339+ struct dst_entry *dst = skb_dst(skb), *cache_dst = NULL;340340 struct in6_addr orig_daddr;341341 struct ioam6_lwt *ilwt;342342 int err = -EINVAL;···407407 cache_dst = ip6_route_output(net, NULL, &fl6);408408 if (cache_dst->error) {409409 err = cache_dst->error;410410- dst_release(cache_dst);411410 goto drop;412411 }413412414414- local_bh_disable();415415- dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr);416416- local_bh_enable();413413+ /* cache only if we don't create a dst reference loop */414414+ if (dst->lwtstate != cache_dst->lwtstate) {415415+ local_bh_disable();416416+ dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr);417417+ local_bh_enable();418418+ }417419418420 err = skb_cow_head(skb, LL_RESERVED_SPACE(cache_dst->dev));419421 if (unlikely(err))···428426 return dst_output(net, sk, skb);429427 }430428out:429429+ dst_release(cache_dst);431430 return dst->lwtstate->orig_output(net, sk, skb);432431drop:432432+ dst_release(cache_dst);433433 kfree_skb(skb);434434 return err;435435}
+10-5
net/ipv6/rpl_iptunnel.c
···232232 dst = ip6_route_output(net, NULL, &fl6);233233 if (dst->error) {234234 err = dst->error;235235- dst_release(dst);236235 goto drop;237236 }238237239239- local_bh_disable();240240- dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr);241241- local_bh_enable();238238+ /* cache only if we don't create a dst reference loop */239239+ if (orig_dst->lwtstate != dst->lwtstate) {240240+ local_bh_disable();241241+ dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr);242242+ local_bh_enable();243243+ }242244243245 err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));244246 if (unlikely(err))···253251 return dst_output(net, sk, skb);254252255253drop:254254+ dst_release(dst);256255 kfree_skb(skb);257256 return err;258257}···272269 local_bh_enable();273270274271 err = rpl_do_srh(skb, rlwt, dst);275275- if (unlikely(err))272272+ if (unlikely(err)) {273273+ dst_release(dst);276274 goto drop;275275+ }277276278277 if (!dst) {279278 ip6_route_input(skb);
+10-5
net/ipv6/seg6_iptunnel.c
···482482 local_bh_enable();483483484484 err = seg6_do_srh(skb, dst);485485- if (unlikely(err))485485+ if (unlikely(err)) {486486+ dst_release(dst);486487 goto drop;488488+ }487489488490 if (!dst) {489491 ip6_route_input(skb);···573571 dst = ip6_route_output(net, NULL, &fl6);574572 if (dst->error) {575573 err = dst->error;576576- dst_release(dst);577574 goto drop;578575 }579576580580- local_bh_disable();581581- dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr);582582- local_bh_enable();577577+ /* cache only if we don't create a dst reference loop */578578+ if (orig_dst->lwtstate != dst->lwtstate) {579579+ local_bh_disable();580580+ dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr);581581+ local_bh_enable();582582+ }583583584584 err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));585585 if (unlikely(err))···597593598594 return dst_output(net, sk, skb);599595drop:596596+ dst_release(dst);600597 kfree_skb(skb);601598 return err;602599}
+2-2
net/ipv6/udp.c
···13891389 const int hlen = skb_network_header_len(skb) +13901390 sizeof(struct udphdr);1391139113921392- if (hlen + cork->gso_size > cork->fragsize) {13921392+ if (hlen + min(datalen, cork->gso_size) > cork->fragsize) {13931393 kfree_skb(skb);13941394- return -EINVAL;13941394+ return -EMSGSIZE;13951395 }13961396 if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {13971397 kfree_skb(skb);
+16-8
net/rose/af_rose.c
···701701 struct net_device *dev;702702 ax25_address *source;703703 ax25_uid_assoc *user;704704+ int err = -EINVAL;704705 int n;705705-706706- if (!sock_flag(sk, SOCK_ZAPPED))707707- return -EINVAL;708706709707 if (addr_len != sizeof(struct sockaddr_rose) && addr_len != sizeof(struct full_sockaddr_rose))710708 return -EINVAL;···716718 if ((unsigned int) addr->srose_ndigis > ROSE_MAX_DIGIS)717719 return -EINVAL;718720719719- if ((dev = rose_dev_get(&addr->srose_addr)) == NULL)720720- return -EADDRNOTAVAIL;721721+ lock_sock(sk);722722+723723+ if (!sock_flag(sk, SOCK_ZAPPED))724724+ goto out_release;725725+726726+ err = -EADDRNOTAVAIL;727727+ dev = rose_dev_get(&addr->srose_addr);728728+ if (!dev)729729+ goto out_release;721730722731 source = &addr->srose_call;723732···735730 } else {736731 if (ax25_uid_policy && !capable(CAP_NET_BIND_SERVICE)) {737732 dev_put(dev);738738- return -EACCES;733733+ err = -EACCES;734734+ goto out_release;739735 }740736 rose->source_call = *source;741737 }···759753 rose_insert_socket(sk);760754761755 sock_reset_flag(sk, SOCK_ZAPPED);762762-763763- return 0;756756+ err = 0;757757+out_release:758758+ release_sock(sk);759759+ return err;764760}765761766762static int rose_connect(struct socket *sock, struct sockaddr *uaddr, int addr_len, int flags)
+1-1
net/rxrpc/ar-internal.h
···582582 RXRPC_CALL_EXCLUSIVE, /* The call uses a once-only connection */583583 RXRPC_CALL_RX_IS_IDLE, /* recvmsg() is idle - send an ACK */584584 RXRPC_CALL_RECVMSG_READ_ALL, /* recvmsg() read all of the received data */585585+ RXRPC_CALL_CONN_CHALLENGING, /* The connection is being challenged */585586};586587587588/*···603602 RXRPC_CALL_CLIENT_AWAIT_REPLY, /* - client awaiting reply */604603 RXRPC_CALL_CLIENT_RECV_REPLY, /* - client receiving reply phase */605604 RXRPC_CALL_SERVER_PREALLOC, /* - service preallocation */606606- RXRPC_CALL_SERVER_SECURING, /* - server securing request connection */607605 RXRPC_CALL_SERVER_RECV_REQUEST, /* - server receiving request */608606 RXRPC_CALL_SERVER_ACK_REQUEST, /* - server pending ACK of request */609607 RXRPC_CALL_SERVER_SEND_REPLY, /* - server sending reply */
···448448 struct rxrpc_skb_priv *sp = rxrpc_skb(skb);449449 bool last = sp->hdr.flags & RXRPC_LAST_PACKET;450450451451- skb_queue_tail(&call->recvmsg_queue, skb);451451+ spin_lock_irq(&call->recvmsg_queue.lock);452452+453453+ __skb_queue_tail(&call->recvmsg_queue, skb);452454 rxrpc_input_update_ack_window(call, window, wtop);453455 trace_rxrpc_receive(call, last ? why + 1 : why, sp->hdr.serial, sp->hdr.seq);454456 if (last)457457+ /* Change the state inside the lock so that recvmsg syncs458458+ * correctly with it and using sendmsg() to send a reply459459+ * doesn't race.460460+ */455461 rxrpc_end_rx_phase(call, sp->hdr.serial);462462+463463+ spin_unlock_irq(&call->recvmsg_queue.lock);456464}457465458466/*···665657 rxrpc_propose_delay_ACK(call, sp->hdr.serial,666658 rxrpc_propose_ack_input_data);667659 }668668- if (notify) {660660+ if (notify && !test_bit(RXRPC_CALL_CONN_CHALLENGING, &call->flags)) {669661 trace_rxrpc_notify_socket(call->debug_id, sp->hdr.serial);670662 rxrpc_notify_socket(call);671663 }
+1-1
net/rxrpc/sendmsg.c
···707707 } else {708708 switch (rxrpc_call_state(call)) {709709 case RXRPC_CALL_CLIENT_AWAIT_CONN:710710- case RXRPC_CALL_SERVER_SECURING:710710+ case RXRPC_CALL_SERVER_RECV_REQUEST:711711 if (p.command == RXRPC_CMD_SEND_ABORT)712712 break;713713 fallthrough;
+3
net/sched/sch_fifo.c
···4040{4141 unsigned int prev_backlog;42424343+ if (unlikely(READ_ONCE(sch->limit) == 0))4444+ return qdisc_drop(skb, sch, to_free);4545+4346 if (likely(sch->q.qlen < READ_ONCE(sch->limit)))4447 return qdisc_enqueue_tail(skb, sch);4548
···3131ifdef CONFIG_CC_IS_CLANG3232# The kernel builds with '-std=gnu11' so use of GNU extensions is acceptable.3333KBUILD_CFLAGS += -Wno-gnu3434+3535+# Clang checks for overflow/truncation with '%p', while GCC does not:3636+# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=1112193737+KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow-non-kprintf)3838+KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation-non-kprintf)3439else35403641# gcc inanely warns about local variables called 'main'···110105KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow)111106ifdef CONFIG_CC_IS_GCC112107KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation)113113-else114114-# Clang checks for overflow/truncation with '%p', while GCC does not:115115-# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111219116116-KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow-non-kprintf)117117-KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation-non-kprintf)118108endif119109KBUILD_CFLAGS += $(call cc-disable-warning, stringop-truncation)120110···133133KBUILD_CFLAGS += -Wno-tautological-constant-out-of-range-compare134134KBUILD_CFLAGS += $(call cc-disable-warning, unaligned-access)135135KBUILD_CFLAGS += -Wno-enum-compare-conditional136136-KBUILD_CFLAGS += -Wno-enum-enum-conversion137136endif138137139138endif···155156KBUILD_CFLAGS += -Wno-missing-field-initializers156157KBUILD_CFLAGS += -Wno-type-limits157158KBUILD_CFLAGS += -Wno-shift-negative-value159159+160160+ifdef CONFIG_CC_IS_CLANG161161+KBUILD_CFLAGS += -Wno-enum-enum-conversion162162+endif158163159164ifdef CONFIG_CC_IS_GCC160165KBUILD_CFLAGS += -Wno-maybe-uninitialized
+1-1
scripts/Makefile.lib
···305305# These are shared by some Makefile.* files.306306307307ifdef CONFIG_LTO_CLANG308308-# Run $(LD) here to covert LLVM IR to ELF in the following cases:308308+# Run $(LD) here to convert LLVM IR to ELF in the following cases:309309# - when this object needs objtool processing, as objtool cannot process LLVM IR310310# - when this is a single-object module, as modpost cannot process LLVM IR311311cmd_ld_single = $(if $(objtool-enabled)$(is-single-obj-m), ; $(LD) $(ld_flags) -r -o $(tmp-target) $@; mv $(tmp-target) $@)
+18
scripts/generate_rust_target.rs
···165165 let option = "CONFIG_".to_owned() + option;166166 self.0.contains_key(&option)167167 }168168+169169+ /// Is the rustc version at least `major.minor.patch`?170170+ fn rustc_version_atleast(&self, major: u32, minor: u32, patch: u32) -> bool {171171+ let check_version = 100000 * major + 100 * minor + patch;172172+ let actual_version = self173173+ .0174174+ .get("CONFIG_RUSTC_VERSION")175175+ .unwrap()176176+ .parse::<u32>()177177+ .unwrap();178178+ check_version <= actual_version179179+ }168180}169181170182fn main() {···194182 }195183 } else if cfg.has("X86_64") {196184 ts.push("arch", "x86_64");185185+ if cfg.rustc_version_atleast(1, 86, 0) {186186+ ts.push("rustc-abi", "x86-softfloat");187187+ }197188 ts.push(198189 "data-layout",199190 "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128",···230215 panic!("32-bit x86 only works under UML");231216 }232217 ts.push("arch", "x86");218218+ if cfg.rustc_version_atleast(1, 86, 0) {219219+ ts.push("rustc-abi", "x86-softfloat");220220+ }233221 ts.push(234222 "data-layout",235223 "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128",
+35
scripts/mod/modpost.c
···507507 info->modinfo_len = sechdrs[i].sh_size;508508 } else if (!strcmp(secname, ".export_symbol")) {509509 info->export_symbol_secndx = i;510510+ } else if (!strcmp(secname, ".no_trim_symbol")) {511511+ info->no_trim_symbol = (void *)hdr + sechdrs[i].sh_offset;512512+ info->no_trim_symbol_len = sechdrs[i].sh_size;510513 }511514512515 if (sechdrs[i].sh_type == SHT_SYMTAB) {···15691566 /* strip trailing .o */15701567 mod = new_module(modname, strlen(modname) - strlen(".o"));1571156815691569+ /* save .no_trim_symbol section for later use */15701570+ if (info.no_trim_symbol_len) {15711571+ mod->no_trim_symbol = xmalloc(info.no_trim_symbol_len);15721572+ memcpy(mod->no_trim_symbol, info.no_trim_symbol,15731573+ info.no_trim_symbol_len);15741574+ mod->no_trim_symbol_len = info.no_trim_symbol_len;15751575+ }15761576+15721577 if (!mod->is_vmlinux) {15731578 license = get_modinfo(&info, "license");15741579 if (!license)···17371726 }1738172717391728 free(buf);17291729+}17301730+17311731+/*17321732+ * Keep symbols recorded in the .no_trim_symbol section. This is necessary to17331733+ * prevent CONFIG_TRIM_UNUSED_KSYMS from dropping EXPORT_SYMBOL because17341734+ * symbol_get() relies on the symbol being present in the ksymtab for lookups.17351735+ */17361736+static void keep_no_trim_symbols(struct module *mod)17371737+{17381738+ unsigned long size = mod->no_trim_symbol_len;17391739+17401740+ for (char *s = mod->no_trim_symbol; s; s = next_string(s , &size)) {17411741+ struct symbol *sym;17421742+17431743+ /*17441744+ * If find_symbol() returns NULL, this symbol is not provided17451745+ * by any module, and symbol_get() will fail.17461746+ */17471747+ sym = find_symbol(s);17481748+ if (sym)17491749+ sym->used = true;17501750+ }17401751}1741175217421753static void check_modname_len(struct module *mod)···22872254 read_symbols_from_files(files_source);2288225522892256 list_for_each_entry(mod, &modules, list) {22572257+ keep_no_trim_symbols(mod);22582258+22902259 if (mod->dump_file || mod->is_vmlinux)22912260 continue;22922261
+6
scripts/mod/modpost.h
···111111 *112112 * @dump_file: path to the .symvers file if loaded from a file113113 * @aliases: list head for module_aliases114114+ * @no_trim_symbol: .no_trim_symbol section data115115+ * @no_trim_symbol_len: length of the .no_trim_symbol section114116 */115117struct module {116118 struct list_head list;···130128 // Actual imported namespaces131129 struct list_head imported_namespaces;132130 struct list_head aliases;131131+ char *no_trim_symbol;132132+ unsigned int no_trim_symbol_len;133133 char name[];134134};135135···145141 char *strtab;146142 char *modinfo;147143 unsigned int modinfo_len;144144+ char *no_trim_symbol;145145+ unsigned int no_trim_symbol_len;148146149147 /* support for 32bit section numbers */150148
···313313 "matchPattern": "qdisc bfifo 1: root",314314 "matchCount": "0",315315 "teardown": [316316+ ]317317+ },318318+ {319319+ "id": "d774",320320+ "name": "Check pfifo_head_drop qdisc enqueue behaviour when limit == 0",321321+ "category": [322322+ "qdisc",323323+ "pfifo_head_drop"324324+ ],325325+ "plugins": {326326+ "requires": "nsPlugin"327327+ },328328+ "setup": [329329+ "$IP addr add 10.10.10.10/24 dev $DUMMY || true",330330+ "$TC qdisc add dev $DUMMY root handle 1: pfifo_head_drop limit 0",331331+ "$IP link set dev $DUMMY up || true"332332+ ],333333+ "cmdUnderTest": "ping -c2 -W0.01 -I $DUMMY 10.10.10.1",334334+ "expExitCode": "1",335335+ "verifyCmd": "$TC -s qdisc show dev $DUMMY",336336+ "matchPattern": "dropped 2",337337+ "matchCount": "1",338338+ "teardown": [316339 ]317340 }318341]
+9-16
virt/kvm/kvm_main.c
···10711071}1072107210731073/*10741074- * Called after the VM is otherwise initialized, but just before adding it to10751075- * the vm_list.10761076- */10771077-int __weak kvm_arch_post_init_vm(struct kvm *kvm)10781078-{10791079- return 0;10801080-}10811081-10821082-/*10831074 * Called just after removing the VM from the vm_list, but before doing any10841075 * other destruction.10851076 */···11901199 if (r)11911200 goto out_err_no_debugfs;1192120111931193- r = kvm_arch_post_init_vm(kvm);11941194- if (r)11951195- goto out_err;11961196-11971202 mutex_lock(&kvm_lock);11981203 list_add(&kvm->vm_list, &vm_list);11991204 mutex_unlock(&kvm_lock);···1199121212001213 return kvm;1201121412021202-out_err:12031203- kvm_destroy_vm_debugfs(kvm);12041215out_err_no_debugfs:12051216 kvm_coalesced_mmio_free(kvm);12061217out_no_coalesced_mmio:···19561971 return -EINVAL;19571972 if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)19581973 return -EINVAL;19591959- if ((mem->memory_size >> PAGE_SHIFT) > KVM_MEM_MAX_NR_PAGES)19741974+19751975+ /*19761976+ * The size of userspace-defined memory regions is restricted in order19771977+ * to play nice with dirty bitmap operations, which are indexed with an19781978+ * "unsigned int". KVM's internal memory regions don't support dirty19791979+ * logging, and so are exempt.19801980+ */19811981+ if (id < KVM_USER_MEM_SLOTS &&19821982+ (mem->memory_size >> PAGE_SHIFT) > KVM_MEM_MAX_NR_PAGES)19601983 return -EINVAL;1961198419621985 slots = __kvm_memslots(kvm, as_id);