Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING

CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not provide stolen
time accounting unless CONFIG_PARAVIRT_TIME_ACCOUNTING is enabled.
Implement this using the VPA accumulated wait counters.

Note this will not work on current KVM hosts because KVM does not
implement the VPA dispatch counters (yet). It could be implemented
with the dispatch trace log as it is for VIRT_CPU_ACCOUNTING_NATIVE,
but that is not necessary for the more limited accounting provided
by PARAVIRT_TIME_ACCOUNTING, and it is more expensive, complex, and
has downsides like potential log wrap.

From Shrikanth:

[...] it was tested on Power10 [PowerVM] Shared LPAR. system has two
LPAR. we will call first one LPAR1 and second one as LPAR2. Test was
carried out in SMT=1. Similar observation was seen in SMT=8 as well.

LPAR config header from each LPAR is below. LPAR1 is twice as big as
LPAR2. Since Both are sharing the same underlying hardware, work
stealing will happen when both the LPAR's are contending for the same
resource.

LPAR1:
type=Shared mode=Uncapped smt=Off lcpu=40 cpus=40 ent=20.00
LPAR2:
type=Shared mode=Uncapped smt=Off lcpu=20 cpus=40 ent=10.00

mpstat was used to check for the utilization. stress-ng has been used
as the workload. Few cases are tested. when the both LPAR are idle
there is no steal time. when LPAR1 starts running at 100% which
consumes all of the physical resource, steal time starts to get
accounted. With LPAR1 running at 100% and LPAR2 starts running, steal
time starts increasing. This is as expected. When the LPAR2 Load is
increased further, steal time increases further.

Case 1: 0% LPAR1; 0% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 99.95

Case 2: 100% LPAR1; 0% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
97.68 0.00 0.00 0.00 0.00 0.00 2.32 0.00 0.00 0.00

Case 3: 100% LPAR1; 50% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
86.34 0.00 0.10 0.00 0.00 0.03 13.54 0.00 0.00 0.00

Case 4: 100% LPAR1; 100% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
78.54 0.00 0.07 0.00 0.00 0.02 21.36 0.00 0.00 0.00

Case 5: 50% LPAR1; 100% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
49.37 0.00 0.00 0.00 0.00 0.00 1.17 0.00 0.00 49.47

Patch is accounting for the steal time and basic tests are holding
good.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
[mpe: Add SPDX tag to new paravirt_api_clock.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220902085316.2071519-3-npiggin@gmail.com

authored by

Nicholas Piggin and committed by
Michael Ellerman
0e8a6313 a8933c8d

+55 -3
+3 -3
Documentation/admin-guide/kernel-parameters.txt
··· 3741 3741 [X86,PV_OPS] Disable paravirtualized VMware scheduler 3742 3742 clock and use the default one. 3743 3743 3744 - no-steal-acc [X86,PV_OPS,ARM64] Disable paravirtualized steal time 3745 - accounting. steal time is computed, but won't 3746 - influence scheduler behaviour 3744 + no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES] Disable paravirtualized 3745 + steal time accounting. steal time is computed, but 3746 + won't influence scheduler behaviour 3747 3747 3748 3748 nolapic [X86-32,APIC] Do not enable or use the local APIC. 3749 3749
+12
arch/powerpc/include/asm/paravirt.h
··· 21 21 return static_branch_unlikely(&shared_processor); 22 22 } 23 23 24 + #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING 25 + extern struct static_key paravirt_steal_enabled; 26 + extern struct static_key paravirt_steal_rq_enabled; 27 + 28 + u64 pseries_paravirt_steal_clock(int cpu); 29 + 30 + static inline u64 paravirt_steal_clock(int cpu) 31 + { 32 + return pseries_paravirt_steal_clock(cpu); 33 + } 34 + #endif 35 + 24 36 /* If bit 0 is set, the cpu has been ceded, conferred, or preempted */ 25 37 static inline u32 yield_count_of(int cpu) 26 38 {
+2
arch/powerpc/include/asm/paravirt_api_clock.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #include <asm/paravirt.h>
+8
arch/powerpc/platforms/pseries/Kconfig
··· 23 23 select SWIOTLB 24 24 default y 25 25 26 + config PARAVIRT 27 + bool 28 + 26 29 config PARAVIRT_SPINLOCKS 30 + bool 31 + 32 + config PARAVIRT_TIME_ACCOUNTING 33 + select PARAVIRT 27 34 bool 28 35 29 36 config PPC_SPLPAR 30 37 bool "Support for shared-processor logical partitions" 31 38 depends on PPC_PSERIES 32 39 select PARAVIRT_SPINLOCKS if PPC_QUEUED_SPINLOCKS 40 + select PARAVIRT_TIME_ACCOUNTING if VIRT_CPU_ACCOUNTING_GEN 33 41 default y 34 42 help 35 43 Enabling this option will make the kernel run more efficiently
+11
arch/powerpc/platforms/pseries/lpar.c
··· 660 660 } 661 661 662 662 machine_device_initcall(pseries, vcpudispatch_stats_procfs_init); 663 + 664 + #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING 665 + u64 pseries_paravirt_steal_clock(int cpu) 666 + { 667 + struct lppaca *lppaca = &lppaca_of(cpu); 668 + 669 + return be64_to_cpu(READ_ONCE(lppaca->enqueue_dispatch_tb)) + 670 + be64_to_cpu(READ_ONCE(lppaca->ready_enqueue_tb)); 671 + } 672 + #endif 673 + 663 674 #endif /* CONFIG_PPC_SPLPAR */ 664 675 665 676 void vpa_init(int cpu)
+19
arch/powerpc/platforms/pseries/setup.c
··· 80 80 DEFINE_STATIC_KEY_FALSE(shared_processor); 81 81 EXPORT_SYMBOL(shared_processor); 82 82 83 + #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING 84 + struct static_key paravirt_steal_enabled; 85 + struct static_key paravirt_steal_rq_enabled; 86 + 87 + static bool steal_acc = true; 88 + static int __init parse_no_stealacc(char *arg) 89 + { 90 + steal_acc = false; 91 + return 0; 92 + } 93 + 94 + early_param("no-steal-acc", parse_no_stealacc); 95 + #endif 96 + 83 97 int CMO_PrPSP = -1; 84 98 int CMO_SecPSP = -1; 85 99 unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K); ··· 848 834 if (lppaca_shared_proc(get_lppaca())) { 849 835 static_branch_enable(&shared_processor); 850 836 pv_spinlocks_init(); 837 + #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING 838 + static_key_slow_inc(&paravirt_steal_enabled); 839 + if (steal_acc) 840 + static_key_slow_inc(&paravirt_steal_rq_enabled); 841 + #endif 851 842 } 852 843 853 844 ppc_md.power_save = pseries_lpar_idle;