Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

s390/cpumf: simplify detection of guest samples

There are three different code levels in regard to the identification
of guest samples. They differ in the way the LPP instruction is used.

1) Old kernels without the LPP instruction. The guest program parameter
is always zero.
2) Newer kernels load the process pid into the program parameter with LPP.
The guest program parameter is non-zero if the guest executes in a
process != idle.
3) The latest kernels load ((1UL << 31) | pid) with LPP to make the value
non-zero even for the idle task. The guest program parameter is non-zero
if the guest is running.

All kernels load the process pid to CR4 on context switch. The CPU sampling
code uses the value in CR4 to decide between guest and host samples in case
the guest program parameter is zero. The three cases:

1) CR4==pid, gpp==0
2) CR4==pid, gpp==pid
3) CR4==pid, gpp==((1UL << 31) | pid)

The load-control instruction to load the pid into CR4 is expensive and the
goal is to remove it. To distinguish the host CR4 from the guest pid for
the idle process the maximum value 0xffff for the PASN is used.
This adds a fourth case for a guest OS with an updated kernel:

4) CR4==0xffff, gpp=((1UL << 31) | pid)

The host kernel will have CR4==0xffff and will use (gpp!=0 || CR4!==0xffff)
to identify guest samples. This works nicely with all 4 cases, the only
possible issue would be a guest with an old kernel (gpp==0) and a process
pid of 0xffff. Well, don't do that..

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

+4 -9
-4
arch/s390/kernel/entry.S
··· 189 189 stg %r3,__LC_CURRENT # store task struct of next 190 190 stg %r15,__LC_KERNEL_STACK # store end of kernel stack 191 191 lg %r15,__THREAD_ksp(%r1) # load kernel stack of next 192 - /* c4 is used in guest detection: arch/s390/kernel/perf_cpum_sf.c */ 193 - xc __SF_EMPTY(8,%r15),__SF_EMPTY(%r15) 194 - mvc __SF_EMPTY+4(4,%r15),__TASK_pid(%r3) 195 - lctlg %c4,%c4,__SF_EMPTY(%r15) # load pid to control reg. 4 196 192 mvc __LC_CURRENT_PID(4,%r0),__TASK_pid(%r3) # store pid of next 197 193 lmg %r6,%r15,__SF_GPRS(%r15) # load gprs of next task 198 194 TSTMSK __LC_MACHINE_FLAGS,MACHINE_FLAG_LPP
+1 -1
arch/s390/kernel/head64.S
··· 52 52 .quad 0 # cr1: primary space segment table 53 53 .quad .Lduct # cr2: dispatchable unit control table 54 54 .quad 0 # cr3: instruction authorization 55 - .quad 0 # cr4: instruction authorization 55 + .quad 0xffff # cr4: instruction authorization 56 56 .quad .Lduct # cr5: primary-aste origin 57 57 .quad 0 # cr6: I/O interrupts 58 58 .quad 0 # cr7: secondary space segment table
+3 -4
arch/s390/kernel/perf_cpum_sf.c
··· 1009 1009 * sample. Some early samples or samples from guests without 1010 1010 * lpp usage would be misaccounted to the host. We use the asn 1011 1011 * value as an addon heuristic to detect most of these guest samples. 1012 - * If the value differs from the host hpp value, we assume to be a 1013 - * KVM guest. 1012 + * If the value differs from 0xffff (the host value), we assume to 1013 + * be a KVM guest. 1014 1014 */ 1015 1015 switch (sfr->basic.CL) { 1016 1016 case 1: /* logical partition */ ··· 1020 1020 sde_regs->in_guest = 1; 1021 1021 break; 1022 1022 default: /* old machine, use heuristics */ 1023 - if (sfr->basic.gpp || 1024 - sfr->basic.prim_asn != (u16)sfr->basic.hpp) 1023 + if (sfr->basic.gpp || sfr->basic.prim_asn != 0xffff) 1025 1024 sde_regs->in_guest = 1; 1026 1025 break; 1027 1026 }