Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cache quality monitoring update from Thomas Gleixner:
"This update provides a complete rewrite of the Cache Quality
Monitoring (CQM) facility.

The existing CQM support was duct taped into perf with a lot of issues
and the attempts to fix those turned out to be incomplete and
horrible.

After lengthy discussions it was decided to integrate the CQM support
into the Resource Director Technology (RDT) facility, which is the
obvious choise as in hardware CQM is part of RDT. This allowed to add
Memory Bandwidth Monitoring support on top.

As a result the mechanisms for allocating cache/memory bandwidth and
the corresponding monitoring mechanisms are integrated into a single
management facility with a consistent user interface"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/intel_rdt: Turn off most RDT features on Skylake
x86/intel_rdt: Add command line options for resource director technology
x86/intel_rdt: Move special case code for Haswell to a quirk function
x86/intel_rdt: Remove redundant ternary operator on return
x86/intel_rdt/cqm: Improve limbo list processing
x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
x86/intel_rdt: Modify the intel_pqr_state for better performance
x86/intel_rdt/cqm: Clear the default RMID during hotcpu
x86/intel_rdt: Show bitmask of shareable resource with other executing units
x86/intel_rdt/mbm: Handle counter overflow
x86/intel_rdt/mbm: Add mbm counter initialization
x86/intel_rdt/mbm: Basic counting of MBM events (total and local)
x86/intel_rdt/cqm: Add CPU hotplug support
x86/intel_rdt/cqm: Add sched_in support
x86/intel_rdt: Introduce rdt_enable_key for scheduling
x86/intel_rdt/cqm: Add mount,umount support
x86/intel_rdt/cqm: Add rmdir support
x86/intel_rdt: Separate the ctrl bits from rmdir
x86/intel_rdt/cqm: Add mon_data
x86/intel_rdt: Prepare for RDT monitor data support
...

+2643 -2439
+1
Documentation/admin-guide/kernel-parameters.rst
··· 138 PPT Parallel port support is enabled. 139 PS2 Appropriate PS/2 support is enabled. 140 RAM RAM disk support is enabled. 141 S390 S390 architecture is enabled. 142 SCSI Appropriate SCSI support is enabled. 143 A lot of drivers have their options described inside
··· 138 PPT Parallel port support is enabled. 139 PS2 Appropriate PS/2 support is enabled. 140 RAM RAM disk support is enabled. 141 + RDT Intel Resource Director Technology. 142 S390 S390 architecture is enabled. 143 SCSI Appropriate SCSI support is enabled. 144 A lot of drivers have their options described inside
+6
Documentation/admin-guide/kernel-parameters.txt
··· 3612 Run specified binary instead of /init from the ramdisk, 3613 used for early userspace startup. See initrd. 3614 3615 reboot= [KNL] 3616 Format (x86 or x86_64): 3617 [w[arm] | c[old] | h[ard] | s[oft] | g[pio]] \
··· 3612 Run specified binary instead of /init from the ramdisk, 3613 used for early userspace startup. See initrd. 3614 3615 + rdt= [HW,X86,RDT] 3616 + Turn on/off individual RDT features. List is: 3617 + cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, mba. 3618 + E.g. to turn on cmt and turn off mba use: 3619 + rdt=cmt,!mba 3620 + 3621 reboot= [KNL] 3622 Format (x86 or x86_64): 3623 [w[arm] | c[old] | h[ard] | s[oft] | g[pio]] \
+285 -38
Documentation/x86/intel_rdt_ui.txt
··· 6 Tony Luck <tony.luck@intel.com> 7 Vikas Shivappa <vikas.shivappa@intel.com> 8 9 - This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the 10 - X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3". 11 12 To use the feature mount the file system: 13 ··· 17 18 "cdp": Enable code/data prioritization in L3 cache allocations. 19 20 21 Info directory 22 -------------- ··· 31 The 'info' directory contains information about the enabled 32 resources. Each resource has its own subdirectory. The subdirectory 33 names reflect the resource names. 34 - Cache resource(L3/L2) subdirectory contains the following files: 35 36 "num_closids": The number of CLOSIDs which are valid for this 37 resource. The kernel uses the smallest number of ··· 48 "min_cbm_bits": The minimum number of consecutive bits which 49 must be set when writing a mask. 50 51 - Memory bandwitdh(MB) subdirectory contains the following files: 52 53 "min_bandwidth": The minimum memory bandwidth percentage which 54 user can request. ··· 72 non-linear. This field is purely informational 73 only. 74 75 - Resource groups 76 - --------------- 77 Resource groups are represented as directories in the resctrl file 78 - system. The default group is the root directory. Other groups may be 79 - created as desired by the system administrator using the "mkdir(1)" 80 - command, and removed using "rmdir(1)". 81 82 - There are three files associated with each group: 83 84 - "tasks": A list of tasks that belongs to this group. Tasks can be 85 - added to a group by writing the task ID to the "tasks" file 86 - (which will automatically remove them from the previous 87 - group to which they belonged). New tasks created by fork(2) 88 - and clone(2) are added to the same group as their parent. 89 - If a pid is not in any sub partition, it is in root partition 90 - (i.e. default partition). 91 92 - "cpus": A bitmask of logical CPUs assigned to this group. Writing 93 - a new mask can add/remove CPUs from this group. Added CPUs 94 - are removed from their previous group. Removed ones are 95 - given to the default (root) group. You cannot remove CPUs 96 - from the default group. 97 98 - "cpus_list": One or more CPU ranges of logical CPUs assigned to this 99 - group. Same rules apply like for the "cpus" file. 100 101 - "schemata": A list of all the resources available to this group. 102 - Each resource has its own line and format - see below for 103 - details. 104 105 - When a task is running the following rules define which resources 106 - are available to it: 107 108 1) If the task is a member of a non-default group, then the schemata 109 - for that group is used. 110 111 2) Else if the task belongs to the default group, but is running on a 112 - CPU that is assigned to some specific group, then the schemata for 113 - the CPU's group is used. 114 115 3) Otherwise the schemata for the default group is used. 116 117 118 Schemata files - general concepts 119 --------------------------------- ··· 267 sharing a core will result in both threads being throttled to use the 268 low bandwidth. 269 270 - L3 details (code and data prioritization disabled) 271 - -------------------------------------------------- 272 With CDP disabled the L3 schemata format is: 273 274 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 275 276 - L3 details (CDP enabled via mount option to resctrl) 277 - ---------------------------------------------------- 278 When CDP is enabled L3 control is split into two separate resources 279 so you can specify independent masks for code and data like this: 280 281 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 282 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 283 284 - L2 details 285 - ---------- 286 L2 cache does not support code and data prioritization, so the 287 schemata format is always: 288 ··· 308 # cat schemata 309 L3DATA:0=fffff;1=fffff;2=3c0;3=fffff 310 L3CODE:0=fffff;1=fffff;2=fffff;3=fffff 311 312 Example 1 313 --------- ··· 536 /* code to read and write directory contents */ 537 resctrl_release_lock(fd); 538 }
··· 6 Tony Luck <tony.luck@intel.com> 7 Vikas Shivappa <vikas.shivappa@intel.com> 8 9 + This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the 10 + X86 /proc/cpuinfo flag bits "rdt", "cqm", "cat_l3" and "cdp_l3". 11 12 To use the feature mount the file system: 13 ··· 17 18 "cdp": Enable code/data prioritization in L3 cache allocations. 19 20 + RDT features are orthogonal. A particular system may support only 21 + monitoring, only control, or both monitoring and control. 22 + 23 + The mount succeeds if either of allocation or monitoring is present, but 24 + only those files and directories supported by the system will be created. 25 + For more details on the behavior of the interface during monitoring 26 + and allocation, see the "Resource alloc and monitor groups" section. 27 28 Info directory 29 -------------- ··· 24 The 'info' directory contains information about the enabled 25 resources. Each resource has its own subdirectory. The subdirectory 26 names reflect the resource names. 27 + 28 + Each subdirectory contains the following files with respect to 29 + allocation: 30 + 31 + Cache resource(L3/L2) subdirectory contains the following files 32 + related to allocation: 33 34 "num_closids": The number of CLOSIDs which are valid for this 35 resource. The kernel uses the smallest number of ··· 36 "min_cbm_bits": The minimum number of consecutive bits which 37 must be set when writing a mask. 38 39 + "shareable_bits": Bitmask of shareable resource with other executing 40 + entities (e.g. I/O). User can use this when 41 + setting up exclusive cache partitions. Note that 42 + some platforms support devices that have their 43 + own settings for cache use which can over-ride 44 + these bits. 45 + 46 + Memory bandwitdh(MB) subdirectory contains the following files 47 + with respect to allocation: 48 49 "min_bandwidth": The minimum memory bandwidth percentage which 50 user can request. ··· 52 non-linear. This field is purely informational 53 only. 54 55 + If RDT monitoring is available there will be an "L3_MON" directory 56 + with the following files: 57 + 58 + "num_rmids": The number of RMIDs available. This is the 59 + upper bound for how many "CTRL_MON" + "MON" 60 + groups can be created. 61 + 62 + "mon_features": Lists the monitoring events if 63 + monitoring is enabled for the resource. 64 + 65 + "max_threshold_occupancy": 66 + Read/write file provides the largest value (in 67 + bytes) at which a previously used LLC_occupancy 68 + counter can be considered for re-use. 69 + 70 + 71 + Resource alloc and monitor groups 72 + --------------------------------- 73 + 74 Resource groups are represented as directories in the resctrl file 75 + system. The default group is the root directory which, immediately 76 + after mounting, owns all the tasks and cpus in the system and can make 77 + full use of all resources. 78 79 + On a system with RDT control features additional directories can be 80 + created in the root directory that specify different amounts of each 81 + resource (see "schemata" below). The root and these additional top level 82 + directories are referred to as "CTRL_MON" groups below. 83 84 + On a system with RDT monitoring the root directory and other top level 85 + directories contain a directory named "mon_groups" in which additional 86 + directories can be created to monitor subsets of tasks in the CTRL_MON 87 + group that is their ancestor. These are called "MON" groups in the rest 88 + of this document. 89 90 + Removing a directory will move all tasks and cpus owned by the group it 91 + represents to the parent. Removing one of the created CTRL_MON groups 92 + will automatically remove all MON groups below it. 93 94 + All groups contain the following files: 95 96 + "tasks": 97 + Reading this file shows the list of all tasks that belong to 98 + this group. Writing a task id to the file will add a task to the 99 + group. If the group is a CTRL_MON group the task is removed from 100 + whichever previous CTRL_MON group owned the task and also from 101 + any MON group that owned the task. If the group is a MON group, 102 + then the task must already belong to the CTRL_MON parent of this 103 + group. The task is removed from any previous MON group. 104 105 + 106 + "cpus": 107 + Reading this file shows a bitmask of the logical CPUs owned by 108 + this group. Writing a mask to this file will add and remove 109 + CPUs to/from this group. As with the tasks file a hierarchy is 110 + maintained where MON groups may only include CPUs owned by the 111 + parent CTRL_MON group. 112 + 113 + 114 + "cpus_list": 115 + Just like "cpus", only using ranges of CPUs instead of bitmasks. 116 + 117 + 118 + When control is enabled all CTRL_MON groups will also contain: 119 + 120 + "schemata": 121 + A list of all the resources available to this group. 122 + Each resource has its own line and format - see below for details. 123 + 124 + When monitoring is enabled all MON groups will also contain: 125 + 126 + "mon_data": 127 + This contains a set of files organized by L3 domain and by 128 + RDT event. E.g. on a system with two L3 domains there will 129 + be subdirectories "mon_L3_00" and "mon_L3_01". Each of these 130 + directories have one file per event (e.g. "llc_occupancy", 131 + "mbm_total_bytes", and "mbm_local_bytes"). In a MON group these 132 + files provide a read out of the current value of the event for 133 + all tasks in the group. In CTRL_MON groups these files provide 134 + the sum for all tasks in the CTRL_MON group and all tasks in 135 + MON groups. Please see example section for more details on usage. 136 + 137 + Resource allocation rules 138 + ------------------------- 139 + When a task is running the following rules define which resources are 140 + available to it: 141 142 1) If the task is a member of a non-default group, then the schemata 143 + for that group is used. 144 145 2) Else if the task belongs to the default group, but is running on a 146 + CPU that is assigned to some specific group, then the schemata for the 147 + CPU's group is used. 148 149 3) Otherwise the schemata for the default group is used. 150 151 + Resource monitoring rules 152 + ------------------------- 153 + 1) If a task is a member of a MON group, or non-default CTRL_MON group 154 + then RDT events for the task will be reported in that group. 155 + 156 + 2) If a task is a member of the default CTRL_MON group, but is running 157 + on a CPU that is assigned to some specific group, then the RDT events 158 + for the task will be reported in that group. 159 + 160 + 3) Otherwise RDT events for the task will be reported in the root level 161 + "mon_data" group. 162 + 163 + 164 + Notes on cache occupancy monitoring and control 165 + ----------------------------------------------- 166 + When moving a task from one group to another you should remember that 167 + this only affects *new* cache allocations by the task. E.g. you may have 168 + a task in a monitor group showing 3 MB of cache occupancy. If you move 169 + to a new group and immediately check the occupancy of the old and new 170 + groups you will likely see that the old group is still showing 3 MB and 171 + the new group zero. When the task accesses locations still in cache from 172 + before the move, the h/w does not update any counters. On a busy system 173 + you will likely see the occupancy in the old group go down as cache lines 174 + are evicted and re-used while the occupancy in the new group rises as 175 + the task accesses memory and loads into the cache are counted based on 176 + membership in the new group. 177 + 178 + The same applies to cache allocation control. Moving a task to a group 179 + with a smaller cache partition will not evict any cache lines. The 180 + process may continue to use them from the old partition. 181 + 182 + Hardware uses CLOSid(Class of service ID) and an RMID(Resource monitoring ID) 183 + to identify a control group and a monitoring group respectively. Each of 184 + the resource groups are mapped to these IDs based on the kind of group. The 185 + number of CLOSid and RMID are limited by the hardware and hence the creation of 186 + a "CTRL_MON" directory may fail if we run out of either CLOSID or RMID 187 + and creation of "MON" group may fail if we run out of RMIDs. 188 + 189 + max_threshold_occupancy - generic concepts 190 + ------------------------------------------ 191 + 192 + Note that an RMID once freed may not be immediately available for use as 193 + the RMID is still tagged the cache lines of the previous user of RMID. 194 + Hence such RMIDs are placed on limbo list and checked back if the cache 195 + occupancy has gone down. If there is a time when system has a lot of 196 + limbo RMIDs but which are not ready to be used, user may see an -EBUSY 197 + during mkdir. 198 + 199 + max_threshold_occupancy is a user configurable value to determine the 200 + occupancy at which an RMID can be freed. 201 202 Schemata files - general concepts 203 --------------------------------- ··· 143 sharing a core will result in both threads being throttled to use the 144 low bandwidth. 145 146 + L3 schemata file details (code and data prioritization disabled) 147 + ---------------------------------------------------------------- 148 With CDP disabled the L3 schemata format is: 149 150 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 151 152 + L3 schemata file details (CDP enabled via mount option to resctrl) 153 + ------------------------------------------------------------------ 154 When CDP is enabled L3 control is split into two separate resources 155 so you can specify independent masks for code and data like this: 156 157 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 158 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 159 160 + L2 schemata file details 161 + ------------------------ 162 L2 cache does not support code and data prioritization, so the 163 schemata format is always: 164 ··· 184 # cat schemata 185 L3DATA:0=fffff;1=fffff;2=3c0;3=fffff 186 L3CODE:0=fffff;1=fffff;2=fffff;3=fffff 187 + 188 + Examples for RDT allocation usage: 189 190 Example 1 191 --------- ··· 410 /* code to read and write directory contents */ 411 resctrl_release_lock(fd); 412 } 413 + 414 + Examples for RDT Monitoring along with allocation usage: 415 + 416 + Reading monitored data 417 + ---------------------- 418 + Reading an event file (for ex: mon_data/mon_L3_00/llc_occupancy) would 419 + show the current snapshot of LLC occupancy of the corresponding MON 420 + group or CTRL_MON group. 421 + 422 + 423 + Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group) 424 + --------- 425 + On a two socket machine (one L3 cache per socket) with just four bits 426 + for cache bit masks 427 + 428 + # mount -t resctrl resctrl /sys/fs/resctrl 429 + # cd /sys/fs/resctrl 430 + # mkdir p0 p1 431 + # echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata 432 + # echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata 433 + # echo 5678 > p1/tasks 434 + # echo 5679 > p1/tasks 435 + 436 + The default resource group is unmodified, so we have access to all parts 437 + of all caches (its schemata file reads "L3:0=f;1=f"). 438 + 439 + Tasks that are under the control of group "p0" may only allocate from the 440 + "lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1. 441 + Tasks in group "p1" use the "lower" 50% of cache on both sockets. 442 + 443 + Create monitor groups and assign a subset of tasks to each monitor group. 444 + 445 + # cd /sys/fs/resctrl/p1/mon_groups 446 + # mkdir m11 m12 447 + # echo 5678 > m11/tasks 448 + # echo 5679 > m12/tasks 449 + 450 + fetch data (data shown in bytes) 451 + 452 + # cat m11/mon_data/mon_L3_00/llc_occupancy 453 + 16234000 454 + # cat m11/mon_data/mon_L3_01/llc_occupancy 455 + 14789000 456 + # cat m12/mon_data/mon_L3_00/llc_occupancy 457 + 16789000 458 + 459 + The parent ctrl_mon group shows the aggregated data. 460 + 461 + # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy 462 + 31234000 463 + 464 + Example 2 (Monitor a task from its creation) 465 + --------- 466 + On a two socket machine (one L3 cache per socket) 467 + 468 + # mount -t resctrl resctrl /sys/fs/resctrl 469 + # cd /sys/fs/resctrl 470 + # mkdir p0 p1 471 + 472 + An RMID is allocated to the group once its created and hence the <cmd> 473 + below is monitored from its creation. 474 + 475 + # echo $$ > /sys/fs/resctrl/p1/tasks 476 + # <cmd> 477 + 478 + Fetch the data 479 + 480 + # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy 481 + 31789000 482 + 483 + Example 3 (Monitor without CAT support or before creating CAT groups) 484 + --------- 485 + 486 + Assume a system like HSW has only CQM and no CAT support. In this case 487 + the resctrl will still mount but cannot create CTRL_MON directories. 488 + But user can create different MON groups within the root group thereby 489 + able to monitor all tasks including kernel threads. 490 + 491 + This can also be used to profile jobs cache size footprint before being 492 + able to allocate them to different allocation groups. 493 + 494 + # mount -t resctrl resctrl /sys/fs/resctrl 495 + # cd /sys/fs/resctrl 496 + # mkdir mon_groups/m01 497 + # mkdir mon_groups/m02 498 + 499 + # echo 3478 > /sys/fs/resctrl/mon_groups/m01/tasks 500 + # echo 2467 > /sys/fs/resctrl/mon_groups/m02/tasks 501 + 502 + Monitor the groups separately and also get per domain data. From the 503 + below its apparent that the tasks are mostly doing work on 504 + domain(socket) 0. 505 + 506 + # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_00/llc_occupancy 507 + 31234000 508 + # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_01/llc_occupancy 509 + 34555 510 + # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_00/llc_occupancy 511 + 31234000 512 + # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_01/llc_occupancy 513 + 32789 514 + 515 + 516 + Example 4 (Monitor real time tasks) 517 + ----------------------------------- 518 + 519 + A single socket system which has real time tasks running on cores 4-7 520 + and non real time tasks on other cpus. We want to monitor the cache 521 + occupancy of the real time threads on these cores. 522 + 523 + # mount -t resctrl resctrl /sys/fs/resctrl 524 + # cd /sys/fs/resctrl 525 + # mkdir p1 526 + 527 + Move the cpus 4-7 over to p1 528 + # echo f0 > p0/cpus 529 + 530 + View the llc occupancy snapshot 531 + 532 + # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy 533 + 11234000
+1 -1
MAINTAINERS
··· 11121 L: linux-kernel@vger.kernel.org 11122 S: Supported 11123 F: arch/x86/kernel/cpu/intel_rdt* 11124 - F: arch/x86/include/asm/intel_rdt* 11125 F: Documentation/x86/intel_rdt* 11126 11127 READ-COPY UPDATE (RCU)
··· 11121 L: linux-kernel@vger.kernel.org 11122 S: Supported 11123 F: arch/x86/kernel/cpu/intel_rdt* 11124 + F: arch/x86/include/asm/intel_rdt_sched.h 11125 F: Documentation/x86/intel_rdt* 11126 11127 READ-COPY UPDATE (RCU)
+6 -6
arch/x86/Kconfig
··· 429 def_bool y 430 depends on X86_GOLDFISH 431 432 - config INTEL_RDT_A 433 - bool "Intel Resource Director Technology Allocation support" 434 default n 435 depends on X86 && CPU_SUP_INTEL 436 select KERNFS 437 help 438 - Select to enable resource allocation which is a sub-feature of 439 - Intel Resource Director Technology(RDT). More information about 440 - RDT can be found in the Intel x86 Architecture Software 441 - Developer Manual. 442 443 Say N if unsure. 444
··· 429 def_bool y 430 depends on X86_GOLDFISH 431 432 + config INTEL_RDT 433 + bool "Intel Resource Director Technology support" 434 default n 435 depends on X86 && CPU_SUP_INTEL 436 select KERNFS 437 help 438 + Select to enable resource allocation and monitoring which are 439 + sub-features of Intel Resource Director Technology(RDT). More 440 + information about RDT can be found in the Intel x86 441 + Architecture Software Developer Manual. 442 443 Say N if unsure. 444
+1 -1
arch/x86/events/intel/Makefile
··· 1 - obj-$(CONFIG_CPU_SUP_INTEL) += core.o bts.o cqm.o 2 obj-$(CONFIG_CPU_SUP_INTEL) += ds.o knc.o 3 obj-$(CONFIG_CPU_SUP_INTEL) += lbr.o p4.o p6.o pt.o 4 obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL) += intel-rapl-perf.o
··· 1 + obj-$(CONFIG_CPU_SUP_INTEL) += core.o bts.o 2 obj-$(CONFIG_CPU_SUP_INTEL) += ds.o knc.o 3 obj-$(CONFIG_CPU_SUP_INTEL) += lbr.o p4.o p6.o pt.o 4 obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL) += intel-rapl-perf.o
-1766
arch/x86/events/intel/cqm.c
··· 1 - /* 2 - * Intel Cache Quality-of-Service Monitoring (CQM) support. 3 - * 4 - * Based very, very heavily on work by Peter Zijlstra. 5 - */ 6 - 7 - #include <linux/perf_event.h> 8 - #include <linux/slab.h> 9 - #include <asm/cpu_device_id.h> 10 - #include <asm/intel_rdt_common.h> 11 - #include "../perf_event.h" 12 - 13 - #define MSR_IA32_QM_CTR 0x0c8e 14 - #define MSR_IA32_QM_EVTSEL 0x0c8d 15 - 16 - #define MBM_CNTR_WIDTH 24 17 - /* 18 - * Guaranteed time in ms as per SDM where MBM counters will not overflow. 19 - */ 20 - #define MBM_CTR_OVERFLOW_TIME 1000 21 - 22 - static u32 cqm_max_rmid = -1; 23 - static unsigned int cqm_l3_scale; /* supposedly cacheline size */ 24 - static bool cqm_enabled, mbm_enabled; 25 - unsigned int mbm_socket_max; 26 - 27 - /* 28 - * The cached intel_pqr_state is strictly per CPU and can never be 29 - * updated from a remote CPU. Both functions which modify the state 30 - * (intel_cqm_event_start and intel_cqm_event_stop) are called with 31 - * interrupts disabled, which is sufficient for the protection. 32 - */ 33 - DEFINE_PER_CPU(struct intel_pqr_state, pqr_state); 34 - static struct hrtimer *mbm_timers; 35 - /** 36 - * struct sample - mbm event's (local or total) data 37 - * @total_bytes #bytes since we began monitoring 38 - * @prev_msr previous value of MSR 39 - */ 40 - struct sample { 41 - u64 total_bytes; 42 - u64 prev_msr; 43 - }; 44 - 45 - /* 46 - * samples profiled for total memory bandwidth type events 47 - */ 48 - static struct sample *mbm_total; 49 - /* 50 - * samples profiled for local memory bandwidth type events 51 - */ 52 - static struct sample *mbm_local; 53 - 54 - #define pkg_id topology_physical_package_id(smp_processor_id()) 55 - /* 56 - * rmid_2_index returns the index for the rmid in mbm_local/mbm_total array. 57 - * mbm_total[] and mbm_local[] are linearly indexed by socket# * max number of 58 - * rmids per socket, an example is given below 59 - * RMID1 of Socket0: vrmid = 1 60 - * RMID1 of Socket1: vrmid = 1 * (cqm_max_rmid + 1) + 1 61 - * RMID1 of Socket2: vrmid = 2 * (cqm_max_rmid + 1) + 1 62 - */ 63 - #define rmid_2_index(rmid) ((pkg_id * (cqm_max_rmid + 1)) + rmid) 64 - /* 65 - * Protects cache_cgroups and cqm_rmid_free_lru and cqm_rmid_limbo_lru. 66 - * Also protects event->hw.cqm_rmid 67 - * 68 - * Hold either for stability, both for modification of ->hw.cqm_rmid. 69 - */ 70 - static DEFINE_MUTEX(cache_mutex); 71 - static DEFINE_RAW_SPINLOCK(cache_lock); 72 - 73 - /* 74 - * Groups of events that have the same target(s), one RMID per group. 75 - */ 76 - static LIST_HEAD(cache_groups); 77 - 78 - /* 79 - * Mask of CPUs for reading CQM values. We only need one per-socket. 80 - */ 81 - static cpumask_t cqm_cpumask; 82 - 83 - #define RMID_VAL_ERROR (1ULL << 63) 84 - #define RMID_VAL_UNAVAIL (1ULL << 62) 85 - 86 - /* 87 - * Event IDs are used to program IA32_QM_EVTSEL before reading event 88 - * counter from IA32_QM_CTR 89 - */ 90 - #define QOS_L3_OCCUP_EVENT_ID 0x01 91 - #define QOS_MBM_TOTAL_EVENT_ID 0x02 92 - #define QOS_MBM_LOCAL_EVENT_ID 0x03 93 - 94 - /* 95 - * This is central to the rotation algorithm in __intel_cqm_rmid_rotate(). 96 - * 97 - * This rmid is always free and is guaranteed to have an associated 98 - * near-zero occupancy value, i.e. no cachelines are tagged with this 99 - * RMID, once __intel_cqm_rmid_rotate() returns. 100 - */ 101 - static u32 intel_cqm_rotation_rmid; 102 - 103 - #define INVALID_RMID (-1) 104 - 105 - /* 106 - * Is @rmid valid for programming the hardware? 107 - * 108 - * rmid 0 is reserved by the hardware for all non-monitored tasks, which 109 - * means that we should never come across an rmid with that value. 110 - * Likewise, an rmid value of -1 is used to indicate "no rmid currently 111 - * assigned" and is used as part of the rotation code. 112 - */ 113 - static inline bool __rmid_valid(u32 rmid) 114 - { 115 - if (!rmid || rmid == INVALID_RMID) 116 - return false; 117 - 118 - return true; 119 - } 120 - 121 - static u64 __rmid_read(u32 rmid) 122 - { 123 - u64 val; 124 - 125 - /* 126 - * Ignore the SDM, this thing is _NOTHING_ like a regular perfcnt, 127 - * it just says that to increase confusion. 128 - */ 129 - wrmsr(MSR_IA32_QM_EVTSEL, QOS_L3_OCCUP_EVENT_ID, rmid); 130 - rdmsrl(MSR_IA32_QM_CTR, val); 131 - 132 - /* 133 - * Aside from the ERROR and UNAVAIL bits, assume this thing returns 134 - * the number of cachelines tagged with @rmid. 135 - */ 136 - return val; 137 - } 138 - 139 - enum rmid_recycle_state { 140 - RMID_YOUNG = 0, 141 - RMID_AVAILABLE, 142 - RMID_DIRTY, 143 - }; 144 - 145 - struct cqm_rmid_entry { 146 - u32 rmid; 147 - enum rmid_recycle_state state; 148 - struct list_head list; 149 - unsigned long queue_time; 150 - }; 151 - 152 - /* 153 - * cqm_rmid_free_lru - A least recently used list of RMIDs. 154 - * 155 - * Oldest entry at the head, newest (most recently used) entry at the 156 - * tail. This list is never traversed, it's only used to keep track of 157 - * the lru order. That is, we only pick entries of the head or insert 158 - * them on the tail. 159 - * 160 - * All entries on the list are 'free', and their RMIDs are not currently 161 - * in use. To mark an RMID as in use, remove its entry from the lru 162 - * list. 163 - * 164 - * 165 - * cqm_rmid_limbo_lru - list of currently unused but (potentially) dirty RMIDs. 166 - * 167 - * This list is contains RMIDs that no one is currently using but that 168 - * may have a non-zero occupancy value associated with them. The 169 - * rotation worker moves RMIDs from the limbo list to the free list once 170 - * the occupancy value drops below __intel_cqm_threshold. 171 - * 172 - * Both lists are protected by cache_mutex. 173 - */ 174 - static LIST_HEAD(cqm_rmid_free_lru); 175 - static LIST_HEAD(cqm_rmid_limbo_lru); 176 - 177 - /* 178 - * We use a simple array of pointers so that we can lookup a struct 179 - * cqm_rmid_entry in O(1). This alleviates the callers of __get_rmid() 180 - * and __put_rmid() from having to worry about dealing with struct 181 - * cqm_rmid_entry - they just deal with rmids, i.e. integers. 182 - * 183 - * Once this array is initialized it is read-only. No locks are required 184 - * to access it. 185 - * 186 - * All entries for all RMIDs can be looked up in the this array at all 187 - * times. 188 - */ 189 - static struct cqm_rmid_entry **cqm_rmid_ptrs; 190 - 191 - static inline struct cqm_rmid_entry *__rmid_entry(u32 rmid) 192 - { 193 - struct cqm_rmid_entry *entry; 194 - 195 - entry = cqm_rmid_ptrs[rmid]; 196 - WARN_ON(entry->rmid != rmid); 197 - 198 - return entry; 199 - } 200 - 201 - /* 202 - * Returns < 0 on fail. 203 - * 204 - * We expect to be called with cache_mutex held. 205 - */ 206 - static u32 __get_rmid(void) 207 - { 208 - struct cqm_rmid_entry *entry; 209 - 210 - lockdep_assert_held(&cache_mutex); 211 - 212 - if (list_empty(&cqm_rmid_free_lru)) 213 - return INVALID_RMID; 214 - 215 - entry = list_first_entry(&cqm_rmid_free_lru, struct cqm_rmid_entry, list); 216 - list_del(&entry->list); 217 - 218 - return entry->rmid; 219 - } 220 - 221 - static void __put_rmid(u32 rmid) 222 - { 223 - struct cqm_rmid_entry *entry; 224 - 225 - lockdep_assert_held(&cache_mutex); 226 - 227 - WARN_ON(!__rmid_valid(rmid)); 228 - entry = __rmid_entry(rmid); 229 - 230 - entry->queue_time = jiffies; 231 - entry->state = RMID_YOUNG; 232 - 233 - list_add_tail(&entry->list, &cqm_rmid_limbo_lru); 234 - } 235 - 236 - static void cqm_cleanup(void) 237 - { 238 - int i; 239 - 240 - if (!cqm_rmid_ptrs) 241 - return; 242 - 243 - for (i = 0; i < cqm_max_rmid; i++) 244 - kfree(cqm_rmid_ptrs[i]); 245 - 246 - kfree(cqm_rmid_ptrs); 247 - cqm_rmid_ptrs = NULL; 248 - cqm_enabled = false; 249 - } 250 - 251 - static int intel_cqm_setup_rmid_cache(void) 252 - { 253 - struct cqm_rmid_entry *entry; 254 - unsigned int nr_rmids; 255 - int r = 0; 256 - 257 - nr_rmids = cqm_max_rmid + 1; 258 - cqm_rmid_ptrs = kzalloc(sizeof(struct cqm_rmid_entry *) * 259 - nr_rmids, GFP_KERNEL); 260 - if (!cqm_rmid_ptrs) 261 - return -ENOMEM; 262 - 263 - for (; r <= cqm_max_rmid; r++) { 264 - struct cqm_rmid_entry *entry; 265 - 266 - entry = kmalloc(sizeof(*entry), GFP_KERNEL); 267 - if (!entry) 268 - goto fail; 269 - 270 - INIT_LIST_HEAD(&entry->list); 271 - entry->rmid = r; 272 - cqm_rmid_ptrs[r] = entry; 273 - 274 - list_add_tail(&entry->list, &cqm_rmid_free_lru); 275 - } 276 - 277 - /* 278 - * RMID 0 is special and is always allocated. It's used for all 279 - * tasks that are not monitored. 280 - */ 281 - entry = __rmid_entry(0); 282 - list_del(&entry->list); 283 - 284 - mutex_lock(&cache_mutex); 285 - intel_cqm_rotation_rmid = __get_rmid(); 286 - mutex_unlock(&cache_mutex); 287 - 288 - return 0; 289 - 290 - fail: 291 - cqm_cleanup(); 292 - return -ENOMEM; 293 - } 294 - 295 - /* 296 - * Determine if @a and @b measure the same set of tasks. 297 - * 298 - * If @a and @b measure the same set of tasks then we want to share a 299 - * single RMID. 300 - */ 301 - static bool __match_event(struct perf_event *a, struct perf_event *b) 302 - { 303 - /* Per-cpu and task events don't mix */ 304 - if ((a->attach_state & PERF_ATTACH_TASK) != 305 - (b->attach_state & PERF_ATTACH_TASK)) 306 - return false; 307 - 308 - #ifdef CONFIG_CGROUP_PERF 309 - if (a->cgrp != b->cgrp) 310 - return false; 311 - #endif 312 - 313 - /* If not task event, we're machine wide */ 314 - if (!(b->attach_state & PERF_ATTACH_TASK)) 315 - return true; 316 - 317 - /* 318 - * Events that target same task are placed into the same cache group. 319 - * Mark it as a multi event group, so that we update ->count 320 - * for every event rather than just the group leader later. 321 - */ 322 - if (a->hw.target == b->hw.target) { 323 - b->hw.is_group_event = true; 324 - return true; 325 - } 326 - 327 - /* 328 - * Are we an inherited event? 329 - */ 330 - if (b->parent == a) 331 - return true; 332 - 333 - return false; 334 - } 335 - 336 - #ifdef CONFIG_CGROUP_PERF 337 - static inline struct perf_cgroup *event_to_cgroup(struct perf_event *event) 338 - { 339 - if (event->attach_state & PERF_ATTACH_TASK) 340 - return perf_cgroup_from_task(event->hw.target, event->ctx); 341 - 342 - return event->cgrp; 343 - } 344 - #endif 345 - 346 - /* 347 - * Determine if @a's tasks intersect with @b's tasks 348 - * 349 - * There are combinations of events that we explicitly prohibit, 350 - * 351 - * PROHIBITS 352 - * system-wide -> cgroup and task 353 - * cgroup -> system-wide 354 - * -> task in cgroup 355 - * task -> system-wide 356 - * -> task in cgroup 357 - * 358 - * Call this function before allocating an RMID. 359 - */ 360 - static bool __conflict_event(struct perf_event *a, struct perf_event *b) 361 - { 362 - #ifdef CONFIG_CGROUP_PERF 363 - /* 364 - * We can have any number of cgroups but only one system-wide 365 - * event at a time. 366 - */ 367 - if (a->cgrp && b->cgrp) { 368 - struct perf_cgroup *ac = a->cgrp; 369 - struct perf_cgroup *bc = b->cgrp; 370 - 371 - /* 372 - * This condition should have been caught in 373 - * __match_event() and we should be sharing an RMID. 374 - */ 375 - WARN_ON_ONCE(ac == bc); 376 - 377 - if (cgroup_is_descendant(ac->css.cgroup, bc->css.cgroup) || 378 - cgroup_is_descendant(bc->css.cgroup, ac->css.cgroup)) 379 - return true; 380 - 381 - return false; 382 - } 383 - 384 - if (a->cgrp || b->cgrp) { 385 - struct perf_cgroup *ac, *bc; 386 - 387 - /* 388 - * cgroup and system-wide events are mutually exclusive 389 - */ 390 - if ((a->cgrp && !(b->attach_state & PERF_ATTACH_TASK)) || 391 - (b->cgrp && !(a->attach_state & PERF_ATTACH_TASK))) 392 - return true; 393 - 394 - /* 395 - * Ensure neither event is part of the other's cgroup 396 - */ 397 - ac = event_to_cgroup(a); 398 - bc = event_to_cgroup(b); 399 - if (ac == bc) 400 - return true; 401 - 402 - /* 403 - * Must have cgroup and non-intersecting task events. 404 - */ 405 - if (!ac || !bc) 406 - return false; 407 - 408 - /* 409 - * We have cgroup and task events, and the task belongs 410 - * to a cgroup. Check for for overlap. 411 - */ 412 - if (cgroup_is_descendant(ac->css.cgroup, bc->css.cgroup) || 413 - cgroup_is_descendant(bc->css.cgroup, ac->css.cgroup)) 414 - return true; 415 - 416 - return false; 417 - } 418 - #endif 419 - /* 420 - * If one of them is not a task, same story as above with cgroups. 421 - */ 422 - if (!(a->attach_state & PERF_ATTACH_TASK) || 423 - !(b->attach_state & PERF_ATTACH_TASK)) 424 - return true; 425 - 426 - /* 427 - * Must be non-overlapping. 428 - */ 429 - return false; 430 - } 431 - 432 - struct rmid_read { 433 - u32 rmid; 434 - u32 evt_type; 435 - atomic64_t value; 436 - }; 437 - 438 - static void __intel_cqm_event_count(void *info); 439 - static void init_mbm_sample(u32 rmid, u32 evt_type); 440 - static void __intel_mbm_event_count(void *info); 441 - 442 - static bool is_cqm_event(int e) 443 - { 444 - return (e == QOS_L3_OCCUP_EVENT_ID); 445 - } 446 - 447 - static bool is_mbm_event(int e) 448 - { 449 - return (e >= QOS_MBM_TOTAL_EVENT_ID && e <= QOS_MBM_LOCAL_EVENT_ID); 450 - } 451 - 452 - static void cqm_mask_call(struct rmid_read *rr) 453 - { 454 - if (is_mbm_event(rr->evt_type)) 455 - on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_count, rr, 1); 456 - else 457 - on_each_cpu_mask(&cqm_cpumask, __intel_cqm_event_count, rr, 1); 458 - } 459 - 460 - /* 461 - * Exchange the RMID of a group of events. 462 - */ 463 - static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid) 464 - { 465 - struct perf_event *event; 466 - struct list_head *head = &group->hw.cqm_group_entry; 467 - u32 old_rmid = group->hw.cqm_rmid; 468 - 469 - lockdep_assert_held(&cache_mutex); 470 - 471 - /* 472 - * If our RMID is being deallocated, perform a read now. 473 - */ 474 - if (__rmid_valid(old_rmid) && !__rmid_valid(rmid)) { 475 - struct rmid_read rr = { 476 - .rmid = old_rmid, 477 - .evt_type = group->attr.config, 478 - .value = ATOMIC64_INIT(0), 479 - }; 480 - 481 - cqm_mask_call(&rr); 482 - local64_set(&group->count, atomic64_read(&rr.value)); 483 - } 484 - 485 - raw_spin_lock_irq(&cache_lock); 486 - 487 - group->hw.cqm_rmid = rmid; 488 - list_for_each_entry(event, head, hw.cqm_group_entry) 489 - event->hw.cqm_rmid = rmid; 490 - 491 - raw_spin_unlock_irq(&cache_lock); 492 - 493 - /* 494 - * If the allocation is for mbm, init the mbm stats. 495 - * Need to check if each event in the group is mbm event 496 - * because there could be multiple type of events in the same group. 497 - */ 498 - if (__rmid_valid(rmid)) { 499 - event = group; 500 - if (is_mbm_event(event->attr.config)) 501 - init_mbm_sample(rmid, event->attr.config); 502 - 503 - list_for_each_entry(event, head, hw.cqm_group_entry) { 504 - if (is_mbm_event(event->attr.config)) 505 - init_mbm_sample(rmid, event->attr.config); 506 - } 507 - } 508 - 509 - return old_rmid; 510 - } 511 - 512 - /* 513 - * If we fail to assign a new RMID for intel_cqm_rotation_rmid because 514 - * cachelines are still tagged with RMIDs in limbo, we progressively 515 - * increment the threshold until we find an RMID in limbo with <= 516 - * __intel_cqm_threshold lines tagged. This is designed to mitigate the 517 - * problem where cachelines tagged with an RMID are not steadily being 518 - * evicted. 519 - * 520 - * On successful rotations we decrease the threshold back towards zero. 521 - * 522 - * __intel_cqm_max_threshold provides an upper bound on the threshold, 523 - * and is measured in bytes because it's exposed to userland. 524 - */ 525 - static unsigned int __intel_cqm_threshold; 526 - static unsigned int __intel_cqm_max_threshold; 527 - 528 - /* 529 - * Test whether an RMID has a zero occupancy value on this cpu. 530 - */ 531 - static void intel_cqm_stable(void *arg) 532 - { 533 - struct cqm_rmid_entry *entry; 534 - 535 - list_for_each_entry(entry, &cqm_rmid_limbo_lru, list) { 536 - if (entry->state != RMID_AVAILABLE) 537 - break; 538 - 539 - if (__rmid_read(entry->rmid) > __intel_cqm_threshold) 540 - entry->state = RMID_DIRTY; 541 - } 542 - } 543 - 544 - /* 545 - * If we have group events waiting for an RMID that don't conflict with 546 - * events already running, assign @rmid. 547 - */ 548 - static bool intel_cqm_sched_in_event(u32 rmid) 549 - { 550 - struct perf_event *leader, *event; 551 - 552 - lockdep_assert_held(&cache_mutex); 553 - 554 - leader = list_first_entry(&cache_groups, struct perf_event, 555 - hw.cqm_groups_entry); 556 - event = leader; 557 - 558 - list_for_each_entry_continue(event, &cache_groups, 559 - hw.cqm_groups_entry) { 560 - if (__rmid_valid(event->hw.cqm_rmid)) 561 - continue; 562 - 563 - if (__conflict_event(event, leader)) 564 - continue; 565 - 566 - intel_cqm_xchg_rmid(event, rmid); 567 - return true; 568 - } 569 - 570 - return false; 571 - } 572 - 573 - /* 574 - * Initially use this constant for both the limbo queue time and the 575 - * rotation timer interval, pmu::hrtimer_interval_ms. 576 - * 577 - * They don't need to be the same, but the two are related since if you 578 - * rotate faster than you recycle RMIDs, you may run out of available 579 - * RMIDs. 580 - */ 581 - #define RMID_DEFAULT_QUEUE_TIME 250 /* ms */ 582 - 583 - static unsigned int __rmid_queue_time_ms = RMID_DEFAULT_QUEUE_TIME; 584 - 585 - /* 586 - * intel_cqm_rmid_stabilize - move RMIDs from limbo to free list 587 - * @nr_available: number of freeable RMIDs on the limbo list 588 - * 589 - * Quiescent state; wait for all 'freed' RMIDs to become unused, i.e. no 590 - * cachelines are tagged with those RMIDs. After this we can reuse them 591 - * and know that the current set of active RMIDs is stable. 592 - * 593 - * Return %true or %false depending on whether stabilization needs to be 594 - * reattempted. 595 - * 596 - * If we return %true then @nr_available is updated to indicate the 597 - * number of RMIDs on the limbo list that have been queued for the 598 - * minimum queue time (RMID_AVAILABLE), but whose data occupancy values 599 - * are above __intel_cqm_threshold. 600 - */ 601 - static bool intel_cqm_rmid_stabilize(unsigned int *available) 602 - { 603 - struct cqm_rmid_entry *entry, *tmp; 604 - 605 - lockdep_assert_held(&cache_mutex); 606 - 607 - *available = 0; 608 - list_for_each_entry(entry, &cqm_rmid_limbo_lru, list) { 609 - unsigned long min_queue_time; 610 - unsigned long now = jiffies; 611 - 612 - /* 613 - * We hold RMIDs placed into limbo for a minimum queue 614 - * time. Before the minimum queue time has elapsed we do 615 - * not recycle RMIDs. 616 - * 617 - * The reasoning is that until a sufficient time has 618 - * passed since we stopped using an RMID, any RMID 619 - * placed onto the limbo list will likely still have 620 - * data tagged in the cache, which means we'll probably 621 - * fail to recycle it anyway. 622 - * 623 - * We can save ourselves an expensive IPI by skipping 624 - * any RMIDs that have not been queued for the minimum 625 - * time. 626 - */ 627 - min_queue_time = entry->queue_time + 628 - msecs_to_jiffies(__rmid_queue_time_ms); 629 - 630 - if (time_after(min_queue_time, now)) 631 - break; 632 - 633 - entry->state = RMID_AVAILABLE; 634 - (*available)++; 635 - } 636 - 637 - /* 638 - * Fast return if none of the RMIDs on the limbo list have been 639 - * sitting on the queue for the minimum queue time. 640 - */ 641 - if (!*available) 642 - return false; 643 - 644 - /* 645 - * Test whether an RMID is free for each package. 646 - */ 647 - on_each_cpu_mask(&cqm_cpumask, intel_cqm_stable, NULL, true); 648 - 649 - list_for_each_entry_safe(entry, tmp, &cqm_rmid_limbo_lru, list) { 650 - /* 651 - * Exhausted all RMIDs that have waited min queue time. 652 - */ 653 - if (entry->state == RMID_YOUNG) 654 - break; 655 - 656 - if (entry->state == RMID_DIRTY) 657 - continue; 658 - 659 - list_del(&entry->list); /* remove from limbo */ 660 - 661 - /* 662 - * The rotation RMID gets priority if it's 663 - * currently invalid. In which case, skip adding 664 - * the RMID to the the free lru. 665 - */ 666 - if (!__rmid_valid(intel_cqm_rotation_rmid)) { 667 - intel_cqm_rotation_rmid = entry->rmid; 668 - continue; 669 - } 670 - 671 - /* 672 - * If we have groups waiting for RMIDs, hand 673 - * them one now provided they don't conflict. 674 - */ 675 - if (intel_cqm_sched_in_event(entry->rmid)) 676 - continue; 677 - 678 - /* 679 - * Otherwise place it onto the free list. 680 - */ 681 - list_add_tail(&entry->list, &cqm_rmid_free_lru); 682 - } 683 - 684 - 685 - return __rmid_valid(intel_cqm_rotation_rmid); 686 - } 687 - 688 - /* 689 - * Pick a victim group and move it to the tail of the group list. 690 - * @next: The first group without an RMID 691 - */ 692 - static void __intel_cqm_pick_and_rotate(struct perf_event *next) 693 - { 694 - struct perf_event *rotor; 695 - u32 rmid; 696 - 697 - lockdep_assert_held(&cache_mutex); 698 - 699 - rotor = list_first_entry(&cache_groups, struct perf_event, 700 - hw.cqm_groups_entry); 701 - 702 - /* 703 - * The group at the front of the list should always have a valid 704 - * RMID. If it doesn't then no groups have RMIDs assigned and we 705 - * don't need to rotate the list. 706 - */ 707 - if (next == rotor) 708 - return; 709 - 710 - rmid = intel_cqm_xchg_rmid(rotor, INVALID_RMID); 711 - __put_rmid(rmid); 712 - 713 - list_rotate_left(&cache_groups); 714 - } 715 - 716 - /* 717 - * Deallocate the RMIDs from any events that conflict with @event, and 718 - * place them on the back of the group list. 719 - */ 720 - static void intel_cqm_sched_out_conflicting_events(struct perf_event *event) 721 - { 722 - struct perf_event *group, *g; 723 - u32 rmid; 724 - 725 - lockdep_assert_held(&cache_mutex); 726 - 727 - list_for_each_entry_safe(group, g, &cache_groups, hw.cqm_groups_entry) { 728 - if (group == event) 729 - continue; 730 - 731 - rmid = group->hw.cqm_rmid; 732 - 733 - /* 734 - * Skip events that don't have a valid RMID. 735 - */ 736 - if (!__rmid_valid(rmid)) 737 - continue; 738 - 739 - /* 740 - * No conflict? No problem! Leave the event alone. 741 - */ 742 - if (!__conflict_event(group, event)) 743 - continue; 744 - 745 - intel_cqm_xchg_rmid(group, INVALID_RMID); 746 - __put_rmid(rmid); 747 - } 748 - } 749 - 750 - /* 751 - * Attempt to rotate the groups and assign new RMIDs. 752 - * 753 - * We rotate for two reasons, 754 - * 1. To handle the scheduling of conflicting events 755 - * 2. To recycle RMIDs 756 - * 757 - * Rotating RMIDs is complicated because the hardware doesn't give us 758 - * any clues. 759 - * 760 - * There's problems with the hardware interface; when you change the 761 - * task:RMID map cachelines retain their 'old' tags, giving a skewed 762 - * picture. In order to work around this, we must always keep one free 763 - * RMID - intel_cqm_rotation_rmid. 764 - * 765 - * Rotation works by taking away an RMID from a group (the old RMID), 766 - * and assigning the free RMID to another group (the new RMID). We must 767 - * then wait for the old RMID to not be used (no cachelines tagged). 768 - * This ensure that all cachelines are tagged with 'active' RMIDs. At 769 - * this point we can start reading values for the new RMID and treat the 770 - * old RMID as the free RMID for the next rotation. 771 - * 772 - * Return %true or %false depending on whether we did any rotating. 773 - */ 774 - static bool __intel_cqm_rmid_rotate(void) 775 - { 776 - struct perf_event *group, *start = NULL; 777 - unsigned int threshold_limit; 778 - unsigned int nr_needed = 0; 779 - unsigned int nr_available; 780 - bool rotated = false; 781 - 782 - mutex_lock(&cache_mutex); 783 - 784 - again: 785 - /* 786 - * Fast path through this function if there are no groups and no 787 - * RMIDs that need cleaning. 788 - */ 789 - if (list_empty(&cache_groups) && list_empty(&cqm_rmid_limbo_lru)) 790 - goto out; 791 - 792 - list_for_each_entry(group, &cache_groups, hw.cqm_groups_entry) { 793 - if (!__rmid_valid(group->hw.cqm_rmid)) { 794 - if (!start) 795 - start = group; 796 - nr_needed++; 797 - } 798 - } 799 - 800 - /* 801 - * We have some event groups, but they all have RMIDs assigned 802 - * and no RMIDs need cleaning. 803 - */ 804 - if (!nr_needed && list_empty(&cqm_rmid_limbo_lru)) 805 - goto out; 806 - 807 - if (!nr_needed) 808 - goto stabilize; 809 - 810 - /* 811 - * We have more event groups without RMIDs than available RMIDs, 812 - * or we have event groups that conflict with the ones currently 813 - * scheduled. 814 - * 815 - * We force deallocate the rmid of the group at the head of 816 - * cache_groups. The first event group without an RMID then gets 817 - * assigned intel_cqm_rotation_rmid. This ensures we always make 818 - * forward progress. 819 - * 820 - * Rotate the cache_groups list so the previous head is now the 821 - * tail. 822 - */ 823 - __intel_cqm_pick_and_rotate(start); 824 - 825 - /* 826 - * If the rotation is going to succeed, reduce the threshold so 827 - * that we don't needlessly reuse dirty RMIDs. 828 - */ 829 - if (__rmid_valid(intel_cqm_rotation_rmid)) { 830 - intel_cqm_xchg_rmid(start, intel_cqm_rotation_rmid); 831 - intel_cqm_rotation_rmid = __get_rmid(); 832 - 833 - intel_cqm_sched_out_conflicting_events(start); 834 - 835 - if (__intel_cqm_threshold) 836 - __intel_cqm_threshold--; 837 - } 838 - 839 - rotated = true; 840 - 841 - stabilize: 842 - /* 843 - * We now need to stablize the RMID we freed above (if any) to 844 - * ensure that the next time we rotate we have an RMID with zero 845 - * occupancy value. 846 - * 847 - * Alternatively, if we didn't need to perform any rotation, 848 - * we'll have a bunch of RMIDs in limbo that need stabilizing. 849 - */ 850 - threshold_limit = __intel_cqm_max_threshold / cqm_l3_scale; 851 - 852 - while (intel_cqm_rmid_stabilize(&nr_available) && 853 - __intel_cqm_threshold < threshold_limit) { 854 - unsigned int steal_limit; 855 - 856 - /* 857 - * Don't spin if nobody is actively waiting for an RMID, 858 - * the rotation worker will be kicked as soon as an 859 - * event needs an RMID anyway. 860 - */ 861 - if (!nr_needed) 862 - break; 863 - 864 - /* Allow max 25% of RMIDs to be in limbo. */ 865 - steal_limit = (cqm_max_rmid + 1) / 4; 866 - 867 - /* 868 - * We failed to stabilize any RMIDs so our rotation 869 - * logic is now stuck. In order to make forward progress 870 - * we have a few options: 871 - * 872 - * 1. rotate ("steal") another RMID 873 - * 2. increase the threshold 874 - * 3. do nothing 875 - * 876 - * We do both of 1. and 2. until we hit the steal limit. 877 - * 878 - * The steal limit prevents all RMIDs ending up on the 879 - * limbo list. This can happen if every RMID has a 880 - * non-zero occupancy above threshold_limit, and the 881 - * occupancy values aren't dropping fast enough. 882 - * 883 - * Note that there is prioritisation at work here - we'd 884 - * rather increase the number of RMIDs on the limbo list 885 - * than increase the threshold, because increasing the 886 - * threshold skews the event data (because we reuse 887 - * dirty RMIDs) - threshold bumps are a last resort. 888 - */ 889 - if (nr_available < steal_limit) 890 - goto again; 891 - 892 - __intel_cqm_threshold++; 893 - } 894 - 895 - out: 896 - mutex_unlock(&cache_mutex); 897 - return rotated; 898 - } 899 - 900 - static void intel_cqm_rmid_rotate(struct work_struct *work); 901 - 902 - static DECLARE_DELAYED_WORK(intel_cqm_rmid_work, intel_cqm_rmid_rotate); 903 - 904 - static struct pmu intel_cqm_pmu; 905 - 906 - static void intel_cqm_rmid_rotate(struct work_struct *work) 907 - { 908 - unsigned long delay; 909 - 910 - __intel_cqm_rmid_rotate(); 911 - 912 - delay = msecs_to_jiffies(intel_cqm_pmu.hrtimer_interval_ms); 913 - schedule_delayed_work(&intel_cqm_rmid_work, delay); 914 - } 915 - 916 - static u64 update_sample(unsigned int rmid, u32 evt_type, int first) 917 - { 918 - struct sample *mbm_current; 919 - u32 vrmid = rmid_2_index(rmid); 920 - u64 val, bytes, shift; 921 - u32 eventid; 922 - 923 - if (evt_type == QOS_MBM_LOCAL_EVENT_ID) { 924 - mbm_current = &mbm_local[vrmid]; 925 - eventid = QOS_MBM_LOCAL_EVENT_ID; 926 - } else { 927 - mbm_current = &mbm_total[vrmid]; 928 - eventid = QOS_MBM_TOTAL_EVENT_ID; 929 - } 930 - 931 - wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid); 932 - rdmsrl(MSR_IA32_QM_CTR, val); 933 - if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) 934 - return mbm_current->total_bytes; 935 - 936 - if (first) { 937 - mbm_current->prev_msr = val; 938 - mbm_current->total_bytes = 0; 939 - return mbm_current->total_bytes; 940 - } 941 - 942 - /* 943 - * The h/w guarantees that counters will not overflow 944 - * so long as we poll them at least once per second. 945 - */ 946 - shift = 64 - MBM_CNTR_WIDTH; 947 - bytes = (val << shift) - (mbm_current->prev_msr << shift); 948 - bytes >>= shift; 949 - 950 - bytes *= cqm_l3_scale; 951 - 952 - mbm_current->total_bytes += bytes; 953 - mbm_current->prev_msr = val; 954 - 955 - return mbm_current->total_bytes; 956 - } 957 - 958 - static u64 rmid_read_mbm(unsigned int rmid, u32 evt_type) 959 - { 960 - return update_sample(rmid, evt_type, 0); 961 - } 962 - 963 - static void __intel_mbm_event_init(void *info) 964 - { 965 - struct rmid_read *rr = info; 966 - 967 - update_sample(rr->rmid, rr->evt_type, 1); 968 - } 969 - 970 - static void init_mbm_sample(u32 rmid, u32 evt_type) 971 - { 972 - struct rmid_read rr = { 973 - .rmid = rmid, 974 - .evt_type = evt_type, 975 - .value = ATOMIC64_INIT(0), 976 - }; 977 - 978 - /* on each socket, init sample */ 979 - on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1); 980 - } 981 - 982 - /* 983 - * Find a group and setup RMID. 984 - * 985 - * If we're part of a group, we use the group's RMID. 986 - */ 987 - static void intel_cqm_setup_event(struct perf_event *event, 988 - struct perf_event **group) 989 - { 990 - struct perf_event *iter; 991 - bool conflict = false; 992 - u32 rmid; 993 - 994 - event->hw.is_group_event = false; 995 - list_for_each_entry(iter, &cache_groups, hw.cqm_groups_entry) { 996 - rmid = iter->hw.cqm_rmid; 997 - 998 - if (__match_event(iter, event)) { 999 - /* All tasks in a group share an RMID */ 1000 - event->hw.cqm_rmid = rmid; 1001 - *group = iter; 1002 - if (is_mbm_event(event->attr.config) && __rmid_valid(rmid)) 1003 - init_mbm_sample(rmid, event->attr.config); 1004 - return; 1005 - } 1006 - 1007 - /* 1008 - * We only care about conflicts for events that are 1009 - * actually scheduled in (and hence have a valid RMID). 1010 - */ 1011 - if (__conflict_event(iter, event) && __rmid_valid(rmid)) 1012 - conflict = true; 1013 - } 1014 - 1015 - if (conflict) 1016 - rmid = INVALID_RMID; 1017 - else 1018 - rmid = __get_rmid(); 1019 - 1020 - if (is_mbm_event(event->attr.config) && __rmid_valid(rmid)) 1021 - init_mbm_sample(rmid, event->attr.config); 1022 - 1023 - event->hw.cqm_rmid = rmid; 1024 - } 1025 - 1026 - static void intel_cqm_event_read(struct perf_event *event) 1027 - { 1028 - unsigned long flags; 1029 - u32 rmid; 1030 - u64 val; 1031 - 1032 - /* 1033 - * Task events are handled by intel_cqm_event_count(). 1034 - */ 1035 - if (event->cpu == -1) 1036 - return; 1037 - 1038 - raw_spin_lock_irqsave(&cache_lock, flags); 1039 - rmid = event->hw.cqm_rmid; 1040 - 1041 - if (!__rmid_valid(rmid)) 1042 - goto out; 1043 - 1044 - if (is_mbm_event(event->attr.config)) 1045 - val = rmid_read_mbm(rmid, event->attr.config); 1046 - else 1047 - val = __rmid_read(rmid); 1048 - 1049 - /* 1050 - * Ignore this reading on error states and do not update the value. 1051 - */ 1052 - if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) 1053 - goto out; 1054 - 1055 - local64_set(&event->count, val); 1056 - out: 1057 - raw_spin_unlock_irqrestore(&cache_lock, flags); 1058 - } 1059 - 1060 - static void __intel_cqm_event_count(void *info) 1061 - { 1062 - struct rmid_read *rr = info; 1063 - u64 val; 1064 - 1065 - val = __rmid_read(rr->rmid); 1066 - 1067 - if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) 1068 - return; 1069 - 1070 - atomic64_add(val, &rr->value); 1071 - } 1072 - 1073 - static inline bool cqm_group_leader(struct perf_event *event) 1074 - { 1075 - return !list_empty(&event->hw.cqm_groups_entry); 1076 - } 1077 - 1078 - static void __intel_mbm_event_count(void *info) 1079 - { 1080 - struct rmid_read *rr = info; 1081 - u64 val; 1082 - 1083 - val = rmid_read_mbm(rr->rmid, rr->evt_type); 1084 - if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) 1085 - return; 1086 - atomic64_add(val, &rr->value); 1087 - } 1088 - 1089 - static enum hrtimer_restart mbm_hrtimer_handle(struct hrtimer *hrtimer) 1090 - { 1091 - struct perf_event *iter, *iter1; 1092 - int ret = HRTIMER_RESTART; 1093 - struct list_head *head; 1094 - unsigned long flags; 1095 - u32 grp_rmid; 1096 - 1097 - /* 1098 - * Need to cache_lock as the timer Event Select MSR reads 1099 - * can race with the mbm/cqm count() and mbm_init() reads. 1100 - */ 1101 - raw_spin_lock_irqsave(&cache_lock, flags); 1102 - 1103 - if (list_empty(&cache_groups)) { 1104 - ret = HRTIMER_NORESTART; 1105 - goto out; 1106 - } 1107 - 1108 - list_for_each_entry(iter, &cache_groups, hw.cqm_groups_entry) { 1109 - grp_rmid = iter->hw.cqm_rmid; 1110 - if (!__rmid_valid(grp_rmid)) 1111 - continue; 1112 - if (is_mbm_event(iter->attr.config)) 1113 - update_sample(grp_rmid, iter->attr.config, 0); 1114 - 1115 - head = &iter->hw.cqm_group_entry; 1116 - if (list_empty(head)) 1117 - continue; 1118 - list_for_each_entry(iter1, head, hw.cqm_group_entry) { 1119 - if (!iter1->hw.is_group_event) 1120 - break; 1121 - if (is_mbm_event(iter1->attr.config)) 1122 - update_sample(iter1->hw.cqm_rmid, 1123 - iter1->attr.config, 0); 1124 - } 1125 - } 1126 - 1127 - hrtimer_forward_now(hrtimer, ms_to_ktime(MBM_CTR_OVERFLOW_TIME)); 1128 - out: 1129 - raw_spin_unlock_irqrestore(&cache_lock, flags); 1130 - 1131 - return ret; 1132 - } 1133 - 1134 - static void __mbm_start_timer(void *info) 1135 - { 1136 - hrtimer_start(&mbm_timers[pkg_id], ms_to_ktime(MBM_CTR_OVERFLOW_TIME), 1137 - HRTIMER_MODE_REL_PINNED); 1138 - } 1139 - 1140 - static void __mbm_stop_timer(void *info) 1141 - { 1142 - hrtimer_cancel(&mbm_timers[pkg_id]); 1143 - } 1144 - 1145 - static void mbm_start_timers(void) 1146 - { 1147 - on_each_cpu_mask(&cqm_cpumask, __mbm_start_timer, NULL, 1); 1148 - } 1149 - 1150 - static void mbm_stop_timers(void) 1151 - { 1152 - on_each_cpu_mask(&cqm_cpumask, __mbm_stop_timer, NULL, 1); 1153 - } 1154 - 1155 - static void mbm_hrtimer_init(void) 1156 - { 1157 - struct hrtimer *hr; 1158 - int i; 1159 - 1160 - for (i = 0; i < mbm_socket_max; i++) { 1161 - hr = &mbm_timers[i]; 1162 - hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 1163 - hr->function = mbm_hrtimer_handle; 1164 - } 1165 - } 1166 - 1167 - static u64 intel_cqm_event_count(struct perf_event *event) 1168 - { 1169 - unsigned long flags; 1170 - struct rmid_read rr = { 1171 - .evt_type = event->attr.config, 1172 - .value = ATOMIC64_INIT(0), 1173 - }; 1174 - 1175 - /* 1176 - * We only need to worry about task events. System-wide events 1177 - * are handled like usual, i.e. entirely with 1178 - * intel_cqm_event_read(). 1179 - */ 1180 - if (event->cpu != -1) 1181 - return __perf_event_count(event); 1182 - 1183 - /* 1184 - * Only the group leader gets to report values except in case of 1185 - * multiple events in the same group, we still need to read the 1186 - * other events.This stops us 1187 - * reporting duplicate values to userspace, and gives us a clear 1188 - * rule for which task gets to report the values. 1189 - * 1190 - * Note that it is impossible to attribute these values to 1191 - * specific packages - we forfeit that ability when we create 1192 - * task events. 1193 - */ 1194 - if (!cqm_group_leader(event) && !event->hw.is_group_event) 1195 - return 0; 1196 - 1197 - /* 1198 - * Getting up-to-date values requires an SMP IPI which is not 1199 - * possible if we're being called in interrupt context. Return 1200 - * the cached values instead. 1201 - */ 1202 - if (unlikely(in_interrupt())) 1203 - goto out; 1204 - 1205 - /* 1206 - * Notice that we don't perform the reading of an RMID 1207 - * atomically, because we can't hold a spin lock across the 1208 - * IPIs. 1209 - * 1210 - * Speculatively perform the read, since @event might be 1211 - * assigned a different (possibly invalid) RMID while we're 1212 - * busying performing the IPI calls. It's therefore necessary to 1213 - * check @event's RMID afterwards, and if it has changed, 1214 - * discard the result of the read. 1215 - */ 1216 - rr.rmid = ACCESS_ONCE(event->hw.cqm_rmid); 1217 - 1218 - if (!__rmid_valid(rr.rmid)) 1219 - goto out; 1220 - 1221 - cqm_mask_call(&rr); 1222 - 1223 - raw_spin_lock_irqsave(&cache_lock, flags); 1224 - if (event->hw.cqm_rmid == rr.rmid) 1225 - local64_set(&event->count, atomic64_read(&rr.value)); 1226 - raw_spin_unlock_irqrestore(&cache_lock, flags); 1227 - out: 1228 - return __perf_event_count(event); 1229 - } 1230 - 1231 - static void intel_cqm_event_start(struct perf_event *event, int mode) 1232 - { 1233 - struct intel_pqr_state *state = this_cpu_ptr(&pqr_state); 1234 - u32 rmid = event->hw.cqm_rmid; 1235 - 1236 - if (!(event->hw.cqm_state & PERF_HES_STOPPED)) 1237 - return; 1238 - 1239 - event->hw.cqm_state &= ~PERF_HES_STOPPED; 1240 - 1241 - if (state->rmid_usecnt++) { 1242 - if (!WARN_ON_ONCE(state->rmid != rmid)) 1243 - return; 1244 - } else { 1245 - WARN_ON_ONCE(state->rmid); 1246 - } 1247 - 1248 - state->rmid = rmid; 1249 - wrmsr(MSR_IA32_PQR_ASSOC, rmid, state->closid); 1250 - } 1251 - 1252 - static void intel_cqm_event_stop(struct perf_event *event, int mode) 1253 - { 1254 - struct intel_pqr_state *state = this_cpu_ptr(&pqr_state); 1255 - 1256 - if (event->hw.cqm_state & PERF_HES_STOPPED) 1257 - return; 1258 - 1259 - event->hw.cqm_state |= PERF_HES_STOPPED; 1260 - 1261 - intel_cqm_event_read(event); 1262 - 1263 - if (!--state->rmid_usecnt) { 1264 - state->rmid = 0; 1265 - wrmsr(MSR_IA32_PQR_ASSOC, 0, state->closid); 1266 - } else { 1267 - WARN_ON_ONCE(!state->rmid); 1268 - } 1269 - } 1270 - 1271 - static int intel_cqm_event_add(struct perf_event *event, int mode) 1272 - { 1273 - unsigned long flags; 1274 - u32 rmid; 1275 - 1276 - raw_spin_lock_irqsave(&cache_lock, flags); 1277 - 1278 - event->hw.cqm_state = PERF_HES_STOPPED; 1279 - rmid = event->hw.cqm_rmid; 1280 - 1281 - if (__rmid_valid(rmid) && (mode & PERF_EF_START)) 1282 - intel_cqm_event_start(event, mode); 1283 - 1284 - raw_spin_unlock_irqrestore(&cache_lock, flags); 1285 - 1286 - return 0; 1287 - } 1288 - 1289 - static void intel_cqm_event_destroy(struct perf_event *event) 1290 - { 1291 - struct perf_event *group_other = NULL; 1292 - unsigned long flags; 1293 - 1294 - mutex_lock(&cache_mutex); 1295 - /* 1296 - * Hold the cache_lock as mbm timer handlers could be 1297 - * scanning the list of events. 1298 - */ 1299 - raw_spin_lock_irqsave(&cache_lock, flags); 1300 - 1301 - /* 1302 - * If there's another event in this group... 1303 - */ 1304 - if (!list_empty(&event->hw.cqm_group_entry)) { 1305 - group_other = list_first_entry(&event->hw.cqm_group_entry, 1306 - struct perf_event, 1307 - hw.cqm_group_entry); 1308 - list_del(&event->hw.cqm_group_entry); 1309 - } 1310 - 1311 - /* 1312 - * And we're the group leader.. 1313 - */ 1314 - if (cqm_group_leader(event)) { 1315 - /* 1316 - * If there was a group_other, make that leader, otherwise 1317 - * destroy the group and return the RMID. 1318 - */ 1319 - if (group_other) { 1320 - list_replace(&event->hw.cqm_groups_entry, 1321 - &group_other->hw.cqm_groups_entry); 1322 - } else { 1323 - u32 rmid = event->hw.cqm_rmid; 1324 - 1325 - if (__rmid_valid(rmid)) 1326 - __put_rmid(rmid); 1327 - list_del(&event->hw.cqm_groups_entry); 1328 - } 1329 - } 1330 - 1331 - raw_spin_unlock_irqrestore(&cache_lock, flags); 1332 - 1333 - /* 1334 - * Stop the mbm overflow timers when the last event is destroyed. 1335 - */ 1336 - if (mbm_enabled && list_empty(&cache_groups)) 1337 - mbm_stop_timers(); 1338 - 1339 - mutex_unlock(&cache_mutex); 1340 - } 1341 - 1342 - static int intel_cqm_event_init(struct perf_event *event) 1343 - { 1344 - struct perf_event *group = NULL; 1345 - bool rotate = false; 1346 - unsigned long flags; 1347 - 1348 - if (event->attr.type != intel_cqm_pmu.type) 1349 - return -ENOENT; 1350 - 1351 - if ((event->attr.config < QOS_L3_OCCUP_EVENT_ID) || 1352 - (event->attr.config > QOS_MBM_LOCAL_EVENT_ID)) 1353 - return -EINVAL; 1354 - 1355 - if ((is_cqm_event(event->attr.config) && !cqm_enabled) || 1356 - (is_mbm_event(event->attr.config) && !mbm_enabled)) 1357 - return -EINVAL; 1358 - 1359 - /* unsupported modes and filters */ 1360 - if (event->attr.exclude_user || 1361 - event->attr.exclude_kernel || 1362 - event->attr.exclude_hv || 1363 - event->attr.exclude_idle || 1364 - event->attr.exclude_host || 1365 - event->attr.exclude_guest || 1366 - event->attr.sample_period) /* no sampling */ 1367 - return -EINVAL; 1368 - 1369 - INIT_LIST_HEAD(&event->hw.cqm_group_entry); 1370 - INIT_LIST_HEAD(&event->hw.cqm_groups_entry); 1371 - 1372 - event->destroy = intel_cqm_event_destroy; 1373 - 1374 - mutex_lock(&cache_mutex); 1375 - 1376 - /* 1377 - * Start the mbm overflow timers when the first event is created. 1378 - */ 1379 - if (mbm_enabled && list_empty(&cache_groups)) 1380 - mbm_start_timers(); 1381 - 1382 - /* Will also set rmid */ 1383 - intel_cqm_setup_event(event, &group); 1384 - 1385 - /* 1386 - * Hold the cache_lock as mbm timer handlers be 1387 - * scanning the list of events. 1388 - */ 1389 - raw_spin_lock_irqsave(&cache_lock, flags); 1390 - 1391 - if (group) { 1392 - list_add_tail(&event->hw.cqm_group_entry, 1393 - &group->hw.cqm_group_entry); 1394 - } else { 1395 - list_add_tail(&event->hw.cqm_groups_entry, 1396 - &cache_groups); 1397 - 1398 - /* 1399 - * All RMIDs are either in use or have recently been 1400 - * used. Kick the rotation worker to clean/free some. 1401 - * 1402 - * We only do this for the group leader, rather than for 1403 - * every event in a group to save on needless work. 1404 - */ 1405 - if (!__rmid_valid(event->hw.cqm_rmid)) 1406 - rotate = true; 1407 - } 1408 - 1409 - raw_spin_unlock_irqrestore(&cache_lock, flags); 1410 - mutex_unlock(&cache_mutex); 1411 - 1412 - if (rotate) 1413 - schedule_delayed_work(&intel_cqm_rmid_work, 0); 1414 - 1415 - return 0; 1416 - } 1417 - 1418 - EVENT_ATTR_STR(llc_occupancy, intel_cqm_llc, "event=0x01"); 1419 - EVENT_ATTR_STR(llc_occupancy.per-pkg, intel_cqm_llc_pkg, "1"); 1420 - EVENT_ATTR_STR(llc_occupancy.unit, intel_cqm_llc_unit, "Bytes"); 1421 - EVENT_ATTR_STR(llc_occupancy.scale, intel_cqm_llc_scale, NULL); 1422 - EVENT_ATTR_STR(llc_occupancy.snapshot, intel_cqm_llc_snapshot, "1"); 1423 - 1424 - EVENT_ATTR_STR(total_bytes, intel_cqm_total_bytes, "event=0x02"); 1425 - EVENT_ATTR_STR(total_bytes.per-pkg, intel_cqm_total_bytes_pkg, "1"); 1426 - EVENT_ATTR_STR(total_bytes.unit, intel_cqm_total_bytes_unit, "MB"); 1427 - EVENT_ATTR_STR(total_bytes.scale, intel_cqm_total_bytes_scale, "1e-6"); 1428 - 1429 - EVENT_ATTR_STR(local_bytes, intel_cqm_local_bytes, "event=0x03"); 1430 - EVENT_ATTR_STR(local_bytes.per-pkg, intel_cqm_local_bytes_pkg, "1"); 1431 - EVENT_ATTR_STR(local_bytes.unit, intel_cqm_local_bytes_unit, "MB"); 1432 - EVENT_ATTR_STR(local_bytes.scale, intel_cqm_local_bytes_scale, "1e-6"); 1433 - 1434 - static struct attribute *intel_cqm_events_attr[] = { 1435 - EVENT_PTR(intel_cqm_llc), 1436 - EVENT_PTR(intel_cqm_llc_pkg), 1437 - EVENT_PTR(intel_cqm_llc_unit), 1438 - EVENT_PTR(intel_cqm_llc_scale), 1439 - EVENT_PTR(intel_cqm_llc_snapshot), 1440 - NULL, 1441 - }; 1442 - 1443 - static struct attribute *intel_mbm_events_attr[] = { 1444 - EVENT_PTR(intel_cqm_total_bytes), 1445 - EVENT_PTR(intel_cqm_local_bytes), 1446 - EVENT_PTR(intel_cqm_total_bytes_pkg), 1447 - EVENT_PTR(intel_cqm_local_bytes_pkg), 1448 - EVENT_PTR(intel_cqm_total_bytes_unit), 1449 - EVENT_PTR(intel_cqm_local_bytes_unit), 1450 - EVENT_PTR(intel_cqm_total_bytes_scale), 1451 - EVENT_PTR(intel_cqm_local_bytes_scale), 1452 - NULL, 1453 - }; 1454 - 1455 - static struct attribute *intel_cmt_mbm_events_attr[] = { 1456 - EVENT_PTR(intel_cqm_llc), 1457 - EVENT_PTR(intel_cqm_total_bytes), 1458 - EVENT_PTR(intel_cqm_local_bytes), 1459 - EVENT_PTR(intel_cqm_llc_pkg), 1460 - EVENT_PTR(intel_cqm_total_bytes_pkg), 1461 - EVENT_PTR(intel_cqm_local_bytes_pkg), 1462 - EVENT_PTR(intel_cqm_llc_unit), 1463 - EVENT_PTR(intel_cqm_total_bytes_unit), 1464 - EVENT_PTR(intel_cqm_local_bytes_unit), 1465 - EVENT_PTR(intel_cqm_llc_scale), 1466 - EVENT_PTR(intel_cqm_total_bytes_scale), 1467 - EVENT_PTR(intel_cqm_local_bytes_scale), 1468 - EVENT_PTR(intel_cqm_llc_snapshot), 1469 - NULL, 1470 - }; 1471 - 1472 - static struct attribute_group intel_cqm_events_group = { 1473 - .name = "events", 1474 - .attrs = NULL, 1475 - }; 1476 - 1477 - PMU_FORMAT_ATTR(event, "config:0-7"); 1478 - static struct attribute *intel_cqm_formats_attr[] = { 1479 - &format_attr_event.attr, 1480 - NULL, 1481 - }; 1482 - 1483 - static struct attribute_group intel_cqm_format_group = { 1484 - .name = "format", 1485 - .attrs = intel_cqm_formats_attr, 1486 - }; 1487 - 1488 - static ssize_t 1489 - max_recycle_threshold_show(struct device *dev, struct device_attribute *attr, 1490 - char *page) 1491 - { 1492 - ssize_t rv; 1493 - 1494 - mutex_lock(&cache_mutex); 1495 - rv = snprintf(page, PAGE_SIZE-1, "%u\n", __intel_cqm_max_threshold); 1496 - mutex_unlock(&cache_mutex); 1497 - 1498 - return rv; 1499 - } 1500 - 1501 - static ssize_t 1502 - max_recycle_threshold_store(struct device *dev, 1503 - struct device_attribute *attr, 1504 - const char *buf, size_t count) 1505 - { 1506 - unsigned int bytes, cachelines; 1507 - int ret; 1508 - 1509 - ret = kstrtouint(buf, 0, &bytes); 1510 - if (ret) 1511 - return ret; 1512 - 1513 - mutex_lock(&cache_mutex); 1514 - 1515 - __intel_cqm_max_threshold = bytes; 1516 - cachelines = bytes / cqm_l3_scale; 1517 - 1518 - /* 1519 - * The new maximum takes effect immediately. 1520 - */ 1521 - if (__intel_cqm_threshold > cachelines) 1522 - __intel_cqm_threshold = cachelines; 1523 - 1524 - mutex_unlock(&cache_mutex); 1525 - 1526 - return count; 1527 - } 1528 - 1529 - static DEVICE_ATTR_RW(max_recycle_threshold); 1530 - 1531 - static struct attribute *intel_cqm_attrs[] = { 1532 - &dev_attr_max_recycle_threshold.attr, 1533 - NULL, 1534 - }; 1535 - 1536 - static const struct attribute_group intel_cqm_group = { 1537 - .attrs = intel_cqm_attrs, 1538 - }; 1539 - 1540 - static const struct attribute_group *intel_cqm_attr_groups[] = { 1541 - &intel_cqm_events_group, 1542 - &intel_cqm_format_group, 1543 - &intel_cqm_group, 1544 - NULL, 1545 - }; 1546 - 1547 - static struct pmu intel_cqm_pmu = { 1548 - .hrtimer_interval_ms = RMID_DEFAULT_QUEUE_TIME, 1549 - .attr_groups = intel_cqm_attr_groups, 1550 - .task_ctx_nr = perf_sw_context, 1551 - .event_init = intel_cqm_event_init, 1552 - .add = intel_cqm_event_add, 1553 - .del = intel_cqm_event_stop, 1554 - .start = intel_cqm_event_start, 1555 - .stop = intel_cqm_event_stop, 1556 - .read = intel_cqm_event_read, 1557 - .count = intel_cqm_event_count, 1558 - }; 1559 - 1560 - static inline void cqm_pick_event_reader(int cpu) 1561 - { 1562 - int reader; 1563 - 1564 - /* First online cpu in package becomes the reader */ 1565 - reader = cpumask_any_and(&cqm_cpumask, topology_core_cpumask(cpu)); 1566 - if (reader >= nr_cpu_ids) 1567 - cpumask_set_cpu(cpu, &cqm_cpumask); 1568 - } 1569 - 1570 - static int intel_cqm_cpu_starting(unsigned int cpu) 1571 - { 1572 - struct intel_pqr_state *state = &per_cpu(pqr_state, cpu); 1573 - struct cpuinfo_x86 *c = &cpu_data(cpu); 1574 - 1575 - state->rmid = 0; 1576 - state->closid = 0; 1577 - state->rmid_usecnt = 0; 1578 - 1579 - WARN_ON(c->x86_cache_max_rmid != cqm_max_rmid); 1580 - WARN_ON(c->x86_cache_occ_scale != cqm_l3_scale); 1581 - 1582 - cqm_pick_event_reader(cpu); 1583 - return 0; 1584 - } 1585 - 1586 - static int intel_cqm_cpu_exit(unsigned int cpu) 1587 - { 1588 - int target; 1589 - 1590 - /* Is @cpu the current cqm reader for this package ? */ 1591 - if (!cpumask_test_and_clear_cpu(cpu, &cqm_cpumask)) 1592 - return 0; 1593 - 1594 - /* Find another online reader in this package */ 1595 - target = cpumask_any_but(topology_core_cpumask(cpu), cpu); 1596 - 1597 - if (target < nr_cpu_ids) 1598 - cpumask_set_cpu(target, &cqm_cpumask); 1599 - 1600 - return 0; 1601 - } 1602 - 1603 - static const struct x86_cpu_id intel_cqm_match[] = { 1604 - { .vendor = X86_VENDOR_INTEL, .feature = X86_FEATURE_CQM_OCCUP_LLC }, 1605 - {} 1606 - }; 1607 - 1608 - static void mbm_cleanup(void) 1609 - { 1610 - if (!mbm_enabled) 1611 - return; 1612 - 1613 - kfree(mbm_local); 1614 - kfree(mbm_total); 1615 - mbm_enabled = false; 1616 - } 1617 - 1618 - static const struct x86_cpu_id intel_mbm_local_match[] = { 1619 - { .vendor = X86_VENDOR_INTEL, .feature = X86_FEATURE_CQM_MBM_LOCAL }, 1620 - {} 1621 - }; 1622 - 1623 - static const struct x86_cpu_id intel_mbm_total_match[] = { 1624 - { .vendor = X86_VENDOR_INTEL, .feature = X86_FEATURE_CQM_MBM_TOTAL }, 1625 - {} 1626 - }; 1627 - 1628 - static int intel_mbm_init(void) 1629 - { 1630 - int ret = 0, array_size, maxid = cqm_max_rmid + 1; 1631 - 1632 - mbm_socket_max = topology_max_packages(); 1633 - array_size = sizeof(struct sample) * maxid * mbm_socket_max; 1634 - mbm_local = kmalloc(array_size, GFP_KERNEL); 1635 - if (!mbm_local) 1636 - return -ENOMEM; 1637 - 1638 - mbm_total = kmalloc(array_size, GFP_KERNEL); 1639 - if (!mbm_total) { 1640 - ret = -ENOMEM; 1641 - goto out; 1642 - } 1643 - 1644 - array_size = sizeof(struct hrtimer) * mbm_socket_max; 1645 - mbm_timers = kmalloc(array_size, GFP_KERNEL); 1646 - if (!mbm_timers) { 1647 - ret = -ENOMEM; 1648 - goto out; 1649 - } 1650 - mbm_hrtimer_init(); 1651 - 1652 - out: 1653 - if (ret) 1654 - mbm_cleanup(); 1655 - 1656 - return ret; 1657 - } 1658 - 1659 - static int __init intel_cqm_init(void) 1660 - { 1661 - char *str = NULL, scale[20]; 1662 - int cpu, ret; 1663 - 1664 - if (x86_match_cpu(intel_cqm_match)) 1665 - cqm_enabled = true; 1666 - 1667 - if (x86_match_cpu(intel_mbm_local_match) && 1668 - x86_match_cpu(intel_mbm_total_match)) 1669 - mbm_enabled = true; 1670 - 1671 - if (!cqm_enabled && !mbm_enabled) 1672 - return -ENODEV; 1673 - 1674 - cqm_l3_scale = boot_cpu_data.x86_cache_occ_scale; 1675 - 1676 - /* 1677 - * It's possible that not all resources support the same number 1678 - * of RMIDs. Instead of making scheduling much more complicated 1679 - * (where we have to match a task's RMID to a cpu that supports 1680 - * that many RMIDs) just find the minimum RMIDs supported across 1681 - * all cpus. 1682 - * 1683 - * Also, check that the scales match on all cpus. 1684 - */ 1685 - cpus_read_lock(); 1686 - for_each_online_cpu(cpu) { 1687 - struct cpuinfo_x86 *c = &cpu_data(cpu); 1688 - 1689 - if (c->x86_cache_max_rmid < cqm_max_rmid) 1690 - cqm_max_rmid = c->x86_cache_max_rmid; 1691 - 1692 - if (c->x86_cache_occ_scale != cqm_l3_scale) { 1693 - pr_err("Multiple LLC scale values, disabling\n"); 1694 - ret = -EINVAL; 1695 - goto out; 1696 - } 1697 - } 1698 - 1699 - /* 1700 - * A reasonable upper limit on the max threshold is the number 1701 - * of lines tagged per RMID if all RMIDs have the same number of 1702 - * lines tagged in the LLC. 1703 - * 1704 - * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC. 1705 - */ 1706 - __intel_cqm_max_threshold = 1707 - boot_cpu_data.x86_cache_size * 1024 / (cqm_max_rmid + 1); 1708 - 1709 - snprintf(scale, sizeof(scale), "%u", cqm_l3_scale); 1710 - str = kstrdup(scale, GFP_KERNEL); 1711 - if (!str) { 1712 - ret = -ENOMEM; 1713 - goto out; 1714 - } 1715 - 1716 - event_attr_intel_cqm_llc_scale.event_str = str; 1717 - 1718 - ret = intel_cqm_setup_rmid_cache(); 1719 - if (ret) 1720 - goto out; 1721 - 1722 - if (mbm_enabled) 1723 - ret = intel_mbm_init(); 1724 - if (ret && !cqm_enabled) 1725 - goto out; 1726 - 1727 - if (cqm_enabled && mbm_enabled) 1728 - intel_cqm_events_group.attrs = intel_cmt_mbm_events_attr; 1729 - else if (!cqm_enabled && mbm_enabled) 1730 - intel_cqm_events_group.attrs = intel_mbm_events_attr; 1731 - else if (cqm_enabled && !mbm_enabled) 1732 - intel_cqm_events_group.attrs = intel_cqm_events_attr; 1733 - 1734 - ret = perf_pmu_register(&intel_cqm_pmu, "intel_cqm", -1); 1735 - if (ret) { 1736 - pr_err("Intel CQM perf registration failed: %d\n", ret); 1737 - goto out; 1738 - } 1739 - 1740 - if (cqm_enabled) 1741 - pr_info("Intel CQM monitoring enabled\n"); 1742 - if (mbm_enabled) 1743 - pr_info("Intel MBM enabled\n"); 1744 - 1745 - /* 1746 - * Setup the hot cpu notifier once we are sure cqm 1747 - * is enabled to avoid notifier leak. 1748 - */ 1749 - cpuhp_setup_state_cpuslocked(CPUHP_AP_PERF_X86_CQM_STARTING, 1750 - "perf/x86/cqm:starting", 1751 - intel_cqm_cpu_starting, NULL); 1752 - cpuhp_setup_state_cpuslocked(CPUHP_AP_PERF_X86_CQM_ONLINE, 1753 - "perf/x86/cqm:online", 1754 - NULL, intel_cqm_cpu_exit); 1755 - out: 1756 - cpus_read_unlock(); 1757 - 1758 - if (ret) { 1759 - kfree(str); 1760 - cqm_cleanup(); 1761 - mbm_cleanup(); 1762 - } 1763 - 1764 - return ret; 1765 - } 1766 - device_initcall(intel_cqm_init);
···
-286
arch/x86/include/asm/intel_rdt.h
··· 1 - #ifndef _ASM_X86_INTEL_RDT_H 2 - #define _ASM_X86_INTEL_RDT_H 3 - 4 - #ifdef CONFIG_INTEL_RDT_A 5 - 6 - #include <linux/sched.h> 7 - #include <linux/kernfs.h> 8 - #include <linux/jump_label.h> 9 - 10 - #include <asm/intel_rdt_common.h> 11 - 12 - #define IA32_L3_QOS_CFG 0xc81 13 - #define IA32_L3_CBM_BASE 0xc90 14 - #define IA32_L2_CBM_BASE 0xd10 15 - #define IA32_MBA_THRTL_BASE 0xd50 16 - 17 - #define L3_QOS_CDP_ENABLE 0x01ULL 18 - 19 - /** 20 - * struct rdtgroup - store rdtgroup's data in resctrl file system. 21 - * @kn: kernfs node 22 - * @rdtgroup_list: linked list for all rdtgroups 23 - * @closid: closid for this rdtgroup 24 - * @cpu_mask: CPUs assigned to this rdtgroup 25 - * @flags: status bits 26 - * @waitcount: how many cpus expect to find this 27 - * group when they acquire rdtgroup_mutex 28 - */ 29 - struct rdtgroup { 30 - struct kernfs_node *kn; 31 - struct list_head rdtgroup_list; 32 - int closid; 33 - struct cpumask cpu_mask; 34 - int flags; 35 - atomic_t waitcount; 36 - }; 37 - 38 - /* rdtgroup.flags */ 39 - #define RDT_DELETED 1 40 - 41 - /* rftype.flags */ 42 - #define RFTYPE_FLAGS_CPUS_LIST 1 43 - 44 - /* List of all resource groups */ 45 - extern struct list_head rdt_all_groups; 46 - 47 - extern int max_name_width, max_data_width; 48 - 49 - int __init rdtgroup_init(void); 50 - 51 - /** 52 - * struct rftype - describe each file in the resctrl file system 53 - * @name: File name 54 - * @mode: Access mode 55 - * @kf_ops: File operations 56 - * @flags: File specific RFTYPE_FLAGS_* flags 57 - * @seq_show: Show content of the file 58 - * @write: Write to the file 59 - */ 60 - struct rftype { 61 - char *name; 62 - umode_t mode; 63 - struct kernfs_ops *kf_ops; 64 - unsigned long flags; 65 - 66 - int (*seq_show)(struct kernfs_open_file *of, 67 - struct seq_file *sf, void *v); 68 - /* 69 - * write() is the generic write callback which maps directly to 70 - * kernfs write operation and overrides all other operations. 71 - * Maximum write size is determined by ->max_write_len. 72 - */ 73 - ssize_t (*write)(struct kernfs_open_file *of, 74 - char *buf, size_t nbytes, loff_t off); 75 - }; 76 - 77 - /** 78 - * struct rdt_domain - group of cpus sharing an RDT resource 79 - * @list: all instances of this resource 80 - * @id: unique id for this instance 81 - * @cpu_mask: which cpus share this resource 82 - * @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID) 83 - * @new_ctrl: new ctrl value to be loaded 84 - * @have_new_ctrl: did user provide new_ctrl for this domain 85 - */ 86 - struct rdt_domain { 87 - struct list_head list; 88 - int id; 89 - struct cpumask cpu_mask; 90 - u32 *ctrl_val; 91 - u32 new_ctrl; 92 - bool have_new_ctrl; 93 - }; 94 - 95 - /** 96 - * struct msr_param - set a range of MSRs from a domain 97 - * @res: The resource to use 98 - * @low: Beginning index from base MSR 99 - * @high: End index 100 - */ 101 - struct msr_param { 102 - struct rdt_resource *res; 103 - int low; 104 - int high; 105 - }; 106 - 107 - /** 108 - * struct rdt_cache - Cache allocation related data 109 - * @cbm_len: Length of the cache bit mask 110 - * @min_cbm_bits: Minimum number of consecutive bits to be set 111 - * @cbm_idx_mult: Multiplier of CBM index 112 - * @cbm_idx_offset: Offset of CBM index. CBM index is computed by: 113 - * closid * cbm_idx_multi + cbm_idx_offset 114 - * in a cache bit mask 115 - */ 116 - struct rdt_cache { 117 - unsigned int cbm_len; 118 - unsigned int min_cbm_bits; 119 - unsigned int cbm_idx_mult; 120 - unsigned int cbm_idx_offset; 121 - }; 122 - 123 - /** 124 - * struct rdt_membw - Memory bandwidth allocation related data 125 - * @max_delay: Max throttle delay. Delay is the hardware 126 - * representation for memory bandwidth. 127 - * @min_bw: Minimum memory bandwidth percentage user can request 128 - * @bw_gran: Granularity at which the memory bandwidth is allocated 129 - * @delay_linear: True if memory B/W delay is in linear scale 130 - * @mb_map: Mapping of memory B/W percentage to memory B/W delay 131 - */ 132 - struct rdt_membw { 133 - u32 max_delay; 134 - u32 min_bw; 135 - u32 bw_gran; 136 - u32 delay_linear; 137 - u32 *mb_map; 138 - }; 139 - 140 - /** 141 - * struct rdt_resource - attributes of an RDT resource 142 - * @enabled: Is this feature enabled on this machine 143 - * @capable: Is this feature available on this machine 144 - * @name: Name to use in "schemata" file 145 - * @num_closid: Number of CLOSIDs available 146 - * @cache_level: Which cache level defines scope of this resource 147 - * @default_ctrl: Specifies default cache cbm or memory B/W percent. 148 - * @msr_base: Base MSR address for CBMs 149 - * @msr_update: Function pointer to update QOS MSRs 150 - * @data_width: Character width of data when displaying 151 - * @domains: All domains for this resource 152 - * @cache: Cache allocation related data 153 - * @info_files: resctrl info files for the resource 154 - * @nr_info_files: Number of info files 155 - * @format_str: Per resource format string to show domain value 156 - * @parse_ctrlval: Per resource function pointer to parse control values 157 - */ 158 - struct rdt_resource { 159 - bool enabled; 160 - bool capable; 161 - char *name; 162 - int num_closid; 163 - int cache_level; 164 - u32 default_ctrl; 165 - unsigned int msr_base; 166 - void (*msr_update) (struct rdt_domain *d, struct msr_param *m, 167 - struct rdt_resource *r); 168 - int data_width; 169 - struct list_head domains; 170 - struct rdt_cache cache; 171 - struct rdt_membw membw; 172 - struct rftype *info_files; 173 - int nr_info_files; 174 - const char *format_str; 175 - int (*parse_ctrlval) (char *buf, struct rdt_resource *r, 176 - struct rdt_domain *d); 177 - }; 178 - 179 - void rdt_get_cache_infofile(struct rdt_resource *r); 180 - void rdt_get_mba_infofile(struct rdt_resource *r); 181 - int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d); 182 - int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d); 183 - 184 - extern struct mutex rdtgroup_mutex; 185 - 186 - extern struct rdt_resource rdt_resources_all[]; 187 - extern struct rdtgroup rdtgroup_default; 188 - DECLARE_STATIC_KEY_FALSE(rdt_enable_key); 189 - 190 - int __init rdtgroup_init(void); 191 - 192 - enum { 193 - RDT_RESOURCE_L3, 194 - RDT_RESOURCE_L3DATA, 195 - RDT_RESOURCE_L3CODE, 196 - RDT_RESOURCE_L2, 197 - RDT_RESOURCE_MBA, 198 - 199 - /* Must be the last */ 200 - RDT_NUM_RESOURCES, 201 - }; 202 - 203 - #define for_each_capable_rdt_resource(r) \ 204 - for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\ 205 - r++) \ 206 - if (r->capable) 207 - 208 - #define for_each_enabled_rdt_resource(r) \ 209 - for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\ 210 - r++) \ 211 - if (r->enabled) 212 - 213 - /* CPUID.(EAX=10H, ECX=ResID=1).EAX */ 214 - union cpuid_0x10_1_eax { 215 - struct { 216 - unsigned int cbm_len:5; 217 - } split; 218 - unsigned int full; 219 - }; 220 - 221 - /* CPUID.(EAX=10H, ECX=ResID=3).EAX */ 222 - union cpuid_0x10_3_eax { 223 - struct { 224 - unsigned int max_delay:12; 225 - } split; 226 - unsigned int full; 227 - }; 228 - 229 - /* CPUID.(EAX=10H, ECX=ResID).EDX */ 230 - union cpuid_0x10_x_edx { 231 - struct { 232 - unsigned int cos_max:16; 233 - } split; 234 - unsigned int full; 235 - }; 236 - 237 - DECLARE_PER_CPU_READ_MOSTLY(int, cpu_closid); 238 - 239 - void rdt_ctrl_update(void *arg); 240 - struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn); 241 - void rdtgroup_kn_unlock(struct kernfs_node *kn); 242 - ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of, 243 - char *buf, size_t nbytes, loff_t off); 244 - int rdtgroup_schemata_show(struct kernfs_open_file *of, 245 - struct seq_file *s, void *v); 246 - 247 - /* 248 - * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR 249 - * 250 - * Following considerations are made so that this has minimal impact 251 - * on scheduler hot path: 252 - * - This will stay as no-op unless we are running on an Intel SKU 253 - * which supports resource control and we enable by mounting the 254 - * resctrl file system. 255 - * - Caches the per cpu CLOSid values and does the MSR write only 256 - * when a task with a different CLOSid is scheduled in. 257 - * 258 - * Must be called with preemption disabled. 259 - */ 260 - static inline void intel_rdt_sched_in(void) 261 - { 262 - if (static_branch_likely(&rdt_enable_key)) { 263 - struct intel_pqr_state *state = this_cpu_ptr(&pqr_state); 264 - int closid; 265 - 266 - /* 267 - * If this task has a closid assigned, use it. 268 - * Else use the closid assigned to this cpu. 269 - */ 270 - closid = current->closid; 271 - if (closid == 0) 272 - closid = this_cpu_read(cpu_closid); 273 - 274 - if (closid != state->closid) { 275 - state->closid = closid; 276 - wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid); 277 - } 278 - } 279 - } 280 - 281 - #else 282 - 283 - static inline void intel_rdt_sched_in(void) {} 284 - 285 - #endif /* CONFIG_INTEL_RDT_A */ 286 - #endif /* _ASM_X86_INTEL_RDT_H */
···
-27
arch/x86/include/asm/intel_rdt_common.h
··· 1 - #ifndef _ASM_X86_INTEL_RDT_COMMON_H 2 - #define _ASM_X86_INTEL_RDT_COMMON_H 3 - 4 - #define MSR_IA32_PQR_ASSOC 0x0c8f 5 - 6 - /** 7 - * struct intel_pqr_state - State cache for the PQR MSR 8 - * @rmid: The cached Resource Monitoring ID 9 - * @closid: The cached Class Of Service ID 10 - * @rmid_usecnt: The usage counter for rmid 11 - * 12 - * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the 13 - * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always 14 - * contains both parts, so we need to cache them. 15 - * 16 - * The cache also helps to avoid pointless updates if the value does 17 - * not change. 18 - */ 19 - struct intel_pqr_state { 20 - u32 rmid; 21 - u32 closid; 22 - int rmid_usecnt; 23 - }; 24 - 25 - DECLARE_PER_CPU(struct intel_pqr_state, pqr_state); 26 - 27 - #endif /* _ASM_X86_INTEL_RDT_COMMON_H */
···
+92
arch/x86/include/asm/intel_rdt_sched.h
···
··· 1 + #ifndef _ASM_X86_INTEL_RDT_SCHED_H 2 + #define _ASM_X86_INTEL_RDT_SCHED_H 3 + 4 + #ifdef CONFIG_INTEL_RDT 5 + 6 + #include <linux/sched.h> 7 + #include <linux/jump_label.h> 8 + 9 + #define IA32_PQR_ASSOC 0x0c8f 10 + 11 + /** 12 + * struct intel_pqr_state - State cache for the PQR MSR 13 + * @cur_rmid: The cached Resource Monitoring ID 14 + * @cur_closid: The cached Class Of Service ID 15 + * @default_rmid: The user assigned Resource Monitoring ID 16 + * @default_closid: The user assigned cached Class Of Service ID 17 + * 18 + * The upper 32 bits of IA32_PQR_ASSOC contain closid and the 19 + * lower 10 bits rmid. The update to IA32_PQR_ASSOC always 20 + * contains both parts, so we need to cache them. This also 21 + * stores the user configured per cpu CLOSID and RMID. 22 + * 23 + * The cache also helps to avoid pointless updates if the value does 24 + * not change. 25 + */ 26 + struct intel_pqr_state { 27 + u32 cur_rmid; 28 + u32 cur_closid; 29 + u32 default_rmid; 30 + u32 default_closid; 31 + }; 32 + 33 + DECLARE_PER_CPU(struct intel_pqr_state, pqr_state); 34 + 35 + DECLARE_STATIC_KEY_FALSE(rdt_enable_key); 36 + DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key); 37 + DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key); 38 + 39 + /* 40 + * __intel_rdt_sched_in() - Writes the task's CLOSid/RMID to IA32_PQR_MSR 41 + * 42 + * Following considerations are made so that this has minimal impact 43 + * on scheduler hot path: 44 + * - This will stay as no-op unless we are running on an Intel SKU 45 + * which supports resource control or monitoring and we enable by 46 + * mounting the resctrl file system. 47 + * - Caches the per cpu CLOSid/RMID values and does the MSR write only 48 + * when a task with a different CLOSid/RMID is scheduled in. 49 + * - We allocate RMIDs/CLOSids globally in order to keep this as 50 + * simple as possible. 51 + * Must be called with preemption disabled. 52 + */ 53 + static void __intel_rdt_sched_in(void) 54 + { 55 + struct intel_pqr_state *state = this_cpu_ptr(&pqr_state); 56 + u32 closid = state->default_closid; 57 + u32 rmid = state->default_rmid; 58 + 59 + /* 60 + * If this task has a closid/rmid assigned, use it. 61 + * Else use the closid/rmid assigned to this cpu. 62 + */ 63 + if (static_branch_likely(&rdt_alloc_enable_key)) { 64 + if (current->closid) 65 + closid = current->closid; 66 + } 67 + 68 + if (static_branch_likely(&rdt_mon_enable_key)) { 69 + if (current->rmid) 70 + rmid = current->rmid; 71 + } 72 + 73 + if (closid != state->cur_closid || rmid != state->cur_rmid) { 74 + state->cur_closid = closid; 75 + state->cur_rmid = rmid; 76 + wrmsr(IA32_PQR_ASSOC, rmid, closid); 77 + } 78 + } 79 + 80 + static inline void intel_rdt_sched_in(void) 81 + { 82 + if (static_branch_likely(&rdt_enable_key)) 83 + __intel_rdt_sched_in(); 84 + } 85 + 86 + #else 87 + 88 + static inline void intel_rdt_sched_in(void) {} 89 + 90 + #endif /* CONFIG_INTEL_RDT */ 91 + 92 + #endif /* _ASM_X86_INTEL_RDT_SCHED_H */
+1 -1
arch/x86/kernel/cpu/Makefile
··· 33 obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o 34 obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o 35 36 - obj-$(CONFIG_INTEL_RDT_A) += intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_schemata.o 37 38 obj-$(CONFIG_X86_MCE) += mcheck/ 39 obj-$(CONFIG_MTRR) += mtrr/
··· 33 obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o 34 obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o 35 36 + obj-$(CONFIG_INTEL_RDT) += intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o intel_rdt_ctrlmondata.o 37 38 obj-$(CONFIG_X86_MCE) += mcheck/ 39 obj-$(CONFIG_MTRR) += mtrr/
+314 -61
arch/x86/kernel/cpu/intel_rdt.c
··· 30 #include <linux/cpuhotplug.h> 31 32 #include <asm/intel-family.h> 33 - #include <asm/intel_rdt.h> 34 35 #define MAX_MBA_BW 100u 36 #define MBA_IS_LINEAR 0x4 ··· 39 /* Mutex to protect rdtgroup access. */ 40 DEFINE_MUTEX(rdtgroup_mutex); 41 42 - DEFINE_PER_CPU_READ_MOSTLY(int, cpu_closid); 43 44 /* 45 * Used to store the max resource name width and max resource data width 46 * to display the schemata in a tabular format 47 */ 48 int max_name_width, max_data_width; 49 50 static void 51 mba_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r); ··· 67 #define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains) 68 69 struct rdt_resource rdt_resources_all[] = { 70 { 71 .name = "L3", 72 .domains = domain_init(RDT_RESOURCE_L3), 73 .msr_base = IA32_L3_CBM_BASE, ··· 82 }, 83 .parse_ctrlval = parse_cbm, 84 .format_str = "%d=%0*x", 85 }, 86 { 87 .name = "L3DATA", 88 .domains = domain_init(RDT_RESOURCE_L3DATA), 89 .msr_base = IA32_L3_CBM_BASE, ··· 99 }, 100 .parse_ctrlval = parse_cbm, 101 .format_str = "%d=%0*x", 102 }, 103 { 104 .name = "L3CODE", 105 .domains = domain_init(RDT_RESOURCE_L3CODE), 106 .msr_base = IA32_L3_CBM_BASE, ··· 116 }, 117 .parse_ctrlval = parse_cbm, 118 .format_str = "%d=%0*x", 119 }, 120 { 121 .name = "L2", 122 .domains = domain_init(RDT_RESOURCE_L2), 123 .msr_base = IA32_L2_CBM_BASE, ··· 133 }, 134 .parse_ctrlval = parse_cbm, 135 .format_str = "%d=%0*x", 136 }, 137 { 138 .name = "MB", 139 .domains = domain_init(RDT_RESOURCE_MBA), 140 .msr_base = IA32_MBA_THRTL_BASE, ··· 145 .cache_level = 3, 146 .parse_ctrlval = parse_bw, 147 .format_str = "%d=%*d", 148 }, 149 }; 150 ··· 172 * is always 20 on hsw server parts. The minimum cache bitmask length 173 * allowed for HSW server is always 2 bits. Hardcode all of them. 174 */ 175 - static inline bool cache_alloc_hsw_probe(void) 176 { 177 - if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && 178 - boot_cpu_data.x86 == 6 && 179 - boot_cpu_data.x86_model == INTEL_FAM6_HASWELL_X) { 180 - struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3]; 181 - u32 l, h, max_cbm = BIT_MASK(20) - 1; 182 183 - if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0)) 184 - return false; 185 - rdmsr(IA32_L3_CBM_BASE, l, h); 186 187 - /* If all the bits were set in MSR, return success */ 188 - if (l != max_cbm) 189 - return false; 190 191 - r->num_closid = 4; 192 - r->default_ctrl = max_cbm; 193 - r->cache.cbm_len = 20; 194 - r->cache.min_cbm_bits = 2; 195 - r->capable = true; 196 - r->enabled = true; 197 198 - return true; 199 - } 200 - 201 - return false; 202 } 203 204 /* ··· 236 return false; 237 } 238 r->data_width = 3; 239 - rdt_get_mba_infofile(r); 240 241 - r->capable = true; 242 - r->enabled = true; 243 244 return true; 245 } 246 247 - static void rdt_get_cache_config(int idx, struct rdt_resource *r) 248 { 249 union cpuid_0x10_1_eax eax; 250 union cpuid_0x10_x_edx edx; ··· 253 r->num_closid = edx.split.cos_max + 1; 254 r->cache.cbm_len = eax.split.cbm_len + 1; 255 r->default_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1; 256 r->data_width = (r->cache.cbm_len + 3) / 4; 257 - rdt_get_cache_infofile(r); 258 - r->capable = true; 259 - r->enabled = true; 260 } 261 262 static void rdt_get_cdp_l3_config(int type) ··· 268 r->cache.cbm_len = r_l3->cache.cbm_len; 269 r->default_ctrl = r_l3->default_ctrl; 270 r->data_width = (r->cache.cbm_len + 3) / 4; 271 - r->capable = true; 272 /* 273 * By default, CDP is disabled. CDP can be enabled by mount parameter 274 * "cdp" during resctrl file system mount time. 275 */ 276 - r->enabled = false; 277 } 278 279 static int get_cache_id(int cpu, int level) ··· 322 wrmsrl(r->msr_base + cbm_idx(r, i), d->ctrl_val[i]); 323 } 324 325 void rdt_ctrl_update(void *arg) 326 { 327 struct msr_param *m = arg; ··· 342 int cpu = smp_processor_id(); 343 struct rdt_domain *d; 344 345 - list_for_each_entry(d, &r->domains, list) { 346 - /* Find the domain that contains this CPU */ 347 - if (cpumask_test_cpu(cpu, &d->cpu_mask)) { 348 - r->msr_update(d, m, r); 349 - return; 350 - } 351 } 352 pr_warn_once("cpu %d not found in any domain for resource %s\n", 353 cpu, r->name); ··· 359 * caller, return the first domain whose id is bigger than the input id. 360 * The domain list is sorted by id in ascending order. 361 */ 362 - static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, 363 - struct list_head **pos) 364 { 365 struct rdt_domain *d; 366 struct list_head *l; ··· 410 return 0; 411 } 412 413 /* 414 * domain_add_cpu - Add a cpu to a resource's domain list. 415 * ··· 483 return; 484 485 d->id = id; 486 487 - if (domain_setup_ctrlval(r, d)) { 488 kfree(d); 489 return; 490 } 491 492 - cpumask_set_cpu(cpu, &d->cpu_mask); 493 list_add_tail(&d->list, add_pos); 494 } 495 496 static void domain_remove_cpu(int cpu, struct rdt_resource *r) ··· 518 519 cpumask_clear_cpu(cpu, &d->cpu_mask); 520 if (cpumask_empty(&d->cpu_mask)) { 521 kfree(d->ctrl_val); 522 list_del(&d->list); 523 kfree(d); 524 } 525 } 526 527 - static void clear_closid(int cpu) 528 { 529 struct intel_pqr_state *state = this_cpu_ptr(&pqr_state); 530 531 - per_cpu(cpu_closid, cpu) = 0; 532 - state->closid = 0; 533 - wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0); 534 } 535 536 static int intel_rdt_online_cpu(unsigned int cpu) ··· 581 domain_add_cpu(cpu, r); 582 /* The cpu is set in default rdtgroup after online. */ 583 cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask); 584 - clear_closid(cpu); 585 mutex_unlock(&rdtgroup_mutex); 586 587 return 0; 588 } 589 590 static int intel_rdt_offline_cpu(unsigned int cpu) ··· 607 for_each_capable_rdt_resource(r) 608 domain_remove_cpu(cpu, r); 609 list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) { 610 - if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) 611 break; 612 } 613 - clear_closid(cpu); 614 mutex_unlock(&rdtgroup_mutex); 615 616 return 0; ··· 627 struct rdt_resource *r; 628 int cl; 629 630 - for_each_capable_rdt_resource(r) { 631 cl = strlen(r->name); 632 if (cl > max_name_width) 633 max_name_width = cl; ··· 637 } 638 } 639 640 - static __init bool get_rdt_resources(void) 641 { 642 bool ret = false; 643 644 - if (cache_alloc_hsw_probe()) 645 return true; 646 647 if (!boot_cpu_has(X86_FEATURE_RDT_A)) 648 return false; 649 650 - if (boot_cpu_has(X86_FEATURE_CAT_L3)) { 651 - rdt_get_cache_config(1, &rdt_resources_all[RDT_RESOURCE_L3]); 652 - if (boot_cpu_has(X86_FEATURE_CDP_L3)) { 653 rdt_get_cdp_l3_config(RDT_RESOURCE_L3DATA); 654 rdt_get_cdp_l3_config(RDT_RESOURCE_L3CODE); 655 } 656 ret = true; 657 } 658 - if (boot_cpu_has(X86_FEATURE_CAT_L2)) { 659 /* CPUID 0x10.2 fields are same format at 0x10.1 */ 660 - rdt_get_cache_config(2, &rdt_resources_all[RDT_RESOURCE_L2]); 661 ret = true; 662 } 663 664 - if (boot_cpu_has(X86_FEATURE_MBA)) { 665 if (rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA])) 666 ret = true; 667 } 668 - 669 return ret; 670 } 671 672 static int __init intel_rdt_late_init(void) ··· 806 return ret; 807 } 808 809 - for_each_capable_rdt_resource(r) 810 pr_info("Intel RDT %s allocation detected\n", r->name); 811 812 return 0; 813 }
··· 30 #include <linux/cpuhotplug.h> 31 32 #include <asm/intel-family.h> 33 + #include <asm/intel_rdt_sched.h> 34 + #include "intel_rdt.h" 35 36 #define MAX_MBA_BW 100u 37 #define MBA_IS_LINEAR 0x4 ··· 38 /* Mutex to protect rdtgroup access. */ 39 DEFINE_MUTEX(rdtgroup_mutex); 40 41 + /* 42 + * The cached intel_pqr_state is strictly per CPU and can never be 43 + * updated from a remote CPU. Functions which modify the state 44 + * are called with interrupts disabled and no preemption, which 45 + * is sufficient for the protection. 46 + */ 47 + DEFINE_PER_CPU(struct intel_pqr_state, pqr_state); 48 49 /* 50 * Used to store the max resource name width and max resource data width 51 * to display the schemata in a tabular format 52 */ 53 int max_name_width, max_data_width; 54 + 55 + /* 56 + * Global boolean for rdt_alloc which is true if any 57 + * resource allocation is enabled. 58 + */ 59 + bool rdt_alloc_capable; 60 61 static void 62 mba_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r); ··· 54 #define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains) 55 56 struct rdt_resource rdt_resources_all[] = { 57 + [RDT_RESOURCE_L3] = 58 { 59 + .rid = RDT_RESOURCE_L3, 60 .name = "L3", 61 .domains = domain_init(RDT_RESOURCE_L3), 62 .msr_base = IA32_L3_CBM_BASE, ··· 67 }, 68 .parse_ctrlval = parse_cbm, 69 .format_str = "%d=%0*x", 70 + .fflags = RFTYPE_RES_CACHE, 71 }, 72 + [RDT_RESOURCE_L3DATA] = 73 { 74 + .rid = RDT_RESOURCE_L3DATA, 75 .name = "L3DATA", 76 .domains = domain_init(RDT_RESOURCE_L3DATA), 77 .msr_base = IA32_L3_CBM_BASE, ··· 81 }, 82 .parse_ctrlval = parse_cbm, 83 .format_str = "%d=%0*x", 84 + .fflags = RFTYPE_RES_CACHE, 85 }, 86 + [RDT_RESOURCE_L3CODE] = 87 { 88 + .rid = RDT_RESOURCE_L3CODE, 89 .name = "L3CODE", 90 .domains = domain_init(RDT_RESOURCE_L3CODE), 91 .msr_base = IA32_L3_CBM_BASE, ··· 95 }, 96 .parse_ctrlval = parse_cbm, 97 .format_str = "%d=%0*x", 98 + .fflags = RFTYPE_RES_CACHE, 99 }, 100 + [RDT_RESOURCE_L2] = 101 { 102 + .rid = RDT_RESOURCE_L2, 103 .name = "L2", 104 .domains = domain_init(RDT_RESOURCE_L2), 105 .msr_base = IA32_L2_CBM_BASE, ··· 109 }, 110 .parse_ctrlval = parse_cbm, 111 .format_str = "%d=%0*x", 112 + .fflags = RFTYPE_RES_CACHE, 113 }, 114 + [RDT_RESOURCE_MBA] = 115 { 116 + .rid = RDT_RESOURCE_MBA, 117 .name = "MB", 118 .domains = domain_init(RDT_RESOURCE_MBA), 119 .msr_base = IA32_MBA_THRTL_BASE, ··· 118 .cache_level = 3, 119 .parse_ctrlval = parse_bw, 120 .format_str = "%d=%*d", 121 + .fflags = RFTYPE_RES_MB, 122 }, 123 }; 124 ··· 144 * is always 20 on hsw server parts. The minimum cache bitmask length 145 * allowed for HSW server is always 2 bits. Hardcode all of them. 146 */ 147 + static inline void cache_alloc_hsw_probe(void) 148 { 149 + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3]; 150 + u32 l, h, max_cbm = BIT_MASK(20) - 1; 151 152 + if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0)) 153 + return; 154 + rdmsr(IA32_L3_CBM_BASE, l, h); 155 156 + /* If all the bits were set in MSR, return success */ 157 + if (l != max_cbm) 158 + return; 159 160 + r->num_closid = 4; 161 + r->default_ctrl = max_cbm; 162 + r->cache.cbm_len = 20; 163 + r->cache.shareable_bits = 0xc0000; 164 + r->cache.min_cbm_bits = 2; 165 + r->alloc_capable = true; 166 + r->alloc_enabled = true; 167 168 + rdt_alloc_capable = true; 169 } 170 171 /* ··· 213 return false; 214 } 215 r->data_width = 3; 216 217 + r->alloc_capable = true; 218 + r->alloc_enabled = true; 219 220 return true; 221 } 222 223 + static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r) 224 { 225 union cpuid_0x10_1_eax eax; 226 union cpuid_0x10_x_edx edx; ··· 231 r->num_closid = edx.split.cos_max + 1; 232 r->cache.cbm_len = eax.split.cbm_len + 1; 233 r->default_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1; 234 + r->cache.shareable_bits = ebx & r->default_ctrl; 235 r->data_width = (r->cache.cbm_len + 3) / 4; 236 + r->alloc_capable = true; 237 + r->alloc_enabled = true; 238 } 239 240 static void rdt_get_cdp_l3_config(int type) ··· 246 r->cache.cbm_len = r_l3->cache.cbm_len; 247 r->default_ctrl = r_l3->default_ctrl; 248 r->data_width = (r->cache.cbm_len + 3) / 4; 249 + r->alloc_capable = true; 250 /* 251 * By default, CDP is disabled. CDP can be enabled by mount parameter 252 * "cdp" during resctrl file system mount time. 253 */ 254 + r->alloc_enabled = false; 255 } 256 257 static int get_cache_id(int cpu, int level) ··· 300 wrmsrl(r->msr_base + cbm_idx(r, i), d->ctrl_val[i]); 301 } 302 303 + struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r) 304 + { 305 + struct rdt_domain *d; 306 + 307 + list_for_each_entry(d, &r->domains, list) { 308 + /* Find the domain that contains this CPU */ 309 + if (cpumask_test_cpu(cpu, &d->cpu_mask)) 310 + return d; 311 + } 312 + 313 + return NULL; 314 + } 315 + 316 void rdt_ctrl_update(void *arg) 317 { 318 struct msr_param *m = arg; ··· 307 int cpu = smp_processor_id(); 308 struct rdt_domain *d; 309 310 + d = get_domain_from_cpu(cpu, r); 311 + if (d) { 312 + r->msr_update(d, m, r); 313 + return; 314 } 315 pr_warn_once("cpu %d not found in any domain for resource %s\n", 316 cpu, r->name); ··· 326 * caller, return the first domain whose id is bigger than the input id. 327 * The domain list is sorted by id in ascending order. 328 */ 329 + struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, 330 + struct list_head **pos) 331 { 332 struct rdt_domain *d; 333 struct list_head *l; ··· 377 return 0; 378 } 379 380 + static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d) 381 + { 382 + size_t tsize; 383 + 384 + if (is_llc_occupancy_enabled()) { 385 + d->rmid_busy_llc = kcalloc(BITS_TO_LONGS(r->num_rmid), 386 + sizeof(unsigned long), 387 + GFP_KERNEL); 388 + if (!d->rmid_busy_llc) 389 + return -ENOMEM; 390 + INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo); 391 + } 392 + if (is_mbm_total_enabled()) { 393 + tsize = sizeof(*d->mbm_total); 394 + d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL); 395 + if (!d->mbm_total) { 396 + kfree(d->rmid_busy_llc); 397 + return -ENOMEM; 398 + } 399 + } 400 + if (is_mbm_local_enabled()) { 401 + tsize = sizeof(*d->mbm_local); 402 + d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL); 403 + if (!d->mbm_local) { 404 + kfree(d->rmid_busy_llc); 405 + kfree(d->mbm_total); 406 + return -ENOMEM; 407 + } 408 + } 409 + 410 + if (is_mbm_enabled()) { 411 + INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow); 412 + mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL); 413 + } 414 + 415 + return 0; 416 + } 417 + 418 /* 419 * domain_add_cpu - Add a cpu to a resource's domain list. 420 * ··· 412 return; 413 414 d->id = id; 415 + cpumask_set_cpu(cpu, &d->cpu_mask); 416 417 + if (r->alloc_capable && domain_setup_ctrlval(r, d)) { 418 kfree(d); 419 return; 420 } 421 422 + if (r->mon_capable && domain_setup_mon_state(r, d)) { 423 + kfree(d); 424 + return; 425 + } 426 + 427 list_add_tail(&d->list, add_pos); 428 + 429 + /* 430 + * If resctrl is mounted, add 431 + * per domain monitor data directories. 432 + */ 433 + if (static_branch_unlikely(&rdt_mon_enable_key)) 434 + mkdir_mondata_subdir_allrdtgrp(r, d); 435 } 436 437 static void domain_remove_cpu(int cpu, struct rdt_resource *r) ··· 435 436 cpumask_clear_cpu(cpu, &d->cpu_mask); 437 if (cpumask_empty(&d->cpu_mask)) { 438 + /* 439 + * If resctrl is mounted, remove all the 440 + * per domain monitor data directories. 441 + */ 442 + if (static_branch_unlikely(&rdt_mon_enable_key)) 443 + rmdir_mondata_subdir_allrdtgrp(r, d->id); 444 kfree(d->ctrl_val); 445 + kfree(d->rmid_busy_llc); 446 + kfree(d->mbm_total); 447 + kfree(d->mbm_local); 448 list_del(&d->list); 449 + if (is_mbm_enabled()) 450 + cancel_delayed_work(&d->mbm_over); 451 + if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) { 452 + /* 453 + * When a package is going down, forcefully 454 + * decrement rmid->ebusy. There is no way to know 455 + * that the L3 was flushed and hence may lead to 456 + * incorrect counts in rare scenarios, but leaving 457 + * the RMID as busy creates RMID leaks if the 458 + * package never comes back. 459 + */ 460 + __check_limbo(d, true); 461 + cancel_delayed_work(&d->cqm_limbo); 462 + } 463 + 464 kfree(d); 465 + return; 466 + } 467 + 468 + if (r == &rdt_resources_all[RDT_RESOURCE_L3]) { 469 + if (is_mbm_enabled() && cpu == d->mbm_work_cpu) { 470 + cancel_delayed_work(&d->mbm_over); 471 + mbm_setup_overflow_handler(d, 0); 472 + } 473 + if (is_llc_occupancy_enabled() && cpu == d->cqm_work_cpu && 474 + has_busy_rmid(r, d)) { 475 + cancel_delayed_work(&d->cqm_limbo); 476 + cqm_setup_limbo_handler(d, 0); 477 + } 478 } 479 } 480 481 + static void clear_closid_rmid(int cpu) 482 { 483 struct intel_pqr_state *state = this_cpu_ptr(&pqr_state); 484 485 + state->default_closid = 0; 486 + state->default_rmid = 0; 487 + state->cur_closid = 0; 488 + state->cur_rmid = 0; 489 + wrmsr(IA32_PQR_ASSOC, 0, 0); 490 } 491 492 static int intel_rdt_online_cpu(unsigned int cpu) ··· 459 domain_add_cpu(cpu, r); 460 /* The cpu is set in default rdtgroup after online. */ 461 cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask); 462 + clear_closid_rmid(cpu); 463 mutex_unlock(&rdtgroup_mutex); 464 465 return 0; 466 + } 467 + 468 + static void clear_childcpus(struct rdtgroup *r, unsigned int cpu) 469 + { 470 + struct rdtgroup *cr; 471 + 472 + list_for_each_entry(cr, &r->mon.crdtgrp_list, mon.crdtgrp_list) { 473 + if (cpumask_test_and_clear_cpu(cpu, &cr->cpu_mask)) { 474 + break; 475 + } 476 + } 477 } 478 479 static int intel_rdt_offline_cpu(unsigned int cpu) ··· 474 for_each_capable_rdt_resource(r) 475 domain_remove_cpu(cpu, r); 476 list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) { 477 + if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) { 478 + clear_childcpus(rdtgrp, cpu); 479 break; 480 + } 481 } 482 + clear_closid_rmid(cpu); 483 mutex_unlock(&rdtgroup_mutex); 484 485 return 0; ··· 492 struct rdt_resource *r; 493 int cl; 494 495 + for_each_alloc_capable_rdt_resource(r) { 496 cl = strlen(r->name); 497 if (cl > max_name_width) 498 max_name_width = cl; ··· 502 } 503 } 504 505 + enum { 506 + RDT_FLAG_CMT, 507 + RDT_FLAG_MBM_TOTAL, 508 + RDT_FLAG_MBM_LOCAL, 509 + RDT_FLAG_L3_CAT, 510 + RDT_FLAG_L3_CDP, 511 + RDT_FLAG_L2_CAT, 512 + RDT_FLAG_MBA, 513 + }; 514 + 515 + #define RDT_OPT(idx, n, f) \ 516 + [idx] = { \ 517 + .name = n, \ 518 + .flag = f \ 519 + } 520 + 521 + struct rdt_options { 522 + char *name; 523 + int flag; 524 + bool force_off, force_on; 525 + }; 526 + 527 + static struct rdt_options rdt_options[] __initdata = { 528 + RDT_OPT(RDT_FLAG_CMT, "cmt", X86_FEATURE_CQM_OCCUP_LLC), 529 + RDT_OPT(RDT_FLAG_MBM_TOTAL, "mbmtotal", X86_FEATURE_CQM_MBM_TOTAL), 530 + RDT_OPT(RDT_FLAG_MBM_LOCAL, "mbmlocal", X86_FEATURE_CQM_MBM_LOCAL), 531 + RDT_OPT(RDT_FLAG_L3_CAT, "l3cat", X86_FEATURE_CAT_L3), 532 + RDT_OPT(RDT_FLAG_L3_CDP, "l3cdp", X86_FEATURE_CDP_L3), 533 + RDT_OPT(RDT_FLAG_L2_CAT, "l2cat", X86_FEATURE_CAT_L2), 534 + RDT_OPT(RDT_FLAG_MBA, "mba", X86_FEATURE_MBA), 535 + }; 536 + #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options) 537 + 538 + static int __init set_rdt_options(char *str) 539 + { 540 + struct rdt_options *o; 541 + bool force_off; 542 + char *tok; 543 + 544 + if (*str == '=') 545 + str++; 546 + while ((tok = strsep(&str, ",")) != NULL) { 547 + force_off = *tok == '!'; 548 + if (force_off) 549 + tok++; 550 + for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) { 551 + if (strcmp(tok, o->name) == 0) { 552 + if (force_off) 553 + o->force_off = true; 554 + else 555 + o->force_on = true; 556 + break; 557 + } 558 + } 559 + } 560 + return 1; 561 + } 562 + __setup("rdt", set_rdt_options); 563 + 564 + static bool __init rdt_cpu_has(int flag) 565 + { 566 + bool ret = boot_cpu_has(flag); 567 + struct rdt_options *o; 568 + 569 + if (!ret) 570 + return ret; 571 + 572 + for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) { 573 + if (flag == o->flag) { 574 + if (o->force_off) 575 + ret = false; 576 + if (o->force_on) 577 + ret = true; 578 + break; 579 + } 580 + } 581 + return ret; 582 + } 583 + 584 + static __init bool get_rdt_alloc_resources(void) 585 { 586 bool ret = false; 587 588 + if (rdt_alloc_capable) 589 return true; 590 591 if (!boot_cpu_has(X86_FEATURE_RDT_A)) 592 return false; 593 594 + if (rdt_cpu_has(X86_FEATURE_CAT_L3)) { 595 + rdt_get_cache_alloc_cfg(1, &rdt_resources_all[RDT_RESOURCE_L3]); 596 + if (rdt_cpu_has(X86_FEATURE_CDP_L3)) { 597 rdt_get_cdp_l3_config(RDT_RESOURCE_L3DATA); 598 rdt_get_cdp_l3_config(RDT_RESOURCE_L3CODE); 599 } 600 ret = true; 601 } 602 + if (rdt_cpu_has(X86_FEATURE_CAT_L2)) { 603 /* CPUID 0x10.2 fields are same format at 0x10.1 */ 604 + rdt_get_cache_alloc_cfg(2, &rdt_resources_all[RDT_RESOURCE_L2]); 605 ret = true; 606 } 607 608 + if (rdt_cpu_has(X86_FEATURE_MBA)) { 609 if (rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA])) 610 ret = true; 611 } 612 return ret; 613 + } 614 + 615 + static __init bool get_rdt_mon_resources(void) 616 + { 617 + if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) 618 + rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID); 619 + if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) 620 + rdt_mon_features |= (1 << QOS_L3_MBM_TOTAL_EVENT_ID); 621 + if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) 622 + rdt_mon_features |= (1 << QOS_L3_MBM_LOCAL_EVENT_ID); 623 + 624 + if (!rdt_mon_features) 625 + return false; 626 + 627 + return !rdt_get_mon_l3_config(&rdt_resources_all[RDT_RESOURCE_L3]); 628 + } 629 + 630 + static __init void rdt_quirks(void) 631 + { 632 + switch (boot_cpu_data.x86_model) { 633 + case INTEL_FAM6_HASWELL_X: 634 + if (!rdt_options[RDT_FLAG_L3_CAT].force_off) 635 + cache_alloc_hsw_probe(); 636 + break; 637 + case INTEL_FAM6_SKYLAKE_X: 638 + if (boot_cpu_data.x86_mask <= 4) 639 + set_rdt_options("!cmt,!mbmtotal,!mbmlocal,!l3cat"); 640 + } 641 + } 642 + 643 + static __init bool get_rdt_resources(void) 644 + { 645 + rdt_quirks(); 646 + rdt_alloc_capable = get_rdt_alloc_resources(); 647 + rdt_mon_capable = get_rdt_mon_resources(); 648 + 649 + return (rdt_mon_capable || rdt_alloc_capable); 650 } 651 652 static int __init intel_rdt_late_init(void) ··· 556 return ret; 557 } 558 559 + for_each_alloc_capable_rdt_resource(r) 560 pr_info("Intel RDT %s allocation detected\n", r->name); 561 + 562 + for_each_mon_capable_rdt_resource(r) 563 + pr_info("Intel RDT %s monitoring detected\n", r->name); 564 565 return 0; 566 }
+440
arch/x86/kernel/cpu/intel_rdt.h
···
··· 1 + #ifndef _ASM_X86_INTEL_RDT_H 2 + #define _ASM_X86_INTEL_RDT_H 3 + 4 + #include <linux/sched.h> 5 + #include <linux/kernfs.h> 6 + #include <linux/jump_label.h> 7 + 8 + #define IA32_L3_QOS_CFG 0xc81 9 + #define IA32_L3_CBM_BASE 0xc90 10 + #define IA32_L2_CBM_BASE 0xd10 11 + #define IA32_MBA_THRTL_BASE 0xd50 12 + 13 + #define L3_QOS_CDP_ENABLE 0x01ULL 14 + 15 + /* 16 + * Event IDs are used to program IA32_QM_EVTSEL before reading event 17 + * counter from IA32_QM_CTR 18 + */ 19 + #define QOS_L3_OCCUP_EVENT_ID 0x01 20 + #define QOS_L3_MBM_TOTAL_EVENT_ID 0x02 21 + #define QOS_L3_MBM_LOCAL_EVENT_ID 0x03 22 + 23 + #define CQM_LIMBOCHECK_INTERVAL 1000 24 + 25 + #define MBM_CNTR_WIDTH 24 26 + #define MBM_OVERFLOW_INTERVAL 1000 27 + 28 + #define RMID_VAL_ERROR BIT_ULL(63) 29 + #define RMID_VAL_UNAVAIL BIT_ULL(62) 30 + 31 + DECLARE_STATIC_KEY_FALSE(rdt_enable_key); 32 + 33 + /** 34 + * struct mon_evt - Entry in the event list of a resource 35 + * @evtid: event id 36 + * @name: name of the event 37 + */ 38 + struct mon_evt { 39 + u32 evtid; 40 + char *name; 41 + struct list_head list; 42 + }; 43 + 44 + /** 45 + * struct mon_data_bits - Monitoring details for each event file 46 + * @rid: Resource id associated with the event file. 47 + * @evtid: Event id associated with the event file 48 + * @domid: The domain to which the event file belongs 49 + */ 50 + union mon_data_bits { 51 + void *priv; 52 + struct { 53 + unsigned int rid : 10; 54 + unsigned int evtid : 8; 55 + unsigned int domid : 14; 56 + } u; 57 + }; 58 + 59 + struct rmid_read { 60 + struct rdtgroup *rgrp; 61 + struct rdt_domain *d; 62 + int evtid; 63 + bool first; 64 + u64 val; 65 + }; 66 + 67 + extern unsigned int intel_cqm_threshold; 68 + extern bool rdt_alloc_capable; 69 + extern bool rdt_mon_capable; 70 + extern unsigned int rdt_mon_features; 71 + 72 + enum rdt_group_type { 73 + RDTCTRL_GROUP = 0, 74 + RDTMON_GROUP, 75 + RDT_NUM_GROUP, 76 + }; 77 + 78 + /** 79 + * struct mongroup - store mon group's data in resctrl fs. 80 + * @mon_data_kn kernlfs node for the mon_data directory 81 + * @parent: parent rdtgrp 82 + * @crdtgrp_list: child rdtgroup node list 83 + * @rmid: rmid for this rdtgroup 84 + */ 85 + struct mongroup { 86 + struct kernfs_node *mon_data_kn; 87 + struct rdtgroup *parent; 88 + struct list_head crdtgrp_list; 89 + u32 rmid; 90 + }; 91 + 92 + /** 93 + * struct rdtgroup - store rdtgroup's data in resctrl file system. 94 + * @kn: kernfs node 95 + * @rdtgroup_list: linked list for all rdtgroups 96 + * @closid: closid for this rdtgroup 97 + * @cpu_mask: CPUs assigned to this rdtgroup 98 + * @flags: status bits 99 + * @waitcount: how many cpus expect to find this 100 + * group when they acquire rdtgroup_mutex 101 + * @type: indicates type of this rdtgroup - either 102 + * monitor only or ctrl_mon group 103 + * @mon: mongroup related data 104 + */ 105 + struct rdtgroup { 106 + struct kernfs_node *kn; 107 + struct list_head rdtgroup_list; 108 + u32 closid; 109 + struct cpumask cpu_mask; 110 + int flags; 111 + atomic_t waitcount; 112 + enum rdt_group_type type; 113 + struct mongroup mon; 114 + }; 115 + 116 + /* rdtgroup.flags */ 117 + #define RDT_DELETED 1 118 + 119 + /* rftype.flags */ 120 + #define RFTYPE_FLAGS_CPUS_LIST 1 121 + 122 + /* 123 + * Define the file type flags for base and info directories. 124 + */ 125 + #define RFTYPE_INFO BIT(0) 126 + #define RFTYPE_BASE BIT(1) 127 + #define RF_CTRLSHIFT 4 128 + #define RF_MONSHIFT 5 129 + #define RFTYPE_CTRL BIT(RF_CTRLSHIFT) 130 + #define RFTYPE_MON BIT(RF_MONSHIFT) 131 + #define RFTYPE_RES_CACHE BIT(8) 132 + #define RFTYPE_RES_MB BIT(9) 133 + #define RF_CTRL_INFO (RFTYPE_INFO | RFTYPE_CTRL) 134 + #define RF_MON_INFO (RFTYPE_INFO | RFTYPE_MON) 135 + #define RF_CTRL_BASE (RFTYPE_BASE | RFTYPE_CTRL) 136 + 137 + /* List of all resource groups */ 138 + extern struct list_head rdt_all_groups; 139 + 140 + extern int max_name_width, max_data_width; 141 + 142 + int __init rdtgroup_init(void); 143 + 144 + /** 145 + * struct rftype - describe each file in the resctrl file system 146 + * @name: File name 147 + * @mode: Access mode 148 + * @kf_ops: File operations 149 + * @flags: File specific RFTYPE_FLAGS_* flags 150 + * @fflags: File specific RF_* or RFTYPE_* flags 151 + * @seq_show: Show content of the file 152 + * @write: Write to the file 153 + */ 154 + struct rftype { 155 + char *name; 156 + umode_t mode; 157 + struct kernfs_ops *kf_ops; 158 + unsigned long flags; 159 + unsigned long fflags; 160 + 161 + int (*seq_show)(struct kernfs_open_file *of, 162 + struct seq_file *sf, void *v); 163 + /* 164 + * write() is the generic write callback which maps directly to 165 + * kernfs write operation and overrides all other operations. 166 + * Maximum write size is determined by ->max_write_len. 167 + */ 168 + ssize_t (*write)(struct kernfs_open_file *of, 169 + char *buf, size_t nbytes, loff_t off); 170 + }; 171 + 172 + /** 173 + * struct mbm_state - status for each MBM counter in each domain 174 + * @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes) 175 + * @prev_msr Value of IA32_QM_CTR for this RMID last time we read it 176 + */ 177 + struct mbm_state { 178 + u64 chunks; 179 + u64 prev_msr; 180 + }; 181 + 182 + /** 183 + * struct rdt_domain - group of cpus sharing an RDT resource 184 + * @list: all instances of this resource 185 + * @id: unique id for this instance 186 + * @cpu_mask: which cpus share this resource 187 + * @rmid_busy_llc: 188 + * bitmap of which limbo RMIDs are above threshold 189 + * @mbm_total: saved state for MBM total bandwidth 190 + * @mbm_local: saved state for MBM local bandwidth 191 + * @mbm_over: worker to periodically read MBM h/w counters 192 + * @cqm_limbo: worker to periodically read CQM h/w counters 193 + * @mbm_work_cpu: 194 + * worker cpu for MBM h/w counters 195 + * @cqm_work_cpu: 196 + * worker cpu for CQM h/w counters 197 + * @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID) 198 + * @new_ctrl: new ctrl value to be loaded 199 + * @have_new_ctrl: did user provide new_ctrl for this domain 200 + */ 201 + struct rdt_domain { 202 + struct list_head list; 203 + int id; 204 + struct cpumask cpu_mask; 205 + unsigned long *rmid_busy_llc; 206 + struct mbm_state *mbm_total; 207 + struct mbm_state *mbm_local; 208 + struct delayed_work mbm_over; 209 + struct delayed_work cqm_limbo; 210 + int mbm_work_cpu; 211 + int cqm_work_cpu; 212 + u32 *ctrl_val; 213 + u32 new_ctrl; 214 + bool have_new_ctrl; 215 + }; 216 + 217 + /** 218 + * struct msr_param - set a range of MSRs from a domain 219 + * @res: The resource to use 220 + * @low: Beginning index from base MSR 221 + * @high: End index 222 + */ 223 + struct msr_param { 224 + struct rdt_resource *res; 225 + int low; 226 + int high; 227 + }; 228 + 229 + /** 230 + * struct rdt_cache - Cache allocation related data 231 + * @cbm_len: Length of the cache bit mask 232 + * @min_cbm_bits: Minimum number of consecutive bits to be set 233 + * @cbm_idx_mult: Multiplier of CBM index 234 + * @cbm_idx_offset: Offset of CBM index. CBM index is computed by: 235 + * closid * cbm_idx_multi + cbm_idx_offset 236 + * in a cache bit mask 237 + * @shareable_bits: Bitmask of shareable resource with other 238 + * executing entities 239 + */ 240 + struct rdt_cache { 241 + unsigned int cbm_len; 242 + unsigned int min_cbm_bits; 243 + unsigned int cbm_idx_mult; 244 + unsigned int cbm_idx_offset; 245 + unsigned int shareable_bits; 246 + }; 247 + 248 + /** 249 + * struct rdt_membw - Memory bandwidth allocation related data 250 + * @max_delay: Max throttle delay. Delay is the hardware 251 + * representation for memory bandwidth. 252 + * @min_bw: Minimum memory bandwidth percentage user can request 253 + * @bw_gran: Granularity at which the memory bandwidth is allocated 254 + * @delay_linear: True if memory B/W delay is in linear scale 255 + * @mb_map: Mapping of memory B/W percentage to memory B/W delay 256 + */ 257 + struct rdt_membw { 258 + u32 max_delay; 259 + u32 min_bw; 260 + u32 bw_gran; 261 + u32 delay_linear; 262 + u32 *mb_map; 263 + }; 264 + 265 + static inline bool is_llc_occupancy_enabled(void) 266 + { 267 + return (rdt_mon_features & (1 << QOS_L3_OCCUP_EVENT_ID)); 268 + } 269 + 270 + static inline bool is_mbm_total_enabled(void) 271 + { 272 + return (rdt_mon_features & (1 << QOS_L3_MBM_TOTAL_EVENT_ID)); 273 + } 274 + 275 + static inline bool is_mbm_local_enabled(void) 276 + { 277 + return (rdt_mon_features & (1 << QOS_L3_MBM_LOCAL_EVENT_ID)); 278 + } 279 + 280 + static inline bool is_mbm_enabled(void) 281 + { 282 + return (is_mbm_total_enabled() || is_mbm_local_enabled()); 283 + } 284 + 285 + static inline bool is_mbm_event(int e) 286 + { 287 + return (e >= QOS_L3_MBM_TOTAL_EVENT_ID && 288 + e <= QOS_L3_MBM_LOCAL_EVENT_ID); 289 + } 290 + 291 + /** 292 + * struct rdt_resource - attributes of an RDT resource 293 + * @rid: The index of the resource 294 + * @alloc_enabled: Is allocation enabled on this machine 295 + * @mon_enabled: Is monitoring enabled for this feature 296 + * @alloc_capable: Is allocation available on this machine 297 + * @mon_capable: Is monitor feature available on this machine 298 + * @name: Name to use in "schemata" file 299 + * @num_closid: Number of CLOSIDs available 300 + * @cache_level: Which cache level defines scope of this resource 301 + * @default_ctrl: Specifies default cache cbm or memory B/W percent. 302 + * @msr_base: Base MSR address for CBMs 303 + * @msr_update: Function pointer to update QOS MSRs 304 + * @data_width: Character width of data when displaying 305 + * @domains: All domains for this resource 306 + * @cache: Cache allocation related data 307 + * @format_str: Per resource format string to show domain value 308 + * @parse_ctrlval: Per resource function pointer to parse control values 309 + * @evt_list: List of monitoring events 310 + * @num_rmid: Number of RMIDs available 311 + * @mon_scale: cqm counter * mon_scale = occupancy in bytes 312 + * @fflags: flags to choose base and info files 313 + */ 314 + struct rdt_resource { 315 + int rid; 316 + bool alloc_enabled; 317 + bool mon_enabled; 318 + bool alloc_capable; 319 + bool mon_capable; 320 + char *name; 321 + int num_closid; 322 + int cache_level; 323 + u32 default_ctrl; 324 + unsigned int msr_base; 325 + void (*msr_update) (struct rdt_domain *d, struct msr_param *m, 326 + struct rdt_resource *r); 327 + int data_width; 328 + struct list_head domains; 329 + struct rdt_cache cache; 330 + struct rdt_membw membw; 331 + const char *format_str; 332 + int (*parse_ctrlval) (char *buf, struct rdt_resource *r, 333 + struct rdt_domain *d); 334 + struct list_head evt_list; 335 + int num_rmid; 336 + unsigned int mon_scale; 337 + unsigned long fflags; 338 + }; 339 + 340 + int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d); 341 + int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d); 342 + 343 + extern struct mutex rdtgroup_mutex; 344 + 345 + extern struct rdt_resource rdt_resources_all[]; 346 + extern struct rdtgroup rdtgroup_default; 347 + DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key); 348 + 349 + int __init rdtgroup_init(void); 350 + 351 + enum { 352 + RDT_RESOURCE_L3, 353 + RDT_RESOURCE_L3DATA, 354 + RDT_RESOURCE_L3CODE, 355 + RDT_RESOURCE_L2, 356 + RDT_RESOURCE_MBA, 357 + 358 + /* Must be the last */ 359 + RDT_NUM_RESOURCES, 360 + }; 361 + 362 + #define for_each_capable_rdt_resource(r) \ 363 + for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\ 364 + r++) \ 365 + if (r->alloc_capable || r->mon_capable) 366 + 367 + #define for_each_alloc_capable_rdt_resource(r) \ 368 + for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\ 369 + r++) \ 370 + if (r->alloc_capable) 371 + 372 + #define for_each_mon_capable_rdt_resource(r) \ 373 + for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\ 374 + r++) \ 375 + if (r->mon_capable) 376 + 377 + #define for_each_alloc_enabled_rdt_resource(r) \ 378 + for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\ 379 + r++) \ 380 + if (r->alloc_enabled) 381 + 382 + #define for_each_mon_enabled_rdt_resource(r) \ 383 + for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\ 384 + r++) \ 385 + if (r->mon_enabled) 386 + 387 + /* CPUID.(EAX=10H, ECX=ResID=1).EAX */ 388 + union cpuid_0x10_1_eax { 389 + struct { 390 + unsigned int cbm_len:5; 391 + } split; 392 + unsigned int full; 393 + }; 394 + 395 + /* CPUID.(EAX=10H, ECX=ResID=3).EAX */ 396 + union cpuid_0x10_3_eax { 397 + struct { 398 + unsigned int max_delay:12; 399 + } split; 400 + unsigned int full; 401 + }; 402 + 403 + /* CPUID.(EAX=10H, ECX=ResID).EDX */ 404 + union cpuid_0x10_x_edx { 405 + struct { 406 + unsigned int cos_max:16; 407 + } split; 408 + unsigned int full; 409 + }; 410 + 411 + void rdt_ctrl_update(void *arg); 412 + struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn); 413 + void rdtgroup_kn_unlock(struct kernfs_node *kn); 414 + struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, 415 + struct list_head **pos); 416 + ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of, 417 + char *buf, size_t nbytes, loff_t off); 418 + int rdtgroup_schemata_show(struct kernfs_open_file *of, 419 + struct seq_file *s, void *v); 420 + struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r); 421 + int alloc_rmid(void); 422 + void free_rmid(u32 rmid); 423 + int rdt_get_mon_l3_config(struct rdt_resource *r); 424 + void mon_event_count(void *info); 425 + int rdtgroup_mondata_show(struct seq_file *m, void *arg); 426 + void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, 427 + unsigned int dom_id); 428 + void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, 429 + struct rdt_domain *d); 430 + void mon_event_read(struct rmid_read *rr, struct rdt_domain *d, 431 + struct rdtgroup *rdtgrp, int evtid, int first); 432 + void mbm_setup_overflow_handler(struct rdt_domain *dom, 433 + unsigned long delay_ms); 434 + void mbm_handle_overflow(struct work_struct *work); 435 + void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms); 436 + void cqm_handle_limbo(struct work_struct *work); 437 + bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d); 438 + void __check_limbo(struct rdt_domain *d, bool force_free); 439 + 440 + #endif /* _ASM_X86_INTEL_RDT_H */
+499
arch/x86/kernel/cpu/intel_rdt_monitor.c
···
··· 1 + /* 2 + * Resource Director Technology(RDT) 3 + * - Monitoring code 4 + * 5 + * Copyright (C) 2017 Intel Corporation 6 + * 7 + * Author: 8 + * Vikas Shivappa <vikas.shivappa@intel.com> 9 + * 10 + * This replaces the cqm.c based on perf but we reuse a lot of 11 + * code and datastructures originally from Peter Zijlstra and Matt Fleming. 12 + * 13 + * This program is free software; you can redistribute it and/or modify it 14 + * under the terms and conditions of the GNU General Public License, 15 + * version 2, as published by the Free Software Foundation. 16 + * 17 + * This program is distributed in the hope it will be useful, but WITHOUT 18 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 19 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for 20 + * more details. 21 + * 22 + * More information about RDT be found in the Intel (R) x86 Architecture 23 + * Software Developer Manual June 2016, volume 3, section 17.17. 24 + */ 25 + 26 + #include <linux/module.h> 27 + #include <linux/slab.h> 28 + #include <asm/cpu_device_id.h> 29 + #include "intel_rdt.h" 30 + 31 + #define MSR_IA32_QM_CTR 0x0c8e 32 + #define MSR_IA32_QM_EVTSEL 0x0c8d 33 + 34 + struct rmid_entry { 35 + u32 rmid; 36 + int busy; 37 + struct list_head list; 38 + }; 39 + 40 + /** 41 + * @rmid_free_lru A least recently used list of free RMIDs 42 + * These RMIDs are guaranteed to have an occupancy less than the 43 + * threshold occupancy 44 + */ 45 + static LIST_HEAD(rmid_free_lru); 46 + 47 + /** 48 + * @rmid_limbo_count count of currently unused but (potentially) 49 + * dirty RMIDs. 50 + * This counts RMIDs that no one is currently using but that 51 + * may have a occupancy value > intel_cqm_threshold. User can change 52 + * the threshold occupancy value. 53 + */ 54 + unsigned int rmid_limbo_count; 55 + 56 + /** 57 + * @rmid_entry - The entry in the limbo and free lists. 58 + */ 59 + static struct rmid_entry *rmid_ptrs; 60 + 61 + /* 62 + * Global boolean for rdt_monitor which is true if any 63 + * resource monitoring is enabled. 64 + */ 65 + bool rdt_mon_capable; 66 + 67 + /* 68 + * Global to indicate which monitoring events are enabled. 69 + */ 70 + unsigned int rdt_mon_features; 71 + 72 + /* 73 + * This is the threshold cache occupancy at which we will consider an 74 + * RMID available for re-allocation. 75 + */ 76 + unsigned int intel_cqm_threshold; 77 + 78 + static inline struct rmid_entry *__rmid_entry(u32 rmid) 79 + { 80 + struct rmid_entry *entry; 81 + 82 + entry = &rmid_ptrs[rmid]; 83 + WARN_ON(entry->rmid != rmid); 84 + 85 + return entry; 86 + } 87 + 88 + static u64 __rmid_read(u32 rmid, u32 eventid) 89 + { 90 + u64 val; 91 + 92 + /* 93 + * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured 94 + * with a valid event code for supported resource type and the bits 95 + * IA32_QM_EVTSEL.RMID (bits 41:32) are configured with valid RMID, 96 + * IA32_QM_CTR.data (bits 61:0) reports the monitored data. 97 + * IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62) 98 + * are error bits. 99 + */ 100 + wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid); 101 + rdmsrl(MSR_IA32_QM_CTR, val); 102 + 103 + return val; 104 + } 105 + 106 + static bool rmid_dirty(struct rmid_entry *entry) 107 + { 108 + u64 val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID); 109 + 110 + return val >= intel_cqm_threshold; 111 + } 112 + 113 + /* 114 + * Check the RMIDs that are marked as busy for this domain. If the 115 + * reported LLC occupancy is below the threshold clear the busy bit and 116 + * decrement the count. If the busy count gets to zero on an RMID, we 117 + * free the RMID 118 + */ 119 + void __check_limbo(struct rdt_domain *d, bool force_free) 120 + { 121 + struct rmid_entry *entry; 122 + struct rdt_resource *r; 123 + u32 crmid = 1, nrmid; 124 + 125 + r = &rdt_resources_all[RDT_RESOURCE_L3]; 126 + 127 + /* 128 + * Skip RMID 0 and start from RMID 1 and check all the RMIDs that 129 + * are marked as busy for occupancy < threshold. If the occupancy 130 + * is less than the threshold decrement the busy counter of the 131 + * RMID and move it to the free list when the counter reaches 0. 132 + */ 133 + for (;;) { 134 + nrmid = find_next_bit(d->rmid_busy_llc, r->num_rmid, crmid); 135 + if (nrmid >= r->num_rmid) 136 + break; 137 + 138 + entry = __rmid_entry(nrmid); 139 + if (force_free || !rmid_dirty(entry)) { 140 + clear_bit(entry->rmid, d->rmid_busy_llc); 141 + if (!--entry->busy) { 142 + rmid_limbo_count--; 143 + list_add_tail(&entry->list, &rmid_free_lru); 144 + } 145 + } 146 + crmid = nrmid + 1; 147 + } 148 + } 149 + 150 + bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d) 151 + { 152 + return find_first_bit(d->rmid_busy_llc, r->num_rmid) != r->num_rmid; 153 + } 154 + 155 + /* 156 + * As of now the RMIDs allocation is global. 157 + * However we keep track of which packages the RMIDs 158 + * are used to optimize the limbo list management. 159 + */ 160 + int alloc_rmid(void) 161 + { 162 + struct rmid_entry *entry; 163 + 164 + lockdep_assert_held(&rdtgroup_mutex); 165 + 166 + if (list_empty(&rmid_free_lru)) 167 + return rmid_limbo_count ? -EBUSY : -ENOSPC; 168 + 169 + entry = list_first_entry(&rmid_free_lru, 170 + struct rmid_entry, list); 171 + list_del(&entry->list); 172 + 173 + return entry->rmid; 174 + } 175 + 176 + static void add_rmid_to_limbo(struct rmid_entry *entry) 177 + { 178 + struct rdt_resource *r; 179 + struct rdt_domain *d; 180 + int cpu; 181 + u64 val; 182 + 183 + r = &rdt_resources_all[RDT_RESOURCE_L3]; 184 + 185 + entry->busy = 0; 186 + cpu = get_cpu(); 187 + list_for_each_entry(d, &r->domains, list) { 188 + if (cpumask_test_cpu(cpu, &d->cpu_mask)) { 189 + val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID); 190 + if (val <= intel_cqm_threshold) 191 + continue; 192 + } 193 + 194 + /* 195 + * For the first limbo RMID in the domain, 196 + * setup up the limbo worker. 197 + */ 198 + if (!has_busy_rmid(r, d)) 199 + cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL); 200 + set_bit(entry->rmid, d->rmid_busy_llc); 201 + entry->busy++; 202 + } 203 + put_cpu(); 204 + 205 + if (entry->busy) 206 + rmid_limbo_count++; 207 + else 208 + list_add_tail(&entry->list, &rmid_free_lru); 209 + } 210 + 211 + void free_rmid(u32 rmid) 212 + { 213 + struct rmid_entry *entry; 214 + 215 + if (!rmid) 216 + return; 217 + 218 + lockdep_assert_held(&rdtgroup_mutex); 219 + 220 + entry = __rmid_entry(rmid); 221 + 222 + if (is_llc_occupancy_enabled()) 223 + add_rmid_to_limbo(entry); 224 + else 225 + list_add_tail(&entry->list, &rmid_free_lru); 226 + } 227 + 228 + static int __mon_event_count(u32 rmid, struct rmid_read *rr) 229 + { 230 + u64 chunks, shift, tval; 231 + struct mbm_state *m; 232 + 233 + tval = __rmid_read(rmid, rr->evtid); 234 + if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) { 235 + rr->val = tval; 236 + return -EINVAL; 237 + } 238 + switch (rr->evtid) { 239 + case QOS_L3_OCCUP_EVENT_ID: 240 + rr->val += tval; 241 + return 0; 242 + case QOS_L3_MBM_TOTAL_EVENT_ID: 243 + m = &rr->d->mbm_total[rmid]; 244 + break; 245 + case QOS_L3_MBM_LOCAL_EVENT_ID: 246 + m = &rr->d->mbm_local[rmid]; 247 + break; 248 + default: 249 + /* 250 + * Code would never reach here because 251 + * an invalid event id would fail the __rmid_read. 252 + */ 253 + return -EINVAL; 254 + } 255 + 256 + if (rr->first) { 257 + m->prev_msr = tval; 258 + m->chunks = 0; 259 + return 0; 260 + } 261 + 262 + shift = 64 - MBM_CNTR_WIDTH; 263 + chunks = (tval << shift) - (m->prev_msr << shift); 264 + chunks >>= shift; 265 + m->chunks += chunks; 266 + m->prev_msr = tval; 267 + 268 + rr->val += m->chunks; 269 + return 0; 270 + } 271 + 272 + /* 273 + * This is called via IPI to read the CQM/MBM counters 274 + * on a domain. 275 + */ 276 + void mon_event_count(void *info) 277 + { 278 + struct rdtgroup *rdtgrp, *entry; 279 + struct rmid_read *rr = info; 280 + struct list_head *head; 281 + 282 + rdtgrp = rr->rgrp; 283 + 284 + if (__mon_event_count(rdtgrp->mon.rmid, rr)) 285 + return; 286 + 287 + /* 288 + * For Ctrl groups read data from child monitor groups. 289 + */ 290 + head = &rdtgrp->mon.crdtgrp_list; 291 + 292 + if (rdtgrp->type == RDTCTRL_GROUP) { 293 + list_for_each_entry(entry, head, mon.crdtgrp_list) { 294 + if (__mon_event_count(entry->mon.rmid, rr)) 295 + return; 296 + } 297 + } 298 + } 299 + 300 + static void mbm_update(struct rdt_domain *d, int rmid) 301 + { 302 + struct rmid_read rr; 303 + 304 + rr.first = false; 305 + rr.d = d; 306 + 307 + /* 308 + * This is protected from concurrent reads from user 309 + * as both the user and we hold the global mutex. 310 + */ 311 + if (is_mbm_total_enabled()) { 312 + rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID; 313 + __mon_event_count(rmid, &rr); 314 + } 315 + if (is_mbm_local_enabled()) { 316 + rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID; 317 + __mon_event_count(rmid, &rr); 318 + } 319 + } 320 + 321 + /* 322 + * Handler to scan the limbo list and move the RMIDs 323 + * to free list whose occupancy < threshold_occupancy. 324 + */ 325 + void cqm_handle_limbo(struct work_struct *work) 326 + { 327 + unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL); 328 + int cpu = smp_processor_id(); 329 + struct rdt_resource *r; 330 + struct rdt_domain *d; 331 + 332 + mutex_lock(&rdtgroup_mutex); 333 + 334 + r = &rdt_resources_all[RDT_RESOURCE_L3]; 335 + d = get_domain_from_cpu(cpu, r); 336 + 337 + if (!d) { 338 + pr_warn_once("Failure to get domain for limbo worker\n"); 339 + goto out_unlock; 340 + } 341 + 342 + __check_limbo(d, false); 343 + 344 + if (has_busy_rmid(r, d)) 345 + schedule_delayed_work_on(cpu, &d->cqm_limbo, delay); 346 + 347 + out_unlock: 348 + mutex_unlock(&rdtgroup_mutex); 349 + } 350 + 351 + void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms) 352 + { 353 + unsigned long delay = msecs_to_jiffies(delay_ms); 354 + struct rdt_resource *r; 355 + int cpu; 356 + 357 + r = &rdt_resources_all[RDT_RESOURCE_L3]; 358 + 359 + cpu = cpumask_any(&dom->cpu_mask); 360 + dom->cqm_work_cpu = cpu; 361 + 362 + schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay); 363 + } 364 + 365 + void mbm_handle_overflow(struct work_struct *work) 366 + { 367 + unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL); 368 + struct rdtgroup *prgrp, *crgrp; 369 + int cpu = smp_processor_id(); 370 + struct list_head *head; 371 + struct rdt_domain *d; 372 + 373 + mutex_lock(&rdtgroup_mutex); 374 + 375 + if (!static_branch_likely(&rdt_enable_key)) 376 + goto out_unlock; 377 + 378 + d = get_domain_from_cpu(cpu, &rdt_resources_all[RDT_RESOURCE_L3]); 379 + if (!d) 380 + goto out_unlock; 381 + 382 + list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) { 383 + mbm_update(d, prgrp->mon.rmid); 384 + 385 + head = &prgrp->mon.crdtgrp_list; 386 + list_for_each_entry(crgrp, head, mon.crdtgrp_list) 387 + mbm_update(d, crgrp->mon.rmid); 388 + } 389 + 390 + schedule_delayed_work_on(cpu, &d->mbm_over, delay); 391 + 392 + out_unlock: 393 + mutex_unlock(&rdtgroup_mutex); 394 + } 395 + 396 + void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms) 397 + { 398 + unsigned long delay = msecs_to_jiffies(delay_ms); 399 + int cpu; 400 + 401 + if (!static_branch_likely(&rdt_enable_key)) 402 + return; 403 + cpu = cpumask_any(&dom->cpu_mask); 404 + dom->mbm_work_cpu = cpu; 405 + schedule_delayed_work_on(cpu, &dom->mbm_over, delay); 406 + } 407 + 408 + static int dom_data_init(struct rdt_resource *r) 409 + { 410 + struct rmid_entry *entry = NULL; 411 + int i, nr_rmids; 412 + 413 + nr_rmids = r->num_rmid; 414 + rmid_ptrs = kcalloc(nr_rmids, sizeof(struct rmid_entry), GFP_KERNEL); 415 + if (!rmid_ptrs) 416 + return -ENOMEM; 417 + 418 + for (i = 0; i < nr_rmids; i++) { 419 + entry = &rmid_ptrs[i]; 420 + INIT_LIST_HEAD(&entry->list); 421 + 422 + entry->rmid = i; 423 + list_add_tail(&entry->list, &rmid_free_lru); 424 + } 425 + 426 + /* 427 + * RMID 0 is special and is always allocated. It's used for all 428 + * tasks that are not monitored. 429 + */ 430 + entry = __rmid_entry(0); 431 + list_del(&entry->list); 432 + 433 + return 0; 434 + } 435 + 436 + static struct mon_evt llc_occupancy_event = { 437 + .name = "llc_occupancy", 438 + .evtid = QOS_L3_OCCUP_EVENT_ID, 439 + }; 440 + 441 + static struct mon_evt mbm_total_event = { 442 + .name = "mbm_total_bytes", 443 + .evtid = QOS_L3_MBM_TOTAL_EVENT_ID, 444 + }; 445 + 446 + static struct mon_evt mbm_local_event = { 447 + .name = "mbm_local_bytes", 448 + .evtid = QOS_L3_MBM_LOCAL_EVENT_ID, 449 + }; 450 + 451 + /* 452 + * Initialize the event list for the resource. 453 + * 454 + * Note that MBM events are also part of RDT_RESOURCE_L3 resource 455 + * because as per the SDM the total and local memory bandwidth 456 + * are enumerated as part of L3 monitoring. 457 + */ 458 + static void l3_mon_evt_init(struct rdt_resource *r) 459 + { 460 + INIT_LIST_HEAD(&r->evt_list); 461 + 462 + if (is_llc_occupancy_enabled()) 463 + list_add_tail(&llc_occupancy_event.list, &r->evt_list); 464 + if (is_mbm_total_enabled()) 465 + list_add_tail(&mbm_total_event.list, &r->evt_list); 466 + if (is_mbm_local_enabled()) 467 + list_add_tail(&mbm_local_event.list, &r->evt_list); 468 + } 469 + 470 + int rdt_get_mon_l3_config(struct rdt_resource *r) 471 + { 472 + int ret; 473 + 474 + r->mon_scale = boot_cpu_data.x86_cache_occ_scale; 475 + r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1; 476 + 477 + /* 478 + * A reasonable upper limit on the max threshold is the number 479 + * of lines tagged per RMID if all RMIDs have the same number of 480 + * lines tagged in the LLC. 481 + * 482 + * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC. 483 + */ 484 + intel_cqm_threshold = boot_cpu_data.x86_cache_size * 1024 / r->num_rmid; 485 + 486 + /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */ 487 + intel_cqm_threshold /= r->mon_scale; 488 + 489 + ret = dom_data_init(r); 490 + if (ret) 491 + return ret; 492 + 493 + l3_mon_evt_init(r); 494 + 495 + r->mon_capable = true; 496 + r->mon_enabled = true; 497 + 498 + return 0; 499 + }
+930 -211
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
··· 32 33 #include <uapi/linux/magic.h> 34 35 - #include <asm/intel_rdt.h> 36 - #include <asm/intel_rdt_common.h> 37 38 DEFINE_STATIC_KEY_FALSE(rdt_enable_key); 39 - struct kernfs_root *rdt_root; 40 struct rdtgroup rdtgroup_default; 41 LIST_HEAD(rdt_all_groups); 42 43 /* Kernel fs node for "info" directory under root */ 44 static struct kernfs_node *kn_info; 45 46 /* 47 * Trivial allocator for CLOSIDs. Since h/w only supports a small number, ··· 74 int rdt_min_closid = 32; 75 76 /* Compute rdt_min_closid across all resources */ 77 - for_each_enabled_rdt_resource(r) 78 rdt_min_closid = min(rdt_min_closid, r->num_closid); 79 80 closid_free_map = BIT_MASK(rdt_min_closid) - 1; ··· 83 closid_free_map &= ~1; 84 } 85 86 - int closid_alloc(void) 87 { 88 - int closid = ffs(closid_free_map); 89 90 if (closid == 0) 91 return -ENOSPC; ··· 133 return 0; 134 } 135 136 - static int rdtgroup_add_files(struct kernfs_node *kn, struct rftype *rfts, 137 - int len) 138 - { 139 - struct rftype *rft; 140 - int ret; 141 - 142 - lockdep_assert_held(&rdtgroup_mutex); 143 - 144 - for (rft = rfts; rft < rfts + len; rft++) { 145 - ret = rdtgroup_add_file(kn, rft); 146 - if (ret) 147 - goto error; 148 - } 149 - 150 - return 0; 151 - error: 152 - pr_warn("Failed to add %s, err=%d\n", rft->name, ret); 153 - while (--rft >= rfts) 154 - kernfs_remove_by_name(kn, rft->name); 155 - return ret; 156 - } 157 - 158 static int rdtgroup_seqfile_show(struct seq_file *m, void *arg) 159 { 160 struct kernfs_open_file *of = m->private; ··· 158 .atomic_write_len = PAGE_SIZE, 159 .write = rdtgroup_file_write, 160 .seq_show = rdtgroup_seqfile_show, 161 }; 162 163 static bool is_cpu_list(struct kernfs_open_file *of) ··· 194 /* 195 * This is safe against intel_rdt_sched_in() called from __switch_to() 196 * because __switch_to() is executed with interrupts disabled. A local call 197 - * from rdt_update_closid() is proteced against __switch_to() because 198 * preemption is disabled. 199 */ 200 - static void rdt_update_cpu_closid(void *closid) 201 { 202 - if (closid) 203 - this_cpu_write(cpu_closid, *(int *)closid); 204 /* 205 * We cannot unconditionally write the MSR because the current 206 * executing task might have its own closid selected. Just reuse ··· 217 /* 218 * Update the PGR_ASSOC MSR on all cpus in @cpu_mask, 219 * 220 - * Per task closids must have been set up before calling this function. 221 - * 222 - * The per cpu closids are updated with the smp function call, when @closid 223 - * is not NULL. If @closid is NULL then all affected percpu closids must 224 - * have been set up before calling this function. 225 */ 226 static void 227 - rdt_update_closid(const struct cpumask *cpu_mask, int *closid) 228 { 229 int cpu = get_cpu(); 230 231 if (cpumask_test_cpu(cpu, cpu_mask)) 232 - rdt_update_cpu_closid(closid); 233 - smp_call_function_many(cpu_mask, rdt_update_cpu_closid, closid, 1); 234 put_cpu(); 235 } 236 237 static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of, 238 char *buf, size_t nbytes, loff_t off) 239 { 240 - cpumask_var_t tmpmask, newmask; 241 - struct rdtgroup *rdtgrp, *r; 242 int ret; 243 244 if (!buf) ··· 348 return -ENOMEM; 349 if (!zalloc_cpumask_var(&newmask, GFP_KERNEL)) { 350 free_cpumask_var(tmpmask); 351 return -ENOMEM; 352 } 353 ··· 377 goto unlock; 378 } 379 380 - /* Check whether cpus are dropped from this group */ 381 - cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask); 382 - if (cpumask_weight(tmpmask)) { 383 - /* Can't drop from default group */ 384 - if (rdtgrp == &rdtgroup_default) { 385 - ret = -EINVAL; 386 - goto unlock; 387 - } 388 - /* Give any dropped cpus to rdtgroup_default */ 389 - cpumask_or(&rdtgroup_default.cpu_mask, 390 - &rdtgroup_default.cpu_mask, tmpmask); 391 - rdt_update_closid(tmpmask, &rdtgroup_default.closid); 392 - } 393 - 394 - /* 395 - * If we added cpus, remove them from previous group that owned them 396 - * and update per-cpu closid 397 - */ 398 - cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask); 399 - if (cpumask_weight(tmpmask)) { 400 - list_for_each_entry(r, &rdt_all_groups, rdtgroup_list) { 401 - if (r == rdtgrp) 402 - continue; 403 - cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask); 404 - } 405 - rdt_update_closid(tmpmask, &rdtgrp->closid); 406 - } 407 - 408 - /* Done pushing/pulling - update this group with new mask */ 409 - cpumask_copy(&rdtgrp->cpu_mask, newmask); 410 411 unlock: 412 rdtgroup_kn_unlock(of->kn); 413 free_cpumask_var(tmpmask); 414 free_cpumask_var(newmask); 415 416 return ret ?: nbytes; 417 } ··· 414 if (atomic_dec_and_test(&rdtgrp->waitcount) && 415 (rdtgrp->flags & RDT_DELETED)) { 416 current->closid = 0; 417 kfree(rdtgrp); 418 } 419 ··· 453 atomic_dec(&rdtgrp->waitcount); 454 kfree(callback); 455 } else { 456 - tsk->closid = rdtgrp->closid; 457 } 458 return ret; 459 } ··· 546 547 rcu_read_lock(); 548 for_each_process_thread(p, t) { 549 - if (t->closid == r->closid) 550 seq_printf(s, "%d\n", t->pid); 551 } 552 rcu_read_unlock(); ··· 568 569 return ret; 570 } 571 - 572 - /* Files in each rdtgroup */ 573 - static struct rftype rdtgroup_base_files[] = { 574 - { 575 - .name = "cpus", 576 - .mode = 0644, 577 - .kf_ops = &rdtgroup_kf_single_ops, 578 - .write = rdtgroup_cpus_write, 579 - .seq_show = rdtgroup_cpus_show, 580 - }, 581 - { 582 - .name = "cpus_list", 583 - .mode = 0644, 584 - .kf_ops = &rdtgroup_kf_single_ops, 585 - .write = rdtgroup_cpus_write, 586 - .seq_show = rdtgroup_cpus_show, 587 - .flags = RFTYPE_FLAGS_CPUS_LIST, 588 - }, 589 - { 590 - .name = "tasks", 591 - .mode = 0644, 592 - .kf_ops = &rdtgroup_kf_single_ops, 593 - .write = rdtgroup_tasks_write, 594 - .seq_show = rdtgroup_tasks_show, 595 - }, 596 - { 597 - .name = "schemata", 598 - .mode = 0644, 599 - .kf_ops = &rdtgroup_kf_single_ops, 600 - .write = rdtgroup_schemata_write, 601 - .seq_show = rdtgroup_schemata_show, 602 - }, 603 - }; 604 605 static int rdt_num_closids_show(struct kernfs_open_file *of, 606 struct seq_file *seq, void *v) ··· 596 return 0; 597 } 598 599 static int rdt_min_bw_show(struct kernfs_open_file *of, 600 struct seq_file *seq, void *v) 601 { 602 struct rdt_resource *r = of->kn->parent->priv; 603 604 seq_printf(seq, "%u\n", r->membw.min_bw); 605 return 0; 606 } 607 ··· 654 return 0; 655 } 656 657 /* rdtgroup information files for one cache resource. */ 658 - static struct rftype res_cache_info_files[] = { 659 { 660 .name = "num_closids", 661 .mode = 0444, 662 .kf_ops = &rdtgroup_kf_single_ops, 663 .seq_show = rdt_num_closids_show, 664 }, 665 { 666 .name = "cbm_mask", 667 .mode = 0444, 668 .kf_ops = &rdtgroup_kf_single_ops, 669 .seq_show = rdt_default_ctrl_show, 670 }, 671 { 672 .name = "min_cbm_bits", 673 .mode = 0444, 674 .kf_ops = &rdtgroup_kf_single_ops, 675 .seq_show = rdt_min_cbm_bits_show, 676 }, 677 - }; 678 - 679 - /* rdtgroup information files for memory bandwidth. */ 680 - static struct rftype res_mba_info_files[] = { 681 { 682 - .name = "num_closids", 683 .mode = 0444, 684 .kf_ops = &rdtgroup_kf_single_ops, 685 - .seq_show = rdt_num_closids_show, 686 }, 687 { 688 .name = "min_bandwidth", 689 .mode = 0444, 690 .kf_ops = &rdtgroup_kf_single_ops, 691 .seq_show = rdt_min_bw_show, 692 }, 693 { 694 .name = "bandwidth_gran", 695 .mode = 0444, 696 .kf_ops = &rdtgroup_kf_single_ops, 697 .seq_show = rdt_bw_gran_show, 698 }, 699 { 700 .name = "delay_linear", 701 .mode = 0444, 702 .kf_ops = &rdtgroup_kf_single_ops, 703 .seq_show = rdt_delay_linear_show, 704 }, 705 }; 706 707 - void rdt_get_mba_infofile(struct rdt_resource *r) 708 { 709 - r->info_files = res_mba_info_files; 710 - r->nr_info_files = ARRAY_SIZE(res_mba_info_files); 711 } 712 713 - void rdt_get_cache_infofile(struct rdt_resource *r) 714 { 715 - r->info_files = res_cache_info_files; 716 - r->nr_info_files = ARRAY_SIZE(res_cache_info_files); 717 } 718 719 static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn) 720 { 721 - struct kernfs_node *kn_subdir; 722 - struct rftype *res_info_files; 723 struct rdt_resource *r; 724 - int ret, len; 725 726 /* create the directory */ 727 kn_info = kernfs_create_dir(parent_kn, "info", parent_kn->mode, NULL); ··· 855 return PTR_ERR(kn_info); 856 kernfs_get(kn_info); 857 858 - for_each_enabled_rdt_resource(r) { 859 - kn_subdir = kernfs_create_dir(kn_info, r->name, 860 - kn_info->mode, r); 861 - if (IS_ERR(kn_subdir)) { 862 - ret = PTR_ERR(kn_subdir); 863 - goto out_destroy; 864 - } 865 - kernfs_get(kn_subdir); 866 - ret = rdtgroup_kn_set_ugid(kn_subdir); 867 if (ret) 868 goto out_destroy; 869 870 - res_info_files = r->info_files; 871 - len = r->nr_info_files; 872 - 873 - ret = rdtgroup_add_files(kn_subdir, res_info_files, len); 874 if (ret) 875 goto out_destroy; 876 - kernfs_activate(kn_subdir); 877 } 878 879 /* ··· 889 return ret; 890 } 891 892 static void l3_qos_cfg_update(void *arg) 893 { 894 bool *enable = arg; ··· 962 struct rdt_resource *r_l3 = &rdt_resources_all[RDT_RESOURCE_L3]; 963 int ret; 964 965 - if (!r_l3->capable || !r_l3data->capable || !r_l3code->capable) 966 return -EINVAL; 967 968 ret = set_l3_qos_cfg(r_l3, true); 969 if (!ret) { 970 - r_l3->enabled = false; 971 - r_l3data->enabled = true; 972 - r_l3code->enabled = true; 973 } 974 return ret; 975 } ··· 979 { 980 struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3]; 981 982 - r->enabled = r->capable; 983 984 - if (rdt_resources_all[RDT_RESOURCE_L3DATA].enabled) { 985 - rdt_resources_all[RDT_RESOURCE_L3DATA].enabled = false; 986 - rdt_resources_all[RDT_RESOURCE_L3CODE].enabled = false; 987 set_l3_qos_cfg(r, false); 988 } 989 } ··· 1068 } 1069 } 1070 1071 static struct dentry *rdt_mount(struct file_system_type *fs_type, 1072 int flags, const char *unused_dev_name, 1073 void *data) 1074 { 1075 struct dentry *dentry; 1076 int ret; 1077 ··· 1104 goto out_cdp; 1105 } 1106 1107 dentry = kernfs_mount(fs_type, flags, rdt_root, 1108 RDTGROUP_SUPER_MAGIC, NULL); 1109 if (IS_ERR(dentry)) 1110 - goto out_destroy; 1111 1112 - static_branch_enable(&rdt_enable_key); 1113 goto out; 1114 1115 - out_destroy: 1116 kernfs_remove(kn_info); 1117 out_cdp: 1118 cdp_disable(); ··· 1199 return 0; 1200 } 1201 1202 /* 1203 * Move tasks from one to the other group. If @from is NULL, then all tasks 1204 * in the systems are moved unconditionally (used for teardown). ··· 1226 1227 read_lock(&tasklist_lock); 1228 for_each_process_thread(p, t) { 1229 - if (!from || t->closid == from->closid) { 1230 t->closid = to->closid; 1231 #ifdef CONFIG_SMP 1232 /* 1233 * This is safe on x86 w/o barriers as the ordering ··· 1249 read_unlock(&tasklist_lock); 1250 } 1251 1252 /* 1253 * Forcibly remove all of subdirectories under root. 1254 */ ··· 1273 rdt_move_group_tasks(NULL, &rdtgroup_default, NULL); 1274 1275 list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) { 1276 /* Remove each rdtgroup other than root */ 1277 if (rdtgrp == &rdtgroup_default) 1278 continue; ··· 1288 cpumask_or(&rdtgroup_default.cpu_mask, 1289 &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); 1290 1291 kernfs_remove(rdtgrp->kn); 1292 list_del(&rdtgrp->rdtgroup_list); 1293 kfree(rdtgrp); 1294 } 1295 /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ 1296 get_online_cpus(); 1297 - rdt_update_closid(cpu_online_mask, &rdtgroup_default.closid); 1298 put_online_cpus(); 1299 1300 kernfs_remove(kn_info); 1301 } 1302 1303 static void rdt_kill_sb(struct super_block *sb) ··· 1311 mutex_lock(&rdtgroup_mutex); 1312 1313 /*Put everything back to default values. */ 1314 - for_each_enabled_rdt_resource(r) 1315 reset_all_ctrls(r); 1316 cdp_disable(); 1317 rmdir_all_sub(); 1318 static_branch_disable(&rdt_enable_key); 1319 kernfs_kill_sb(sb); 1320 mutex_unlock(&rdtgroup_mutex); ··· 1328 .kill_sb = rdt_kill_sb, 1329 }; 1330 1331 - static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name, 1332 - umode_t mode) 1333 { 1334 - struct rdtgroup *parent, *rdtgrp; 1335 struct kernfs_node *kn; 1336 - int ret, closid; 1337 1338 - /* Only allow mkdir in the root directory */ 1339 - if (parent_kn != rdtgroup_default.kn) 1340 - return -EPERM; 1341 1342 - /* Do not accept '\n' to avoid unparsable situation. */ 1343 - if (strchr(name, '\n')) 1344 - return -EINVAL; 1345 1346 - parent = rdtgroup_kn_lock_live(parent_kn); 1347 - if (!parent) { 1348 ret = -ENODEV; 1349 goto out_unlock; 1350 } 1351 - 1352 - ret = closid_alloc(); 1353 - if (ret < 0) 1354 - goto out_unlock; 1355 - closid = ret; 1356 1357 /* allocate the rdtgroup. */ 1358 rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL); 1359 if (!rdtgrp) { 1360 ret = -ENOSPC; 1361 - goto out_closid_free; 1362 } 1363 - rdtgrp->closid = closid; 1364 - list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups); 1365 1366 /* kernfs creates the directory for rdtgrp */ 1367 - kn = kernfs_create_dir(parent->kn, name, mode, rdtgrp); 1368 if (IS_ERR(kn)) { 1369 ret = PTR_ERR(kn); 1370 - goto out_cancel_ref; 1371 } 1372 rdtgrp->kn = kn; 1373 ··· 1560 if (ret) 1561 goto out_destroy; 1562 1563 - ret = rdtgroup_add_files(kn, rdtgroup_base_files, 1564 - ARRAY_SIZE(rdtgroup_base_files)); 1565 if (ret) 1566 goto out_destroy; 1567 1568 kernfs_activate(kn); 1569 1570 - ret = 0; 1571 - goto out_unlock; 1572 1573 out_destroy: 1574 kernfs_remove(rdtgrp->kn); 1575 - out_cancel_ref: 1576 - list_del(&rdtgrp->rdtgroup_list); 1577 kfree(rdtgrp); 1578 - out_closid_free: 1579 - closid_free(closid); 1580 out_unlock: 1581 - rdtgroup_kn_unlock(parent_kn); 1582 return ret; 1583 } 1584 1585 static int rdtgroup_rmdir(struct kernfs_node *kn) 1586 { 1587 - int ret, cpu, closid = rdtgroup_default.closid; 1588 struct rdtgroup *rdtgrp; 1589 cpumask_var_t tmpmask; 1590 1591 if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) 1592 return -ENOMEM; ··· 1822 goto out; 1823 } 1824 1825 - /* Give any tasks back to the default group */ 1826 - rdt_move_group_tasks(rdtgrp, &rdtgroup_default, tmpmask); 1827 - 1828 - /* Give any CPUs back to the default group */ 1829 - cpumask_or(&rdtgroup_default.cpu_mask, 1830 - &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); 1831 - 1832 - /* Update per cpu closid of the moved CPUs first */ 1833 - for_each_cpu(cpu, &rdtgrp->cpu_mask) 1834 - per_cpu(cpu_closid, cpu) = closid; 1835 /* 1836 - * Update the MSR on moved CPUs and CPUs which have moved 1837 - * task running on them. 1838 */ 1839 - cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask); 1840 - rdt_update_closid(tmpmask, NULL); 1841 1842 - rdtgrp->flags = RDT_DELETED; 1843 - closid_free(rdtgrp->closid); 1844 - list_del(&rdtgrp->rdtgroup_list); 1845 - 1846 - /* 1847 - * one extra hold on this, will drop when we kfree(rdtgrp) 1848 - * in rdtgroup_kn_unlock() 1849 - */ 1850 - kernfs_get(kn); 1851 - kernfs_remove(rdtgrp->kn); 1852 - ret = 0; 1853 out: 1854 rdtgroup_kn_unlock(kn); 1855 free_cpumask_var(tmpmask); ··· 1845 1846 static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf) 1847 { 1848 - if (rdt_resources_all[RDT_RESOURCE_L3DATA].enabled) 1849 seq_puts(seq, ",cdp"); 1850 return 0; 1851 } ··· 1869 mutex_lock(&rdtgroup_mutex); 1870 1871 rdtgroup_default.closid = 0; 1872 list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups); 1873 1874 - ret = rdtgroup_add_files(rdt_root->kn, rdtgroup_base_files, 1875 - ARRAY_SIZE(rdtgroup_base_files)); 1876 if (ret) { 1877 kernfs_destroy_root(rdt_root); 1878 goto out;
··· 32 33 #include <uapi/linux/magic.h> 34 35 + #include <asm/intel_rdt_sched.h> 36 + #include "intel_rdt.h" 37 38 DEFINE_STATIC_KEY_FALSE(rdt_enable_key); 39 + DEFINE_STATIC_KEY_FALSE(rdt_mon_enable_key); 40 + DEFINE_STATIC_KEY_FALSE(rdt_alloc_enable_key); 41 + static struct kernfs_root *rdt_root; 42 struct rdtgroup rdtgroup_default; 43 LIST_HEAD(rdt_all_groups); 44 45 /* Kernel fs node for "info" directory under root */ 46 static struct kernfs_node *kn_info; 47 + 48 + /* Kernel fs node for "mon_groups" directory under root */ 49 + static struct kernfs_node *kn_mongrp; 50 + 51 + /* Kernel fs node for "mon_data" directory under root */ 52 + static struct kernfs_node *kn_mondata; 53 54 /* 55 * Trivial allocator for CLOSIDs. Since h/w only supports a small number, ··· 66 int rdt_min_closid = 32; 67 68 /* Compute rdt_min_closid across all resources */ 69 + for_each_alloc_enabled_rdt_resource(r) 70 rdt_min_closid = min(rdt_min_closid, r->num_closid); 71 72 closid_free_map = BIT_MASK(rdt_min_closid) - 1; ··· 75 closid_free_map &= ~1; 76 } 77 78 + static int closid_alloc(void) 79 { 80 + u32 closid = ffs(closid_free_map); 81 82 if (closid == 0) 83 return -ENOSPC; ··· 125 return 0; 126 } 127 128 static int rdtgroup_seqfile_show(struct seq_file *m, void *arg) 129 { 130 struct kernfs_open_file *of = m->private; ··· 172 .atomic_write_len = PAGE_SIZE, 173 .write = rdtgroup_file_write, 174 .seq_show = rdtgroup_seqfile_show, 175 + }; 176 + 177 + static struct kernfs_ops kf_mondata_ops = { 178 + .atomic_write_len = PAGE_SIZE, 179 + .seq_show = rdtgroup_mondata_show, 180 }; 181 182 static bool is_cpu_list(struct kernfs_open_file *of) ··· 203 /* 204 * This is safe against intel_rdt_sched_in() called from __switch_to() 205 * because __switch_to() is executed with interrupts disabled. A local call 206 + * from update_closid_rmid() is proteced against __switch_to() because 207 * preemption is disabled. 208 */ 209 + static void update_cpu_closid_rmid(void *info) 210 { 211 + struct rdtgroup *r = info; 212 + 213 + if (r) { 214 + this_cpu_write(pqr_state.default_closid, r->closid); 215 + this_cpu_write(pqr_state.default_rmid, r->mon.rmid); 216 + } 217 + 218 /* 219 * We cannot unconditionally write the MSR because the current 220 * executing task might have its own closid selected. Just reuse ··· 221 /* 222 * Update the PGR_ASSOC MSR on all cpus in @cpu_mask, 223 * 224 + * Per task closids/rmids must have been set up before calling this function. 225 */ 226 static void 227 + update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r) 228 { 229 int cpu = get_cpu(); 230 231 if (cpumask_test_cpu(cpu, cpu_mask)) 232 + update_cpu_closid_rmid(r); 233 + smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1); 234 put_cpu(); 235 + } 236 + 237 + static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask, 238 + cpumask_var_t tmpmask) 239 + { 240 + struct rdtgroup *prgrp = rdtgrp->mon.parent, *crgrp; 241 + struct list_head *head; 242 + 243 + /* Check whether cpus belong to parent ctrl group */ 244 + cpumask_andnot(tmpmask, newmask, &prgrp->cpu_mask); 245 + if (cpumask_weight(tmpmask)) 246 + return -EINVAL; 247 + 248 + /* Check whether cpus are dropped from this group */ 249 + cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask); 250 + if (cpumask_weight(tmpmask)) { 251 + /* Give any dropped cpus to parent rdtgroup */ 252 + cpumask_or(&prgrp->cpu_mask, &prgrp->cpu_mask, tmpmask); 253 + update_closid_rmid(tmpmask, prgrp); 254 + } 255 + 256 + /* 257 + * If we added cpus, remove them from previous group that owned them 258 + * and update per-cpu rmid 259 + */ 260 + cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask); 261 + if (cpumask_weight(tmpmask)) { 262 + head = &prgrp->mon.crdtgrp_list; 263 + list_for_each_entry(crgrp, head, mon.crdtgrp_list) { 264 + if (crgrp == rdtgrp) 265 + continue; 266 + cpumask_andnot(&crgrp->cpu_mask, &crgrp->cpu_mask, 267 + tmpmask); 268 + } 269 + update_closid_rmid(tmpmask, rdtgrp); 270 + } 271 + 272 + /* Done pushing/pulling - update this group with new mask */ 273 + cpumask_copy(&rdtgrp->cpu_mask, newmask); 274 + 275 + return 0; 276 + } 277 + 278 + static void cpumask_rdtgrp_clear(struct rdtgroup *r, struct cpumask *m) 279 + { 280 + struct rdtgroup *crgrp; 281 + 282 + cpumask_andnot(&r->cpu_mask, &r->cpu_mask, m); 283 + /* update the child mon group masks as well*/ 284 + list_for_each_entry(crgrp, &r->mon.crdtgrp_list, mon.crdtgrp_list) 285 + cpumask_and(&crgrp->cpu_mask, &r->cpu_mask, &crgrp->cpu_mask); 286 + } 287 + 288 + static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask, 289 + cpumask_var_t tmpmask, cpumask_var_t tmpmask1) 290 + { 291 + struct rdtgroup *r, *crgrp; 292 + struct list_head *head; 293 + 294 + /* Check whether cpus are dropped from this group */ 295 + cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask); 296 + if (cpumask_weight(tmpmask)) { 297 + /* Can't drop from default group */ 298 + if (rdtgrp == &rdtgroup_default) 299 + return -EINVAL; 300 + 301 + /* Give any dropped cpus to rdtgroup_default */ 302 + cpumask_or(&rdtgroup_default.cpu_mask, 303 + &rdtgroup_default.cpu_mask, tmpmask); 304 + update_closid_rmid(tmpmask, &rdtgroup_default); 305 + } 306 + 307 + /* 308 + * If we added cpus, remove them from previous group and 309 + * the prev group's child groups that owned them 310 + * and update per-cpu closid/rmid. 311 + */ 312 + cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask); 313 + if (cpumask_weight(tmpmask)) { 314 + list_for_each_entry(r, &rdt_all_groups, rdtgroup_list) { 315 + if (r == rdtgrp) 316 + continue; 317 + cpumask_and(tmpmask1, &r->cpu_mask, tmpmask); 318 + if (cpumask_weight(tmpmask1)) 319 + cpumask_rdtgrp_clear(r, tmpmask1); 320 + } 321 + update_closid_rmid(tmpmask, rdtgrp); 322 + } 323 + 324 + /* Done pushing/pulling - update this group with new mask */ 325 + cpumask_copy(&rdtgrp->cpu_mask, newmask); 326 + 327 + /* 328 + * Clear child mon group masks since there is a new parent mask 329 + * now and update the rmid for the cpus the child lost. 330 + */ 331 + head = &rdtgrp->mon.crdtgrp_list; 332 + list_for_each_entry(crgrp, head, mon.crdtgrp_list) { 333 + cpumask_and(tmpmask, &rdtgrp->cpu_mask, &crgrp->cpu_mask); 334 + update_closid_rmid(tmpmask, rdtgrp); 335 + cpumask_clear(&crgrp->cpu_mask); 336 + } 337 + 338 + return 0; 339 } 340 341 static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of, 342 char *buf, size_t nbytes, loff_t off) 343 { 344 + cpumask_var_t tmpmask, newmask, tmpmask1; 345 + struct rdtgroup *rdtgrp; 346 int ret; 347 348 if (!buf) ··· 252 return -ENOMEM; 253 if (!zalloc_cpumask_var(&newmask, GFP_KERNEL)) { 254 free_cpumask_var(tmpmask); 255 + return -ENOMEM; 256 + } 257 + if (!zalloc_cpumask_var(&tmpmask1, GFP_KERNEL)) { 258 + free_cpumask_var(tmpmask); 259 + free_cpumask_var(newmask); 260 return -ENOMEM; 261 } 262 ··· 276 goto unlock; 277 } 278 279 + if (rdtgrp->type == RDTCTRL_GROUP) 280 + ret = cpus_ctrl_write(rdtgrp, newmask, tmpmask, tmpmask1); 281 + else if (rdtgrp->type == RDTMON_GROUP) 282 + ret = cpus_mon_write(rdtgrp, newmask, tmpmask); 283 + else 284 + ret = -EINVAL; 285 286 unlock: 287 rdtgroup_kn_unlock(of->kn); 288 free_cpumask_var(tmpmask); 289 free_cpumask_var(newmask); 290 + free_cpumask_var(tmpmask1); 291 292 return ret ?: nbytes; 293 } ··· 336 if (atomic_dec_and_test(&rdtgrp->waitcount) && 337 (rdtgrp->flags & RDT_DELETED)) { 338 current->closid = 0; 339 + current->rmid = 0; 340 kfree(rdtgrp); 341 } 342 ··· 374 atomic_dec(&rdtgrp->waitcount); 375 kfree(callback); 376 } else { 377 + /* 378 + * For ctrl_mon groups move both closid and rmid. 379 + * For monitor groups, can move the tasks only from 380 + * their parent CTRL group. 381 + */ 382 + if (rdtgrp->type == RDTCTRL_GROUP) { 383 + tsk->closid = rdtgrp->closid; 384 + tsk->rmid = rdtgrp->mon.rmid; 385 + } else if (rdtgrp->type == RDTMON_GROUP) { 386 + if (rdtgrp->mon.parent->closid == tsk->closid) 387 + tsk->rmid = rdtgrp->mon.rmid; 388 + else 389 + ret = -EINVAL; 390 + } 391 } 392 return ret; 393 } ··· 454 455 rcu_read_lock(); 456 for_each_process_thread(p, t) { 457 + if ((r->type == RDTCTRL_GROUP && t->closid == r->closid) || 458 + (r->type == RDTMON_GROUP && t->rmid == r->mon.rmid)) 459 seq_printf(s, "%d\n", t->pid); 460 } 461 rcu_read_unlock(); ··· 475 476 return ret; 477 } 478 479 static int rdt_num_closids_show(struct kernfs_open_file *of, 480 struct seq_file *seq, void *v) ··· 536 return 0; 537 } 538 539 + static int rdt_shareable_bits_show(struct kernfs_open_file *of, 540 + struct seq_file *seq, void *v) 541 + { 542 + struct rdt_resource *r = of->kn->parent->priv; 543 + 544 + seq_printf(seq, "%x\n", r->cache.shareable_bits); 545 + return 0; 546 + } 547 + 548 static int rdt_min_bw_show(struct kernfs_open_file *of, 549 struct seq_file *seq, void *v) 550 { 551 struct rdt_resource *r = of->kn->parent->priv; 552 553 seq_printf(seq, "%u\n", r->membw.min_bw); 554 + return 0; 555 + } 556 + 557 + static int rdt_num_rmids_show(struct kernfs_open_file *of, 558 + struct seq_file *seq, void *v) 559 + { 560 + struct rdt_resource *r = of->kn->parent->priv; 561 + 562 + seq_printf(seq, "%d\n", r->num_rmid); 563 + 564 + return 0; 565 + } 566 + 567 + static int rdt_mon_features_show(struct kernfs_open_file *of, 568 + struct seq_file *seq, void *v) 569 + { 570 + struct rdt_resource *r = of->kn->parent->priv; 571 + struct mon_evt *mevt; 572 + 573 + list_for_each_entry(mevt, &r->evt_list, list) 574 + seq_printf(seq, "%s\n", mevt->name); 575 + 576 return 0; 577 } 578 ··· 563 return 0; 564 } 565 566 + static int max_threshold_occ_show(struct kernfs_open_file *of, 567 + struct seq_file *seq, void *v) 568 + { 569 + struct rdt_resource *r = of->kn->parent->priv; 570 + 571 + seq_printf(seq, "%u\n", intel_cqm_threshold * r->mon_scale); 572 + 573 + return 0; 574 + } 575 + 576 + static ssize_t max_threshold_occ_write(struct kernfs_open_file *of, 577 + char *buf, size_t nbytes, loff_t off) 578 + { 579 + struct rdt_resource *r = of->kn->parent->priv; 580 + unsigned int bytes; 581 + int ret; 582 + 583 + ret = kstrtouint(buf, 0, &bytes); 584 + if (ret) 585 + return ret; 586 + 587 + if (bytes > (boot_cpu_data.x86_cache_size * 1024)) 588 + return -EINVAL; 589 + 590 + intel_cqm_threshold = bytes / r->mon_scale; 591 + 592 + return nbytes; 593 + } 594 + 595 /* rdtgroup information files for one cache resource. */ 596 + static struct rftype res_common_files[] = { 597 { 598 .name = "num_closids", 599 .mode = 0444, 600 .kf_ops = &rdtgroup_kf_single_ops, 601 .seq_show = rdt_num_closids_show, 602 + .fflags = RF_CTRL_INFO, 603 + }, 604 + { 605 + .name = "mon_features", 606 + .mode = 0444, 607 + .kf_ops = &rdtgroup_kf_single_ops, 608 + .seq_show = rdt_mon_features_show, 609 + .fflags = RF_MON_INFO, 610 + }, 611 + { 612 + .name = "num_rmids", 613 + .mode = 0444, 614 + .kf_ops = &rdtgroup_kf_single_ops, 615 + .seq_show = rdt_num_rmids_show, 616 + .fflags = RF_MON_INFO, 617 }, 618 { 619 .name = "cbm_mask", 620 .mode = 0444, 621 .kf_ops = &rdtgroup_kf_single_ops, 622 .seq_show = rdt_default_ctrl_show, 623 + .fflags = RF_CTRL_INFO | RFTYPE_RES_CACHE, 624 }, 625 { 626 .name = "min_cbm_bits", 627 .mode = 0444, 628 .kf_ops = &rdtgroup_kf_single_ops, 629 .seq_show = rdt_min_cbm_bits_show, 630 + .fflags = RF_CTRL_INFO | RFTYPE_RES_CACHE, 631 }, 632 { 633 + .name = "shareable_bits", 634 .mode = 0444, 635 .kf_ops = &rdtgroup_kf_single_ops, 636 + .seq_show = rdt_shareable_bits_show, 637 + .fflags = RF_CTRL_INFO | RFTYPE_RES_CACHE, 638 }, 639 { 640 .name = "min_bandwidth", 641 .mode = 0444, 642 .kf_ops = &rdtgroup_kf_single_ops, 643 .seq_show = rdt_min_bw_show, 644 + .fflags = RF_CTRL_INFO | RFTYPE_RES_MB, 645 }, 646 { 647 .name = "bandwidth_gran", 648 .mode = 0444, 649 .kf_ops = &rdtgroup_kf_single_ops, 650 .seq_show = rdt_bw_gran_show, 651 + .fflags = RF_CTRL_INFO | RFTYPE_RES_MB, 652 }, 653 { 654 .name = "delay_linear", 655 .mode = 0444, 656 .kf_ops = &rdtgroup_kf_single_ops, 657 .seq_show = rdt_delay_linear_show, 658 + .fflags = RF_CTRL_INFO | RFTYPE_RES_MB, 659 + }, 660 + { 661 + .name = "max_threshold_occupancy", 662 + .mode = 0644, 663 + .kf_ops = &rdtgroup_kf_single_ops, 664 + .write = max_threshold_occ_write, 665 + .seq_show = max_threshold_occ_show, 666 + .fflags = RF_MON_INFO | RFTYPE_RES_CACHE, 667 + }, 668 + { 669 + .name = "cpus", 670 + .mode = 0644, 671 + .kf_ops = &rdtgroup_kf_single_ops, 672 + .write = rdtgroup_cpus_write, 673 + .seq_show = rdtgroup_cpus_show, 674 + .fflags = RFTYPE_BASE, 675 + }, 676 + { 677 + .name = "cpus_list", 678 + .mode = 0644, 679 + .kf_ops = &rdtgroup_kf_single_ops, 680 + .write = rdtgroup_cpus_write, 681 + .seq_show = rdtgroup_cpus_show, 682 + .flags = RFTYPE_FLAGS_CPUS_LIST, 683 + .fflags = RFTYPE_BASE, 684 + }, 685 + { 686 + .name = "tasks", 687 + .mode = 0644, 688 + .kf_ops = &rdtgroup_kf_single_ops, 689 + .write = rdtgroup_tasks_write, 690 + .seq_show = rdtgroup_tasks_show, 691 + .fflags = RFTYPE_BASE, 692 + }, 693 + { 694 + .name = "schemata", 695 + .mode = 0644, 696 + .kf_ops = &rdtgroup_kf_single_ops, 697 + .write = rdtgroup_schemata_write, 698 + .seq_show = rdtgroup_schemata_show, 699 + .fflags = RF_CTRL_BASE, 700 }, 701 }; 702 703 + static int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags) 704 { 705 + struct rftype *rfts, *rft; 706 + int ret, len; 707 + 708 + rfts = res_common_files; 709 + len = ARRAY_SIZE(res_common_files); 710 + 711 + lockdep_assert_held(&rdtgroup_mutex); 712 + 713 + for (rft = rfts; rft < rfts + len; rft++) { 714 + if ((fflags & rft->fflags) == rft->fflags) { 715 + ret = rdtgroup_add_file(kn, rft); 716 + if (ret) 717 + goto error; 718 + } 719 + } 720 + 721 + return 0; 722 + error: 723 + pr_warn("Failed to add %s, err=%d\n", rft->name, ret); 724 + while (--rft >= rfts) { 725 + if ((fflags & rft->fflags) == rft->fflags) 726 + kernfs_remove_by_name(kn, rft->name); 727 + } 728 + return ret; 729 } 730 731 + static int rdtgroup_mkdir_info_resdir(struct rdt_resource *r, char *name, 732 + unsigned long fflags) 733 { 734 + struct kernfs_node *kn_subdir; 735 + int ret; 736 + 737 + kn_subdir = kernfs_create_dir(kn_info, name, 738 + kn_info->mode, r); 739 + if (IS_ERR(kn_subdir)) 740 + return PTR_ERR(kn_subdir); 741 + 742 + kernfs_get(kn_subdir); 743 + ret = rdtgroup_kn_set_ugid(kn_subdir); 744 + if (ret) 745 + return ret; 746 + 747 + ret = rdtgroup_add_files(kn_subdir, fflags); 748 + if (!ret) 749 + kernfs_activate(kn_subdir); 750 + 751 + return ret; 752 } 753 754 static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn) 755 { 756 struct rdt_resource *r; 757 + unsigned long fflags; 758 + char name[32]; 759 + int ret; 760 761 /* create the directory */ 762 kn_info = kernfs_create_dir(parent_kn, "info", parent_kn->mode, NULL); ··· 638 return PTR_ERR(kn_info); 639 kernfs_get(kn_info); 640 641 + for_each_alloc_enabled_rdt_resource(r) { 642 + fflags = r->fflags | RF_CTRL_INFO; 643 + ret = rdtgroup_mkdir_info_resdir(r, r->name, fflags); 644 if (ret) 645 goto out_destroy; 646 + } 647 648 + for_each_mon_enabled_rdt_resource(r) { 649 + fflags = r->fflags | RF_MON_INFO; 650 + sprintf(name, "%s_MON", r->name); 651 + ret = rdtgroup_mkdir_info_resdir(r, name, fflags); 652 if (ret) 653 goto out_destroy; 654 } 655 656 /* ··· 678 return ret; 679 } 680 681 + static int 682 + mongroup_create_dir(struct kernfs_node *parent_kn, struct rdtgroup *prgrp, 683 + char *name, struct kernfs_node **dest_kn) 684 + { 685 + struct kernfs_node *kn; 686 + int ret; 687 + 688 + /* create the directory */ 689 + kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp); 690 + if (IS_ERR(kn)) 691 + return PTR_ERR(kn); 692 + 693 + if (dest_kn) 694 + *dest_kn = kn; 695 + 696 + /* 697 + * This extra ref will be put in kernfs_remove() and guarantees 698 + * that @rdtgrp->kn is always accessible. 699 + */ 700 + kernfs_get(kn); 701 + 702 + ret = rdtgroup_kn_set_ugid(kn); 703 + if (ret) 704 + goto out_destroy; 705 + 706 + kernfs_activate(kn); 707 + 708 + return 0; 709 + 710 + out_destroy: 711 + kernfs_remove(kn); 712 + return ret; 713 + } 714 static void l3_qos_cfg_update(void *arg) 715 { 716 bool *enable = arg; ··· 718 struct rdt_resource *r_l3 = &rdt_resources_all[RDT_RESOURCE_L3]; 719 int ret; 720 721 + if (!r_l3->alloc_capable || !r_l3data->alloc_capable || 722 + !r_l3code->alloc_capable) 723 return -EINVAL; 724 725 ret = set_l3_qos_cfg(r_l3, true); 726 if (!ret) { 727 + r_l3->alloc_enabled = false; 728 + r_l3data->alloc_enabled = true; 729 + r_l3code->alloc_enabled = true; 730 } 731 return ret; 732 } ··· 734 { 735 struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3]; 736 737 + r->alloc_enabled = r->alloc_capable; 738 739 + if (rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled) { 740 + rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled = false; 741 + rdt_resources_all[RDT_RESOURCE_L3CODE].alloc_enabled = false; 742 set_l3_qos_cfg(r, false); 743 } 744 } ··· 823 } 824 } 825 826 + static int mkdir_mondata_all(struct kernfs_node *parent_kn, 827 + struct rdtgroup *prgrp, 828 + struct kernfs_node **mon_data_kn); 829 + 830 static struct dentry *rdt_mount(struct file_system_type *fs_type, 831 int flags, const char *unused_dev_name, 832 void *data) 833 { 834 + struct rdt_domain *dom; 835 + struct rdt_resource *r; 836 struct dentry *dentry; 837 int ret; 838 ··· 853 goto out_cdp; 854 } 855 856 + if (rdt_mon_capable) { 857 + ret = mongroup_create_dir(rdtgroup_default.kn, 858 + NULL, "mon_groups", 859 + &kn_mongrp); 860 + if (ret) { 861 + dentry = ERR_PTR(ret); 862 + goto out_info; 863 + } 864 + kernfs_get(kn_mongrp); 865 + 866 + ret = mkdir_mondata_all(rdtgroup_default.kn, 867 + &rdtgroup_default, &kn_mondata); 868 + if (ret) { 869 + dentry = ERR_PTR(ret); 870 + goto out_mongrp; 871 + } 872 + kernfs_get(kn_mondata); 873 + rdtgroup_default.mon.mon_data_kn = kn_mondata; 874 + } 875 + 876 dentry = kernfs_mount(fs_type, flags, rdt_root, 877 RDTGROUP_SUPER_MAGIC, NULL); 878 if (IS_ERR(dentry)) 879 + goto out_mondata; 880 881 + if (rdt_alloc_capable) 882 + static_branch_enable(&rdt_alloc_enable_key); 883 + if (rdt_mon_capable) 884 + static_branch_enable(&rdt_mon_enable_key); 885 + 886 + if (rdt_alloc_capable || rdt_mon_capable) 887 + static_branch_enable(&rdt_enable_key); 888 + 889 + if (is_mbm_enabled()) { 890 + r = &rdt_resources_all[RDT_RESOURCE_L3]; 891 + list_for_each_entry(dom, &r->domains, list) 892 + mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL); 893 + } 894 + 895 goto out; 896 897 + out_mondata: 898 + if (rdt_mon_capable) 899 + kernfs_remove(kn_mondata); 900 + out_mongrp: 901 + if (rdt_mon_capable) 902 + kernfs_remove(kn_mongrp); 903 + out_info: 904 kernfs_remove(kn_info); 905 out_cdp: 906 cdp_disable(); ··· 909 return 0; 910 } 911 912 + static bool is_closid_match(struct task_struct *t, struct rdtgroup *r) 913 + { 914 + return (rdt_alloc_capable && 915 + (r->type == RDTCTRL_GROUP) && (t->closid == r->closid)); 916 + } 917 + 918 + static bool is_rmid_match(struct task_struct *t, struct rdtgroup *r) 919 + { 920 + return (rdt_mon_capable && 921 + (r->type == RDTMON_GROUP) && (t->rmid == r->mon.rmid)); 922 + } 923 + 924 /* 925 * Move tasks from one to the other group. If @from is NULL, then all tasks 926 * in the systems are moved unconditionally (used for teardown). ··· 924 925 read_lock(&tasklist_lock); 926 for_each_process_thread(p, t) { 927 + if (!from || is_closid_match(t, from) || 928 + is_rmid_match(t, from)) { 929 t->closid = to->closid; 930 + t->rmid = to->mon.rmid; 931 + 932 #ifdef CONFIG_SMP 933 /* 934 * This is safe on x86 w/o barriers as the ordering ··· 944 read_unlock(&tasklist_lock); 945 } 946 947 + static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp) 948 + { 949 + struct rdtgroup *sentry, *stmp; 950 + struct list_head *head; 951 + 952 + head = &rdtgrp->mon.crdtgrp_list; 953 + list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) { 954 + free_rmid(sentry->mon.rmid); 955 + list_del(&sentry->mon.crdtgrp_list); 956 + kfree(sentry); 957 + } 958 + } 959 + 960 /* 961 * Forcibly remove all of subdirectories under root. 962 */ ··· 955 rdt_move_group_tasks(NULL, &rdtgroup_default, NULL); 956 957 list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) { 958 + /* Free any child rmids */ 959 + free_all_child_rdtgrp(rdtgrp); 960 + 961 /* Remove each rdtgroup other than root */ 962 if (rdtgrp == &rdtgroup_default) 963 continue; ··· 967 cpumask_or(&rdtgroup_default.cpu_mask, 968 &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); 969 970 + free_rmid(rdtgrp->mon.rmid); 971 + 972 kernfs_remove(rdtgrp->kn); 973 list_del(&rdtgrp->rdtgroup_list); 974 kfree(rdtgrp); 975 } 976 /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ 977 get_online_cpus(); 978 + update_closid_rmid(cpu_online_mask, &rdtgroup_default); 979 put_online_cpus(); 980 981 kernfs_remove(kn_info); 982 + kernfs_remove(kn_mongrp); 983 + kernfs_remove(kn_mondata); 984 } 985 986 static void rdt_kill_sb(struct super_block *sb) ··· 986 mutex_lock(&rdtgroup_mutex); 987 988 /*Put everything back to default values. */ 989 + for_each_alloc_enabled_rdt_resource(r) 990 reset_all_ctrls(r); 991 cdp_disable(); 992 rmdir_all_sub(); 993 + static_branch_disable(&rdt_alloc_enable_key); 994 + static_branch_disable(&rdt_mon_enable_key); 995 static_branch_disable(&rdt_enable_key); 996 kernfs_kill_sb(sb); 997 mutex_unlock(&rdtgroup_mutex); ··· 1001 .kill_sb = rdt_kill_sb, 1002 }; 1003 1004 + static int mon_addfile(struct kernfs_node *parent_kn, const char *name, 1005 + void *priv) 1006 { 1007 struct kernfs_node *kn; 1008 + int ret = 0; 1009 1010 + kn = __kernfs_create_file(parent_kn, name, 0444, 0, 1011 + &kf_mondata_ops, priv, NULL, NULL); 1012 + if (IS_ERR(kn)) 1013 + return PTR_ERR(kn); 1014 1015 + ret = rdtgroup_kn_set_ugid(kn); 1016 + if (ret) { 1017 + kernfs_remove(kn); 1018 + return ret; 1019 + } 1020 1021 + return ret; 1022 + } 1023 + 1024 + /* 1025 + * Remove all subdirectories of mon_data of ctrl_mon groups 1026 + * and monitor groups with given domain id. 1027 + */ 1028 + void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id) 1029 + { 1030 + struct rdtgroup *prgrp, *crgrp; 1031 + char name[32]; 1032 + 1033 + if (!r->mon_enabled) 1034 + return; 1035 + 1036 + list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) { 1037 + sprintf(name, "mon_%s_%02d", r->name, dom_id); 1038 + kernfs_remove_by_name(prgrp->mon.mon_data_kn, name); 1039 + 1040 + list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) 1041 + kernfs_remove_by_name(crgrp->mon.mon_data_kn, name); 1042 + } 1043 + } 1044 + 1045 + static int mkdir_mondata_subdir(struct kernfs_node *parent_kn, 1046 + struct rdt_domain *d, 1047 + struct rdt_resource *r, struct rdtgroup *prgrp) 1048 + { 1049 + union mon_data_bits priv; 1050 + struct kernfs_node *kn; 1051 + struct mon_evt *mevt; 1052 + struct rmid_read rr; 1053 + char name[32]; 1054 + int ret; 1055 + 1056 + sprintf(name, "mon_%s_%02d", r->name, d->id); 1057 + /* create the directory */ 1058 + kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp); 1059 + if (IS_ERR(kn)) 1060 + return PTR_ERR(kn); 1061 + 1062 + /* 1063 + * This extra ref will be put in kernfs_remove() and guarantees 1064 + * that kn is always accessible. 1065 + */ 1066 + kernfs_get(kn); 1067 + ret = rdtgroup_kn_set_ugid(kn); 1068 + if (ret) 1069 + goto out_destroy; 1070 + 1071 + if (WARN_ON(list_empty(&r->evt_list))) { 1072 + ret = -EPERM; 1073 + goto out_destroy; 1074 + } 1075 + 1076 + priv.u.rid = r->rid; 1077 + priv.u.domid = d->id; 1078 + list_for_each_entry(mevt, &r->evt_list, list) { 1079 + priv.u.evtid = mevt->evtid; 1080 + ret = mon_addfile(kn, mevt->name, priv.priv); 1081 + if (ret) 1082 + goto out_destroy; 1083 + 1084 + if (is_mbm_event(mevt->evtid)) 1085 + mon_event_read(&rr, d, prgrp, mevt->evtid, true); 1086 + } 1087 + kernfs_activate(kn); 1088 + return 0; 1089 + 1090 + out_destroy: 1091 + kernfs_remove(kn); 1092 + return ret; 1093 + } 1094 + 1095 + /* 1096 + * Add all subdirectories of mon_data for "ctrl_mon" groups 1097 + * and "monitor" groups with given domain id. 1098 + */ 1099 + void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, 1100 + struct rdt_domain *d) 1101 + { 1102 + struct kernfs_node *parent_kn; 1103 + struct rdtgroup *prgrp, *crgrp; 1104 + struct list_head *head; 1105 + 1106 + if (!r->mon_enabled) 1107 + return; 1108 + 1109 + list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) { 1110 + parent_kn = prgrp->mon.mon_data_kn; 1111 + mkdir_mondata_subdir(parent_kn, d, r, prgrp); 1112 + 1113 + head = &prgrp->mon.crdtgrp_list; 1114 + list_for_each_entry(crgrp, head, mon.crdtgrp_list) { 1115 + parent_kn = crgrp->mon.mon_data_kn; 1116 + mkdir_mondata_subdir(parent_kn, d, r, crgrp); 1117 + } 1118 + } 1119 + } 1120 + 1121 + static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn, 1122 + struct rdt_resource *r, 1123 + struct rdtgroup *prgrp) 1124 + { 1125 + struct rdt_domain *dom; 1126 + int ret; 1127 + 1128 + list_for_each_entry(dom, &r->domains, list) { 1129 + ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp); 1130 + if (ret) 1131 + return ret; 1132 + } 1133 + 1134 + return 0; 1135 + } 1136 + 1137 + /* 1138 + * This creates a directory mon_data which contains the monitored data. 1139 + * 1140 + * mon_data has one directory for each domain whic are named 1141 + * in the format mon_<domain_name>_<domain_id>. For ex: A mon_data 1142 + * with L3 domain looks as below: 1143 + * ./mon_data: 1144 + * mon_L3_00 1145 + * mon_L3_01 1146 + * mon_L3_02 1147 + * ... 1148 + * 1149 + * Each domain directory has one file per event: 1150 + * ./mon_L3_00/: 1151 + * llc_occupancy 1152 + * 1153 + */ 1154 + static int mkdir_mondata_all(struct kernfs_node *parent_kn, 1155 + struct rdtgroup *prgrp, 1156 + struct kernfs_node **dest_kn) 1157 + { 1158 + struct rdt_resource *r; 1159 + struct kernfs_node *kn; 1160 + int ret; 1161 + 1162 + /* 1163 + * Create the mon_data directory first. 1164 + */ 1165 + ret = mongroup_create_dir(parent_kn, NULL, "mon_data", &kn); 1166 + if (ret) 1167 + return ret; 1168 + 1169 + if (dest_kn) 1170 + *dest_kn = kn; 1171 + 1172 + /* 1173 + * Create the subdirectories for each domain. Note that all events 1174 + * in a domain like L3 are grouped into a resource whose domain is L3 1175 + */ 1176 + for_each_mon_enabled_rdt_resource(r) { 1177 + ret = mkdir_mondata_subdir_alldom(kn, r, prgrp); 1178 + if (ret) 1179 + goto out_destroy; 1180 + } 1181 + 1182 + return 0; 1183 + 1184 + out_destroy: 1185 + kernfs_remove(kn); 1186 + return ret; 1187 + } 1188 + 1189 + static int mkdir_rdt_prepare(struct kernfs_node *parent_kn, 1190 + struct kernfs_node *prgrp_kn, 1191 + const char *name, umode_t mode, 1192 + enum rdt_group_type rtype, struct rdtgroup **r) 1193 + { 1194 + struct rdtgroup *prdtgrp, *rdtgrp; 1195 + struct kernfs_node *kn; 1196 + uint files = 0; 1197 + int ret; 1198 + 1199 + prdtgrp = rdtgroup_kn_lock_live(prgrp_kn); 1200 + if (!prdtgrp) { 1201 ret = -ENODEV; 1202 goto out_unlock; 1203 } 1204 1205 /* allocate the rdtgroup. */ 1206 rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL); 1207 if (!rdtgrp) { 1208 ret = -ENOSPC; 1209 + goto out_unlock; 1210 } 1211 + *r = rdtgrp; 1212 + rdtgrp->mon.parent = prdtgrp; 1213 + rdtgrp->type = rtype; 1214 + INIT_LIST_HEAD(&rdtgrp->mon.crdtgrp_list); 1215 1216 /* kernfs creates the directory for rdtgrp */ 1217 + kn = kernfs_create_dir(parent_kn, name, mode, rdtgrp); 1218 if (IS_ERR(kn)) { 1219 ret = PTR_ERR(kn); 1220 + goto out_free_rgrp; 1221 } 1222 rdtgrp->kn = kn; 1223 ··· 1056 if (ret) 1057 goto out_destroy; 1058 1059 + files = RFTYPE_BASE | RFTYPE_CTRL; 1060 + files = RFTYPE_BASE | BIT(RF_CTRLSHIFT + rtype); 1061 + ret = rdtgroup_add_files(kn, files); 1062 if (ret) 1063 goto out_destroy; 1064 1065 + if (rdt_mon_capable) { 1066 + ret = alloc_rmid(); 1067 + if (ret < 0) 1068 + goto out_destroy; 1069 + rdtgrp->mon.rmid = ret; 1070 + 1071 + ret = mkdir_mondata_all(kn, rdtgrp, &rdtgrp->mon.mon_data_kn); 1072 + if (ret) 1073 + goto out_idfree; 1074 + } 1075 kernfs_activate(kn); 1076 1077 + /* 1078 + * The caller unlocks the prgrp_kn upon success. 1079 + */ 1080 + return 0; 1081 1082 + out_idfree: 1083 + free_rmid(rdtgrp->mon.rmid); 1084 out_destroy: 1085 kernfs_remove(rdtgrp->kn); 1086 + out_free_rgrp: 1087 kfree(rdtgrp); 1088 out_unlock: 1089 + rdtgroup_kn_unlock(prgrp_kn); 1090 return ret; 1091 + } 1092 + 1093 + static void mkdir_rdt_prepare_clean(struct rdtgroup *rgrp) 1094 + { 1095 + kernfs_remove(rgrp->kn); 1096 + free_rmid(rgrp->mon.rmid); 1097 + kfree(rgrp); 1098 + } 1099 + 1100 + /* 1101 + * Create a monitor group under "mon_groups" directory of a control 1102 + * and monitor group(ctrl_mon). This is a resource group 1103 + * to monitor a subset of tasks and cpus in its parent ctrl_mon group. 1104 + */ 1105 + static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn, 1106 + struct kernfs_node *prgrp_kn, 1107 + const char *name, 1108 + umode_t mode) 1109 + { 1110 + struct rdtgroup *rdtgrp, *prgrp; 1111 + int ret; 1112 + 1113 + ret = mkdir_rdt_prepare(parent_kn, prgrp_kn, name, mode, RDTMON_GROUP, 1114 + &rdtgrp); 1115 + if (ret) 1116 + return ret; 1117 + 1118 + prgrp = rdtgrp->mon.parent; 1119 + rdtgrp->closid = prgrp->closid; 1120 + 1121 + /* 1122 + * Add the rdtgrp to the list of rdtgrps the parent 1123 + * ctrl_mon group has to track. 1124 + */ 1125 + list_add_tail(&rdtgrp->mon.crdtgrp_list, &prgrp->mon.crdtgrp_list); 1126 + 1127 + rdtgroup_kn_unlock(prgrp_kn); 1128 + return ret; 1129 + } 1130 + 1131 + /* 1132 + * These are rdtgroups created under the root directory. Can be used 1133 + * to allocate and monitor resources. 1134 + */ 1135 + static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn, 1136 + struct kernfs_node *prgrp_kn, 1137 + const char *name, umode_t mode) 1138 + { 1139 + struct rdtgroup *rdtgrp; 1140 + struct kernfs_node *kn; 1141 + u32 closid; 1142 + int ret; 1143 + 1144 + ret = mkdir_rdt_prepare(parent_kn, prgrp_kn, name, mode, RDTCTRL_GROUP, 1145 + &rdtgrp); 1146 + if (ret) 1147 + return ret; 1148 + 1149 + kn = rdtgrp->kn; 1150 + ret = closid_alloc(); 1151 + if (ret < 0) 1152 + goto out_common_fail; 1153 + closid = ret; 1154 + 1155 + rdtgrp->closid = closid; 1156 + list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups); 1157 + 1158 + if (rdt_mon_capable) { 1159 + /* 1160 + * Create an empty mon_groups directory to hold the subset 1161 + * of tasks and cpus to monitor. 1162 + */ 1163 + ret = mongroup_create_dir(kn, NULL, "mon_groups", NULL); 1164 + if (ret) 1165 + goto out_id_free; 1166 + } 1167 + 1168 + goto out_unlock; 1169 + 1170 + out_id_free: 1171 + closid_free(closid); 1172 + list_del(&rdtgrp->rdtgroup_list); 1173 + out_common_fail: 1174 + mkdir_rdt_prepare_clean(rdtgrp); 1175 + out_unlock: 1176 + rdtgroup_kn_unlock(prgrp_kn); 1177 + return ret; 1178 + } 1179 + 1180 + /* 1181 + * We allow creating mon groups only with in a directory called "mon_groups" 1182 + * which is present in every ctrl_mon group. Check if this is a valid 1183 + * "mon_groups" directory. 1184 + * 1185 + * 1. The directory should be named "mon_groups". 1186 + * 2. The mon group itself should "not" be named "mon_groups". 1187 + * This makes sure "mon_groups" directory always has a ctrl_mon group 1188 + * as parent. 1189 + */ 1190 + static bool is_mon_groups(struct kernfs_node *kn, const char *name) 1191 + { 1192 + return (!strcmp(kn->name, "mon_groups") && 1193 + strcmp(name, "mon_groups")); 1194 + } 1195 + 1196 + static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name, 1197 + umode_t mode) 1198 + { 1199 + /* Do not accept '\n' to avoid unparsable situation. */ 1200 + if (strchr(name, '\n')) 1201 + return -EINVAL; 1202 + 1203 + /* 1204 + * If the parent directory is the root directory and RDT 1205 + * allocation is supported, add a control and monitoring 1206 + * subdirectory 1207 + */ 1208 + if (rdt_alloc_capable && parent_kn == rdtgroup_default.kn) 1209 + return rdtgroup_mkdir_ctrl_mon(parent_kn, parent_kn, name, mode); 1210 + 1211 + /* 1212 + * If RDT monitoring is supported and the parent directory is a valid 1213 + * "mon_groups" directory, add a monitoring subdirectory. 1214 + */ 1215 + if (rdt_mon_capable && is_mon_groups(parent_kn, name)) 1216 + return rdtgroup_mkdir_mon(parent_kn, parent_kn->parent, name, mode); 1217 + 1218 + return -EPERM; 1219 + } 1220 + 1221 + static int rdtgroup_rmdir_mon(struct kernfs_node *kn, struct rdtgroup *rdtgrp, 1222 + cpumask_var_t tmpmask) 1223 + { 1224 + struct rdtgroup *prdtgrp = rdtgrp->mon.parent; 1225 + int cpu; 1226 + 1227 + /* Give any tasks back to the parent group */ 1228 + rdt_move_group_tasks(rdtgrp, prdtgrp, tmpmask); 1229 + 1230 + /* Update per cpu rmid of the moved CPUs first */ 1231 + for_each_cpu(cpu, &rdtgrp->cpu_mask) 1232 + per_cpu(pqr_state.default_rmid, cpu) = prdtgrp->mon.rmid; 1233 + /* 1234 + * Update the MSR on moved CPUs and CPUs which have moved 1235 + * task running on them. 1236 + */ 1237 + cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask); 1238 + update_closid_rmid(tmpmask, NULL); 1239 + 1240 + rdtgrp->flags = RDT_DELETED; 1241 + free_rmid(rdtgrp->mon.rmid); 1242 + 1243 + /* 1244 + * Remove the rdtgrp from the parent ctrl_mon group's list 1245 + */ 1246 + WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list)); 1247 + list_del(&rdtgrp->mon.crdtgrp_list); 1248 + 1249 + /* 1250 + * one extra hold on this, will drop when we kfree(rdtgrp) 1251 + * in rdtgroup_kn_unlock() 1252 + */ 1253 + kernfs_get(kn); 1254 + kernfs_remove(rdtgrp->kn); 1255 + 1256 + return 0; 1257 + } 1258 + 1259 + static int rdtgroup_rmdir_ctrl(struct kernfs_node *kn, struct rdtgroup *rdtgrp, 1260 + cpumask_var_t tmpmask) 1261 + { 1262 + int cpu; 1263 + 1264 + /* Give any tasks back to the default group */ 1265 + rdt_move_group_tasks(rdtgrp, &rdtgroup_default, tmpmask); 1266 + 1267 + /* Give any CPUs back to the default group */ 1268 + cpumask_or(&rdtgroup_default.cpu_mask, 1269 + &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); 1270 + 1271 + /* Update per cpu closid and rmid of the moved CPUs first */ 1272 + for_each_cpu(cpu, &rdtgrp->cpu_mask) { 1273 + per_cpu(pqr_state.default_closid, cpu) = rdtgroup_default.closid; 1274 + per_cpu(pqr_state.default_rmid, cpu) = rdtgroup_default.mon.rmid; 1275 + } 1276 + 1277 + /* 1278 + * Update the MSR on moved CPUs and CPUs which have moved 1279 + * task running on them. 1280 + */ 1281 + cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask); 1282 + update_closid_rmid(tmpmask, NULL); 1283 + 1284 + rdtgrp->flags = RDT_DELETED; 1285 + closid_free(rdtgrp->closid); 1286 + free_rmid(rdtgrp->mon.rmid); 1287 + 1288 + /* 1289 + * Free all the child monitor group rmids. 1290 + */ 1291 + free_all_child_rdtgrp(rdtgrp); 1292 + 1293 + list_del(&rdtgrp->rdtgroup_list); 1294 + 1295 + /* 1296 + * one extra hold on this, will drop when we kfree(rdtgrp) 1297 + * in rdtgroup_kn_unlock() 1298 + */ 1299 + kernfs_get(kn); 1300 + kernfs_remove(rdtgrp->kn); 1301 + 1302 + return 0; 1303 } 1304 1305 static int rdtgroup_rmdir(struct kernfs_node *kn) 1306 { 1307 + struct kernfs_node *parent_kn = kn->parent; 1308 struct rdtgroup *rdtgrp; 1309 cpumask_var_t tmpmask; 1310 + int ret = 0; 1311 1312 if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) 1313 return -ENOMEM; ··· 1093 goto out; 1094 } 1095 1096 /* 1097 + * If the rdtgroup is a ctrl_mon group and parent directory 1098 + * is the root directory, remove the ctrl_mon group. 1099 + * 1100 + * If the rdtgroup is a mon group and parent directory 1101 + * is a valid "mon_groups" directory, remove the mon group. 1102 */ 1103 + if (rdtgrp->type == RDTCTRL_GROUP && parent_kn == rdtgroup_default.kn) 1104 + ret = rdtgroup_rmdir_ctrl(kn, rdtgrp, tmpmask); 1105 + else if (rdtgrp->type == RDTMON_GROUP && 1106 + is_mon_groups(parent_kn, kn->name)) 1107 + ret = rdtgroup_rmdir_mon(kn, rdtgrp, tmpmask); 1108 + else 1109 + ret = -EPERM; 1110 1111 out: 1112 rdtgroup_kn_unlock(kn); 1113 free_cpumask_var(tmpmask); ··· 1129 1130 static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf) 1131 { 1132 + if (rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled) 1133 seq_puts(seq, ",cdp"); 1134 return 0; 1135 } ··· 1153 mutex_lock(&rdtgroup_mutex); 1154 1155 rdtgroup_default.closid = 0; 1156 + rdtgroup_default.mon.rmid = 0; 1157 + rdtgroup_default.type = RDTCTRL_GROUP; 1158 + INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list); 1159 + 1160 list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups); 1161 1162 + ret = rdtgroup_add_files(rdt_root->kn, RF_CTRL_BASE); 1163 if (ret) { 1164 kernfs_destroy_root(rdt_root); 1165 goto out;
+61 -6
arch/x86/kernel/cpu/intel_rdt_schemata.c arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
··· 26 #include <linux/kernfs.h> 27 #include <linux/seq_file.h> 28 #include <linux/slab.h> 29 - #include <asm/intel_rdt.h> 30 31 /* 32 * Check whether MBA bandwidth percentage value is correct. The value is ··· 192 { 193 struct rdt_resource *r; 194 195 - for_each_enabled_rdt_resource(r) { 196 if (!strcmp(resname, r->name) && closid < r->num_closid) 197 return parse_line(tok, r); 198 } ··· 221 222 closid = rdtgrp->closid; 223 224 - for_each_enabled_rdt_resource(r) { 225 list_for_each_entry(dom, &r->domains, list) 226 dom->have_new_ctrl = false; 227 } ··· 237 goto out; 238 } 239 240 - for_each_enabled_rdt_resource(r) { 241 ret = update_domains(r, closid); 242 if (ret) 243 goto out; ··· 269 { 270 struct rdtgroup *rdtgrp; 271 struct rdt_resource *r; 272 - int closid, ret = 0; 273 274 rdtgrp = rdtgroup_kn_lock_live(of->kn); 275 if (rdtgrp) { 276 closid = rdtgrp->closid; 277 - for_each_enabled_rdt_resource(r) { 278 if (closid < r->num_closid) 279 show_doms(s, r, closid); 280 } 281 } else { 282 ret = -ENOENT; 283 } 284 rdtgroup_kn_unlock(of->kn); 285 return ret; 286 }
··· 26 #include <linux/kernfs.h> 27 #include <linux/seq_file.h> 28 #include <linux/slab.h> 29 + #include "intel_rdt.h" 30 31 /* 32 * Check whether MBA bandwidth percentage value is correct. The value is ··· 192 { 193 struct rdt_resource *r; 194 195 + for_each_alloc_enabled_rdt_resource(r) { 196 if (!strcmp(resname, r->name) && closid < r->num_closid) 197 return parse_line(tok, r); 198 } ··· 221 222 closid = rdtgrp->closid; 223 224 + for_each_alloc_enabled_rdt_resource(r) { 225 list_for_each_entry(dom, &r->domains, list) 226 dom->have_new_ctrl = false; 227 } ··· 237 goto out; 238 } 239 240 + for_each_alloc_enabled_rdt_resource(r) { 241 ret = update_domains(r, closid); 242 if (ret) 243 goto out; ··· 269 { 270 struct rdtgroup *rdtgrp; 271 struct rdt_resource *r; 272 + int ret = 0; 273 + u32 closid; 274 275 rdtgrp = rdtgroup_kn_lock_live(of->kn); 276 if (rdtgrp) { 277 closid = rdtgrp->closid; 278 + for_each_alloc_enabled_rdt_resource(r) { 279 if (closid < r->num_closid) 280 show_doms(s, r, closid); 281 } 282 } else { 283 ret = -ENOENT; 284 } 285 + rdtgroup_kn_unlock(of->kn); 286 + return ret; 287 + } 288 + 289 + void mon_event_read(struct rmid_read *rr, struct rdt_domain *d, 290 + struct rdtgroup *rdtgrp, int evtid, int first) 291 + { 292 + /* 293 + * setup the parameters to send to the IPI to read the data. 294 + */ 295 + rr->rgrp = rdtgrp; 296 + rr->evtid = evtid; 297 + rr->d = d; 298 + rr->val = 0; 299 + rr->first = first; 300 + 301 + smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1); 302 + } 303 + 304 + int rdtgroup_mondata_show(struct seq_file *m, void *arg) 305 + { 306 + struct kernfs_open_file *of = m->private; 307 + u32 resid, evtid, domid; 308 + struct rdtgroup *rdtgrp; 309 + struct rdt_resource *r; 310 + union mon_data_bits md; 311 + struct rdt_domain *d; 312 + struct rmid_read rr; 313 + int ret = 0; 314 + 315 + rdtgrp = rdtgroup_kn_lock_live(of->kn); 316 + 317 + md.priv = of->kn->priv; 318 + resid = md.u.rid; 319 + domid = md.u.domid; 320 + evtid = md.u.evtid; 321 + 322 + r = &rdt_resources_all[resid]; 323 + d = rdt_find_domain(r, domid, NULL); 324 + if (!d) { 325 + ret = -ENOENT; 326 + goto out; 327 + } 328 + 329 + mon_event_read(&rr, d, rdtgrp, evtid, false); 330 + 331 + if (rr.val & RMID_VAL_ERROR) 332 + seq_puts(m, "Error\n"); 333 + else if (rr.val & RMID_VAL_UNAVAIL) 334 + seq_puts(m, "Unavailable\n"); 335 + else 336 + seq_printf(m, "%llu\n", rr.val * r->mon_scale); 337 + 338 + out: 339 rdtgroup_kn_unlock(of->kn); 340 return ret; 341 }
+1 -1
arch/x86/kernel/process_32.c
··· 56 #include <asm/debugreg.h> 57 #include <asm/switch_to.h> 58 #include <asm/vm86.h> 59 - #include <asm/intel_rdt.h> 60 #include <asm/proto.h> 61 62 void __show_regs(struct pt_regs *regs, int all)
··· 56 #include <asm/debugreg.h> 57 #include <asm/switch_to.h> 58 #include <asm/vm86.h> 59 + #include <asm/intel_rdt_sched.h> 60 #include <asm/proto.h> 61 62 void __show_regs(struct pt_regs *regs, int all)
+1 -1
arch/x86/kernel/process_64.c
··· 52 #include <asm/switch_to.h> 53 #include <asm/xen/hypervisor.h> 54 #include <asm/vdso.h> 55 - #include <asm/intel_rdt.h> 56 #include <asm/unistd.h> 57 #ifdef CONFIG_IA32_EMULATION 58 /* Not included via unistd.h */
··· 52 #include <asm/switch_to.h> 53 #include <asm/xen/hypervisor.h> 54 #include <asm/vdso.h> 55 + #include <asm/intel_rdt_sched.h> 56 #include <asm/unistd.h> 57 #ifdef CONFIG_IA32_EMULATION 58 /* Not included via unistd.h */
-18
include/linux/perf_event.h
··· 139 /* for tp_event->class */ 140 struct list_head tp_list; 141 }; 142 - struct { /* intel_cqm */ 143 - int cqm_state; 144 - u32 cqm_rmid; 145 - int is_group_event; 146 - struct list_head cqm_events_entry; 147 - struct list_head cqm_groups_entry; 148 - struct list_head cqm_group_entry; 149 - }; 150 struct { /* amd_power */ 151 u64 pwr_acc; 152 u64 ptsc; ··· 404 */ 405 size_t task_ctx_size; 406 407 - 408 - /* 409 - * Return the count value for a counter. 410 - */ 411 - u64 (*count) (struct perf_event *event); /*optional*/ 412 413 /* 414 * Set up pmu-private data structures for an AUX area ··· 1097 1098 if (static_branch_unlikely(&perf_sched_events)) 1099 __perf_event_task_sched_out(prev, next); 1100 - } 1101 - 1102 - static inline u64 __perf_event_count(struct perf_event *event) 1103 - { 1104 - return local64_read(&event->count) + atomic64_read(&event->child_count); 1105 } 1106 1107 extern void perf_event_mmap(struct vm_area_struct *vma);
··· 139 /* for tp_event->class */ 140 struct list_head tp_list; 141 }; 142 struct { /* amd_power */ 143 u64 pwr_acc; 144 u64 ptsc; ··· 412 */ 413 size_t task_ctx_size; 414 415 416 /* 417 * Set up pmu-private data structures for an AUX area ··· 1110 1111 if (static_branch_unlikely(&perf_sched_events)) 1112 __perf_event_task_sched_out(prev, next); 1113 } 1114 1115 extern void perf_event_mmap(struct vm_area_struct *vma);
+3 -2
include/linux/sched.h
··· 909 /* cg_list protected by css_set_lock and tsk->alloc_lock: */ 910 struct list_head cg_list; 911 #endif 912 - #ifdef CONFIG_INTEL_RDT_A 913 - int closid; 914 #endif 915 #ifdef CONFIG_FUTEX 916 struct robust_list_head __user *robust_list;
··· 909 /* cg_list protected by css_set_lock and tsk->alloc_lock: */ 910 struct list_head cg_list; 911 #endif 912 + #ifdef CONFIG_INTEL_RDT 913 + u32 closid; 914 + u32 rmid; 915 #endif 916 #ifdef CONFIG_FUTEX 917 struct robust_list_head __user *robust_list;
+1 -13
kernel/events/core.c
··· 3673 3674 static inline u64 perf_event_count(struct perf_event *event) 3675 { 3676 - if (event->pmu->count) 3677 - return event->pmu->count(event); 3678 - 3679 - return __perf_event_count(event); 3680 } 3681 3682 /* ··· 3700 * all child counters from atomic context. 3701 */ 3702 if (event->attr.inherit) { 3703 - ret = -EOPNOTSUPP; 3704 - goto out; 3705 - } 3706 - 3707 - /* 3708 - * It must not have a pmu::count method, those are not 3709 - * NMI safe. 3710 - */ 3711 - if (event->pmu->count) { 3712 ret = -EOPNOTSUPP; 3713 goto out; 3714 }
··· 3673 3674 static inline u64 perf_event_count(struct perf_event *event) 3675 { 3676 + return local64_read(&event->count) + atomic64_read(&event->child_count); 3677 } 3678 3679 /* ··· 3703 * all child counters from atomic context. 3704 */ 3705 if (event->attr.inherit) { 3706 ret = -EOPNOTSUPP; 3707 goto out; 3708 }