Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

- Add support for AMD's Smart Data Cache Injection feature which allows
for direct insertion of data from I/O devices into the L3 cache, thus
bypassing DRAM and saving its bandwidth; the resctrl side of the
feature allows the size of the L3 used for data injection to be
controlled

- Add Intel Clearwater Forest to the list of CPUs which support
Sub-NUMA clustering

- Other fixes and cleanups

* tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/resctrl: Update bit_usage to reflect io_alloc
fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks
fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID
fs/resctrl: Introduce interface to display io_alloc CBMs
fs/resctrl: Add user interface to enable/disable io_alloc feature
fs/resctrl: Introduce interface to display "io_alloc" support
x86,fs/resctrl: Implement "io_alloc" enable/disable handlers
x86,fs/resctrl: Detect io_alloc feature
x86/resctrl: Add SDCIAE feature in the command line options
x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement
fs/resctrl: Consider sparse masks when initializing new group's allocation
x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest

+580 -47
+1 -1
Documentation/admin-guide/kernel-parameters.txt
··· 6207 6207 rdt= [HW,X86,RDT] 6208 6208 Turn on/off individual RDT features. List is: 6209 6209 cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, 6210 - mba, smba, bmec, abmc. 6210 + mba, smba, bmec, abmc, sdciae. 6211 6211 E.g. to turn on cmt and turn off mba use: 6212 6212 rdt=cmt,!mba 6213 6213
+109 -25
Documentation/filesystems/resctrl.rst
··· 17 17 This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo 18 18 flag bits: 19 19 20 - =============================================== ================================ 21 - RDT (Resource Director Technology) Allocation "rdt_a" 22 - CAT (Cache Allocation Technology) "cat_l3", "cat_l2" 23 - CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2" 24 - CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc" 25 - MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local" 26 - MBA (Memory Bandwidth Allocation) "mba" 27 - SMBA (Slow Memory Bandwidth Allocation) "" 28 - BMEC (Bandwidth Monitoring Event Configuration) "" 29 - ABMC (Assignable Bandwidth Monitoring Counters) "" 30 - =============================================== ================================ 20 + =============================================================== ================================ 21 + RDT (Resource Director Technology) Allocation "rdt_a" 22 + CAT (Cache Allocation Technology) "cat_l3", "cat_l2" 23 + CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2" 24 + CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc" 25 + MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local" 26 + MBA (Memory Bandwidth Allocation) "mba" 27 + SMBA (Slow Memory Bandwidth Allocation) "" 28 + BMEC (Bandwidth Monitoring Event Configuration) "" 29 + ABMC (Assignable Bandwidth Monitoring Counters) "" 30 + SDCIAE (Smart Data Cache Injection Allocation Enforcement) "" 31 + =============================================================== ================================ 31 32 32 33 Historically, new features were made visible by default in /proc/cpuinfo. This 33 34 resulted in the feature flags becoming hard to parse by humans. Adding a new ··· 73 72 resources. Each resource has its own subdirectory. The subdirectory 74 73 names reflect the resource names. 75 74 75 + Most of the files in the resource's subdirectory are read-only, and 76 + describe properties of the resource. Resources that support global 77 + configuration options also include writable files that can be used 78 + to modify those settings. 79 + 76 80 Each subdirectory contains the following files with respect to 77 81 allocation: 78 82 ··· 96 90 must be set when writing a mask. 97 91 98 92 "shareable_bits": 99 - Bitmask of shareable resource with other executing 100 - entities (e.g. I/O). User can use this when 101 - setting up exclusive cache partitions. Note that 102 - some platforms support devices that have their 103 - own settings for cache use which can over-ride 104 - these bits. 93 + Bitmask of shareable resource with other executing entities 94 + (e.g. I/O). Applies to all instances of this resource. User 95 + can use this when setting up exclusive cache partitions. 96 + Note that some platforms support devices that have their 97 + own settings for cache use which can over-ride these bits. 98 + 99 + When "io_alloc" is enabled, a portion of each cache instance can 100 + be configured for shared use between hardware and software. 101 + "bit_usage" should be used to see which portions of each cache 102 + instance is configured for hardware use via "io_alloc" feature 103 + because every cache instance can have its "io_alloc" bitmask 104 + configured independently via "io_alloc_cbm". 105 + 105 106 "bit_usage": 106 107 Annotated capacity bitmasks showing how all 107 108 instances of the resource are used. The legend is: ··· 122 109 "H": 123 110 Corresponding region is used by hardware only 124 111 but available for software use. If a resource 125 - has bits set in "shareable_bits" but not all 126 - of these bits appear in the resource groups' 127 - schematas then the bits appearing in 128 - "shareable_bits" but no resource group will 129 - be marked as "H". 112 + has bits set in "shareable_bits" or "io_alloc_cbm" 113 + but not all of these bits appear in the resource 114 + groups' schemata then the bits appearing in 115 + "shareable_bits" or "io_alloc_cbm" but no 116 + resource group will be marked as "H". 130 117 "X": 131 118 Corresponding region is available for sharing and 132 - used by hardware and software. These are the 133 - bits that appear in "shareable_bits" as 134 - well as a resource group's allocation. 119 + used by hardware and software. These are the bits 120 + that appear in "shareable_bits" or "io_alloc_cbm" 121 + as well as a resource group's allocation. 135 122 "S": 136 123 Corresponding region is used by software 137 124 and available for sharing. ··· 148 135 Only contiguous 1s value in CBM is supported. 149 136 "1": 150 137 Non-contiguous 1s value in CBM is supported. 138 + 139 + "io_alloc": 140 + "io_alloc" enables system software to configure the portion of 141 + the cache allocated for I/O traffic. File may only exist if the 142 + system supports this feature on some of its cache resources. 143 + 144 + "disabled": 145 + Resource supports "io_alloc" but the feature is disabled. 146 + Portions of cache used for allocation of I/O traffic cannot 147 + be configured. 148 + "enabled": 149 + Portions of cache used for allocation of I/O traffic 150 + can be configured using "io_alloc_cbm". 151 + "not supported": 152 + Support not available for this resource. 153 + 154 + The feature can be modified by writing to the interface, for example: 155 + 156 + To enable:: 157 + 158 + # echo 1 > /sys/fs/resctrl/info/L3/io_alloc 159 + 160 + To disable:: 161 + 162 + # echo 0 > /sys/fs/resctrl/info/L3/io_alloc 163 + 164 + The underlying implementation may reduce resources available to 165 + general (CPU) cache allocation. See architecture specific notes 166 + below. Depending on usage requirements the feature can be enabled 167 + or disabled. 168 + 169 + On AMD systems, io_alloc feature is supported by the L3 Smart 170 + Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for 171 + io_alloc is the highest CLOSID supported by the resource. When 172 + io_alloc is enabled, the highest CLOSID is dedicated to io_alloc and 173 + no longer available for general (CPU) cache allocation. When CDP is 174 + enabled, io_alloc routes I/O traffic using the highest CLOSID allocated 175 + for the instruction cache (CDP_CODE), making this CLOSID no longer 176 + available for general (CPU) cache allocation for both the CDP_CODE 177 + and CDP_DATA resources. 178 + 179 + "io_alloc_cbm": 180 + Capacity bitmasks that describe the portions of cache instances to 181 + which I/O traffic from supported I/O devices are routed when "io_alloc" 182 + is enabled. 183 + 184 + CBMs are displayed in the following format: 185 + 186 + <cache_id0>=<cbm>;<cache_id1>=<cbm>;... 187 + 188 + Example:: 189 + 190 + # cat /sys/fs/resctrl/info/L3/io_alloc_cbm 191 + 0=ffff;1=ffff 192 + 193 + CBMs can be configured by writing to the interface. 194 + 195 + Example:: 196 + 197 + # echo 1=ff > /sys/fs/resctrl/info/L3/io_alloc_cbm 198 + # cat /sys/fs/resctrl/info/L3/io_alloc_cbm 199 + 0=ffff;1=00ff 200 + 201 + # echo "0=ff;1=f" > /sys/fs/resctrl/info/L3/io_alloc_cbm 202 + # cat /sys/fs/resctrl/info/L3/io_alloc_cbm 203 + 0=00ff;1=000f 204 + 205 + When CDP is enabled "io_alloc_cbm" associated with the CDP_DATA and CDP_CODE 206 + resources may reflect the same values. For example, values read from and 207 + written to /sys/fs/resctrl/info/L3DATA/io_alloc_cbm may be reflected by 208 + /sys/fs/resctrl/info/L3CODE/io_alloc_cbm and vice versa. 151 209 152 210 Memory bandwidth(MB) subdirectory contains the following files 153 211 with respect to allocation:
+2
arch/x86/include/asm/cpufeatures.h
··· 500 500 #define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */ 501 501 #define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */ 502 502 503 + #define X86_FEATURE_SDCIAE (21*32+18) /* L3 Smart Data Cache Injection Allocation Enforcement */ 504 + 503 505 /* 504 506 * BUG word(s) 505 507 */
+1
arch/x86/kernel/cpu/cpuid-deps.c
··· 72 72 { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC }, 73 73 { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL }, 74 74 { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL }, 75 + { X86_FEATURE_SDCIAE, X86_FEATURE_CAT_L3 }, 75 76 { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL }, 76 77 { X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW }, 77 78 { X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
+9
arch/x86/kernel/cpu/resctrl/core.c
··· 274 274 rdt_resources_all[level].r_resctrl.cdp_capable = true; 275 275 } 276 276 277 + static void rdt_set_io_alloc_capable(struct rdt_resource *r) 278 + { 279 + r->cache.io_alloc_capable = true; 280 + } 281 + 277 282 static void rdt_get_cdp_l3_config(void) 278 283 { 279 284 rdt_get_cdp_config(RDT_RESOURCE_L3); ··· 724 719 RDT_FLAG_SMBA, 725 720 RDT_FLAG_BMEC, 726 721 RDT_FLAG_ABMC, 722 + RDT_FLAG_SDCIAE, 727 723 }; 728 724 729 725 #define RDT_OPT(idx, n, f) \ ··· 751 745 RDT_OPT(RDT_FLAG_SMBA, "smba", X86_FEATURE_SMBA), 752 746 RDT_OPT(RDT_FLAG_BMEC, "bmec", X86_FEATURE_BMEC), 753 747 RDT_OPT(RDT_FLAG_ABMC, "abmc", X86_FEATURE_ABMC), 748 + RDT_OPT(RDT_FLAG_SDCIAE, "sdciae", X86_FEATURE_SDCIAE), 754 749 }; 755 750 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options) 756 751 ··· 860 853 rdt_get_cache_alloc_cfg(1, r); 861 854 if (rdt_cpu_has(X86_FEATURE_CDP_L3)) 862 855 rdt_get_cdp_l3_config(); 856 + if (rdt_cpu_has(X86_FEATURE_SDCIAE)) 857 + rdt_set_io_alloc_capable(r); 863 858 ret = true; 864 859 } 865 860 if (rdt_cpu_has(X86_FEATURE_CAT_L2)) {
+40
arch/x86/kernel/cpu/resctrl/ctrlmondata.c
··· 91 91 92 92 return hw_dom->ctrl_val[idx]; 93 93 } 94 + 95 + bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r) 96 + { 97 + return resctrl_to_arch_res(r)->sdciae_enabled; 98 + } 99 + 100 + static void resctrl_sdciae_set_one_amd(void *arg) 101 + { 102 + bool *enable = arg; 103 + 104 + if (*enable) 105 + msr_set_bit(MSR_IA32_L3_QOS_EXT_CFG, SDCIAE_ENABLE_BIT); 106 + else 107 + msr_clear_bit(MSR_IA32_L3_QOS_EXT_CFG, SDCIAE_ENABLE_BIT); 108 + } 109 + 110 + static void _resctrl_sdciae_enable(struct rdt_resource *r, bool enable) 111 + { 112 + struct rdt_ctrl_domain *d; 113 + 114 + /* Walking r->ctrl_domains, ensure it can't race with cpuhp */ 115 + lockdep_assert_cpus_held(); 116 + 117 + /* Update MSR_IA32_L3_QOS_EXT_CFG MSR on all the CPUs in all domains */ 118 + list_for_each_entry(d, &r->ctrl_domains, hdr.list) 119 + on_each_cpu_mask(&d->hdr.cpu_mask, resctrl_sdciae_set_one_amd, &enable, 1); 120 + } 121 + 122 + int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable) 123 + { 124 + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r); 125 + 126 + if (hw_res->r_resctrl.cache.io_alloc_capable && 127 + hw_res->sdciae_enabled != enable) { 128 + _resctrl_sdciae_enable(r, enable); 129 + hw_res->sdciae_enabled = enable; 130 + } 131 + 132 + return 0; 133 + }
+5
arch/x86/kernel/cpu/resctrl/internal.h
··· 46 46 #define ABMC_EXTENDED_EVT_ID BIT(31) 47 47 #define ABMC_EVT_ID BIT(0) 48 48 49 + /* Setting bit 1 in MSR_IA32_L3_QOS_EXT_CFG enables the SDCIAE feature. */ 50 + #define SDCIAE_ENABLE_BIT 1 51 + 49 52 /** 50 53 * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share 51 54 * a resource for a control function ··· 115 112 * @mbm_width: Monitor width, to detect and correct for overflow. 116 113 * @cdp_enabled: CDP state of this resource 117 114 * @mbm_cntr_assign_enabled: ABMC feature is enabled 115 + * @sdciae_enabled: SDCIAE feature (backing "io_alloc") is enabled. 118 116 * 119 117 * Members of this structure are either private to the architecture 120 118 * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g. ··· 130 126 unsigned int mbm_width; 131 127 bool cdp_enabled; 132 128 bool mbm_cntr_assign_enabled; 129 + bool sdciae_enabled; 133 130 }; 134 131 135 132 static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
+1
arch/x86/kernel/cpu/resctrl/monitor.c
··· 361 361 X86_MATCH_VFM(INTEL_EMERALDRAPIDS_X, 0), 362 362 X86_MATCH_VFM(INTEL_GRANITERAPIDS_X, 0), 363 363 X86_MATCH_VFM(INTEL_ATOM_CRESTMONT_X, 0), 364 + X86_MATCH_VFM(INTEL_ATOM_DARKMONT_X, 0), 364 365 {} 365 366 }; 366 367
+1
arch/x86/kernel/cpu/scattered.c
··· 53 53 { X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 }, 54 54 { X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 }, 55 55 { X86_FEATURE_ABMC, CPUID_EBX, 5, 0x80000020, 0 }, 56 + { X86_FEATURE_SDCIAE, CPUID_EBX, 6, 0x80000020, 0 }, 56 57 { X86_FEATURE_TSA_SQ_NO, CPUID_ECX, 1, 0x80000021, 0 }, 57 58 { X86_FEATURE_TSA_L1_NO, CPUID_ECX, 2, 0x80000021, 0 }, 58 59 { X86_FEATURE_AMD_WORKLOAD_CLASS, CPUID_EAX, 22, 0x80000021, 0 },
+295 -14
fs/resctrl/ctrlmondata.c
··· 24 24 #include "internal.h" 25 25 26 26 struct rdt_parse_data { 27 - struct rdtgroup *rdtgrp; 27 + u32 closid; 28 + enum rdtgrp_mode mode; 28 29 char *buf; 29 30 }; 30 31 ··· 78 77 struct rdt_ctrl_domain *d) 79 78 { 80 79 struct resctrl_staged_config *cfg; 81 - u32 closid = data->rdtgrp->closid; 82 80 struct rdt_resource *r = s->res; 81 + u32 closid = data->closid; 83 82 u32 bw_val; 84 83 85 84 cfg = &d->staged_config[s->conf_type]; ··· 157 156 static int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s, 158 157 struct rdt_ctrl_domain *d) 159 158 { 160 - struct rdtgroup *rdtgrp = data->rdtgrp; 159 + enum rdtgrp_mode mode = data->mode; 161 160 struct resctrl_staged_config *cfg; 162 161 struct rdt_resource *r = s->res; 162 + u32 closid = data->closid; 163 163 u32 cbm_val; 164 164 165 165 cfg = &d->staged_config[s->conf_type]; ··· 173 171 * Cannot set up more than one pseudo-locked region in a cache 174 172 * hierarchy. 175 173 */ 176 - if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP && 174 + if (mode == RDT_MODE_PSEUDO_LOCKSETUP && 177 175 rdtgroup_pseudo_locked_in_hierarchy(d)) { 178 176 rdt_last_cmd_puts("Pseudo-locked region in hierarchy\n"); 179 177 return -EINVAL; ··· 182 180 if (!cbm_validate(data->buf, &cbm_val, r)) 183 181 return -EINVAL; 184 182 185 - if ((rdtgrp->mode == RDT_MODE_EXCLUSIVE || 186 - rdtgrp->mode == RDT_MODE_SHAREABLE) && 183 + if ((mode == RDT_MODE_EXCLUSIVE || mode == RDT_MODE_SHAREABLE) && 187 184 rdtgroup_cbm_overlaps_pseudo_locked(d, cbm_val)) { 188 185 rdt_last_cmd_puts("CBM overlaps with pseudo-locked region\n"); 189 186 return -EINVAL; ··· 192 191 * The CBM may not overlap with the CBM of another closid if 193 192 * either is exclusive. 194 193 */ 195 - if (rdtgroup_cbm_overlaps(s, d, cbm_val, rdtgrp->closid, true)) { 194 + if (rdtgroup_cbm_overlaps(s, d, cbm_val, closid, true)) { 196 195 rdt_last_cmd_puts("Overlaps with exclusive group\n"); 197 196 return -EINVAL; 198 197 } 199 198 200 - if (rdtgroup_cbm_overlaps(s, d, cbm_val, rdtgrp->closid, false)) { 201 - if (rdtgrp->mode == RDT_MODE_EXCLUSIVE || 202 - rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) { 199 + if (rdtgroup_cbm_overlaps(s, d, cbm_val, closid, false)) { 200 + if (mode == RDT_MODE_EXCLUSIVE || 201 + mode == RDT_MODE_PSEUDO_LOCKSETUP) { 203 202 rdt_last_cmd_puts("Overlaps with other group\n"); 204 203 return -EINVAL; 205 204 } ··· 263 262 list_for_each_entry(d, &r->ctrl_domains, hdr.list) { 264 263 if (d->hdr.id == dom_id) { 265 264 data.buf = dom; 266 - data.rdtgrp = rdtgrp; 265 + data.closid = rdtgrp->closid; 266 + data.mode = rdtgrp->mode; 267 267 if (parse_ctrlval(&data, s, d)) 268 268 return -EINVAL; 269 269 if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) { ··· 383 381 return ret ?: nbytes; 384 382 } 385 383 386 - static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid) 384 + static void show_doms(struct seq_file *s, struct resctrl_schema *schema, 385 + char *resource_name, int closid) 387 386 { 388 387 struct rdt_resource *r = schema->res; 389 388 struct rdt_ctrl_domain *dom; ··· 394 391 /* Walking r->domains, ensure it can't race with cpuhp */ 395 392 lockdep_assert_cpus_held(); 396 393 397 - seq_printf(s, "%*s:", max_name_width, schema->name); 394 + if (resource_name) 395 + seq_printf(s, "%*s:", max_name_width, resource_name); 398 396 list_for_each_entry(dom, &r->ctrl_domains, hdr.list) { 399 397 if (sep) 400 398 seq_puts(s, ";"); ··· 441 437 closid = rdtgrp->closid; 442 438 list_for_each_entry(schema, &resctrl_schema_all, list) { 443 439 if (closid < schema->num_closid) 444 - show_doms(s, schema, closid); 440 + show_doms(s, schema, schema->name, closid); 445 441 } 446 442 } 447 443 } else { ··· 679 675 out: 680 676 rdtgroup_kn_unlock(of->kn); 681 677 return ret; 678 + } 679 + 680 + int resctrl_io_alloc_show(struct kernfs_open_file *of, struct seq_file *seq, void *v) 681 + { 682 + struct resctrl_schema *s = rdt_kn_parent_priv(of->kn); 683 + struct rdt_resource *r = s->res; 684 + 685 + mutex_lock(&rdtgroup_mutex); 686 + 687 + if (r->cache.io_alloc_capable) { 688 + if (resctrl_arch_get_io_alloc_enabled(r)) 689 + seq_puts(seq, "enabled\n"); 690 + else 691 + seq_puts(seq, "disabled\n"); 692 + } else { 693 + seq_puts(seq, "not supported\n"); 694 + } 695 + 696 + mutex_unlock(&rdtgroup_mutex); 697 + 698 + return 0; 699 + } 700 + 701 + /* 702 + * resctrl_io_alloc_closid_supported() - io_alloc feature utilizes the 703 + * highest CLOSID value to direct I/O traffic. Ensure that io_alloc_closid 704 + * is in the supported range. 705 + */ 706 + static bool resctrl_io_alloc_closid_supported(u32 io_alloc_closid) 707 + { 708 + return io_alloc_closid < closids_supported(); 709 + } 710 + 711 + /* 712 + * Initialize io_alloc CLOSID cache resource CBM with all usable (shared 713 + * and unused) cache portions. 714 + */ 715 + static int resctrl_io_alloc_init_cbm(struct resctrl_schema *s, u32 closid) 716 + { 717 + enum resctrl_conf_type peer_type; 718 + struct rdt_resource *r = s->res; 719 + struct rdt_ctrl_domain *d; 720 + int ret; 721 + 722 + rdt_staged_configs_clear(); 723 + 724 + ret = rdtgroup_init_cat(s, closid); 725 + if (ret < 0) 726 + goto out; 727 + 728 + /* Keep CDP_CODE and CDP_DATA of io_alloc CLOSID's CBM in sync. */ 729 + if (resctrl_arch_get_cdp_enabled(r->rid)) { 730 + peer_type = resctrl_peer_type(s->conf_type); 731 + list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) 732 + memcpy(&d->staged_config[peer_type], 733 + &d->staged_config[s->conf_type], 734 + sizeof(d->staged_config[0])); 735 + } 736 + 737 + ret = resctrl_arch_update_domains(r, closid); 738 + out: 739 + rdt_staged_configs_clear(); 740 + return ret; 741 + } 742 + 743 + /* 744 + * resctrl_io_alloc_closid() - io_alloc feature routes I/O traffic using 745 + * the highest available CLOSID. Retrieve the maximum CLOSID supported by the 746 + * resource. Note that if Code Data Prioritization (CDP) is enabled, the number 747 + * of available CLOSIDs is reduced by half. 748 + */ 749 + u32 resctrl_io_alloc_closid(struct rdt_resource *r) 750 + { 751 + if (resctrl_arch_get_cdp_enabled(r->rid)) 752 + return resctrl_arch_get_num_closid(r) / 2 - 1; 753 + else 754 + return resctrl_arch_get_num_closid(r) - 1; 755 + } 756 + 757 + ssize_t resctrl_io_alloc_write(struct kernfs_open_file *of, char *buf, 758 + size_t nbytes, loff_t off) 759 + { 760 + struct resctrl_schema *s = rdt_kn_parent_priv(of->kn); 761 + struct rdt_resource *r = s->res; 762 + char const *grp_name; 763 + u32 io_alloc_closid; 764 + bool enable; 765 + int ret; 766 + 767 + ret = kstrtobool(buf, &enable); 768 + if (ret) 769 + return ret; 770 + 771 + cpus_read_lock(); 772 + mutex_lock(&rdtgroup_mutex); 773 + 774 + rdt_last_cmd_clear(); 775 + 776 + if (!r->cache.io_alloc_capable) { 777 + rdt_last_cmd_printf("io_alloc is not supported on %s\n", s->name); 778 + ret = -ENODEV; 779 + goto out_unlock; 780 + } 781 + 782 + /* If the feature is already up to date, no action is needed. */ 783 + if (resctrl_arch_get_io_alloc_enabled(r) == enable) 784 + goto out_unlock; 785 + 786 + io_alloc_closid = resctrl_io_alloc_closid(r); 787 + if (!resctrl_io_alloc_closid_supported(io_alloc_closid)) { 788 + rdt_last_cmd_printf("io_alloc CLOSID (ctrl_hw_id) %u is not available\n", 789 + io_alloc_closid); 790 + ret = -EINVAL; 791 + goto out_unlock; 792 + } 793 + 794 + if (enable) { 795 + if (!closid_alloc_fixed(io_alloc_closid)) { 796 + grp_name = rdtgroup_name_by_closid(io_alloc_closid); 797 + WARN_ON_ONCE(!grp_name); 798 + rdt_last_cmd_printf("CLOSID (ctrl_hw_id) %u for io_alloc is used by %s group\n", 799 + io_alloc_closid, grp_name ? grp_name : "another"); 800 + ret = -ENOSPC; 801 + goto out_unlock; 802 + } 803 + 804 + ret = resctrl_io_alloc_init_cbm(s, io_alloc_closid); 805 + if (ret) { 806 + rdt_last_cmd_puts("Failed to initialize io_alloc allocations\n"); 807 + closid_free(io_alloc_closid); 808 + goto out_unlock; 809 + } 810 + } else { 811 + closid_free(io_alloc_closid); 812 + } 813 + 814 + ret = resctrl_arch_io_alloc_enable(r, enable); 815 + if (enable && ret) { 816 + rdt_last_cmd_puts("Failed to enable io_alloc feature\n"); 817 + closid_free(io_alloc_closid); 818 + } 819 + 820 + out_unlock: 821 + mutex_unlock(&rdtgroup_mutex); 822 + cpus_read_unlock(); 823 + 824 + return ret ?: nbytes; 825 + } 826 + 827 + int resctrl_io_alloc_cbm_show(struct kernfs_open_file *of, struct seq_file *seq, void *v) 828 + { 829 + struct resctrl_schema *s = rdt_kn_parent_priv(of->kn); 830 + struct rdt_resource *r = s->res; 831 + int ret = 0; 832 + 833 + cpus_read_lock(); 834 + mutex_lock(&rdtgroup_mutex); 835 + 836 + rdt_last_cmd_clear(); 837 + 838 + if (!r->cache.io_alloc_capable) { 839 + rdt_last_cmd_printf("io_alloc is not supported on %s\n", s->name); 840 + ret = -ENODEV; 841 + goto out_unlock; 842 + } 843 + 844 + if (!resctrl_arch_get_io_alloc_enabled(r)) { 845 + rdt_last_cmd_printf("io_alloc is not enabled on %s\n", s->name); 846 + ret = -EINVAL; 847 + goto out_unlock; 848 + } 849 + 850 + /* 851 + * When CDP is enabled, the CBMs of the highest CLOSID of CDP_CODE and 852 + * CDP_DATA are kept in sync. As a result, the io_alloc CBMs shown for 853 + * either CDP resource are identical and accurately represent the CBMs 854 + * used for I/O. 855 + */ 856 + show_doms(seq, s, NULL, resctrl_io_alloc_closid(r)); 857 + 858 + out_unlock: 859 + mutex_unlock(&rdtgroup_mutex); 860 + cpus_read_unlock(); 861 + return ret; 862 + } 863 + 864 + static int resctrl_io_alloc_parse_line(char *line, struct rdt_resource *r, 865 + struct resctrl_schema *s, u32 closid) 866 + { 867 + enum resctrl_conf_type peer_type; 868 + struct rdt_parse_data data; 869 + struct rdt_ctrl_domain *d; 870 + char *dom = NULL, *id; 871 + unsigned long dom_id; 872 + 873 + next: 874 + if (!line || line[0] == '\0') 875 + return 0; 876 + 877 + dom = strsep(&line, ";"); 878 + id = strsep(&dom, "="); 879 + if (!dom || kstrtoul(id, 10, &dom_id)) { 880 + rdt_last_cmd_puts("Missing '=' or non-numeric domain\n"); 881 + return -EINVAL; 882 + } 883 + 884 + dom = strim(dom); 885 + list_for_each_entry(d, &r->ctrl_domains, hdr.list) { 886 + if (d->hdr.id == dom_id) { 887 + data.buf = dom; 888 + data.mode = RDT_MODE_SHAREABLE; 889 + data.closid = closid; 890 + if (parse_cbm(&data, s, d)) 891 + return -EINVAL; 892 + /* 893 + * Keep io_alloc CLOSID's CBM of CDP_CODE and CDP_DATA 894 + * in sync. 895 + */ 896 + if (resctrl_arch_get_cdp_enabled(r->rid)) { 897 + peer_type = resctrl_peer_type(s->conf_type); 898 + memcpy(&d->staged_config[peer_type], 899 + &d->staged_config[s->conf_type], 900 + sizeof(d->staged_config[0])); 901 + } 902 + goto next; 903 + } 904 + } 905 + 906 + return -EINVAL; 907 + } 908 + 909 + ssize_t resctrl_io_alloc_cbm_write(struct kernfs_open_file *of, char *buf, 910 + size_t nbytes, loff_t off) 911 + { 912 + struct resctrl_schema *s = rdt_kn_parent_priv(of->kn); 913 + struct rdt_resource *r = s->res; 914 + u32 io_alloc_closid; 915 + int ret = 0; 916 + 917 + /* Valid input requires a trailing newline */ 918 + if (nbytes == 0 || buf[nbytes - 1] != '\n') 919 + return -EINVAL; 920 + 921 + buf[nbytes - 1] = '\0'; 922 + 923 + cpus_read_lock(); 924 + mutex_lock(&rdtgroup_mutex); 925 + rdt_last_cmd_clear(); 926 + 927 + if (!r->cache.io_alloc_capable) { 928 + rdt_last_cmd_printf("io_alloc is not supported on %s\n", s->name); 929 + ret = -ENODEV; 930 + goto out_unlock; 931 + } 932 + 933 + if (!resctrl_arch_get_io_alloc_enabled(r)) { 934 + rdt_last_cmd_printf("io_alloc is not enabled on %s\n", s->name); 935 + ret = -EINVAL; 936 + goto out_unlock; 937 + } 938 + 939 + io_alloc_closid = resctrl_io_alloc_closid(r); 940 + 941 + rdt_staged_configs_clear(); 942 + ret = resctrl_io_alloc_parse_line(buf, r, s, io_alloc_closid); 943 + if (ret) 944 + goto out_clear_configs; 945 + 946 + ret = resctrl_arch_update_domains(r, io_alloc_closid); 947 + 948 + out_clear_configs: 949 + rdt_staged_configs_clear(); 950 + out_unlock: 951 + mutex_unlock(&rdtgroup_mutex); 952 + cpus_read_unlock(); 953 + 954 + return ret ?: nbytes; 682 955 }
+17
fs/resctrl/internal.h
··· 390 390 391 391 bool closid_allocated(unsigned int closid); 392 392 393 + bool closid_alloc_fixed(u32 closid); 394 + 393 395 int resctrl_find_cleanest_closid(void); 394 396 395 397 void *rdt_kn_parent_priv(struct kernfs_node *kn); ··· 428 426 429 427 ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf, size_t nbytes, 430 428 loff_t off); 429 + int resctrl_io_alloc_show(struct kernfs_open_file *of, struct seq_file *seq, void *v); 430 + 431 + int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid); 432 + 433 + enum resctrl_conf_type resctrl_peer_type(enum resctrl_conf_type my_type); 434 + 435 + ssize_t resctrl_io_alloc_write(struct kernfs_open_file *of, char *buf, 436 + size_t nbytes, loff_t off); 437 + 438 + const char *rdtgroup_name_by_closid(u32 closid); 439 + int resctrl_io_alloc_cbm_show(struct kernfs_open_file *of, struct seq_file *seq, 440 + void *v); 441 + ssize_t resctrl_io_alloc_cbm_write(struct kernfs_open_file *of, char *buf, 442 + size_t nbytes, loff_t off); 443 + u32 resctrl_io_alloc_closid(struct rdt_resource *r); 431 444 432 445 #ifdef CONFIG_RESCTRL_FS_PSEUDO_LOCK 433 446 int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
+75 -7
fs/resctrl/rdtgroup.c
··· 226 226 return !test_bit(closid, closid_free_map); 227 227 } 228 228 229 + bool closid_alloc_fixed(u32 closid) 230 + { 231 + return __test_and_clear_bit(closid, closid_free_map); 232 + } 233 + 229 234 /** 230 235 * rdtgroup_mode_by_closid - Return mode of resource group with closid 231 236 * @closid: closid if the resource group ··· 1062 1057 1063 1058 cpus_read_lock(); 1064 1059 mutex_lock(&rdtgroup_mutex); 1065 - hw_shareable = r->cache.shareable_bits; 1066 1060 list_for_each_entry(dom, &r->ctrl_domains, hdr.list) { 1067 1061 if (sep) 1068 1062 seq_putc(seq, ';'); 1063 + hw_shareable = r->cache.shareable_bits; 1069 1064 sw_shareable = 0; 1070 1065 exclusive = 0; 1071 1066 seq_printf(seq, "%d=", dom->hdr.id); 1072 1067 for (i = 0; i < closids_supported(); i++) { 1073 - if (!closid_allocated(i)) 1068 + if (!closid_allocated(i) || 1069 + (resctrl_arch_get_io_alloc_enabled(r) && 1070 + i == resctrl_io_alloc_closid(r))) 1074 1071 continue; 1075 1072 ctrl_val = resctrl_arch_get_config(r, dom, i, 1076 1073 s->conf_type); ··· 1100 1093 break; 1101 1094 } 1102 1095 } 1096 + 1097 + /* 1098 + * When the "io_alloc" feature is enabled, a portion of the cache 1099 + * is configured for shared use between hardware and software. 1100 + * Also, when CDP is enabled the CBMs of CDP_CODE and CDP_DATA 1101 + * resources are kept in sync. So, the CBMs for "io_alloc" can 1102 + * be accessed through either resource. 1103 + */ 1104 + if (resctrl_arch_get_io_alloc_enabled(r)) { 1105 + ctrl_val = resctrl_arch_get_config(r, dom, 1106 + resctrl_io_alloc_closid(r), 1107 + s->conf_type); 1108 + hw_shareable |= ctrl_val; 1109 + } 1110 + 1103 1111 for (i = r->cache.cbm_len - 1; i >= 0; i--) { 1104 1112 pseudo_locked = dom->plr ? dom->plr->cbm : 0; 1105 1113 hwb = test_bit(i, &hw_shareable); ··· 1269 1247 return 0; 1270 1248 } 1271 1249 1272 - static enum resctrl_conf_type resctrl_peer_type(enum resctrl_conf_type my_type) 1250 + enum resctrl_conf_type resctrl_peer_type(enum resctrl_conf_type my_type) 1273 1251 { 1274 1252 switch (my_type) { 1275 1253 case CDP_CODE: ··· 1860 1838 kernfs_put(mon_kn); 1861 1839 } 1862 1840 1841 + const char *rdtgroup_name_by_closid(u32 closid) 1842 + { 1843 + struct rdtgroup *rdtgrp; 1844 + 1845 + list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) { 1846 + if (rdtgrp->closid == closid) 1847 + return rdt_kn_name(rdtgrp->kn); 1848 + } 1849 + 1850 + return NULL; 1851 + } 1852 + 1863 1853 /* rdtgroup information files for one cache resource. */ 1864 1854 static struct rftype res_common_files[] = { 1865 1855 { ··· 1980 1946 .mode = 0444, 1981 1947 .kf_ops = &rdtgroup_kf_single_ops, 1982 1948 .seq_show = rdt_thread_throttle_mode_show, 1949 + }, 1950 + { 1951 + .name = "io_alloc", 1952 + .mode = 0644, 1953 + .kf_ops = &rdtgroup_kf_single_ops, 1954 + .seq_show = resctrl_io_alloc_show, 1955 + .write = resctrl_io_alloc_write, 1956 + }, 1957 + { 1958 + .name = "io_alloc_cbm", 1959 + .mode = 0644, 1960 + .kf_ops = &rdtgroup_kf_single_ops, 1961 + .seq_show = resctrl_io_alloc_cbm_show, 1962 + .write = resctrl_io_alloc_cbm_write, 1983 1963 }, 1984 1964 { 1985 1965 .name = "max_threshold_occupancy", ··· 2184 2136 2185 2137 resctrl_file_fflags_init("thread_throttle_mode", 2186 2138 RFTYPE_CTRL_INFO | RFTYPE_RES_MB); 2139 + } 2140 + 2141 + /* 2142 + * The resctrl file "io_alloc" is added using L3 resource. However, it results 2143 + * in this file being visible for *all* cache resources (eg. L2 cache), 2144 + * whether it supports "io_alloc" or not. 2145 + */ 2146 + static void io_alloc_init(void) 2147 + { 2148 + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3); 2149 + 2150 + if (r->cache.io_alloc_capable) { 2151 + resctrl_file_fflags_init("io_alloc", RFTYPE_CTRL_INFO | 2152 + RFTYPE_RES_CACHE); 2153 + resctrl_file_fflags_init("io_alloc_cbm", 2154 + RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE); 2155 + } 2187 2156 } 2188 2157 2189 2158 void resctrl_file_fflags_init(const char *config, unsigned long fflags) ··· 3448 3383 { 3449 3384 unsigned int cbm_len = r->cache.cbm_len; 3450 3385 unsigned long first_bit, zero_bit; 3451 - unsigned long val = _val; 3386 + unsigned long val; 3452 3387 3453 - if (!val) 3454 - return 0; 3388 + if (!_val || r->cache.arch_has_sparse_bitmasks) 3389 + return _val; 3455 3390 3391 + val = _val; 3456 3392 first_bit = find_first_bit(&val, cbm_len); 3457 3393 zero_bit = find_next_zero_bit(&val, cbm_len, first_bit); 3458 3394 ··· 3546 3480 * If there are no more shareable bits available on any domain then 3547 3481 * the entire allocation will fail. 3548 3482 */ 3549 - static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid) 3483 + int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid) 3550 3484 { 3551 3485 struct rdt_ctrl_domain *d; 3552 3486 int ret; ··· 4473 4407 rdtgroup_setup_default(); 4474 4408 4475 4409 thread_throttle_mode_init(); 4410 + 4411 + io_alloc_init(); 4476 4412 4477 4413 ret = resctrl_mon_resource_init(); 4478 4414 if (ret)
+24
include/linux/resctrl.h
··· 206 206 * @arch_has_sparse_bitmasks: True if a bitmask like f00f is valid. 207 207 * @arch_has_per_cpu_cfg: True if QOS_CFG register for this cache 208 208 * level has CPU scope. 209 + * @io_alloc_capable: True if portion of the cache can be configured 210 + * for I/O traffic. 209 211 */ 210 212 struct resctrl_cache { 211 213 unsigned int cbm_len; ··· 215 213 unsigned int shareable_bits; 216 214 bool arch_has_sparse_bitmasks; 217 215 bool arch_has_per_cpu_cfg; 216 + bool io_alloc_capable; 218 217 }; 219 218 220 219 /** ··· 656 653 void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d, 657 654 u32 closid, u32 rmid, int cntr_id, 658 655 enum resctrl_event_id eventid); 656 + 657 + /** 658 + * resctrl_arch_io_alloc_enable() - Enable/disable io_alloc feature. 659 + * @r: The resctrl resource. 660 + * @enable: Enable (true) or disable (false) io_alloc on resource @r. 661 + * 662 + * This can be called from any CPU. 663 + * 664 + * Return: 665 + * 0 on success, <0 on error. 666 + */ 667 + int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable); 668 + 669 + /** 670 + * resctrl_arch_get_io_alloc_enabled() - Get io_alloc feature state. 671 + * @r: The resctrl resource. 672 + * 673 + * Return: 674 + * true if io_alloc is enabled or false if disabled. 675 + */ 676 + bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r); 659 677 660 678 extern unsigned int resctrl_rmid_realloc_threshold; 661 679 extern unsigned int resctrl_rmid_realloc_limit;