Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

platform/x86/intel-uncore-freq: Support for cluster level controls

An SoC can contain multiple power domains with individual or collection
of mesh partitions. This partition is called fabric cluster.

Certain type of meshes will need to run at the same frequency, they will
be placed in the same fabric cluster. Benefit of fabric cluster is that
it offers a scalable mechanism to deal with partitioned fabrics in a SoC.

The current sysfs interface supports control at package and die level.
This interface is not enough to support more granular control at
fabric cluster level.

SoCs with the support of TPMI (Topology Aware Register and PM Capsule
Interface), can have multiple power domains. Each power domain can
contain one or more fabric clusters.

To support such granular controls, enhance uncore common to optionally
create new directories to provide controls at fabric cluster level. It
is also important to have flexibility to change granularity for future
version of SoCs. If the directory name contains scope like:
"package_*_die_*_power_domain_*_cluster_*", then this is not expandable.

The cpufreq policies also have different scopes. There the scope of the
policy (affected_cpus) specified by attributes inside each policy.
So, follow the same model for uncore frequency scaling sysfs as:
"sys/devices/system/cpu/cpufreq/policy*"

Allow client drivers to optionally support granular control for each
fabric cluster. Here, the directory name will be "uncore" suffixed with
an unique instance number. For example: uncore00, uncore01 etc.
Attributes in the directory identify package id, power domain and
fabric cluster id. This interface is expandable even if some new level
of granularity is introduced. A new sysfs attribute can identify new
level.

For compatibility with the existing sysfs and provide easy way to set
limits for each fabric cluster in the package/die, the existing control
at package/die levels are still provided. For majority of users, this is
an easy approach.

For example: On a single package/die system, with three power domains
and one fabric cluster per power domain:

$tree -L 2 /sys/devices/system/cpu/intel_uncore_frequency/
/sys/devices/system/cpu/intel_uncore_frequency/
├── package_00_die_00
│   ├── current_freq_khz
│   ├── initial_max_freq_khz
│   ├── initial_min_freq_khz
│   ├── max_freq_khz
│   └── min_freq_khz
├── uncore00
│   ├── current_freq_khz
│   ├── domain_id
│   ├── fabric_cluster_id
│   ├── initial_max_freq_khz
│   ├── initial_min_freq_khz
│   ├── max_freq_khz
│   ├── min_freq_khz
│   └── package_id
├── uncore01
│   ├── current_freq_khz
│   ├── domain_id
│   ├── fabric_cluster_id
│   ├── initial_max_freq_khz
│   ├── initial_min_freq_khz
│   ├── max_freq_khz
│   ├── min_freq_khz
│   └── package_id
└── uncore02
├── current_freq_khz
├── domain_id
├── fabric_cluster_id
├── initial_max_freq_khz
├── initial_min_freq_khz
├── max_freq_khz
├── min_freq_khz
└── package_id

The attribute for cluster id is "fabric_cluster_id" instead of just
"cluster_id" is to avoid confusion with usage of term clusters in
other part of the Linux kernel.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Link: https://lore.kernel.org/r/20230418171340.681662-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

authored by

Srinivas Pandruvada and committed by
Hans de Goede
9b8dea80 8a54e225

+121 -4
+56 -1
Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
··· 5 5 Intel Uncore Frequency Scaling 6 6 ============================== 7 7 8 - :Copyright: |copy| 2022 Intel Corporation 8 + :Copyright: |copy| 2022-2023 Intel Corporation 9 9 10 10 :Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> 11 11 ··· 58 58 59 59 ``current_freq_khz`` 60 60 This attribute is used to get the current uncore frequency. 61 + 62 + SoCs with TPMI (Topology Aware Register and PM Capsule Interface) 63 + ----------------------------------------------------------------- 64 + 65 + An SoC can contain multiple power domains with individual or collection 66 + of mesh partitions. This partition is called fabric cluster. 67 + 68 + Certain type of meshes will need to run at the same frequency, they will 69 + be placed in the same fabric cluster. Benefit of fabric cluster is that it 70 + offers a scalable mechanism to deal with partitioned fabrics in a SoC. 71 + 72 + The current sysfs interface supports controls at package and die level. 73 + This interface is not enough to support more granular control at 74 + fabric cluster level. 75 + 76 + SoCs with the support of TPMI (Topology Aware Register and PM Capsule 77 + Interface), can have multiple power domains. Each power domain can 78 + contain one or more fabric clusters. 79 + 80 + To represent controls at fabric cluster level in addition to the 81 + controls at package and die level (like systems without TPMI 82 + support), sysfs is enhanced. This granular interface is presented in the 83 + sysfs with directories names prefixed with "uncore". For example: 84 + uncore00, uncore01 etc. 85 + 86 + The scope of control is specified by attributes "package_id", "domain_id" 87 + and "fabric_cluster_id" in the directory. 88 + 89 + Attributes in each directory: 90 + 91 + ``domain_id`` 92 + This attribute is used to get the power domain id of this instance. 93 + 94 + ``fabric_cluster_id`` 95 + This attribute is used to get the fabric cluster id of this instance. 96 + 97 + ``package_id`` 98 + This attribute is used to get the package id of this instance. 99 + 100 + The other attributes are same as presented at package_*_die_* level. 101 + 102 + In most of current use cases, the "max_freq_khz" and "min_freq_khz" 103 + is updated at "package_*_die_*" level. This model will be still supported 104 + with the following approach: 105 + 106 + When user uses controls at "package_*_die_*" level, then every fabric 107 + cluster is affected in that package and die. For example: user changes 108 + "max_freq_khz" in the package_00_die_00, then "max_freq_khz" for uncore* 109 + directory with the same package id will be updated. In this case user can 110 + still update "max_freq_khz" at each uncore* level, which is more restrictive. 111 + Similarly, user can update "min_freq_khz" at "package_*_die_*" level 112 + to apply at each uncore* level. 113 + 114 + Support for "current_freq_khz" is available only at each fabric cluster 115 + level (i.e., in uncore* directory).
+49 -2
drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c
··· 16 16 /* uncore instance count */ 17 17 static int uncore_instance_count; 18 18 19 + static DEFINE_IDA(intel_uncore_ida); 20 + 19 21 /* callbacks for actual HW read/write */ 20 22 static int (*uncore_read)(struct uncore_data *data, unsigned int *min, unsigned int *max); 21 23 static int (*uncore_write)(struct uncore_data *data, unsigned int input, unsigned int min_max); 22 24 static int (*uncore_read_freq)(struct uncore_data *data, unsigned int *freq); 25 + 26 + static ssize_t show_domain_id(struct device *dev, struct device_attribute *attr, char *buf) 27 + { 28 + struct uncore_data *data = container_of(attr, struct uncore_data, domain_id_dev_attr); 29 + 30 + return sprintf(buf, "%u\n", data->domain_id); 31 + } 32 + 33 + static ssize_t show_fabric_cluster_id(struct device *dev, struct device_attribute *attr, char *buf) 34 + { 35 + struct uncore_data *data = container_of(attr, struct uncore_data, fabric_cluster_id_dev_attr); 36 + 37 + return sprintf(buf, "%u\n", data->cluster_id); 38 + } 39 + 40 + static ssize_t show_package_id(struct device *dev, struct device_attribute *attr, char *buf) 41 + { 42 + struct uncore_data *data = container_of(attr, struct uncore_data, package_id_dev_attr); 43 + 44 + return sprintf(buf, "%u\n", data->package_id); 45 + } 23 46 24 47 static ssize_t show_min_max_freq_khz(struct uncore_data *data, 25 48 char *buf, int min_max) ··· 184 161 init_attribute_ro(initial_max_freq_khz); 185 162 init_attribute_root_ro(current_freq_khz); 186 163 164 + if (data->domain_id != UNCORE_DOMAIN_ID_INVALID) { 165 + init_attribute_root_ro(domain_id); 166 + data->uncore_attrs[index++] = &data->domain_id_dev_attr.attr; 167 + init_attribute_root_ro(fabric_cluster_id); 168 + data->uncore_attrs[index++] = &data->fabric_cluster_id_dev_attr.attr; 169 + init_attribute_root_ro(package_id); 170 + data->uncore_attrs[index++] = &data->package_id_dev_attr.attr; 171 + } 172 + 187 173 data->uncore_attrs[index++] = &data->max_freq_khz_dev_attr.attr; 188 174 data->uncore_attrs[index++] = &data->min_freq_khz_dev_attr.attr; 189 175 data->uncore_attrs[index++] = &data->initial_min_freq_khz_dev_attr.attr; ··· 223 191 goto uncore_unlock; 224 192 } 225 193 226 - sprintf(data->name, "package_%02d_die_%02d", data->package_id, data->die_id); 194 + if (data->domain_id != UNCORE_DOMAIN_ID_INVALID) { 195 + ret = ida_alloc(&intel_uncore_ida, GFP_KERNEL); 196 + if (ret < 0) 197 + goto uncore_unlock; 198 + 199 + data->instance_id = ret; 200 + sprintf(data->name, "uncore%02d", ret); 201 + } else { 202 + sprintf(data->name, "package_%02d_die_%02d", data->package_id, data->die_id); 203 + } 227 204 228 205 uncore_read(data, &data->initial_min_freq_khz, &data->initial_max_freq_khz); 229 206 230 207 ret = create_attr_group(data, data->name); 231 - if (!ret) { 208 + if (ret) { 209 + if (data->domain_id != UNCORE_DOMAIN_ID_INVALID) 210 + ida_free(&intel_uncore_ida, data->instance_id); 211 + } else { 232 212 data->control_cpu = cpu; 233 213 data->valid = true; 234 214 } ··· 258 214 delete_attr_group(data, data->name); 259 215 data->control_cpu = -1; 260 216 data->valid = false; 217 + if (data->domain_id != UNCORE_DOMAIN_ID_INVALID) 218 + ida_free(&intel_uncore_ida, data->instance_id); 219 + 261 220 mutex_unlock(&uncore_lock); 262 221 } 263 222 EXPORT_SYMBOL_NS_GPL(uncore_freq_remove_die_entry, INTEL_UNCORE_FREQUENCY);
+15 -1
drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h
··· 21 21 * @valid: Mark the data valid/invalid 22 22 * @package_id: Package id for this instance 23 23 * @die_id: Die id for this instance 24 + * @domain_id: Power domain id for this instance 25 + * @cluster_id: cluster id in a domain 26 + * @instance_id: Unique instance id to append to directory name 24 27 * @name: Sysfs entry name for this instance 25 28 * @uncore_attr_group: Attribute group storage 26 29 * @max_freq_khz_dev_attr: Storage for device attribute max_freq_khz ··· 31 28 * @initial_max_freq_khz_dev_attr: Storage for device attribute initial_max_freq_khz 32 29 * @initial_min_freq_khz_dev_attr: Storage for device attribute initial_min_freq_khz 33 30 * @current_freq_khz_dev_attr: Storage for device attribute current_freq_khz 31 + * @domain_id_dev_attr: Storage for device attribute domain_id 32 + * @fabric_cluster_id_dev_attr: Storage for device attribute fabric_cluster_id 33 + * @package_id_dev_attr: Storage for device attribute package_id 34 34 * @uncore_attrs: Attribute storage for group creation 35 35 * 36 36 * This structure is used to encapsulate all data related to uncore sysfs ··· 47 41 bool valid; 48 42 int package_id; 49 43 int die_id; 44 + int domain_id; 45 + int cluster_id; 46 + int instance_id; 50 47 char name[32]; 51 48 52 49 struct attribute_group uncore_attr_group; ··· 58 49 struct device_attribute initial_max_freq_khz_dev_attr; 59 50 struct device_attribute initial_min_freq_khz_dev_attr; 60 51 struct device_attribute current_freq_khz_dev_attr; 61 - struct attribute *uncore_attrs[6]; 52 + struct device_attribute domain_id_dev_attr; 53 + struct device_attribute fabric_cluster_id_dev_attr; 54 + struct device_attribute package_id_dev_attr; 55 + struct attribute *uncore_attrs[9]; 62 56 }; 57 + 58 + #define UNCORE_DOMAIN_ID_INVALID -1 63 59 64 60 int uncore_freq_common_init(int (*read_control_freq)(struct uncore_data *data, unsigned int *min, unsigned int *max), 65 61 int (*write_control_freq)(struct uncore_data *data, unsigned int input, unsigned int min_max),
+1
drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c
··· 136 136 137 137 data->package_id = topology_physical_package_id(cpu); 138 138 data->die_id = topology_die_id(cpu); 139 + data->domain_id = UNCORE_DOMAIN_ID_INVALID; 139 140 140 141 return uncore_freq_add_entry(data, cpu); 141 142 }