Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node

== Problem ==

The amount of SGX memory on a system is determined by the BIOS and it
varies wildly between systems. It can be as small as dozens of MB's
and as large as many GB's on servers. Just like how applications need
to know how much regular RAM is available, enclave builders need to
know how much SGX memory an enclave can consume.

== Solution ==

Introduce a new sysfs file:

/sys/devices/system/node/nodeX/x86/sgx_total_bytes

to enumerate the amount of SGX memory available in each NUMA node.
This serves the same function for SGX as /proc/meminfo or
/sys/devices/system/node/nodeX/meminfo does for normal RAM.

'sgx_total_bytes' is needed today to help drive the SGX selftests.
SGX-specific swap code is exercised by creating overcommitted enclaves
which are larger than the physical SGX memory on the system. They
currently use a CPUID-based approach which can diverge from the actual
amount of SGX memory available. 'sgx_total_bytes' ensures that the
selftests can work efficiently and do not attempt stupid things like
creating a 100,000 MB enclave on a system with 128 MB of SGX memory.

== Implementation Details ==

Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
arch specific attribute group, and add an attribute for the amount of
SGX memory in bytes to each NUMA node:

== ABI Design Discussion ==

As opposed to the per-node ABI, a single, global ABI was considered.
However, this would prevent enclaves from being able to size
themselves so that they fit on a single NUMA node. Essentially, a
single value would rule out NUMA optimizations for enclaves.

Create a new "x86/" directory inside each "nodeX/" sysfs directory.
'sgx_total_bytes' is expected to be the first of at least a few
sgx-specific files to be placed in the new directory. Just scanning
/proc/meminfo, these are the no-brainers that we have for RAM, but we
need for SGX:

MemTotal: xxxx kB // sgx_total_bytes (implemented here)
MemFree: yyyy kB // sgx_free_bytes
SwapTotal: zzzz kB // sgx_swapped_bytes

So, at *least* three. I think we will eventually end up needing
something more along the lines of a dozen. A new directory (as
opposed to being in the nodeX/ "root") directory avoids cluttering the
root with several "sgx_*" files.

Place the new file in a new "nodeX/x86/" directory because SGX is
highly x86-specific. It is very unlikely that any other architecture
(or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/"
as opposed to "x86/" was also considered. But, there is a real chance
this can get used for other arch-specific purposes.

[ dhansen: rewrite changelog ]

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org

authored by

Jarkko Sakkinen and committed by
Dave Hansen
50468e43 5c16f7ee

+39
+6
Documentation/ABI/stable/sysfs-devices-node
··· 176 176 Description: 177 177 The cache write policy: 0 for write-back, 1 for write-through, 178 178 other or unknown. 179 + 180 + What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes 181 + Date: November 2021 182 + Contact: Jarkko Sakkinen <jarkko@kernel.org> 183 + Description: 184 + The total amount of SGX physical memory in bytes.
+4
arch/Kconfig
··· 1302 1302 config DYNAMIC_SIGFRAME 1303 1303 bool 1304 1304 1305 + # Select, if arch has a named attribute group bound to NUMA device nodes. 1306 + config HAVE_ARCH_NODE_DEV_GROUP 1307 + bool 1308 + 1305 1309 source "kernel/gcov/Kconfig" 1306 1310 1307 1311 source "scripts/gcc-plugins/Kconfig"
+1
arch/x86/Kconfig
··· 269 269 select HAVE_ARCH_KCSAN if X86_64 270 270 select X86_FEATURE_NAMES if PROC_FS 271 271 select PROC_PID_ARCH_STATUS if PROC_FS 272 + select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX 272 273 imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI 273 274 274 275 config INSTRUCTION_DECODER
+20
arch/x86/kernel/cpu/sgx/main.c
··· 825 825 INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list); 826 826 INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list); 827 827 node_set(nid, sgx_numa_mask); 828 + sgx_numa_nodes[nid].size = 0; 828 829 } 829 830 830 831 sgx_epc_sections[i].node = &sgx_numa_nodes[nid]; 832 + sgx_numa_nodes[nid].size += size; 831 833 832 834 sgx_nr_epc_sections++; 833 835 } ··· 902 900 return 0; 903 901 } 904 902 EXPORT_SYMBOL_GPL(sgx_set_attribute); 903 + 904 + #ifdef CONFIG_NUMA 905 + static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf) 906 + { 907 + return sysfs_emit(buf, "%lu\n", sgx_numa_nodes[dev->id].size); 908 + } 909 + static DEVICE_ATTR_RO(sgx_total_bytes); 910 + 911 + static struct attribute *arch_node_dev_attrs[] = { 912 + &dev_attr_sgx_total_bytes.attr, 913 + NULL, 914 + }; 915 + 916 + const struct attribute_group arch_node_dev_group = { 917 + .name = "x86", 918 + .attrs = arch_node_dev_attrs, 919 + }; 920 + #endif /* CONFIG_NUMA */ 905 921 906 922 static int __init sgx_init(void) 907 923 {
+1
arch/x86/kernel/cpu/sgx/sgx.h
··· 44 44 struct sgx_numa_node { 45 45 struct list_head free_page_list; 46 46 struct list_head sgx_poison_page_list; 47 + unsigned long size; 47 48 spinlock_t lock; 48 49 }; 49 50
+3
drivers/base/node.c
··· 581 581 582 582 static const struct attribute_group *node_dev_groups[] = { 583 583 &node_dev_group, 584 + #ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP 585 + &arch_node_dev_group, 586 + #endif 584 587 NULL 585 588 }; 586 589
+4
include/linux/numa.h
··· 58 58 } 59 59 #endif 60 60 61 + #ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP 62 + extern const struct attribute_group arch_node_dev_group; 63 + #endif 64 + 61 65 #endif /* _LINUX_NUMA_H */