Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

hugetlb: support node specified when using cma for gigantic hugepages

Now the size of CMA area for gigantic hugepages runtime allocation is
balanced for all online nodes, but we also want to specify the size of
CMA per-node, or only one node in some cases, which are similar with
patch [1].

For example, on some multi-nodes systems, each node's memory can be
different, allocating the same size of CMA for each node is not suitable
for the low-memory nodes. Meanwhile some workloads like DPDK mentioned
by Zhenguo in patch [1] only need hugepages in one node.

On the other hand, we have some machines with multiple types of memory,
like DRAM and PMEM (persistent memory). On this system, we may want to
specify all the hugepages only on DRAM node, or specify the proportion
of DRAM node and PMEM node, to tuning the performance of the workloads.

Thus this patch adds node format for 'hugetlb_cma' parameter to support
specifying the size of CMA per-node. An example is as follows:

hugetlb_cma=0:5G,2:5G

which means allocating 5G size of CMA area on node 0 and node 2
respectively. And the users should use the node specific sysfs file to
allocate the gigantic hugepages if specified the CMA size on that node.

Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com [1]
Link: https://lkml.kernel.org/r/bb790775ca60bb8f4b26956bb3f6988f74e075c7.1634261144.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Baolin Wang and committed by
Linus Torvalds
38e719ab 12b61320

+81 -11
+4 -2
Documentation/admin-guide/kernel-parameters.txt
··· 1587 1587 registers. Default set by CONFIG_HPET_MMAP_DEFAULT. 1588 1588 1589 1589 hugetlb_cma= [HW,CMA] The size of a CMA area used for allocation 1590 - of gigantic hugepages. 1591 - Format: nn[KMGTPE] 1590 + of gigantic hugepages. Or using node format, the size 1591 + of a CMA area per node can be specified. 1592 + Format: nn[KMGTPE] or (node format) 1593 + <node>:nn[KMGTPE][,<node>:nn[KMGTPE]] 1592 1594 1593 1595 Reserve a CMA area of given size and allocate gigantic 1594 1596 hugepages using the CMA allocator. If enabled, the
+77 -9
mm/hugetlb.c
··· 50 50 51 51 #ifdef CONFIG_CMA 52 52 static struct cma *hugetlb_cma[MAX_NUMNODES]; 53 + static unsigned long hugetlb_cma_size_in_node[MAX_NUMNODES] __initdata; 53 54 static bool hugetlb_cma_page(struct page *page, unsigned int order) 54 55 { 55 56 return cma_pages_valid(hugetlb_cma[page_to_nid(page)], page, ··· 6763 6762 6764 6763 static int __init cmdline_parse_hugetlb_cma(char *p) 6765 6764 { 6766 - hugetlb_cma_size = memparse(p, &p); 6765 + int nid, count = 0; 6766 + unsigned long tmp; 6767 + char *s = p; 6768 + 6769 + while (*s) { 6770 + if (sscanf(s, "%lu%n", &tmp, &count) != 1) 6771 + break; 6772 + 6773 + if (s[count] == ':') { 6774 + nid = tmp; 6775 + if (nid < 0 || nid >= MAX_NUMNODES) 6776 + break; 6777 + 6778 + s += count + 1; 6779 + tmp = memparse(s, &s); 6780 + hugetlb_cma_size_in_node[nid] = tmp; 6781 + hugetlb_cma_size += tmp; 6782 + 6783 + /* 6784 + * Skip the separator if have one, otherwise 6785 + * break the parsing. 6786 + */ 6787 + if (*s == ',') 6788 + s++; 6789 + else 6790 + break; 6791 + } else { 6792 + hugetlb_cma_size = memparse(p, &p); 6793 + break; 6794 + } 6795 + } 6796 + 6767 6797 return 0; 6768 6798 } 6769 6799 ··· 6803 6771 void __init hugetlb_cma_reserve(int order) 6804 6772 { 6805 6773 unsigned long size, reserved, per_node; 6774 + bool node_specific_cma_alloc = false; 6806 6775 int nid; 6807 6776 6808 6777 cma_reserve_called = true; 6809 6778 6779 + if (!hugetlb_cma_size) 6780 + return; 6781 + 6782 + for (nid = 0; nid < MAX_NUMNODES; nid++) { 6783 + if (hugetlb_cma_size_in_node[nid] == 0) 6784 + continue; 6785 + 6786 + if (!node_state(nid, N_ONLINE)) { 6787 + pr_warn("hugetlb_cma: invalid node %d specified\n", nid); 6788 + hugetlb_cma_size -= hugetlb_cma_size_in_node[nid]; 6789 + hugetlb_cma_size_in_node[nid] = 0; 6790 + continue; 6791 + } 6792 + 6793 + if (hugetlb_cma_size_in_node[nid] < (PAGE_SIZE << order)) { 6794 + pr_warn("hugetlb_cma: cma area of node %d should be at least %lu MiB\n", 6795 + nid, (PAGE_SIZE << order) / SZ_1M); 6796 + hugetlb_cma_size -= hugetlb_cma_size_in_node[nid]; 6797 + hugetlb_cma_size_in_node[nid] = 0; 6798 + } else { 6799 + node_specific_cma_alloc = true; 6800 + } 6801 + } 6802 + 6803 + /* Validate the CMA size again in case some invalid nodes specified. */ 6810 6804 if (!hugetlb_cma_size) 6811 6805 return; 6812 6806 ··· 6843 6785 return; 6844 6786 } 6845 6787 6846 - /* 6847 - * If 3 GB area is requested on a machine with 4 numa nodes, 6848 - * let's allocate 1 GB on first three nodes and ignore the last one. 6849 - */ 6850 - per_node = DIV_ROUND_UP(hugetlb_cma_size, nr_online_nodes); 6851 - pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n", 6852 - hugetlb_cma_size / SZ_1M, per_node / SZ_1M); 6788 + if (!node_specific_cma_alloc) { 6789 + /* 6790 + * If 3 GB area is requested on a machine with 4 numa nodes, 6791 + * let's allocate 1 GB on first three nodes and ignore the last one. 6792 + */ 6793 + per_node = DIV_ROUND_UP(hugetlb_cma_size, nr_online_nodes); 6794 + pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n", 6795 + hugetlb_cma_size / SZ_1M, per_node / SZ_1M); 6796 + } 6853 6797 6854 6798 reserved = 0; 6855 6799 for_each_node_state(nid, N_ONLINE) { 6856 6800 int res; 6857 6801 char name[CMA_MAX_NAME]; 6858 6802 6859 - size = min(per_node, hugetlb_cma_size - reserved); 6803 + if (node_specific_cma_alloc) { 6804 + if (hugetlb_cma_size_in_node[nid] == 0) 6805 + continue; 6806 + 6807 + size = hugetlb_cma_size_in_node[nid]; 6808 + } else { 6809 + size = min(per_node, hugetlb_cma_size - reserved); 6810 + } 6811 + 6860 6812 size = round_up(size, PAGE_SIZE << order); 6861 6813 6862 6814 snprintf(name, sizeof(name), "hugetlb%d", nid);