[PATCH] cpuset: top_cpuset tracks hotplug changes to cpu_online_map

Change the list of cpus allowed to tasks in the top (root) cpuset to
dynamically track what cpus are online, using a CPU hotplug notifier. Make
this top cpus file read-only.

On systems that have cpusets configured in their kernel, but that aren't
actively using cpusets (for some distros, this covers the majority of
systems) all tasks end up in the top cpuset.

If that system does support CPU hotplug, then these tasks cannot make use
of CPUs that are added after system boot, because the CPUs are not allowed
in the top cpuset. This is a surprising regression over earlier kernels
that didn't have cpusets enabled.

In order to keep the behaviour of cpusets consistent between systems
actively making use of them and systems not using them, this patch changes
the behaviour of the 'cpus' file in the top (root) cpuset, making it read
only, and making it automatically track the value of cpu_online_map. Thus
tasks in the top cpuset will have automatic use of hot plugged CPUs allowed
by their cpuset.

Thanks to Anton Blanchard and Nathan Lynch for reporting this problem,
driving the fix, and earlier versions of this patch.

Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Nathan Lynch <ntl@pobox.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

authored by Paul Jackson and committed by Linus Torvalds 4c4d50f7 6394cca5

+39
+6
Documentation/cpusets.txt
··· 217 to represent the cpuset hierarchy provides for a familiar permission 218 and name space for cpusets, with a minimum of additional kernel code. 219 220 221 1.4 What are exclusive cpusets ? 222 --------------------------------
··· 217 to represent the cpuset hierarchy provides for a familiar permission 218 and name space for cpusets, with a minimum of additional kernel code. 219 220 + The cpus file in the root (top_cpuset) cpuset is read-only. 221 + It automatically tracks the value of cpu_online_map, using a CPU 222 + hotplug notifier. If and when memory nodes can be hotplugged, 223 + we expect to make the mems file in the root cpuset read-only 224 + as well, and have it track the value of node_online_map. 225 + 226 227 1.4 What are exclusive cpusets ? 228 --------------------------------
+33
kernel/cpuset.c
··· 816 struct cpuset trialcs; 817 int retval, cpus_unchanged; 818 819 trialcs = *cs; 820 retval = cpulist_parse(buf, trialcs.cpus_allowed); 821 if (retval < 0) ··· 2037 return err; 2038 } 2039 2040 /** 2041 * cpuset_init_smp - initialize cpus_allowed 2042 * ··· 2074 { 2075 top_cpuset.cpus_allowed = cpu_online_map; 2076 top_cpuset.mems_allowed = node_online_map; 2077 } 2078 2079 /**
··· 816 struct cpuset trialcs; 817 int retval, cpus_unchanged; 818 819 + /* top_cpuset.cpus_allowed tracks cpu_online_map; it's read-only */ 820 + if (cs == &top_cpuset) 821 + return -EACCES; 822 + 823 trialcs = *cs; 824 retval = cpulist_parse(buf, trialcs.cpus_allowed); 825 if (retval < 0) ··· 2033 return err; 2034 } 2035 2036 + /* 2037 + * The top_cpuset tracks what CPUs and Memory Nodes are online, 2038 + * period. This is necessary in order to make cpusets transparent 2039 + * (of no affect) on systems that are actively using CPU hotplug 2040 + * but making no active use of cpusets. 2041 + * 2042 + * This handles CPU hotplug (cpuhp) events. If someday Memory 2043 + * Nodes can be hotplugged (dynamically changing node_online_map) 2044 + * then we should handle that too, perhaps in a similar way. 2045 + */ 2046 + 2047 + #ifdef CONFIG_HOTPLUG_CPU 2048 + static int cpuset_handle_cpuhp(struct notifier_block *nb, 2049 + unsigned long phase, void *cpu) 2050 + { 2051 + mutex_lock(&manage_mutex); 2052 + mutex_lock(&callback_mutex); 2053 + 2054 + top_cpuset.cpus_allowed = cpu_online_map; 2055 + 2056 + mutex_unlock(&callback_mutex); 2057 + mutex_unlock(&manage_mutex); 2058 + 2059 + return 0; 2060 + } 2061 + #endif 2062 + 2063 /** 2064 * cpuset_init_smp - initialize cpus_allowed 2065 * ··· 2043 { 2044 top_cpuset.cpus_allowed = cpu_online_map; 2045 top_cpuset.mems_allowed = node_online_map; 2046 + 2047 + hotcpu_notifier(cpuset_handle_cpuhp, 0); 2048 } 2049 2050 /**