Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

[PATCH] cpusets: confine oom_killer to mem_exclusive cpuset

Now the real motivation for this cpuset mem_exclusive patch series seems
trivial.

This patch keeps a task in or under one mem_exclusive cpuset from provoking an
oom kill of a task under a non-overlapping mem_exclusive cpuset. Since only
interrupt and GFP_ATOMIC allocations are allowed to escape mem_exclusive
containment, there is little to gain from oom killing a task under a
non-overlapping mem_exclusive cpuset, as almost all kernel and user memory
allocation must come from disjoint memory nodes.

This patch enables configuring a system so that a runaway job under one
mem_exclusive cpuset cannot cause the killing of a job in another such cpuset
that might be using very high compute and memory resources for a prolonged
time.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

authored by

Paul Jackson and committed by
Linus Torvalds
ef08e3b4 9bf2229f

+44
+6
include/linux/cpuset.h
··· 24 24 void cpuset_restrict_to_mems_allowed(unsigned long *nodes); 25 25 int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl); 26 26 extern int cpuset_zone_allowed(struct zone *z, unsigned int __nocast gfp_mask); 27 + extern int cpuset_excl_nodes_overlap(const struct task_struct *p); 27 28 extern struct file_operations proc_cpuset_operations; 28 29 extern char *cpuset_task_status_allowed(struct task_struct *task, char *buffer); 29 30 ··· 51 50 52 51 static inline int cpuset_zone_allowed(struct zone *z, 53 52 unsigned int __nocast gfp_mask) 53 + { 54 + return 1; 55 + } 56 + 57 + static inline int cpuset_excl_nodes_overlap(const struct task_struct *p) 54 58 { 55 59 return 1; 56 60 }
+33
kernel/cpuset.c
··· 1688 1688 return allowed; 1689 1689 } 1690 1690 1691 + /** 1692 + * cpuset_excl_nodes_overlap - Do we overlap @p's mem_exclusive ancestors? 1693 + * @p: pointer to task_struct of some other task. 1694 + * 1695 + * Description: Return true if the nearest mem_exclusive ancestor 1696 + * cpusets of tasks @p and current overlap. Used by oom killer to 1697 + * determine if task @p's memory usage might impact the memory 1698 + * available to the current task. 1699 + * 1700 + * Acquires cpuset_sem - not suitable for calling from a fast path. 1701 + **/ 1702 + 1703 + int cpuset_excl_nodes_overlap(const struct task_struct *p) 1704 + { 1705 + const struct cpuset *cs1, *cs2; /* my and p's cpuset ancestors */ 1706 + int overlap = 0; /* do cpusets overlap? */ 1707 + 1708 + down(&cpuset_sem); 1709 + cs1 = current->cpuset; 1710 + if (!cs1) 1711 + goto done; /* current task exiting */ 1712 + cs2 = p->cpuset; 1713 + if (!cs2) 1714 + goto done; /* task p is exiting */ 1715 + cs1 = nearest_exclusive_ancestor(cs1); 1716 + cs2 = nearest_exclusive_ancestor(cs2); 1717 + overlap = nodes_intersects(cs1->mems_allowed, cs2->mems_allowed); 1718 + done: 1719 + up(&cpuset_sem); 1720 + 1721 + return overlap; 1722 + } 1723 + 1691 1724 /* 1692 1725 * proc_cpuset_show() 1693 1726 * - Print tasks cpuset path into seq_file.
+5
mm/oom_kill.c
··· 20 20 #include <linux/swap.h> 21 21 #include <linux/timex.h> 22 22 #include <linux/jiffies.h> 23 + #include <linux/cpuset.h> 23 24 24 25 /* #define DEBUG */ 25 26 ··· 153 152 continue; 154 153 if (p->oomkilladj == OOM_DISABLE) 155 154 continue; 155 + /* If p's nodes don't overlap ours, it won't help to kill p. */ 156 + if (!cpuset_excl_nodes_overlap(p)) 157 + continue; 158 + 156 159 /* 157 160 * This is in the process of releasing memory so for wait it 158 161 * to finish before killing some other task by mistake.