Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

sched: Fix nohz load accounting -- again!

Various people reported nohz load tracking still being wrecked, but Doug
spotted the actual problem. We fold the nohz remainder in too soon,
causing us to loose samples and under-account.

So instead of playing catch-up up-front, always do a single load-fold
with whatever state we encounter and only then fold the nohz remainder
and play catch-up.

Reported-by: Doug Smythies <dsmythies@telus.net>
Reported-by: LesÅ=82aw Kope=C4=87 <leslaw.kopec@nasza-klasa.pl>
Reported-by: Aman Gupta <aman@tmm1.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-4v31etnhgg9kwd6ocgx3rxl8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

authored by

Peter Zijlstra and committed by
Ingo Molnar
c308b56b 8e3fabfd

+27 -28
+27 -28
kernel/sched/core.c
··· 2266 2266 * Once we've updated the global active value, we need to apply the exponential 2267 2267 * weights adjusted to the number of cycles missed. 2268 2268 */ 2269 - static void calc_global_nohz(unsigned long ticks) 2269 + static void calc_global_nohz(void) 2270 2270 { 2271 2271 long delta, active, n; 2272 - 2273 - if (time_before(jiffies, calc_load_update)) 2274 - return; 2275 2272 2276 2273 /* 2277 2274 * If we crossed a calc_load_update boundary, make sure to fold ··· 2281 2284 atomic_long_add(delta, &calc_load_tasks); 2282 2285 2283 2286 /* 2284 - * If we were idle for multiple load cycles, apply them. 2287 + * It could be the one fold was all it took, we done! 2285 2288 */ 2286 - if (ticks >= LOAD_FREQ) { 2287 - n = ticks / LOAD_FREQ; 2288 - 2289 - active = atomic_long_read(&calc_load_tasks); 2290 - active = active > 0 ? active * FIXED_1 : 0; 2291 - 2292 - avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n); 2293 - avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n); 2294 - avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n); 2295 - 2296 - calc_load_update += n * LOAD_FREQ; 2297 - } 2289 + if (time_before(jiffies, calc_load_update + 10)) 2290 + return; 2298 2291 2299 2292 /* 2300 - * Its possible the remainder of the above division also crosses 2301 - * a LOAD_FREQ period, the regular check in calc_global_load() 2302 - * which comes after this will take care of that. 2303 - * 2304 - * Consider us being 11 ticks before a cycle completion, and us 2305 - * sleeping for 4*LOAD_FREQ + 22 ticks, then the above code will 2306 - * age us 4 cycles, and the test in calc_global_load() will 2307 - * pick up the final one. 2293 + * Catch-up, fold however many we are behind still 2308 2294 */ 2295 + delta = jiffies - calc_load_update - 10; 2296 + n = 1 + (delta / LOAD_FREQ); 2297 + 2298 + active = atomic_long_read(&calc_load_tasks); 2299 + active = active > 0 ? active * FIXED_1 : 0; 2300 + 2301 + avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n); 2302 + avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n); 2303 + avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n); 2304 + 2305 + calc_load_update += n * LOAD_FREQ; 2309 2306 } 2310 2307 #else 2311 2308 void calc_load_account_idle(struct rq *this_rq) ··· 2311 2320 return 0; 2312 2321 } 2313 2322 2314 - static void calc_global_nohz(unsigned long ticks) 2323 + static void calc_global_nohz(void) 2315 2324 { 2316 2325 } 2317 2326 #endif ··· 2339 2348 { 2340 2349 long active; 2341 2350 2342 - calc_global_nohz(ticks); 2343 - 2344 2351 if (time_before(jiffies, calc_load_update + 10)) 2345 2352 return; 2346 2353 ··· 2350 2361 avenrun[2] = calc_load(avenrun[2], EXP_15, active); 2351 2362 2352 2363 calc_load_update += LOAD_FREQ; 2364 + 2365 + /* 2366 + * Account one period with whatever state we found before 2367 + * folding in the nohz state and ageing the entire idle period. 2368 + * 2369 + * This avoids loosing a sample when we go idle between 2370 + * calc_load_account_active() (10 ticks ago) and now and thus 2371 + * under-accounting. 2372 + */ 2373 + calc_global_nohz(); 2353 2374 } 2354 2375 2355 2376 /*