Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

sh: add sleazy FPU optimization

sh port of the sLeAZY-fpu feature currently implemented for some architectures
such us i386.

Right now the SH kernel has a 100% lazy fpu behaviour.
This is of course great for applications that have very sporadic or no FPU use.
However for very frequent FPU users... you take an extra trap every context
switch.
The patch below adds a simple heuristic to this code: after 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch.
After 256 switches, this is reset and the 100% lazy behavior is returned.

Tests with LMbench showed no regression.
I saw a little improvement due to the prefetching (~2%).

The tests below also show that, with this sLeazy patch, indeed,
the number of FPU exceptions is reduced.
To test this. I hacked the lat_ctx LMBench to use the FPU a little more.

sLeasy implementation
===========================================
switch_to calls | 79326
sleasy calls | 42577
do_fpu_state_restore calls| 59232
restore_fpu calls | 59032

Exceptions: 0x800 (FPU disabled ): 16604

100% Leazy (default implementation)
===========================================
switch_to calls | 79690
do_fpu_state_restore calls | 53299
restore_fpu calls | 53101

Exceptions: 0x800 (FPU disabled ): 53273

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

authored by

Giuseppe CAVALLARO and committed by
Paul Mundt
a0458b07 a8a8a669

+31 -4
+3
arch/sh/include/asm/fpu.h
··· 19 19 struct task_struct; 20 20 21 21 extern void save_fpu(struct task_struct *__tsk, struct pt_regs *regs); 22 + void fpu_state_restore(struct pt_regs *regs); 22 23 #else 23 24 24 25 #define release_fpu(regs) do { } while (0) ··· 45 44 preempt_disable(); 46 45 if (test_tsk_thread_flag(tsk, TIF_USEDFPU)) 47 46 save_fpu(tsk, regs); 47 + else 48 + tsk->fpu_counter = 0; 48 49 preempt_enable(); 49 50 } 50 51
+12 -4
arch/sh/kernel/cpu/sh4/fpu.c
··· 483 483 force_sig(SIGFPE, tsk); 484 484 } 485 485 486 - BUILD_TRAP_HANDLER(fpu_state_restore) 486 + void fpu_state_restore(struct pt_regs *regs) 487 487 { 488 488 struct task_struct *tsk = current; 489 - TRAP_HANDLER_DECL; 490 489 491 490 grab_fpu(regs); 492 - if (!user_mode(regs)) { 491 + if (unlikely(!user_mode(regs))) { 493 492 printk(KERN_ERR "BUG: FPU is used in kernel mode.\n"); 493 + BUG(); 494 494 return; 495 495 } 496 496 497 - if (used_math()) { 497 + if (likely(used_math())) { 498 498 /* Using the FPU again. */ 499 499 restore_fpu(tsk); 500 500 } else { ··· 503 503 set_used_math(); 504 504 } 505 505 set_tsk_thread_flag(tsk, TIF_USEDFPU); 506 + tsk->fpu_counter++; 507 + } 508 + 509 + BUILD_TRAP_HANDLER(fpu_state_restore) 510 + { 511 + TRAP_HANDLER_DECL; 512 + 513 + fpu_state_restore(regs); 506 514 }
+16
arch/sh/kernel/process_32.c
··· 288 288 __notrace_funcgraph struct task_struct * 289 289 __switch_to(struct task_struct *prev, struct task_struct *next) 290 290 { 291 + struct thread_struct *next_t = &next->thread; 292 + 291 293 #if defined(CONFIG_SH_FPU) 292 294 unlazy_fpu(prev, task_pt_regs(prev)); 295 + 296 + /* we're going to use this soon, after a few expensive things */ 297 + if (next->fpu_counter > 5) 298 + prefetch(&next_t->fpu.hard); 293 299 #endif 294 300 295 301 #ifdef CONFIG_MMU ··· 326 320 ctrl_outw(0, UBC_BBRB); 327 321 #endif 328 322 } 323 + 324 + #if defined(CONFIG_SH_FPU) 325 + /* If the task has used fpu the last 5 timeslices, just do a full 326 + * restore of the math state immediately to avoid the trap; the 327 + * chances of needing FPU soon are obviously high now 328 + */ 329 + if (next->fpu_counter > 5) { 330 + fpu_state_restore(task_pt_regs(next)); 331 + } 332 + #endif 329 333 330 334 return prev; 331 335 }