Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/mce: fix off by one errors in mce event handling

Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in
save_mce_event, index got the value of mce_nest_count, and
mce_nest_count was incremented *after* index was set.

However, that patch changed the behaviour so that mce_nest count was
incremented *before* setting index.

This causes an off-by-one error, as get_mce_event sets index as
mce_nest_count - 1 before reading mce_event. Thus get_mce_event reads
bogus data, causing warnings like
"Machine Check Exception, Unknown event version 0 !"
and breaking MCEs handling.

Restore the old behaviour and unbreak MCE handling by subtracting one
from the newly incremented value.

The same broken change occured in machine_check_queue_event (which set
a queue read by machine_check_process_queued_event). Fix that too,
unbreaking printing of MCE information.

Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses")
CC: stable@vger.kernel.org
CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
CC: Christoph Lameter <cl@linux.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

authored by

Daniel Axtens and committed by
Michael Ellerman
ffb2d78e 7b868e81

+2 -2
+2 -2
arch/powerpc/kernel/mce.c
··· 73 73 uint64_t nip, uint64_t addr) 74 74 { 75 75 uint64_t srr1; 76 - int index = __this_cpu_inc_return(mce_nest_count); 76 + int index = __this_cpu_inc_return(mce_nest_count) - 1; 77 77 struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]); 78 78 79 79 /* ··· 184 184 if (!get_mce_event(&evt, MCE_EVENT_RELEASE)) 185 185 return; 186 186 187 - index = __this_cpu_inc_return(mce_queue_count); 187 + index = __this_cpu_inc_return(mce_queue_count) - 1; 188 188 /* If queue is full, just return for now. */ 189 189 if (index >= MAX_MC_EVT) { 190 190 __this_cpu_dec(mce_queue_count);