Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/MCE/AMD: Clear DFR errors found in THR handler

AMD's MCA Thresholding feature counts errors of all severity levels, not
just correctable errors. If a deferred error causes the threshold limit
to be reached (it was the error that caused the overflow), then both a
deferred error interrupt and a thresholding interrupt will be triggered.

The order of the interrupts is not guaranteed. If the threshold
interrupt handler is executed first, then it will clear MCA_STATUS for
the error. It will not check or clear MCA_DESTAT which also holds a copy
of the deferred error. When the deferred error interrupt handler runs it
will not find an error in MCA_STATUS, but it will find the error in
MCA_DESTAT. This will cause two errors to be logged.

Check for deferred errors when handling a threshold interrupt. If a bank
contains a deferred error, then clear the bank's MCA_DESTAT register.

Define a new helper function to do the deferred error check and clearing
of MCA_DESTAT.

[ bp: Simplify, convert comment to passive voice. ]

Fixes: 37d43acfd79f ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220621155943.33623-1-yazen.ghannam@amd.com

authored by

Yazen Ghannam and committed by
Borislav Petkov
bc1b705b 9abf2313

+20 -13
+20 -13
arch/x86/kernel/cpu/mce/amd.c
··· 788 788 return status & MCI_STATUS_DEFERRED; 789 789 } 790 790 791 + static bool _log_error_deferred(unsigned int bank, u32 misc) 792 + { 793 + if (!_log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), 794 + mca_msr_reg(bank, MCA_ADDR), misc)) 795 + return false; 796 + 797 + /* 798 + * Non-SMCA systems don't have MCA_DESTAT/MCA_DEADDR registers. 799 + * Return true here to avoid accessing these registers. 800 + */ 801 + if (!mce_flags.smca) 802 + return true; 803 + 804 + /* Clear MCA_DESTAT if the deferred error was logged from MCA_STATUS. */ 805 + wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0); 806 + return true; 807 + } 808 + 791 809 /* 792 810 * We have three scenarios for checking for Deferred errors: 793 811 * ··· 817 799 */ 818 800 static void log_error_deferred(unsigned int bank) 819 801 { 820 - bool defrd; 821 - 822 - defrd = _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), 823 - mca_msr_reg(bank, MCA_ADDR), 0); 824 - 825 - if (!mce_flags.smca) 802 + if (_log_error_deferred(bank, 0)) 826 803 return; 827 - 828 - /* Clear MCA_DESTAT if we logged the deferred error from MCA_STATUS. */ 829 - if (defrd) { 830 - wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0); 831 - return; 832 - } 833 804 834 805 /* 835 806 * Only deferred errors are logged in MCA_DE{STAT,ADDR} so just check ··· 839 832 840 833 static void log_error_thresholding(unsigned int bank, u64 misc) 841 834 { 842 - _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), mca_msr_reg(bank, MCA_ADDR), misc); 835 + _log_error_deferred(bank, misc); 843 836 } 844 837 845 838 static void log_and_reset_block(struct threshold_block *block)