Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

EDAC/mce_amd: Add support for FRU text in MCA

A new "FRU Text in MCA" feature is defined where the Field Replaceable
Unit (FRU) Text for a device is represented by a string in the new
MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).

The FRU Text is populated dynamically for each individual error state
(MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
multiple devices, for example, a Unified Memory Controller (UMC) bank
that manages two DIMMs.

[ Yazen: Add Avadhut as co-developer for wrapper changes. ]
[ bp: Do not expose MCA_CONFIG to userspace yet. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com

authored by

Yazen Ghannam and committed by
Borislav Petkov (AMD)
612c2add e9876daf

+13 -6
+1
arch/x86/include/asm/mce.h
··· 61 61 * - TCC bit is present in MCx_STATUS. 62 62 */ 63 63 #define MCI_CONFIG_MCAX 0x1 64 + #define MCI_CONFIG_FRUTEXT BIT_ULL(9) 64 65 #define MCI_IPID_MCATYPE 0xFFFF0000 65 66 #define MCI_IPID_HWID 0xFFF 66 67
+12 -6
drivers/edac/mce_amd.c
··· 795 795 struct mce *m = (struct mce *)data; 796 796 struct mce_hw_err *err = to_mce_hw_err(m); 797 797 unsigned int fam = x86_family(m->cpuid); 798 + u32 mca_config_lo = 0, dummy; 798 799 int ecc; 799 800 800 801 if (m->kflags & MCE_HANDLED_CEC) ··· 815 814 ((m->status & MCI_STATUS_PCC) ? "PCC" : "-")); 816 815 817 816 if (boot_cpu_has(X86_FEATURE_SMCA)) { 818 - u32 low, high; 819 - u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank); 817 + rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), &mca_config_lo, &dummy); 820 818 821 - if (!rdmsr_safe(addr, &low, &high) && 822 - (low & MCI_CONFIG_MCAX)) 819 + if (mca_config_lo & MCI_CONFIG_MCAX) 823 820 pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-")); 824 821 825 822 pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-")); ··· 852 853 853 854 if (m->status & MCI_STATUS_SYNDV) { 854 855 pr_cont(", Syndrome: 0x%016llx\n", m->synd); 855 - pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx", 856 - err->vendor.amd.synd1, err->vendor.amd.synd2); 856 + if (mca_config_lo & MCI_CONFIG_FRUTEXT) { 857 + char frutext[17]; 858 + 859 + frutext[16] = '\0'; 860 + memcpy(&frutext[0], &err->vendor.amd.synd1, 8); 861 + memcpy(&frutext[8], &err->vendor.amd.synd2, 8); 862 + 863 + pr_emerg(HW_ERR "FRU Text: %s", frutext); 864 + } 857 865 } 858 866 859 867 pr_cont("\n");