Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors

The MI200 (Aldebaran) series of devices introduced a new SMCA bank type
for Unified Memory Controllers. The MCE subsystem already has support
for this new type. The MCE decoder module will decode the common MCA
error information for the new bank type, but it will not pass the
information to the AMD64 EDAC module for detailed memory error decoding.

Have the MCE decoder module recognize the new bank type as an SMCA UMC
memory error and pass the MCA information to AMD64 EDAC.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515113537.1052146-3-muralimk@amd.com

authored by

Yazen Ghannam and committed by
Borislav Petkov (AMD)
c35977b0 f5e87cd5

+6 -3
+4 -2
arch/x86/kernel/cpu/mce/amd.c
··· 715 715 716 716 bool amd_mce_is_memory_error(struct mce *m) 717 717 { 718 + enum smca_bank_types bank_type; 718 719 /* ErrCodeExt[20:16] */ 719 720 u8 xec = (m->status >> 16) & 0x1f; 720 721 722 + bank_type = smca_get_bank_type(m->extcpu, m->bank); 721 723 if (mce_flags.smca) 722 - return smca_get_bank_type(m->extcpu, m->bank) == SMCA_UMC && xec == 0x0; 724 + return (bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2) && xec == 0x0; 723 725 724 726 return m->bank == 4 && xec == 0x8; 725 727 } ··· 1052 1050 if (bank_type >= N_SMCA_BANK_TYPES) 1053 1051 return NULL; 1054 1052 1055 - if (b && bank_type == SMCA_UMC) { 1053 + if (b && (bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2)) { 1056 1054 if (b->block < ARRAY_SIZE(smca_umc_block_names)) 1057 1055 return smca_umc_block_names[b->block]; 1058 1056 return NULL;
+2 -1
drivers/edac/mce_amd.c
··· 1186 1186 if (xec < smca_mce_descs[bank_type].num_descs) 1187 1187 pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]); 1188 1188 1189 - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) 1189 + if ((bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2) && 1190 + xec == 0 && decode_dram_ecc) 1190 1191 decode_dram_ecc(topology_die_id(m->extcpu), m); 1191 1192 } 1192 1193