Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

RAS/AMD/FMPM: Safely handle saved records of various sizes

Currently, the size of the locally cached FRU record structures is
based on the module parameter "max_nr_entries".

This creates issues when restoring records if a user changes the
parameter.

If the number of entries is reduced, then old, larger records will not
be restored. The opportunity to take action on the saved data is missed.
Also, new records will be created and written to storage, even as the old
records remain in storage, resulting in wasted space.

If the number of entries is increased, then the length of the old,
smaller records will not be adjusted. This causes a checksum failure
which leads to the old record being cleared from storage. Again this
results in another missed opportunity for action on the saved data.

Allocate the temporary record with the maximum possible size based on
the current maximum number of supported entries (255). This allows the
ERST read operation to succeed if max_nr_entries has been increased.

Warn the user if a saved record exceeds the expected size and fail to
load the module. This allows the user to adjust the module parameter
without losing data or the opportunity to restore larger records.

Increase the size of a saved record up to the current max_rec_len. The
checksum will be recalculated, and the updated record will be written to
storage.

Fixes: 6f15e617cc99 ("RAS: Introduce a FRU memory poison manager")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Muralidhara M K <muralidhara.mk@amd.com>
Link: https://lore.kernel.org/r/20240319113322.280096-3-yazen.ghannam@amd.com

authored by

Yazen Ghannam and committed by
Borislav Petkov (AMD)
9b195439 4b0e527c

+37 -18
+37 -18
drivers/ras/amd/fmpm.c
··· 150 150 /* Total length of record including headers and list of descriptor entries. */ 151 151 static size_t max_rec_len; 152 152 153 + #define FMPM_MAX_REC_LEN (sizeof(struct fru_rec) + (sizeof(struct cper_fru_poison_desc) * 255)) 154 + 153 155 /* Total number of SPA entries across all FRUs. */ 154 156 static unsigned int spa_nr_entries; 155 157 ··· 477 475 struct cper_section_descriptor *sec_desc = &rec->sec_desc; 478 476 struct cper_record_header *hdr = &rec->hdr; 479 477 478 + /* 479 + * This is a saved record created with fewer max_nr_entries. 480 + * Update the record lengths and keep everything else as-is. 481 + */ 482 + if (hdr->record_length && hdr->record_length < max_rec_len) { 483 + pr_debug("Growing record 0x%016llx from %u to %zu bytes\n", 484 + hdr->record_id, hdr->record_length, max_rec_len); 485 + goto update_lengths; 486 + } 487 + 480 488 memcpy(hdr->signature, CPER_SIG_RECORD, CPER_SIG_SIZE); 481 489 hdr->revision = CPER_RECORD_REV; 482 490 hdr->signature_end = CPER_SIG_END; ··· 501 489 hdr->error_severity = CPER_SEV_RECOVERABLE; 502 490 503 491 hdr->validation_bits = 0; 504 - hdr->record_length = max_rec_len; 505 492 hdr->creator_id = CPER_CREATOR_FMP; 506 493 hdr->notification_type = CPER_NOTIFY_MCE; 507 494 hdr->record_id = cper_next_record_id(); 508 495 hdr->flags = CPER_HW_ERROR_FLAGS_PREVERR; 509 496 510 497 sec_desc->section_offset = sizeof(struct cper_record_header); 511 - sec_desc->section_length = max_rec_len - sizeof(struct cper_record_header); 512 498 sec_desc->revision = CPER_SEC_REV; 513 499 sec_desc->validation_bits = 0; 514 500 sec_desc->flags = CPER_SEC_PRIMARY; 515 501 sec_desc->section_type = CPER_SECTION_TYPE_FMP; 516 502 sec_desc->section_severity = CPER_SEV_RECOVERABLE; 503 + 504 + update_lengths: 505 + hdr->record_length = max_rec_len; 506 + sec_desc->section_length = max_rec_len - sizeof(struct cper_record_header); 517 507 } 518 508 519 509 static int save_new_records(void) ··· 526 512 int ret = 0; 527 513 528 514 for_each_fru(i, rec) { 529 - if (rec->hdr.record_length) 515 + /* No need to update saved records that match the current record size. */ 516 + if (rec->hdr.record_length == max_rec_len) 530 517 continue; 518 + 519 + if (!rec->hdr.record_length) 520 + set_bit(i, new_records); 531 521 532 522 set_rec_fields(rec); 533 523 534 524 ret = update_record_on_storage(rec); 535 525 if (ret) 536 526 goto out_clear; 537 - 538 - set_bit(i, new_records); 539 527 } 540 528 541 529 return ret; ··· 657 641 int ret, pos; 658 642 ssize_t len; 659 643 660 - /* 661 - * Assume saved records match current max size. 662 - * 663 - * However, this may not be true depending on module parameters. 664 - */ 665 - old = kmalloc(max_rec_len, GFP_KERNEL); 644 + old = kmalloc(FMPM_MAX_REC_LEN, GFP_KERNEL); 666 645 if (!old) { 667 646 ret = -ENOMEM; 668 647 goto out; ··· 674 663 * Make sure to clear temporary buffer between reads to avoid 675 664 * leftover data from records of various sizes. 676 665 */ 677 - memset(old, 0, max_rec_len); 666 + memset(old, 0, FMPM_MAX_REC_LEN); 678 667 679 - len = erst_read_record(record_id, &old->hdr, max_rec_len, 668 + len = erst_read_record(record_id, &old->hdr, FMPM_MAX_REC_LEN, 680 669 sizeof(struct fru_rec), &CPER_CREATOR_FMP); 681 670 if (len < 0) 682 671 continue; 683 - 684 - if (len > max_rec_len) { 685 - pr_debug("Found record larger than max_rec_len\n"); 686 - continue; 687 - } 688 672 689 673 new = get_valid_record(old); 690 674 if (!new) { 691 675 erst_clear(record_id); 692 676 continue; 677 + } 678 + 679 + if (len > max_rec_len) { 680 + unsigned int saved_nr_entries; 681 + 682 + saved_nr_entries = len - sizeof(struct fru_rec); 683 + saved_nr_entries /= sizeof(struct cper_fru_poison_desc); 684 + 685 + pr_warn("Saved record found with %u entries.\n", saved_nr_entries); 686 + pr_warn("Please increase max_nr_entries to %u.\n", saved_nr_entries); 687 + 688 + ret = -EINVAL; 689 + goto out_end; 693 690 } 694 691 695 692 /* Restore the record */