Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

s390/mm: prevent and break zero page mappings in case of storage keys

As soon as storage keys are enabled we need to stop working on zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
-> change goes to PGSTE
3) guest reads from page X
-> as X was not dirty before, the page will be zero page backed,
storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
-> as Y was not dirty before, the page will be zero page backed,
storage key from PGSTE for Y will got to storage key for zero page
overwriting storage key for X

While holding the mmap sem, we are safe against changes on entries we
already fixed, as every fault would need to take the mmap_sem (read).

Other vCPUs executing storage key instructions will get a one time interception
and be serialized also with mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

authored by

Dominik Dingel and committed by
Martin Schwidefsky
2faee8ff 593befa6

+17 -1
+5
arch/s390/include/asm/pgtable.h
··· 479 479 return 0; 480 480 } 481 481 482 + /* 483 + * In the case that a guest uses storage keys 484 + * faults should no longer be backed by zero pages 485 + */ 486 + #define mm_forbids_zeropage mm_use_skey 482 487 static inline int mm_use_skey(struct mm_struct *mm) 483 488 { 484 489 #ifdef CONFIG_PGSTE
+12 -1
arch/s390/mm/pgtable.c
··· 1256 1256 pgste_t pgste; 1257 1257 1258 1258 pgste = pgste_get_lock(pte); 1259 + /* 1260 + * Remove all zero page mappings, 1261 + * after establishing a policy to forbid zero page mappings 1262 + * following faults for that page will get fresh anonymous pages 1263 + */ 1264 + if (is_zero_pfn(pte_pfn(*pte))) { 1265 + ptep_flush_direct(walk->mm, addr, pte); 1266 + pte_val(*pte) = _PAGE_INVALID; 1267 + } 1259 1268 /* Clear storage key */ 1260 1269 pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT | 1261 1270 PGSTE_GR_BIT | PGSTE_GC_BIT); ··· 1283 1274 down_write(&mm->mmap_sem); 1284 1275 if (mm_use_skey(mm)) 1285 1276 goto out_up; 1277 + 1278 + mm->context.use_skey = 1; 1279 + 1286 1280 walk.mm = mm; 1287 1281 walk_page_range(0, TASK_SIZE, &walk); 1288 - mm->context.use_skey = 1; 1289 1282 1290 1283 out_up: 1291 1284 up_write(&mm->mmap_sem);