Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER

Add a new LANDLOCK_ACCESS_FS_REFER access right to enable policy writers
to allow sandboxed processes to link and rename files from and to a
specific set of file hierarchies. This access right should be composed
with LANDLOCK_ACCESS_FS_MAKE_* for the destination of a link or rename,
and with LANDLOCK_ACCESS_FS_REMOVE_* for a source of a rename. This
lift a Landlock limitation that always denied changing the parent of an
inode.

Renaming or linking to the same directory is still always allowed,
whatever LANDLOCK_ACCESS_FS_REFER is used or not, because it is not
considered a threat to user data.

However, creating multiple links or renaming to a different parent
directory may lead to privilege escalations if not handled properly.
Indeed, we must be sure that the source doesn't gain more privileges by
being accessible from the destination. This is handled by making sure
that the source hierarchy (including the referenced file or directory
itself) restricts at least as much the destination hierarchy. If it is
not the case, an EXDEV error is returned, making it potentially possible
for user space to copy the file hierarchy instead of moving or linking
it.

Instead of creating different access rights for the source and the
destination, we choose to make it simple and consistent for users.
Indeed, considering the previous constraint, it would be weird to
require such destination access right to be also granted to the source
(to make it a superset). Moreover, RENAME_EXCHANGE would also add to
the confusion because of paths being both a source and a destination.

See the provided documentation for additional details.

New tests are provided with a following commit.

Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220506161102.525323-8-mic@digikod.net

+558 -82
+25 -2
include/uapi/linux/landlock.h
··· 21 21 /** 22 22 * @handled_access_fs: Bitmask of actions (cf. `Filesystem flags`_) 23 23 * that is handled by this ruleset and should then be forbidden if no 24 - * rule explicitly allow them. This is needed for backward 25 - * compatibility reasons. 24 + * rule explicitly allow them: it is a deny-by-default list that should 25 + * contain as much Landlock access rights as possible. Indeed, all 26 + * Landlock filesystem access rights that are not part of 27 + * handled_access_fs are allowed. This is needed for backward 28 + * compatibility reasons. One exception is the 29 + * LANDLOCK_ACCESS_FS_REFER access right, which is always implicitly 30 + * handled, but must still be explicitly handled to add new rules with 31 + * this access right. 26 32 */ 27 33 __u64 handled_access_fs; 28 34 }; ··· 118 112 * - %LANDLOCK_ACCESS_FS_MAKE_FIFO: Create (or rename or link) a named pipe. 119 113 * - %LANDLOCK_ACCESS_FS_MAKE_BLOCK: Create (or rename or link) a block device. 120 114 * - %LANDLOCK_ACCESS_FS_MAKE_SYM: Create (or rename or link) a symbolic link. 115 + * - %LANDLOCK_ACCESS_FS_REFER: Link or rename a file from or to a different 116 + * directory (i.e. reparent a file hierarchy). This access right is 117 + * available since the second version of the Landlock ABI. This is also the 118 + * only access right which is always considered handled by any ruleset in 119 + * such a way that reparenting a file hierarchy is always denied by default. 120 + * To avoid privilege escalation, it is not enough to add a rule with this 121 + * access right. When linking or renaming a file, the destination directory 122 + * hierarchy must also always have the same or a superset of restrictions of 123 + * the source hierarchy. If it is not the case, or if the domain doesn't 124 + * handle this access right, such actions are denied by default with errno 125 + * set to EXDEV. Linking also requires a LANDLOCK_ACCESS_FS_MAKE_* access 126 + * right on the destination directory, and renaming also requires a 127 + * LANDLOCK_ACCESS_FS_REMOVE_* access right on the source's (file or 128 + * directory) parent. Otherwise, such actions are denied with errno set to 129 + * EACCES. The EACCES errno prevails over EXDEV to let user space 130 + * efficiently deal with an unrecoverable error. 121 131 * 122 132 * .. warning:: 123 133 * ··· 159 137 #define LANDLOCK_ACCESS_FS_MAKE_FIFO (1ULL << 10) 160 138 #define LANDLOCK_ACCESS_FS_MAKE_BLOCK (1ULL << 11) 161 139 #define LANDLOCK_ACCESS_FS_MAKE_SYM (1ULL << 12) 140 + #define LANDLOCK_ACCESS_FS_REFER (1ULL << 13) 162 141 /* clang-format on */ 163 142 164 143 #endif /* _UAPI_LINUX_LANDLOCK_H */
+528 -76
security/landlock/fs.c
··· 4 4 * 5 5 * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net> 6 6 * Copyright © 2018-2020 ANSSI 7 + * Copyright © 2021-2022 Microsoft Corporation 7 8 */ 8 9 9 10 #include <linux/atomic.h> ··· 274 273 unlikely(IS_PRIVATE(d_backing_inode(dentry)))); 275 274 } 276 275 277 - static int check_access_path(const struct landlock_ruleset *const domain, 278 - const struct path *const path, 279 - const access_mask_t access_request) 276 + static inline access_mask_t 277 + get_handled_accesses(const struct landlock_ruleset *const domain) 280 278 { 281 - layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {}; 282 - bool allowed = false, has_access = false; 283 - struct path walker_path; 284 - size_t i; 279 + access_mask_t access_dom = 0; 280 + unsigned long access_bit; 285 281 282 + for (access_bit = 0; access_bit < LANDLOCK_NUM_ACCESS_FS; 283 + access_bit++) { 284 + size_t layer_level; 285 + 286 + for (layer_level = 0; layer_level < domain->num_layers; 287 + layer_level++) { 288 + if (domain->fs_access_masks[layer_level] & 289 + BIT_ULL(access_bit)) { 290 + access_dom |= BIT_ULL(access_bit); 291 + break; 292 + } 293 + } 294 + } 295 + return access_dom; 296 + } 297 + 298 + static inline access_mask_t 299 + init_layer_masks(const struct landlock_ruleset *const domain, 300 + const access_mask_t access_request, 301 + layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS]) 302 + { 303 + access_mask_t handled_accesses = 0; 304 + size_t layer_level; 305 + 306 + memset(layer_masks, 0, sizeof(*layer_masks)); 307 + /* An empty access request can happen because of O_WRONLY | O_RDWR. */ 286 308 if (!access_request) 309 + return 0; 310 + 311 + /* Saves all handled accesses per layer. */ 312 + for (layer_level = 0; layer_level < domain->num_layers; layer_level++) { 313 + const unsigned long access_req = access_request; 314 + unsigned long access_bit; 315 + 316 + for_each_set_bit(access_bit, &access_req, 317 + ARRAY_SIZE(*layer_masks)) { 318 + if (domain->fs_access_masks[layer_level] & 319 + BIT_ULL(access_bit)) { 320 + (*layer_masks)[access_bit] |= 321 + BIT_ULL(layer_level); 322 + handled_accesses |= BIT_ULL(access_bit); 323 + } 324 + } 325 + } 326 + return handled_accesses; 327 + } 328 + 329 + /* 330 + * Check that a destination file hierarchy has more restrictions than a source 331 + * file hierarchy. This is only used for link and rename actions. 332 + * 333 + * @layer_masks_child2: Optional child masks. 334 + */ 335 + static inline bool no_more_access( 336 + const layer_mask_t (*const layer_masks_parent1)[LANDLOCK_NUM_ACCESS_FS], 337 + const layer_mask_t (*const layer_masks_child1)[LANDLOCK_NUM_ACCESS_FS], 338 + const bool child1_is_directory, 339 + const layer_mask_t (*const layer_masks_parent2)[LANDLOCK_NUM_ACCESS_FS], 340 + const layer_mask_t (*const layer_masks_child2)[LANDLOCK_NUM_ACCESS_FS], 341 + const bool child2_is_directory) 342 + { 343 + unsigned long access_bit; 344 + 345 + for (access_bit = 0; access_bit < ARRAY_SIZE(*layer_masks_parent2); 346 + access_bit++) { 347 + /* Ignores accesses that only make sense for directories. */ 348 + const bool is_file_access = 349 + !!(BIT_ULL(access_bit) & ACCESS_FILE); 350 + 351 + if (child1_is_directory || is_file_access) { 352 + /* 353 + * Checks if the destination restrictions are a 354 + * superset of the source ones (i.e. inherited access 355 + * rights without child exceptions): 356 + * restrictions(parent2) >= restrictions(child1) 357 + */ 358 + if ((((*layer_masks_parent1)[access_bit] & 359 + (*layer_masks_child1)[access_bit]) | 360 + (*layer_masks_parent2)[access_bit]) != 361 + (*layer_masks_parent2)[access_bit]) 362 + return false; 363 + } 364 + 365 + if (!layer_masks_child2) 366 + continue; 367 + if (child2_is_directory || is_file_access) { 368 + /* 369 + * Checks inverted restrictions for RENAME_EXCHANGE: 370 + * restrictions(parent1) >= restrictions(child2) 371 + */ 372 + if ((((*layer_masks_parent2)[access_bit] & 373 + (*layer_masks_child2)[access_bit]) | 374 + (*layer_masks_parent1)[access_bit]) != 375 + (*layer_masks_parent1)[access_bit]) 376 + return false; 377 + } 378 + } 379 + return true; 380 + } 381 + 382 + /* 383 + * Removes @layer_masks accesses that are not requested. 384 + * 385 + * Returns true if the request is allowed, false otherwise. 386 + */ 387 + static inline bool 388 + scope_to_request(const access_mask_t access_request, 389 + layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS]) 390 + { 391 + const unsigned long access_req = access_request; 392 + unsigned long access_bit; 393 + 394 + if (WARN_ON_ONCE(!layer_masks)) 395 + return true; 396 + 397 + for_each_clear_bit(access_bit, &access_req, ARRAY_SIZE(*layer_masks)) 398 + (*layer_masks)[access_bit] = 0; 399 + return !memchr_inv(layer_masks, 0, sizeof(*layer_masks)); 400 + } 401 + 402 + /* 403 + * Returns true if there is at least one access right different than 404 + * LANDLOCK_ACCESS_FS_REFER. 405 + */ 406 + static inline bool 407 + is_eacces(const layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS], 408 + const access_mask_t access_request) 409 + { 410 + unsigned long access_bit; 411 + /* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */ 412 + const unsigned long access_check = access_request & 413 + ~LANDLOCK_ACCESS_FS_REFER; 414 + 415 + if (!layer_masks) 416 + return false; 417 + 418 + for_each_set_bit(access_bit, &access_check, ARRAY_SIZE(*layer_masks)) { 419 + if ((*layer_masks)[access_bit]) 420 + return true; 421 + } 422 + return false; 423 + } 424 + 425 + /** 426 + * check_access_path_dual - Check accesses for requests with a common path 427 + * 428 + * @domain: Domain to check against. 429 + * @path: File hierarchy to walk through. 430 + * @access_request_parent1: Accesses to check, once @layer_masks_parent1 is 431 + * equal to @layer_masks_parent2 (if any). This is tied to the unique 432 + * requested path for most actions, or the source in case of a refer action 433 + * (i.e. rename or link), or the source and destination in case of 434 + * RENAME_EXCHANGE. 435 + * @layer_masks_parent1: Pointer to a matrix of layer masks per access 436 + * masks, identifying the layers that forbid a specific access. Bits from 437 + * this matrix can be unset according to the @path walk. An empty matrix 438 + * means that @domain allows all possible Landlock accesses (i.e. not only 439 + * those identified by @access_request_parent1). This matrix can 440 + * initially refer to domain layer masks and, when the accesses for the 441 + * destination and source are the same, to requested layer masks. 442 + * @dentry_child1: Dentry to the initial child of the parent1 path. This 443 + * pointer must be NULL for non-refer actions (i.e. not link nor rename). 444 + * @access_request_parent2: Similar to @access_request_parent1 but for a 445 + * request involving a source and a destination. This refers to the 446 + * destination, except in case of RENAME_EXCHANGE where it also refers to 447 + * the source. Must be set to 0 when using a simple path request. 448 + * @layer_masks_parent2: Similar to @layer_masks_parent1 but for a refer 449 + * action. This must be NULL otherwise. 450 + * @dentry_child2: Dentry to the initial child of the parent2 path. This 451 + * pointer is only set for RENAME_EXCHANGE actions and must be NULL 452 + * otherwise. 453 + * 454 + * This helper first checks that the destination has a superset of restrictions 455 + * compared to the source (if any) for a common path. Because of 456 + * RENAME_EXCHANGE actions, source and destinations may be swapped. It then 457 + * checks that the collected accesses and the remaining ones are enough to 458 + * allow the request. 459 + * 460 + * Returns: 461 + * - 0 if the access request is granted; 462 + * - -EACCES if it is denied because of access right other than 463 + * LANDLOCK_ACCESS_FS_REFER; 464 + * - -EXDEV if the renaming or linking would be a privileged escalation 465 + * (according to each layered policies), or if LANDLOCK_ACCESS_FS_REFER is 466 + * not allowed by the source or the destination. 467 + */ 468 + static int check_access_path_dual( 469 + const struct landlock_ruleset *const domain, 470 + const struct path *const path, 471 + const access_mask_t access_request_parent1, 472 + layer_mask_t (*const layer_masks_parent1)[LANDLOCK_NUM_ACCESS_FS], 473 + const struct dentry *const dentry_child1, 474 + const access_mask_t access_request_parent2, 475 + layer_mask_t (*const layer_masks_parent2)[LANDLOCK_NUM_ACCESS_FS], 476 + const struct dentry *const dentry_child2) 477 + { 478 + bool allowed_parent1 = false, allowed_parent2 = false, is_dom_check, 479 + child1_is_directory = true, child2_is_directory = true; 480 + struct path walker_path; 481 + access_mask_t access_masked_parent1, access_masked_parent2; 482 + layer_mask_t _layer_masks_child1[LANDLOCK_NUM_ACCESS_FS], 483 + _layer_masks_child2[LANDLOCK_NUM_ACCESS_FS]; 484 + layer_mask_t(*layer_masks_child1)[LANDLOCK_NUM_ACCESS_FS] = NULL, 485 + (*layer_masks_child2)[LANDLOCK_NUM_ACCESS_FS] = NULL; 486 + 487 + if (!access_request_parent1 && !access_request_parent2) 287 488 return 0; 288 489 if (WARN_ON_ONCE(!domain || !path)) 289 490 return 0; 290 491 if (is_nouser_or_private(path->dentry)) 291 492 return 0; 292 - if (WARN_ON_ONCE(domain->num_layers < 1)) 493 + if (WARN_ON_ONCE(domain->num_layers < 1 || !layer_masks_parent1)) 293 494 return -EACCES; 294 495 295 - /* Saves all layers handling a subset of requested accesses. */ 296 - for (i = 0; i < domain->num_layers; i++) { 297 - const unsigned long access_req = access_request; 298 - unsigned long access_bit; 299 - 300 - for_each_set_bit(access_bit, &access_req, 301 - ARRAY_SIZE(layer_masks)) { 302 - if (domain->fs_access_masks[i] & BIT_ULL(access_bit)) { 303 - layer_masks[access_bit] |= BIT_ULL(i); 304 - has_access = true; 305 - } 306 - } 496 + if (unlikely(layer_masks_parent2)) { 497 + if (WARN_ON_ONCE(!dentry_child1)) 498 + return -EACCES; 499 + /* 500 + * For a double request, first check for potential privilege 501 + * escalation by looking at domain handled accesses (which are 502 + * a superset of the meaningful requested accesses). 503 + */ 504 + access_masked_parent1 = access_masked_parent2 = 505 + get_handled_accesses(domain); 506 + is_dom_check = true; 507 + } else { 508 + if (WARN_ON_ONCE(dentry_child1 || dentry_child2)) 509 + return -EACCES; 510 + /* For a simple request, only check for requested accesses. */ 511 + access_masked_parent1 = access_request_parent1; 512 + access_masked_parent2 = access_request_parent2; 513 + is_dom_check = false; 307 514 } 308 - /* An access request not handled by the domain is allowed. */ 309 - if (!has_access) 310 - return 0; 515 + 516 + if (unlikely(dentry_child1)) { 517 + unmask_layers(find_rule(domain, dentry_child1), 518 + init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS, 519 + &_layer_masks_child1), 520 + &_layer_masks_child1); 521 + layer_masks_child1 = &_layer_masks_child1; 522 + child1_is_directory = d_is_dir(dentry_child1); 523 + } 524 + if (unlikely(dentry_child2)) { 525 + unmask_layers(find_rule(domain, dentry_child2), 526 + init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS, 527 + &_layer_masks_child2), 528 + &_layer_masks_child2); 529 + layer_masks_child2 = &_layer_masks_child2; 530 + child2_is_directory = d_is_dir(dentry_child2); 531 + } 311 532 312 533 walker_path = *path; 313 534 path_get(&walker_path); ··· 539 316 */ 540 317 while (true) { 541 318 struct dentry *parent_dentry; 319 + const struct landlock_rule *rule; 542 320 543 - allowed = unmask_layers(find_rule(domain, walker_path.dentry), 544 - access_request, &layer_masks); 545 - if (allowed) 546 - /* Stops when a rule from each layer grants access. */ 321 + /* 322 + * If at least all accesses allowed on the destination are 323 + * already allowed on the source, respectively if there is at 324 + * least as much as restrictions on the destination than on the 325 + * source, then we can safely refer files from the source to 326 + * the destination without risking a privilege escalation. 327 + * This also applies in the case of RENAME_EXCHANGE, which 328 + * implies checks on both direction. This is crucial for 329 + * standalone multilayered security policies. Furthermore, 330 + * this helps avoid policy writers to shoot themselves in the 331 + * foot. 332 + */ 333 + if (unlikely(is_dom_check && 334 + no_more_access( 335 + layer_masks_parent1, layer_masks_child1, 336 + child1_is_directory, layer_masks_parent2, 337 + layer_masks_child2, 338 + child2_is_directory))) { 339 + allowed_parent1 = scope_to_request( 340 + access_request_parent1, layer_masks_parent1); 341 + allowed_parent2 = scope_to_request( 342 + access_request_parent2, layer_masks_parent2); 343 + 344 + /* Stops when all accesses are granted. */ 345 + if (allowed_parent1 && allowed_parent2) 346 + break; 347 + 348 + /* 349 + * Now, downgrades the remaining checks from domain 350 + * handled accesses to requested accesses. 351 + */ 352 + is_dom_check = false; 353 + access_masked_parent1 = access_request_parent1; 354 + access_masked_parent2 = access_request_parent2; 355 + } 356 + 357 + rule = find_rule(domain, walker_path.dentry); 358 + allowed_parent1 = unmask_layers(rule, access_masked_parent1, 359 + layer_masks_parent1); 360 + allowed_parent2 = unmask_layers(rule, access_masked_parent2, 361 + layer_masks_parent2); 362 + 363 + /* Stops when a rule from each layer grants access. */ 364 + if (allowed_parent1 && allowed_parent2) 547 365 break; 548 366 549 367 jump_up: ··· 597 333 * Stops at the real root. Denies access 598 334 * because not all layers have granted access. 599 335 */ 600 - allowed = false; 601 336 break; 602 337 } 603 338 } ··· 606 343 * access to internal filesystems (e.g. nsfs, which is 607 344 * reachable through /proc/<pid>/ns/<namespace>). 608 345 */ 609 - allowed = !!(walker_path.mnt->mnt_flags & MNT_INTERNAL); 346 + allowed_parent1 = allowed_parent2 = 347 + !!(walker_path.mnt->mnt_flags & MNT_INTERNAL); 610 348 break; 611 349 } 612 350 parent_dentry = dget_parent(walker_path.dentry); ··· 615 351 walker_path.dentry = parent_dentry; 616 352 } 617 353 path_put(&walker_path); 618 - return allowed ? 0 : -EACCES; 354 + 355 + if (allowed_parent1 && allowed_parent2) 356 + return 0; 357 + 358 + /* 359 + * This prioritizes EACCES over EXDEV for all actions, including 360 + * renames with RENAME_EXCHANGE. 361 + */ 362 + if (likely(is_eacces(layer_masks_parent1, access_request_parent1) || 363 + is_eacces(layer_masks_parent2, access_request_parent2))) 364 + return -EACCES; 365 + 366 + /* 367 + * Gracefully forbids reparenting if the destination directory 368 + * hierarchy is not a superset of restrictions of the source directory 369 + * hierarchy, or if LANDLOCK_ACCESS_FS_REFER is not allowed by the 370 + * source or the destination. 371 + */ 372 + return -EXDEV; 373 + } 374 + 375 + static inline int check_access_path(const struct landlock_ruleset *const domain, 376 + const struct path *const path, 377 + access_mask_t access_request) 378 + { 379 + layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {}; 380 + 381 + access_request = init_layer_masks(domain, access_request, &layer_masks); 382 + return check_access_path_dual(domain, path, access_request, 383 + &layer_masks, NULL, 0, NULL, NULL); 619 384 } 620 385 621 386 static inline int current_check_access_path(const struct path *const path, ··· 689 396 return 0; 690 397 return d_is_dir(dentry) ? LANDLOCK_ACCESS_FS_REMOVE_DIR : 691 398 LANDLOCK_ACCESS_FS_REMOVE_FILE; 399 + } 400 + 401 + /** 402 + * collect_domain_accesses - Walk through a file path and collect accesses 403 + * 404 + * @domain: Domain to check against. 405 + * @mnt_root: Last directory to check. 406 + * @dir: Directory to start the walk from. 407 + * @layer_masks_dom: Where to store the collected accesses. 408 + * 409 + * This helper is useful to begin a path walk from the @dir directory to a 410 + * @mnt_root directory used as a mount point. This mount point is the common 411 + * ancestor between the source and the destination of a renamed and linked 412 + * file. While walking from @dir to @mnt_root, we record all the domain's 413 + * allowed accesses in @layer_masks_dom. 414 + * 415 + * This is similar to check_access_path_dual() but much simpler because it only 416 + * handles walking on the same mount point and only check one set of accesses. 417 + * 418 + * Returns: 419 + * - true if all the domain access rights are allowed for @dir; 420 + * - false if the walk reached @mnt_root. 421 + */ 422 + static bool collect_domain_accesses( 423 + const struct landlock_ruleset *const domain, 424 + const struct dentry *const mnt_root, struct dentry *dir, 425 + layer_mask_t (*const layer_masks_dom)[LANDLOCK_NUM_ACCESS_FS]) 426 + { 427 + unsigned long access_dom; 428 + bool ret = false; 429 + 430 + if (WARN_ON_ONCE(!domain || !mnt_root || !dir || !layer_masks_dom)) 431 + return true; 432 + if (is_nouser_or_private(dir)) 433 + return true; 434 + 435 + access_dom = init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS, 436 + layer_masks_dom); 437 + 438 + dget(dir); 439 + while (true) { 440 + struct dentry *parent_dentry; 441 + 442 + /* Gets all layers allowing all domain accesses. */ 443 + if (unmask_layers(find_rule(domain, dir), access_dom, 444 + layer_masks_dom)) { 445 + /* 446 + * Stops when all handled accesses are allowed by at 447 + * least one rule in each layer. 448 + */ 449 + ret = true; 450 + break; 451 + } 452 + 453 + /* We should not reach a root other than @mnt_root. */ 454 + if (dir == mnt_root || WARN_ON_ONCE(IS_ROOT(dir))) 455 + break; 456 + 457 + parent_dentry = dget_parent(dir); 458 + dput(dir); 459 + dir = parent_dentry; 460 + } 461 + dput(dir); 462 + return ret; 463 + } 464 + 465 + /** 466 + * current_check_refer_path - Check if a rename or link action is allowed 467 + * 468 + * @old_dentry: File or directory requested to be moved or linked. 469 + * @new_dir: Destination parent directory. 470 + * @new_dentry: Destination file or directory. 471 + * @removable: Sets to true if it is a rename operation. 472 + * @exchange: Sets to true if it is a rename operation with RENAME_EXCHANGE. 473 + * 474 + * Because of its unprivileged constraints, Landlock relies on file hierarchies 475 + * (and not only inodes) to tie access rights to files. Being able to link or 476 + * rename a file hierarchy brings some challenges. Indeed, moving or linking a 477 + * file (i.e. creating a new reference to an inode) can have an impact on the 478 + * actions allowed for a set of files if it would change its parent directory 479 + * (i.e. reparenting). 480 + * 481 + * To avoid trivial access right bypasses, Landlock first checks if the file or 482 + * directory requested to be moved would gain new access rights inherited from 483 + * its new hierarchy. Before returning any error, Landlock then checks that 484 + * the parent source hierarchy and the destination hierarchy would allow the 485 + * link or rename action. If it is not the case, an error with EACCES is 486 + * returned to inform user space that there is no way to remove or create the 487 + * requested source file type. If it should be allowed but the new inherited 488 + * access rights would be greater than the source access rights, then the 489 + * kernel returns an error with EXDEV. Prioritizing EACCES over EXDEV enables 490 + * user space to abort the whole operation if there is no way to do it, or to 491 + * manually copy the source to the destination if this remains allowed, e.g. 492 + * because file creation is allowed on the destination directory but not direct 493 + * linking. 494 + * 495 + * To achieve this goal, the kernel needs to compare two file hierarchies: the 496 + * one identifying the source file or directory (including itself), and the 497 + * destination one. This can be seen as a multilayer partial ordering problem. 498 + * The kernel walks through these paths and collects in a matrix the access 499 + * rights that are denied per layer. These matrices are then compared to see 500 + * if the destination one has more (or the same) restrictions as the source 501 + * one. If this is the case, the requested action will not return EXDEV, which 502 + * doesn't mean the action is allowed. The parent hierarchy of the source 503 + * (i.e. parent directory), and the destination hierarchy must also be checked 504 + * to verify that they explicitly allow such action (i.e. referencing, 505 + * creation and potentially removal rights). The kernel implementation is then 506 + * required to rely on potentially four matrices of access rights: one for the 507 + * source file or directory (i.e. the child), a potentially other one for the 508 + * other source/destination (in case of RENAME_EXCHANGE), one for the source 509 + * parent hierarchy and a last one for the destination hierarchy. These 510 + * ephemeral matrices take some space on the stack, which limits the number of 511 + * layers to a deemed reasonable number: 16. 512 + * 513 + * Returns: 514 + * - 0 if access is allowed; 515 + * - -EXDEV if @old_dentry would inherit new access rights from @new_dir; 516 + * - -EACCES if file removal or creation is denied. 517 + */ 518 + static int current_check_refer_path(struct dentry *const old_dentry, 519 + const struct path *const new_dir, 520 + struct dentry *const new_dentry, 521 + const bool removable, const bool exchange) 522 + { 523 + const struct landlock_ruleset *const dom = 524 + landlock_get_current_domain(); 525 + bool allow_parent1, allow_parent2; 526 + access_mask_t access_request_parent1, access_request_parent2; 527 + struct path mnt_dir; 528 + layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS], 529 + layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS]; 530 + 531 + if (!dom) 532 + return 0; 533 + if (WARN_ON_ONCE(dom->num_layers < 1)) 534 + return -EACCES; 535 + if (unlikely(d_is_negative(old_dentry))) 536 + return -ENOENT; 537 + if (exchange) { 538 + if (unlikely(d_is_negative(new_dentry))) 539 + return -ENOENT; 540 + access_request_parent1 = 541 + get_mode_access(d_backing_inode(new_dentry)->i_mode); 542 + } else { 543 + access_request_parent1 = 0; 544 + } 545 + access_request_parent2 = 546 + get_mode_access(d_backing_inode(old_dentry)->i_mode); 547 + if (removable) { 548 + access_request_parent1 |= maybe_remove(old_dentry); 549 + access_request_parent2 |= maybe_remove(new_dentry); 550 + } 551 + 552 + /* The mount points are the same for old and new paths, cf. EXDEV. */ 553 + if (old_dentry->d_parent == new_dir->dentry) { 554 + /* 555 + * The LANDLOCK_ACCESS_FS_REFER access right is not required 556 + * for same-directory referer (i.e. no reparenting). 557 + */ 558 + access_request_parent1 = init_layer_masks( 559 + dom, access_request_parent1 | access_request_parent2, 560 + &layer_masks_parent1); 561 + return check_access_path_dual(dom, new_dir, 562 + access_request_parent1, 563 + &layer_masks_parent1, NULL, 0, 564 + NULL, NULL); 565 + } 566 + 567 + /* Backward compatibility: no reparenting support. */ 568 + if (!(get_handled_accesses(dom) & LANDLOCK_ACCESS_FS_REFER)) 569 + return -EXDEV; 570 + 571 + access_request_parent1 |= LANDLOCK_ACCESS_FS_REFER; 572 + access_request_parent2 |= LANDLOCK_ACCESS_FS_REFER; 573 + 574 + /* Saves the common mount point. */ 575 + mnt_dir.mnt = new_dir->mnt; 576 + mnt_dir.dentry = new_dir->mnt->mnt_root; 577 + 578 + /* new_dir->dentry is equal to new_dentry->d_parent */ 579 + allow_parent1 = collect_domain_accesses(dom, mnt_dir.dentry, 580 + old_dentry->d_parent, 581 + &layer_masks_parent1); 582 + allow_parent2 = collect_domain_accesses( 583 + dom, mnt_dir.dentry, new_dir->dentry, &layer_masks_parent2); 584 + 585 + if (allow_parent1 && allow_parent2) 586 + return 0; 587 + 588 + /* 589 + * To be able to compare source and destination domain access rights, 590 + * take into account the @old_dentry access rights aggregated with its 591 + * parent access rights. This will be useful to compare with the 592 + * destination parent access rights. 593 + */ 594 + return check_access_path_dual(dom, &mnt_dir, access_request_parent1, 595 + &layer_masks_parent1, old_dentry, 596 + access_request_parent2, 597 + &layer_masks_parent2, 598 + exchange ? new_dentry : NULL); 692 599 } 693 600 694 601 /* Inode hooks */ ··· 1084 591 1085 592 /* Path hooks */ 1086 593 1087 - /* 1088 - * Creating multiple links or renaming may lead to privilege escalations if not 1089 - * handled properly. Indeed, we must be sure that the source doesn't gain more 1090 - * privileges by being accessible from the destination. This is getting more 1091 - * complex when dealing with multiple layers. The whole picture can be seen as 1092 - * a multilayer partial ordering problem. A future version of Landlock will 1093 - * deal with that. 1094 - */ 1095 594 static int hook_path_link(struct dentry *const old_dentry, 1096 595 const struct path *const new_dir, 1097 596 struct dentry *const new_dentry) 1098 597 { 1099 - const struct landlock_ruleset *const dom = 1100 - landlock_get_current_domain(); 1101 - 1102 - if (!dom) 1103 - return 0; 1104 - /* The mount points are the same for old and new paths, cf. EXDEV. */ 1105 - if (old_dentry->d_parent != new_dir->dentry) 1106 - /* Gracefully forbids reparenting. */ 1107 - return -EXDEV; 1108 - if (unlikely(d_is_negative(old_dentry))) 1109 - return -ENOENT; 1110 - return check_access_path( 1111 - dom, new_dir, 1112 - get_mode_access(d_backing_inode(old_dentry)->i_mode)); 598 + return current_check_refer_path(old_dentry, new_dir, new_dentry, false, 599 + false); 1113 600 } 1114 601 1115 602 static int hook_path_rename(const struct path *const old_dir, ··· 1098 625 struct dentry *const new_dentry, 1099 626 const unsigned int flags) 1100 627 { 1101 - const struct landlock_ruleset *const dom = 1102 - landlock_get_current_domain(); 1103 - u32 exchange_access = 0; 1104 - 1105 - if (!dom) 1106 - return 0; 1107 - /* The mount points are the same for old and new paths, cf. EXDEV. */ 1108 - if (old_dir->dentry != new_dir->dentry) 1109 - /* Gracefully forbids reparenting. */ 1110 - return -EXDEV; 1111 - if (flags & RENAME_EXCHANGE) { 1112 - if (unlikely(d_is_negative(new_dentry))) 1113 - return -ENOENT; 1114 - exchange_access = 1115 - get_mode_access(d_backing_inode(new_dentry)->i_mode); 1116 - } 1117 - if (unlikely(d_is_negative(old_dentry))) 1118 - return -ENOENT; 1119 - /* RENAME_EXCHANGE is handled because directories are the same. */ 1120 - return check_access_path( 1121 - dom, old_dir, 1122 - maybe_remove(old_dentry) | maybe_remove(new_dentry) | 1123 - exchange_access | 1124 - get_mode_access(d_backing_inode(old_dentry)->i_mode)); 628 + /* old_dir refers to old_dentry->d_parent and new_dir->mnt */ 629 + return current_check_refer_path(old_dentry, new_dir, new_dentry, true, 630 + !!(flags & RENAME_EXCHANGE)); 1125 631 } 1126 632 1127 633 static int hook_path_mkdir(const struct path *const dir,
+1 -1
security/landlock/limits.h
··· 18 18 #define LANDLOCK_MAX_NUM_LAYERS 16 19 19 #define LANDLOCK_MAX_NUM_RULES U32_MAX 20 20 21 - #define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_MAKE_SYM 21 + #define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_REFER 22 22 #define LANDLOCK_MASK_ACCESS_FS ((LANDLOCK_LAST_ACCESS_FS << 1) - 1) 23 23 #define LANDLOCK_NUM_ACCESS_FS __const_hweight64(LANDLOCK_MASK_ACCESS_FS) 24 24
+1 -1
security/landlock/syscalls.c
··· 129 129 .write = fop_dummy_write, 130 130 }; 131 131 132 - #define LANDLOCK_ABI_VERSION 1 132 + #define LANDLOCK_ABI_VERSION 2 133 133 134 134 /** 135 135 * sys_landlock_create_ruleset - Create a new ruleset
+1 -1
tools/testing/selftests/landlock/base_test.c
··· 75 75 const struct landlock_ruleset_attr ruleset_attr = { 76 76 .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE, 77 77 }; 78 - ASSERT_EQ(1, landlock_create_ruleset(NULL, 0, 78 + ASSERT_EQ(2, landlock_create_ruleset(NULL, 0, 79 79 LANDLOCK_CREATE_RULESET_VERSION)); 80 80 81 81 ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
+2 -1
tools/testing/selftests/landlock/fs_test.c
··· 401 401 LANDLOCK_ACCESS_FS_WRITE_FILE | \ 402 402 LANDLOCK_ACCESS_FS_READ_FILE) 403 403 404 - #define ACCESS_LAST LANDLOCK_ACCESS_FS_MAKE_SYM 404 + #define ACCESS_LAST LANDLOCK_ACCESS_FS_REFER 405 405 406 406 #define ACCESS_ALL ( \ 407 407 ACCESS_FILE | \ ··· 414 414 LANDLOCK_ACCESS_FS_MAKE_SOCK | \ 415 415 LANDLOCK_ACCESS_FS_MAKE_FIFO | \ 416 416 LANDLOCK_ACCESS_FS_MAKE_BLOCK | \ 417 + LANDLOCK_ACCESS_FS_MAKE_SYM | \ 417 418 ACCESS_LAST) 418 419 419 420 /* clang-format on */