Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

fs-verity: add a documentation file

Add a documentation file for fs-verity, covering:

- Introduction
- Use cases
- User API
- FS_IOC_ENABLE_VERITY
- FS_IOC_MEASURE_VERITY
- FS_IOC_GETFLAGS
- Accessing verity files
- File measurement computation
- Merkle tree
- fs-verity descriptor
- Built-in signature verification
- Filesystem support
- ext4
- f2fs
- Implementation details
- Verifying data
- Pagecache
- Block device based filesystems
- Userspace utility
- Tests
- FAQ

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>

+727
+726
Documentation/filesystems/fsverity.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + .. _fsverity: 4 + 5 + ======================================================= 6 + fs-verity: read-only file-based authenticity protection 7 + ======================================================= 8 + 9 + Introduction 10 + ============ 11 + 12 + fs-verity (``fs/verity/``) is a support layer that filesystems can 13 + hook into to support transparent integrity and authenticity protection 14 + of read-only files. Currently, it is supported by the ext4 and f2fs 15 + filesystems. Like fscrypt, not too much filesystem-specific code is 16 + needed to support fs-verity. 17 + 18 + fs-verity is similar to `dm-verity 19 + <https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ 20 + but works on files rather than block devices. On regular files on 21 + filesystems supporting fs-verity, userspace can execute an ioctl that 22 + causes the filesystem to build a Merkle tree for the file and persist 23 + it to a filesystem-specific location associated with the file. 24 + 25 + After this, the file is made readonly, and all reads from the file are 26 + automatically verified against the file's Merkle tree. Reads of any 27 + corrupted data, including mmap reads, will fail. 28 + 29 + Userspace can use another ioctl to retrieve the root hash (actually 30 + the "file measurement", which is a hash that includes the root hash) 31 + that fs-verity is enforcing for the file. This ioctl executes in 32 + constant time, regardless of the file size. 33 + 34 + fs-verity is essentially a way to hash a file in constant time, 35 + subject to the caveat that reads which would violate the hash will 36 + fail at runtime. 37 + 38 + Use cases 39 + ========= 40 + 41 + By itself, the base fs-verity feature only provides integrity 42 + protection, i.e. detection of accidental (non-malicious) corruption. 43 + 44 + However, because fs-verity makes retrieving the file hash extremely 45 + efficient, it's primarily meant to be used as a tool to support 46 + authentication (detection of malicious modifications) or auditing 47 + (logging file hashes before use). 48 + 49 + Trusted userspace code (e.g. operating system code running on a 50 + read-only partition that is itself authenticated by dm-verity) can 51 + authenticate the contents of an fs-verity file by using the 52 + `FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a 53 + digital signature of it. 54 + 55 + A standard file hash could be used instead of fs-verity. However, 56 + this is inefficient if the file is large and only a small portion may 57 + be accessed. This is often the case for Android application package 58 + (APK) files, for example. These typically contain many translations, 59 + classes, and other resources that are infrequently or even never 60 + accessed on a particular device. It would be slow and wasteful to 61 + read and hash the entire file before starting the application. 62 + 63 + Unlike an ahead-of-time hash, fs-verity also re-verifies data each 64 + time it's paged in. This ensures that malicious disk firmware can't 65 + undetectably change the contents of the file at runtime. 66 + 67 + fs-verity does not replace or obsolete dm-verity. dm-verity should 68 + still be used on read-only filesystems. fs-verity is for files that 69 + must live on a read-write filesystem because they are independently 70 + updated and potentially user-installed, so dm-verity cannot be used. 71 + 72 + The base fs-verity feature is a hashing mechanism only; actually 73 + authenticating the files is up to userspace. However, to meet some 74 + users' needs, fs-verity optionally supports a simple signature 75 + verification mechanism where users can configure the kernel to require 76 + that all fs-verity files be signed by a key loaded into a keyring; see 77 + `Built-in signature verification`_. Support for fs-verity file hashes 78 + in IMA (Integrity Measurement Architecture) policies is also planned. 79 + 80 + User API 81 + ======== 82 + 83 + FS_IOC_ENABLE_VERITY 84 + -------------------- 85 + 86 + The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes 87 + in a pointer to a :c:type:`struct fsverity_enable_arg`, defined as 88 + follows:: 89 + 90 + struct fsverity_enable_arg { 91 + __u32 version; 92 + __u32 hash_algorithm; 93 + __u32 block_size; 94 + __u32 salt_size; 95 + __u64 salt_ptr; 96 + __u32 sig_size; 97 + __u32 __reserved1; 98 + __u64 sig_ptr; 99 + __u64 __reserved2[11]; 100 + }; 101 + 102 + This structure contains the parameters of the Merkle tree to build for 103 + the file, and optionally contains a signature. It must be initialized 104 + as follows: 105 + 106 + - ``version`` must be 1. 107 + - ``hash_algorithm`` must be the identifier for the hash algorithm to 108 + use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See 109 + ``include/uapi/linux/fsverity.h`` for the list of possible values. 110 + - ``block_size`` must be the Merkle tree block size. Currently, this 111 + must be equal to the system page size, which is usually 4096 bytes. 112 + Other sizes may be supported in the future. This value is not 113 + necessarily the same as the filesystem block size. 114 + - ``salt_size`` is the size of the salt in bytes, or 0 if no salt is 115 + provided. The salt is a value that is prepended to every hashed 116 + block; it can be used to personalize the hashing for a particular 117 + file or device. Currently the maximum salt size is 32 bytes. 118 + - ``salt_ptr`` is the pointer to the salt, or NULL if no salt is 119 + provided. 120 + - ``sig_size`` is the size of the signature in bytes, or 0 if no 121 + signature is provided. Currently the signature is (somewhat 122 + arbitrarily) limited to 16128 bytes. See `Built-in signature 123 + verification`_ for more information. 124 + - ``sig_ptr`` is the pointer to the signature, or NULL if no 125 + signature is provided. 126 + - All reserved fields must be zeroed. 127 + 128 + FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for 129 + the file and persist it to a filesystem-specific location associated 130 + with the file, then mark the file as a verity file. This ioctl may 131 + take a long time to execute on large files, and it is interruptible by 132 + fatal signals. 133 + 134 + FS_IOC_ENABLE_VERITY checks for write access to the inode. However, 135 + it must be executed on an O_RDONLY file descriptor and no processes 136 + can have the file open for writing. Attempts to open the file for 137 + writing while this ioctl is executing will fail with ETXTBSY. (This 138 + is necessary to guarantee that no writable file descriptors will exist 139 + after verity is enabled, and to guarantee that the file's contents are 140 + stable while the Merkle tree is being built over it.) 141 + 142 + On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a 143 + verity file. On failure (including the case of interruption by a 144 + fatal signal), no changes are made to the file. 145 + 146 + FS_IOC_ENABLE_VERITY can fail with the following errors: 147 + 148 + - ``EACCES``: the process does not have write access to the file 149 + - ``EBADMSG``: the signature is malformed 150 + - ``EBUSY``: this ioctl is already running on the file 151 + - ``EEXIST``: the file already has verity enabled 152 + - ``EFAULT``: the caller provided inaccessible memory 153 + - ``EINTR``: the operation was interrupted by a fatal signal 154 + - ``EINVAL``: unsupported version, hash algorithm, or block size; or 155 + reserved bits are set; or the file descriptor refers to neither a 156 + regular file nor a directory. 157 + - ``EISDIR``: the file descriptor refers to a directory 158 + - ``EKEYREJECTED``: the signature doesn't match the file 159 + - ``EMSGSIZE``: the salt or signature is too long 160 + - ``ENOKEY``: the fs-verity keyring doesn't contain the certificate 161 + needed to verify the signature 162 + - ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not 163 + available in the kernel's crypto API as currently configured (e.g. 164 + for SHA-512, missing CONFIG_CRYPTO_SHA512). 165 + - ``ENOTTY``: this type of filesystem does not implement fs-verity 166 + - ``EOPNOTSUPP``: the kernel was not configured with fs-verity 167 + support; or the filesystem superblock has not had the 'verity' 168 + feature enabled on it; or the filesystem does not support fs-verity 169 + on this file. (See `Filesystem support`_.) 170 + - ``EPERM``: the file is append-only; or, a signature is required and 171 + one was not provided. 172 + - ``EROFS``: the filesystem is read-only 173 + - ``ETXTBSY``: someone has the file open for writing. This can be the 174 + caller's file descriptor, another open file descriptor, or the file 175 + reference held by a writable memory map. 176 + 177 + FS_IOC_MEASURE_VERITY 178 + --------------------- 179 + 180 + The FS_IOC_MEASURE_VERITY ioctl retrieves the measurement of a verity 181 + file. The file measurement is a digest that cryptographically 182 + identifies the file contents that are being enforced on reads. 183 + 184 + This ioctl takes in a pointer to a variable-length structure:: 185 + 186 + struct fsverity_digest { 187 + __u16 digest_algorithm; 188 + __u16 digest_size; /* input/output */ 189 + __u8 digest[]; 190 + }; 191 + 192 + ``digest_size`` is an input/output field. On input, it must be 193 + initialized to the number of bytes allocated for the variable-length 194 + ``digest`` field. 195 + 196 + On success, 0 is returned and the kernel fills in the structure as 197 + follows: 198 + 199 + - ``digest_algorithm`` will be the hash algorithm used for the file 200 + measurement. It will match ``fsverity_enable_arg::hash_algorithm``. 201 + - ``digest_size`` will be the size of the digest in bytes, e.g. 32 202 + for SHA-256. (This can be redundant with ``digest_algorithm``.) 203 + - ``digest`` will be the actual bytes of the digest. 204 + 205 + FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time, 206 + regardless of the size of the file. 207 + 208 + FS_IOC_MEASURE_VERITY can fail with the following errors: 209 + 210 + - ``EFAULT``: the caller provided inaccessible memory 211 + - ``ENODATA``: the file is not a verity file 212 + - ``ENOTTY``: this type of filesystem does not implement fs-verity 213 + - ``EOPNOTSUPP``: the kernel was not configured with fs-verity 214 + support, or the filesystem superblock has not had the 'verity' 215 + feature enabled on it. (See `Filesystem support`_.) 216 + - ``EOVERFLOW``: the digest is longer than the specified 217 + ``digest_size`` bytes. Try providing a larger buffer. 218 + 219 + FS_IOC_GETFLAGS 220 + --------------- 221 + 222 + The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity) 223 + can also be used to check whether a file has fs-verity enabled or not. 224 + To do so, check for FS_VERITY_FL (0x00100000) in the returned flags. 225 + 226 + The verity flag is not settable via FS_IOC_SETFLAGS. You must use 227 + FS_IOC_ENABLE_VERITY instead, since parameters must be provided. 228 + 229 + Accessing verity files 230 + ====================== 231 + 232 + Applications can transparently access a verity file just like a 233 + non-verity one, with the following exceptions: 234 + 235 + - Verity files are readonly. They cannot be opened for writing or 236 + truncate()d, even if the file mode bits allow it. Attempts to do 237 + one of these things will fail with EPERM. However, changes to 238 + metadata such as owner, mode, timestamps, and xattrs are still 239 + allowed, since these are not measured by fs-verity. Verity files 240 + can also still be renamed, deleted, and linked to. 241 + 242 + - Direct I/O is not supported on verity files. Attempts to use direct 243 + I/O on such files will fall back to buffered I/O. 244 + 245 + - DAX (Direct Access) is not supported on verity files, because this 246 + would circumvent the data verification. 247 + 248 + - Reads of data that doesn't match the verity Merkle tree will fail 249 + with EIO (for read()) or SIGBUS (for mmap() reads). 250 + 251 + - If the sysctl "fs.verity.require_signatures" is set to 1 and the 252 + file's verity measurement is not signed by a key in the fs-verity 253 + keyring, then opening the file will fail. See `Built-in signature 254 + verification`_. 255 + 256 + Direct access to the Merkle tree is not supported. Therefore, if a 257 + verity file is copied, or is backed up and restored, then it will lose 258 + its "verity"-ness. fs-verity is primarily meant for files like 259 + executables that are managed by a package manager. 260 + 261 + File measurement computation 262 + ============================ 263 + 264 + This section describes how fs-verity hashes the file contents using a 265 + Merkle tree to produce the "file measurement" which cryptographically 266 + identifies the file contents. This algorithm is the same for all 267 + filesystems that support fs-verity. 268 + 269 + Userspace only needs to be aware of this algorithm if it needs to 270 + compute the file measurement itself, e.g. in order to sign the file. 271 + 272 + .. _fsverity_merkle_tree: 273 + 274 + Merkle tree 275 + ----------- 276 + 277 + The file contents is divided into blocks, where the block size is 278 + configurable but is usually 4096 bytes. The end of the last block is 279 + zero-padded if needed. Each block is then hashed, producing the first 280 + level of hashes. Then, the hashes in this first level are grouped 281 + into 'blocksize'-byte blocks (zero-padding the ends as needed) and 282 + these blocks are hashed, producing the second level of hashes. This 283 + proceeds up the tree until only a single block remains. The hash of 284 + this block is the "Merkle tree root hash". 285 + 286 + If the file fits in one block and is nonempty, then the "Merkle tree 287 + root hash" is simply the hash of the single data block. If the file 288 + is empty, then the "Merkle tree root hash" is all zeroes. 289 + 290 + The "blocks" here are not necessarily the same as "filesystem blocks". 291 + 292 + If a salt was specified, then it's zero-padded to the closest multiple 293 + of the input size of the hash algorithm's compression function, e.g. 294 + 64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is 295 + prepended to every data or Merkle tree block that is hashed. 296 + 297 + The purpose of the block padding is to cause every hash to be taken 298 + over the same amount of data, which simplifies the implementation and 299 + keeps open more possibilities for hardware acceleration. The purpose 300 + of the salt padding is to make the salting "free" when the salted hash 301 + state is precomputed, then imported for each hash. 302 + 303 + Example: in the recommended configuration of SHA-256 and 4K blocks, 304 + 128 hash values fit in each block. Thus, each level of the Merkle 305 + tree is approximately 128 times smaller than the previous, and for 306 + large files the Merkle tree's size converges to approximately 1/127 of 307 + the original file size. However, for small files, the padding is 308 + significant, making the space overhead proportionally more. 309 + 310 + .. _fsverity_descriptor: 311 + 312 + fs-verity descriptor 313 + -------------------- 314 + 315 + By itself, the Merkle tree root hash is ambiguous. For example, it 316 + can't a distinguish a large file from a small second file whose data 317 + is exactly the top-level hash block of the first file. Ambiguities 318 + also arise from the convention of padding to the next block boundary. 319 + 320 + To solve this problem, the verity file measurement is actually 321 + computed as a hash of the following structure, which contains the 322 + Merkle tree root hash as well as other fields such as the file size:: 323 + 324 + struct fsverity_descriptor { 325 + __u8 version; /* must be 1 */ 326 + __u8 hash_algorithm; /* Merkle tree hash algorithm */ 327 + __u8 log_blocksize; /* log2 of size of data and tree blocks */ 328 + __u8 salt_size; /* size of salt in bytes; 0 if none */ 329 + __le32 sig_size; /* must be 0 */ 330 + __le64 data_size; /* size of file the Merkle tree is built over */ 331 + __u8 root_hash[64]; /* Merkle tree root hash */ 332 + __u8 salt[32]; /* salt prepended to each hashed block */ 333 + __u8 __reserved[144]; /* must be 0's */ 334 + }; 335 + 336 + Note that the ``sig_size`` field must be set to 0 for the purpose of 337 + computing the file measurement, even if a signature was provided (or 338 + will be provided) to `FS_IOC_ENABLE_VERITY`_. 339 + 340 + Built-in signature verification 341 + =============================== 342 + 343 + With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting 344 + a portion of an authentication policy (see `Use cases`_) in the 345 + kernel. Specifically, it adds support for: 346 + 347 + 1. At fs-verity module initialization time, a keyring ".fs-verity" is 348 + created. The root user can add trusted X.509 certificates to this 349 + keyring using the add_key() system call, then (when done) 350 + optionally use keyctl_restrict_keyring() to prevent additional 351 + certificates from being added. 352 + 353 + 2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted 354 + detached signature in DER format of the file measurement. On 355 + success, this signature is persisted alongside the Merkle tree. 356 + Then, any time the file is opened, the kernel will verify the 357 + file's actual measurement against this signature, using the 358 + certificates in the ".fs-verity" keyring. 359 + 360 + 3. A new sysctl "fs.verity.require_signatures" is made available. 361 + When set to 1, the kernel requires that all verity files have a 362 + correctly signed file measurement as described in (2). 363 + 364 + File measurements must be signed in the following format, which is 365 + similar to the structure used by `FS_IOC_MEASURE_VERITY`_:: 366 + 367 + struct fsverity_signed_digest { 368 + char magic[8]; /* must be "FSVerity" */ 369 + __le16 digest_algorithm; 370 + __le16 digest_size; 371 + __u8 digest[]; 372 + }; 373 + 374 + fs-verity's built-in signature verification support is meant as a 375 + relatively simple mechanism that can be used to provide some level of 376 + authenticity protection for verity files, as an alternative to doing 377 + the signature verification in userspace or using IMA-appraisal. 378 + However, with this mechanism, userspace programs still need to check 379 + that the verity bit is set, and there is no protection against verity 380 + files being swapped around. 381 + 382 + Filesystem support 383 + ================== 384 + 385 + fs-verity is currently supported by the ext4 and f2fs filesystems. 386 + The CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity 387 + on either filesystem. 388 + 389 + ``include/linux/fsverity.h`` declares the interface between the 390 + ``fs/verity/`` support layer and filesystems. Briefly, filesystems 391 + must provide an ``fsverity_operations`` structure that provides 392 + methods to read and write the verity metadata to a filesystem-specific 393 + location, including the Merkle tree blocks and 394 + ``fsverity_descriptor``. Filesystems must also call functions in 395 + ``fs/verity/`` at certain times, such as when a file is opened or when 396 + pages have been read into the pagecache. (See `Verifying data`_.) 397 + 398 + ext4 399 + ---- 400 + 401 + ext4 supports fs-verity since Linux TODO and e2fsprogs v1.45.2. 402 + 403 + To create verity files on an ext4 filesystem, the filesystem must have 404 + been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on 405 + it. "verity" is an RO_COMPAT filesystem feature, so once set, old 406 + kernels will only be able to mount the filesystem readonly, and old 407 + versions of e2fsck will be unable to check the filesystem. Moreover, 408 + currently ext4 only supports mounting a filesystem with the "verity" 409 + feature when its block size is equal to PAGE_SIZE (often 4096 bytes). 410 + 411 + ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It 412 + can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared. 413 + 414 + ext4 also supports encryption, which can be used simultaneously with 415 + fs-verity. In this case, the plaintext data is verified rather than 416 + the ciphertext. This is necessary in order to make the file 417 + measurement meaningful, since every file is encrypted differently. 418 + 419 + ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) 420 + past the end of the file, starting at the first 64K boundary beyond 421 + i_size. This approach works because (a) verity files are readonly, 422 + and (b) pages fully beyond i_size aren't visible to userspace but can 423 + be read/written internally by ext4 with only some relatively small 424 + changes to ext4. This approach avoids having to depend on the 425 + EA_INODE feature and on rearchitecturing ext4's xattr support to 426 + support paging multi-gigabyte xattrs into memory, and to support 427 + encrypting xattrs. Note that the verity metadata *must* be encrypted 428 + when the file is, since it contains hashes of the plaintext data. 429 + 430 + Currently, ext4 verity only supports the case where the Merkle tree 431 + block size, filesystem block size, and page size are all the same. It 432 + also only supports extent-based files. 433 + 434 + f2fs 435 + ---- 436 + 437 + f2fs supports fs-verity since Linux TODO and f2fs-tools v1.11.0. 438 + 439 + To create verity files on an f2fs filesystem, the filesystem must have 440 + been formatted with ``-O verity``. 441 + 442 + f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files. 443 + It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be 444 + cleared. 445 + 446 + Like ext4, f2fs stores the verity metadata (Merkle tree and 447 + fsverity_descriptor) past the end of the file, starting at the first 448 + 64K boundary beyond i_size. See explanation for ext4 above. 449 + Moreover, f2fs supports at most 4096 bytes of xattr entries per inode 450 + which wouldn't be enough for even a single Merkle tree block. 451 + 452 + Currently, f2fs verity only supports a Merkle tree block size of 4096. 453 + Also, f2fs doesn't support enabling verity on files that currently 454 + have atomic or volatile writes pending. 455 + 456 + Implementation details 457 + ====================== 458 + 459 + Verifying data 460 + -------------- 461 + 462 + fs-verity ensures that all reads of a verity file's data are verified, 463 + regardless of which syscall is used to do the read (e.g. mmap(), 464 + read(), pread()) and regardless of whether it's the first read or a 465 + later read (unless the later read can return cached data that was 466 + already verified). Below, we describe how filesystems implement this. 467 + 468 + Pagecache 469 + ~~~~~~~~~ 470 + 471 + For filesystems using Linux's pagecache, the ``->readpage()`` and 472 + ``->readpages()`` methods must be modified to verify pages before they 473 + are marked Uptodate. Merely hooking ``->read_iter()`` would be 474 + insufficient, since ``->read_iter()`` is not used for memory maps. 475 + 476 + Therefore, fs/verity/ provides a function fsverity_verify_page() which 477 + verifies a page that has been read into the pagecache of a verity 478 + inode, but is still locked and not Uptodate, so it's not yet readable 479 + by userspace. As needed to do the verification, 480 + fsverity_verify_page() will call back into the filesystem to read 481 + Merkle tree pages via fsverity_operations::read_merkle_tree_page(). 482 + 483 + fsverity_verify_page() returns false if verification failed; in this 484 + case, the filesystem must not set the page Uptodate. Following this, 485 + as per the usual Linux pagecache behavior, attempts by userspace to 486 + read() from the part of the file containing the page will fail with 487 + EIO, and accesses to the page within a memory map will raise SIGBUS. 488 + 489 + fsverity_verify_page() currently only supports the case where the 490 + Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes). 491 + 492 + In principle, fsverity_verify_page() verifies the entire path in the 493 + Merkle tree from the data page to the root hash. However, for 494 + efficiency the filesystem may cache the hash pages. Therefore, 495 + fsverity_verify_page() only ascends the tree reading hash pages until 496 + an already-verified hash page is seen, as indicated by the PageChecked 497 + bit being set. It then verifies the path to that page. 498 + 499 + This optimization, which is also used by dm-verity, results in 500 + excellent sequential read performance. This is because usually (e.g. 501 + 127 in 128 times for 4K blocks and SHA-256) the hash page from the 502 + bottom level of the tree will already be cached and checked from 503 + reading a previous data page. However, random reads perform worse. 504 + 505 + Block device based filesystems 506 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 507 + 508 + Block device based filesystems (e.g. ext4 and f2fs) in Linux also use 509 + the pagecache, so the above subsection applies too. However, they 510 + also usually read many pages from a file at once, grouped into a 511 + structure called a "bio". To make it easier for these types of 512 + filesystems to support fs-verity, fs/verity/ also provides a function 513 + fsverity_verify_bio() which verifies all pages in a bio. 514 + 515 + ext4 and f2fs also support encryption. If a verity file is also 516 + encrypted, the pages must be decrypted before being verified. To 517 + support this, these filesystems allocate a "post-read context" for 518 + each bio and store it in ``->bi_private``:: 519 + 520 + struct bio_post_read_ctx { 521 + struct bio *bio; 522 + struct work_struct work; 523 + unsigned int cur_step; 524 + unsigned int enabled_steps; 525 + }; 526 + 527 + ``enabled_steps`` is a bitmask that specifies whether decryption, 528 + verity, or both is enabled. After the bio completes, for each needed 529 + postprocessing step the filesystem enqueues the bio_post_read_ctx on a 530 + workqueue, and then the workqueue work does the decryption or 531 + verification. Finally, pages where no decryption or verity error 532 + occurred are marked Uptodate, and the pages are unlocked. 533 + 534 + Files on ext4 and f2fs may contain holes. Normally, ``->readpages()`` 535 + simply zeroes holes and sets the corresponding pages Uptodate; no bios 536 + are issued. To prevent this case from bypassing fs-verity, these 537 + filesystems use fsverity_verify_page() to verify hole pages. 538 + 539 + ext4 and f2fs disable direct I/O on verity files, since otherwise 540 + direct I/O would bypass fs-verity. (They also do the same for 541 + encrypted files.) 542 + 543 + Userspace utility 544 + ================= 545 + 546 + This document focuses on the kernel, but a userspace utility for 547 + fs-verity can be found at: 548 + 549 + https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git 550 + 551 + See the README.md file in the fsverity-utils source tree for details, 552 + including examples of setting up fs-verity protected files. 553 + 554 + Tests 555 + ===== 556 + 557 + To test fs-verity, use xfstests. For example, using `kvm-xfstests 558 + <https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_:: 559 + 560 + kvm-xfstests -c ext4,f2fs -g verity 561 + 562 + FAQ 563 + === 564 + 565 + This section answers frequently asked questions about fs-verity that 566 + weren't already directly answered in other parts of this document. 567 + 568 + :Q: Why isn't fs-verity part of IMA? 569 + :A: fs-verity and IMA (Integrity Measurement Architecture) have 570 + different focuses. fs-verity is a filesystem-level mechanism for 571 + hashing individual files using a Merkle tree. In contrast, IMA 572 + specifies a system-wide policy that specifies which files are 573 + hashed and what to do with those hashes, such as log them, 574 + authenticate them, or add them to a measurement list. 575 + 576 + IMA is planned to support the fs-verity hashing mechanism as an 577 + alternative to doing full file hashes, for people who want the 578 + performance and security benefits of the Merkle tree based hash. 579 + But it doesn't make sense to force all uses of fs-verity to be 580 + through IMA. As a standalone filesystem feature, fs-verity 581 + already meets many users' needs, and it's testable like other 582 + filesystem features e.g. with xfstests. 583 + 584 + :Q: Isn't fs-verity useless because the attacker can just modify the 585 + hashes in the Merkle tree, which is stored on-disk? 586 + :A: To verify the authenticity of an fs-verity file you must verify 587 + the authenticity of the "file measurement", which is basically the 588 + root hash of the Merkle tree. See `Use cases`_. 589 + 590 + :Q: Isn't fs-verity useless because the attacker can just replace a 591 + verity file with a non-verity one? 592 + :A: See `Use cases`_. In the initial use case, it's really trusted 593 + userspace code that authenticates the files; fs-verity is just a 594 + tool to do this job efficiently and securely. The trusted 595 + userspace code will consider non-verity files to be inauthentic. 596 + 597 + :Q: Why does the Merkle tree need to be stored on-disk? Couldn't you 598 + store just the root hash? 599 + :A: If the Merkle tree wasn't stored on-disk, then you'd have to 600 + compute the entire tree when the file is first accessed, even if 601 + just one byte is being read. This is a fundamental consequence of 602 + how Merkle tree hashing works. To verify a leaf node, you need to 603 + verify the whole path to the root hash, including the root node 604 + (the thing which the root hash is a hash of). But if the root 605 + node isn't stored on-disk, you have to compute it by hashing its 606 + children, and so on until you've actually hashed the entire file. 607 + 608 + That defeats most of the point of doing a Merkle tree-based hash, 609 + since if you have to hash the whole file ahead of time anyway, 610 + then you could simply do sha256(file) instead. That would be much 611 + simpler, and a bit faster too. 612 + 613 + It's true that an in-memory Merkle tree could still provide the 614 + advantage of verification on every read rather than just on the 615 + first read. However, it would be inefficient because every time a 616 + hash page gets evicted (you can't pin the entire Merkle tree into 617 + memory, since it may be very large), in order to restore it you 618 + again need to hash everything below it in the tree. This again 619 + defeats most of the point of doing a Merkle tree-based hash, since 620 + a single block read could trigger re-hashing gigabytes of data. 621 + 622 + :Q: But couldn't you store just the leaf nodes and compute the rest? 623 + :A: See previous answer; this really just moves up one level, since 624 + one could alternatively interpret the data blocks as being the 625 + leaf nodes of the Merkle tree. It's true that the tree can be 626 + computed much faster if the leaf level is stored rather than just 627 + the data, but that's only because each level is less than 1% the 628 + size of the level below (assuming the recommended settings of 629 + SHA-256 and 4K blocks). For the exact same reason, by storing 630 + "just the leaf nodes" you'd already be storing over 99% of the 631 + tree, so you might as well simply store the whole tree. 632 + 633 + :Q: Can the Merkle tree be built ahead of time, e.g. distributed as 634 + part of a package that is installed to many computers? 635 + :A: This isn't currently supported. It was part of the original 636 + design, but was removed to simplify the kernel UAPI and because it 637 + wasn't a critical use case. Files are usually installed once and 638 + used many times, and cryptographic hashing is somewhat fast on 639 + most modern processors. 640 + 641 + :Q: Why doesn't fs-verity support writes? 642 + :A: Write support would be very difficult and would require a 643 + completely different design, so it's well outside the scope of 644 + fs-verity. Write support would require: 645 + 646 + - A way to maintain consistency between the data and hashes, 647 + including all levels of hashes, since corruption after a crash 648 + (especially of potentially the entire file!) is unacceptable. 649 + The main options for solving this are data journalling, 650 + copy-on-write, and log-structured volume. But it's very hard to 651 + retrofit existing filesystems with new consistency mechanisms. 652 + Data journalling is available on ext4, but is very slow. 653 + 654 + - Rebuilding the the Merkle tree after every write, which would be 655 + extremely inefficient. Alternatively, a different authenticated 656 + dictionary structure such as an "authenticated skiplist" could 657 + be used. However, this would be far more complex. 658 + 659 + Compare it to dm-verity vs. dm-integrity. dm-verity is very 660 + simple: the kernel just verifies read-only data against a 661 + read-only Merkle tree. In contrast, dm-integrity supports writes 662 + but is slow, is much more complex, and doesn't actually support 663 + full-device authentication since it authenticates each sector 664 + independently, i.e. there is no "root hash". It doesn't really 665 + make sense for the same device-mapper target to support these two 666 + very different cases; the same applies to fs-verity. 667 + 668 + :Q: Since verity files are immutable, why isn't the immutable bit set? 669 + :A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a 670 + specific set of semantics which not only make the file contents 671 + read-only, but also prevent the file from being deleted, renamed, 672 + linked to, or having its owner or mode changed. These extra 673 + properties are unwanted for fs-verity, so reusing the immutable 674 + bit isn't appropriate. 675 + 676 + :Q: Why does the API use ioctls instead of setxattr() and getxattr()? 677 + :A: Abusing the xattr interface for basically arbitrary syscalls is 678 + heavily frowned upon by most of the Linux filesystem developers. 679 + An xattr should really just be an xattr on-disk, not an API to 680 + e.g. magically trigger construction of a Merkle tree. 681 + 682 + :Q: Does fs-verity support remote filesystems? 683 + :A: Only ext4 and f2fs support is implemented currently, but in 684 + principle any filesystem that can store per-file verity metadata 685 + can support fs-verity, regardless of whether it's local or remote. 686 + Some filesystems may have fewer options of where to store the 687 + verity metadata; one possibility is to store it past the end of 688 + the file and "hide" it from userspace by manipulating i_size. The 689 + data verification functions provided by ``fs/verity/`` also assume 690 + that the filesystem uses the Linux pagecache, but both local and 691 + remote filesystems normally do so. 692 + 693 + :Q: Why is anything filesystem-specific at all? Shouldn't fs-verity 694 + be implemented entirely at the VFS level? 695 + :A: There are many reasons why this is not possible or would be very 696 + difficult, including the following: 697 + 698 + - To prevent bypassing verification, pages must not be marked 699 + Uptodate until they've been verified. Currently, each 700 + filesystem is responsible for marking pages Uptodate via 701 + ``->readpages()``. Therefore, currently it's not possible for 702 + the VFS to do the verification on its own. Changing this would 703 + require significant changes to the VFS and all filesystems. 704 + 705 + - It would require defining a filesystem-independent way to store 706 + the verity metadata. Extended attributes don't work for this 707 + because (a) the Merkle tree may be gigabytes, but many 708 + filesystems assume that all xattrs fit into a single 4K 709 + filesystem block, and (b) ext4 and f2fs encryption doesn't 710 + encrypt xattrs, yet the Merkle tree *must* be encrypted when the 711 + file contents are, because it stores hashes of the plaintext 712 + file contents. 713 + 714 + So the verity metadata would have to be stored in an actual 715 + file. Using a separate file would be very ugly, since the 716 + metadata is fundamentally part of the file to be protected, and 717 + it could cause problems where users could delete the real file 718 + but not the metadata file or vice versa. On the other hand, 719 + having it be in the same file would break applications unless 720 + filesystems' notion of i_size were divorced from the VFS's, 721 + which would be complex and require changes to all filesystems. 722 + 723 + - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's 724 + transaction mechanism so that either the file ends up with 725 + verity enabled, or no changes were made. Allowing intermediate 726 + states to occur after a crash may cause problems.
+1
Documentation/filesystems/index.rst
··· 32 32 33 33 journalling 34 34 fscrypt 35 + fsverity