Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

xfs: clean up inode lockdep annotations

Lockdep annotations are a maintenance nightmare. Locking has to be
modified to suit the limitations of the annotations, and we're
always having to fix the annotations because they are unable to
express the complexity of locking heirarchies correctly.

So, next up, we've got more issues with lockdep annotations for
inode locking w.r.t. XFS_LOCK_PARENT:

- lockdep classes are exclusive and can't be ORed together
to form new classes.
- IOLOCK needs multiple PARENT subclasses to express the
changes needed for the readdir locking rework needed to
stop the endless flow of lockdep false positives involving
readdir calling filldir under the ILOCK.
- there are only 8 unique lockdep subclasses available,
so we can't create a generic solution.

IOWs we need to treat the 3-bit space available to each lock type
differently:

- IOLOCK uses xfs_lock_two_inodes(), so needs:
- at least 2 IOLOCK subclasses
- at least 2 IOLOCK_PARENT subclasses
- MMAPLOCK uses xfs_lock_two_inodes(), so needs:
- at least 2 MMAPLOCK subclasses
- ILOCK uses xfs_lock_inodes with up to 5 inodes, so needs:
- at least 5 ILOCK subclasses
- one ILOCK_PARENT subclass
- one RTBITMAP subclass
- one RTSUM subclass

For the IOLOCK, split the space into two sets of subclasses.
For the MMAPLOCK, just use half the space for the one subclass to
match the non-parent lock classes of the IOLOCK.
For the ILOCK, use 0-4 as the ILOCK subclasses, 5-7 for the
remaining individual subclasses.

Because they are now all different, modify xfs_lock_inumorder() to
handle the nested subclasses, and to assert fail if passed an
invalid subclass. Further, annotate xfs_lock_inodes() to assert fail
if an invalid combination of lock primitives and inode counts are
passed that would result in a lockdep subclass annotation overflow.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

authored by

Dave Chinner and committed by
Dave Chinner
0952c818 7df1c170

+107 -40
+51 -17
fs/xfs/xfs_inode.c
··· 164 164 (XFS_MMAPLOCK_SHARED | XFS_MMAPLOCK_EXCL)); 165 165 ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) != 166 166 (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); 167 - ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0); 167 + ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_SUBCLASS_MASK)) == 0); 168 168 169 169 if (lock_flags & XFS_IOLOCK_EXCL) 170 170 mrupdate_nested(&ip->i_iolock, XFS_IOLOCK_DEP(lock_flags)); ··· 212 212 (XFS_MMAPLOCK_SHARED | XFS_MMAPLOCK_EXCL)); 213 213 ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) != 214 214 (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); 215 - ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0); 215 + ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_SUBCLASS_MASK)) == 0); 216 216 217 217 if (lock_flags & XFS_IOLOCK_EXCL) { 218 218 if (!mrtryupdate(&ip->i_iolock)) ··· 281 281 (XFS_MMAPLOCK_SHARED | XFS_MMAPLOCK_EXCL)); 282 282 ASSERT((lock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)) != 283 283 (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); 284 - ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_DEP_MASK)) == 0); 284 + ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_SUBCLASS_MASK)) == 0); 285 285 ASSERT(lock_flags != 0); 286 286 287 287 if (lock_flags & XFS_IOLOCK_EXCL) ··· 364 364 365 365 /* 366 366 * Bump the subclass so xfs_lock_inodes() acquires each lock with a different 367 - * value. This shouldn't be called for page fault locking, but we also need to 368 - * ensure we don't overrun the number of lockdep subclasses for the iolock or 369 - * mmaplock as that is limited to 12 by the mmap lock lockdep annotations. 367 + * value. This can be called for any type of inode lock combination, including 368 + * parent locking. Care must be taken to ensure we don't overrun the subclass 369 + * storage fields in the class mask we build. 370 370 */ 371 371 static inline int 372 372 xfs_lock_inumorder(int lock_mode, int subclass) 373 373 { 374 + int class = 0; 375 + 376 + ASSERT(!(lock_mode & (XFS_ILOCK_PARENT | XFS_ILOCK_RTBITMAP | 377 + XFS_ILOCK_RTSUM))); 378 + 374 379 if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL)) { 375 - ASSERT(subclass + XFS_LOCK_INUMORDER < 376 - (1 << (XFS_MMAPLOCK_SHIFT - XFS_IOLOCK_SHIFT))); 377 - lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_IOLOCK_SHIFT; 380 + ASSERT(subclass <= XFS_IOLOCK_MAX_SUBCLASS); 381 + ASSERT(subclass + XFS_IOLOCK_PARENT_VAL < 382 + MAX_LOCKDEP_SUBCLASSES); 383 + class += subclass << XFS_IOLOCK_SHIFT; 384 + if (lock_mode & XFS_IOLOCK_PARENT) 385 + class += XFS_IOLOCK_PARENT_VAL << XFS_IOLOCK_SHIFT; 378 386 } 379 387 380 388 if (lock_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) { 381 - ASSERT(subclass + XFS_LOCK_INUMORDER < 382 - (1 << (XFS_ILOCK_SHIFT - XFS_MMAPLOCK_SHIFT))); 383 - lock_mode |= (subclass + XFS_LOCK_INUMORDER) << 384 - XFS_MMAPLOCK_SHIFT; 389 + ASSERT(subclass <= XFS_MMAPLOCK_MAX_SUBCLASS); 390 + class += subclass << XFS_MMAPLOCK_SHIFT; 385 391 } 386 392 387 - if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)) 388 - lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_ILOCK_SHIFT; 393 + if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)) { 394 + ASSERT(subclass <= XFS_ILOCK_MAX_SUBCLASS); 395 + class += subclass << XFS_ILOCK_SHIFT; 396 + } 389 397 390 - return lock_mode; 398 + return (lock_mode & ~XFS_LOCK_SUBCLASS_MASK) | class; 391 399 } 392 400 393 401 /* ··· 407 399 * transaction (such as truncate). This can result in deadlock since the long 408 400 * running trans might need to wait for the inode we just locked in order to 409 401 * push the tail and free space in the log. 402 + * 403 + * xfs_lock_inodes() can only be used to lock one type of lock at a time - 404 + * the iolock, the mmaplock or the ilock, but not more than one at a time. If we 405 + * lock more than one at a time, lockdep will report false positives saying we 406 + * have violated locking orders. 410 407 */ 411 408 void 412 409 xfs_lock_inodes( ··· 422 409 int attempts = 0, i, j, try_lock; 423 410 xfs_log_item_t *lp; 424 411 425 - /* currently supports between 2 and 5 inodes */ 412 + /* 413 + * Currently supports between 2 and 5 inodes with exclusive locking. We 414 + * support an arbitrary depth of locking here, but absolute limits on 415 + * inodes depend on the the type of locking and the limits placed by 416 + * lockdep annotations in xfs_lock_inumorder. These are all checked by 417 + * the asserts. 418 + */ 426 419 ASSERT(ips && inodes >= 2 && inodes <= 5); 420 + ASSERT(lock_mode & (XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL | 421 + XFS_ILOCK_EXCL)); 422 + ASSERT(!(lock_mode & (XFS_IOLOCK_SHARED | XFS_MMAPLOCK_SHARED | 423 + XFS_ILOCK_SHARED))); 424 + ASSERT(!(lock_mode & XFS_IOLOCK_EXCL) || 425 + inodes <= XFS_IOLOCK_MAX_SUBCLASS + 1); 426 + ASSERT(!(lock_mode & XFS_MMAPLOCK_EXCL) || 427 + inodes <= XFS_MMAPLOCK_MAX_SUBCLASS + 1); 428 + ASSERT(!(lock_mode & XFS_ILOCK_EXCL) || 429 + inodes <= XFS_ILOCK_MAX_SUBCLASS + 1); 430 + 431 + if (lock_mode & XFS_IOLOCK_EXCL) { 432 + ASSERT(!(lock_mode & (XFS_MMAPLOCK_EXCL | XFS_ILOCK_EXCL))); 433 + } else if (lock_mode & XFS_MMAPLOCK_EXCL) 434 + ASSERT(!(lock_mode & XFS_ILOCK_EXCL)); 427 435 428 436 try_lock = 0; 429 437 i = 0;
+56 -23
fs/xfs/xfs_inode.h
··· 284 284 * Flags for lockdep annotations. 285 285 * 286 286 * XFS_LOCK_PARENT - for directory operations that require locking a 287 - * parent directory inode and a child entry inode. The parent gets locked 288 - * with this flag so it gets a lockdep subclass of 1 and the child entry 289 - * lock will have a lockdep subclass of 0. 287 + * parent directory inode and a child entry inode. IOLOCK requires nesting, 288 + * MMAPLOCK does not support this class, ILOCK requires a single subclass 289 + * to differentiate parent from child. 290 290 * 291 291 * XFS_LOCK_RTBITMAP/XFS_LOCK_RTSUM - the realtime device bitmap and summary 292 292 * inodes do not participate in the normal lock order, and thus have their ··· 295 295 * XFS_LOCK_INUMORDER - for locking several inodes at the some time 296 296 * with xfs_lock_inodes(). This flag is used as the starting subclass 297 297 * and each subsequent lock acquired will increment the subclass by one. 298 - * So the first lock acquired will have a lockdep subclass of 4, the 299 - * second lock will have a lockdep subclass of 5, and so on. It is 300 - * the responsibility of the class builder to shift this to the correct 301 - * portion of the lock_mode lockdep mask. 298 + * However, MAX_LOCKDEP_SUBCLASSES == 8, which means we are greatly 299 + * limited to the subclasses we can represent via nesting. We need at least 300 + * 5 inodes nest depth for the ILOCK through rename, and we also have to support 301 + * XFS_ILOCK_PARENT, which gives 6 subclasses. Then we have XFS_ILOCK_RTBITMAP 302 + * and XFS_ILOCK_RTSUM, which are another 2 unique subclasses, so that's all 303 + * 8 subclasses supported by lockdep. 304 + * 305 + * This also means we have to number the sub-classes in the lowest bits of 306 + * the mask we keep, and we have to ensure we never exceed 3 bits of lockdep 307 + * mask and we can't use bit-masking to build the subclasses. What a mess. 308 + * 309 + * Bit layout: 310 + * 311 + * Bit Lock Region 312 + * 16-19 XFS_IOLOCK_SHIFT dependencies 313 + * 20-23 XFS_MMAPLOCK_SHIFT dependencies 314 + * 24-31 XFS_ILOCK_SHIFT dependencies 315 + * 316 + * IOLOCK values 317 + * 318 + * 0-3 subclass value 319 + * 4-7 PARENT subclass values 320 + * 321 + * MMAPLOCK values 322 + * 323 + * 0-3 subclass value 324 + * 4-7 unused 325 + * 326 + * ILOCK values 327 + * 0-4 subclass values 328 + * 5 PARENT subclass (not nestable) 329 + * 6 RTBITMAP subclass (not nestable) 330 + * 7 RTSUM subclass (not nestable) 331 + * 302 332 */ 303 - #define XFS_LOCK_PARENT 1 304 - #define XFS_LOCK_RTBITMAP 2 305 - #define XFS_LOCK_RTSUM 3 306 - #define XFS_LOCK_INUMORDER 4 333 + #define XFS_IOLOCK_SHIFT 16 334 + #define XFS_IOLOCK_PARENT_VAL 4 335 + #define XFS_IOLOCK_MAX_SUBCLASS (XFS_IOLOCK_PARENT_VAL - 1) 336 + #define XFS_IOLOCK_DEP_MASK 0x000f0000 337 + #define XFS_IOLOCK_PARENT (XFS_IOLOCK_PARENT_VAL << XFS_IOLOCK_SHIFT) 307 338 308 - #define XFS_IOLOCK_SHIFT 16 309 - #define XFS_IOLOCK_PARENT (XFS_LOCK_PARENT << XFS_IOLOCK_SHIFT) 339 + #define XFS_MMAPLOCK_SHIFT 20 340 + #define XFS_MMAPLOCK_NUMORDER 0 341 + #define XFS_MMAPLOCK_MAX_SUBCLASS 3 342 + #define XFS_MMAPLOCK_DEP_MASK 0x00f00000 310 343 311 - #define XFS_MMAPLOCK_SHIFT 20 344 + #define XFS_ILOCK_SHIFT 24 345 + #define XFS_ILOCK_PARENT_VAL 5 346 + #define XFS_ILOCK_MAX_SUBCLASS (XFS_ILOCK_PARENT_VAL - 1) 347 + #define XFS_ILOCK_RTBITMAP_VAL 6 348 + #define XFS_ILOCK_RTSUM_VAL 7 349 + #define XFS_ILOCK_DEP_MASK 0xff000000 350 + #define XFS_ILOCK_PARENT (XFS_ILOCK_PARENT_VAL << XFS_ILOCK_SHIFT) 351 + #define XFS_ILOCK_RTBITMAP (XFS_ILOCK_RTBITMAP_VAL << XFS_ILOCK_SHIFT) 352 + #define XFS_ILOCK_RTSUM (XFS_ILOCK_RTSUM_VAL << XFS_ILOCK_SHIFT) 312 353 313 - #define XFS_ILOCK_SHIFT 24 314 - #define XFS_ILOCK_PARENT (XFS_LOCK_PARENT << XFS_ILOCK_SHIFT) 315 - #define XFS_ILOCK_RTBITMAP (XFS_LOCK_RTBITMAP << XFS_ILOCK_SHIFT) 316 - #define XFS_ILOCK_RTSUM (XFS_LOCK_RTSUM << XFS_ILOCK_SHIFT) 317 - 318 - #define XFS_IOLOCK_DEP_MASK 0x000f0000 319 - #define XFS_MMAPLOCK_DEP_MASK 0x00f00000 320 - #define XFS_ILOCK_DEP_MASK 0xff000000 321 - #define XFS_LOCK_DEP_MASK (XFS_IOLOCK_DEP_MASK | \ 354 + #define XFS_LOCK_SUBCLASS_MASK (XFS_IOLOCK_DEP_MASK | \ 322 355 XFS_MMAPLOCK_DEP_MASK | \ 323 356 XFS_ILOCK_DEP_MASK) 324 357