Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs updates from Kent Overstreet:

- More safety fixes, primarily found by syzbot

- Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is
primarily for testing fsck/recovery in dry run mode, so it shouldn't
change anything besides disabling writes and holding dirty metadata
in memory.

The idea here was to reduce the amount of activity if we can't write
anything out, so that bringing up a filesystem in "super ro" mode
would be more lilkely to work for data recovery - but norecovery is
the correct option for this.

- btree_trans->locked; we now track whether a btree_trans has any btree
nodes locked, and this is used for improved assertions related to
trans_unlock() and trans_relock(). We'll also be using it for
improving how we work with lockdep in the future: we don't want
lockdep to be tracking individual btree node locks because we take
too many for lockdep to track, and it's not necessary since we have a
cycle detector.

- Trigger improvements that are prep work for online fsck

- BTREE_TRIGGER_check_repair; this regularizes how we do some repair
work for extents that goes with running triggers in fsck, and fixes
some subtle issues with transaction restarts there.

- bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot
equivalence classes are for when snapshot deletion leaves behind
redundant snapshot nodes, but snapshot deletion now cleans this up
right away, so the abstraction doesn't need to leak.

- Improvements to how we resume writing to the journal in recovery. The
code for picking the new place to write when reading the journal is
greatly simplified and we also store the position in the superblock
for when we don't read the journal; this means that we preserve more
of the journal for list_journal debugging.

- Improvements to sysfs btree_cache and btree_node_cache, for debugging
memory reclaim.

- We now detect when we've blocked for 10 seconds on the allocator in
the write path and dump some useful info.

- Safety fixes for devices references: this is a big series that
changes almost all device lookups to properly check if the device
exists and take a reference to it.

Previously we assumed that if a bkey exists that references a device
then the device must exist, and this was enforced in .invalid
methods, but this was incorrect because it meant device removal
relied on accounting being correct to not leave keys pointing to
invalid devices, and that's not something we can assume.

Getting the "pointer to invalid device" checks out of our .invalid()
methods fixes some long standing device removal bugs; the only
outstanding bug with device removal now is a race between the discard
path and deleting alloc info, which should be easily fixed.

- The allocator now prefers not to expand the new
member_info.btree_allocated bitmap, meaning if repair ever requires
scanning for btree nodes (because of a corrupt interior nodes) we
won't have to scan the whole device(s).

- New coding style document, which among other things talks about the
correct usage of assertions

* tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits)
bcachefs: add no_invalid_checks flag
bcachefs: add counters for failed shrinker reclaim
bcachefs: Fix sb_field_downgrade validation
bcachefs: Plumb bch_validate_flags to sb_field_ops.validate()
bcachefs: s/bkey_invalid_flags/bch_validate_flags
bcachefs: fsync() should not return -EROFS
bcachefs: Invalid devices are now checked for by fsck, not .invalid methods
bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs()
bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio()
bcachefs: bch2_dev_get_ioref() checks for device not present
bcachefs: bch2_dev_get_ioref2(); io_read.c
bcachefs: bch2_dev_get_ioref2(); debug.c
bcachefs: bch2_dev_get_ioref2(); journal_io.c
bcachefs: bch2_dev_get_ioref2(); io_write.c
bcachefs: bch2_dev_get_ioref2(); btree_io.c
bcachefs: bch2_dev_get_ioref2(); backpointers.c
bcachefs: bch2_dev_get_ioref2(); alloc_background.c
bcachefs: for_each_bset() declares loop iter
bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h
bcachefs: Improve sysfs internal/btree_cache
...

+4692 -4384
+186
Documentation/filesystems/bcachefs/CodingStyle.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + bcachefs coding style 4 + ===================== 5 + 6 + Good development is like gardening, and codebases are our gardens. Tend to them 7 + every day; look for little things that are out of place or in need of tidying. 8 + A little weeding here and there goes a long way; don't wait until things have 9 + spiraled out of control. 10 + 11 + Things don't always have to be perfect - nitpicking often does more harm than 12 + good. But appreciate beauty when you see it - and let people know. 13 + 14 + The code that you are afraid to touch is the code most in need of refactoring. 15 + 16 + A little organizing here and there goes a long way. 17 + 18 + Put real thought into how you organize things. 19 + 20 + Good code is readable code, where the structure is simple and leaves nowhere 21 + for bugs to hide. 22 + 23 + Assertions are one of our most important tools for writing reliable code. If in 24 + the course of writing a patchset you encounter a condition that shouldn't 25 + happen (and will have unpredictable or undefined behaviour if it does), or 26 + you're not sure if it can happen and not sure how to handle it yet - make it a 27 + BUG_ON(). Don't leave undefined or unspecified behavior lurking in the codebase. 28 + 29 + By the time you finish the patchset, you should understand better which 30 + assertions need to be handled and turned into checks with error paths, and 31 + which should be logically impossible. Leave the BUG_ON()s in for the ones which 32 + are logically impossible. (Or, make them debug mode assertions if they're 33 + expensive - but don't turn everything into a debug mode assertion, so that 34 + we're not stuck debugging undefined behaviour should it turn out that you were 35 + wrong). 36 + 37 + Assertions are documentation that can't go out of date. Good assertions are 38 + wonderful. 39 + 40 + Good assertions drastically and dramatically reduce the amount of testing 41 + required to shake out bugs. 42 + 43 + Good assertions are based on state, not logic. To write good assertions, you 44 + have to think about what the invariants on your state are. 45 + 46 + Good invariants and assertions will hold everywhere in your codebase. This 47 + means that you can run them in only a few places in the checked in version, but 48 + should you need to debug something that caused the assertion to fail, you can 49 + quickly shotgun them everywhere to find the codepath that broke the invariant. 50 + 51 + A good assertion checks something that the compiler could check for us, and 52 + elide - if we were working in a language with embedded correctness proofs that 53 + the compiler could check. This is something that exists today, but it'll likely 54 + still be a few decades before it comes to systems programming languages. But we 55 + can still incorporate that kind of thinking into our code and document the 56 + invariants with runtime checks - much like the way people working in 57 + dynamically typed languages may add type annotations, gradually making their 58 + code statically typed. 59 + 60 + Looking for ways to make your assertions simpler - and higher level - will 61 + often nudge you towards making the entire system simpler and more robust. 62 + 63 + Good code is code where you can poke around and see what it's doing - 64 + introspection. We can't debug anything if we can't see what's going on. 65 + 66 + Whenever we're debugging, and the solution isn't immediately obvious, if the 67 + issue is that we don't know where the issue is because we can't see what's 68 + going on - fix that first. 69 + 70 + We have the tools to make anything visible at runtime, efficiently - RCU and 71 + percpu data structures among them. Don't let things stay hidden. 72 + 73 + The most important tool for introspection is the humble pretty printer - in 74 + bcachefs, this means `*_to_text()` functions, which output to printbufs. 75 + 76 + Pretty printers are wonderful, because they compose and you can use them 77 + everywhere. Having functions to print whatever object you're working with will 78 + make your error messages much easier to write (therefore they will actually 79 + exist) and much more informative. And they can be used from sysfs/debugfs, as 80 + well as tracepoints. 81 + 82 + Runtime info and debugging tools should come with clear descriptions and 83 + labels, and good structure - we don't want files with a list of bare integers, 84 + like in procfs. Part of the job of the debugging tools is to educate users and 85 + new developers as to how the system works. 86 + 87 + Error messages should, whenever possible, tell you everything you need to debug 88 + the issue. It's worth putting effort into them. 89 + 90 + Tracepoints shouldn't be the first thing you reach for. They're an important 91 + tool, but always look for more immediate ways to make things visible. When we 92 + have to rely on tracing, we have to know which tracepoints we're looking for, 93 + and then we have to run the troublesome workload, and then we have to sift 94 + through logs. This is a lot of steps to go through when a user is hitting 95 + something, and if it's intermittent it may not even be possible. 96 + 97 + The humble counter is an incredibly useful tool. They're cheap and simple to 98 + use, and many complicated internal operations with lots of things that can 99 + behave weirdly (anything involving memory reclaim, for example) become 100 + shockingly easy to debug once you have counters on every distinct codepath. 101 + 102 + Persistent counters are even better. 103 + 104 + When debugging, try to get the most out of every bug you come across; don't 105 + rush to fix the initial issue. Look for things that will make related bugs 106 + easier the next time around - introspection, new assertions, better error 107 + messages, new debug tools, and do those first. Look for ways to make the system 108 + better behaved; often one bug will uncover several other bugs through 109 + downstream effects. 110 + 111 + Fix all that first, and then the original bug last - even if that means keeping 112 + a user waiting. They'll thank you in the long run, and when they understand 113 + what you're doing you'll be amazed at how patient they're happy to be. Users 114 + like to help - otherwise they wouldn't be reporting the bug in the first place. 115 + 116 + Talk to your users. Don't isolate yourself. 117 + 118 + Users notice all sorts of interesting things, and by just talking to them and 119 + interacting with them you can benefit from their experience. 120 + 121 + Spend time doing support and helpdesk stuff. Don't just write code - code isn't 122 + finished until it's being used trouble free. 123 + 124 + This will also motivate you to make your debugging tools as good as possible, 125 + and perhaps even your documentation, too. Like anything else in life, the more 126 + time you spend at it the better you'll get, and you the developer are the 127 + person most able to improve the tools to make debugging quick and easy. 128 + 129 + Be wary of how you take on and commit to big projects. Don't let development 130 + become product-manager focused. Often time an idea is a good one but needs to 131 + wait for its proper time - but you won't know if it's the proper time for an 132 + idea until you start writing code. 133 + 134 + Expect to throw a lot of things away, or leave them half finished for later. 135 + Nobody writes all perfect code that all gets shipped, and you'll be much more 136 + productive in the long run if you notice this early and shift to something 137 + else. The experience gained and lessons learned will be valuable for all the 138 + other work you do. 139 + 140 + But don't be afraid to tackle projects that require significant rework of 141 + existing code. Sometimes these can be the best projects, because they can lead 142 + us to make existing code more general, more flexible, more multipurpose and 143 + perhaps more robust. Just don't hesitate to abandon the idea if it looks like 144 + it's going to make a mess of things. 145 + 146 + Complicated features can often be done as a series of refactorings, with the 147 + final change that actually implements the feature as a quite small patch at the 148 + end. It's wonderful when this happens, especially when those refactorings are 149 + things that improve the codebase in their own right. When that happens there's 150 + much less risk of wasted effort if the feature you were going for doesn't work 151 + out. 152 + 153 + Always strive to work incrementally. Always strive to turn the big projects 154 + into little bite sized projects that can prove their own merits. 155 + 156 + Instead of always tackling those big projects, look for little things that 157 + will be useful, and make the big projects easier. 158 + 159 + The question of what's likely to be useful is where junior developers most 160 + often go astray - doing something because it seems like it'll be useful often 161 + leads to overengineering. Knowing what's useful comes from many years of 162 + experience, or talking with people who have that experience - or from simply 163 + reading lots of code and looking for common patterns and issues. Don't be 164 + afraid to throw things away and do something simpler. 165 + 166 + Talk about your ideas with your fellow developers; often times the best things 167 + come from relaxed conversations where people aren't afraid to say "what if?". 168 + 169 + Don't neglect your tools. 170 + 171 + The most important tools (besides the compiler and our text editor) are the 172 + tools we use for testing. The shortest possible edit/test/debug cycle is 173 + essential for working productively. We learn, gain experience, and discover the 174 + errors in our thinking by running our code and seeing what happens. If your 175 + time is being wasted because your tools are bad or too slow - don't accept it, 176 + fix it. 177 + 178 + Put effort into your documentation, commmit messages, and code comments - but 179 + don't go overboard. A good commit message is wonderful - but if the information 180 + was important enough to go in a commit message, ask yourself if it would be 181 + even better as a code comment. 182 + 183 + A good code comment is wonderful, but even better is the comment that didn't 184 + need to exist because the code was so straightforward as to be obvious; 185 + organized into small clean and tidy modules, with clear and descriptive names 186 + for functions and variable, where every line of code has a clear purpose.
+1
Documentation/filesystems/bcachefs/index.rst
··· 8 8 :maxdepth: 2 9 9 :numbered: 10 10 11 + CodingStyle 11 12 errorcodes
+15 -30
fs/bcachefs/acl.c
··· 282 282 struct btree_trans *trans = bch2_trans_get(c); 283 283 struct btree_iter iter = { NULL }; 284 284 struct posix_acl *acl = NULL; 285 - struct bkey_s_c k; 286 - int ret; 287 285 retry: 288 286 bch2_trans_begin(trans); 289 287 290 - ret = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc, 291 - &hash, inode_inum(inode), &search, 0); 292 - if (ret) 293 - goto err; 294 - 295 - k = bch2_btree_iter_peek_slot(&iter); 296 - ret = bkey_err(k); 288 + struct bkey_s_c k = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc, 289 + &hash, inode_inum(inode), &search, 0); 290 + int ret = bkey_err(k); 297 291 if (ret) 298 292 goto err; 299 293 ··· 360 366 361 367 ret = bch2_subvol_is_ro_trans(trans, inode->ei_subvol) ?: 362 368 bch2_inode_peek(trans, &inode_iter, &inode_u, inode_inum(inode), 363 - BTREE_ITER_INTENT); 369 + BTREE_ITER_intent); 364 370 if (ret) 365 371 goto btree_err; 366 372 ··· 408 414 struct bch_hash_info hash_info = bch2_hash_info_init(trans->c, inode); 409 415 struct xattr_search_key search = X_SEARCH(KEY_TYPE_XATTR_INDEX_POSIX_ACL_ACCESS, "", 0); 410 416 struct btree_iter iter; 411 - struct bkey_s_c_xattr xattr; 412 - struct bkey_i_xattr *new; 413 417 struct posix_acl *acl = NULL; 414 - struct bkey_s_c k; 415 - int ret; 416 418 417 - ret = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc, 418 - &hash_info, inum, &search, BTREE_ITER_INTENT); 419 + struct bkey_s_c k = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc, 420 + &hash_info, inum, &search, BTREE_ITER_intent); 421 + int ret = bkey_err(k); 419 422 if (ret) 420 423 return bch2_err_matches(ret, ENOENT) ? 0 : ret; 421 424 422 - k = bch2_btree_iter_peek_slot(&iter); 423 - ret = bkey_err(k); 424 - if (ret) 425 - goto err; 426 - xattr = bkey_s_c_to_xattr(k); 425 + struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k); 427 426 428 427 acl = bch2_acl_from_disk(trans, xattr_val(xattr.v), 429 428 le16_to_cpu(xattr.v->x_val_len)); 430 429 ret = PTR_ERR_OR_ZERO(acl); 431 - if (IS_ERR_OR_NULL(acl)) 432 - goto err; 433 - 434 - ret = allocate_dropping_locks_errcode(trans, 435 - __posix_acl_chmod(&acl, _gfp, mode)); 436 430 if (ret) 437 431 goto err; 438 432 439 - new = bch2_acl_to_xattr(trans, acl, ACL_TYPE_ACCESS); 440 - if (IS_ERR(new)) { 441 - ret = PTR_ERR(new); 433 + ret = allocate_dropping_locks_errcode(trans, __posix_acl_chmod(&acl, _gfp, mode)); 434 + if (ret) 442 435 goto err; 443 - } 436 + 437 + struct bkey_i_xattr *new = bch2_acl_to_xattr(trans, acl, ACL_TYPE_ACCESS); 438 + ret = PTR_ERR_OR_ZERO(new); 439 + if (ret) 440 + goto err; 444 441 445 442 new->k.p = iter.pos; 446 443 ret = bch2_trans_update(trans, &iter, &new->k_i, 0);
+169 -170
fs/bcachefs/alloc_background.c
··· 195 195 } 196 196 197 197 int bch2_alloc_v1_invalid(struct bch_fs *c, struct bkey_s_c k, 198 - enum bkey_invalid_flags flags, 198 + enum bch_validate_flags flags, 199 199 struct printbuf *err) 200 200 { 201 201 struct bkey_s_c_alloc a = bkey_s_c_to_alloc(k); ··· 211 211 } 212 212 213 213 int bch2_alloc_v2_invalid(struct bch_fs *c, struct bkey_s_c k, 214 - enum bkey_invalid_flags flags, 214 + enum bch_validate_flags flags, 215 215 struct printbuf *err) 216 216 { 217 217 struct bkey_alloc_unpacked u; ··· 225 225 } 226 226 227 227 int bch2_alloc_v3_invalid(struct bch_fs *c, struct bkey_s_c k, 228 - enum bkey_invalid_flags flags, 228 + enum bch_validate_flags flags, 229 229 struct printbuf *err) 230 230 { 231 231 struct bkey_alloc_unpacked u; ··· 239 239 } 240 240 241 241 int bch2_alloc_v4_invalid(struct bch_fs *c, struct bkey_s_c k, 242 - enum bkey_invalid_flags flags, struct printbuf *err) 242 + enum bch_validate_flags flags, struct printbuf *err) 243 243 { 244 244 struct bkey_s_c_alloc_v4 a = bkey_s_c_to_alloc_v4(k); 245 245 int ret = 0; ··· 263 263 case BCH_DATA_free: 264 264 case BCH_DATA_need_gc_gens: 265 265 case BCH_DATA_need_discard: 266 - bkey_fsck_err_on(bch2_bucket_sectors(*a.v) || a.v->stripe, 266 + bkey_fsck_err_on(bch2_bucket_sectors_total(*a.v) || a.v->stripe, 267 267 c, err, alloc_key_empty_but_have_data, 268 268 "empty data type free but have data"); 269 269 break; ··· 330 330 prt_printf(out, "gen %u oldest_gen %u data_type ", a->gen, a->oldest_gen); 331 331 bch2_prt_data_type(out, a->data_type); 332 332 prt_newline(out); 333 - prt_printf(out, "journal_seq %llu", a->journal_seq); 334 - prt_newline(out); 335 - prt_printf(out, "need_discard %llu", BCH_ALLOC_V4_NEED_DISCARD(a)); 336 - prt_newline(out); 337 - prt_printf(out, "need_inc_gen %llu", BCH_ALLOC_V4_NEED_INC_GEN(a)); 338 - prt_newline(out); 339 - prt_printf(out, "dirty_sectors %u", a->dirty_sectors); 340 - prt_newline(out); 341 - prt_printf(out, "cached_sectors %u", a->cached_sectors); 342 - prt_newline(out); 343 - prt_printf(out, "stripe %u", a->stripe); 344 - prt_newline(out); 345 - prt_printf(out, "stripe_redundancy %u", a->stripe_redundancy); 346 - prt_newline(out); 347 - prt_printf(out, "io_time[READ] %llu", a->io_time[READ]); 348 - prt_newline(out); 349 - prt_printf(out, "io_time[WRITE] %llu", a->io_time[WRITE]); 350 - prt_newline(out); 351 - prt_printf(out, "fragmentation %llu", a->fragmentation_lru); 352 - prt_newline(out); 353 - prt_printf(out, "bp_start %llu", BCH_ALLOC_V4_BACKPOINTERS_START(a)); 333 + prt_printf(out, "journal_seq %llu\n", a->journal_seq); 334 + prt_printf(out, "need_discard %llu\n", BCH_ALLOC_V4_NEED_DISCARD(a)); 335 + prt_printf(out, "need_inc_gen %llu\n", BCH_ALLOC_V4_NEED_INC_GEN(a)); 336 + prt_printf(out, "dirty_sectors %u\n", a->dirty_sectors); 337 + prt_printf(out, "cached_sectors %u\n", a->cached_sectors); 338 + prt_printf(out, "stripe %u\n", a->stripe); 339 + prt_printf(out, "stripe_redundancy %u\n", a->stripe_redundancy); 340 + prt_printf(out, "io_time[READ] %llu\n", a->io_time[READ]); 341 + prt_printf(out, "io_time[WRITE] %llu\n", a->io_time[WRITE]); 342 + prt_printf(out, "fragmentation %llu\n", a->fragmentation_lru); 343 + prt_printf(out, "bp_start %llu\n", BCH_ALLOC_V4_BACKPOINTERS_START(a)); 354 344 printbuf_indent_sub(out, 2); 355 345 } 356 346 ··· 429 439 } 430 440 431 441 struct bkey_i_alloc_v4 * 432 - bch2_trans_start_alloc_update(struct btree_trans *trans, struct btree_iter *iter, 433 - struct bpos pos) 442 + bch2_trans_start_alloc_update_noupdate(struct btree_trans *trans, struct btree_iter *iter, 443 + struct bpos pos) 434 444 { 435 - struct bkey_s_c k; 436 - struct bkey_i_alloc_v4 *a; 437 - int ret; 438 - 439 - k = bch2_bkey_get_iter(trans, iter, BTREE_ID_alloc, pos, 440 - BTREE_ITER_WITH_UPDATES| 441 - BTREE_ITER_CACHED| 442 - BTREE_ITER_INTENT); 443 - ret = bkey_err(k); 445 + struct bkey_s_c k = bch2_bkey_get_iter(trans, iter, BTREE_ID_alloc, pos, 446 + BTREE_ITER_with_updates| 447 + BTREE_ITER_cached| 448 + BTREE_ITER_intent); 449 + int ret = bkey_err(k); 444 450 if (unlikely(ret)) 445 451 return ERR_PTR(ret); 446 452 447 - a = bch2_alloc_to_v4_mut_inlined(trans, k); 453 + struct bkey_i_alloc_v4 *a = bch2_alloc_to_v4_mut_inlined(trans, k); 448 454 ret = PTR_ERR_OR_ZERO(a); 449 455 if (unlikely(ret)) 450 456 goto err; ··· 448 462 err: 449 463 bch2_trans_iter_exit(trans, iter); 450 464 return ERR_PTR(ret); 465 + } 466 + 467 + __flatten 468 + struct bkey_i_alloc_v4 *bch2_trans_start_alloc_update(struct btree_trans *trans, struct bpos pos) 469 + { 470 + struct btree_iter iter; 471 + struct bkey_i_alloc_v4 *a = bch2_trans_start_alloc_update_noupdate(trans, &iter, pos); 472 + int ret = PTR_ERR_OR_ZERO(a); 473 + if (ret) 474 + return ERR_PTR(ret); 475 + 476 + ret = bch2_trans_update(trans, &iter, &a->k_i, 0); 477 + bch2_trans_iter_exit(trans, &iter); 478 + return unlikely(ret) ? ERR_PTR(ret) : a; 451 479 } 452 480 453 481 static struct bpos alloc_gens_pos(struct bpos pos, unsigned *offset) ··· 487 487 } 488 488 489 489 int bch2_bucket_gens_invalid(struct bch_fs *c, struct bkey_s_c k, 490 - enum bkey_invalid_flags flags, 490 + enum bch_validate_flags flags, 491 491 struct printbuf *err) 492 492 { 493 493 int ret = 0; ··· 520 520 int ret; 521 521 522 522 ret = for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN, 523 - BTREE_ITER_PREFETCH, k, ({ 523 + BTREE_ITER_prefetch, k, ({ 524 524 /* 525 525 * Not a fsck error because this is checked/repaired by 526 526 * bch2_check_alloc_key() which runs later: ··· 567 567 int bch2_alloc_read(struct bch_fs *c) 568 568 { 569 569 struct btree_trans *trans = bch2_trans_get(c); 570 + struct bch_dev *ca = NULL; 570 571 int ret; 571 572 572 573 down_read(&c->gc_lock); 573 574 574 575 if (c->sb.version_upgrade_complete >= bcachefs_metadata_version_bucket_gens) { 575 576 ret = for_each_btree_key(trans, iter, BTREE_ID_bucket_gens, POS_MIN, 576 - BTREE_ITER_PREFETCH, k, ({ 577 + BTREE_ITER_prefetch, k, ({ 577 578 u64 start = bucket_gens_pos_to_alloc(k.k->p, 0).offset; 578 579 u64 end = bucket_gens_pos_to_alloc(bpos_nosnap_successor(k.k->p), 0).offset; 579 580 580 581 if (k.k->type != KEY_TYPE_bucket_gens) 581 582 continue; 582 583 583 - const struct bch_bucket_gens *g = bkey_s_c_to_bucket_gens(k).v; 584 - 584 + ca = bch2_dev_iterate(c, ca, k.k->p.inode); 585 585 /* 586 586 * Not a fsck error because this is checked/repaired by 587 587 * bch2_check_alloc_key() which runs later: 588 588 */ 589 - if (!bch2_dev_exists2(c, k.k->p.inode)) 589 + if (!ca) { 590 + bch2_btree_iter_set_pos(&iter, POS(k.k->p.inode + 1, 0)); 590 591 continue; 592 + } 591 593 592 - struct bch_dev *ca = bch_dev_bkey_exists(c, k.k->p.inode); 594 + const struct bch_bucket_gens *g = bkey_s_c_to_bucket_gens(k).v; 593 595 594 596 for (u64 b = max_t(u64, ca->mi.first_bucket, start); 595 597 b < min_t(u64, ca->mi.nbuckets, end); ··· 601 599 })); 602 600 } else { 603 601 ret = for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN, 604 - BTREE_ITER_PREFETCH, k, ({ 602 + BTREE_ITER_prefetch, k, ({ 603 + ca = bch2_dev_iterate(c, ca, k.k->p.inode); 605 604 /* 606 605 * Not a fsck error because this is checked/repaired by 607 606 * bch2_check_alloc_key() which runs later: 608 607 */ 609 - if (!bch2_dev_bucket_exists(c, k.k->p)) 608 + if (!ca) { 609 + bch2_btree_iter_set_pos(&iter, POS(k.k->p.inode + 1, 0)); 610 610 continue; 611 - 612 - struct bch_dev *ca = bch_dev_bkey_exists(c, k.k->p.inode); 611 + } 613 612 614 613 struct bch_alloc_v4 a; 615 614 *bucket_gen(ca, k.k->p.offset) = bch2_alloc_to_v4(k, &a)->gen; ··· 618 615 })); 619 616 } 620 617 618 + bch2_dev_put(ca); 621 619 bch2_trans_put(trans); 622 620 up_read(&c->gc_lock); 623 621 ··· 629 625 /* Free space/discard btree: */ 630 626 631 627 static int bch2_bucket_do_index(struct btree_trans *trans, 628 + struct bch_dev *ca, 632 629 struct bkey_s_c alloc_k, 633 630 const struct bch_alloc_v4 *a, 634 631 bool set) 635 632 { 636 633 struct bch_fs *c = trans->c; 637 - struct bch_dev *ca = bch_dev_bkey_exists(c, alloc_k.k->p.inode); 638 634 struct btree_iter iter; 639 635 struct bkey_s_c old; 640 636 struct bkey_i *k; ··· 671 667 672 668 old = bch2_bkey_get_iter(trans, &iter, btree, 673 669 bkey_start_pos(&k->k), 674 - BTREE_ITER_INTENT); 670 + BTREE_ITER_intent); 675 671 ret = bkey_err(old); 676 672 if (ret) 677 673 return ret; ··· 715 711 return ret; 716 712 717 713 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_bucket_gens, pos, 718 - BTREE_ITER_INTENT| 719 - BTREE_ITER_WITH_UPDATES); 714 + BTREE_ITER_intent| 715 + BTREE_ITER_with_updates); 720 716 ret = bkey_err(k); 721 717 if (ret) 722 718 return ret; ··· 738 734 int bch2_trigger_alloc(struct btree_trans *trans, 739 735 enum btree_id btree, unsigned level, 740 736 struct bkey_s_c old, struct bkey_s new, 741 - unsigned flags) 737 + enum btree_iter_update_trigger_flags flags) 742 738 { 743 739 struct bch_fs *c = trans->c; 744 740 int ret = 0; 745 741 746 - if (bch2_trans_inconsistent_on(!bch2_dev_bucket_exists(c, new.k->p), trans, 747 - "alloc key for invalid device or bucket")) 742 + struct bch_dev *ca = bch2_dev_bucket_tryget(c, new.k->p); 743 + if (!ca) 748 744 return -EIO; 749 - 750 - struct bch_dev *ca = bch_dev_bkey_exists(c, new.k->p.inode); 751 745 752 746 struct bch_alloc_v4 old_a_convert; 753 747 const struct bch_alloc_v4 *old_a = bch2_alloc_to_v4(old, &old_a_convert); 754 748 755 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { 749 + if (flags & BTREE_TRIGGER_transactional) { 756 750 struct bch_alloc_v4 *new_a = bkey_s_to_alloc_v4(new).v; 757 751 758 - new_a->data_type = alloc_data_type(*new_a, new_a->data_type); 752 + alloc_data_type_set(new_a, new_a->data_type); 759 753 760 - if (bch2_bucket_sectors(*new_a) > bch2_bucket_sectors(*old_a)) { 754 + if (bch2_bucket_sectors_total(*new_a) > bch2_bucket_sectors_total(*old_a)) { 761 755 new_a->io_time[READ] = max_t(u64, 1, atomic64_read(&c->io_clock[READ].now)); 762 756 new_a->io_time[WRITE]= max_t(u64, 1, atomic64_read(&c->io_clock[WRITE].now)); 763 757 SET_BCH_ALLOC_V4_NEED_INC_GEN(new_a, true); ··· 772 770 if (old_a->data_type != new_a->data_type || 773 771 (new_a->data_type == BCH_DATA_free && 774 772 alloc_freespace_genbits(*old_a) != alloc_freespace_genbits(*new_a))) { 775 - ret = bch2_bucket_do_index(trans, old, old_a, false) ?: 776 - bch2_bucket_do_index(trans, new.s_c, new_a, true); 773 + ret = bch2_bucket_do_index(trans, ca, old, old_a, false) ?: 774 + bch2_bucket_do_index(trans, ca, new.s_c, new_a, true); 777 775 if (ret) 778 - return ret; 776 + goto err; 779 777 } 780 778 781 779 if (new_a->data_type == BCH_DATA_cached && ··· 789 787 bucket_to_u64(new.k->p), 790 788 old_lru, new_lru); 791 789 if (ret) 792 - return ret; 790 + goto err; 793 791 } 794 792 795 - new_a->fragmentation_lru = alloc_lru_idx_fragmentation(*new_a, 796 - bch_dev_bkey_exists(c, new.k->p.inode)); 793 + new_a->fragmentation_lru = alloc_lru_idx_fragmentation(*new_a, ca); 797 794 if (old_a->fragmentation_lru != new_a->fragmentation_lru) { 798 795 ret = bch2_lru_change(trans, 799 796 BCH_LRU_FRAGMENTATION_START, 800 797 bucket_to_u64(new.k->p), 801 798 old_a->fragmentation_lru, new_a->fragmentation_lru); 802 799 if (ret) 803 - return ret; 800 + goto err; 804 801 } 805 802 806 803 if (old_a->gen != new_a->gen) { 807 804 ret = bch2_bucket_gen_update(trans, new.k->p, new_a->gen); 808 805 if (ret) 809 - return ret; 806 + goto err; 810 807 } 811 808 812 809 /* ··· 813 812 * not: 814 813 */ 815 814 816 - if ((flags & BTREE_TRIGGER_BUCKET_INVALIDATE) && 815 + if ((flags & BTREE_TRIGGER_bucket_invalidate) && 817 816 old_a->cached_sectors) { 818 817 ret = bch2_update_cached_sectors_list(trans, new.k->p.inode, 819 818 -((s64) old_a->cached_sectors)); 820 819 if (ret) 821 - return ret; 820 + goto err; 822 821 } 823 822 } 824 823 825 - if ((flags & BTREE_TRIGGER_ATOMIC) && (flags & BTREE_TRIGGER_INSERT)) { 824 + if ((flags & BTREE_TRIGGER_atomic) && (flags & BTREE_TRIGGER_insert)) { 826 825 struct bch_alloc_v4 *new_a = bkey_s_to_alloc_v4(new).v; 827 826 u64 journal_seq = trans->journal_res.seq; 828 827 u64 bucket_journal_seq = new_a->journal_seq; 829 828 830 - if ((flags & BTREE_TRIGGER_INSERT) && 829 + if ((flags & BTREE_TRIGGER_insert) && 831 830 data_type_is_empty(old_a->data_type) != 832 831 data_type_is_empty(new_a->data_type) && 833 832 new.k->type == KEY_TYPE_alloc_v4) { ··· 855 854 if (ret) { 856 855 bch2_fs_fatal_error(c, 857 856 "setting bucket_needs_journal_commit: %s", bch2_err_str(ret)); 858 - return ret; 857 + goto err; 859 858 } 860 859 } 861 860 ··· 885 884 bch2_do_invalidates(c); 886 885 887 886 if (statechange(a->data_type == BCH_DATA_need_gc_gens)) 888 - bch2_do_gc_gens(c); 887 + bch2_gc_gens_async(c); 889 888 } 890 889 891 - if ((flags & BTREE_TRIGGER_GC) && 892 - (flags & BTREE_TRIGGER_BUCKET_INVALIDATE)) { 890 + if ((flags & BTREE_TRIGGER_gc) && 891 + (flags & BTREE_TRIGGER_bucket_invalidate)) { 893 892 struct bch_alloc_v4 new_a_convert; 894 893 const struct bch_alloc_v4 *new_a = bch2_alloc_to_v4(new.s_c, &new_a_convert); 895 894 ··· 909 908 bucket_unlock(g); 910 909 percpu_up_read(&c->mark_lock); 911 910 } 912 - 913 - return 0; 911 + err: 912 + bch2_dev_put(ca); 913 + return ret; 914 914 } 915 915 916 916 /* 917 - * This synthesizes deleted extents for holes, similar to BTREE_ITER_SLOTS for 917 + * This synthesizes deleted extents for holes, similar to BTREE_ITER_slots for 918 918 * extents style btrees, but works on non-extents btrees: 919 919 */ 920 920 static struct bkey_s_c bch2_get_key_or_hole(struct btree_iter *iter, struct bpos end, struct bkey *hole) ··· 960 958 } 961 959 } 962 960 963 - static bool next_bucket(struct bch_fs *c, struct bpos *bucket) 961 + static bool next_bucket(struct bch_fs *c, struct bch_dev **ca, struct bpos *bucket) 964 962 { 965 - struct bch_dev *ca; 963 + if (*ca) { 964 + if (bucket->offset < (*ca)->mi.first_bucket) 965 + bucket->offset = (*ca)->mi.first_bucket; 966 966 967 - if (bch2_dev_bucket_exists(c, *bucket)) 968 - return true; 969 - 970 - if (bch2_dev_exists2(c, bucket->inode)) { 971 - ca = bch_dev_bkey_exists(c, bucket->inode); 972 - 973 - if (bucket->offset < ca->mi.first_bucket) { 974 - bucket->offset = ca->mi.first_bucket; 967 + if (bucket->offset < (*ca)->mi.nbuckets) 975 968 return true; 976 - } 977 969 970 + bch2_dev_put(*ca); 971 + *ca = NULL; 978 972 bucket->inode++; 979 973 bucket->offset = 0; 980 974 } 981 975 982 976 rcu_read_lock(); 983 - ca = __bch2_next_dev_idx(c, bucket->inode, NULL); 984 - if (ca) 985 - *bucket = POS(ca->dev_idx, ca->mi.first_bucket); 977 + *ca = __bch2_next_dev_idx(c, bucket->inode, NULL); 978 + if (*ca) { 979 + *bucket = POS((*ca)->dev_idx, (*ca)->mi.first_bucket); 980 + bch2_dev_get(*ca); 981 + } 986 982 rcu_read_unlock(); 987 983 988 - return ca != NULL; 984 + return *ca != NULL; 989 985 } 990 986 991 - static struct bkey_s_c bch2_get_key_or_real_bucket_hole(struct btree_iter *iter, struct bkey *hole) 987 + static struct bkey_s_c bch2_get_key_or_real_bucket_hole(struct btree_iter *iter, 988 + struct bch_dev **ca, struct bkey *hole) 992 989 { 993 990 struct bch_fs *c = iter->trans->c; 994 991 struct bkey_s_c k; ··· 996 995 if (bkey_err(k)) 997 996 return k; 998 997 999 - if (!k.k->type) { 1000 - struct bpos bucket = bkey_start_pos(k.k); 998 + *ca = bch2_dev_iterate_noerror(c, *ca, k.k->p.inode); 1001 999 1002 - if (!bch2_dev_bucket_exists(c, bucket)) { 1003 - if (!next_bucket(c, &bucket)) 1000 + if (!k.k->type) { 1001 + struct bpos hole_start = bkey_start_pos(k.k); 1002 + 1003 + if (!*ca || !bucket_valid(*ca, hole_start.offset)) { 1004 + if (!next_bucket(c, ca, &hole_start)) 1004 1005 return bkey_s_c_null; 1005 1006 1006 - bch2_btree_iter_set_pos(iter, bucket); 1007 + bch2_btree_iter_set_pos(iter, hole_start); 1007 1008 goto again; 1008 1009 } 1009 1010 1010 - if (!bch2_dev_bucket_exists(c, k.k->p)) { 1011 - struct bch_dev *ca = bch_dev_bkey_exists(c, bucket.inode); 1012 - 1013 - bch2_key_resize(hole, ca->mi.nbuckets - bucket.offset); 1014 - } 1011 + if (k.k->p.offset > (*ca)->mi.nbuckets) 1012 + bch2_key_resize(hole, (*ca)->mi.nbuckets - hole_start.offset); 1015 1013 } 1016 1014 1017 1015 return k; ··· 1025 1025 struct btree_iter *bucket_gens_iter) 1026 1026 { 1027 1027 struct bch_fs *c = trans->c; 1028 - struct bch_dev *ca; 1029 1028 struct bch_alloc_v4 a_convert; 1030 1029 const struct bch_alloc_v4 *a; 1031 1030 unsigned discard_key_type, freespace_key_type; 1032 1031 unsigned gens_offset; 1033 1032 struct bkey_s_c k; 1034 1033 struct printbuf buf = PRINTBUF; 1035 - int ret; 1034 + int ret = 0; 1036 1035 1037 - if (fsck_err_on(!bch2_dev_bucket_exists(c, alloc_k.k->p), c, 1038 - alloc_key_to_missing_dev_bucket, 1036 + struct bch_dev *ca = bch2_dev_bucket_tryget_noerror(c, alloc_k.k->p); 1037 + if (fsck_err_on(!ca, 1038 + c, alloc_key_to_missing_dev_bucket, 1039 1039 "alloc key for invalid device:bucket %llu:%llu", 1040 1040 alloc_k.k->p.inode, alloc_k.k->p.offset)) 1041 - return bch2_btree_delete_at(trans, alloc_iter, 0); 1041 + ret = bch2_btree_delete_at(trans, alloc_iter, 0); 1042 + if (!ca) 1043 + return ret; 1042 1044 1043 - ca = bch_dev_bkey_exists(c, alloc_k.k->p.inode); 1044 1045 if (!ca->mi.freespace_initialized) 1045 - return 0; 1046 + goto out; 1046 1047 1047 1048 a = bch2_alloc_to_v4(alloc_k, &a_convert); 1048 1049 ··· 1142 1141 if (ret) 1143 1142 goto err; 1144 1143 } 1144 + out: 1145 1145 err: 1146 1146 fsck_err: 1147 + bch2_dev_put(ca); 1147 1148 printbuf_exit(&buf); 1148 1149 return ret; 1149 1150 } 1150 1151 1151 1152 static noinline_for_stack 1152 1153 int bch2_check_alloc_hole_freespace(struct btree_trans *trans, 1154 + struct bch_dev *ca, 1153 1155 struct bpos start, 1154 1156 struct bpos *end, 1155 1157 struct btree_iter *freespace_iter) 1156 1158 { 1157 1159 struct bch_fs *c = trans->c; 1158 - struct bch_dev *ca; 1159 1160 struct bkey_s_c k; 1160 1161 struct printbuf buf = PRINTBUF; 1161 1162 int ret; 1162 1163 1163 - ca = bch_dev_bkey_exists(c, start.inode); 1164 1164 if (!ca->mi.freespace_initialized) 1165 1165 return 0; 1166 1166 ··· 1315 1313 goto delete; 1316 1314 out: 1317 1315 fsck_err: 1318 - set_btree_iter_dontneed(&alloc_iter); 1316 + bch2_set_btree_iter_dontneed(&alloc_iter); 1319 1317 bch2_trans_iter_exit(trans, &alloc_iter); 1320 1318 printbuf_exit(&buf); 1321 1319 return ret; ··· 1339 1337 { 1340 1338 struct bch_fs *c = trans->c; 1341 1339 struct bkey_i_bucket_gens g; 1342 - struct bch_dev *ca; 1343 1340 u64 start = bucket_gens_pos_to_alloc(k.k->p, 0).offset; 1344 1341 u64 end = bucket_gens_pos_to_alloc(bpos_nosnap_successor(k.k->p), 0).offset; 1345 1342 u64 b; 1346 - bool need_update = false, dev_exists; 1343 + bool need_update = false; 1347 1344 struct printbuf buf = PRINTBUF; 1348 1345 int ret = 0; 1349 1346 1350 1347 BUG_ON(k.k->type != KEY_TYPE_bucket_gens); 1351 1348 bkey_reassemble(&g.k_i, k); 1352 1349 1353 - /* if no bch_dev, skip out whether we repair or not */ 1354 - dev_exists = bch2_dev_exists2(c, k.k->p.inode); 1355 - if (!dev_exists) { 1356 - if (fsck_err_on(!dev_exists, c, 1357 - bucket_gens_to_invalid_dev, 1358 - "bucket_gens key for invalid device:\n %s", 1359 - (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 1350 + struct bch_dev *ca = bch2_dev_tryget_noerror(c, k.k->p.inode); 1351 + if (!ca) { 1352 + if (fsck_err(c, bucket_gens_to_invalid_dev, 1353 + "bucket_gens key for invalid device:\n %s", 1354 + (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 1360 1355 ret = bch2_btree_delete_at(trans, iter, 0); 1361 - } 1362 1356 goto out; 1363 1357 } 1364 1358 1365 - ca = bch_dev_bkey_exists(c, k.k->p.inode); 1366 1359 if (fsck_err_on(end <= ca->mi.first_bucket || 1367 1360 start >= ca->mi.nbuckets, c, 1368 1361 bucket_gens_to_invalid_buckets, ··· 1395 1398 } 1396 1399 out: 1397 1400 fsck_err: 1401 + bch2_dev_put(ca); 1398 1402 printbuf_exit(&buf); 1399 1403 return ret; 1400 1404 } ··· 1404 1406 { 1405 1407 struct btree_trans *trans = bch2_trans_get(c); 1406 1408 struct btree_iter iter, discard_iter, freespace_iter, bucket_gens_iter; 1409 + struct bch_dev *ca = NULL; 1407 1410 struct bkey hole; 1408 1411 struct bkey_s_c k; 1409 1412 int ret = 0; 1410 1413 1411 1414 bch2_trans_iter_init(trans, &iter, BTREE_ID_alloc, POS_MIN, 1412 - BTREE_ITER_PREFETCH); 1415 + BTREE_ITER_prefetch); 1413 1416 bch2_trans_iter_init(trans, &discard_iter, BTREE_ID_need_discard, POS_MIN, 1414 - BTREE_ITER_PREFETCH); 1417 + BTREE_ITER_prefetch); 1415 1418 bch2_trans_iter_init(trans, &freespace_iter, BTREE_ID_freespace, POS_MIN, 1416 - BTREE_ITER_PREFETCH); 1419 + BTREE_ITER_prefetch); 1417 1420 bch2_trans_iter_init(trans, &bucket_gens_iter, BTREE_ID_bucket_gens, POS_MIN, 1418 - BTREE_ITER_PREFETCH); 1421 + BTREE_ITER_prefetch); 1419 1422 1420 1423 while (1) { 1421 1424 struct bpos next; 1422 1425 1423 1426 bch2_trans_begin(trans); 1424 1427 1425 - k = bch2_get_key_or_real_bucket_hole(&iter, &hole); 1428 + k = bch2_get_key_or_real_bucket_hole(&iter, &ca, &hole); 1426 1429 ret = bkey_err(k); 1427 1430 if (ret) 1428 1431 goto bkey_err; ··· 1444 1445 } else { 1445 1446 next = k.k->p; 1446 1447 1447 - ret = bch2_check_alloc_hole_freespace(trans, 1448 + ret = bch2_check_alloc_hole_freespace(trans, ca, 1448 1449 bkey_start_pos(k.k), 1449 1450 &next, 1450 1451 &freespace_iter) ?: ··· 1472 1473 bch2_trans_iter_exit(trans, &freespace_iter); 1473 1474 bch2_trans_iter_exit(trans, &discard_iter); 1474 1475 bch2_trans_iter_exit(trans, &iter); 1476 + bch2_dev_put(ca); 1477 + ca = NULL; 1475 1478 1476 1479 if (ret < 0) 1477 1480 goto err; 1478 1481 1479 1482 ret = for_each_btree_key(trans, iter, 1480 1483 BTREE_ID_need_discard, POS_MIN, 1481 - BTREE_ITER_PREFETCH, k, 1484 + BTREE_ITER_prefetch, k, 1482 1485 bch2_check_discard_freespace_key(trans, &iter)); 1483 1486 if (ret) 1484 1487 goto err; 1485 1488 1486 1489 bch2_trans_iter_init(trans, &iter, BTREE_ID_freespace, POS_MIN, 1487 - BTREE_ITER_PREFETCH); 1490 + BTREE_ITER_prefetch); 1488 1491 while (1) { 1489 1492 bch2_trans_begin(trans); 1490 1493 k = bch2_btree_iter_peek(&iter); ··· 1516 1515 1517 1516 ret = for_each_btree_key_commit(trans, iter, 1518 1517 BTREE_ID_bucket_gens, POS_MIN, 1519 - BTREE_ITER_PREFETCH, k, 1518 + BTREE_ITER_prefetch, k, 1520 1519 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 1521 1520 bch2_check_bucket_gens_key(trans, &iter, k)); 1522 1521 err: ··· 1563 1562 1564 1563 a_mut->v.io_time[READ] = atomic64_read(&c->io_clock[READ].now); 1565 1564 ret = bch2_trans_update(trans, alloc_iter, 1566 - &a_mut->k_i, BTREE_TRIGGER_NORUN); 1565 + &a_mut->k_i, BTREE_TRIGGER_norun); 1567 1566 if (ret) 1568 1567 goto err; 1569 1568 ··· 1602 1601 { 1603 1602 int ret = bch2_trans_run(c, 1604 1603 for_each_btree_key_commit(trans, iter, BTREE_ID_alloc, 1605 - POS_MIN, BTREE_ITER_PREFETCH, k, 1604 + POS_MIN, BTREE_ITER_prefetch, k, 1606 1605 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 1607 1606 bch2_check_alloc_to_lru_ref(trans, &iter))); 1608 1607 bch_err_fn(c, ret); ··· 1658 1657 bch2_journal_flush_async(&c->journal, NULL); 1659 1658 1660 1659 if (s->ca) 1661 - percpu_ref_put(&s->ca->ref); 1662 - if (ca) 1663 - percpu_ref_get(&ca->ref); 1660 + percpu_ref_put(&s->ca->io_ref); 1664 1661 s->ca = ca; 1665 1662 s->need_journal_commit_this_dev = 0; 1666 1663 } ··· 1672 1673 struct bpos pos = need_discard_iter->pos; 1673 1674 struct btree_iter iter = { NULL }; 1674 1675 struct bkey_s_c k; 1675 - struct bch_dev *ca; 1676 1676 struct bkey_i_alloc_v4 *a; 1677 1677 struct printbuf buf = PRINTBUF; 1678 1678 bool discard_locked = false; 1679 1679 int ret = 0; 1680 1680 1681 - ca = bch_dev_bkey_exists(c, pos.inode); 1682 - 1683 - if (!percpu_ref_tryget(&ca->io_ref)) { 1681 + struct bch_dev *ca = s->ca && s->ca->dev_idx == pos.inode 1682 + ? s->ca 1683 + : bch2_dev_get_ioref(c, pos.inode, WRITE); 1684 + if (!ca) { 1684 1685 bch2_btree_iter_set_pos(need_discard_iter, POS(pos.inode + 1, 0)); 1685 1686 return 0; 1686 1687 } ··· 1702 1703 1703 1704 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_alloc, 1704 1705 need_discard_iter->pos, 1705 - BTREE_ITER_CACHED); 1706 + BTREE_ITER_cached); 1706 1707 ret = bkey_err(k); 1707 1708 if (ret) 1708 1709 goto out; ··· 1712 1713 if (ret) 1713 1714 goto out; 1714 1715 1715 - if (a->v.dirty_sectors) { 1716 + if (bch2_bucket_sectors_total(a->v)) { 1716 1717 if (bch2_trans_inconsistent_on(c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info, 1717 1718 trans, "attempting to discard bucket with dirty data\n%s", 1718 1719 (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) ··· 1770 1771 } 1771 1772 1772 1773 SET_BCH_ALLOC_V4_NEED_DISCARD(&a->v, false); 1773 - a->v.data_type = alloc_data_type(a->v, a->v.data_type); 1774 + alloc_data_type_set(&a->v, a->v.data_type); 1774 1775 write: 1775 1776 ret = bch2_trans_update(trans, &iter, &a->k_i, 0) ?: 1776 1777 bch2_trans_commit(trans, NULL, NULL, ··· 1786 1787 discard_in_flight_remove(c, iter.pos); 1787 1788 s->seen++; 1788 1789 bch2_trans_iter_exit(trans, &iter); 1789 - percpu_ref_put(&ca->io_ref); 1790 1790 printbuf_exit(&buf); 1791 1791 return ret; 1792 1792 } ··· 1825 1827 static int bch2_clear_bucket_needs_discard(struct btree_trans *trans, struct bpos bucket) 1826 1828 { 1827 1829 struct btree_iter iter; 1828 - bch2_trans_iter_init(trans, &iter, BTREE_ID_alloc, bucket, BTREE_ITER_INTENT); 1830 + bch2_trans_iter_init(trans, &iter, BTREE_ID_alloc, bucket, BTREE_ITER_intent); 1829 1831 struct bkey_s_c k = bch2_btree_iter_peek_slot(&iter); 1830 1832 int ret = bkey_err(k); 1831 1833 if (ret) ··· 1838 1840 1839 1841 BUG_ON(a->v.dirty_sectors); 1840 1842 SET_BCH_ALLOC_V4_NEED_DISCARD(&a->v, false); 1841 - a->v.data_type = alloc_data_type(a->v, a->v.data_type); 1843 + alloc_data_type_set(&a->v, a->v.data_type); 1842 1844 1843 1845 ret = bch2_trans_update(trans, &iter, &a->k_i, 0); 1844 1846 err: ··· 1860 1862 if (i->snapshot) 1861 1863 continue; 1862 1864 1863 - ca = bch_dev_bkey_exists(c, i->inode); 1864 - 1865 - if (!percpu_ref_tryget(&ca->io_ref)) { 1865 + ca = bch2_dev_get_ioref(c, i->inode, WRITE); 1866 + if (!ca) { 1866 1867 darray_remove_item(&c->discard_buckets_in_flight, i); 1867 1868 continue; 1868 1869 } ··· 1900 1903 1901 1904 static void bch2_discard_one_bucket_fast(struct bch_fs *c, struct bpos bucket) 1902 1905 { 1903 - struct bch_dev *ca = bch_dev_bkey_exists(c, bucket.inode); 1906 + rcu_read_lock(); 1907 + struct bch_dev *ca = bch2_dev_rcu(c, bucket.inode); 1908 + bool dead = !ca || percpu_ref_is_dying(&ca->io_ref); 1909 + rcu_read_unlock(); 1904 1910 1905 - if (!percpu_ref_is_dying(&ca->io_ref) && 1911 + if (!dead && 1906 1912 !discard_in_flight_add(c, bucket) && 1907 1913 bch2_write_ref_tryget(c, BCH_WRITE_REF_discard_fast) && 1908 1914 !queue_work(c->write_ref_wq, &c->discard_fast_work)) ··· 1918 1918 s64 *nr_to_invalidate) 1919 1919 { 1920 1920 struct bch_fs *c = trans->c; 1921 - struct btree_iter alloc_iter = { NULL }; 1922 1921 struct bkey_i_alloc_v4 *a = NULL; 1923 1922 struct printbuf buf = PRINTBUF; 1924 1923 struct bpos bucket = u64_to_bucket(lru_k.k->p.offset); ··· 1935 1936 if (bch2_bucket_is_open_safe(c, bucket.inode, bucket.offset)) 1936 1937 return 0; 1937 1938 1938 - a = bch2_trans_start_alloc_update(trans, &alloc_iter, bucket); 1939 + a = bch2_trans_start_alloc_update(trans, bucket); 1939 1940 ret = PTR_ERR_OR_ZERO(a); 1940 1941 if (ret) 1941 1942 goto out; ··· 1960 1961 a->v.io_time[READ] = atomic64_read(&c->io_clock[READ].now); 1961 1962 a->v.io_time[WRITE] = atomic64_read(&c->io_clock[WRITE].now); 1962 1963 1963 - ret = bch2_trans_update(trans, &alloc_iter, &a->k_i, 1964 - BTREE_TRIGGER_BUCKET_INVALIDATE) ?: 1965 - bch2_trans_commit(trans, NULL, NULL, 1966 - BCH_WATERMARK_btree| 1967 - BCH_TRANS_COMMIT_no_enospc); 1964 + ret = bch2_trans_commit(trans, NULL, NULL, 1965 + BCH_WATERMARK_btree| 1966 + BCH_TRANS_COMMIT_no_enospc); 1968 1967 if (ret) 1969 1968 goto out; 1970 1969 1971 1970 trace_and_count(c, bucket_invalidate, c, bucket.inode, bucket.offset, cached_sectors); 1972 1971 --*nr_to_invalidate; 1973 1972 out: 1974 - bch2_trans_iter_exit(trans, &alloc_iter); 1975 1973 printbuf_exit(&buf); 1976 1974 return ret; 1977 1975 err: ··· 2010 2014 ret = for_each_btree_key_upto(trans, iter, BTREE_ID_lru, 2011 2015 lru_pos(ca->dev_idx, 0, 0), 2012 2016 lru_pos(ca->dev_idx, U64_MAX, LRU_TIME_MAX), 2013 - BTREE_ITER_INTENT, k, 2017 + BTREE_ITER_intent, k, 2014 2018 invalidate_one_bucket(trans, &iter, k, &nr_to_invalidate)); 2015 2019 2016 2020 if (ret < 0) { 2017 - percpu_ref_put(&ca->ref); 2021 + bch2_dev_put(ca); 2018 2022 break; 2019 2023 } 2020 2024 } ··· 2047 2051 2048 2052 bch2_trans_iter_init(trans, &iter, BTREE_ID_alloc, 2049 2053 POS(ca->dev_idx, max_t(u64, ca->mi.first_bucket, bucket_start)), 2050 - BTREE_ITER_PREFETCH); 2054 + BTREE_ITER_prefetch); 2051 2055 /* 2052 2056 * Scan the alloc btree for every bucket on @ca, and add buckets to the 2053 2057 * freespace/need_discard/need_gc_gens btrees as needed: ··· 2079 2083 struct bch_alloc_v4 a_convert; 2080 2084 const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &a_convert); 2081 2085 2082 - ret = bch2_bucket_do_index(trans, k, a, true) ?: 2086 + ret = bch2_bucket_do_index(trans, ca, k, a, true) ?: 2083 2087 bch2_trans_commit(trans, NULL, NULL, 2084 2088 BCH_TRANS_COMMIT_no_enospc); 2085 2089 if (ret) ··· 2151 2155 2152 2156 ret = bch2_dev_freespace_init(c, ca, 0, ca->mi.nbuckets); 2153 2157 if (ret) { 2154 - percpu_ref_put(&ca->ref); 2158 + bch2_dev_put(ca); 2155 2159 bch_err_fn(c, ret); 2156 2160 return ret; 2157 2161 } ··· 2178 2182 u64 now; 2179 2183 int ret = 0; 2180 2184 2181 - a = bch2_trans_start_alloc_update(trans, &iter, POS(dev, bucket_nr)); 2185 + if (bch2_trans_relock(trans)) 2186 + bch2_trans_begin(trans); 2187 + 2188 + a = bch2_trans_start_alloc_update_noupdate(trans, &iter, POS(dev, bucket_nr)); 2182 2189 ret = PTR_ERR_OR_ZERO(a); 2183 2190 if (ret) 2184 2191 return ret;
+71 -38
fs/bcachefs/alloc_background.h
··· 8 8 #include "debug.h" 9 9 #include "super.h" 10 10 11 - enum bkey_invalid_flags; 11 + enum bch_validate_flags; 12 12 13 13 /* How out of date a pointer gen is allowed to be: */ 14 14 #define BUCKET_GC_GEN_MAX 96U 15 15 16 16 static inline bool bch2_dev_bucket_exists(struct bch_fs *c, struct bpos pos) 17 17 { 18 - struct bch_dev *ca; 19 - 20 - if (!bch2_dev_exists2(c, pos.inode)) 21 - return false; 22 - 23 - ca = bch_dev_bkey_exists(c, pos.inode); 24 - return pos.offset >= ca->mi.first_bucket && 25 - pos.offset < ca->mi.nbuckets; 18 + rcu_read_lock(); 19 + struct bch_dev *ca = bch2_dev_rcu(c, pos.inode); 20 + bool ret = ca && bucket_valid(ca, pos.offset); 21 + rcu_read_unlock(); 22 + return ret; 26 23 } 27 24 28 25 static inline u64 bucket_to_u64(struct bpos bucket) ··· 37 40 return a.gen - a.oldest_gen; 38 41 } 39 42 40 - static inline enum bch_data_type __alloc_data_type(u32 dirty_sectors, 41 - u32 cached_sectors, 42 - u32 stripe, 43 - struct bch_alloc_v4 a, 44 - enum bch_data_type data_type) 43 + static inline void alloc_to_bucket(struct bucket *dst, struct bch_alloc_v4 src) 45 44 { 46 - if (stripe) 47 - return data_type == BCH_DATA_parity ? data_type : BCH_DATA_stripe; 48 - if (dirty_sectors) 49 - return data_type; 50 - if (cached_sectors) 51 - return BCH_DATA_cached; 52 - if (BCH_ALLOC_V4_NEED_DISCARD(&a)) 53 - return BCH_DATA_need_discard; 54 - if (alloc_gc_gen(a) >= BUCKET_GC_GEN_MAX) 55 - return BCH_DATA_need_gc_gens; 56 - return BCH_DATA_free; 45 + dst->gen = src.gen; 46 + dst->data_type = src.data_type; 47 + dst->dirty_sectors = src.dirty_sectors; 48 + dst->cached_sectors = src.cached_sectors; 49 + dst->stripe = src.stripe; 57 50 } 58 51 59 - static inline enum bch_data_type alloc_data_type(struct bch_alloc_v4 a, 60 - enum bch_data_type data_type) 52 + static inline void __bucket_m_to_alloc(struct bch_alloc_v4 *dst, struct bucket src) 61 53 { 62 - return __alloc_data_type(a.dirty_sectors, a.cached_sectors, 63 - a.stripe, a, data_type); 54 + dst->gen = src.gen; 55 + dst->data_type = src.data_type; 56 + dst->dirty_sectors = src.dirty_sectors; 57 + dst->cached_sectors = src.cached_sectors; 58 + dst->stripe = src.stripe; 59 + } 60 + 61 + static inline struct bch_alloc_v4 bucket_m_to_alloc(struct bucket b) 62 + { 63 + struct bch_alloc_v4 ret = {}; 64 + __bucket_m_to_alloc(&ret, b); 65 + return ret; 64 66 } 65 67 66 68 static inline enum bch_data_type bucket_data_type(enum bch_data_type data_type) 67 69 { 68 - return data_type == BCH_DATA_stripe ? BCH_DATA_user : data_type; 70 + switch (data_type) { 71 + case BCH_DATA_cached: 72 + case BCH_DATA_stripe: 73 + return BCH_DATA_user; 74 + default: 75 + return data_type; 76 + } 69 77 } 70 78 71 - static inline unsigned bch2_bucket_sectors(struct bch_alloc_v4 a) 79 + static inline bool bucket_data_type_mismatch(enum bch_data_type bucket, 80 + enum bch_data_type ptr) 81 + { 82 + return !data_type_is_empty(bucket) && 83 + bucket_data_type(bucket) != bucket_data_type(ptr); 84 + } 85 + 86 + static inline unsigned bch2_bucket_sectors_total(struct bch_alloc_v4 a) 72 87 { 73 88 return a.dirty_sectors + a.cached_sectors; 74 89 } ··· 96 87 int d = bch2_bucket_sectors_dirty(a); 97 88 98 89 return d ? max(0, ca->mi.bucket_size - d) : 0; 90 + } 91 + 92 + static inline enum bch_data_type alloc_data_type(struct bch_alloc_v4 a, 93 + enum bch_data_type data_type) 94 + { 95 + if (a.stripe) 96 + return data_type == BCH_DATA_parity ? data_type : BCH_DATA_stripe; 97 + if (a.dirty_sectors) 98 + return data_type; 99 + if (a.cached_sectors) 100 + return BCH_DATA_cached; 101 + if (BCH_ALLOC_V4_NEED_DISCARD(&a)) 102 + return BCH_DATA_need_discard; 103 + if (alloc_gc_gen(a) >= BUCKET_GC_GEN_MAX) 104 + return BCH_DATA_need_gc_gens; 105 + return BCH_DATA_free; 106 + } 107 + 108 + static inline void alloc_data_type_set(struct bch_alloc_v4 *a, enum bch_data_type data_type) 109 + { 110 + a->data_type = alloc_data_type(*a, data_type); 99 111 } 100 112 101 113 static inline u64 alloc_lru_idx_read(struct bch_alloc_v4 a) ··· 177 147 } 178 148 179 149 struct bkey_i_alloc_v4 * 180 - bch2_trans_start_alloc_update(struct btree_trans *, struct btree_iter *, struct bpos); 150 + bch2_trans_start_alloc_update_noupdate(struct btree_trans *, struct btree_iter *, struct bpos); 151 + struct bkey_i_alloc_v4 * 152 + bch2_trans_start_alloc_update(struct btree_trans *, struct bpos); 181 153 182 154 void __bch2_alloc_to_v4(struct bkey_s_c, struct bch_alloc_v4 *); 183 155 ··· 205 173 int bch2_bucket_io_time_reset(struct btree_trans *, unsigned, size_t, int); 206 174 207 175 int bch2_alloc_v1_invalid(struct bch_fs *, struct bkey_s_c, 208 - enum bkey_invalid_flags, struct printbuf *); 176 + enum bch_validate_flags, struct printbuf *); 209 177 int bch2_alloc_v2_invalid(struct bch_fs *, struct bkey_s_c, 210 - enum bkey_invalid_flags, struct printbuf *); 178 + enum bch_validate_flags, struct printbuf *); 211 179 int bch2_alloc_v3_invalid(struct bch_fs *, struct bkey_s_c, 212 - enum bkey_invalid_flags, struct printbuf *); 180 + enum bch_validate_flags, struct printbuf *); 213 181 int bch2_alloc_v4_invalid(struct bch_fs *, struct bkey_s_c, 214 - enum bkey_invalid_flags, struct printbuf *); 182 + enum bch_validate_flags, struct printbuf *); 215 183 void bch2_alloc_v4_swab(struct bkey_s); 216 184 void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 217 185 ··· 245 213 }) 246 214 247 215 int bch2_bucket_gens_invalid(struct bch_fs *, struct bkey_s_c, 248 - enum bkey_invalid_flags, struct printbuf *); 216 + enum bch_validate_flags, struct printbuf *); 249 217 void bch2_bucket_gens_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 250 218 251 219 #define bch2_bkey_ops_bucket_gens ((struct bkey_ops) { \ ··· 265 233 int bch2_alloc_read(struct bch_fs *); 266 234 267 235 int bch2_trigger_alloc(struct btree_trans *, enum btree_id, unsigned, 268 - struct bkey_s_c, struct bkey_s, unsigned); 236 + struct bkey_s_c, struct bkey_s, 237 + enum btree_iter_update_trigger_flags); 269 238 int bch2_check_alloc_info(struct bch_fs *); 270 239 int bch2_check_alloc_to_lru_refs(struct bch_fs *); 271 240 void bch2_do_discards(struct bch_fs *);
+238 -66
fs/bcachefs/alloc_foreground.c
··· 71 71 { 72 72 rcu_read_lock(); 73 73 for_each_member_device_rcu(c, ca, NULL) 74 - ca->alloc_cursor = 0; 74 + memset(ca->alloc_cursor, 0, sizeof(ca->alloc_cursor)); 75 75 rcu_read_unlock(); 76 76 } 77 77 ··· 100 100 101 101 void __bch2_open_bucket_put(struct bch_fs *c, struct open_bucket *ob) 102 102 { 103 - struct bch_dev *ca = bch_dev_bkey_exists(c, ob->dev); 103 + struct bch_dev *ca = ob_dev(c, ob); 104 104 105 105 if (ob->ec) { 106 106 ec_stripe_new_put(c, ob->ec, STRIPE_REF_io); ··· 300 300 301 301 k = bch2_bkey_get_iter(trans, &iter, 302 302 BTREE_ID_alloc, POS(ca->dev_idx, b), 303 - BTREE_ITER_CACHED); 303 + BTREE_ITER_cached); 304 304 ret = bkey_err(k); 305 305 if (ret) { 306 306 ob = ERR_PTR(ret); ··· 342 342 struct bch_backpointer bp; 343 343 struct bpos bp_pos = POS_MIN; 344 344 345 - ret = bch2_get_next_backpointer(trans, POS(ca->dev_idx, b), -1, 345 + ret = bch2_get_next_backpointer(trans, ca, POS(ca->dev_idx, b), -1, 346 346 &bp_pos, &bp, 347 - BTREE_ITER_NOPRESERVE); 347 + BTREE_ITER_nopreserve); 348 348 if (ret) { 349 349 ob = ERR_PTR(ret); 350 350 goto err; ··· 363 363 364 364 ob = __try_alloc_bucket(c, ca, b, watermark, a, s, cl); 365 365 if (!ob) 366 - set_btree_iter_dontneed(&iter); 366 + bch2_set_btree_iter_dontneed(&iter); 367 367 err: 368 368 if (iter.path) 369 - set_btree_iter_dontneed(&iter); 369 + bch2_set_btree_iter_dontneed(&iter); 370 370 bch2_trans_iter_exit(trans, &iter); 371 371 printbuf_exit(&buf); 372 372 return ob; ··· 389 389 struct bkey_s_c k, ck; 390 390 struct open_bucket *ob = NULL; 391 391 u64 first_bucket = max_t(u64, ca->mi.first_bucket, ca->new_fs_bucket_idx); 392 - u64 alloc_start = max(first_bucket, READ_ONCE(ca->alloc_cursor)); 392 + u64 *dev_alloc_cursor = &ca->alloc_cursor[s->btree_bitmap]; 393 + u64 alloc_start = max(first_bucket, *dev_alloc_cursor); 393 394 u64 alloc_cursor = alloc_start; 394 395 int ret; 395 396 ··· 405 404 */ 406 405 again: 407 406 for_each_btree_key_norestart(trans, iter, BTREE_ID_alloc, POS(ca->dev_idx, alloc_cursor), 408 - BTREE_ITER_SLOTS, k, ret) { 409 - struct bch_alloc_v4 a_convert; 410 - const struct bch_alloc_v4 *a; 407 + BTREE_ITER_slots, k, ret) { 408 + u64 bucket = k.k->p.offset; 411 409 412 410 if (bkey_ge(k.k->p, POS(ca->dev_idx, ca->mi.nbuckets))) 413 411 break; ··· 415 415 is_superblock_bucket(ca, k.k->p.offset)) 416 416 continue; 417 417 418 - a = bch2_alloc_to_v4(k, &a_convert); 418 + if (s->btree_bitmap != BTREE_BITMAP_ANY && 419 + s->btree_bitmap != bch2_dev_btree_bitmap_marked_sectors(ca, 420 + bucket_to_sector(ca, bucket), ca->mi.bucket_size)) { 421 + if (s->btree_bitmap == BTREE_BITMAP_YES && 422 + bucket_to_sector(ca, bucket) > 64ULL << ca->mi.btree_bitmap_shift) 423 + break; 424 + 425 + bucket = sector_to_bucket(ca, 426 + round_up(bucket_to_sector(ca, bucket) + 1, 427 + 1ULL << ca->mi.btree_bitmap_shift)); 428 + bch2_btree_iter_set_pos(&iter, POS(ca->dev_idx, bucket)); 429 + s->buckets_seen++; 430 + s->skipped_mi_btree_bitmap++; 431 + continue; 432 + } 433 + 434 + struct bch_alloc_v4 a_convert; 435 + const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &a_convert); 419 436 if (a->data_type != BCH_DATA_free) 420 437 continue; 421 438 422 439 /* now check the cached key to serialize concurrent allocs of the bucket */ 423 - ck = bch2_bkey_get_iter(trans, &citer, BTREE_ID_alloc, k.k->p, BTREE_ITER_CACHED); 440 + ck = bch2_bkey_get_iter(trans, &citer, BTREE_ID_alloc, k.k->p, BTREE_ITER_cached); 424 441 ret = bkey_err(ck); 425 442 if (ret) 426 443 break; ··· 450 433 451 434 ob = __try_alloc_bucket(trans->c, ca, k.k->p.offset, watermark, a, s, cl); 452 435 next: 453 - set_btree_iter_dontneed(&citer); 436 + bch2_set_btree_iter_dontneed(&citer); 454 437 bch2_trans_iter_exit(trans, &citer); 455 438 if (ob) 456 439 break; ··· 458 441 bch2_trans_iter_exit(trans, &iter); 459 442 460 443 alloc_cursor = iter.pos.offset; 461 - ca->alloc_cursor = alloc_cursor; 462 444 463 445 if (!ob && ret) 464 446 ob = ERR_PTR(ret); ··· 466 450 alloc_cursor = alloc_start = first_bucket; 467 451 goto again; 468 452 } 453 + 454 + *dev_alloc_cursor = alloc_cursor; 469 455 470 456 return ob; 471 457 } ··· 481 463 struct btree_iter iter; 482 464 struct bkey_s_c k; 483 465 struct open_bucket *ob = NULL; 484 - u64 alloc_start = max_t(u64, ca->mi.first_bucket, READ_ONCE(ca->alloc_cursor)); 466 + u64 *dev_alloc_cursor = &ca->alloc_cursor[s->btree_bitmap]; 467 + u64 alloc_start = max_t(u64, ca->mi.first_bucket, READ_ONCE(*dev_alloc_cursor)); 485 468 u64 alloc_cursor = alloc_start; 486 469 int ret; 487 470 ··· 504 485 505 486 s->buckets_seen++; 506 487 488 + u64 bucket = alloc_cursor & ~(~0ULL << 56); 489 + if (s->btree_bitmap != BTREE_BITMAP_ANY && 490 + s->btree_bitmap != bch2_dev_btree_bitmap_marked_sectors(ca, 491 + bucket_to_sector(ca, bucket), ca->mi.bucket_size)) { 492 + if (s->btree_bitmap == BTREE_BITMAP_YES && 493 + bucket_to_sector(ca, bucket) > 64ULL << ca->mi.btree_bitmap_shift) 494 + goto fail; 495 + 496 + bucket = sector_to_bucket(ca, 497 + round_up(bucket_to_sector(ca, bucket) + 1, 498 + 1ULL << ca->mi.btree_bitmap_shift)); 499 + u64 genbits = alloc_cursor >> 56; 500 + alloc_cursor = bucket | (genbits << 56); 501 + 502 + if (alloc_cursor > k.k->p.offset) 503 + bch2_btree_iter_set_pos(&iter, POS(ca->dev_idx, alloc_cursor)); 504 + s->skipped_mi_btree_bitmap++; 505 + continue; 506 + } 507 + 507 508 ob = try_alloc_bucket(trans, ca, watermark, 508 509 alloc_cursor, s, k, cl); 509 510 if (ob) { 510 - set_btree_iter_dontneed(&iter); 511 + bch2_set_btree_iter_dontneed(&iter); 511 512 break; 512 513 } 513 514 } ··· 535 496 if (ob || ret) 536 497 break; 537 498 } 499 + fail: 538 500 bch2_trans_iter_exit(trans, &iter); 539 - 540 - ca->alloc_cursor = alloc_cursor; 541 501 542 502 if (!ob && ret) 543 503 ob = ERR_PTR(ret); ··· 546 508 goto again; 547 509 } 548 510 511 + *dev_alloc_cursor = alloc_cursor; 512 + 549 513 return ob; 514 + } 515 + 516 + static noinline void trace_bucket_alloc2(struct bch_fs *c, struct bch_dev *ca, 517 + enum bch_watermark watermark, 518 + enum bch_data_type data_type, 519 + struct closure *cl, 520 + struct bch_dev_usage *usage, 521 + struct bucket_alloc_state *s, 522 + struct open_bucket *ob) 523 + { 524 + struct printbuf buf = PRINTBUF; 525 + 526 + printbuf_tabstop_push(&buf, 24); 527 + 528 + prt_printf(&buf, "dev\t%s (%u)\n", ca->name, ca->dev_idx); 529 + prt_printf(&buf, "watermark\t%s\n", bch2_watermarks[watermark]); 530 + prt_printf(&buf, "data type\t%s\n", __bch2_data_types[data_type]); 531 + prt_printf(&buf, "blocking\t%u\n", cl != NULL); 532 + prt_printf(&buf, "free\t%llu\n", usage->d[BCH_DATA_free].buckets); 533 + prt_printf(&buf, "avail\t%llu\n", dev_buckets_free(ca, *usage, watermark)); 534 + prt_printf(&buf, "copygc_wait\t%lu/%lli\n", 535 + bch2_copygc_wait_amount(c), 536 + c->copygc_wait - atomic64_read(&c->io_clock[WRITE].now)); 537 + prt_printf(&buf, "seen\t%llu\n", s->buckets_seen); 538 + prt_printf(&buf, "open\t%llu\n", s->skipped_open); 539 + prt_printf(&buf, "need journal commit\t%llu\n", s->skipped_need_journal_commit); 540 + prt_printf(&buf, "nocow\t%llu\n", s->skipped_nocow); 541 + prt_printf(&buf, "nouse\t%llu\n", s->skipped_nouse); 542 + prt_printf(&buf, "mi_btree_bitmap\t%llu\n", s->skipped_mi_btree_bitmap); 543 + 544 + if (!IS_ERR(ob)) { 545 + prt_printf(&buf, "allocated\t%llu\n", ob->bucket); 546 + trace_bucket_alloc(c, buf.buf); 547 + } else { 548 + prt_printf(&buf, "err\t%s\n", bch2_err_str(PTR_ERR(ob))); 549 + trace_bucket_alloc_fail(c, buf.buf); 550 + } 551 + 552 + printbuf_exit(&buf); 550 553 } 551 554 552 555 /** ··· 595 516 * @trans: transaction object 596 517 * @ca: device to allocate from 597 518 * @watermark: how important is this allocation? 519 + * @data_type: BCH_DATA_journal, btree, user... 598 520 * @cl: if not NULL, closure to be used to wait if buckets not available 599 521 * @usage: for secondarily also returning the current device usage 600 522 * ··· 604 524 static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans, 605 525 struct bch_dev *ca, 606 526 enum bch_watermark watermark, 527 + enum bch_data_type data_type, 607 528 struct closure *cl, 608 529 struct bch_dev_usage *usage) 609 530 { ··· 612 531 struct open_bucket *ob = NULL; 613 532 bool freespace = READ_ONCE(ca->mi.freespace_initialized); 614 533 u64 avail; 615 - struct bucket_alloc_state s = { 0 }; 534 + struct bucket_alloc_state s = { 535 + .btree_bitmap = data_type == BCH_DATA_btree, 536 + }; 616 537 bool waiting = false; 617 538 again: 618 539 bch2_dev_usage_read_fast(ca, usage); ··· 624 541 bch2_do_discards(c); 625 542 626 543 if (usage->d[BCH_DATA_need_gc_gens].buckets > avail) 627 - bch2_do_gc_gens(c); 544 + bch2_gc_gens_async(c); 628 545 629 546 if (should_invalidate_buckets(ca, *usage)) 630 547 bch2_do_invalidates(c); ··· 652 569 if (s.skipped_need_journal_commit * 2 > avail) 653 570 bch2_journal_flush_async(&c->journal, NULL); 654 571 572 + if (!ob && s.btree_bitmap != BTREE_BITMAP_ANY) { 573 + s.btree_bitmap = BTREE_BITMAP_ANY; 574 + goto alloc; 575 + } 576 + 655 577 if (!ob && freespace && c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_alloc_info) { 656 578 freespace = false; 657 579 goto alloc; ··· 666 578 ob = ERR_PTR(-BCH_ERR_no_buckets_found); 667 579 668 580 if (!IS_ERR(ob)) 669 - trace_and_count(c, bucket_alloc, ca, 670 - bch2_watermarks[watermark], 671 - ob->bucket, 672 - usage->d[BCH_DATA_free].buckets, 673 - avail, 674 - bch2_copygc_wait_amount(c), 675 - c->copygc_wait - atomic64_read(&c->io_clock[WRITE].now), 676 - &s, 677 - cl == NULL, 678 - ""); 581 + ob->data_type = data_type; 582 + 583 + if (!IS_ERR(ob)) 584 + count_event(c, bucket_alloc); 679 585 else if (!bch2_err_matches(PTR_ERR(ob), BCH_ERR_transaction_restart)) 680 - trace_and_count(c, bucket_alloc_fail, ca, 681 - bch2_watermarks[watermark], 682 - 0, 683 - usage->d[BCH_DATA_free].buckets, 684 - avail, 685 - bch2_copygc_wait_amount(c), 686 - c->copygc_wait - atomic64_read(&c->io_clock[WRITE].now), 687 - &s, 688 - cl == NULL, 689 - bch2_err_str(PTR_ERR(ob))); 586 + count_event(c, bucket_alloc_fail); 587 + 588 + if (!IS_ERR(ob) 589 + ? trace_bucket_alloc_enabled() 590 + : trace_bucket_alloc_fail_enabled()) 591 + trace_bucket_alloc2(c, ca, watermark, data_type, cl, usage, &s, ob); 690 592 691 593 return ob; 692 594 } 693 595 694 596 struct open_bucket *bch2_bucket_alloc(struct bch_fs *c, struct bch_dev *ca, 695 597 enum bch_watermark watermark, 598 + enum bch_data_type data_type, 696 599 struct closure *cl) 697 600 { 698 601 struct bch_dev_usage usage; ··· 691 612 692 613 bch2_trans_do(c, NULL, NULL, 0, 693 614 PTR_ERR_OR_ZERO(ob = bch2_bucket_alloc_trans(trans, ca, watermark, 694 - cl, &usage))); 615 + data_type, cl, &usage))); 695 616 return ob; 696 617 } 697 618 ··· 757 678 unsigned flags, 758 679 struct open_bucket *ob) 759 680 { 760 - unsigned durability = 761 - bch_dev_bkey_exists(c, ob->dev)->mi.durability; 681 + unsigned durability = ob_dev(c, ob)->mi.durability; 762 682 763 683 BUG_ON(*nr_effective >= nr_replicas); 764 684 ··· 789 711 struct bch_fs *c = trans->c; 790 712 struct dev_alloc_list devs_sorted = 791 713 bch2_dev_alloc_list(c, stripe, devs_may_alloc); 792 - unsigned dev; 793 - struct bch_dev *ca; 794 714 int ret = -BCH_ERR_insufficient_devices; 795 - unsigned i; 796 715 797 716 BUG_ON(*nr_effective >= nr_replicas); 798 717 799 - for (i = 0; i < devs_sorted.nr; i++) { 718 + for (unsigned i = 0; i < devs_sorted.nr; i++) { 800 719 struct bch_dev_usage usage; 801 720 struct open_bucket *ob; 802 721 803 - dev = devs_sorted.devs[i]; 804 - 805 - rcu_read_lock(); 806 - ca = rcu_dereference(c->devs[dev]); 807 - if (ca) 808 - percpu_ref_get(&ca->ref); 809 - rcu_read_unlock(); 810 - 722 + unsigned dev = devs_sorted.devs[i]; 723 + struct bch_dev *ca = bch2_dev_tryget_noerror(c, dev); 811 724 if (!ca) 812 725 continue; 813 726 814 727 if (!ca->mi.durability && *have_cache) { 815 - percpu_ref_put(&ca->ref); 728 + bch2_dev_put(ca); 816 729 continue; 817 730 } 818 731 819 - ob = bch2_bucket_alloc_trans(trans, ca, watermark, cl, &usage); 732 + ob = bch2_bucket_alloc_trans(trans, ca, watermark, data_type, cl, &usage); 820 733 if (!IS_ERR(ob)) 821 734 bch2_dev_stripe_increment_inlined(ca, stripe, &usage); 822 - percpu_ref_put(&ca->ref); 735 + bch2_dev_put(ca); 823 736 824 737 if (IS_ERR(ob)) { 825 738 ret = PTR_ERR(ob); ··· 818 749 break; 819 750 continue; 820 751 } 821 - 822 - ob->data_type = data_type; 823 752 824 753 if (add_new_bucket(c, ptrs, devs_may_alloc, 825 754 nr_replicas, nr_effective, ··· 903 836 bool *have_cache, bool ec, 904 837 struct open_bucket *ob) 905 838 { 906 - struct bch_dev *ca = bch_dev_bkey_exists(c, ob->dev); 839 + struct bch_dev *ca = ob_dev(c, ob); 907 840 908 841 if (!test_bit(ob->dev, devs_may_alloc->d)) 909 842 return false; ··· 973 906 struct open_bucket *ob = c->open_buckets + c->open_buckets_partial[i]; 974 907 975 908 if (want_bucket(c, wp, devs_may_alloc, have_cache, ec, ob)) { 976 - struct bch_dev *ca = bch_dev_bkey_exists(c, ob->dev); 909 + struct bch_dev *ca = ob_dev(c, ob); 977 910 struct bch_dev_usage usage; 978 911 u64 avail; 979 912 ··· 1358 1291 unsigned i; 1359 1292 1360 1293 open_bucket_for_each(c, ptrs, ob, i) { 1361 - unsigned d = bch_dev_bkey_exists(c, ob->dev)->mi.durability; 1294 + unsigned d = ob_dev(c, ob)->mi.durability; 1362 1295 1363 1296 if (d && d <= extra_replicas) { 1364 1297 extra_replicas -= d; ··· 1408 1341 have_cache = false; 1409 1342 1410 1343 *wp_ret = wp = writepoint_find(trans, write_point.v); 1344 + 1345 + ret = bch2_trans_relock(trans); 1346 + if (ret) 1347 + goto err; 1411 1348 1412 1349 /* metadata may not allocate on cache devices: */ 1413 1350 if (wp->data_type != BCH_DATA_user) ··· 1515 1444 1516 1445 struct bch_extent_ptr bch2_ob_ptr(struct bch_fs *c, struct open_bucket *ob) 1517 1446 { 1518 - struct bch_dev *ca = bch_dev_bkey_exists(c, ob->dev); 1447 + struct bch_dev *ca = ob_dev(c, ob); 1519 1448 1520 1449 return (struct bch_extent_ptr) { 1521 1450 .type = 1 << BCH_EXTENT_ENTRY_ptr, ··· 1591 1520 1592 1521 static void bch2_open_bucket_to_text(struct printbuf *out, struct bch_fs *c, struct open_bucket *ob) 1593 1522 { 1594 - struct bch_dev *ca = bch_dev_bkey_exists(c, ob->dev); 1523 + struct bch_dev *ca = ob_dev(c, ob); 1595 1524 unsigned data_type = ob->data_type; 1596 1525 barrier(); /* READ_ONCE() doesn't work on bitfields */ 1597 1526 ··· 1692 1621 1693 1622 prt_str(out, "Btree write point\n"); 1694 1623 bch2_write_point_to_text(out, c, &c->btree_write_point); 1624 + } 1625 + 1626 + void bch2_fs_alloc_debug_to_text(struct printbuf *out, struct bch_fs *c) 1627 + { 1628 + unsigned nr[BCH_DATA_NR]; 1629 + 1630 + memset(nr, 0, sizeof(nr)); 1631 + 1632 + for (unsigned i = 0; i < ARRAY_SIZE(c->open_buckets); i++) 1633 + nr[c->open_buckets[i].data_type]++; 1634 + 1635 + printbuf_tabstop_push(out, 24); 1636 + 1637 + percpu_down_read(&c->mark_lock); 1638 + prt_printf(out, "hidden\t%llu\n", bch2_fs_usage_read_one(c, &c->usage_base->b.hidden)); 1639 + prt_printf(out, "btree\t%llu\n", bch2_fs_usage_read_one(c, &c->usage_base->b.btree)); 1640 + prt_printf(out, "data\t%llu\n", bch2_fs_usage_read_one(c, &c->usage_base->b.data)); 1641 + prt_printf(out, "cached\t%llu\n", bch2_fs_usage_read_one(c, &c->usage_base->b.cached)); 1642 + prt_printf(out, "reserved\t%llu\n", bch2_fs_usage_read_one(c, &c->usage_base->b.reserved)); 1643 + prt_printf(out, "online_reserved\t%llu\n", percpu_u64_get(c->online_reserved)); 1644 + prt_printf(out, "nr_inodes\t%llu\n", bch2_fs_usage_read_one(c, &c->usage_base->b.nr_inodes)); 1645 + percpu_up_read(&c->mark_lock); 1646 + 1647 + prt_newline(out); 1648 + prt_printf(out, "freelist_wait\t%s\n", c->freelist_wait.list.first ? "waiting" : "empty"); 1649 + prt_printf(out, "open buckets allocated\t%i\n", OPEN_BUCKETS_COUNT - c->open_buckets_nr_free); 1650 + prt_printf(out, "open buckets total\t%u\n", OPEN_BUCKETS_COUNT); 1651 + prt_printf(out, "open_buckets_wait\t%s\n", c->open_buckets_wait.list.first ? "waiting" : "empty"); 1652 + prt_printf(out, "open_buckets_btree\t%u\n", nr[BCH_DATA_btree]); 1653 + prt_printf(out, "open_buckets_user\t%u\n", nr[BCH_DATA_user]); 1654 + prt_printf(out, "btree reserve cache\t%u\n", c->btree_reserve_cache_nr); 1655 + } 1656 + 1657 + void bch2_dev_alloc_debug_to_text(struct printbuf *out, struct bch_dev *ca) 1658 + { 1659 + struct bch_fs *c = ca->fs; 1660 + struct bch_dev_usage stats = bch2_dev_usage_read(ca); 1661 + unsigned nr[BCH_DATA_NR]; 1662 + 1663 + memset(nr, 0, sizeof(nr)); 1664 + 1665 + for (unsigned i = 0; i < ARRAY_SIZE(c->open_buckets); i++) 1666 + nr[c->open_buckets[i].data_type]++; 1667 + 1668 + printbuf_tabstop_push(out, 12); 1669 + printbuf_tabstop_push(out, 16); 1670 + printbuf_tabstop_push(out, 16); 1671 + printbuf_tabstop_push(out, 16); 1672 + printbuf_tabstop_push(out, 16); 1673 + 1674 + bch2_dev_usage_to_text(out, &stats); 1675 + 1676 + prt_newline(out); 1677 + 1678 + prt_printf(out, "reserves:\n"); 1679 + for (unsigned i = 0; i < BCH_WATERMARK_NR; i++) 1680 + prt_printf(out, "%s\t%llu\r\n", bch2_watermarks[i], bch2_dev_buckets_reserved(ca, i)); 1681 + 1682 + prt_newline(out); 1683 + 1684 + printbuf_tabstops_reset(out); 1685 + printbuf_tabstop_push(out, 12); 1686 + printbuf_tabstop_push(out, 16); 1687 + 1688 + prt_printf(out, "open buckets\t%i\r\n", ca->nr_open_buckets); 1689 + prt_printf(out, "buckets to invalidate\t%llu\r\n", should_invalidate_buckets(ca, stats)); 1690 + } 1691 + 1692 + void bch2_print_allocator_stuck(struct bch_fs *c) 1693 + { 1694 + struct printbuf buf = PRINTBUF; 1695 + 1696 + prt_printf(&buf, "Allocator stuck? Waited for 10 seconds\n"); 1697 + 1698 + prt_printf(&buf, "Allocator debug:\n"); 1699 + printbuf_indent_add(&buf, 2); 1700 + bch2_fs_alloc_debug_to_text(&buf, c); 1701 + printbuf_indent_sub(&buf, 2); 1702 + prt_newline(&buf); 1703 + 1704 + for_each_online_member(c, ca) { 1705 + prt_printf(&buf, "Dev %u:\n", ca->dev_idx); 1706 + printbuf_indent_add(&buf, 2); 1707 + bch2_dev_alloc_debug_to_text(&buf, ca); 1708 + printbuf_indent_sub(&buf, 2); 1709 + prt_newline(&buf); 1710 + } 1711 + 1712 + prt_printf(&buf, "Copygc debug:\n"); 1713 + printbuf_indent_add(&buf, 2); 1714 + bch2_copygc_wait_to_text(&buf, c); 1715 + printbuf_indent_sub(&buf, 2); 1716 + prt_newline(&buf); 1717 + 1718 + prt_printf(&buf, "Journal debug:\n"); 1719 + printbuf_indent_add(&buf, 2); 1720 + bch2_journal_debug_to_text(&buf, &c->journal); 1721 + printbuf_indent_sub(&buf, 2); 1722 + 1723 + bch2_print_string_as_lines(KERN_ERR, buf.buf); 1724 + printbuf_exit(&buf); 1695 1725 }
+13 -2
fs/bcachefs/alloc_foreground.h
··· 30 30 31 31 long bch2_bucket_alloc_new_fs(struct bch_dev *); 32 32 33 + static inline struct bch_dev *ob_dev(struct bch_fs *c, struct open_bucket *ob) 34 + { 35 + return bch2_dev_have_ref(c, ob->dev); 36 + } 37 + 33 38 struct open_bucket *bch2_bucket_alloc(struct bch_fs *, struct bch_dev *, 34 - enum bch_watermark, struct closure *); 39 + enum bch_watermark, enum bch_data_type, 40 + struct closure *); 35 41 36 42 static inline void ob_push(struct bch_fs *c, struct open_buckets *obs, 37 43 struct open_bucket *ob) ··· 190 184 wp->sectors_allocated += sectors; 191 185 192 186 open_bucket_for_each(c, &wp->ptrs, ob, i) { 193 - struct bch_dev *ca = bch_dev_bkey_exists(c, ob->dev); 187 + struct bch_dev *ca = ob_dev(c, ob); 194 188 struct bch_extent_ptr ptr = bch2_ob_ptr(c, ob); 195 189 196 190 ptr.cached = cached || ··· 226 220 void bch2_open_buckets_partial_to_text(struct printbuf *, struct bch_fs *); 227 221 228 222 void bch2_write_points_to_text(struct printbuf *, struct bch_fs *); 223 + 224 + void bch2_fs_alloc_debug_to_text(struct printbuf *, struct bch_fs *); 225 + void bch2_dev_alloc_debug_to_text(struct printbuf *, struct bch_dev *); 226 + 227 + void bch2_print_allocator_stuck(struct bch_fs *); 229 228 230 229 #endif /* _BCACHEFS_ALLOC_FOREGROUND_H */
+7
fs/bcachefs/alloc_types.h
··· 9 9 #include "fifo.h" 10 10 11 11 struct bucket_alloc_state { 12 + enum { 13 + BTREE_BITMAP_NO, 14 + BTREE_BITMAP_YES, 15 + BTREE_BITMAP_ANY, 16 + } btree_bitmap; 17 + 12 18 u64 buckets_seen; 13 19 u64 skipped_open; 14 20 u64 skipped_need_journal_commit; 15 21 u64 skipped_nocow; 16 22 u64 skipped_nouse; 23 + u64 skipped_mi_btree_bitmap; 17 24 }; 18 25 19 26 #define BCH_WATERMARKS() \
+88 -70
fs/bcachefs/backpointers.c
··· 23 23 const union bch_extent_entry *entry; 24 24 struct extent_ptr_decoded p; 25 25 26 + rcu_read_lock(); 26 27 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { 27 28 struct bpos bucket2; 28 29 struct bch_backpointer bp2; ··· 31 30 if (p.ptr.cached) 32 31 continue; 33 32 34 - bch2_extent_ptr_to_bp(c, btree_id, level, k, p, entry, &bucket2, &bp2); 33 + struct bch_dev *ca = bch2_dev_rcu(c, p.ptr.dev); 34 + if (!ca) 35 + continue; 36 + 37 + bch2_extent_ptr_to_bp(c, ca, btree_id, level, k, p, entry, &bucket2, &bp2); 35 38 if (bpos_eq(bucket, bucket2) && 36 - !memcmp(&bp, &bp2, sizeof(bp))) 39 + !memcmp(&bp, &bp2, sizeof(bp))) { 40 + rcu_read_unlock(); 37 41 return true; 42 + } 38 43 } 44 + rcu_read_unlock(); 39 45 40 46 return false; 41 47 } 42 48 43 49 int bch2_backpointer_invalid(struct bch_fs *c, struct bkey_s_c k, 44 - enum bkey_invalid_flags flags, 50 + enum bch_validate_flags flags, 45 51 struct printbuf *err) 46 52 { 47 53 struct bkey_s_c_backpointer bp = bkey_s_c_to_backpointer(k); 48 54 49 - /* these will be caught by fsck */ 50 - if (!bch2_dev_exists2(c, bp.k->p.inode)) 55 + rcu_read_lock(); 56 + struct bch_dev *ca = bch2_dev_rcu(c, bp.k->p.inode); 57 + if (!ca) { 58 + /* these will be caught by fsck */ 59 + rcu_read_unlock(); 51 60 return 0; 61 + } 52 62 53 - struct bch_dev *ca = bch_dev_bkey_exists(c, bp.k->p.inode); 54 - struct bpos bucket = bp_pos_to_bucket(c, bp.k->p); 63 + struct bpos bucket = bp_pos_to_bucket(ca, bp.k->p); 64 + struct bpos bp_pos = bucket_pos_to_bp_noerror(ca, bucket, bp.v->bucket_offset); 65 + rcu_read_unlock(); 55 66 int ret = 0; 56 67 57 68 bkey_fsck_err_on((bp.v->bucket_offset >> MAX_EXTENT_COMPRESS_RATIO_SHIFT) >= ca->mi.bucket_size || 58 - !bpos_eq(bp.k->p, bucket_pos_to_bp_noerror(ca, bucket, bp.v->bucket_offset)), 69 + !bpos_eq(bp.k->p, bp_pos), 59 70 c, err, 60 71 backpointer_bucket_offset_wrong, 61 72 "backpointer bucket_offset wrong"); ··· 88 75 89 76 void bch2_backpointer_k_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k) 90 77 { 91 - if (bch2_dev_exists2(c, k.k->p.inode)) { 78 + rcu_read_lock(); 79 + struct bch_dev *ca = bch2_dev_rcu(c, k.k->p.inode); 80 + if (ca) { 81 + struct bpos bucket = bp_pos_to_bucket(ca, k.k->p); 82 + rcu_read_unlock(); 92 83 prt_str(out, "bucket="); 93 - bch2_bpos_to_text(out, bp_pos_to_bucket(c, k.k->p)); 84 + bch2_bpos_to_text(out, bucket); 94 85 prt_str(out, " "); 86 + } else { 87 + rcu_read_unlock(); 95 88 } 96 89 97 90 bch2_backpointer_to_text(out, bkey_s_c_to_backpointer(k).v); ··· 136 117 137 118 bch_err(c, "%s", buf.buf); 138 119 } else if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers) { 139 - prt_printf(&buf, "backpointer not found when deleting"); 140 - prt_newline(&buf); 120 + prt_printf(&buf, "backpointer not found when deleting\n"); 141 121 printbuf_indent_add(&buf, 2); 142 122 143 123 prt_printf(&buf, "searching for "); ··· 163 145 } 164 146 165 147 int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *trans, 148 + struct bch_dev *ca, 166 149 struct bpos bucket, 167 150 struct bch_backpointer bp, 168 151 struct bkey_s_c orig_k, ··· 180 161 return ret; 181 162 182 163 bkey_backpointer_init(&bp_k->k_i); 183 - bp_k->k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset); 164 + bp_k->k.p = bucket_pos_to_bp(ca, bucket, bp.bucket_offset); 184 165 bp_k->v = bp; 185 166 186 167 if (!insert) { ··· 190 171 191 172 k = bch2_bkey_get_iter(trans, &bp_iter, BTREE_ID_backpointers, 192 173 bp_k->k.p, 193 - BTREE_ITER_INTENT| 194 - BTREE_ITER_SLOTS| 195 - BTREE_ITER_WITH_UPDATES); 174 + BTREE_ITER_intent| 175 + BTREE_ITER_slots| 176 + BTREE_ITER_with_updates); 196 177 ret = bkey_err(k); 197 178 if (ret) 198 179 goto err; ··· 216 197 * Find the next backpointer >= *bp_offset: 217 198 */ 218 199 int bch2_get_next_backpointer(struct btree_trans *trans, 200 + struct bch_dev *ca, 219 201 struct bpos bucket, int gen, 220 202 struct bpos *bp_pos, 221 203 struct bch_backpointer *bp, 222 204 unsigned iter_flags) 223 205 { 224 - struct bch_fs *c = trans->c; 225 - struct bpos bp_end_pos = bucket_pos_to_bp(c, bpos_nosnap_successor(bucket), 0); 206 + struct bpos bp_end_pos = bucket_pos_to_bp(ca, bpos_nosnap_successor(bucket), 0); 226 207 struct btree_iter alloc_iter = { NULL }, bp_iter = { NULL }; 227 208 struct bkey_s_c k; 228 209 int ret = 0; ··· 232 213 233 214 if (gen >= 0) { 234 215 k = bch2_bkey_get_iter(trans, &alloc_iter, BTREE_ID_alloc, 235 - bucket, BTREE_ITER_CACHED|iter_flags); 216 + bucket, BTREE_ITER_cached|iter_flags); 236 217 ret = bkey_err(k); 237 218 if (ret) 238 219 goto out; ··· 242 223 goto done; 243 224 } 244 225 245 - *bp_pos = bpos_max(*bp_pos, bucket_pos_to_bp(c, bucket, 0)); 226 + *bp_pos = bpos_max(*bp_pos, bucket_pos_to_bp(ca, bucket, 0)); 246 227 247 228 for_each_btree_key_norestart(trans, bp_iter, BTREE_ID_backpointers, 248 229 *bp_pos, iter_flags, k, ret) { ··· 268 249 { 269 250 struct bch_fs *c = trans->c; 270 251 struct printbuf buf = PRINTBUF; 271 - struct bpos bucket = bp_pos_to_bucket(c, bp_pos); 272 252 273 253 /* 274 254 * If we're using the btree write buffer, the backpointer we were ··· 275 257 * pointed to is not an error: 276 258 */ 277 259 if (likely(!bch2_backpointers_no_use_write_buffer)) 260 + return; 261 + 262 + struct bpos bucket; 263 + if (!bp_pos_to_bucket_nodev(c, bp_pos, &bucket)) 278 264 return; 279 265 280 266 prt_printf(&buf, "backpointer doesn't match %s it points to:\n ", ··· 310 288 { 311 289 if (likely(!bp.level)) { 312 290 struct bch_fs *c = trans->c; 313 - struct bpos bucket = bp_pos_to_bucket(c, bp_pos); 314 - struct bkey_s_c k; 291 + 292 + struct bpos bucket; 293 + if (!bp_pos_to_bucket_nodev(c, bp_pos, &bucket)) 294 + return bkey_s_c_err(-EIO); 315 295 316 296 bch2_trans_node_iter_init(trans, iter, 317 297 bp.btree_id, 318 298 bp.pos, 319 299 0, 0, 320 300 iter_flags); 321 - k = bch2_btree_iter_peek_slot(iter); 301 + struct bkey_s_c k = bch2_btree_iter_peek_slot(iter); 322 302 if (bkey_err(k)) { 323 303 bch2_trans_iter_exit(trans, iter); 324 304 return k; ··· 349 325 struct bch_backpointer bp) 350 326 { 351 327 struct bch_fs *c = trans->c; 352 - struct bpos bucket = bp_pos_to_bucket(c, bp_pos); 353 - struct btree *b; 354 328 355 329 BUG_ON(!bp.level); 330 + 331 + struct bpos bucket; 332 + if (!bp_pos_to_bucket_nodev(c, bp_pos, &bucket)) 333 + return ERR_PTR(-EIO); 356 334 357 335 bch2_trans_node_iter_init(trans, iter, 358 336 bp.btree_id, ··· 362 336 0, 363 337 bp.level - 1, 364 338 0); 365 - b = bch2_btree_iter_peek_node(iter); 339 + struct btree *b = bch2_btree_iter_peek_node(iter); 366 340 if (IS_ERR_OR_NULL(b)) 367 341 goto err; 368 342 ··· 393 367 struct printbuf buf = PRINTBUF; 394 368 int ret = 0; 395 369 396 - if (fsck_err_on(!bch2_dev_exists2(c, k.k->p.inode), c, 397 - backpointer_to_missing_device, 398 - "backpointer for missing device:\n%s", 399 - (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 400 - ret = bch2_btree_delete_at(trans, bp_iter, 0); 370 + struct bpos bucket; 371 + if (!bp_pos_to_bucket_nodev_noerror(c, k.k->p, &bucket)) { 372 + if (fsck_err(c, backpointer_to_missing_device, 373 + "backpointer for missing device:\n%s", 374 + (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 375 + ret = bch2_btree_delete_at(trans, bp_iter, 0); 401 376 goto out; 402 377 } 403 378 404 - alloc_k = bch2_bkey_get_iter(trans, &alloc_iter, BTREE_ID_alloc, 405 - bp_pos_to_bucket(c, k.k->p), 0); 379 + alloc_k = bch2_bkey_get_iter(trans, &alloc_iter, BTREE_ID_alloc, bucket, 0); 406 380 ret = bkey_err(alloc_k); 407 381 if (ret) 408 382 goto out; ··· 486 460 487 461 bytes = p.crc.compressed_size << 9; 488 462 489 - struct bch_dev *ca = bch_dev_bkey_exists(c, dev); 490 - if (!bch2_dev_get_ioref(ca, READ)) 463 + struct bch_dev *ca = bch2_dev_get_ioref(c, dev, READ); 464 + if (!ca) 491 465 return false; 492 466 493 467 data_buf = kvmalloc(bytes, GFP_KERNEL); ··· 537 511 struct printbuf buf = PRINTBUF; 538 512 struct bkey_s_c bp_k; 539 513 struct bkey_buf tmp; 540 - int ret; 514 + int ret = 0; 541 515 542 516 bch2_bkey_buf_init(&tmp); 543 517 544 - if (!bch2_dev_bucket_exists(c, bucket)) { 518 + struct bch_dev *ca = bch2_dev_bucket_tryget(c, bucket); 519 + if (!ca) { 545 520 prt_str(&buf, "extent for nonexistent device:bucket "); 546 521 bch2_bpos_to_text(&buf, bucket); 547 522 prt_str(&buf, "\n "); 548 523 bch2_bkey_val_to_text(&buf, c, orig_k); 549 524 bch_err(c, "%s", buf.buf); 550 - return -BCH_ERR_fsck_repair_unimplemented; 525 + ret = -BCH_ERR_fsck_repair_unimplemented; 526 + goto err; 551 527 } 552 528 553 529 if (bpos_lt(bucket, s->bucket_start) || 554 530 bpos_gt(bucket, s->bucket_end)) 555 - return 0; 531 + goto out; 556 532 557 533 bp_k = bch2_bkey_get_iter(trans, &bp_iter, BTREE_ID_backpointers, 558 - bucket_pos_to_bp(c, bucket, bp.bucket_offset), 534 + bucket_pos_to_bp(ca, bucket, bp.bucket_offset), 559 535 0); 560 536 ret = bkey_err(bp_k); 561 537 if (ret) ··· 590 562 bch2_trans_iter_exit(trans, &other_extent_iter); 591 563 bch2_trans_iter_exit(trans, &bp_iter); 592 564 bch2_bkey_buf_exit(&tmp, c); 565 + bch2_dev_put(ca); 593 566 printbuf_exit(&buf); 594 567 return ret; 595 568 check_existing_bp: ··· 666 637 667 638 struct bkey_i_backpointer n_bp_k; 668 639 bkey_backpointer_init(&n_bp_k.k_i); 669 - n_bp_k.k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset); 640 + n_bp_k.k.p = bucket_pos_to_bp(ca, bucket, bp.bucket_offset); 670 641 n_bp_k.v = bp; 671 642 prt_printf(&buf, "\n want: "); 672 643 bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&n_bp_k.k_i)); 673 644 674 645 if (fsck_err(c, ptr_to_missing_backpointer, "%s", buf.buf)) 675 - ret = bch2_bucket_backpointer_mod(trans, bucket, bp, orig_k, true); 646 + ret = bch2_bucket_backpointer_mod(trans, ca, bucket, bp, orig_k, true); 676 647 677 648 goto out; 678 649 } ··· 696 667 if (p.ptr.cached) 697 668 continue; 698 669 699 - bch2_extent_ptr_to_bp(c, btree, level, k, p, entry, &bucket_pos, &bp); 670 + rcu_read_lock(); 671 + struct bch_dev *ca = bch2_dev_rcu(c, p.ptr.dev); 672 + if (ca) 673 + bch2_extent_ptr_to_bp(c, ca, btree, level, k, p, entry, &bucket_pos, &bp); 674 + rcu_read_unlock(); 675 + 676 + if (!ca) 677 + continue; 700 678 701 679 ret = check_bp_exists(trans, s, bucket_pos, bp, k); 702 680 if (ret) ··· 796 760 797 761 __for_each_btree_node(trans, iter, btree, 798 762 btree == start.btree ? start.pos : POS_MIN, 799 - 0, depth, BTREE_ITER_PREFETCH, b, ret) { 763 + 0, depth, BTREE_ITER_prefetch, b, ret) { 800 764 mem_may_pin -= btree_buf_bytes(b); 801 765 if (mem_may_pin <= 0) { 802 766 c->btree_cache.pinned_nodes_end = *end = ··· 830 794 831 795 while (level >= depth) { 832 796 struct btree_iter iter; 833 - bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0, 834 - level, 835 - BTREE_ITER_PREFETCH); 836 - while (1) { 837 - bch2_trans_begin(trans); 797 + bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0, level, 798 + BTREE_ITER_prefetch); 838 799 839 - struct bkey_s_c k = bch2_btree_iter_peek(&iter); 840 - if (!k.k) 841 - break; 842 - ret = bkey_err(k) ?: 843 - check_extent_to_backpointers(trans, s, btree_id, level, k) ?: 844 - bch2_trans_commit(trans, NULL, NULL, 845 - BCH_TRANS_COMMIT_no_enospc); 846 - if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) { 847 - ret = 0; 848 - continue; 849 - } 850 - if (ret) 851 - break; 852 - if (bpos_eq(iter.pos, SPOS_MAX)) 853 - break; 854 - bch2_btree_iter_advance(&iter); 855 - } 856 - bch2_trans_iter_exit(trans, &iter); 857 - 800 + ret = for_each_btree_key_continue(trans, iter, 0, k, ({ 801 + check_extent_to_backpointers(trans, s, btree_id, level, k) ?: 802 + bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); 803 + })); 858 804 if (ret) 859 805 return ret; 860 806 ··· 954 936 struct bpos last_flushed_pos = SPOS_MAX; 955 937 956 938 return for_each_btree_key_commit(trans, iter, BTREE_ID_backpointers, 957 - POS_MIN, BTREE_ITER_PREFETCH, k, 939 + POS_MIN, BTREE_ITER_prefetch, k, 958 940 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 959 941 check_one_backpointer(trans, start, end, 960 942 bkey_s_c_to_backpointer(k),
+29 -14
fs/bcachefs/backpointers.h
··· 6 6 #include "btree_iter.h" 7 7 #include "btree_update.h" 8 8 #include "buckets.h" 9 + #include "error.h" 9 10 #include "super.h" 10 11 11 12 static inline u64 swab40(u64 x) ··· 19 18 } 20 19 21 20 int bch2_backpointer_invalid(struct bch_fs *, struct bkey_s_c k, 22 - enum bkey_invalid_flags, struct printbuf *); 21 + enum bch_validate_flags, struct printbuf *); 23 22 void bch2_backpointer_to_text(struct printbuf *, const struct bch_backpointer *); 24 23 void bch2_backpointer_k_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 25 24 void bch2_backpointer_swab(struct bkey_s); ··· 37 36 * Convert from pos in backpointer btree to pos of corresponding bucket in alloc 38 37 * btree: 39 38 */ 40 - static inline struct bpos bp_pos_to_bucket(const struct bch_fs *c, 41 - struct bpos bp_pos) 39 + static inline struct bpos bp_pos_to_bucket(const struct bch_dev *ca, struct bpos bp_pos) 42 40 { 43 - struct bch_dev *ca = bch_dev_bkey_exists(c, bp_pos.inode); 44 41 u64 bucket_sector = bp_pos.offset >> MAX_EXTENT_COMPRESS_RATIO_SHIFT; 45 42 46 43 return POS(bp_pos.inode, sector_to_bucket(ca, bucket_sector)); 44 + } 45 + 46 + static inline bool bp_pos_to_bucket_nodev_noerror(struct bch_fs *c, struct bpos bp_pos, struct bpos *bucket) 47 + { 48 + rcu_read_lock(); 49 + struct bch_dev *ca = bch2_dev_rcu(c, bp_pos.inode); 50 + if (ca) 51 + *bucket = bp_pos_to_bucket(ca, bp_pos); 52 + rcu_read_unlock(); 53 + return ca != NULL; 54 + } 55 + 56 + static inline bool bp_pos_to_bucket_nodev(struct bch_fs *c, struct bpos bp_pos, struct bpos *bucket) 57 + { 58 + return !bch2_fs_inconsistent_on(!bp_pos_to_bucket_nodev_noerror(c, bp_pos, bucket), 59 + c, "backpointer for missing device %llu", bp_pos.inode); 47 60 } 48 61 49 62 static inline struct bpos bucket_pos_to_bp_noerror(const struct bch_dev *ca, ··· 72 57 /* 73 58 * Convert from pos in alloc btree + bucket offset to pos in backpointer btree: 74 59 */ 75 - static inline struct bpos bucket_pos_to_bp(const struct bch_fs *c, 60 + static inline struct bpos bucket_pos_to_bp(const struct bch_dev *ca, 76 61 struct bpos bucket, 77 62 u64 bucket_offset) 78 63 { 79 - struct bch_dev *ca = bch_dev_bkey_exists(c, bucket.inode); 80 64 struct bpos ret = bucket_pos_to_bp_noerror(ca, bucket, bucket_offset); 81 - EBUG_ON(!bkey_eq(bucket, bp_pos_to_bucket(c, ret))); 65 + EBUG_ON(!bkey_eq(bucket, bp_pos_to_bucket(ca, ret))); 82 66 return ret; 83 67 } 84 68 85 - int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *, struct bpos bucket, 86 - struct bch_backpointer, struct bkey_s_c, bool); 69 + int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *, struct bch_dev *, 70 + struct bpos bucket, struct bch_backpointer, struct bkey_s_c, bool); 87 71 88 72 static inline int bch2_bucket_backpointer_mod(struct btree_trans *trans, 73 + struct bch_dev *ca, 89 74 struct bpos bucket, 90 75 struct bch_backpointer bp, 91 76 struct bkey_s_c orig_k, 92 77 bool insert) 93 78 { 94 79 if (unlikely(bch2_backpointers_no_use_write_buffer)) 95 - return bch2_bucket_backpointer_mod_nowritebuffer(trans, bucket, bp, orig_k, insert); 80 + return bch2_bucket_backpointer_mod_nowritebuffer(trans, ca, bucket, bp, orig_k, insert); 96 81 97 82 struct bkey_i_backpointer bp_k; 98 83 99 84 bkey_backpointer_init(&bp_k.k_i); 100 - bp_k.k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset); 85 + bp_k.k.p = bucket_pos_to_bp(ca, bucket, bp.bucket_offset); 101 86 bp_k.v = bp; 102 87 103 88 if (!insert) { ··· 135 120 } 136 121 } 137 122 138 - static inline void bch2_extent_ptr_to_bp(struct bch_fs *c, 123 + static inline void bch2_extent_ptr_to_bp(struct bch_fs *c, struct bch_dev *ca, 139 124 enum btree_id btree_id, unsigned level, 140 125 struct bkey_s_c k, struct extent_ptr_decoded p, 141 126 const union bch_extent_entry *entry, ··· 145 130 s64 sectors = level ? btree_sectors(c) : k.k->size; 146 131 u32 bucket_offset; 147 132 148 - *bucket_pos = PTR_BUCKET_POS_OFFSET(c, &p.ptr, &bucket_offset); 133 + *bucket_pos = PTR_BUCKET_POS_OFFSET(ca, &p.ptr, &bucket_offset); 149 134 *bp = (struct bch_backpointer) { 150 135 .btree_id = btree_id, 151 136 .level = level, ··· 157 142 }; 158 143 } 159 144 160 - int bch2_get_next_backpointer(struct btree_trans *, struct bpos, int, 145 + int bch2_get_next_backpointer(struct btree_trans *, struct bch_dev *ca, struct bpos, int, 161 146 struct bpos *, struct bch_backpointer *, unsigned); 162 147 struct bkey_s_c bch2_backpointer_get_key(struct btree_trans *, struct btree_iter *, 163 148 struct bpos, struct bch_backpointer,
+17 -17
fs/bcachefs/bcachefs.h
··· 359 359 #define BCH_DEBUG_PARAMS_ALWAYS() \ 360 360 BCH_DEBUG_PARAM(key_merging_disabled, \ 361 361 "Disables merging of extents") \ 362 + BCH_DEBUG_PARAM(btree_node_merging_disabled, \ 363 + "Disables merging of btree nodes") \ 362 364 BCH_DEBUG_PARAM(btree_gc_always_rewrite, \ 363 365 "Causes mark and sweep to compact and rewrite every " \ 364 366 "btree node it traverses") \ ··· 470 468 #include "quota_types.h" 471 469 #include "rebalance_types.h" 472 470 #include "replicas_types.h" 471 + #include "sb-members_types.h" 473 472 #include "subvolume_types.h" 474 473 #include "super_types.h" 475 474 #include "thread_with_file_types.h" ··· 519 516 520 517 struct gc_pos { 521 518 enum gc_phase phase; 519 + u16 level; 522 520 struct bpos pos; 523 - unsigned level; 524 521 }; 525 522 526 523 struct reflink_gc { ··· 537 534 538 535 struct bch_dev { 539 536 struct kobject kobj; 537 + #ifdef CONFIG_BCACHEFS_DEBUG 538 + atomic_long_t ref; 539 + bool dying; 540 + unsigned long last_put; 541 + #else 540 542 struct percpu_ref ref; 543 + #endif 541 544 struct completion ref_completion; 542 545 struct percpu_ref io_ref; 543 546 struct completion io_ref_completion; ··· 569 560 570 561 struct bch_devs_mask self; 571 562 572 - /* biosets used in cloned bios for writing multiple replicas */ 573 - struct bio_set replica_set; 574 - 575 563 /* 576 564 * Buckets: 577 565 * Per-bucket arrays are protected by c->mark_lock, bucket_lock and 578 566 * gc_lock, for device resize - holding any is sufficient for access: 579 - * Or rcu_read_lock(), but only for ptr_stale(): 567 + * Or rcu_read_lock(), but only for dev_ptr_stale(): 580 568 */ 581 569 struct bucket_array __rcu *buckets_gc; 582 570 struct bucket_gens __rcu *bucket_gens; ··· 587 581 588 582 /* Allocator: */ 589 583 u64 new_fs_bucket_idx; 590 - u64 alloc_cursor; 584 + u64 alloc_cursor[3]; 591 585 592 586 unsigned nr_open_buckets; 593 587 unsigned nr_btree_reserve; ··· 633 627 x(clean_shutdown) \ 634 628 x(fsck_running) \ 635 629 x(initial_gc_unfixed) \ 636 - x(need_another_gc) \ 637 630 x(need_delete_dead_snapshots) \ 638 631 x(error) \ 639 632 x(topology_error) \ 640 633 x(errors_fixed) \ 641 - x(errors_not_fixed) 634 + x(errors_not_fixed) \ 635 + x(no_invalid_checks) 642 636 643 637 enum bch_fs_flags { 644 638 #define x(n) BCH_FS_##n, ··· 721 715 x(discard_fast) \ 722 716 x(invalidate) \ 723 717 x(delete_dead_snapshots) \ 718 + x(gc_gens) \ 724 719 x(snapshot_delete_pagecache) \ 725 720 x(sysfs) \ 726 721 x(btree_write_buffer) ··· 933 926 /* JOURNAL SEQ BLACKLIST */ 934 927 struct journal_seq_blacklist_table * 935 928 journal_seq_blacklist_table; 936 - struct work_struct journal_seq_blacklist_gc_work; 937 929 938 930 /* ALLOCATOR */ 939 931 spinlock_t freelist_lock; ··· 963 957 struct work_struct discard_fast_work; 964 958 965 959 /* GARBAGE COLLECTION */ 966 - struct task_struct *gc_thread; 967 - atomic_t kick_gc; 960 + struct work_struct gc_gens_work; 968 961 unsigned long gc_count; 969 962 970 963 enum btree_id gc_gens_btree; ··· 993 988 struct bio_set bio_read; 994 989 struct bio_set bio_read_split; 995 990 struct bio_set bio_write; 991 + struct bio_set replica_set; 996 992 struct mutex bio_bounce_pages_lock; 997 993 mempool_t bio_bounce_pages; 998 994 struct bucket_nocow_lock_table ··· 1121 1115 u64 counters_on_mount[BCH_COUNTER_NR]; 1122 1116 u64 __percpu *counters; 1123 1117 1124 - unsigned btree_gc_periodic:1; 1125 1118 unsigned copy_gc_enabled:1; 1126 1119 bool promote_whole_extents; 1127 1120 ··· 1253 1248 1254 1249 ktime_get_coarse_real_ts64(&now); 1255 1250 return timespec_to_bch2_time(c, now); 1256 - } 1257 - 1258 - static inline bool bch2_dev_exists2(const struct bch_fs *c, unsigned dev) 1259 - { 1260 - return dev < c->sb.nr_devices && c->devs[dev]; 1261 1251 } 1262 1252 1263 1253 static inline struct stdio_redirect *bch2_fs_stdio_redirect(struct bch_fs *c)
+9 -1
fs/bcachefs/bcachefs_format.h
··· 76 76 #include <asm/byteorder.h> 77 77 #include <linux/kernel.h> 78 78 #include <linux/uuid.h> 79 + #include <uapi/linux/magic.h> 79 80 #include "vstructs.h" 80 81 81 82 #ifdef __KERNEL__ ··· 590 589 __le64 errors_reset_time; 591 590 __le64 seq; 592 591 __le64 btree_allocated_bitmap; 592 + /* 593 + * On recovery from a clean shutdown we don't normally read the journal, 594 + * but we still want to resume writing from where we left off so we 595 + * don't overwrite more than is necessary, for list journal debugging: 596 + */ 597 + __le32 last_journal_bucket; 598 + __le32 last_journal_bucket_offset; 593 599 }; 594 600 595 601 /* ··· 1291 1283 UUID_INIT(0xc68573f6, 0x66ce, 0x90a9, \ 1292 1284 0xd9, 0x6a, 0x60, 0xcf, 0x80, 0x3d, 0xf7, 0xef) 1293 1285 1294 - #define BCACHEFS_STATFS_MAGIC 0xca451a4e 1286 + #define BCACHEFS_STATFS_MAGIC BCACHEFS_SUPER_MAGIC 1295 1287 1296 1288 #define JSET_MAGIC __cpu_to_le64(0x245235c1a3625032ULL) 1297 1289 #define BSET_MAGIC __cpu_to_le64(0x90135c78b99e07f5ULL)
+6 -9
fs/bcachefs/bkey.c
··· 640 640 641 641 int bch2_bkey_format_invalid(struct bch_fs *c, 642 642 struct bkey_format *f, 643 - enum bkey_invalid_flags flags, 643 + enum bch_validate_flags flags, 644 644 struct printbuf *err) 645 645 { 646 646 unsigned i, bits = KEY_PACKED_BITS_START; ··· 656 656 * unpacked format: 657 657 */ 658 658 for (i = 0; i < f->nr_fields; i++) { 659 - if (!c || c->sb.version_min >= bcachefs_metadata_version_snapshot) { 659 + if ((!c || c->sb.version_min >= bcachefs_metadata_version_snapshot) && 660 + bch2_bkey_format_field_overflows(f, i)) { 660 661 unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i]; 661 662 u64 unpacked_max = ~((~0ULL << 1) << (unpacked_bits - 1)); 662 663 u64 packed_max = f->bits_per_field[i] 663 664 ? ~((~0ULL << 1) << (f->bits_per_field[i] - 1)) 664 665 : 0; 665 - u64 field_offset = le64_to_cpu(f->field_offset[i]); 666 666 667 - if (packed_max + field_offset < packed_max || 668 - packed_max + field_offset > unpacked_max) { 669 - prt_printf(err, "field %u too large: %llu + %llu > %llu", 670 - i, packed_max, field_offset, unpacked_max); 671 - return -BCH_ERR_invalid; 672 - } 667 + prt_printf(err, "field %u too large: %llu + %llu > %llu", 668 + i, packed_max, le64_to_cpu(f->field_offset[i]), unpacked_max); 669 + return -BCH_ERR_invalid; 673 670 } 674 671 675 672 bits += f->bits_per_field[i];
+28 -5
fs/bcachefs/bkey.h
··· 9 9 #include "util.h" 10 10 #include "vstructs.h" 11 11 12 - enum bkey_invalid_flags { 13 - BKEY_INVALID_WRITE = (1U << 0), 14 - BKEY_INVALID_COMMIT = (1U << 1), 15 - BKEY_INVALID_JOURNAL = (1U << 2), 12 + enum bch_validate_flags { 13 + BCH_VALIDATE_write = (1U << 0), 14 + BCH_VALIDATE_commit = (1U << 1), 15 + BCH_VALIDATE_journal = (1U << 2), 16 16 }; 17 17 18 18 #if 0 ··· 574 574 575 575 void bch2_bkey_format_add_pos(struct bkey_format_state *, struct bpos); 576 576 struct bkey_format bch2_bkey_format_done(struct bkey_format_state *); 577 + 578 + static inline bool bch2_bkey_format_field_overflows(struct bkey_format *f, unsigned i) 579 + { 580 + unsigned f_bits = f->bits_per_field[i]; 581 + unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i]; 582 + u64 unpacked_mask = ~((~0ULL << 1) << (unpacked_bits - 1)); 583 + u64 field_offset = le64_to_cpu(f->field_offset[i]); 584 + 585 + if (f_bits > unpacked_bits) 586 + return true; 587 + 588 + if ((f_bits == unpacked_bits) && field_offset) 589 + return true; 590 + 591 + u64 f_mask = f_bits 592 + ? ~((~0ULL << (f_bits - 1)) << 1) 593 + : 0; 594 + 595 + if (((field_offset + f_mask) & unpacked_mask) < field_offset) 596 + return true; 597 + return false; 598 + } 599 + 577 600 int bch2_bkey_format_invalid(struct bch_fs *, struct bkey_format *, 578 - enum bkey_invalid_flags, struct printbuf *); 601 + enum bch_validate_flags, struct printbuf *); 579 602 void bch2_bkey_format_to_text(struct printbuf *, const struct bkey_format *); 580 603 581 604 #endif /* _BCACHEFS_BKEY_H */
+14 -8
fs/bcachefs/bkey_methods.c
··· 27 27 }; 28 28 29 29 static int deleted_key_invalid(struct bch_fs *c, struct bkey_s_c k, 30 - enum bkey_invalid_flags flags, struct printbuf *err) 30 + enum bch_validate_flags flags, struct printbuf *err) 31 31 { 32 32 return 0; 33 33 } ··· 41 41 }) 42 42 43 43 static int empty_val_key_invalid(struct bch_fs *c, struct bkey_s_c k, 44 - enum bkey_invalid_flags flags, struct printbuf *err) 44 + enum bch_validate_flags flags, struct printbuf *err) 45 45 { 46 46 int ret = 0; 47 47 ··· 58 58 }) 59 59 60 60 static int key_type_cookie_invalid(struct bch_fs *c, struct bkey_s_c k, 61 - enum bkey_invalid_flags flags, struct printbuf *err) 61 + enum bch_validate_flags flags, struct printbuf *err) 62 62 { 63 63 return 0; 64 64 } ··· 82 82 }) 83 83 84 84 static int key_type_inline_data_invalid(struct bch_fs *c, struct bkey_s_c k, 85 - enum bkey_invalid_flags flags, struct printbuf *err) 85 + enum bch_validate_flags flags, struct printbuf *err) 86 86 { 87 87 return 0; 88 88 } ··· 123 123 }; 124 124 125 125 int bch2_bkey_val_invalid(struct bch_fs *c, struct bkey_s_c k, 126 - enum bkey_invalid_flags flags, 126 + enum bch_validate_flags flags, 127 127 struct printbuf *err) 128 128 { 129 + if (test_bit(BCH_FS_no_invalid_checks, &c->flags)) 130 + return 0; 131 + 129 132 const struct bkey_ops *ops = bch2_bkey_type_ops(k.k->type); 130 133 int ret = 0; 131 134 ··· 162 159 163 160 int __bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k, 164 161 enum btree_node_type type, 165 - enum bkey_invalid_flags flags, 162 + enum bch_validate_flags flags, 166 163 struct printbuf *err) 167 164 { 165 + if (test_bit(BCH_FS_no_invalid_checks, &c->flags)) 166 + return 0; 167 + 168 168 int ret = 0; 169 169 170 170 bkey_fsck_err_on(k.k->u64s < BKEY_U64s, c, err, ··· 178 172 return 0; 179 173 180 174 bkey_fsck_err_on(k.k->type < KEY_TYPE_MAX && 181 - (type == BKEY_TYPE_btree || (flags & BKEY_INVALID_COMMIT)) && 175 + (type == BKEY_TYPE_btree || (flags & BCH_VALIDATE_commit)) && 182 176 !(bch2_key_types_allowed[type] & BIT_ULL(k.k->type)), c, err, 183 177 bkey_invalid_type_for_btree, 184 178 "invalid key type for btree %s (%s)", ··· 230 224 231 225 int bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k, 232 226 enum btree_node_type type, 233 - enum bkey_invalid_flags flags, 227 + enum bch_validate_flags flags, 234 228 struct printbuf *err) 235 229 { 236 230 return __bch2_bkey_invalid(c, k, type, flags, err) ?:
+15 -58
fs/bcachefs/bkey_methods.h
··· 22 22 */ 23 23 struct bkey_ops { 24 24 int (*key_invalid)(struct bch_fs *c, struct bkey_s_c k, 25 - enum bkey_invalid_flags flags, struct printbuf *err); 25 + enum bch_validate_flags flags, struct printbuf *err); 26 26 void (*val_to_text)(struct printbuf *, struct bch_fs *, 27 27 struct bkey_s_c); 28 28 void (*swab)(struct bkey_s); 29 29 bool (*key_normalize)(struct bch_fs *, struct bkey_s); 30 30 bool (*key_merge)(struct bch_fs *, struct bkey_s, struct bkey_s_c); 31 31 int (*trigger)(struct btree_trans *, enum btree_id, unsigned, 32 - struct bkey_s_c, struct bkey_s, unsigned); 32 + struct bkey_s_c, struct bkey_s, 33 + enum btree_iter_update_trigger_flags); 33 34 void (*compat)(enum btree_id id, unsigned version, 34 35 unsigned big_endian, int write, 35 36 struct bkey_s); ··· 49 48 } 50 49 51 50 int bch2_bkey_val_invalid(struct bch_fs *, struct bkey_s_c, 52 - enum bkey_invalid_flags, struct printbuf *); 51 + enum bch_validate_flags, struct printbuf *); 53 52 int __bch2_bkey_invalid(struct bch_fs *, struct bkey_s_c, enum btree_node_type, 54 - enum bkey_invalid_flags, struct printbuf *); 53 + enum bch_validate_flags, struct printbuf *); 55 54 int bch2_bkey_invalid(struct bch_fs *, struct bkey_s_c, enum btree_node_type, 56 - enum bkey_invalid_flags, struct printbuf *); 55 + enum bch_validate_flags, struct printbuf *); 57 56 int bch2_bkey_in_btree_node(struct bch_fs *, struct btree *, 58 57 struct bkey_s_c, struct printbuf *); 59 58 ··· 77 76 78 77 bool bch2_bkey_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c); 79 78 80 - enum btree_update_flags { 81 - __BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE = __BTREE_ITER_FLAGS_END, 82 - __BTREE_UPDATE_NOJOURNAL, 83 - __BTREE_UPDATE_KEY_CACHE_RECLAIM, 84 - 85 - __BTREE_TRIGGER_NORUN, 86 - __BTREE_TRIGGER_TRANSACTIONAL, 87 - __BTREE_TRIGGER_ATOMIC, 88 - __BTREE_TRIGGER_GC, 89 - __BTREE_TRIGGER_INSERT, 90 - __BTREE_TRIGGER_OVERWRITE, 91 - __BTREE_TRIGGER_BUCKET_INVALIDATE, 92 - }; 93 - 94 - #define BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE (1U << __BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) 95 - #define BTREE_UPDATE_NOJOURNAL (1U << __BTREE_UPDATE_NOJOURNAL) 96 - #define BTREE_UPDATE_KEY_CACHE_RECLAIM (1U << __BTREE_UPDATE_KEY_CACHE_RECLAIM) 97 - 98 - /* Don't run triggers at all */ 99 - #define BTREE_TRIGGER_NORUN (1U << __BTREE_TRIGGER_NORUN) 100 - 101 - /* 102 - * If set, we're running transactional triggers as part of a transaction commit: 103 - * triggers may generate new updates 104 - * 105 - * If cleared, and either BTREE_TRIGGER_INSERT|BTREE_TRIGGER_OVERWRITE are set, 106 - * we're running atomic triggers during a transaction commit: we have our 107 - * journal reservation, we're holding btree node write locks, and we know the 108 - * transaction is going to commit (returning an error here is a fatal error, 109 - * causing us to go emergency read-only) 110 - */ 111 - #define BTREE_TRIGGER_TRANSACTIONAL (1U << __BTREE_TRIGGER_TRANSACTIONAL) 112 - #define BTREE_TRIGGER_ATOMIC (1U << __BTREE_TRIGGER_ATOMIC) 113 - 114 - /* We're in gc/fsck: running triggers to recalculate e.g. disk usage */ 115 - #define BTREE_TRIGGER_GC (1U << __BTREE_TRIGGER_GC) 116 - 117 - /* @new is entering the btree */ 118 - #define BTREE_TRIGGER_INSERT (1U << __BTREE_TRIGGER_INSERT) 119 - 120 - /* @old is leaving the btree */ 121 - #define BTREE_TRIGGER_OVERWRITE (1U << __BTREE_TRIGGER_OVERWRITE) 122 - 123 - /* signal from bucket invalidate path to alloc trigger */ 124 - #define BTREE_TRIGGER_BUCKET_INVALIDATE (1U << __BTREE_TRIGGER_BUCKET_INVALIDATE) 125 - 126 79 static inline int bch2_key_trigger(struct btree_trans *trans, 127 80 enum btree_id btree, unsigned level, 128 81 struct bkey_s_c old, struct bkey_s new, 129 - unsigned flags) 82 + enum btree_iter_update_trigger_flags flags) 130 83 { 131 84 const struct bkey_ops *ops = bch2_bkey_type_ops(old.k->type ?: new.k->type); 132 85 ··· 90 135 } 91 136 92 137 static inline int bch2_key_trigger_old(struct btree_trans *trans, 93 - enum btree_id btree_id, unsigned level, 94 - struct bkey_s_c old, unsigned flags) 138 + enum btree_id btree_id, unsigned level, 139 + struct bkey_s_c old, 140 + enum btree_iter_update_trigger_flags flags) 95 141 { 96 142 struct bkey_i deleted; 97 143 ··· 100 144 deleted.k.p = old.k->p; 101 145 102 146 return bch2_key_trigger(trans, btree_id, level, old, bkey_i_to_s(&deleted), 103 - BTREE_TRIGGER_OVERWRITE|flags); 147 + BTREE_TRIGGER_overwrite|flags); 104 148 } 105 149 106 150 static inline int bch2_key_trigger_new(struct btree_trans *trans, 107 - enum btree_id btree_id, unsigned level, 108 - struct bkey_s new, unsigned flags) 151 + enum btree_id btree_id, unsigned level, 152 + struct bkey_s new, 153 + enum btree_iter_update_trigger_flags flags) 109 154 { 110 155 struct bkey_i deleted; 111 156 ··· 114 157 deleted.k.p = new.k->p; 115 158 116 159 return bch2_key_trigger(trans, btree_id, level, bkey_i_to_s_c(&deleted), new, 117 - BTREE_TRIGGER_INSERT|flags); 160 + BTREE_TRIGGER_insert|flags); 118 161 } 119 162 120 163 void bch2_bkey_renumber(enum btree_node_type, struct bkey_packed *, int);
+47 -34
fs/bcachefs/bkey_sort.c
··· 6 6 #include "bset.h" 7 7 #include "extents.h" 8 8 9 - typedef int (*sort_cmp_fn)(struct btree *, 10 - struct bkey_packed *, 11 - struct bkey_packed *); 9 + typedef int (*sort_cmp_fn)(const struct btree *, 10 + const struct bkey_packed *, 11 + const struct bkey_packed *); 12 12 13 13 static inline bool sort_iter_end(struct sort_iter *iter) 14 14 { ··· 70 70 /* 71 71 * If keys compare equal, compare by pointer order: 72 72 */ 73 - static inline int key_sort_fix_overlapping_cmp(struct btree *b, 74 - struct bkey_packed *l, 75 - struct bkey_packed *r) 73 + static inline int key_sort_fix_overlapping_cmp(const struct btree *b, 74 + const struct bkey_packed *l, 75 + const struct bkey_packed *r) 76 76 { 77 77 return bch2_bkey_cmp_packed(b, l, r) ?: 78 78 cmp_int((unsigned long) l, (unsigned long) r); ··· 154 154 return nr; 155 155 } 156 156 157 - static inline int sort_keys_cmp(struct btree *b, 158 - struct bkey_packed *l, 159 - struct bkey_packed *r) 157 + static inline int keep_unwritten_whiteouts_cmp(const struct btree *b, 158 + const struct bkey_packed *l, 159 + const struct bkey_packed *r) 160 160 { 161 161 return bch2_bkey_cmp_packed_inlined(b, l, r) ?: 162 162 (int) bkey_deleted(r) - (int) bkey_deleted(l) ?: 163 - (int) l->needs_whiteout - (int) r->needs_whiteout; 163 + (long) l - (long) r; 164 164 } 165 165 166 - unsigned bch2_sort_keys(struct bkey_packed *dst, 167 - struct sort_iter *iter, 168 - bool filter_whiteouts) 166 + #include "btree_update_interior.h" 167 + 168 + /* 169 + * For sorting in the btree node write path: whiteouts not in the unwritten 170 + * whiteouts area are dropped, whiteouts in the unwritten whiteouts area are 171 + * dropped if overwritten by real keys: 172 + */ 173 + unsigned bch2_sort_keys_keep_unwritten_whiteouts(struct bkey_packed *dst, struct sort_iter *iter) 169 174 { 170 - const struct bkey_format *f = &iter->b->format; 171 175 struct bkey_packed *in, *next, *out = dst; 172 176 173 - sort_iter_sort(iter, sort_keys_cmp); 177 + sort_iter_sort(iter, keep_unwritten_whiteouts_cmp); 174 178 175 - while ((in = sort_iter_next(iter, sort_keys_cmp))) { 176 - bool needs_whiteout = false; 177 - 178 - if (bkey_deleted(in) && 179 - (filter_whiteouts || !in->needs_whiteout)) 179 + while ((in = sort_iter_next(iter, keep_unwritten_whiteouts_cmp))) { 180 + if (bkey_deleted(in) && in < unwritten_whiteouts_start(iter->b)) 180 181 continue; 181 182 182 - while ((next = sort_iter_peek(iter)) && 183 - !bch2_bkey_cmp_packed_inlined(iter->b, in, next)) { 184 - BUG_ON(in->needs_whiteout && 185 - next->needs_whiteout); 186 - needs_whiteout |= in->needs_whiteout; 187 - in = sort_iter_next(iter, sort_keys_cmp); 188 - } 183 + if ((next = sort_iter_peek(iter)) && 184 + !bch2_bkey_cmp_packed_inlined(iter->b, in, next)) 185 + continue; 189 186 190 - if (bkey_deleted(in)) { 191 - memcpy_u64s_small(out, in, bkeyp_key_u64s(f, in)); 192 - set_bkeyp_val_u64s(f, out, 0); 193 - } else { 194 - bkey_p_copy(out, in); 195 - } 196 - out->needs_whiteout |= needs_whiteout; 187 + bkey_p_copy(out, in); 188 + out = bkey_p_next(out); 189 + } 190 + 191 + return (u64 *) out - (u64 *) dst; 192 + } 193 + 194 + /* 195 + * Main sort routine for compacting a btree node in memory: we always drop 196 + * whiteouts because any whiteouts that need to be written are in the unwritten 197 + * whiteouts area: 198 + */ 199 + unsigned bch2_sort_keys(struct bkey_packed *dst, struct sort_iter *iter) 200 + { 201 + struct bkey_packed *in, *out = dst; 202 + 203 + sort_iter_sort(iter, bch2_bkey_cmp_packed_inlined); 204 + 205 + while ((in = sort_iter_next(iter, bch2_bkey_cmp_packed_inlined))) { 206 + if (bkey_deleted(in)) 207 + continue; 208 + 209 + bkey_p_copy(out, in); 197 210 out = bkey_p_next(out); 198 211 } 199 212
+2 -2
fs/bcachefs/bkey_sort.h
··· 48 48 struct btree_node_iter *, 49 49 struct bkey_format *, bool); 50 50 51 - unsigned bch2_sort_keys(struct bkey_packed *, 52 - struct sort_iter *, bool); 51 + unsigned bch2_sort_keys_keep_unwritten_whiteouts(struct bkey_packed *, struct sort_iter *); 52 + unsigned bch2_sort_keys(struct bkey_packed *, struct sort_iter *); 53 53 54 54 #endif /* _BCACHEFS_BKEY_SORT_H */
+10 -19
fs/bcachefs/bset.c
··· 103 103 104 104 void bch2_dump_btree_node(struct bch_fs *c, struct btree *b) 105 105 { 106 - struct bset_tree *t; 107 - 108 106 console_lock(); 109 107 for_each_bset(b, t) 110 108 bch2_dump_bset(c, b, bset(b, t), t - b->set); ··· 134 136 135 137 struct btree_nr_keys bch2_btree_node_count_keys(struct btree *b) 136 138 { 137 - struct bset_tree *t; 138 139 struct bkey_packed *k; 139 140 struct btree_nr_keys nr = {}; 140 141 ··· 195 198 { 196 199 struct btree_node_iter_set *set, *s2; 197 200 struct bkey_packed *k, *p; 198 - struct bset_tree *t; 199 201 200 202 if (bch2_btree_node_iter_end(iter)) 201 203 return; ··· 209 213 /* Verify that set->end is correct: */ 210 214 btree_node_iter_for_each(iter, set) { 211 215 for_each_bset(b, t) 212 - if (set->end == t->end_offset) 216 + if (set->end == t->end_offset) { 217 + BUG_ON(set->k < btree_bkey_first_offset(t) || 218 + set->k >= t->end_offset); 213 219 goto found; 220 + } 214 221 BUG(); 215 222 found: 216 - BUG_ON(set->k < btree_bkey_first_offset(t) || 217 - set->k >= t->end_offset); 223 + do {} while (0); 218 224 } 219 225 220 226 /* Verify iterator is sorted: */ ··· 375 377 return ro_aux_tree_base(b, t)->f + idx; 376 378 } 377 379 378 - static void bset_aux_tree_verify(const struct btree *b) 380 + static void bset_aux_tree_verify(struct btree *b) 379 381 { 380 382 #ifdef CONFIG_BCACHEFS_DEBUG 381 - const struct bset_tree *t; 382 - 383 383 for_each_bset(b, t) { 384 384 if (t->aux_data_offset == U16_MAX) 385 385 continue; ··· 681 685 } 682 686 683 687 /* bytes remaining - only valid for last bset: */ 684 - static unsigned __bset_tree_capacity(const struct btree *b, const struct bset_tree *t) 688 + static unsigned __bset_tree_capacity(struct btree *b, const struct bset_tree *t) 685 689 { 686 690 bset_aux_tree_verify(b); 687 691 688 692 return btree_aux_data_bytes(b) - t->aux_data_offset * sizeof(u64); 689 693 } 690 694 691 - static unsigned bset_ro_tree_capacity(const struct btree *b, const struct bset_tree *t) 695 + static unsigned bset_ro_tree_capacity(struct btree *b, const struct bset_tree *t) 692 696 { 693 697 return __bset_tree_capacity(b, t) / 694 698 (sizeof(struct bkey_float) + sizeof(u8)); 695 699 } 696 700 697 - static unsigned bset_rw_tree_capacity(const struct btree *b, const struct bset_tree *t) 701 + static unsigned bset_rw_tree_capacity(struct btree *b, const struct bset_tree *t) 698 702 { 699 703 return __bset_tree_capacity(b, t) / sizeof(struct rw_aux_tree); 700 704 } ··· 1370 1374 void bch2_btree_node_iter_init_from_start(struct btree_node_iter *iter, 1371 1375 struct btree *b) 1372 1376 { 1373 - struct bset_tree *t; 1374 - 1375 1377 memset(iter, 0, sizeof(*iter)); 1376 1378 1377 1379 for_each_bset(b, t) ··· 1475 1481 { 1476 1482 struct bkey_packed *k, *prev = NULL; 1477 1483 struct btree_node_iter_set *set; 1478 - struct bset_tree *t; 1479 1484 unsigned end = 0; 1480 1485 1481 1486 if (bch2_expensive_debug_checks) ··· 1543 1550 1544 1551 void bch2_btree_keys_stats(const struct btree *b, struct bset_stats *stats) 1545 1552 { 1546 - const struct bset_tree *t; 1547 - 1548 - for_each_bset(b, t) { 1553 + for_each_bset_c(b, t) { 1549 1554 enum bset_aux_tree_type type = bset_aux_tree_type(t); 1550 1555 size_t j; 1551 1556
+4 -2
fs/bcachefs/bset.h
··· 206 206 } 207 207 208 208 #define for_each_bset(_b, _t) \ 209 - for (_t = (_b)->set; _t < (_b)->set + (_b)->nsets; _t++) 209 + for (struct bset_tree *_t = (_b)->set; _t < (_b)->set + (_b)->nsets; _t++) 210 + 211 + #define for_each_bset_c(_b, _t) \ 212 + for (const struct bset_tree *_t = (_b)->set; _t < (_b)->set + (_b)->nsets; _t++) 210 213 211 214 #define bset_tree_for_each_key(_b, _t, _k) \ 212 215 for (_k = btree_bkey_first(_b, _t); \ ··· 297 294 bch2_bkey_to_bset_inlined(struct btree *b, struct bkey_packed *k) 298 295 { 299 296 unsigned offset = __btree_node_key_to_offset(b, k); 300 - struct bset_tree *t; 301 297 302 298 for_each_bset(b, t) 303 299 if (offset <= t->end_offset) {
+123 -26
fs/bcachefs/btree_cache.c
··· 16 16 #include <linux/prefetch.h> 17 17 #include <linux/sched/mm.h> 18 18 19 + #define BTREE_CACHE_NOT_FREED_INCREMENT(counter) \ 20 + do { \ 21 + if (shrinker_counter) \ 22 + bc->not_freed_##counter++; \ 23 + } while (0) 24 + 19 25 const char * const bch2_btree_node_flags[] = { 20 26 #define x(f) #f, 21 27 BTREE_FLAGS() ··· 168 162 169 163 /* Cause future lookups for this node to fail: */ 170 164 b->hash_val = 0; 165 + 166 + if (b->c.btree_id < BTREE_ID_NR) 167 + --bc->used_by_btree[b->c.btree_id]; 171 168 } 172 169 173 170 int __bch2_btree_node_hash_insert(struct btree_cache *bc, struct btree *b) ··· 178 169 BUG_ON(b->hash_val); 179 170 b->hash_val = btree_ptr_hash_val(&b->key); 180 171 181 - return rhashtable_lookup_insert_fast(&bc->table, &b->hash, 182 - bch_btree_cache_params); 172 + int ret = rhashtable_lookup_insert_fast(&bc->table, &b->hash, 173 + bch_btree_cache_params); 174 + if (!ret && b->c.btree_id < BTREE_ID_NR) 175 + bc->used_by_btree[b->c.btree_id]++; 176 + return ret; 183 177 } 184 178 185 179 int bch2_btree_node_hash_insert(struct btree_cache *bc, struct btree *b, ··· 202 190 return ret; 203 191 } 204 192 193 + void bch2_btree_node_update_key_early(struct btree_trans *trans, 194 + enum btree_id btree, unsigned level, 195 + struct bkey_s_c old, struct bkey_i *new) 196 + { 197 + struct bch_fs *c = trans->c; 198 + struct btree *b; 199 + struct bkey_buf tmp; 200 + int ret; 201 + 202 + bch2_bkey_buf_init(&tmp); 203 + bch2_bkey_buf_reassemble(&tmp, c, old); 204 + 205 + b = bch2_btree_node_get_noiter(trans, tmp.k, btree, level, true); 206 + if (!IS_ERR_OR_NULL(b)) { 207 + mutex_lock(&c->btree_cache.lock); 208 + 209 + bch2_btree_node_hash_remove(&c->btree_cache, b); 210 + 211 + bkey_copy(&b->key, new); 212 + ret = __bch2_btree_node_hash_insert(&c->btree_cache, b); 213 + BUG_ON(ret); 214 + 215 + mutex_unlock(&c->btree_cache.lock); 216 + six_unlock_read(&b->c.lock); 217 + } 218 + 219 + bch2_bkey_buf_exit(&tmp, c); 220 + } 221 + 205 222 __flatten 206 223 static inline struct btree *btree_cache_find(struct btree_cache *bc, 207 224 const struct bkey_i *k) ··· 244 203 * this version is for btree nodes that have already been freed (we're not 245 204 * reaping a real btree node) 246 205 */ 247 - static int __btree_node_reclaim(struct bch_fs *c, struct btree *b, bool flush) 206 + static int __btree_node_reclaim(struct bch_fs *c, struct btree *b, bool flush, bool shrinker_counter) 248 207 { 249 208 struct btree_cache *bc = &c->btree_cache; 250 209 int ret = 0; ··· 266 225 if (b->flags & ((1U << BTREE_NODE_dirty)| 267 226 (1U << BTREE_NODE_read_in_flight)| 268 227 (1U << BTREE_NODE_write_in_flight))) { 269 - if (!flush) 228 + if (!flush) { 229 + if (btree_node_dirty(b)) 230 + BTREE_CACHE_NOT_FREED_INCREMENT(dirty); 231 + else if (btree_node_read_in_flight(b)) 232 + BTREE_CACHE_NOT_FREED_INCREMENT(read_in_flight); 233 + else if (btree_node_write_in_flight(b)) 234 + BTREE_CACHE_NOT_FREED_INCREMENT(write_in_flight); 270 235 return -BCH_ERR_ENOMEM_btree_node_reclaim; 236 + } 271 237 272 238 /* XXX: waiting on IO with btree cache lock held */ 273 239 bch2_btree_node_wait_on_read(b); 274 240 bch2_btree_node_wait_on_write(b); 275 241 } 276 242 277 - if (!six_trylock_intent(&b->c.lock)) 243 + if (!six_trylock_intent(&b->c.lock)) { 244 + BTREE_CACHE_NOT_FREED_INCREMENT(lock_intent); 278 245 return -BCH_ERR_ENOMEM_btree_node_reclaim; 246 + } 279 247 280 - if (!six_trylock_write(&b->c.lock)) 248 + if (!six_trylock_write(&b->c.lock)) { 249 + BTREE_CACHE_NOT_FREED_INCREMENT(lock_write); 281 250 goto out_unlock_intent; 251 + } 282 252 283 253 /* recheck under lock */ 284 254 if (b->flags & ((1U << BTREE_NODE_read_in_flight)| 285 255 (1U << BTREE_NODE_write_in_flight))) { 286 - if (!flush) 256 + if (!flush) { 257 + if (btree_node_read_in_flight(b)) 258 + BTREE_CACHE_NOT_FREED_INCREMENT(read_in_flight); 259 + else if (btree_node_write_in_flight(b)) 260 + BTREE_CACHE_NOT_FREED_INCREMENT(write_in_flight); 287 261 goto out_unlock; 262 + } 288 263 six_unlock_write(&b->c.lock); 289 264 six_unlock_intent(&b->c.lock); 290 265 goto wait_on_io; 291 266 } 292 267 293 - if (btree_node_noevict(b) || 294 - btree_node_write_blocked(b) || 295 - btree_node_will_make_reachable(b)) 268 + if (btree_node_noevict(b)) { 269 + BTREE_CACHE_NOT_FREED_INCREMENT(noevict); 296 270 goto out_unlock; 271 + } 272 + if (btree_node_write_blocked(b)) { 273 + BTREE_CACHE_NOT_FREED_INCREMENT(write_blocked); 274 + goto out_unlock; 275 + } 276 + if (btree_node_will_make_reachable(b)) { 277 + BTREE_CACHE_NOT_FREED_INCREMENT(will_make_reachable); 278 + goto out_unlock; 279 + } 297 280 298 281 if (btree_node_dirty(b)) { 299 - if (!flush) 282 + if (!flush) { 283 + BTREE_CACHE_NOT_FREED_INCREMENT(dirty); 300 284 goto out_unlock; 285 + } 301 286 /* 302 287 * Using the underscore version because we don't want to compact 303 288 * bsets after the write, since this node is about to be evicted ··· 353 286 goto out; 354 287 } 355 288 356 - static int btree_node_reclaim(struct bch_fs *c, struct btree *b) 289 + static int btree_node_reclaim(struct bch_fs *c, struct btree *b, bool shrinker_counter) 357 290 { 358 - return __btree_node_reclaim(c, b, false); 291 + return __btree_node_reclaim(c, b, false, shrinker_counter); 359 292 } 360 293 361 294 static int btree_node_write_and_reclaim(struct bch_fs *c, struct btree *b) 362 295 { 363 - return __btree_node_reclaim(c, b, true); 296 + return __btree_node_reclaim(c, b, true, false); 364 297 } 365 298 366 299 static unsigned long bch2_btree_cache_scan(struct shrinker *shrink, ··· 408 341 if (touched >= nr) 409 342 goto out; 410 343 411 - if (!btree_node_reclaim(c, b)) { 344 + if (!btree_node_reclaim(c, b, true)) { 412 345 btree_node_data_free(c, b); 413 346 six_unlock_write(&b->c.lock); 414 347 six_unlock_intent(&b->c.lock); 415 348 freed++; 349 + bc->freed++; 416 350 } 417 351 } 418 352 restart: ··· 422 354 423 355 if (btree_node_accessed(b)) { 424 356 clear_btree_node_accessed(b); 425 - } else if (!btree_node_reclaim(c, b)) { 357 + bc->not_freed_access_bit++; 358 + } else if (!btree_node_reclaim(c, b, true)) { 426 359 freed++; 427 360 btree_node_data_free(c, b); 361 + bc->freed++; 428 362 429 363 bch2_btree_node_hash_remove(bc, b); 430 364 six_unlock_write(&b->c.lock); ··· 634 564 struct btree *b; 635 565 636 566 list_for_each_entry_reverse(b, &bc->live, list) 637 - if (!btree_node_reclaim(c, b)) 567 + if (!btree_node_reclaim(c, b, false)) 638 568 return b; 639 569 640 570 while (1) { ··· 670 600 * disk node. Check the freed list before allocating a new one: 671 601 */ 672 602 list_for_each_entry(b, freed, list) 673 - if (!btree_node_reclaim(c, b)) { 603 + if (!btree_node_reclaim(c, b, false)) { 674 604 list_del_init(&b->list); 675 605 goto got_node; 676 606 } ··· 696 626 * the list. Check if there's any freed nodes there: 697 627 */ 698 628 list_for_each_entry(b2, &bc->freeable, list) 699 - if (!btree_node_reclaim(c, b2)) { 629 + if (!btree_node_reclaim(c, b2, false)) { 700 630 swap(b->data, b2->data); 701 631 swap(b->aux_data, b2->aux_data); 702 632 btree_node_to_freedlist(bc, b2); ··· 916 846 struct bch_fs *c = trans->c; 917 847 struct btree_cache *bc = &c->btree_cache; 918 848 struct btree *b; 919 - struct bset_tree *t; 920 849 bool need_relock = false; 921 850 int ret; 922 851 ··· 1035 966 { 1036 967 struct bch_fs *c = trans->c; 1037 968 struct btree *b; 1038 - struct bset_tree *t; 1039 969 int ret; 1040 970 1041 971 EBUG_ON(level >= BTREE_MAX_DEPTH); ··· 1111 1043 struct bch_fs *c = trans->c; 1112 1044 struct btree_cache *bc = &c->btree_cache; 1113 1045 struct btree *b; 1114 - struct bset_tree *t; 1115 1046 int ret; 1116 1047 1117 1048 EBUG_ON(level >= BTREE_MAX_DEPTH); ··· 1307 1240 stats.failed); 1308 1241 } 1309 1242 1310 - void bch2_btree_cache_to_text(struct printbuf *out, const struct bch_fs *c) 1243 + static void prt_btree_cache_line(struct printbuf *out, const struct bch_fs *c, 1244 + const char *label, unsigned nr) 1311 1245 { 1312 - prt_printf(out, "nr nodes:\t\t%u\n", c->btree_cache.used); 1313 - prt_printf(out, "nr dirty:\t\t%u\n", atomic_read(&c->btree_cache.dirty)); 1314 - prt_printf(out, "cannibalize lock:\t%p\n", c->btree_cache.alloc_lock); 1246 + prt_printf(out, "%s\t", label); 1247 + prt_human_readable_u64(out, nr * c->opts.btree_node_size); 1248 + prt_printf(out, " (%u)\n", nr); 1249 + } 1250 + 1251 + void bch2_btree_cache_to_text(struct printbuf *out, const struct btree_cache *bc) 1252 + { 1253 + struct bch_fs *c = container_of(bc, struct bch_fs, btree_cache); 1254 + 1255 + if (!out->nr_tabstops) 1256 + printbuf_tabstop_push(out, 32); 1257 + 1258 + prt_btree_cache_line(out, c, "total:", bc->used); 1259 + prt_btree_cache_line(out, c, "nr dirty:", atomic_read(&bc->dirty)); 1260 + prt_printf(out, "cannibalize lock:\t%p\n", bc->alloc_lock); 1261 + prt_newline(out); 1262 + 1263 + for (unsigned i = 0; i < ARRAY_SIZE(bc->used_by_btree); i++) 1264 + prt_btree_cache_line(out, c, bch2_btree_id_str(i), bc->used_by_btree[i]); 1265 + 1266 + prt_newline(out); 1267 + prt_printf(out, "freed:\t%u\n", bc->freed); 1268 + prt_printf(out, "not freed:\n"); 1269 + prt_printf(out, " dirty\t%u\n", bc->not_freed_dirty); 1270 + prt_printf(out, " write in flight\t%u\n", bc->not_freed_write_in_flight); 1271 + prt_printf(out, " read in flight\t%u\n", bc->not_freed_read_in_flight); 1272 + prt_printf(out, " lock intent failed\t%u\n", bc->not_freed_lock_intent); 1273 + prt_printf(out, " lock write failed\t%u\n", bc->not_freed_lock_write); 1274 + prt_printf(out, " access bit\t%u\n", bc->not_freed_access_bit); 1275 + prt_printf(out, " no evict failed\t%u\n", bc->not_freed_noevict); 1276 + prt_printf(out, " write blocked\t%u\n", bc->not_freed_write_blocked); 1277 + prt_printf(out, " will make reachable\t%u\n", bc->not_freed_will_make_reachable); 1315 1278 }
+4 -1
fs/bcachefs/btree_cache.h
··· 17 17 int bch2_btree_node_hash_insert(struct btree_cache *, struct btree *, 18 18 unsigned, enum btree_id); 19 19 20 + void bch2_btree_node_update_key_early(struct btree_trans *, enum btree_id, unsigned, 21 + struct bkey_s_c, struct bkey_i *); 22 + 20 23 void bch2_btree_cache_cannibalize_unlock(struct btree_trans *); 21 24 int bch2_btree_cache_cannibalize_lock(struct btree_trans *, struct closure *); 22 25 ··· 134 131 const char *bch2_btree_id_str(enum btree_id); 135 132 void bch2_btree_pos_to_text(struct printbuf *, struct bch_fs *, const struct btree *); 136 133 void bch2_btree_node_to_text(struct printbuf *, struct bch_fs *, const struct btree *); 137 - void bch2_btree_cache_to_text(struct printbuf *, const struct bch_fs *); 134 + void bch2_btree_cache_to_text(struct printbuf *, const struct btree_cache *); 138 135 139 136 #endif /* _BCACHEFS_BTREE_CACHE_H */
+204 -846
fs/bcachefs/btree_gc.c
··· 52 52 }}}; 53 53 } 54 54 55 - static bool should_restart_for_topology_repair(struct bch_fs *c) 56 - { 57 - return c->opts.fix_errors != FSCK_FIX_no && 58 - !(c->recovery_passes_complete & BIT_ULL(BCH_RECOVERY_PASS_check_topology)); 59 - } 60 - 61 55 static inline void __gc_pos_set(struct bch_fs *c, struct gc_pos new_pos) 62 56 { 63 57 preempt_disable(); ··· 63 69 64 70 static inline void gc_pos_set(struct bch_fs *c, struct gc_pos new_pos) 65 71 { 66 - BUG_ON(gc_pos_cmp(new_pos, c->gc_pos) <= 0); 72 + BUG_ON(gc_pos_cmp(new_pos, c->gc_pos) < 0); 67 73 __gc_pos_set(c, new_pos); 68 74 } 69 75 ··· 89 95 default: 90 96 BUG(); 91 97 } 92 - } 93 - 94 - static void bch2_btree_node_update_key_early(struct btree_trans *trans, 95 - enum btree_id btree, unsigned level, 96 - struct bkey_s_c old, struct bkey_i *new) 97 - { 98 - struct bch_fs *c = trans->c; 99 - struct btree *b; 100 - struct bkey_buf tmp; 101 - int ret; 102 - 103 - bch2_bkey_buf_init(&tmp); 104 - bch2_bkey_buf_reassemble(&tmp, c, old); 105 - 106 - b = bch2_btree_node_get_noiter(trans, tmp.k, btree, level, true); 107 - if (!IS_ERR_OR_NULL(b)) { 108 - mutex_lock(&c->btree_cache.lock); 109 - 110 - bch2_btree_node_hash_remove(&c->btree_cache, b); 111 - 112 - bkey_copy(&b->key, new); 113 - ret = __bch2_btree_node_hash_insert(&c->btree_cache, b); 114 - BUG_ON(ret); 115 - 116 - mutex_unlock(&c->btree_cache.lock); 117 - six_unlock_read(&b->c.lock); 118 - } 119 - 120 - bch2_bkey_buf_exit(&tmp, c); 121 98 } 122 99 123 100 static int set_node_min(struct bch_fs *c, struct btree *b, struct bpos new_min) ··· 511 546 if (!bch2_btree_has_scanned_nodes(c, i)) { 512 547 mustfix_fsck_err(c, btree_root_unreadable_and_scan_found_nothing, 513 548 "no nodes found for btree %s, continue?", bch2_btree_id_str(i)); 514 - bch2_btree_root_alloc_fake(c, i, 0); 549 + bch2_btree_root_alloc_fake_trans(trans, i, 0); 515 550 } else { 516 - bch2_btree_root_alloc_fake(c, i, 1); 551 + bch2_btree_root_alloc_fake_trans(trans, i, 1); 517 552 bch2_shoot_down_journal_keys(c, i, 1, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX); 518 553 ret = bch2_get_scanned_nodes(c, i, 0, POS_MIN, SPOS_MAX); 519 554 if (ret) ··· 541 576 goto reconstruct_root; 542 577 543 578 bch_err(c, "empty btree root %s", bch2_btree_id_str(i)); 544 - bch2_btree_root_alloc_fake(c, i, 0); 579 + bch2_btree_root_alloc_fake_trans(trans, i, 0); 545 580 r->alive = false; 546 581 ret = 0; 547 582 } ··· 551 586 return ret; 552 587 } 553 588 554 - static int bch2_check_fix_ptrs(struct btree_trans *trans, enum btree_id btree_id, 555 - unsigned level, bool is_root, 556 - struct bkey_s_c *k) 557 - { 558 - struct bch_fs *c = trans->c; 559 - struct bkey_ptrs_c ptrs_c = bch2_bkey_ptrs_c(*k); 560 - const union bch_extent_entry *entry_c; 561 - struct extent_ptr_decoded p = { 0 }; 562 - bool do_update = false; 563 - struct printbuf buf = PRINTBUF; 564 - int ret = 0; 565 - 566 - /* 567 - * XXX 568 - * use check_bucket_ref here 569 - */ 570 - bkey_for_each_ptr_decode(k->k, ptrs_c, p, entry_c) { 571 - struct bch_dev *ca = bch_dev_bkey_exists(c, p.ptr.dev); 572 - struct bucket *g = PTR_GC_BUCKET(ca, &p.ptr); 573 - enum bch_data_type data_type = bch2_bkey_ptr_data_type(*k, p, entry_c); 574 - 575 - if (fsck_err_on(!g->gen_valid, 576 - c, ptr_to_missing_alloc_key, 577 - "bucket %u:%zu data type %s ptr gen %u missing in alloc btree\n" 578 - "while marking %s", 579 - p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), 580 - bch2_data_type_str(ptr_data_type(k->k, &p.ptr)), 581 - p.ptr.gen, 582 - (printbuf_reset(&buf), 583 - bch2_bkey_val_to_text(&buf, c, *k), buf.buf))) { 584 - if (!p.ptr.cached) { 585 - g->gen_valid = true; 586 - g->gen = p.ptr.gen; 587 - } else { 588 - do_update = true; 589 - } 590 - } 591 - 592 - if (fsck_err_on(gen_cmp(p.ptr.gen, g->gen) > 0, 593 - c, ptr_gen_newer_than_bucket_gen, 594 - "bucket %u:%zu data type %s ptr gen in the future: %u > %u\n" 595 - "while marking %s", 596 - p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), 597 - bch2_data_type_str(ptr_data_type(k->k, &p.ptr)), 598 - p.ptr.gen, g->gen, 599 - (printbuf_reset(&buf), 600 - bch2_bkey_val_to_text(&buf, c, *k), buf.buf))) { 601 - if (!p.ptr.cached) { 602 - g->gen_valid = true; 603 - g->gen = p.ptr.gen; 604 - g->data_type = 0; 605 - g->dirty_sectors = 0; 606 - g->cached_sectors = 0; 607 - set_bit(BCH_FS_need_another_gc, &c->flags); 608 - } else { 609 - do_update = true; 610 - } 611 - } 612 - 613 - if (fsck_err_on(gen_cmp(g->gen, p.ptr.gen) > BUCKET_GC_GEN_MAX, 614 - c, ptr_gen_newer_than_bucket_gen, 615 - "bucket %u:%zu gen %u data type %s: ptr gen %u too stale\n" 616 - "while marking %s", 617 - p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), g->gen, 618 - bch2_data_type_str(ptr_data_type(k->k, &p.ptr)), 619 - p.ptr.gen, 620 - (printbuf_reset(&buf), 621 - bch2_bkey_val_to_text(&buf, c, *k), buf.buf))) 622 - do_update = true; 623 - 624 - if (fsck_err_on(!p.ptr.cached && gen_cmp(p.ptr.gen, g->gen) < 0, 625 - c, stale_dirty_ptr, 626 - "bucket %u:%zu data type %s stale dirty ptr: %u < %u\n" 627 - "while marking %s", 628 - p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), 629 - bch2_data_type_str(ptr_data_type(k->k, &p.ptr)), 630 - p.ptr.gen, g->gen, 631 - (printbuf_reset(&buf), 632 - bch2_bkey_val_to_text(&buf, c, *k), buf.buf))) 633 - do_update = true; 634 - 635 - if (data_type != BCH_DATA_btree && p.ptr.gen != g->gen) 636 - continue; 637 - 638 - if (fsck_err_on(bucket_data_type(g->data_type) && 639 - bucket_data_type(g->data_type) != 640 - bucket_data_type(data_type), c, 641 - ptr_bucket_data_type_mismatch, 642 - "bucket %u:%zu different types of data in same bucket: %s, %s\n" 643 - "while marking %s", 644 - p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), 645 - bch2_data_type_str(g->data_type), 646 - bch2_data_type_str(data_type), 647 - (printbuf_reset(&buf), 648 - bch2_bkey_val_to_text(&buf, c, *k), buf.buf))) { 649 - if (data_type == BCH_DATA_btree) { 650 - g->data_type = data_type; 651 - set_bit(BCH_FS_need_another_gc, &c->flags); 652 - } else { 653 - do_update = true; 654 - } 655 - } 656 - 657 - if (p.has_ec) { 658 - struct gc_stripe *m = genradix_ptr(&c->gc_stripes, p.ec.idx); 659 - 660 - if (fsck_err_on(!m || !m->alive, c, 661 - ptr_to_missing_stripe, 662 - "pointer to nonexistent stripe %llu\n" 663 - "while marking %s", 664 - (u64) p.ec.idx, 665 - (printbuf_reset(&buf), 666 - bch2_bkey_val_to_text(&buf, c, *k), buf.buf))) 667 - do_update = true; 668 - 669 - if (fsck_err_on(m && m->alive && !bch2_ptr_matches_stripe_m(m, p), c, 670 - ptr_to_incorrect_stripe, 671 - "pointer does not match stripe %llu\n" 672 - "while marking %s", 673 - (u64) p.ec.idx, 674 - (printbuf_reset(&buf), 675 - bch2_bkey_val_to_text(&buf, c, *k), buf.buf))) 676 - do_update = true; 677 - } 678 - } 679 - 680 - if (do_update) { 681 - if (is_root) { 682 - bch_err(c, "cannot update btree roots yet"); 683 - ret = -EINVAL; 684 - goto err; 685 - } 686 - 687 - struct bkey_i *new = kmalloc(bkey_bytes(k->k), GFP_KERNEL); 688 - if (!new) { 689 - ret = -BCH_ERR_ENOMEM_gc_repair_key; 690 - bch_err_msg(c, ret, "allocating new key"); 691 - goto err; 692 - } 693 - 694 - bkey_reassemble(new, *k); 695 - 696 - if (level) { 697 - /* 698 - * We don't want to drop btree node pointers - if the 699 - * btree node isn't there anymore, the read path will 700 - * sort it out: 701 - */ 702 - struct bkey_ptrs ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 703 - bkey_for_each_ptr(ptrs, ptr) { 704 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 705 - struct bucket *g = PTR_GC_BUCKET(ca, ptr); 706 - 707 - ptr->gen = g->gen; 708 - } 709 - } else { 710 - struct bkey_ptrs ptrs; 711 - union bch_extent_entry *entry; 712 - restart_drop_ptrs: 713 - ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 714 - bkey_for_each_ptr_decode(bkey_i_to_s(new).k, ptrs, p, entry) { 715 - struct bch_dev *ca = bch_dev_bkey_exists(c, p.ptr.dev); 716 - struct bucket *g = PTR_GC_BUCKET(ca, &p.ptr); 717 - enum bch_data_type data_type = bch2_bkey_ptr_data_type(bkey_i_to_s_c(new), p, entry); 718 - 719 - if ((p.ptr.cached && 720 - (!g->gen_valid || gen_cmp(p.ptr.gen, g->gen) > 0)) || 721 - (!p.ptr.cached && 722 - gen_cmp(p.ptr.gen, g->gen) < 0) || 723 - gen_cmp(g->gen, p.ptr.gen) > BUCKET_GC_GEN_MAX || 724 - (g->data_type && 725 - g->data_type != data_type)) { 726 - bch2_bkey_drop_ptr(bkey_i_to_s(new), &entry->ptr); 727 - goto restart_drop_ptrs; 728 - } 729 - } 730 - again: 731 - ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 732 - bkey_extent_entry_for_each(ptrs, entry) { 733 - if (extent_entry_type(entry) == BCH_EXTENT_ENTRY_stripe_ptr) { 734 - struct gc_stripe *m = genradix_ptr(&c->gc_stripes, 735 - entry->stripe_ptr.idx); 736 - union bch_extent_entry *next_ptr; 737 - 738 - bkey_extent_entry_for_each_from(ptrs, next_ptr, entry) 739 - if (extent_entry_type(next_ptr) == BCH_EXTENT_ENTRY_ptr) 740 - goto found; 741 - next_ptr = NULL; 742 - found: 743 - if (!next_ptr) { 744 - bch_err(c, "aieee, found stripe ptr with no data ptr"); 745 - continue; 746 - } 747 - 748 - if (!m || !m->alive || 749 - !__bch2_ptr_matches_stripe(&m->ptrs[entry->stripe_ptr.block], 750 - &next_ptr->ptr, 751 - m->sectors)) { 752 - bch2_bkey_extent_entry_drop(new, entry); 753 - goto again; 754 - } 755 - } 756 - } 757 - } 758 - 759 - if (level) 760 - bch2_btree_node_update_key_early(trans, btree_id, level - 1, *k, new); 761 - 762 - if (0) { 763 - printbuf_reset(&buf); 764 - bch2_bkey_val_to_text(&buf, c, *k); 765 - bch_info(c, "updated %s", buf.buf); 766 - 767 - printbuf_reset(&buf); 768 - bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(new)); 769 - bch_info(c, "new key %s", buf.buf); 770 - } 771 - 772 - ret = bch2_journal_key_insert_take(c, btree_id, level, new); 773 - if (ret) { 774 - kfree(new); 775 - goto err; 776 - } 777 - 778 - *k = bkey_i_to_s_c(new); 779 - } 780 - err: 781 - fsck_err: 782 - printbuf_exit(&buf); 783 - return ret; 784 - } 785 - 786 589 /* marking of btree keys/nodes: */ 787 590 788 591 static int bch2_gc_mark_key(struct btree_trans *trans, enum btree_id btree_id, 789 - unsigned level, bool is_root, 790 - struct bkey_s_c *k, 592 + unsigned level, struct btree **prev, 593 + struct btree_iter *iter, struct bkey_s_c k, 791 594 bool initial) 792 595 { 793 596 struct bch_fs *c = trans->c; 597 + 598 + if (iter) { 599 + struct btree_path *path = btree_iter_path(trans, iter); 600 + struct btree *b = path_l(path)->b; 601 + 602 + if (*prev != b) { 603 + int ret = bch2_btree_node_check_topology(trans, b); 604 + if (ret) 605 + return ret; 606 + } 607 + *prev = b; 608 + } 609 + 794 610 struct bkey deleted = KEY(0, 0, 0); 795 611 struct bkey_s_c old = (struct bkey_s_c) { &deleted, NULL }; 796 612 struct printbuf buf = PRINTBUF; 797 613 int ret = 0; 798 614 799 - deleted.p = k->k->p; 615 + deleted.p = k.k->p; 800 616 801 617 if (initial) { 802 618 BUG_ON(bch2_journal_seq_verify && 803 - k->k->version.lo > atomic64_read(&c->journal.seq)); 619 + k.k->version.lo > atomic64_read(&c->journal.seq)); 804 620 805 - if (fsck_err_on(k->k->version.lo > atomic64_read(&c->key_version), c, 621 + if (fsck_err_on(k.k->version.lo > atomic64_read(&c->key_version), c, 806 622 bkey_version_in_future, 807 623 "key version number higher than recorded: %llu > %llu", 808 - k->k->version.lo, 624 + k.k->version.lo, 809 625 atomic64_read(&c->key_version))) 810 - atomic64_set(&c->key_version, k->k->version.lo); 626 + atomic64_set(&c->key_version, k.k->version.lo); 811 627 } 812 628 813 - ret = bch2_check_fix_ptrs(trans, btree_id, level, is_root, k); 814 - if (ret) 815 - goto err; 816 - 817 - if (mustfix_fsck_err_on(level && !bch2_dev_btree_bitmap_marked(c, *k), 629 + if (mustfix_fsck_err_on(level && !bch2_dev_btree_bitmap_marked(c, k), 818 630 c, btree_bitmap_not_marked, 819 631 "btree ptr not marked in member info btree allocated bitmap\n %s", 820 - (bch2_bkey_val_to_text(&buf, c, *k), 632 + (bch2_bkey_val_to_text(&buf, c, k), 821 633 buf.buf))) { 822 634 mutex_lock(&c->sb_lock); 823 - bch2_dev_btree_bitmap_mark(c, *k); 635 + bch2_dev_btree_bitmap_mark(c, k); 824 636 bch2_write_super(c); 825 637 mutex_unlock(&c->sb_lock); 826 638 } 827 639 828 - ret = commit_do(trans, NULL, NULL, 0, 829 - bch2_key_trigger(trans, btree_id, level, old, 830 - unsafe_bkey_s_c_to_s(*k), BTREE_TRIGGER_GC)); 640 + /* 641 + * We require a commit before key_trigger() because 642 + * key_trigger(BTREE_TRIGGER_GC) is not idempotant; we'll calculate the 643 + * wrong result if we run it multiple times. 644 + */ 645 + unsigned flags = !iter ? BTREE_TRIGGER_is_root : 0; 646 + 647 + ret = bch2_key_trigger(trans, btree_id, level, old, unsafe_bkey_s_c_to_s(k), 648 + BTREE_TRIGGER_check_repair|flags); 649 + if (ret) 650 + goto out; 651 + 652 + if (trans->nr_updates) { 653 + ret = bch2_trans_commit(trans, NULL, NULL, 0) ?: 654 + -BCH_ERR_transaction_restart_nested; 655 + goto out; 656 + } 657 + 658 + ret = bch2_key_trigger(trans, btree_id, level, old, unsafe_bkey_s_c_to_s(k), 659 + BTREE_TRIGGER_gc|flags); 660 + out: 831 661 fsck_err: 832 - err: 833 662 printbuf_exit(&buf); 834 663 bch_err_fn(c, ret); 835 664 return ret; 836 665 } 837 666 838 - static int btree_gc_mark_node(struct btree_trans *trans, struct btree *b, bool initial) 667 + static int bch2_gc_btree(struct btree_trans *trans, enum btree_id btree, bool initial) 839 668 { 840 - struct btree_node_iter iter; 841 - struct bkey unpacked; 842 - struct bkey_s_c k; 669 + struct bch_fs *c = trans->c; 670 + int level = 0, target_depth = btree_node_type_needs_gc(__btree_node_type(0, btree)) ? 0 : 1; 843 671 int ret = 0; 844 672 845 - ret = bch2_btree_node_check_topology(trans, b); 673 + /* We need to make sure every leaf node is readable before going RW */ 674 + if (initial) 675 + target_depth = 0; 676 + 677 + /* root */ 678 + mutex_lock(&c->btree_root_lock); 679 + struct btree *b = bch2_btree_id_root(c, btree)->b; 680 + if (!btree_node_fake(b)) { 681 + gc_pos_set(c, gc_pos_btree(btree, b->c.level + 1, SPOS_MAX)); 682 + ret = lockrestart_do(trans, 683 + bch2_gc_mark_key(trans, b->c.btree_id, b->c.level + 1, 684 + NULL, NULL, bkey_i_to_s_c(&b->key), initial)); 685 + level = b->c.level; 686 + } 687 + mutex_unlock(&c->btree_root_lock); 688 + 846 689 if (ret) 847 690 return ret; 848 691 849 - if (!btree_node_type_needs_gc(btree_node_type(b))) 850 - return 0; 692 + for (; level >= target_depth; --level) { 693 + struct btree *prev = NULL; 694 + struct btree_iter iter; 695 + bch2_trans_node_iter_init(trans, &iter, btree, POS_MIN, 0, level, 696 + BTREE_ITER_prefetch); 851 697 852 - bch2_btree_node_iter_init_from_start(&iter, b); 853 - 854 - while ((k = bch2_btree_node_iter_peek_unpack(&iter, b, &unpacked)).k) { 855 - ret = bch2_gc_mark_key(trans, b->c.btree_id, b->c.level, false, 856 - &k, initial); 857 - if (ret) 858 - return ret; 859 - 860 - bch2_btree_node_iter_advance(&iter, b); 861 - } 862 - 863 - return 0; 864 - } 865 - 866 - static int bch2_gc_btree(struct btree_trans *trans, enum btree_id btree_id, 867 - bool initial, bool metadata_only) 868 - { 869 - struct bch_fs *c = trans->c; 870 - struct btree_iter iter; 871 - struct btree *b; 872 - unsigned depth = metadata_only ? 1 : 0; 873 - int ret = 0; 874 - 875 - gc_pos_set(c, gc_pos_btree(btree_id, POS_MIN, 0)); 876 - 877 - __for_each_btree_node(trans, iter, btree_id, POS_MIN, 878 - 0, depth, BTREE_ITER_PREFETCH, b, ret) { 879 - bch2_verify_btree_nr_keys(b); 880 - 881 - gc_pos_set(c, gc_pos_btree_node(b)); 882 - 883 - ret = btree_gc_mark_node(trans, b, initial); 698 + ret = for_each_btree_key_continue(trans, iter, 0, k, ({ 699 + gc_pos_set(c, gc_pos_btree(btree, level, k.k->p)); 700 + bch2_gc_mark_key(trans, btree, level, &prev, &iter, k, initial); 701 + })); 884 702 if (ret) 885 703 break; 886 704 } 887 - bch2_trans_iter_exit(trans, &iter); 888 705 889 - if (ret) 890 - return ret; 891 - 892 - mutex_lock(&c->btree_root_lock); 893 - b = bch2_btree_id_root(c, btree_id)->b; 894 - if (!btree_node_fake(b)) { 895 - struct bkey_s_c k = bkey_i_to_s_c(&b->key); 896 - 897 - ret = bch2_gc_mark_key(trans, b->c.btree_id, b->c.level + 1, 898 - true, &k, initial); 899 - } 900 - gc_pos_set(c, gc_pos_btree_root(b->c.btree_id)); 901 - mutex_unlock(&c->btree_root_lock); 902 - 903 - return ret; 904 - } 905 - 906 - static int bch2_gc_btree_init_recurse(struct btree_trans *trans, struct btree *b, 907 - unsigned target_depth) 908 - { 909 - struct bch_fs *c = trans->c; 910 - struct btree_and_journal_iter iter; 911 - struct bkey_s_c k; 912 - struct bkey_buf cur; 913 - struct printbuf buf = PRINTBUF; 914 - int ret = 0; 915 - 916 - ret = bch2_btree_node_check_topology(trans, b); 917 - if (ret) 918 - return ret; 919 - 920 - bch2_btree_and_journal_iter_init_node_iter(trans, &iter, b); 921 - bch2_bkey_buf_init(&cur); 922 - 923 - while ((k = bch2_btree_and_journal_iter_peek(&iter)).k) { 924 - BUG_ON(bpos_lt(k.k->p, b->data->min_key)); 925 - BUG_ON(bpos_gt(k.k->p, b->data->max_key)); 926 - 927 - ret = bch2_gc_mark_key(trans, b->c.btree_id, b->c.level, 928 - false, &k, true); 929 - if (ret) 930 - goto fsck_err; 931 - 932 - bch2_btree_and_journal_iter_advance(&iter); 933 - } 934 - 935 - if (b->c.level > target_depth) { 936 - bch2_btree_and_journal_iter_exit(&iter); 937 - bch2_btree_and_journal_iter_init_node_iter(trans, &iter, b); 938 - iter.prefetch = true; 939 - 940 - while ((k = bch2_btree_and_journal_iter_peek(&iter)).k) { 941 - struct btree *child; 942 - 943 - bch2_bkey_buf_reassemble(&cur, c, k); 944 - bch2_btree_and_journal_iter_advance(&iter); 945 - 946 - child = bch2_btree_node_get_noiter(trans, cur.k, 947 - b->c.btree_id, b->c.level - 1, 948 - false); 949 - ret = PTR_ERR_OR_ZERO(child); 950 - 951 - if (bch2_err_matches(ret, EIO)) { 952 - bch2_topology_error(c); 953 - 954 - if (__fsck_err(c, 955 - FSCK_CAN_FIX| 956 - FSCK_CAN_IGNORE| 957 - FSCK_NO_RATELIMIT, 958 - btree_node_read_error, 959 - "Unreadable btree node at btree %s level %u:\n" 960 - " %s", 961 - bch2_btree_id_str(b->c.btree_id), 962 - b->c.level - 1, 963 - (printbuf_reset(&buf), 964 - bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(cur.k)), buf.buf)) && 965 - should_restart_for_topology_repair(c)) { 966 - bch_info(c, "Halting mark and sweep to start topology repair pass"); 967 - ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_check_topology); 968 - goto fsck_err; 969 - } else { 970 - /* Continue marking when opted to not 971 - * fix the error: */ 972 - ret = 0; 973 - set_bit(BCH_FS_initial_gc_unfixed, &c->flags); 974 - continue; 975 - } 976 - } else if (ret) { 977 - bch_err_msg(c, ret, "getting btree node"); 978 - break; 979 - } 980 - 981 - ret = bch2_gc_btree_init_recurse(trans, child, 982 - target_depth); 983 - six_unlock_read(&child->c.lock); 984 - 985 - if (ret) 986 - break; 987 - } 988 - } 989 - fsck_err: 990 - bch2_bkey_buf_exit(&cur, c); 991 - bch2_btree_and_journal_iter_exit(&iter); 992 - printbuf_exit(&buf); 993 - return ret; 994 - } 995 - 996 - static int bch2_gc_btree_init(struct btree_trans *trans, 997 - enum btree_id btree_id, 998 - bool metadata_only) 999 - { 1000 - struct bch_fs *c = trans->c; 1001 - struct btree *b; 1002 - unsigned target_depth = metadata_only ? 1 : 0; 1003 - struct printbuf buf = PRINTBUF; 1004 - int ret = 0; 1005 - 1006 - b = bch2_btree_id_root(c, btree_id)->b; 1007 - 1008 - six_lock_read(&b->c.lock, NULL, NULL); 1009 - printbuf_reset(&buf); 1010 - bch2_bpos_to_text(&buf, b->data->min_key); 1011 - if (mustfix_fsck_err_on(!bpos_eq(b->data->min_key, POS_MIN), c, 1012 - btree_root_bad_min_key, 1013 - "btree root with incorrect min_key: %s", buf.buf)) { 1014 - bch_err(c, "repair unimplemented"); 1015 - ret = -BCH_ERR_fsck_repair_unimplemented; 1016 - goto fsck_err; 1017 - } 1018 - 1019 - printbuf_reset(&buf); 1020 - bch2_bpos_to_text(&buf, b->data->max_key); 1021 - if (mustfix_fsck_err_on(!bpos_eq(b->data->max_key, SPOS_MAX), c, 1022 - btree_root_bad_max_key, 1023 - "btree root with incorrect max_key: %s", buf.buf)) { 1024 - bch_err(c, "repair unimplemented"); 1025 - ret = -BCH_ERR_fsck_repair_unimplemented; 1026 - goto fsck_err; 1027 - } 1028 - 1029 - if (b->c.level >= target_depth) 1030 - ret = bch2_gc_btree_init_recurse(trans, b, target_depth); 1031 - 1032 - if (!ret) { 1033 - struct bkey_s_c k = bkey_i_to_s_c(&b->key); 1034 - 1035 - ret = bch2_gc_mark_key(trans, b->c.btree_id, b->c.level + 1, true, 1036 - &k, true); 1037 - } 1038 - fsck_err: 1039 - six_unlock_read(&b->c.lock); 1040 - 1041 - bch_err_fn(c, ret); 1042 - printbuf_exit(&buf); 1043 706 return ret; 1044 707 } 1045 708 ··· 677 1084 (int) btree_id_to_gc_phase(r); 678 1085 } 679 1086 680 - static int bch2_gc_btrees(struct bch_fs *c, bool initial, bool metadata_only) 1087 + static int bch2_gc_btrees(struct bch_fs *c) 681 1088 { 682 1089 struct btree_trans *trans = bch2_trans_get(c); 683 1090 enum btree_id ids[BTREE_ID_NR]; ··· 688 1095 ids[i] = i; 689 1096 bubble_sort(ids, BTREE_ID_NR, btree_id_gc_phase_cmp); 690 1097 691 - for (i = 0; i < BTREE_ID_NR && !ret; i++) 692 - ret = initial 693 - ? bch2_gc_btree_init(trans, ids[i], metadata_only) 694 - : bch2_gc_btree(trans, ids[i], initial, metadata_only); 1098 + for (i = 0; i < btree_id_nr_alive(c) && !ret; i++) { 1099 + unsigned btree = i < BTREE_ID_NR ? ids[i] : i; 695 1100 696 - for (i = BTREE_ID_NR; i < btree_id_nr_alive(c) && !ret; i++) { 697 - if (!bch2_btree_id_root(c, i)->alive) 1101 + if (IS_ERR_OR_NULL(bch2_btree_id_root(c, btree)->b)) 698 1102 continue; 699 1103 700 - ret = initial 701 - ? bch2_gc_btree_init(trans, i, metadata_only) 702 - : bch2_gc_btree(trans, i, initial, metadata_only); 703 - } 1104 + ret = bch2_gc_btree(trans, btree, true); 704 1105 1106 + if (mustfix_fsck_err_on(bch2_err_matches(ret, EIO), 1107 + c, btree_node_read_error, 1108 + "btree node read error for %s", 1109 + bch2_btree_id_str(btree))) 1110 + ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_check_topology); 1111 + } 1112 + fsck_err: 705 1113 bch2_trans_put(trans); 706 1114 bch_err_fn(c, ret); 707 1115 return ret; 708 1116 } 709 1117 710 - static void mark_metadata_sectors(struct bch_fs *c, struct bch_dev *ca, 711 - u64 start, u64 end, 712 - enum bch_data_type type, 713 - unsigned flags) 714 - { 715 - u64 b = sector_to_bucket(ca, start); 716 - 717 - do { 718 - unsigned sectors = 719 - min_t(u64, bucket_to_sector(ca, b + 1), end) - start; 720 - 721 - bch2_mark_metadata_bucket(c, ca, b, type, sectors, 722 - gc_phase(GC_PHASE_SB), flags); 723 - b++; 724 - start += sectors; 725 - } while (start < end); 726 - } 727 - 728 - static void bch2_mark_dev_superblock(struct bch_fs *c, struct bch_dev *ca, 729 - unsigned flags) 730 - { 731 - struct bch_sb_layout *layout = &ca->disk_sb.sb->layout; 732 - unsigned i; 733 - u64 b; 734 - 735 - for (i = 0; i < layout->nr_superblocks; i++) { 736 - u64 offset = le64_to_cpu(layout->sb_offset[i]); 737 - 738 - if (offset == BCH_SB_SECTOR) 739 - mark_metadata_sectors(c, ca, 0, BCH_SB_SECTOR, 740 - BCH_DATA_sb, flags); 741 - 742 - mark_metadata_sectors(c, ca, offset, 743 - offset + (1 << layout->sb_max_size_bits), 744 - BCH_DATA_sb, flags); 745 - } 746 - 747 - for (i = 0; i < ca->journal.nr; i++) { 748 - b = ca->journal.buckets[i]; 749 - bch2_mark_metadata_bucket(c, ca, b, BCH_DATA_journal, 750 - ca->mi.bucket_size, 751 - gc_phase(GC_PHASE_SB), flags); 752 - } 753 - } 754 - 755 - static void bch2_mark_superblocks(struct bch_fs *c) 1118 + static int bch2_mark_superblocks(struct bch_fs *c) 756 1119 { 757 1120 mutex_lock(&c->sb_lock); 758 1121 gc_pos_set(c, gc_phase(GC_PHASE_SB)); 759 1122 760 - for_each_online_member(c, ca) 761 - bch2_mark_dev_superblock(c, ca, BTREE_TRIGGER_GC); 1123 + int ret = bch2_trans_mark_dev_sbs_flags(c, BTREE_TRIGGER_gc); 762 1124 mutex_unlock(&c->sb_lock); 1125 + return ret; 763 1126 } 764 - 765 - #if 0 766 - /* Also see bch2_pending_btree_node_free_insert_done() */ 767 - static void bch2_mark_pending_btree_node_frees(struct bch_fs *c) 768 - { 769 - struct btree_update *as; 770 - struct pending_btree_node_free *d; 771 - 772 - mutex_lock(&c->btree_interior_update_lock); 773 - gc_pos_set(c, gc_phase(GC_PHASE_PENDING_DELETE)); 774 - 775 - for_each_pending_btree_node_free(c, as, d) 776 - if (d->index_update_done) 777 - bch2_mark_key(c, bkey_i_to_s_c(&d->key), BTREE_TRIGGER_GC); 778 - 779 - mutex_unlock(&c->btree_interior_update_lock); 780 - } 781 - #endif 782 1127 783 1128 static void bch2_gc_free(struct bch_fs *c) 784 1129 { ··· 735 1204 c->usage_gc = NULL; 736 1205 } 737 1206 738 - static int bch2_gc_done(struct bch_fs *c, 739 - bool initial, bool metadata_only) 1207 + static int bch2_gc_done(struct bch_fs *c) 740 1208 { 741 1209 struct bch_dev *ca = NULL; 742 1210 struct printbuf buf = PRINTBUF; 743 - bool verify = !metadata_only && 744 - !c->opts.reconstruct_alloc && 745 - (!initial || (c->sb.compat & (1ULL << BCH_COMPAT_alloc_info))); 746 1211 unsigned i; 747 1212 int ret = 0; 748 1213 749 1214 percpu_down_write(&c->mark_lock); 750 1215 751 - #define copy_field(_err, _f, _msg, ...) \ 752 - if (dst->_f != src->_f && \ 753 - (!verify || \ 754 - fsck_err(c, _err, _msg ": got %llu, should be %llu" \ 755 - , ##__VA_ARGS__, dst->_f, src->_f))) \ 1216 + #define copy_field(_err, _f, _msg, ...) \ 1217 + if (fsck_err_on(dst->_f != src->_f, c, _err, \ 1218 + _msg ": got %llu, should be %llu" , ##__VA_ARGS__, \ 1219 + dst->_f, src->_f)) \ 756 1220 dst->_f = src->_f 757 - #define copy_dev_field(_err, _f, _msg, ...) \ 1221 + #define copy_dev_field(_err, _f, _msg, ...) \ 758 1222 copy_field(_err, _f, "dev %u has wrong " _msg, ca->dev_idx, ##__VA_ARGS__) 759 - #define copy_fs_field(_err, _f, _msg, ...) \ 1223 + #define copy_fs_field(_err, _f, _msg, ...) \ 760 1224 copy_field(_err, _f, "fs has wrong " _msg, ##__VA_ARGS__) 761 1225 762 1226 for (i = 0; i < ARRAY_SIZE(c->usage); i++) ··· 784 1258 copy_fs_field(fs_usage_btree_wrong, 785 1259 b.btree, "btree"); 786 1260 787 - if (!metadata_only) { 788 - copy_fs_field(fs_usage_data_wrong, 789 - b.data, "data"); 790 - copy_fs_field(fs_usage_cached_wrong, 791 - b.cached, "cached"); 792 - copy_fs_field(fs_usage_reserved_wrong, 793 - b.reserved, "reserved"); 794 - copy_fs_field(fs_usage_nr_inodes_wrong, 795 - b.nr_inodes,"nr_inodes"); 1261 + copy_fs_field(fs_usage_data_wrong, 1262 + b.data, "data"); 1263 + copy_fs_field(fs_usage_cached_wrong, 1264 + b.cached, "cached"); 1265 + copy_fs_field(fs_usage_reserved_wrong, 1266 + b.reserved, "reserved"); 1267 + copy_fs_field(fs_usage_nr_inodes_wrong, 1268 + b.nr_inodes,"nr_inodes"); 796 1269 797 - for (i = 0; i < BCH_REPLICAS_MAX; i++) 798 - copy_fs_field(fs_usage_persistent_reserved_wrong, 799 - persistent_reserved[i], 800 - "persistent_reserved[%i]", i); 801 - } 1270 + for (i = 0; i < BCH_REPLICAS_MAX; i++) 1271 + copy_fs_field(fs_usage_persistent_reserved_wrong, 1272 + persistent_reserved[i], 1273 + "persistent_reserved[%i]", i); 802 1274 803 1275 for (i = 0; i < c->replicas.nr; i++) { 804 1276 struct bch_replicas_entry_v1 *e = 805 1277 cpu_replicas_entry(&c->replicas, i); 806 - 807 - if (metadata_only && 808 - (e->data_type == BCH_DATA_user || 809 - e->data_type == BCH_DATA_cached)) 810 - continue; 811 1278 812 1279 printbuf_reset(&buf); 813 1280 bch2_replicas_entry_to_text(&buf, e); ··· 815 1296 #undef copy_stripe_field 816 1297 #undef copy_field 817 1298 fsck_err: 818 - if (ca) 819 - percpu_ref_put(&ca->ref); 1299 + bch2_dev_put(ca); 820 1300 bch_err_fn(c, ret); 821 - 822 1301 percpu_up_write(&c->mark_lock); 823 1302 printbuf_exit(&buf); 824 1303 return ret; ··· 839 1322 ca->usage_gc = alloc_percpu(struct bch_dev_usage); 840 1323 if (!ca->usage_gc) { 841 1324 bch_err(c, "error allocating ca->usage_gc"); 842 - percpu_ref_put(&ca->ref); 1325 + bch2_dev_put(ca); 843 1326 return -BCH_ERR_ENOMEM_gc_start; 844 1327 } 845 1328 ··· 848 1331 } 849 1332 850 1333 return 0; 851 - } 852 - 853 - static int bch2_gc_reset(struct bch_fs *c) 854 - { 855 - for_each_member_device(c, ca) { 856 - free_percpu(ca->usage_gc); 857 - ca->usage_gc = NULL; 858 - } 859 - 860 - free_percpu(c->usage_gc); 861 - c->usage_gc = NULL; 862 - 863 - return bch2_gc_start(c); 864 1334 } 865 1335 866 1336 /* returns true if not equal */ ··· 865 1361 866 1362 static int bch2_alloc_write_key(struct btree_trans *trans, 867 1363 struct btree_iter *iter, 868 - struct bkey_s_c k, 869 - bool metadata_only) 1364 + struct bch_dev *ca, 1365 + struct bkey_s_c k) 870 1366 { 871 1367 struct bch_fs *c = trans->c; 872 - struct bch_dev *ca = bch_dev_bkey_exists(c, iter->pos.inode); 873 - struct bucket old_gc, gc, *b; 874 1368 struct bkey_i_alloc_v4 *a; 875 - struct bch_alloc_v4 old_convert, new; 1369 + struct bch_alloc_v4 old_gc, gc, old_convert, new; 876 1370 const struct bch_alloc_v4 *old; 877 1371 int ret; 878 1372 879 1373 old = bch2_alloc_to_v4(k, &old_convert); 880 - new = *old; 1374 + gc = new = *old; 881 1375 882 1376 percpu_down_read(&c->mark_lock); 883 - b = gc_bucket(ca, iter->pos.offset); 884 - old_gc = *b; 1377 + __bucket_m_to_alloc(&gc, *gc_bucket(ca, iter->pos.offset)); 1378 + 1379 + old_gc = gc; 885 1380 886 1381 if ((old->data_type == BCH_DATA_sb || 887 1382 old->data_type == BCH_DATA_journal) && 888 1383 !bch2_dev_is_online(ca)) { 889 - b->data_type = old->data_type; 890 - b->dirty_sectors = old->dirty_sectors; 1384 + gc.data_type = old->data_type; 1385 + gc.dirty_sectors = old->dirty_sectors; 891 1386 } 892 1387 893 1388 /* 894 - * b->data_type doesn't yet include need_discard & need_gc_gen states - 1389 + * gc.data_type doesn't yet include need_discard & need_gc_gen states - 895 1390 * fix that here: 896 1391 */ 897 - b->data_type = __alloc_data_type(b->dirty_sectors, 898 - b->cached_sectors, 899 - b->stripe, 900 - *old, 901 - b->data_type); 902 - gc = *b; 1392 + alloc_data_type_set(&gc, gc.data_type); 903 1393 904 1394 if (gc.data_type != old_gc.data_type || 905 1395 gc.dirty_sectors != old_gc.dirty_sectors) 906 - bch2_dev_usage_update_m(c, ca, &old_gc, &gc); 1396 + bch2_dev_usage_update(c, ca, &old_gc, &gc, 0, true); 907 1397 percpu_up_read(&c->mark_lock); 908 - 909 - if (metadata_only && 910 - gc.data_type != BCH_DATA_sb && 911 - gc.data_type != BCH_DATA_journal && 912 - gc.data_type != BCH_DATA_btree) 913 - return 0; 914 - 915 - if (gen_after(old->gen, gc.gen)) 916 - return 0; 917 1398 918 1399 if (fsck_err_on(new.data_type != gc.data_type, c, 919 1400 alloc_key_data_type_wrong, ··· 949 1460 if (a->v.data_type == BCH_DATA_cached && !a->v.io_time[READ]) 950 1461 a->v.io_time[READ] = max_t(u64, 1, atomic64_read(&c->io_clock[READ].now)); 951 1462 952 - ret = bch2_trans_update(trans, iter, &a->k_i, BTREE_TRIGGER_NORUN); 1463 + ret = bch2_trans_update(trans, iter, &a->k_i, BTREE_TRIGGER_norun); 953 1464 fsck_err: 954 1465 return ret; 955 1466 } 956 1467 957 - static int bch2_gc_alloc_done(struct bch_fs *c, bool metadata_only) 1468 + static int bch2_gc_alloc_done(struct bch_fs *c) 958 1469 { 959 1470 int ret = 0; 960 1471 ··· 963 1474 for_each_btree_key_upto_commit(trans, iter, BTREE_ID_alloc, 964 1475 POS(ca->dev_idx, ca->mi.first_bucket), 965 1476 POS(ca->dev_idx, ca->mi.nbuckets - 1), 966 - BTREE_ITER_SLOTS|BTREE_ITER_PREFETCH, k, 1477 + BTREE_ITER_slots|BTREE_ITER_prefetch, k, 967 1478 NULL, NULL, BCH_TRANS_COMMIT_lazy_rw, 968 - bch2_alloc_write_key(trans, &iter, k, metadata_only))); 1479 + bch2_alloc_write_key(trans, &iter, ca, k))); 969 1480 if (ret) { 970 - percpu_ref_put(&ca->ref); 1481 + bch2_dev_put(ca); 971 1482 break; 972 1483 } 973 1484 } ··· 976 1487 return ret; 977 1488 } 978 1489 979 - static int bch2_gc_alloc_start(struct bch_fs *c, bool metadata_only) 1490 + static int bch2_gc_alloc_start(struct bch_fs *c) 980 1491 { 981 1492 for_each_member_device(c, ca) { 982 1493 struct bucket_array *buckets = kvmalloc(sizeof(struct bucket_array) + 983 1494 ca->mi.nbuckets * sizeof(struct bucket), 984 1495 GFP_KERNEL|__GFP_ZERO); 985 1496 if (!buckets) { 986 - percpu_ref_put(&ca->ref); 1497 + bch2_dev_put(ca); 987 1498 bch_err(c, "error allocating ca->buckets[gc]"); 988 1499 return -BCH_ERR_ENOMEM_gc_alloc_start; 989 1500 } ··· 993 1504 rcu_assign_pointer(ca->buckets_gc, buckets); 994 1505 } 995 1506 1507 + struct bch_dev *ca = NULL; 996 1508 int ret = bch2_trans_run(c, 997 1509 for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN, 998 - BTREE_ITER_PREFETCH, k, ({ 999 - struct bch_dev *ca = bch_dev_bkey_exists(c, k.k->p.inode); 1000 - struct bucket *g = gc_bucket(ca, k.k->p.offset); 1510 + BTREE_ITER_prefetch, k, ({ 1511 + ca = bch2_dev_iterate(c, ca, k.k->p.inode); 1512 + if (!ca) { 1513 + bch2_btree_iter_set_pos(&iter, POS(k.k->p.inode + 1, 0)); 1514 + continue; 1515 + } 1001 1516 1002 1517 struct bch_alloc_v4 a_convert; 1003 1518 const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &a_convert); 1004 1519 1520 + struct bucket *g = gc_bucket(ca, k.k->p.offset); 1005 1521 g->gen_valid = 1; 1006 1522 g->gen = a->gen; 1007 - 1008 - if (metadata_only && 1009 - (a->data_type == BCH_DATA_user || 1010 - a->data_type == BCH_DATA_cached || 1011 - a->data_type == BCH_DATA_parity)) { 1012 - g->data_type = a->data_type; 1013 - g->dirty_sectors = a->dirty_sectors; 1014 - g->cached_sectors = a->cached_sectors; 1015 - g->stripe = a->stripe; 1016 - g->stripe_redundancy = a->stripe_redundancy; 1017 - } 1018 - 1019 1523 0; 1020 1524 }))); 1525 + bch2_dev_put(ca); 1021 1526 bch_err_fn(c, ret); 1022 1527 return ret; 1023 - } 1024 - 1025 - static void bch2_gc_alloc_reset(struct bch_fs *c, bool metadata_only) 1026 - { 1027 - for_each_member_device(c, ca) { 1028 - struct bucket_array *buckets = gc_bucket_array(ca); 1029 - struct bucket *g; 1030 - 1031 - for_each_bucket(g, buckets) { 1032 - if (metadata_only && 1033 - (g->data_type == BCH_DATA_user || 1034 - g->data_type == BCH_DATA_cached || 1035 - g->data_type == BCH_DATA_parity)) 1036 - continue; 1037 - g->data_type = 0; 1038 - g->dirty_sectors = 0; 1039 - g->cached_sectors = 0; 1040 - } 1041 - } 1042 1528 } 1043 1529 1044 1530 static int bch2_gc_write_reflink_key(struct btree_trans *trans, ··· 1065 1601 return ret; 1066 1602 } 1067 1603 1068 - static int bch2_gc_reflink_done(struct bch_fs *c, bool metadata_only) 1604 + static int bch2_gc_reflink_done(struct bch_fs *c) 1069 1605 { 1070 1606 size_t idx = 0; 1071 - 1072 - if (metadata_only) 1073 - return 0; 1074 1607 1075 1608 int ret = bch2_trans_run(c, 1076 1609 for_each_btree_key_commit(trans, iter, 1077 1610 BTREE_ID_reflink, POS_MIN, 1078 - BTREE_ITER_PREFETCH, k, 1611 + BTREE_ITER_prefetch, k, 1079 1612 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 1080 1613 bch2_gc_write_reflink_key(trans, &iter, k, &idx))); 1081 1614 c->reflink_gc_nr = 0; 1082 1615 return ret; 1083 1616 } 1084 1617 1085 - static int bch2_gc_reflink_start(struct bch_fs *c, 1086 - bool metadata_only) 1618 + static int bch2_gc_reflink_start(struct bch_fs *c) 1087 1619 { 1088 - 1089 - if (metadata_only) 1090 - return 0; 1091 - 1092 1620 c->reflink_gc_nr = 0; 1093 1621 1094 1622 int ret = bch2_trans_run(c, 1095 1623 for_each_btree_key(trans, iter, BTREE_ID_reflink, POS_MIN, 1096 - BTREE_ITER_PREFETCH, k, ({ 1624 + BTREE_ITER_prefetch, k, ({ 1097 1625 const __le64 *refcount = bkey_refcount_c(k); 1098 1626 1099 1627 if (!refcount) ··· 1106 1650 1107 1651 bch_err_fn(c, ret); 1108 1652 return ret; 1109 - } 1110 - 1111 - static void bch2_gc_reflink_reset(struct bch_fs *c, bool metadata_only) 1112 - { 1113 - struct genradix_iter iter; 1114 - struct reflink_gc *r; 1115 - 1116 - genradix_for_each(&c->reflink_gc_table, iter, r) 1117 - r->refcount = 0; 1118 1653 } 1119 1654 1120 1655 static int bch2_gc_write_stripes_key(struct btree_trans *trans, ··· 1161 1714 return ret; 1162 1715 } 1163 1716 1164 - static int bch2_gc_stripes_done(struct bch_fs *c, bool metadata_only) 1717 + static int bch2_gc_stripes_done(struct bch_fs *c) 1165 1718 { 1166 - if (metadata_only) 1167 - return 0; 1168 - 1169 1719 return bch2_trans_run(c, 1170 1720 for_each_btree_key_commit(trans, iter, 1171 1721 BTREE_ID_stripes, POS_MIN, 1172 - BTREE_ITER_PREFETCH, k, 1722 + BTREE_ITER_prefetch, k, 1173 1723 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 1174 1724 bch2_gc_write_stripes_key(trans, &iter, k))); 1175 1725 } 1176 1726 1177 - static void bch2_gc_stripes_reset(struct bch_fs *c, bool metadata_only) 1178 - { 1179 - genradix_free(&c->gc_stripes); 1180 - } 1181 - 1182 1727 /** 1183 - * bch2_gc - walk _all_ references to buckets, and recompute them: 1728 + * bch2_check_allocations - walk all references to buckets, and recompute them: 1184 1729 * 1185 1730 * @c: filesystem object 1186 - * @initial: are we in recovery? 1187 - * @metadata_only: are we just checking metadata references, or everything? 1188 1731 * 1189 1732 * Returns: 0 on success, or standard errcode on failure 1190 1733 * ··· 1193 1756 * move around - if references move backwards in the ordering GC 1194 1757 * uses, GC could skip past them 1195 1758 */ 1196 - int bch2_gc(struct bch_fs *c, bool initial, bool metadata_only) 1759 + int bch2_check_allocations(struct bch_fs *c) 1197 1760 { 1198 - unsigned iter = 0; 1199 1761 int ret; 1200 1762 1201 1763 lockdep_assert_held(&c->state_lock); ··· 1204 1768 bch2_btree_interior_updates_flush(c); 1205 1769 1206 1770 ret = bch2_gc_start(c) ?: 1207 - bch2_gc_alloc_start(c, metadata_only) ?: 1208 - bch2_gc_reflink_start(c, metadata_only); 1771 + bch2_gc_alloc_start(c) ?: 1772 + bch2_gc_reflink_start(c); 1209 1773 if (ret) 1210 1774 goto out; 1211 - again: 1775 + 1212 1776 gc_pos_set(c, gc_phase(GC_PHASE_START)); 1213 1777 1214 - bch2_mark_superblocks(c); 1778 + ret = bch2_mark_superblocks(c); 1779 + BUG_ON(ret); 1215 1780 1216 - ret = bch2_gc_btrees(c, initial, metadata_only); 1217 - 1781 + ret = bch2_gc_btrees(c); 1218 1782 if (ret) 1219 1783 goto out; 1220 1784 1221 - #if 0 1222 - bch2_mark_pending_btree_node_frees(c); 1223 - #endif 1224 1785 c->gc_count++; 1225 1786 1226 - if (test_bit(BCH_FS_need_another_gc, &c->flags) || 1227 - (!iter && bch2_test_restart_gc)) { 1228 - if (iter++ > 2) { 1229 - bch_info(c, "Unable to fix bucket gens, looping"); 1230 - ret = -EINVAL; 1231 - goto out; 1232 - } 1233 - 1234 - /* 1235 - * XXX: make sure gens we fixed got saved 1236 - */ 1237 - bch_info(c, "Second GC pass needed, restarting:"); 1238 - clear_bit(BCH_FS_need_another_gc, &c->flags); 1239 - __gc_pos_set(c, gc_phase(GC_PHASE_NOT_RUNNING)); 1240 - 1241 - bch2_gc_stripes_reset(c, metadata_only); 1242 - bch2_gc_alloc_reset(c, metadata_only); 1243 - bch2_gc_reflink_reset(c, metadata_only); 1244 - ret = bch2_gc_reset(c); 1245 - if (ret) 1246 - goto out; 1247 - 1248 - /* flush fsck errors, reset counters */ 1249 - bch2_flush_fsck_errs(c); 1250 - goto again; 1251 - } 1787 + bch2_journal_block(&c->journal); 1252 1788 out: 1253 - if (!ret) { 1254 - bch2_journal_block(&c->journal); 1789 + ret = bch2_gc_alloc_done(c) ?: 1790 + bch2_gc_done(c) ?: 1791 + bch2_gc_stripes_done(c) ?: 1792 + bch2_gc_reflink_done(c); 1255 1793 1256 - ret = bch2_gc_alloc_done(c, metadata_only) ?: 1257 - bch2_gc_done(c, initial, metadata_only) ?: 1258 - bch2_gc_stripes_done(c, metadata_only) ?: 1259 - bch2_gc_reflink_done(c, metadata_only); 1260 - 1261 - bch2_journal_unblock(&c->journal); 1262 - } 1794 + bch2_journal_unblock(&c->journal); 1263 1795 1264 1796 percpu_down_write(&c->mark_lock); 1265 1797 /* Indicates that gc is no longer in progress: */ ··· 1256 1852 struct bkey_i *u; 1257 1853 int ret; 1258 1854 1259 - percpu_down_read(&c->mark_lock); 1260 - bkey_for_each_ptr(ptrs, ptr) { 1261 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 1855 + if (unlikely(test_bit(BCH_FS_going_ro, &c->flags))) 1856 + return -EROFS; 1262 1857 1263 - if (ptr_stale(ca, ptr) > 16) { 1858 + percpu_down_read(&c->mark_lock); 1859 + rcu_read_lock(); 1860 + bkey_for_each_ptr(ptrs, ptr) { 1861 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 1862 + if (!ca) 1863 + continue; 1864 + 1865 + if (dev_ptr_stale(ca, ptr) > 16) { 1866 + rcu_read_unlock(); 1264 1867 percpu_up_read(&c->mark_lock); 1265 1868 goto update; 1266 1869 } 1267 1870 } 1268 1871 1269 1872 bkey_for_each_ptr(ptrs, ptr) { 1270 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 1271 - u8 *gen = &ca->oldest_gen[PTR_BUCKET_NR(ca, ptr)]; 1873 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 1874 + if (!ca) 1875 + continue; 1272 1876 1877 + u8 *gen = &ca->oldest_gen[PTR_BUCKET_NR(ca, ptr)]; 1273 1878 if (gen_after(*gen, ptr->gen)) 1274 1879 *gen = ptr->gen; 1275 1880 } 1881 + rcu_read_unlock(); 1276 1882 percpu_up_read(&c->mark_lock); 1277 1883 return 0; 1278 1884 update: ··· 1295 1881 return 0; 1296 1882 } 1297 1883 1298 - static int bch2_alloc_write_oldest_gen(struct btree_trans *trans, struct btree_iter *iter, 1299 - struct bkey_s_c k) 1884 + static int bch2_alloc_write_oldest_gen(struct btree_trans *trans, struct bch_dev *ca, 1885 + struct btree_iter *iter, struct bkey_s_c k) 1300 1886 { 1301 - struct bch_dev *ca = bch_dev_bkey_exists(trans->c, iter->pos.inode); 1302 1887 struct bch_alloc_v4 a_convert; 1303 1888 const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &a_convert); 1304 1889 struct bkey_i_alloc_v4 *a_mut; ··· 1312 1899 return ret; 1313 1900 1314 1901 a_mut->v.oldest_gen = ca->oldest_gen[iter->pos.offset]; 1315 - a_mut->v.data_type = alloc_data_type(a_mut->v, a_mut->v.data_type); 1902 + alloc_data_type_set(&a_mut->v, a_mut->v.data_type); 1316 1903 1317 1904 return bch2_trans_update(trans, iter, &a_mut->k_i, 0); 1318 1905 } ··· 1340 1927 1341 1928 ca->oldest_gen = kvmalloc(gens->nbuckets, GFP_KERNEL); 1342 1929 if (!ca->oldest_gen) { 1343 - percpu_ref_put(&ca->ref); 1930 + bch2_dev_put(ca); 1344 1931 ret = -BCH_ERR_ENOMEM_gc_gens; 1345 1932 goto err; 1346 1933 } ··· 1358 1945 ret = bch2_trans_run(c, 1359 1946 for_each_btree_key_commit(trans, iter, i, 1360 1947 POS_MIN, 1361 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, 1948 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, 1362 1949 k, 1363 1950 NULL, NULL, 1364 1951 BCH_TRANS_COMMIT_no_enospc, ··· 1367 1954 goto err; 1368 1955 } 1369 1956 1957 + struct bch_dev *ca = NULL; 1370 1958 ret = bch2_trans_run(c, 1371 1959 for_each_btree_key_commit(trans, iter, BTREE_ID_alloc, 1372 1960 POS_MIN, 1373 - BTREE_ITER_PREFETCH, 1961 + BTREE_ITER_prefetch, 1374 1962 k, 1375 1963 NULL, NULL, 1376 - BCH_TRANS_COMMIT_no_enospc, 1377 - bch2_alloc_write_oldest_gen(trans, &iter, k))); 1964 + BCH_TRANS_COMMIT_no_enospc, ({ 1965 + ca = bch2_dev_iterate(c, ca, k.k->p.inode); 1966 + if (!ca) { 1967 + bch2_btree_iter_set_pos(&iter, POS(k.k->p.inode + 1, 0)); 1968 + continue; 1969 + } 1970 + bch2_alloc_write_oldest_gen(trans, ca, &iter, k); 1971 + }))); 1972 + bch2_dev_put(ca); 1973 + 1378 1974 if (ret) 1379 1975 goto err; 1380 1976 ··· 1407 1985 return ret; 1408 1986 } 1409 1987 1410 - static int bch2_gc_thread(void *arg) 1988 + static void bch2_gc_gens_work(struct work_struct *work) 1411 1989 { 1412 - struct bch_fs *c = arg; 1413 - struct io_clock *clock = &c->io_clock[WRITE]; 1414 - unsigned long last = atomic64_read(&clock->now); 1415 - unsigned last_kick = atomic_read(&c->kick_gc); 1416 - 1417 - set_freezable(); 1418 - 1419 - while (1) { 1420 - while (1) { 1421 - set_current_state(TASK_INTERRUPTIBLE); 1422 - 1423 - if (kthread_should_stop()) { 1424 - __set_current_state(TASK_RUNNING); 1425 - return 0; 1426 - } 1427 - 1428 - if (atomic_read(&c->kick_gc) != last_kick) 1429 - break; 1430 - 1431 - if (c->btree_gc_periodic) { 1432 - unsigned long next = last + c->capacity / 16; 1433 - 1434 - if (atomic64_read(&clock->now) >= next) 1435 - break; 1436 - 1437 - bch2_io_clock_schedule_timeout(clock, next); 1438 - } else { 1439 - schedule(); 1440 - } 1441 - 1442 - try_to_freeze(); 1443 - } 1444 - __set_current_state(TASK_RUNNING); 1445 - 1446 - last = atomic64_read(&clock->now); 1447 - last_kick = atomic_read(&c->kick_gc); 1448 - 1449 - /* 1450 - * Full gc is currently incompatible with btree key cache: 1451 - */ 1452 - #if 0 1453 - ret = bch2_gc(c, false, false); 1454 - #else 1455 - bch2_gc_gens(c); 1456 - #endif 1457 - debug_check_no_locks_held(); 1458 - } 1459 - 1460 - return 0; 1990 + struct bch_fs *c = container_of(work, struct bch_fs, gc_gens_work); 1991 + bch2_gc_gens(c); 1992 + bch2_write_ref_put(c, BCH_WRITE_REF_gc_gens); 1461 1993 } 1462 1994 1463 - void bch2_gc_thread_stop(struct bch_fs *c) 1995 + void bch2_gc_gens_async(struct bch_fs *c) 1464 1996 { 1465 - struct task_struct *p; 1466 - 1467 - p = c->gc_thread; 1468 - c->gc_thread = NULL; 1469 - 1470 - if (p) { 1471 - kthread_stop(p); 1472 - put_task_struct(p); 1473 - } 1997 + if (bch2_write_ref_tryget(c, BCH_WRITE_REF_gc_gens) && 1998 + !queue_work(c->write_ref_wq, &c->gc_gens_work)) 1999 + bch2_write_ref_put(c, BCH_WRITE_REF_gc_gens); 1474 2000 } 1475 2001 1476 - int bch2_gc_thread_start(struct bch_fs *c) 2002 + void bch2_fs_gc_init(struct bch_fs *c) 1477 2003 { 1478 - struct task_struct *p; 2004 + seqcount_init(&c->gc_pos_lock); 1479 2005 1480 - if (c->gc_thread) 1481 - return 0; 1482 - 1483 - p = kthread_create(bch2_gc_thread, c, "bch-gc/%s", c->name); 1484 - if (IS_ERR(p)) { 1485 - bch_err_fn(c, PTR_ERR(p)); 1486 - return PTR_ERR(p); 1487 - } 1488 - 1489 - get_task_struct(p); 1490 - c->gc_thread = p; 1491 - wake_up_process(p); 1492 - return 0; 2006 + INIT_WORK(&c->gc_gens_work, bch2_gc_gens_work); 1493 2007 }
+13 -31
fs/bcachefs/btree_gc.h
··· 6 6 #include "btree_types.h" 7 7 8 8 int bch2_check_topology(struct bch_fs *); 9 - int bch2_gc(struct bch_fs *, bool, bool); 10 - int bch2_gc_gens(struct bch_fs *); 11 - void bch2_gc_thread_stop(struct bch_fs *); 12 - int bch2_gc_thread_start(struct bch_fs *); 9 + int bch2_check_allocations(struct bch_fs *); 13 10 14 11 /* 15 12 * For concurrent mark and sweep (with other index updates), we define a total ··· 34 37 { 35 38 return (struct gc_pos) { 36 39 .phase = phase, 37 - .pos = POS_MIN, 38 40 .level = 0, 41 + .pos = POS_MIN, 39 42 }; 40 43 } 41 44 42 45 static inline int gc_pos_cmp(struct gc_pos l, struct gc_pos r) 43 46 { 44 - return cmp_int(l.phase, r.phase) ?: 45 - bpos_cmp(l.pos, r.pos) ?: 46 - cmp_int(l.level, r.level); 47 + return cmp_int(l.phase, r.phase) ?: 48 + -cmp_int(l.level, r.level) ?: 49 + bpos_cmp(l.pos, r.pos); 47 50 } 48 51 49 52 static inline enum gc_phase btree_id_to_gc_phase(enum btree_id id) ··· 57 60 } 58 61 } 59 62 60 - static inline struct gc_pos gc_pos_btree(enum btree_id id, 61 - struct bpos pos, unsigned level) 63 + static inline struct gc_pos gc_pos_btree(enum btree_id btree, unsigned level, 64 + struct bpos pos) 62 65 { 63 66 return (struct gc_pos) { 64 - .phase = btree_id_to_gc_phase(id), 65 - .pos = pos, 67 + .phase = btree_id_to_gc_phase(btree), 66 68 .level = level, 69 + .pos = pos, 67 70 }; 68 71 } 69 72 ··· 73 76 */ 74 77 static inline struct gc_pos gc_pos_btree_node(struct btree *b) 75 78 { 76 - return gc_pos_btree(b->c.btree_id, b->key.k.p, b->c.level); 77 - } 78 - 79 - /* 80 - * GC position of the pointer to a btree root: we don't use 81 - * gc_pos_pointer_to_btree_node() here to avoid a potential race with 82 - * btree_split() increasing the tree depth - the new root will have level > the 83 - * old root and thus have a greater gc position than the old root, but that 84 - * would be incorrect since once gc has marked the root it's not coming back. 85 - */ 86 - static inline struct gc_pos gc_pos_btree_root(enum btree_id id) 87 - { 88 - return gc_pos_btree(id, SPOS_MAX, BTREE_MAX_DEPTH); 79 + return gc_pos_btree(b->c.btree_id, b->c.level, b->key.k.p); 89 80 } 90 81 91 82 static inline bool gc_visited(struct bch_fs *c, struct gc_pos pos) ··· 89 104 return ret; 90 105 } 91 106 92 - static inline void bch2_do_gc_gens(struct bch_fs *c) 93 - { 94 - atomic_inc(&c->kick_gc); 95 - if (c->gc_thread) 96 - wake_up_process(c->gc_thread); 97 - } 107 + int bch2_gc_gens(struct bch_fs *); 108 + void bch2_gc_gens_async(struct bch_fs *); 109 + void bch2_fs_gc_init(struct bch_fs *); 98 110 99 111 #endif /* _BCACHEFS_BTREE_GC_H */
+62 -55
fs/bcachefs/btree_io.c
··· 23 23 24 24 #include <linux/sched/mm.h> 25 25 26 + static void bch2_btree_node_header_to_text(struct printbuf *out, struct btree_node *bn) 27 + { 28 + prt_printf(out, "btree=%s l=%u seq %llux\n", 29 + bch2_btree_id_str(BTREE_NODE_ID(bn)), 30 + (unsigned) BTREE_NODE_LEVEL(bn), bn->keys.seq); 31 + prt_str(out, "min: "); 32 + bch2_bpos_to_text(out, bn->min_key); 33 + prt_newline(out); 34 + prt_str(out, "max: "); 35 + bch2_bpos_to_text(out, bn->max_key); 36 + } 37 + 26 38 void bch2_btree_node_io_unlock(struct btree *b) 27 39 { 28 40 EBUG_ON(!btree_node_write_in_flight(b)); ··· 229 217 230 218 static bool bch2_drop_whiteouts(struct btree *b, enum compact_mode mode) 231 219 { 232 - struct bset_tree *t; 233 220 bool ret = false; 234 221 235 222 for_each_bset(b, t) { ··· 299 288 300 289 static void btree_node_sort(struct bch_fs *c, struct btree *b, 301 290 unsigned start_idx, 302 - unsigned end_idx, 303 - bool filter_whiteouts) 291 + unsigned end_idx) 304 292 { 305 293 struct btree_node *out; 306 294 struct sort_iter_stack sort_iter; ··· 330 320 331 321 start_time = local_clock(); 332 322 333 - u64s = bch2_sort_keys(out->keys.start, &sort_iter.iter, filter_whiteouts); 323 + u64s = bch2_sort_keys(out->keys.start, &sort_iter.iter); 334 324 335 325 out->keys.u64s = cpu_to_le16(u64s); 336 326 ··· 436 426 break; 437 427 438 428 if (b->nsets - unwritten_idx > 1) { 439 - btree_node_sort(c, b, unwritten_idx, 440 - b->nsets, false); 429 + btree_node_sort(c, b, unwritten_idx, b->nsets); 441 430 ret = true; 442 431 } 443 432 444 433 if (unwritten_idx > 1) { 445 - btree_node_sort(c, b, 0, unwritten_idx, false); 434 + btree_node_sort(c, b, 0, unwritten_idx); 446 435 ret = true; 447 436 } 448 437 ··· 450 441 451 442 void bch2_btree_build_aux_trees(struct btree *b) 452 443 { 453 - struct bset_tree *t; 454 - 455 444 for_each_bset(b, t) 456 445 bch2_bset_build_aux_tree(b, t, 457 446 !bset_written(b, bset(b, t)) && ··· 531 524 prt_printf(out, "at btree "); 532 525 bch2_btree_pos_to_text(out, c, b); 533 526 534 - prt_printf(out, "\n node offset %u/%u", 527 + printbuf_indent_add(out, 2); 528 + 529 + prt_printf(out, "\nnode offset %u/%u", 535 530 b->written, btree_ptr_sectors_written(&b->key)); 536 531 if (i) 537 532 prt_printf(out, " bset u64s %u", le16_to_cpu(i->u64s)); ··· 552 543 const char *fmt, ...) 553 544 { 554 545 struct printbuf out = PRINTBUF; 546 + bool silent = c->curr_recovery_pass == BCH_RECOVERY_PASS_scan_for_btree_nodes; 555 547 va_list args; 556 548 557 549 btree_err_msg(&out, c, ca, b, i, b->written, write); ··· 574 564 if (!have_retry && ret == -BCH_ERR_btree_node_read_err_must_retry) 575 565 ret = -BCH_ERR_btree_node_read_err_bad_node; 576 566 577 - if (ret != -BCH_ERR_btree_node_read_err_fixable) 567 + if (!silent && ret != -BCH_ERR_btree_node_read_err_fixable) 578 568 bch2_sb_error_count(c, err_type); 579 569 580 570 switch (ret) { 581 571 case -BCH_ERR_btree_node_read_err_fixable: 582 - ret = bch2_fsck_err(c, FSCK_CAN_FIX, err_type, "%s", out.buf); 572 + ret = !silent 573 + ? bch2_fsck_err(c, FSCK_CAN_FIX, err_type, "%s", out.buf) 574 + : -BCH_ERR_fsck_fix; 583 575 if (ret != -BCH_ERR_fsck_fix && 584 576 ret != -BCH_ERR_fsck_ignore) 585 577 goto fsck_err; ··· 589 577 break; 590 578 case -BCH_ERR_btree_node_read_err_want_retry: 591 579 case -BCH_ERR_btree_node_read_err_must_retry: 592 - bch2_print_string_as_lines(KERN_ERR, out.buf); 580 + if (!silent) 581 + bch2_print_string_as_lines(KERN_ERR, out.buf); 593 582 break; 594 583 case -BCH_ERR_btree_node_read_err_bad_node: 595 - bch2_print_string_as_lines(KERN_ERR, out.buf); 584 + if (!silent) 585 + bch2_print_string_as_lines(KERN_ERR, out.buf); 596 586 ret = bch2_topology_error(c); 597 587 break; 598 588 case -BCH_ERR_btree_node_read_err_incompatible: 599 - bch2_print_string_as_lines(KERN_ERR, out.buf); 589 + if (!silent) 590 + bch2_print_string_as_lines(KERN_ERR, out.buf); 600 591 ret = -BCH_ERR_fsck_errors_not_fixed; 601 592 break; 602 593 default: ··· 634 619 __cold 635 620 void bch2_btree_node_drop_keys_outside_node(struct btree *b) 636 621 { 637 - struct bset_tree *t; 638 - 639 622 for_each_bset(b, t) { 640 623 struct bset *i = bset(b, t); 641 624 struct bkey_packed *k; ··· 1034 1021 -BCH_ERR_btree_node_read_err_must_retry, 1035 1022 c, ca, b, NULL, 1036 1023 btree_node_bad_seq, 1037 - "got wrong btree node (want %llx got %llx)\n" 1038 - "got btree %s level %llu pos %s", 1039 - bp->seq, b->data->keys.seq, 1040 - bch2_btree_id_str(BTREE_NODE_ID(b->data)), 1041 - BTREE_NODE_LEVEL(b->data), 1042 - buf.buf); 1024 + "got wrong btree node: got\n%s", 1025 + (printbuf_reset(&buf), 1026 + bch2_btree_node_header_to_text(&buf, b->data), 1027 + buf.buf)); 1043 1028 } else { 1044 1029 btree_err_on(!b->data->keys.seq, 1045 1030 -BCH_ERR_btree_node_read_err_must_retry, 1046 1031 c, ca, b, NULL, 1047 1032 btree_node_bad_seq, 1048 - "bad btree header: seq 0"); 1033 + "bad btree header: seq 0\n%s", 1034 + (printbuf_reset(&buf), 1035 + bch2_btree_node_header_to_text(&buf, b->data), 1036 + buf.buf)); 1049 1037 } 1050 1038 1051 1039 while (b->written < (ptr_written ?: btree_sectors(c))) { ··· 1109 1095 nonce = btree_nonce(i, b->written << 9); 1110 1096 struct bch_csum csum = csum_vstruct(c, BSET_CSUM_TYPE(i), nonce, bne); 1111 1097 csum_bad = bch2_crc_cmp(bne->csum, csum); 1112 - if (csum_bad) 1098 + if (ca && csum_bad) 1113 1099 bch2_io_error(ca, BCH_MEMBER_ERROR_checksum); 1114 1100 1115 1101 btree_err_on(csum_bad, ··· 1263 1249 1264 1250 btree_node_reset_sib_u64s(b); 1265 1251 1252 + rcu_read_lock(); 1266 1253 bkey_for_each_ptr(bch2_bkey_ptrs(bkey_i_to_s(&b->key)), ptr) { 1267 - struct bch_dev *ca2 = bch_dev_bkey_exists(c, ptr->dev); 1254 + struct bch_dev *ca2 = bch2_dev_rcu(c, ptr->dev); 1268 1255 1269 - if (ca2->mi.state != BCH_MEMBER_STATE_rw) 1256 + if (!ca2 || ca2->mi.state != BCH_MEMBER_STATE_rw) 1270 1257 set_btree_node_need_rewrite(b); 1271 1258 } 1259 + rcu_read_unlock(); 1272 1260 1273 1261 if (!ptr_written) 1274 1262 set_btree_node_need_rewrite(b); ··· 1295 1279 struct btree_read_bio *rb = 1296 1280 container_of(work, struct btree_read_bio, work); 1297 1281 struct bch_fs *c = rb->c; 1282 + struct bch_dev *ca = rb->have_ioref ? bch2_dev_have_ref(c, rb->pick.ptr.dev) : NULL; 1298 1283 struct btree *b = rb->b; 1299 - struct bch_dev *ca = bch_dev_bkey_exists(c, rb->pick.ptr.dev); 1300 1284 struct bio *bio = &rb->bio; 1301 1285 struct bch_io_failures failed = { .nr = 0 }; 1302 1286 struct printbuf buf = PRINTBUF; ··· 1308 1292 while (1) { 1309 1293 retry = true; 1310 1294 bch_info(c, "retrying read"); 1311 - ca = bch_dev_bkey_exists(c, rb->pick.ptr.dev); 1312 - rb->have_ioref = bch2_dev_get_ioref(ca, READ); 1295 + ca = bch2_dev_get_ioref(c, rb->pick.ptr.dev, READ); 1296 + rb->have_ioref = ca != NULL; 1313 1297 bio_reset(bio, NULL, REQ_OP_READ|REQ_SYNC|REQ_META); 1314 1298 bio->bi_iter.bi_sector = rb->pick.ptr.offset; 1315 1299 bio->bi_iter.bi_size = btree_buf_bytes(b); ··· 1323 1307 start: 1324 1308 printbuf_reset(&buf); 1325 1309 bch2_btree_pos_to_text(&buf, c, b); 1326 - bch2_dev_io_err_on(bio->bi_status, ca, BCH_MEMBER_ERROR_read, 1310 + bch2_dev_io_err_on(ca && bio->bi_status, ca, BCH_MEMBER_ERROR_read, 1327 1311 "btree read error %s for %s", 1328 1312 bch2_blk_status_to_str(bio->bi_status), buf.buf); 1329 1313 if (rb->have_ioref) ··· 1379 1363 struct bch_fs *c = rb->c; 1380 1364 1381 1365 if (rb->have_ioref) { 1382 - struct bch_dev *ca = bch_dev_bkey_exists(c, rb->pick.ptr.dev); 1366 + struct bch_dev *ca = bch2_dev_have_ref(c, rb->pick.ptr.dev); 1383 1367 1384 1368 bch2_latency_acct(ca, rb->start_time, READ); 1385 1369 } ··· 1576 1560 struct btree_node_read_all *ra = rb->ra; 1577 1561 1578 1562 if (rb->have_ioref) { 1579 - struct bch_dev *ca = bch_dev_bkey_exists(c, rb->pick.ptr.dev); 1563 + struct bch_dev *ca = bch2_dev_have_ref(c, rb->pick.ptr.dev); 1580 1564 1581 1565 bch2_latency_acct(ca, rb->start_time, READ); 1582 1566 } ··· 1618 1602 1619 1603 i = 0; 1620 1604 bkey_for_each_ptr_decode(k.k, ptrs, pick, entry) { 1621 - struct bch_dev *ca = bch_dev_bkey_exists(c, pick.ptr.dev); 1605 + struct bch_dev *ca = bch2_dev_get_ioref(c, pick.ptr.dev, READ); 1622 1606 struct btree_read_bio *rb = 1623 1607 container_of(ra->bio[i], struct btree_read_bio, bio); 1624 1608 rb->c = c; 1625 1609 rb->b = b; 1626 1610 rb->ra = ra; 1627 1611 rb->start_time = local_clock(); 1628 - rb->have_ioref = bch2_dev_get_ioref(ca, READ); 1612 + rb->have_ioref = ca != NULL; 1629 1613 rb->idx = i; 1630 1614 rb->pick = pick; 1631 1615 rb->bio.bi_iter.bi_sector = pick.ptr.offset; ··· 1695 1679 return; 1696 1680 } 1697 1681 1698 - ca = bch_dev_bkey_exists(c, pick.ptr.dev); 1682 + ca = bch2_dev_get_ioref(c, pick.ptr.dev, READ); 1699 1683 1700 1684 bio = bio_alloc_bioset(NULL, 1701 1685 buf_pages(b->data, btree_buf_bytes(b)), ··· 1707 1691 rb->b = b; 1708 1692 rb->ra = NULL; 1709 1693 rb->start_time = local_clock(); 1710 - rb->have_ioref = bch2_dev_get_ioref(ca, READ); 1694 + rb->have_ioref = ca != NULL; 1711 1695 rb->pick = pick; 1712 1696 INIT_WORK(&rb->work, btree_node_read_work); 1713 1697 bio->bi_iter.bi_sector = pick.ptr.offset; ··· 1862 1846 container_of(work, struct btree_write_bio, work); 1863 1847 struct bch_fs *c = wbio->wbio.c; 1864 1848 struct btree *b = wbio->wbio.bio.bi_private; 1865 - struct bch_extent_ptr *ptr; 1866 1849 int ret = 0; 1867 1850 1868 1851 btree_bounce_free(c, ··· 1911 1896 struct btree_write_bio *wb = container_of(orig, struct btree_write_bio, wbio); 1912 1897 struct bch_fs *c = wbio->c; 1913 1898 struct btree *b = wbio->bio.bi_private; 1914 - struct bch_dev *ca = bch_dev_bkey_exists(c, wbio->dev); 1899 + struct bch_dev *ca = wbio->have_ioref ? bch2_dev_have_ref(c, wbio->dev) : NULL; 1915 1900 unsigned long flags; 1916 1901 1917 1902 if (wbio->have_ioref) 1918 1903 bch2_latency_acct(ca, wbio->submit_time, WRITE); 1919 1904 1920 - if (bch2_dev_io_err_on(bio->bi_status, ca, BCH_MEMBER_ERROR_write, 1905 + if (!ca || 1906 + bch2_dev_io_err_on(bio->bi_status, ca, BCH_MEMBER_ERROR_write, 1921 1907 "btree write error: %s", 1922 1908 bch2_blk_status_to_str(bio->bi_status)) || 1923 1909 bch2_meta_write_fault("btree")) { ··· 1985 1969 void __bch2_btree_node_write(struct bch_fs *c, struct btree *b, unsigned flags) 1986 1970 { 1987 1971 struct btree_write_bio *wbio; 1988 - struct bset_tree *t; 1989 1972 struct bset *i; 1990 1973 struct btree_node *bn = NULL; 1991 1974 struct btree_node_entry *bne = NULL; ··· 2110 2095 unwritten_whiteouts_end(b)); 2111 2096 SET_BSET_SEPARATE_WHITEOUTS(i, false); 2112 2097 2113 - b->whiteout_u64s = 0; 2114 - 2115 - u64s = bch2_sort_keys(i->start, &sort_iter.iter, false); 2098 + u64s = bch2_sort_keys_keep_unwritten_whiteouts(i->start, &sort_iter.iter); 2116 2099 le16_add_cpu(&i->u64s, u64s); 2100 + 2101 + b->whiteout_u64s = 0; 2117 2102 2118 2103 BUG_ON(!b->written && i->u64s != b->data->keys.u64s); 2119 2104 ··· 2241 2226 { 2242 2227 bool invalidated_iter = false; 2243 2228 struct btree_node_entry *bne; 2244 - struct bset_tree *t; 2245 2229 2246 2230 if (!btree_node_just_written(b)) 2247 2231 return false; ··· 2263 2249 * single bset: 2264 2250 */ 2265 2251 if (b->nsets > 1) { 2266 - btree_node_sort(c, b, 0, b->nsets, true); 2252 + btree_node_sort(c, b, 0, b->nsets); 2267 2253 invalidated_iter = true; 2268 2254 } else { 2269 2255 invalidated_iter = bch2_drop_whiteouts(b, COMPACT_ALL); ··· 2360 2346 printbuf_tabstop_push(out, 20); 2361 2347 printbuf_tabstop_push(out, 10); 2362 2348 2363 - prt_tab(out); 2364 - prt_str(out, "nr"); 2365 - prt_tab(out); 2366 - prt_str(out, "size"); 2367 - prt_newline(out); 2349 + prt_printf(out, "\tnr\tsize\n"); 2368 2350 2369 2351 for (unsigned i = 0; i < BTREE_WRITE_TYPE_NR; i++) { 2370 2352 u64 nr = atomic64_read(&c->btree_write_stats[i].nr); 2371 2353 u64 bytes = atomic64_read(&c->btree_write_stats[i].bytes); 2372 2354 2373 - prt_printf(out, "%s:", bch2_btree_write_types[i]); 2374 - prt_tab(out); 2375 - prt_u64(out, nr); 2376 - prt_tab(out); 2355 + prt_printf(out, "%s:\t%llu\t", bch2_btree_write_types[i], nr); 2377 2356 prt_human_readable_u64(out, nr ? div64_u64(bytes, nr) : 0); 2378 2357 prt_newline(out); 2379 2358 }
-2
fs/bcachefs/btree_io.h
··· 81 81 82 82 static inline bool bch2_maybe_compact_whiteouts(struct bch_fs *c, struct btree *b) 83 83 { 84 - struct bset_tree *t; 85 - 86 84 for_each_bset(b, t) 87 85 if (should_compact_bset_lazy(b, t)) 88 86 return bch2_compact_whiteouts(c, b, COMPACT_LAZY);
+238 -109
fs/bcachefs/btree_iter.c
··· 61 61 static inline struct bpos bkey_successor(struct btree_iter *iter, struct bpos p) 62 62 { 63 63 /* Are we iterating over keys in all snapshots? */ 64 - if (iter->flags & BTREE_ITER_ALL_SNAPSHOTS) { 64 + if (iter->flags & BTREE_ITER_all_snapshots) { 65 65 p = bpos_successor(p); 66 66 } else { 67 67 p = bpos_nosnap_successor(p); ··· 74 74 static inline struct bpos bkey_predecessor(struct btree_iter *iter, struct bpos p) 75 75 { 76 76 /* Are we iterating over keys in all snapshots? */ 77 - if (iter->flags & BTREE_ITER_ALL_SNAPSHOTS) { 77 + if (iter->flags & BTREE_ITER_all_snapshots) { 78 78 p = bpos_predecessor(p); 79 79 } else { 80 80 p = bpos_nosnap_predecessor(p); ··· 88 88 { 89 89 struct bpos pos = iter->pos; 90 90 91 - if ((iter->flags & BTREE_ITER_IS_EXTENTS) && 91 + if ((iter->flags & BTREE_ITER_is_extents) && 92 92 !bkey_eq(pos, POS_MAX)) 93 93 pos = bkey_successor(iter, pos); 94 94 return pos; ··· 253 253 254 254 BUG_ON(iter->btree_id >= BTREE_ID_NR); 255 255 256 - BUG_ON(!!(iter->flags & BTREE_ITER_CACHED) != btree_iter_path(trans, iter)->cached); 256 + BUG_ON(!!(iter->flags & BTREE_ITER_cached) != btree_iter_path(trans, iter)->cached); 257 257 258 - BUG_ON((iter->flags & BTREE_ITER_IS_EXTENTS) && 259 - (iter->flags & BTREE_ITER_ALL_SNAPSHOTS)); 258 + BUG_ON((iter->flags & BTREE_ITER_is_extents) && 259 + (iter->flags & BTREE_ITER_all_snapshots)); 260 260 261 - BUG_ON(!(iter->flags & __BTREE_ITER_ALL_SNAPSHOTS) && 262 - (iter->flags & BTREE_ITER_ALL_SNAPSHOTS) && 261 + BUG_ON(!(iter->flags & BTREE_ITER_snapshot_field) && 262 + (iter->flags & BTREE_ITER_all_snapshots) && 263 263 !btree_type_has_snapshot_field(iter->btree_id)); 264 264 265 265 if (iter->update_path) ··· 269 269 270 270 static void bch2_btree_iter_verify_entry_exit(struct btree_iter *iter) 271 271 { 272 - BUG_ON((iter->flags & BTREE_ITER_FILTER_SNAPSHOTS) && 272 + BUG_ON((iter->flags & BTREE_ITER_filter_snapshots) && 273 273 !iter->pos.snapshot); 274 274 275 - BUG_ON(!(iter->flags & BTREE_ITER_ALL_SNAPSHOTS) && 275 + BUG_ON(!(iter->flags & BTREE_ITER_all_snapshots) && 276 276 iter->pos.snapshot != iter->snapshot); 277 277 278 278 BUG_ON(bkey_lt(iter->pos, bkey_start_pos(&iter->k)) || ··· 289 289 if (!bch2_debug_check_iterators) 290 290 return 0; 291 291 292 - if (!(iter->flags & BTREE_ITER_FILTER_SNAPSHOTS)) 292 + if (!(iter->flags & BTREE_ITER_filter_snapshots)) 293 293 return 0; 294 294 295 295 if (bkey_err(k) || !k.k) ··· 300 300 k.k->p.snapshot)); 301 301 302 302 bch2_trans_iter_init(trans, &copy, iter->btree_id, iter->pos, 303 - BTREE_ITER_NOPRESERVE| 304 - BTREE_ITER_ALL_SNAPSHOTS); 303 + BTREE_ITER_nopreserve| 304 + BTREE_ITER_all_snapshots); 305 305 prev = bch2_btree_iter_prev(&copy); 306 306 if (!prev.k) 307 307 goto out; ··· 897 897 898 898 bch2_bkey_buf_reassemble(out, c, k); 899 899 900 - if ((flags & BTREE_ITER_PREFETCH) && 900 + if ((flags & BTREE_ITER_prefetch) && 901 901 c->opts.btree_node_prefetch) 902 902 ret = btree_path_prefetch_j(trans, path, &jiter); 903 903 ··· 944 944 945 945 bch2_bkey_buf_unpack(&tmp, c, l->b, k); 946 946 947 - if ((flags & BTREE_ITER_PREFETCH) && 947 + if ((flags & BTREE_ITER_prefetch) && 948 948 c->opts.btree_node_prefetch) { 949 949 ret = btree_path_prefetch(trans, path); 950 950 if (ret) ··· 999 999 1000 1000 bch2_trans_unlock(trans); 1001 1001 cond_resched(); 1002 + trans->locked = true; 1002 1003 1003 1004 if (unlikely(trans->memory_allocation_failure)) { 1004 1005 struct closure cl; ··· 1163 1162 goto out_uptodate; 1164 1163 1165 1164 path->level = btree_path_up_until_good_node(trans, path, 0); 1165 + unsigned max_level = path->level; 1166 1166 1167 1167 EBUG_ON(btree_path_node(path, path->level) && 1168 1168 !btree_node_locked(path, path->level)); ··· 1194 1192 goto out; 1195 1193 } 1196 1194 } 1195 + 1196 + if (unlikely(max_level > path->level)) { 1197 + struct btree_path *linked; 1198 + unsigned iter; 1199 + 1200 + trans_for_each_path_with_node(trans, path_l(path)->b, linked, iter) 1201 + for (unsigned j = path->level + 1; j < max_level; j++) 1202 + linked->l[j] = path->l[j]; 1203 + } 1204 + 1197 1205 out_uptodate: 1198 1206 path->uptodate = BTREE_ITER_UPTODATE; 1199 1207 out: ··· 1233 1221 } 1234 1222 1235 1223 static btree_path_idx_t btree_path_clone(struct btree_trans *trans, btree_path_idx_t src, 1236 - bool intent) 1224 + bool intent, unsigned long ip) 1237 1225 { 1238 1226 btree_path_idx_t new = btree_path_alloc(trans, src); 1239 1227 btree_path_copy(trans, trans->paths + new, trans->paths + src); 1240 1228 __btree_path_get(trans->paths + new, intent); 1229 + #ifdef TRACK_PATH_ALLOCATED 1230 + trans->paths[new].ip_allocated = ip; 1231 + #endif 1241 1232 return new; 1242 1233 } 1243 1234 ··· 1249 1234 btree_path_idx_t path, bool intent, unsigned long ip) 1250 1235 { 1251 1236 __btree_path_put(trans->paths + path, intent); 1252 - path = btree_path_clone(trans, path, intent); 1237 + path = btree_path_clone(trans, path, intent, ip); 1253 1238 trans->paths[path].preserve = false; 1254 1239 return path; 1255 1240 } ··· 1349 1334 __clear_bit(path, trans->paths_allocated); 1350 1335 } 1351 1336 1337 + static bool bch2_btree_path_can_relock(struct btree_trans *trans, struct btree_path *path) 1338 + { 1339 + unsigned l = path->level; 1340 + 1341 + do { 1342 + if (!btree_path_node(path, l)) 1343 + break; 1344 + 1345 + if (!is_btree_node(path, l)) 1346 + return false; 1347 + 1348 + if (path->l[l].lock_seq != path->l[l].b->c.lock.seq) 1349 + return false; 1350 + 1351 + l++; 1352 + } while (l < path->locks_want); 1353 + 1354 + return true; 1355 + } 1356 + 1352 1357 void bch2_path_put(struct btree_trans *trans, btree_path_idx_t path_idx, bool intent) 1353 1358 { 1354 1359 struct btree_path *path = trans->paths + path_idx, *dup; ··· 1383 1348 if (!dup && !(!path->preserve && !is_btree_node(path, path->level))) 1384 1349 return; 1385 1350 1386 - if (path->should_be_locked && 1387 - !trans->restarted && 1388 - (!dup || !bch2_btree_path_relock_norestart(trans, dup))) 1389 - return; 1351 + if (path->should_be_locked && !trans->restarted) { 1352 + if (!dup) 1353 + return; 1354 + 1355 + if (!(trans->locked 1356 + ? bch2_btree_path_relock_norestart(trans, dup) 1357 + : bch2_btree_path_can_relock(trans, dup))) 1358 + return; 1359 + } 1390 1360 1391 1361 if (dup) { 1392 1362 dup->preserve |= path->preserve; ··· 1424 1384 (void *) trans->last_restarted_ip); 1425 1385 } 1426 1386 1387 + void __noreturn bch2_trans_unlocked_error(struct btree_trans *trans) 1388 + { 1389 + panic("trans should be locked, unlocked by %pS\n", 1390 + (void *) trans->last_unlock_ip); 1391 + } 1392 + 1427 1393 noinline __cold 1428 1394 void bch2_trans_updates_to_text(struct printbuf *buf, struct btree_trans *trans) 1429 1395 { 1430 - prt_printf(buf, "transaction updates for %s journal seq %llu", 1396 + prt_printf(buf, "transaction updates for %s journal seq %llu\n", 1431 1397 trans->fn, trans->journal_res.seq); 1432 - prt_newline(buf); 1433 1398 printbuf_indent_add(buf, 2); 1434 1399 1435 1400 trans_for_each_update(trans, i) { 1436 1401 struct bkey_s_c old = { &i->old_k, i->old_v }; 1437 1402 1438 - prt_printf(buf, "update: btree=%s cached=%u %pS", 1403 + prt_printf(buf, "update: btree=%s cached=%u %pS\n", 1439 1404 bch2_btree_id_str(i->btree_id), 1440 1405 i->cached, 1441 1406 (void *) i->ip_allocated); 1442 - prt_newline(buf); 1443 1407 1444 1408 prt_printf(buf, " old "); 1445 1409 bch2_bkey_val_to_text(buf, trans->c, old); ··· 1472 1428 printbuf_exit(&buf); 1473 1429 } 1474 1430 1475 - static void bch2_btree_path_to_text(struct printbuf *out, struct btree_trans *trans, btree_path_idx_t path_idx) 1431 + static void bch2_btree_path_to_text_short(struct printbuf *out, struct btree_trans *trans, btree_path_idx_t path_idx) 1476 1432 { 1477 1433 struct btree_path *path = trans->paths + path_idx; 1478 1434 1479 - prt_printf(out, "path: idx %2u ref %u:%u %c %c btree=%s l=%u pos ", 1435 + prt_printf(out, "path: idx %2u ref %u:%u %c %c %c btree=%s l=%u pos ", 1480 1436 path_idx, path->ref, path->intent_ref, 1481 1437 path->preserve ? 'P' : ' ', 1482 1438 path->should_be_locked ? 'S' : ' ', 1439 + path->cached ? 'C' : 'B', 1483 1440 bch2_btree_id_str(path->btree_id), 1484 1441 path->level); 1485 1442 bch2_bpos_to_text(out, path->pos); 1486 1443 1487 - prt_printf(out, " locks %u", path->nodes_locked); 1488 1444 #ifdef TRACK_PATH_ALLOCATED 1489 1445 prt_printf(out, " %pS", (void *) path->ip_allocated); 1490 1446 #endif 1447 + } 1448 + 1449 + static const char *btree_node_locked_str(enum btree_node_locked_type t) 1450 + { 1451 + switch (t) { 1452 + case BTREE_NODE_UNLOCKED: 1453 + return "unlocked"; 1454 + case BTREE_NODE_READ_LOCKED: 1455 + return "read"; 1456 + case BTREE_NODE_INTENT_LOCKED: 1457 + return "intent"; 1458 + case BTREE_NODE_WRITE_LOCKED: 1459 + return "write"; 1460 + default: 1461 + return NULL; 1462 + } 1463 + } 1464 + 1465 + void bch2_btree_path_to_text(struct printbuf *out, struct btree_trans *trans, btree_path_idx_t path_idx) 1466 + { 1467 + bch2_btree_path_to_text_short(out, trans, path_idx); 1468 + 1469 + struct btree_path *path = trans->paths + path_idx; 1470 + 1471 + prt_printf(out, " uptodate %u locks_want %u", path->uptodate, path->locks_want); 1491 1472 prt_newline(out); 1473 + 1474 + printbuf_indent_add(out, 2); 1475 + for (unsigned l = 0; l < BTREE_MAX_DEPTH; l++) { 1476 + prt_printf(out, "l=%u locks %s seq %u node ", l, 1477 + btree_node_locked_str(btree_node_locked_type(path, l)), 1478 + path->l[l].lock_seq); 1479 + 1480 + int ret = PTR_ERR_OR_ZERO(path->l[l].b); 1481 + if (ret) 1482 + prt_str(out, bch2_err_str(ret)); 1483 + else 1484 + prt_printf(out, "%px", path->l[l].b); 1485 + prt_newline(out); 1486 + } 1487 + printbuf_indent_sub(out, 2); 1492 1488 } 1493 1489 1494 1490 static noinline __cold ··· 1540 1456 if (!nosort) 1541 1457 btree_trans_sort_paths(trans); 1542 1458 1543 - trans_for_each_path_idx_inorder(trans, iter) 1544 - bch2_btree_path_to_text(out, trans, iter.path_idx); 1459 + trans_for_each_path_idx_inorder(trans, iter) { 1460 + bch2_btree_path_to_text_short(out, trans, iter.path_idx); 1461 + prt_newline(out); 1462 + } 1545 1463 } 1546 1464 1547 1465 noinline __cold ··· 1694 1608 unsigned flags, unsigned long ip) 1695 1609 { 1696 1610 struct btree_path *path; 1697 - bool cached = flags & BTREE_ITER_CACHED; 1698 - bool intent = flags & BTREE_ITER_INTENT; 1611 + bool cached = flags & BTREE_ITER_cached; 1612 + bool intent = flags & BTREE_ITER_intent; 1699 1613 struct trans_for_each_path_inorder_iter iter; 1700 1614 btree_path_idx_t path_pos = 0, path_idx; 1701 1615 1616 + bch2_trans_verify_not_unlocked(trans); 1702 1617 bch2_trans_verify_not_in_restart(trans); 1703 1618 bch2_trans_verify_locks(trans); 1704 1619 ··· 1744 1657 trans->paths_sorted = false; 1745 1658 } 1746 1659 1747 - if (!(flags & BTREE_ITER_NOPRESERVE)) 1660 + if (!(flags & BTREE_ITER_nopreserve)) 1748 1661 path->preserve = true; 1749 1662 1750 1663 if (path->intent_ref) ··· 1762 1675 if (locks_want > path->locks_want) 1763 1676 bch2_btree_path_upgrade_noupgrade_sibs(trans, path, locks_want, NULL); 1764 1677 1678 + return path_idx; 1679 + } 1680 + 1681 + btree_path_idx_t bch2_path_get_unlocked_mut(struct btree_trans *trans, 1682 + enum btree_id btree_id, 1683 + unsigned level, 1684 + struct bpos pos) 1685 + { 1686 + btree_path_idx_t path_idx = bch2_path_get(trans, btree_id, pos, level + 1, level, 1687 + BTREE_ITER_nopreserve| 1688 + BTREE_ITER_intent, _RET_IP_); 1689 + path_idx = bch2_btree_path_make_mut(trans, path_idx, true, _RET_IP_); 1690 + 1691 + struct btree_path *path = trans->paths + path_idx; 1692 + bch2_btree_path_downgrade(trans, path); 1693 + __bch2_btree_path_unlock(trans, path); 1765 1694 return path_idx; 1766 1695 } 1767 1696 ··· 1822 1719 return (struct bkey_s_c) { u, NULL }; 1823 1720 } 1824 1721 1722 + 1723 + void bch2_set_btree_iter_dontneed(struct btree_iter *iter) 1724 + { 1725 + struct btree_trans *trans = iter->trans; 1726 + 1727 + if (!iter->path || trans->restarted) 1728 + return; 1729 + 1730 + struct btree_path *path = btree_iter_path(trans, iter); 1731 + path->preserve = false; 1732 + if (path->ref == 1) 1733 + path->should_be_locked = false; 1734 + } 1825 1735 /* Btree iterators: */ 1826 1736 1827 1737 int __must_check ··· 1849 1733 struct btree_trans *trans = iter->trans; 1850 1734 int ret; 1851 1735 1736 + bch2_trans_verify_not_unlocked(trans); 1737 + 1852 1738 iter->path = bch2_btree_path_set_pos(trans, iter->path, 1853 1739 btree_iter_search_key(iter), 1854 - iter->flags & BTREE_ITER_INTENT, 1740 + iter->flags & BTREE_ITER_intent, 1855 1741 btree_iter_ip_allocated(iter)); 1856 1742 1857 1743 ret = bch2_btree_path_traverse(iter->trans, iter->path, iter->flags); ··· 1892 1774 iter->k.p = iter->pos = b->key.k.p; 1893 1775 1894 1776 iter->path = bch2_btree_path_set_pos(trans, iter->path, b->key.k.p, 1895 - iter->flags & BTREE_ITER_INTENT, 1777 + iter->flags & BTREE_ITER_intent, 1896 1778 btree_iter_ip_allocated(iter)); 1897 1779 btree_path_set_should_be_locked(btree_iter_path(trans, iter)); 1898 1780 out: ··· 1953 1835 if (bpos_eq(iter->pos, b->key.k.p)) { 1954 1836 __btree_path_set_level_up(trans, path, path->level++); 1955 1837 } else { 1838 + if (btree_lock_want(path, path->level + 1) == BTREE_NODE_UNLOCKED) 1839 + btree_node_unlock(trans, path, path->level + 1); 1840 + 1956 1841 /* 1957 1842 * Haven't gotten to the end of the parent node: go back down to 1958 1843 * the next child node 1959 1844 */ 1960 1845 iter->path = bch2_btree_path_set_pos(trans, iter->path, 1961 1846 bpos_successor(iter->pos), 1962 - iter->flags & BTREE_ITER_INTENT, 1847 + iter->flags & BTREE_ITER_intent, 1963 1848 btree_iter_ip_allocated(iter)); 1964 1849 1965 1850 path = btree_iter_path(trans, iter); ··· 1980 1859 iter->k.p = iter->pos = b->key.k.p; 1981 1860 1982 1861 iter->path = bch2_btree_path_set_pos(trans, iter->path, b->key.k.p, 1983 - iter->flags & BTREE_ITER_INTENT, 1862 + iter->flags & BTREE_ITER_intent, 1984 1863 btree_iter_ip_allocated(iter)); 1985 1864 btree_path_set_should_be_locked(btree_iter_path(trans, iter)); 1986 1865 EBUG_ON(btree_iter_path(trans, iter)->uptodate); ··· 1999 1878 inline bool bch2_btree_iter_advance(struct btree_iter *iter) 2000 1879 { 2001 1880 struct bpos pos = iter->k.p; 2002 - bool ret = !(iter->flags & BTREE_ITER_ALL_SNAPSHOTS 1881 + bool ret = !(iter->flags & BTREE_ITER_all_snapshots 2003 1882 ? bpos_eq(pos, SPOS_MAX) 2004 1883 : bkey_eq(pos, SPOS_MAX)); 2005 1884 2006 - if (ret && !(iter->flags & BTREE_ITER_IS_EXTENTS)) 1885 + if (ret && !(iter->flags & BTREE_ITER_is_extents)) 2007 1886 pos = bkey_successor(iter, pos); 2008 1887 bch2_btree_iter_set_pos(iter, pos); 2009 1888 return ret; ··· 2012 1891 inline bool bch2_btree_iter_rewind(struct btree_iter *iter) 2013 1892 { 2014 1893 struct bpos pos = bkey_start_pos(&iter->k); 2015 - bool ret = !(iter->flags & BTREE_ITER_ALL_SNAPSHOTS 1894 + bool ret = !(iter->flags & BTREE_ITER_all_snapshots 2016 1895 ? bpos_eq(pos, POS_MIN) 2017 1896 : bkey_eq(pos, POS_MIN)); 2018 1897 2019 - if (ret && !(iter->flags & BTREE_ITER_IS_EXTENTS)) 1898 + if (ret && !(iter->flags & BTREE_ITER_is_extents)) 2020 1899 pos = bkey_predecessor(iter, pos); 2021 1900 bch2_btree_iter_set_pos(iter, pos); 2022 1901 return ret; ··· 2127 2006 struct bkey_s_c k; 2128 2007 int ret; 2129 2008 2130 - if ((iter->flags & BTREE_ITER_KEY_CACHE_FILL) && 2009 + bch2_trans_verify_not_in_restart(trans); 2010 + bch2_trans_verify_not_unlocked(trans); 2011 + 2012 + if ((iter->flags & BTREE_ITER_key_cache_fill) && 2131 2013 bpos_eq(iter->pos, pos)) 2132 2014 return bkey_s_c_null; 2133 2015 ··· 2139 2015 2140 2016 if (!iter->key_cache_path) 2141 2017 iter->key_cache_path = bch2_path_get(trans, iter->btree_id, pos, 2142 - iter->flags & BTREE_ITER_INTENT, 0, 2143 - iter->flags|BTREE_ITER_CACHED| 2144 - BTREE_ITER_CACHED_NOFILL, 2018 + iter->flags & BTREE_ITER_intent, 0, 2019 + iter->flags|BTREE_ITER_cached| 2020 + BTREE_ITER_cached_nofill, 2145 2021 _THIS_IP_); 2146 2022 2147 2023 iter->key_cache_path = bch2_btree_path_set_pos(trans, iter->key_cache_path, pos, 2148 - iter->flags & BTREE_ITER_INTENT, 2024 + iter->flags & BTREE_ITER_intent, 2149 2025 btree_iter_ip_allocated(iter)); 2150 2026 2151 2027 ret = bch2_btree_path_traverse(trans, iter->key_cache_path, 2152 - iter->flags|BTREE_ITER_CACHED) ?: 2028 + iter->flags|BTREE_ITER_cached) ?: 2153 2029 bch2_btree_path_relock(trans, btree_iter_path(trans, iter), _THIS_IP_); 2154 2030 if (unlikely(ret)) 2155 2031 return bkey_s_c_err(ret); ··· 2177 2053 struct btree_path_level *l; 2178 2054 2179 2055 iter->path = bch2_btree_path_set_pos(trans, iter->path, search_key, 2180 - iter->flags & BTREE_ITER_INTENT, 2056 + iter->flags & BTREE_ITER_intent, 2181 2057 btree_iter_ip_allocated(iter)); 2182 2058 2183 2059 ret = bch2_btree_path_traverse(trans, iter->path, iter->flags); ··· 2202 2078 2203 2079 k = btree_path_level_peek_all(trans->c, l, &iter->k); 2204 2080 2205 - if (unlikely(iter->flags & BTREE_ITER_WITH_KEY_CACHE) && 2081 + if (unlikely(iter->flags & BTREE_ITER_with_key_cache) && 2206 2082 k.k && 2207 2083 (k2 = btree_trans_peek_key_cache(iter, k.k->p)).k) { 2208 2084 k = k2; ··· 2213 2089 } 2214 2090 } 2215 2091 2216 - if (unlikely(iter->flags & BTREE_ITER_WITH_JOURNAL)) 2092 + if (unlikely(iter->flags & BTREE_ITER_with_journal)) 2217 2093 k = btree_trans_peek_journal(trans, iter, k); 2218 2094 2219 - if (unlikely((iter->flags & BTREE_ITER_WITH_UPDATES) && 2095 + if (unlikely((iter->flags & BTREE_ITER_with_updates) && 2220 2096 trans->nr_updates)) 2221 2097 bch2_btree_trans_peek_updates(trans, iter, &k); 2222 2098 ··· 2268 2144 struct bpos iter_pos; 2269 2145 int ret; 2270 2146 2271 - EBUG_ON((iter->flags & BTREE_ITER_FILTER_SNAPSHOTS) && bkey_eq(end, POS_MAX)); 2147 + bch2_trans_verify_not_unlocked(trans); 2148 + EBUG_ON((iter->flags & BTREE_ITER_filter_snapshots) && bkey_eq(end, POS_MAX)); 2272 2149 2273 2150 if (iter->update_path) { 2274 2151 bch2_path_put_nokeep(trans, iter->update_path, 2275 - iter->flags & BTREE_ITER_INTENT); 2152 + iter->flags & BTREE_ITER_intent); 2276 2153 iter->update_path = 0; 2277 2154 } 2278 2155 ··· 2296 2171 * isn't monotonically increasing before FILTER_SNAPSHOTS, and 2297 2172 * that's what we check against in extents mode: 2298 2173 */ 2299 - if (unlikely(!(iter->flags & BTREE_ITER_IS_EXTENTS) 2174 + if (unlikely(!(iter->flags & BTREE_ITER_is_extents) 2300 2175 ? bkey_gt(k.k->p, end) 2301 2176 : k.k->p.inode > end.inode)) 2302 2177 goto end; ··· 2304 2179 if (iter->update_path && 2305 2180 !bkey_eq(trans->paths[iter->update_path].pos, k.k->p)) { 2306 2181 bch2_path_put_nokeep(trans, iter->update_path, 2307 - iter->flags & BTREE_ITER_INTENT); 2182 + iter->flags & BTREE_ITER_intent); 2308 2183 iter->update_path = 0; 2309 2184 } 2310 2185 2311 - if ((iter->flags & BTREE_ITER_FILTER_SNAPSHOTS) && 2312 - (iter->flags & BTREE_ITER_INTENT) && 2313 - !(iter->flags & BTREE_ITER_IS_EXTENTS) && 2186 + if ((iter->flags & BTREE_ITER_filter_snapshots) && 2187 + (iter->flags & BTREE_ITER_intent) && 2188 + !(iter->flags & BTREE_ITER_is_extents) && 2314 2189 !iter->update_path) { 2315 2190 struct bpos pos = k.k->p; 2316 2191 ··· 2325 2200 * advance, same as on exit for iter->path, but only up 2326 2201 * to snapshot 2327 2202 */ 2328 - __btree_path_get(trans->paths + iter->path, iter->flags & BTREE_ITER_INTENT); 2203 + __btree_path_get(trans->paths + iter->path, iter->flags & BTREE_ITER_intent); 2329 2204 iter->update_path = iter->path; 2330 2205 2331 2206 iter->update_path = bch2_btree_path_set_pos(trans, 2332 2207 iter->update_path, pos, 2333 - iter->flags & BTREE_ITER_INTENT, 2208 + iter->flags & BTREE_ITER_intent, 2334 2209 _THIS_IP_); 2335 2210 ret = bch2_btree_path_traverse(trans, iter->update_path, iter->flags); 2336 2211 if (unlikely(ret)) { ··· 2343 2218 * We can never have a key in a leaf node at POS_MAX, so 2344 2219 * we don't have to check these successor() calls: 2345 2220 */ 2346 - if ((iter->flags & BTREE_ITER_FILTER_SNAPSHOTS) && 2221 + if ((iter->flags & BTREE_ITER_filter_snapshots) && 2347 2222 !bch2_snapshot_is_ancestor(trans->c, 2348 2223 iter->snapshot, 2349 2224 k.k->p.snapshot)) { ··· 2352 2227 } 2353 2228 2354 2229 if (bkey_whiteout(k.k) && 2355 - !(iter->flags & BTREE_ITER_ALL_SNAPSHOTS)) { 2230 + !(iter->flags & BTREE_ITER_all_snapshots)) { 2356 2231 search_key = bkey_successor(iter, k.k->p); 2357 2232 continue; 2358 2233 } ··· 2362 2237 * equal to the key we just returned - except extents can 2363 2238 * straddle iter->pos: 2364 2239 */ 2365 - if (!(iter->flags & BTREE_ITER_IS_EXTENTS)) 2240 + if (!(iter->flags & BTREE_ITER_is_extents)) 2366 2241 iter_pos = k.k->p; 2367 2242 else 2368 2243 iter_pos = bkey_max(iter->pos, bkey_start_pos(k.k)); 2369 2244 2370 - if (unlikely(!(iter->flags & BTREE_ITER_IS_EXTENTS) 2245 + if (unlikely(!(iter->flags & BTREE_ITER_is_extents) 2371 2246 ? bkey_gt(iter_pos, end) 2372 2247 : bkey_ge(iter_pos, end))) 2373 2248 goto end; ··· 2378 2253 iter->pos = iter_pos; 2379 2254 2380 2255 iter->path = bch2_btree_path_set_pos(trans, iter->path, k.k->p, 2381 - iter->flags & BTREE_ITER_INTENT, 2256 + iter->flags & BTREE_ITER_intent, 2382 2257 btree_iter_ip_allocated(iter)); 2383 2258 2384 2259 btree_path_set_should_be_locked(btree_iter_path(trans, iter)); ··· 2391 2266 btree_path_set_should_be_locked(trans->paths + iter->update_path); 2392 2267 } 2393 2268 2394 - if (!(iter->flags & BTREE_ITER_ALL_SNAPSHOTS)) 2269 + if (!(iter->flags & BTREE_ITER_all_snapshots)) 2395 2270 iter->pos.snapshot = iter->snapshot; 2396 2271 2397 2272 ret = bch2_btree_iter_verify_ret(iter, k); ··· 2441 2316 btree_path_idx_t saved_path = 0; 2442 2317 int ret; 2443 2318 2319 + bch2_trans_verify_not_unlocked(trans); 2444 2320 EBUG_ON(btree_iter_path(trans, iter)->cached || 2445 2321 btree_iter_path(trans, iter)->level); 2446 2322 2447 - if (iter->flags & BTREE_ITER_WITH_JOURNAL) 2323 + if (iter->flags & BTREE_ITER_with_journal) 2448 2324 return bkey_s_c_err(-BCH_ERR_btree_iter_with_journal_not_supported); 2449 2325 2450 2326 bch2_btree_iter_verify(iter); 2451 2327 bch2_btree_iter_verify_entry_exit(iter); 2452 2328 2453 - if (iter->flags & BTREE_ITER_FILTER_SNAPSHOTS) 2329 + if (iter->flags & BTREE_ITER_filter_snapshots) 2454 2330 search_key.snapshot = U32_MAX; 2455 2331 2456 2332 while (1) { 2457 2333 iter->path = bch2_btree_path_set_pos(trans, iter->path, search_key, 2458 - iter->flags & BTREE_ITER_INTENT, 2334 + iter->flags & BTREE_ITER_intent, 2459 2335 btree_iter_ip_allocated(iter)); 2460 2336 2461 2337 ret = bch2_btree_path_traverse(trans, iter->path, iter->flags); ··· 2471 2345 2472 2346 k = btree_path_level_peek(trans, path, &path->l[0], &iter->k); 2473 2347 if (!k.k || 2474 - ((iter->flags & BTREE_ITER_IS_EXTENTS) 2348 + ((iter->flags & BTREE_ITER_is_extents) 2475 2349 ? bpos_ge(bkey_start_pos(k.k), search_key) 2476 2350 : bpos_gt(k.k->p, search_key))) 2477 2351 k = btree_path_level_prev(trans, path, &path->l[0], &iter->k); 2478 2352 2479 - if (unlikely((iter->flags & BTREE_ITER_WITH_UPDATES) && 2353 + if (unlikely((iter->flags & BTREE_ITER_with_updates) && 2480 2354 trans->nr_updates)) 2481 2355 bch2_btree_trans_peek_prev_updates(trans, iter, &k); 2482 2356 2483 2357 if (likely(k.k)) { 2484 - if (iter->flags & BTREE_ITER_FILTER_SNAPSHOTS) { 2358 + if (iter->flags & BTREE_ITER_filter_snapshots) { 2485 2359 if (k.k->p.snapshot == iter->snapshot) 2486 2360 goto got_key; 2487 2361 ··· 2492 2366 */ 2493 2367 if (saved_path && !bkey_eq(k.k->p, saved_k.p)) { 2494 2368 bch2_path_put_nokeep(trans, iter->path, 2495 - iter->flags & BTREE_ITER_INTENT); 2369 + iter->flags & BTREE_ITER_intent); 2496 2370 iter->path = saved_path; 2497 2371 saved_path = 0; 2498 2372 iter->k = saved_k; ··· 2505 2379 k.k->p.snapshot)) { 2506 2380 if (saved_path) 2507 2381 bch2_path_put_nokeep(trans, saved_path, 2508 - iter->flags & BTREE_ITER_INTENT); 2382 + iter->flags & BTREE_ITER_intent); 2509 2383 saved_path = btree_path_clone(trans, iter->path, 2510 - iter->flags & BTREE_ITER_INTENT); 2384 + iter->flags & BTREE_ITER_intent, 2385 + _THIS_IP_); 2511 2386 path = btree_iter_path(trans, iter); 2512 2387 saved_k = *k.k; 2513 2388 saved_v = k.v; ··· 2519 2392 } 2520 2393 got_key: 2521 2394 if (bkey_whiteout(k.k) && 2522 - !(iter->flags & BTREE_ITER_ALL_SNAPSHOTS)) { 2395 + !(iter->flags & BTREE_ITER_all_snapshots)) { 2523 2396 search_key = bkey_predecessor(iter, k.k->p); 2524 - if (iter->flags & BTREE_ITER_FILTER_SNAPSHOTS) 2397 + if (iter->flags & BTREE_ITER_filter_snapshots) 2525 2398 search_key.snapshot = U32_MAX; 2526 2399 continue; 2527 2400 } ··· 2545 2418 if (bkey_lt(k.k->p, iter->pos)) 2546 2419 iter->pos = k.k->p; 2547 2420 2548 - if (iter->flags & BTREE_ITER_FILTER_SNAPSHOTS) 2421 + if (iter->flags & BTREE_ITER_filter_snapshots) 2549 2422 iter->pos.snapshot = iter->snapshot; 2550 2423 out_no_locked: 2551 2424 if (saved_path) 2552 - bch2_path_put_nokeep(trans, saved_path, iter->flags & BTREE_ITER_INTENT); 2425 + bch2_path_put_nokeep(trans, saved_path, iter->flags & BTREE_ITER_intent); 2553 2426 2554 2427 bch2_btree_iter_verify_entry_exit(iter); 2555 2428 bch2_btree_iter_verify(iter); ··· 2579 2452 struct bkey_s_c k; 2580 2453 int ret; 2581 2454 2455 + bch2_trans_verify_not_unlocked(trans); 2582 2456 bch2_btree_iter_verify(iter); 2583 2457 bch2_btree_iter_verify_entry_exit(iter); 2584 - EBUG_ON(btree_iter_path(trans, iter)->level && (iter->flags & BTREE_ITER_WITH_KEY_CACHE)); 2458 + EBUG_ON(btree_iter_path(trans, iter)->level && (iter->flags & BTREE_ITER_with_key_cache)); 2585 2459 2586 2460 /* extents can't span inode numbers: */ 2587 - if ((iter->flags & BTREE_ITER_IS_EXTENTS) && 2461 + if ((iter->flags & BTREE_ITER_is_extents) && 2588 2462 unlikely(iter->pos.offset == KEY_OFFSET_MAX)) { 2589 2463 if (iter->pos.inode == KEY_INODE_MAX) 2590 2464 return bkey_s_c_null; ··· 2595 2467 2596 2468 search_key = btree_iter_search_key(iter); 2597 2469 iter->path = bch2_btree_path_set_pos(trans, iter->path, search_key, 2598 - iter->flags & BTREE_ITER_INTENT, 2470 + iter->flags & BTREE_ITER_intent, 2599 2471 btree_iter_ip_allocated(iter)); 2600 2472 2601 2473 ret = bch2_btree_path_traverse(trans, iter->path, iter->flags); ··· 2604 2476 goto out_no_locked; 2605 2477 } 2606 2478 2607 - if ((iter->flags & BTREE_ITER_CACHED) || 2608 - !(iter->flags & (BTREE_ITER_IS_EXTENTS|BTREE_ITER_FILTER_SNAPSHOTS))) { 2479 + if ((iter->flags & BTREE_ITER_cached) || 2480 + !(iter->flags & (BTREE_ITER_is_extents|BTREE_ITER_filter_snapshots))) { 2609 2481 k = bkey_s_c_null; 2610 2482 2611 - if (unlikely((iter->flags & BTREE_ITER_WITH_UPDATES) && 2483 + if (unlikely((iter->flags & BTREE_ITER_with_updates) && 2612 2484 trans->nr_updates)) { 2613 2485 bch2_btree_trans_peek_slot_updates(trans, iter, &k); 2614 2486 if (k.k) 2615 2487 goto out; 2616 2488 } 2617 2489 2618 - if (unlikely(iter->flags & BTREE_ITER_WITH_JOURNAL) && 2490 + if (unlikely(iter->flags & BTREE_ITER_with_journal) && 2619 2491 (k = btree_trans_peek_slot_journal(trans, iter)).k) 2620 2492 goto out; 2621 2493 2622 - if (unlikely(iter->flags & BTREE_ITER_WITH_KEY_CACHE) && 2494 + if (unlikely(iter->flags & BTREE_ITER_with_key_cache) && 2623 2495 (k = btree_trans_peek_key_cache(iter, iter->pos)).k) { 2624 2496 if (!bkey_err(k)) 2625 2497 iter->k = *k.k; ··· 2634 2506 struct bpos next; 2635 2507 struct bpos end = iter->pos; 2636 2508 2637 - if (iter->flags & BTREE_ITER_IS_EXTENTS) 2509 + if (iter->flags & BTREE_ITER_is_extents) 2638 2510 end.offset = U64_MAX; 2639 2511 2640 2512 EBUG_ON(btree_iter_path(trans, iter)->level); 2641 2513 2642 - if (iter->flags & BTREE_ITER_INTENT) { 2514 + if (iter->flags & BTREE_ITER_intent) { 2643 2515 struct btree_iter iter2; 2644 2516 2645 2517 bch2_trans_copy_iter(&iter2, iter); ··· 2670 2542 bkey_init(&iter->k); 2671 2543 iter->k.p = iter->pos; 2672 2544 2673 - if (iter->flags & BTREE_ITER_IS_EXTENTS) { 2545 + if (iter->flags & BTREE_ITER_is_extents) { 2674 2546 bch2_key_resize(&iter->k, 2675 2547 min_t(u64, KEY_SIZE_MAX, 2676 2548 (next.inode == iter->pos.inode ··· 2854 2726 { 2855 2727 if (iter->update_path) 2856 2728 bch2_path_put_nokeep(trans, iter->update_path, 2857 - iter->flags & BTREE_ITER_INTENT); 2729 + iter->flags & BTREE_ITER_intent); 2858 2730 if (iter->path) 2859 2731 bch2_path_put(trans, iter->path, 2860 - iter->flags & BTREE_ITER_INTENT); 2732 + iter->flags & BTREE_ITER_intent); 2861 2733 if (iter->key_cache_path) 2862 2734 bch2_path_put(trans, iter->key_cache_path, 2863 - iter->flags & BTREE_ITER_INTENT); 2735 + iter->flags & BTREE_ITER_intent); 2864 2736 iter->path = 0; 2865 2737 iter->update_path = 0; 2866 2738 iter->key_cache_path = 0; ··· 2885 2757 unsigned depth, 2886 2758 unsigned flags) 2887 2759 { 2888 - flags |= BTREE_ITER_NOT_EXTENTS; 2889 - flags |= __BTREE_ITER_ALL_SNAPSHOTS; 2890 - flags |= BTREE_ITER_ALL_SNAPSHOTS; 2760 + flags |= BTREE_ITER_not_extents; 2761 + flags |= BTREE_ITER_snapshot_field; 2762 + flags |= BTREE_ITER_all_snapshots; 2891 2763 2892 2764 bch2_trans_iter_init_common(trans, iter, btree_id, pos, locks_want, depth, 2893 2765 __bch2_btree_iter_flags(trans, btree_id, flags), ··· 2910 2782 dst->ip_allocated = _RET_IP_; 2911 2783 #endif 2912 2784 if (src->path) 2913 - __btree_path_get(trans->paths + src->path, src->flags & BTREE_ITER_INTENT); 2785 + __btree_path_get(trans->paths + src->path, src->flags & BTREE_ITER_intent); 2914 2786 if (src->update_path) 2915 - __btree_path_get(trans->paths + src->update_path, src->flags & BTREE_ITER_INTENT); 2787 + __btree_path_get(trans->paths + src->update_path, src->flags & BTREE_ITER_intent); 2916 2788 dst->key_cache_path = 0; 2917 2789 } 2918 2790 ··· 3081 2953 if (!trans->restarted && 3082 2954 (need_resched() || 3083 2955 time_after64(now, trans->last_begin_time + BTREE_TRANS_MAX_LOCK_HOLD_TIME_NS))) { 3084 - drop_locks_do(trans, (cond_resched(), 0)); 2956 + bch2_trans_unlock(trans); 2957 + cond_resched(); 3085 2958 now = local_clock(); 3086 2959 } 3087 2960 trans->last_begin_time = now; ··· 3092 2963 bch2_trans_srcu_unlock(trans); 3093 2964 3094 2965 trans->last_begin_ip = _RET_IP_; 2966 + trans->locked = true; 2967 + 3095 2968 if (trans->restarted) { 3096 2969 bch2_btree_path_traverse_all(trans); 3097 2970 trans->notrace_relock_fail = false; 3098 2971 } 3099 2972 2973 + bch2_trans_verify_not_unlocked(trans); 3100 2974 return trans->restart_count; 3101 2975 } 3102 2976 ··· 3152 3020 */ 3153 3021 BUG_ON(pos_task && 3154 3022 pid == pos_task->pid && 3155 - bch2_trans_locked(pos)); 3023 + pos->locked); 3156 3024 3157 3025 if (pos_task && pid < pos_task->pid) { 3158 3026 list_add_tail(&trans->list, &pos->list); ··· 3168 3036 trans->last_begin_time = local_clock(); 3169 3037 trans->fn_idx = fn_idx; 3170 3038 trans->locking_wait.task = current; 3039 + trans->locked = true; 3171 3040 trans->journal_replay_not_finished = 3172 - unlikely(!test_bit(JOURNAL_REPLAY_DONE, &c->journal.flags)) && 3041 + unlikely(!test_bit(JOURNAL_replay_done, &c->journal.flags)) && 3173 3042 atomic_inc_not_zero(&c->journal_keys.ref); 3174 3043 trans->nr_paths = ARRAY_SIZE(trans->_paths); 3175 3044 trans->paths_allocated = trans->_paths_allocated; ··· 3299 3166 pid = owner ? owner->pid : 0; 3300 3167 rcu_read_unlock(); 3301 3168 3302 - prt_tab(out); 3303 - prt_printf(out, "%px %c l=%u %s:", b, b->cached ? 'c' : 'b', 3169 + prt_printf(out, "\t%px %c l=%u %s:", b, b->cached ? 'c' : 'b', 3304 3170 b->level, bch2_btree_id_str(b->btree_id)); 3305 3171 bch2_bpos_to_text(out, btree_node_pos(b)); 3306 3172 3307 - prt_tab(out); 3308 - prt_printf(out, " locks %u:%u:%u held by pid %u", 3173 + prt_printf(out, "\t locks %u:%u:%u held by pid %u", 3309 3174 c.n[0], c.n[1], c.n[2], pid); 3310 3175 } 3311 3176 ··· 3360 3229 3361 3230 b = READ_ONCE(trans->locking); 3362 3231 if (b) { 3363 - prt_printf(out, " blocked for %lluus on", 3364 - div_u64(local_clock() - trans->locking_wait.start_time, 3365 - 1000)); 3366 - prt_newline(out); 3232 + prt_printf(out, " blocked for %lluus on\n", 3233 + div_u64(local_clock() - trans->locking_wait.start_time, 1000)); 3367 3234 prt_printf(out, " %c", lock_types[trans->locking_wait.lock_want]); 3368 3235 bch2_btree_bkey_cached_common_to_text(out, b); 3369 3236 prt_newline(out);
+51 -44
fs/bcachefs/btree_iter.h
··· 216 216 btree_path_idx_t, 217 217 unsigned, unsigned long); 218 218 219 + static inline void bch2_trans_verify_not_unlocked(struct btree_trans *); 220 + 219 221 static inline int __must_check bch2_btree_path_traverse(struct btree_trans *trans, 220 222 btree_path_idx_t path, unsigned flags) 221 223 { 224 + bch2_trans_verify_not_unlocked(trans); 225 + 222 226 if (trans->paths[path].uptodate < BTREE_ITER_NEED_RELOCK) 223 227 return 0; 224 228 ··· 231 227 232 228 btree_path_idx_t bch2_path_get(struct btree_trans *, enum btree_id, struct bpos, 233 229 unsigned, unsigned, unsigned, unsigned long); 230 + btree_path_idx_t bch2_path_get_unlocked_mut(struct btree_trans *, enum btree_id, 231 + unsigned, struct bpos); 232 + 234 233 struct bkey_s_c bch2_btree_path_peek_slot(struct btree_path *, struct bkey *); 235 234 236 235 /* ··· 290 283 int bch2_trans_relock_notrace(struct btree_trans *); 291 284 void bch2_trans_unlock(struct btree_trans *); 292 285 void bch2_trans_unlock_long(struct btree_trans *); 293 - bool bch2_trans_locked(struct btree_trans *); 294 286 295 287 static inline int trans_was_restarted(struct btree_trans *trans, u32 restart_count) 296 288 { ··· 313 307 { 314 308 if (trans->restarted) 315 309 bch2_trans_in_restart_error(trans); 310 + } 311 + 312 + void __noreturn bch2_trans_unlocked_error(struct btree_trans *); 313 + 314 + static inline void bch2_trans_verify_not_unlocked(struct btree_trans *trans) 315 + { 316 + if (!trans->locked) 317 + bch2_trans_unlocked_error(trans); 316 318 } 317 319 318 320 __always_inline ··· 400 386 401 387 if (unlikely(iter->update_path)) 402 388 bch2_path_put(trans, iter->update_path, 403 - iter->flags & BTREE_ITER_INTENT); 389 + iter->flags & BTREE_ITER_intent); 404 390 iter->update_path = 0; 405 391 406 - if (!(iter->flags & BTREE_ITER_ALL_SNAPSHOTS)) 392 + if (!(iter->flags & BTREE_ITER_all_snapshots)) 407 393 new_pos.snapshot = iter->snapshot; 408 394 409 395 __bch2_btree_iter_set_pos(iter, new_pos); ··· 411 397 412 398 static inline void bch2_btree_iter_set_pos_to_extent_start(struct btree_iter *iter) 413 399 { 414 - BUG_ON(!(iter->flags & BTREE_ITER_IS_EXTENTS)); 400 + BUG_ON(!(iter->flags & BTREE_ITER_is_extents)); 415 401 iter->pos = bkey_start_pos(&iter->k); 416 402 } 417 403 ··· 430 416 unsigned btree_id, 431 417 unsigned flags) 432 418 { 433 - if (!(flags & (BTREE_ITER_ALL_SNAPSHOTS|BTREE_ITER_NOT_EXTENTS)) && 419 + if (!(flags & (BTREE_ITER_all_snapshots|BTREE_ITER_not_extents)) && 434 420 btree_id_is_extents(btree_id)) 435 - flags |= BTREE_ITER_IS_EXTENTS; 421 + flags |= BTREE_ITER_is_extents; 436 422 437 - if (!(flags & __BTREE_ITER_ALL_SNAPSHOTS) && 423 + if (!(flags & BTREE_ITER_snapshot_field) && 438 424 !btree_type_has_snapshot_field(btree_id)) 439 - flags &= ~BTREE_ITER_ALL_SNAPSHOTS; 425 + flags &= ~BTREE_ITER_all_snapshots; 440 426 441 - if (!(flags & BTREE_ITER_ALL_SNAPSHOTS) && 427 + if (!(flags & BTREE_ITER_all_snapshots) && 442 428 btree_type_has_snapshots(btree_id)) 443 - flags |= BTREE_ITER_FILTER_SNAPSHOTS; 429 + flags |= BTREE_ITER_filter_snapshots; 444 430 445 431 if (trans->journal_replay_not_finished) 446 - flags |= BTREE_ITER_WITH_JOURNAL; 432 + flags |= BTREE_ITER_with_journal; 447 433 448 434 return flags; 449 435 } ··· 453 439 unsigned flags) 454 440 { 455 441 if (!btree_id_cached(trans->c, btree_id)) { 456 - flags &= ~BTREE_ITER_CACHED; 457 - flags &= ~BTREE_ITER_WITH_KEY_CACHE; 458 - } else if (!(flags & BTREE_ITER_CACHED)) 459 - flags |= BTREE_ITER_WITH_KEY_CACHE; 442 + flags &= ~BTREE_ITER_cached; 443 + flags &= ~BTREE_ITER_with_key_cache; 444 + } else if (!(flags & BTREE_ITER_cached)) 445 + flags |= BTREE_ITER_with_key_cache; 460 446 461 447 return __bch2_btree_iter_flags(trans, btree_id, flags); 462 448 } ··· 508 494 unsigned, unsigned, unsigned); 509 495 void bch2_trans_copy_iter(struct btree_iter *, struct btree_iter *); 510 496 511 - static inline void set_btree_iter_dontneed(struct btree_iter *iter) 512 - { 513 - struct btree_trans *trans = iter->trans; 514 - 515 - if (!iter->path || trans->restarted) 516 - return; 517 - 518 - struct btree_path *path = btree_iter_path(trans, iter); 519 - path->preserve = false; 520 - if (path->ref == 1) 521 - path->should_be_locked = false; 522 - } 497 + void bch2_set_btree_iter_dontneed(struct btree_iter *); 523 498 524 499 void *__bch2_trans_kmalloc(struct btree_trans *, size_t); 525 500 ··· 622 619 static inline struct bkey_s_c bch2_btree_iter_peek_prev_type(struct btree_iter *iter, 623 620 unsigned flags) 624 621 { 625 - return flags & BTREE_ITER_SLOTS ? bch2_btree_iter_peek_slot(iter) : 622 + return flags & BTREE_ITER_slots ? bch2_btree_iter_peek_slot(iter) : 626 623 bch2_btree_iter_peek_prev(iter); 627 624 } 628 625 629 626 static inline struct bkey_s_c bch2_btree_iter_peek_type(struct btree_iter *iter, 630 627 unsigned flags) 631 628 { 632 - return flags & BTREE_ITER_SLOTS ? bch2_btree_iter_peek_slot(iter) : 629 + return flags & BTREE_ITER_slots ? bch2_btree_iter_peek_slot(iter) : 633 630 bch2_btree_iter_peek(iter); 634 631 } 635 632 ··· 637 634 struct bpos end, 638 635 unsigned flags) 639 636 { 640 - if (!(flags & BTREE_ITER_SLOTS)) 637 + if (!(flags & BTREE_ITER_slots)) 641 638 return bch2_btree_iter_peek_upto(iter, end); 642 639 643 640 if (bkey_gt(iter->pos, end)) ··· 702 699 _ret2 ?: trans_was_restarted(_trans, _restart_count); \ 703 700 }) 704 701 705 - #define for_each_btree_key_upto(_trans, _iter, _btree_id, \ 706 - _start, _end, _flags, _k, _do) \ 702 + #define for_each_btree_key_upto_continue(_trans, _iter, \ 703 + _end, _flags, _k, _do) \ 707 704 ({ \ 708 - struct btree_iter _iter; \ 709 705 struct bkey_s_c _k; \ 710 706 int _ret3 = 0; \ 711 - \ 712 - bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \ 713 - (_start), (_flags)); \ 714 707 \ 715 708 do { \ 716 709 _ret3 = lockrestart_do(_trans, ({ \ ··· 721 722 \ 722 723 bch2_trans_iter_exit((_trans), &(_iter)); \ 723 724 _ret3; \ 725 + }) 726 + 727 + #define for_each_btree_key_continue(_trans, _iter, _flags, _k, _do) \ 728 + for_each_btree_key_upto_continue(_trans, _iter, SPOS_MAX, _flags, _k, _do) 729 + 730 + #define for_each_btree_key_upto(_trans, _iter, _btree_id, \ 731 + _start, _end, _flags, _k, _do) \ 732 + ({ \ 733 + bch2_trans_begin(trans); \ 734 + \ 735 + struct btree_iter _iter; \ 736 + bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \ 737 + (_start), (_flags)); \ 738 + \ 739 + for_each_btree_key_upto_continue(_trans, _iter, _end, _flags, _k, _do);\ 724 740 }) 725 741 726 742 #define for_each_btree_key(_trans, _iter, _btree_id, \ ··· 808 794 return k; 809 795 } 810 796 811 - #define for_each_btree_key_old(_trans, _iter, _btree_id, \ 812 - _start, _flags, _k, _ret) \ 813 - for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \ 814 - (_start), (_flags)); \ 815 - (_k) = __bch2_btree_iter_peek_and_restart((_trans), &(_iter), _flags),\ 816 - !((_ret) = bkey_err(_k)) && (_k).k; \ 817 - bch2_btree_iter_advance(&(_iter))) 818 - 819 797 #define for_each_btree_key_upto_norestart(_trans, _iter, _btree_id, \ 820 798 _start, _end, _flags, _k, _ret) \ 821 799 for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \ ··· 867 861 }) 868 862 869 863 void bch2_trans_updates_to_text(struct printbuf *, struct btree_trans *); 864 + void bch2_btree_path_to_text(struct printbuf *, struct btree_trans *, btree_path_idx_t); 870 865 void bch2_trans_paths_to_text(struct printbuf *, struct btree_trans *); 871 866 void bch2_dump_trans_updates(struct btree_trans *); 872 867 void bch2_dump_trans_paths_updates(struct btree_trans *);
+17
fs/bcachefs/btree_journal_iter.c
··· 623 623 keys->data[dst++] = *i; 624 624 keys->nr = keys->gap = dst; 625 625 } 626 + 627 + void bch2_journal_keys_dump(struct bch_fs *c) 628 + { 629 + struct journal_keys *keys = &c->journal_keys; 630 + struct printbuf buf = PRINTBUF; 631 + 632 + pr_info("%zu keys:", keys->nr); 633 + 634 + move_gap(keys, keys->nr); 635 + 636 + darray_for_each(*keys, i) { 637 + printbuf_reset(&buf); 638 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(i->k)); 639 + pr_err("%s l=%u %s", bch2_btree_id_str(i->btree_id), i->level, buf.buf); 640 + } 641 + printbuf_exit(&buf); 642 + }
+2
fs/bcachefs/btree_journal_iter.h
··· 70 70 unsigned, unsigned, 71 71 struct bpos, struct bpos); 72 72 73 + void bch2_journal_keys_dump(struct bch_fs *); 74 + 73 75 #endif /* _BCACHEFS_BTREE_JOURNAL_ITER_H */
+68 -39
fs/bcachefs/btree_key_cache.c
··· 383 383 int ret; 384 384 385 385 bch2_trans_iter_init(trans, &iter, ck->key.btree_id, ck->key.pos, 386 - BTREE_ITER_KEY_CACHE_FILL| 387 - BTREE_ITER_CACHED_NOFILL); 388 - iter.flags &= ~BTREE_ITER_WITH_JOURNAL; 386 + BTREE_ITER_key_cache_fill| 387 + BTREE_ITER_cached_nofill); 388 + iter.flags &= ~BTREE_ITER_with_journal; 389 389 k = bch2_btree_iter_peek_slot(&iter); 390 390 ret = bkey_err(k); 391 391 if (ret) ··· 456 456 bch2_btree_node_unlock_write(trans, ck_path, ck_path->l[0].b); 457 457 458 458 /* We're not likely to need this iterator again: */ 459 - set_btree_iter_dontneed(&iter); 459 + bch2_set_btree_iter_dontneed(&iter); 460 460 err: 461 461 bch2_trans_iter_exit(trans, &iter); 462 462 return ret; ··· 515 515 fill: 516 516 path->uptodate = BTREE_ITER_UPTODATE; 517 517 518 - if (!ck->valid && !(flags & BTREE_ITER_CACHED_NOFILL)) { 519 - /* 520 - * Using the underscore version because we haven't set 521 - * path->uptodate yet: 522 - */ 523 - if (!path->locks_want && 524 - !__bch2_btree_path_upgrade(trans, path, 1, NULL)) { 525 - trace_and_count(trans->c, trans_restart_key_cache_upgrade, trans, _THIS_IP_); 526 - ret = btree_trans_restart(trans, BCH_ERR_transaction_restart_key_cache_upgrade); 527 - goto err; 528 - } 529 - 530 - ret = btree_key_cache_fill(trans, path, ck); 531 - if (ret) 532 - goto err; 533 - 534 - ret = bch2_btree_path_relock(trans, path, _THIS_IP_); 518 + if (!ck->valid && !(flags & BTREE_ITER_cached_nofill)) { 519 + ret = bch2_btree_path_upgrade(trans, path, 1) ?: 520 + btree_key_cache_fill(trans, path, ck) ?: 521 + bch2_btree_path_relock(trans, path, _THIS_IP_); 535 522 if (ret) 536 523 goto err; 537 524 ··· 609 622 int ret; 610 623 611 624 bch2_trans_iter_init(trans, &b_iter, key.btree_id, key.pos, 612 - BTREE_ITER_SLOTS| 613 - BTREE_ITER_INTENT| 614 - BTREE_ITER_ALL_SNAPSHOTS); 625 + BTREE_ITER_slots| 626 + BTREE_ITER_intent| 627 + BTREE_ITER_all_snapshots); 615 628 bch2_trans_iter_init(trans, &c_iter, key.btree_id, key.pos, 616 - BTREE_ITER_CACHED| 617 - BTREE_ITER_INTENT); 618 - b_iter.flags &= ~BTREE_ITER_WITH_KEY_CACHE; 629 + BTREE_ITER_cached| 630 + BTREE_ITER_intent); 631 + b_iter.flags &= ~BTREE_ITER_with_key_cache; 619 632 620 633 ret = bch2_btree_iter_traverse(&c_iter); 621 634 if (ret) ··· 648 661 commit_flags |= BCH_WATERMARK_reclaim; 649 662 650 663 if (ck->journal.seq != journal_last_seq(j) || 651 - !test_bit(JOURNAL_SPACE_LOW, &c->journal.flags)) 664 + !test_bit(JOURNAL_space_low, &c->journal.flags)) 652 665 commit_flags |= BCH_TRANS_COMMIT_no_journal_res; 653 666 654 667 ret = bch2_btree_iter_traverse(&b_iter) ?: 655 668 bch2_trans_update(trans, &b_iter, ck->k, 656 - BTREE_UPDATE_KEY_CACHE_RECLAIM| 657 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE| 658 - BTREE_TRIGGER_NORUN) ?: 669 + BTREE_UPDATE_key_cache_reclaim| 670 + BTREE_UPDATE_internal_snapshot_node| 671 + BTREE_TRIGGER_norun) ?: 659 672 bch2_trans_commit(trans, NULL, NULL, 660 673 BCH_TRANS_COMMIT_no_check_rw| 661 674 BCH_TRANS_COMMIT_no_enospc| ··· 777 790 * flushing. The flush callback will not proceed unless ->seq matches 778 791 * the latest pin, so make sure it starts with a consistent value. 779 792 */ 780 - if (!(insert_entry->flags & BTREE_UPDATE_NOJOURNAL) || 793 + if (!(insert_entry->flags & BTREE_UPDATE_nojournal) || 781 794 !journal_pin_active(&ck->journal)) { 782 795 ck->seq = trans->journal_res.seq; 783 796 } ··· 822 835 int srcu_idx; 823 836 824 837 mutex_lock(&bc->lock); 838 + bc->requested_to_free += sc->nr_to_scan; 839 + 825 840 srcu_idx = srcu_read_lock(&c->btree_trans_barrier); 826 841 flags = memalloc_nofs_save(); 827 842 ··· 842 853 atomic_long_dec(&bc->nr_freed); 843 854 freed++; 844 855 bc->nr_freed_nonpcpu--; 856 + bc->freed++; 845 857 } 846 858 847 859 list_for_each_entry_safe(ck, t, &bc->freed_pcpu, list) { ··· 856 866 atomic_long_dec(&bc->nr_freed); 857 867 freed++; 858 868 bc->nr_freed_pcpu--; 869 + bc->freed++; 859 870 } 860 871 861 872 rcu_read_lock(); ··· 875 884 ck = container_of(pos, struct bkey_cached, hash); 876 885 877 886 if (test_bit(BKEY_CACHED_DIRTY, &ck->flags)) { 887 + bc->skipped_dirty++; 878 888 goto next; 879 889 } else if (test_bit(BKEY_CACHED_ACCESSED, &ck->flags)) { 880 890 clear_bit(BKEY_CACHED_ACCESSED, &ck->flags); 891 + bc->skipped_accessed++; 881 892 goto next; 882 893 } else if (bkey_cached_lock_for_evict(ck)) { 883 894 bkey_cached_evict(bc, ck); 884 895 bkey_cached_free(bc, ck); 896 + bc->moved_to_freelist++; 897 + } else { 898 + bc->skipped_lock_fail++; 885 899 } 886 900 887 901 scanned++; ··· 1033 1037 return 0; 1034 1038 } 1035 1039 1036 - void bch2_btree_key_cache_to_text(struct printbuf *out, struct btree_key_cache *c) 1040 + void bch2_btree_key_cache_to_text(struct printbuf *out, struct btree_key_cache *bc) 1037 1041 { 1038 - prt_printf(out, "nr_freed:\t%lu", atomic_long_read(&c->nr_freed)); 1039 - prt_newline(out); 1040 - prt_printf(out, "nr_keys:\t%lu", atomic_long_read(&c->nr_keys)); 1041 - prt_newline(out); 1042 - prt_printf(out, "nr_dirty:\t%lu", atomic_long_read(&c->nr_dirty)); 1043 - prt_newline(out); 1042 + struct bch_fs *c = container_of(bc, struct bch_fs, btree_key_cache); 1043 + 1044 + printbuf_tabstop_push(out, 24); 1045 + printbuf_tabstop_push(out, 12); 1046 + 1047 + unsigned flags = memalloc_nofs_save(); 1048 + mutex_lock(&bc->lock); 1049 + prt_printf(out, "keys:\t%lu\r\n", atomic_long_read(&bc->nr_keys)); 1050 + prt_printf(out, "dirty:\t%lu\r\n", atomic_long_read(&bc->nr_dirty)); 1051 + prt_printf(out, "freelist:\t%lu\r\n", atomic_long_read(&bc->nr_freed)); 1052 + prt_printf(out, "nonpcpu freelist:\t%zu\r\n", bc->nr_freed_nonpcpu); 1053 + prt_printf(out, "pcpu freelist:\t%zu\r\n", bc->nr_freed_pcpu); 1054 + 1055 + prt_printf(out, "\nshrinker:\n"); 1056 + prt_printf(out, "requested_to_free:\t%lu\r\n", bc->requested_to_free); 1057 + prt_printf(out, "freed:\t%lu\r\n", bc->freed); 1058 + prt_printf(out, "moved_to_freelist:\t%lu\r\n", bc->moved_to_freelist); 1059 + prt_printf(out, "skipped_dirty:\t%lu\r\n", bc->skipped_dirty); 1060 + prt_printf(out, "skipped_accessed:\t%lu\r\n", bc->skipped_accessed); 1061 + prt_printf(out, "skipped_lock_fail:\t%lu\r\n", bc->skipped_lock_fail); 1062 + 1063 + prt_printf(out, "srcu seq:\t%lu\r\n", get_state_synchronize_srcu(&c->btree_trans_barrier)); 1064 + 1065 + struct bkey_cached *ck; 1066 + unsigned iter = 0; 1067 + list_for_each_entry(ck, &bc->freed_nonpcpu, list) { 1068 + prt_printf(out, "freed_nonpcpu:\t%lu\r\n", ck->btree_trans_barrier_seq); 1069 + if (++iter > 10) 1070 + break; 1071 + } 1072 + 1073 + iter = 0; 1074 + list_for_each_entry(ck, &bc->freed_pcpu, list) { 1075 + prt_printf(out, "freed_pcpu:\t%lu\r\n", ck->btree_trans_barrier_seq); 1076 + if (++iter > 10) 1077 + break; 1078 + } 1079 + mutex_unlock(&bc->lock); 1080 + memalloc_flags_restore(flags); 1044 1081 } 1045 1082 1046 1083 void bch2_btree_key_cache_exit(void)
+8
fs/bcachefs/btree_key_cache_types.h
··· 24 24 atomic_long_t nr_freed; 25 25 atomic_long_t nr_keys; 26 26 atomic_long_t nr_dirty; 27 + 28 + /* shrinker stats */ 29 + unsigned long requested_to_free; 30 + unsigned long freed; 31 + unsigned long moved_to_freelist; 32 + unsigned long skipped_dirty; 33 + unsigned long skipped_accessed; 34 + unsigned long skipped_lock_fail; 27 35 }; 28 36 29 37 struct bkey_cached_key {
+106 -77
fs/bcachefs/btree_locking.c
··· 83 83 { 84 84 struct trans_waiting_for_lock *i; 85 85 86 - prt_printf(out, "Found lock cycle (%u entries):", g->nr); 87 - prt_newline(out); 86 + prt_printf(out, "Found lock cycle (%u entries):\n", g->nr); 88 87 89 88 for (i = g->g; i < g->g + g->nr; i++) { 90 89 struct task_struct *task = READ_ONCE(i->trans->locking_wait.task); ··· 223 224 224 225 bch2_btree_trans_to_text(&buf, trans); 225 226 226 - prt_printf(&buf, "backtrace:"); 227 - prt_newline(&buf); 227 + prt_printf(&buf, "backtrace:\n"); 228 228 printbuf_indent_add(&buf, 2); 229 229 bch2_prt_task_backtrace(&buf, trans->locking_wait.task, 2, GFP_NOWAIT); 230 230 printbuf_indent_sub(&buf, 2); ··· 490 492 if (path->uptodate == BTREE_ITER_NEED_RELOCK) 491 493 path->uptodate = BTREE_ITER_UPTODATE; 492 494 493 - bch2_trans_verify_locks(trans); 494 - 495 495 return path->uptodate < BTREE_ITER_NEED_RELOCK; 496 496 } 497 497 ··· 605 609 { 606 610 struct get_locks_fail f; 607 611 608 - return btree_path_get_locks(trans, path, false, &f); 612 + bool ret = btree_path_get_locks(trans, path, false, &f); 613 + bch2_trans_verify_locks(trans); 614 + return ret; 609 615 } 610 616 611 617 int __bch2_btree_path_relock(struct btree_trans *trans, ··· 630 632 631 633 path->locks_want = new_locks_want; 632 634 633 - return btree_path_get_locks(trans, path, true, f); 635 + bool ret = btree_path_get_locks(trans, path, true, f); 636 + bch2_trans_verify_locks(trans); 637 + return ret; 634 638 } 635 639 636 640 bool __bch2_btree_path_upgrade(struct btree_trans *trans, ··· 640 640 unsigned new_locks_want, 641 641 struct get_locks_fail *f) 642 642 { 643 - if (bch2_btree_path_upgrade_noupgrade_sibs(trans, path, new_locks_want, f)) 644 - return true; 643 + bool ret = bch2_btree_path_upgrade_noupgrade_sibs(trans, path, new_locks_want, f); 644 + if (ret) 645 + goto out; 645 646 646 647 /* 647 648 * XXX: this is ugly - we'd prefer to not be mucking with other ··· 676 675 btree_path_get_locks(trans, linked, true, NULL); 677 676 } 678 677 } 679 - 680 - return false; 678 + out: 679 + bch2_trans_verify_locks(trans); 680 + return ret; 681 681 } 682 682 683 683 void __bch2_btree_path_downgrade(struct btree_trans *trans, ··· 727 725 bch2_btree_path_downgrade(trans, path); 728 726 } 729 727 730 - int bch2_trans_relock(struct btree_trans *trans) 728 + static inline void __bch2_trans_unlock(struct btree_trans *trans) 731 729 { 732 730 struct btree_path *path; 733 731 unsigned i; 734 732 733 + trans_for_each_path(trans, path, i) 734 + __bch2_btree_path_unlock(trans, path); 735 + } 736 + 737 + static noinline __cold int bch2_trans_relock_fail(struct btree_trans *trans, struct btree_path *path, 738 + struct get_locks_fail *f, bool trace) 739 + { 740 + if (!trace) 741 + goto out; 742 + 743 + if (trace_trans_restart_relock_enabled()) { 744 + struct printbuf buf = PRINTBUF; 745 + 746 + bch2_bpos_to_text(&buf, path->pos); 747 + prt_printf(&buf, " l=%u seq=%u node seq=", f->l, path->l[f->l].lock_seq); 748 + if (IS_ERR_OR_NULL(f->b)) { 749 + prt_str(&buf, bch2_err_str(PTR_ERR(f->b))); 750 + } else { 751 + prt_printf(&buf, "%u", f->b->c.lock.seq); 752 + 753 + struct six_lock_count c = 754 + bch2_btree_node_lock_counts(trans, NULL, &f->b->c, f->l); 755 + prt_printf(&buf, " self locked %u.%u.%u", c.n[0], c.n[1], c.n[2]); 756 + 757 + c = six_lock_counts(&f->b->c.lock); 758 + prt_printf(&buf, " total locked %u.%u.%u", c.n[0], c.n[1], c.n[2]); 759 + } 760 + 761 + trace_trans_restart_relock(trans, _RET_IP_, buf.buf); 762 + printbuf_exit(&buf); 763 + } 764 + 765 + count_event(trans->c, trans_restart_relock); 766 + out: 767 + __bch2_trans_unlock(trans); 768 + bch2_trans_verify_locks(trans); 769 + return btree_trans_restart(trans, BCH_ERR_transaction_restart_relock); 770 + } 771 + 772 + static inline int __bch2_trans_relock(struct btree_trans *trans, bool trace) 773 + { 774 + bch2_trans_verify_locks(trans); 775 + 735 776 if (unlikely(trans->restarted)) 736 777 return -((int) trans->restarted); 778 + if (unlikely(trans->locked)) 779 + goto out; 780 + 781 + struct btree_path *path; 782 + unsigned i; 737 783 738 784 trans_for_each_path(trans, path, i) { 739 785 struct get_locks_fail f; 740 786 741 787 if (path->should_be_locked && 742 - !btree_path_get_locks(trans, path, false, &f)) { 743 - if (trace_trans_restart_relock_enabled()) { 744 - struct printbuf buf = PRINTBUF; 745 - 746 - bch2_bpos_to_text(&buf, path->pos); 747 - prt_printf(&buf, " l=%u seq=%u node seq=", 748 - f.l, path->l[f.l].lock_seq); 749 - if (IS_ERR_OR_NULL(f.b)) { 750 - prt_str(&buf, bch2_err_str(PTR_ERR(f.b))); 751 - } else { 752 - prt_printf(&buf, "%u", f.b->c.lock.seq); 753 - 754 - struct six_lock_count c = 755 - bch2_btree_node_lock_counts(trans, NULL, &f.b->c, f.l); 756 - prt_printf(&buf, " self locked %u.%u.%u", c.n[0], c.n[1], c.n[2]); 757 - 758 - c = six_lock_counts(&f.b->c.lock); 759 - prt_printf(&buf, " total locked %u.%u.%u", c.n[0], c.n[1], c.n[2]); 760 - } 761 - 762 - trace_trans_restart_relock(trans, _RET_IP_, buf.buf); 763 - printbuf_exit(&buf); 764 - } 765 - 766 - count_event(trans->c, trans_restart_relock); 767 - return btree_trans_restart(trans, BCH_ERR_transaction_restart_relock); 768 - } 788 + !btree_path_get_locks(trans, path, false, &f)) 789 + return bch2_trans_relock_fail(trans, path, &f, trace); 769 790 } 770 791 792 + trans->locked = true; 793 + out: 794 + bch2_trans_verify_locks(trans); 771 795 return 0; 796 + } 797 + 798 + int bch2_trans_relock(struct btree_trans *trans) 799 + { 800 + return __bch2_trans_relock(trans, true); 772 801 } 773 802 774 803 int bch2_trans_relock_notrace(struct btree_trans *trans) 775 804 { 776 - struct btree_path *path; 777 - unsigned i; 778 - 779 - if (unlikely(trans->restarted)) 780 - return -((int) trans->restarted); 781 - 782 - trans_for_each_path(trans, path, i) 783 - if (path->should_be_locked && 784 - !bch2_btree_path_relock_norestart(trans, path)) { 785 - return btree_trans_restart(trans, BCH_ERR_transaction_restart_relock); 786 - } 787 - return 0; 805 + return __bch2_trans_relock(trans, false); 788 806 } 789 807 790 808 void bch2_trans_unlock_noassert(struct btree_trans *trans) 791 809 { 792 - struct btree_path *path; 793 - unsigned i; 810 + __bch2_trans_unlock(trans); 794 811 795 - trans_for_each_path(trans, path, i) 796 - __bch2_btree_path_unlock(trans, path); 812 + trans->locked = false; 813 + trans->last_unlock_ip = _RET_IP_; 797 814 } 798 815 799 816 void bch2_trans_unlock(struct btree_trans *trans) 800 817 { 801 - struct btree_path *path; 802 - unsigned i; 818 + __bch2_trans_unlock(trans); 803 819 804 - trans_for_each_path(trans, path, i) 805 - __bch2_btree_path_unlock(trans, path); 820 + trans->locked = false; 821 + trans->last_unlock_ip = _RET_IP_; 806 822 } 807 823 808 824 void bch2_trans_unlock_long(struct btree_trans *trans) 809 825 { 810 826 bch2_trans_unlock(trans); 811 827 bch2_trans_srcu_unlock(trans); 812 - } 813 - 814 - bool bch2_trans_locked(struct btree_trans *trans) 815 - { 816 - struct btree_path *path; 817 - unsigned i; 818 - 819 - trans_for_each_path(trans, path, i) 820 - if (path->nodes_locked) 821 - return true; 822 - return false; 823 828 } 824 829 825 830 int __bch2_trans_mutex_lock(struct btree_trans *trans, ··· 845 836 846 837 void bch2_btree_path_verify_locks(struct btree_path *path) 847 838 { 848 - unsigned l; 839 + /* 840 + * A path may be uptodate and yet have nothing locked if and only if 841 + * there is no node at path->level, which generally means we were 842 + * iterating over all nodes and got to the end of the btree 843 + */ 844 + BUG_ON(path->uptodate == BTREE_ITER_UPTODATE && 845 + btree_path_node(path, path->level) && 846 + !path->nodes_locked); 849 847 850 - if (!path->nodes_locked) { 851 - BUG_ON(path->uptodate == BTREE_ITER_UPTODATE && 852 - btree_path_node(path, path->level)); 848 + if (!path->nodes_locked) 853 849 return; 854 - } 855 850 856 - for (l = 0; l < BTREE_MAX_DEPTH; l++) { 851 + for (unsigned l = 0; l < BTREE_MAX_DEPTH; l++) { 857 852 int want = btree_lock_want(path, l); 858 853 int have = btree_node_locked_type(path, l); 859 854 ··· 870 857 } 871 858 } 872 859 860 + static bool bch2_trans_locked(struct btree_trans *trans) 861 + { 862 + struct btree_path *path; 863 + unsigned i; 864 + 865 + trans_for_each_path(trans, path, i) 866 + if (path->nodes_locked) 867 + return true; 868 + return false; 869 + } 870 + 873 871 void bch2_trans_verify_locks(struct btree_trans *trans) 874 872 { 873 + if (!trans->locked) { 874 + BUG_ON(bch2_trans_locked(trans)); 875 + return; 876 + } 877 + 875 878 struct btree_path *path; 876 879 unsigned i; 877 880
+2 -2
fs/bcachefs/btree_locking.h
··· 364 364 struct btree_path *path, 365 365 unsigned new_locks_want) 366 366 { 367 - struct get_locks_fail f; 367 + struct get_locks_fail f = {}; 368 368 unsigned old_locks_want = path->locks_want; 369 369 370 370 new_locks_want = min(new_locks_want, BTREE_MAX_DEPTH); 371 371 372 372 if (path->locks_want < new_locks_want 373 373 ? __bch2_btree_path_upgrade(trans, path, new_locks_want, &f) 374 - : path->uptodate == BTREE_ITER_UPTODATE) 374 + : path->nodes_locked) 375 375 return 0; 376 376 377 377 trace_and_count(trans->c, trans_restart_upgrade, trans, _THIS_IP_, path,
+48 -22
fs/bcachefs/btree_trans_commit.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 3 3 #include "bcachefs.h" 4 + #include "alloc_foreground.h" 4 5 #include "btree_gc.h" 5 6 #include "btree_io.h" 6 7 #include "btree_iter.h" ··· 19 18 #include "snapshot.h" 20 19 21 20 #include <linux/prefetch.h> 21 + 22 + static const char * const trans_commit_flags_strs[] = { 23 + #define x(n, ...) #n, 24 + BCH_TRANS_COMMIT_FLAGS() 25 + #undef x 26 + NULL 27 + }; 28 + 29 + void bch2_trans_commit_flags_to_text(struct printbuf *out, enum bch_trans_commit_flags flags) 30 + { 31 + enum bch_watermark watermark = flags & BCH_WATERMARK_MASK; 32 + 33 + prt_printf(out, "watermark=%s", bch2_watermarks[watermark]); 34 + 35 + flags >>= BCH_WATERMARK_BITS; 36 + if (flags) { 37 + prt_char(out, ' '); 38 + bch2_prt_bitflags(out, trans_commit_flags_strs, flags); 39 + } 40 + } 22 41 23 42 static void verify_update_old_key(struct btree_trans *trans, struct btree_insert_entry *i) 24 43 { ··· 336 315 BUG_ON(i->btree_id != path->btree_id); 337 316 EBUG_ON(!i->level && 338 317 btree_type_has_snapshots(i->btree_id) && 339 - !(i->flags & BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) && 340 - test_bit(JOURNAL_REPLAY_DONE, &trans->c->journal.flags) && 318 + !(i->flags & BTREE_UPDATE_internal_snapshot_node) && 319 + test_bit(JOURNAL_replay_done, &trans->c->journal.flags) && 341 320 i->k->k.p.snapshot && 342 321 bch2_snapshot_is_internal_node(trans->c, i->k->k.p.snapshot) > 0); 343 322 } ··· 464 443 465 444 verify_update_old_key(trans, i); 466 445 467 - if (unlikely(flags & BTREE_TRIGGER_NORUN)) 446 + if (unlikely(flags & BTREE_TRIGGER_norun)) 468 447 return 0; 469 448 470 449 if (old_ops->trigger == new_ops->trigger) { 471 450 ret = bch2_key_trigger(trans, i->btree_id, i->level, 472 451 old, bkey_i_to_s(new), 473 - BTREE_TRIGGER_INSERT|BTREE_TRIGGER_OVERWRITE|flags); 452 + BTREE_TRIGGER_insert|BTREE_TRIGGER_overwrite|flags); 474 453 } else { 475 454 ret = bch2_key_trigger_new(trans, i->btree_id, i->level, 476 455 bkey_i_to_s(new), flags) ?: ··· 493 472 struct bkey_s_c old = { &old_k, i->old_v }; 494 473 const struct bkey_ops *old_ops = bch2_bkey_type_ops(old.k->type); 495 474 const struct bkey_ops *new_ops = bch2_bkey_type_ops(i->k->k.type); 496 - unsigned flags = i->flags|BTREE_TRIGGER_TRANSACTIONAL; 475 + unsigned flags = i->flags|BTREE_TRIGGER_transactional; 497 476 498 477 verify_update_old_key(trans, i); 499 478 500 - if ((i->flags & BTREE_TRIGGER_NORUN) || 479 + if ((i->flags & BTREE_TRIGGER_norun) || 501 480 !(BTREE_NODE_TYPE_HAS_TRANS_TRIGGERS & (1U << i->bkey_type))) 502 481 return 0; 503 482 ··· 507 486 i->overwrite_trigger_run = true; 508 487 i->insert_trigger_run = true; 509 488 return bch2_key_trigger(trans, i->btree_id, i->level, old, bkey_i_to_s(i->k), 510 - BTREE_TRIGGER_INSERT| 511 - BTREE_TRIGGER_OVERWRITE|flags) ?: 1; 489 + BTREE_TRIGGER_insert| 490 + BTREE_TRIGGER_overwrite|flags) ?: 1; 512 491 } else if (overwrite && !i->overwrite_trigger_run) { 513 492 i->overwrite_trigger_run = true; 514 493 return bch2_key_trigger_old(trans, i->btree_id, i->level, old, flags) ?: 1; ··· 593 572 594 573 #ifdef CONFIG_BCACHEFS_DEBUG 595 574 trans_for_each_update(trans, i) 596 - BUG_ON(!(i->flags & BTREE_TRIGGER_NORUN) && 575 + BUG_ON(!(i->flags & BTREE_TRIGGER_norun) && 597 576 (BTREE_NODE_TYPE_HAS_TRANS_TRIGGERS & (1U << i->bkey_type)) && 598 577 (!i->insert_trigger_run || !i->overwrite_trigger_run)); 599 578 #endif ··· 611 590 612 591 if (btree_node_type_needs_gc(__btree_node_type(i->level, i->btree_id)) && 613 592 gc_visited(trans->c, gc_pos_btree_node(insert_l(trans, i)->b))) { 614 - int ret = run_one_mem_trigger(trans, i, i->flags|BTREE_TRIGGER_GC); 593 + int ret = run_one_mem_trigger(trans, i, i->flags|BTREE_TRIGGER_gc); 615 594 if (ret) 616 595 return ret; 617 596 } ··· 629 608 struct btree_trans_commit_hook *h; 630 609 unsigned u64s = 0; 631 610 int ret; 611 + 612 + bch2_trans_verify_not_unlocked(trans); 613 + bch2_trans_verify_not_in_restart(trans); 632 614 633 615 if (race_fault()) { 634 616 trace_and_count(c, trans_restart_fault_inject, trans, trace_ip); ··· 710 686 711 687 trans_for_each_update(trans, i) 712 688 if (BTREE_NODE_TYPE_HAS_ATOMIC_TRIGGERS & (1U << i->bkey_type)) { 713 - ret = run_one_mem_trigger(trans, i, BTREE_TRIGGER_ATOMIC|i->flags); 689 + ret = run_one_mem_trigger(trans, i, BTREE_TRIGGER_atomic|i->flags); 714 690 if (ret) 715 691 goto fatal_err; 716 692 } ··· 729 705 if (i->key_cache_already_flushed) 730 706 continue; 731 707 732 - if (i->flags & BTREE_UPDATE_NOJOURNAL) 708 + if (i->flags & BTREE_UPDATE_nojournal) 733 709 continue; 734 710 735 711 verify_update_old_key(trans, i); ··· 790 766 } 791 767 792 768 static noinline int bch2_trans_commit_bkey_invalid(struct btree_trans *trans, 793 - enum bkey_invalid_flags flags, 769 + enum bch_validate_flags flags, 794 770 struct btree_insert_entry *i, 795 771 struct printbuf *err) 796 772 { 797 773 struct bch_fs *c = trans->c; 798 774 799 775 printbuf_reset(err); 800 - prt_printf(err, "invalid bkey on insert from %s -> %ps", 776 + prt_printf(err, "invalid bkey on insert from %s -> %ps\n", 801 777 trans->fn, (void *) i->ip_allocated); 802 - prt_newline(err); 803 778 printbuf_indent_add(err, 2); 804 779 805 780 bch2_bkey_val_to_text(err, c, bkey_i_to_s_c(i->k)); ··· 819 796 struct bch_fs *c = trans->c; 820 797 struct printbuf buf = PRINTBUF; 821 798 822 - prt_printf(&buf, "invalid bkey on insert from %s", trans->fn); 823 - prt_newline(&buf); 799 + prt_printf(&buf, "invalid bkey on insert from %s\n", trans->fn); 824 800 printbuf_indent_add(&buf, 2); 825 801 826 802 bch2_journal_entry_to_text(&buf, c, i); ··· 1010 988 struct bch_fs *c = trans->c; 1011 989 int ret = 0; 1012 990 991 + bch2_trans_verify_not_unlocked(trans); 992 + bch2_trans_verify_not_in_restart(trans); 993 + 1013 994 if (!trans->nr_updates && 1014 995 !trans->journal_entries_u64s) 1015 996 goto out_reset; ··· 1025 1000 1026 1001 trans_for_each_update(trans, i) { 1027 1002 struct printbuf buf = PRINTBUF; 1028 - enum bkey_invalid_flags invalid_flags = 0; 1003 + enum bch_validate_flags invalid_flags = 0; 1029 1004 1030 1005 if (!(flags & BCH_TRANS_COMMIT_no_journal_res)) 1031 - invalid_flags |= BKEY_INVALID_WRITE|BKEY_INVALID_COMMIT; 1006 + invalid_flags |= BCH_VALIDATE_write|BCH_VALIDATE_commit; 1032 1007 1033 1008 if (unlikely(bch2_bkey_invalid(c, bkey_i_to_s_c(i->k), 1034 1009 i->bkey_type, invalid_flags, &buf))) ··· 1043 1018 for (struct jset_entry *i = trans->journal_entries; 1044 1019 i != (void *) ((u64 *) trans->journal_entries + trans->journal_entries_u64s); 1045 1020 i = vstruct_next(i)) { 1046 - enum bkey_invalid_flags invalid_flags = 0; 1021 + enum bch_validate_flags invalid_flags = 0; 1047 1022 1048 1023 if (!(flags & BCH_TRANS_COMMIT_no_journal_res)) 1049 - invalid_flags |= BKEY_INVALID_WRITE|BKEY_INVALID_COMMIT; 1024 + invalid_flags |= BCH_VALIDATE_write|BCH_VALIDATE_commit; 1050 1025 1051 1026 if (unlikely(bch2_journal_entry_validate(c, NULL, i, 1052 1027 bcachefs_metadata_version_current, ··· 1090 1065 if (i->key_cache_already_flushed) 1091 1066 continue; 1092 1067 1093 - if (i->flags & BTREE_UPDATE_NOJOURNAL) 1068 + if (i->flags & BTREE_UPDATE_nojournal) 1094 1069 continue; 1095 1070 1096 1071 /* we're going to journal the key being updated: */ ··· 1111 1086 } 1112 1087 retry: 1113 1088 errored_at = NULL; 1089 + bch2_trans_verify_not_unlocked(trans); 1114 1090 bch2_trans_verify_not_in_restart(trans); 1115 1091 if (likely(!(flags & BCH_TRANS_COMMIT_no_journal_res))) 1116 1092 memset(&trans->journal_res, 0, sizeof(trans->journal_res));
+97 -30
fs/bcachefs/btree_types.h
··· 163 163 /* Number of elements in live + freeable lists */ 164 164 unsigned used; 165 165 unsigned reserve; 166 + unsigned freed; 167 + unsigned not_freed_lock_intent; 168 + unsigned not_freed_lock_write; 169 + unsigned not_freed_dirty; 170 + unsigned not_freed_read_in_flight; 171 + unsigned not_freed_write_in_flight; 172 + unsigned not_freed_noevict; 173 + unsigned not_freed_write_blocked; 174 + unsigned not_freed_will_make_reachable; 175 + unsigned not_freed_access_bit; 166 176 atomic_t dirty; 167 177 struct shrinker *shrink; 178 + 179 + unsigned used_by_btree[BTREE_ID_NR]; 168 180 169 181 /* 170 182 * If we need to allocate memory for a new btree node and that ··· 199 187 } data[MAX_BSETS]; 200 188 }; 201 189 190 + #define BTREE_ITER_FLAGS() \ 191 + x(slots) \ 192 + x(intent) \ 193 + x(prefetch) \ 194 + x(is_extents) \ 195 + x(not_extents) \ 196 + x(cached) \ 197 + x(with_key_cache) \ 198 + x(with_updates) \ 199 + x(with_journal) \ 200 + x(snapshot_field) \ 201 + x(all_snapshots) \ 202 + x(filter_snapshots) \ 203 + x(nopreserve) \ 204 + x(cached_nofill) \ 205 + x(key_cache_fill) \ 206 + 207 + #define STR_HASH_FLAGS() \ 208 + x(must_create) \ 209 + x(must_replace) 210 + 211 + #define BTREE_UPDATE_FLAGS() \ 212 + x(internal_snapshot_node) \ 213 + x(nojournal) \ 214 + x(key_cache_reclaim) 215 + 216 + 202 217 /* 203 - * Iterate over all possible positions, synthesizing deleted keys for holes: 218 + * BTREE_TRIGGER_norun - don't run triggers at all 219 + * 220 + * BTREE_TRIGGER_transactional - we're running transactional triggers as part of 221 + * a transaction commit: triggers may generate new updates 222 + * 223 + * BTREE_TRIGGER_atomic - we're running atomic triggers during a transaction 224 + * commit: we have our journal reservation, we're holding btree node write 225 + * locks, and we know the transaction is going to commit (returning an error 226 + * here is a fatal error, causing us to go emergency read-only) 227 + * 228 + * BTREE_TRIGGER_gc - we're in gc/fsck: running triggers to recalculate e.g. disk usage 229 + * 230 + * BTREE_TRIGGER_insert - @new is entering the btree 231 + * BTREE_TRIGGER_overwrite - @old is leaving the btree 232 + * 233 + * BTREE_TRIGGER_bucket_invalidate - signal from bucket invalidate path to alloc 234 + * trigger 204 235 */ 205 - static const __maybe_unused u16 BTREE_ITER_SLOTS = 1 << 0; 206 - /* 207 - * Indicates that intent locks should be taken on leaf nodes, because we expect 208 - * to be doing updates: 209 - */ 210 - static const __maybe_unused u16 BTREE_ITER_INTENT = 1 << 1; 211 - /* 212 - * Causes the btree iterator code to prefetch additional btree nodes from disk: 213 - */ 214 - static const __maybe_unused u16 BTREE_ITER_PREFETCH = 1 << 2; 215 - /* 216 - * Used in bch2_btree_iter_traverse(), to indicate whether we're searching for 217 - * @pos or the first key strictly greater than @pos 218 - */ 219 - static const __maybe_unused u16 BTREE_ITER_IS_EXTENTS = 1 << 3; 220 - static const __maybe_unused u16 BTREE_ITER_NOT_EXTENTS = 1 << 4; 221 - static const __maybe_unused u16 BTREE_ITER_CACHED = 1 << 5; 222 - static const __maybe_unused u16 BTREE_ITER_WITH_KEY_CACHE = 1 << 6; 223 - static const __maybe_unused u16 BTREE_ITER_WITH_UPDATES = 1 << 7; 224 - static const __maybe_unused u16 BTREE_ITER_WITH_JOURNAL = 1 << 8; 225 - static const __maybe_unused u16 __BTREE_ITER_ALL_SNAPSHOTS = 1 << 9; 226 - static const __maybe_unused u16 BTREE_ITER_ALL_SNAPSHOTS = 1 << 10; 227 - static const __maybe_unused u16 BTREE_ITER_FILTER_SNAPSHOTS = 1 << 11; 228 - static const __maybe_unused u16 BTREE_ITER_NOPRESERVE = 1 << 12; 229 - static const __maybe_unused u16 BTREE_ITER_CACHED_NOFILL = 1 << 13; 230 - static const __maybe_unused u16 BTREE_ITER_KEY_CACHE_FILL = 1 << 14; 231 - #define __BTREE_ITER_FLAGS_END 15 236 + #define BTREE_TRIGGER_FLAGS() \ 237 + x(norun) \ 238 + x(transactional) \ 239 + x(atomic) \ 240 + x(check_repair) \ 241 + x(gc) \ 242 + x(insert) \ 243 + x(overwrite) \ 244 + x(is_root) \ 245 + x(bucket_invalidate) 246 + 247 + enum { 248 + #define x(n) BTREE_ITER_FLAG_BIT_##n, 249 + BTREE_ITER_FLAGS() 250 + STR_HASH_FLAGS() 251 + BTREE_UPDATE_FLAGS() 252 + BTREE_TRIGGER_FLAGS() 253 + #undef x 254 + }; 255 + 256 + /* iter flags must fit in a u16: */ 257 + //BUILD_BUG_ON(BTREE_ITER_FLAG_BIT_key_cache_fill > 15); 258 + 259 + enum btree_iter_update_trigger_flags { 260 + #define x(n) BTREE_ITER_##n = 1U << BTREE_ITER_FLAG_BIT_##n, 261 + BTREE_ITER_FLAGS() 262 + #undef x 263 + #define x(n) STR_HASH_##n = 1U << BTREE_ITER_FLAG_BIT_##n, 264 + STR_HASH_FLAGS() 265 + #undef x 266 + #define x(n) BTREE_UPDATE_##n = 1U << BTREE_ITER_FLAG_BIT_##n, 267 + BTREE_UPDATE_FLAGS() 268 + #undef x 269 + #define x(n) BTREE_TRIGGER_##n = 1U << BTREE_ITER_FLAG_BIT_##n, 270 + BTREE_TRIGGER_FLAGS() 271 + #undef x 272 + }; 232 273 233 274 enum btree_path_uptodate { 234 275 BTREE_ITER_UPTODATE = 0, ··· 372 307 */ 373 308 struct bkey k; 374 309 375 - /* BTREE_ITER_WITH_JOURNAL: */ 310 + /* BTREE_ITER_with_journal: */ 376 311 size_t journal_idx; 377 312 #ifdef TRACK_PATH_ALLOCATED 378 313 unsigned long ip_allocated; ··· 483 418 u8 lock_must_abort; 484 419 bool lock_may_not_fail:1; 485 420 bool srcu_held:1; 421 + bool locked:1; 422 + bool write_locked:1; 486 423 bool used_mempool:1; 487 424 bool in_traverse_all:1; 488 425 bool paths_sorted:1; ··· 492 425 bool journal_transaction_names:1; 493 426 bool journal_replay_not_finished:1; 494 427 bool notrace_relock_fail:1; 495 - bool write_locked:1; 496 428 enum bch_errcode restarted:16; 497 429 u32 restart_count; 498 430 499 431 u64 last_begin_time; 500 432 unsigned long last_begin_ip; 501 433 unsigned long last_restarted_ip; 434 + unsigned long last_unlock_ip; 502 435 unsigned long srcu_lock_time; 503 436 504 437 const char *fn;
+46 -49
fs/bcachefs/btree_update.c
··· 25 25 26 26 static int __must_check 27 27 bch2_trans_update_by_path(struct btree_trans *, btree_path_idx_t, 28 - struct bkey_i *, enum btree_update_flags, 28 + struct bkey_i *, enum btree_iter_update_trigger_flags, 29 29 unsigned long ip); 30 30 31 31 static noinline int extent_front_merge(struct btree_trans *trans, 32 32 struct btree_iter *iter, 33 33 struct bkey_s_c k, 34 34 struct bkey_i **insert, 35 - enum btree_update_flags flags) 35 + enum btree_iter_update_trigger_flags flags) 36 36 { 37 37 struct bch_fs *c = trans->c; 38 38 struct bkey_i *update; ··· 104 104 pos.snapshot++; 105 105 106 106 for_each_btree_key_norestart(trans, iter, btree_id, pos, 107 - BTREE_ITER_ALL_SNAPSHOTS| 108 - BTREE_ITER_NOPRESERVE, k, ret) { 107 + BTREE_ITER_all_snapshots| 108 + BTREE_ITER_nopreserve, k, ret) { 109 109 if (!bkey_eq(k.k->p, pos)) 110 110 break; 111 111 ··· 138 138 darray_init(&s); 139 139 140 140 bch2_trans_iter_init(trans, &old_iter, id, old_pos, 141 - BTREE_ITER_NOT_EXTENTS| 142 - BTREE_ITER_ALL_SNAPSHOTS); 141 + BTREE_ITER_not_extents| 142 + BTREE_ITER_all_snapshots); 143 143 while ((old_k = bch2_btree_iter_prev(&old_iter)).k && 144 144 !(ret = bkey_err(old_k)) && 145 145 bkey_eq(old_pos, old_k.k->p)) { ··· 151 151 continue; 152 152 153 153 new_k = bch2_bkey_get_iter(trans, &new_iter, id, whiteout_pos, 154 - BTREE_ITER_NOT_EXTENTS| 155 - BTREE_ITER_INTENT); 154 + BTREE_ITER_not_extents| 155 + BTREE_ITER_intent); 156 156 ret = bkey_err(new_k); 157 157 if (ret) 158 158 break; ··· 168 168 update->k.type = KEY_TYPE_whiteout; 169 169 170 170 ret = bch2_trans_update(trans, &new_iter, update, 171 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 171 + BTREE_UPDATE_internal_snapshot_node); 172 172 } 173 173 bch2_trans_iter_exit(trans, &new_iter); 174 174 ··· 185 185 186 186 int bch2_trans_update_extent_overwrite(struct btree_trans *trans, 187 187 struct btree_iter *iter, 188 - enum btree_update_flags flags, 188 + enum btree_iter_update_trigger_flags flags, 189 189 struct bkey_s_c old, 190 190 struct bkey_s_c new) 191 191 { ··· 218 218 ret = bch2_insert_snapshot_whiteouts(trans, btree_id, 219 219 old.k->p, update->k.p) ?: 220 220 bch2_btree_insert_nonextent(trans, btree_id, update, 221 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE|flags); 221 + BTREE_UPDATE_internal_snapshot_node|flags); 222 222 if (ret) 223 223 return ret; 224 224 } ··· 235 235 ret = bch2_insert_snapshot_whiteouts(trans, btree_id, 236 236 old.k->p, update->k.p) ?: 237 237 bch2_btree_insert_nonextent(trans, btree_id, update, 238 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE|flags); 238 + BTREE_UPDATE_internal_snapshot_node|flags); 239 239 if (ret) 240 240 return ret; 241 241 } ··· 260 260 } 261 261 262 262 ret = bch2_btree_insert_nonextent(trans, btree_id, update, 263 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE|flags); 263 + BTREE_UPDATE_internal_snapshot_node|flags); 264 264 if (ret) 265 265 return ret; 266 266 } ··· 273 273 bch2_cut_front(new.k->p, update); 274 274 275 275 ret = bch2_trans_update_by_path(trans, iter->path, update, 276 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE| 276 + BTREE_UPDATE_internal_snapshot_node| 277 277 flags, _RET_IP_); 278 278 if (ret) 279 279 return ret; ··· 285 285 static int bch2_trans_update_extent(struct btree_trans *trans, 286 286 struct btree_iter *orig_iter, 287 287 struct bkey_i *insert, 288 - enum btree_update_flags flags) 288 + enum btree_iter_update_trigger_flags flags) 289 289 { 290 290 struct btree_iter iter; 291 291 struct bkey_s_c k; ··· 293 293 int ret = 0; 294 294 295 295 bch2_trans_iter_init(trans, &iter, btree_id, bkey_start_pos(&insert->k), 296 - BTREE_ITER_INTENT| 297 - BTREE_ITER_WITH_UPDATES| 298 - BTREE_ITER_NOT_EXTENTS); 296 + BTREE_ITER_intent| 297 + BTREE_ITER_with_updates| 298 + BTREE_ITER_not_extents); 299 299 k = bch2_btree_iter_peek_upto(&iter, POS(insert->k.p.inode, U64_MAX)); 300 300 if ((ret = bkey_err(k))) 301 301 goto err; ··· 346 346 347 347 static noinline int flush_new_cached_update(struct btree_trans *trans, 348 348 struct btree_insert_entry *i, 349 - enum btree_update_flags flags, 349 + enum btree_iter_update_trigger_flags flags, 350 350 unsigned long ip) 351 351 { 352 352 struct bkey k; ··· 354 354 355 355 btree_path_idx_t path_idx = 356 356 bch2_path_get(trans, i->btree_id, i->old_k.p, 1, 0, 357 - BTREE_ITER_INTENT, _THIS_IP_); 357 + BTREE_ITER_intent, _THIS_IP_); 358 358 ret = bch2_btree_path_traverse(trans, path_idx, 0); 359 359 if (ret) 360 360 goto out; ··· 372 372 goto out; 373 373 374 374 i->key_cache_already_flushed = true; 375 - i->flags |= BTREE_TRIGGER_NORUN; 375 + i->flags |= BTREE_TRIGGER_norun; 376 376 377 377 btree_path_set_should_be_locked(btree_path); 378 378 ret = bch2_trans_update_by_path(trans, path_idx, i->k, flags, ip); ··· 383 383 384 384 static int __must_check 385 385 bch2_trans_update_by_path(struct btree_trans *trans, btree_path_idx_t path_idx, 386 - struct bkey_i *k, enum btree_update_flags flags, 386 + struct bkey_i *k, enum btree_iter_update_trigger_flags flags, 387 387 unsigned long ip) 388 388 { 389 389 struct bch_fs *c = trans->c; ··· 479 479 if (!iter->key_cache_path) 480 480 iter->key_cache_path = 481 481 bch2_path_get(trans, path->btree_id, path->pos, 1, 0, 482 - BTREE_ITER_INTENT| 483 - BTREE_ITER_CACHED, _THIS_IP_); 482 + BTREE_ITER_intent| 483 + BTREE_ITER_cached, _THIS_IP_); 484 484 485 485 iter->key_cache_path = 486 486 bch2_btree_path_set_pos(trans, iter->key_cache_path, path->pos, 487 - iter->flags & BTREE_ITER_INTENT, 487 + iter->flags & BTREE_ITER_intent, 488 488 _THIS_IP_); 489 489 490 - ret = bch2_btree_path_traverse(trans, iter->key_cache_path, BTREE_ITER_CACHED); 490 + ret = bch2_btree_path_traverse(trans, iter->key_cache_path, BTREE_ITER_cached); 491 491 if (unlikely(ret)) 492 492 return ret; 493 493 ··· 505 505 } 506 506 507 507 int __must_check bch2_trans_update(struct btree_trans *trans, struct btree_iter *iter, 508 - struct bkey_i *k, enum btree_update_flags flags) 508 + struct bkey_i *k, enum btree_iter_update_trigger_flags flags) 509 509 { 510 510 btree_path_idx_t path_idx = iter->update_path ?: iter->path; 511 511 int ret; 512 512 513 - if (iter->flags & BTREE_ITER_IS_EXTENTS) 513 + if (iter->flags & BTREE_ITER_is_extents) 514 514 return bch2_trans_update_extent(trans, iter, k, flags); 515 515 516 516 if (bkey_deleted(&k->k) && 517 - !(flags & BTREE_UPDATE_KEY_CACHE_RECLAIM) && 518 - (iter->flags & BTREE_ITER_FILTER_SNAPSHOTS)) { 517 + !(flags & BTREE_UPDATE_key_cache_reclaim) && 518 + (iter->flags & BTREE_ITER_filter_snapshots)) { 519 519 ret = need_whiteout_for_snapshot(trans, iter->btree_id, k->k.p); 520 520 if (unlikely(ret < 0)) 521 521 return ret; ··· 528 528 * Ensure that updates to cached btrees go to the key cache: 529 529 */ 530 530 struct btree_path *path = trans->paths + path_idx; 531 - if (!(flags & BTREE_UPDATE_KEY_CACHE_RECLAIM) && 531 + if (!(flags & BTREE_UPDATE_key_cache_reclaim) && 532 532 !path->cached && 533 533 !path->level && 534 534 btree_id_cached(trans->c, path->btree_id)) { ··· 587 587 struct bkey_s_c k; 588 588 int ret = 0; 589 589 590 - bch2_trans_iter_init(trans, iter, btree, POS_MAX, BTREE_ITER_INTENT); 590 + bch2_trans_iter_init(trans, iter, btree, POS_MAX, BTREE_ITER_intent); 591 591 k = bch2_btree_iter_prev(iter); 592 592 ret = bkey_err(k); 593 593 if (ret) ··· 621 621 622 622 int bch2_btree_insert_nonextent(struct btree_trans *trans, 623 623 enum btree_id btree, struct bkey_i *k, 624 - enum btree_update_flags flags) 624 + enum btree_iter_update_trigger_flags flags) 625 625 { 626 626 struct btree_iter iter; 627 627 int ret; 628 628 629 629 bch2_trans_iter_init(trans, &iter, btree, k->k.p, 630 - BTREE_ITER_CACHED| 631 - BTREE_ITER_NOT_EXTENTS| 632 - BTREE_ITER_INTENT); 630 + BTREE_ITER_cached| 631 + BTREE_ITER_not_extents| 632 + BTREE_ITER_intent); 633 633 ret = bch2_btree_iter_traverse(&iter) ?: 634 634 bch2_trans_update(trans, &iter, k, flags); 635 635 bch2_trans_iter_exit(trans, &iter); ··· 637 637 } 638 638 639 639 int bch2_btree_insert_trans(struct btree_trans *trans, enum btree_id id, 640 - struct bkey_i *k, enum btree_update_flags flags) 640 + struct bkey_i *k, enum btree_iter_update_trigger_flags flags) 641 641 { 642 642 struct btree_iter iter; 643 - int ret; 644 - 645 643 bch2_trans_iter_init(trans, &iter, id, bkey_start_pos(&k->k), 646 - BTREE_ITER_CACHED| 647 - BTREE_ITER_INTENT); 648 - ret = bch2_btree_iter_traverse(&iter) ?: 649 - bch2_trans_update(trans, &iter, k, flags); 644 + BTREE_ITER_intent|flags); 645 + int ret = bch2_btree_iter_traverse(&iter) ?: 646 + bch2_trans_update(trans, &iter, k, flags); 650 647 bch2_trans_iter_exit(trans, &iter); 651 648 return ret; 652 649 } ··· 695 698 int ret; 696 699 697 700 bch2_trans_iter_init(trans, &iter, btree, pos, 698 - BTREE_ITER_CACHED| 699 - BTREE_ITER_INTENT); 701 + BTREE_ITER_cached| 702 + BTREE_ITER_intent); 700 703 ret = bch2_btree_iter_traverse(&iter) ?: 701 704 bch2_btree_delete_at(trans, &iter, update_flags); 702 705 bch2_trans_iter_exit(trans, &iter); ··· 714 717 struct bkey_s_c k; 715 718 int ret = 0; 716 719 717 - bch2_trans_iter_init(trans, &iter, id, start, BTREE_ITER_INTENT); 720 + bch2_trans_iter_init(trans, &iter, id, start, BTREE_ITER_intent); 718 721 while ((k = bch2_btree_iter_peek_upto(&iter, end)).k) { 719 722 struct disk_reservation disk_res = 720 723 bch2_disk_reservation_init(trans->c, 0); ··· 742 745 */ 743 746 delete.k.p = iter.pos; 744 747 745 - if (iter.flags & BTREE_ITER_IS_EXTENTS) 748 + if (iter.flags & BTREE_ITER_is_extents) 746 749 bch2_key_resize(&delete.k, 747 750 bpos_min(end, k.k->p).offset - 748 751 iter.pos.offset); ··· 801 804 k->k.p = pos; 802 805 803 806 struct btree_iter iter; 804 - bch2_trans_iter_init(trans, &iter, btree, pos, BTREE_ITER_INTENT); 807 + bch2_trans_iter_init(trans, &iter, btree, pos, BTREE_ITER_intent); 805 808 806 809 ret = bch2_btree_iter_traverse(&iter) ?: 807 810 bch2_trans_update(trans, &iter, k, 0); ··· 849 852 if (ret) 850 853 goto err; 851 854 852 - if (!test_bit(JOURNAL_STARTED, &c->journal.flags)) { 855 + if (!test_bit(JOURNAL_running, &c->journal.flags)) { 853 856 ret = darray_make_room(&c->journal.early_journal_entries, jset_u64s(u64s)); 854 857 if (ret) 855 858 goto err;
+8 -6
fs/bcachefs/btree_update.h
··· 44 44 #undef x 45 45 }; 46 46 47 + void bch2_trans_commit_flags_to_text(struct printbuf *, enum bch_trans_commit_flags); 48 + 47 49 int bch2_btree_delete_extent_at(struct btree_trans *, struct btree_iter *, 48 50 unsigned, unsigned); 49 51 int bch2_btree_delete_at(struct btree_trans *, struct btree_iter *, unsigned); 50 52 int bch2_btree_delete(struct btree_trans *, enum btree_id, struct bpos, unsigned); 51 53 52 54 int bch2_btree_insert_nonextent(struct btree_trans *, enum btree_id, 53 - struct bkey_i *, enum btree_update_flags); 55 + struct bkey_i *, enum btree_iter_update_trigger_flags); 54 56 55 57 int bch2_btree_insert_trans(struct btree_trans *, enum btree_id, struct bkey_i *, 56 - enum btree_update_flags); 58 + enum btree_iter_update_trigger_flags); 57 59 int bch2_btree_insert(struct bch_fs *, enum btree_id, struct bkey_i *, 58 60 struct disk_reservation *, int flags); 59 61 ··· 96 94 } 97 95 98 96 int bch2_trans_update_extent_overwrite(struct btree_trans *, struct btree_iter *, 99 - enum btree_update_flags, 97 + enum btree_iter_update_trigger_flags, 100 98 struct bkey_s_c, struct bkey_s_c); 101 99 102 100 int bch2_bkey_get_empty_slot(struct btree_trans *, struct btree_iter *, 103 101 enum btree_id, struct bpos); 104 102 105 103 int __must_check bch2_trans_update(struct btree_trans *, struct btree_iter *, 106 - struct bkey_i *, enum btree_update_flags); 104 + struct bkey_i *, enum btree_iter_update_trigger_flags); 107 105 108 106 struct jset_entry *__bch2_trans_jset_entry_alloc(struct btree_trans *, unsigned); 109 107 ··· 278 276 unsigned flags, unsigned type, unsigned min_bytes) 279 277 { 280 278 struct bkey_s_c k = __bch2_bkey_get_iter(trans, iter, 281 - btree_id, pos, flags|BTREE_ITER_INTENT, type); 279 + btree_id, pos, flags|BTREE_ITER_intent, type); 282 280 struct bkey_i *ret = IS_ERR(k.k) 283 281 ? ERR_CAST(k.k) 284 282 : __bch2_bkey_make_mut_noupdate(trans, k, 0, min_bytes); ··· 301 299 unsigned flags, unsigned type, unsigned min_bytes) 302 300 { 303 301 struct bkey_i *mut = __bch2_bkey_get_mut_noupdate(trans, iter, 304 - btree_id, pos, flags|BTREE_ITER_INTENT, type, min_bytes); 302 + btree_id, pos, flags|BTREE_ITER_intent, type, min_bytes); 305 303 int ret; 306 304 307 305 if (IS_ERR(mut))
+49 -46
fs/bcachefs/btree_update_interior.c
··· 38 38 btree_path_idx_t, struct btree *, struct keylist *); 39 39 static void bch2_btree_update_add_new_node(struct btree_update *, struct btree *); 40 40 41 - static btree_path_idx_t get_unlocked_mut_path(struct btree_trans *trans, 42 - enum btree_id btree_id, 43 - unsigned level, 44 - struct bpos pos) 45 - { 46 - btree_path_idx_t path_idx = bch2_path_get(trans, btree_id, pos, level + 1, level, 47 - BTREE_ITER_NOPRESERVE| 48 - BTREE_ITER_INTENT, _RET_IP_); 49 - path_idx = bch2_btree_path_make_mut(trans, path_idx, true, _RET_IP_); 50 - 51 - struct btree_path *path = trans->paths + path_idx; 52 - bch2_btree_path_downgrade(trans, path); 53 - __bch2_btree_path_unlock(trans, path); 54 - return path_idx; 55 - } 56 - 57 41 /* 58 42 * Verify that child nodes correctly span parent node's range: 59 43 */ ··· 56 72 BUG_ON(b->key.k.type == KEY_TYPE_btree_ptr_v2 && 57 73 !bpos_eq(bkey_i_to_btree_ptr_v2(&b->key)->v.min_key, 58 74 b->data->min_key)); 75 + 76 + if (b == btree_node_root(c, b)) { 77 + if (!bpos_eq(b->data->min_key, POS_MIN)) { 78 + printbuf_reset(&buf); 79 + bch2_bpos_to_text(&buf, b->data->min_key); 80 + need_fsck_err(c, btree_root_bad_min_key, 81 + "btree root with incorrect min_key: %s", buf.buf); 82 + goto topology_repair; 83 + } 84 + 85 + if (!bpos_eq(b->data->max_key, SPOS_MAX)) { 86 + printbuf_reset(&buf); 87 + bch2_bpos_to_text(&buf, b->data->max_key); 88 + need_fsck_err(c, btree_root_bad_max_key, 89 + "btree root with incorrect max_key: %s", buf.buf); 90 + goto topology_repair; 91 + } 92 + } 59 93 60 94 if (!b->c.level) 61 95 return 0; ··· 160 158 static void __bch2_btree_calc_format(struct bkey_format_state *s, struct btree *b) 161 159 { 162 160 struct bkey_packed *k; 163 - struct bset_tree *t; 164 161 struct bkey uk; 165 162 166 163 for_each_bset(b, t) ··· 647 646 unsigned level = bkey_i_to_btree_ptr_v2(k)->v.mem_ptr; 648 647 649 648 ret = bch2_key_trigger_old(trans, as->btree_id, level, bkey_i_to_s_c(k), 650 - BTREE_TRIGGER_TRANSACTIONAL); 649 + BTREE_TRIGGER_transactional); 651 650 if (ret) 652 651 return ret; 653 652 } ··· 656 655 unsigned level = bkey_i_to_btree_ptr_v2(k)->v.mem_ptr; 657 656 658 657 ret = bch2_key_trigger_new(trans, as->btree_id, level, bkey_i_to_s(k), 659 - BTREE_TRIGGER_TRANSACTIONAL); 658 + BTREE_TRIGGER_transactional); 660 659 if (ret) 661 660 return ret; 662 661 } ··· 736 735 */ 737 736 b = READ_ONCE(as->b); 738 737 if (b) { 739 - btree_path_idx_t path_idx = get_unlocked_mut_path(trans, 740 - as->btree_id, b->c.level, b->key.k.p); 741 - struct btree_path *path = trans->paths + path_idx; 742 738 /* 743 739 * @b is the node we did the final insert into: 744 740 * ··· 753 755 * btree_node_lock_nopath() (the use of which is always suspect, 754 756 * we need to work on removing this in the future) 755 757 * 756 - * It should be, but get_unlocked_mut_path() -> bch2_path_get() 758 + * It should be, but bch2_path_get_unlocked_mut() -> bch2_path_get() 757 759 * calls bch2_path_upgrade(), before we call path_make_mut(), so 758 760 * we may rarely end up with a locked path besides the one we 759 761 * have here: 760 762 */ 761 763 bch2_trans_unlock(trans); 764 + bch2_trans_begin(trans); 765 + btree_path_idx_t path_idx = bch2_path_get_unlocked_mut(trans, 766 + as->btree_id, b->c.level, b->key.k.p); 767 + struct btree_path *path = trans->paths + path_idx; 762 768 btree_node_lock_nopath_nofail(trans, &b->c, SIX_LOCK_intent); 763 769 mark_btree_node_locked(trans, path, b->c.level, BTREE_NODE_INTENT_LOCKED); 764 770 path->l[b->c.level].lock_seq = six_lock_seq(&b->c.lock); ··· 1156 1154 flags |= watermark; 1157 1155 1158 1156 if (watermark < BCH_WATERMARK_reclaim && 1159 - test_bit(JOURNAL_SPACE_LOW, &c->journal.flags)) { 1157 + test_bit(JOURNAL_space_low, &c->journal.flags)) { 1160 1158 if (flags & BCH_TRANS_COMMIT_journal_reclaim) 1161 1159 return ERR_PTR(-BCH_ERR_journal_reclaim_would_deadlock); 1162 1160 1163 - bch2_trans_unlock(trans); 1164 - wait_event(c->journal.wait, !test_bit(JOURNAL_SPACE_LOW, &c->journal.flags)); 1165 - ret = bch2_trans_relock(trans); 1161 + ret = drop_locks_do(trans, 1162 + ({ wait_event(c->journal.wait, !test_bit(JOURNAL_space_low, &c->journal.flags)); 0; })); 1166 1163 if (ret) 1167 1164 return ERR_PTR(ret); 1168 1165 } ··· 1207 1206 as->start_time = start_time; 1208 1207 as->ip_started = _RET_IP_; 1209 1208 as->mode = BTREE_UPDATE_none; 1210 - as->watermark = watermark; 1209 + as->flags = flags; 1211 1210 as->took_gc_lock = true; 1212 1211 as->btree_id = path->btree_id; 1213 1212 as->update_level_start = level_start; ··· 1361 1360 BUG_ON(insert->k.type == KEY_TYPE_btree_ptr_v2 && 1362 1361 !btree_ptr_sectors_written(insert)); 1363 1362 1364 - if (unlikely(!test_bit(JOURNAL_REPLAY_DONE, &c->journal.flags))) 1363 + if (unlikely(!test_bit(JOURNAL_replay_done, &c->journal.flags))) 1365 1364 bch2_journal_key_overwritten(c, b->c.btree_id, b->c.level, insert->k.p); 1366 1365 1367 1366 if (bch2_bkey_invalid(c, bkey_i_to_s_c(insert), ··· 1620 1619 six_unlock_write(&n2->c.lock); 1621 1620 six_unlock_write(&n1->c.lock); 1622 1621 1623 - path1 = get_unlocked_mut_path(trans, as->btree_id, n1->c.level, n1->key.k.p); 1622 + path1 = bch2_path_get_unlocked_mut(trans, as->btree_id, n1->c.level, n1->key.k.p); 1624 1623 six_lock_increment(&n1->c.lock, SIX_LOCK_intent); 1625 1624 mark_btree_node_locked(trans, trans->paths + path1, n1->c.level, BTREE_NODE_INTENT_LOCKED); 1626 1625 bch2_btree_path_level_init(trans, trans->paths + path1, n1); 1627 1626 1628 - path2 = get_unlocked_mut_path(trans, as->btree_id, n2->c.level, n2->key.k.p); 1627 + path2 = bch2_path_get_unlocked_mut(trans, as->btree_id, n2->c.level, n2->key.k.p); 1629 1628 six_lock_increment(&n2->c.lock, SIX_LOCK_intent); 1630 1629 mark_btree_node_locked(trans, trans->paths + path2, n2->c.level, BTREE_NODE_INTENT_LOCKED); 1631 1630 bch2_btree_path_level_init(trans, trans->paths + path2, n2); ··· 1670 1669 bch2_btree_update_add_new_node(as, n1); 1671 1670 six_unlock_write(&n1->c.lock); 1672 1671 1673 - path1 = get_unlocked_mut_path(trans, as->btree_id, n1->c.level, n1->key.k.p); 1672 + path1 = bch2_path_get_unlocked_mut(trans, as->btree_id, n1->c.level, n1->key.k.p); 1674 1673 six_lock_increment(&n1->c.lock, SIX_LOCK_intent); 1675 1674 mark_btree_node_locked(trans, trans->paths + path1, n1->c.level, BTREE_NODE_INTENT_LOCKED); 1676 1675 bch2_btree_path_level_init(trans, trans->paths + path1, n1); ··· 1948 1947 u64 start_time = local_clock(); 1949 1948 int ret = 0; 1950 1949 1950 + bch2_trans_verify_not_in_restart(trans); 1951 + bch2_trans_verify_not_unlocked(trans); 1951 1952 BUG_ON(!trans->paths[path].should_be_locked); 1952 1953 BUG_ON(!btree_node_locked(&trans->paths[path], level)); 1953 1954 ··· 1982 1979 : bpos_successor(b->data->max_key); 1983 1980 1984 1981 sib_path = bch2_path_get(trans, btree, sib_pos, 1985 - U8_MAX, level, BTREE_ITER_INTENT, _THIS_IP_); 1982 + U8_MAX, level, BTREE_ITER_intent, _THIS_IP_); 1986 1983 ret = bch2_btree_path_traverse(trans, sib_path, false); 1987 1984 if (ret) 1988 1985 goto err; ··· 2075 2072 bch2_btree_update_add_new_node(as, n); 2076 2073 six_unlock_write(&n->c.lock); 2077 2074 2078 - new_path = get_unlocked_mut_path(trans, btree, n->c.level, n->key.k.p); 2075 + new_path = bch2_path_get_unlocked_mut(trans, btree, n->c.level, n->key.k.p); 2079 2076 six_lock_increment(&n->c.lock, SIX_LOCK_intent); 2080 2077 mark_btree_node_locked(trans, trans->paths + new_path, n->c.level, BTREE_NODE_INTENT_LOCKED); 2081 2078 bch2_btree_path_level_init(trans, trans->paths + new_path, n); ··· 2153 2150 bch2_btree_update_add_new_node(as, n); 2154 2151 six_unlock_write(&n->c.lock); 2155 2152 2156 - new_path = get_unlocked_mut_path(trans, iter->btree_id, n->c.level, n->key.k.p); 2153 + new_path = bch2_path_get_unlocked_mut(trans, iter->btree_id, n->c.level, n->key.k.p); 2157 2154 six_lock_increment(&n->c.lock, SIX_LOCK_intent); 2158 2155 mark_btree_node_locked(trans, trans->paths + new_path, n->c.level, BTREE_NODE_INTENT_LOCKED); 2159 2156 bch2_btree_path_level_init(trans, trans->paths + new_path, n); ··· 2336 2333 if (!skip_triggers) { 2337 2334 ret = bch2_key_trigger_old(trans, b->c.btree_id, b->c.level + 1, 2338 2335 bkey_i_to_s_c(&b->key), 2339 - BTREE_TRIGGER_TRANSACTIONAL) ?: 2336 + BTREE_TRIGGER_transactional) ?: 2340 2337 bch2_key_trigger_new(trans, b->c.btree_id, b->c.level + 1, 2341 2338 bkey_i_to_s(new_key), 2342 - BTREE_TRIGGER_TRANSACTIONAL); 2339 + BTREE_TRIGGER_transactional); 2343 2340 if (ret) 2344 2341 return ret; 2345 2342 } ··· 2356 2353 bch2_trans_copy_iter(&iter2, iter); 2357 2354 2358 2355 iter2.path = bch2_btree_path_make_mut(trans, iter2.path, 2359 - iter2.flags & BTREE_ITER_INTENT, 2356 + iter2.flags & BTREE_ITER_intent, 2360 2357 _THIS_IP_); 2361 2358 2362 2359 struct btree_path *path2 = btree_iter_path(trans, &iter2); ··· 2368 2365 trans->paths_sorted = false; 2369 2366 2370 2367 ret = bch2_btree_iter_traverse(&iter2) ?: 2371 - bch2_trans_update(trans, &iter2, new_key, BTREE_TRIGGER_NORUN); 2368 + bch2_trans_update(trans, &iter2, new_key, BTREE_TRIGGER_norun); 2372 2369 if (ret) 2373 2370 goto err; 2374 2371 } else { ··· 2476 2473 2477 2474 bch2_trans_node_iter_init(trans, &iter, b->c.btree_id, b->key.k.p, 2478 2475 BTREE_MAX_DEPTH, b->c.level, 2479 - BTREE_ITER_INTENT); 2476 + BTREE_ITER_intent); 2480 2477 ret = bch2_btree_iter_traverse(&iter); 2481 2478 if (ret) 2482 2479 goto out; ··· 2490 2487 2491 2488 BUG_ON(!btree_node_hashed(b)); 2492 2489 2493 - struct bch_extent_ptr *ptr; 2494 2490 bch2_bkey_drop_ptrs(bkey_i_to_s(new_key), ptr, 2495 2491 !bch2_bkey_has_device(bkey_i_to_s(&b->key), ptr->dev)); 2496 2492 ··· 2513 2511 bch2_btree_set_root_inmem(c, b); 2514 2512 } 2515 2513 2516 - static int __bch2_btree_root_alloc_fake(struct btree_trans *trans, enum btree_id id, unsigned level) 2514 + int bch2_btree_root_alloc_fake_trans(struct btree_trans *trans, enum btree_id id, unsigned level) 2517 2515 { 2518 2516 struct bch_fs *c = trans->c; 2519 2517 struct closure cl; ··· 2561 2559 2562 2560 void bch2_btree_root_alloc_fake(struct bch_fs *c, enum btree_id id, unsigned level) 2563 2561 { 2564 - bch2_trans_run(c, __bch2_btree_root_alloc_fake(trans, id, level)); 2562 + bch2_trans_run(c, bch2_btree_root_alloc_fake_trans(trans, id, level)); 2565 2563 } 2566 2564 2567 2565 static void bch2_btree_update_to_text(struct printbuf *out, struct btree_update *as) 2568 2566 { 2569 - prt_printf(out, "%ps: btree=%s l=%u-%u watermark=%s mode=%s nodes_written=%u cl.remaining=%u journal_seq=%llu\n", 2570 - (void *) as->ip_started, 2567 + prt_printf(out, "%ps: ", (void *) as->ip_started); 2568 + bch2_trans_commit_flags_to_text(out, as->flags); 2569 + 2570 + prt_printf(out, " btree=%s l=%u-%u mode=%s nodes_written=%u cl.remaining=%u journal_seq=%llu\n", 2571 2571 bch2_btree_id_str(as->btree_id), 2572 2572 as->update_level_start, 2573 2573 as->update_level_end, 2574 - bch2_watermarks[as->watermark], 2575 2574 bch2_btree_update_modes[as->mode], 2576 2575 as->nodes_written, 2577 2576 closure_nr_remaining(&as->cl),
+6 -1
fs/bcachefs/btree_update_interior.h
··· 52 52 struct list_head unwritten_list; 53 53 54 54 enum btree_update_mode mode; 55 - enum bch_watermark watermark; 55 + enum bch_trans_commit_flags flags; 56 56 unsigned nodes_written:1; 57 57 unsigned took_gc_lock:1; 58 58 ··· 144 144 145 145 EBUG_ON(!btree_node_locked(path, level)); 146 146 147 + if (bch2_btree_node_merging_disabled) 148 + return 0; 149 + 147 150 b = path->l[level].b; 148 151 if (b->sib_u64s[sib] > trans->c->btree_foreground_merge_threshold) 149 152 return 0; ··· 175 172 struct bkey_i *, unsigned, bool); 176 173 177 174 void bch2_btree_set_root_for_read(struct bch_fs *, struct btree *); 175 + 176 + int bch2_btree_root_alloc_fake_trans(struct btree_trans *, enum btree_id, unsigned); 178 177 void bch2_btree_root_alloc_fake(struct bch_fs *, enum btree_id, unsigned); 179 178 180 179 static inline unsigned btree_update_reserve_required(struct bch_fs *c,
+4 -4
fs/bcachefs/btree_write_buffer.c
··· 122 122 trans->journal_res.seq = wb->journal_seq; 123 123 124 124 return bch2_trans_update(trans, iter, &wb->k, 125 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?: 125 + BTREE_UPDATE_internal_snapshot_node) ?: 126 126 bch2_trans_commit(trans, NULL, NULL, 127 127 BCH_TRANS_COMMIT_no_enospc| 128 128 BCH_TRANS_COMMIT_no_check_rw| ··· 191 191 int ret; 192 192 193 193 bch2_trans_iter_init(trans, &iter, wb->btree, bkey_start_pos(&wb->k.k), 194 - BTREE_ITER_CACHED|BTREE_ITER_INTENT); 194 + BTREE_ITER_cached|BTREE_ITER_intent); 195 195 196 196 trans->journal_res.seq = wb->journal_seq; 197 197 198 198 ret = bch2_btree_iter_traverse(&iter) ?: 199 199 bch2_trans_update(trans, &iter, &wb->k, 200 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 200 + BTREE_UPDATE_internal_snapshot_node); 201 201 bch2_trans_iter_exit(trans, &iter); 202 202 return ret; 203 203 } ··· 332 332 if (!iter.path || iter.btree_id != k->btree) { 333 333 bch2_trans_iter_exit(trans, &iter); 334 334 bch2_trans_iter_init(trans, &iter, k->btree, k->k.k.p, 335 - BTREE_ITER_INTENT|BTREE_ITER_ALL_SNAPSHOTS); 335 + BTREE_ITER_intent|BTREE_ITER_all_snapshots); 336 336 } 337 337 338 338 bch2_btree_iter_set_pos(&iter, k->k.k.p);
+475 -250
fs/bcachefs/buckets.c
··· 274 274 275 275 void bch2_dev_usage_to_text(struct printbuf *out, struct bch_dev_usage *usage) 276 276 { 277 - prt_tab(out); 278 - prt_str(out, "buckets"); 279 - prt_tab_rjust(out); 280 - prt_str(out, "sectors"); 281 - prt_tab_rjust(out); 282 - prt_str(out, "fragmented"); 283 - prt_tab_rjust(out); 284 - prt_newline(out); 277 + prt_printf(out, "\tbuckets\rsectors\rfragmented\r\n"); 285 278 286 279 for (unsigned i = 0; i < BCH_DATA_NR; i++) { 287 280 bch2_prt_data_type(out, i); 288 - prt_tab(out); 289 - prt_u64(out, usage->d[i].buckets); 290 - prt_tab_rjust(out); 291 - prt_u64(out, usage->d[i].sectors); 292 - prt_tab_rjust(out); 293 - prt_u64(out, usage->d[i].fragmented); 294 - prt_tab_rjust(out); 295 - prt_newline(out); 281 + prt_printf(out, "\t%llu\r%llu\r%llu\r\n", 282 + usage->d[i].buckets, 283 + usage->d[i].sectors, 284 + usage->d[i].fragmented); 296 285 } 297 286 } 298 287 ··· 316 327 u->d[new->data_type].fragmented += bch2_bucket_sectors_fragmented(ca, *new); 317 328 318 329 preempt_enable(); 319 - } 320 - 321 - static inline struct bch_alloc_v4 bucket_m_to_alloc(struct bucket b) 322 - { 323 - return (struct bch_alloc_v4) { 324 - .gen = b.gen, 325 - .data_type = b.data_type, 326 - .dirty_sectors = b.dirty_sectors, 327 - .cached_sectors = b.cached_sectors, 328 - .stripe = b.stripe, 329 - }; 330 - } 331 - 332 - void bch2_dev_usage_update_m(struct bch_fs *c, struct bch_dev *ca, 333 - struct bucket *old, struct bucket *new) 334 - { 335 - struct bch_alloc_v4 old_a = bucket_m_to_alloc(*old); 336 - struct bch_alloc_v4 new_a = bucket_m_to_alloc(*new); 337 - 338 - bch2_dev_usage_update(c, ca, &old_a, &new_a, 0, true); 339 330 } 340 331 341 332 static inline int __update_replicas(struct bch_fs *c, ··· 465 496 return bch2_update_replicas_list(trans, &r.e, sectors); 466 497 } 467 498 468 - int bch2_mark_metadata_bucket(struct bch_fs *c, struct bch_dev *ca, 469 - size_t b, enum bch_data_type data_type, 470 - unsigned sectors, struct gc_pos pos, 471 - unsigned flags) 472 - { 473 - struct bucket old, new, *g; 474 - int ret = 0; 475 - 476 - BUG_ON(!(flags & BTREE_TRIGGER_GC)); 477 - BUG_ON(data_type != BCH_DATA_sb && 478 - data_type != BCH_DATA_journal); 479 - 480 - /* 481 - * Backup superblock might be past the end of our normal usable space: 482 - */ 483 - if (b >= ca->mi.nbuckets) 484 - return 0; 485 - 486 - percpu_down_read(&c->mark_lock); 487 - g = gc_bucket(ca, b); 488 - 489 - bucket_lock(g); 490 - old = *g; 491 - 492 - if (bch2_fs_inconsistent_on(g->data_type && 493 - g->data_type != data_type, c, 494 - "different types of data in same bucket: %s, %s", 495 - bch2_data_type_str(g->data_type), 496 - bch2_data_type_str(data_type))) { 497 - ret = -EIO; 498 - goto err; 499 - } 500 - 501 - if (bch2_fs_inconsistent_on((u64) g->dirty_sectors + sectors > ca->mi.bucket_size, c, 502 - "bucket %u:%zu gen %u data type %s sector count overflow: %u + %u > bucket size", 503 - ca->dev_idx, b, g->gen, 504 - bch2_data_type_str(g->data_type ?: data_type), 505 - g->dirty_sectors, sectors)) { 506 - ret = -EIO; 507 - goto err; 508 - } 509 - 510 - g->data_type = data_type; 511 - g->dirty_sectors += sectors; 512 - new = *g; 513 - err: 514 - bucket_unlock(g); 515 - if (!ret) 516 - bch2_dev_usage_update_m(c, ca, &old, &new); 517 - percpu_up_read(&c->mark_lock); 518 - return ret; 519 - } 520 - 521 - int bch2_check_bucket_ref(struct btree_trans *trans, 522 - struct bkey_s_c k, 523 - const struct bch_extent_ptr *ptr, 524 - s64 sectors, enum bch_data_type ptr_data_type, 525 - u8 b_gen, u8 bucket_data_type, 526 - u32 bucket_sectors) 499 + int bch2_check_fix_ptrs(struct btree_trans *trans, 500 + enum btree_id btree, unsigned level, struct bkey_s_c k, 501 + enum btree_iter_update_trigger_flags flags) 527 502 { 528 503 struct bch_fs *c = trans->c; 529 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 530 - size_t bucket_nr = PTR_BUCKET_NR(ca, ptr); 504 + struct bkey_ptrs_c ptrs_c = bch2_bkey_ptrs_c(k); 505 + const union bch_extent_entry *entry_c; 506 + struct extent_ptr_decoded p = { 0 }; 507 + bool do_update = false; 531 508 struct printbuf buf = PRINTBUF; 532 509 int ret = 0; 533 510 534 - if (bucket_data_type == BCH_DATA_cached) 535 - bucket_data_type = BCH_DATA_user; 511 + percpu_down_read(&c->mark_lock); 536 512 537 - if ((bucket_data_type == BCH_DATA_stripe && ptr_data_type == BCH_DATA_user) || 538 - (bucket_data_type == BCH_DATA_user && ptr_data_type == BCH_DATA_stripe)) 539 - bucket_data_type = ptr_data_type = BCH_DATA_stripe; 513 + rcu_read_lock(); 514 + bkey_for_each_ptr_decode(k.k, ptrs_c, p, entry_c) { 515 + struct bch_dev *ca = bch2_dev_rcu(c, p.ptr.dev); 516 + if (!ca) { 517 + if (fsck_err(c, ptr_to_invalid_device, 518 + "pointer to missing device %u\n" 519 + "while marking %s", 520 + p.ptr.dev, 521 + (printbuf_reset(&buf), 522 + bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 523 + do_update = true; 524 + continue; 525 + } 526 + 527 + struct bucket *g = PTR_GC_BUCKET(ca, &p.ptr); 528 + enum bch_data_type data_type = bch2_bkey_ptr_data_type(k, p, entry_c); 529 + 530 + if (fsck_err_on(!g->gen_valid, 531 + c, ptr_to_missing_alloc_key, 532 + "bucket %u:%zu data type %s ptr gen %u missing in alloc btree\n" 533 + "while marking %s", 534 + p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), 535 + bch2_data_type_str(ptr_data_type(k.k, &p.ptr)), 536 + p.ptr.gen, 537 + (printbuf_reset(&buf), 538 + bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 539 + if (!p.ptr.cached) { 540 + g->gen_valid = true; 541 + g->gen = p.ptr.gen; 542 + } else { 543 + do_update = true; 544 + } 545 + } 546 + 547 + if (fsck_err_on(gen_cmp(p.ptr.gen, g->gen) > 0, 548 + c, ptr_gen_newer_than_bucket_gen, 549 + "bucket %u:%zu data type %s ptr gen in the future: %u > %u\n" 550 + "while marking %s", 551 + p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), 552 + bch2_data_type_str(ptr_data_type(k.k, &p.ptr)), 553 + p.ptr.gen, g->gen, 554 + (printbuf_reset(&buf), 555 + bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 556 + if (!p.ptr.cached && 557 + (g->data_type != BCH_DATA_btree || 558 + data_type == BCH_DATA_btree)) { 559 + g->gen_valid = true; 560 + g->gen = p.ptr.gen; 561 + g->data_type = 0; 562 + g->dirty_sectors = 0; 563 + g->cached_sectors = 0; 564 + } else { 565 + do_update = true; 566 + } 567 + } 568 + 569 + if (fsck_err_on(gen_cmp(g->gen, p.ptr.gen) > BUCKET_GC_GEN_MAX, 570 + c, ptr_gen_newer_than_bucket_gen, 571 + "bucket %u:%zu gen %u data type %s: ptr gen %u too stale\n" 572 + "while marking %s", 573 + p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), g->gen, 574 + bch2_data_type_str(ptr_data_type(k.k, &p.ptr)), 575 + p.ptr.gen, 576 + (printbuf_reset(&buf), 577 + bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 578 + do_update = true; 579 + 580 + if (fsck_err_on(!p.ptr.cached && gen_cmp(p.ptr.gen, g->gen) < 0, 581 + c, stale_dirty_ptr, 582 + "bucket %u:%zu data type %s stale dirty ptr: %u < %u\n" 583 + "while marking %s", 584 + p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), 585 + bch2_data_type_str(ptr_data_type(k.k, &p.ptr)), 586 + p.ptr.gen, g->gen, 587 + (printbuf_reset(&buf), 588 + bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 589 + do_update = true; 590 + 591 + if (data_type != BCH_DATA_btree && p.ptr.gen != g->gen) 592 + continue; 593 + 594 + if (fsck_err_on(bucket_data_type_mismatch(g->data_type, data_type), 595 + c, ptr_bucket_data_type_mismatch, 596 + "bucket %u:%zu gen %u different types of data in same bucket: %s, %s\n" 597 + "while marking %s", 598 + p.ptr.dev, PTR_BUCKET_NR(ca, &p.ptr), g->gen, 599 + bch2_data_type_str(g->data_type), 600 + bch2_data_type_str(data_type), 601 + (printbuf_reset(&buf), 602 + bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 603 + if (data_type == BCH_DATA_btree) { 604 + g->gen_valid = true; 605 + g->gen = p.ptr.gen; 606 + g->data_type = data_type; 607 + g->dirty_sectors = 0; 608 + g->cached_sectors = 0; 609 + } else { 610 + do_update = true; 611 + } 612 + } 613 + 614 + if (p.has_ec) { 615 + struct gc_stripe *m = genradix_ptr(&c->gc_stripes, p.ec.idx); 616 + 617 + if (fsck_err_on(!m || !m->alive, c, 618 + ptr_to_missing_stripe, 619 + "pointer to nonexistent stripe %llu\n" 620 + "while marking %s", 621 + (u64) p.ec.idx, 622 + (printbuf_reset(&buf), 623 + bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 624 + do_update = true; 625 + 626 + if (fsck_err_on(m && m->alive && !bch2_ptr_matches_stripe_m(m, p), c, 627 + ptr_to_incorrect_stripe, 628 + "pointer does not match stripe %llu\n" 629 + "while marking %s", 630 + (u64) p.ec.idx, 631 + (printbuf_reset(&buf), 632 + bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 633 + do_update = true; 634 + } 635 + } 636 + rcu_read_unlock(); 637 + 638 + if (do_update) { 639 + if (flags & BTREE_TRIGGER_is_root) { 640 + bch_err(c, "cannot update btree roots yet"); 641 + ret = -EINVAL; 642 + goto err; 643 + } 644 + 645 + struct bkey_i *new = bch2_bkey_make_mut_noupdate(trans, k); 646 + ret = PTR_ERR_OR_ZERO(new); 647 + if (ret) 648 + goto err; 649 + 650 + rcu_read_lock(); 651 + bch2_bkey_drop_ptrs(bkey_i_to_s(new), ptr, !bch2_dev_rcu(c, ptr->dev)); 652 + rcu_read_unlock(); 653 + 654 + if (level) { 655 + /* 656 + * We don't want to drop btree node pointers - if the 657 + * btree node isn't there anymore, the read path will 658 + * sort it out: 659 + */ 660 + struct bkey_ptrs ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 661 + rcu_read_lock(); 662 + bkey_for_each_ptr(ptrs, ptr) { 663 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 664 + struct bucket *g = PTR_GC_BUCKET(ca, ptr); 665 + 666 + ptr->gen = g->gen; 667 + } 668 + rcu_read_unlock(); 669 + } else { 670 + struct bkey_ptrs ptrs; 671 + union bch_extent_entry *entry; 672 + restart_drop_ptrs: 673 + ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 674 + rcu_read_lock(); 675 + bkey_for_each_ptr_decode(bkey_i_to_s(new).k, ptrs, p, entry) { 676 + struct bch_dev *ca = bch2_dev_rcu(c, p.ptr.dev); 677 + struct bucket *g = PTR_GC_BUCKET(ca, &p.ptr); 678 + enum bch_data_type data_type = bch2_bkey_ptr_data_type(bkey_i_to_s_c(new), p, entry); 679 + 680 + if ((p.ptr.cached && 681 + (!g->gen_valid || gen_cmp(p.ptr.gen, g->gen) > 0)) || 682 + (!p.ptr.cached && 683 + gen_cmp(p.ptr.gen, g->gen) < 0) || 684 + gen_cmp(g->gen, p.ptr.gen) > BUCKET_GC_GEN_MAX || 685 + (g->data_type && 686 + g->data_type != data_type)) { 687 + bch2_bkey_drop_ptr(bkey_i_to_s(new), &entry->ptr); 688 + goto restart_drop_ptrs; 689 + } 690 + } 691 + rcu_read_unlock(); 692 + again: 693 + ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 694 + bkey_extent_entry_for_each(ptrs, entry) { 695 + if (extent_entry_type(entry) == BCH_EXTENT_ENTRY_stripe_ptr) { 696 + struct gc_stripe *m = genradix_ptr(&c->gc_stripes, 697 + entry->stripe_ptr.idx); 698 + union bch_extent_entry *next_ptr; 699 + 700 + bkey_extent_entry_for_each_from(ptrs, next_ptr, entry) 701 + if (extent_entry_type(next_ptr) == BCH_EXTENT_ENTRY_ptr) 702 + goto found; 703 + next_ptr = NULL; 704 + found: 705 + if (!next_ptr) { 706 + bch_err(c, "aieee, found stripe ptr with no data ptr"); 707 + continue; 708 + } 709 + 710 + if (!m || !m->alive || 711 + !__bch2_ptr_matches_stripe(&m->ptrs[entry->stripe_ptr.block], 712 + &next_ptr->ptr, 713 + m->sectors)) { 714 + bch2_bkey_extent_entry_drop(new, entry); 715 + goto again; 716 + } 717 + } 718 + } 719 + } 720 + 721 + if (0) { 722 + printbuf_reset(&buf); 723 + bch2_bkey_val_to_text(&buf, c, k); 724 + bch_info(c, "updated %s", buf.buf); 725 + 726 + printbuf_reset(&buf); 727 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(new)); 728 + bch_info(c, "new key %s", buf.buf); 729 + } 730 + 731 + percpu_up_read(&c->mark_lock); 732 + struct btree_iter iter; 733 + bch2_trans_node_iter_init(trans, &iter, btree, new->k.p, 0, level, 734 + BTREE_ITER_intent|BTREE_ITER_all_snapshots); 735 + ret = bch2_btree_iter_traverse(&iter) ?: 736 + bch2_trans_update(trans, &iter, new, 737 + BTREE_UPDATE_internal_snapshot_node| 738 + BTREE_TRIGGER_norun); 739 + bch2_trans_iter_exit(trans, &iter); 740 + percpu_down_read(&c->mark_lock); 741 + 742 + if (ret) 743 + goto err; 744 + 745 + if (level) 746 + bch2_btree_node_update_key_early(trans, btree, level - 1, k, new); 747 + } 748 + err: 749 + fsck_err: 750 + percpu_up_read(&c->mark_lock); 751 + printbuf_exit(&buf); 752 + return ret; 753 + } 754 + 755 + int bch2_bucket_ref_update(struct btree_trans *trans, struct bch_dev *ca, 756 + struct bkey_s_c k, 757 + const struct bch_extent_ptr *ptr, 758 + s64 sectors, enum bch_data_type ptr_data_type, 759 + u8 b_gen, u8 bucket_data_type, 760 + u32 *bucket_sectors) 761 + { 762 + struct bch_fs *c = trans->c; 763 + size_t bucket_nr = PTR_BUCKET_NR(ca, ptr); 764 + struct printbuf buf = PRINTBUF; 765 + bool inserting = sectors > 0; 766 + int ret = 0; 767 + 768 + BUG_ON(!sectors); 540 769 541 770 if (gen_after(ptr->gen, b_gen)) { 542 771 bch2_fsck_err(c, FSCK_CAN_IGNORE|FSCK_NEED_FSCK, ··· 745 578 bch2_data_type_str(bucket_data_type ?: ptr_data_type), 746 579 ptr->gen, 747 580 (bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 748 - ret = -EIO; 749 - goto err; 581 + if (inserting) 582 + goto err; 583 + goto out; 750 584 } 751 585 752 586 if (gen_cmp(b_gen, ptr->gen) > BUCKET_GC_GEN_MAX) { ··· 760 592 ptr->gen, 761 593 (printbuf_reset(&buf), 762 594 bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 763 - ret = -EIO; 764 - goto err; 595 + if (inserting) 596 + goto err; 597 + goto out; 765 598 } 766 599 767 - if (b_gen != ptr->gen && !ptr->cached) { 600 + if (b_gen != ptr->gen && ptr->cached) { 601 + ret = 1; 602 + goto out; 603 + } 604 + 605 + if (b_gen != ptr->gen) { 768 606 bch2_fsck_err(c, FSCK_CAN_IGNORE|FSCK_NEED_FSCK, 769 607 BCH_FSCK_ERR_stale_dirty_ptr, 770 608 "bucket %u:%zu gen %u (mem gen %u) data type %s: stale dirty ptr (gen %u)\n" ··· 781 607 ptr->gen, 782 608 (printbuf_reset(&buf), 783 609 bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 784 - ret = -EIO; 785 - goto err; 786 - } 787 - 788 - if (b_gen != ptr->gen) { 789 - ret = 1; 610 + if (inserting) 611 + goto err; 790 612 goto out; 791 613 } 792 614 793 - if (!data_type_is_empty(bucket_data_type) && 794 - ptr_data_type && 795 - bucket_data_type != ptr_data_type) { 615 + if (bucket_data_type_mismatch(bucket_data_type, ptr_data_type)) { 796 616 bch2_fsck_err(c, FSCK_CAN_IGNORE|FSCK_NEED_FSCK, 797 617 BCH_FSCK_ERR_ptr_bucket_data_type_mismatch, 798 618 "bucket %u:%zu gen %u different types of data in same bucket: %s, %s\n" ··· 796 628 bch2_data_type_str(ptr_data_type), 797 629 (printbuf_reset(&buf), 798 630 bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 799 - ret = -EIO; 800 - goto err; 631 + if (inserting) 632 + goto err; 633 + goto out; 801 634 } 802 635 803 - if ((u64) bucket_sectors + sectors > U32_MAX) { 636 + if ((u64) *bucket_sectors + sectors > U32_MAX) { 804 637 bch2_fsck_err(c, FSCK_CAN_IGNORE|FSCK_NEED_FSCK, 805 638 BCH_FSCK_ERR_bucket_sector_count_overflow, 806 639 "bucket %u:%zu gen %u data type %s sector count overflow: %u + %lli > U32_MAX\n" 807 640 "while marking %s", 808 641 ptr->dev, bucket_nr, b_gen, 809 642 bch2_data_type_str(bucket_data_type ?: ptr_data_type), 810 - bucket_sectors, sectors, 643 + *bucket_sectors, sectors, 811 644 (printbuf_reset(&buf), 812 645 bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 813 - ret = -EIO; 814 - goto err; 646 + if (inserting) 647 + goto err; 648 + sectors = -*bucket_sectors; 815 649 } 650 + 651 + *bucket_sectors += sectors; 816 652 out: 817 653 printbuf_exit(&buf); 818 654 return ret; 819 655 err: 820 656 bch2_dump_trans_updates(trans); 657 + ret = -EIO; 821 658 goto out; 822 659 } 823 660 ··· 959 786 960 787 /* KEY_TYPE_extent: */ 961 788 962 - static int __mark_pointer(struct btree_trans *trans, 789 + static int __mark_pointer(struct btree_trans *trans, struct bch_dev *ca, 963 790 struct bkey_s_c k, 964 791 const struct bch_extent_ptr *ptr, 965 792 s64 sectors, enum bch_data_type ptr_data_type, 966 - u8 bucket_gen, u8 *bucket_data_type, 967 - u32 *dirty_sectors, u32 *cached_sectors) 793 + struct bch_alloc_v4 *a) 968 794 { 969 795 u32 *dst_sectors = !ptr->cached 970 - ? dirty_sectors 971 - : cached_sectors; 972 - int ret = bch2_check_bucket_ref(trans, k, ptr, sectors, ptr_data_type, 973 - bucket_gen, *bucket_data_type, *dst_sectors); 796 + ? &a->dirty_sectors 797 + : &a->cached_sectors; 798 + int ret = bch2_bucket_ref_update(trans, ca, k, ptr, sectors, ptr_data_type, 799 + a->gen, a->data_type, dst_sectors); 974 800 975 801 if (ret) 976 802 return ret; 977 803 978 - *dst_sectors += sectors; 979 - 980 - if (!*dirty_sectors && !*cached_sectors) 981 - *bucket_data_type = 0; 982 - else if (*bucket_data_type != BCH_DATA_stripe) 983 - *bucket_data_type = ptr_data_type; 984 - 804 + alloc_data_type_set(a, ptr_data_type); 985 805 return 0; 986 806 } 987 807 ··· 982 816 enum btree_id btree_id, unsigned level, 983 817 struct bkey_s_c k, struct extent_ptr_decoded p, 984 818 const union bch_extent_entry *entry, 985 - s64 *sectors, unsigned flags) 819 + s64 *sectors, 820 + enum btree_iter_update_trigger_flags flags) 986 821 { 987 - bool insert = !(flags & BTREE_TRIGGER_OVERWRITE); 822 + bool insert = !(flags & BTREE_TRIGGER_overwrite); 823 + int ret = 0; 824 + 825 + struct bch_fs *c = trans->c; 826 + struct bch_dev *ca = bch2_dev_tryget(c, p.ptr.dev); 827 + if (unlikely(!ca)) { 828 + if (insert) 829 + ret = -EIO; 830 + goto err; 831 + } 832 + 988 833 struct bpos bucket; 989 834 struct bch_backpointer bp; 990 - 991 - bch2_extent_ptr_to_bp(trans->c, btree_id, level, k, p, entry, &bucket, &bp); 835 + bch2_extent_ptr_to_bp(trans->c, ca, btree_id, level, k, p, entry, &bucket, &bp); 992 836 *sectors = insert ? bp.bucket_len : -((s64) bp.bucket_len); 993 837 994 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { 995 - struct btree_iter iter; 996 - struct bkey_i_alloc_v4 *a = bch2_trans_start_alloc_update(trans, &iter, bucket); 997 - int ret = PTR_ERR_OR_ZERO(a); 838 + if (flags & BTREE_TRIGGER_transactional) { 839 + struct bkey_i_alloc_v4 *a = bch2_trans_start_alloc_update(trans, bucket); 840 + ret = PTR_ERR_OR_ZERO(a) ?: 841 + __mark_pointer(trans, ca, k, &p.ptr, *sectors, bp.data_type, &a->v); 998 842 if (ret) 999 - return ret; 1000 - 1001 - ret = __mark_pointer(trans, k, &p.ptr, *sectors, bp.data_type, 1002 - a->v.gen, &a->v.data_type, 1003 - &a->v.dirty_sectors, &a->v.cached_sectors) ?: 1004 - bch2_trans_update(trans, &iter, &a->k_i, 0); 1005 - bch2_trans_iter_exit(trans, &iter); 1006 - 1007 - if (ret) 1008 - return ret; 843 + goto err; 1009 844 1010 845 if (!p.ptr.cached) { 1011 - ret = bch2_bucket_backpointer_mod(trans, bucket, bp, k, insert); 846 + ret = bch2_bucket_backpointer_mod(trans, ca, bucket, bp, k, insert); 1012 847 if (ret) 1013 - return ret; 848 + goto err; 1014 849 } 1015 850 } 1016 851 1017 - if (flags & BTREE_TRIGGER_GC) { 1018 - struct bch_fs *c = trans->c; 1019 - struct bch_dev *ca = bch_dev_bkey_exists(c, p.ptr.dev); 1020 - enum bch_data_type data_type = bch2_bkey_ptr_data_type(k, p, entry); 1021 - 852 + if (flags & BTREE_TRIGGER_gc) { 1022 853 percpu_down_read(&c->mark_lock); 1023 - struct bucket *g = PTR_GC_BUCKET(ca, &p.ptr); 854 + struct bucket *g = gc_bucket(ca, bucket.offset); 1024 855 bucket_lock(g); 1025 - struct bucket old = *g; 1026 - 1027 - u8 bucket_data_type = g->data_type; 1028 - int ret = __mark_pointer(trans, k, &p.ptr, *sectors, 1029 - data_type, g->gen, 1030 - &bucket_data_type, 1031 - &g->dirty_sectors, 1032 - &g->cached_sectors); 1033 - if (ret) { 1034 - bucket_unlock(g); 1035 - percpu_up_read(&c->mark_lock); 1036 - return ret; 856 + struct bch_alloc_v4 old = bucket_m_to_alloc(*g), new = old; 857 + ret = __mark_pointer(trans, ca, k, &p.ptr, *sectors, bp.data_type, &new); 858 + if (!ret) { 859 + alloc_to_bucket(g, new); 860 + bch2_dev_usage_update(c, ca, &old, &new, 0, true); 1037 861 } 1038 - 1039 - g->data_type = bucket_data_type; 1040 - struct bucket new = *g; 1041 862 bucket_unlock(g); 1042 - bch2_dev_usage_update_m(c, ca, &old, &new); 1043 863 percpu_up_read(&c->mark_lock); 1044 864 } 1045 - 1046 - return 0; 865 + err: 866 + bch2_dev_put(ca); 867 + return ret; 1047 868 } 1048 869 1049 870 static int bch2_trigger_stripe_ptr(struct btree_trans *trans, 1050 871 struct bkey_s_c k, 1051 872 struct extent_ptr_decoded p, 1052 873 enum bch_data_type data_type, 1053 - s64 sectors, unsigned flags) 874 + s64 sectors, 875 + enum btree_iter_update_trigger_flags flags) 1054 876 { 1055 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { 877 + if (flags & BTREE_TRIGGER_transactional) { 1056 878 struct btree_iter iter; 1057 879 struct bkey_i_stripe *s = bch2_bkey_get_mut_typed(trans, &iter, 1058 880 BTREE_ID_stripes, POS(0, p.ec.idx), 1059 - BTREE_ITER_WITH_UPDATES, stripe); 881 + BTREE_ITER_with_updates, stripe); 1060 882 int ret = PTR_ERR_OR_ZERO(s); 1061 883 if (unlikely(ret)) { 1062 884 bch2_trans_inconsistent_on(bch2_err_matches(ret, ENOENT), trans, ··· 1074 920 return ret; 1075 921 } 1076 922 1077 - if (flags & BTREE_TRIGGER_GC) { 923 + if (flags & BTREE_TRIGGER_gc) { 1078 924 struct bch_fs *c = trans->c; 1079 925 1080 - BUG_ON(!(flags & BTREE_TRIGGER_GC)); 926 + BUG_ON(!(flags & BTREE_TRIGGER_gc)); 1081 927 1082 928 struct gc_stripe *m = genradix_ptr_alloc(&c->gc_stripes, p.ec.idx, GFP_KERNEL); 1083 929 if (!m) { ··· 1113 959 1114 960 static int __trigger_extent(struct btree_trans *trans, 1115 961 enum btree_id btree_id, unsigned level, 1116 - struct bkey_s_c k, unsigned flags) 962 + struct bkey_s_c k, 963 + enum btree_iter_update_trigger_flags flags) 1117 964 { 1118 - bool gc = flags & BTREE_TRIGGER_GC; 965 + bool gc = flags & BTREE_TRIGGER_gc; 1119 966 struct bch_fs *c = trans->c; 1120 967 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1121 968 const union bch_extent_entry *entry; ··· 1125 970 enum bch_data_type data_type = bkey_is_btree_ptr(k.k) 1126 971 ? BCH_DATA_btree 1127 972 : BCH_DATA_user; 1128 - s64 dirty_sectors = 0; 973 + s64 replicas_sectors = 0; 1129 974 int ret = 0; 1130 975 1131 976 r.e.data_type = data_type; ··· 1151 996 return ret; 1152 997 } 1153 998 } else if (!p.has_ec) { 1154 - dirty_sectors += disk_sectors; 999 + replicas_sectors += disk_sectors; 1155 1000 r.e.devs[r.e.nr_devs++] = p.ptr.dev; 1156 1001 } else { 1157 1002 ret = bch2_trigger_stripe_ptr(trans, k, p, data_type, disk_sectors, flags); ··· 1169 1014 1170 1015 if (r.e.nr_devs) { 1171 1016 ret = !gc 1172 - ? bch2_update_replicas_list(trans, &r.e, dirty_sectors) 1173 - : bch2_update_replicas(c, k, &r.e, dirty_sectors, 0, true); 1017 + ? bch2_update_replicas_list(trans, &r.e, replicas_sectors) 1018 + : bch2_update_replicas(c, k, &r.e, replicas_sectors, 0, true); 1174 1019 if (unlikely(ret && gc)) { 1175 1020 struct printbuf buf = PRINTBUF; 1176 1021 ··· 1186 1031 } 1187 1032 1188 1033 int bch2_trigger_extent(struct btree_trans *trans, 1189 - enum btree_id btree_id, unsigned level, 1034 + enum btree_id btree, unsigned level, 1190 1035 struct bkey_s_c old, struct bkey_s new, 1191 - unsigned flags) 1036 + enum btree_iter_update_trigger_flags flags) 1192 1037 { 1193 1038 struct bkey_ptrs_c new_ptrs = bch2_bkey_ptrs_c(new.s_c); 1194 1039 struct bkey_ptrs_c old_ptrs = bch2_bkey_ptrs_c(old); 1195 1040 unsigned new_ptrs_bytes = (void *) new_ptrs.end - (void *) new_ptrs.start; 1196 1041 unsigned old_ptrs_bytes = (void *) old_ptrs.end - (void *) old_ptrs.start; 1042 + 1043 + if (unlikely(flags & BTREE_TRIGGER_check_repair)) 1044 + return bch2_check_fix_ptrs(trans, btree, level, new.s_c, flags); 1197 1045 1198 1046 /* if pointers aren't changing - nothing to do: */ 1199 1047 if (new_ptrs_bytes == old_ptrs_bytes && ··· 1205 1047 new_ptrs_bytes)) 1206 1048 return 0; 1207 1049 1208 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { 1050 + if (flags & BTREE_TRIGGER_transactional) { 1209 1051 struct bch_fs *c = trans->c; 1210 1052 int mod = (int) bch2_bkey_needs_rebalance(c, new.s_c) - 1211 1053 (int) bch2_bkey_needs_rebalance(c, old); ··· 1218 1060 } 1219 1061 } 1220 1062 1221 - if (flags & (BTREE_TRIGGER_TRANSACTIONAL|BTREE_TRIGGER_GC)) 1222 - return trigger_run_overwrite_then_insert(__trigger_extent, trans, btree_id, level, old, new, flags); 1063 + if (flags & (BTREE_TRIGGER_transactional|BTREE_TRIGGER_gc)) 1064 + return trigger_run_overwrite_then_insert(__trigger_extent, trans, btree, level, old, new, flags); 1223 1065 1224 1066 return 0; 1225 1067 } ··· 1227 1069 /* KEY_TYPE_reservation */ 1228 1070 1229 1071 static int __trigger_reservation(struct btree_trans *trans, 1230 - enum btree_id btree_id, unsigned level, 1231 - struct bkey_s_c k, unsigned flags) 1072 + enum btree_id btree_id, unsigned level, struct bkey_s_c k, 1073 + enum btree_iter_update_trigger_flags flags) 1232 1074 { 1233 1075 struct bch_fs *c = trans->c; 1234 1076 unsigned replicas = bkey_s_c_to_reservation(k).v->nr_replicas; 1235 1077 s64 sectors = (s64) k.k->size * replicas; 1236 1078 1237 - if (flags & BTREE_TRIGGER_OVERWRITE) 1079 + if (flags & BTREE_TRIGGER_overwrite) 1238 1080 sectors = -sectors; 1239 1081 1240 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { 1082 + if (flags & BTREE_TRIGGER_transactional) { 1241 1083 int ret = bch2_replicas_deltas_realloc(trans, 0); 1242 1084 if (ret) 1243 1085 return ret; ··· 1248 1090 d->persistent_reserved[replicas - 1] += sectors; 1249 1091 } 1250 1092 1251 - if (flags & BTREE_TRIGGER_GC) { 1093 + if (flags & BTREE_TRIGGER_gc) { 1252 1094 percpu_down_read(&c->mark_lock); 1253 1095 preempt_disable(); 1254 1096 ··· 1268 1110 int bch2_trigger_reservation(struct btree_trans *trans, 1269 1111 enum btree_id btree_id, unsigned level, 1270 1112 struct bkey_s_c old, struct bkey_s new, 1271 - unsigned flags) 1113 + enum btree_iter_update_trigger_flags flags) 1272 1114 { 1273 1115 return trigger_run_overwrite_then_insert(__trigger_reservation, trans, btree_id, level, old, new, flags); 1274 1116 } ··· 1276 1118 /* Mark superblocks: */ 1277 1119 1278 1120 static int __bch2_trans_mark_metadata_bucket(struct btree_trans *trans, 1279 - struct bch_dev *ca, size_t b, 1121 + struct bch_dev *ca, u64 b, 1280 1122 enum bch_data_type type, 1281 1123 unsigned sectors) 1282 1124 { 1283 1125 struct bch_fs *c = trans->c; 1284 1126 struct btree_iter iter; 1285 - struct bkey_i_alloc_v4 *a; 1286 1127 int ret = 0; 1287 1128 1288 - /* 1289 - * Backup superblock might be past the end of our normal usable space: 1290 - */ 1291 - if (b >= ca->mi.nbuckets) 1292 - return 0; 1293 - 1294 - a = bch2_trans_start_alloc_update(trans, &iter, POS(ca->dev_idx, b)); 1129 + struct bkey_i_alloc_v4 *a = 1130 + bch2_trans_start_alloc_update_noupdate(trans, &iter, POS(ca->dev_idx, b)); 1295 1131 if (IS_ERR(a)) 1296 1132 return PTR_ERR(a); 1297 1133 ··· 1313 1161 return ret; 1314 1162 } 1315 1163 1316 - int bch2_trans_mark_metadata_bucket(struct btree_trans *trans, 1317 - struct bch_dev *ca, size_t b, 1318 - enum bch_data_type type, 1319 - unsigned sectors) 1164 + static int bch2_mark_metadata_bucket(struct bch_fs *c, struct bch_dev *ca, 1165 + u64 b, enum bch_data_type data_type, unsigned sectors, 1166 + enum btree_iter_update_trigger_flags flags) 1320 1167 { 1321 - return commit_do(trans, NULL, NULL, 0, 1322 - __bch2_trans_mark_metadata_bucket(trans, ca, b, type, sectors)); 1168 + int ret = 0; 1169 + 1170 + percpu_down_read(&c->mark_lock); 1171 + struct bucket *g = gc_bucket(ca, b); 1172 + 1173 + bucket_lock(g); 1174 + struct bch_alloc_v4 old = bucket_m_to_alloc(*g); 1175 + 1176 + if (bch2_fs_inconsistent_on(g->data_type && 1177 + g->data_type != data_type, c, 1178 + "different types of data in same bucket: %s, %s", 1179 + bch2_data_type_str(g->data_type), 1180 + bch2_data_type_str(data_type))) { 1181 + ret = -EIO; 1182 + goto err; 1183 + } 1184 + 1185 + if (bch2_fs_inconsistent_on((u64) g->dirty_sectors + sectors > ca->mi.bucket_size, c, 1186 + "bucket %u:%llu gen %u data type %s sector count overflow: %u + %u > bucket size", 1187 + ca->dev_idx, b, g->gen, 1188 + bch2_data_type_str(g->data_type ?: data_type), 1189 + g->dirty_sectors, sectors)) { 1190 + ret = -EIO; 1191 + goto err; 1192 + } 1193 + 1194 + g->data_type = data_type; 1195 + g->dirty_sectors += sectors; 1196 + struct bch_alloc_v4 new = bucket_m_to_alloc(*g); 1197 + err: 1198 + bucket_unlock(g); 1199 + if (!ret) 1200 + bch2_dev_usage_update(c, ca, &old, &new, 0, true); 1201 + percpu_up_read(&c->mark_lock); 1202 + return ret; 1203 + } 1204 + 1205 + int bch2_trans_mark_metadata_bucket(struct btree_trans *trans, 1206 + struct bch_dev *ca, u64 b, 1207 + enum bch_data_type type, unsigned sectors, 1208 + enum btree_iter_update_trigger_flags flags) 1209 + { 1210 + BUG_ON(type != BCH_DATA_free && 1211 + type != BCH_DATA_sb && 1212 + type != BCH_DATA_journal); 1213 + 1214 + /* 1215 + * Backup superblock might be past the end of our normal usable space: 1216 + */ 1217 + if (b >= ca->mi.nbuckets) 1218 + return 0; 1219 + 1220 + if (flags & BTREE_TRIGGER_gc) 1221 + return bch2_mark_metadata_bucket(trans->c, ca, b, type, sectors, flags); 1222 + else if (flags & BTREE_TRIGGER_transactional) 1223 + return commit_do(trans, NULL, NULL, 0, 1224 + __bch2_trans_mark_metadata_bucket(trans, ca, b, type, sectors)); 1225 + else 1226 + BUG(); 1323 1227 } 1324 1228 1325 1229 static int bch2_trans_mark_metadata_sectors(struct btree_trans *trans, 1326 - struct bch_dev *ca, 1327 - u64 start, u64 end, 1328 - enum bch_data_type type, 1329 - u64 *bucket, unsigned *bucket_sectors) 1230 + struct bch_dev *ca, u64 start, u64 end, 1231 + enum bch_data_type type, u64 *bucket, unsigned *bucket_sectors, 1232 + enum btree_iter_update_trigger_flags flags) 1330 1233 { 1331 1234 do { 1332 1235 u64 b = sector_to_bucket(ca, start); ··· 1390 1183 1391 1184 if (b != *bucket && *bucket_sectors) { 1392 1185 int ret = bch2_trans_mark_metadata_bucket(trans, ca, *bucket, 1393 - type, *bucket_sectors); 1186 + type, *bucket_sectors, flags); 1394 1187 if (ret) 1395 1188 return ret; 1396 1189 ··· 1405 1198 return 0; 1406 1199 } 1407 1200 1408 - static int __bch2_trans_mark_dev_sb(struct btree_trans *trans, 1409 - struct bch_dev *ca) 1201 + static int __bch2_trans_mark_dev_sb(struct btree_trans *trans, struct bch_dev *ca, 1202 + enum btree_iter_update_trigger_flags flags) 1410 1203 { 1411 1204 struct bch_sb_layout *layout = &ca->disk_sb.sb->layout; 1412 1205 u64 bucket = 0; ··· 1419 1212 if (offset == BCH_SB_SECTOR) { 1420 1213 ret = bch2_trans_mark_metadata_sectors(trans, ca, 1421 1214 0, BCH_SB_SECTOR, 1422 - BCH_DATA_sb, &bucket, &bucket_sectors); 1215 + BCH_DATA_sb, &bucket, &bucket_sectors, flags); 1423 1216 if (ret) 1424 1217 return ret; 1425 1218 } 1426 1219 1427 1220 ret = bch2_trans_mark_metadata_sectors(trans, ca, offset, 1428 1221 offset + (1 << layout->sb_max_size_bits), 1429 - BCH_DATA_sb, &bucket, &bucket_sectors); 1222 + BCH_DATA_sb, &bucket, &bucket_sectors, flags); 1430 1223 if (ret) 1431 1224 return ret; 1432 1225 } 1433 1226 1434 1227 if (bucket_sectors) { 1435 1228 ret = bch2_trans_mark_metadata_bucket(trans, ca, 1436 - bucket, BCH_DATA_sb, bucket_sectors); 1229 + bucket, BCH_DATA_sb, bucket_sectors, flags); 1437 1230 if (ret) 1438 1231 return ret; 1439 1232 } ··· 1441 1234 for (i = 0; i < ca->journal.nr; i++) { 1442 1235 ret = bch2_trans_mark_metadata_bucket(trans, ca, 1443 1236 ca->journal.buckets[i], 1444 - BCH_DATA_journal, ca->mi.bucket_size); 1237 + BCH_DATA_journal, ca->mi.bucket_size, flags); 1445 1238 if (ret) 1446 1239 return ret; 1447 1240 } ··· 1449 1242 return 0; 1450 1243 } 1451 1244 1452 - int bch2_trans_mark_dev_sb(struct bch_fs *c, struct bch_dev *ca) 1245 + int bch2_trans_mark_dev_sb(struct bch_fs *c, struct bch_dev *ca, 1246 + enum btree_iter_update_trigger_flags flags) 1453 1247 { 1454 - int ret = bch2_trans_run(c, __bch2_trans_mark_dev_sb(trans, ca)); 1455 - 1248 + int ret = bch2_trans_run(c, 1249 + __bch2_trans_mark_dev_sb(trans, ca, flags)); 1456 1250 bch_err_fn(c, ret); 1457 1251 return ret; 1458 1252 } 1459 1253 1460 - int bch2_trans_mark_dev_sbs(struct bch_fs *c) 1254 + int bch2_trans_mark_dev_sbs_flags(struct bch_fs *c, 1255 + enum btree_iter_update_trigger_flags flags) 1461 1256 { 1462 1257 for_each_online_member(c, ca) { 1463 - int ret = bch2_trans_mark_dev_sb(c, ca); 1258 + int ret = bch2_trans_mark_dev_sb(c, ca, flags); 1464 1259 if (ret) { 1465 - percpu_ref_put(&ca->ref); 1260 + bch2_dev_put(ca); 1466 1261 return ret; 1467 1262 } 1468 1263 } 1469 1264 1470 1265 return 0; 1266 + } 1267 + 1268 + int bch2_trans_mark_dev_sbs(struct bch_fs *c) 1269 + { 1270 + return bch2_trans_mark_dev_sbs_flags(c, BTREE_TRIGGER_transactional); 1471 1271 } 1472 1272 1473 1273 /* Disk reservations: */ ··· 1545 1331 1546 1332 /* Startup/shutdown: */ 1547 1333 1334 + void bch2_buckets_nouse_free(struct bch_fs *c) 1335 + { 1336 + for_each_member_device(c, ca) { 1337 + kvfree_rcu_mightsleep(ca->buckets_nouse); 1338 + ca->buckets_nouse = NULL; 1339 + } 1340 + } 1341 + 1342 + int bch2_buckets_nouse_alloc(struct bch_fs *c) 1343 + { 1344 + for_each_member_device(c, ca) { 1345 + BUG_ON(ca->buckets_nouse); 1346 + 1347 + ca->buckets_nouse = kvmalloc(BITS_TO_LONGS(ca->mi.nbuckets) * 1348 + sizeof(unsigned long), 1349 + GFP_KERNEL|__GFP_ZERO); 1350 + if (!ca->buckets_nouse) { 1351 + bch2_dev_put(ca); 1352 + return -BCH_ERR_ENOMEM_buckets_nouse; 1353 + } 1354 + } 1355 + 1356 + return 0; 1357 + } 1358 + 1548 1359 static void bucket_gens_free_rcu(struct rcu_head *rcu) 1549 1360 { 1550 1361 struct bucket_gens *buckets = ··· 1581 1342 int bch2_dev_buckets_resize(struct bch_fs *c, struct bch_dev *ca, u64 nbuckets) 1582 1343 { 1583 1344 struct bucket_gens *bucket_gens = NULL, *old_bucket_gens = NULL; 1584 - unsigned long *buckets_nouse = NULL; 1585 1345 bool resize = ca->bucket_gens != NULL; 1586 1346 int ret; 1347 + 1348 + BUG_ON(resize && ca->buckets_nouse); 1587 1349 1588 1350 if (!(bucket_gens = kvmalloc(sizeof(struct bucket_gens) + nbuckets, 1589 1351 GFP_KERNEL|__GFP_ZERO))) { 1590 1352 ret = -BCH_ERR_ENOMEM_bucket_gens; 1591 - goto err; 1592 - } 1593 - 1594 - if ((c->opts.buckets_nouse && 1595 - !(buckets_nouse = kvmalloc(BITS_TO_LONGS(nbuckets) * 1596 - sizeof(unsigned long), 1597 - GFP_KERNEL|__GFP_ZERO)))) { 1598 - ret = -BCH_ERR_ENOMEM_buckets_nouse; 1599 1353 goto err; 1600 1354 } 1601 1355 ··· 1609 1377 memcpy(bucket_gens->b, 1610 1378 old_bucket_gens->b, 1611 1379 n); 1612 - if (buckets_nouse) 1613 - memcpy(buckets_nouse, 1614 - ca->buckets_nouse, 1615 - BITS_TO_LONGS(n) * sizeof(unsigned long)); 1616 1380 } 1617 1381 1618 1382 rcu_assign_pointer(ca->bucket_gens, bucket_gens); 1619 1383 bucket_gens = old_bucket_gens; 1620 - 1621 - swap(ca->buckets_nouse, buckets_nouse); 1622 1384 1623 1385 nbuckets = ca->mi.nbuckets; 1624 1386 ··· 1624 1398 1625 1399 ret = 0; 1626 1400 err: 1627 - kvfree(buckets_nouse); 1628 1401 if (bucket_gens) 1629 1402 call_rcu(&bucket_gens->rcu, bucket_gens_free_rcu); 1630 1403
+37 -33
fs/bcachefs/buckets.h
··· 12 12 #include "extents.h" 13 13 #include "sb-members.h" 14 14 15 - static inline size_t sector_to_bucket(const struct bch_dev *ca, sector_t s) 15 + static inline u64 sector_to_bucket(const struct bch_dev *ca, sector_t s) 16 16 { 17 17 return div_u64(s, ca->mi.bucket_size); 18 18 } ··· 30 30 return remainder; 31 31 } 32 32 33 - static inline size_t sector_to_bucket_and_offset(const struct bch_dev *ca, sector_t s, 34 - u32 *offset) 33 + static inline u64 sector_to_bucket_and_offset(const struct bch_dev *ca, sector_t s, u32 *offset) 35 34 { 36 35 return div_u64_rem(s, ca->mi.bucket_size, offset); 37 36 } ··· 93 94 { 94 95 struct bucket_array *buckets = gc_bucket_array(ca); 95 96 96 - BUG_ON(b < buckets->first_bucket || b >= buckets->nbuckets); 97 + BUG_ON(!bucket_valid(ca, b)); 97 98 return buckets->b + b; 98 99 } 99 100 ··· 110 111 { 111 112 struct bucket_gens *gens = bucket_gens(ca); 112 113 113 - BUG_ON(b < gens->first_bucket || b >= gens->nbuckets); 114 + BUG_ON(!bucket_valid(ca, b)); 114 115 return gens->b + b; 115 116 } 116 117 ··· 120 121 return sector_to_bucket(ca, ptr->offset); 121 122 } 122 123 123 - static inline struct bpos PTR_BUCKET_POS(const struct bch_fs *c, 124 - const struct bch_extent_ptr *ptr) 124 + static inline struct bpos PTR_BUCKET_POS(const struct bch_dev *ca, 125 + const struct bch_extent_ptr *ptr) 125 126 { 126 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 127 - 128 127 return POS(ptr->dev, PTR_BUCKET_NR(ca, ptr)); 129 128 } 130 129 131 - static inline struct bpos PTR_BUCKET_POS_OFFSET(const struct bch_fs *c, 130 + static inline struct bpos PTR_BUCKET_POS_OFFSET(const struct bch_dev *ca, 132 131 const struct bch_extent_ptr *ptr, 133 132 u32 *bucket_offset) 134 133 { 135 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 136 - 137 134 return POS(ptr->dev, sector_to_bucket_and_offset(ca, ptr->offset, bucket_offset)); 138 135 } 139 136 ··· 170 175 return r > 0 ? r : 0; 171 176 } 172 177 178 + static inline u8 dev_ptr_stale_rcu(struct bch_dev *ca, const struct bch_extent_ptr *ptr) 179 + { 180 + return gen_after(*bucket_gen(ca, PTR_BUCKET_NR(ca, ptr)), ptr->gen); 181 + } 182 + 173 183 /** 174 - * ptr_stale() - check if a pointer points into a bucket that has been 184 + * dev_ptr_stale() - check if a pointer points into a bucket that has been 175 185 * invalidated. 176 186 */ 177 - static inline u8 ptr_stale(struct bch_dev *ca, 178 - const struct bch_extent_ptr *ptr) 187 + static inline u8 dev_ptr_stale(struct bch_dev *ca, const struct bch_extent_ptr *ptr) 179 188 { 180 - u8 ret; 181 - 182 189 rcu_read_lock(); 183 - ret = gen_after(*bucket_gen(ca, PTR_BUCKET_NR(ca, ptr)), ptr->gen); 190 + u8 ret = dev_ptr_stale_rcu(ca, ptr); 184 191 rcu_read_unlock(); 185 192 186 193 return ret; ··· 303 306 void bch2_dev_usage_update(struct bch_fs *, struct bch_dev *, 304 307 const struct bch_alloc_v4 *, 305 308 const struct bch_alloc_v4 *, u64, bool); 306 - void bch2_dev_usage_update_m(struct bch_fs *, struct bch_dev *, 307 - struct bucket *, struct bucket *); 308 309 309 310 /* key/bucket marking: */ 310 311 ··· 328 333 329 334 void bch2_fs_usage_initialize(struct bch_fs *); 330 335 331 - int bch2_check_bucket_ref(struct btree_trans *, struct bkey_s_c, 332 - const struct bch_extent_ptr *, 333 - s64, enum bch_data_type, u8, u8, u32); 336 + int bch2_bucket_ref_update(struct btree_trans *, struct bch_dev *, 337 + struct bkey_s_c, const struct bch_extent_ptr *, 338 + s64, enum bch_data_type, u8, u8, u32 *); 334 339 335 - int bch2_mark_metadata_bucket(struct bch_fs *, struct bch_dev *, 336 - size_t, enum bch_data_type, unsigned, 337 - struct gc_pos, unsigned); 340 + int bch2_check_fix_ptrs(struct btree_trans *, 341 + enum btree_id, unsigned, struct bkey_s_c, 342 + enum btree_iter_update_trigger_flags); 338 343 339 344 int bch2_trigger_extent(struct btree_trans *, enum btree_id, unsigned, 340 - struct bkey_s_c, struct bkey_s, unsigned); 345 + struct bkey_s_c, struct bkey_s, 346 + enum btree_iter_update_trigger_flags); 341 347 int bch2_trigger_reservation(struct btree_trans *, enum btree_id, unsigned, 342 - struct bkey_s_c, struct bkey_s, unsigned); 348 + struct bkey_s_c, struct bkey_s, 349 + enum btree_iter_update_trigger_flags); 343 350 344 351 #define trigger_run_overwrite_then_insert(_fn, _trans, _btree_id, _level, _old, _new, _flags)\ 345 352 ({ \ 346 353 int ret = 0; \ 347 354 \ 348 355 if (_old.k->type) \ 349 - ret = _fn(_trans, _btree_id, _level, _old, _flags & ~BTREE_TRIGGER_INSERT); \ 356 + ret = _fn(_trans, _btree_id, _level, _old, _flags & ~BTREE_TRIGGER_insert); \ 350 357 if (!ret && _new.k->type) \ 351 - ret = _fn(_trans, _btree_id, _level, _new.s_c, _flags & ~BTREE_TRIGGER_OVERWRITE);\ 358 + ret = _fn(_trans, _btree_id, _level, _new.s_c, _flags & ~BTREE_TRIGGER_overwrite);\ 352 359 ret; \ 353 360 }) 354 361 ··· 359 362 void bch2_trans_fs_usage_revert(struct btree_trans *, struct replicas_delta_list *); 360 363 int bch2_trans_fs_usage_apply(struct btree_trans *, struct replicas_delta_list *); 361 364 362 - int bch2_trans_mark_metadata_bucket(struct btree_trans *, struct bch_dev *, 363 - size_t, enum bch_data_type, unsigned); 364 - int bch2_trans_mark_dev_sb(struct bch_fs *, struct bch_dev *); 365 + int bch2_trans_mark_metadata_bucket(struct btree_trans *, struct bch_dev *, u64, 366 + enum bch_data_type, unsigned, 367 + enum btree_iter_update_trigger_flags); 368 + int bch2_trans_mark_dev_sb(struct bch_fs *, struct bch_dev *, 369 + enum btree_iter_update_trigger_flags); 370 + int bch2_trans_mark_dev_sbs_flags(struct bch_fs *, 371 + enum btree_iter_update_trigger_flags); 365 372 int bch2_trans_mark_dev_sbs(struct bch_fs *); 366 373 367 374 static inline bool is_superblock_bucket(struct bch_dev *ca, u64 b) ··· 464 463 { 465 464 return div_u64(r << RESERVE_FACTOR, (1 << RESERVE_FACTOR) + 1); 466 465 } 466 + 467 + void bch2_buckets_nouse_free(struct bch_fs *); 468 + int bch2_buckets_nouse_alloc(struct bch_fs *); 467 469 468 470 int bch2_dev_buckets_resize(struct bch_fs *, struct bch_dev *, u64); 469 471 void bch2_dev_buckets_free(struct bch_dev *);
+34 -32
fs/bcachefs/chardev.c
··· 32 32 if (dev >= c->sb.nr_devices) 33 33 return ERR_PTR(-EINVAL); 34 34 35 - rcu_read_lock(); 36 - ca = rcu_dereference(c->devs[dev]); 37 - if (ca) 38 - percpu_ref_get(&ca->ref); 39 - rcu_read_unlock(); 40 - 35 + ca = bch2_dev_tryget_noerror(c, dev); 41 36 if (!ca) 42 37 return ERR_PTR(-EINVAL); 43 38 } else { ··· 386 391 return PTR_ERR(ca); 387 392 388 393 ret = bch2_dev_offline(c, ca, arg.flags); 389 - percpu_ref_put(&ca->ref); 394 + bch2_dev_put(ca); 390 395 return ret; 391 396 } 392 397 ··· 415 420 if (ret) 416 421 bch_err(c, "Error setting device state: %s", bch2_err_str(ret)); 417 422 418 - percpu_ref_put(&ca->ref); 423 + bch2_dev_put(ca); 419 424 return ret; 420 425 } 421 426 ··· 610 615 arg.d[i].fragmented = src.d[i].fragmented; 611 616 } 612 617 613 - percpu_ref_put(&ca->ref); 618 + bch2_dev_put(ca); 614 619 615 620 return copy_to_user_errcode(user_arg, &arg, sizeof(arg)); 616 621 } ··· 662 667 goto err; 663 668 } 664 669 err: 665 - percpu_ref_put(&ca->ref); 670 + bch2_dev_put(ca); 666 671 return ret; 667 672 } 668 673 ··· 684 689 685 690 if (arg.flags & BCH_READ_DEV) { 686 691 ca = bch2_device_lookup(c, arg.dev, arg.flags); 687 - 688 - if (IS_ERR(ca)) { 689 - ret = PTR_ERR(ca); 690 - goto err; 691 - } 692 + ret = PTR_ERR_OR_ZERO(ca); 693 + if (ret) 694 + goto err_unlock; 692 695 693 696 sb = ca->disk_sb.sb; 694 697 } else { ··· 701 708 ret = copy_to_user_errcode((void __user *)(unsigned long)arg.sb, sb, 702 709 vstruct_bytes(sb)); 703 710 err: 704 - if (!IS_ERR_OR_NULL(ca)) 705 - percpu_ref_put(&ca->ref); 711 + bch2_dev_put(ca); 712 + err_unlock: 706 713 mutex_unlock(&c->sb_lock); 707 714 return ret; 708 715 } ··· 746 753 747 754 ret = bch2_dev_resize(c, ca, arg.nbuckets); 748 755 749 - percpu_ref_put(&ca->ref); 756 + bch2_dev_put(ca); 750 757 return ret; 751 758 } 752 759 ··· 772 779 773 780 ret = bch2_set_nr_journal_buckets(c, ca, arg.nbuckets); 774 781 775 - percpu_ref_put(&ca->ref); 782 + bch2_dev_put(ca); 776 783 return ret; 777 784 } 778 785 ··· 954 961 }; 955 962 956 963 static int bch_chardev_major; 957 - static struct class *bch_chardev_class; 964 + static const struct class bch_chardev_class = { 965 + .name = "bcachefs", 966 + }; 958 967 static struct device *bch_chardev; 959 968 960 969 void bch2_fs_chardev_exit(struct bch_fs *c) ··· 973 978 if (c->minor < 0) 974 979 return c->minor; 975 980 976 - c->chardev = device_create(bch_chardev_class, NULL, 981 + c->chardev = device_create(&bch_chardev_class, NULL, 977 982 MKDEV(bch_chardev_major, c->minor), c, 978 983 "bcachefs%u-ctl", c->minor); 979 984 if (IS_ERR(c->chardev)) ··· 984 989 985 990 void bch2_chardev_exit(void) 986 991 { 987 - if (!IS_ERR_OR_NULL(bch_chardev_class)) 988 - device_destroy(bch_chardev_class, 989 - MKDEV(bch_chardev_major, U8_MAX)); 990 - if (!IS_ERR_OR_NULL(bch_chardev_class)) 991 - class_destroy(bch_chardev_class); 992 + device_destroy(&bch_chardev_class, MKDEV(bch_chardev_major, U8_MAX)); 993 + class_unregister(&bch_chardev_class); 992 994 if (bch_chardev_major > 0) 993 995 unregister_chrdev(bch_chardev_major, "bcachefs"); 994 996 } 995 997 996 998 int __init bch2_chardev_init(void) 997 999 { 1000 + int ret; 1001 + 998 1002 bch_chardev_major = register_chrdev(0, "bcachefs-ctl", &bch_chardev_fops); 999 1003 if (bch_chardev_major < 0) 1000 1004 return bch_chardev_major; 1001 1005 1002 - bch_chardev_class = class_create("bcachefs"); 1003 - if (IS_ERR(bch_chardev_class)) 1004 - return PTR_ERR(bch_chardev_class); 1006 + ret = class_register(&bch_chardev_class); 1007 + if (ret) 1008 + goto major_out; 1005 1009 1006 - bch_chardev = device_create(bch_chardev_class, NULL, 1010 + bch_chardev = device_create(&bch_chardev_class, NULL, 1007 1011 MKDEV(bch_chardev_major, U8_MAX), 1008 1012 NULL, "bcachefs-ctl"); 1009 - if (IS_ERR(bch_chardev)) 1010 - return PTR_ERR(bch_chardev); 1013 + if (IS_ERR(bch_chardev)) { 1014 + ret = PTR_ERR(bch_chardev); 1015 + goto class_out; 1016 + } 1011 1017 1012 1018 return 0; 1019 + 1020 + class_out: 1021 + class_unregister(&bch_chardev_class); 1022 + major_out: 1023 + unregister_chrdev(bch_chardev_major, "bcachefs-ctl"); 1024 + return ret; 1013 1025 } 1014 1026 1015 1027 #endif /* NO_BCACHEFS_CHARDEV */
+6 -11
fs/bcachefs/checksum.c
··· 469 469 470 470 /* BCH_SB_FIELD_crypt: */ 471 471 472 - static int bch2_sb_crypt_validate(struct bch_sb *sb, 473 - struct bch_sb_field *f, 474 - struct printbuf *err) 472 + static int bch2_sb_crypt_validate(struct bch_sb *sb, struct bch_sb_field *f, 473 + enum bch_validate_flags flags, struct printbuf *err) 475 474 { 476 475 struct bch_sb_field_crypt *crypt = field_to_type(f, crypt); 477 476 ··· 493 494 { 494 495 struct bch_sb_field_crypt *crypt = field_to_type(f, crypt); 495 496 496 - prt_printf(out, "KFD: %llu", BCH_CRYPT_KDF_TYPE(crypt)); 497 - prt_newline(out); 498 - prt_printf(out, "scrypt n: %llu", BCH_KDF_SCRYPT_N(crypt)); 499 - prt_newline(out); 500 - prt_printf(out, "scrypt r: %llu", BCH_KDF_SCRYPT_R(crypt)); 501 - prt_newline(out); 502 - prt_printf(out, "scrypt p: %llu", BCH_KDF_SCRYPT_P(crypt)); 503 - prt_newline(out); 497 + prt_printf(out, "KFD: %llu\n", BCH_CRYPT_KDF_TYPE(crypt)); 498 + prt_printf(out, "scrypt n: %llu\n", BCH_KDF_SCRYPT_N(crypt)); 499 + prt_printf(out, "scrypt r: %llu\n", BCH_KDF_SCRYPT_R(crypt)); 500 + prt_printf(out, "scrypt p: %llu\n", BCH_KDF_SCRYPT_P(crypt)); 504 501 } 505 502 506 503 const struct bch_sb_field_ops bch_sb_field_ops_crypt = {
+35 -19
fs/bcachefs/data_update.c
··· 106 106 107 107 bch2_trans_iter_init(trans, &iter, m->btree_id, 108 108 bkey_start_pos(&bch2_keylist_front(keys)->k), 109 - BTREE_ITER_SLOTS|BTREE_ITER_INTENT); 109 + BTREE_ITER_slots|BTREE_ITER_intent); 110 110 111 111 while (1) { 112 112 struct bkey_s_c k; ··· 203 203 204 204 /* Now, drop excess replicas: */ 205 205 restart_drop_extra_replicas: 206 + 207 + rcu_read_lock(); 206 208 bkey_for_each_ptr_decode(old.k, bch2_bkey_ptrs(bkey_i_to_s(insert)), p, entry) { 207 209 unsigned ptr_durability = bch2_extent_ptr_durability(c, &p); 208 210 ··· 216 214 goto restart_drop_extra_replicas; 217 215 } 218 216 } 217 + rcu_read_unlock(); 219 218 220 219 /* Finally, add the pointers we just wrote: */ 221 220 extent_for_each_ptr_decode(extent_i_to_s(new), p, entry) ··· 291 288 k.k->p, insert->k.p) ?: 292 289 bch2_bkey_set_needs_rebalance(c, insert, &op->opts) ?: 293 290 bch2_trans_update(trans, &iter, insert, 294 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?: 291 + BTREE_UPDATE_internal_snapshot_node) ?: 295 292 bch2_trans_commit(trans, &op->res, 296 293 NULL, 297 294 BCH_TRANS_COMMIT_no_check_rw| ··· 360 357 bch2_bkey_ptrs_c(bkey_i_to_s_c(update->k.k)); 361 358 362 359 bkey_for_each_ptr(ptrs, ptr) { 360 + struct bch_dev *ca = bch2_dev_have_ref(c, ptr->dev); 363 361 if (c->opts.nocow_enabled) 364 362 bch2_bucket_nocow_unlock(&c->nocow_locks, 365 - PTR_BUCKET_POS(c, ptr), 0); 366 - percpu_ref_put(&bch_dev_bkey_exists(c, ptr->dev)->ref); 363 + PTR_BUCKET_POS(ca, ptr), 0); 364 + bch2_dev_put(ca); 367 365 } 368 366 369 367 bch2_bkey_buf_exit(&update->k, c); ··· 390 386 while (bio_sectors(bio)) { 391 387 unsigned sectors = bio_sectors(bio); 392 388 389 + bch2_trans_begin(trans); 390 + 393 391 bch2_trans_iter_init(trans, &iter, update->btree_id, update->op.pos, 394 - BTREE_ITER_SLOTS); 392 + BTREE_ITER_slots); 395 393 ret = lockrestart_do(trans, ({ 396 394 k = bch2_btree_iter_peek_slot(&iter); 397 395 bkey_err(k); ··· 471 465 472 466 while (data_opts.kill_ptrs) { 473 467 unsigned i = 0, drop = __fls(data_opts.kill_ptrs); 474 - struct bch_extent_ptr *ptr; 475 468 476 469 bch2_bkey_drop_ptrs(bkey_i_to_s(n), ptr, i++ == drop); 477 470 data_opts.kill_ptrs ^= 1U << drop; ··· 485 480 486 481 /* 487 482 * Since we're not inserting through an extent iterator 488 - * (BTREE_ITER_ALL_SNAPSHOTS iterators aren't extent iterators), 483 + * (BTREE_ITER_all_snapshots iterators aren't extent iterators), 489 484 * we aren't using the extent overwrite path to delete, we're 490 485 * just using the normal key deletion path: 491 486 */ 492 - if (bkey_deleted(&n->k) && !(iter->flags & BTREE_ITER_IS_EXTENTS)) 487 + if (bkey_deleted(&n->k) && !(iter->flags & BTREE_ITER_is_extents)) 493 488 n->k.size = 0; 494 489 495 490 return bch2_trans_relock(trans) ?: 496 - bch2_trans_update(trans, iter, n, BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?: 491 + bch2_trans_update(trans, iter, n, BTREE_UPDATE_internal_snapshot_node) ?: 497 492 bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); 498 493 } 499 494 ··· 544 539 m->op.compression_opt = background_compression(io_opts); 545 540 m->op.watermark = m->data_opts.btree_insert_flags & BCH_WATERMARK_MASK; 546 541 547 - bkey_for_each_ptr(ptrs, ptr) 548 - percpu_ref_get(&bch_dev_bkey_exists(c, ptr->dev)->ref); 542 + bkey_for_each_ptr(ptrs, ptr) { 543 + if (!bch2_dev_tryget(c, ptr->dev)) { 544 + bkey_for_each_ptr(ptrs, ptr2) { 545 + if (ptr2 == ptr) 546 + break; 547 + bch2_dev_put(bch2_dev_have_ref(c, ptr2->dev)); 548 + } 549 + return -BCH_ERR_data_update_done; 550 + } 551 + } 549 552 550 553 unsigned durability_have = 0, durability_removing = 0; 551 554 552 555 i = 0; 553 556 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { 557 + struct bch_dev *ca = bch2_dev_have_ref(c, p.ptr.dev); 558 + struct bpos bucket = PTR_BUCKET_POS(ca, &p.ptr); 554 559 bool locked; 555 560 561 + rcu_read_lock(); 556 562 if (((1U << i) & m->data_opts.rewrite_ptrs)) { 557 563 BUG_ON(p.ptr.cached); 558 564 ··· 577 561 bch2_dev_list_add_dev(&m->op.devs_have, p.ptr.dev); 578 562 durability_have += bch2_extent_ptr_durability(c, &p); 579 563 } 564 + rcu_read_unlock(); 580 565 581 566 /* 582 567 * op->csum_type is normally initialized from the fs/file's ··· 596 579 if (ctxt) { 597 580 move_ctxt_wait_event(ctxt, 598 581 (locked = bch2_bucket_nocow_trylock(&c->nocow_locks, 599 - PTR_BUCKET_POS(c, &p.ptr), 0)) || 582 + bucket, 0)) || 600 583 list_empty(&ctxt->ios)); 601 584 602 585 if (!locked) 603 - bch2_bucket_nocow_lock(&c->nocow_locks, 604 - PTR_BUCKET_POS(c, &p.ptr), 0); 586 + bch2_bucket_nocow_lock(&c->nocow_locks, bucket, 0); 605 587 } else { 606 - if (!bch2_bucket_nocow_trylock(&c->nocow_locks, 607 - PTR_BUCKET_POS(c, &p.ptr), 0)) { 588 + if (!bch2_bucket_nocow_trylock(&c->nocow_locks, bucket, 0)) { 608 589 ret = -BCH_ERR_nocow_lock_blocked; 609 590 goto err; 610 591 } ··· 664 649 err: 665 650 i = 0; 666 651 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { 652 + struct bch_dev *ca = bch2_dev_have_ref(c, p.ptr.dev); 653 + struct bpos bucket = PTR_BUCKET_POS(ca, &p.ptr); 667 654 if ((1U << i) & ptrs_locked) 668 - bch2_bucket_nocow_unlock(&c->nocow_locks, 669 - PTR_BUCKET_POS(c, &p.ptr), 0); 670 - percpu_ref_put(&bch_dev_bkey_exists(c, p.ptr.dev)->ref); 655 + bch2_bucket_nocow_unlock(&c->nocow_locks, bucket, 0); 656 + bch2_dev_put(ca); 671 657 i++; 672 658 } 673 659
+24 -56
fs/bcachefs/debug.c
··· 37 37 struct btree_node *n_ondisk = c->verify_ondisk; 38 38 struct btree_node *n_sorted = c->verify_data->data; 39 39 struct bset *sorted, *inmemory = &b->data->keys; 40 - struct bch_dev *ca = bch_dev_bkey_exists(c, pick.ptr.dev); 41 40 struct bio *bio; 42 41 bool failed = false, saw_error = false; 43 42 44 - if (!bch2_dev_get_ioref(ca, READ)) 43 + struct bch_dev *ca = bch2_dev_get_ioref(c, pick.ptr.dev, READ); 44 + if (!ca) 45 45 return false; 46 46 47 47 bio = bio_alloc_bioset(ca->disk_sb.bdev, ··· 194 194 return; 195 195 } 196 196 197 - ca = bch_dev_bkey_exists(c, pick.ptr.dev); 198 - if (!bch2_dev_get_ioref(ca, READ)) { 197 + ca = bch2_dev_get_ioref(c, pick.ptr.dev, READ); 198 + if (!ca) { 199 199 prt_printf(out, "error getting device to read from: not online\n"); 200 200 return; 201 201 } ··· 375 375 return flush_buf(i) ?: 376 376 bch2_trans_run(i->c, 377 377 for_each_btree_key(trans, iter, i->id, i->from, 378 - BTREE_ITER_PREFETCH| 379 - BTREE_ITER_ALL_SNAPSHOTS, k, ({ 378 + BTREE_ITER_prefetch| 379 + BTREE_ITER_all_snapshots, k, ({ 380 380 bch2_bkey_val_to_text(&i->buf, i->c, k); 381 381 prt_newline(&i->buf); 382 382 bch2_trans_unlock(trans); ··· 459 459 return flush_buf(i) ?: 460 460 bch2_trans_run(i->c, 461 461 for_each_btree_key(trans, iter, i->id, i->from, 462 - BTREE_ITER_PREFETCH| 463 - BTREE_ITER_ALL_SNAPSHOTS, k, ({ 462 + BTREE_ITER_prefetch| 463 + BTREE_ITER_all_snapshots, k, ({ 464 464 struct btree_path_level *l = 465 465 &btree_iter_path(trans, &iter)->l[0]; 466 466 struct bkey_packed *_k = ··· 492 492 if (!out->nr_tabstops) 493 493 printbuf_tabstop_push(out, 32); 494 494 495 - prt_printf(out, "%px btree=%s l=%u ", 496 - b, 497 - bch2_btree_id_str(b->c.btree_id), 498 - b->c.level); 499 - prt_newline(out); 495 + prt_printf(out, "%px btree=%s l=%u\n", b, bch2_btree_id_str(b->c.btree_id), b->c.level); 500 496 501 497 printbuf_indent_add(out, 2); 502 498 503 499 bch2_bkey_val_to_text(out, c, bkey_i_to_s_c(&b->key)); 504 500 prt_newline(out); 505 501 506 - prt_printf(out, "flags: "); 507 - prt_tab(out); 502 + prt_printf(out, "flags:\t"); 508 503 prt_bitflags(out, bch2_btree_node_flags, b->flags); 509 504 prt_newline(out); 510 505 511 - prt_printf(out, "pcpu read locks: "); 512 - prt_tab(out); 513 - prt_printf(out, "%u", b->c.lock.readers != NULL); 514 - prt_newline(out); 506 + prt_printf(out, "pcpu read locks:\t%u\n", b->c.lock.readers != NULL); 507 + prt_printf(out, "written:\t%u\n", b->written); 508 + prt_printf(out, "writes blocked:\t%u\n", !list_empty_careful(&b->write_blocked)); 509 + prt_printf(out, "will make reachable:\t%lx\n", b->will_make_reachable); 515 510 516 - prt_printf(out, "written:"); 517 - prt_tab(out); 518 - prt_printf(out, "%u", b->written); 519 - prt_newline(out); 520 - 521 - prt_printf(out, "writes blocked:"); 522 - prt_tab(out); 523 - prt_printf(out, "%u", !list_empty_careful(&b->write_blocked)); 524 - prt_newline(out); 525 - 526 - prt_printf(out, "will make reachable:"); 527 - prt_tab(out); 528 - prt_printf(out, "%lx", b->will_make_reachable); 529 - prt_newline(out); 530 - 531 - prt_printf(out, "journal pin %px:", &b->writes[0].journal); 532 - prt_tab(out); 533 - prt_printf(out, "%llu", b->writes[0].journal.seq); 534 - prt_newline(out); 535 - 536 - prt_printf(out, "journal pin %px:", &b->writes[1].journal); 537 - prt_tab(out); 538 - prt_printf(out, "%llu", b->writes[1].journal.seq); 539 - prt_newline(out); 511 + prt_printf(out, "journal pin %px:\t%llu\n", 512 + &b->writes[0].journal, b->writes[0].journal.seq); 513 + prt_printf(out, "journal pin %px:\t%llu\n", 514 + &b->writes[1].journal, b->writes[1].journal.seq); 540 515 541 516 printbuf_indent_sub(out, 2); 542 517 } ··· 600 625 601 626 bch2_btree_trans_to_text(&i->buf, trans); 602 627 603 - prt_printf(&i->buf, "backtrace:"); 604 - prt_newline(&i->buf); 628 + prt_printf(&i->buf, "backtrace:\n"); 605 629 printbuf_indent_add(&i->buf, 2); 606 630 bch2_prt_task_backtrace(&i->buf, task, 0, GFP_KERNEL); 607 631 printbuf_indent_sub(&i->buf, 2); ··· 756 782 !bch2_btree_transaction_fns[i->iter]) 757 783 break; 758 784 759 - prt_printf(&i->buf, "%s: ", bch2_btree_transaction_fns[i->iter]); 760 - prt_newline(&i->buf); 785 + prt_printf(&i->buf, "%s:\n", bch2_btree_transaction_fns[i->iter]); 761 786 printbuf_indent_add(&i->buf, 2); 762 787 763 788 mutex_lock(&s->lock); 764 789 765 - prt_printf(&i->buf, "Max mem used: %u", s->max_mem); 766 - prt_newline(&i->buf); 767 - 768 - prt_printf(&i->buf, "Transaction duration:"); 769 - prt_newline(&i->buf); 790 + prt_printf(&i->buf, "Max mem used: %u\n", s->max_mem); 791 + prt_printf(&i->buf, "Transaction duration:\n"); 770 792 771 793 printbuf_indent_add(&i->buf, 2); 772 794 bch2_time_stats_to_text(&i->buf, &s->duration); 773 795 printbuf_indent_sub(&i->buf, 2); 774 796 775 797 if (IS_ENABLED(CONFIG_BCACHEFS_LOCK_TIME_STATS)) { 776 - prt_printf(&i->buf, "Lock hold times:"); 777 - prt_newline(&i->buf); 798 + prt_printf(&i->buf, "Lock hold times:\n"); 778 799 779 800 printbuf_indent_add(&i->buf, 2); 780 801 bch2_time_stats_to_text(&i->buf, &s->lock_hold_times); ··· 777 808 } 778 809 779 810 if (s->max_paths_text) { 780 - prt_printf(&i->buf, "Maximum allocated btree paths (%u):", s->nr_max_paths); 781 - prt_newline(&i->buf); 811 + prt_printf(&i->buf, "Maximum allocated btree paths (%u):\n", s->nr_max_paths); 782 812 783 813 printbuf_indent_add(&i->buf, 2); 784 814 prt_str_indented(&i->buf, s->max_paths_text);
+45 -52
fs/bcachefs/dirent.c
··· 98 98 }; 99 99 100 100 int bch2_dirent_invalid(struct bch_fs *c, struct bkey_s_c k, 101 - enum bkey_invalid_flags flags, 101 + enum bch_validate_flags flags, 102 102 struct printbuf *err) 103 103 { 104 104 struct bkey_s_c_dirent d = bkey_s_c_to_dirent(k); ··· 118 118 * Check new keys don't exceed the max length 119 119 * (older keys may be larger.) 120 120 */ 121 - bkey_fsck_err_on((flags & BKEY_INVALID_COMMIT) && d_name.len > BCH_NAME_MAX, c, err, 121 + bkey_fsck_err_on((flags & BCH_VALIDATE_commit) && d_name.len > BCH_NAME_MAX, c, err, 122 122 dirent_name_too_long, 123 123 "dirent name too big (%u > %u)", 124 124 d_name.len, BCH_NAME_MAX); ··· 205 205 const struct bch_hash_info *hash_info, 206 206 u8 type, const struct qstr *name, u64 dst_inum, 207 207 u64 *dir_offset, 208 - bch_str_hash_flags_t str_hash_flags) 208 + enum btree_iter_update_trigger_flags flags) 209 209 { 210 210 subvol_inum dir_inum = { .subvol = dir_subvol, .inum = dir }; 211 211 struct bkey_i_dirent *dirent; ··· 220 220 dirent->k.p.snapshot = snapshot; 221 221 222 222 ret = bch2_hash_set_in_snapshot(trans, bch2_dirent_hash_desc, hash_info, 223 - dir_inum, snapshot, 224 - &dirent->k_i, str_hash_flags, 225 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 223 + dir_inum, snapshot, &dirent->k_i, 224 + flags|BTREE_UPDATE_internal_snapshot_node); 226 225 *dir_offset = dirent->k.p.offset; 227 226 228 227 return ret; ··· 231 232 const struct bch_hash_info *hash_info, 232 233 u8 type, const struct qstr *name, u64 dst_inum, 233 234 u64 *dir_offset, 234 - bch_str_hash_flags_t str_hash_flags) 235 + enum btree_iter_update_trigger_flags flags) 235 236 { 236 237 struct bkey_i_dirent *dirent; 237 238 int ret; ··· 242 243 return ret; 243 244 244 245 ret = bch2_hash_set(trans, bch2_dirent_hash_desc, hash_info, 245 - dir, &dirent->k_i, str_hash_flags); 246 + dir, &dirent->k_i, flags); 246 247 *dir_offset = dirent->k.p.offset; 247 248 248 249 return ret; ··· 271 272 } else { 272 273 target->subvol = le32_to_cpu(d.v->d_child_subvol); 273 274 274 - ret = bch2_subvolume_get(trans, target->subvol, true, BTREE_ITER_CACHED, &s); 275 + ret = bch2_subvolume_get(trans, target->subvol, true, BTREE_ITER_cached, &s); 275 276 276 277 target->inum = le64_to_cpu(s.inode); 277 278 } ··· 300 301 memset(dst_inum, 0, sizeof(*dst_inum)); 301 302 302 303 /* Lookup src: */ 303 - ret = bch2_hash_lookup(trans, &src_iter, bch2_dirent_hash_desc, 304 - src_hash, src_dir, src_name, 305 - BTREE_ITER_INTENT); 306 - if (ret) 307 - goto out; 308 - 309 - old_src = bch2_btree_iter_peek_slot(&src_iter); 304 + old_src = bch2_hash_lookup(trans, &src_iter, bch2_dirent_hash_desc, 305 + src_hash, src_dir, src_name, 306 + BTREE_ITER_intent); 310 307 ret = bkey_err(old_src); 311 308 if (ret) 312 309 goto out; ··· 324 329 if (ret) 325 330 goto out; 326 331 } else { 327 - ret = bch2_hash_lookup(trans, &dst_iter, bch2_dirent_hash_desc, 328 - dst_hash, dst_dir, dst_name, 329 - BTREE_ITER_INTENT); 330 - if (ret) 331 - goto out; 332 - 333 - old_dst = bch2_btree_iter_peek_slot(&dst_iter); 332 + old_dst = bch2_hash_lookup(trans, &dst_iter, bch2_dirent_hash_desc, 333 + dst_hash, dst_dir, dst_name, 334 + BTREE_ITER_intent); 334 335 ret = bkey_err(old_dst); 335 336 if (ret) 336 337 goto out; ··· 441 450 if (delete_src) { 442 451 bch2_btree_iter_set_snapshot(&src_iter, old_src.k->p.snapshot); 443 452 ret = bch2_btree_iter_traverse(&src_iter) ?: 444 - bch2_btree_delete_at(trans, &src_iter, BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 453 + bch2_btree_delete_at(trans, &src_iter, BTREE_UPDATE_internal_snapshot_node); 445 454 if (ret) 446 455 goto out; 447 456 } ··· 449 458 if (delete_dst) { 450 459 bch2_btree_iter_set_snapshot(&dst_iter, old_dst.k->p.snapshot); 451 460 ret = bch2_btree_iter_traverse(&dst_iter) ?: 452 - bch2_btree_delete_at(trans, &dst_iter, BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 461 + bch2_btree_delete_at(trans, &dst_iter, BTREE_UPDATE_internal_snapshot_node); 453 462 if (ret) 454 463 goto out; 455 464 } ··· 470 479 const struct qstr *name, subvol_inum *inum, 471 480 unsigned flags) 472 481 { 473 - int ret = bch2_hash_lookup(trans, iter, bch2_dirent_hash_desc, 474 - hash_info, dir, name, flags); 475 - if (ret) 476 - return ret; 477 - 478 - struct bkey_s_c k = bch2_btree_iter_peek_slot(iter); 479 - ret = bkey_err(k); 482 + struct bkey_s_c k = bch2_hash_lookup(trans, iter, bch2_dirent_hash_desc, 483 + hash_info, dir, name, flags); 484 + int ret = bkey_err(k); 480 485 if (ret) 481 486 goto err; 482 487 ··· 528 541 bch2_empty_dir_snapshot(trans, dir.inum, dir.subvol, snapshot); 529 542 } 530 543 544 + static int bch2_dir_emit(struct dir_context *ctx, struct bkey_s_c_dirent d, subvol_inum target) 545 + { 546 + struct qstr name = bch2_dirent_get_name(d); 547 + bool ret = dir_emit(ctx, name.name, 548 + name.len, 549 + target.inum, 550 + vfs_d_type(d.v->d_type)); 551 + if (ret) 552 + ctx->pos = d.k->p.offset + 1; 553 + return ret; 554 + } 555 + 531 556 int bch2_readdir(struct bch_fs *c, subvol_inum inum, struct dir_context *ctx) 532 557 { 533 558 struct btree_trans *trans = bch2_trans_get(c); 534 559 struct btree_iter iter; 535 560 struct bkey_s_c k; 536 - struct bkey_s_c_dirent dirent; 537 561 subvol_inum target; 538 562 u32 snapshot; 539 563 struct bkey_buf sk; 540 - struct qstr name; 541 564 int ret; 542 565 543 566 bch2_bkey_buf_init(&sk); ··· 564 567 if (k.k->type != KEY_TYPE_dirent) 565 568 continue; 566 569 567 - dirent = bkey_s_c_to_dirent(k); 570 + /* dir_emit() can fault and block: */ 571 + bch2_bkey_buf_reassemble(&sk, c, k); 572 + struct bkey_s_c_dirent dirent = bkey_i_to_s_c_dirent(sk.k); 568 573 569 574 ret = bch2_dirent_read_target(trans, inum, dirent, &target); 570 575 if (ret < 0) ··· 574 575 if (ret) 575 576 continue; 576 577 577 - /* dir_emit() can fault and block: */ 578 - bch2_bkey_buf_reassemble(&sk, c, k); 579 - dirent = bkey_i_to_s_c_dirent(sk.k); 580 - bch2_trans_unlock(trans); 581 - 582 - name = bch2_dirent_get_name(dirent); 583 - 584 - ctx->pos = dirent.k->p.offset; 585 - if (!dir_emit(ctx, name.name, 586 - name.len, 587 - target.inum, 588 - vfs_d_type(dirent.v->d_type))) 589 - break; 590 - ctx->pos = dirent.k->p.offset + 1; 591 - 592 578 /* 593 579 * read_target looks up subvolumes, we can overflow paths if the 594 580 * directory has many subvolumes in it 581 + * 582 + * XXX: btree_trans_too_many_iters() is something we'd like to 583 + * get rid of, and there's no good reason to be using it here 584 + * except that we don't yet have a for_each_btree_key() helper 585 + * that does subvolume_get_snapshot(). 595 586 */ 596 - ret = btree_trans_too_many_iters(trans); 597 - if (ret) 587 + ret = drop_locks_do(trans, 588 + bch2_dir_emit(ctx, dirent, target)) ?: 589 + btree_trans_too_many_iters(trans); 590 + if (ret) { 591 + ret = ret < 0 ? ret : 0; 598 592 break; 593 + } 599 594 } 600 595 bch2_trans_iter_exit(trans, &iter); 601 596 err:
+4 -4
fs/bcachefs/dirent.h
··· 4 4 5 5 #include "str_hash.h" 6 6 7 - enum bkey_invalid_flags; 7 + enum bch_validate_flags; 8 8 extern const struct bch_hash_desc bch2_dirent_hash_desc; 9 9 10 10 int bch2_dirent_invalid(struct bch_fs *, struct bkey_s_c, 11 - enum bkey_invalid_flags, struct printbuf *); 11 + enum bch_validate_flags, struct printbuf *); 12 12 void bch2_dirent_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 13 13 14 14 #define bch2_bkey_ops_dirent ((struct bkey_ops) { \ ··· 38 38 int bch2_dirent_create_snapshot(struct btree_trans *, u32, u64, u32, 39 39 const struct bch_hash_info *, u8, 40 40 const struct qstr *, u64, u64 *, 41 - bch_str_hash_flags_t); 41 + enum btree_iter_update_trigger_flags); 42 42 int bch2_dirent_create(struct btree_trans *, subvol_inum, 43 43 const struct bch_hash_info *, u8, 44 44 const struct qstr *, u64, u64 *, 45 - bch_str_hash_flags_t); 45 + enum btree_iter_update_trigger_flags); 46 46 47 47 static inline unsigned vfs_d_type(unsigned type) 48 48 {
+5 -6
fs/bcachefs/disk_groups.c
··· 18 18 strncmp(l->label, r->label, sizeof(l->label)); 19 19 } 20 20 21 - static int bch2_sb_disk_groups_validate(struct bch_sb *sb, 22 - struct bch_sb_field *f, 23 - struct printbuf *err) 21 + static int bch2_sb_disk_groups_validate(struct bch_sb *sb, struct bch_sb_field *f, 22 + enum bch_validate_flags flags, struct printbuf *err) 24 23 { 25 24 struct bch_sb_field_disk_groups *groups = 26 25 field_to_type(f, disk_groups); ··· 176 177 struct bch_member m = bch2_sb_member_get(c->disk_sb.sb, i); 177 178 struct bch_disk_group_cpu *dst; 178 179 179 - if (!bch2_member_exists(&m)) 180 + if (!bch2_member_alive(&m)) 180 181 continue; 181 182 182 183 g = BCH_MEMBER_GROUP(&m); ··· 522 523 ca = bch2_dev_lookup(c, val); 523 524 if (!IS_ERR(ca)) { 524 525 *res = dev_to_target(ca->dev_idx); 525 - percpu_ref_put(&ca->ref); 526 + bch2_dev_put(ca); 526 527 return 0; 527 528 } 528 529 ··· 587 588 case TARGET_DEV: { 588 589 struct bch_member m = bch2_sb_member_get(sb, t.dev); 589 590 590 - if (bch2_dev_exists(sb, t.dev)) { 591 + if (bch2_member_exists(sb, t.dev)) { 591 592 prt_printf(out, "Device "); 592 593 pr_uuid(out, m.uuid.b); 593 594 prt_printf(out, " (%u)", t.dev);
+222 -193
fs/bcachefs/ec.c
··· 107 107 /* Stripes btree keys: */ 108 108 109 109 int bch2_stripe_invalid(struct bch_fs *c, struct bkey_s_c k, 110 - enum bkey_invalid_flags flags, 110 + enum bch_validate_flags flags, 111 111 struct printbuf *err) 112 112 { 113 113 const struct bch_stripe *s = bkey_s_c_to_stripe(k).v; ··· 163 163 164 164 /* Triggers: */ 165 165 166 - static int bch2_trans_mark_stripe_bucket(struct btree_trans *trans, 167 - struct bkey_s_c_stripe s, 168 - unsigned idx, bool deleting) 166 + static int __mark_stripe_bucket(struct btree_trans *trans, 167 + struct bch_dev *ca, 168 + struct bkey_s_c_stripe s, 169 + unsigned ptr_idx, bool deleting, 170 + struct bpos bucket, 171 + struct bch_alloc_v4 *a, 172 + enum btree_iter_update_trigger_flags flags) 169 173 { 170 - struct bch_fs *c = trans->c; 171 - const struct bch_extent_ptr *ptr = &s.v->ptrs[idx]; 172 - struct btree_iter iter; 173 - struct bkey_i_alloc_v4 *a; 174 - enum bch_data_type data_type = idx >= s.v->nr_blocks - s.v->nr_redundant 175 - ? BCH_DATA_parity : 0; 176 - s64 sectors = data_type ? le16_to_cpu(s.v->sectors) : 0; 177 - int ret = 0; 178 - 179 - if (deleting) 180 - sectors = -sectors; 181 - 182 - a = bch2_trans_start_alloc_update(trans, &iter, PTR_BUCKET_POS(c, ptr)); 183 - if (IS_ERR(a)) 184 - return PTR_ERR(a); 185 - 186 - ret = bch2_check_bucket_ref(trans, s.s_c, ptr, sectors, data_type, 187 - a->v.gen, a->v.data_type, 188 - a->v.dirty_sectors); 189 - if (ret) 190 - goto err; 191 - 192 - if (!deleting) { 193 - if (bch2_trans_inconsistent_on(a->v.stripe || 194 - a->v.stripe_redundancy, trans, 195 - "bucket %llu:%llu gen %u data type %s dirty_sectors %u: multiple stripes using same bucket (%u, %llu)", 196 - iter.pos.inode, iter.pos.offset, a->v.gen, 197 - bch2_data_type_str(a->v.data_type), 198 - a->v.dirty_sectors, 199 - a->v.stripe, s.k->p.offset)) { 200 - ret = -EIO; 201 - goto err; 202 - } 203 - 204 - if (bch2_trans_inconsistent_on(data_type && a->v.dirty_sectors, trans, 205 - "bucket %llu:%llu gen %u data type %s dirty_sectors %u: data already in stripe bucket %llu", 206 - iter.pos.inode, iter.pos.offset, a->v.gen, 207 - bch2_data_type_str(a->v.data_type), 208 - a->v.dirty_sectors, 209 - s.k->p.offset)) { 210 - ret = -EIO; 211 - goto err; 212 - } 213 - 214 - a->v.stripe = s.k->p.offset; 215 - a->v.stripe_redundancy = s.v->nr_redundant; 216 - a->v.data_type = BCH_DATA_stripe; 217 - } else { 218 - if (bch2_trans_inconsistent_on(a->v.stripe != s.k->p.offset || 219 - a->v.stripe_redundancy != s.v->nr_redundant, trans, 220 - "bucket %llu:%llu gen %u: not marked as stripe when deleting stripe %llu (got %u)", 221 - iter.pos.inode, iter.pos.offset, a->v.gen, 222 - s.k->p.offset, a->v.stripe)) { 223 - ret = -EIO; 224 - goto err; 225 - } 226 - 227 - a->v.stripe = 0; 228 - a->v.stripe_redundancy = 0; 229 - a->v.data_type = alloc_data_type(a->v, BCH_DATA_user); 230 - } 231 - 232 - a->v.dirty_sectors += sectors; 233 - if (data_type) 234 - a->v.data_type = !deleting ? data_type : 0; 235 - 236 - ret = bch2_trans_update(trans, &iter, &a->k_i, 0); 237 - if (ret) 238 - goto err; 239 - err: 240 - bch2_trans_iter_exit(trans, &iter); 241 - return ret; 242 - } 243 - 244 - static int mark_stripe_bucket(struct btree_trans *trans, 245 - struct bkey_s_c k, 246 - unsigned ptr_idx, 247 - unsigned flags) 248 - { 249 - struct bch_fs *c = trans->c; 250 - const struct bch_stripe *s = bkey_s_c_to_stripe(k).v; 251 - unsigned nr_data = s->nr_blocks - s->nr_redundant; 174 + const struct bch_extent_ptr *ptr = s.v->ptrs + ptr_idx; 175 + unsigned nr_data = s.v->nr_blocks - s.v->nr_redundant; 252 176 bool parity = ptr_idx >= nr_data; 253 177 enum bch_data_type data_type = parity ? BCH_DATA_parity : BCH_DATA_stripe; 254 - s64 sectors = parity ? le16_to_cpu(s->sectors) : 0; 255 - const struct bch_extent_ptr *ptr = s->ptrs + ptr_idx; 256 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 257 - struct bucket old, new, *g; 178 + s64 sectors = parity ? le16_to_cpu(s.v->sectors) : 0; 258 179 struct printbuf buf = PRINTBUF; 259 180 int ret = 0; 260 181 261 - BUG_ON(!(flags & BTREE_TRIGGER_GC)); 182 + struct bch_fs *c = trans->c; 183 + if (deleting) 184 + sectors = -sectors; 262 185 263 - /* * XXX doesn't handle deletion */ 186 + if (!deleting) { 187 + if (bch2_trans_inconsistent_on(a->stripe || 188 + a->stripe_redundancy, trans, 189 + "bucket %llu:%llu gen %u data type %s dirty_sectors %u: multiple stripes using same bucket (%u, %llu)\n%s", 190 + bucket.inode, bucket.offset, a->gen, 191 + bch2_data_type_str(a->data_type), 192 + a->dirty_sectors, 193 + a->stripe, s.k->p.offset, 194 + (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 195 + ret = -EIO; 196 + goto err; 197 + } 264 198 265 - percpu_down_read(&c->mark_lock); 266 - g = PTR_GC_BUCKET(ca, ptr); 199 + if (bch2_trans_inconsistent_on(parity && bch2_bucket_sectors_total(*a), trans, 200 + "bucket %llu:%llu gen %u data type %s dirty_sectors %u cached_sectors %u: data already in parity bucket\n%s", 201 + bucket.inode, bucket.offset, a->gen, 202 + bch2_data_type_str(a->data_type), 203 + a->dirty_sectors, 204 + a->cached_sectors, 205 + (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 206 + ret = -EIO; 207 + goto err; 208 + } 209 + } else { 210 + if (bch2_trans_inconsistent_on(a->stripe != s.k->p.offset || 211 + a->stripe_redundancy != s.v->nr_redundant, trans, 212 + "bucket %llu:%llu gen %u: not marked as stripe when deleting stripe (got %u)\n%s", 213 + bucket.inode, bucket.offset, a->gen, 214 + a->stripe, 215 + (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 216 + ret = -EIO; 217 + goto err; 218 + } 267 219 268 - if (g->dirty_sectors || 269 - (g->stripe && g->stripe != k.k->p.offset)) { 270 - bch2_fs_inconsistent(c, 271 - "bucket %u:%zu gen %u: multiple stripes using same bucket\n%s", 272 - ptr->dev, PTR_BUCKET_NR(ca, ptr), g->gen, 273 - (bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 274 - ret = -EINVAL; 275 - goto err; 220 + if (bch2_trans_inconsistent_on(a->data_type != data_type, trans, 221 + "bucket %llu:%llu gen %u data type %s: wrong data type when stripe, should be %s\n%s", 222 + bucket.inode, bucket.offset, a->gen, 223 + bch2_data_type_str(a->data_type), 224 + bch2_data_type_str(data_type), 225 + (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 226 + ret = -EIO; 227 + goto err; 228 + } 229 + 230 + if (bch2_trans_inconsistent_on(parity && 231 + (a->dirty_sectors != -sectors || 232 + a->cached_sectors), trans, 233 + "bucket %llu:%llu gen %u dirty_sectors %u cached_sectors %u: wrong sectors when deleting parity block of stripe\n%s", 234 + bucket.inode, bucket.offset, a->gen, 235 + a->dirty_sectors, 236 + a->cached_sectors, 237 + (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 238 + ret = -EIO; 239 + goto err; 240 + } 276 241 } 277 242 278 - bucket_lock(g); 279 - old = *g; 243 + if (sectors) { 244 + ret = bch2_bucket_ref_update(trans, ca, s.s_c, ptr, sectors, data_type, 245 + a->gen, a->data_type, &a->dirty_sectors); 246 + if (ret) 247 + goto err; 248 + } 280 249 281 - ret = bch2_check_bucket_ref(trans, k, ptr, sectors, data_type, 282 - g->gen, g->data_type, 283 - g->dirty_sectors); 284 - if (ret) 285 - goto err; 250 + if (!deleting) { 251 + a->stripe = s.k->p.offset; 252 + a->stripe_redundancy = s.v->nr_redundant; 253 + } else { 254 + a->stripe = 0; 255 + a->stripe_redundancy = 0; 256 + } 286 257 287 - g->data_type = data_type; 288 - g->dirty_sectors += sectors; 289 - 290 - g->stripe = k.k->p.offset; 291 - g->stripe_redundancy = s->nr_redundant; 292 - new = *g; 258 + alloc_data_type_set(a, data_type); 293 259 err: 294 - bucket_unlock(g); 295 - if (!ret) 296 - bch2_dev_usage_update_m(c, ca, &old, &new); 297 - percpu_up_read(&c->mark_lock); 298 260 printbuf_exit(&buf); 299 261 return ret; 300 262 } 301 263 264 + static int mark_stripe_bucket(struct btree_trans *trans, 265 + struct bkey_s_c_stripe s, 266 + unsigned ptr_idx, bool deleting, 267 + enum btree_iter_update_trigger_flags flags) 268 + { 269 + struct bch_fs *c = trans->c; 270 + const struct bch_extent_ptr *ptr = s.v->ptrs + ptr_idx; 271 + int ret = 0; 272 + 273 + struct bch_dev *ca = bch2_dev_tryget(c, ptr->dev); 274 + if (unlikely(!ca)) { 275 + if (!(flags & BTREE_TRIGGER_overwrite)) 276 + ret = -EIO; 277 + goto err; 278 + } 279 + 280 + struct bpos bucket = PTR_BUCKET_POS(ca, ptr); 281 + 282 + if (flags & BTREE_TRIGGER_transactional) { 283 + struct bkey_i_alloc_v4 *a = 284 + bch2_trans_start_alloc_update(trans, bucket); 285 + ret = PTR_ERR_OR_ZERO(a) ?: 286 + __mark_stripe_bucket(trans, ca, s, ptr_idx, deleting, bucket, &a->v, flags); 287 + } 288 + 289 + if (flags & BTREE_TRIGGER_gc) { 290 + percpu_down_read(&c->mark_lock); 291 + struct bucket *g = gc_bucket(ca, bucket.offset); 292 + bucket_lock(g); 293 + struct bch_alloc_v4 old = bucket_m_to_alloc(*g), new = old; 294 + ret = __mark_stripe_bucket(trans, ca, s, ptr_idx, deleting, bucket, &new, flags); 295 + if (!ret) { 296 + alloc_to_bucket(g, new); 297 + bch2_dev_usage_update(c, ca, &old, &new, 0, true); 298 + } 299 + bucket_unlock(g); 300 + percpu_up_read(&c->mark_lock); 301 + } 302 + err: 303 + bch2_dev_put(ca); 304 + return ret; 305 + } 306 + 307 + static int mark_stripe_buckets(struct btree_trans *trans, 308 + struct bkey_s_c old, struct bkey_s_c new, 309 + enum btree_iter_update_trigger_flags flags) 310 + { 311 + const struct bch_stripe *old_s = old.k->type == KEY_TYPE_stripe 312 + ? bkey_s_c_to_stripe(old).v : NULL; 313 + const struct bch_stripe *new_s = new.k->type == KEY_TYPE_stripe 314 + ? bkey_s_c_to_stripe(new).v : NULL; 315 + 316 + BUG_ON(old_s && new_s && old_s->nr_blocks != new_s->nr_blocks); 317 + 318 + unsigned nr_blocks = new_s ? new_s->nr_blocks : old_s->nr_blocks; 319 + 320 + for (unsigned i = 0; i < nr_blocks; i++) { 321 + if (new_s && old_s && 322 + !memcmp(&new_s->ptrs[i], 323 + &old_s->ptrs[i], 324 + sizeof(new_s->ptrs[i]))) 325 + continue; 326 + 327 + if (new_s) { 328 + int ret = mark_stripe_bucket(trans, 329 + bkey_s_c_to_stripe(new), i, false, flags); 330 + if (ret) 331 + return ret; 332 + } 333 + 334 + if (old_s) { 335 + int ret = mark_stripe_bucket(trans, 336 + bkey_s_c_to_stripe(old), i, true, flags); 337 + if (ret) 338 + return ret; 339 + } 340 + } 341 + 342 + return 0; 343 + } 344 + 302 345 int bch2_trigger_stripe(struct btree_trans *trans, 303 - enum btree_id btree_id, unsigned level, 346 + enum btree_id btree, unsigned level, 304 347 struct bkey_s_c old, struct bkey_s _new, 305 - unsigned flags) 348 + enum btree_iter_update_trigger_flags flags) 306 349 { 307 350 struct bkey_s_c new = _new.s_c; 308 351 struct bch_fs *c = trans->c; ··· 355 312 const struct bch_stripe *new_s = new.k->type == KEY_TYPE_stripe 356 313 ? bkey_s_c_to_stripe(new).v : NULL; 357 314 358 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { 315 + if (unlikely(flags & BTREE_TRIGGER_check_repair)) 316 + return bch2_check_fix_ptrs(trans, btree, level, _new.s_c, flags); 317 + 318 + if (flags & BTREE_TRIGGER_transactional) { 359 319 /* 360 320 * If the pointers aren't changing, we don't need to do anything: 361 321 */ ··· 393 347 return ret; 394 348 } 395 349 396 - unsigned nr_blocks = new_s ? new_s->nr_blocks : old_s->nr_blocks; 397 - for (unsigned i = 0; i < nr_blocks; i++) { 398 - if (new_s && old_s && 399 - !memcmp(&new_s->ptrs[i], 400 - &old_s->ptrs[i], 401 - sizeof(new_s->ptrs[i]))) 402 - continue; 403 - 404 - if (new_s) { 405 - int ret = bch2_trans_mark_stripe_bucket(trans, 406 - bkey_s_c_to_stripe(new), i, false); 407 - if (ret) 408 - return ret; 409 - } 410 - 411 - if (old_s) { 412 - int ret = bch2_trans_mark_stripe_bucket(trans, 413 - bkey_s_c_to_stripe(old), i, true); 414 - if (ret) 415 - return ret; 416 - } 417 - } 350 + int ret = mark_stripe_buckets(trans, old, new, flags); 351 + if (ret) 352 + return ret; 418 353 } 419 354 420 - if (flags & BTREE_TRIGGER_ATOMIC) { 355 + if (flags & BTREE_TRIGGER_atomic) { 421 356 struct stripe *m = genradix_ptr(&c->stripes, idx); 422 357 423 358 if (!m) { ··· 437 410 } 438 411 } 439 412 440 - if (flags & BTREE_TRIGGER_GC) { 413 + if (flags & BTREE_TRIGGER_gc) { 441 414 struct gc_stripe *m = 442 415 genradix_ptr_alloc(&c->gc_stripes, idx, GFP_KERNEL); 443 416 ··· 466 439 */ 467 440 memset(m->block_sectors, 0, sizeof(m->block_sectors)); 468 441 469 - for (unsigned i = 0; i < new_s->nr_blocks; i++) { 470 - int ret = mark_stripe_bucket(trans, new, i, flags); 471 - if (ret) 472 - return ret; 473 - } 442 + int ret = mark_stripe_buckets(trans, old, new, flags); 443 + if (ret) 444 + return ret; 474 445 475 - int ret = bch2_update_replicas(c, new, &m->r.e, 446 + ret = bch2_update_replicas(c, new, &m->r.e, 476 447 ((s64) m->sectors * m->nr_redundant), 477 448 0, true); 478 449 if (ret) { ··· 633 608 struct bch_csum got = ec_block_checksum(buf, i, offset); 634 609 635 610 if (bch2_crc_cmp(want, got)) { 636 - struct printbuf err = PRINTBUF; 637 - struct bch_dev *ca = bch_dev_bkey_exists(c, v->ptrs[i].dev); 611 + struct bch_dev *ca = bch2_dev_tryget(c, v->ptrs[i].dev); 612 + if (ca) { 613 + struct printbuf err = PRINTBUF; 638 614 639 - prt_str(&err, "stripe "); 640 - bch2_csum_err_msg(&err, v->csum_type, want, got); 641 - prt_printf(&err, " for %ps at %u of\n ", (void *) _RET_IP_, i); 642 - bch2_bkey_val_to_text(&err, c, bkey_i_to_s_c(&buf->key)); 643 - bch_err_ratelimited(ca, "%s", err.buf); 644 - printbuf_exit(&err); 615 + prt_str(&err, "stripe "); 616 + bch2_csum_err_msg(&err, v->csum_type, want, got); 617 + prt_printf(&err, " for %ps at %u of\n ", (void *) _RET_IP_, i); 618 + bch2_bkey_val_to_text(&err, c, bkey_i_to_s_c(&buf->key)); 619 + bch_err_ratelimited(ca, "%s", err.buf); 620 + printbuf_exit(&err); 621 + 622 + bch2_io_error(ca, BCH_MEMBER_ERROR_checksum); 623 + } 645 624 646 625 clear_bit(i, buf->valid); 647 - 648 - bch2_io_error(ca, BCH_MEMBER_ERROR_checksum); 649 626 break; 650 627 } 651 628 ··· 714 687 bch2_blk_status_to_str(bio->bi_status))) 715 688 clear_bit(ec_bio->idx, ec_bio->buf->valid); 716 689 717 - if (ptr_stale(ca, ptr)) { 690 + if (dev_ptr_stale(ca, ptr)) { 718 691 bch_err_ratelimited(ca->fs, 719 692 "error %s stripe: stale pointer after io", 720 693 bio_data_dir(bio) == READ ? "reading from" : "writing to"); ··· 732 705 struct bch_stripe *v = &bkey_i_to_stripe(&buf->key)->v; 733 706 unsigned offset = 0, bytes = buf->size << 9; 734 707 struct bch_extent_ptr *ptr = &v->ptrs[idx]; 735 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 736 708 enum bch_data_type data_type = idx < v->nr_blocks - v->nr_redundant 737 709 ? BCH_DATA_user 738 710 : BCH_DATA_parity; 739 711 int rw = op_is_write(opf); 740 712 741 - if (ptr_stale(ca, ptr)) { 713 + struct bch_dev *ca = bch2_dev_get_ioref(c, ptr->dev, rw); 714 + if (!ca) { 715 + clear_bit(idx, buf->valid); 716 + return; 717 + } 718 + 719 + if (dev_ptr_stale(ca, ptr)) { 742 720 bch_err_ratelimited(c, 743 721 "error %s stripe: stale pointer", 744 722 rw == READ ? "reading from" : "writing to"); ··· 751 719 return; 752 720 } 753 721 754 - if (!bch2_dev_get_ioref(ca, rw)) { 755 - clear_bit(idx, buf->valid); 756 - return; 757 - } 758 722 759 723 this_cpu_add(ca->io_done->sectors[rw][data_type], buf->size); 760 724 ··· 797 769 int ret; 798 770 799 771 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_stripes, 800 - POS(0, idx), BTREE_ITER_SLOTS); 772 + POS(0, idx), BTREE_ITER_slots); 801 773 ret = bkey_err(k); 802 774 if (ret) 803 775 goto err; ··· 1088 1060 int ret; 1089 1061 1090 1062 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_stripes, POS(0, idx), 1091 - BTREE_ITER_INTENT); 1063 + BTREE_ITER_intent); 1092 1064 ret = bkey_err(k); 1093 1065 if (ret) 1094 1066 goto err; ··· 1159 1131 int ret; 1160 1132 1161 1133 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_stripes, 1162 - new->k.p, BTREE_ITER_INTENT); 1134 + new->k.p, BTREE_ITER_intent); 1163 1135 ret = bkey_err(k); 1164 1136 if (ret) 1165 1137 goto err; ··· 1201 1173 } 1202 1174 1203 1175 static int ec_stripe_update_extent(struct btree_trans *trans, 1176 + struct bch_dev *ca, 1204 1177 struct bpos bucket, u8 gen, 1205 1178 struct ec_stripe_buf *s, 1206 1179 struct bpos *bp_pos) ··· 1212 1183 struct btree_iter iter; 1213 1184 struct bkey_s_c k; 1214 1185 const struct bch_extent_ptr *ptr_c; 1215 - struct bch_extent_ptr *ptr, *ec_ptr = NULL; 1186 + struct bch_extent_ptr *ec_ptr = NULL; 1216 1187 struct bch_extent_stripe_ptr stripe_ptr; 1217 1188 struct bkey_i *n; 1218 1189 int ret, dev, block; 1219 1190 1220 - ret = bch2_get_next_backpointer(trans, bucket, gen, 1221 - bp_pos, &bp, BTREE_ITER_CACHED); 1191 + ret = bch2_get_next_backpointer(trans, ca, bucket, gen, 1192 + bp_pos, &bp, BTREE_ITER_cached); 1222 1193 if (ret) 1223 1194 return ret; 1224 1195 if (bpos_eq(*bp_pos, SPOS_MAX)) ··· 1243 1214 return -EIO; 1244 1215 } 1245 1216 1246 - k = bch2_backpointer_get_key(trans, &iter, *bp_pos, bp, BTREE_ITER_INTENT); 1217 + k = bch2_backpointer_get_key(trans, &iter, *bp_pos, bp, BTREE_ITER_intent); 1247 1218 ret = bkey_err(k); 1248 1219 if (ret) 1249 1220 return ret; ··· 1301 1272 { 1302 1273 struct bch_fs *c = trans->c; 1303 1274 struct bch_stripe *v = &bkey_i_to_stripe(&s->key)->v; 1304 - struct bch_extent_ptr bucket = v->ptrs[block]; 1305 - struct bpos bucket_pos = PTR_BUCKET_POS(c, &bucket); 1275 + struct bch_extent_ptr ptr = v->ptrs[block]; 1306 1276 struct bpos bp_pos = POS_MIN; 1307 1277 int ret = 0; 1278 + 1279 + struct bch_dev *ca = bch2_dev_tryget(c, ptr.dev); 1280 + if (!ca) 1281 + return -EIO; 1282 + 1283 + struct bpos bucket_pos = PTR_BUCKET_POS(ca, &ptr); 1308 1284 1309 1285 while (1) { 1310 1286 ret = commit_do(trans, NULL, NULL, 1311 1287 BCH_TRANS_COMMIT_no_check_rw| 1312 1288 BCH_TRANS_COMMIT_no_enospc, 1313 - ec_stripe_update_extent(trans, bucket_pos, bucket.gen, 1314 - s, &bp_pos)); 1289 + ec_stripe_update_extent(trans, ca, bucket_pos, ptr.gen, s, &bp_pos)); 1315 1290 if (ret) 1316 1291 break; 1317 1292 if (bkey_eq(bp_pos, POS_MAX)) ··· 1324 1291 bp_pos = bpos_nosnap_successor(bp_pos); 1325 1292 } 1326 1293 1294 + bch2_dev_put(ca); 1327 1295 return ret; 1328 1296 } 1329 1297 ··· 1355 1321 unsigned block, 1356 1322 struct open_bucket *ob) 1357 1323 { 1358 - struct bch_dev *ca = bch_dev_bkey_exists(c, ob->dev); 1359 - unsigned offset = ca->mi.bucket_size - ob->sectors_free; 1360 - int ret; 1361 - 1362 - if (!bch2_dev_get_ioref(ca, WRITE)) { 1324 + struct bch_dev *ca = bch2_dev_get_ioref(c, ob->dev, WRITE); 1325 + if (!ca) { 1363 1326 s->err = -BCH_ERR_erofs_no_writes; 1364 1327 return; 1365 1328 } 1366 1329 1330 + unsigned offset = ca->mi.bucket_size - ob->sectors_free; 1367 1331 memset(s->new_stripe.data[block] + (offset << 9), 1368 1332 0, 1369 1333 ob->sectors_free << 9); 1370 1334 1371 - ret = blkdev_issue_zeroout(ca->disk_sb.bdev, 1335 + int ret = blkdev_issue_zeroout(ca->disk_sb.bdev, 1372 1336 ob->bucket * ca->mi.bucket_size + offset, 1373 1337 ob->sectors_free, 1374 1338 GFP_KERNEL, 0); ··· 1551 1519 void *bch2_writepoint_ec_buf(struct bch_fs *c, struct write_point *wp) 1552 1520 { 1553 1521 struct open_bucket *ob = ec_open_bucket(c, &wp->ptrs); 1554 - struct bch_dev *ca; 1555 - unsigned offset; 1556 - 1557 1522 if (!ob) 1558 1523 return NULL; 1559 1524 1560 1525 BUG_ON(!ob->ec->new_stripe.data[ob->ec_idx]); 1561 1526 1562 - ca = bch_dev_bkey_exists(c, ob->dev); 1563 - offset = ca->mi.bucket_size - ob->sectors_free; 1527 + struct bch_dev *ca = ob_dev(c, ob); 1528 + unsigned offset = ca->mi.bucket_size - ob->sectors_free; 1564 1529 1565 1530 return ob->ec->new_stripe.data[ob->ec_idx] + (offset << 9); 1566 1531 } ··· 1966 1937 } 1967 1938 1968 1939 for_each_btree_key_norestart(trans, iter, BTREE_ID_stripes, start_pos, 1969 - BTREE_ITER_SLOTS|BTREE_ITER_INTENT, k, ret) { 1940 + BTREE_ITER_slots|BTREE_ITER_intent, k, ret) { 1970 1941 if (bkey_gt(k.k->p, POS(0, U32_MAX))) { 1971 1942 if (start_pos.offset) { 1972 1943 start_pos = min_pos; ··· 2156 2127 { 2157 2128 int ret = bch2_trans_run(c, 2158 2129 for_each_btree_key(trans, iter, BTREE_ID_stripes, POS_MIN, 2159 - BTREE_ITER_PREFETCH, k, ({ 2130 + BTREE_ITER_prefetch, k, ({ 2160 2131 if (k.k->type != KEY_TYPE_stripe) 2161 2132 continue; 2162 2133
+4 -3
fs/bcachefs/ec.h
··· 6 6 #include "buckets_types.h" 7 7 #include "extents_types.h" 8 8 9 - enum bkey_invalid_flags; 9 + enum bch_validate_flags; 10 10 11 11 int bch2_stripe_invalid(struct bch_fs *, struct bkey_s_c, 12 - enum bkey_invalid_flags, struct printbuf *); 12 + enum bch_validate_flags, struct printbuf *); 13 13 void bch2_stripe_to_text(struct printbuf *, struct bch_fs *, 14 14 struct bkey_s_c); 15 15 int bch2_trigger_stripe(struct btree_trans *, enum btree_id, unsigned, 16 - struct bkey_s_c, struct bkey_s, unsigned); 16 + struct bkey_s_c, struct bkey_s, 17 + enum btree_iter_update_trigger_flags); 17 18 18 19 #define bch2_bkey_ops_stripe ((struct bkey_ops) { \ 19 20 .key_invalid = bch2_stripe_invalid, \
+48 -11
fs/bcachefs/error.c
··· 176 176 return s; 177 177 } 178 178 179 + /* s/fix?/fixing/ s/recreate?/recreating/ */ 180 + static void prt_actioning(struct printbuf *out, const char *action) 181 + { 182 + unsigned len = strlen(action); 183 + 184 + BUG_ON(action[len - 1] != '?'); 185 + --len; 186 + 187 + if (action[len - 1] == 'e') 188 + --len; 189 + 190 + prt_bytes(out, action, len); 191 + prt_str(out, "ing"); 192 + } 193 + 179 194 int bch2_fsck_err(struct bch_fs *c, 180 195 enum bch_fsck_flags flags, 181 196 enum bch_sb_error_id err, ··· 201 186 bool print = true, suppressing = false, inconsistent = false; 202 187 struct printbuf buf = PRINTBUF, *out = &buf; 203 188 int ret = -BCH_ERR_fsck_ignore; 189 + const char *action_orig = "fix?", *action = action_orig; 204 190 205 191 if ((flags & FSCK_CAN_FIX) && 206 192 test_bit(err, c->sb.errors_silent)) ··· 212 196 va_start(args, fmt); 213 197 prt_vprintf(out, fmt, args); 214 198 va_end(args); 199 + 200 + /* Custom fix/continue/recreate/etc.? */ 201 + if (out->buf[out->pos - 1] == '?') { 202 + const char *p = strrchr(out->buf, ','); 203 + if (p) { 204 + out->pos = p - out->buf; 205 + action = kstrdup(p + 2, GFP_KERNEL); 206 + if (!action) { 207 + ret = -ENOMEM; 208 + goto err; 209 + } 210 + } 211 + } 215 212 216 213 mutex_lock(&c->fsck_error_msgs_lock); 217 214 s = fsck_err_get(c, fmt); ··· 237 208 if (s->last_msg && !strcmp(buf.buf, s->last_msg)) { 238 209 ret = s->ret; 239 210 mutex_unlock(&c->fsck_error_msgs_lock); 240 - printbuf_exit(&buf); 241 - return ret; 211 + goto err; 242 212 } 243 213 244 214 kfree(s->last_msg); 245 215 s->last_msg = kstrdup(buf.buf, GFP_KERNEL); 216 + if (!s->last_msg) { 217 + mutex_unlock(&c->fsck_error_msgs_lock); 218 + ret = -ENOMEM; 219 + goto err; 220 + } 246 221 247 222 if (c->opts.ratelimit_errors && 248 223 !(flags & FSCK_NO_RATELIMIT) && ··· 272 239 inconsistent = true; 273 240 ret = -BCH_ERR_fsck_errors_not_fixed; 274 241 } else if (flags & FSCK_CAN_FIX) { 275 - prt_str(out, ", fixing"); 242 + prt_str(out, ", "); 243 + prt_actioning(out, action); 276 244 ret = -BCH_ERR_fsck_fix; 277 245 } else { 278 246 prt_str(out, ", continuing"); ··· 288 254 : c->opts.fix_errors; 289 255 290 256 if (fix == FSCK_FIX_ask) { 291 - int ask; 257 + prt_str(out, ", "); 258 + prt_str(out, action); 292 259 293 - prt_str(out, ": fix?"); 294 260 if (bch2_fs_stdio_redirect(c)) 295 261 bch2_print(c, "%s", out->buf); 296 262 else 297 263 bch2_print_string_as_lines(KERN_ERR, out->buf); 298 264 print = false; 299 265 300 - ask = bch2_fsck_ask_yn(c); 266 + int ask = bch2_fsck_ask_yn(c); 301 267 302 268 if (ask >= YN_ALLNO && s) 303 269 s->fix = ask == YN_ALLNO ··· 310 276 } else if (fix == FSCK_FIX_yes || 311 277 (c->opts.nochanges && 312 278 !(flags & FSCK_CAN_IGNORE))) { 313 - prt_str(out, ", fixing"); 279 + prt_str(out, ", "); 280 + prt_actioning(out, action); 314 281 ret = -BCH_ERR_fsck_fix; 315 282 } else { 316 - prt_str(out, ", not fixing"); 283 + prt_str(out, ", not "); 284 + prt_actioning(out, action); 317 285 } 318 286 } else if (flags & FSCK_NEED_FSCK) { 319 287 prt_str(out, " (run fsck to correct)"); ··· 347 311 348 312 mutex_unlock(&c->fsck_error_msgs_lock); 349 313 350 - printbuf_exit(&buf); 351 - 352 314 if (inconsistent) 353 315 bch2_inconsistent_error(c); 354 316 ··· 356 322 set_bit(BCH_FS_errors_not_fixed, &c->flags); 357 323 set_bit(BCH_FS_error, &c->flags); 358 324 } 359 - 325 + err: 326 + if (action != action_orig) 327 + kfree(action); 328 + printbuf_exit(&buf); 360 329 return ret; 361 330 } 362 331
+1 -1
fs/bcachefs/extent_update.c
··· 72 72 73 73 for_each_btree_key_norestart(trans, iter, 74 74 BTREE_ID_reflink, POS(0, idx + offset), 75 - BTREE_ITER_SLOTS, r_k, ret2) { 75 + BTREE_ITER_slots, r_k, ret2) { 76 76 if (bkey_ge(bkey_start_pos(r_k.k), POS(0, idx + sectors))) 77 77 break; 78 78
+86 -65
fs/bcachefs/extents.c
··· 71 71 } 72 72 } 73 73 74 + static inline u64 dev_latency(struct bch_fs *c, unsigned dev) 75 + { 76 + struct bch_dev *ca = bch2_dev_rcu(c, dev); 77 + return ca ? atomic64_read(&ca->cur_latency[READ]) : S64_MAX; 78 + } 79 + 74 80 /* 75 81 * returns true if p1 is better than p2: 76 82 */ ··· 85 79 const struct extent_ptr_decoded p2) 86 80 { 87 81 if (likely(!p1.idx && !p2.idx)) { 88 - struct bch_dev *dev1 = bch_dev_bkey_exists(c, p1.ptr.dev); 89 - struct bch_dev *dev2 = bch_dev_bkey_exists(c, p2.ptr.dev); 90 - 91 - u64 l1 = atomic64_read(&dev1->cur_latency[READ]); 92 - u64 l2 = atomic64_read(&dev2->cur_latency[READ]); 82 + u64 l1 = dev_latency(c, p1.ptr.dev); 83 + u64 l2 = dev_latency(c, p2.ptr.dev); 93 84 94 85 /* Pick at random, biased in favor of the faster device: */ 95 86 ··· 112 109 const union bch_extent_entry *entry; 113 110 struct extent_ptr_decoded p; 114 111 struct bch_dev_io_failures *f; 115 - struct bch_dev *ca; 116 112 int ret = 0; 117 113 118 114 if (k.k->type == KEY_TYPE_error) 119 115 return -EIO; 120 116 117 + rcu_read_lock(); 121 118 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { 122 119 /* 123 120 * Unwritten extent: no need to actually read, treat it as a 124 121 * hole and return 0s: 125 122 */ 126 - if (p.ptr.unwritten) 127 - return 0; 128 - 129 - ca = bch_dev_bkey_exists(c, p.ptr.dev); 123 + if (p.ptr.unwritten) { 124 + ret = 0; 125 + break; 126 + } 130 127 131 128 /* 132 129 * If there are any dirty pointers it's an error if we can't ··· 135 132 if (!ret && !p.ptr.cached) 136 133 ret = -EIO; 137 134 138 - if (p.ptr.cached && ptr_stale(ca, &p.ptr)) 135 + struct bch_dev *ca = bch2_dev_rcu(c, p.ptr.dev); 136 + 137 + if (p.ptr.cached && (!ca || dev_ptr_stale(ca, &p.ptr))) 139 138 continue; 140 139 141 140 f = failed ? dev_io_failures(failed, p.ptr.dev) : NULL; ··· 146 141 ? f->idx 147 142 : f->idx + 1; 148 143 149 - if (!p.idx && 150 - !bch2_dev_is_readable(ca)) 144 + if (!p.idx && !ca) 151 145 p.idx++; 152 146 153 - if (bch2_force_reconstruct_read && 154 - !p.idx && p.has_ec) 147 + if (!p.idx && p.has_ec && bch2_force_reconstruct_read) 148 + p.idx++; 149 + 150 + if (!p.idx && !bch2_dev_is_readable(ca)) 155 151 p.idx++; 156 152 157 153 if (p.idx >= (unsigned) p.has_ec + 1) ··· 164 158 *pick = p; 165 159 ret = 1; 166 160 } 161 + rcu_read_unlock(); 167 162 168 163 return ret; 169 164 } ··· 172 165 /* KEY_TYPE_btree_ptr: */ 173 166 174 167 int bch2_btree_ptr_invalid(struct bch_fs *c, struct bkey_s_c k, 175 - enum bkey_invalid_flags flags, 168 + enum bch_validate_flags flags, 176 169 struct printbuf *err) 177 170 { 178 171 int ret = 0; ··· 193 186 } 194 187 195 188 int bch2_btree_ptr_v2_invalid(struct bch_fs *c, struct bkey_s_c k, 196 - enum bkey_invalid_flags flags, 189 + enum bch_validate_flags flags, 197 190 struct printbuf *err) 198 191 { 199 192 struct bkey_s_c_btree_ptr_v2 bp = bkey_s_c_to_btree_ptr_v2(k); ··· 207 200 bkey_fsck_err_on(bpos_ge(bp.v->min_key, bp.k->p), 208 201 c, err, btree_ptr_v2_min_key_bad, 209 202 "min_key > key"); 203 + 204 + if (flags & BCH_VALIDATE_write) 205 + bkey_fsck_err_on(!bp.v->sectors_written, 206 + c, err, btree_ptr_v2_written_0, 207 + "sectors_written == 0"); 210 208 211 209 ret = bch2_bkey_ptrs_invalid(c, k, flags, err); 212 210 fsck_err: ··· 259 247 const union bch_extent_entry *en_r; 260 248 struct extent_ptr_decoded lp, rp; 261 249 bool use_right_ptr; 262 - struct bch_dev *ca; 263 250 264 251 en_l = l_ptrs.start; 265 252 en_r = r_ptrs.start; ··· 289 278 return false; 290 279 291 280 /* Extents may not straddle buckets: */ 292 - ca = bch_dev_bkey_exists(c, lp.ptr.dev); 293 - if (PTR_BUCKET_NR(ca, &lp.ptr) != PTR_BUCKET_NR(ca, &rp.ptr)) 281 + rcu_read_lock(); 282 + struct bch_dev *ca = bch2_dev_rcu(c, lp.ptr.dev); 283 + bool same_bucket = ca && PTR_BUCKET_NR(ca, &lp.ptr) == PTR_BUCKET_NR(ca, &rp.ptr); 284 + rcu_read_unlock(); 285 + 286 + if (!same_bucket) 294 287 return false; 295 288 296 289 if (lp.has_ec != rp.has_ec || ··· 400 385 /* KEY_TYPE_reservation: */ 401 386 402 387 int bch2_reservation_invalid(struct bch_fs *c, struct bkey_s_c k, 403 - enum bkey_invalid_flags flags, 388 + enum bch_validate_flags flags, 404 389 struct printbuf *err) 405 390 { 406 391 struct bkey_s_c_reservation r = bkey_s_c_to_reservation(k); ··· 682 667 683 668 unsigned bch2_extent_ptr_desired_durability(struct bch_fs *c, struct extent_ptr_decoded *p) 684 669 { 685 - struct bch_dev *ca = bch_dev_bkey_exists(c, p->ptr.dev); 670 + struct bch_dev *ca = bch2_dev_rcu(c, p->ptr.dev); 686 671 687 - return __extent_ptr_durability(ca, p); 672 + return ca ? __extent_ptr_durability(ca, p) : 0; 688 673 } 689 674 690 675 unsigned bch2_extent_ptr_durability(struct bch_fs *c, struct extent_ptr_decoded *p) 691 676 { 692 - struct bch_dev *ca = bch_dev_bkey_exists(c, p->ptr.dev); 677 + struct bch_dev *ca = bch2_dev_rcu(c, p->ptr.dev); 693 678 694 - if (ca->mi.state == BCH_MEMBER_STATE_failed) 679 + if (!ca || ca->mi.state == BCH_MEMBER_STATE_failed) 695 680 return 0; 696 681 697 682 return __extent_ptr_durability(ca, p); ··· 704 689 struct extent_ptr_decoded p; 705 690 unsigned durability = 0; 706 691 692 + rcu_read_lock(); 707 693 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) 708 694 durability += bch2_extent_ptr_durability(c, &p); 695 + rcu_read_unlock(); 709 696 710 697 return durability; 711 698 } ··· 719 702 struct extent_ptr_decoded p; 720 703 unsigned durability = 0; 721 704 705 + rcu_read_lock(); 722 706 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) 723 707 if (p.ptr.dev < c->sb.nr_devices && c->devs[p.ptr.dev]) 724 708 durability += bch2_extent_ptr_durability(c, &p); 709 + rcu_read_unlock(); 725 710 726 711 return durability; 727 712 } ··· 852 833 853 834 void bch2_bkey_drop_device(struct bkey_s k, unsigned dev) 854 835 { 855 - struct bch_extent_ptr *ptr; 856 - 857 836 bch2_bkey_drop_ptrs(k, ptr, ptr->dev == dev); 858 837 } 859 838 ··· 877 860 bool bch2_bkey_has_target(struct bch_fs *c, struct bkey_s_c k, unsigned target) 878 861 { 879 862 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 863 + struct bch_dev *ca; 864 + bool ret = false; 880 865 866 + rcu_read_lock(); 881 867 bkey_for_each_ptr(ptrs, ptr) 882 868 if (bch2_dev_in_target(c, ptr->dev, target) && 869 + (ca = bch2_dev_rcu(c, ptr->dev)) && 883 870 (!ptr->cached || 884 - !ptr_stale(bch_dev_bkey_exists(c, ptr->dev), ptr))) 885 - return true; 871 + !dev_ptr_stale_rcu(ca, ptr))) { 872 + ret = true; 873 + break; 874 + } 875 + rcu_read_unlock(); 886 876 887 - return false; 877 + return ret; 888 878 } 889 879 890 880 bool bch2_bkey_matches_ptr(struct bch_fs *c, struct bkey_s_c k, ··· 993 969 */ 994 970 bool bch2_extent_normalize(struct bch_fs *c, struct bkey_s k) 995 971 { 996 - struct bch_extent_ptr *ptr; 972 + struct bch_dev *ca; 997 973 974 + rcu_read_lock(); 998 975 bch2_bkey_drop_ptrs(k, ptr, 999 976 ptr->cached && 1000 - ptr_stale(bch_dev_bkey_exists(c, ptr->dev), ptr)); 977 + (ca = bch2_dev_rcu(c, ptr->dev)) && 978 + dev_ptr_stale_rcu(ca, ptr)); 979 + rcu_read_unlock(); 1001 980 1002 981 return bkey_deleted(k.k); 1003 982 } 1004 983 1005 984 void bch2_extent_ptr_to_text(struct printbuf *out, struct bch_fs *c, const struct bch_extent_ptr *ptr) 1006 985 { 1007 - struct bch_dev *ca = c && ptr->dev < c->sb.nr_devices && c->devs[ptr->dev] 1008 - ? bch_dev_bkey_exists(c, ptr->dev) 1009 - : NULL; 1010 - 986 + out->atomic++; 987 + rcu_read_lock(); 988 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 1011 989 if (!ca) { 1012 990 prt_printf(out, "ptr: %u:%llu gen %u%s", ptr->dev, 1013 991 (u64) ptr->offset, ptr->gen, ··· 1024 998 prt_str(out, " cached"); 1025 999 if (ptr->unwritten) 1026 1000 prt_str(out, " unwritten"); 1027 - if (b >= ca->mi.first_bucket && 1028 - b < ca->mi.nbuckets && 1029 - ptr_stale(ca, ptr)) 1001 + if (bucket_valid(ca, b) && dev_ptr_stale_rcu(ca, ptr)) 1030 1002 prt_printf(out, " stale"); 1031 1003 } 1004 + rcu_read_unlock(); 1005 + --out->atomic; 1032 1006 } 1033 1007 1034 1008 void bch2_bkey_ptrs_to_text(struct printbuf *out, struct bch_fs *c, ··· 1095 1069 1096 1070 static int extent_ptr_invalid(struct bch_fs *c, 1097 1071 struct bkey_s_c k, 1098 - enum bkey_invalid_flags flags, 1072 + enum bch_validate_flags flags, 1099 1073 const struct bch_extent_ptr *ptr, 1100 1074 unsigned size_ondisk, 1101 1075 bool metadata, 1102 1076 struct printbuf *err) 1103 1077 { 1104 - struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1105 - u64 bucket; 1106 - u32 bucket_offset; 1107 - struct bch_dev *ca; 1108 1078 int ret = 0; 1109 1079 1110 - if (!bch2_dev_exists2(c, ptr->dev)) { 1111 - /* 1112 - * If we're in the write path this key might have already been 1113 - * overwritten, and we could be seeing a device that doesn't 1114 - * exist anymore due to racing with device removal: 1115 - */ 1116 - if (flags & BKEY_INVALID_WRITE) 1117 - return 0; 1118 - 1119 - bkey_fsck_err(c, err, ptr_to_invalid_device, 1120 - "pointer to invalid device (%u)", ptr->dev); 1080 + rcu_read_lock(); 1081 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 1082 + if (!ca) { 1083 + rcu_read_unlock(); 1084 + return 0; 1121 1085 } 1086 + u32 bucket_offset; 1087 + u64 bucket = sector_to_bucket_and_offset(ca, ptr->offset, &bucket_offset); 1088 + unsigned first_bucket = ca->mi.first_bucket; 1089 + u64 nbuckets = ca->mi.nbuckets; 1090 + unsigned bucket_size = ca->mi.bucket_size; 1091 + rcu_read_unlock(); 1122 1092 1123 - ca = bch_dev_bkey_exists(c, ptr->dev); 1093 + struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1124 1094 bkey_for_each_ptr(ptrs, ptr2) 1125 1095 bkey_fsck_err_on(ptr != ptr2 && ptr->dev == ptr2->dev, c, err, 1126 1096 ptr_to_duplicate_device, 1127 1097 "multiple pointers to same device (%u)", ptr->dev); 1128 1098 1129 - bucket = sector_to_bucket_and_offset(ca, ptr->offset, &bucket_offset); 1130 1099 1131 - bkey_fsck_err_on(bucket >= ca->mi.nbuckets, c, err, 1100 + bkey_fsck_err_on(bucket >= nbuckets, c, err, 1132 1101 ptr_after_last_bucket, 1133 - "pointer past last bucket (%llu > %llu)", bucket, ca->mi.nbuckets); 1134 - bkey_fsck_err_on(ptr->offset < bucket_to_sector(ca, ca->mi.first_bucket), c, err, 1102 + "pointer past last bucket (%llu > %llu)", bucket, nbuckets); 1103 + bkey_fsck_err_on(bucket < first_bucket, c, err, 1135 1104 ptr_before_first_bucket, 1136 - "pointer before first bucket (%llu < %u)", bucket, ca->mi.first_bucket); 1137 - bkey_fsck_err_on(bucket_offset + size_ondisk > ca->mi.bucket_size, c, err, 1105 + "pointer before first bucket (%llu < %u)", bucket, first_bucket); 1106 + bkey_fsck_err_on(bucket_offset + size_ondisk > bucket_size, c, err, 1138 1107 ptr_spans_multiple_buckets, 1139 1108 "pointer spans multiple buckets (%u + %u > %u)", 1140 - bucket_offset, size_ondisk, ca->mi.bucket_size); 1109 + bucket_offset, size_ondisk, bucket_size); 1141 1110 fsck_err: 1142 1111 return ret; 1143 1112 } 1144 1113 1145 1114 int bch2_bkey_ptrs_invalid(struct bch_fs *c, struct bkey_s_c k, 1146 - enum bkey_invalid_flags flags, 1115 + enum bch_validate_flags flags, 1147 1116 struct printbuf *err) 1148 1117 { 1149 1118 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); ··· 1214 1193 1215 1194 bkey_fsck_err_on(crc_is_encoded(crc) && 1216 1195 (crc.uncompressed_size > c->opts.encoded_extent_max >> 9) && 1217 - (flags & (BKEY_INVALID_WRITE|BKEY_INVALID_COMMIT)), c, err, 1196 + (flags & (BCH_VALIDATE_write|BCH_VALIDATE_commit)), c, err, 1218 1197 ptr_crc_uncompressed_size_too_big, 1219 1198 "too large encoded extent"); 1220 1199
+6 -6
fs/bcachefs/extents.h
··· 8 8 9 9 struct bch_fs; 10 10 struct btree_trans; 11 - enum bkey_invalid_flags; 11 + enum bch_validate_flags; 12 12 13 13 /* extent entries: */ 14 14 ··· 406 406 /* KEY_TYPE_btree_ptr: */ 407 407 408 408 int bch2_btree_ptr_invalid(struct bch_fs *, struct bkey_s_c, 409 - enum bkey_invalid_flags, struct printbuf *); 409 + enum bch_validate_flags, struct printbuf *); 410 410 void bch2_btree_ptr_to_text(struct printbuf *, struct bch_fs *, 411 411 struct bkey_s_c); 412 412 413 413 int bch2_btree_ptr_v2_invalid(struct bch_fs *, struct bkey_s_c, 414 - enum bkey_invalid_flags, struct printbuf *); 414 + enum bch_validate_flags, struct printbuf *); 415 415 void bch2_btree_ptr_v2_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 416 416 void bch2_btree_ptr_v2_compat(enum btree_id, unsigned, unsigned, 417 417 int, struct bkey_s); ··· 448 448 /* KEY_TYPE_reservation: */ 449 449 450 450 int bch2_reservation_invalid(struct bch_fs *, struct bkey_s_c, 451 - enum bkey_invalid_flags, struct printbuf *); 451 + enum bch_validate_flags, struct printbuf *); 452 452 void bch2_reservation_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 453 453 bool bch2_reservation_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c); 454 454 ··· 654 654 do { \ 655 655 struct bkey_ptrs _ptrs = bch2_bkey_ptrs(_k); \ 656 656 \ 657 - _ptr = &_ptrs.start->ptr; \ 657 + struct bch_extent_ptr *_ptr = &_ptrs.start->ptr; \ 658 658 \ 659 659 while ((_ptr = bkey_ptr_next(_ptrs, _ptr))) { \ 660 660 if (_cond) { \ ··· 680 680 void bch2_bkey_ptrs_to_text(struct printbuf *, struct bch_fs *, 681 681 struct bkey_s_c); 682 682 int bch2_bkey_ptrs_invalid(struct bch_fs *, struct bkey_s_c, 683 - enum bkey_invalid_flags, struct printbuf *); 683 + enum bch_validate_flags, struct printbuf *); 684 684 685 685 void bch2_ptr_swab(struct bkey_s); 686 686
+88 -17
fs/bcachefs/eytzinger.c
··· 171 171 swap_r_func_t swap_func, 172 172 const void *priv) 173 173 { 174 - int i, c, r; 174 + int i, j, k; 175 175 176 176 /* called from 'sort' without swap function, let's pick the default */ 177 177 if (swap_func == SWAP_WRAPPER && !((struct wrapper *)priv)->swap_func) ··· 188 188 189 189 /* heapify */ 190 190 for (i = n / 2 - 1; i >= 0; --i) { 191 - for (r = i; r * 2 + 1 < n; r = c) { 192 - c = r * 2 + 1; 191 + /* Find the sift-down path all the way to the leaves. */ 192 + for (j = i; k = j * 2 + 1, k + 1 < n;) 193 + j = eytzinger0_do_cmp(base, n, size, cmp_func, priv, k, k + 1) > 0 ? k : k + 1; 193 194 194 - if (c + 1 < n && 195 - eytzinger0_do_cmp(base, n, size, cmp_func, priv, c, c + 1) < 0) 196 - c++; 195 + /* Special case for the last leaf with no sibling. */ 196 + if (j * 2 + 2 == n) 197 + j = j * 2 + 1; 197 198 198 - if (eytzinger0_do_cmp(base, n, size, cmp_func, priv, r, c) >= 0) 199 - break; 199 + /* Backtrack to the correct location. */ 200 + while (j != i && eytzinger0_do_cmp(base, n, size, cmp_func, priv, i, j) >= 0) 201 + j = (j - 1) / 2; 200 202 201 - eytzinger0_do_swap(base, n, size, swap_func, priv, r, c); 203 + /* Shift the element into its correct place. */ 204 + for (k = j; j != i;) { 205 + j = (j - 1) / 2; 206 + eytzinger0_do_swap(base, n, size, swap_func, priv, j, k); 202 207 } 203 208 } 204 209 ··· 211 206 for (i = n - 1; i > 0; --i) { 212 207 eytzinger0_do_swap(base, n, size, swap_func, priv, 0, i); 213 208 214 - for (r = 0; r * 2 + 1 < i; r = c) { 215 - c = r * 2 + 1; 209 + /* Find the sift-down path all the way to the leaves. */ 210 + for (j = 0; k = j * 2 + 1, k + 1 < i;) 211 + j = eytzinger0_do_cmp(base, n, size, cmp_func, priv, k, k + 1) > 0 ? k : k + 1; 216 212 217 - if (c + 1 < i && 218 - eytzinger0_do_cmp(base, n, size, cmp_func, priv, c, c + 1) < 0) 219 - c++; 213 + /* Special case for the last leaf with no sibling. */ 214 + if (j * 2 + 2 == i) 215 + j = j * 2 + 1; 220 216 221 - if (eytzinger0_do_cmp(base, n, size, cmp_func, priv, r, c) >= 0) 222 - break; 217 + /* Backtrack to the correct location. */ 218 + while (j && eytzinger0_do_cmp(base, n, size, cmp_func, priv, 0, j) >= 0) 219 + j = (j - 1) / 2; 223 220 224 - eytzinger0_do_swap(base, n, size, swap_func, priv, r, c); 221 + /* Shift the element into its correct place. */ 222 + for (k = j; j;) { 223 + j = (j - 1) / 2; 224 + eytzinger0_do_swap(base, n, size, swap_func, priv, j, k); 225 225 } 226 226 } 227 227 } ··· 242 232 243 233 return eytzinger0_sort_r(base, n, size, _CMP_WRAPPER, SWAP_WRAPPER, &w); 244 234 } 235 + 236 + #if 0 237 + #include <linux/slab.h> 238 + #include <linux/random.h> 239 + #include <linux/ktime.h> 240 + 241 + static u64 cmp_count; 242 + 243 + static int mycmp(const void *a, const void *b) 244 + { 245 + u32 _a = *(u32 *)a; 246 + u32 _b = *(u32 *)b; 247 + 248 + cmp_count++; 249 + if (_a < _b) 250 + return -1; 251 + else if (_a > _b) 252 + return 1; 253 + else 254 + return 0; 255 + } 256 + 257 + static int test(void) 258 + { 259 + size_t N, i; 260 + ktime_t start, end; 261 + s64 delta; 262 + u32 *arr; 263 + 264 + for (N = 10000; N <= 100000; N += 10000) { 265 + arr = kmalloc_array(N, sizeof(u32), GFP_KERNEL); 266 + cmp_count = 0; 267 + 268 + for (i = 0; i < N; i++) 269 + arr[i] = get_random_u32(); 270 + 271 + start = ktime_get(); 272 + eytzinger0_sort(arr, N, sizeof(u32), mycmp, NULL); 273 + end = ktime_get(); 274 + 275 + delta = ktime_us_delta(end, start); 276 + printk(KERN_INFO "time: %lld\n", delta); 277 + printk(KERN_INFO "comparisons: %lld\n", cmp_count); 278 + 279 + u32 prev = 0; 280 + 281 + eytzinger0_for_each(i, N) { 282 + if (prev > arr[i]) 283 + goto err; 284 + prev = arr[i]; 285 + } 286 + 287 + kfree(arr); 288 + } 289 + return 0; 290 + 291 + err: 292 + kfree(arr); 293 + return -1; 294 + } 295 + #endif
+19 -19
fs/bcachefs/fs-common.c
··· 42 42 if (ret) 43 43 goto err; 44 44 45 - ret = bch2_inode_peek(trans, &dir_iter, dir_u, dir, BTREE_ITER_INTENT); 45 + ret = bch2_inode_peek(trans, &dir_iter, dir_u, dir, BTREE_ITER_intent); 46 46 if (ret) 47 47 goto err; 48 48 ··· 70 70 struct bch_subvolume s; 71 71 72 72 ret = bch2_subvolume_get(trans, snapshot_src.subvol, true, 73 - BTREE_ITER_CACHED, &s); 73 + BTREE_ITER_cached, &s); 74 74 if (ret) 75 75 goto err; 76 76 ··· 78 78 } 79 79 80 80 ret = bch2_inode_peek(trans, &inode_iter, new_inode, snapshot_src, 81 - BTREE_ITER_INTENT); 81 + BTREE_ITER_intent); 82 82 if (ret) 83 83 goto err; 84 84 ··· 163 163 name, 164 164 dir_target, 165 165 &dir_offset, 166 - BCH_HASH_SET_MUST_CREATE); 166 + STR_HASH_must_create); 167 167 if (ret) 168 168 goto err; 169 169 ··· 171 171 new_inode->bi_dir_offset = dir_offset; 172 172 } 173 173 174 - inode_iter.flags &= ~BTREE_ITER_ALL_SNAPSHOTS; 174 + inode_iter.flags &= ~BTREE_ITER_all_snapshots; 175 175 bch2_btree_iter_set_snapshot(&inode_iter, snapshot); 176 176 177 177 ret = bch2_btree_iter_traverse(&inode_iter) ?: ··· 198 198 if (dir.subvol != inum.subvol) 199 199 return -EXDEV; 200 200 201 - ret = bch2_inode_peek(trans, &inode_iter, inode_u, inum, BTREE_ITER_INTENT); 201 + ret = bch2_inode_peek(trans, &inode_iter, inode_u, inum, BTREE_ITER_intent); 202 202 if (ret) 203 - goto err; 203 + return ret; 204 204 205 205 inode_u->bi_ctime = now; 206 206 ret = bch2_inode_nlink_inc(inode_u); 207 207 if (ret) 208 - return ret; 208 + goto err; 209 209 210 - ret = bch2_inode_peek(trans, &dir_iter, dir_u, dir, BTREE_ITER_INTENT); 210 + ret = bch2_inode_peek(trans, &dir_iter, dir_u, dir, BTREE_ITER_intent); 211 211 if (ret) 212 212 goto err; 213 213 ··· 223 223 ret = bch2_dirent_create(trans, dir, &dir_hash, 224 224 mode_to_type(inode_u->bi_mode), 225 225 name, inum.inum, &dir_offset, 226 - BCH_HASH_SET_MUST_CREATE); 226 + STR_HASH_must_create); 227 227 if (ret) 228 228 goto err; 229 229 ··· 255 255 struct bkey_s_c k; 256 256 int ret; 257 257 258 - ret = bch2_inode_peek(trans, &dir_iter, dir_u, dir, BTREE_ITER_INTENT); 258 + ret = bch2_inode_peek(trans, &dir_iter, dir_u, dir, BTREE_ITER_intent); 259 259 if (ret) 260 260 goto err; 261 261 262 262 dir_hash = bch2_hash_info_init(c, dir_u); 263 263 264 264 ret = bch2_dirent_lookup_trans(trans, &dirent_iter, dir, &dir_hash, 265 - name, &inum, BTREE_ITER_INTENT); 265 + name, &inum, BTREE_ITER_intent); 266 266 if (ret) 267 267 goto err; 268 268 269 269 ret = bch2_inode_peek(trans, &inode_iter, inode_u, inum, 270 - BTREE_ITER_INTENT); 270 + BTREE_ITER_intent); 271 271 if (ret) 272 272 goto err; 273 273 ··· 322 322 323 323 ret = bch2_hash_delete_at(trans, bch2_dirent_hash_desc, 324 324 &dir_hash, &dirent_iter, 325 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?: 325 + BTREE_UPDATE_internal_snapshot_node) ?: 326 326 bch2_inode_write(trans, &dir_iter, dir_u) ?: 327 327 bch2_inode_write(trans, &inode_iter, inode_u); 328 328 err: ··· 363 363 struct bkey_i_subvolume *s = 364 364 bch2_bkey_get_mut_typed(trans, &iter, 365 365 BTREE_ID_subvolumes, POS(0, subvol), 366 - BTREE_ITER_CACHED, subvolume); 366 + BTREE_ITER_cached, subvolume); 367 367 int ret = PTR_ERR_OR_ZERO(s); 368 368 if (ret) 369 369 return ret; ··· 394 394 int ret; 395 395 396 396 ret = bch2_inode_peek(trans, &src_dir_iter, src_dir_u, src_dir, 397 - BTREE_ITER_INTENT); 397 + BTREE_ITER_intent); 398 398 if (ret) 399 399 goto err; 400 400 ··· 403 403 if (dst_dir.inum != src_dir.inum || 404 404 dst_dir.subvol != src_dir.subvol) { 405 405 ret = bch2_inode_peek(trans, &dst_dir_iter, dst_dir_u, dst_dir, 406 - BTREE_ITER_INTENT); 406 + BTREE_ITER_intent); 407 407 if (ret) 408 408 goto err; 409 409 ··· 423 423 goto err; 424 424 425 425 ret = bch2_inode_peek(trans, &src_inode_iter, src_inode_u, src_inum, 426 - BTREE_ITER_INTENT); 426 + BTREE_ITER_intent); 427 427 if (ret) 428 428 goto err; 429 429 430 430 if (dst_inum.inum) { 431 431 ret = bch2_inode_peek(trans, &dst_inode_iter, dst_inode_u, dst_inum, 432 - BTREE_ITER_INTENT); 432 + BTREE_ITER_intent); 433 433 if (ret) 434 434 goto err; 435 435 }
+3 -11
fs/bcachefs/fs-io-buffered.c
··· 30 30 { 31 31 struct folio_iter fi; 32 32 33 - bio_for_each_folio_all(fi, bio) { 34 - if (!bio->bi_status) { 35 - folio_mark_uptodate(fi.folio); 36 - } else { 37 - folio_clear_uptodate(fi.folio); 38 - folio_set_error(fi.folio); 39 - } 40 - folio_unlock(fi.folio); 41 - } 33 + bio_for_each_folio_all(fi, bio) 34 + folio_end_read(fi.folio, bio->bi_status == BLK_STS_OK); 42 35 43 36 bio_put(bio); 44 37 } ··· 169 176 170 177 bch2_trans_iter_init(trans, &iter, BTREE_ID_extents, 171 178 SPOS(inum.inum, rbio->bio.bi_iter.bi_sector, snapshot), 172 - BTREE_ITER_SLOTS); 179 + BTREE_ITER_slots); 173 180 while (1) { 174 181 struct bkey_s_c k; 175 182 unsigned bytes, sectors, offset_into_extent; ··· 401 408 bio_for_each_folio_all(fi, bio) { 402 409 struct bch_folio *s; 403 410 404 - folio_set_error(fi.folio); 405 411 mapping_set_error(fi.folio->mapping, -EIO); 406 412 407 413 s = __bch2_folio(fi.folio);
+1 -1
fs/bcachefs/fs-io-direct.c
··· 254 254 255 255 for_each_btree_key_norestart(trans, iter, BTREE_ID_extents, 256 256 SPOS(inum.inum, offset, snapshot), 257 - BTREE_ITER_SLOTS, k, err) { 257 + BTREE_ITER_slots, k, err) { 258 258 if (bkey_ge(bkey_start_pos(k.k), POS(inum.inum, end))) 259 259 break; 260 260
+1 -1
fs/bcachefs/fs-io-pagecache.c
··· 214 214 215 215 for_each_btree_key_norestart(trans, iter, BTREE_ID_extents, 216 216 SPOS(inum.inum, offset, snapshot), 217 - BTREE_ITER_SLOTS, k, ret) { 217 + BTREE_ITER_slots, k, ret) { 218 218 unsigned nr_ptrs = bch2_bkey_nr_ptrs_fully_allocated(k); 219 219 unsigned state = bkey_to_sector_state(k); 220 220
+6 -3
fs/bcachefs/fs-io.c
··· 202 202 goto out; 203 203 ret = bch2_flush_inode(c, inode); 204 204 out: 205 - return bch2_err_class(ret); 205 + ret = bch2_err_class(ret); 206 + if (ret == -EROFS) 207 + ret = -EIO; 208 + return ret; 206 209 } 207 210 208 211 /* truncate: */ ··· 597 594 598 595 bch2_trans_iter_init(trans, &iter, BTREE_ID_extents, 599 596 POS(inode->v.i_ino, start_sector), 600 - BTREE_ITER_SLOTS|BTREE_ITER_INTENT); 597 + BTREE_ITER_slots|BTREE_ITER_intent); 601 598 602 599 while (!ret && bkey_lt(iter.pos, end_pos)) { 603 600 s64 i_sectors_delta = 0; ··· 1012 1009 1013 1010 for_each_btree_key_norestart(trans, iter, BTREE_ID_extents, 1014 1011 SPOS(inode->v.i_ino, offset >> 9, snapshot), 1015 - BTREE_ITER_SLOTS, k, ret) { 1012 + BTREE_ITER_slots, k, ret) { 1016 1013 if (k.k->p.inode != inode->v.i_ino) { 1017 1014 next_hole = bch2_seek_pagecache_hole(&inode->v, 1018 1015 offset, MAX_LFS_FILESIZE, 0, false);
+1 -1
fs/bcachefs/fs-ioctl.c
··· 548 548 { 549 549 /* These are just misnamed, they actually get/put from/to user an int */ 550 550 switch (cmd) { 551 - case FS_IOC_GETFLAGS: 551 + case FS_IOC32_GETFLAGS: 552 552 cmd = FS_IOC_GETFLAGS; 553 553 break; 554 554 case FS_IOC32_SETFLAGS:
+60 -49
fs/bcachefs/fs.c
··· 90 90 bch2_trans_begin(trans); 91 91 92 92 ret = bch2_inode_peek(trans, &iter, &inode_u, inode_inum(inode), 93 - BTREE_ITER_INTENT) ?: 93 + BTREE_ITER_intent) ?: 94 94 (set ? set(trans, inode, &inode_u, p) : 0) ?: 95 95 bch2_inode_write(trans, &iter, &inode_u) ?: 96 96 bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); ··· 213 213 _ret; \ 214 214 }) 215 215 216 + static struct inode *bch2_alloc_inode(struct super_block *sb) 217 + { 218 + BUG(); 219 + } 220 + 221 + static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c) 222 + { 223 + struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, GFP_NOFS); 224 + if (!inode) 225 + return NULL; 226 + 227 + inode_init_once(&inode->v); 228 + mutex_init(&inode->ei_update_lock); 229 + two_state_lock_init(&inode->ei_pagecache_lock); 230 + INIT_LIST_HEAD(&inode->ei_vfs_inode_list); 231 + mutex_init(&inode->ei_quota_lock); 232 + inode->v.i_state = 0; 233 + 234 + if (unlikely(inode_init_always(c->vfs_sb, &inode->v))) { 235 + kmem_cache_free(bch2_inode_cache, inode); 236 + return NULL; 237 + } 238 + 239 + return inode; 240 + } 241 + 216 242 /* 217 243 * Allocate a new inode, dropping/retaking btree locks if necessary: 218 244 */ 219 245 static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans) 220 246 { 221 - struct bch_fs *c = trans->c; 222 - 223 247 struct bch_inode_info *inode = 224 248 memalloc_flags_do(PF_MEMALLOC_NORECLAIM|PF_MEMALLOC_NOWARN, 225 - to_bch_ei(new_inode(c->vfs_sb))); 249 + __bch2_new_inode(trans->c)); 226 250 227 251 if (unlikely(!inode)) { 228 - int ret = drop_locks_do(trans, (inode = to_bch_ei(new_inode(c->vfs_sb))) ? 0 : -ENOMEM); 252 + int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c)) ? 0 : -ENOMEM); 229 253 if (ret && inode) { 230 254 __destroy_inode(&inode->v); 231 255 kmem_cache_free(bch2_inode_cache, inode); ··· 314 290 if (ret) 315 291 return ERR_PTR(ret); 316 292 #endif 317 - inode = to_bch_ei(new_inode(c->vfs_sb)); 293 + inode = __bch2_new_inode(c); 318 294 if (unlikely(!inode)) { 319 295 inode = ERR_PTR(-ENOMEM); 320 296 goto err; ··· 347 323 inum.inum = inode_u.bi_inum; 348 324 349 325 ret = bch2_subvolume_get(trans, inum.subvol, true, 350 - BTREE_ITER_WITH_UPDATES, &subvol) ?: 326 + BTREE_ITER_with_updates, &subvol) ?: 351 327 bch2_trans_commit(trans, NULL, &journal_seq, 0); 352 328 if (unlikely(ret)) { 353 329 bch2_quota_acct(c, bch_qid(&inode_u), Q_INO, -1, ··· 400 376 struct bch_fs *c = trans->c; 401 377 struct btree_iter dirent_iter = {}; 402 378 subvol_inum inum = {}; 379 + struct printbuf buf = PRINTBUF; 403 380 404 - int ret = bch2_hash_lookup(trans, &dirent_iter, bch2_dirent_hash_desc, 405 - dir_hash_info, dir, name, 0); 381 + struct bkey_s_c k = bch2_hash_lookup(trans, &dirent_iter, bch2_dirent_hash_desc, 382 + dir_hash_info, dir, name, 0); 383 + int ret = bkey_err(k); 406 384 if (ret) 407 385 return ERR_PTR(ret); 408 - 409 - struct bkey_s_c k = bch2_btree_iter_peek_slot(&dirent_iter); 410 - ret = bkey_err(k); 411 - if (ret) 412 - goto err; 413 386 414 387 ret = bch2_dirent_read_target(trans, dir, bkey_s_c_to_dirent(k), &inum); 415 388 if (ret > 0) ··· 427 406 ret = bch2_subvolume_get(trans, inum.subvol, true, 0, &subvol) ?: 428 407 bch2_inode_find_by_inum_nowarn_trans(trans, inum, &inode_u) ?: 429 408 PTR_ERR_OR_ZERO(inode = bch2_new_inode(trans)); 430 - if (bch2_err_matches(ret, ENOENT)) { 431 - struct printbuf buf = PRINTBUF; 432 409 433 - bch2_bkey_val_to_text(&buf, c, k); 434 - bch_err(c, "%s points to missing inode", buf.buf); 435 - printbuf_exit(&buf); 436 - } 410 + bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), 411 + c, "dirent to missing inode:\n %s", 412 + (bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 437 413 if (ret) 438 414 goto err; 415 + 416 + /* regular files may have hardlinks: */ 417 + if (bch2_fs_inconsistent_on(bch2_inode_should_have_bp(&inode_u) && 418 + !bkey_eq(k.k->p, POS(inode_u.bi_dir, inode_u.bi_dir_offset)), 419 + c, 420 + "dirent points to inode that does not point back:\n %s", 421 + (bch2_bkey_val_to_text(&buf, c, k), 422 + prt_printf(&buf, "\n "), 423 + bch2_inode_unpacked_to_text(&buf, &inode_u), 424 + buf.buf))) { 425 + ret = -ENOENT; 426 + goto err; 427 + } 439 428 440 429 bch2_vfs_inode_init(trans, inum, inode, &inode_u, &subvol); 441 430 inode = bch2_inode_insert(c, inode); 442 431 out: 443 432 bch2_trans_iter_exit(trans, &dirent_iter); 433 + printbuf_exit(&buf); 444 434 return inode; 445 435 err: 446 436 inode = ERR_PTR(ret); ··· 819 787 acl = NULL; 820 788 821 789 ret = bch2_inode_peek(trans, &inode_iter, &inode_u, inode_inum(inode), 822 - BTREE_ITER_INTENT); 790 + BTREE_ITER_intent); 823 791 if (ret) 824 792 goto btree_err; 825 793 ··· 1075 1043 1076 1044 bch2_btree_iter_set_pos(&iter, 1077 1045 POS(iter.pos.inode, iter.pos.offset + sectors)); 1046 + 1047 + ret = bch2_trans_relock(trans); 1048 + if (ret) 1049 + break; 1078 1050 } 1079 1051 start = iter.pos.offset; 1080 1052 bch2_trans_iter_exit(trans, &iter); ··· 1526 1490 mapping_set_large_folios(inode->v.i_mapping); 1527 1491 } 1528 1492 1529 - static struct inode *bch2_alloc_inode(struct super_block *sb) 1493 + static void bch2_free_inode(struct inode *vinode) 1530 1494 { 1531 - struct bch_inode_info *inode; 1532 - 1533 - inode = kmem_cache_alloc(bch2_inode_cache, GFP_NOFS); 1534 - if (!inode) 1535 - return NULL; 1536 - 1537 - inode_init_once(&inode->v); 1538 - mutex_init(&inode->ei_update_lock); 1539 - two_state_lock_init(&inode->ei_pagecache_lock); 1540 - INIT_LIST_HEAD(&inode->ei_vfs_inode_list); 1541 - mutex_init(&inode->ei_quota_lock); 1542 - 1543 - return &inode->v; 1544 - } 1545 - 1546 - static void bch2_i_callback(struct rcu_head *head) 1547 - { 1548 - struct inode *vinode = container_of(head, struct inode, i_rcu); 1549 - struct bch_inode_info *inode = to_bch_ei(vinode); 1550 - 1551 - kmem_cache_free(bch2_inode_cache, inode); 1552 - } 1553 - 1554 - static void bch2_destroy_inode(struct inode *vinode) 1555 - { 1556 - call_rcu(&vinode->i_rcu, bch2_i_callback); 1495 + kmem_cache_free(bch2_inode_cache, to_bch_ei(vinode)); 1557 1496 } 1558 1497 1559 1498 static int inode_update_times_fn(struct btree_trans *trans, ··· 1836 1825 1837 1826 static const struct super_operations bch_super_operations = { 1838 1827 .alloc_inode = bch2_alloc_inode, 1839 - .destroy_inode = bch2_destroy_inode, 1828 + .free_inode = bch2_free_inode, 1840 1829 .write_inode = bch2_vfs_write_inode, 1841 1830 .evict_inode = bch2_evict_inode, 1842 1831 .sync_fs = bch2_sync_fs,
+81 -131
fs/bcachefs/fsck.c
··· 79 79 80 80 bch2_trans_iter_init(trans, &iter, BTREE_ID_inodes, 81 81 POS(0, inode_nr), 82 - BTREE_ITER_ALL_SNAPSHOTS); 82 + BTREE_ITER_all_snapshots); 83 83 k = bch2_btree_iter_peek(&iter); 84 84 ret = bkey_err(k); 85 85 if (ret) ··· 127 127 u64 *target, unsigned *type, u32 snapshot) 128 128 { 129 129 struct btree_iter iter; 130 - struct bkey_s_c_dirent d; 131 - int ret = bch2_hash_lookup_in_snapshot(trans, &iter, bch2_dirent_hash_desc, 132 - &hash_info, dir, name, 0, snapshot); 130 + struct bkey_s_c k = bch2_hash_lookup_in_snapshot(trans, &iter, bch2_dirent_hash_desc, 131 + &hash_info, dir, name, 0, snapshot); 132 + int ret = bkey_err(k); 133 133 if (ret) 134 134 return ret; 135 135 136 - d = bkey_s_c_to_dirent(bch2_btree_iter_peek_slot(&iter)); 136 + struct bkey_s_c_dirent d = bkey_s_c_to_dirent(bch2_btree_iter_peek_slot(&iter)); 137 137 *target = le64_to_cpu(d.v->d_inum); 138 138 *type = d.v->d_type; 139 139 bch2_trans_iter_exit(trans, &iter); ··· 154 154 155 155 dir_hash_info = bch2_hash_info_init(c, &dir_inode); 156 156 157 - bch2_trans_iter_init(trans, &iter, BTREE_ID_dirents, pos, BTREE_ITER_INTENT); 157 + bch2_trans_iter_init(trans, &iter, BTREE_ID_dirents, pos, BTREE_ITER_intent); 158 158 159 159 ret = bch2_btree_iter_traverse(&iter) ?: 160 160 bch2_hash_delete_at(trans, bch2_dirent_hash_desc, 161 161 &dir_hash_info, &iter, 162 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 162 + BTREE_UPDATE_internal_snapshot_node); 163 163 bch2_trans_iter_exit(trans, &iter); 164 164 err: 165 165 bch_err_fn(c, ret); ··· 274 274 &lostfound_str, 275 275 lostfound->bi_inum, 276 276 &lostfound->bi_dir_offset, 277 - BCH_HASH_SET_MUST_CREATE) ?: 277 + STR_HASH_must_create) ?: 278 278 bch2_inode_write_flags(trans, &lostfound_iter, lostfound, 279 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 279 + BTREE_UPDATE_internal_snapshot_node); 280 280 err: 281 281 bch_err_msg(c, ret, "creating lost+found"); 282 282 bch2_trans_iter_exit(trans, &lostfound_iter); ··· 333 333 &name, 334 334 inode->bi_subvol ?: inode->bi_inum, 335 335 &dir_offset, 336 - BCH_HASH_SET_MUST_CREATE); 336 + STR_HASH_must_create); 337 337 if (ret) 338 338 return ret; 339 339 ··· 486 486 return reconstruct_inode(trans, snapshot, inum, k.k->p.offset << 9, S_IFREG); 487 487 } 488 488 489 - struct snapshots_seen_entry { 490 - u32 id; 491 - u32 equiv; 492 - }; 493 - 494 489 struct snapshots_seen { 495 490 struct bpos pos; 496 - DARRAY(struct snapshots_seen_entry) ids; 491 + snapshot_id_list ids; 497 492 }; 498 493 499 494 static inline void snapshots_seen_exit(struct snapshots_seen *s) ··· 503 508 504 509 static int snapshots_seen_add_inorder(struct bch_fs *c, struct snapshots_seen *s, u32 id) 505 510 { 506 - struct snapshots_seen_entry *i, n = { 507 - .id = id, 508 - .equiv = bch2_snapshot_equiv(c, id), 509 - }; 510 - int ret = 0; 511 - 511 + u32 *i; 512 512 __darray_for_each(s->ids, i) { 513 - if (i->id == id) 513 + if (*i == id) 514 514 return 0; 515 - if (i->id > id) 515 + if (*i > id) 516 516 break; 517 517 } 518 518 519 - ret = darray_insert_item(&s->ids, i - s->ids.data, n); 519 + int ret = darray_insert_item(&s->ids, i - s->ids.data, id); 520 520 if (ret) 521 521 bch_err(c, "error reallocating snapshots_seen table (size %zu)", 522 522 s->ids.size); ··· 521 531 static int snapshots_seen_update(struct bch_fs *c, struct snapshots_seen *s, 522 532 enum btree_id btree_id, struct bpos pos) 523 533 { 524 - struct snapshots_seen_entry n = { 525 - .id = pos.snapshot, 526 - .equiv = bch2_snapshot_equiv(c, pos.snapshot), 527 - }; 528 - int ret = 0; 529 - 530 534 if (!bkey_eq(s->pos, pos)) 531 535 s->ids.nr = 0; 532 - 533 536 s->pos = pos; 534 - s->pos.snapshot = n.equiv; 535 537 536 - darray_for_each(s->ids, i) { 537 - if (i->id == n.id) 538 - return 0; 539 - 540 - /* 541 - * We currently don't rigorously track for snapshot cleanup 542 - * needing to be run, so it shouldn't be a fsck error yet: 543 - */ 544 - if (i->equiv == n.equiv) { 545 - bch_err(c, "snapshot deletion did not finish:\n" 546 - " duplicate keys in btree %s at %llu:%llu snapshots %u, %u (equiv %u)\n", 547 - bch2_btree_id_str(btree_id), 548 - pos.inode, pos.offset, 549 - i->id, n.id, n.equiv); 550 - set_bit(BCH_FS_need_delete_dead_snapshots, &c->flags); 551 - return bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_delete_dead_snapshots); 552 - } 553 - } 554 - 555 - ret = darray_push(&s->ids, n); 556 - if (ret) 557 - bch_err(c, "error reallocating snapshots_seen table (size %zu)", 558 - s->ids.size); 559 - return ret; 538 + return snapshot_list_add_nodup(c, &s->ids, pos.snapshot); 560 539 } 561 540 562 541 /** ··· 545 586 ssize_t i; 546 587 547 588 EBUG_ON(id > ancestor); 548 - EBUG_ON(!bch2_snapshot_is_equiv(c, id)); 549 - EBUG_ON(!bch2_snapshot_is_equiv(c, ancestor)); 550 589 551 590 /* @ancestor should be the snapshot most recently added to @seen */ 552 591 EBUG_ON(ancestor != seen->pos.snapshot); 553 - EBUG_ON(ancestor != seen->ids.data[seen->ids.nr - 1].equiv); 592 + EBUG_ON(ancestor != darray_last(seen->ids)); 554 593 555 594 if (id == ancestor) 556 595 return true; ··· 567 610 */ 568 611 569 612 for (i = seen->ids.nr - 2; 570 - i >= 0 && seen->ids.data[i].equiv >= id; 613 + i >= 0 && seen->ids.data[i] >= id; 571 614 --i) 572 - if (bch2_snapshot_is_ancestor(c, id, seen->ids.data[i].equiv)) 615 + if (bch2_snapshot_is_ancestor(c, id, seen->ids.data[i])) 573 616 return false; 574 617 575 618 return true; ··· 600 643 u32 src, struct snapshots_seen *src_seen, 601 644 u32 dst, struct snapshots_seen *dst_seen) 602 645 { 603 - src = bch2_snapshot_equiv(c, src); 604 - dst = bch2_snapshot_equiv(c, dst); 605 - 606 646 if (dst > src) { 607 647 swap(dst, src); 608 648 swap(dst_seen, src_seen); ··· 646 692 647 693 return darray_push(&w->inodes, ((struct inode_walker_entry) { 648 694 .inode = u, 649 - .snapshot = bch2_snapshot_equiv(c, inode.k->p.snapshot), 695 + .snapshot = inode.k->p.snapshot, 650 696 })); 651 697 } 652 698 ··· 662 708 w->inodes.nr = 0; 663 709 664 710 for_each_btree_key_norestart(trans, iter, BTREE_ID_inodes, POS(0, inum), 665 - BTREE_ITER_ALL_SNAPSHOTS, k, ret) { 711 + BTREE_ITER_all_snapshots, k, ret) { 666 712 if (k.k->p.offset != inum) 667 713 break; 668 714 ··· 682 728 lookup_inode_for_snapshot(struct bch_fs *c, struct inode_walker *w, struct bkey_s_c k) 683 729 { 684 730 bool is_whiteout = k.k->type == KEY_TYPE_whiteout; 685 - u32 snapshot = bch2_snapshot_equiv(c, k.k->p.snapshot); 686 731 687 732 struct inode_walker_entry *i; 688 733 __darray_for_each(w->inodes, i) 689 - if (bch2_snapshot_is_ancestor(c, snapshot, i->snapshot)) 734 + if (bch2_snapshot_is_ancestor(c, k.k->p.snapshot, i->snapshot)) 690 735 goto found; 691 736 692 737 return NULL; 693 738 found: 694 - BUG_ON(snapshot > i->snapshot); 739 + BUG_ON(k.k->p.snapshot > i->snapshot); 695 740 696 - if (snapshot != i->snapshot && !is_whiteout) { 741 + if (k.k->p.snapshot != i->snapshot && !is_whiteout) { 697 742 struct inode_walker_entry new = *i; 698 743 699 - new.snapshot = snapshot; 744 + new.snapshot = k.k->p.snapshot; 700 745 new.count = 0; 701 746 702 747 struct printbuf buf = PRINTBUF; ··· 704 751 bch_info(c, "have key for inode %llu:%u but have inode in ancestor snapshot %u\n" 705 752 "unexpected because we should always update the inode when we update a key in that inode\n" 706 753 "%s", 707 - w->last_pos.inode, snapshot, i->snapshot, buf.buf); 754 + w->last_pos.inode, k.k->p.snapshot, i->snapshot, buf.buf); 708 755 printbuf_exit(&buf); 709 756 710 - while (i > w->inodes.data && i[-1].snapshot > snapshot) 757 + while (i > w->inodes.data && i[-1].snapshot > k.k->p.snapshot) 711 758 --i; 712 759 713 760 size_t pos = i - w->inodes.data; ··· 739 786 return lookup_inode_for_snapshot(trans->c, w, k); 740 787 } 741 788 742 - static int __get_visible_inodes(struct btree_trans *trans, 743 - struct inode_walker *w, 744 - struct snapshots_seen *s, 745 - u64 inum) 789 + static int get_visible_inodes(struct btree_trans *trans, 790 + struct inode_walker *w, 791 + struct snapshots_seen *s, 792 + u64 inum) 746 793 { 747 794 struct bch_fs *c = trans->c; 748 795 struct btree_iter iter; ··· 752 799 w->inodes.nr = 0; 753 800 754 801 for_each_btree_key_norestart(trans, iter, BTREE_ID_inodes, POS(0, inum), 755 - BTREE_ITER_ALL_SNAPSHOTS, k, ret) { 756 - u32 equiv = bch2_snapshot_equiv(c, k.k->p.snapshot); 757 - 802 + BTREE_ITER_all_snapshots, k, ret) { 758 803 if (k.k->p.offset != inum) 759 804 break; 760 805 761 - if (!ref_visible(c, s, s->pos.snapshot, equiv)) 806 + if (!ref_visible(c, s, s->pos.snapshot, k.k->p.snapshot)) 762 807 continue; 763 808 764 809 if (bkey_is_inode(k.k)) 765 810 add_inode(c, w, k); 766 811 767 - if (equiv >= s->pos.snapshot) 812 + if (k.k->p.snapshot >= s->pos.snapshot) 768 813 break; 769 814 } 770 815 bch2_trans_iter_exit(trans, &iter); ··· 783 832 "key in missing snapshot: %s", 784 833 (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 785 834 ret = bch2_btree_delete_at(trans, iter, 786 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?: 1; 835 + BTREE_UPDATE_internal_snapshot_node) ?: 1; 787 836 fsck_err: 788 837 printbuf_exit(&buf); 789 838 return ret; ··· 812 861 bch2_hash_set_in_snapshot(trans, desc, hash_info, 813 862 (subvol_inum) { 0, k.k->p.inode }, 814 863 k.k->p.snapshot, tmp, 815 - BCH_HASH_SET_MUST_CREATE, 816 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?: 864 + STR_HASH_must_create| 865 + BTREE_UPDATE_internal_snapshot_node) ?: 817 866 bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); 818 867 } 819 868 ··· 842 891 843 892 for_each_btree_key_norestart(trans, iter, desc.btree_id, 844 893 SPOS(hash_k.k->p.inode, hash, hash_k.k->p.snapshot), 845 - BTREE_ITER_SLOTS, k, ret) { 894 + BTREE_ITER_slots, k, ret) { 846 895 if (bkey_eq(k.k->p, hash_k.k->p)) 847 896 break; 848 897 ··· 1184 1233 int ret = bch2_trans_run(c, 1185 1234 for_each_btree_key_commit(trans, iter, BTREE_ID_inodes, 1186 1235 POS_MIN, 1187 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, 1236 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 1188 1237 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 1189 1238 check_inode(trans, &iter, k, &prev, &s, full))); 1190 1239 ··· 1313 1362 BUG_ON(bkey_le(pos1, bkey_start_pos(&pos2))); 1314 1363 1315 1364 bch2_trans_iter_init(trans, &iter1, btree, pos1, 1316 - BTREE_ITER_ALL_SNAPSHOTS| 1317 - BTREE_ITER_NOT_EXTENTS); 1365 + BTREE_ITER_all_snapshots| 1366 + BTREE_ITER_not_extents); 1318 1367 k1 = bch2_btree_iter_peek_upto(&iter1, POS(pos1.inode, U64_MAX)); 1319 1368 ret = bkey_err(k1); 1320 1369 if (ret) ··· 1376 1425 trans->extra_disk_res += bch2_bkey_sectors_compressed(k2); 1377 1426 1378 1427 ret = bch2_trans_update_extent_overwrite(trans, old_iter, 1379 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE, 1428 + BTREE_UPDATE_internal_snapshot_node, 1380 1429 k1, k2) ?: 1381 1430 bch2_trans_commit(trans, &res, NULL, BCH_TRANS_COMMIT_no_enospc); 1382 1431 bch2_disk_reservation_put(c, &res); ··· 1417 1466 struct snapshots_seen *seen, 1418 1467 struct extent_ends *extent_ends, 1419 1468 struct bkey_s_c k, 1420 - u32 equiv, 1421 1469 struct btree_iter *iter, 1422 1470 bool *fixed) 1423 1471 { ··· 1485 1535 struct bch_fs *c = trans->c; 1486 1536 struct inode_walker_entry *i; 1487 1537 struct printbuf buf = PRINTBUF; 1488 - struct bpos equiv = k.k->p; 1489 1538 int ret = 0; 1490 - 1491 - equiv.snapshot = bch2_snapshot_equiv(c, k.k->p.snapshot); 1492 1539 1493 1540 ret = check_key_has_snapshot(trans, iter, k); 1494 1541 if (ret) { ··· 1536 1589 bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 1537 1590 goto delete; 1538 1591 1539 - ret = check_overlapping_extents(trans, s, extent_ends, k, 1540 - equiv.snapshot, iter, 1592 + ret = check_overlapping_extents(trans, s, extent_ends, k, iter, 1541 1593 &inode->recalculate_sums); 1542 1594 if (ret) 1543 1595 goto err; ··· 1553 1607 for (; 1554 1608 inode->inodes.data && i >= inode->inodes.data; 1555 1609 --i) { 1556 - if (i->snapshot > equiv.snapshot || 1557 - !key_visible_in_snapshot(c, s, i->snapshot, equiv.snapshot)) 1610 + if (i->snapshot > k.k->p.snapshot || 1611 + !key_visible_in_snapshot(c, s, i->snapshot, k.k->p.snapshot)) 1558 1612 continue; 1559 1613 1560 1614 if (k.k->type != KEY_TYPE_whiteout) { ··· 1571 1625 bch2_btree_iter_set_snapshot(&iter2, i->snapshot); 1572 1626 ret = bch2_btree_iter_traverse(&iter2) ?: 1573 1627 bch2_btree_delete_at(trans, &iter2, 1574 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 1628 + BTREE_UPDATE_internal_snapshot_node); 1575 1629 bch2_trans_iter_exit(trans, &iter2); 1576 1630 if (ret) 1577 1631 goto err; ··· 1598 1652 bch_err_fn(c, ret); 1599 1653 return ret; 1600 1654 delete: 1601 - ret = bch2_btree_delete_at(trans, iter, BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 1655 + ret = bch2_btree_delete_at(trans, iter, BTREE_UPDATE_internal_snapshot_node); 1602 1656 goto out; 1603 1657 } 1604 1658 ··· 1619 1673 int ret = bch2_trans_run(c, 1620 1674 for_each_btree_key_commit(trans, iter, BTREE_ID_extents, 1621 1675 POS(BCACHEFS_ROOT_INO, 0), 1622 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, 1676 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 1623 1677 &res, NULL, 1624 1678 BCH_TRANS_COMMIT_no_enospc, ({ 1625 1679 bch2_disk_reservation_put(c, &res); ··· 1644 1698 int ret = bch2_trans_run(c, 1645 1699 for_each_btree_key_commit(trans, iter, BTREE_ID_reflink, 1646 1700 POS_MIN, 1647 - BTREE_ITER_PREFETCH, k, 1701 + BTREE_ITER_prefetch, k, 1648 1702 &res, NULL, 1649 1703 BCH_TRANS_COMMIT_no_enospc, ({ 1650 1704 bch2_disk_reservation_put(c, &res); ··· 1712 1766 1713 1767 if (inode_points_to_dirent(target, d)) 1714 1768 return 0; 1769 + 1770 + if (bch2_inode_should_have_bp(target) && 1771 + !fsck_err(c, inode_wrong_backpointer, 1772 + "dirent points to inode that does not point back:\n %s", 1773 + (bch2_bkey_val_to_text(&buf, c, d.s_c), 1774 + prt_printf(&buf, "\n "), 1775 + bch2_inode_unpacked_to_text(&buf, target), 1776 + buf.buf))) 1777 + goto out_noiter; 1715 1778 1716 1779 if (!target->bi_dir && 1717 1780 !target->bi_dir_offset) { ··· 1790 1835 err: 1791 1836 fsck_err: 1792 1837 bch2_trans_iter_exit(trans, &bp_iter); 1838 + out_noiter: 1793 1839 printbuf_exit(&buf); 1794 1840 bch_err_fn(c, ret); 1795 1841 return ret; ··· 2008 2052 struct bch_fs *c = trans->c; 2009 2053 struct inode_walker_entry *i; 2010 2054 struct printbuf buf = PRINTBUF; 2011 - struct bpos equiv; 2012 2055 int ret = 0; 2013 2056 2014 2057 ret = check_key_has_snapshot(trans, iter, k); ··· 2015 2060 ret = ret < 0 ? ret : 0; 2016 2061 goto out; 2017 2062 } 2018 - 2019 - equiv = k.k->p; 2020 - equiv.snapshot = bch2_snapshot_equiv(c, k.k->p.snapshot); 2021 2063 2022 2064 ret = snapshots_seen_update(c, s, iter->btree_id, k.k->p); 2023 2065 if (ret) ··· 2056 2104 (printbuf_reset(&buf), 2057 2105 bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 2058 2106 ret = bch2_btree_delete_at(trans, iter, 2059 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 2107 + BTREE_UPDATE_internal_snapshot_node); 2060 2108 goto out; 2061 2109 } 2062 2110 ··· 2092 2140 if (ret) 2093 2141 goto err; 2094 2142 } else { 2095 - ret = __get_visible_inodes(trans, target, s, le64_to_cpu(d.v->d_inum)); 2143 + ret = get_visible_inodes(trans, target, s, le64_to_cpu(d.v->d_inum)); 2096 2144 if (ret) 2097 2145 goto err; 2098 2146 2099 2147 if (fsck_err_on(!target->inodes.nr, 2100 2148 c, dirent_to_missing_inode, 2101 - "dirent points to missing inode: (equiv %u)\n%s", 2102 - equiv.snapshot, 2149 + "dirent points to missing inode:\n%s", 2103 2150 (printbuf_reset(&buf), 2104 2151 bch2_bkey_val_to_text(&buf, c, k), 2105 2152 buf.buf))) { ··· 2115 2164 } 2116 2165 2117 2166 if (d.v->d_type == DT_DIR) 2118 - for_each_visible_inode(c, s, dir, equiv.snapshot, i) 2167 + for_each_visible_inode(c, s, dir, d.k->p.snapshot, i) 2119 2168 i->count++; 2120 2169 } 2121 2170 out: ··· 2142 2191 int ret = bch2_trans_run(c, 2143 2192 for_each_btree_key_commit(trans, iter, BTREE_ID_dirents, 2144 2193 POS(BCACHEFS_ROOT_INO, 0), 2145 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, 2194 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, 2146 2195 k, 2147 2196 NULL, NULL, 2148 2197 BCH_TRANS_COMMIT_no_enospc, ··· 2206 2255 ret = bch2_trans_run(c, 2207 2256 for_each_btree_key_commit(trans, iter, BTREE_ID_xattrs, 2208 2257 POS(BCACHEFS_ROOT_INO, 0), 2209 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, 2258 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, 2210 2259 k, 2211 2260 NULL, NULL, 2212 2261 BCH_TRANS_COMMIT_no_enospc, ··· 2373 2422 { 2374 2423 int ret = bch2_trans_run(c, 2375 2424 for_each_btree_key_commit(trans, iter, 2376 - BTREE_ID_subvolumes, POS_MIN, BTREE_ITER_PREFETCH, k, 2425 + BTREE_ID_subvolumes, POS_MIN, BTREE_ITER_prefetch, k, 2377 2426 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 2378 2427 check_subvol_path(trans, &iter, k))); 2379 2428 bch_err_fn(c, ret); ··· 2408 2457 struct btree_iter inode_iter = {}; 2409 2458 struct bch_inode_unpacked inode; 2410 2459 struct printbuf buf = PRINTBUF; 2411 - u32 snapshot = bch2_snapshot_equiv(c, inode_k.k->p.snapshot); 2460 + u32 snapshot = inode_k.k->p.snapshot; 2412 2461 int ret = 0; 2413 2462 2414 2463 p->nr = 0; ··· 2510 2559 2511 2560 ret = bch2_trans_run(c, 2512 2561 for_each_btree_key_commit(trans, iter, BTREE_ID_inodes, POS_MIN, 2513 - BTREE_ITER_INTENT| 2514 - BTREE_ITER_PREFETCH| 2515 - BTREE_ITER_ALL_SNAPSHOTS, k, 2562 + BTREE_ITER_intent| 2563 + BTREE_ITER_prefetch| 2564 + BTREE_ITER_all_snapshots, k, 2516 2565 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, ({ 2517 2566 if (!bkey_is_inode(k.k)) 2518 2567 continue; ··· 2612 2661 int ret = bch2_trans_run(c, 2613 2662 for_each_btree_key(trans, iter, BTREE_ID_inodes, 2614 2663 POS(0, start), 2615 - BTREE_ITER_INTENT| 2616 - BTREE_ITER_PREFETCH| 2617 - BTREE_ITER_ALL_SNAPSHOTS, k, ({ 2664 + BTREE_ITER_intent| 2665 + BTREE_ITER_prefetch| 2666 + BTREE_ITER_all_snapshots, k, ({ 2618 2667 if (!bkey_is_inode(k.k)) 2619 2668 continue; 2620 2669 ··· 2655 2704 2656 2705 int ret = bch2_trans_run(c, 2657 2706 for_each_btree_key(trans, iter, BTREE_ID_dirents, POS_MIN, 2658 - BTREE_ITER_INTENT| 2659 - BTREE_ITER_PREFETCH| 2660 - BTREE_ITER_ALL_SNAPSHOTS, k, ({ 2707 + BTREE_ITER_intent| 2708 + BTREE_ITER_prefetch| 2709 + BTREE_ITER_all_snapshots, k, ({ 2661 2710 ret = snapshots_seen_update(c, &s, iter.btree_id, k.k->p); 2662 2711 if (ret) 2663 2712 break; ··· 2668 2717 if (d.v->d_type != DT_DIR && 2669 2718 d.v->d_type != DT_SUBVOL) 2670 2719 inc_link(c, &s, links, range_start, range_end, 2671 - le64_to_cpu(d.v->d_inum), 2672 - bch2_snapshot_equiv(c, d.k->p.snapshot)); 2720 + le64_to_cpu(d.v->d_inum), d.k->p.snapshot); 2673 2721 } 2674 2722 0; 2675 2723 }))); ··· 2731 2781 int ret = bch2_trans_run(c, 2732 2782 for_each_btree_key_commit(trans, iter, BTREE_ID_inodes, 2733 2783 POS(0, range_start), 2734 - BTREE_ITER_INTENT|BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, 2784 + BTREE_ITER_intent|BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 2735 2785 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 2736 2786 check_nlinks_update_inode(trans, &iter, k, links, &idx, range_end))); 2737 2787 if (ret < 0) { ··· 2799 2849 u->v.front_pad = 0; 2800 2850 u->v.back_pad = 0; 2801 2851 2802 - return bch2_trans_update(trans, iter, &u->k_i, BTREE_TRIGGER_NORUN); 2852 + return bch2_trans_update(trans, iter, &u->k_i, BTREE_TRIGGER_norun); 2803 2853 } 2804 2854 2805 2855 int bch2_fix_reflink_p(struct bch_fs *c) ··· 2810 2860 int ret = bch2_trans_run(c, 2811 2861 for_each_btree_key_commit(trans, iter, 2812 2862 BTREE_ID_extents, POS_MIN, 2813 - BTREE_ITER_INTENT|BTREE_ITER_PREFETCH| 2814 - BTREE_ITER_ALL_SNAPSHOTS, k, 2863 + BTREE_ITER_intent|BTREE_ITER_prefetch| 2864 + BTREE_ITER_all_snapshots, k, 2815 2865 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 2816 2866 fix_reflink_p_key(trans, &iter, k))); 2817 2867 bch_err_fn(c, ret);
+27 -37
fs/bcachefs/inode.c
··· 339 339 340 340 k = bch2_bkey_get_iter(trans, iter, BTREE_ID_inodes, 341 341 SPOS(0, inum.inum, snapshot), 342 - flags|BTREE_ITER_CACHED); 342 + flags|BTREE_ITER_cached); 343 343 ret = bkey_err(k); 344 344 if (ret) 345 345 return ret; ··· 371 371 int bch2_inode_write_flags(struct btree_trans *trans, 372 372 struct btree_iter *iter, 373 373 struct bch_inode_unpacked *inode, 374 - enum btree_update_flags flags) 374 + enum btree_iter_update_trigger_flags flags) 375 375 { 376 376 struct bkey_inode_buf *inode_p; 377 377 ··· 399 399 400 400 return bch2_btree_insert_nonextent(trans, BTREE_ID_inodes, 401 401 &inode_p->inode.k_i, 402 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 402 + BTREE_UPDATE_internal_snapshot_node); 403 403 } 404 404 405 405 int bch2_fsck_write_inode(struct btree_trans *trans, ··· 473 473 } 474 474 475 475 int bch2_inode_invalid(struct bch_fs *c, struct bkey_s_c k, 476 - enum bkey_invalid_flags flags, 476 + enum bch_validate_flags flags, 477 477 struct printbuf *err) 478 478 { 479 479 struct bkey_s_c_inode inode = bkey_s_c_to_inode(k); ··· 490 490 } 491 491 492 492 int bch2_inode_v2_invalid(struct bch_fs *c, struct bkey_s_c k, 493 - enum bkey_invalid_flags flags, 493 + enum bch_validate_flags flags, 494 494 struct printbuf *err) 495 495 { 496 496 struct bkey_s_c_inode_v2 inode = bkey_s_c_to_inode_v2(k); ··· 507 507 } 508 508 509 509 int bch2_inode_v3_invalid(struct bch_fs *c, struct bkey_s_c k, 510 - enum bkey_invalid_flags flags, 510 + enum bch_validate_flags flags, 511 511 struct printbuf *err) 512 512 { 513 513 struct bkey_s_c_inode_v3 inode = bkey_s_c_to_inode_v3(k); ··· 535 535 struct bch_inode_unpacked *inode) 536 536 { 537 537 printbuf_indent_add(out, 2); 538 - prt_printf(out, "mode=%o", inode->bi_mode); 539 - prt_newline(out); 538 + prt_printf(out, "mode=%o\n", inode->bi_mode); 540 539 541 540 prt_str(out, "flags="); 542 541 prt_bitflags(out, bch2_inode_flag_strs, inode->bi_flags & ((1U << 20) - 1)); 543 - prt_printf(out, " (%x)", inode->bi_flags); 544 - prt_newline(out); 542 + prt_printf(out, " (%x)\n", inode->bi_flags); 545 543 546 - prt_printf(out, "journal_seq=%llu", inode->bi_journal_seq); 547 - prt_newline(out); 548 - 549 - prt_printf(out, "bi_size=%llu", inode->bi_size); 550 - prt_newline(out); 551 - 552 - prt_printf(out, "bi_sectors=%llu", inode->bi_sectors); 553 - prt_newline(out); 554 - 555 - prt_printf(out, "bi_version=%llu", inode->bi_version); 556 - prt_newline(out); 544 + prt_printf(out, "journal_seq=%llu\n", inode->bi_journal_seq); 545 + prt_printf(out, "bi_size=%llu\n", inode->bi_size); 546 + prt_printf(out, "bi_sectors=%llu\n", inode->bi_sectors); 547 + prt_printf(out, "bi_version=%llu\n", inode->bi_version); 557 548 558 549 #define x(_name, _bits) \ 559 - prt_printf(out, #_name "=%llu", (u64) inode->_name); \ 560 - prt_newline(out); 550 + prt_printf(out, #_name "=%llu\n", (u64) inode->_name); 561 551 BCH_INODE_FIELDS_v3() 562 552 #undef x 563 553 printbuf_indent_sub(out, 2); ··· 594 604 enum btree_id btree_id, unsigned level, 595 605 struct bkey_s_c old, 596 606 struct bkey_s new, 597 - unsigned flags) 607 + enum btree_iter_update_trigger_flags flags) 598 608 { 599 609 s64 nr = (s64) bkey_is_inode(new.k) - (s64) bkey_is_inode(old.k); 600 610 601 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { 611 + if (flags & BTREE_TRIGGER_transactional) { 602 612 if (nr) { 603 613 int ret = bch2_replicas_deltas_realloc(trans, 0); 604 614 if (ret) ··· 617 627 } 618 628 } 619 629 620 - if ((flags & BTREE_TRIGGER_ATOMIC) && (flags & BTREE_TRIGGER_INSERT)) { 630 + if ((flags & BTREE_TRIGGER_atomic) && (flags & BTREE_TRIGGER_insert)) { 621 631 BUG_ON(!trans->journal_res.seq); 622 632 623 633 bkey_s_to_inode_v3(new).v->bi_journal_seq = cpu_to_le64(trans->journal_res.seq); 624 634 } 625 635 626 - if (flags & BTREE_TRIGGER_GC) { 636 + if (flags & BTREE_TRIGGER_gc) { 627 637 struct bch_fs *c = trans->c; 628 638 629 639 percpu_down_read(&c->mark_lock); ··· 635 645 } 636 646 637 647 int bch2_inode_generation_invalid(struct bch_fs *c, struct bkey_s_c k, 638 - enum bkey_invalid_flags flags, 648 + enum bch_validate_flags flags, 639 649 struct printbuf *err) 640 650 { 641 651 int ret = 0; ··· 752 762 753 763 pos = start; 754 764 bch2_trans_iter_init(trans, iter, BTREE_ID_inodes, POS(0, pos), 755 - BTREE_ITER_ALL_SNAPSHOTS| 756 - BTREE_ITER_INTENT); 765 + BTREE_ITER_all_snapshots| 766 + BTREE_ITER_intent); 757 767 again: 758 768 while ((k = bch2_btree_iter_peek(iter)).k && 759 769 !(ret = bkey_err(k)) && ··· 814 824 * extent iterator: 815 825 */ 816 826 bch2_trans_iter_init(trans, &iter, id, POS(inum.inum, 0), 817 - BTREE_ITER_INTENT); 827 + BTREE_ITER_intent); 818 828 819 829 while (1) { 820 830 bch2_trans_begin(trans); ··· 836 846 bkey_init(&delete.k); 837 847 delete.k.p = iter.pos; 838 848 839 - if (iter.flags & BTREE_ITER_IS_EXTENTS) 849 + if (iter.flags & BTREE_ITER_is_extents) 840 850 bch2_key_resize(&delete.k, 841 851 bpos_min(end, k.k->p).offset - 842 852 iter.pos.offset); ··· 885 895 886 896 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_inodes, 887 897 SPOS(0, inum.inum, snapshot), 888 - BTREE_ITER_INTENT|BTREE_ITER_CACHED); 898 + BTREE_ITER_intent|BTREE_ITER_cached); 889 899 ret = bkey_err(k); 890 900 if (ret) 891 901 goto err; ··· 1045 1055 bch2_trans_begin(trans); 1046 1056 1047 1057 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_inodes, 1048 - SPOS(0, inum, snapshot), BTREE_ITER_INTENT); 1058 + SPOS(0, inum, snapshot), BTREE_ITER_intent); 1049 1059 ret = bkey_err(k); 1050 1060 if (ret) 1051 1061 goto err; ··· 1090 1100 struct bch_inode_unpacked inode; 1091 1101 int ret; 1092 1102 1093 - k = bch2_bkey_get_iter(trans, &inode_iter, BTREE_ID_inodes, pos, BTREE_ITER_CACHED); 1103 + k = bch2_bkey_get_iter(trans, &inode_iter, BTREE_ID_inodes, pos, BTREE_ITER_cached); 1094 1104 ret = bkey_err(k); 1095 1105 if (ret) 1096 1106 return ret; ··· 1142 1152 inode.bi_flags &= ~BCH_INODE_unlinked; 1143 1153 1144 1154 ret = bch2_inode_write_flags(trans, &inode_iter, &inode, 1145 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 1155 + BTREE_UPDATE_internal_snapshot_node); 1146 1156 bch_err_msg(c, ret, "clearing inode unlinked flag"); 1147 1157 if (ret) 1148 1158 goto out; ··· 1189 1199 * flushed and we'd spin: 1190 1200 */ 1191 1201 ret = for_each_btree_key_commit(trans, iter, BTREE_ID_deleted_inodes, POS_MIN, 1192 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, 1202 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 1193 1203 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, ({ 1194 1204 ret = may_delete_deleted_inode(trans, &iter, k.k->p, &need_another_pass); 1195 1205 if (ret > 0) {
+16 -7
fs/bcachefs/inode.h
··· 6 6 #include "bkey_methods.h" 7 7 #include "opts.h" 8 8 9 - enum bkey_invalid_flags; 9 + enum bch_validate_flags; 10 10 extern const char * const bch2_inode_opts[]; 11 11 12 12 int bch2_inode_invalid(struct bch_fs *, struct bkey_s_c, 13 - enum bkey_invalid_flags, struct printbuf *); 13 + enum bch_validate_flags, struct printbuf *); 14 14 int bch2_inode_v2_invalid(struct bch_fs *, struct bkey_s_c, 15 - enum bkey_invalid_flags, struct printbuf *); 15 + enum bch_validate_flags, struct printbuf *); 16 16 int bch2_inode_v3_invalid(struct bch_fs *, struct bkey_s_c, 17 - enum bkey_invalid_flags, struct printbuf *); 17 + enum bch_validate_flags, struct printbuf *); 18 18 void bch2_inode_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 19 19 20 20 int bch2_trigger_inode(struct btree_trans *, enum btree_id, unsigned, 21 - struct bkey_s_c, struct bkey_s, unsigned); 21 + struct bkey_s_c, struct bkey_s, 22 + enum btree_iter_update_trigger_flags); 22 23 23 24 #define bch2_bkey_ops_inode ((struct bkey_ops) { \ 24 25 .key_invalid = bch2_inode_invalid, \ ··· 50 49 } 51 50 52 51 int bch2_inode_generation_invalid(struct bch_fs *, struct bkey_s_c, 53 - enum bkey_invalid_flags, struct printbuf *); 52 + enum bch_validate_flags, struct printbuf *); 54 53 void bch2_inode_generation_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 55 54 56 55 #define bch2_bkey_ops_inode_generation ((struct bkey_ops) { \ ··· 102 101 struct bch_inode_unpacked *, subvol_inum, unsigned); 103 102 104 103 int bch2_inode_write_flags(struct btree_trans *, struct btree_iter *, 105 - struct bch_inode_unpacked *, enum btree_update_flags); 104 + struct bch_inode_unpacked *, enum btree_iter_update_trigger_flags); 106 105 107 106 static inline int bch2_inode_write(struct btree_trans *trans, 108 107 struct btree_iter *iter, ··· 220 219 221 220 int bch2_inode_nlink_inc(struct bch_inode_unpacked *); 222 221 void bch2_inode_nlink_dec(struct btree_trans *, struct bch_inode_unpacked *); 222 + 223 + static inline bool bch2_inode_should_have_bp(struct bch_inode_unpacked *inode) 224 + { 225 + bool inode_has_bp = inode->bi_dir || inode->bi_dir_offset; 226 + 227 + return S_ISDIR(inode->bi_mode) || 228 + (!inode->bi_nlink && inode_has_bp); 229 + } 223 230 224 231 struct bch_opts bch2_inode_opts_to_opts(struct bch_inode_unpacked *); 225 232 void bch2_inode_opts_get(struct bch_io_opts *, struct bch_fs *,
+5 -5
fs/bcachefs/io_misc.c
··· 198 198 199 199 bch2_trans_iter_init(trans, &iter, BTREE_ID_extents, 200 200 POS(inum.inum, start), 201 - BTREE_ITER_INTENT); 201 + BTREE_ITER_intent); 202 202 203 203 ret = bch2_fpunch_at(trans, &iter, inum, end, i_sectors_delta); 204 204 ··· 230 230 struct bch_inode_unpacked inode_u; 231 231 int ret; 232 232 233 - ret = bch2_inode_peek(trans, &iter, &inode_u, inum, BTREE_ITER_INTENT) ?: 233 + ret = bch2_inode_peek(trans, &iter, &inode_u, inum, BTREE_ITER_intent) ?: 234 234 (inode_u.bi_size = new_i_size, 0) ?: 235 235 bch2_inode_write(trans, &iter, &inode_u); 236 236 ··· 256 256 257 257 bch2_trans_iter_init(trans, &fpunch_iter, BTREE_ID_extents, 258 258 POS(inum.inum, round_up(new_i_size, block_bytes(c)) >> 9), 259 - BTREE_ITER_INTENT); 259 + BTREE_ITER_intent); 260 260 ret = bch2_fpunch_at(trans, &fpunch_iter, inum, U64_MAX, i_sectors_delta); 261 261 bch2_trans_iter_exit(trans, &fpunch_iter); 262 262 ··· 317 317 offset <<= 9; 318 318 len <<= 9; 319 319 320 - ret = bch2_inode_peek(trans, &iter, &inode_u, inum, BTREE_ITER_INTENT); 320 + ret = bch2_inode_peek(trans, &iter, &inode_u, inum, BTREE_ITER_intent); 321 321 if (ret) 322 322 return ret; 323 323 ··· 365 365 366 366 bch2_trans_iter_init(trans, &iter, BTREE_ID_extents, 367 367 POS(inum.inum, 0), 368 - BTREE_ITER_INTENT); 368 + BTREE_ITER_intent); 369 369 370 370 switch (op->v.state) { 371 371 case LOGGED_OP_FINSERT_start:
+38 -30
fs/bcachefs/io_read.c
··· 378 378 bch2_bkey_buf_init(&sk); 379 379 380 380 bch2_trans_iter_init(trans, &iter, rbio->data_btree, 381 - rbio->read_pos, BTREE_ITER_SLOTS); 381 + rbio->read_pos, BTREE_ITER_slots); 382 382 retry: 383 383 rbio->bio.bi_status = 0; 384 384 ··· 487 487 return 0; 488 488 489 489 k = bch2_bkey_get_iter(trans, &iter, rbio->data_btree, rbio->data_pos, 490 - BTREE_ITER_SLOTS|BTREE_ITER_INTENT); 490 + BTREE_ITER_slots|BTREE_ITER_intent); 491 491 if ((ret = bkey_err(k))) 492 492 goto out; 493 493 ··· 523 523 goto out; 524 524 525 525 ret = bch2_trans_update(trans, &iter, new, 526 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 526 + BTREE_UPDATE_internal_snapshot_node); 527 527 out: 528 528 bch2_trans_iter_exit(trans, &iter); 529 529 return ret; ··· 541 541 struct bch_read_bio *rbio = 542 542 container_of(work, struct bch_read_bio, work); 543 543 struct bch_fs *c = rbio->c; 544 - struct bch_dev *ca = bch_dev_bkey_exists(c, rbio->pick.ptr.dev); 545 544 struct bio *src = &rbio->bio; 546 545 struct bio *dst = &bch2_rbio_parent(rbio)->bio; 547 546 struct bvec_iter dst_iter = rbio->bvec_iter; ··· 646 647 prt_str(&buf, "data "); 647 648 bch2_csum_err_msg(&buf, crc.csum_type, rbio->pick.crc.csum, csum); 648 649 649 - bch_err_inum_offset_ratelimited(ca, 650 - rbio->read_pos.inode, 651 - rbio->read_pos.offset << 9, 652 - "data %s", buf.buf); 650 + struct bch_dev *ca = rbio->have_ioref ? bch2_dev_have_ref(c, rbio->pick.ptr.dev) : NULL; 651 + if (ca) { 652 + bch_err_inum_offset_ratelimited(ca, 653 + rbio->read_pos.inode, 654 + rbio->read_pos.offset << 9, 655 + "data %s", buf.buf); 656 + bch2_io_error(ca, BCH_MEMBER_ERROR_checksum); 657 + } 653 658 printbuf_exit(&buf); 654 - 655 - bch2_io_error(ca, BCH_MEMBER_ERROR_checksum); 656 659 bch2_rbio_error(rbio, READ_RETRY_AVOID, BLK_STS_IOERR); 657 660 goto out; 658 661 decompression_err: ··· 676 675 struct bch_read_bio *rbio = 677 676 container_of(bio, struct bch_read_bio, bio); 678 677 struct bch_fs *c = rbio->c; 679 - struct bch_dev *ca = bch_dev_bkey_exists(c, rbio->pick.ptr.dev); 678 + struct bch_dev *ca = rbio->have_ioref ? bch2_dev_have_ref(c, rbio->pick.ptr.dev) : NULL; 680 679 struct workqueue_struct *wq = NULL; 681 680 enum rbio_context context = RBIO_CONTEXT_NULL; 682 681 ··· 688 687 if (!rbio->split) 689 688 rbio->bio.bi_end_io = rbio->end_io; 690 689 691 - if (bch2_dev_inum_io_err_on(bio->bi_status, ca, BCH_MEMBER_ERROR_read, 692 - rbio->read_pos.inode, 693 - rbio->read_pos.offset, 694 - "data read error: %s", 695 - bch2_blk_status_to_str(bio->bi_status))) { 690 + if (bio->bi_status) { 691 + if (ca) { 692 + bch_err_inum_offset_ratelimited(ca, 693 + rbio->read_pos.inode, 694 + rbio->read_pos.offset, 695 + "data read error: %s", 696 + bch2_blk_status_to_str(bio->bi_status)); 697 + bch2_io_error(ca, BCH_MEMBER_ERROR_read); 698 + } 696 699 bch2_rbio_error(rbio, READ_RETRY_AVOID, bio->bi_status); 697 700 return; 698 701 } 699 702 700 703 if (((rbio->flags & BCH_READ_RETRY_IF_STALE) && race_fault()) || 701 - ptr_stale(ca, &rbio->pick.ptr)) { 704 + (ca && dev_ptr_stale(ca, &rbio->pick.ptr))) { 702 705 trace_and_count(c, read_reuse_race, &rbio->bio); 703 706 704 707 if (rbio->flags & BCH_READ_RETRY_IF_STALE) ··· 763 758 } 764 759 765 760 static noinline void read_from_stale_dirty_pointer(struct btree_trans *trans, 761 + struct bch_dev *ca, 766 762 struct bkey_s_c k, 767 763 struct bch_extent_ptr ptr) 768 764 { 769 765 struct bch_fs *c = trans->c; 770 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr.dev); 771 766 struct btree_iter iter; 772 767 struct printbuf buf = PRINTBUF; 773 768 int ret; 774 769 775 770 bch2_trans_iter_init(trans, &iter, BTREE_ID_alloc, 776 - PTR_BUCKET_POS(c, &ptr), 777 - BTREE_ITER_CACHED); 771 + PTR_BUCKET_POS(ca, &ptr), 772 + BTREE_ITER_cached); 778 773 779 - prt_printf(&buf, "Attempting to read from stale dirty pointer:"); 774 + prt_printf(&buf, "Attempting to read from stale dirty pointer:\n"); 780 775 printbuf_indent_add(&buf, 2); 781 - prt_newline(&buf); 782 776 783 777 bch2_bkey_val_to_text(&buf, c, k); 784 778 prt_newline(&buf); ··· 805 801 struct bch_fs *c = trans->c; 806 802 struct extent_ptr_decoded pick; 807 803 struct bch_read_bio *rbio = NULL; 808 - struct bch_dev *ca = NULL; 809 804 struct promote_op *promote = NULL; 810 805 bool bounce = false, read_full = false, narrow_crcs = false; 811 806 struct bpos data_pos = bkey_start_pos(k.k); ··· 835 832 goto err; 836 833 } 837 834 838 - ca = bch_dev_bkey_exists(c, pick.ptr.dev); 835 + struct bch_dev *ca = bch2_dev_get_ioref(c, pick.ptr.dev, READ); 839 836 840 837 /* 841 838 * Stale dirty pointers are treated as IO errors, but @failed isn't ··· 845 842 */ 846 843 if ((flags & BCH_READ_IN_RETRY) && 847 844 !pick.ptr.cached && 848 - unlikely(ptr_stale(ca, &pick.ptr))) { 849 - read_from_stale_dirty_pointer(trans, k, pick.ptr); 845 + ca && 846 + unlikely(dev_ptr_stale(ca, &pick.ptr))) { 847 + read_from_stale_dirty_pointer(trans, ca, k, pick.ptr); 850 848 bch2_mark_io_failure(failed, &pick); 849 + percpu_ref_put(&ca->io_ref); 851 850 goto retry_pick; 852 851 } 853 852 ··· 864 859 * can happen if we retry, and the extent we were going to read 865 860 * has been merged in the meantime: 866 861 */ 867 - if (pick.crc.compressed_size > orig->bio.bi_vcnt * PAGE_SECTORS) 862 + if (pick.crc.compressed_size > orig->bio.bi_vcnt * PAGE_SECTORS) { 863 + if (ca) 864 + percpu_ref_put(&ca->io_ref); 868 865 goto hole; 866 + } 869 867 870 868 iter.bi_size = pick.crc.compressed_size << 9; 871 869 goto get_bio; ··· 973 965 rbio->bvec_iter = iter; 974 966 rbio->offset_into_extent= offset_into_extent; 975 967 rbio->flags = flags; 976 - rbio->have_ioref = pick_ret > 0 && bch2_dev_get_ioref(ca, READ); 968 + rbio->have_ioref = ca != NULL; 977 969 rbio->narrow_crcs = narrow_crcs; 978 970 rbio->hole = 0; 979 971 rbio->retry = 0; ··· 1003 995 * If it's being moved internally, we don't want to flag it as a cache 1004 996 * hit: 1005 997 */ 1006 - if (pick.ptr.cached && !(flags & BCH_READ_NODECODE)) 998 + if (ca && pick.ptr.cached && !(flags & BCH_READ_NODECODE)) 1007 999 bch2_bucket_io_time_reset(trans, pick.ptr.dev, 1008 1000 PTR_BUCKET_NR(ca, &pick.ptr), READ); 1009 1001 ··· 1121 1113 1122 1114 bch2_trans_iter_init(trans, &iter, BTREE_ID_extents, 1123 1115 SPOS(inum.inum, bvec_iter.bi_sector, snapshot), 1124 - BTREE_ITER_SLOTS); 1116 + BTREE_ITER_slots); 1125 1117 while (1) { 1126 1118 unsigned bytes, sectors, offset_into_extent; 1127 1119 enum btree_id data_btree = BTREE_ID_extents;
+49 -46
fs/bcachefs/io_write.c
··· 166 166 bch2_trans_copy_iter(&iter, extent_iter); 167 167 168 168 for_each_btree_key_upto_continue_norestart(iter, 169 - new->k.p, BTREE_ITER_SLOTS, old, ret) { 169 + new->k.p, BTREE_ITER_slots, old, ret) { 170 170 s64 sectors = min(new->k.p.offset, old.k->p.offset) - 171 171 max(bkey_start_offset(&new->k), 172 172 bkey_start_offset(old.k)); ··· 210 210 * to be journalled - if we crash, the bi_journal_seq update will be 211 211 * lost, but that's fine. 212 212 */ 213 - unsigned inode_update_flags = BTREE_UPDATE_NOJOURNAL; 213 + unsigned inode_update_flags = BTREE_UPDATE_nojournal; 214 214 215 215 struct btree_iter iter; 216 216 struct bkey_s_c k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_inodes, 217 217 SPOS(0, 218 218 extent_iter->pos.inode, 219 219 extent_iter->snapshot), 220 - BTREE_ITER_CACHED); 220 + BTREE_ITER_cached); 221 221 int ret = bkey_err(k); 222 222 if (unlikely(ret)) 223 223 return ret; ··· 259 259 } 260 260 261 261 ret = bch2_trans_update(trans, &iter, &inode->k_i, 262 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE| 262 + BTREE_UPDATE_internal_snapshot_node| 263 263 inode_update_flags); 264 264 err: 265 265 bch2_trans_iter_exit(trans, &iter); ··· 368 368 369 369 bch2_trans_iter_init(trans, &iter, BTREE_ID_extents, 370 370 bkey_start_pos(&sk.k->k), 371 - BTREE_ITER_SLOTS|BTREE_ITER_INTENT); 371 + BTREE_ITER_slots|BTREE_ITER_intent); 372 372 373 373 ret = bch2_bkey_set_needs_rebalance(c, sk.k, &op->opts) ?: 374 374 bch2_extent_update(trans, inum, &iter, sk.k, ··· 407 407 BUG_ON(c->opts.nochanges); 408 408 409 409 bkey_for_each_ptr(ptrs, ptr) { 410 - BUG_ON(!bch2_dev_exists2(c, ptr->dev)); 411 - 412 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 410 + struct bch_dev *ca = nocow 411 + ? bch2_dev_have_ref(c, ptr->dev) 412 + : bch2_dev_get_ioref(c, ptr->dev, type == BCH_DATA_btree ? READ : WRITE); 413 413 414 414 if (to_entry(ptr + 1) < ptrs.end) { 415 - n = to_wbio(bio_alloc_clone(NULL, &wbio->bio, 416 - GFP_NOFS, &ca->replica_set)); 415 + n = to_wbio(bio_alloc_clone(NULL, &wbio->bio, GFP_NOFS, &c->replica_set)); 417 416 418 417 n->bio.bi_end_io = wbio->bio.bi_end_io; 419 418 n->bio.bi_private = wbio->bio.bi_private; ··· 429 430 430 431 n->c = c; 431 432 n->dev = ptr->dev; 432 - n->have_ioref = nocow || bch2_dev_get_ioref(ca, 433 - type == BCH_DATA_btree ? READ : WRITE); 433 + n->have_ioref = ca != NULL; 434 434 n->nocow = nocow; 435 435 n->submit_time = local_clock(); 436 436 n->inode_offset = bkey_start_offset(&k->k); 437 + if (nocow) 438 + n->nocow_bucket = PTR_BUCKET_NR(ca, ptr); 437 439 n->bio.bi_iter.bi_sector = ptr->offset; 438 440 439 441 if (likely(n->have_ioref)) { ··· 481 481 static noinline int bch2_write_drop_io_error_ptrs(struct bch_write_op *op) 482 482 { 483 483 struct keylist *keys = &op->insert_keys; 484 - struct bch_extent_ptr *ptr; 485 484 struct bkey_i *src, *dst = keys->keys, *n; 486 485 487 486 for (src = keys->keys; src != keys->top; src = n) { ··· 649 650 struct bch_write_bio *wbio = to_wbio(bio); 650 651 struct bch_write_bio *parent = wbio->split ? wbio->parent : NULL; 651 652 struct bch_fs *c = wbio->c; 652 - struct bch_dev *ca = bch_dev_bkey_exists(c, wbio->dev); 653 + struct bch_dev *ca = wbio->have_ioref 654 + ? bch2_dev_have_ref(c, wbio->dev) 655 + : NULL; 653 656 654 657 if (bch2_dev_inum_io_err_on(bio->bi_status, ca, BCH_MEMBER_ERROR_write, 655 658 op->pos.inode, ··· 662 661 op->flags |= BCH_WRITE_IO_ERROR; 663 662 } 664 663 665 - if (wbio->nocow) 664 + if (wbio->nocow) { 665 + bch2_bucket_nocow_unlock(&c->nocow_locks, 666 + POS(ca->dev_idx, wbio->nocow_bucket), 667 + BUCKET_NOCOW_LOCK_UPDATE); 666 668 set_bit(wbio->dev, op->devs_need_flush->d); 669 + } 667 670 668 671 if (wbio->have_ioref) { 669 672 bch2_latency_acct(ca, wbio->submit_time, WRITE); ··· 1106 1101 return false; 1107 1102 1108 1103 e = bkey_s_c_to_extent(k); 1104 + 1105 + rcu_read_lock(); 1109 1106 extent_for_each_ptr_decode(e, p, entry) { 1110 - if (crc_is_encoded(p.crc) || p.has_ec) 1107 + if (crc_is_encoded(p.crc) || p.has_ec) { 1108 + rcu_read_unlock(); 1111 1109 return false; 1110 + } 1112 1111 1113 1112 replicas += bch2_extent_ptr_durability(c, &p); 1114 1113 } 1114 + rcu_read_unlock(); 1115 1115 1116 1116 return replicas >= op->opts.data_replicas; 1117 - } 1118 - 1119 - static inline void bch2_nocow_write_unlock(struct bch_write_op *op) 1120 - { 1121 - struct bch_fs *c = op->c; 1122 - 1123 - for_each_keylist_key(&op->insert_keys, k) { 1124 - struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(bkey_i_to_s_c(k)); 1125 - 1126 - bkey_for_each_ptr(ptrs, ptr) 1127 - bch2_bucket_nocow_unlock(&c->nocow_locks, 1128 - PTR_BUCKET_POS(c, ptr), 1129 - BUCKET_NOCOW_LOCK_UPDATE); 1130 - } 1131 1117 } 1132 1118 1133 1119 static int bch2_nocow_write_convert_one_unwritten(struct btree_trans *trans, ··· 1154 1158 return bch2_extent_update_i_size_sectors(trans, iter, 1155 1159 min(new->k.p.offset << 9, new_i_size), 0) ?: 1156 1160 bch2_trans_update(trans, iter, new, 1157 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 1161 + BTREE_UPDATE_internal_snapshot_node); 1158 1162 } 1159 1163 1160 1164 static void bch2_nocow_write_convert_unwritten(struct bch_write_op *op) ··· 1165 1169 for_each_keylist_key(&op->insert_keys, orig) { 1166 1170 int ret = for_each_btree_key_upto_commit(trans, iter, BTREE_ID_extents, 1167 1171 bkey_start_pos(&orig->k), orig->k.p, 1168 - BTREE_ITER_INTENT, k, 1172 + BTREE_ITER_intent, k, 1169 1173 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, ({ 1170 1174 bch2_nocow_write_convert_one_unwritten(trans, &iter, orig, k, op->new_i_size); 1171 1175 })); ··· 1191 1195 1192 1196 static void __bch2_nocow_write_done(struct bch_write_op *op) 1193 1197 { 1194 - bch2_nocow_write_unlock(op); 1195 - 1196 1198 if (unlikely(op->flags & BCH_WRITE_IO_ERROR)) { 1197 1199 op->error = -EIO; 1198 1200 } else if (unlikely(op->flags & BCH_WRITE_CONVERT_UNWRITTEN)) ··· 1236 1242 1237 1243 bch2_trans_iter_init(trans, &iter, BTREE_ID_extents, 1238 1244 SPOS(op->pos.inode, op->pos.offset, snapshot), 1239 - BTREE_ITER_SLOTS); 1245 + BTREE_ITER_slots); 1240 1246 while (1) { 1241 1247 struct bio *bio = &op->wbio.bio; 1242 1248 1243 1249 buckets.nr = 0; 1250 + 1251 + ret = bch2_trans_relock(trans); 1252 + if (ret) 1253 + break; 1244 1254 1245 1255 k = bch2_btree_iter_peek_slot(&iter); 1246 1256 ret = bkey_err(k); ··· 1265 1267 /* Get iorefs before dropping btree locks: */ 1266 1268 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1267 1269 bkey_for_each_ptr(ptrs, ptr) { 1268 - struct bpos b = PTR_BUCKET_POS(c, ptr); 1270 + struct bch_dev *ca = bch2_dev_get_ioref(c, ptr->dev, WRITE); 1271 + if (unlikely(!ca)) 1272 + goto err_get_ioref; 1273 + 1274 + struct bpos b = PTR_BUCKET_POS(ca, ptr); 1269 1275 struct nocow_lock_bucket *l = 1270 1276 bucket_nocow_lock(&c->nocow_locks, bucket_to_u64(b)); 1271 1277 prefetch(l); 1272 - 1273 - if (unlikely(!bch2_dev_get_ioref(bch_dev_bkey_exists(c, ptr->dev), WRITE))) 1274 - goto err_get_ioref; 1275 1278 1276 1279 /* XXX allocating memory with btree locks held - rare */ 1277 1280 darray_push_gfp(&buckets, ((struct bucket_to_lock) { ··· 1292 1293 bch2_cut_back(POS(op->pos.inode, op->pos.offset + bio_sectors(bio)), op->insert_keys.top); 1293 1294 1294 1295 darray_for_each(buckets, i) { 1295 - struct bch_dev *ca = bch_dev_bkey_exists(c, i->b.inode); 1296 + struct bch_dev *ca = bch2_dev_have_ref(c, i->b.inode); 1296 1297 1297 1298 __bch2_bucket_nocow_lock(&c->nocow_locks, i->l, 1298 1299 bucket_to_u64(i->b), ··· 1369 1370 return; 1370 1371 err_get_ioref: 1371 1372 darray_for_each(buckets, i) 1372 - percpu_ref_put(&bch_dev_bkey_exists(c, i->b.inode)->io_ref); 1373 + percpu_ref_put(&bch2_dev_have_ref(c, i->b.inode)->io_ref); 1373 1374 1374 1375 /* Fall back to COW path: */ 1375 1376 goto out; ··· 1490 1491 if ((op->flags & BCH_WRITE_SYNC) || 1491 1492 (!(op->flags & BCH_WRITE_DONE) && 1492 1493 !(op->flags & BCH_WRITE_IN_WORKER))) { 1493 - closure_sync(&op->cl); 1494 + if (closure_sync_timeout(&op->cl, HZ * 10)) { 1495 + bch2_print_allocator_stuck(c); 1496 + closure_sync(&op->cl); 1497 + } 1498 + 1494 1499 __bch2_write_index(op); 1495 1500 1496 1501 if (!(op->flags & BCH_WRITE_DONE)) ··· 1652 1649 prt_bitflags(out, bch2_write_flags, op->flags); 1653 1650 prt_newline(out); 1654 1651 1655 - prt_printf(out, "ref: %u", closure_nr_remaining(&op->cl)); 1656 - prt_newline(out); 1652 + prt_printf(out, "ref: %u\n", closure_nr_remaining(&op->cl)); 1657 1653 1658 1654 printbuf_indent_sub(out, 2); 1659 1655 } ··· 1660 1658 void bch2_fs_io_write_exit(struct bch_fs *c) 1661 1659 { 1662 1660 mempool_exit(&c->bio_bounce_pages); 1661 + bioset_exit(&c->replica_set); 1663 1662 bioset_exit(&c->bio_write); 1664 1663 } 1665 1664 1666 1665 int bch2_fs_io_write_init(struct bch_fs *c) 1667 1666 { 1668 - if (bioset_init(&c->bio_write, 1, offsetof(struct bch_write_bio, bio), 1669 - BIOSET_NEED_BVECS)) 1667 + if (bioset_init(&c->bio_write, 1, offsetof(struct bch_write_bio, bio), BIOSET_NEED_BVECS) || 1668 + bioset_init(&c->replica_set, 4, offsetof(struct bch_write_bio, bio), 0)) 1670 1669 return -BCH_ERR_ENOMEM_bio_write_init; 1671 1670 1672 1671 if (mempool_init_page_pool(&c->bio_bounce_pages,
+1
fs/bcachefs/io_write_types.h
··· 20 20 21 21 u64 submit_time; 22 22 u64 inode_offset; 23 + u64 nocow_bucket; 23 24 24 25 struct bch_devs_list failed; 25 26 u8 dev;
+63 -68
fs/bcachefs/journal.c
··· 53 53 unsigned i = seq & JOURNAL_BUF_MASK; 54 54 struct journal_buf *buf = j->buf + i; 55 55 56 - prt_str(out, "seq:"); 57 - prt_tab(out); 58 - prt_printf(out, "%llu", seq); 59 - prt_newline(out); 56 + prt_printf(out, "seq:\t%llu\n", seq); 60 57 printbuf_indent_add(out, 2); 61 58 62 - prt_str(out, "refcount:"); 63 - prt_tab(out); 64 - prt_printf(out, "%u", journal_state_count(s, i)); 65 - prt_newline(out); 59 + prt_printf(out, "refcount:\t%u\n", journal_state_count(s, i)); 66 60 67 - prt_str(out, "size:"); 68 - prt_tab(out); 61 + prt_printf(out, "size:\t"); 69 62 prt_human_readable_u64(out, vstruct_bytes(buf->data)); 70 63 prt_newline(out); 71 64 72 - prt_str(out, "expires:"); 73 - prt_tab(out); 74 - prt_printf(out, "%li jiffies", buf->expires - jiffies); 75 - prt_newline(out); 65 + prt_printf(out, "expires:\t"); 66 + prt_printf(out, "%li jiffies\n", buf->expires - jiffies); 76 67 77 - prt_str(out, "flags:"); 78 - prt_tab(out); 68 + prt_printf(out, "flags:\t"); 79 69 if (buf->noflush) 80 70 prt_str(out, "noflush "); 81 71 if (buf->must_flush) ··· 77 87 if (buf->write_started) 78 88 prt_str(out, "write_started "); 79 89 if (buf->write_allocated) 80 - prt_str(out, "write allocated "); 90 + prt_str(out, "write_allocated "); 81 91 if (buf->write_done) 82 - prt_str(out, "write done"); 92 + prt_str(out, "write_done"); 83 93 prt_newline(out); 84 94 85 95 printbuf_indent_sub(out, 2); ··· 938 948 break; 939 949 } 940 950 } else { 941 - ob[nr_got] = bch2_bucket_alloc(c, ca, BCH_WATERMARK_normal, cl); 951 + ob[nr_got] = bch2_bucket_alloc(c, ca, BCH_WATERMARK_normal, 952 + BCH_DATA_journal, cl); 942 953 ret = PTR_ERR_OR_ZERO(ob[nr_got]); 943 954 if (ret) 944 955 break; ··· 947 956 ret = bch2_trans_run(c, 948 957 bch2_trans_mark_metadata_bucket(trans, ca, 949 958 ob[nr_got]->bucket, BCH_DATA_journal, 950 - ca->mi.bucket_size)); 959 + ca->mi.bucket_size, BTREE_TRIGGER_transactional)); 951 960 if (ret) { 952 961 bch2_open_bucket_put(c, ob[nr_got]); 953 962 bch_err_msg(c, ret, "marking new journal buckets"); ··· 1027 1036 for (i = 0; i < nr_got; i++) 1028 1037 bch2_trans_run(c, 1029 1038 bch2_trans_mark_metadata_bucket(trans, ca, 1030 - bu[i], BCH_DATA_free, 0)); 1039 + bu[i], BCH_DATA_free, 0, 1040 + BTREE_TRIGGER_transactional)); 1031 1041 err_free: 1032 1042 if (!new_fs) 1033 1043 for (i = 0; i < nr_got; i++) ··· 1179 1187 bch2_journal_meta(j); 1180 1188 1181 1189 journal_quiesce(j); 1190 + cancel_delayed_work_sync(&j->write_work); 1182 1191 1183 1192 BUG_ON(!bch2_journal_error(j) && 1184 - test_bit(JOURNAL_REPLAY_DONE, &j->flags) && 1193 + test_bit(JOURNAL_replay_done, &j->flags) && 1185 1194 j->last_empty_seq != journal_cur_seq(j)); 1186 1195 1187 - cancel_delayed_work_sync(&j->write_work); 1196 + if (!bch2_journal_error(j)) 1197 + clear_bit(JOURNAL_running, &j->flags); 1188 1198 } 1189 1199 1190 1200 int bch2_fs_journal_start(struct journal *j, u64 cur_seq) ··· 1260 1266 1261 1267 spin_lock(&j->lock); 1262 1268 1263 - set_bit(JOURNAL_STARTED, &j->flags); 1269 + set_bit(JOURNAL_running, &j->flags); 1264 1270 j->last_flush_write = jiffies; 1265 1271 1266 1272 j->reservations.idx = j->reservations.unwritten_idx = journal_cur_seq(j); ··· 1401 1407 1402 1408 /* debug: */ 1403 1409 1410 + static const char * const bch2_journal_flags_strs[] = { 1411 + #define x(n) #n, 1412 + JOURNAL_FLAGS() 1413 + #undef x 1414 + NULL 1415 + }; 1416 + 1404 1417 void __bch2_journal_debug_to_text(struct printbuf *out, struct journal *j) 1405 1418 { 1406 1419 struct bch_fs *c = container_of(j, struct bch_fs, journal); ··· 1416 1415 u64 nr_writes = j->nr_flush_writes + j->nr_noflush_writes; 1417 1416 1418 1417 if (!out->nr_tabstops) 1419 - printbuf_tabstop_push(out, 24); 1418 + printbuf_tabstop_push(out, 28); 1420 1419 out->atomic++; 1421 1420 1422 1421 rcu_read_lock(); 1423 1422 s = READ_ONCE(j->reservations); 1424 1423 1424 + prt_printf(out, "flags:\t"); 1425 + prt_bitflags(out, bch2_journal_flags_strs, j->flags); 1426 + prt_newline(out); 1425 1427 prt_printf(out, "dirty journal entries:\t%llu/%llu\n", fifo_used(&j->pin), j->pin.size); 1426 - prt_printf(out, "seq:\t\t\t%llu\n", journal_cur_seq(j)); 1427 - prt_printf(out, "seq_ondisk:\t\t%llu\n", j->seq_ondisk); 1428 - prt_printf(out, "last_seq:\t\t%llu\n", journal_last_seq(j)); 1428 + prt_printf(out, "seq:\t%llu\n", journal_cur_seq(j)); 1429 + prt_printf(out, "seq_ondisk:\t%llu\n", j->seq_ondisk); 1430 + prt_printf(out, "last_seq:\t%llu\n", journal_last_seq(j)); 1429 1431 prt_printf(out, "last_seq_ondisk:\t%llu\n", j->last_seq_ondisk); 1430 1432 prt_printf(out, "flushed_seq_ondisk:\t%llu\n", j->flushed_seq_ondisk); 1431 - prt_printf(out, "watermark:\t\t%s\n", bch2_watermarks[j->watermark]); 1433 + prt_printf(out, "watermark:\t%s\n", bch2_watermarks[j->watermark]); 1432 1434 prt_printf(out, "each entry reserved:\t%u\n", j->entry_u64s_reserved); 1433 1435 prt_printf(out, "nr flush writes:\t%llu\n", j->nr_flush_writes); 1434 1436 prt_printf(out, "nr noflush writes:\t%llu\n", j->nr_noflush_writes); ··· 1440 1436 prt_newline(out); 1441 1437 prt_printf(out, "nr direct reclaim:\t%llu\n", j->nr_direct_reclaim); 1442 1438 prt_printf(out, "nr background reclaim:\t%llu\n", j->nr_background_reclaim); 1443 - prt_printf(out, "reclaim kicked:\t\t%u\n", j->reclaim_kicked); 1439 + prt_printf(out, "reclaim kicked:\t%u\n", j->reclaim_kicked); 1444 1440 prt_printf(out, "reclaim runs in:\t%u ms\n", time_after(j->next_reclaim, now) 1445 1441 ? jiffies_to_msecs(j->next_reclaim - jiffies) : 0); 1446 - prt_printf(out, "blocked:\t\t%u\n", j->blocked); 1442 + prt_printf(out, "blocked:\t%u\n", j->blocked); 1447 1443 prt_printf(out, "current entry sectors:\t%u\n", j->cur_entry_sectors); 1448 1444 prt_printf(out, "current entry error:\t%s\n", bch2_journal_errors[j->cur_entry_error]); 1449 - prt_printf(out, "current entry:\t\t"); 1445 + prt_printf(out, "current entry:\t"); 1450 1446 1451 1447 switch (s.cur_entry_offset) { 1452 1448 case JOURNAL_ENTRY_ERROR_VAL: 1453 - prt_printf(out, "error"); 1449 + prt_printf(out, "error\n"); 1454 1450 break; 1455 1451 case JOURNAL_ENTRY_CLOSED_VAL: 1456 - prt_printf(out, "closed"); 1452 + prt_printf(out, "closed\n"); 1457 1453 break; 1458 1454 default: 1459 - prt_printf(out, "%u/%u", s.cur_entry_offset, j->cur_entry_u64s); 1455 + prt_printf(out, "%u/%u\n", s.cur_entry_offset, j->cur_entry_u64s); 1460 1456 break; 1461 1457 } 1462 1458 1463 - prt_newline(out); 1464 - prt_printf(out, "unwritten entries:"); 1465 - prt_newline(out); 1459 + prt_printf(out, "unwritten entries:\n"); 1466 1460 bch2_journal_bufs_to_text(out, j); 1467 1461 1468 - prt_printf(out, 1469 - "replay done:\t\t%i\n", 1470 - test_bit(JOURNAL_REPLAY_DONE, &j->flags)); 1471 - 1472 1462 prt_printf(out, "space:\n"); 1473 - prt_printf(out, "\tdiscarded\t%u:%u\n", 1463 + printbuf_indent_add(out, 2); 1464 + prt_printf(out, "discarded\t%u:%u\n", 1474 1465 j->space[journal_space_discarded].next_entry, 1475 1466 j->space[journal_space_discarded].total); 1476 - prt_printf(out, "\tclean ondisk\t%u:%u\n", 1467 + prt_printf(out, "clean ondisk\t%u:%u\n", 1477 1468 j->space[journal_space_clean_ondisk].next_entry, 1478 1469 j->space[journal_space_clean_ondisk].total); 1479 - prt_printf(out, "\tclean\t\t%u:%u\n", 1470 + prt_printf(out, "clean\t%u:%u\n", 1480 1471 j->space[journal_space_clean].next_entry, 1481 1472 j->space[journal_space_clean].total); 1482 - prt_printf(out, "\ttotal\t\t%u:%u\n", 1473 + prt_printf(out, "total\t%u:%u\n", 1483 1474 j->space[journal_space_total].next_entry, 1484 1475 j->space[journal_space_total].total); 1476 + printbuf_indent_sub(out, 2); 1485 1477 1486 1478 for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) { 1487 1479 struct journal_device *ja = &ca->journal; ··· 1488 1488 if (!ja->nr) 1489 1489 continue; 1490 1490 1491 - prt_printf(out, "dev %u:\n", ca->dev_idx); 1492 - prt_printf(out, "\tnr\t\t%u\n", ja->nr); 1493 - prt_printf(out, "\tbucket size\t%u\n", ca->mi.bucket_size); 1494 - prt_printf(out, "\tavailable\t%u:%u\n", bch2_journal_dev_buckets_available(j, ja, journal_space_discarded), ja->sectors_free); 1495 - prt_printf(out, "\tdiscard_idx\t%u\n", ja->discard_idx); 1496 - prt_printf(out, "\tdirty_ondisk\t%u (seq %llu)\n", ja->dirty_idx_ondisk, ja->bucket_seq[ja->dirty_idx_ondisk]); 1497 - prt_printf(out, "\tdirty_idx\t%u (seq %llu)\n", ja->dirty_idx, ja->bucket_seq[ja->dirty_idx]); 1498 - prt_printf(out, "\tcur_idx\t\t%u (seq %llu)\n", ja->cur_idx, ja->bucket_seq[ja->cur_idx]); 1491 + prt_printf(out, "dev %u:\n", ca->dev_idx); 1492 + printbuf_indent_add(out, 2); 1493 + prt_printf(out, "nr\t%u\n", ja->nr); 1494 + prt_printf(out, "bucket size\t%u\n", ca->mi.bucket_size); 1495 + prt_printf(out, "available\t%u:%u\n", bch2_journal_dev_buckets_available(j, ja, journal_space_discarded), ja->sectors_free); 1496 + prt_printf(out, "discard_idx\t%u\n", ja->discard_idx); 1497 + prt_printf(out, "dirty_ondisk\t%u (seq %llu)\n",ja->dirty_idx_ondisk, ja->bucket_seq[ja->dirty_idx_ondisk]); 1498 + prt_printf(out, "dirty_idx\t%u (seq %llu)\n", ja->dirty_idx, ja->bucket_seq[ja->dirty_idx]); 1499 + prt_printf(out, "cur_idx\t%u (seq %llu)\n", ja->cur_idx, ja->bucket_seq[ja->cur_idx]); 1500 + printbuf_indent_sub(out, 2); 1499 1501 } 1500 1502 1501 1503 rcu_read_unlock(); ··· 1529 1527 1530 1528 pin_list = journal_seq_pin(j, *seq); 1531 1529 1532 - prt_printf(out, "%llu: count %u", *seq, atomic_read(&pin_list->count)); 1533 - prt_newline(out); 1530 + prt_printf(out, "%llu: count %u\n", *seq, atomic_read(&pin_list->count)); 1534 1531 printbuf_indent_add(out, 2); 1535 1532 1536 1533 for (unsigned i = 0; i < ARRAY_SIZE(pin_list->list); i++) 1537 - list_for_each_entry(pin, &pin_list->list[i], list) { 1538 - prt_printf(out, "\t%px %ps", pin, pin->flush); 1539 - prt_newline(out); 1540 - } 1534 + list_for_each_entry(pin, &pin_list->list[i], list) 1535 + prt_printf(out, "\t%px %ps\n", pin, pin->flush); 1541 1536 1542 - if (!list_empty(&pin_list->flushed)) { 1543 - prt_printf(out, "flushed:"); 1544 - prt_newline(out); 1545 - } 1537 + if (!list_empty(&pin_list->flushed)) 1538 + prt_printf(out, "flushed:\n"); 1546 1539 1547 - list_for_each_entry(pin, &pin_list->flushed, list) { 1548 - prt_printf(out, "\t%px %ps", pin, pin->flush); 1549 - prt_newline(out); 1550 - } 1540 + list_for_each_entry(pin, &pin_list->flushed, list) 1541 + prt_printf(out, "\t%px %ps\n", pin, pin->flush); 1551 1542 1552 1543 printbuf_indent_sub(out, 2); 1553 1544
+3 -3
fs/bcachefs/journal.h
··· 372 372 int ret; 373 373 374 374 EBUG_ON(res->ref); 375 - EBUG_ON(!test_bit(JOURNAL_STARTED, &j->flags)); 375 + EBUG_ON(!test_bit(JOURNAL_running, &j->flags)); 376 376 377 377 res->u64s = u64s; 378 378 ··· 418 418 419 419 static inline void bch2_journal_set_replay_done(struct journal *j) 420 420 { 421 - BUG_ON(!test_bit(JOURNAL_STARTED, &j->flags)); 422 - set_bit(JOURNAL_REPLAY_DONE, &j->flags); 421 + BUG_ON(!test_bit(JOURNAL_running, &j->flags)); 422 + set_bit(JOURNAL_replay_done, &j->flags); 423 423 } 424 424 425 425 void bch2_journal_unblock(struct journal *);
+71 -92
fs/bcachefs/journal_io.c
··· 17 17 #include "sb-clean.h" 18 18 #include "trace.h" 19 19 20 + void bch2_journal_pos_from_member_info_set(struct bch_fs *c) 21 + { 22 + lockdep_assert_held(&c->sb_lock); 23 + 24 + for_each_member_device(c, ca) { 25 + struct bch_member *m = bch2_members_v2_get_mut(c->disk_sb.sb, ca->dev_idx); 26 + 27 + m->last_journal_bucket = cpu_to_le32(ca->journal.cur_idx); 28 + m->last_journal_bucket_offset = cpu_to_le32(ca->mi.bucket_size - ca->journal.sectors_free); 29 + } 30 + } 31 + 32 + void bch2_journal_pos_from_member_info_resume(struct bch_fs *c) 33 + { 34 + mutex_lock(&c->sb_lock); 35 + for_each_member_device(c, ca) { 36 + struct bch_member m = bch2_sb_member_get(c->disk_sb.sb, ca->dev_idx); 37 + 38 + unsigned idx = le32_to_cpu(m.last_journal_bucket); 39 + if (idx < ca->journal.nr) 40 + ca->journal.cur_idx = idx; 41 + unsigned offset = le32_to_cpu(m.last_journal_bucket_offset); 42 + if (offset <= ca->mi.bucket_size) 43 + ca->journal.sectors_free = ca->mi.bucket_size - offset; 44 + } 45 + mutex_unlock(&c->sb_lock); 46 + } 47 + 20 48 void bch2_journal_ptrs_to_text(struct printbuf *out, struct bch_fs *c, 21 49 struct journal_replay *j) 22 50 { 23 51 darray_for_each(j->ptrs, i) { 24 - struct bch_dev *ca = bch_dev_bkey_exists(c, i->dev); 25 - u64 offset; 26 - 27 - div64_u64_rem(i->sector, ca->mi.bucket_size, &offset); 28 - 29 52 if (i != j->ptrs.data) 30 53 prt_printf(out, " "); 31 54 prt_printf(out, "%u:%u:%u (sector %llu)", ··· 144 121 u64 last_seq = !JSET_NO_FLUSH(j) ? le64_to_cpu(j->last_seq) : 0; 145 122 struct printbuf buf = PRINTBUF; 146 123 int ret = JOURNAL_ENTRY_ADD_OK; 124 + 125 + if (!c->journal.oldest_seq_found_ondisk || 126 + le64_to_cpu(j->seq) < c->journal.oldest_seq_found_ondisk) 127 + c->journal.oldest_seq_found_ondisk = le64_to_cpu(j->seq); 147 128 148 129 /* Is this entry older than the range we need? */ 149 130 if (!c->opts.read_entire_journal && ··· 299 272 journal_entry_err_msg(&_buf, version, jset, entry); \ 300 273 prt_printf(&_buf, msg, ##__VA_ARGS__); \ 301 274 \ 302 - switch (flags & BKEY_INVALID_WRITE) { \ 275 + switch (flags & BCH_VALIDATE_write) { \ 303 276 case READ: \ 304 277 mustfix_fsck_err(c, _err, "%s", _buf.buf); \ 305 278 break; \ ··· 328 301 unsigned level, enum btree_id btree_id, 329 302 struct bkey_i *k, 330 303 unsigned version, int big_endian, 331 - enum bkey_invalid_flags flags) 304 + enum bch_validate_flags flags) 332 305 { 333 - int write = flags & BKEY_INVALID_WRITE; 306 + int write = flags & BCH_VALIDATE_write; 334 307 void *next = vstruct_next(entry); 335 308 struct printbuf buf = PRINTBUF; 336 309 int ret = 0; ··· 403 376 struct jset *jset, 404 377 struct jset_entry *entry, 405 378 unsigned version, int big_endian, 406 - enum bkey_invalid_flags flags) 379 + enum bch_validate_flags flags) 407 380 { 408 381 struct bkey_i *k = entry->start; 409 382 ··· 412 385 entry->level, 413 386 entry->btree_id, 414 387 k, version, big_endian, 415 - flags|BKEY_INVALID_JOURNAL); 388 + flags|BCH_VALIDATE_journal); 416 389 if (ret == FSCK_DELETED_KEY) 417 390 continue; 418 391 ··· 443 416 struct jset *jset, 444 417 struct jset_entry *entry, 445 418 unsigned version, int big_endian, 446 - enum bkey_invalid_flags flags) 419 + enum bch_validate_flags flags) 447 420 { 448 421 struct bkey_i *k = entry->start; 449 422 int ret = 0; ··· 482 455 struct jset *jset, 483 456 struct jset_entry *entry, 484 457 unsigned version, int big_endian, 485 - enum bkey_invalid_flags flags) 458 + enum bch_validate_flags flags) 486 459 { 487 460 /* obsolete, don't care: */ 488 461 return 0; ··· 497 470 struct jset *jset, 498 471 struct jset_entry *entry, 499 472 unsigned version, int big_endian, 500 - enum bkey_invalid_flags flags) 473 + enum bch_validate_flags flags) 501 474 { 502 475 int ret = 0; 503 476 ··· 524 497 struct jset *jset, 525 498 struct jset_entry *entry, 526 499 unsigned version, int big_endian, 527 - enum bkey_invalid_flags flags) 500 + enum bch_validate_flags flags) 528 501 { 529 502 struct jset_entry_blacklist_v2 *bl_entry; 530 503 int ret = 0; ··· 566 539 struct jset *jset, 567 540 struct jset_entry *entry, 568 541 unsigned version, int big_endian, 569 - enum bkey_invalid_flags flags) 542 + enum bch_validate_flags flags) 570 543 { 571 544 struct jset_entry_usage *u = 572 545 container_of(entry, struct jset_entry_usage, entry); ··· 600 573 struct jset *jset, 601 574 struct jset_entry *entry, 602 575 unsigned version, int big_endian, 603 - enum bkey_invalid_flags flags) 576 + enum bch_validate_flags flags) 604 577 { 605 578 struct jset_entry_data_usage *u = 606 579 container_of(entry, struct jset_entry_data_usage, entry); ··· 644 617 struct jset *jset, 645 618 struct jset_entry *entry, 646 619 unsigned version, int big_endian, 647 - enum bkey_invalid_flags flags) 620 + enum bch_validate_flags flags) 648 621 { 649 622 struct jset_entry_clock *clock = 650 623 container_of(entry, struct jset_entry_clock, entry); ··· 684 657 struct jset *jset, 685 658 struct jset_entry *entry, 686 659 unsigned version, int big_endian, 687 - enum bkey_invalid_flags flags) 660 + enum bch_validate_flags flags) 688 661 { 689 662 struct jset_entry_dev_usage *u = 690 663 container_of(entry, struct jset_entry_dev_usage, entry); 691 664 unsigned bytes = jset_u64s(le16_to_cpu(entry->u64s)) * sizeof(u64); 692 665 unsigned expected = sizeof(*u); 693 - unsigned dev; 694 666 int ret = 0; 695 667 696 668 if (journal_entry_err_on(bytes < expected, ··· 697 671 journal_entry_dev_usage_bad_size, 698 672 "bad size (%u < %u)", 699 673 bytes, expected)) { 700 - journal_entry_null_range(entry, vstruct_next(entry)); 701 - return ret; 702 - } 703 - 704 - dev = le32_to_cpu(u->dev); 705 - 706 - if (journal_entry_err_on(!bch2_dev_exists2(c, dev), 707 - c, version, jset, entry, 708 - journal_entry_dev_usage_bad_dev, 709 - "bad dev")) { 710 674 journal_entry_null_range(entry, vstruct_next(entry)); 711 675 return ret; 712 676 } ··· 735 719 struct jset *jset, 736 720 struct jset_entry *entry, 737 721 unsigned version, int big_endian, 738 - enum bkey_invalid_flags flags) 722 + enum bch_validate_flags flags) 739 723 { 740 724 return 0; 741 725 } ··· 753 737 struct jset *jset, 754 738 struct jset_entry *entry, 755 739 unsigned version, int big_endian, 756 - enum bkey_invalid_flags flags) 740 + enum bch_validate_flags flags) 757 741 { 758 742 return journal_entry_btree_keys_validate(c, jset, entry, 759 743 version, big_endian, READ); ··· 769 753 struct jset *jset, 770 754 struct jset_entry *entry, 771 755 unsigned version, int big_endian, 772 - enum bkey_invalid_flags flags) 756 + enum bch_validate_flags flags) 773 757 { 774 758 return journal_entry_btree_keys_validate(c, jset, entry, 775 759 version, big_endian, READ); ··· 785 769 struct jset *jset, 786 770 struct jset_entry *entry, 787 771 unsigned version, int big_endian, 788 - enum bkey_invalid_flags flags) 772 + enum bch_validate_flags flags) 789 773 { 790 774 unsigned bytes = vstruct_bytes(entry); 791 775 unsigned expected = 16; ··· 815 799 struct jset_entry_ops { 816 800 int (*validate)(struct bch_fs *, struct jset *, 817 801 struct jset_entry *, unsigned, int, 818 - enum bkey_invalid_flags); 802 + enum bch_validate_flags); 819 803 void (*to_text)(struct printbuf *, struct bch_fs *, struct jset_entry *); 820 804 }; 821 805 ··· 833 817 struct jset *jset, 834 818 struct jset_entry *entry, 835 819 unsigned version, int big_endian, 836 - enum bkey_invalid_flags flags) 820 + enum bch_validate_flags flags) 837 821 { 838 822 return entry->type < BCH_JSET_ENTRY_NR 839 823 ? bch2_jset_entry_ops[entry->type].validate(c, jset, entry, ··· 853 837 } 854 838 855 839 static int jset_validate_entries(struct bch_fs *c, struct jset *jset, 856 - enum bkey_invalid_flags flags) 840 + enum bch_validate_flags flags) 857 841 { 858 842 unsigned version = le32_to_cpu(jset->version); 859 843 int ret = 0; ··· 879 863 static int jset_validate(struct bch_fs *c, 880 864 struct bch_dev *ca, 881 865 struct jset *jset, u64 sector, 882 - enum bkey_invalid_flags flags) 866 + enum bch_validate_flags flags) 883 867 { 884 868 unsigned version; 885 869 int ret = 0; ··· 934 918 { 935 919 size_t bytes = vstruct_bytes(jset); 936 920 unsigned version; 937 - enum bkey_invalid_flags flags = BKEY_INVALID_JOURNAL; 921 + enum bch_validate_flags flags = BCH_VALIDATE_journal; 938 922 int ret = 0; 939 923 940 924 if (le64_to_cpu(jset->magic) != jset_magic(c)) ··· 1073 1057 goto err; 1074 1058 } 1075 1059 1060 + if (le64_to_cpu(j->seq) > ja->highest_seq_found) { 1061 + ja->highest_seq_found = le64_to_cpu(j->seq); 1062 + ja->cur_idx = bucket; 1063 + ja->sectors_free = ca->mi.bucket_size - 1064 + bucket_remainder(ca, offset) - sectors; 1065 + } 1066 + 1076 1067 /* 1077 1068 * This happens sometimes if we don't have discards on - 1078 1069 * when we've partially overwritten a bucket with new ··· 1148 1125 struct bch_fs *c = ca->fs; 1149 1126 struct journal_list *jlist = 1150 1127 container_of(cl->parent, struct journal_list, cl); 1151 - struct journal_replay *r, **_r; 1152 - struct genradix_iter iter; 1153 1128 struct journal_read_buf buf = { NULL, 0 }; 1154 1129 unsigned i; 1155 1130 int ret = 0; ··· 1165 1144 ret = journal_read_bucket(ca, &buf, jlist, i); 1166 1145 if (ret) 1167 1146 goto err; 1168 - } 1169 - 1170 - ja->sectors_free = ca->mi.bucket_size; 1171 - 1172 - mutex_lock(&jlist->lock); 1173 - genradix_for_each_reverse(&c->journal_entries, iter, _r) { 1174 - r = *_r; 1175 - 1176 - if (!r) 1177 - continue; 1178 - 1179 - darray_for_each(r->ptrs, i) 1180 - if (i->dev == ca->dev_idx) { 1181 - unsigned wrote = bucket_remainder(ca, i->sector) + 1182 - vstruct_sectors(&r->j, c->block_bits); 1183 - 1184 - ja->cur_idx = i->bucket; 1185 - ja->sectors_free = ca->mi.bucket_size - wrote; 1186 - goto found; 1187 - } 1188 - } 1189 - found: 1190 - mutex_unlock(&jlist->lock); 1191 - 1192 - if (ja->bucket_seq[ja->cur_idx] && 1193 - ja->sectors_free == ca->mi.bucket_size) { 1194 - #if 0 1195 - /* 1196 - * Debug code for ZNS support, where we (probably) want to be 1197 - * correlated where we stopped in the journal to the zone write 1198 - * points: 1199 - */ 1200 - bch_err(c, "ja->sectors_free == ca->mi.bucket_size"); 1201 - bch_err(c, "cur_idx %u/%u", ja->cur_idx, ja->nr); 1202 - for (i = 0; i < 3; i++) { 1203 - unsigned idx = (ja->cur_idx + ja->nr - 1 + i) % ja->nr; 1204 - 1205 - bch_err(c, "bucket_seq[%u] = %llu", idx, ja->bucket_seq[idx]); 1206 - } 1207 - #endif 1208 - ja->sectors_free = 0; 1209 1147 } 1210 1148 1211 1149 /* ··· 1235 1255 * those entries will be blacklisted: 1236 1256 */ 1237 1257 genradix_for_each_reverse(&c->journal_entries, radix_iter, _i) { 1238 - enum bkey_invalid_flags flags = BKEY_INVALID_JOURNAL; 1258 + enum bch_validate_flags flags = BCH_VALIDATE_journal; 1239 1259 1240 1260 i = *_i; 1241 1261 ··· 1346 1366 fsck_err(c, journal_entries_missing, 1347 1367 "journal entries %llu-%llu missing! (replaying %llu-%llu)\n" 1348 1368 " prev at %s\n" 1349 - " next at %s", 1369 + " next at %s, continue?", 1350 1370 missing_start, missing_end, 1351 1371 *last_seq, *blacklist_seq - 1, 1352 1372 buf1.buf, buf2.buf); ··· 1370 1390 continue; 1371 1391 1372 1392 darray_for_each(i->ptrs, ptr) { 1373 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 1393 + struct bch_dev *ca = bch2_dev_have_ref(c, ptr->dev); 1374 1394 1375 1395 if (!ptr->csum_good) 1376 1396 bch_err_dev_offset(ca, ptr->sector, ··· 1380 1400 } 1381 1401 1382 1402 ret = jset_validate(c, 1383 - bch_dev_bkey_exists(c, i->ptrs.data[0].dev), 1403 + bch2_dev_have_ref(c, i->ptrs.data[0].dev), 1384 1404 &i->j, 1385 1405 i->ptrs.data[0].sector, 1386 1406 READ); ··· 1711 1731 unsigned sectors = vstruct_sectors(w->data, c->block_bits); 1712 1732 1713 1733 extent_for_each_ptr(bkey_i_to_s_extent(&w->key), ptr) { 1714 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 1715 - struct journal_device *ja = &ca->journal; 1716 - 1717 - if (!percpu_ref_tryget(&ca->io_ref)) { 1734 + struct bch_dev *ca = bch2_dev_get_ioref(c, ptr->dev, WRITE); 1735 + if (!ca) { 1718 1736 /* XXX: fix this */ 1719 1737 bch_err(c, "missing device for journal write\n"); 1720 1738 continue; ··· 1721 1743 this_cpu_add(ca->io_done->sectors[WRITE][BCH_DATA_journal], 1722 1744 sectors); 1723 1745 1746 + struct journal_device *ja = &ca->journal; 1724 1747 struct bio *bio = &ja->bio[w->idx]->bio; 1725 1748 bio_reset(bio, ca->disk_sb.bdev, REQ_OP_WRITE|REQ_SYNC|REQ_META); 1726 1749 bio->bi_iter.bi_sector = ptr->offset; ··· 1937 1958 * So if we're in an error state, and we're still starting up, we don't 1938 1959 * write anything at all. 1939 1960 */ 1940 - if (error && test_bit(JOURNAL_NEED_FLUSH_WRITE, &j->flags)) 1961 + if (error && test_bit(JOURNAL_need_flush_write, &j->flags)) 1941 1962 return -EIO; 1942 1963 1943 1964 if (error || 1944 1965 w->noflush || 1945 1966 (!w->must_flush && 1946 1967 (jiffies - j->last_flush_write) < msecs_to_jiffies(c->opts.journal_flush_delay) && 1947 - test_bit(JOURNAL_MAY_SKIP_FLUSH, &j->flags))) { 1968 + test_bit(JOURNAL_may_skip_flush, &j->flags))) { 1948 1969 w->noflush = true; 1949 1970 SET_JSET_NO_FLUSH(w->data, true); 1950 1971 w->data->last_seq = 0; ··· 1955 1976 w->must_flush = true; 1956 1977 j->last_flush_write = jiffies; 1957 1978 j->nr_flush_writes++; 1958 - clear_bit(JOURNAL_NEED_FLUSH_WRITE, &j->flags); 1979 + clear_bit(JOURNAL_need_flush_write, &j->flags); 1959 1980 } 1960 1981 1961 1982 return 0;
+4 -1
fs/bcachefs/journal_io.h
··· 4 4 5 5 #include "darray.h" 6 6 7 + void bch2_journal_pos_from_member_info_set(struct bch_fs *); 8 + void bch2_journal_pos_from_member_info_resume(struct bch_fs *); 9 + 7 10 struct journal_ptr { 8 11 bool csum_good; 9 12 u8 dev; ··· 63 60 64 61 int bch2_journal_entry_validate(struct bch_fs *, struct jset *, 65 62 struct jset_entry *, unsigned, int, 66 - enum bkey_invalid_flags); 63 + enum bch_validate_flags); 67 64 void bch2_journal_entry_to_text(struct printbuf *, struct bch_fs *, 68 65 struct jset_entry *); 69 66
+5 -5
fs/bcachefs/journal_reclaim.c
··· 67 67 track_event_change(&c->times[BCH_TIME_blocked_write_buffer_full], low_on_wb)) 68 68 trace_and_count(c, journal_full, c); 69 69 70 - mod_bit(JOURNAL_SPACE_LOW, &j->flags, low_on_space || low_on_pin); 70 + mod_bit(JOURNAL_space_low, &j->flags, low_on_space || low_on_pin); 71 71 72 72 swap(watermark, j->watermark); 73 73 if (watermark > j->watermark) ··· 225 225 j->space[journal_space_clean_ondisk].total) && 226 226 (clean - clean_ondisk <= total / 8) && 227 227 (clean_ondisk * 2 > clean)) 228 - set_bit(JOURNAL_MAY_SKIP_FLUSH, &j->flags); 228 + set_bit(JOURNAL_may_skip_flush, &j->flags); 229 229 else 230 - clear_bit(JOURNAL_MAY_SKIP_FLUSH, &j->flags); 230 + clear_bit(JOURNAL_may_skip_flush, &j->flags); 231 231 232 232 bch2_journal_set_watermark(j); 233 233 out: ··· 818 818 * If journal replay hasn't completed, the unreplayed journal entries 819 819 * hold refs on their corresponding sequence numbers 820 820 */ 821 - ret = !test_bit(JOURNAL_REPLAY_DONE, &j->flags) || 821 + ret = !test_bit(JOURNAL_replay_done, &j->flags) || 822 822 journal_last_seq(j) > seq_to_flush || 823 823 !fifo_used(&j->pin); 824 824 ··· 833 833 /* time_stats this */ 834 834 bool did_work = false; 835 835 836 - if (!test_bit(JOURNAL_STARTED, &j->flags)) 836 + if (!test_bit(JOURNAL_running, &j->flags)) 837 837 return false; 838 838 839 839 closure_wait_event(&j->async_wait,
+4 -6
fs/bcachefs/journal_sb.c
··· 16 16 return cmp_int(*l, *r); 17 17 } 18 18 19 - static int bch2_sb_journal_validate(struct bch_sb *sb, 20 - struct bch_sb_field *f, 21 - struct printbuf *err) 19 + static int bch2_sb_journal_validate(struct bch_sb *sb, struct bch_sb_field *f, 20 + enum bch_validate_flags flags, struct printbuf *err) 22 21 { 23 22 struct bch_sb_field_journal *journal = field_to_type(f, journal); 24 23 struct bch_member m = bch2_sb_member_get(sb, sb->dev_idx); ··· 98 99 return cmp_int(l->start, r->start); 99 100 } 100 101 101 - static int bch2_sb_journal_v2_validate(struct bch_sb *sb, 102 - struct bch_sb_field *f, 103 - struct printbuf *err) 102 + static int bch2_sb_journal_v2_validate(struct bch_sb *sb, struct bch_sb_field *f, 103 + enum bch_validate_flags flags, struct printbuf *err) 104 104 { 105 105 struct bch_sb_field_journal_v2 *journal = field_to_type(f, journal_v2); 106 106 struct bch_member m = bch2_sb_member_get(sb, sb->dev_idx);
+19 -58
fs/bcachefs/journal_seq_blacklist.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 3 3 #include "bcachefs.h" 4 - #include "btree_iter.h" 5 4 #include "eytzinger.h" 5 + #include "journal.h" 6 6 #include "journal_seq_blacklist.h" 7 7 #include "super-io.h" 8 8 ··· 162 162 return 0; 163 163 } 164 164 165 - static int bch2_sb_journal_seq_blacklist_validate(struct bch_sb *sb, 166 - struct bch_sb_field *f, 167 - struct printbuf *err) 165 + static int bch2_sb_journal_seq_blacklist_validate(struct bch_sb *sb, struct bch_sb_field *f, 166 + enum bch_validate_flags flags, struct printbuf *err) 168 167 { 169 168 struct bch_sb_field_journal_seq_blacklist *bl = 170 169 field_to_type(f, journal_seq_blacklist); ··· 216 217 .to_text = bch2_sb_journal_seq_blacklist_to_text 217 218 }; 218 219 219 - void bch2_blacklist_entries_gc(struct work_struct *work) 220 + bool bch2_blacklist_entries_gc(struct bch_fs *c) 220 221 { 221 - struct bch_fs *c = container_of(work, struct bch_fs, 222 - journal_seq_blacklist_gc_work); 223 - struct journal_seq_blacklist_table *t; 224 - struct bch_sb_field_journal_seq_blacklist *bl; 225 222 struct journal_seq_blacklist_entry *src, *dst; 226 - struct btree_trans *trans = bch2_trans_get(c); 227 - unsigned i, nr, new_nr; 228 - int ret; 229 223 230 - for (i = 0; i < BTREE_ID_NR; i++) { 231 - struct btree_iter iter; 232 - struct btree *b; 233 - 234 - bch2_trans_node_iter_init(trans, &iter, i, POS_MIN, 235 - 0, 0, BTREE_ITER_PREFETCH); 236 - retry: 237 - bch2_trans_begin(trans); 238 - 239 - b = bch2_btree_iter_peek_node(&iter); 240 - 241 - while (!(ret = PTR_ERR_OR_ZERO(b)) && 242 - b && 243 - !test_bit(BCH_FS_stopping, &c->flags)) 244 - b = bch2_btree_iter_next_node(&iter); 245 - 246 - if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) 247 - goto retry; 248 - 249 - bch2_trans_iter_exit(trans, &iter); 250 - } 251 - 252 - bch2_trans_put(trans); 253 - if (ret) 254 - return; 255 - 256 - mutex_lock(&c->sb_lock); 257 - bl = bch2_sb_field_get(c->disk_sb.sb, journal_seq_blacklist); 224 + struct bch_sb_field_journal_seq_blacklist *bl = 225 + bch2_sb_field_get(c->disk_sb.sb, journal_seq_blacklist); 258 226 if (!bl) 259 - goto out; 227 + return false; 260 228 261 - nr = blacklist_nr_entries(bl); 229 + unsigned nr = blacklist_nr_entries(bl); 262 230 dst = bl->start; 263 231 264 - t = c->journal_seq_blacklist_table; 232 + struct journal_seq_blacklist_table *t = c->journal_seq_blacklist_table; 265 233 BUG_ON(nr != t->nr); 266 234 235 + unsigned i; 267 236 for (src = bl->start, i = eytzinger0_first(t->nr); 268 237 src < bl->start + nr; 269 238 src++, i = eytzinger0_next(i, nr)) { 270 239 BUG_ON(t->entries[i].start != le64_to_cpu(src->start)); 271 240 BUG_ON(t->entries[i].end != le64_to_cpu(src->end)); 272 241 273 - if (t->entries[i].dirty) 242 + if (t->entries[i].dirty || t->entries[i].end >= c->journal.oldest_seq_found_ondisk) 274 243 *dst++ = *src; 275 244 } 276 245 277 - new_nr = dst - bl->start; 246 + unsigned new_nr = dst - bl->start; 247 + if (new_nr == nr) 248 + return false; 278 249 279 - bch_info(c, "nr blacklist entries was %u, now %u", nr, new_nr); 250 + bch_verbose(c, "nr blacklist entries was %u, now %u", nr, new_nr); 280 251 281 - if (new_nr != nr) { 282 - bl = bch2_sb_field_resize(&c->disk_sb, journal_seq_blacklist, 283 - new_nr ? sb_blacklist_u64s(new_nr) : 0); 284 - BUG_ON(new_nr && !bl); 285 - 286 - if (!new_nr) 287 - c->disk_sb.sb->features[0] &= cpu_to_le64(~(1ULL << BCH_FEATURE_journal_seq_blacklist_v3)); 288 - 289 - bch2_write_super(c); 290 - } 291 - out: 292 - mutex_unlock(&c->sb_lock); 252 + bl = bch2_sb_field_resize(&c->disk_sb, journal_seq_blacklist, 253 + new_nr ? sb_blacklist_u64s(new_nr) : 0); 254 + BUG_ON(new_nr && !bl); 255 + return true; 293 256 }
+1 -1
fs/bcachefs/journal_seq_blacklist.h
··· 17 17 18 18 extern const struct bch_sb_field_ops bch_sb_field_ops_journal_seq_blacklist; 19 19 20 - void bch2_blacklist_entries_gc(struct work_struct *); 20 + bool bch2_blacklist_entries_gc(struct bch_fs *); 21 21 22 22 #endif /* _BCACHEFS_JOURNAL_SEQ_BLACKLIST_H */
+12 -5
fs/bcachefs/journal_types.h
··· 129 129 journal_space_nr, 130 130 }; 131 131 132 + #define JOURNAL_FLAGS() \ 133 + x(replay_done) \ 134 + x(running) \ 135 + x(may_skip_flush) \ 136 + x(need_flush_write) \ 137 + x(space_low) 138 + 132 139 enum journal_flags { 133 - JOURNAL_REPLAY_DONE, 134 - JOURNAL_STARTED, 135 - JOURNAL_MAY_SKIP_FLUSH, 136 - JOURNAL_NEED_FLUSH_WRITE, 137 - JOURNAL_SPACE_LOW, 140 + #define x(n) JOURNAL_##n, 141 + JOURNAL_FLAGS() 142 + #undef x 138 143 }; 139 144 140 145 /* Reasons we may fail to get a journal reservation: */ ··· 234 229 u64 last_seq_ondisk; 235 230 u64 err_seq; 236 231 u64 last_empty_seq; 232 + u64 oldest_seq_found_ondisk; 237 233 238 234 /* 239 235 * FIFO of journal entries whose btree updates have not yet been ··· 332 326 333 327 /* for bch_journal_read_device */ 334 328 struct closure read; 329 + u64 highest_seq_found; 335 330 }; 336 331 337 332 /*
+1 -1
fs/bcachefs/logged_ops.c
··· 56 56 int ret = bch2_trans_run(c, 57 57 for_each_btree_key(trans, iter, 58 58 BTREE_ID_logged_ops, POS_MIN, 59 - BTREE_ITER_PREFETCH, k, 59 + BTREE_ITER_prefetch, k, 60 60 resume_logged_op(trans, &iter, k))); 61 61 bch_err_fn(c, ret); 62 62 return ret;
+2 -2
fs/bcachefs/lru.c
··· 11 11 12 12 /* KEY_TYPE_lru is obsolete: */ 13 13 int bch2_lru_invalid(struct bch_fs *c, struct bkey_s_c k, 14 - enum bkey_invalid_flags flags, 14 + enum bch_validate_flags flags, 15 15 struct printbuf *err) 16 16 { 17 17 int ret = 0; ··· 149 149 struct bpos last_flushed_pos = POS_MIN; 150 150 int ret = bch2_trans_run(c, 151 151 for_each_btree_key_commit(trans, iter, 152 - BTREE_ID_lru, POS_MIN, BTREE_ITER_PREFETCH, k, 152 + BTREE_ID_lru, POS_MIN, BTREE_ITER_prefetch, k, 153 153 NULL, NULL, BCH_TRANS_COMMIT_no_enospc|BCH_TRANS_COMMIT_lazy_rw, 154 154 bch2_check_lru_key(trans, &iter, k, &last_flushed_pos))); 155 155 bch_err_fn(c, ret);
+1 -1
fs/bcachefs/lru.h
··· 49 49 } 50 50 51 51 int bch2_lru_invalid(struct bch_fs *, struct bkey_s_c, 52 - enum bkey_invalid_flags, struct printbuf *); 52 + enum bch_validate_flags, struct printbuf *); 53 53 void bch2_lru_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 54 54 55 55 void bch2_lru_pos_to_text(struct printbuf *, struct bpos);
+4 -4
fs/bcachefs/migrate.c
··· 49 49 if (!bch2_bkey_has_device_c(k, dev_idx)) 50 50 return 0; 51 51 52 - n = bch2_bkey_make_mut(trans, iter, &k, BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 52 + n = bch2_bkey_make_mut(trans, iter, &k, BTREE_UPDATE_internal_snapshot_node); 53 53 ret = PTR_ERR_OR_ZERO(n); 54 54 if (ret) 55 55 return ret; ··· 67 67 68 68 /* 69 69 * Since we're not inserting through an extent iterator 70 - * (BTREE_ITER_ALL_SNAPSHOTS iterators aren't extent iterators), 70 + * (BTREE_ITER_all_snapshots iterators aren't extent iterators), 71 71 * we aren't using the extent overwrite path to delete, we're 72 72 * just using the normal key deletion path: 73 73 */ ··· 87 87 continue; 88 88 89 89 ret = for_each_btree_key_commit(trans, iter, id, POS_MIN, 90 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, 90 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 91 91 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 92 92 bch2_dev_usrdata_drop_key(trans, &iter, k, dev_idx, flags)); 93 93 if (ret) ··· 119 119 120 120 for (id = 0; id < BTREE_ID_NR; id++) { 121 121 bch2_trans_node_iter_init(trans, &iter, id, POS_MIN, 0, 0, 122 - BTREE_ITER_PREFETCH); 122 + BTREE_ITER_prefetch); 123 123 retry: 124 124 ret = 0; 125 125 while (bch2_trans_begin(trans),
+30 -52
fs/bcachefs/move.c
··· 41 41 struct data_update_opts *data_opts) 42 42 { 43 43 printbuf_tabstop_push(out, 20); 44 - prt_str(out, "rewrite ptrs:"); 45 - prt_tab(out); 44 + prt_str(out, "rewrite ptrs:\t"); 46 45 bch2_prt_u64_base2(out, data_opts->rewrite_ptrs); 47 46 prt_newline(out); 48 47 49 - prt_str(out, "kill ptrs: "); 50 - prt_tab(out); 48 + prt_str(out, "kill ptrs:\t"); 51 49 bch2_prt_u64_base2(out, data_opts->kill_ptrs); 52 50 prt_newline(out); 53 51 54 - prt_str(out, "target: "); 55 - prt_tab(out); 52 + prt_str(out, "target:\t"); 56 53 bch2_target_to_text(out, c, data_opts->target); 57 54 prt_newline(out); 58 55 59 - prt_str(out, "compression: "); 60 - prt_tab(out); 56 + prt_str(out, "compression:\t"); 61 57 bch2_compression_opt_to_text(out, background_compression(*io_opts)); 62 58 prt_newline(out); 63 59 64 - prt_str(out, "extra replicas: "); 65 - prt_tab(out); 60 + prt_str(out, "extra replicas:\t"); 66 61 prt_u64(out, data_opts->extra_replicas); 67 62 } 68 63 ··· 416 421 io_opts->d.nr = 0; 417 422 418 423 ret = for_each_btree_key(trans, iter, BTREE_ID_inodes, POS(0, extent_k.k->p.inode), 419 - BTREE_ITER_ALL_SNAPSHOTS, k, ({ 424 + BTREE_ITER_all_snapshots, k, ({ 420 425 if (k.k->p.offset != extent_k.k->p.inode) 421 426 break; 422 427 ··· 462 467 463 468 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_inodes, 464 469 SPOS(0, extent_k.k->p.inode, extent_k.k->p.snapshot), 465 - BTREE_ITER_CACHED); 470 + BTREE_ITER_cached); 466 471 ret = bkey_err(k); 467 472 if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) 468 473 return ret; ··· 548 553 } 549 554 550 555 bch2_trans_iter_init(trans, &iter, btree_id, start, 551 - BTREE_ITER_PREFETCH| 552 - BTREE_ITER_ALL_SNAPSHOTS); 556 + BTREE_ITER_prefetch| 557 + BTREE_ITER_all_snapshots); 553 558 554 559 if (ctxt->rate) 555 560 bch2_ratelimit_reset(ctxt->rate); ··· 690 695 struct bpos bp_pos = POS_MIN; 691 696 int ret = 0; 692 697 698 + struct bch_dev *ca = bch2_dev_tryget(c, bucket.inode); 699 + if (!ca) 700 + return 0; 701 + 693 702 trace_bucket_evacuate(c, &bucket); 694 703 695 704 bch2_bkey_buf_init(&sk); ··· 704 705 bch2_trans_begin(trans); 705 706 706 707 bch2_trans_iter_init(trans, &iter, BTREE_ID_alloc, 707 - bucket, BTREE_ITER_CACHED); 708 + bucket, BTREE_ITER_cached); 708 709 ret = lockrestart_do(trans, 709 710 bkey_err(k = bch2_btree_iter_peek_slot(&iter))); 710 711 bch2_trans_iter_exit(trans, &iter); ··· 715 716 716 717 a = bch2_alloc_to_v4(k, &a_convert); 717 718 dirty_sectors = bch2_bucket_sectors_dirty(*a); 718 - bucket_size = bch_dev_bkey_exists(c, bucket.inode)->mi.bucket_size; 719 + bucket_size = ca->mi.bucket_size; 719 720 fragmentation = a->fragmentation_lru; 720 721 721 722 ret = bch2_btree_write_buffer_tryflush(trans); ··· 729 730 730 731 bch2_trans_begin(trans); 731 732 732 - ret = bch2_get_next_backpointer(trans, bucket, gen, 733 + ret = bch2_get_next_backpointer(trans, ca, bucket, gen, 733 734 &bp_pos, &bp, 734 - BTREE_ITER_CACHED); 735 + BTREE_ITER_cached); 735 736 if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) 736 737 continue; 737 738 if (ret) ··· 827 828 828 829 trace_evacuate_bucket(c, &bucket, dirty_sectors, bucket_size, fragmentation, ret); 829 830 err: 831 + bch2_dev_put(ca); 830 832 bch2_bkey_buf_exit(&sk, c); 831 833 return ret; 832 834 } ··· 868 868 continue; 869 869 870 870 bch2_trans_node_iter_init(trans, &iter, btree, POS_MIN, 0, 0, 871 - BTREE_ITER_PREFETCH); 871 + BTREE_ITER_prefetch); 872 872 retry: 873 873 ret = 0; 874 874 while (bch2_trans_begin(trans), ··· 975 975 */ 976 976 static bool bformat_needs_redo(struct bkey_format *f) 977 977 { 978 - for (unsigned i = 0; i < f->nr_fields; i++) { 979 - unsigned f_bits = f->bits_per_field[i]; 980 - unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i]; 981 - u64 unpacked_mask = ~((~0ULL << 1) << (unpacked_bits - 1)); 982 - u64 field_offset = le64_to_cpu(f->field_offset[i]); 983 - 984 - if (f_bits > unpacked_bits) 978 + for (unsigned i = 0; i < f->nr_fields; i++) 979 + if (bch2_bkey_format_field_overflows(f, i)) 985 980 return true; 986 - 987 - if ((f_bits == unpacked_bits) && field_offset) 988 - return true; 989 - 990 - u64 f_mask = f_bits 991 - ? ~((~0ULL << (f_bits - 1)) << 1) 992 - : 0; 993 - 994 - if (((field_offset + f_mask) & unpacked_mask) < field_offset) 995 - return true; 996 - } 997 981 998 982 return false; 999 983 } ··· 1033 1049 struct extent_ptr_decoded p; 1034 1050 unsigned i = 0; 1035 1051 1052 + rcu_read_lock(); 1036 1053 bkey_for_each_ptr_decode(k.k, bch2_bkey_ptrs_c(k), p, entry) { 1037 1054 unsigned d = bch2_extent_ptr_durability(c, &p); 1038 1055 ··· 1044 1059 1045 1060 i++; 1046 1061 } 1062 + rcu_read_unlock(); 1047 1063 1048 1064 return data_opts->kill_ptrs != 0; 1049 1065 } ··· 1129 1143 prt_newline(out); 1130 1144 printbuf_indent_add(out, 2); 1131 1145 1132 - prt_str(out, "keys moved: "); 1133 - prt_u64(out, atomic64_read(&stats->keys_moved)); 1134 - prt_newline(out); 1135 - 1136 - prt_str(out, "keys raced: "); 1137 - prt_u64(out, atomic64_read(&stats->keys_raced)); 1138 - prt_newline(out); 1139 - 1140 - prt_str(out, "bytes seen: "); 1146 + prt_printf(out, "keys moved: %llu\n", atomic64_read(&stats->keys_moved)); 1147 + prt_printf(out, "keys raced: %llu\n", atomic64_read(&stats->keys_raced)); 1148 + prt_printf(out, "bytes seen: "); 1141 1149 prt_human_readable_u64(out, atomic64_read(&stats->sectors_seen) << 9); 1142 1150 prt_newline(out); 1143 1151 1144 - prt_str(out, "bytes moved: "); 1152 + prt_printf(out, "bytes moved: "); 1145 1153 prt_human_readable_u64(out, atomic64_read(&stats->sectors_moved) << 9); 1146 1154 prt_newline(out); 1147 1155 1148 - prt_str(out, "bytes raced: "); 1156 + prt_printf(out, "bytes raced: "); 1149 1157 prt_human_readable_u64(out, atomic64_read(&stats->sectors_raced) << 9); 1150 1158 prt_newline(out); 1151 1159 ··· 1153 1173 bch2_move_stats_to_text(out, ctxt->stats); 1154 1174 printbuf_indent_add(out, 2); 1155 1175 1156 - prt_printf(out, "reads: ios %u/%u sectors %u/%u", 1176 + prt_printf(out, "reads: ios %u/%u sectors %u/%u\n", 1157 1177 atomic_read(&ctxt->read_ios), 1158 1178 c->opts.move_ios_in_flight, 1159 1179 atomic_read(&ctxt->read_sectors), 1160 1180 c->opts.move_bytes_in_flight >> 9); 1161 - prt_newline(out); 1162 1181 1163 - prt_printf(out, "writes: ios %u/%u sectors %u/%u", 1182 + prt_printf(out, "writes: ios %u/%u sectors %u/%u\n", 1164 1183 atomic_read(&ctxt->write_ios), 1165 1184 c->opts.move_ios_in_flight, 1166 1185 atomic_read(&ctxt->write_sectors), 1167 1186 c->opts.move_bytes_in_flight >> 9); 1168 - prt_newline(out); 1169 1187 1170 1188 printbuf_indent_add(out, 2); 1171 1189
+3 -1
fs/bcachefs/movinggc.c
··· 84 84 return 0; 85 85 86 86 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_alloc, 87 - b->k.bucket, BTREE_ITER_CACHED); 87 + b->k.bucket, BTREE_ITER_cached); 88 88 ret = bkey_err(k); 89 89 if (ret) 90 90 return ret; ··· 157 157 158 158 if (bch2_fs_fatal_err_on(ret, c, "%s: from bch2_btree_write_buffer_tryflush()", bch2_err_str(ret))) 159 159 return ret; 160 + 161 + bch2_trans_begin(trans); 160 162 161 163 ret = for_each_btree_key_upto(trans, iter, BTREE_ID_lru, 162 164 lru_pos(BCH_LRU_FRAGMENTATION_START, 0, 0),
+1 -6
fs/bcachefs/opts.h
··· 426 426 BCH_SB_VERSION_UPGRADE, BCH_VERSION_UPGRADE_compatible, \ 427 427 NULL, "Set superblock to latest version,\n" \ 428 428 "allowing any new features to be used") \ 429 - x(buckets_nouse, u8, \ 430 - 0, \ 431 - OPT_BOOL(), \ 432 - BCH2_NO_SB_OPT, false, \ 433 - NULL, "Allocate the buckets_nouse bitmap") \ 434 429 x(stdio, u64, \ 435 430 0, \ 436 431 OPT_UINT(0, S64_MAX), \ ··· 475 480 OPT_FS|OPT_MOUNT|OPT_RUNTIME, \ 476 481 OPT_BOOL(), \ 477 482 BCH2_NO_SB_OPT, true, \ 478 - NULL, "BTREE_ITER_PREFETCH casuse btree nodes to be\n"\ 483 + NULL, "BTREE_ITER_prefetch casuse btree nodes to be\n"\ 479 484 " prefetched sequentially") 480 485 481 486 struct bch_opts {
+136 -96
fs/bcachefs/printbuf.c
··· 10 10 11 11 #include "printbuf.h" 12 12 13 + static inline unsigned __printbuf_linelen(struct printbuf *buf, unsigned pos) 14 + { 15 + return pos - buf->last_newline; 16 + } 17 + 13 18 static inline unsigned printbuf_linelen(struct printbuf *buf) 14 19 { 15 - return buf->pos - buf->last_newline; 20 + return __printbuf_linelen(buf, buf->pos); 21 + } 22 + 23 + /* 24 + * Returns spaces from start of line, if set, or 0 if unset: 25 + */ 26 + static inline unsigned cur_tabstop(struct printbuf *buf) 27 + { 28 + return buf->cur_tabstop < buf->nr_tabstops 29 + ? buf->_tabstops[buf->cur_tabstop] 30 + : 0; 16 31 } 17 32 18 33 int bch2_printbuf_make_room(struct printbuf *out, unsigned extra) 19 34 { 20 - unsigned new_size; 21 - char *buf; 22 - 23 - if (!out->heap_allocated) 24 - return 0; 25 - 26 35 /* Reserved space for terminating nul: */ 27 36 extra += 1; 28 37 29 - if (out->pos + extra < out->size) 38 + if (out->pos + extra <= out->size) 30 39 return 0; 31 40 32 - new_size = roundup_pow_of_two(out->size + extra); 41 + if (!out->heap_allocated) { 42 + out->overflow = true; 43 + return 0; 44 + } 45 + 46 + unsigned new_size = roundup_pow_of_two(out->size + extra); 33 47 34 48 /* 35 49 * Note: output buffer must be freeable with kfree(), it's not required 36 50 * that the user use printbuf_exit(). 37 51 */ 38 - buf = krealloc(out->buf, new_size, !out->atomic ? GFP_KERNEL : GFP_NOWAIT); 52 + char *buf = krealloc(out->buf, new_size, !out->atomic ? GFP_KERNEL : GFP_NOWAIT); 39 53 40 54 if (!buf) { 41 55 out->allocation_failure = true; 56 + out->overflow = true; 42 57 return -ENOMEM; 43 58 } 44 59 45 60 out->buf = buf; 46 61 out->size = new_size; 47 62 return 0; 63 + } 64 + 65 + static void printbuf_advance_pos(struct printbuf *out, unsigned len) 66 + { 67 + out->pos += min(len, printbuf_remaining(out)); 68 + } 69 + 70 + static void printbuf_insert_spaces(struct printbuf *out, unsigned pos, unsigned nr) 71 + { 72 + unsigned move = out->pos - pos; 73 + 74 + bch2_printbuf_make_room(out, nr); 75 + 76 + if (pos + nr < out->size) 77 + memmove(out->buf + pos + nr, 78 + out->buf + pos, 79 + min(move, out->size - 1 - pos - nr)); 80 + 81 + if (pos < out->size) 82 + memset(out->buf + pos, ' ', min(nr, out->size - pos)); 83 + 84 + printbuf_advance_pos(out, nr); 85 + printbuf_nul_terminate_reserved(out); 86 + } 87 + 88 + static void __printbuf_do_indent(struct printbuf *out, unsigned pos) 89 + { 90 + while (true) { 91 + int pad; 92 + unsigned len = out->pos - pos; 93 + char *p = out->buf + pos; 94 + char *n = memscan(p, '\n', len); 95 + if (cur_tabstop(out)) { 96 + n = min(n, (char *) memscan(p, '\r', len)); 97 + n = min(n, (char *) memscan(p, '\t', len)); 98 + } 99 + 100 + pos = n - out->buf; 101 + if (pos == out->pos) 102 + break; 103 + 104 + switch (*n) { 105 + case '\n': 106 + pos++; 107 + out->last_newline = pos; 108 + 109 + printbuf_insert_spaces(out, pos, out->indent); 110 + 111 + pos = min(pos + out->indent, out->pos); 112 + out->last_field = pos; 113 + out->cur_tabstop = 0; 114 + break; 115 + case '\r': 116 + memmove(n, n + 1, out->pos - pos); 117 + --out->pos; 118 + pad = (int) cur_tabstop(out) - (int) __printbuf_linelen(out, pos); 119 + if (pad > 0) { 120 + printbuf_insert_spaces(out, out->last_field, pad); 121 + pos += pad; 122 + } 123 + 124 + out->last_field = pos; 125 + out->cur_tabstop++; 126 + break; 127 + case '\t': 128 + pad = (int) cur_tabstop(out) - (int) __printbuf_linelen(out, pos) - 1; 129 + if (pad > 0) { 130 + *n = ' '; 131 + printbuf_insert_spaces(out, pos, pad - 1); 132 + pos += pad; 133 + } else { 134 + memmove(n, n + 1, out->pos - pos); 135 + --out->pos; 136 + } 137 + 138 + out->last_field = pos; 139 + out->cur_tabstop++; 140 + break; 141 + } 142 + } 143 + } 144 + 145 + static inline void printbuf_do_indent(struct printbuf *out, unsigned pos) 146 + { 147 + if (out->has_indent_or_tabstops && !out->suppress_indent_tabstop_handling) 148 + __printbuf_do_indent(out, pos); 48 149 } 49 150 50 151 void bch2_prt_vprintf(struct printbuf *out, const char *fmt, va_list args) ··· 156 55 va_list args2; 157 56 158 57 va_copy(args2, args); 159 - len = vsnprintf(out->buf + out->pos, printbuf_remaining(out), fmt, args2); 58 + len = vsnprintf(out->buf + out->pos, printbuf_remaining_size(out), fmt, args2); 160 59 va_end(args2); 161 - } while (len + 1 >= printbuf_remaining(out) && 162 - !bch2_printbuf_make_room(out, len + 1)); 60 + } while (len > printbuf_remaining(out) && 61 + !bch2_printbuf_make_room(out, len)); 163 62 164 - len = min_t(size_t, len, 165 - printbuf_remaining(out) ? printbuf_remaining(out) - 1 : 0); 166 - out->pos += len; 63 + unsigned indent_pos = out->pos; 64 + printbuf_advance_pos(out, len); 65 + printbuf_do_indent(out, indent_pos); 167 66 } 168 67 169 68 void bch2_prt_printf(struct printbuf *out, const char *fmt, ...) ··· 173 72 174 73 do { 175 74 va_start(args, fmt); 176 - len = vsnprintf(out->buf + out->pos, printbuf_remaining(out), fmt, args); 75 + len = vsnprintf(out->buf + out->pos, printbuf_remaining_size(out), fmt, args); 177 76 va_end(args); 178 - } while (len + 1 >= printbuf_remaining(out) && 179 - !bch2_printbuf_make_room(out, len + 1)); 77 + } while (len > printbuf_remaining(out) && 78 + !bch2_printbuf_make_room(out, len)); 180 79 181 - len = min_t(size_t, len, 182 - printbuf_remaining(out) ? printbuf_remaining(out) - 1 : 0); 183 - out->pos += len; 80 + unsigned indent_pos = out->pos; 81 + printbuf_advance_pos(out, len); 82 + printbuf_do_indent(out, indent_pos); 184 83 } 185 84 186 85 /** ··· 295 194 296 195 void bch2_prt_newline(struct printbuf *buf) 297 196 { 298 - unsigned i; 299 - 300 197 bch2_printbuf_make_room(buf, 1 + buf->indent); 301 198 302 - __prt_char(buf, '\n'); 199 + __prt_char_reserved(buf, '\n'); 303 200 304 201 buf->last_newline = buf->pos; 305 202 306 - for (i = 0; i < buf->indent; i++) 307 - __prt_char(buf, ' '); 203 + __prt_chars_reserved(buf, ' ', buf->indent); 308 204 309 - printbuf_nul_terminate(buf); 205 + printbuf_nul_terminate_reserved(buf); 310 206 311 207 buf->last_field = buf->pos; 312 208 buf->cur_tabstop = 0; 313 - } 314 - 315 - /* 316 - * Returns spaces from start of line, if set, or 0 if unset: 317 - */ 318 - static inline unsigned cur_tabstop(struct printbuf *buf) 319 - { 320 - return buf->cur_tabstop < buf->nr_tabstops 321 - ? buf->_tabstops[buf->cur_tabstop] 322 - : 0; 323 209 } 324 210 325 211 static void __prt_tab(struct printbuf *out) ··· 335 247 336 248 static void __prt_tab_rjust(struct printbuf *buf) 337 249 { 338 - unsigned move = buf->pos - buf->last_field; 339 250 int pad = (int) cur_tabstop(buf) - (int) printbuf_linelen(buf); 340 - 341 - if (pad > 0) { 342 - bch2_printbuf_make_room(buf, pad); 343 - 344 - if (buf->last_field + pad < buf->size) 345 - memmove(buf->buf + buf->last_field + pad, 346 - buf->buf + buf->last_field, 347 - min(move, buf->size - 1 - buf->last_field - pad)); 348 - 349 - if (buf->last_field < buf->size) 350 - memset(buf->buf + buf->last_field, ' ', 351 - min((unsigned) pad, buf->size - buf->last_field)); 352 - 353 - buf->pos += pad; 354 - printbuf_nul_terminate(buf); 355 - } 251 + if (pad > 0) 252 + printbuf_insert_spaces(buf, buf->last_field, pad); 356 253 357 254 buf->last_field = buf->pos; 358 255 buf->cur_tabstop++; ··· 374 301 */ 375 302 void bch2_prt_bytes_indented(struct printbuf *out, const char *str, unsigned count) 376 303 { 377 - const char *unprinted_start = str; 378 - const char *end = str + count; 379 - 380 - if (!out->has_indent_or_tabstops || out->suppress_indent_tabstop_handling) { 381 - prt_bytes(out, str, count); 382 - return; 383 - } 384 - 385 - while (str != end) { 386 - switch (*str) { 387 - case '\n': 388 - prt_bytes(out, unprinted_start, str - unprinted_start); 389 - unprinted_start = str + 1; 390 - bch2_prt_newline(out); 391 - break; 392 - case '\t': 393 - if (likely(cur_tabstop(out))) { 394 - prt_bytes(out, unprinted_start, str - unprinted_start); 395 - unprinted_start = str + 1; 396 - __prt_tab(out); 397 - } 398 - break; 399 - case '\r': 400 - if (likely(cur_tabstop(out))) { 401 - prt_bytes(out, unprinted_start, str - unprinted_start); 402 - unprinted_start = str + 1; 403 - __prt_tab_rjust(out); 404 - } 405 - break; 406 - } 407 - 408 - str++; 409 - } 410 - 411 - prt_bytes(out, unprinted_start, str - unprinted_start); 304 + unsigned indent_pos = out->pos; 305 + prt_bytes(out, str, count); 306 + printbuf_do_indent(out, indent_pos); 412 307 } 413 308 414 309 /** ··· 389 348 void bch2_prt_human_readable_u64(struct printbuf *out, u64 v) 390 349 { 391 350 bch2_printbuf_make_room(out, 10); 392 - out->pos += string_get_size(v, 1, !out->si_units, 393 - out->buf + out->pos, 394 - printbuf_remaining_size(out)); 351 + unsigned len = string_get_size(v, 1, !out->si_units, 352 + out->buf + out->pos, 353 + printbuf_remaining_size(out)); 354 + printbuf_advance_pos(out, len); 395 355 } 396 356 397 357 /** ··· 444 402 const char * const list[], 445 403 size_t selected) 446 404 { 447 - size_t i; 448 - 449 - for (i = 0; list[i]; i++) 405 + for (size_t i = 0; list[i]; i++) 450 406 bch2_prt_printf(out, i == selected ? "[%s] " : "%s ", list[i]); 451 407 } 452 408
+24 -29
fs/bcachefs/printbuf.h
··· 86 86 u8 atomic; 87 87 bool allocation_failure:1; 88 88 bool heap_allocated:1; 89 + bool overflow:1; 89 90 enum printbuf_si si_units:1; 90 91 bool human_readable_units:1; 91 92 bool has_indent_or_tabstops:1; ··· 143 142 */ 144 143 static inline unsigned printbuf_remaining_size(struct printbuf *out) 145 144 { 146 - return out->pos < out->size ? out->size - out->pos : 0; 145 + if (WARN_ON(out->size && out->pos >= out->size)) 146 + out->pos = out->size - 1; 147 + return out->size - out->pos; 147 148 } 148 149 149 150 /* ··· 154 151 */ 155 152 static inline unsigned printbuf_remaining(struct printbuf *out) 156 153 { 157 - return out->pos < out->size ? out->size - out->pos - 1 : 0; 154 + return out->size ? printbuf_remaining_size(out) - 1 : 0; 158 155 } 159 156 160 157 static inline unsigned printbuf_written(struct printbuf *out) ··· 162 159 return out->size ? min(out->pos, out->size - 1) : 0; 163 160 } 164 161 165 - /* 166 - * Returns true if output was truncated: 167 - */ 168 - static inline bool printbuf_overflowed(struct printbuf *out) 162 + static inline void printbuf_nul_terminate_reserved(struct printbuf *out) 169 163 { 170 - return out->pos >= out->size; 164 + if (WARN_ON(out->size && out->pos >= out->size)) 165 + out->pos = out->size - 1; 166 + if (out->size) 167 + out->buf[out->pos] = 0; 171 168 } 172 169 173 170 static inline void printbuf_nul_terminate(struct printbuf *out) 174 171 { 175 172 bch2_printbuf_make_room(out, 1); 176 - 177 - if (out->pos < out->size) 178 - out->buf[out->pos] = 0; 179 - else if (out->size) 180 - out->buf[out->size - 1] = 0; 173 + printbuf_nul_terminate_reserved(out); 181 174 } 182 175 183 176 /* Doesn't call bch2_printbuf_make_room(), doesn't nul terminate: */ 184 177 static inline void __prt_char_reserved(struct printbuf *out, char c) 185 178 { 186 179 if (printbuf_remaining(out)) 187 - out->buf[out->pos] = c; 188 - out->pos++; 180 + out->buf[out->pos++] = c; 189 181 } 190 182 191 183 /* Doesn't nul terminate: */ ··· 192 194 193 195 static inline void prt_char(struct printbuf *out, char c) 194 196 { 195 - __prt_char(out, c); 196 - printbuf_nul_terminate(out); 197 + bch2_printbuf_make_room(out, 2); 198 + __prt_char_reserved(out, c); 199 + printbuf_nul_terminate_reserved(out); 197 200 } 198 201 199 202 static inline void __prt_chars_reserved(struct printbuf *out, char c, unsigned n) 200 203 { 201 - unsigned i, can_print = min(n, printbuf_remaining(out)); 204 + unsigned can_print = min(n, printbuf_remaining(out)); 202 205 203 - for (i = 0; i < can_print; i++) 206 + for (unsigned i = 0; i < can_print; i++) 204 207 out->buf[out->pos++] = c; 205 - out->pos += n - can_print; 206 208 } 207 209 208 210 static inline void prt_chars(struct printbuf *out, char c, unsigned n) 209 211 { 210 212 bch2_printbuf_make_room(out, n); 211 213 __prt_chars_reserved(out, c, n); 212 - printbuf_nul_terminate(out); 214 + printbuf_nul_terminate_reserved(out); 213 215 } 214 216 215 217 static inline void prt_bytes(struct printbuf *out, const void *b, unsigned n) 216 218 { 217 - unsigned i, can_print; 218 - 219 219 bch2_printbuf_make_room(out, n); 220 220 221 - can_print = min(n, printbuf_remaining(out)); 221 + unsigned can_print = min(n, printbuf_remaining(out)); 222 222 223 - for (i = 0; i < can_print; i++) 223 + for (unsigned i = 0; i < can_print; i++) 224 224 out->buf[out->pos++] = ((char *) b)[i]; 225 - out->pos += n - can_print; 226 225 227 226 printbuf_nul_terminate(out); 228 227 } ··· 236 241 237 242 static inline void prt_hex_byte(struct printbuf *out, u8 byte) 238 243 { 239 - bch2_printbuf_make_room(out, 2); 244 + bch2_printbuf_make_room(out, 3); 240 245 __prt_char_reserved(out, hex_asc_hi(byte)); 241 246 __prt_char_reserved(out, hex_asc_lo(byte)); 242 - printbuf_nul_terminate(out); 247 + printbuf_nul_terminate_reserved(out); 243 248 } 244 249 245 250 static inline void prt_hex_byte_upper(struct printbuf *out, u8 byte) 246 251 { 247 - bch2_printbuf_make_room(out, 2); 252 + bch2_printbuf_make_room(out, 3); 248 253 __prt_char_reserved(out, hex_asc_upper_hi(byte)); 249 254 __prt_char_reserved(out, hex_asc_upper_lo(byte)); 250 - printbuf_nul_terminate(out); 255 + printbuf_nul_terminate_reserved(out); 251 256 } 252 257 253 258 /**
+24 -99
fs/bcachefs/quota.c
··· 20 20 }; 21 21 22 22 static int bch2_sb_quota_validate(struct bch_sb *sb, struct bch_sb_field *f, 23 - struct printbuf *err) 23 + enum bch_validate_flags flags, struct printbuf *err) 24 24 { 25 25 struct bch_sb_field_quota *q = field_to_type(f, quota); 26 26 ··· 60 60 }; 61 61 62 62 int bch2_quota_invalid(struct bch_fs *c, struct bkey_s_c k, 63 - enum bkey_invalid_flags flags, 64 - struct printbuf *err) 63 + enum bch_validate_flags flags, struct printbuf *err) 65 64 { 66 65 int ret = 0; 67 66 ··· 96 97 printbuf_tabstops_reset(out); 97 98 printbuf_tabstop_push(out, 20); 98 99 99 - prt_str(out, "i_fieldmask"); 100 - prt_tab(out); 101 - prt_printf(out, "%x", i->i_fieldmask); 102 - prt_newline(out); 103 - 104 - prt_str(out, "i_flags"); 105 - prt_tab(out); 106 - prt_printf(out, "%u", i->i_flags); 107 - prt_newline(out); 108 - 109 - prt_str(out, "i_spc_timelimit"); 110 - prt_tab(out); 111 - prt_printf(out, "%u", i->i_spc_timelimit); 112 - prt_newline(out); 113 - 114 - prt_str(out, "i_ino_timelimit"); 115 - prt_tab(out); 116 - prt_printf(out, "%u", i->i_ino_timelimit); 117 - prt_newline(out); 118 - 119 - prt_str(out, "i_rt_spc_timelimit"); 120 - prt_tab(out); 121 - prt_printf(out, "%u", i->i_rt_spc_timelimit); 122 - prt_newline(out); 123 - 124 - prt_str(out, "i_spc_warnlimit"); 125 - prt_tab(out); 126 - prt_printf(out, "%u", i->i_spc_warnlimit); 127 - prt_newline(out); 128 - 129 - prt_str(out, "i_ino_warnlimit"); 130 - prt_tab(out); 131 - prt_printf(out, "%u", i->i_ino_warnlimit); 132 - prt_newline(out); 133 - 134 - prt_str(out, "i_rt_spc_warnlimit"); 135 - prt_tab(out); 136 - prt_printf(out, "%u", i->i_rt_spc_warnlimit); 137 - prt_newline(out); 100 + prt_printf(out, "i_fieldmask\t%x\n", i->i_fieldmask); 101 + prt_printf(out, "i_flags\t%u\n", i->i_flags); 102 + prt_printf(out, "i_spc_timelimit\t%u\n", i->i_spc_timelimit); 103 + prt_printf(out, "i_ino_timelimit\t%u\n", i->i_ino_timelimit); 104 + prt_printf(out, "i_rt_spc_timelimit\t%u\n", i->i_rt_spc_timelimit); 105 + prt_printf(out, "i_spc_warnlimit\t%u\n", i->i_spc_warnlimit); 106 + prt_printf(out, "i_ino_warnlimit\t%u\n", i->i_ino_warnlimit); 107 + prt_printf(out, "i_rt_spc_warnlimit\t%u\n", i->i_rt_spc_warnlimit); 138 108 } 139 109 140 110 static void qc_dqblk_to_text(struct printbuf *out, struct qc_dqblk *q) ··· 111 143 printbuf_tabstops_reset(out); 112 144 printbuf_tabstop_push(out, 20); 113 145 114 - prt_str(out, "d_fieldmask"); 115 - prt_tab(out); 116 - prt_printf(out, "%x", q->d_fieldmask); 117 - prt_newline(out); 118 - 119 - prt_str(out, "d_spc_hardlimit"); 120 - prt_tab(out); 121 - prt_printf(out, "%llu", q->d_spc_hardlimit); 122 - prt_newline(out); 123 - 124 - prt_str(out, "d_spc_softlimit"); 125 - prt_tab(out); 126 - prt_printf(out, "%llu", q->d_spc_softlimit); 127 - prt_newline(out); 128 - 129 - prt_str(out, "d_ino_hardlimit"); 130 - prt_tab(out); 131 - prt_printf(out, "%llu", q->d_ino_hardlimit); 132 - prt_newline(out); 133 - 134 - prt_str(out, "d_ino_softlimit"); 135 - prt_tab(out); 136 - prt_printf(out, "%llu", q->d_ino_softlimit); 137 - prt_newline(out); 138 - 139 - prt_str(out, "d_space"); 140 - prt_tab(out); 141 - prt_printf(out, "%llu", q->d_space); 142 - prt_newline(out); 143 - 144 - prt_str(out, "d_ino_count"); 145 - prt_tab(out); 146 - prt_printf(out, "%llu", q->d_ino_count); 147 - prt_newline(out); 148 - 149 - prt_str(out, "d_ino_timer"); 150 - prt_tab(out); 151 - prt_printf(out, "%llu", q->d_ino_timer); 152 - prt_newline(out); 153 - 154 - prt_str(out, "d_spc_timer"); 155 - prt_tab(out); 156 - prt_printf(out, "%llu", q->d_spc_timer); 157 - prt_newline(out); 158 - 159 - prt_str(out, "d_ino_warns"); 160 - prt_tab(out); 161 - prt_printf(out, "%i", q->d_ino_warns); 162 - prt_newline(out); 163 - 164 - prt_str(out, "d_spc_warns"); 165 - prt_tab(out); 166 - prt_printf(out, "%i", q->d_spc_warns); 167 - prt_newline(out); 146 + prt_printf(out, "d_fieldmask\t%x\n", q->d_fieldmask); 147 + prt_printf(out, "d_spc_hardlimit\t%llu\n", q->d_spc_hardlimit); 148 + prt_printf(out, "d_spc_softlimit\t%llu\n", q->d_spc_softlimit); 149 + prt_printf(out, "d_ino_hardlimit\%llu\n", q->d_ino_hardlimit); 150 + prt_printf(out, "d_ino_softlimit\t%llu\n", q->d_ino_softlimit); 151 + prt_printf(out, "d_space\t%llu\n", q->d_space); 152 + prt_printf(out, "d_ino_count\t%llu\n", q->d_ino_count); 153 + prt_printf(out, "d_ino_timer\t%llu\n", q->d_ino_timer); 154 + prt_printf(out, "d_spc_timer\t%llu\n", q->d_spc_timer); 155 + prt_printf(out, "d_ino_warns\t%i\n", q->d_ino_warns); 156 + prt_printf(out, "d_spc_warns\t%i\n", q->d_spc_warns); 168 157 } 169 158 170 159 static inline unsigned __next_qtype(unsigned i, unsigned qtypes) ··· 535 610 536 611 int ret = bch2_trans_run(c, 537 612 for_each_btree_key(trans, iter, BTREE_ID_quotas, POS_MIN, 538 - BTREE_ITER_PREFETCH, k, 613 + BTREE_ITER_prefetch, k, 539 614 __bch2_quota_set(c, k, NULL)) ?: 540 615 for_each_btree_key(trans, iter, BTREE_ID_inodes, POS_MIN, 541 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, 616 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 542 617 bch2_fs_quota_read_inode(trans, &iter, k))); 543 618 bch_err_fn(c, ret); 544 619 return ret; ··· 825 900 int ret; 826 901 827 902 k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_quotas, new_quota->k.p, 828 - BTREE_ITER_SLOTS|BTREE_ITER_INTENT); 903 + BTREE_ITER_slots|BTREE_ITER_intent); 829 904 ret = bkey_err(k); 830 905 if (unlikely(ret)) 831 906 return ret;
+2 -2
fs/bcachefs/quota.h
··· 5 5 #include "inode.h" 6 6 #include "quota_types.h" 7 7 8 - enum bkey_invalid_flags; 8 + enum bch_validate_flags; 9 9 extern const struct bch_sb_field_ops bch_sb_field_ops_quota; 10 10 11 11 int bch2_quota_invalid(struct bch_fs *, struct bkey_s_c, 12 - enum bkey_invalid_flags, struct printbuf *); 12 + enum bch_validate_flags, struct printbuf *); 13 13 void bch2_quota_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 14 14 15 15 #define bch2_bkey_ops_quota ((struct bkey_ops) { \
+6 -4
fs/bcachefs/rebalance.c
··· 42 42 43 43 bch2_trans_iter_init(trans, &iter, BTREE_ID_rebalance_work, 44 44 SPOS(inum, REBALANCE_WORK_SCAN_OFFSET, U32_MAX), 45 - BTREE_ITER_INTENT); 45 + BTREE_ITER_intent); 46 46 k = bch2_btree_iter_peek_slot(&iter); 47 47 ret = bkey_err(k); 48 48 if (ret) ··· 89 89 90 90 bch2_trans_iter_init(trans, &iter, BTREE_ID_rebalance_work, 91 91 SPOS(inum, REBALANCE_WORK_SCAN_OFFSET, U32_MAX), 92 - BTREE_ITER_INTENT); 92 + BTREE_ITER_intent); 93 93 k = bch2_btree_iter_peek_slot(&iter); 94 94 ret = bkey_err(k); 95 95 if (ret) ··· 140 140 bch2_trans_iter_init(trans, extent_iter, 141 141 work_pos.inode ? BTREE_ID_extents : BTREE_ID_reflink, 142 142 work_pos, 143 - BTREE_ITER_ALL_SNAPSHOTS); 143 + BTREE_ITER_all_snapshots); 144 144 k = bch2_btree_iter_peek_slot(extent_iter); 145 145 if (bkey_err(k)) 146 146 return k; ··· 323 323 struct bkey_s_c k; 324 324 int ret = 0; 325 325 326 + bch2_trans_begin(trans); 327 + 326 328 bch2_move_stats_init(&r->work_stats, "rebalance_work"); 327 329 bch2_move_stats_init(&r->scan_stats, "rebalance_scan"); 328 330 329 331 bch2_trans_iter_init(trans, &rebalance_work_iter, 330 332 BTREE_ID_rebalance_work, POS_MIN, 331 - BTREE_ITER_ALL_SNAPSHOTS); 333 + BTREE_ITER_all_snapshots); 332 334 333 335 while (!bch2_move_ratelimit(ctxt)) { 334 336 if (!r->enabled) {
+84 -70
fs/bcachefs/recovery.c
··· 65 65 __set_bit_le64(BCH_FSCK_ERR_ptr_to_missing_alloc_key, ext->errors_silent); 66 66 __set_bit_le64(BCH_FSCK_ERR_ptr_gen_newer_than_bucket_gen, ext->errors_silent); 67 67 __set_bit_le64(BCH_FSCK_ERR_stale_dirty_ptr, ext->errors_silent); 68 + 69 + __set_bit_le64(BCH_FSCK_ERR_dev_usage_buckets_wrong, ext->errors_silent); 70 + __set_bit_le64(BCH_FSCK_ERR_dev_usage_sectors_wrong, ext->errors_silent); 71 + __set_bit_le64(BCH_FSCK_ERR_dev_usage_fragmented_wrong, ext->errors_silent); 72 + 73 + __set_bit_le64(BCH_FSCK_ERR_fs_usage_btree_wrong, ext->errors_silent); 74 + __set_bit_le64(BCH_FSCK_ERR_fs_usage_cached_wrong, ext->errors_silent); 75 + __set_bit_le64(BCH_FSCK_ERR_fs_usage_persistent_reserved_wrong, ext->errors_silent); 76 + __set_bit_le64(BCH_FSCK_ERR_fs_usage_replicas_wrong, ext->errors_silent); 77 + 68 78 __set_bit_le64(BCH_FSCK_ERR_alloc_key_data_type_wrong, ext->errors_silent); 69 79 __set_bit_le64(BCH_FSCK_ERR_alloc_key_gen_wrong, ext->errors_silent); 70 80 __set_bit_le64(BCH_FSCK_ERR_alloc_key_dirty_sectors_wrong, ext->errors_silent); 81 + __set_bit_le64(BCH_FSCK_ERR_alloc_key_cached_sectors_wrong, ext->errors_silent); 71 82 __set_bit_le64(BCH_FSCK_ERR_alloc_key_stripe_wrong, ext->errors_silent); 72 83 __set_bit_le64(BCH_FSCK_ERR_alloc_key_stripe_redundancy_wrong, ext->errors_silent); 73 84 __set_bit_le64(BCH_FSCK_ERR_need_discard_key_wrong, ext->errors_silent); ··· 136 125 { 137 126 struct btree_iter iter; 138 127 unsigned iter_flags = 139 - BTREE_ITER_INTENT| 140 - BTREE_ITER_NOT_EXTENTS; 141 - unsigned update_flags = BTREE_TRIGGER_NORUN; 128 + BTREE_ITER_intent| 129 + BTREE_ITER_not_extents; 130 + unsigned update_flags = BTREE_TRIGGER_norun; 142 131 int ret; 143 132 144 133 if (k->overwritten) ··· 147 136 trans->journal_res.seq = k->journal_seq; 148 137 149 138 /* 150 - * BTREE_UPDATE_KEY_CACHE_RECLAIM disables key cache lookup/update to 139 + * BTREE_UPDATE_key_cache_reclaim disables key cache lookup/update to 151 140 * keep the key cache coherent with the underlying btree. Nothing 152 141 * besides the allocator is doing updates yet so we don't need key cache 153 142 * coherency for non-alloc btrees, and key cache fills for snapshots 154 - * btrees use BTREE_ITER_FILTER_SNAPSHOTS, which isn't available until 143 + * btrees use BTREE_ITER_filter_snapshots, which isn't available until 155 144 * the snapshots recovery pass runs. 156 145 */ 157 146 if (!k->level && k->btree_id == BTREE_ID_alloc) 158 - iter_flags |= BTREE_ITER_CACHED; 147 + iter_flags |= BTREE_ITER_cached; 159 148 else 160 - update_flags |= BTREE_UPDATE_KEY_CACHE_RECLAIM; 149 + update_flags |= BTREE_UPDATE_key_cache_reclaim; 161 150 162 151 bch2_trans_node_iter_init(trans, &iter, k->btree_id, k->k->k.p, 163 152 BTREE_MAX_DEPTH, k->level, ··· 202 191 struct journal *j = &c->journal; 203 192 u64 start_seq = c->journal_replay_seq_start; 204 193 u64 end_seq = c->journal_replay_seq_start; 205 - struct btree_trans *trans = bch2_trans_get(c); 194 + struct btree_trans *trans = NULL; 206 195 bool immediate_flush = false; 207 196 int ret = 0; 208 197 ··· 216 205 BUG_ON(!atomic_read(&keys->ref)); 217 206 218 207 move_gap(keys, keys->nr); 208 + trans = bch2_trans_get(c); 219 209 220 210 /* 221 211 * First, attempt to replay keys in sorted order. This is more ··· 373 361 case BCH_JSET_ENTRY_dev_usage: { 374 362 struct jset_entry_dev_usage *u = 375 363 container_of(entry, struct jset_entry_dev_usage, entry); 376 - struct bch_dev *ca = bch_dev_bkey_exists(c, le32_to_cpu(u->dev)); 377 - unsigned i, nr_types = jset_entry_dev_usage_nr_types(u); 364 + unsigned nr_types = jset_entry_dev_usage_nr_types(u); 378 365 379 - for (i = 0; i < min_t(unsigned, nr_types, BCH_DATA_NR); i++) { 380 - ca->usage_base->d[i].buckets = le64_to_cpu(u->d[i].buckets); 381 - ca->usage_base->d[i].sectors = le64_to_cpu(u->d[i].sectors); 382 - ca->usage_base->d[i].fragmented = le64_to_cpu(u->d[i].fragmented); 383 - } 366 + rcu_read_lock(); 367 + struct bch_dev *ca = bch2_dev_rcu(c, le32_to_cpu(u->dev)); 368 + if (ca) 369 + for (unsigned i = 0; i < min_t(unsigned, nr_types, BCH_DATA_NR); i++) { 370 + ca->usage_base->d[i].buckets = le64_to_cpu(u->d[i].buckets); 371 + ca->usage_base->d[i].sectors = le64_to_cpu(u->d[i].sectors); 372 + ca->usage_base->d[i].fragmented = le64_to_cpu(u->d[i].fragmented); 373 + } 374 + rcu_read_unlock(); 384 375 385 376 break; 386 377 } ··· 612 597 if (c->opts.norecovery) 613 598 c->opts.recovery_pass_last = BCH_RECOVERY_PASS_journal_replay - 1; 614 599 615 - if (!c->opts.nochanges) { 616 - mutex_lock(&c->sb_lock); 617 - struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 618 - bool write_sb = false; 600 + mutex_lock(&c->sb_lock); 601 + struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 602 + bool write_sb = false; 619 603 620 - if (BCH_SB_HAS_TOPOLOGY_ERRORS(c->disk_sb.sb)) { 621 - ext->recovery_passes_required[0] |= 622 - cpu_to_le64(bch2_recovery_passes_to_stable(BIT_ULL(BCH_RECOVERY_PASS_check_topology))); 623 - write_sb = true; 624 - } 625 - 626 - u64 sb_passes = bch2_recovery_passes_from_stable(le64_to_cpu(ext->recovery_passes_required[0])); 627 - if (sb_passes) { 628 - struct printbuf buf = PRINTBUF; 629 - prt_str(&buf, "superblock requires following recovery passes to be run:\n "); 630 - prt_bitflags(&buf, bch2_recovery_passes, sb_passes); 631 - bch_info(c, "%s", buf.buf); 632 - printbuf_exit(&buf); 633 - } 634 - 635 - if (bch2_check_version_downgrade(c)) { 636 - struct printbuf buf = PRINTBUF; 637 - 638 - prt_str(&buf, "Version downgrade required:"); 639 - 640 - __le64 passes = ext->recovery_passes_required[0]; 641 - bch2_sb_set_downgrade(c, 642 - BCH_VERSION_MINOR(bcachefs_metadata_version_current), 643 - BCH_VERSION_MINOR(c->sb.version)); 644 - passes = ext->recovery_passes_required[0] & ~passes; 645 - if (passes) { 646 - prt_str(&buf, "\n running recovery passes: "); 647 - prt_bitflags(&buf, bch2_recovery_passes, 648 - bch2_recovery_passes_from_stable(le64_to_cpu(passes))); 649 - } 650 - 651 - bch_info(c, "%s", buf.buf); 652 - printbuf_exit(&buf); 653 - write_sb = true; 654 - } 655 - 656 - if (check_version_upgrade(c)) 657 - write_sb = true; 658 - 659 - if (write_sb) 660 - bch2_write_super(c); 661 - 662 - c->recovery_passes_explicit |= bch2_recovery_passes_from_stable(le64_to_cpu(ext->recovery_passes_required[0])); 663 - mutex_unlock(&c->sb_lock); 604 + if (BCH_SB_HAS_TOPOLOGY_ERRORS(c->disk_sb.sb)) { 605 + ext->recovery_passes_required[0] |= 606 + cpu_to_le64(bch2_recovery_passes_to_stable(BIT_ULL(BCH_RECOVERY_PASS_check_topology))); 607 + write_sb = true; 664 608 } 609 + 610 + u64 sb_passes = bch2_recovery_passes_from_stable(le64_to_cpu(ext->recovery_passes_required[0])); 611 + if (sb_passes) { 612 + struct printbuf buf = PRINTBUF; 613 + prt_str(&buf, "superblock requires following recovery passes to be run:\n "); 614 + prt_bitflags(&buf, bch2_recovery_passes, sb_passes); 615 + bch_info(c, "%s", buf.buf); 616 + printbuf_exit(&buf); 617 + } 618 + 619 + if (bch2_check_version_downgrade(c)) { 620 + struct printbuf buf = PRINTBUF; 621 + 622 + prt_str(&buf, "Version downgrade required:"); 623 + 624 + __le64 passes = ext->recovery_passes_required[0]; 625 + bch2_sb_set_downgrade(c, 626 + BCH_VERSION_MINOR(bcachefs_metadata_version_current), 627 + BCH_VERSION_MINOR(c->sb.version)); 628 + passes = ext->recovery_passes_required[0] & ~passes; 629 + if (passes) { 630 + prt_str(&buf, "\n running recovery passes: "); 631 + prt_bitflags(&buf, bch2_recovery_passes, 632 + bch2_recovery_passes_from_stable(le64_to_cpu(passes))); 633 + } 634 + 635 + bch_info(c, "%s", buf.buf); 636 + printbuf_exit(&buf); 637 + write_sb = true; 638 + } 639 + 640 + if (check_version_upgrade(c)) 641 + write_sb = true; 642 + 643 + if (write_sb) 644 + bch2_write_super(c); 645 + 646 + c->recovery_passes_explicit |= bch2_recovery_passes_from_stable(le64_to_cpu(ext->recovery_passes_required[0])); 647 + mutex_unlock(&c->sb_lock); 665 648 666 649 if (c->opts.fsck && IS_ENABLED(CONFIG_BCACHEFS_DEBUG)) 667 650 c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_topology); ··· 673 660 goto err; 674 661 } 675 662 676 - if (!c->sb.clean || c->opts.fsck || c->opts.retain_recovery_info) { 663 + bch2_journal_pos_from_member_info_resume(c); 664 + 665 + if (!c->sb.clean || c->opts.retain_recovery_info) { 677 666 struct genradix_iter iter; 678 667 struct journal_replay **i; 679 668 ··· 847 832 } 848 833 849 834 mutex_lock(&c->sb_lock); 850 - struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 851 - bool write_sb = false; 835 + ext = bch2_sb_field_get(c->disk_sb.sb, ext); 836 + write_sb = false; 852 837 853 838 if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) != le16_to_cpu(c->disk_sb.sb->version)) { 854 839 SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, le16_to_cpu(c->disk_sb.sb->version)); ··· 883 868 write_sb = true; 884 869 } 885 870 871 + if (bch2_blacklist_entries_gc(c)) 872 + write_sb = true; 873 + 886 874 if (write_sb) 887 875 bch2_write_super(c); 888 876 mutex_unlock(&c->sb_lock); ··· 907 889 goto err; 908 890 bch_info(c, "scanning for old btree nodes done"); 909 891 } 910 - 911 - if (c->journal_seq_blacklist_table && 912 - c->journal_seq_blacklist_table->nr > 128) 913 - queue_work(system_long_wq, &c->journal_seq_blacklist_gc_work); 914 892 915 893 ret = 0; 916 894 out:
+2 -6
fs/bcachefs/recovery_passes.c
··· 26 26 NULL 27 27 }; 28 28 29 - static int bch2_check_allocations(struct bch_fs *c) 30 - { 31 - return bch2_gc(c, true, false); 32 - } 33 - 34 29 static int bch2_set_may_go_rw(struct bch_fs *c) 35 30 { 36 31 struct journal_keys *keys = &c->journal_keys; ··· 222 227 if (should_run_recovery_pass(c, c->curr_recovery_pass)) { 223 228 unsigned pass = c->curr_recovery_pass; 224 229 225 - ret = bch2_run_recovery_pass(c, c->curr_recovery_pass); 230 + ret = bch2_run_recovery_pass(c, c->curr_recovery_pass) ?: 231 + bch2_journal_flush(&c->journal); 226 232 if (bch2_err_matches(ret, BCH_ERR_restart_recovery) || 227 233 (ret && c->curr_recovery_pass < pass)) 228 234 continue;
+39 -33
fs/bcachefs/reflink.c
··· 30 30 /* reflink pointers */ 31 31 32 32 int bch2_reflink_p_invalid(struct bch_fs *c, struct bkey_s_c k, 33 - enum bkey_invalid_flags flags, 33 + enum bch_validate_flags flags, 34 34 struct printbuf *err) 35 35 { 36 36 struct bkey_s_c_reflink_p p = bkey_s_c_to_reflink_p(k); ··· 74 74 } 75 75 76 76 static int trans_trigger_reflink_p_segment(struct btree_trans *trans, 77 - struct bkey_s_c_reflink_p p, 78 - u64 *idx, unsigned flags) 77 + struct bkey_s_c_reflink_p p, u64 *idx, 78 + enum btree_iter_update_trigger_flags flags) 79 79 { 80 80 struct bch_fs *c = trans->c; 81 81 struct btree_iter iter; 82 82 struct bkey_i *k; 83 83 __le64 *refcount; 84 - int add = !(flags & BTREE_TRIGGER_OVERWRITE) ? 1 : -1; 84 + int add = !(flags & BTREE_TRIGGER_overwrite) ? 1 : -1; 85 85 struct printbuf buf = PRINTBUF; 86 86 int ret; 87 87 88 88 k = bch2_bkey_get_mut_noupdate(trans, &iter, 89 89 BTREE_ID_reflink, POS(0, *idx), 90 - BTREE_ITER_WITH_UPDATES); 90 + BTREE_ITER_with_updates); 91 91 ret = PTR_ERR_OR_ZERO(k); 92 92 if (ret) 93 93 goto err; ··· 102 102 goto err; 103 103 } 104 104 105 - if (!*refcount && (flags & BTREE_TRIGGER_OVERWRITE)) { 105 + if (!*refcount && (flags & BTREE_TRIGGER_overwrite)) { 106 106 bch2_bkey_val_to_text(&buf, c, p.s_c); 107 107 bch2_trans_inconsistent(trans, 108 108 "indirect extent refcount underflow at %llu while marking\n %s", ··· 111 111 goto err; 112 112 } 113 113 114 - if (flags & BTREE_TRIGGER_INSERT) { 114 + if (flags & BTREE_TRIGGER_insert) { 115 115 struct bch_reflink_p *v = (struct bch_reflink_p *) p.v; 116 116 u64 pad; 117 117 ··· 141 141 } 142 142 143 143 static s64 gc_trigger_reflink_p_segment(struct btree_trans *trans, 144 - struct bkey_s_c_reflink_p p, 145 - u64 *idx, unsigned flags, size_t r_idx) 144 + struct bkey_s_c_reflink_p p, u64 *idx, 145 + enum btree_iter_update_trigger_flags flags, 146 + size_t r_idx) 146 147 { 147 148 struct bch_fs *c = trans->c; 148 149 struct reflink_gc *r; 149 - int add = !(flags & BTREE_TRIGGER_OVERWRITE) ? 1 : -1; 150 + int add = !(flags & BTREE_TRIGGER_overwrite) ? 1 : -1; 150 151 u64 start = le64_to_cpu(p.v->idx); 151 152 u64 end = le64_to_cpu(p.v->idx) + p.k->size; 152 153 u64 next_idx = end + le32_to_cpu(p.v->back_pad); ··· 164 163 165 164 BUG_ON((s64) r->refcount + add < 0); 166 165 167 - r->refcount += add; 166 + if (flags & BTREE_TRIGGER_gc) 167 + r->refcount += add; 168 168 *idx = r->offset; 169 169 return 0; 170 170 not_found: 171 + BUG_ON(!(flags & BTREE_TRIGGER_check_repair)); 172 + 171 173 if (fsck_err(c, reflink_p_to_missing_reflink_v, 172 174 "pointer to missing indirect extent\n" 173 175 " %s\n" ··· 193 189 set_bkey_val_u64s(&update->k, 0); 194 190 } 195 191 196 - ret = bch2_btree_insert_trans(trans, BTREE_ID_extents, update, BTREE_TRIGGER_NORUN); 192 + ret = bch2_btree_insert_trans(trans, BTREE_ID_extents, update, BTREE_TRIGGER_norun); 197 193 } 198 194 199 195 *idx = next_idx; ··· 204 200 } 205 201 206 202 static int __trigger_reflink_p(struct btree_trans *trans, 207 - enum btree_id btree_id, unsigned level, 208 - struct bkey_s_c k, unsigned flags) 203 + enum btree_id btree_id, unsigned level, struct bkey_s_c k, 204 + enum btree_iter_update_trigger_flags flags) 209 205 { 210 206 struct bch_fs *c = trans->c; 211 207 struct bkey_s_c_reflink_p p = bkey_s_c_to_reflink_p(k); ··· 214 210 u64 idx = le64_to_cpu(p.v->idx) - le32_to_cpu(p.v->front_pad); 215 211 u64 end = le64_to_cpu(p.v->idx) + p.k->size + le32_to_cpu(p.v->back_pad); 216 212 217 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { 213 + if (flags & BTREE_TRIGGER_transactional) { 218 214 while (idx < end && !ret) 219 215 ret = trans_trigger_reflink_p_segment(trans, p, &idx, flags); 220 216 } 221 217 222 - if (flags & BTREE_TRIGGER_GC) { 218 + if (flags & (BTREE_TRIGGER_check_repair|BTREE_TRIGGER_gc)) { 223 219 size_t l = 0, r = c->reflink_gc_nr; 224 220 225 221 while (l < r) { ··· 242 238 enum btree_id btree_id, unsigned level, 243 239 struct bkey_s_c old, 244 240 struct bkey_s new, 245 - unsigned flags) 241 + enum btree_iter_update_trigger_flags flags) 246 242 { 247 - if ((flags & BTREE_TRIGGER_TRANSACTIONAL) && 248 - (flags & BTREE_TRIGGER_INSERT)) { 243 + if ((flags & BTREE_TRIGGER_transactional) && 244 + (flags & BTREE_TRIGGER_insert)) { 249 245 struct bch_reflink_p *v = bkey_s_to_reflink_p(new).v; 250 246 251 247 v->front_pad = v->back_pad = 0; ··· 257 253 /* indirect extents */ 258 254 259 255 int bch2_reflink_v_invalid(struct bch_fs *c, struct bkey_s_c k, 260 - enum bkey_invalid_flags flags, 256 + enum bch_validate_flags flags, 261 257 struct printbuf *err) 262 258 { 263 259 return bch2_bkey_ptrs_invalid(c, k, flags, err); ··· 285 281 } 286 282 #endif 287 283 288 - static inline void check_indirect_extent_deleting(struct bkey_s new, unsigned *flags) 284 + static inline void 285 + check_indirect_extent_deleting(struct bkey_s new, 286 + enum btree_iter_update_trigger_flags *flags) 289 287 { 290 - if ((*flags & BTREE_TRIGGER_INSERT) && !*bkey_refcount(new)) { 288 + if ((*flags & BTREE_TRIGGER_insert) && !*bkey_refcount(new)) { 291 289 new.k->type = KEY_TYPE_deleted; 292 290 new.k->size = 0; 293 291 set_bkey_val_u64s(new.k, 0); 294 - *flags &= ~BTREE_TRIGGER_INSERT; 292 + *flags &= ~BTREE_TRIGGER_insert; 295 293 } 296 294 } 297 295 298 296 int bch2_trigger_reflink_v(struct btree_trans *trans, 299 297 enum btree_id btree_id, unsigned level, 300 298 struct bkey_s_c old, struct bkey_s new, 301 - unsigned flags) 299 + enum btree_iter_update_trigger_flags flags) 302 300 { 303 - if ((flags & BTREE_TRIGGER_TRANSACTIONAL) && 304 - (flags & BTREE_TRIGGER_INSERT)) 301 + if ((flags & BTREE_TRIGGER_transactional) && 302 + (flags & BTREE_TRIGGER_insert)) 305 303 check_indirect_extent_deleting(new, &flags); 306 304 307 305 return bch2_trigger_extent(trans, btree_id, level, old, new, flags); ··· 312 306 /* indirect inline data */ 313 307 314 308 int bch2_indirect_inline_data_invalid(struct bch_fs *c, struct bkey_s_c k, 315 - enum bkey_invalid_flags flags, 309 + enum bch_validate_flags flags, 316 310 struct printbuf *err) 317 311 { 318 312 return 0; ··· 332 326 int bch2_trigger_indirect_inline_data(struct btree_trans *trans, 333 327 enum btree_id btree_id, unsigned level, 334 328 struct bkey_s_c old, struct bkey_s new, 335 - unsigned flags) 329 + enum btree_iter_update_trigger_flags flags) 336 330 { 337 331 check_indirect_extent_deleting(new, &flags); 338 332 ··· 355 349 bch2_check_set_feature(c, BCH_FEATURE_reflink_inline_data); 356 350 357 351 bch2_trans_iter_init(trans, &reflink_iter, BTREE_ID_reflink, POS_MAX, 358 - BTREE_ITER_INTENT); 352 + BTREE_ITER_intent); 359 353 k = bch2_btree_iter_peek_prev(&reflink_iter); 360 354 ret = bkey_err(k); 361 355 if (ret) ··· 400 394 r_p->v.idx = cpu_to_le64(bkey_start_offset(&r_v->k)); 401 395 402 396 ret = bch2_trans_update(trans, extent_iter, &r_p->k_i, 403 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 397 + BTREE_UPDATE_internal_snapshot_node); 404 398 err: 405 399 bch2_trans_iter_exit(trans, &reflink_iter); 406 400 ··· 461 455 goto err; 462 456 463 457 bch2_trans_iter_init(trans, &src_iter, BTREE_ID_extents, src_start, 464 - BTREE_ITER_INTENT); 458 + BTREE_ITER_intent); 465 459 bch2_trans_iter_init(trans, &dst_iter, BTREE_ID_extents, dst_start, 466 - BTREE_ITER_INTENT); 460 + BTREE_ITER_intent); 467 461 468 462 while ((ret == 0 || 469 463 bch2_err_matches(ret, BCH_ERR_transaction_restart)) && ··· 573 567 bch2_trans_begin(trans); 574 568 575 569 ret2 = bch2_inode_peek(trans, &inode_iter, &inode_u, 576 - dst_inum, BTREE_ITER_INTENT); 570 + dst_inum, BTREE_ITER_intent); 577 571 578 572 if (!ret2 && 579 573 inode_u.bi_size < new_i_size) {
+9 -7
fs/bcachefs/reflink.h
··· 2 2 #ifndef _BCACHEFS_REFLINK_H 3 3 #define _BCACHEFS_REFLINK_H 4 4 5 - enum bkey_invalid_flags; 5 + enum bch_validate_flags; 6 6 7 7 int bch2_reflink_p_invalid(struct bch_fs *, struct bkey_s_c, 8 - enum bkey_invalid_flags, struct printbuf *); 8 + enum bch_validate_flags, struct printbuf *); 9 9 void bch2_reflink_p_to_text(struct printbuf *, struct bch_fs *, 10 10 struct bkey_s_c); 11 11 bool bch2_reflink_p_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c); 12 12 int bch2_trigger_reflink_p(struct btree_trans *, enum btree_id, unsigned, 13 - struct bkey_s_c, struct bkey_s, unsigned); 13 + struct bkey_s_c, struct bkey_s, 14 + enum btree_iter_update_trigger_flags); 14 15 15 16 #define bch2_bkey_ops_reflink_p ((struct bkey_ops) { \ 16 17 .key_invalid = bch2_reflink_p_invalid, \ ··· 22 21 }) 23 22 24 23 int bch2_reflink_v_invalid(struct bch_fs *, struct bkey_s_c, 25 - enum bkey_invalid_flags, struct printbuf *); 24 + enum bch_validate_flags, struct printbuf *); 26 25 void bch2_reflink_v_to_text(struct printbuf *, struct bch_fs *, 27 26 struct bkey_s_c); 28 27 int bch2_trigger_reflink_v(struct btree_trans *, enum btree_id, unsigned, 29 - struct bkey_s_c, struct bkey_s, unsigned); 28 + struct bkey_s_c, struct bkey_s, 29 + enum btree_iter_update_trigger_flags); 30 30 31 31 #define bch2_bkey_ops_reflink_v ((struct bkey_ops) { \ 32 32 .key_invalid = bch2_reflink_v_invalid, \ ··· 38 36 }) 39 37 40 38 int bch2_indirect_inline_data_invalid(struct bch_fs *, struct bkey_s_c, 41 - enum bkey_invalid_flags, struct printbuf *); 39 + enum bch_validate_flags, struct printbuf *); 42 40 void bch2_indirect_inline_data_to_text(struct printbuf *, 43 41 struct bch_fs *, struct bkey_s_c); 44 42 int bch2_trigger_indirect_inline_data(struct btree_trans *, 45 43 enum btree_id, unsigned, 46 44 struct bkey_s_c, struct bkey_s, 47 - unsigned); 45 + enum btree_iter_update_trigger_flags); 48 46 49 47 #define bch2_bkey_ops_indirect_inline_data ((struct bkey_ops) { \ 50 48 .key_invalid = bch2_indirect_inline_data_invalid, \
+11 -9
fs/bcachefs/replicas.c
··· 84 84 } 85 85 86 86 for (unsigned i = 0; i < r->nr_devs; i++) 87 - if (!bch2_dev_exists(sb, r->devs[i])) { 87 + if (!bch2_member_exists(sb, r->devs[i])) { 88 88 prt_printf(err, "invalid device %u in entry ", r->devs[i]); 89 89 goto bad; 90 90 } ··· 200 200 }; 201 201 202 202 for (i = 0; i < new_entry->nr_devs; i++) 203 - BUG_ON(!bch2_dev_exists2(c, new_entry->devs[i])); 203 + BUG_ON(!bch2_dev_exists(c, new_entry->devs[i])); 204 204 205 205 BUG_ON(!new_entry->data_type); 206 206 verify_replicas_entry(new_entry); ··· 860 860 } 861 861 862 862 static int bch2_sb_replicas_validate(struct bch_sb *sb, struct bch_sb_field *f, 863 - struct printbuf *err) 863 + enum bch_validate_flags flags, struct printbuf *err) 864 864 { 865 865 struct bch_sb_field_replicas *sb_r = field_to_type(f, replicas); 866 866 struct bch_replicas_cpu cpu_r; ··· 899 899 }; 900 900 901 901 static int bch2_sb_replicas_v0_validate(struct bch_sb *sb, struct bch_sb_field *f, 902 - struct printbuf *err) 902 + enum bch_validate_flags flags, struct printbuf *err) 903 903 { 904 904 struct bch_sb_field_replicas_v0 *sb_r = field_to_type(f, replicas_v0); 905 905 struct bch_replicas_cpu cpu_r; ··· 947 947 948 948 percpu_down_read(&c->mark_lock); 949 949 for_each_cpu_replicas_entry(&c->replicas, e) { 950 - unsigned i, nr_online = 0, nr_failed = 0, dflags = 0; 950 + unsigned nr_online = 0, nr_failed = 0, dflags = 0; 951 951 bool metadata = e->data_type < BCH_DATA_user; 952 952 953 953 if (e->data_type == BCH_DATA_cached) 954 954 continue; 955 955 956 - for (i = 0; i < e->nr_devs; i++) { 957 - struct bch_dev *ca = bch_dev_bkey_exists(c, e->devs[i]); 958 - 956 + rcu_read_lock(); 957 + for (unsigned i = 0; i < e->nr_devs; i++) { 959 958 nr_online += test_bit(e->devs[i], devs.d); 960 - nr_failed += ca->mi.state == BCH_MEMBER_STATE_failed; 959 + 960 + struct bch_dev *ca = bch2_dev_rcu(c, e->devs[i]); 961 + nr_failed += ca && ca->mi.state == BCH_MEMBER_STATE_failed; 961 962 } 963 + rcu_read_unlock(); 962 964 963 965 if (nr_failed == e->nr_devs) 964 966 continue;
+7 -8
fs/bcachefs/sb-clean.c
··· 266 266 } 267 267 } 268 268 269 - static int bch2_sb_clean_validate(struct bch_sb *sb, 270 - struct bch_sb_field *f, 271 - struct printbuf *err) 269 + static int bch2_sb_clean_validate(struct bch_sb *sb, struct bch_sb_field *f, 270 + enum bch_validate_flags flags, struct printbuf *err) 272 271 { 273 272 struct bch_sb_field_clean *clean = field_to_type(f, clean); 274 273 ··· 282 283 entry = vstruct_next(entry)) { 283 284 if ((void *) vstruct_next(entry) > vstruct_end(&clean->field)) { 284 285 prt_str(err, "entry type "); 285 - bch2_prt_jset_entry_type(err, le16_to_cpu(entry->type)); 286 + bch2_prt_jset_entry_type(err, entry->type); 286 287 prt_str(err, " overruns end of section"); 287 288 return -BCH_ERR_invalid_sb_clean; 288 289 } ··· 297 298 struct bch_sb_field_clean *clean = field_to_type(f, clean); 298 299 struct jset_entry *entry; 299 300 300 - prt_printf(out, "flags: %x", le32_to_cpu(clean->flags)); 301 - prt_newline(out); 302 - prt_printf(out, "journal_seq: %llu", le64_to_cpu(clean->journal_seq)); 303 - prt_newline(out); 301 + prt_printf(out, "flags: %x\n", le32_to_cpu(clean->flags)); 302 + prt_printf(out, "journal_seq: %llu\n", le64_to_cpu(clean->journal_seq)); 304 303 305 304 for (entry = clean->start; 306 305 entry != vstruct_end(&clean->field); ··· 388 391 bch_err(c, "error writing marking filesystem clean: validate error"); 389 392 goto out; 390 393 } 394 + 395 + bch2_journal_pos_from_member_info_set(c); 391 396 392 397 bch2_write_super(c); 393 398 out:
+6 -14
fs/bcachefs/sb-counters.c
··· 20 20 return (__le64 *) vstruct_end(&ctrs->field) - &ctrs->d[0]; 21 21 }; 22 22 23 - static int bch2_sb_counters_validate(struct bch_sb *sb, 24 - struct bch_sb_field *f, 25 - struct printbuf *err) 23 + static int bch2_sb_counters_validate(struct bch_sb *sb, struct bch_sb_field *f, 24 + enum bch_validate_flags flags, struct printbuf *err) 26 25 { 27 26 return 0; 28 27 }; ··· 30 31 struct bch_sb_field *f) 31 32 { 32 33 struct bch_sb_field_counters *ctrs = field_to_type(f, counters); 33 - unsigned int i; 34 34 unsigned int nr = bch2_sb_counter_nr_entries(ctrs); 35 35 36 - for (i = 0; i < nr; i++) { 37 - if (i < BCH_COUNTER_NR) 38 - prt_printf(out, "%s ", bch2_counter_names[i]); 39 - else 40 - prt_printf(out, "(unknown)"); 41 - 42 - prt_tab(out); 43 - prt_printf(out, "%llu", le64_to_cpu(ctrs->d[i])); 44 - prt_newline(out); 45 - } 36 + for (unsigned i = 0; i < nr; i++) 37 + prt_printf(out, "%s \t%llu\n", 38 + i < BCH_COUNTER_NR ? bch2_counter_names[i] : "(unknown)", 39 + le64_to_cpu(ctrs->d[i])); 46 40 }; 47 41 48 42 int bch2_sb_counters_to_cpu(struct bch_fs *c)
+16 -9
fs/bcachefs/sb-downgrade.c
··· 134 134 #define for_each_downgrade_entry(_d, _i) \ 135 135 for (const struct bch_sb_field_downgrade_entry *_i = (_d)->entries; \ 136 136 (void *) _i < vstruct_end(&(_d)->field) && \ 137 - (void *) &_i->errors[0] < vstruct_end(&(_d)->field); \ 137 + (void *) &_i->errors[0] <= vstruct_end(&(_d)->field) && \ 138 + (void *) downgrade_entry_next_c(_i) <= vstruct_end(&(_d)->field); \ 138 139 _i = downgrade_entry_next_c(_i)) 139 140 140 141 static int bch2_sb_downgrade_validate(struct bch_sb *sb, struct bch_sb_field *f, 141 - struct printbuf *err) 142 + enum bch_validate_flags flags, struct printbuf *err) 142 143 { 143 144 struct bch_sb_field_downgrade *e = field_to_type(f, downgrade); 144 145 145 - for_each_downgrade_entry(e, i) { 146 + for (const struct bch_sb_field_downgrade_entry *i = e->entries; 147 + (void *) i < vstruct_end(&e->field); 148 + i = downgrade_entry_next_c(i)) { 149 + if (flags & BCH_VALIDATE_write && 150 + ((void *) &i->errors[0] > vstruct_end(&e->field) || 151 + (void *) downgrade_entry_next_c(i) > vstruct_end(&e->field))) { 152 + prt_printf(err, "downgrade entry overruns end of superblock section)"); 153 + return -BCH_ERR_invalid_sb_downgrade; 154 + } 155 + 146 156 if (BCH_VERSION_MAJOR(le16_to_cpu(i->version)) != 147 157 BCH_VERSION_MAJOR(le16_to_cpu(sb->version))) { 148 158 prt_printf(err, "downgrade entry with mismatched major version (%u != %u)", ··· 174 164 printbuf_tabstop_push(out, 16); 175 165 176 166 for_each_downgrade_entry(e, i) { 177 - prt_str(out, "version:"); 178 - prt_tab(out); 167 + prt_str(out, "version:\t"); 179 168 bch2_version_to_text(out, le16_to_cpu(i->version)); 180 169 prt_newline(out); 181 170 182 - prt_str(out, "recovery passes:"); 183 - prt_tab(out); 171 + prt_str(out, "recovery passes:\t"); 184 172 prt_bitflags(out, bch2_recovery_passes, 185 173 bch2_recovery_passes_from_stable(le64_to_cpu(i->recovery_passes[0]))); 186 174 prt_newline(out); 187 175 188 - prt_str(out, "errors:"); 189 - prt_tab(out); 176 + prt_str(out, "errors:\t"); 190 177 bool first = true; 191 178 for (unsigned j = 0; j < le16_to_cpu(i->nr_errors); j++) { 192 179 if (!first)
+1 -1
fs/bcachefs/sb-errors.c
··· 30 30 } 31 31 32 32 static int bch2_sb_errors_validate(struct bch_sb *sb, struct bch_sb_field *f, 33 - struct printbuf *err) 33 + enum bch_validate_flags flags, struct printbuf *err) 34 34 { 35 35 struct bch_sb_field_errors *e = field_to_type(f, errors); 36 36 unsigned i, nr = bch2_sb_field_errors_nr_entries(e);
+2 -1
fs/bcachefs/sb-errors_types.h
··· 272 272 x(snapshot_node_missing, 264) \ 273 273 x(dup_backpointer_to_bad_csum_extent, 265) \ 274 274 x(btree_bitmap_not_marked, 266) \ 275 - x(sb_clean_entry_overrun, 267) 275 + x(sb_clean_entry_overrun, 267) \ 276 + x(btree_ptr_v2_written_0, 268) 276 277 277 278 enum bch_sb_error_id { 278 279 #define x(t, n) BCH_FSCK_ERR_##t = n,
+67 -82
fs/bcachefs/sb-members.c
··· 3 3 #include "bcachefs.h" 4 4 #include "btree_cache.h" 5 5 #include "disk_groups.h" 6 + #include "error.h" 6 7 #include "opts.h" 7 8 #include "replicas.h" 8 9 #include "sb-members.h" 9 10 #include "super-io.h" 11 + 12 + void bch2_dev_missing(struct bch_fs *c, unsigned dev) 13 + { 14 + bch2_fs_inconsistent(c, "pointer to nonexistent device %u", dev); 15 + } 16 + 17 + void bch2_dev_bucket_missing(struct bch_fs *c, struct bpos bucket) 18 + { 19 + bch2_fs_inconsistent(c, "pointer to nonexistent bucket %llu:%llu", bucket.inode, bucket.offset); 20 + } 10 21 11 22 #define x(t, n, ...) [n] = #t, 12 23 static const char * const bch2_iops_measurements[] = { ··· 175 164 u64 bucket_size = le16_to_cpu(m.bucket_size); 176 165 u64 device_size = le64_to_cpu(m.nbuckets) * bucket_size; 177 166 178 - if (!bch2_member_exists(&m)) 167 + if (!bch2_member_alive(&m)) 179 168 return; 180 169 181 - prt_printf(out, "Device:"); 182 - prt_tab(out); 183 - prt_printf(out, "%u", i); 184 - prt_newline(out); 170 + prt_printf(out, "Device:\t%u\n", i); 185 171 186 172 printbuf_indent_add(out, 2); 187 173 188 - prt_printf(out, "Label:"); 189 - prt_tab(out); 174 + prt_printf(out, "Label:\t"); 190 175 if (BCH_MEMBER_GROUP(&m)) { 191 176 unsigned idx = BCH_MEMBER_GROUP(&m) - 1; 192 177 ··· 196 189 } 197 190 prt_newline(out); 198 191 199 - prt_printf(out, "UUID:"); 200 - prt_tab(out); 192 + prt_printf(out, "UUID:\t"); 201 193 pr_uuid(out, m.uuid.b); 202 194 prt_newline(out); 203 195 204 - prt_printf(out, "Size:"); 205 - prt_tab(out); 196 + prt_printf(out, "Size:\t"); 206 197 prt_units_u64(out, device_size << 9); 207 198 prt_newline(out); 208 199 209 - for (unsigned i = 0; i < BCH_MEMBER_ERROR_NR; i++) { 210 - prt_printf(out, "%s errors:", bch2_member_error_strs[i]); 211 - prt_tab(out); 212 - prt_u64(out, le64_to_cpu(m.errors[i])); 213 - prt_newline(out); 214 - } 200 + for (unsigned i = 0; i < BCH_MEMBER_ERROR_NR; i++) 201 + prt_printf(out, "%s errors:\t%llu\n", bch2_member_error_strs[i], le64_to_cpu(m.errors[i])); 215 202 216 - for (unsigned i = 0; i < BCH_IOPS_NR; i++) { 217 - prt_printf(out, "%s iops:", bch2_iops_measurements[i]); 218 - prt_tab(out); 219 - prt_printf(out, "%u", le32_to_cpu(m.iops[i])); 220 - prt_newline(out); 221 - } 203 + for (unsigned i = 0; i < BCH_IOPS_NR; i++) 204 + prt_printf(out, "%s iops:\t%u\n", bch2_iops_measurements[i], le32_to_cpu(m.iops[i])); 222 205 223 - prt_printf(out, "Bucket size:"); 224 - prt_tab(out); 206 + prt_printf(out, "Bucket size:\t"); 225 207 prt_units_u64(out, bucket_size << 9); 226 208 prt_newline(out); 227 209 228 - prt_printf(out, "First bucket:"); 229 - prt_tab(out); 230 - prt_printf(out, "%u", le16_to_cpu(m.first_bucket)); 231 - prt_newline(out); 210 + prt_printf(out, "First bucket:\t%u\n", le16_to_cpu(m.first_bucket)); 211 + prt_printf(out, "Buckets:\t%llu\n", le64_to_cpu(m.nbuckets)); 232 212 233 - prt_printf(out, "Buckets:"); 234 - prt_tab(out); 235 - prt_printf(out, "%llu", le64_to_cpu(m.nbuckets)); 236 - prt_newline(out); 237 - 238 - prt_printf(out, "Last mount:"); 239 - prt_tab(out); 213 + prt_printf(out, "Last mount:\t"); 240 214 if (m.last_mount) 241 215 bch2_prt_datetime(out, le64_to_cpu(m.last_mount)); 242 216 else 243 217 prt_printf(out, "(never)"); 244 218 prt_newline(out); 245 219 246 - prt_printf(out, "Last superblock write:"); 247 - prt_tab(out); 248 - prt_u64(out, le64_to_cpu(m.seq)); 249 - prt_newline(out); 220 + prt_printf(out, "Last superblock write:\t%llu\n", le64_to_cpu(m.seq)); 250 221 251 - prt_printf(out, "State:"); 252 - prt_tab(out); 253 - prt_printf(out, "%s", 222 + prt_printf(out, "State:\t%s\n", 254 223 BCH_MEMBER_STATE(&m) < BCH_MEMBER_STATE_NR 255 224 ? bch2_member_states[BCH_MEMBER_STATE(&m)] 256 225 : "unknown"); 257 - prt_newline(out); 258 226 259 - prt_printf(out, "Data allowed:"); 260 - prt_tab(out); 227 + prt_printf(out, "Data allowed:\t"); 261 228 if (BCH_MEMBER_DATA_ALLOWED(&m)) 262 229 prt_bitflags(out, __bch2_data_types, BCH_MEMBER_DATA_ALLOWED(&m)); 263 230 else 264 231 prt_printf(out, "(none)"); 265 232 prt_newline(out); 266 233 267 - prt_printf(out, "Has data:"); 268 - prt_tab(out); 234 + prt_printf(out, "Has data:\t"); 269 235 if (data_have) 270 236 prt_bitflags(out, __bch2_data_types, data_have); 271 237 else 272 238 prt_printf(out, "(none)"); 273 239 prt_newline(out); 274 240 275 - prt_str(out, "Durability:"); 276 - prt_tab(out); 277 - prt_printf(out, "%llu", BCH_MEMBER_DURABILITY(&m) ? BCH_MEMBER_DURABILITY(&m) - 1 : 1); 241 + prt_printf(out, "Btree allocated bitmap blocksize:\t"); 242 + prt_units_u64(out, 1ULL << m.btree_bitmap_shift); 278 243 prt_newline(out); 279 244 280 - prt_printf(out, "Discard:"); 281 - prt_tab(out); 282 - prt_printf(out, "%llu", BCH_MEMBER_DISCARD(&m)); 245 + prt_printf(out, "Btree allocated bitmap:\t"); 246 + bch2_prt_u64_base2_nbits(out, le64_to_cpu(m.btree_allocated_bitmap), 64); 283 247 prt_newline(out); 284 248 285 - prt_printf(out, "Freespace initialized:"); 286 - prt_tab(out); 287 - prt_printf(out, "%llu", BCH_MEMBER_FREESPACE_INITIALIZED(&m)); 288 - prt_newline(out); 249 + prt_printf(out, "Durability:\t%llu\n", BCH_MEMBER_DURABILITY(&m) ? BCH_MEMBER_DURABILITY(&m) - 1 : 1); 250 + 251 + prt_printf(out, "Discard:\t%llu\n", BCH_MEMBER_DISCARD(&m)); 252 + prt_printf(out, "Freespace initialized:\t%llu\n", BCH_MEMBER_FREESPACE_INITIALIZED(&m)); 289 253 290 254 printbuf_indent_sub(out, 2); 291 255 } 292 256 293 - static int bch2_sb_members_v1_validate(struct bch_sb *sb, 294 - struct bch_sb_field *f, 295 - struct printbuf *err) 257 + static int bch2_sb_members_v1_validate(struct bch_sb *sb, struct bch_sb_field *f, 258 + enum bch_validate_flags flags, struct printbuf *err) 296 259 { 297 260 struct bch_sb_field_members_v1 *mi = field_to_type(f, members_v1); 298 261 unsigned i; ··· 310 333 member_to_text(out, members_v2_get(mi, i), gi, sb, i); 311 334 } 312 335 313 - static int bch2_sb_members_v2_validate(struct bch_sb *sb, 314 - struct bch_sb_field *f, 315 - struct printbuf *err) 336 + static int bch2_sb_members_v2_validate(struct bch_sb *sb, struct bch_sb_field *f, 337 + enum bch_validate_flags flags, struct printbuf *err) 316 338 { 317 339 struct bch_sb_field_members_v2 *mi = field_to_type(f, members_v2); 318 340 size_t mi_bytes = (void *) __bch2_members_v2_get_mut(mi, sb->nr_devices) - ··· 366 390 prt_newline(out); 367 391 368 392 printbuf_indent_add(out, 2); 369 - for (unsigned i = 0; i < BCH_MEMBER_ERROR_NR; i++) { 370 - prt_printf(out, "%s:", bch2_member_error_strs[i]); 371 - prt_tab(out); 372 - prt_u64(out, atomic64_read(&ca->errors[i])); 373 - prt_newline(out); 374 - } 393 + for (unsigned i = 0; i < BCH_MEMBER_ERROR_NR; i++) 394 + prt_printf(out, "%s:\t%llu\n", bch2_member_error_strs[i], atomic64_read(&ca->errors[i])); 375 395 printbuf_indent_sub(out, 2); 376 396 377 397 prt_str(out, "IO errors since "); ··· 376 404 prt_newline(out); 377 405 378 406 printbuf_indent_add(out, 2); 379 - for (unsigned i = 0; i < BCH_MEMBER_ERROR_NR; i++) { 380 - prt_printf(out, "%s:", bch2_member_error_strs[i]); 381 - prt_tab(out); 382 - prt_u64(out, atomic64_read(&ca->errors[i]) - le64_to_cpu(m.errors_at_reset[i])); 383 - prt_newline(out); 384 - } 407 + for (unsigned i = 0; i < BCH_MEMBER_ERROR_NR; i++) 408 + prt_printf(out, "%s:\t%llu\n", bch2_member_error_strs[i], 409 + atomic64_read(&ca->errors[i]) - le64_to_cpu(m.errors_at_reset[i])); 385 410 printbuf_indent_sub(out, 2); 386 411 } 387 412 ··· 406 437 407 438 bool bch2_dev_btree_bitmap_marked(struct bch_fs *c, struct bkey_s_c k) 408 439 { 409 - bkey_for_each_ptr(bch2_bkey_ptrs_c(k), ptr) 410 - if (!bch2_dev_btree_bitmap_marked_sectors(bch_dev_bkey_exists(c, ptr->dev), 411 - ptr->offset, btree_sectors(c))) 412 - return false; 413 - return true; 440 + bool ret = true; 441 + rcu_read_lock(); 442 + bkey_for_each_ptr(bch2_bkey_ptrs_c(k), ptr) { 443 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 444 + if (!ca) 445 + continue; 446 + 447 + if (!bch2_dev_btree_bitmap_marked_sectors(ca, ptr->offset, btree_sectors(c))) { 448 + ret = false; 449 + break; 450 + } 451 + } 452 + rcu_read_unlock(); 453 + return ret; 414 454 } 415 455 416 456 static void __bch2_dev_btree_bitmap_mark(struct bch_sb_field_members_v2 *mi, unsigned dev, ··· 441 463 m->btree_bitmap_shift += resize; 442 464 } 443 465 466 + BUG_ON(m->btree_bitmap_shift > 57); 467 + BUG_ON(end > 64ULL << m->btree_bitmap_shift); 468 + 444 469 for (unsigned bit = start >> m->btree_bitmap_shift; 445 470 (u64) bit << m->btree_bitmap_shift < end; 446 471 bit++) ··· 457 476 lockdep_assert_held(&c->sb_lock); 458 477 459 478 struct bch_sb_field_members_v2 *mi = bch2_sb_field_get(c->disk_sb.sb, members_v2); 460 - bkey_for_each_ptr(bch2_bkey_ptrs_c(k), ptr) 479 + bkey_for_each_ptr(bch2_bkey_ptrs_c(k), ptr) { 480 + if (!bch2_member_exists(c->disk_sb.sb, ptr->dev)) 481 + continue; 482 + 461 483 __bch2_dev_btree_bitmap_mark(mi, ptr->dev, ptr->offset, btree_sectors(c)); 484 + } 462 485 }
+135 -32
fs/bcachefs/sb-members.h
··· 29 29 ca->mi.state != BCH_MEMBER_STATE_failed; 30 30 } 31 31 32 - static inline bool bch2_dev_get_ioref(struct bch_dev *ca, int rw) 33 - { 34 - if (!percpu_ref_tryget(&ca->io_ref)) 35 - return false; 36 - 37 - if (ca->mi.state == BCH_MEMBER_STATE_rw || 38 - (ca->mi.state == BCH_MEMBER_STATE_ro && rw == READ)) 39 - return true; 40 - 41 - percpu_ref_put(&ca->io_ref); 42 - return false; 43 - } 44 - 45 32 static inline unsigned dev_mask_nr(const struct bch_devs_mask *devs) 46 33 { 47 34 return bitmap_weight(devs->d, BCH_SB_MEMBERS_MAX); ··· 92 105 for (struct bch_dev *_ca = NULL; \ 93 106 (_ca = __bch2_next_dev((_c), _ca, (_mask)));) 94 107 108 + static inline void bch2_dev_get(struct bch_dev *ca) 109 + { 110 + #ifdef CONFIG_BCACHEFS_DEBUG 111 + BUG_ON(atomic_long_inc_return(&ca->ref) <= 1L); 112 + #else 113 + percpu_ref_get(&ca->ref); 114 + #endif 115 + } 116 + 117 + static inline void __bch2_dev_put(struct bch_dev *ca) 118 + { 119 + #ifdef CONFIG_BCACHEFS_DEBUG 120 + long r = atomic_long_dec_return(&ca->ref); 121 + if (r < (long) !ca->dying) 122 + panic("bch_dev->ref underflow, last put: %pS\n", (void *) ca->last_put); 123 + ca->last_put = _THIS_IP_; 124 + if (!r) 125 + complete(&ca->ref_completion); 126 + #else 127 + percpu_ref_put(&ca->ref); 128 + #endif 129 + } 130 + 131 + static inline void bch2_dev_put(struct bch_dev *ca) 132 + { 133 + if (ca) 134 + __bch2_dev_put(ca); 135 + } 136 + 95 137 static inline struct bch_dev *bch2_get_next_dev(struct bch_fs *c, struct bch_dev *ca) 96 138 { 97 139 rcu_read_lock(); 98 - if (ca) 99 - percpu_ref_put(&ca->ref); 100 - 140 + bch2_dev_put(ca); 101 141 if ((ca = __bch2_next_dev(c, ca, NULL))) 102 - percpu_ref_get(&ca->ref); 142 + bch2_dev_get(ca); 103 143 rcu_read_unlock(); 104 144 105 145 return ca; ··· 172 158 #define for_each_readable_member(c, ca) \ 173 159 __for_each_online_member(c, ca, BIT( BCH_MEMBER_STATE_rw)|BIT(BCH_MEMBER_STATE_ro)) 174 160 175 - /* 176 - * If a key exists that references a device, the device won't be going away and 177 - * we can omit rcu_read_lock(): 178 - */ 179 - static inline struct bch_dev *bch_dev_bkey_exists(const struct bch_fs *c, unsigned idx) 161 + static inline bool bch2_dev_exists(const struct bch_fs *c, unsigned dev) 180 162 { 181 - EBUG_ON(idx >= c->sb.nr_devices || !c->devs[idx]); 182 - 183 - return rcu_dereference_check(c->devs[idx], 1); 163 + return dev < c->sb.nr_devices && c->devs[dev]; 184 164 } 185 165 186 - static inline struct bch_dev *bch_dev_locked(struct bch_fs *c, unsigned idx) 166 + static inline bool bucket_valid(const struct bch_dev *ca, u64 b) 187 167 { 188 - EBUG_ON(idx >= c->sb.nr_devices || !c->devs[idx]); 168 + return b - ca->mi.first_bucket < ca->mi.nbuckets_minus_first; 169 + } 189 170 190 - return rcu_dereference_protected(c->devs[idx], 171 + static inline struct bch_dev *bch2_dev_have_ref(const struct bch_fs *c, unsigned dev) 172 + { 173 + EBUG_ON(!bch2_dev_exists(c, dev)); 174 + 175 + return rcu_dereference_check(c->devs[dev], 1); 176 + } 177 + 178 + static inline struct bch_dev *bch2_dev_locked(struct bch_fs *c, unsigned dev) 179 + { 180 + EBUG_ON(!bch2_dev_exists(c, dev)); 181 + 182 + return rcu_dereference_protected(c->devs[dev], 191 183 lockdep_is_held(&c->sb_lock) || 192 184 lockdep_is_held(&c->state_lock)); 185 + } 186 + 187 + static inline struct bch_dev *bch2_dev_rcu(struct bch_fs *c, unsigned dev) 188 + { 189 + return c && dev < c->sb.nr_devices 190 + ? rcu_dereference(c->devs[dev]) 191 + : NULL; 192 + } 193 + 194 + static inline struct bch_dev *bch2_dev_tryget_noerror(struct bch_fs *c, unsigned dev) 195 + { 196 + rcu_read_lock(); 197 + struct bch_dev *ca = bch2_dev_rcu(c, dev); 198 + if (ca) 199 + bch2_dev_get(ca); 200 + rcu_read_unlock(); 201 + return ca; 202 + } 203 + 204 + void bch2_dev_missing(struct bch_fs *, unsigned); 205 + 206 + static inline struct bch_dev *bch2_dev_tryget(struct bch_fs *c, unsigned dev) 207 + { 208 + struct bch_dev *ca = bch2_dev_tryget_noerror(c, dev); 209 + if (!ca) 210 + bch2_dev_missing(c, dev); 211 + return ca; 212 + } 213 + 214 + static inline struct bch_dev *bch2_dev_bucket_tryget_noerror(struct bch_fs *c, struct bpos bucket) 215 + { 216 + struct bch_dev *ca = bch2_dev_tryget_noerror(c, bucket.inode); 217 + if (ca && !bucket_valid(ca, bucket.offset)) { 218 + bch2_dev_put(ca); 219 + ca = NULL; 220 + } 221 + return ca; 222 + } 223 + 224 + void bch2_dev_bucket_missing(struct bch_fs *, struct bpos); 225 + 226 + static inline struct bch_dev *bch2_dev_bucket_tryget(struct bch_fs *c, struct bpos bucket) 227 + { 228 + struct bch_dev *ca = bch2_dev_bucket_tryget_noerror(c, bucket); 229 + if (!ca) 230 + bch2_dev_bucket_missing(c, bucket); 231 + return ca; 232 + } 233 + 234 + static inline struct bch_dev *bch2_dev_iterate_noerror(struct bch_fs *c, struct bch_dev *ca, unsigned dev_idx) 235 + { 236 + if (ca && ca->dev_idx == dev_idx) 237 + return ca; 238 + bch2_dev_put(ca); 239 + return bch2_dev_tryget_noerror(c, dev_idx); 240 + } 241 + 242 + static inline struct bch_dev *bch2_dev_iterate(struct bch_fs *c, struct bch_dev *ca, unsigned dev_idx) 243 + { 244 + if (ca && ca->dev_idx == dev_idx) 245 + return ca; 246 + bch2_dev_put(ca); 247 + return bch2_dev_tryget(c, dev_idx); 248 + } 249 + 250 + static inline struct bch_dev *bch2_dev_get_ioref(struct bch_fs *c, unsigned dev, int rw) 251 + { 252 + rcu_read_lock(); 253 + struct bch_dev *ca = bch2_dev_rcu(c, dev); 254 + if (ca && !percpu_ref_tryget(&ca->io_ref)) 255 + ca = NULL; 256 + rcu_read_unlock(); 257 + 258 + if (ca && 259 + (ca->mi.state == BCH_MEMBER_STATE_rw || 260 + (ca->mi.state == BCH_MEMBER_STATE_ro && rw == READ))) 261 + return ca; 262 + 263 + if (ca) 264 + percpu_ref_put(&ca->io_ref); 265 + return NULL; 193 266 } 194 267 195 268 /* XXX kill, move to struct bch_fs */ ··· 293 192 extern const struct bch_sb_field_ops bch_sb_field_ops_members_v1; 294 193 extern const struct bch_sb_field_ops bch_sb_field_ops_members_v2; 295 194 296 - static inline bool bch2_member_exists(struct bch_member *m) 195 + static inline bool bch2_member_alive(struct bch_member *m) 297 196 { 298 197 return !bch2_is_zero(&m->uuid, sizeof(m->uuid)); 299 198 } 300 199 301 - static inline bool bch2_dev_exists(struct bch_sb *sb, unsigned dev) 200 + static inline bool bch2_member_exists(struct bch_sb *sb, unsigned dev) 302 201 { 303 202 if (dev < sb->nr_devices) { 304 203 struct bch_member m = bch2_sb_member_get(sb, dev); 305 - return bch2_member_exists(&m); 204 + return bch2_member_alive(&m); 306 205 } 307 206 return false; 308 207 } ··· 311 210 { 312 211 return (struct bch_member_cpu) { 313 212 .nbuckets = le64_to_cpu(mi->nbuckets), 213 + .nbuckets_minus_first = le64_to_cpu(mi->nbuckets) - 214 + le16_to_cpu(mi->first_bucket), 314 215 .first_bucket = le16_to_cpu(mi->first_bucket), 315 216 .bucket_size = le16_to_cpu(mi->bucket_size), 316 217 .group = BCH_MEMBER_GROUP(mi), ··· 323 220 ? BCH_MEMBER_DURABILITY(mi) - 1 324 221 : 1, 325 222 .freespace_initialized = BCH_MEMBER_FREESPACE_INITIALIZED(mi), 326 - .valid = bch2_member_exists(mi), 223 + .valid = bch2_member_alive(mi), 327 224 .btree_bitmap_shift = mi->btree_bitmap_shift, 328 225 .btree_allocated_bitmap = le64_to_cpu(mi->btree_allocated_bitmap), 329 226 };
+21
fs/bcachefs/sb-members_types.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _BCACHEFS_SB_MEMBERS_TYPES_H 3 + #define _BCACHEFS_SB_MEMBERS_TYPES_H 4 + 5 + struct bch_member_cpu { 6 + u64 nbuckets; /* device size */ 7 + u64 nbuckets_minus_first; 8 + u16 first_bucket; /* index of first bucket used */ 9 + u16 bucket_size; /* sectors */ 10 + u16 group; 11 + u8 state; 12 + u8 discard; 13 + u8 data_allowed; 14 + u8 durability; 15 + u8 freespace_initialized; 16 + u8 valid; 17 + u8 btree_bitmap_shift; 18 + u64 btree_allocated_bitmap; 19 + }; 20 + 21 + #endif /* _BCACHEFS_SB_MEMBERS_H */
+27 -26
fs/bcachefs/snapshot.c
··· 32 32 } 33 33 34 34 int bch2_snapshot_tree_invalid(struct bch_fs *c, struct bkey_s_c k, 35 - enum bkey_invalid_flags flags, 35 + enum bch_validate_flags flags, 36 36 struct printbuf *err) 37 37 { 38 38 int ret = 0; ··· 49 49 struct bch_snapshot_tree *s) 50 50 { 51 51 int ret = bch2_bkey_get_val_typed(trans, BTREE_ID_snapshot_trees, POS(0, id), 52 - BTREE_ITER_WITH_UPDATES, snapshot_tree, s); 52 + BTREE_ITER_with_updates, snapshot_tree, s); 53 53 54 54 if (bch2_err_matches(ret, ENOENT)) 55 55 ret = -BCH_ERR_ENOENT_snapshot_tree; ··· 223 223 } 224 224 225 225 int bch2_snapshot_invalid(struct bch_fs *c, struct bkey_s_c k, 226 - enum bkey_invalid_flags flags, 226 + enum bch_validate_flags flags, 227 227 struct printbuf *err) 228 228 { 229 229 struct bkey_s_c_snapshot s; ··· 298 298 static int __bch2_mark_snapshot(struct btree_trans *trans, 299 299 enum btree_id btree, unsigned level, 300 300 struct bkey_s_c old, struct bkey_s_c new, 301 - unsigned flags) 301 + enum btree_iter_update_trigger_flags flags) 302 302 { 303 303 struct bch_fs *c = trans->c; 304 304 struct snapshot_t *t; ··· 352 352 int bch2_mark_snapshot(struct btree_trans *trans, 353 353 enum btree_id btree, unsigned level, 354 354 struct bkey_s_c old, struct bkey_s new, 355 - unsigned flags) 355 + enum btree_iter_update_trigger_flags flags) 356 356 { 357 357 return __bch2_mark_snapshot(trans, btree, level, old, new.s_c, flags); 358 358 } ··· 361 361 struct bch_snapshot *s) 362 362 { 363 363 return bch2_bkey_get_val_typed(trans, BTREE_ID_snapshots, POS(0, id), 364 - BTREE_ITER_WITH_UPDATES, snapshot, s); 364 + BTREE_ITER_with_updates, snapshot, s); 365 365 } 366 366 367 367 static int bch2_snapshot_live(struct btree_trans *trans, u32 id) ··· 618 618 int ret = bch2_trans_run(c, 619 619 for_each_btree_key_commit(trans, iter, 620 620 BTREE_ID_snapshot_trees, POS_MIN, 621 - BTREE_ITER_PREFETCH, k, 621 + BTREE_ITER_prefetch, k, 622 622 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 623 623 check_snapshot_tree(trans, &iter, k))); 624 624 bch_err_fn(c, ret); ··· 695 695 696 696 root = bch2_bkey_get_iter_typed(trans, &root_iter, 697 697 BTREE_ID_snapshots, POS(0, root_id), 698 - BTREE_ITER_WITH_UPDATES, snapshot); 698 + BTREE_ITER_with_updates, snapshot); 699 699 ret = bkey_err(root); 700 700 if (ret) 701 701 goto err; ··· 886 886 int ret = bch2_trans_run(c, 887 887 for_each_btree_key_reverse_commit(trans, iter, 888 888 BTREE_ID_snapshots, POS_MAX, 889 - BTREE_ITER_PREFETCH, k, 889 + BTREE_ITER_prefetch, k, 890 890 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 891 891 check_snapshot(trans, &iter, k))); 892 892 bch_err_fn(c, ret); ··· 900 900 if (bch2_snapshot_equiv(c, id)) 901 901 return 0; 902 902 903 - u32 tree_id; 903 + /* 0 is an invalid tree ID */ 904 + u32 tree_id = 0; 904 905 int ret = bch2_snapshot_tree_create(trans, id, 0, &tree_id); 905 906 if (ret) 906 907 return ret; ··· 1002 1001 r.btree = btree; 1003 1002 1004 1003 ret = for_each_btree_key(trans, iter, btree, POS_MIN, 1005 - BTREE_ITER_ALL_SNAPSHOTS|BTREE_ITER_PREFETCH, k, ({ 1004 + BTREE_ITER_all_snapshots|BTREE_ITER_prefetch, k, ({ 1006 1005 get_snapshot_trees(c, &r, k.k->p); 1007 1006 })); 1008 1007 if (ret) ··· 1019 1018 darray_for_each(*t, id) { 1020 1019 if (fsck_err_on(!bch2_snapshot_equiv(c, *id), 1021 1020 c, snapshot_node_missing, 1022 - "snapshot node %u from tree %s missing", *id, buf.buf)) { 1021 + "snapshot node %u from tree %s missing, recreate?", *id, buf.buf)) { 1023 1022 if (t->nr > 1) { 1024 1023 bch_err(c, "cannot reconstruct snapshot trees with multiple nodes"); 1025 1024 ret = -BCH_ERR_fsck_repair_unimplemented; ··· 1091 1090 int ret = 0; 1092 1091 1093 1092 s = bch2_bkey_get_iter_typed(trans, &iter, BTREE_ID_snapshots, POS(0, id), 1094 - BTREE_ITER_INTENT, snapshot); 1093 + BTREE_ITER_intent, snapshot); 1095 1094 ret = bkey_err(s); 1096 1095 bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), c, 1097 1096 "missing snapshot %u", id); ··· 1200 1199 int ret; 1201 1200 1202 1201 bch2_trans_iter_init(trans, &iter, BTREE_ID_snapshots, 1203 - POS_MIN, BTREE_ITER_INTENT); 1202 + POS_MIN, BTREE_ITER_intent); 1204 1203 k = bch2_btree_iter_peek(&iter); 1205 1204 ret = bkey_err(k); 1206 1205 if (ret) ··· 1368 1367 if (snapshot_list_has_id(deleted, k.k->p.snapshot) || 1369 1368 snapshot_list_has_id(equiv_seen, equiv)) { 1370 1369 return bch2_btree_delete_at(trans, iter, 1371 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 1370 + BTREE_UPDATE_internal_snapshot_node); 1372 1371 } else { 1373 1372 return snapshot_list_add(c, equiv_seen, equiv); 1374 1373 } ··· 1405 1404 new->k.p.snapshot = equiv; 1406 1405 1407 1406 bch2_trans_iter_init(trans, &new_iter, iter->btree_id, new->k.p, 1408 - BTREE_ITER_ALL_SNAPSHOTS| 1409 - BTREE_ITER_CACHED| 1410 - BTREE_ITER_INTENT); 1407 + BTREE_ITER_all_snapshots| 1408 + BTREE_ITER_cached| 1409 + BTREE_ITER_intent); 1411 1410 1412 1411 ret = bch2_btree_iter_traverse(&new_iter) ?: 1413 1412 bch2_trans_update(trans, &new_iter, new, 1414 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) ?: 1413 + BTREE_UPDATE_internal_snapshot_node) ?: 1415 1414 bch2_btree_delete_at(trans, iter, 1416 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 1415 + BTREE_UPDATE_internal_snapshot_node); 1417 1416 bch2_trans_iter_exit(trans, &new_iter); 1418 1417 if (ret) 1419 1418 return ret; ··· 1604 1603 1605 1604 ret = for_each_btree_key_commit(trans, iter, 1606 1605 id, POS_MIN, 1607 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, 1606 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 1608 1607 &res, NULL, BCH_TRANS_COMMIT_no_enospc, 1609 1608 snapshot_delete_key(trans, &iter, k, &deleted, &equiv_seen, &last_pos)) ?: 1610 1609 for_each_btree_key_commit(trans, iter, 1611 1610 id, POS_MIN, 1612 - BTREE_ITER_PREFETCH|BTREE_ITER_ALL_SNAPSHOTS, k, 1611 + BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 1613 1612 &res, NULL, BCH_TRANS_COMMIT_no_enospc, 1614 1613 move_key_to_correct_snapshot(trans, &iter, k)); 1615 1614 ··· 1644 1643 * nodes some depth fields will be off: 1645 1644 */ 1646 1645 ret = for_each_btree_key_commit(trans, iter, BTREE_ID_snapshots, POS_MIN, 1647 - BTREE_ITER_INTENT, k, 1646 + BTREE_ITER_intent, k, 1648 1647 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 1649 1648 bch2_fix_child_of_deleted_snapshot(trans, &iter, k, &deleted_interior)); 1650 1649 if (ret) ··· 1700 1699 int ret; 1701 1700 1702 1701 bch2_trans_iter_init(trans, &iter, id, pos, 1703 - BTREE_ITER_NOT_EXTENTS| 1704 - BTREE_ITER_ALL_SNAPSHOTS); 1702 + BTREE_ITER_not_extents| 1703 + BTREE_ITER_all_snapshots); 1705 1704 while (1) { 1706 1705 k = bch2_btree_iter_prev(&iter); 1707 1706 ret = bkey_err(k); ··· 1753 1752 1754 1753 pos.snapshot = leaf_id; 1755 1754 1756 - bch2_trans_iter_init(trans, &iter, btree, pos, BTREE_ITER_INTENT); 1755 + bch2_trans_iter_init(trans, &iter, btree, pos, BTREE_ITER_intent); 1757 1756 k = bch2_btree_iter_peek_slot(&iter); 1758 1757 ret = bkey_err(k); 1759 1758 if (ret)
+6 -10
fs/bcachefs/snapshot.h
··· 2 2 #ifndef _BCACHEFS_SNAPSHOT_H 3 3 #define _BCACHEFS_SNAPSHOT_H 4 4 5 - enum bkey_invalid_flags; 5 + enum bch_validate_flags; 6 6 7 7 void bch2_snapshot_tree_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 8 8 int bch2_snapshot_tree_invalid(struct bch_fs *, struct bkey_s_c, 9 - enum bkey_invalid_flags, struct printbuf *); 9 + enum bch_validate_flags, struct printbuf *); 10 10 11 11 #define bch2_bkey_ops_snapshot_tree ((struct bkey_ops) { \ 12 12 .key_invalid = bch2_snapshot_tree_invalid, \ ··· 20 20 21 21 void bch2_snapshot_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 22 22 int bch2_snapshot_invalid(struct bch_fs *, struct bkey_s_c, 23 - enum bkey_invalid_flags, struct printbuf *); 23 + enum bch_validate_flags, struct printbuf *); 24 24 int bch2_mark_snapshot(struct btree_trans *, enum btree_id, unsigned, 25 - struct bkey_s_c, struct bkey_s, unsigned); 25 + struct bkey_s_c, struct bkey_s, 26 + enum btree_iter_update_trigger_flags); 26 27 27 28 #define bch2_bkey_ops_snapshot ((struct bkey_ops) { \ 28 29 .key_invalid = bch2_snapshot_invalid, \ ··· 78 77 return 0; 79 78 80 79 u32 parent = s->parent; 81 - if (IS_ENABLED(CONFIG_BCACHEFS_DEBU) && 80 + if (IS_ENABLED(CONFIG_BCACHEFS_DEBUG) && 82 81 parent && 83 82 s->depth != snapshot_t(c, parent)->depth + 1) 84 83 panic("id %u depth=%u parent %u depth=%u\n", ··· 134 133 rcu_read_unlock(); 135 134 136 135 return id; 137 - } 138 - 139 - static inline bool bch2_snapshot_is_equiv(struct bch_fs *c, u32 id) 140 - { 141 - return id == bch2_snapshot_equiv(c, id); 142 136 } 143 137 144 138 static inline int bch2_snapshot_is_internal_node(struct bch_fs *c, u32 id)
+29 -41
fs/bcachefs/str_hash.h
··· 15 15 #include <crypto/hash.h> 16 16 #include <crypto/sha2.h> 17 17 18 - typedef unsigned __bitwise bch_str_hash_flags_t; 19 - 20 - enum bch_str_hash_flags { 21 - __BCH_HASH_SET_MUST_CREATE, 22 - __BCH_HASH_SET_MUST_REPLACE, 23 - }; 24 - 25 - #define BCH_HASH_SET_MUST_CREATE (__force bch_str_hash_flags_t) BIT(__BCH_HASH_SET_MUST_CREATE) 26 - #define BCH_HASH_SET_MUST_REPLACE (__force bch_str_hash_flags_t) BIT(__BCH_HASH_SET_MUST_REPLACE) 27 - 28 18 static inline enum bch_str_hash_type 29 19 bch2_str_hash_opt_to_type(struct bch_fs *c, enum bch_str_hash_opts opt) 30 20 { ··· 149 159 desc.is_visible(inum, k)); 150 160 } 151 161 152 - static __always_inline int 162 + static __always_inline struct bkey_s_c 153 163 bch2_hash_lookup_in_snapshot(struct btree_trans *trans, 154 164 struct btree_iter *iter, 155 165 const struct bch_hash_desc desc, 156 166 const struct bch_hash_info *info, 157 167 subvol_inum inum, const void *key, 158 - unsigned flags, u32 snapshot) 168 + enum btree_iter_update_trigger_flags flags, 169 + u32 snapshot) 159 170 { 160 171 struct bkey_s_c k; 161 172 int ret; ··· 164 173 for_each_btree_key_upto_norestart(trans, *iter, desc.btree_id, 165 174 SPOS(inum.inum, desc.hash_key(info, key), snapshot), 166 175 POS(inum.inum, U64_MAX), 167 - BTREE_ITER_SLOTS|flags, k, ret) { 176 + BTREE_ITER_slots|flags, k, ret) { 168 177 if (is_visible_key(desc, inum, k)) { 169 178 if (!desc.cmp_key(k, key)) 170 - return 0; 179 + return k; 171 180 } else if (k.k->type == KEY_TYPE_hash_whiteout) { 172 181 ; 173 182 } else { ··· 177 186 } 178 187 bch2_trans_iter_exit(trans, iter); 179 188 180 - return ret ?: -BCH_ERR_ENOENT_str_hash_lookup; 189 + return bkey_s_c_err(ret ?: -BCH_ERR_ENOENT_str_hash_lookup); 181 190 } 182 191 183 - static __always_inline int 192 + static __always_inline struct bkey_s_c 184 193 bch2_hash_lookup(struct btree_trans *trans, 185 194 struct btree_iter *iter, 186 195 const struct bch_hash_desc desc, 187 196 const struct bch_hash_info *info, 188 197 subvol_inum inum, const void *key, 189 - unsigned flags) 198 + enum btree_iter_update_trigger_flags flags) 190 199 { 191 200 u32 snapshot; 192 - return bch2_subvolume_get_snapshot(trans, inum.subvol, &snapshot) ?: 193 - bch2_hash_lookup_in_snapshot(trans, iter, desc, info, inum, key, flags, snapshot); 201 + int ret = bch2_subvolume_get_snapshot(trans, inum.subvol, &snapshot); 202 + if (ret) 203 + return bkey_s_c_err(ret); 204 + 205 + return bch2_hash_lookup_in_snapshot(trans, iter, desc, info, inum, key, flags, snapshot); 194 206 } 195 207 196 208 static __always_inline int ··· 214 220 for_each_btree_key_upto_norestart(trans, *iter, desc.btree_id, 215 221 SPOS(inum.inum, desc.hash_key(info, key), snapshot), 216 222 POS(inum.inum, U64_MAX), 217 - BTREE_ITER_SLOTS|BTREE_ITER_INTENT, k, ret) 223 + BTREE_ITER_slots|BTREE_ITER_intent, k, ret) 218 224 if (!is_visible_key(desc, inum, k)) 219 225 return 0; 220 226 bch2_trans_iter_exit(trans, iter); ··· 236 242 237 243 bch2_btree_iter_advance(&iter); 238 244 239 - for_each_btree_key_continue_norestart(iter, BTREE_ITER_SLOTS, k, ret) { 245 + for_each_btree_key_continue_norestart(iter, BTREE_ITER_slots, k, ret) { 240 246 if (k.k->type != desc.key_type && 241 247 k.k->type != KEY_TYPE_hash_whiteout) 242 248 break; ··· 258 264 const struct bch_hash_info *info, 259 265 subvol_inum inum, u32 snapshot, 260 266 struct bkey_i *insert, 261 - bch_str_hash_flags_t str_hash_flags, 262 - int update_flags) 267 + enum btree_iter_update_trigger_flags flags) 263 268 { 264 269 struct btree_iter iter, slot = { NULL }; 265 270 struct bkey_s_c k; ··· 270 277 desc.hash_bkey(info, bkey_i_to_s_c(insert)), 271 278 snapshot), 272 279 POS(insert->k.p.inode, U64_MAX), 273 - BTREE_ITER_SLOTS|BTREE_ITER_INTENT, k, ret) { 280 + BTREE_ITER_slots|BTREE_ITER_intent, k, ret) { 274 281 if (is_visible_key(desc, inum, k)) { 275 282 if (!desc.cmp_bkey(k, bkey_i_to_s_c(insert))) 276 283 goto found; ··· 279 286 continue; 280 287 } 281 288 282 - if (!slot.path && 283 - !(str_hash_flags & BCH_HASH_SET_MUST_REPLACE)) 289 + if (!slot.path && !(flags & STR_HASH_must_replace)) 284 290 bch2_trans_copy_iter(&slot, &iter); 285 291 286 292 if (k.k->type != KEY_TYPE_hash_whiteout) ··· 297 305 found = true; 298 306 not_found: 299 307 300 - if (!found && (str_hash_flags & BCH_HASH_SET_MUST_REPLACE)) { 308 + if (!found && (flags & STR_HASH_must_replace)) { 301 309 ret = -BCH_ERR_ENOENT_str_hash_set_must_replace; 302 - } else if (found && (str_hash_flags & BCH_HASH_SET_MUST_CREATE)) { 310 + } else if (found && (flags & STR_HASH_must_create)) { 303 311 ret = -EEXIST; 304 312 } else { 305 313 if (!found && slot.path) 306 314 swap(iter, slot); 307 315 308 316 insert->k.p = iter.pos; 309 - ret = bch2_trans_update(trans, &iter, insert, update_flags); 317 + ret = bch2_trans_update(trans, &iter, insert, flags); 310 318 } 311 319 312 320 goto out; ··· 318 326 const struct bch_hash_info *info, 319 327 subvol_inum inum, 320 328 struct bkey_i *insert, 321 - bch_str_hash_flags_t str_hash_flags) 329 + enum btree_iter_update_trigger_flags flags) 322 330 { 323 331 insert->k.p.inode = inum.inum; 324 332 325 333 u32 snapshot; 326 334 return bch2_subvolume_get_snapshot(trans, inum.subvol, &snapshot) ?: 327 335 bch2_hash_set_in_snapshot(trans, desc, info, inum, 328 - snapshot, insert, str_hash_flags, 0); 336 + snapshot, insert, flags); 329 337 } 330 338 331 339 static __always_inline ··· 333 341 const struct bch_hash_desc desc, 334 342 const struct bch_hash_info *info, 335 343 struct btree_iter *iter, 336 - unsigned update_flags) 344 + enum btree_iter_update_trigger_flags flags) 337 345 { 338 346 struct bkey_i *delete; 339 347 int ret; ··· 351 359 delete->k.p = iter->pos; 352 360 delete->k.type = ret ? KEY_TYPE_hash_whiteout : KEY_TYPE_deleted; 353 361 354 - return bch2_trans_update(trans, iter, delete, update_flags); 362 + return bch2_trans_update(trans, iter, delete, flags); 355 363 } 356 364 357 365 static __always_inline ··· 361 369 subvol_inum inum, const void *key) 362 370 { 363 371 struct btree_iter iter; 364 - int ret; 365 - 366 - ret = bch2_hash_lookup(trans, &iter, desc, info, inum, key, 367 - BTREE_ITER_INTENT); 368 - if (ret) 369 - return ret; 370 - 371 - ret = bch2_hash_delete_at(trans, desc, info, &iter, 0); 372 + struct bkey_s_c k = bch2_hash_lookup(trans, &iter, desc, info, inum, key, 373 + BTREE_ITER_intent); 374 + int ret = bkey_err(k) ?: 375 + bch2_hash_delete_at(trans, desc, info, &iter, 0); 372 376 bch2_trans_iter_exit(trans, &iter); 373 377 return ret; 374 378 }
+11 -11
fs/bcachefs/subvolume.c
··· 162 162 { 163 163 int ret = bch2_trans_run(c, 164 164 for_each_btree_key_commit(trans, iter, 165 - BTREE_ID_subvolumes, POS_MIN, BTREE_ITER_PREFETCH, k, 165 + BTREE_ID_subvolumes, POS_MIN, BTREE_ITER_prefetch, k, 166 166 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 167 167 check_subvol(trans, &iter, k))); 168 168 bch_err_fn(c, ret); ··· 198 198 { 199 199 int ret = bch2_trans_run(c, 200 200 for_each_btree_key_commit(trans, iter, 201 - BTREE_ID_subvolume_children, POS_MIN, BTREE_ITER_PREFETCH, k, 201 + BTREE_ID_subvolume_children, POS_MIN, BTREE_ITER_prefetch, k, 202 202 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 203 203 check_subvol_child(trans, &iter, k))); 204 204 bch_err_fn(c, ret); ··· 208 208 /* Subvolumes: */ 209 209 210 210 int bch2_subvolume_invalid(struct bch_fs *c, struct bkey_s_c k, 211 - enum bkey_invalid_flags flags, struct printbuf *err) 211 + enum bch_validate_flags flags, struct printbuf *err) 212 212 { 213 213 int ret = 0; 214 214 ··· 245 245 int bch2_subvolume_trigger(struct btree_trans *trans, 246 246 enum btree_id btree_id, unsigned level, 247 247 struct bkey_s_c old, struct bkey_s new, 248 - unsigned flags) 248 + enum btree_iter_update_trigger_flags flags) 249 249 { 250 - if (flags & BTREE_TRIGGER_TRANSACTIONAL) { 250 + if (flags & BTREE_TRIGGER_transactional) { 251 251 struct bpos children_pos_old = subvolume_children_pos(old); 252 252 struct bpos children_pos_new = subvolume_children_pos(new.s_c); 253 253 ··· 333 333 334 334 subvol = bch2_bkey_get_iter_typed(trans, &iter, 335 335 BTREE_ID_subvolumes, POS(0, subvolid), 336 - BTREE_ITER_CACHED|BTREE_ITER_WITH_UPDATES, 336 + BTREE_ITER_cached|BTREE_ITER_with_updates, 337 337 subvolume); 338 338 ret = bkey_err(subvol); 339 339 bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), trans->c, ··· 383 383 384 384 return lockrestart_do(trans, 385 385 bch2_subvolume_get(trans, subvolid_to_delete, true, 386 - BTREE_ITER_CACHED, &s)) ?: 386 + BTREE_ITER_cached, &s)) ?: 387 387 for_each_btree_key_commit(trans, iter, 388 - BTREE_ID_subvolumes, POS_MIN, BTREE_ITER_PREFETCH, k, 388 + BTREE_ID_subvolumes, POS_MIN, BTREE_ITER_prefetch, k, 389 389 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 390 390 bch2_subvolume_reparent(trans, &iter, k, 391 391 subvolid_to_delete, le32_to_cpu(s.creation_parent))); ··· 404 404 405 405 subvol = bch2_bkey_get_iter_typed(trans, &iter, 406 406 BTREE_ID_subvolumes, POS(0, subvolid), 407 - BTREE_ITER_CACHED|BTREE_ITER_INTENT, 407 + BTREE_ITER_cached|BTREE_ITER_intent, 408 408 subvolume); 409 409 ret = bkey_err(subvol); 410 410 bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), trans->c, ··· 505 505 506 506 n = bch2_bkey_get_mut_typed(trans, &iter, 507 507 BTREE_ID_subvolumes, POS(0, subvolid), 508 - BTREE_ITER_CACHED, subvolume); 508 + BTREE_ITER_cached, subvolume); 509 509 ret = PTR_ERR_OR_ZERO(n); 510 510 if (unlikely(ret)) { 511 511 bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), trans->c, ··· 547 547 548 548 src_subvol = bch2_bkey_get_mut_typed(trans, &src_iter, 549 549 BTREE_ID_subvolumes, POS(0, src_subvolid), 550 - BTREE_ITER_CACHED, subvolume); 550 + BTREE_ITER_cached, subvolume); 551 551 ret = PTR_ERR_OR_ZERO(src_subvol); 552 552 if (unlikely(ret)) { 553 553 bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), c,
+4 -3
fs/bcachefs/subvolume.h
··· 5 5 #include "darray.h" 6 6 #include "subvolume_types.h" 7 7 8 - enum bkey_invalid_flags; 8 + enum bch_validate_flags; 9 9 10 10 int bch2_check_subvols(struct bch_fs *); 11 11 int bch2_check_subvol_children(struct bch_fs *); 12 12 13 13 int bch2_subvolume_invalid(struct bch_fs *, struct bkey_s_c, 14 - enum bkey_invalid_flags, struct printbuf *); 14 + enum bch_validate_flags, struct printbuf *); 15 15 void bch2_subvolume_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 16 16 int bch2_subvolume_trigger(struct btree_trans *, enum btree_id, unsigned, 17 - struct bkey_s_c, struct bkey_s, unsigned); 17 + struct bkey_s_c, struct bkey_s, 18 + enum btree_iter_update_trigger_flags); 18 19 19 20 #define bch2_bkey_ops_subvolume ((struct bkey_ops) { \ 20 21 .key_invalid = bch2_subvolume_invalid, \
+49 -68
fs/bcachefs/super-io.c
··· 76 76 }; 77 77 78 78 static int bch2_sb_field_validate(struct bch_sb *, struct bch_sb_field *, 79 - struct printbuf *); 79 + enum bch_validate_flags, struct printbuf *); 80 80 81 81 struct bch_sb_field *bch2_sb_field_get_id(struct bch_sb *sb, 82 82 enum bch_sb_field_type type) ··· 344 344 return 0; 345 345 } 346 346 347 - static int bch2_sb_validate(struct bch_sb_handle *disk_sb, struct printbuf *out, 348 - int rw) 347 + static int bch2_sb_validate(struct bch_sb_handle *disk_sb, 348 + enum bch_validate_flags flags, struct printbuf *out) 349 349 { 350 350 struct bch_sb *sb = disk_sb->sb; 351 351 struct bch_sb_field_members_v1 *mi; ··· 401 401 return -BCH_ERR_invalid_sb_time_precision; 402 402 } 403 403 404 - if (rw == READ) { 404 + if (!flags) { 405 405 /* 406 406 * Been seeing a bug where these are getting inexplicably 407 407 * zeroed, so we're now validating them, but we have to be ··· 457 457 return -BCH_ERR_invalid_sb_members_missing; 458 458 } 459 459 460 - ret = bch2_sb_field_validate(sb, &mi->field, out); 460 + ret = bch2_sb_field_validate(sb, &mi->field, flags, out); 461 461 if (ret) 462 462 return ret; 463 463 ··· 465 465 if (le32_to_cpu(f->type) == BCH_SB_FIELD_members_v1) 466 466 continue; 467 467 468 - ret = bch2_sb_field_validate(sb, f, out); 468 + ret = bch2_sb_field_validate(sb, f, flags, out); 469 469 if (ret) 470 470 return ret; 471 471 } 472 472 473 - if (rw == WRITE && 473 + if ((flags & BCH_VALIDATE_write) && 474 474 bch2_sb_member_get(sb, sb->dev_idx).seq != sb->seq) { 475 475 prt_printf(out, "Invalid superblock: member seq %llu != sb seq %llu", 476 476 le64_to_cpu(bch2_sb_member_get(sb, sb->dev_idx).seq), ··· 819 819 820 820 sb->have_layout = true; 821 821 822 - ret = bch2_sb_validate(sb, &err, READ); 822 + ret = bch2_sb_validate(sb, 0, &err); 823 823 if (ret) { 824 824 bch2_print_opts(opts, KERN_ERR "bcachefs (%s): error validating superblock: %s\n", 825 825 path, err.buf); ··· 975 975 darray_for_each(online_devices, ca) { 976 976 printbuf_reset(&err); 977 977 978 - ret = bch2_sb_validate(&(*ca)->disk_sb, &err, WRITE); 978 + ret = bch2_sb_validate(&(*ca)->disk_sb, BCH_VALIDATE_write, &err); 979 979 if (ret) { 980 980 bch2_fs_inconsistent(c, "sb invalid before write: %s", err.buf); 981 981 goto out; ··· 1020 1020 continue; 1021 1021 1022 1022 if (le64_to_cpu(ca->sb_read_scratch->seq) < ca->disk_sb.seq) { 1023 - bch2_fs_fatal_error(c, 1023 + struct printbuf buf = PRINTBUF; 1024 + prt_char(&buf, ' '); 1025 + prt_bdevname(&buf, ca->disk_sb.bdev); 1026 + prt_printf(&buf, 1024 1027 ": Superblock write was silently dropped! (seq %llu expected %llu)", 1025 1028 le64_to_cpu(ca->sb_read_scratch->seq), 1026 1029 ca->disk_sb.seq); 1027 - percpu_ref_put(&ca->io_ref); 1030 + bch2_fs_fatal_error(c, "%s", buf.buf); 1031 + printbuf_exit(&buf); 1028 1032 ret = -BCH_ERR_erofs_sb_err; 1029 - goto out; 1030 1033 } 1031 1034 1032 1035 if (le64_to_cpu(ca->sb_read_scratch->seq) > ca->disk_sb.seq) { 1033 - bch2_fs_fatal_error(c, 1036 + struct printbuf buf = PRINTBUF; 1037 + prt_char(&buf, ' '); 1038 + prt_bdevname(&buf, ca->disk_sb.bdev); 1039 + prt_printf(&buf, 1034 1040 ": Superblock modified by another process (seq %llu expected %llu)", 1035 1041 le64_to_cpu(ca->sb_read_scratch->seq), 1036 1042 ca->disk_sb.seq); 1037 - percpu_ref_put(&ca->io_ref); 1043 + bch2_fs_fatal_error(c, "%s", buf.buf); 1044 + printbuf_exit(&buf); 1038 1045 ret = -BCH_ERR_erofs_sb_err; 1039 - goto out; 1040 1046 } 1041 1047 } 1048 + 1049 + if (ret) 1050 + goto out; 1042 1051 1043 1052 do { 1044 1053 wrote = false; ··· 1161 1152 } 1162 1153 1163 1154 static int bch2_sb_ext_validate(struct bch_sb *sb, struct bch_sb_field *f, 1164 - struct printbuf *err) 1155 + enum bch_validate_flags flags, struct printbuf *err) 1165 1156 { 1166 1157 if (vstruct_bytes(f) < 88) { 1167 1158 prt_printf(err, "field too small (%zu < %u)", vstruct_bytes(f), 88); ··· 1176 1167 { 1177 1168 struct bch_sb_field_ext *e = field_to_type(f, ext); 1178 1169 1179 - prt_printf(out, "Recovery passes required:"); 1180 - prt_tab(out); 1170 + prt_printf(out, "Recovery passes required:\t"); 1181 1171 prt_bitflags(out, bch2_recovery_passes, 1182 1172 bch2_recovery_passes_from_stable(le64_to_cpu(e->recovery_passes_required[0]))); 1183 1173 prt_newline(out); ··· 1185 1177 if (errors_silent) { 1186 1178 le_bitvector_to_cpu(errors_silent, (void *) e->errors_silent, sizeof(e->errors_silent) * 8); 1187 1179 1188 - prt_printf(out, "Errors to silently fix:"); 1189 - prt_tab(out); 1180 + prt_printf(out, "Errors to silently fix:\t"); 1190 1181 prt_bitflags_vector(out, bch2_sb_error_strs, errors_silent, sizeof(e->errors_silent) * 8); 1191 1182 prt_newline(out); 1192 1183 1193 1184 kfree(errors_silent); 1194 1185 } 1195 1186 1196 - prt_printf(out, "Btrees with missing data:"); 1197 - prt_tab(out); 1187 + prt_printf(out, "Btrees with missing data:\t"); 1198 1188 prt_bitflags(out, __bch2_btree_ids, le64_to_cpu(e->btrees_lost_data)); 1199 1189 prt_newline(out); 1200 1190 } ··· 1219 1213 } 1220 1214 1221 1215 static int bch2_sb_field_validate(struct bch_sb *sb, struct bch_sb_field *f, 1222 - struct printbuf *err) 1216 + enum bch_validate_flags flags, struct printbuf *err) 1223 1217 { 1224 1218 unsigned type = le32_to_cpu(f->type); 1225 1219 struct printbuf field_err = PRINTBUF; 1226 1220 const struct bch_sb_field_ops *ops = bch2_sb_field_type_ops(type); 1227 1221 int ret; 1228 1222 1229 - ret = ops->validate ? ops->validate(sb, f, &field_err) : 0; 1223 + ret = ops->validate ? ops->validate(sb, f, flags, &field_err) : 0; 1230 1224 if (ret) { 1231 1225 prt_printf(err, "Invalid superblock section %s: %s", 1232 1226 bch2_sb_fields[type], field_err.buf); ··· 1300 1294 printbuf_tabstop_push(out, 44); 1301 1295 1302 1296 for (int i = 0; i < sb->nr_devices; i++) 1303 - nr_devices += bch2_dev_exists(sb, i); 1297 + nr_devices += bch2_member_exists(sb, i); 1304 1298 1305 - prt_printf(out, "External UUID:"); 1306 - prt_tab(out); 1299 + prt_printf(out, "External UUID:\t"); 1307 1300 pr_uuid(out, sb->user_uuid.b); 1308 1301 prt_newline(out); 1309 1302 1310 - prt_printf(out, "Internal UUID:"); 1311 - prt_tab(out); 1303 + prt_printf(out, "Internal UUID:\t"); 1312 1304 pr_uuid(out, sb->uuid.b); 1313 1305 prt_newline(out); 1314 1306 1315 - prt_printf(out, "Magic number:"); 1316 - prt_tab(out); 1307 + prt_printf(out, "Magic number:\t"); 1317 1308 pr_uuid(out, sb->magic.b); 1318 1309 prt_newline(out); 1319 1310 1320 - prt_str(out, "Device index:"); 1321 - prt_tab(out); 1322 - prt_printf(out, "%u", sb->dev_idx); 1323 - prt_newline(out); 1311 + prt_printf(out, "Device index:\t%u\n", sb->dev_idx); 1324 1312 1325 - prt_str(out, "Label:"); 1326 - prt_tab(out); 1313 + prt_str(out, "Label:\t"); 1327 1314 prt_printf(out, "%.*s", (int) sizeof(sb->label), sb->label); 1328 1315 prt_newline(out); 1329 1316 1330 - prt_str(out, "Version:"); 1331 - prt_tab(out); 1317 + prt_str(out, "Version:\t"); 1332 1318 bch2_version_to_text(out, le16_to_cpu(sb->version)); 1333 1319 prt_newline(out); 1334 1320 1335 - prt_str(out, "Version upgrade complete:"); 1336 - prt_tab(out); 1321 + prt_str(out, "Version upgrade complete:\t"); 1337 1322 bch2_version_to_text(out, BCH_SB_VERSION_UPGRADE_COMPLETE(sb)); 1338 1323 prt_newline(out); 1339 1324 1340 - prt_printf(out, "Oldest version on disk:"); 1341 - prt_tab(out); 1325 + prt_printf(out, "Oldest version on disk:\t"); 1342 1326 bch2_version_to_text(out, le16_to_cpu(sb->version_min)); 1343 1327 prt_newline(out); 1344 1328 1345 - prt_printf(out, "Created:"); 1346 - prt_tab(out); 1329 + prt_printf(out, "Created:\t"); 1347 1330 if (sb->time_base_lo) 1348 1331 bch2_prt_datetime(out, div_u64(le64_to_cpu(sb->time_base_lo), NSEC_PER_SEC)); 1349 1332 else 1350 1333 prt_printf(out, "(not set)"); 1351 1334 prt_newline(out); 1352 1335 1353 - prt_printf(out, "Sequence number:"); 1354 - prt_tab(out); 1336 + prt_printf(out, "Sequence number:\t"); 1355 1337 prt_printf(out, "%llu", le64_to_cpu(sb->seq)); 1356 1338 prt_newline(out); 1357 1339 1358 - prt_printf(out, "Time of last write:"); 1359 - prt_tab(out); 1340 + prt_printf(out, "Time of last write:\t"); 1360 1341 bch2_prt_datetime(out, le64_to_cpu(sb->write_time)); 1361 1342 prt_newline(out); 1362 1343 1363 - prt_printf(out, "Superblock size:"); 1364 - prt_tab(out); 1344 + prt_printf(out, "Superblock size:\t"); 1365 1345 prt_units_u64(out, vstruct_bytes(sb)); 1366 1346 prt_str(out, "/"); 1367 1347 prt_units_u64(out, 512ULL << sb->layout.sb_max_size_bits); 1368 1348 prt_newline(out); 1369 1349 1370 - prt_printf(out, "Clean:"); 1371 - prt_tab(out); 1372 - prt_printf(out, "%llu", BCH_SB_CLEAN(sb)); 1373 - prt_newline(out); 1350 + prt_printf(out, "Clean:\t%llu\n", BCH_SB_CLEAN(sb)); 1351 + prt_printf(out, "Devices:\t%u\n", nr_devices); 1374 1352 1375 - prt_printf(out, "Devices:"); 1376 - prt_tab(out); 1377 - prt_printf(out, "%u", nr_devices); 1378 - prt_newline(out); 1379 - 1380 - prt_printf(out, "Sections:"); 1353 + prt_printf(out, "Sections:\t"); 1381 1354 vstruct_for_each(sb, f) 1382 1355 fields_have |= 1 << le32_to_cpu(f->type); 1383 - prt_tab(out); 1384 1356 prt_bitflags(out, bch2_sb_fields, fields_have); 1385 1357 prt_newline(out); 1386 1358 1387 - prt_printf(out, "Features:"); 1388 - prt_tab(out); 1359 + prt_printf(out, "Features:\t"); 1389 1360 prt_bitflags(out, bch2_sb_features, le64_to_cpu(sb->features[0])); 1390 1361 prt_newline(out); 1391 1362 1392 - prt_printf(out, "Compat features:"); 1393 - prt_tab(out); 1363 + prt_printf(out, "Compat features:\t"); 1394 1364 prt_bitflags(out, bch2_sb_compat, le64_to_cpu(sb->compat[0])); 1395 1365 prt_newline(out); 1396 1366 ··· 1383 1401 if (opt->get_sb != BCH2_NO_SB_OPT) { 1384 1402 u64 v = bch2_opt_from_sb(sb, id); 1385 1403 1386 - prt_printf(out, "%s:", opt->attr.name); 1387 - prt_tab(out); 1404 + prt_printf(out, "%s:\t", opt->attr.name); 1388 1405 bch2_opt_to_text(out, NULL, sb, opt, v, 1389 1406 OPT_HUMAN_READABLE|OPT_SHOW_FULL_LIST); 1390 1407 prt_newline(out);
+2 -1
fs/bcachefs/super-io.h
··· 51 51 extern const char * const bch2_sb_fields[]; 52 52 53 53 struct bch_sb_field_ops { 54 - int (*validate)(struct bch_sb *, struct bch_sb_field *, struct printbuf *); 54 + int (*validate)(struct bch_sb *, struct bch_sb_field *, 55 + enum bch_validate_flags, struct printbuf *); 55 56 void (*to_text)(struct printbuf *, struct bch_sb *, struct bch_sb_field *); 56 57 }; 57 58
+65 -47
fs/bcachefs/super.c
··· 264 264 bch2_open_buckets_stop(c, NULL, true); 265 265 bch2_rebalance_stop(c); 266 266 bch2_copygc_stop(c); 267 - bch2_gc_thread_stop(c); 268 267 bch2_fs_ec_flush(c); 269 268 270 269 bch_verbose(c, "flushing journal and stopping allocators, journal seq %llu", ··· 284 285 bch_verbose(c, "flushing journal and stopping allocators complete, journal seq %llu", 285 286 journal_cur_seq(&c->journal)); 286 287 287 - if (test_bit(JOURNAL_REPLAY_DONE, &c->journal.flags) && 288 + if (test_bit(JOURNAL_replay_done, &c->journal.flags) && 288 289 !test_bit(BCH_FS_emergency_ro, &c->flags)) 289 290 set_bit(BCH_FS_clean_shutdown, &c->flags); 290 291 ··· 466 467 * overwriting whatever was there previously, and there must always be 467 468 * at least one non-flush write in the journal or recovery will fail: 468 469 */ 469 - set_bit(JOURNAL_NEED_FLUSH_WRITE, &c->journal.flags); 470 + set_bit(JOURNAL_need_flush_write, &c->journal.flags); 471 + set_bit(JOURNAL_running, &c->journal.flags); 470 472 471 473 for_each_rw_member(c, ca) 472 474 bch2_dev_allocator_add(c, ca); ··· 484 484 atomic_long_inc(&c->writes[i]); 485 485 } 486 486 #endif 487 - 488 - ret = bch2_gc_thread_start(c); 489 - if (ret) { 490 - bch_err(c, "error starting gc thread"); 491 - return ret; 492 - } 493 487 494 488 ret = bch2_journal_reclaim_start(&c->journal); 495 489 if (ret) ··· 531 537 532 538 static void __bch2_fs_free(struct bch_fs *c) 533 539 { 534 - unsigned i; 535 - 536 - for (i = 0; i < BCH_TIME_STAT_NR; i++) 540 + for (unsigned i = 0; i < BCH_TIME_STAT_NR; i++) 537 541 bch2_time_stats_exit(&c->times[i]); 538 542 539 543 bch2_find_btree_nodes_exit(&c->found_btree_nodes); ··· 564 572 BUG_ON(atomic_read(&c->journal_keys.ref)); 565 573 bch2_fs_btree_write_buffer_exit(c); 566 574 percpu_free_rwsem(&c->mark_lock); 575 + EBUG_ON(percpu_u64_get(c->online_reserved)); 567 576 free_percpu(c->online_reserved); 568 577 569 578 darray_exit(&c->btree_roots_extra); ··· 608 615 bch_verbose(c, "shutting down"); 609 616 610 617 set_bit(BCH_FS_stopping, &c->flags); 611 - 612 - cancel_work_sync(&c->journal_seq_blacklist_gc_work); 613 618 614 619 down_write(&c->state_lock); 615 620 bch2_fs_read_only(c); ··· 656 665 struct bch_dev *ca = rcu_dereference_protected(c->devs[i], true); 657 666 658 667 if (ca) { 668 + EBUG_ON(atomic_long_read(&ca->ref) != 1); 659 669 bch2_free_super(&ca->disk_sb); 660 670 bch2_dev_free(ca); 661 671 } ··· 711 719 ret = bch2_dev_sysfs_online(c, ca); 712 720 if (ret) { 713 721 bch_err(c, "error creating sysfs objects"); 714 - percpu_ref_put(&ca->ref); 722 + bch2_dev_put(ca); 715 723 goto err; 716 724 } 717 725 } ··· 770 778 for (i = 0; i < BCH_TIME_STAT_NR; i++) 771 779 bch2_time_stats_init(&c->times[i]); 772 780 781 + bch2_fs_gc_init(c); 773 782 bch2_fs_copygc_init(c); 774 783 bch2_fs_btree_key_cache_init_early(&c->btree_key_cache); 775 784 bch2_fs_btree_iter_init_early(c); ··· 793 800 794 801 spin_lock_init(&c->btree_write_error_lock); 795 802 796 - INIT_WORK(&c->journal_seq_blacklist_gc_work, 797 - bch2_blacklist_entries_gc); 798 - 799 803 INIT_LIST_HEAD(&c->journal_iters); 800 804 801 805 INIT_LIST_HEAD(&c->fsck_error_msgs); 802 806 mutex_init(&c->fsck_error_msgs_lock); 803 - 804 - seqcount_init(&c->gc_pos_lock); 805 807 806 808 seqcount_init(&c->usage_lock); 807 809 ··· 928 940 goto err; 929 941 930 942 for (i = 0; i < c->sb.nr_devices; i++) 931 - if (bch2_dev_exists(c->disk_sb.sb, i) && 943 + if (bch2_member_exists(c->disk_sb.sb, i) && 932 944 bch2_dev_alloc(c, i)) { 933 945 ret = -EEXIST; 934 946 goto err; ··· 1089 1101 if (!uuid_equal(&fs->sb->uuid, &sb->sb->uuid)) 1090 1102 return -BCH_ERR_device_not_a_member_of_filesystem; 1091 1103 1092 - if (!bch2_dev_exists(fs->sb, sb->sb->dev_idx)) 1104 + if (!bch2_member_exists(fs->sb, sb->sb->dev_idx)) 1093 1105 return -BCH_ERR_device_has_been_removed; 1094 1106 1095 1107 if (fs->sb->block_size != sb->sb->block_size) ··· 1188 1200 if (ca->kobj.state_in_sysfs) 1189 1201 kobject_del(&ca->kobj); 1190 1202 1203 + kfree(ca->buckets_nouse); 1191 1204 bch2_free_super(&ca->disk_sb); 1192 1205 bch2_dev_journal_exit(ca); 1193 1206 1194 1207 free_percpu(ca->io_done); 1195 - bioset_exit(&ca->replica_set); 1196 1208 bch2_dev_buckets_free(ca); 1197 1209 free_page((unsigned long) ca->sb_read_scratch); 1198 1210 ··· 1200 1212 bch2_time_stats_quantiles_exit(&ca->io_latency[READ]); 1201 1213 1202 1214 percpu_ref_exit(&ca->io_ref); 1215 + #ifndef CONFIG_BCACHEFS_DEBUG 1203 1216 percpu_ref_exit(&ca->ref); 1217 + #endif 1204 1218 kobject_put(&ca->kobj); 1205 1219 } 1206 1220 ··· 1229 1239 bch2_dev_journal_exit(ca); 1230 1240 } 1231 1241 1242 + #ifndef CONFIG_BCACHEFS_DEBUG 1232 1243 static void bch2_dev_ref_complete(struct percpu_ref *ref) 1233 1244 { 1234 1245 struct bch_dev *ca = container_of(ref, struct bch_dev, ref); 1235 1246 1236 1247 complete(&ca->ref_completion); 1237 1248 } 1249 + #endif 1238 1250 1239 1251 static void bch2_dev_io_ref_complete(struct percpu_ref *ref) 1240 1252 { ··· 1305 1313 ca->nr_btree_reserve = DIV_ROUND_UP(BTREE_NODE_RESERVE, 1306 1314 ca->mi.bucket_size / btree_sectors(c)); 1307 1315 1308 - if (percpu_ref_init(&ca->ref, bch2_dev_ref_complete, 1309 - 0, GFP_KERNEL) || 1310 - percpu_ref_init(&ca->io_ref, bch2_dev_io_ref_complete, 1316 + #ifndef CONFIG_BCACHEFS_DEBUG 1317 + if (percpu_ref_init(&ca->ref, bch2_dev_ref_complete, 0, GFP_KERNEL)) 1318 + goto err; 1319 + #else 1320 + atomic_long_set(&ca->ref, 1); 1321 + #endif 1322 + 1323 + if (percpu_ref_init(&ca->io_ref, bch2_dev_io_ref_complete, 1311 1324 PERCPU_REF_INIT_DEAD, GFP_KERNEL) || 1312 1325 !(ca->sb_read_scratch = (void *) __get_free_page(GFP_KERNEL)) || 1313 1326 bch2_dev_buckets_alloc(c, ca) || 1314 - bioset_init(&ca->replica_set, 4, 1315 - offsetof(struct bch_write_bio, bio), 0) || 1316 1327 !(ca->io_done = alloc_percpu(*ca->io_done))) 1317 1328 goto err; 1318 1329 ··· 1406 1411 le64_to_cpu(c->disk_sb.sb->seq)) 1407 1412 bch2_sb_to_fs(c, sb->sb); 1408 1413 1409 - BUG_ON(sb->sb->dev_idx >= c->sb.nr_devices || 1410 - !c->devs[sb->sb->dev_idx]); 1414 + BUG_ON(!bch2_dev_exists(c, sb->sb->dev_idx)); 1411 1415 1412 - ca = bch_dev_locked(c, sb->sb->dev_idx); 1416 + ca = bch2_dev_locked(c, sb->sb->dev_idx); 1413 1417 1414 1418 ret = __bch2_dev_attach_bdev(ca, sb); 1415 1419 if (ret) ··· 1500 1506 mutex_lock(&c->sb_lock); 1501 1507 1502 1508 for (i = 0; i < c->disk_sb.sb->nr_devices; i++) { 1503 - if (!bch2_dev_exists(c->disk_sb.sb, i)) 1509 + if (!bch2_member_exists(c->disk_sb.sb, i)) 1504 1510 continue; 1505 1511 1506 - ca = bch_dev_locked(c, i); 1512 + ca = bch2_dev_locked(c, i); 1507 1513 1508 1514 if (!bch2_dev_is_online(ca) && 1509 1515 (ca->mi.state == BCH_MEMBER_STATE_rw || ··· 1593 1599 * with bch2_do_invalidates() and bch2_do_discards() 1594 1600 */ 1595 1601 ret = bch2_btree_delete_range(c, BTREE_ID_lru, start, end, 1596 - BTREE_TRIGGER_NORUN, NULL) ?: 1602 + BTREE_TRIGGER_norun, NULL) ?: 1597 1603 bch2_btree_delete_range(c, BTREE_ID_need_discard, start, end, 1598 - BTREE_TRIGGER_NORUN, NULL) ?: 1604 + BTREE_TRIGGER_norun, NULL) ?: 1599 1605 bch2_btree_delete_range(c, BTREE_ID_freespace, start, end, 1600 - BTREE_TRIGGER_NORUN, NULL) ?: 1606 + BTREE_TRIGGER_norun, NULL) ?: 1601 1607 bch2_btree_delete_range(c, BTREE_ID_backpointers, start, end, 1602 - BTREE_TRIGGER_NORUN, NULL) ?: 1608 + BTREE_TRIGGER_norun, NULL) ?: 1603 1609 bch2_btree_delete_range(c, BTREE_ID_alloc, start, end, 1604 - BTREE_TRIGGER_NORUN, NULL) ?: 1610 + BTREE_TRIGGER_norun, NULL) ?: 1605 1611 bch2_btree_delete_range(c, BTREE_ID_bucket_gens, start, end, 1606 - BTREE_TRIGGER_NORUN, NULL); 1612 + BTREE_TRIGGER_norun, NULL); 1607 1613 bch_err_msg(c, ret, "removing dev alloc info"); 1608 1614 return ret; 1609 1615 } ··· 1620 1626 * We consume a reference to ca->ref, regardless of whether we succeed 1621 1627 * or fail: 1622 1628 */ 1623 - percpu_ref_put(&ca->ref); 1629 + bch2_dev_put(ca); 1624 1630 1625 1631 if (!bch2_dev_state_allowed(c, ca, BCH_MEMBER_STATE_failed, flags)) { 1626 1632 bch_err(ca, "Cannot remove without losing data"); ··· 1672 1678 rcu_assign_pointer(c->devs[ca->dev_idx], NULL); 1673 1679 mutex_unlock(&c->sb_lock); 1674 1680 1681 + #ifndef CONFIG_BCACHEFS_DEBUG 1675 1682 percpu_ref_kill(&ca->ref); 1683 + #else 1684 + ca->dying = true; 1685 + bch2_dev_put(ca); 1686 + #endif 1676 1687 wait_for_completion(&ca->ref_completion); 1677 1688 1678 1689 bch2_dev_free(ca); ··· 1776 1777 if (dynamic_fault("bcachefs:add:no_slot")) 1777 1778 goto no_slot; 1778 1779 1779 - for (dev_idx = 0; dev_idx < BCH_SB_MEMBERS_MAX; dev_idx++) 1780 - if (!bch2_dev_exists(c->disk_sb.sb, dev_idx)) 1781 - goto have_slot; 1780 + if (c->sb.nr_devices < BCH_SB_MEMBERS_MAX) { 1781 + dev_idx = c->sb.nr_devices; 1782 + goto have_slot; 1783 + } 1784 + 1785 + int best = -1; 1786 + u64 best_last_mount = 0; 1787 + for (dev_idx = 0; dev_idx < BCH_SB_MEMBERS_MAX; dev_idx++) { 1788 + struct bch_member m = bch2_sb_member_get(c->disk_sb.sb, dev_idx); 1789 + if (bch2_member_alive(&m)) 1790 + continue; 1791 + 1792 + u64 last_mount = le64_to_cpu(m.last_mount); 1793 + if (best < 0 || last_mount < best_last_mount) { 1794 + best = dev_idx; 1795 + best_last_mount = last_mount; 1796 + } 1797 + } 1798 + if (best >= 0) { 1799 + dev_idx = best; 1800 + goto have_slot; 1801 + } 1782 1802 no_slot: 1783 1803 ret = -BCH_ERR_ENOSPC_sb_members; 1784 1804 bch_err_msg(c, ret, "setting up new superblock"); ··· 1839 1821 1840 1822 bch2_dev_usage_journal_reserve(c); 1841 1823 1842 - ret = bch2_trans_mark_dev_sb(c, ca); 1824 + ret = bch2_trans_mark_dev_sb(c, ca, BTREE_TRIGGER_transactional); 1843 1825 bch_err_msg(ca, ret, "marking new superblock"); 1844 1826 if (ret) 1845 1827 goto err_late; ··· 1902 1884 if (ret) 1903 1885 goto err; 1904 1886 1905 - ca = bch_dev_locked(c, dev_idx); 1887 + ca = bch2_dev_locked(c, dev_idx); 1906 1888 1907 - ret = bch2_trans_mark_dev_sb(c, ca); 1889 + ret = bch2_trans_mark_dev_sb(c, ca, BTREE_TRIGGER_transactional); 1908 1890 bch_err_msg(c, ret, "bringing %s online: error from bch2_trans_mark_dev_sb", path); 1909 1891 if (ret) 1910 1892 goto err; ··· 1997 1979 if (ret) 1998 1980 goto err; 1999 1981 2000 - ret = bch2_trans_mark_dev_sb(c, ca); 1982 + ret = bch2_trans_mark_dev_sb(c, ca, BTREE_TRIGGER_transactional); 2001 1983 if (ret) 2002 1984 goto err; 2003 1985
-15
fs/bcachefs/super_types.h
··· 26 26 u8 data[BCH_BKEY_PTRS_MAX]; 27 27 }; 28 28 29 - struct bch_member_cpu { 30 - u64 nbuckets; /* device size */ 31 - u16 first_bucket; /* index of first bucket used */ 32 - u16 bucket_size; /* sectors */ 33 - u16 group; 34 - u8 state; 35 - u8 discard; 36 - u8 data_allowed; 37 - u8 durability; 38 - u8 freespace_initialized; 39 - u8 valid; 40 - u8 btree_bitmap_shift; 41 - u64 btree_allocated_bitmap; 42 - }; 43 - 44 29 #endif /* _BCACHEFS_SUPER_TYPES_H */
+25 -151
fs/bcachefs/sysfs.c
··· 140 140 write_attribute(trigger_discards); 141 141 write_attribute(trigger_invalidates); 142 142 write_attribute(trigger_journal_flush); 143 - write_attribute(prune_cache); 144 - write_attribute(btree_wakeup); 145 - rw_attribute(btree_gc_periodic); 143 + write_attribute(trigger_btree_cache_shrink); 144 + write_attribute(trigger_btree_key_cache_shrink); 146 145 rw_attribute(gc_gens_pos); 147 146 148 147 read_attribute(uuid); ··· 188 189 { 189 190 bch2_printbuf_tabstop_push(out, 24); 190 191 191 - for (unsigned i = 0; i < ARRAY_SIZE(c->writes); i++) { 192 - prt_str(out, bch2_write_refs[i]); 193 - prt_tab(out); 194 - prt_printf(out, "%li", atomic_long_read(&c->writes[i])); 195 - prt_newline(out); 196 - } 192 + for (unsigned i = 0; i < ARRAY_SIZE(c->writes); i++) 193 + prt_printf(out, "%s\t%li\n", bch2_write_refs[i], atomic_long_read(&c->writes[i])); 197 194 } 198 195 #endif 199 196 ··· 273 278 continue; 274 279 275 280 ret = for_each_btree_key(trans, iter, id, POS_MIN, 276 - BTREE_ITER_ALL_SNAPSHOTS, k, ({ 281 + BTREE_ITER_all_snapshots, k, ({ 277 282 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 278 283 struct bch_extent_crc_unpacked crc; 279 284 const union bch_extent_entry *entry; ··· 308 313 if (ret) 309 314 return ret; 310 315 311 - prt_str(out, "type"); 312 316 printbuf_tabstop_push(out, 12); 313 - prt_tab(out); 314 - 315 - prt_str(out, "compressed"); 316 317 printbuf_tabstop_push(out, 16); 317 - prt_tab_rjust(out); 318 - 319 - prt_str(out, "uncompressed"); 320 318 printbuf_tabstop_push(out, 16); 321 - prt_tab_rjust(out); 322 - 323 - prt_str(out, "average extent size"); 324 319 printbuf_tabstop_push(out, 24); 325 - prt_tab_rjust(out); 326 - prt_newline(out); 320 + prt_printf(out, "type\tcompressed\runcompressed\raverage extent size\r\n"); 327 321 328 322 for (unsigned i = 0; i < ARRAY_SIZE(s); i++) { 329 323 bch2_prt_compression_type(out, i); ··· 346 362 prt_printf(out, "\n"); 347 363 } 348 364 349 - static void bch2_btree_wakeup_all(struct bch_fs *c) 350 - { 351 - struct btree_trans *trans; 352 - 353 - seqmutex_lock(&c->btree_trans_lock); 354 - list_for_each_entry(trans, &c->btree_trans_list, list) { 355 - struct btree_bkey_cached_common *b = READ_ONCE(trans->locking); 356 - 357 - if (b) 358 - six_lock_wakeup_all(&b->lock); 359 - 360 - } 361 - seqmutex_unlock(&c->btree_trans_lock); 362 - } 363 - 364 365 SHOW(bch2_fs) 365 366 { 366 367 struct bch_fs *c = container_of(kobj, struct bch_fs, kobj); ··· 360 391 361 392 if (attr == &sysfs_btree_write_stats) 362 393 bch2_btree_write_stats_to_text(out, c); 363 - 364 - sysfs_printf(btree_gc_periodic, "%u", (int) c->btree_gc_periodic); 365 394 366 395 if (attr == &sysfs_gc_gens_pos) 367 396 bch2_gc_gens_pos_to_text(out, c); ··· 383 416 bch2_journal_debug_to_text(out, &c->journal); 384 417 385 418 if (attr == &sysfs_btree_cache) 386 - bch2_btree_cache_to_text(out, c); 419 + bch2_btree_cache_to_text(out, &c->btree_cache); 387 420 388 421 if (attr == &sysfs_btree_key_cache) 389 422 bch2_btree_key_cache_to_text(out, &c->btree_key_cache); ··· 426 459 if (attr == &sysfs_disk_groups) 427 460 bch2_disk_groups_to_text(out, c); 428 461 462 + if (attr == &sysfs_alloc_debug) 463 + bch2_fs_alloc_debug_to_text(out, c); 464 + 429 465 return 0; 430 466 } 431 467 432 468 STORE(bch2_fs) 433 469 { 434 470 struct bch_fs *c = container_of(kobj, struct bch_fs, kobj); 435 - 436 - if (attr == &sysfs_btree_gc_periodic) { 437 - ssize_t ret = strtoul_safe(buf, c->btree_gc_periodic) 438 - ?: (ssize_t) size; 439 - 440 - wake_up_process(c->gc_thread); 441 - return ret; 442 - } 443 471 444 472 if (attr == &sysfs_copy_gc_enabled) { 445 473 ssize_t ret = strtoul_safe(buf, c->copy_gc_enabled) ··· 467 505 if (!bch2_write_ref_tryget(c, BCH_WRITE_REF_sysfs)) 468 506 return -EROFS; 469 507 470 - if (attr == &sysfs_prune_cache) { 508 + if (attr == &sysfs_trigger_btree_cache_shrink) { 471 509 struct shrink_control sc; 472 510 473 511 sc.gfp_mask = GFP_KERNEL; ··· 475 513 c->btree_cache.shrink->scan_objects(c->btree_cache.shrink, &sc); 476 514 } 477 515 478 - if (attr == &sysfs_btree_wakeup) 479 - bch2_btree_wakeup_all(c); 516 + if (attr == &sysfs_trigger_btree_key_cache_shrink) { 517 + struct shrink_control sc; 480 518 481 - if (attr == &sysfs_trigger_gc) { 482 - /* 483 - * Full gc is currently incompatible with btree key cache: 484 - */ 485 - #if 0 486 - down_read(&c->state_lock); 487 - bch2_gc(c, false, false); 488 - up_read(&c->state_lock); 489 - #else 490 - bch2_gc_gens(c); 491 - #endif 519 + sc.gfp_mask = GFP_KERNEL; 520 + sc.nr_to_scan = strtoul_or_return(buf); 521 + c->btree_key_cache.shrink->scan_objects(c->btree_cache.shrink, &sc); 492 522 } 523 + 524 + if (attr == &sysfs_trigger_gc) 525 + bch2_gc_gens(c); 493 526 494 527 if (attr == &sysfs_trigger_discards) 495 528 bch2_do_discards(c); ··· 551 594 if (attr == &sysfs_##t) { \ 552 595 counter = percpu_u64_get(&c->counters[BCH_COUNTER_##t]);\ 553 596 counter_since_mount = counter - c->counters_on_mount[BCH_COUNTER_##t];\ 554 - prt_printf(out, "since mount:"); \ 555 - prt_tab(out); \ 597 + prt_printf(out, "since mount:\t"); \ 556 598 prt_human_readable_u64(out, counter_since_mount); \ 557 599 prt_newline(out); \ 558 600 \ 559 - prt_printf(out, "since filesystem creation:"); \ 560 - prt_tab(out); \ 601 + prt_printf(out, "since filesystem creation:\t"); \ 561 602 prt_human_readable_u64(out, counter); \ 562 603 prt_newline(out); \ 563 604 } ··· 615 660 &sysfs_trigger_discards, 616 661 &sysfs_trigger_invalidates, 617 662 &sysfs_trigger_journal_flush, 618 - &sysfs_prune_cache, 619 - &sysfs_btree_wakeup, 663 + &sysfs_trigger_btree_cache_shrink, 664 + &sysfs_trigger_btree_key_cache_shrink, 620 665 621 666 &sysfs_gc_gens_pos, 622 667 ··· 632 677 &sysfs_internal_uuid, 633 678 634 679 &sysfs_disk_groups, 680 + &sysfs_alloc_debug, 635 681 NULL 636 682 }; 637 683 ··· 748 792 NULL 749 793 }; 750 794 751 - static void dev_alloc_debug_to_text(struct printbuf *out, struct bch_dev *ca) 752 - { 753 - struct bch_fs *c = ca->fs; 754 - struct bch_dev_usage stats = bch2_dev_usage_read(ca); 755 - unsigned i, nr[BCH_DATA_NR]; 756 - 757 - memset(nr, 0, sizeof(nr)); 758 - 759 - for (i = 0; i < ARRAY_SIZE(c->open_buckets); i++) 760 - nr[c->open_buckets[i].data_type]++; 761 - 762 - printbuf_tabstop_push(out, 8); 763 - printbuf_tabstop_push(out, 16); 764 - printbuf_tabstop_push(out, 16); 765 - printbuf_tabstop_push(out, 16); 766 - printbuf_tabstop_push(out, 16); 767 - 768 - bch2_dev_usage_to_text(out, &stats); 769 - 770 - prt_newline(out); 771 - 772 - prt_printf(out, "reserves:"); 773 - prt_newline(out); 774 - for (i = 0; i < BCH_WATERMARK_NR; i++) { 775 - prt_str(out, bch2_watermarks[i]); 776 - prt_tab(out); 777 - prt_u64(out, bch2_dev_buckets_reserved(ca, i)); 778 - prt_tab_rjust(out); 779 - prt_newline(out); 780 - } 781 - 782 - prt_newline(out); 783 - 784 - printbuf_tabstops_reset(out); 785 - printbuf_tabstop_push(out, 24); 786 - 787 - prt_str(out, "freelist_wait"); 788 - prt_tab(out); 789 - prt_str(out, c->freelist_wait.list.first ? "waiting" : "empty"); 790 - prt_newline(out); 791 - 792 - prt_str(out, "open buckets allocated"); 793 - prt_tab(out); 794 - prt_u64(out, OPEN_BUCKETS_COUNT - c->open_buckets_nr_free); 795 - prt_newline(out); 796 - 797 - prt_str(out, "open buckets this dev"); 798 - prt_tab(out); 799 - prt_u64(out, ca->nr_open_buckets); 800 - prt_newline(out); 801 - 802 - prt_str(out, "open buckets total"); 803 - prt_tab(out); 804 - prt_u64(out, OPEN_BUCKETS_COUNT); 805 - prt_newline(out); 806 - 807 - prt_str(out, "open_buckets_wait"); 808 - prt_tab(out); 809 - prt_str(out, c->open_buckets_wait.list.first ? "waiting" : "empty"); 810 - prt_newline(out); 811 - 812 - prt_str(out, "open_buckets_btree"); 813 - prt_tab(out); 814 - prt_u64(out, nr[BCH_DATA_btree]); 815 - prt_newline(out); 816 - 817 - prt_str(out, "open_buckets_user"); 818 - prt_tab(out); 819 - prt_u64(out, nr[BCH_DATA_user]); 820 - prt_newline(out); 821 - 822 - prt_str(out, "buckets_to_invalidate"); 823 - prt_tab(out); 824 - prt_u64(out, should_invalidate_buckets(ca, stats)); 825 - prt_newline(out); 826 - 827 - prt_str(out, "btree reserve cache"); 828 - prt_tab(out); 829 - prt_u64(out, c->btree_reserve_cache_nr); 830 - prt_newline(out); 831 - } 832 - 833 795 static const char * const bch2_rw[] = { 834 796 "read", 835 797 "write", ··· 817 943 * 100 / CONGESTED_MAX); 818 944 819 945 if (attr == &sysfs_alloc_debug) 820 - dev_alloc_debug_to_text(out, ca); 946 + bch2_dev_alloc_debug_to_text(out, ca); 821 947 822 948 return 0; 823 949 }
+8 -8
fs/bcachefs/tests.c
··· 40 40 k.k.p.snapshot = U32_MAX; 41 41 42 42 bch2_trans_iter_init(trans, &iter, BTREE_ID_xattrs, k.k.p, 43 - BTREE_ITER_INTENT); 43 + BTREE_ITER_intent); 44 44 45 45 ret = commit_do(trans, NULL, NULL, 0, 46 46 bch2_btree_iter_traverse(&iter) ?: ··· 81 81 k.k.p.snapshot = U32_MAX; 82 82 83 83 bch2_trans_iter_init(trans, &iter, BTREE_ID_xattrs, k.k.p, 84 - BTREE_ITER_INTENT); 84 + BTREE_ITER_intent); 85 85 86 86 ret = commit_do(trans, NULL, NULL, 0, 87 87 bch2_btree_iter_traverse(&iter) ?: ··· 261 261 ret = bch2_trans_run(c, 262 262 for_each_btree_key_upto(trans, iter, BTREE_ID_xattrs, 263 263 SPOS(0, 0, U32_MAX), POS(0, U64_MAX), 264 - BTREE_ITER_SLOTS, k, ({ 264 + BTREE_ITER_slots, k, ({ 265 265 if (i >= nr * 2) 266 266 break; 267 267 ··· 322 322 ret = bch2_trans_run(c, 323 323 for_each_btree_key_upto(trans, iter, BTREE_ID_extents, 324 324 SPOS(0, 0, U32_MAX), POS(0, U64_MAX), 325 - BTREE_ITER_SLOTS, k, ({ 325 + BTREE_ITER_slots, k, ({ 326 326 if (i == nr) 327 327 break; 328 328 BUG_ON(bkey_deleted(k.k) != !(i % 16)); ··· 452 452 453 453 ret = bch2_trans_do(c, NULL, NULL, 0, 454 454 bch2_btree_insert_nonextent(trans, BTREE_ID_extents, &k.k_i, 455 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE)); 455 + BTREE_UPDATE_internal_snapshot_node)); 456 456 bch_err_fn(c, ret); 457 457 return ret; 458 458 } ··· 671 671 int ret = 0; 672 672 673 673 bch2_trans_iter_init(trans, &iter, BTREE_ID_xattrs, pos, 674 - BTREE_ITER_INTENT); 674 + BTREE_ITER_intent); 675 675 k = bch2_btree_iter_peek_upto(&iter, POS(0, U64_MAX)); 676 676 ret = bkey_err(k); 677 677 if (ret) ··· 714 714 return bch2_trans_run(c, 715 715 for_each_btree_key_commit(trans, iter, BTREE_ID_xattrs, 716 716 SPOS(0, 0, U32_MAX), 717 - BTREE_ITER_SLOTS|BTREE_ITER_INTENT, k, 717 + BTREE_ITER_slots|BTREE_ITER_intent, k, 718 718 NULL, NULL, 0, ({ 719 719 if (iter.pos.offset >= nr) 720 720 break; ··· 737 737 return bch2_trans_run(c, 738 738 for_each_btree_key_commit(trans, iter, BTREE_ID_xattrs, 739 739 SPOS(0, 0, U32_MAX), 740 - BTREE_ITER_INTENT, k, 740 + BTREE_ITER_intent, k, 741 741 NULL, NULL, 0, ({ 742 742 struct bkey_i_cookie u; 743 743
+6 -91
fs/bcachefs/trace.h
··· 638 638 639 639 /* Allocator */ 640 640 641 - DECLARE_EVENT_CLASS(bucket_alloc, 642 - TP_PROTO(struct bch_dev *ca, const char *alloc_reserve, 643 - u64 bucket, 644 - u64 free, 645 - u64 avail, 646 - u64 copygc_wait_amount, 647 - s64 copygc_waiting_for, 648 - struct bucket_alloc_state *s, 649 - bool nonblocking, 650 - const char *err), 651 - TP_ARGS(ca, alloc_reserve, bucket, free, avail, 652 - copygc_wait_amount, copygc_waiting_for, 653 - s, nonblocking, err), 654 - 655 - TP_STRUCT__entry( 656 - __field(u8, dev ) 657 - __array(char, reserve, 16 ) 658 - __field(u64, bucket ) 659 - __field(u64, free ) 660 - __field(u64, avail ) 661 - __field(u64, copygc_wait_amount ) 662 - __field(s64, copygc_waiting_for ) 663 - __field(u64, seen ) 664 - __field(u64, open ) 665 - __field(u64, need_journal_commit ) 666 - __field(u64, nouse ) 667 - __field(bool, nonblocking ) 668 - __field(u64, nocow ) 669 - __array(char, err, 32 ) 670 - ), 671 - 672 - TP_fast_assign( 673 - __entry->dev = ca->dev_idx; 674 - strscpy(__entry->reserve, alloc_reserve, sizeof(__entry->reserve)); 675 - __entry->bucket = bucket; 676 - __entry->free = free; 677 - __entry->avail = avail; 678 - __entry->copygc_wait_amount = copygc_wait_amount; 679 - __entry->copygc_waiting_for = copygc_waiting_for; 680 - __entry->seen = s->buckets_seen; 681 - __entry->open = s->skipped_open; 682 - __entry->need_journal_commit = s->skipped_need_journal_commit; 683 - __entry->nouse = s->skipped_nouse; 684 - __entry->nonblocking = nonblocking; 685 - __entry->nocow = s->skipped_nocow; 686 - strscpy(__entry->err, err, sizeof(__entry->err)); 687 - ), 688 - 689 - TP_printk("reserve %s bucket %u:%llu free %llu avail %llu copygc_wait %llu/%lli seen %llu open %llu need_journal_commit %llu nouse %llu nocow %llu nonblocking %u err %s", 690 - __entry->reserve, 691 - __entry->dev, 692 - __entry->bucket, 693 - __entry->free, 694 - __entry->avail, 695 - __entry->copygc_wait_amount, 696 - __entry->copygc_waiting_for, 697 - __entry->seen, 698 - __entry->open, 699 - __entry->need_journal_commit, 700 - __entry->nouse, 701 - __entry->nocow, 702 - __entry->nonblocking, 703 - __entry->err) 641 + DEFINE_EVENT(fs_str, bucket_alloc, 642 + TP_PROTO(struct bch_fs *c, const char *str), 643 + TP_ARGS(c, str) 704 644 ); 705 645 706 - DEFINE_EVENT(bucket_alloc, bucket_alloc, 707 - TP_PROTO(struct bch_dev *ca, const char *alloc_reserve, 708 - u64 bucket, 709 - u64 free, 710 - u64 avail, 711 - u64 copygc_wait_amount, 712 - s64 copygc_waiting_for, 713 - struct bucket_alloc_state *s, 714 - bool nonblocking, 715 - const char *err), 716 - TP_ARGS(ca, alloc_reserve, bucket, free, avail, 717 - copygc_wait_amount, copygc_waiting_for, 718 - s, nonblocking, err) 719 - ); 720 - 721 - DEFINE_EVENT(bucket_alloc, bucket_alloc_fail, 722 - TP_PROTO(struct bch_dev *ca, const char *alloc_reserve, 723 - u64 bucket, 724 - u64 free, 725 - u64 avail, 726 - u64 copygc_wait_amount, 727 - s64 copygc_waiting_for, 728 - struct bucket_alloc_state *s, 729 - bool nonblocking, 730 - const char *err), 731 - TP_ARGS(ca, alloc_reserve, bucket, free, avail, 732 - copygc_wait_amount, copygc_waiting_for, 733 - s, nonblocking, err) 646 + DEFINE_EVENT(fs_str, bucket_alloc_fail, 647 + TP_PROTO(struct bch_fs *c, const char *str), 648 + TP_ARGS(c, str) 734 649 ); 735 650 736 651 TRACE_EVENT(discard_buckets,
+17 -44
fs/bcachefs/util.c
··· 348 348 { 349 349 const struct time_unit *u = bch2_pick_time_units(ns); 350 350 351 - prt_printf(out, "%llu ", div64_u64(ns, u->nsecs)); 352 - prt_tab_rjust(out); 353 - prt_printf(out, "%s", u->name); 351 + prt_printf(out, "%llu \r%s", div64_u64(ns, u->nsecs), u->name); 354 352 } 355 353 356 354 static inline void pr_name_and_units(struct printbuf *out, const char *name, u64 ns) 357 355 { 358 - prt_str(out, name); 359 - prt_tab(out); 356 + prt_printf(out, "%s\t", name); 360 357 bch2_pr_time_units_aligned(out, ns); 361 358 prt_newline(out); 362 359 } ··· 386 389 } 387 390 388 391 printbuf_tabstop_push(out, out->indent + TABSTOP_SIZE); 389 - prt_printf(out, "count:"); 390 - prt_tab(out); 391 - prt_printf(out, "%llu ", 392 - stats->duration_stats.n); 392 + prt_printf(out, "count:\t%llu\n", stats->duration_stats.n); 393 393 printbuf_tabstop_pop(out); 394 - prt_newline(out); 395 394 396 395 printbuf_tabstops_reset(out); 397 396 ··· 396 403 printbuf_tabstop_push(out, 0); 397 404 printbuf_tabstop_push(out, TABSTOP_SIZE + 2); 398 405 399 - prt_tab(out); 400 - prt_printf(out, "since mount"); 401 - prt_tab_rjust(out); 402 - prt_tab(out); 406 + prt_printf(out, "\tsince mount\r\trecent\r\n"); 403 407 prt_printf(out, "recent"); 404 - prt_tab_rjust(out); 405 - prt_newline(out); 406 408 407 409 printbuf_tabstops_reset(out); 408 410 printbuf_tabstop_push(out, out->indent + 20); ··· 405 417 printbuf_tabstop_push(out, 2); 406 418 printbuf_tabstop_push(out, TABSTOP_SIZE); 407 419 408 - prt_printf(out, "duration of events"); 409 - prt_newline(out); 420 + prt_printf(out, "duration of events\n"); 410 421 printbuf_indent_add(out, 2); 411 422 412 423 pr_name_and_units(out, "min:", stats->min_duration); 413 424 pr_name_and_units(out, "max:", stats->max_duration); 414 425 pr_name_and_units(out, "total:", stats->total_duration); 415 426 416 - prt_printf(out, "mean:"); 417 - prt_tab(out); 427 + prt_printf(out, "mean:\t"); 418 428 bch2_pr_time_units_aligned(out, d_mean); 419 429 prt_tab(out); 420 430 bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT)); 421 431 prt_newline(out); 422 432 423 - prt_printf(out, "stddev:"); 424 - prt_tab(out); 433 + prt_printf(out, "stddev:\t"); 425 434 bch2_pr_time_units_aligned(out, d_stddev); 426 435 prt_tab(out); 427 436 bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->duration_stats_weighted, TIME_STATS_MV_WEIGHT)); ··· 426 441 printbuf_indent_sub(out, 2); 427 442 prt_newline(out); 428 443 429 - prt_printf(out, "time between events"); 430 - prt_newline(out); 444 + prt_printf(out, "time between events\n"); 431 445 printbuf_indent_add(out, 2); 432 446 433 447 pr_name_and_units(out, "min:", stats->min_freq); 434 448 pr_name_and_units(out, "max:", stats->max_freq); 435 449 436 - prt_printf(out, "mean:"); 437 - prt_tab(out); 450 + prt_printf(out, "mean:\t"); 438 451 bch2_pr_time_units_aligned(out, f_mean); 439 452 prt_tab(out); 440 453 bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_mean(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT)); 441 454 prt_newline(out); 442 455 443 - prt_printf(out, "stddev:"); 444 - prt_tab(out); 456 + prt_printf(out, "stddev:\t"); 445 457 bch2_pr_time_units_aligned(out, f_stddev); 446 458 prt_tab(out); 447 459 bch2_pr_time_units_aligned(out, mean_and_variance_weighted_get_stddev(stats->freq_stats_weighted, TIME_STATS_MV_WEIGHT)); ··· 571 589 if (!out->nr_tabstops) 572 590 printbuf_tabstop_push(out, 20); 573 591 574 - prt_printf(out, "rate:"); 575 - prt_tab(out); 592 + prt_printf(out, "rate:\t"); 576 593 prt_human_readable_s64(out, pd->rate.rate); 577 594 prt_newline(out); 578 595 579 - prt_printf(out, "target:"); 580 - prt_tab(out); 596 + prt_printf(out, "target:\t"); 581 597 prt_human_readable_u64(out, pd->last_target); 582 598 prt_newline(out); 583 599 584 - prt_printf(out, "actual:"); 585 - prt_tab(out); 600 + prt_printf(out, "actual:\t"); 586 601 prt_human_readable_u64(out, pd->last_actual); 587 602 prt_newline(out); 588 603 589 - prt_printf(out, "proportional:"); 590 - prt_tab(out); 604 + prt_printf(out, "proportional:\t"); 591 605 prt_human_readable_s64(out, pd->last_proportional); 592 606 prt_newline(out); 593 607 594 - prt_printf(out, "derivative:"); 595 - prt_tab(out); 608 + prt_printf(out, "derivative:\t"); 596 609 prt_human_readable_s64(out, pd->last_derivative); 597 610 prt_newline(out); 598 611 599 - prt_printf(out, "change:"); 600 - prt_tab(out); 612 + prt_printf(out, "change:\t"); 601 613 prt_human_readable_s64(out, pd->last_change); 602 614 prt_newline(out); 603 615 604 - prt_printf(out, "next io:"); 605 - prt_tab(out); 606 - prt_printf(out, "%llims", div64_s64(pd->rate.next - local_clock(), NSEC_PER_MSEC)); 607 - prt_newline(out); 616 + prt_printf(out, "next io:\t%llims\n", div64_s64(pd->rate.next - local_clock(), NSEC_PER_MSEC)); 608 617 } 609 618 610 619 /* misc: */
+23 -24
fs/bcachefs/xattr.c
··· 71 71 }; 72 72 73 73 int bch2_xattr_invalid(struct bch_fs *c, struct bkey_s_c k, 74 - enum bkey_invalid_flags flags, 74 + enum bch_validate_flags flags, 75 75 struct printbuf *err) 76 76 { 77 77 struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k); ··· 118 118 else 119 119 prt_printf(out, "(unknown type %u)", xattr.v->x_type); 120 120 121 + unsigned name_len = xattr.v->x_name_len; 122 + unsigned val_len = le16_to_cpu(xattr.v->x_val_len); 123 + unsigned max_name_val_bytes = bkey_val_bytes(xattr.k) - 124 + offsetof(struct bch_xattr, x_name); 125 + 126 + val_len = min_t(int, val_len, max_name_val_bytes - name_len); 127 + name_len = min(name_len, max_name_val_bytes); 128 + 121 129 prt_printf(out, "%.*s:%.*s", 122 - xattr.v->x_name_len, 123 - xattr.v->x_name, 124 - le16_to_cpu(xattr.v->x_val_len), 125 - (char *) xattr_val(xattr.v)); 130 + name_len, xattr.v->x_name, 131 + val_len, (char *) xattr_val(xattr.v)); 126 132 127 133 if (xattr.v->x_type == KEY_TYPE_XATTR_INDEX_POSIX_ACL_ACCESS || 128 134 xattr.v->x_type == KEY_TYPE_XATTR_INDEX_POSIX_ACL_DEFAULT) { ··· 144 138 struct bch_hash_info hash = bch2_hash_info_init(trans->c, &inode->ei_inode); 145 139 struct xattr_search_key search = X_SEARCH(type, name, strlen(name)); 146 140 struct btree_iter iter; 147 - struct bkey_s_c_xattr xattr; 148 - struct bkey_s_c k; 149 - int ret; 150 - 151 - ret = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc, &hash, 152 - inode_inum(inode), &search, 0); 141 + struct bkey_s_c k = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc, &hash, 142 + inode_inum(inode), &search, 0); 143 + int ret = bkey_err(k); 153 144 if (ret) 154 - goto err1; 145 + return ret; 155 146 156 - k = bch2_btree_iter_peek_slot(&iter); 157 - ret = bkey_err(k); 158 - if (ret) 159 - goto err2; 160 - 161 - xattr = bkey_s_c_to_xattr(k); 147 + struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k); 162 148 ret = le16_to_cpu(xattr.v->x_val_len); 163 149 if (buffer) { 164 150 if (ret > size) ··· 158 160 else 159 161 memcpy(buffer, xattr_val(xattr.v), ret); 160 162 } 161 - err2: 162 163 bch2_trans_iter_exit(trans, &iter); 163 - err1: 164 - return ret < 0 && bch2_err_matches(ret, ENOENT) ? -ENODATA : ret; 164 + return ret; 165 165 } 166 166 167 167 int bch2_xattr_set(struct btree_trans *trans, subvol_inum inum, ··· 173 177 int ret; 174 178 175 179 ret = bch2_subvol_is_ro_trans(trans, inum.subvol) ?: 176 - bch2_inode_peek(trans, &inode_iter, inode_u, inum, BTREE_ITER_INTENT); 180 + bch2_inode_peek(trans, &inode_iter, inode_u, inum, BTREE_ITER_intent); 177 181 if (ret) 178 182 return ret; 179 183 ··· 208 212 209 213 ret = bch2_hash_set(trans, bch2_xattr_hash_desc, hash_info, 210 214 inum, &xattr->k_i, 211 - (flags & XATTR_CREATE ? BCH_HASH_SET_MUST_CREATE : 0)| 212 - (flags & XATTR_REPLACE ? BCH_HASH_SET_MUST_REPLACE : 0)); 215 + (flags & XATTR_CREATE ? STR_HASH_must_create : 0)| 216 + (flags & XATTR_REPLACE ? STR_HASH_must_replace : 0)); 213 217 } else { 214 218 struct xattr_search_key search = 215 219 X_SEARCH(type, name, strlen(name)); ··· 354 358 struct bch_fs *c = inode->v.i_sb->s_fs_info; 355 359 int ret = bch2_trans_do(c, NULL, NULL, 0, 356 360 bch2_xattr_get_trans(trans, inode, name, buffer, size, handler->flags)); 361 + 362 + if (ret < 0 && bch2_err_matches(ret, ENOENT)) 363 + ret = -ENODATA; 357 364 358 365 return bch2_err_class(ret); 359 366 }
+1 -1
fs/bcachefs/xattr.h
··· 7 7 extern const struct bch_hash_desc bch2_xattr_hash_desc; 8 8 9 9 int bch2_xattr_invalid(struct bch_fs *, struct bkey_s_c, 10 - enum bkey_invalid_flags, struct printbuf *); 10 + enum bch_validate_flags, struct printbuf *); 11 11 void bch2_xattr_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 12 12 13 13 #define bch2_bkey_ops_xattr ((struct bkey_ops) { \
+12
include/linux/closure.h
··· 194 194 __closure_sync(cl); 195 195 } 196 196 197 + int __closure_sync_timeout(struct closure *cl, unsigned long timeout); 198 + 199 + static inline int closure_sync_timeout(struct closure *cl, unsigned long timeout) 200 + { 201 + #ifdef CONFIG_DEBUG_CLOSURES 202 + BUG_ON(closure_nr_remaining(cl) != 1 && !cl->closure_get_happened); 203 + #endif 204 + return cl->closure_get_happened 205 + ? __closure_sync_timeout(cl, timeout) 206 + : 0; 207 + } 208 + 197 209 #ifdef CONFIG_DEBUG_CLOSURES 198 210 199 211 void closure_debug_create(struct closure *cl);
+1
include/uapi/linux/magic.h
··· 37 37 #define HOSTFS_SUPER_MAGIC 0x00c0ffee 38 38 #define OVERLAYFS_SUPER_MAGIC 0x794c7630 39 39 #define FUSE_SUPER_MAGIC 0x65735546 40 + #define BCACHEFS_SUPER_MAGIC 0xca451a4e 40 41 41 42 #define MINIX_SUPER_MAGIC 0x137F /* minix v1 fs, 14 char names */ 42 43 #define MINIX_SUPER_MAGIC2 0x138F /* minix v1 fs, 30 char names */
+37
lib/closure.c
··· 139 139 } 140 140 EXPORT_SYMBOL(__closure_sync); 141 141 142 + int __sched __closure_sync_timeout(struct closure *cl, unsigned long timeout) 143 + { 144 + struct closure_syncer s = { .task = current }; 145 + int ret = 0; 146 + 147 + cl->s = &s; 148 + continue_at(cl, closure_sync_fn, NULL); 149 + 150 + while (1) { 151 + set_current_state(TASK_UNINTERRUPTIBLE); 152 + if (s.done) 153 + break; 154 + if (!timeout) { 155 + /* 156 + * Carefully undo the continue_at() - but only if it 157 + * hasn't completed, i.e. the final closure_put() hasn't 158 + * happened yet: 159 + */ 160 + unsigned old, new, v = atomic_read(&cl->remaining); 161 + do { 162 + old = v; 163 + if (!old || (old & CLOSURE_RUNNING)) 164 + goto success; 165 + 166 + new = old + CLOSURE_REMAINING_INITIALIZER; 167 + } while ((v = atomic_cmpxchg(&cl->remaining, old, new)) != old); 168 + ret = -ETIME; 169 + } 170 + 171 + timeout = schedule_timeout(timeout); 172 + } 173 + success: 174 + __set_current_state(TASK_RUNNING); 175 + return ret; 176 + } 177 + EXPORT_SYMBOL(__closure_sync_timeout); 178 + 142 179 #ifdef CONFIG_DEBUG_CLOSURES 143 180 144 181 static LIST_HEAD(closure_list);