Merge tag 'bcachefs-2025-06-04' of git://evilpiepirate.org/bcachefs

Pull more bcachefs updates from Kent Overstreet:
"More bcachefs updates:

- More stack usage improvements (~600 bytes)

- Define CLASS()es for some commonly used types, and convert most
rcu_read_lock() uses to the new lock guards

- New introspection:
- Superblock error counters are now available in sysfs:
previously, they were only visible with 'show-super', which
doesn't provide a live view
- New tracepoint, error_throw(), which is called any time we
return an error and start to unwind

- Repair
- check_fix_ptrs() can now repair btree node roots
- We can now repair when we've somehow ended up with the journal
using a superblock bucket

- Revert some leftovers from the aborted directory i_size feature,
and add repair code: some userspace programs (e.g. sshfs) were
getting confused

It seems in 6.15 there's a bug where i_nlink on the vfs inode has been
getting incorrectly set to 0, with some unfortunate results;
list_journal analysis showed bch2_inode_rm() being called (by
bch2_evict_inode()) when it clearly should not have been.

- bch2_inode_rm() now runs "should we be deleting this inode?" checks
that were previously only run when deleting unlinked inodes in
recovery

- check_subvol() was treating a dangling subvol (pointing to a
missing root inode) like a dangling dirent, and deleting it. This
was the really unfortunate one: check_subvol() will now recreate
the root inode if necessary

This took longer to debug than it should have, and we lost several
filesystems unnecessarily, because users have been ignoring the
release notes and blindly running 'fsck -y'. Debugging required
reconstructing what happened through analyzing the journal, when
ideally someone would have noticed 'hey, fsck is asking me if I want
to repair this: it usually doesn't, maybe I should run this in dry run
mode and check what's going on?'

As a reminder, fsck errors are being marked as autofix once we've
verified, in real world usage, that they're working correctly; blindly
running 'fsck -y' on an experimental filesystem is playing with fire

Up to this incident we've had an excellent track record of not losing
data, so let's try to learn from this one

This is a community effort, I wouldn't be able to get this done
without the help of all the people QAing and providing excellent bug
reports and feedback based on real world usage. But please don't
ignore advice and expect me to pick up the pieces

If an error isn't marked as autofix, and it /is/ happening in the
wild, that's also something I need to know about so we can check it
out and add it to the autofix list if repair looks good. I haven't
been getting those reports, and I should be; since we don't have any
sort of telemetry yet I am absolutely dependent on user reports

Now I'll be spending the weekend working on new repair code to see if
I can get a filesystem back for a user who didn't have backups"

* tag 'bcachefs-2025-06-04' of git://evilpiepirate.org/bcachefs: (69 commits)
bcachefs: add cond_resched() to handle_overwrites()
bcachefs: Make journal read log message a bit quieter
bcachefs: Fix subvol to missing root repair
bcachefs: Run may_delete_deleted_inode() checks in bch2_inode_rm()
bcachefs: delete dead code from may_delete_deleted_inode()
bcachefs: Add flags to subvolume_to_text()
bcachefs: Fix oops in btree_node_seq_matches()
bcachefs: Fix dirent_casefold_mismatch repair
bcachefs: Fix bch2_fsck_rename_dirent() for casefold
bcachefs: Redo bch2_dirent_init_name()
bcachefs: Fix -Wc23-extensions in bch2_check_dirents()
bcachefs: Run check_dirents second time if required
bcachefs: Run snapshot deletion out of system_long_wq
bcachefs: Make check_key_has_snapshot safer
bcachefs: BCH_RECOVERY_PASS_NO_RATELIMIT
bcachefs: bch2_require_recovery_pass()
bcachefs: bch_err_throw()
bcachefs: Repair code for directory i_size
bcachefs: Kill un-reverted directory i_size code
bcachefs: Delete redundant fsck_err()
...

+2334 -1768
+40 -39
fs/bcachefs/alloc_background.c
··· 21 #include "error.h" 22 #include "lru.h" 23 #include "recovery.h" 24 - #include "trace.h" 25 #include "varint.h" 26 27 #include <linux/kthread.h> ··· 336 a->stripe_sectors = swab32(a->stripe_sectors); 337 } 338 339 - void bch2_alloc_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k) 340 { 341 - struct bch_alloc_v4 _a; 342 - const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &_a); 343 - struct bch_dev *ca = c ? bch2_dev_bucket_tryget_noerror(c, k.k->p) : NULL; 344 345 prt_newline(out); 346 printbuf_indent_add(out, 2); ··· 365 printbuf_indent_sub(out, 2); 366 367 bch2_dev_put(ca); 368 } 369 370 void __bch2_alloc_to_v4(struct bkey_s_c k, struct bch_alloc_v4 *out) ··· 708 set ? "" : "un", 709 bch2_btree_id_str(btree), 710 buf.buf); 711 - if (ret == -BCH_ERR_fsck_ignore || 712 - ret == -BCH_ERR_fsck_errors_not_fixed) 713 ret = 0; 714 715 printbuf_exit(&buf); ··· 865 866 struct bch_dev *ca = bch2_dev_bucket_tryget(c, new.k->p); 867 if (!ca) 868 - return -BCH_ERR_trigger_alloc; 869 870 struct bch_alloc_v4 old_a_convert; 871 const struct bch_alloc_v4 *old_a = bch2_alloc_to_v4(old, &old_a_convert); ··· 999 } 1000 1001 if (new_a->gen != old_a->gen) { 1002 - rcu_read_lock(); 1003 u8 *gen = bucket_gen(ca, new.k->p.offset); 1004 - if (unlikely(!gen)) { 1005 - rcu_read_unlock(); 1006 goto invalid_bucket; 1007 - } 1008 *gen = new_a->gen; 1009 - rcu_read_unlock(); 1010 } 1011 1012 #define eval_state(_a, expr) ({ const struct bch_alloc_v4 *a = _a; expr; }) ··· 1029 } 1030 1031 if ((flags & BTREE_TRIGGER_gc) && (flags & BTREE_TRIGGER_insert)) { 1032 - rcu_read_lock(); 1033 struct bucket *g = gc_bucket(ca, new.k->p.offset); 1034 - if (unlikely(!g)) { 1035 - rcu_read_unlock(); 1036 goto invalid_bucket; 1037 - } 1038 g->gen_valid = 1; 1039 g->gen = new_a->gen; 1040 - rcu_read_unlock(); 1041 } 1042 err: 1043 fsck_err: ··· 1044 invalid_bucket: 1045 bch2_fs_inconsistent(c, "reference to invalid bucket\n%s", 1046 (bch2_bkey_val_to_text(&buf, c, new.s_c), buf.buf)); 1047 - ret = -BCH_ERR_trigger_alloc; 1048 goto err; 1049 } 1050 ··· 1110 bucket->offset = 0; 1111 } 1112 1113 - rcu_read_lock(); 1114 *ca = __bch2_next_dev_idx(c, bucket->inode, NULL); 1115 if (*ca) { 1116 *bucket = POS((*ca)->dev_idx, (*ca)->mi.first_bucket); 1117 bch2_dev_get(*ca); 1118 } 1119 - rcu_read_unlock(); 1120 1121 return *ca != NULL; 1122 } ··· 1458 ret = bch2_btree_bit_mod_iter(trans, iter, false) ?: 1459 bch2_trans_commit(trans, NULL, NULL, 1460 BCH_TRANS_COMMIT_no_enospc) ?: 1461 - -BCH_ERR_transaction_restart_commit; 1462 goto out; 1463 } else { 1464 /* ··· 1781 1782 static int discard_in_flight_add(struct bch_dev *ca, u64 bucket, bool in_progress) 1783 { 1784 int ret; 1785 1786 mutex_lock(&ca->discard_buckets_in_flight_lock); 1787 - darray_for_each(ca->discard_buckets_in_flight, i) 1788 - if (i->bucket == bucket) { 1789 - ret = -BCH_ERR_EEXIST_discard_in_flight_add; 1790 - goto out; 1791 - } 1792 1793 ret = darray_push(&ca->discard_buckets_in_flight, ((struct discard_in_flight) { 1794 .in_progress = in_progress, ··· 1804 static void discard_in_flight_remove(struct bch_dev *ca, u64 bucket) 1805 { 1806 mutex_lock(&ca->discard_buckets_in_flight_lock); 1807 - darray_for_each(ca->discard_buckets_in_flight, i) 1808 - if (i->bucket == bucket) { 1809 - BUG_ON(!i->in_progress); 1810 - darray_remove_item(&ca->discard_buckets_in_flight, i); 1811 - goto found; 1812 - } 1813 - BUG(); 1814 - found: 1815 mutex_unlock(&ca->discard_buckets_in_flight_lock); 1816 } 1817 ··· 2507 2508 lockdep_assert_held(&c->state_lock); 2509 2510 - rcu_read_lock(); 2511 for_each_member_device_rcu(c, ca, NULL) { 2512 struct block_device *bdev = READ_ONCE(ca->disk_sb.bdev); 2513 if (bdev) ··· 2552 bucket_size_max = max_t(unsigned, bucket_size_max, 2553 ca->mi.bucket_size); 2554 } 2555 - rcu_read_unlock(); 2556 2557 bch2_set_ra_pages(c, ra_pages); 2558 ··· 2576 { 2577 u64 ret = U64_MAX; 2578 2579 - rcu_read_lock(); 2580 for_each_rw_member_rcu(c, ca) 2581 ret = min(ret, ca->mi.nbuckets * ca->mi.bucket_size); 2582 - rcu_read_unlock(); 2583 return ret; 2584 } 2585
··· 21 #include "error.h" 22 #include "lru.h" 23 #include "recovery.h" 24 #include "varint.h" 25 26 #include <linux/kthread.h> ··· 337 a->stripe_sectors = swab32(a->stripe_sectors); 338 } 339 340 + static inline void __bch2_alloc_v4_to_text(struct printbuf *out, struct bch_fs *c, 341 + unsigned dev, const struct bch_alloc_v4 *a) 342 { 343 + struct bch_dev *ca = c ? bch2_dev_tryget_noerror(c, dev) : NULL; 344 345 prt_newline(out); 346 printbuf_indent_add(out, 2); ··· 367 printbuf_indent_sub(out, 2); 368 369 bch2_dev_put(ca); 370 + } 371 + 372 + void bch2_alloc_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k) 373 + { 374 + struct bch_alloc_v4 _a; 375 + const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &_a); 376 + 377 + __bch2_alloc_v4_to_text(out, c, k.k->p.inode, a); 378 + } 379 + 380 + void bch2_alloc_v4_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k) 381 + { 382 + __bch2_alloc_v4_to_text(out, c, k.k->p.inode, bkey_s_c_to_alloc_v4(k).v); 383 } 384 385 void __bch2_alloc_to_v4(struct bkey_s_c k, struct bch_alloc_v4 *out) ··· 697 set ? "" : "un", 698 bch2_btree_id_str(btree), 699 buf.buf); 700 + if (bch2_err_matches(ret, BCH_ERR_fsck_ignore) || 701 + bch2_err_matches(ret, BCH_ERR_fsck_errors_not_fixed)) 702 ret = 0; 703 704 printbuf_exit(&buf); ··· 854 855 struct bch_dev *ca = bch2_dev_bucket_tryget(c, new.k->p); 856 if (!ca) 857 + return bch_err_throw(c, trigger_alloc); 858 859 struct bch_alloc_v4 old_a_convert; 860 const struct bch_alloc_v4 *old_a = bch2_alloc_to_v4(old, &old_a_convert); ··· 988 } 989 990 if (new_a->gen != old_a->gen) { 991 + guard(rcu)(); 992 u8 *gen = bucket_gen(ca, new.k->p.offset); 993 + if (unlikely(!gen)) 994 goto invalid_bucket; 995 *gen = new_a->gen; 996 } 997 998 #define eval_state(_a, expr) ({ const struct bch_alloc_v4 *a = _a; expr; }) ··· 1021 } 1022 1023 if ((flags & BTREE_TRIGGER_gc) && (flags & BTREE_TRIGGER_insert)) { 1024 + guard(rcu)(); 1025 struct bucket *g = gc_bucket(ca, new.k->p.offset); 1026 + if (unlikely(!g)) 1027 goto invalid_bucket; 1028 g->gen_valid = 1; 1029 g->gen = new_a->gen; 1030 } 1031 err: 1032 fsck_err: ··· 1039 invalid_bucket: 1040 bch2_fs_inconsistent(c, "reference to invalid bucket\n%s", 1041 (bch2_bkey_val_to_text(&buf, c, new.s_c), buf.buf)); 1042 + ret = bch_err_throw(c, trigger_alloc); 1043 goto err; 1044 } 1045 ··· 1105 bucket->offset = 0; 1106 } 1107 1108 + guard(rcu)(); 1109 *ca = __bch2_next_dev_idx(c, bucket->inode, NULL); 1110 if (*ca) { 1111 *bucket = POS((*ca)->dev_idx, (*ca)->mi.first_bucket); 1112 bch2_dev_get(*ca); 1113 } 1114 1115 return *ca != NULL; 1116 } ··· 1454 ret = bch2_btree_bit_mod_iter(trans, iter, false) ?: 1455 bch2_trans_commit(trans, NULL, NULL, 1456 BCH_TRANS_COMMIT_no_enospc) ?: 1457 + bch_err_throw(c, transaction_restart_commit); 1458 goto out; 1459 } else { 1460 /* ··· 1777 1778 static int discard_in_flight_add(struct bch_dev *ca, u64 bucket, bool in_progress) 1779 { 1780 + struct bch_fs *c = ca->fs; 1781 int ret; 1782 1783 mutex_lock(&ca->discard_buckets_in_flight_lock); 1784 + struct discard_in_flight *i = 1785 + darray_find_p(ca->discard_buckets_in_flight, i, i->bucket == bucket); 1786 + if (i) { 1787 + ret = bch_err_throw(c, EEXIST_discard_in_flight_add); 1788 + goto out; 1789 + } 1790 1791 ret = darray_push(&ca->discard_buckets_in_flight, ((struct discard_in_flight) { 1792 .in_progress = in_progress, ··· 1798 static void discard_in_flight_remove(struct bch_dev *ca, u64 bucket) 1799 { 1800 mutex_lock(&ca->discard_buckets_in_flight_lock); 1801 + struct discard_in_flight *i = 1802 + darray_find_p(ca->discard_buckets_in_flight, i, i->bucket == bucket); 1803 + BUG_ON(!i || !i->in_progress); 1804 + 1805 + darray_remove_item(&ca->discard_buckets_in_flight, i); 1806 mutex_unlock(&ca->discard_buckets_in_flight_lock); 1807 } 1808 ··· 2504 2505 lockdep_assert_held(&c->state_lock); 2506 2507 + guard(rcu)(); 2508 for_each_member_device_rcu(c, ca, NULL) { 2509 struct block_device *bdev = READ_ONCE(ca->disk_sb.bdev); 2510 if (bdev) ··· 2549 bucket_size_max = max_t(unsigned, bucket_size_max, 2550 ca->mi.bucket_size); 2551 } 2552 2553 bch2_set_ra_pages(c, ra_pages); 2554 ··· 2574 { 2575 u64 ret = U64_MAX; 2576 2577 + guard(rcu)(); 2578 for_each_rw_member_rcu(c, ca) 2579 ret = min(ret, ca->mi.nbuckets * ca->mi.bucket_size); 2580 return ret; 2581 } 2582
+4 -5
fs/bcachefs/alloc_background.h
··· 13 14 static inline bool bch2_dev_bucket_exists(struct bch_fs *c, struct bpos pos) 15 { 16 - rcu_read_lock(); 17 struct bch_dev *ca = bch2_dev_rcu_noerror(c, pos.inode); 18 - bool ret = ca && bucket_valid(ca, pos.offset); 19 - rcu_read_unlock(); 20 - return ret; 21 } 22 23 static inline u64 bucket_to_u64(struct bpos bucket) ··· 251 struct bkey_validate_context); 252 void bch2_alloc_v4_swab(struct bkey_s); 253 void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 254 255 #define bch2_bkey_ops_alloc ((struct bkey_ops) { \ 256 .key_validate = bch2_alloc_v1_validate, \ ··· 276 277 #define bch2_bkey_ops_alloc_v4 ((struct bkey_ops) { \ 278 .key_validate = bch2_alloc_v4_validate, \ 279 - .val_to_text = bch2_alloc_to_text, \ 280 .swab = bch2_alloc_v4_swab, \ 281 .trigger = bch2_trigger_alloc, \ 282 .min_val_size = 48, \
··· 13 14 static inline bool bch2_dev_bucket_exists(struct bch_fs *c, struct bpos pos) 15 { 16 + guard(rcu)(); 17 struct bch_dev *ca = bch2_dev_rcu_noerror(c, pos.inode); 18 + return ca && bucket_valid(ca, pos.offset); 19 } 20 21 static inline u64 bucket_to_u64(struct bpos bucket) ··· 253 struct bkey_validate_context); 254 void bch2_alloc_v4_swab(struct bkey_s); 255 void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 256 + void bch2_alloc_v4_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); 257 258 #define bch2_bkey_ops_alloc ((struct bkey_ops) { \ 259 .key_validate = bch2_alloc_v1_validate, \ ··· 277 278 #define bch2_bkey_ops_alloc_v4 ((struct bkey_ops) { \ 279 .key_validate = bch2_alloc_v4_validate, \ 280 + .val_to_text = bch2_alloc_v4_to_text, \ 281 .swab = bch2_alloc_v4_swab, \ 282 .trigger = bch2_trigger_alloc, \ 283 .min_val_size = 48, \
+51 -57
fs/bcachefs/alloc_foreground.c
··· 69 70 void bch2_reset_alloc_cursors(struct bch_fs *c) 71 { 72 - rcu_read_lock(); 73 for_each_member_device_rcu(c, ca, NULL) 74 memset(ca->alloc_cursor, 0, sizeof(ca->alloc_cursor)); 75 - rcu_read_unlock(); 76 } 77 78 static void bch2_open_bucket_hash_add(struct bch_fs *c, struct open_bucket *ob) ··· 165 ARRAY_SIZE(c->open_buckets_partial)); 166 167 spin_lock(&c->freelist_lock); 168 - rcu_read_lock(); 169 - bch2_dev_rcu(c, ob->dev)->nr_partial_buckets++; 170 - rcu_read_unlock(); 171 172 ob->on_partial_list = true; 173 c->open_buckets_partial[c->open_buckets_partial_nr++] = ··· 227 228 track_event_change(&c->times[BCH_TIME_blocked_allocate_open_bucket], true); 229 spin_unlock(&c->freelist_lock); 230 - return ERR_PTR(-BCH_ERR_open_buckets_empty); 231 } 232 233 /* Recheck under lock: */ ··· 533 534 track_event_change(&c->times[BCH_TIME_blocked_allocate], true); 535 536 - ob = ERR_PTR(-BCH_ERR_freelist_empty); 537 goto err; 538 } 539 ··· 558 } 559 err: 560 if (!ob) 561 - ob = ERR_PTR(-BCH_ERR_no_buckets_found); 562 563 if (!IS_ERR(ob)) 564 ob->data_type = req->data_type; ··· 601 602 #define dev_stripe_cmp(l, r) __dev_stripe_cmp(stripe, l, r) 603 604 - struct dev_alloc_list bch2_dev_alloc_list(struct bch_fs *c, 605 - struct dev_stripe_state *stripe, 606 - struct bch_devs_mask *devs) 607 { 608 - struct dev_alloc_list ret = { .nr = 0 }; 609 unsigned i; 610 - 611 for_each_set_bit(i, devs->d, BCH_SB_MEMBERS_MAX) 612 - ret.data[ret.nr++] = i; 613 614 - bubble_sort(ret.data, ret.nr, dev_stripe_cmp); 615 - return ret; 616 } 617 618 static const u64 stripe_clock_hand_rescale = 1ULL << 62; /* trigger rescale at */ ··· 703 return 0; 704 } 705 706 - int bch2_bucket_alloc_set_trans(struct btree_trans *trans, 707 - struct alloc_request *req, 708 - struct dev_stripe_state *stripe, 709 - struct closure *cl) 710 { 711 struct bch_fs *c = trans->c; 712 - int ret = -BCH_ERR_insufficient_devices; 713 714 BUG_ON(req->nr_effective >= req->nr_replicas); 715 716 - struct dev_alloc_list devs_sorted = bch2_dev_alloc_list(c, stripe, &req->devs_may_alloc); 717 - darray_for_each(devs_sorted, i) { 718 req->ca = bch2_dev_tryget_noerror(c, *i); 719 if (!req->ca) 720 continue; ··· 738 continue; 739 } 740 741 - if (add_new_bucket(c, req, ob)) { 742 - ret = 0; 743 break; 744 - } 745 } 746 747 - return ret; 748 } 749 750 /* Allocate from stripes: */ ··· 778 if (!h) 779 return 0; 780 781 - struct dev_alloc_list devs_sorted = 782 - bch2_dev_alloc_list(c, &req->wp->stripe, &req->devs_may_alloc); 783 - darray_for_each(devs_sorted, i) 784 for (unsigned ec_idx = 0; ec_idx < h->s->nr_data; ec_idx++) { 785 if (!h->s->blocks[ec_idx]) 786 continue; ··· 874 i); 875 ob->on_partial_list = false; 876 877 - rcu_read_lock(); 878 - bch2_dev_rcu(c, ob->dev)->nr_partial_buckets--; 879 - rcu_read_unlock(); 880 881 ret = add_new_bucket(c, req, ob); 882 if (ret) ··· 1057 1058 ob->on_partial_list = false; 1059 1060 - rcu_read_lock(); 1061 - bch2_dev_rcu(c, ob->dev)->nr_partial_buckets--; 1062 - rcu_read_unlock(); 1063 1064 spin_unlock(&c->freelist_lock); 1065 bch2_open_bucket_put(c, ob); ··· 1086 { 1087 struct write_point *wp; 1088 1089 - rcu_read_lock(); 1090 hlist_for_each_entry_rcu(wp, head, node) 1091 if (wp->write_point == write_point) 1092 - goto out; 1093 - wp = NULL; 1094 - out: 1095 - rcu_read_unlock(); 1096 - return wp; 1097 } 1098 1099 static inline bool too_many_writepoints(struct bch_fs *c, unsigned factor) ··· 1101 return stranded * factor > free; 1102 } 1103 1104 - static bool try_increase_writepoints(struct bch_fs *c) 1105 { 1106 struct write_point *wp; 1107 ··· 1114 return true; 1115 } 1116 1117 - static bool try_decrease_writepoints(struct btree_trans *trans, unsigned old_nr) 1118 { 1119 struct bch_fs *c = trans->c; 1120 struct write_point *wp; ··· 1376 goto retry; 1377 1378 if (cl && bch2_err_matches(ret, BCH_ERR_open_buckets_empty)) 1379 - ret = -BCH_ERR_bucket_alloc_blocked; 1380 1381 if (cl && !(flags & BCH_WRITE_alloc_nowait) && 1382 bch2_err_matches(ret, BCH_ERR_freelist_empty)) 1383 - ret = -BCH_ERR_bucket_alloc_blocked; 1384 1385 return ret; 1386 } ··· 1634 1635 bch2_printbuf_make_room(&buf, 4096); 1636 1637 - rcu_read_lock(); 1638 buf.atomic++; 1639 - 1640 - for_each_online_member_rcu(c, ca) { 1641 - prt_printf(&buf, "Dev %u:\n", ca->dev_idx); 1642 - printbuf_indent_add(&buf, 2); 1643 - bch2_dev_alloc_debug_to_text(&buf, ca); 1644 - printbuf_indent_sub(&buf, 2); 1645 - prt_newline(&buf); 1646 - } 1647 - 1648 --buf.atomic; 1649 - rcu_read_unlock(); 1650 1651 prt_printf(&buf, "Copygc debug:\n"); 1652 printbuf_indent_add(&buf, 2);
··· 69 70 void bch2_reset_alloc_cursors(struct bch_fs *c) 71 { 72 + guard(rcu)(); 73 for_each_member_device_rcu(c, ca, NULL) 74 memset(ca->alloc_cursor, 0, sizeof(ca->alloc_cursor)); 75 } 76 77 static void bch2_open_bucket_hash_add(struct bch_fs *c, struct open_bucket *ob) ··· 166 ARRAY_SIZE(c->open_buckets_partial)); 167 168 spin_lock(&c->freelist_lock); 169 + scoped_guard(rcu) 170 + bch2_dev_rcu(c, ob->dev)->nr_partial_buckets++; 171 172 ob->on_partial_list = true; 173 c->open_buckets_partial[c->open_buckets_partial_nr++] = ··· 229 230 track_event_change(&c->times[BCH_TIME_blocked_allocate_open_bucket], true); 231 spin_unlock(&c->freelist_lock); 232 + return ERR_PTR(bch_err_throw(c, open_buckets_empty)); 233 } 234 235 /* Recheck under lock: */ ··· 535 536 track_event_change(&c->times[BCH_TIME_blocked_allocate], true); 537 538 + ob = ERR_PTR(bch_err_throw(c, freelist_empty)); 539 goto err; 540 } 541 ··· 560 } 561 err: 562 if (!ob) 563 + ob = ERR_PTR(bch_err_throw(c, no_buckets_found)); 564 565 if (!IS_ERR(ob)) 566 ob->data_type = req->data_type; ··· 603 604 #define dev_stripe_cmp(l, r) __dev_stripe_cmp(stripe, l, r) 605 606 + void bch2_dev_alloc_list(struct bch_fs *c, 607 + struct dev_stripe_state *stripe, 608 + struct bch_devs_mask *devs, 609 + struct dev_alloc_list *ret) 610 { 611 + ret->nr = 0; 612 + 613 unsigned i; 614 for_each_set_bit(i, devs->d, BCH_SB_MEMBERS_MAX) 615 + ret->data[ret->nr++] = i; 616 617 + bubble_sort(ret->data, ret->nr, dev_stripe_cmp); 618 } 619 620 static const u64 stripe_clock_hand_rescale = 1ULL << 62; /* trigger rescale at */ ··· 705 return 0; 706 } 707 708 + inline int bch2_bucket_alloc_set_trans(struct btree_trans *trans, 709 + struct alloc_request *req, 710 + struct dev_stripe_state *stripe, 711 + struct closure *cl) 712 { 713 struct bch_fs *c = trans->c; 714 + int ret = 0; 715 716 BUG_ON(req->nr_effective >= req->nr_replicas); 717 718 + bch2_dev_alloc_list(c, stripe, &req->devs_may_alloc, &req->devs_sorted); 719 + 720 + darray_for_each(req->devs_sorted, i) { 721 req->ca = bch2_dev_tryget_noerror(c, *i); 722 if (!req->ca) 723 continue; ··· 739 continue; 740 } 741 742 + ret = add_new_bucket(c, req, ob); 743 + if (ret) 744 break; 745 } 746 747 + if (ret == 1) 748 + return 0; 749 + if (ret) 750 + return ret; 751 + return bch_err_throw(c, insufficient_devices); 752 } 753 754 /* Allocate from stripes: */ ··· 776 if (!h) 777 return 0; 778 779 + bch2_dev_alloc_list(c, &req->wp->stripe, &req->devs_may_alloc, &req->devs_sorted); 780 + 781 + darray_for_each(req->devs_sorted, i) 782 for (unsigned ec_idx = 0; ec_idx < h->s->nr_data; ec_idx++) { 783 if (!h->s->blocks[ec_idx]) 784 continue; ··· 872 i); 873 ob->on_partial_list = false; 874 875 + scoped_guard(rcu) 876 + bch2_dev_rcu(c, ob->dev)->nr_partial_buckets--; 877 878 ret = add_new_bucket(c, req, ob); 879 if (ret) ··· 1056 1057 ob->on_partial_list = false; 1058 1059 + scoped_guard(rcu) 1060 + bch2_dev_rcu(c, ob->dev)->nr_partial_buckets--; 1061 1062 spin_unlock(&c->freelist_lock); 1063 bch2_open_bucket_put(c, ob); ··· 1086 { 1087 struct write_point *wp; 1088 1089 + guard(rcu)(); 1090 hlist_for_each_entry_rcu(wp, head, node) 1091 if (wp->write_point == write_point) 1092 + return wp; 1093 + return NULL; 1094 } 1095 1096 static inline bool too_many_writepoints(struct bch_fs *c, unsigned factor) ··· 1104 return stranded * factor > free; 1105 } 1106 1107 + static noinline bool try_increase_writepoints(struct bch_fs *c) 1108 { 1109 struct write_point *wp; 1110 ··· 1117 return true; 1118 } 1119 1120 + static noinline bool try_decrease_writepoints(struct btree_trans *trans, unsigned old_nr) 1121 { 1122 struct bch_fs *c = trans->c; 1123 struct write_point *wp; ··· 1379 goto retry; 1380 1381 if (cl && bch2_err_matches(ret, BCH_ERR_open_buckets_empty)) 1382 + ret = bch_err_throw(c, bucket_alloc_blocked); 1383 1384 if (cl && !(flags & BCH_WRITE_alloc_nowait) && 1385 bch2_err_matches(ret, BCH_ERR_freelist_empty)) 1386 + ret = bch_err_throw(c, bucket_alloc_blocked); 1387 1388 return ret; 1389 } ··· 1637 1638 bch2_printbuf_make_room(&buf, 4096); 1639 1640 buf.atomic++; 1641 + scoped_guard(rcu) 1642 + for_each_online_member_rcu(c, ca) { 1643 + prt_printf(&buf, "Dev %u:\n", ca->dev_idx); 1644 + printbuf_indent_add(&buf, 2); 1645 + bch2_dev_alloc_debug_to_text(&buf, ca); 1646 + printbuf_indent_sub(&buf, 2); 1647 + prt_newline(&buf); 1648 + } 1649 --buf.atomic; 1650 1651 prt_printf(&buf, "Copygc debug:\n"); 1652 printbuf_indent_add(&buf, 2);
+5 -3
fs/bcachefs/alloc_foreground.h
··· 42 struct bch_devs_mask devs_may_alloc; 43 44 /* bch2_bucket_alloc_set_trans(): */ 45 struct bch_dev_usage usage; 46 47 /* bch2_bucket_alloc_trans(): */ ··· 72 struct bch_devs_mask scratch_devs_may_alloc; 73 }; 74 75 - struct dev_alloc_list bch2_dev_alloc_list(struct bch_fs *, 76 - struct dev_stripe_state *, 77 - struct bch_devs_mask *); 78 void bch2_dev_stripe_increment(struct bch_dev *, struct dev_stripe_state *); 79 80 static inline struct bch_dev *ob_dev(struct bch_fs *c, struct open_bucket *ob)
··· 42 struct bch_devs_mask devs_may_alloc; 43 44 /* bch2_bucket_alloc_set_trans(): */ 45 + struct dev_alloc_list devs_sorted; 46 struct bch_dev_usage usage; 47 48 /* bch2_bucket_alloc_trans(): */ ··· 71 struct bch_devs_mask scratch_devs_may_alloc; 72 }; 73 74 + void bch2_dev_alloc_list(struct bch_fs *, 75 + struct dev_stripe_state *, 76 + struct bch_devs_mask *, 77 + struct dev_alloc_list *); 78 void bch2_dev_stripe_increment(struct bch_dev *, struct dev_stripe_state *); 79 80 static inline struct bch_dev *ob_dev(struct bch_fs *c, struct open_bucket *ob)
+37 -37
fs/bcachefs/backpointers.c
··· 48 { 49 struct bkey_s_c_backpointer bp = bkey_s_c_to_backpointer(k); 50 51 - rcu_read_lock(); 52 - struct bch_dev *ca = bch2_dev_rcu_noerror(c, bp.k->p.inode); 53 - if (ca) { 54 - u32 bucket_offset; 55 - struct bpos bucket = bp_pos_to_bucket_and_offset(ca, bp.k->p, &bucket_offset); 56 - rcu_read_unlock(); 57 - prt_printf(out, "bucket=%llu:%llu:%u ", bucket.inode, bucket.offset, bucket_offset); 58 - } else { 59 - rcu_read_unlock(); 60 - prt_printf(out, "sector=%llu:%llu ", bp.k->p.inode, bp.k->p.offset >> MAX_EXTENT_COMPRESS_RATIO_SHIFT); 61 } 62 63 bch2_btree_id_level_to_text(out, bp.v->btree_id, bp.v->level); 64 prt_str(out, " data_type="); ··· 142 } 143 144 if (!will_check && __bch2_inconsistent_error(c, &buf)) 145 - ret = -BCH_ERR_erofs_unfixed_errors; 146 147 bch_err(c, "%s", buf.buf); 148 printbuf_exit(&buf); ··· 295 return b; 296 297 if (btree_node_will_make_reachable(b)) { 298 - b = ERR_PTR(-BCH_ERR_backpointer_to_overwritten_btree_node); 299 } else { 300 int ret = backpointer_target_not_found(trans, bp, bkey_i_to_s_c(&b->key), 301 last_flushed, commit); ··· 353 return ret ? bkey_s_c_err(ret) : bkey_s_c_null; 354 } else { 355 struct btree *b = __bch2_backpointer_get_node(trans, bp, iter, last_flushed, commit); 356 - if (b == ERR_PTR(-BCH_ERR_backpointer_to_overwritten_btree_node)) 357 return bkey_s_c_null; 358 if (IS_ERR_OR_NULL(b)) 359 return ((struct bkey_s_c) { .k = ERR_CAST(b) }); ··· 593 bkey_for_each_ptr(other_extent_ptrs, ptr) 594 if (ptr->dev == bp->k.p.inode && 595 dev_ptr_stale_rcu(ca, ptr)) { 596 ret = drop_dev_and_update(trans, other_bp.v->btree_id, 597 other_extent, bp->k.p.inode); 598 if (ret) ··· 651 prt_newline(&buf); 652 bch2_bkey_val_to_text(&buf, c, other_extent); 653 bch_err(c, "%s", buf.buf); 654 - ret = -BCH_ERR_fsck_repair_unimplemented; 655 goto err; 656 missing: 657 printbuf_reset(&buf); ··· 682 if (p.ptr.dev == BCH_SB_MEMBER_INVALID) 683 continue; 684 685 - rcu_read_lock(); 686 - struct bch_dev *ca = bch2_dev_rcu_noerror(c, p.ptr.dev); 687 - if (!ca) { 688 - rcu_read_unlock(); 689 - continue; 690 - } 691 692 - if (p.ptr.cached && dev_ptr_stale_rcu(ca, &p.ptr)) { 693 - rcu_read_unlock(); 694 - continue; 695 - } 696 697 - u64 b = PTR_BUCKET_NR(ca, &p.ptr); 698 - if (!bch2_bucket_bitmap_test(&ca->bucket_backpointer_mismatch, b)) { 699 - rcu_read_unlock(); 700 - continue; 701 - } 702 703 - bool empty = bch2_bucket_bitmap_test(&ca->bucket_backpointer_empty, b); 704 - rcu_read_unlock(); 705 706 struct bkey_i_backpointer bp; 707 bch2_extent_ptr_to_bp(c, btree, level, k, p, entry, &bp); ··· 953 sectors[ALLOC_cached] > a->cached_sectors || 954 sectors[ALLOC_stripe] > a->stripe_sectors) { 955 ret = check_bucket_backpointers_to_extents(trans, ca, alloc_k.k->p) ?: 956 - -BCH_ERR_transaction_restart_nested; 957 goto err; 958 } 959 ··· 981 case KEY_TYPE_btree_ptr_v2: { 982 bool ret = false; 983 984 - rcu_read_lock(); 985 struct bpos pos = bkey_s_c_to_btree_ptr_v2(k).v->min_key; 986 while (pos.inode <= k.k->p.inode) { 987 if (pos.inode >= c->sb.nr_devices) ··· 1009 next: 1010 pos = SPOS(pos.inode + 1, 0, 0); 1011 } 1012 - rcu_read_unlock(); 1013 1014 return ret; 1015 } ··· 1351 b->buckets = kvcalloc(BITS_TO_LONGS(ca->mi.nbuckets), 1352 sizeof(unsigned long), GFP_KERNEL); 1353 if (!b->buckets) 1354 - return -BCH_ERR_ENOMEM_backpointer_mismatches_bitmap; 1355 } 1356 1357 b->nr += !__test_and_set_bit(bit, b->buckets); ··· 1360 return 0; 1361 } 1362 1363 - int bch2_bucket_bitmap_resize(struct bucket_bitmap *b, u64 old_size, u64 new_size) 1364 { 1365 scoped_guard(mutex, &b->lock) { 1366 if (!b->buckets) ··· 1370 unsigned long *n = kvcalloc(BITS_TO_LONGS(new_size), 1371 sizeof(unsigned long), GFP_KERNEL); 1372 if (!n) 1373 - return -BCH_ERR_ENOMEM_backpointer_mismatches_bitmap; 1374 1375 memcpy(n, b->buckets, 1376 BITS_TO_LONGS(min(old_size, new_size)) * sizeof(unsigned long));
··· 48 { 49 struct bkey_s_c_backpointer bp = bkey_s_c_to_backpointer(k); 50 51 + struct bch_dev *ca; 52 + u32 bucket_offset; 53 + struct bpos bucket; 54 + scoped_guard(rcu) { 55 + ca = bch2_dev_rcu_noerror(c, bp.k->p.inode); 56 + if (ca) 57 + bucket = bp_pos_to_bucket_and_offset(ca, bp.k->p, &bucket_offset); 58 } 59 + 60 + if (ca) 61 + prt_printf(out, "bucket=%llu:%llu:%u ", bucket.inode, bucket.offset, bucket_offset); 62 + else 63 + prt_printf(out, "sector=%llu:%llu ", bp.k->p.inode, bp.k->p.offset >> MAX_EXTENT_COMPRESS_RATIO_SHIFT); 64 65 bch2_btree_id_level_to_text(out, bp.v->btree_id, bp.v->level); 66 prt_str(out, " data_type="); ··· 140 } 141 142 if (!will_check && __bch2_inconsistent_error(c, &buf)) 143 + ret = bch_err_throw(c, erofs_unfixed_errors); 144 145 bch_err(c, "%s", buf.buf); 146 printbuf_exit(&buf); ··· 293 return b; 294 295 if (btree_node_will_make_reachable(b)) { 296 + b = ERR_PTR(bch_err_throw(c, backpointer_to_overwritten_btree_node)); 297 } else { 298 int ret = backpointer_target_not_found(trans, bp, bkey_i_to_s_c(&b->key), 299 last_flushed, commit); ··· 351 return ret ? bkey_s_c_err(ret) : bkey_s_c_null; 352 } else { 353 struct btree *b = __bch2_backpointer_get_node(trans, bp, iter, last_flushed, commit); 354 + if (b == ERR_PTR(bch_err_throw(c, backpointer_to_overwritten_btree_node))) 355 return bkey_s_c_null; 356 if (IS_ERR_OR_NULL(b)) 357 return ((struct bkey_s_c) { .k = ERR_CAST(b) }); ··· 591 bkey_for_each_ptr(other_extent_ptrs, ptr) 592 if (ptr->dev == bp->k.p.inode && 593 dev_ptr_stale_rcu(ca, ptr)) { 594 + rcu_read_unlock(); 595 ret = drop_dev_and_update(trans, other_bp.v->btree_id, 596 other_extent, bp->k.p.inode); 597 if (ret) ··· 648 prt_newline(&buf); 649 bch2_bkey_val_to_text(&buf, c, other_extent); 650 bch_err(c, "%s", buf.buf); 651 + ret = bch_err_throw(c, fsck_repair_unimplemented); 652 goto err; 653 missing: 654 printbuf_reset(&buf); ··· 679 if (p.ptr.dev == BCH_SB_MEMBER_INVALID) 680 continue; 681 682 + bool empty; 683 + { 684 + /* scoped_guard() is a loop, so it breaks continue */ 685 + guard(rcu)(); 686 + struct bch_dev *ca = bch2_dev_rcu_noerror(c, p.ptr.dev); 687 + if (!ca) 688 + continue; 689 690 + if (p.ptr.cached && dev_ptr_stale_rcu(ca, &p.ptr)) 691 + continue; 692 693 + u64 b = PTR_BUCKET_NR(ca, &p.ptr); 694 + if (!bch2_bucket_bitmap_test(&ca->bucket_backpointer_mismatch, b)) 695 + continue; 696 697 + empty = bch2_bucket_bitmap_test(&ca->bucket_backpointer_empty, b); 698 + } 699 700 struct bkey_i_backpointer bp; 701 bch2_extent_ptr_to_bp(c, btree, level, k, p, entry, &bp); ··· 953 sectors[ALLOC_cached] > a->cached_sectors || 954 sectors[ALLOC_stripe] > a->stripe_sectors) { 955 ret = check_bucket_backpointers_to_extents(trans, ca, alloc_k.k->p) ?: 956 + bch_err_throw(c, transaction_restart_nested); 957 goto err; 958 } 959 ··· 981 case KEY_TYPE_btree_ptr_v2: { 982 bool ret = false; 983 984 + guard(rcu)(); 985 struct bpos pos = bkey_s_c_to_btree_ptr_v2(k).v->min_key; 986 while (pos.inode <= k.k->p.inode) { 987 if (pos.inode >= c->sb.nr_devices) ··· 1009 next: 1010 pos = SPOS(pos.inode + 1, 0, 0); 1011 } 1012 1013 return ret; 1014 } ··· 1352 b->buckets = kvcalloc(BITS_TO_LONGS(ca->mi.nbuckets), 1353 sizeof(unsigned long), GFP_KERNEL); 1354 if (!b->buckets) 1355 + return bch_err_throw(ca->fs, ENOMEM_backpointer_mismatches_bitmap); 1356 } 1357 1358 b->nr += !__test_and_set_bit(bit, b->buckets); ··· 1361 return 0; 1362 } 1363 1364 + int bch2_bucket_bitmap_resize(struct bch_dev *ca, struct bucket_bitmap *b, 1365 + u64 old_size, u64 new_size) 1366 { 1367 scoped_guard(mutex, &b->lock) { 1368 if (!b->buckets) ··· 1370 unsigned long *n = kvcalloc(BITS_TO_LONGS(new_size), 1371 sizeof(unsigned long), GFP_KERNEL); 1372 if (!n) 1373 + return bch_err_throw(ca->fs, ENOMEM_backpointer_mismatches_bitmap); 1374 1375 memcpy(n, b->buckets, 1376 BITS_TO_LONGS(min(old_size, new_size)) * sizeof(unsigned long));
+2 -3
fs/bcachefs/backpointers.h
··· 53 54 static inline bool bp_pos_to_bucket_nodev_noerror(struct bch_fs *c, struct bpos bp_pos, struct bpos *bucket) 55 { 56 - rcu_read_lock(); 57 struct bch_dev *ca = bch2_dev_rcu_noerror(c, bp_pos.inode); 58 if (ca) 59 *bucket = bp_pos_to_bucket(ca, bp_pos); 60 - rcu_read_unlock(); 61 return ca != NULL; 62 } 63 ··· 194 return bitmap && test_bit(i, bitmap); 195 } 196 197 - int bch2_bucket_bitmap_resize(struct bucket_bitmap *, u64, u64); 198 void bch2_bucket_bitmap_free(struct bucket_bitmap *); 199 200 #endif /* _BCACHEFS_BACKPOINTERS_BACKGROUND_H */
··· 53 54 static inline bool bp_pos_to_bucket_nodev_noerror(struct bch_fs *c, struct bpos bp_pos, struct bpos *bucket) 55 { 56 + guard(rcu)(); 57 struct bch_dev *ca = bch2_dev_rcu_noerror(c, bp_pos.inode); 58 if (ca) 59 *bucket = bp_pos_to_bucket(ca, bp_pos); 60 return ca != NULL; 61 } 62 ··· 195 return bitmap && test_bit(i, bitmap); 196 } 197 198 + int bch2_bucket_bitmap_resize(struct bch_dev *, struct bucket_bitmap *, u64, u64); 199 void bch2_bucket_bitmap_free(struct bucket_bitmap *); 200 201 #endif /* _BCACHEFS_BACKPOINTERS_BACKGROUND_H */
+41 -31
fs/bcachefs/bcachefs.h
··· 183 #define pr_fmt(fmt) "%s() " fmt "\n", __func__ 184 #endif 185 186 #include <linux/backing-dev-defs.h> 187 #include <linux/bug.h> 188 #include <linux/bio.h> ··· 229 #include "time_stats.h" 230 #include "util.h" 231 232 - #ifdef CONFIG_BCACHEFS_DEBUG 233 - #define ENUMERATED_REF_DEBUG 234 - #endif 235 236 - #ifndef dynamic_fault 237 - #define dynamic_fault(...) 0 238 - #endif 239 - 240 - #define race_fault(...) dynamic_fault("bcachefs:race") 241 242 #define count_event(_c, _name) this_cpu_inc((_c)->counters[BCH_COUNTER_##_name]) 243 ··· 405 pr_info(fmt, ##__VA_ARGS__); \ 406 } while (0) 407 408 /* Parameters that are useful for debugging, but should always be compiled in: */ 409 #define BCH_DEBUG_PARAMS_ALWAYS() \ 410 BCH_DEBUG_PARAM(key_merging_disabled, \ ··· 518 #undef x 519 BCH_TIME_STAT_NR 520 }; 521 - 522 - #include "alloc_types.h" 523 - #include "async_objs_types.h" 524 - #include "btree_gc_types.h" 525 - #include "btree_types.h" 526 - #include "btree_node_scan_types.h" 527 - #include "btree_write_buffer_types.h" 528 - #include "buckets_types.h" 529 - #include "buckets_waiting_for_journal_types.h" 530 - #include "clock_types.h" 531 - #include "disk_groups_types.h" 532 - #include "ec_types.h" 533 - #include "enumerated_ref_types.h" 534 - #include "journal_types.h" 535 - #include "keylist_types.h" 536 - #include "quota_types.h" 537 - #include "rebalance_types.h" 538 - #include "recovery_passes_types.h" 539 - #include "replicas_types.h" 540 - #include "sb-members_types.h" 541 - #include "subvolume_types.h" 542 - #include "super_types.h" 543 - #include "thread_with_file_types.h" 544 545 /* Number of nodes btree coalesce will try to coalesce at once */ 546 #define GC_MERGE_NODES 4U
··· 183 #define pr_fmt(fmt) "%s() " fmt "\n", __func__ 184 #endif 185 186 + #ifdef CONFIG_BCACHEFS_DEBUG 187 + #define ENUMERATED_REF_DEBUG 188 + #endif 189 + 190 + #ifndef dynamic_fault 191 + #define dynamic_fault(...) 0 192 + #endif 193 + 194 + #define race_fault(...) dynamic_fault("bcachefs:race") 195 + 196 #include <linux/backing-dev-defs.h> 197 #include <linux/bug.h> 198 #include <linux/bio.h> ··· 219 #include "time_stats.h" 220 #include "util.h" 221 222 + #include "alloc_types.h" 223 + #include "async_objs_types.h" 224 + #include "btree_gc_types.h" 225 + #include "btree_types.h" 226 + #include "btree_node_scan_types.h" 227 + #include "btree_write_buffer_types.h" 228 + #include "buckets_types.h" 229 + #include "buckets_waiting_for_journal_types.h" 230 + #include "clock_types.h" 231 + #include "disk_groups_types.h" 232 + #include "ec_types.h" 233 + #include "enumerated_ref_types.h" 234 + #include "journal_types.h" 235 + #include "keylist_types.h" 236 + #include "quota_types.h" 237 + #include "rebalance_types.h" 238 + #include "recovery_passes_types.h" 239 + #include "replicas_types.h" 240 + #include "sb-members_types.h" 241 + #include "subvolume_types.h" 242 + #include "super_types.h" 243 + #include "thread_with_file_types.h" 244 245 + #include "trace.h" 246 247 #define count_event(_c, _name) this_cpu_inc((_c)->counters[BCH_COUNTER_##_name]) 248 ··· 380 pr_info(fmt, ##__VA_ARGS__); \ 381 } while (0) 382 383 + static inline int __bch2_err_trace(struct bch_fs *c, int err) 384 + { 385 + trace_error_throw(c, err, _THIS_IP_); 386 + return err; 387 + } 388 + 389 + #define bch_err_throw(_c, _err) __bch2_err_trace(_c, -BCH_ERR_##_err) 390 + 391 /* Parameters that are useful for debugging, but should always be compiled in: */ 392 #define BCH_DEBUG_PARAMS_ALWAYS() \ 393 BCH_DEBUG_PARAM(key_merging_disabled, \ ··· 485 #undef x 486 BCH_TIME_STAT_NR 487 }; 488 489 /* Number of nodes btree coalesce will try to coalesce at once */ 490 #define GC_MERGE_NODES 4U
+12 -12
fs/bcachefs/btree_cache.c
··· 149 150 b->data = kvmalloc(btree_buf_bytes(b), gfp); 151 if (!b->data) 152 - return -BCH_ERR_ENOMEM_btree_node_mem_alloc; 153 #ifdef __KERNEL__ 154 b->aux_data = kvmalloc(btree_aux_data_bytes(b), gfp); 155 #else ··· 162 if (!b->aux_data) { 163 kvfree(b->data); 164 b->data = NULL; 165 - return -BCH_ERR_ENOMEM_btree_node_mem_alloc; 166 } 167 168 return 0; ··· 353 354 if (btree_node_noevict(b)) { 355 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_noevict]++; 356 - return -BCH_ERR_ENOMEM_btree_node_reclaim; 357 } 358 if (btree_node_write_blocked(b)) { 359 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_write_blocked]++; 360 - return -BCH_ERR_ENOMEM_btree_node_reclaim; 361 } 362 if (btree_node_will_make_reachable(b)) { 363 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_will_make_reachable]++; 364 - return -BCH_ERR_ENOMEM_btree_node_reclaim; 365 } 366 367 if (btree_node_dirty(b)) { 368 if (!flush) { 369 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_dirty]++; 370 - return -BCH_ERR_ENOMEM_btree_node_reclaim; 371 } 372 373 if (locked) { ··· 393 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_read_in_flight]++; 394 else if (btree_node_write_in_flight(b)) 395 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_write_in_flight]++; 396 - return -BCH_ERR_ENOMEM_btree_node_reclaim; 397 } 398 399 if (locked) ··· 424 425 if (!six_trylock_intent(&b->c.lock)) { 426 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_lock_intent]++; 427 - return -BCH_ERR_ENOMEM_btree_node_reclaim; 428 } 429 430 if (!six_trylock_write(&b->c.lock)) { 431 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_lock_write]++; 432 six_unlock_intent(&b->c.lock); 433 - return -BCH_ERR_ENOMEM_btree_node_reclaim; 434 } 435 436 /* recheck under lock */ ··· 682 683 return 0; 684 err: 685 - return -BCH_ERR_ENOMEM_fs_btree_cache_init; 686 } 687 688 void bch2_fs_btree_cache_init_early(struct btree_cache *bc) ··· 727 728 if (!cl) { 729 trace_and_count(c, btree_cache_cannibalize_lock_fail, trans); 730 - return -BCH_ERR_ENOMEM_btree_cache_cannibalize_lock; 731 } 732 733 closure_wait(&bc->alloc_wait, cl); ··· 741 } 742 743 trace_and_count(c, btree_cache_cannibalize_lock_fail, trans); 744 - return -BCH_ERR_btree_cache_cannibalize_lock_blocked; 745 746 success: 747 trace_and_count(c, btree_cache_cannibalize_lock, trans);
··· 149 150 b->data = kvmalloc(btree_buf_bytes(b), gfp); 151 if (!b->data) 152 + return bch_err_throw(c, ENOMEM_btree_node_mem_alloc); 153 #ifdef __KERNEL__ 154 b->aux_data = kvmalloc(btree_aux_data_bytes(b), gfp); 155 #else ··· 162 if (!b->aux_data) { 163 kvfree(b->data); 164 b->data = NULL; 165 + return bch_err_throw(c, ENOMEM_btree_node_mem_alloc); 166 } 167 168 return 0; ··· 353 354 if (btree_node_noevict(b)) { 355 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_noevict]++; 356 + return bch_err_throw(c, ENOMEM_btree_node_reclaim); 357 } 358 if (btree_node_write_blocked(b)) { 359 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_write_blocked]++; 360 + return bch_err_throw(c, ENOMEM_btree_node_reclaim); 361 } 362 if (btree_node_will_make_reachable(b)) { 363 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_will_make_reachable]++; 364 + return bch_err_throw(c, ENOMEM_btree_node_reclaim); 365 } 366 367 if (btree_node_dirty(b)) { 368 if (!flush) { 369 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_dirty]++; 370 + return bch_err_throw(c, ENOMEM_btree_node_reclaim); 371 } 372 373 if (locked) { ··· 393 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_read_in_flight]++; 394 else if (btree_node_write_in_flight(b)) 395 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_write_in_flight]++; 396 + return bch_err_throw(c, ENOMEM_btree_node_reclaim); 397 } 398 399 if (locked) ··· 424 425 if (!six_trylock_intent(&b->c.lock)) { 426 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_lock_intent]++; 427 + return bch_err_throw(c, ENOMEM_btree_node_reclaim); 428 } 429 430 if (!six_trylock_write(&b->c.lock)) { 431 bc->not_freed[BCH_BTREE_CACHE_NOT_FREED_lock_write]++; 432 six_unlock_intent(&b->c.lock); 433 + return bch_err_throw(c, ENOMEM_btree_node_reclaim); 434 } 435 436 /* recheck under lock */ ··· 682 683 return 0; 684 err: 685 + return bch_err_throw(c, ENOMEM_fs_btree_cache_init); 686 } 687 688 void bch2_fs_btree_cache_init_early(struct btree_cache *bc) ··· 727 728 if (!cl) { 729 trace_and_count(c, btree_cache_cannibalize_lock_fail, trans); 730 + return bch_err_throw(c, ENOMEM_btree_cache_cannibalize_lock); 731 } 732 733 closure_wait(&bc->alloc_wait, cl); ··· 741 } 742 743 trace_and_count(c, btree_cache_cannibalize_lock_fail, trans); 744 + return bch_err_throw(c, btree_cache_cannibalize_lock_blocked); 745 746 success: 747 trace_and_count(c, btree_cache_cannibalize_lock, trans);
+28 -29
fs/bcachefs/btree_gc.c
··· 150 151 new = kmalloc_array(BKEY_BTREE_PTR_U64s_MAX, sizeof(u64), GFP_KERNEL); 152 if (!new) 153 - return -BCH_ERR_ENOMEM_gc_repair_key; 154 155 btree_ptr_to_v2(b, new); 156 b->data->min_key = new_min; ··· 190 191 new = kmalloc_array(BKEY_BTREE_PTR_U64s_MAX, sizeof(u64), GFP_KERNEL); 192 if (!new) 193 - return -BCH_ERR_ENOMEM_gc_repair_key; 194 195 btree_ptr_to_v2(b, new); 196 b->data->max_key = new_max; ··· 935 ret = genradix_prealloc(&ca->buckets_gc, ca->mi.nbuckets, GFP_KERNEL); 936 if (ret) { 937 bch2_dev_put(ca); 938 - ret = -BCH_ERR_ENOMEM_gc_alloc_start; 939 break; 940 } 941 } ··· 1093 { 1094 struct bch_fs *c = trans->c; 1095 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1096 - struct bkey_i *u; 1097 - int ret; 1098 1099 if (unlikely(test_bit(BCH_FS_going_ro, &c->flags))) 1100 return -EROFS; 1101 1102 - rcu_read_lock(); 1103 - bkey_for_each_ptr(ptrs, ptr) { 1104 - struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 1105 - if (!ca) 1106 - continue; 1107 1108 - if (dev_ptr_stale(ca, ptr) > 16) { 1109 - rcu_read_unlock(); 1110 - goto update; 1111 } 1112 } 1113 1114 - bkey_for_each_ptr(ptrs, ptr) { 1115 - struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 1116 - if (!ca) 1117 - continue; 1118 1119 - u8 *gen = &ca->oldest_gen[PTR_BUCKET_NR(ca, ptr)]; 1120 - if (gen_after(*gen, ptr->gen)) 1121 - *gen = ptr->gen; 1122 } 1123 - rcu_read_unlock(); 1124 - return 0; 1125 - update: 1126 - u = bch2_bkey_make_mut(trans, iter, &k, 0); 1127 - ret = PTR_ERR_OR_ZERO(u); 1128 - if (ret) 1129 - return ret; 1130 1131 - bch2_extent_normalize(c, bkey_i_to_s(u)); 1132 return 0; 1133 } 1134 ··· 1180 ca->oldest_gen = kvmalloc(gens->nbuckets, GFP_KERNEL); 1181 if (!ca->oldest_gen) { 1182 bch2_dev_put(ca); 1183 - ret = -BCH_ERR_ENOMEM_gc_gens; 1184 goto err; 1185 } 1186
··· 150 151 new = kmalloc_array(BKEY_BTREE_PTR_U64s_MAX, sizeof(u64), GFP_KERNEL); 152 if (!new) 153 + return bch_err_throw(c, ENOMEM_gc_repair_key); 154 155 btree_ptr_to_v2(b, new); 156 b->data->min_key = new_min; ··· 190 191 new = kmalloc_array(BKEY_BTREE_PTR_U64s_MAX, sizeof(u64), GFP_KERNEL); 192 if (!new) 193 + return bch_err_throw(c, ENOMEM_gc_repair_key); 194 195 btree_ptr_to_v2(b, new); 196 b->data->max_key = new_max; ··· 935 ret = genradix_prealloc(&ca->buckets_gc, ca->mi.nbuckets, GFP_KERNEL); 936 if (ret) { 937 bch2_dev_put(ca); 938 + ret = bch_err_throw(c, ENOMEM_gc_alloc_start); 939 break; 940 } 941 } ··· 1093 { 1094 struct bch_fs *c = trans->c; 1095 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1096 1097 if (unlikely(test_bit(BCH_FS_going_ro, &c->flags))) 1098 return -EROFS; 1099 1100 + bool too_stale = false; 1101 + scoped_guard(rcu) { 1102 + bkey_for_each_ptr(ptrs, ptr) { 1103 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 1104 + if (!ca) 1105 + continue; 1106 1107 + too_stale |= dev_ptr_stale(ca, ptr) > 16; 1108 } 1109 + 1110 + if (!too_stale) 1111 + bkey_for_each_ptr(ptrs, ptr) { 1112 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 1113 + if (!ca) 1114 + continue; 1115 + 1116 + u8 *gen = &ca->oldest_gen[PTR_BUCKET_NR(ca, ptr)]; 1117 + if (gen_after(*gen, ptr->gen)) 1118 + *gen = ptr->gen; 1119 + } 1120 } 1121 1122 + if (too_stale) { 1123 + struct bkey_i *u = bch2_bkey_make_mut(trans, iter, &k, 0); 1124 + int ret = PTR_ERR_OR_ZERO(u); 1125 + if (ret) 1126 + return ret; 1127 1128 + bch2_extent_normalize(c, bkey_i_to_s(u)); 1129 } 1130 1131 return 0; 1132 } 1133 ··· 1181 ca->oldest_gen = kvmalloc(gens->nbuckets, GFP_KERNEL); 1182 if (!ca->oldest_gen) { 1183 bch2_dev_put(ca); 1184 + ret = bch_err_throw(c, ENOMEM_gc_gens); 1185 goto err; 1186 } 1187
+21 -22
fs/bcachefs/btree_io.c
··· 557 const char *fmt, ...) 558 { 559 if (c->recovery.curr_pass == BCH_RECOVERY_PASS_scan_for_btree_nodes) 560 - return -BCH_ERR_fsck_fix; 561 562 bool have_retry = false; 563 int ret2; ··· 572 } 573 574 if (!have_retry && ret == -BCH_ERR_btree_node_read_err_want_retry) 575 - ret = -BCH_ERR_btree_node_read_err_fixable; 576 if (!have_retry && ret == -BCH_ERR_btree_node_read_err_must_retry) 577 - ret = -BCH_ERR_btree_node_read_err_bad_node; 578 579 bch2_sb_error_count(c, err_type); 580 ··· 602 switch (ret) { 603 case -BCH_ERR_btree_node_read_err_fixable: 604 ret2 = bch2_fsck_err_opt(c, FSCK_CAN_FIX, err_type); 605 - if (ret2 != -BCH_ERR_fsck_fix && 606 - ret2 != -BCH_ERR_fsck_ignore) { 607 ret = ret2; 608 goto fsck_err; 609 } 610 611 if (!have_retry) 612 - ret = -BCH_ERR_fsck_fix; 613 goto out; 614 case -BCH_ERR_btree_node_read_err_bad_node: 615 prt_str(&out, ", "); ··· 631 switch (ret) { 632 case -BCH_ERR_btree_node_read_err_fixable: 633 ret2 = __bch2_fsck_err(c, NULL, FSCK_CAN_FIX, err_type, "%s", out.buf); 634 - if (ret2 != -BCH_ERR_fsck_fix && 635 - ret2 != -BCH_ERR_fsck_ignore) { 636 ret = ret2; 637 goto fsck_err; 638 } 639 640 if (!have_retry) 641 - ret = -BCH_ERR_fsck_fix; 642 goto out; 643 case -BCH_ERR_btree_node_read_err_bad_node: 644 prt_str(&out, ", "); ··· 660 failed, err_msg, \ 661 msg, ##__VA_ARGS__); \ 662 \ 663 - if (_ret != -BCH_ERR_fsck_fix) { \ 664 ret = _ret; \ 665 goto fsck_err; \ 666 } \ ··· 1325 1326 btree_node_reset_sib_u64s(b); 1327 1328 - rcu_read_lock(); 1329 - bkey_for_each_ptr(bch2_bkey_ptrs(bkey_i_to_s(&b->key)), ptr) { 1330 - struct bch_dev *ca2 = bch2_dev_rcu(c, ptr->dev); 1331 1332 - if (!ca2 || ca2->mi.state != BCH_MEMBER_STATE_rw) 1333 - set_btree_node_need_rewrite(b); 1334 - } 1335 - rcu_read_unlock(); 1336 1337 if (!ptr_written) 1338 set_btree_node_need_rewrite(b); ··· 1687 1688 ra = kzalloc(sizeof(*ra), GFP_NOFS); 1689 if (!ra) 1690 - return -BCH_ERR_ENOMEM_btree_node_read_all_replicas; 1691 1692 closure_init(&ra->cl, NULL); 1693 ra->c = c; ··· 1869 bch2_btree_node_hash_remove(&c->btree_cache, b); 1870 mutex_unlock(&c->btree_cache.lock); 1871 1872 - ret = -BCH_ERR_btree_node_read_error; 1873 goto err; 1874 } 1875 ··· 2019 struct bch_fs *c = trans->c; 2020 2021 if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_btree_node_scrub)) 2022 - return -BCH_ERR_erofs_no_writes; 2023 2024 struct extent_ptr_decoded pick; 2025 int ret = bch2_bkey_pick_read_device(c, k, NULL, &pick, dev); ··· 2029 struct bch_dev *ca = bch2_dev_get_ioref(c, pick.ptr.dev, READ, 2030 BCH_DEV_READ_REF_btree_node_scrub); 2031 if (!ca) { 2032 - ret = -BCH_ERR_device_offline; 2033 goto err; 2034 } 2035 ··· 2166 bch2_dev_list_has_dev(wbio->wbio.failed, ptr->dev)); 2167 2168 if (!bch2_bkey_nr_ptrs(bkey_i_to_s_c(&wbio->key))) { 2169 - ret = -BCH_ERR_btree_node_write_all_failed; 2170 goto err; 2171 } 2172
··· 557 const char *fmt, ...) 558 { 559 if (c->recovery.curr_pass == BCH_RECOVERY_PASS_scan_for_btree_nodes) 560 + return bch_err_throw(c, fsck_fix); 561 562 bool have_retry = false; 563 int ret2; ··· 572 } 573 574 if (!have_retry && ret == -BCH_ERR_btree_node_read_err_want_retry) 575 + ret = bch_err_throw(c, btree_node_read_err_fixable); 576 if (!have_retry && ret == -BCH_ERR_btree_node_read_err_must_retry) 577 + ret = bch_err_throw(c, btree_node_read_err_bad_node); 578 579 bch2_sb_error_count(c, err_type); 580 ··· 602 switch (ret) { 603 case -BCH_ERR_btree_node_read_err_fixable: 604 ret2 = bch2_fsck_err_opt(c, FSCK_CAN_FIX, err_type); 605 + if (!bch2_err_matches(ret2, BCH_ERR_fsck_fix) && 606 + !bch2_err_matches(ret2, BCH_ERR_fsck_ignore)) { 607 ret = ret2; 608 goto fsck_err; 609 } 610 611 if (!have_retry) 612 + ret = bch_err_throw(c, fsck_fix); 613 goto out; 614 case -BCH_ERR_btree_node_read_err_bad_node: 615 prt_str(&out, ", "); ··· 631 switch (ret) { 632 case -BCH_ERR_btree_node_read_err_fixable: 633 ret2 = __bch2_fsck_err(c, NULL, FSCK_CAN_FIX, err_type, "%s", out.buf); 634 + if (!bch2_err_matches(ret2, BCH_ERR_fsck_fix) && 635 + !bch2_err_matches(ret2, BCH_ERR_fsck_ignore)) { 636 ret = ret2; 637 goto fsck_err; 638 } 639 640 if (!have_retry) 641 + ret = bch_err_throw(c, fsck_fix); 642 goto out; 643 case -BCH_ERR_btree_node_read_err_bad_node: 644 prt_str(&out, ", "); ··· 660 failed, err_msg, \ 661 msg, ##__VA_ARGS__); \ 662 \ 663 + if (!bch2_err_matches(_ret, BCH_ERR_fsck_fix)) { \ 664 ret = _ret; \ 665 goto fsck_err; \ 666 } \ ··· 1325 1326 btree_node_reset_sib_u64s(b); 1327 1328 + scoped_guard(rcu) 1329 + bkey_for_each_ptr(bch2_bkey_ptrs(bkey_i_to_s(&b->key)), ptr) { 1330 + struct bch_dev *ca2 = bch2_dev_rcu(c, ptr->dev); 1331 1332 + if (!ca2 || ca2->mi.state != BCH_MEMBER_STATE_rw) 1333 + set_btree_node_need_rewrite(b); 1334 + } 1335 1336 if (!ptr_written) 1337 set_btree_node_need_rewrite(b); ··· 1688 1689 ra = kzalloc(sizeof(*ra), GFP_NOFS); 1690 if (!ra) 1691 + return bch_err_throw(c, ENOMEM_btree_node_read_all_replicas); 1692 1693 closure_init(&ra->cl, NULL); 1694 ra->c = c; ··· 1870 bch2_btree_node_hash_remove(&c->btree_cache, b); 1871 mutex_unlock(&c->btree_cache.lock); 1872 1873 + ret = bch_err_throw(c, btree_node_read_error); 1874 goto err; 1875 } 1876 ··· 2020 struct bch_fs *c = trans->c; 2021 2022 if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_btree_node_scrub)) 2023 + return bch_err_throw(c, erofs_no_writes); 2024 2025 struct extent_ptr_decoded pick; 2026 int ret = bch2_bkey_pick_read_device(c, k, NULL, &pick, dev); ··· 2030 struct bch_dev *ca = bch2_dev_get_ioref(c, pick.ptr.dev, READ, 2031 BCH_DEV_READ_REF_btree_node_scrub); 2032 if (!ca) { 2033 + ret = bch_err_throw(c, device_offline); 2034 goto err; 2035 } 2036 ··· 2167 bch2_dev_list_has_dev(wbio->wbio.failed, ptr->dev)); 2168 2169 if (!bch2_bkey_nr_ptrs(bkey_i_to_s_c(&wbio->key))) { 2170 + ret = bch_err_throw(c, btree_node_write_all_failed); 2171 goto err; 2172 } 2173
+38 -40
fs/bcachefs/btree_iter.c
··· 890 891 static noinline int btree_node_iter_and_journal_peek(struct btree_trans *trans, 892 struct btree_path *path, 893 - unsigned flags, 894 - struct bkey_buf *out) 895 { 896 struct bch_fs *c = trans->c; 897 struct btree_path_level *l = path_l(path); ··· 914 goto err; 915 } 916 917 - bch2_bkey_buf_reassemble(out, c, k); 918 919 if ((flags & BTREE_ITER_prefetch) && 920 c->opts.btree_node_prefetch) ··· 923 err: 924 bch2_btree_and_journal_iter_exit(&jiter); 925 return ret; 926 } 927 928 static __always_inline int btree_path_down(struct btree_trans *trans, ··· 951 struct btree *b; 952 unsigned level = path->level - 1; 953 enum six_lock_type lock_type = __btree_lock_want(path, level); 954 - struct bkey_buf tmp; 955 int ret; 956 957 EBUG_ON(!btree_node_locked(path, path->level)); 958 959 - bch2_bkey_buf_init(&tmp); 960 - 961 if (unlikely(trans->journal_replay_not_finished)) { 962 - ret = btree_node_iter_and_journal_peek(trans, path, flags, &tmp); 963 if (ret) 964 - goto err; 965 } else { 966 struct bkey_packed *k = bch2_btree_node_iter_peek(&l->iter, l->b); 967 - if (!k) { 968 - struct printbuf buf = PRINTBUF; 969 970 - prt_str(&buf, "node not found at pos "); 971 - bch2_bpos_to_text(&buf, path->pos); 972 - prt_str(&buf, " within parent node "); 973 - bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&l->b->key)); 974 975 - bch2_fs_fatal_error(c, "%s", buf.buf); 976 - printbuf_exit(&buf); 977 - ret = -BCH_ERR_btree_need_topology_repair; 978 - goto err; 979 - } 980 - 981 - bch2_bkey_buf_unpack(&tmp, c, l->b, k); 982 - 983 - if ((flags & BTREE_ITER_prefetch) && 984 c->opts.btree_node_prefetch) { 985 ret = btree_path_prefetch(trans, path); 986 if (ret) 987 - goto err; 988 } 989 } 990 991 - b = bch2_btree_node_get(trans, path, tmp.k, level, lock_type, trace_ip); 992 ret = PTR_ERR_OR_ZERO(b); 993 if (unlikely(ret)) 994 - goto err; 995 996 - if (likely(!trans->journal_replay_not_finished && 997 - tmp.k->k.type == KEY_TYPE_btree_ptr_v2) && 998 - unlikely(b != btree_node_mem_ptr(tmp.k))) 999 btree_node_mem_ptr_set(trans, path, level + 1, b); 1000 1001 if (btree_node_read_locked(path, level + 1)) ··· 994 bch2_btree_path_level_init(trans, path, b); 995 996 bch2_btree_path_verify_locks(trans, path); 997 - err: 998 - bch2_bkey_buf_exit(&tmp, c); 999 - return ret; 1000 } 1001 1002 static int bch2_btree_path_traverse_all(struct btree_trans *trans) ··· 1006 int ret = 0; 1007 1008 if (trans->in_traverse_all) 1009 - return -BCH_ERR_transaction_restart_in_traverse_all; 1010 1011 trans->in_traverse_all = true; 1012 retry_all: ··· 3568 struct btree_bkey_cached_common *b) 3569 { 3570 struct six_lock_count c = six_lock_counts(&b->lock); 3571 - struct task_struct *owner; 3572 pid_t pid; 3573 3574 - rcu_read_lock(); 3575 - owner = READ_ONCE(b->lock.owner); 3576 - pid = owner ? owner->pid : 0; 3577 - rcu_read_unlock(); 3578 3579 prt_printf(out, "\t%px %c ", b, b->cached ? 'c' : 'b'); 3580 bch2_btree_id_to_text(out, b->btree_id); ··· 3602 prt_printf(out, "%i %s\n", task ? task->pid : 0, trans->fn); 3603 3604 /* trans->paths is rcu protected vs. freeing */ 3605 - rcu_read_lock(); 3606 out->atomic++; 3607 3608 struct btree_path *paths = rcu_dereference(trans->paths); ··· 3645 } 3646 out: 3647 --out->atomic; 3648 - rcu_read_unlock(); 3649 } 3650 3651 void bch2_fs_btree_iter_exit(struct bch_fs *c)
··· 890 891 static noinline int btree_node_iter_and_journal_peek(struct btree_trans *trans, 892 struct btree_path *path, 893 + unsigned flags) 894 { 895 struct bch_fs *c = trans->c; 896 struct btree_path_level *l = path_l(path); ··· 915 goto err; 916 } 917 918 + bkey_reassemble(&trans->btree_path_down, k); 919 920 if ((flags & BTREE_ITER_prefetch) && 921 c->opts.btree_node_prefetch) ··· 924 err: 925 bch2_btree_and_journal_iter_exit(&jiter); 926 return ret; 927 + } 928 + 929 + static noinline_for_stack int btree_node_missing_err(struct btree_trans *trans, 930 + struct btree_path *path) 931 + { 932 + struct bch_fs *c = trans->c; 933 + struct printbuf buf = PRINTBUF; 934 + 935 + prt_str(&buf, "node not found at pos "); 936 + bch2_bpos_to_text(&buf, path->pos); 937 + prt_str(&buf, " within parent node "); 938 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&path_l(path)->b->key)); 939 + 940 + bch2_fs_fatal_error(c, "%s", buf.buf); 941 + printbuf_exit(&buf); 942 + return bch_err_throw(c, btree_need_topology_repair); 943 } 944 945 static __always_inline int btree_path_down(struct btree_trans *trans, ··· 936 struct btree *b; 937 unsigned level = path->level - 1; 938 enum six_lock_type lock_type = __btree_lock_want(path, level); 939 int ret; 940 941 EBUG_ON(!btree_node_locked(path, path->level)); 942 943 if (unlikely(trans->journal_replay_not_finished)) { 944 + ret = btree_node_iter_and_journal_peek(trans, path, flags); 945 if (ret) 946 + return ret; 947 } else { 948 struct bkey_packed *k = bch2_btree_node_iter_peek(&l->iter, l->b); 949 + if (unlikely(!k)) 950 + return btree_node_missing_err(trans, path); 951 952 + bch2_bkey_unpack(l->b, &trans->btree_path_down, k); 953 954 + if (unlikely((flags & BTREE_ITER_prefetch)) && 955 c->opts.btree_node_prefetch) { 956 ret = btree_path_prefetch(trans, path); 957 if (ret) 958 + return ret; 959 } 960 } 961 962 + b = bch2_btree_node_get(trans, path, &trans->btree_path_down, 963 + level, lock_type, trace_ip); 964 ret = PTR_ERR_OR_ZERO(b); 965 if (unlikely(ret)) 966 + return ret; 967 968 + if (unlikely(b != btree_node_mem_ptr(&trans->btree_path_down)) && 969 + likely(!trans->journal_replay_not_finished && 970 + trans->btree_path_down.k.type == KEY_TYPE_btree_ptr_v2)) 971 btree_node_mem_ptr_set(trans, path, level + 1, b); 972 973 if (btree_node_read_locked(path, level + 1)) ··· 992 bch2_btree_path_level_init(trans, path, b); 993 994 bch2_btree_path_verify_locks(trans, path); 995 + return 0; 996 } 997 998 static int bch2_btree_path_traverse_all(struct btree_trans *trans) ··· 1006 int ret = 0; 1007 1008 if (trans->in_traverse_all) 1009 + return bch_err_throw(c, transaction_restart_in_traverse_all); 1010 1011 trans->in_traverse_all = true; 1012 retry_all: ··· 3568 struct btree_bkey_cached_common *b) 3569 { 3570 struct six_lock_count c = six_lock_counts(&b->lock); 3571 pid_t pid; 3572 3573 + scoped_guard(rcu) { 3574 + struct task_struct *owner = READ_ONCE(b->lock.owner); 3575 + pid = owner ? owner->pid : 0; 3576 + } 3577 3578 prt_printf(out, "\t%px %c ", b, b->cached ? 'c' : 'b'); 3579 bch2_btree_id_to_text(out, b->btree_id); ··· 3603 prt_printf(out, "%i %s\n", task ? task->pid : 0, trans->fn); 3604 3605 /* trans->paths is rcu protected vs. freeing */ 3606 + guard(rcu)(); 3607 out->atomic++; 3608 3609 struct btree_path *paths = rcu_dereference(trans->paths); ··· 3646 } 3647 out: 3648 --out->atomic; 3649 } 3650 3651 void bch2_fs_btree_iter_exit(struct bch_fs *c)
+21 -10
fs/bcachefs/btree_iter.h
··· 963 _p; \ 964 }) 965 966 - #define bch2_trans_run(_c, _do) \ 967 - ({ \ 968 - struct btree_trans *trans = bch2_trans_get(_c); \ 969 - int _ret = (_do); \ 970 - bch2_trans_put(trans); \ 971 - _ret; \ 972 - }) 973 - 974 - #define bch2_trans_do(_c, _do) bch2_trans_run(_c, lockrestart_do(trans, _do)) 975 - 976 struct btree_trans *__bch2_trans_get(struct bch_fs *, unsigned); 977 void bch2_trans_put(struct btree_trans *); 978 ··· 979 trans_fn_idx = bch2_trans_get_fn_idx(__func__); \ 980 __bch2_trans_get(_c, trans_fn_idx); \ 981 }) 982 983 void bch2_btree_trans_to_text(struct printbuf *, struct btree_trans *); 984
··· 963 _p; \ 964 }) 965 966 struct btree_trans *__bch2_trans_get(struct bch_fs *, unsigned); 967 void bch2_trans_put(struct btree_trans *); 968 ··· 989 trans_fn_idx = bch2_trans_get_fn_idx(__func__); \ 990 __bch2_trans_get(_c, trans_fn_idx); \ 991 }) 992 + 993 + /* 994 + * We don't use DEFINE_CLASS() because using a function for the constructor 995 + * breaks bch2_trans_get()'s use of __func__ 996 + */ 997 + typedef struct btree_trans * class_btree_trans_t; 998 + static inline void class_btree_trans_destructor(struct btree_trans **p) 999 + { 1000 + struct btree_trans *trans = *p; 1001 + bch2_trans_put(trans); 1002 + } 1003 + 1004 + #define class_btree_trans_constructor(_c) bch2_trans_get(_c) 1005 + 1006 + #define bch2_trans_run(_c, _do) \ 1007 + ({ \ 1008 + CLASS(btree_trans, trans)(_c); \ 1009 + (_do); \ 1010 + }) 1011 + 1012 + #define bch2_trans_do(_c, _do) bch2_trans_run(_c, lockrestart_do(trans, _do)) 1013 1014 void bch2_btree_trans_to_text(struct printbuf *, struct btree_trans *); 1015
+7 -12
fs/bcachefs/btree_journal_iter.c
··· 292 if (!new_keys.data) { 293 bch_err(c, "%s: error allocating new key array (size %zu)", 294 __func__, new_keys.size); 295 - return -BCH_ERR_ENOMEM_journal_key_insert; 296 } 297 298 /* Since @keys was full, there was no gap: */ ··· 331 332 n = kmalloc(bkey_bytes(&k->k), GFP_KERNEL); 333 if (!n) 334 - return -BCH_ERR_ENOMEM_journal_key_insert; 335 336 bkey_copy(n, k); 337 ret = bch2_journal_key_insert_take(c, id, level, n); ··· 457 458 static struct bkey_s_c bch2_journal_iter_peek(struct journal_iter *iter) 459 { 460 - struct bkey_s_c ret = bkey_s_c_null; 461 - 462 journal_iter_verify(iter); 463 464 - rcu_read_lock(); 465 while (iter->idx < iter->keys->size) { 466 struct journal_key *k = iter->keys->data + iter->idx; 467 ··· 468 break; 469 BUG_ON(cmp); 470 471 - if (!k->overwritten) { 472 - ret = bkey_i_to_s_c(k->k); 473 - break; 474 - } 475 476 if (k->overwritten_range) 477 iter->idx = idx_to_pos(iter->keys, rcu_dereference(k->overwritten_range)->end); 478 else 479 bch2_journal_iter_advance(iter); 480 } 481 - rcu_read_unlock(); 482 483 - return ret; 484 } 485 486 static void bch2_journal_iter_exit(struct journal_iter *iter) ··· 736 if (keys->nr * 8 > keys->size * 7) { 737 bch_err(c, "Too many journal keys for slowpath; have %zu compacted, buf size %zu, processed %zu keys at seq %llu", 738 keys->nr, keys->size, nr_read, le64_to_cpu(i->j.seq)); 739 - return -BCH_ERR_ENOMEM_journal_keys_sort; 740 } 741 742 BUG_ON(darray_push(keys, n));
··· 292 if (!new_keys.data) { 293 bch_err(c, "%s: error allocating new key array (size %zu)", 294 __func__, new_keys.size); 295 + return bch_err_throw(c, ENOMEM_journal_key_insert); 296 } 297 298 /* Since @keys was full, there was no gap: */ ··· 331 332 n = kmalloc(bkey_bytes(&k->k), GFP_KERNEL); 333 if (!n) 334 + return bch_err_throw(c, ENOMEM_journal_key_insert); 335 336 bkey_copy(n, k); 337 ret = bch2_journal_key_insert_take(c, id, level, n); ··· 457 458 static struct bkey_s_c bch2_journal_iter_peek(struct journal_iter *iter) 459 { 460 journal_iter_verify(iter); 461 462 + guard(rcu)(); 463 while (iter->idx < iter->keys->size) { 464 struct journal_key *k = iter->keys->data + iter->idx; 465 ··· 470 break; 471 BUG_ON(cmp); 472 473 + if (!k->overwritten) 474 + return bkey_i_to_s_c(k->k); 475 476 if (k->overwritten_range) 477 iter->idx = idx_to_pos(iter->keys, rcu_dereference(k->overwritten_range)->end); 478 else 479 bch2_journal_iter_advance(iter); 480 } 481 482 + return bkey_s_c_null; 483 } 484 485 static void bch2_journal_iter_exit(struct journal_iter *iter) ··· 741 if (keys->nr * 8 > keys->size * 7) { 742 bch_err(c, "Too many journal keys for slowpath; have %zu compacted, buf size %zu, processed %zu keys at seq %llu", 743 keys->nr, keys->size, nr_read, le64_to_cpu(i->j.seq)); 744 + return bch_err_throw(c, ENOMEM_journal_keys_sort); 745 } 746 747 BUG_ON(darray_push(keys, n));
+12 -16
fs/bcachefs/btree_key_cache.c
··· 187 static struct bkey_cached * 188 bkey_cached_reuse(struct btree_key_cache *c) 189 { 190 - struct bucket_table *tbl; 191 struct rhash_head *pos; 192 struct bkey_cached *ck; 193 - unsigned i; 194 195 - rcu_read_lock(); 196 - tbl = rht_dereference_rcu(c->table.tbl, &c->table); 197 - for (i = 0; i < tbl->size; i++) 198 rht_for_each_entry_rcu(ck, pos, tbl, i, hash) { 199 if (!test_bit(BKEY_CACHED_DIRTY, &ck->flags) && 200 bkey_cached_lock_for_evict(ck)) { 201 if (bkey_cached_evict(c, ck)) 202 - goto out; 203 six_unlock_write(&ck->c.lock); 204 six_unlock_intent(&ck->c.lock); 205 } 206 } 207 - ck = NULL; 208 - out: 209 - rcu_read_unlock(); 210 - return ck; 211 } 212 213 static int btree_key_cache_create(struct btree_trans *trans, ··· 238 if (unlikely(!ck)) { 239 bch_err(c, "error allocating memory for key cache item, btree %s", 240 bch2_btree_id_str(ck_path->btree_id)); 241 - return -BCH_ERR_ENOMEM_btree_key_cache_create; 242 } 243 } 244 ··· 256 if (unlikely(!new_k)) { 257 bch_err(trans->c, "error allocating memory for key cache key, btree %s u64s %u", 258 bch2_btree_id_str(ck->key.btree_id), key_u64s); 259 - ret = -BCH_ERR_ENOMEM_btree_key_cache_fill; 260 } else if (ret) { 261 kfree(new_k); 262 goto err; ··· 822 823 bc->nr_pending = alloc_percpu(size_t); 824 if (!bc->nr_pending) 825 - return -BCH_ERR_ENOMEM_fs_btree_cache_init; 826 827 if (rcu_pending_init(&bc->pending[0], &c->btree_trans_barrier, __bkey_cached_free) || 828 rcu_pending_init(&bc->pending[1], &c->btree_trans_barrier, __bkey_cached_free)) 829 - return -BCH_ERR_ENOMEM_fs_btree_cache_init; 830 831 if (rhashtable_init(&bc->table, &bch2_btree_key_cache_params)) 832 - return -BCH_ERR_ENOMEM_fs_btree_cache_init; 833 834 bc->table_init_done = true; 835 836 shrink = shrinker_alloc(0, "%s-btree_key_cache", c->name); 837 if (!shrink) 838 - return -BCH_ERR_ENOMEM_fs_btree_cache_init; 839 bc->shrink = shrink; 840 shrink->count_objects = bch2_btree_key_cache_count; 841 shrink->scan_objects = bch2_btree_key_cache_scan;
··· 187 static struct bkey_cached * 188 bkey_cached_reuse(struct btree_key_cache *c) 189 { 190 + 191 + guard(rcu)(); 192 + struct bucket_table *tbl = rht_dereference_rcu(c->table.tbl, &c->table); 193 struct rhash_head *pos; 194 struct bkey_cached *ck; 195 196 + for (unsigned i = 0; i < tbl->size; i++) 197 rht_for_each_entry_rcu(ck, pos, tbl, i, hash) { 198 if (!test_bit(BKEY_CACHED_DIRTY, &ck->flags) && 199 bkey_cached_lock_for_evict(ck)) { 200 if (bkey_cached_evict(c, ck)) 201 + return ck; 202 six_unlock_write(&ck->c.lock); 203 six_unlock_intent(&ck->c.lock); 204 } 205 } 206 + return NULL; 207 } 208 209 static int btree_key_cache_create(struct btree_trans *trans, ··· 242 if (unlikely(!ck)) { 243 bch_err(c, "error allocating memory for key cache item, btree %s", 244 bch2_btree_id_str(ck_path->btree_id)); 245 + return bch_err_throw(c, ENOMEM_btree_key_cache_create); 246 } 247 } 248 ··· 260 if (unlikely(!new_k)) { 261 bch_err(trans->c, "error allocating memory for key cache key, btree %s u64s %u", 262 bch2_btree_id_str(ck->key.btree_id), key_u64s); 263 + ret = bch_err_throw(c, ENOMEM_btree_key_cache_fill); 264 } else if (ret) { 265 kfree(new_k); 266 goto err; ··· 826 827 bc->nr_pending = alloc_percpu(size_t); 828 if (!bc->nr_pending) 829 + return bch_err_throw(c, ENOMEM_fs_btree_cache_init); 830 831 if (rcu_pending_init(&bc->pending[0], &c->btree_trans_barrier, __bkey_cached_free) || 832 rcu_pending_init(&bc->pending[1], &c->btree_trans_barrier, __bkey_cached_free)) 833 + return bch_err_throw(c, ENOMEM_fs_btree_cache_init); 834 835 if (rhashtable_init(&bc->table, &bch2_btree_key_cache_params)) 836 + return bch_err_throw(c, ENOMEM_fs_btree_cache_init); 837 838 bc->table_init_done = true; 839 840 shrink = shrinker_alloc(0, "%s-btree_key_cache", c->name); 841 if (!shrink) 842 + return bch_err_throw(c, ENOMEM_fs_btree_cache_init); 843 bc->shrink = shrink; 844 shrink->count_objects = bch2_btree_key_cache_count; 845 shrink->scan_objects = bch2_btree_key_cache_scan;
+29 -27
fs/bcachefs/btree_locking.c
··· 194 return 3; 195 } 196 197 static noinline int break_cycle(struct lock_graph *g, struct printbuf *cycle, 198 struct trans_waiting_for_lock *from) 199 { ··· 243 } 244 } 245 246 - if (unlikely(!best)) { 247 - struct printbuf buf = PRINTBUF; 248 - buf.atomic++; 249 - 250 - prt_printf(&buf, bch2_fmt(g->g->trans->c, "cycle of nofail locks")); 251 - 252 - for (i = g->g; i < g->g + g->nr; i++) { 253 - struct btree_trans *trans = i->trans; 254 - 255 - bch2_btree_trans_to_text(&buf, trans); 256 - 257 - prt_printf(&buf, "backtrace:\n"); 258 - printbuf_indent_add(&buf, 2); 259 - bch2_prt_task_backtrace(&buf, trans->locking_wait.task, 2, GFP_NOWAIT); 260 - printbuf_indent_sub(&buf, 2); 261 - prt_newline(&buf); 262 - } 263 - 264 - bch2_print_str_nonblocking(g->g->trans->c, KERN_ERR, buf.buf); 265 - printbuf_exit(&buf); 266 - BUG(); 267 - } 268 269 ret = abort_lock(g, abort); 270 out: ··· 259 struct printbuf *cycle) 260 { 261 struct btree_trans *orig_trans = g->g->trans; 262 - struct trans_waiting_for_lock *i; 263 264 - for (i = g->g; i < g->g + g->nr; i++) 265 if (i->trans == trans) { 266 closure_put(&trans->ref); 267 return break_cycle(g, cycle, i); 268 } 269 270 - if (g->nr == ARRAY_SIZE(g->g)) { 271 closure_put(&trans->ref); 272 273 if (orig_trans->lock_may_not_fail) ··· 311 lock_graph_down(&g, trans); 312 313 /* trans->paths is rcu protected vs. freeing */ 314 - rcu_read_lock(); 315 if (cycle) 316 cycle->atomic++; 317 next: ··· 409 out: 410 if (cycle) 411 --cycle->atomic; 412 - rcu_read_unlock(); 413 return ret; 414 } 415
··· 194 return 3; 195 } 196 197 + static noinline __noreturn void break_cycle_fail(struct lock_graph *g) 198 + { 199 + struct printbuf buf = PRINTBUF; 200 + buf.atomic++; 201 + 202 + prt_printf(&buf, bch2_fmt(g->g->trans->c, "cycle of nofail locks")); 203 + 204 + for (struct trans_waiting_for_lock *i = g->g; i < g->g + g->nr; i++) { 205 + struct btree_trans *trans = i->trans; 206 + 207 + bch2_btree_trans_to_text(&buf, trans); 208 + 209 + prt_printf(&buf, "backtrace:\n"); 210 + printbuf_indent_add(&buf, 2); 211 + bch2_prt_task_backtrace(&buf, trans->locking_wait.task, 2, GFP_NOWAIT); 212 + printbuf_indent_sub(&buf, 2); 213 + prt_newline(&buf); 214 + } 215 + 216 + bch2_print_str_nonblocking(g->g->trans->c, KERN_ERR, buf.buf); 217 + printbuf_exit(&buf); 218 + BUG(); 219 + } 220 + 221 static noinline int break_cycle(struct lock_graph *g, struct printbuf *cycle, 222 struct trans_waiting_for_lock *from) 223 { ··· 219 } 220 } 221 222 + if (unlikely(!best)) 223 + break_cycle_fail(g); 224 225 ret = abort_lock(g, abort); 226 out: ··· 255 struct printbuf *cycle) 256 { 257 struct btree_trans *orig_trans = g->g->trans; 258 259 + for (struct trans_waiting_for_lock *i = g->g; i < g->g + g->nr; i++) 260 if (i->trans == trans) { 261 closure_put(&trans->ref); 262 return break_cycle(g, cycle, i); 263 } 264 265 + if (unlikely(g->nr == ARRAY_SIZE(g->g))) { 266 closure_put(&trans->ref); 267 268 if (orig_trans->lock_may_not_fail) ··· 308 lock_graph_down(&g, trans); 309 310 /* trans->paths is rcu protected vs. freeing */ 311 + guard(rcu)(); 312 if (cycle) 313 cycle->atomic++; 314 next: ··· 406 out: 407 if (cycle) 408 --cycle->atomic; 409 return ret; 410 } 411
+2
fs/bcachefs/btree_node_scan.c
··· 363 min_heap_sift_down(nodes_heap, 0, &found_btree_node_heap_cbs, NULL); 364 } 365 } 366 } 367 368 return 0;
··· 363 min_heap_sift_down(nodes_heap, 0, &found_btree_node_heap_cbs, NULL); 364 } 365 } 366 + 367 + cond_resched(); 368 } 369 370 return 0;
+25 -11
fs/bcachefs/btree_trans_commit.c
··· 376 struct btree *b, unsigned u64s) 377 { 378 if (!bch2_btree_node_insert_fits(b, u64s)) 379 - return -BCH_ERR_btree_insert_btree_node_full; 380 381 return 0; 382 } ··· 394 395 new_k = kmalloc(new_u64s * sizeof(u64), GFP_KERNEL); 396 if (!new_k) { 397 - bch_err(trans->c, "error allocating memory for key cache key, btree %s u64s %u", 398 bch2_btree_id_str(path->btree_id), new_u64s); 399 - return -BCH_ERR_ENOMEM_btree_key_cache_insert; 400 } 401 402 ret = bch2_trans_relock(trans) ?: ··· 433 if (watermark < BCH_WATERMARK_reclaim && 434 !test_bit(BKEY_CACHED_DIRTY, &ck->flags) && 435 bch2_btree_key_cache_must_wait(c)) 436 - return -BCH_ERR_btree_insert_need_journal_reclaim; 437 438 /* 439 * bch2_varint_decode can read past the end of the buffer by at most 7 ··· 895 */ 896 if ((flags & BCH_TRANS_COMMIT_journal_reclaim) && 897 watermark < BCH_WATERMARK_reclaim) { 898 - ret = -BCH_ERR_journal_reclaim_would_deadlock; 899 goto out; 900 } 901 ··· 967 968 for (struct jset_entry *i = btree_trans_journal_entries_start(trans); 969 i != btree_trans_journal_entries_top(trans); 970 - i = vstruct_next(i)) 971 if (i->type == BCH_JSET_ENTRY_btree_keys || 972 i->type == BCH_JSET_ENTRY_write_buffer_keys) { 973 - int ret = bch2_journal_key_insert(c, i->btree_id, i->level, i->start); 974 - if (ret) 975 - return ret; 976 } 977 978 for (struct bkey_i *i = btree_trans_subbuf_base(trans, &trans->accounting); 979 i != btree_trans_subbuf_top(trans, &trans->accounting); ··· 1025 if (unlikely(!test_bit(BCH_FS_may_go_rw, &c->flags))) 1026 ret = do_bch2_trans_commit_to_journal_replay(trans); 1027 else 1028 - ret = -BCH_ERR_erofs_trans_commit; 1029 goto out_reset; 1030 } 1031 ··· 1107 * restart: 1108 */ 1109 if (flags & BCH_TRANS_COMMIT_no_journal_res) { 1110 - ret = -BCH_ERR_transaction_restart_nested; 1111 goto out; 1112 } 1113
··· 376 struct btree *b, unsigned u64s) 377 { 378 if (!bch2_btree_node_insert_fits(b, u64s)) 379 + return bch_err_throw(trans->c, btree_insert_btree_node_full); 380 381 return 0; 382 } ··· 394 395 new_k = kmalloc(new_u64s * sizeof(u64), GFP_KERNEL); 396 if (!new_k) { 397 + struct bch_fs *c = trans->c; 398 + bch_err(c, "error allocating memory for key cache key, btree %s u64s %u", 399 bch2_btree_id_str(path->btree_id), new_u64s); 400 + return bch_err_throw(c, ENOMEM_btree_key_cache_insert); 401 } 402 403 ret = bch2_trans_relock(trans) ?: ··· 432 if (watermark < BCH_WATERMARK_reclaim && 433 !test_bit(BKEY_CACHED_DIRTY, &ck->flags) && 434 bch2_btree_key_cache_must_wait(c)) 435 + return bch_err_throw(c, btree_insert_need_journal_reclaim); 436 437 /* 438 * bch2_varint_decode can read past the end of the buffer by at most 7 ··· 894 */ 895 if ((flags & BCH_TRANS_COMMIT_journal_reclaim) && 896 watermark < BCH_WATERMARK_reclaim) { 897 + ret = bch_err_throw(c, journal_reclaim_would_deadlock); 898 goto out; 899 } 900 ··· 966 967 for (struct jset_entry *i = btree_trans_journal_entries_start(trans); 968 i != btree_trans_journal_entries_top(trans); 969 + i = vstruct_next(i)) { 970 if (i->type == BCH_JSET_ENTRY_btree_keys || 971 i->type == BCH_JSET_ENTRY_write_buffer_keys) { 972 + jset_entry_for_each_key(i, k) { 973 + int ret = bch2_journal_key_insert(c, i->btree_id, i->level, k); 974 + if (ret) 975 + return ret; 976 + } 977 } 978 + 979 + if (i->type == BCH_JSET_ENTRY_btree_root) { 980 + guard(mutex)(&c->btree_root_lock); 981 + 982 + struct btree_root *r = bch2_btree_id_root(c, i->btree_id); 983 + 984 + bkey_copy(&r->key, i->start); 985 + r->level = i->level; 986 + r->alive = true; 987 + } 988 + } 989 990 for (struct bkey_i *i = btree_trans_subbuf_base(trans, &trans->accounting); 991 i != btree_trans_subbuf_top(trans, &trans->accounting); ··· 1011 if (unlikely(!test_bit(BCH_FS_may_go_rw, &c->flags))) 1012 ret = do_bch2_trans_commit_to_journal_replay(trans); 1013 else 1014 + ret = bch_err_throw(c, erofs_trans_commit); 1015 goto out_reset; 1016 } 1017 ··· 1093 * restart: 1094 */ 1095 if (flags & BCH_TRANS_COMMIT_no_journal_res) { 1096 + ret = bch_err_throw(c, transaction_restart_nested); 1097 goto out; 1098 } 1099
+2
fs/bcachefs/btree_types.h
··· 555 unsigned journal_u64s; 556 unsigned extra_disk_res; /* XXX kill */ 557 558 #ifdef CONFIG_DEBUG_LOCK_ALLOC 559 struct lockdep_map dep_map; 560 #endif
··· 555 unsigned journal_u64s; 556 unsigned extra_disk_res; /* XXX kill */ 557 558 + __BKEY_PADDED(btree_path_down, BKEY_BTREE_PTR_VAL_U64s_MAX); 559 + 560 #ifdef CONFIG_DEBUG_LOCK_ALLOC 561 struct lockdep_map dep_map; 562 #endif
+19 -40
fs/bcachefs/btree_update.c
··· 123 } 124 125 int __bch2_insert_snapshot_whiteouts(struct btree_trans *trans, 126 - enum btree_id id, 127 - struct bpos old_pos, 128 - struct bpos new_pos) 129 { 130 - struct bch_fs *c = trans->c; 131 - struct btree_iter old_iter, new_iter = {}; 132 - struct bkey_s_c old_k, new_k; 133 - snapshot_id_list s; 134 - struct bkey_i *update; 135 int ret = 0; 136 137 - if (!bch2_snapshot_has_children(c, old_pos.snapshot)) 138 - return 0; 139 140 - darray_init(&s); 141 - 142 - bch2_trans_iter_init(trans, &old_iter, id, old_pos, 143 - BTREE_ITER_not_extents| 144 - BTREE_ITER_all_snapshots); 145 - while ((old_k = bch2_btree_iter_prev(trans, &old_iter)).k && 146 - !(ret = bkey_err(old_k)) && 147 - bkey_eq(old_pos, old_k.k->p)) { 148 - struct bpos whiteout_pos = 149 - SPOS(new_pos.inode, new_pos.offset, old_k.k->p.snapshot); 150 - 151 - if (!bch2_snapshot_is_ancestor(c, old_k.k->p.snapshot, old_pos.snapshot) || 152 - snapshot_list_has_ancestor(c, &s, old_k.k->p.snapshot)) 153 - continue; 154 - 155 - new_k = bch2_bkey_get_iter(trans, &new_iter, id, whiteout_pos, 156 - BTREE_ITER_not_extents| 157 - BTREE_ITER_intent); 158 - ret = bkey_err(new_k); 159 if (ret) 160 break; 161 162 - if (new_k.k->type == KEY_TYPE_deleted) { 163 - update = bch2_trans_kmalloc(trans, sizeof(struct bkey_i)); 164 ret = PTR_ERR_OR_ZERO(update); 165 - if (ret) 166 break; 167 168 bkey_init(&update->k); 169 - update->k.p = whiteout_pos; 170 update->k.type = KEY_TYPE_whiteout; 171 172 - ret = bch2_trans_update(trans, &new_iter, update, 173 BTREE_UPDATE_internal_snapshot_node); 174 } 175 - bch2_trans_iter_exit(trans, &new_iter); 176 177 - ret = snapshot_list_add(c, &s, old_k.k->p.snapshot); 178 if (ret) 179 break; 180 } 181 - bch2_trans_iter_exit(trans, &new_iter); 182 - bch2_trans_iter_exit(trans, &old_iter); 183 - darray_exit(&s); 184 185 return ret; 186 } 187 ··· 587 BUG_ON(k.k->type != KEY_TYPE_deleted); 588 589 if (bkey_gt(k.k->p, end)) { 590 - ret = -BCH_ERR_ENOSPC_btree_slot; 591 goto err; 592 } 593
··· 123 } 124 125 int __bch2_insert_snapshot_whiteouts(struct btree_trans *trans, 126 + enum btree_id btree, struct bpos pos, 127 + snapshot_id_list *s) 128 { 129 int ret = 0; 130 131 + darray_for_each(*s, id) { 132 + pos.snapshot = *id; 133 134 + struct btree_iter iter; 135 + struct bkey_s_c k = bch2_bkey_get_iter(trans, &iter, btree, pos, 136 + BTREE_ITER_not_extents| 137 + BTREE_ITER_intent); 138 + ret = bkey_err(k); 139 if (ret) 140 break; 141 142 + if (k.k->type == KEY_TYPE_deleted) { 143 + struct bkey_i *update = bch2_trans_kmalloc(trans, sizeof(struct bkey_i)); 144 ret = PTR_ERR_OR_ZERO(update); 145 + if (ret) { 146 + bch2_trans_iter_exit(trans, &iter); 147 break; 148 + } 149 150 bkey_init(&update->k); 151 + update->k.p = pos; 152 update->k.type = KEY_TYPE_whiteout; 153 154 + ret = bch2_trans_update(trans, &iter, update, 155 BTREE_UPDATE_internal_snapshot_node); 156 } 157 + bch2_trans_iter_exit(trans, &iter); 158 159 if (ret) 160 break; 161 } 162 163 + darray_exit(s); 164 return ret; 165 } 166 ··· 608 BUG_ON(k.k->type != KEY_TYPE_deleted); 609 610 if (bkey_gt(k.k->p, end)) { 611 + ret = bch_err_throw(trans->c, ENOSPC_btree_slot); 612 goto err; 613 } 614
+12 -2
fs/bcachefs/btree_update.h
··· 4 5 #include "btree_iter.h" 6 #include "journal.h" 7 8 struct bch_fs; 9 struct btree; ··· 75 } 76 77 int __bch2_insert_snapshot_whiteouts(struct btree_trans *, enum btree_id, 78 - struct bpos, struct bpos); 79 80 /* 81 * For use when splitting extents in existing snapshots: ··· 89 struct bpos old_pos, 90 struct bpos new_pos) 91 { 92 if (!btree_type_has_snapshots(btree) || 93 bkey_eq(old_pos, new_pos)) 94 return 0; 95 96 - return __bch2_insert_snapshot_whiteouts(trans, btree, old_pos, new_pos); 97 } 98 99 int bch2_trans_update_extent_overwrite(struct btree_trans *, struct btree_iter *,
··· 4 5 #include "btree_iter.h" 6 #include "journal.h" 7 + #include "snapshot.h" 8 9 struct bch_fs; 10 struct btree; ··· 74 } 75 76 int __bch2_insert_snapshot_whiteouts(struct btree_trans *, enum btree_id, 77 + struct bpos, snapshot_id_list *); 78 79 /* 80 * For use when splitting extents in existing snapshots: ··· 88 struct bpos old_pos, 89 struct bpos new_pos) 90 { 91 + BUG_ON(old_pos.snapshot != new_pos.snapshot); 92 + 93 if (!btree_type_has_snapshots(btree) || 94 bkey_eq(old_pos, new_pos)) 95 return 0; 96 97 + snapshot_id_list s; 98 + int ret = bch2_get_snapshot_overwrites(trans, btree, old_pos, &s); 99 + if (ret) 100 + return ret; 101 + 102 + return s.nr 103 + ? __bch2_insert_snapshot_whiteouts(trans, btree, new_pos, &s) 104 + : 0; 105 } 106 107 int bch2_trans_update_extent_overwrite(struct btree_trans *, struct btree_iter *,
+61 -43
fs/bcachefs/btree_update_interior.c
··· 57 struct bkey_buf prev; 58 int ret = 0; 59 60 - printbuf_indent_add_nextline(&buf, 2); 61 - 62 BUG_ON(b->key.k.type == KEY_TYPE_btree_ptr_v2 && 63 !bpos_eq(bkey_i_to_btree_ptr_v2(&b->key)->v.min_key, 64 b->data->min_key)); ··· 67 68 if (b == btree_node_root(c, b)) { 69 if (!bpos_eq(b->data->min_key, POS_MIN)) { 70 - ret = __bch2_topology_error(c, &buf); 71 - 72 bch2_bpos_to_text(&buf, b->data->min_key); 73 - log_fsck_err(trans, btree_root_bad_min_key, 74 - "btree root with incorrect min_key: %s", buf.buf); 75 - goto out; 76 } 77 78 if (!bpos_eq(b->data->max_key, SPOS_MAX)) { 79 - ret = __bch2_topology_error(c, &buf); 80 bch2_bpos_to_text(&buf, b->data->max_key); 81 - log_fsck_err(trans, btree_root_bad_max_key, 82 - "btree root with incorrect max_key: %s", buf.buf); 83 - goto out; 84 } 85 } 86 ··· 101 : bpos_successor(prev.k->k.p); 102 103 if (!bpos_eq(expected_min, bp.v->min_key)) { 104 - ret = __bch2_topology_error(c, &buf); 105 - 106 - prt_str(&buf, "end of prev node doesn't match start of next node\nin "); 107 - bch2_btree_id_level_to_text(&buf, b->c.btree_id, b->c.level); 108 - prt_str(&buf, " node "); 109 - bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 110 prt_str(&buf, "\nprev "); 111 bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(prev.k)); 112 prt_str(&buf, "\nnext "); 113 bch2_bkey_val_to_text(&buf, c, k); 114 115 - log_fsck_err(trans, btree_node_topology_bad_min_key, "%s", buf.buf); 116 - goto out; 117 } 118 119 bch2_bkey_buf_reassemble(&prev, c, k); ··· 117 } 118 119 if (bkey_deleted(&prev.k->k)) { 120 - ret = __bch2_topology_error(c, &buf); 121 122 - prt_str(&buf, "empty interior node\nin "); 123 - bch2_btree_id_level_to_text(&buf, b->c.btree_id, b->c.level); 124 - prt_str(&buf, " node "); 125 - bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 126 - 127 - log_fsck_err(trans, btree_node_topology_empty_interior_node, "%s", buf.buf); 128 - } else if (!bpos_eq(prev.k->k.p, b->key.k.p)) { 129 - ret = __bch2_topology_error(c, &buf); 130 - 131 - prt_str(&buf, "last child node doesn't end at end of parent node\nin "); 132 - bch2_btree_id_level_to_text(&buf, b->c.btree_id, b->c.level); 133 - prt_str(&buf, " node "); 134 - bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 135 - prt_str(&buf, "\nlast key "); 136 bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(prev.k)); 137 138 - log_fsck_err(trans, btree_node_topology_bad_max_key, "%s", buf.buf); 139 } 140 out: 141 - fsck_err: 142 bch2_btree_and_journal_iter_exit(&iter); 143 bch2_bkey_buf_exit(&prev, c); 144 printbuf_exit(&buf); 145 return ret; 146 } 147 148 /* Calculate ideal packed bkey format for new btree nodes: */ ··· 684 685 /* 686 * Wait for any in flight writes to finish before we free the old nodes 687 - * on disk: 688 */ 689 for (i = 0; i < as->nr_old_nodes; i++) { 690 b = as->old_nodes[i]; 691 692 - if (btree_node_seq_matches(b, as->old_nodes_seq[i])) 693 wait_on_bit_io(&b->flags, BTREE_NODE_write_in_flight_inner, 694 TASK_UNINTERRUPTIBLE); 695 } ··· 1263 if (bch2_err_matches(ret, ENOSPC) && 1264 (flags & BCH_TRANS_COMMIT_journal_reclaim) && 1265 watermark < BCH_WATERMARK_reclaim) { 1266 - ret = -BCH_ERR_journal_reclaim_would_deadlock; 1267 goto err; 1268 } 1269 ··· 2196 if (btree_iter_path(trans, iter)->l[b->c.level].b != b) { 2197 /* node has been freed: */ 2198 BUG_ON(!btree_node_dying(b)); 2199 - ret = -BCH_ERR_btree_node_dying; 2200 goto err; 2201 } 2202 ··· 2810 c->btree_interior_update_worker = 2811 alloc_workqueue("btree_update", WQ_UNBOUND|WQ_MEM_RECLAIM, 8); 2812 if (!c->btree_interior_update_worker) 2813 - return -BCH_ERR_ENOMEM_btree_interior_update_worker_init; 2814 2815 c->btree_node_rewrite_worker = 2816 alloc_ordered_workqueue("btree_node_rewrite", WQ_UNBOUND); 2817 if (!c->btree_node_rewrite_worker) 2818 - return -BCH_ERR_ENOMEM_btree_interior_update_worker_init; 2819 2820 if (mempool_init_kmalloc_pool(&c->btree_interior_update_pool, 1, 2821 sizeof(struct btree_update))) 2822 - return -BCH_ERR_ENOMEM_btree_interior_update_pool_init; 2823 2824 return 0; 2825 }
··· 57 struct bkey_buf prev; 58 int ret = 0; 59 60 BUG_ON(b->key.k.type == KEY_TYPE_btree_ptr_v2 && 61 !bpos_eq(bkey_i_to_btree_ptr_v2(&b->key)->v.min_key, 62 b->data->min_key)); ··· 69 70 if (b == btree_node_root(c, b)) { 71 if (!bpos_eq(b->data->min_key, POS_MIN)) { 72 + bch2_log_msg_start(c, &buf); 73 + prt_printf(&buf, "btree root with incorrect min_key: "); 74 bch2_bpos_to_text(&buf, b->data->min_key); 75 + prt_newline(&buf); 76 + 77 + bch2_count_fsck_err(c, btree_root_bad_min_key, &buf); 78 + goto err; 79 } 80 81 if (!bpos_eq(b->data->max_key, SPOS_MAX)) { 82 + bch2_log_msg_start(c, &buf); 83 + prt_printf(&buf, "btree root with incorrect max_key: "); 84 bch2_bpos_to_text(&buf, b->data->max_key); 85 + prt_newline(&buf); 86 + 87 + bch2_count_fsck_err(c, btree_root_bad_max_key, &buf); 88 + goto err; 89 } 90 } 91 ··· 100 : bpos_successor(prev.k->k.p); 101 102 if (!bpos_eq(expected_min, bp.v->min_key)) { 103 + prt_str(&buf, "end of prev node doesn't match start of next node"); 104 prt_str(&buf, "\nprev "); 105 bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(prev.k)); 106 prt_str(&buf, "\nnext "); 107 bch2_bkey_val_to_text(&buf, c, k); 108 + prt_newline(&buf); 109 110 + bch2_count_fsck_err(c, btree_node_topology_bad_min_key, &buf); 111 + goto err; 112 } 113 114 bch2_bkey_buf_reassemble(&prev, c, k); ··· 120 } 121 122 if (bkey_deleted(&prev.k->k)) { 123 + prt_printf(&buf, "empty interior node\n"); 124 + bch2_count_fsck_err(c, btree_node_topology_empty_interior_node, &buf); 125 + goto err; 126 + } 127 128 + if (!bpos_eq(prev.k->k.p, b->key.k.p)) { 129 + prt_str(&buf, "last child node doesn't end at end of parent node\nchild: "); 130 bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(prev.k)); 131 + prt_newline(&buf); 132 133 + bch2_count_fsck_err(c, btree_node_topology_bad_max_key, &buf); 134 + goto err; 135 } 136 out: 137 bch2_btree_and_journal_iter_exit(&iter); 138 bch2_bkey_buf_exit(&prev, c); 139 printbuf_exit(&buf); 140 return ret; 141 + err: 142 + bch2_btree_id_level_to_text(&buf, b->c.btree_id, b->c.level); 143 + prt_char(&buf, ' '); 144 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 145 + prt_newline(&buf); 146 + 147 + ret = __bch2_topology_error(c, &buf); 148 + bch2_print_str(c, KERN_ERR, buf.buf); 149 + BUG_ON(!ret); 150 + goto out; 151 } 152 153 /* Calculate ideal packed bkey format for new btree nodes: */ ··· 685 686 /* 687 * Wait for any in flight writes to finish before we free the old nodes 688 + * on disk. But we haven't pinned those old nodes in the btree cache, 689 + * they might have already been evicted. 690 + * 691 + * The update we're completing deleted references to those nodes from the 692 + * btree, so we know if they've been evicted they can't be pulled back in. 693 + * We just have to check if the nodes we have pointers to are still those 694 + * old nodes, and haven't been reused. 695 + * 696 + * This can't be done locklessly because the data buffer might have been 697 + * vmalloc allocated, and they're not RCU freed. We also need the 698 + * __no_kmsan_checks annotation because even with the btree node read 699 + * lock, nothing tells us that the data buffer has been initialized (if 700 + * the btree node has been reused for a different node, and the data 701 + * buffer swapped for a new data buffer). 702 */ 703 for (i = 0; i < as->nr_old_nodes; i++) { 704 b = as->old_nodes[i]; 705 706 + bch2_trans_begin(trans); 707 + btree_node_lock_nopath_nofail(trans, &b->c, SIX_LOCK_read); 708 + bool seq_matches = btree_node_seq_matches(b, as->old_nodes_seq[i]); 709 + six_unlock_read(&b->c.lock); 710 + bch2_trans_unlock_long(trans); 711 + 712 + if (seq_matches) 713 wait_on_bit_io(&b->flags, BTREE_NODE_write_in_flight_inner, 714 TASK_UNINTERRUPTIBLE); 715 } ··· 1245 if (bch2_err_matches(ret, ENOSPC) && 1246 (flags & BCH_TRANS_COMMIT_journal_reclaim) && 1247 watermark < BCH_WATERMARK_reclaim) { 1248 + ret = bch_err_throw(c, journal_reclaim_would_deadlock); 1249 goto err; 1250 } 1251 ··· 2178 if (btree_iter_path(trans, iter)->l[b->c.level].b != b) { 2179 /* node has been freed: */ 2180 BUG_ON(!btree_node_dying(b)); 2181 + ret = bch_err_throw(trans->c, btree_node_dying); 2182 goto err; 2183 } 2184 ··· 2792 c->btree_interior_update_worker = 2793 alloc_workqueue("btree_update", WQ_UNBOUND|WQ_MEM_RECLAIM, 8); 2794 if (!c->btree_interior_update_worker) 2795 + return bch_err_throw(c, ENOMEM_btree_interior_update_worker_init); 2796 2797 c->btree_node_rewrite_worker = 2798 alloc_ordered_workqueue("btree_node_rewrite", WQ_UNBOUND); 2799 if (!c->btree_node_rewrite_worker) 2800 + return bch_err_throw(c, ENOMEM_btree_interior_update_worker_init); 2801 2802 if (mempool_init_kmalloc_pool(&c->btree_interior_update_pool, 1, 2803 sizeof(struct btree_update))) 2804 + return bch_err_throw(c, ENOMEM_btree_interior_update_pool_init); 2805 2806 return 0; 2807 }
+3 -3
fs/bcachefs/btree_write_buffer.c
··· 394 bool accounting_accumulated = false; 395 do { 396 if (race_fault()) { 397 - ret = -BCH_ERR_journal_reclaim_would_deadlock; 398 break; 399 } 400 ··· 633 struct bch_fs *c = trans->c; 634 635 if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_btree_write_buffer)) 636 - return -BCH_ERR_erofs_no_writes; 637 638 int ret = bch2_btree_write_buffer_flush_nocheck_rw(trans); 639 enumerated_ref_put(&c->writes, BCH_WRITE_REF_btree_write_buffer); ··· 676 goto err; 677 678 bch2_bkey_buf_copy(last_flushed, c, tmp.k); 679 - ret = -BCH_ERR_transaction_restart_write_buffer_flush; 680 } 681 err: 682 bch2_bkey_buf_exit(&tmp, c);
··· 394 bool accounting_accumulated = false; 395 do { 396 if (race_fault()) { 397 + ret = bch_err_throw(c, journal_reclaim_would_deadlock); 398 break; 399 } 400 ··· 633 struct bch_fs *c = trans->c; 634 635 if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_btree_write_buffer)) 636 + return bch_err_throw(c, erofs_no_writes); 637 638 int ret = bch2_btree_write_buffer_flush_nocheck_rw(trans); 639 enumerated_ref_put(&c->writes, BCH_WRITE_REF_btree_write_buffer); ··· 676 goto err; 677 678 bch2_bkey_buf_copy(last_flushed, c, tmp.k); 679 + ret = bch_err_throw(c, transaction_restart_write_buffer_flush); 680 } 681 err: 682 bch2_bkey_buf_exit(&tmp, c);
+99 -64
fs/bcachefs/buckets.c
··· 221 bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 222 if (!p.ptr.cached && 223 data_type == BCH_DATA_btree) { 224 g->data_type = data_type; 225 g->stripe_sectors = 0; 226 g->dirty_sectors = 0; ··· 284 struct printbuf buf = PRINTBUF; 285 int ret = 0; 286 287 bkey_for_each_ptr_decode(k.k, ptrs_c, p, entry_c) { 288 ret = bch2_check_fix_ptr(trans, k, p, entry_c, &do_update); 289 if (ret) ··· 294 } 295 296 if (do_update) { 297 - if (flags & BTREE_TRIGGER_is_root) { 298 - bch_err(c, "cannot update btree roots yet"); 299 - ret = -EINVAL; 300 - goto err; 301 - } 302 - 303 struct bkey_i *new = bch2_bkey_make_mut_noupdate(trans, k); 304 ret = PTR_ERR_OR_ZERO(new); 305 if (ret) 306 goto err; 307 308 - rcu_read_lock(); 309 - bch2_bkey_drop_ptrs(bkey_i_to_s(new), ptr, !bch2_dev_exists(c, ptr->dev)); 310 - rcu_read_unlock(); 311 312 if (level) { 313 /* ··· 309 * sort it out: 310 */ 311 struct bkey_ptrs ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 312 - rcu_read_lock(); 313 - bkey_for_each_ptr(ptrs, ptr) { 314 - struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 315 - struct bucket *g = PTR_GC_BUCKET(ca, ptr); 316 - 317 - ptr->gen = g->gen; 318 - } 319 - rcu_read_unlock(); 320 } else { 321 struct bkey_ptrs ptrs; 322 union bch_extent_entry *entry; ··· 377 bch_info(c, "new key %s", buf.buf); 378 } 379 380 - struct btree_iter iter; 381 - bch2_trans_node_iter_init(trans, &iter, btree, new->k.p, 0, level, 382 - BTREE_ITER_intent|BTREE_ITER_all_snapshots); 383 - ret = bch2_btree_iter_traverse(trans, &iter) ?: 384 - bch2_trans_update(trans, &iter, new, 385 - BTREE_UPDATE_internal_snapshot_node| 386 - BTREE_TRIGGER_norun); 387 - bch2_trans_iter_exit(trans, &iter); 388 - if (ret) 389 - goto err; 390 391 - if (level) 392 - bch2_btree_node_update_key_early(trans, btree, level - 1, k, new); 393 } 394 err: 395 printbuf_exit(&buf); ··· 435 if (insert) { 436 bch2_trans_updates_to_text(buf, trans); 437 __bch2_inconsistent_error(c, buf); 438 - ret = -BCH_ERR_bucket_ref_update; 439 } 440 441 if (print || insert) ··· 632 struct bch_dev *ca = bch2_dev_tryget(c, p.ptr.dev); 633 if (unlikely(!ca)) { 634 if (insert && p.ptr.dev != BCH_SB_MEMBER_INVALID) 635 - ret = -BCH_ERR_trigger_pointer; 636 goto err; 637 } 638 ··· 640 if (!bucket_valid(ca, bucket.offset)) { 641 if (insert) { 642 bch2_dev_bucket_missing(ca, bucket.offset); 643 - ret = -BCH_ERR_trigger_pointer; 644 } 645 goto err; 646 } ··· 662 if (bch2_fs_inconsistent_on(!g, c, "reference to invalid bucket on device %u\n %s", 663 p.ptr.dev, 664 (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 665 - ret = -BCH_ERR_trigger_pointer; 666 goto err; 667 } 668 ··· 688 s64 sectors, 689 enum btree_iter_update_trigger_flags flags) 690 { 691 if (flags & BTREE_TRIGGER_transactional) { 692 struct btree_iter iter; 693 struct bkey_i_stripe *s = bch2_bkey_get_mut_typed(trans, &iter, ··· 707 bch2_trans_inconsistent(trans, 708 "stripe pointer doesn't match stripe %llu", 709 (u64) p.ec.idx); 710 - ret = -BCH_ERR_trigger_stripe_pointer; 711 goto err; 712 } 713 ··· 727 } 728 729 if (flags & BTREE_TRIGGER_gc) { 730 - struct bch_fs *c = trans->c; 731 - 732 struct gc_stripe *m = genradix_ptr_alloc(&c->gc_stripes, p.ec.idx, GFP_KERNEL); 733 if (!m) { 734 bch_err(c, "error allocating memory for gc_stripes, idx %llu", 735 (u64) p.ec.idx); 736 - return -BCH_ERR_ENOMEM_mark_stripe_ptr; 737 } 738 739 gc_stripe_lock(m); ··· 746 __bch2_inconsistent_error(c, &buf); 747 bch2_print_str(c, KERN_ERR, buf.buf); 748 printbuf_exit(&buf); 749 - return -BCH_ERR_trigger_stripe_pointer; 750 } 751 752 m->block_sectors[p.ec.block] += sectors; ··· 769 static int __trigger_extent(struct btree_trans *trans, 770 enum btree_id btree_id, unsigned level, 771 struct bkey_s_c k, 772 - enum btree_iter_update_trigger_flags flags, 773 - s64 *replicas_sectors) 774 { 775 bool gc = flags & BTREE_TRIGGER_gc; 776 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); ··· 779 ? BCH_DATA_btree 780 : BCH_DATA_user; 781 int ret = 0; 782 783 struct disk_accounting_pos acc_replicas_key; 784 memset(&acc_replicas_key, 0, sizeof(acc_replicas_key)); ··· 808 if (ret) 809 return ret; 810 } else if (!p.has_ec) { 811 - *replicas_sectors += disk_sectors; 812 replicas_entry_add_dev(&acc_replicas_key.replicas, p.ptr.dev); 813 } else { 814 ret = bch2_trigger_stripe_ptr(trans, k, p, data_type, disk_sectors, flags); ··· 846 } 847 848 if (acc_replicas_key.replicas.nr_devs) { 849 - ret = bch2_disk_accounting_mod(trans, &acc_replicas_key, replicas_sectors, 1, gc); 850 if (ret) 851 return ret; 852 } 853 854 if (acc_replicas_key.replicas.nr_devs && !level && k.k->p.snapshot) { 855 - ret = bch2_disk_accounting_mod2_nr(trans, gc, replicas_sectors, 1, snapshot, k.k->p.snapshot); 856 if (ret) 857 return ret; 858 } ··· 868 } 869 870 if (level) { 871 - ret = bch2_disk_accounting_mod2_nr(trans, gc, replicas_sectors, 1, btree, btree_id); 872 if (ret) 873 return ret; 874 } else { ··· 877 s64 v[3] = { 878 insert ? 1 : -1, 879 insert ? k.k->size : -((s64) k.k->size), 880 - *replicas_sectors, 881 }; 882 ret = bch2_disk_accounting_mod2(trans, gc, v, inum, k.k->p.inode); 883 if (ret) ··· 909 return 0; 910 911 if (flags & (BTREE_TRIGGER_transactional|BTREE_TRIGGER_gc)) { 912 - s64 old_replicas_sectors = 0, new_replicas_sectors = 0; 913 - 914 if (old.k->type) { 915 int ret = __trigger_extent(trans, btree, level, old, 916 - flags & ~BTREE_TRIGGER_insert, 917 - &old_replicas_sectors); 918 if (ret) 919 return ret; 920 } 921 922 if (new.k->type) { 923 int ret = __trigger_extent(trans, btree, level, new.s_c, 924 - flags & ~BTREE_TRIGGER_overwrite, 925 - &new_replicas_sectors); 926 if (ret) 927 return ret; 928 } ··· 1005 bch2_data_type_str(type), 1006 bch2_data_type_str(type)); 1007 1008 - bool print = bch2_count_fsck_err(c, bucket_metadata_type_mismatch, &buf); 1009 1010 - bch2_run_explicit_recovery_pass(c, &buf, 1011 BCH_RECOVERY_PASS_check_allocations, 0); 1012 1013 - if (print) 1014 - bch2_print_str(c, KERN_ERR, buf.buf); 1015 printbuf_exit(&buf); 1016 - ret = -BCH_ERR_metadata_bucket_inconsistency; 1017 goto err; 1018 } 1019 ··· 1067 err_unlock: 1068 bucket_unlock(g); 1069 err: 1070 - return -BCH_ERR_metadata_bucket_inconsistency; 1071 } 1072 1073 int bch2_trans_mark_metadata_bucket(struct btree_trans *trans, ··· 1282 ret = 0; 1283 } else { 1284 atomic64_set(&c->sectors_available, sectors_available); 1285 - ret = -BCH_ERR_ENOSPC_disk_reservation; 1286 } 1287 1288 mutex_unlock(&c->sectors_available_lock); ··· 1311 GFP_KERNEL|__GFP_ZERO); 1312 if (!ca->buckets_nouse) { 1313 bch2_dev_put(ca); 1314 - return -BCH_ERR_ENOMEM_buckets_nouse; 1315 } 1316 } 1317 ··· 1336 lockdep_assert_held(&c->state_lock); 1337 1338 if (resize && ca->buckets_nouse) 1339 - return -BCH_ERR_no_resize_with_buckets_nouse; 1340 1341 bucket_gens = bch2_kvmalloc(struct_size(bucket_gens, b, nbuckets), 1342 GFP_KERNEL|__GFP_ZERO); 1343 if (!bucket_gens) { 1344 - ret = -BCH_ERR_ENOMEM_bucket_gens; 1345 goto err; 1346 } 1347 ··· 1360 sizeof(bucket_gens->b[0]) * copy); 1361 } 1362 1363 - ret = bch2_bucket_bitmap_resize(&ca->bucket_backpointer_mismatch, 1364 ca->mi.nbuckets, nbuckets) ?: 1365 - bch2_bucket_bitmap_resize(&ca->bucket_backpointer_empty, 1366 ca->mi.nbuckets, nbuckets); 1367 1368 rcu_assign_pointer(ca->bucket_gens, bucket_gens); ··· 1389 { 1390 ca->usage = alloc_percpu(struct bch_dev_usage_full); 1391 if (!ca->usage) 1392 - return -BCH_ERR_ENOMEM_usage_init; 1393 1394 return bch2_dev_buckets_resize(c, ca, ca->mi.nbuckets); 1395 }
··· 221 bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 222 if (!p.ptr.cached && 223 data_type == BCH_DATA_btree) { 224 + switch (g->data_type) { 225 + case BCH_DATA_sb: 226 + bch_err(c, "btree and superblock in the same bucket - cannot repair"); 227 + ret = bch_err_throw(c, fsck_repair_unimplemented); 228 + goto out; 229 + case BCH_DATA_journal: 230 + ret = bch2_dev_journal_bucket_delete(ca, PTR_BUCKET_NR(ca, &p.ptr)); 231 + bch_err_msg(c, ret, "error deleting journal bucket %zu", 232 + PTR_BUCKET_NR(ca, &p.ptr)); 233 + if (ret) 234 + goto out; 235 + break; 236 + } 237 + 238 g->data_type = data_type; 239 g->stripe_sectors = 0; 240 g->dirty_sectors = 0; ··· 270 struct printbuf buf = PRINTBUF; 271 int ret = 0; 272 273 + /* We don't yet do btree key updates correctly for when we're RW */ 274 + BUG_ON(test_bit(BCH_FS_rw, &c->flags)); 275 + 276 bkey_for_each_ptr_decode(k.k, ptrs_c, p, entry_c) { 277 ret = bch2_check_fix_ptr(trans, k, p, entry_c, &do_update); 278 if (ret) ··· 277 } 278 279 if (do_update) { 280 struct bkey_i *new = bch2_bkey_make_mut_noupdate(trans, k); 281 ret = PTR_ERR_OR_ZERO(new); 282 if (ret) 283 goto err; 284 285 + scoped_guard(rcu) 286 + bch2_bkey_drop_ptrs(bkey_i_to_s(new), ptr, !bch2_dev_exists(c, ptr->dev)); 287 288 if (level) { 289 /* ··· 299 * sort it out: 300 */ 301 struct bkey_ptrs ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 302 + scoped_guard(rcu) 303 + bkey_for_each_ptr(ptrs, ptr) { 304 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 305 + ptr->gen = PTR_GC_BUCKET(ca, ptr)->gen; 306 + } 307 } else { 308 struct bkey_ptrs ptrs; 309 union bch_extent_entry *entry; ··· 370 bch_info(c, "new key %s", buf.buf); 371 } 372 373 + if (!(flags & BTREE_TRIGGER_is_root)) { 374 + struct btree_iter iter; 375 + bch2_trans_node_iter_init(trans, &iter, btree, new->k.p, 0, level, 376 + BTREE_ITER_intent|BTREE_ITER_all_snapshots); 377 + ret = bch2_btree_iter_traverse(trans, &iter) ?: 378 + bch2_trans_update(trans, &iter, new, 379 + BTREE_UPDATE_internal_snapshot_node| 380 + BTREE_TRIGGER_norun); 381 + bch2_trans_iter_exit(trans, &iter); 382 + if (ret) 383 + goto err; 384 385 + if (level) 386 + bch2_btree_node_update_key_early(trans, btree, level - 1, k, new); 387 + } else { 388 + struct jset_entry *e = bch2_trans_jset_entry_alloc(trans, 389 + jset_u64s(new->k.u64s)); 390 + ret = PTR_ERR_OR_ZERO(e); 391 + if (ret) 392 + goto err; 393 + 394 + journal_entry_set(e, 395 + BCH_JSET_ENTRY_btree_root, 396 + btree, level - 1, 397 + new, new->k.u64s); 398 + 399 + /* 400 + * no locking, we're single threaded and not rw yet, see 401 + * the big assertino above that we repeat here: 402 + */ 403 + BUG_ON(test_bit(BCH_FS_rw, &c->flags)); 404 + 405 + struct btree *b = bch2_btree_id_root(c, btree)->b; 406 + bkey_copy(&b->key, new); 407 + } 408 } 409 err: 410 printbuf_exit(&buf); ··· 406 if (insert) { 407 bch2_trans_updates_to_text(buf, trans); 408 __bch2_inconsistent_error(c, buf); 409 + /* 410 + * If we're in recovery, run_explicit_recovery_pass might give 411 + * us an error code for rewinding recovery 412 + */ 413 + if (!ret) 414 + ret = bch_err_throw(c, bucket_ref_update); 415 + } else { 416 + /* Always ignore overwrite errors, so that deletion works */ 417 + ret = 0; 418 } 419 420 if (print || insert) ··· 595 struct bch_dev *ca = bch2_dev_tryget(c, p.ptr.dev); 596 if (unlikely(!ca)) { 597 if (insert && p.ptr.dev != BCH_SB_MEMBER_INVALID) 598 + ret = bch_err_throw(c, trigger_pointer); 599 goto err; 600 } 601 ··· 603 if (!bucket_valid(ca, bucket.offset)) { 604 if (insert) { 605 bch2_dev_bucket_missing(ca, bucket.offset); 606 + ret = bch_err_throw(c, trigger_pointer); 607 } 608 goto err; 609 } ··· 625 if (bch2_fs_inconsistent_on(!g, c, "reference to invalid bucket on device %u\n %s", 626 p.ptr.dev, 627 (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 628 + ret = bch_err_throw(c, trigger_pointer); 629 goto err; 630 } 631 ··· 651 s64 sectors, 652 enum btree_iter_update_trigger_flags flags) 653 { 654 + struct bch_fs *c = trans->c; 655 + 656 if (flags & BTREE_TRIGGER_transactional) { 657 struct btree_iter iter; 658 struct bkey_i_stripe *s = bch2_bkey_get_mut_typed(trans, &iter, ··· 668 bch2_trans_inconsistent(trans, 669 "stripe pointer doesn't match stripe %llu", 670 (u64) p.ec.idx); 671 + ret = bch_err_throw(c, trigger_stripe_pointer); 672 goto err; 673 } 674 ··· 688 } 689 690 if (flags & BTREE_TRIGGER_gc) { 691 struct gc_stripe *m = genradix_ptr_alloc(&c->gc_stripes, p.ec.idx, GFP_KERNEL); 692 if (!m) { 693 bch_err(c, "error allocating memory for gc_stripes, idx %llu", 694 (u64) p.ec.idx); 695 + return bch_err_throw(c, ENOMEM_mark_stripe_ptr); 696 } 697 698 gc_stripe_lock(m); ··· 709 __bch2_inconsistent_error(c, &buf); 710 bch2_print_str(c, KERN_ERR, buf.buf); 711 printbuf_exit(&buf); 712 + return bch_err_throw(c, trigger_stripe_pointer); 713 } 714 715 m->block_sectors[p.ec.block] += sectors; ··· 732 static int __trigger_extent(struct btree_trans *trans, 733 enum btree_id btree_id, unsigned level, 734 struct bkey_s_c k, 735 + enum btree_iter_update_trigger_flags flags) 736 { 737 bool gc = flags & BTREE_TRIGGER_gc; 738 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); ··· 743 ? BCH_DATA_btree 744 : BCH_DATA_user; 745 int ret = 0; 746 + 747 + s64 replicas_sectors = 0; 748 749 struct disk_accounting_pos acc_replicas_key; 750 memset(&acc_replicas_key, 0, sizeof(acc_replicas_key)); ··· 770 if (ret) 771 return ret; 772 } else if (!p.has_ec) { 773 + replicas_sectors += disk_sectors; 774 replicas_entry_add_dev(&acc_replicas_key.replicas, p.ptr.dev); 775 } else { 776 ret = bch2_trigger_stripe_ptr(trans, k, p, data_type, disk_sectors, flags); ··· 808 } 809 810 if (acc_replicas_key.replicas.nr_devs) { 811 + ret = bch2_disk_accounting_mod(trans, &acc_replicas_key, &replicas_sectors, 1, gc); 812 if (ret) 813 return ret; 814 } 815 816 if (acc_replicas_key.replicas.nr_devs && !level && k.k->p.snapshot) { 817 + ret = bch2_disk_accounting_mod2_nr(trans, gc, &replicas_sectors, 1, snapshot, k.k->p.snapshot); 818 if (ret) 819 return ret; 820 } ··· 830 } 831 832 if (level) { 833 + ret = bch2_disk_accounting_mod2_nr(trans, gc, &replicas_sectors, 1, btree, btree_id); 834 if (ret) 835 return ret; 836 } else { ··· 839 s64 v[3] = { 840 insert ? 1 : -1, 841 insert ? k.k->size : -((s64) k.k->size), 842 + replicas_sectors, 843 }; 844 ret = bch2_disk_accounting_mod2(trans, gc, v, inum, k.k->p.inode); 845 if (ret) ··· 871 return 0; 872 873 if (flags & (BTREE_TRIGGER_transactional|BTREE_TRIGGER_gc)) { 874 if (old.k->type) { 875 int ret = __trigger_extent(trans, btree, level, old, 876 + flags & ~BTREE_TRIGGER_insert); 877 if (ret) 878 return ret; 879 } 880 881 if (new.k->type) { 882 int ret = __trigger_extent(trans, btree, level, new.s_c, 883 + flags & ~BTREE_TRIGGER_overwrite); 884 if (ret) 885 return ret; 886 } ··· 971 bch2_data_type_str(type), 972 bch2_data_type_str(type)); 973 974 + bch2_count_fsck_err(c, bucket_metadata_type_mismatch, &buf); 975 976 + ret = bch2_run_explicit_recovery_pass(c, &buf, 977 BCH_RECOVERY_PASS_check_allocations, 0); 978 979 + /* Always print, this is always fatal */ 980 + bch2_print_str(c, KERN_ERR, buf.buf); 981 printbuf_exit(&buf); 982 + if (!ret) 983 + ret = bch_err_throw(c, metadata_bucket_inconsistency); 984 goto err; 985 } 986 ··· 1032 err_unlock: 1033 bucket_unlock(g); 1034 err: 1035 + return bch_err_throw(c, metadata_bucket_inconsistency); 1036 } 1037 1038 int bch2_trans_mark_metadata_bucket(struct btree_trans *trans, ··· 1247 ret = 0; 1248 } else { 1249 atomic64_set(&c->sectors_available, sectors_available); 1250 + ret = bch_err_throw(c, ENOSPC_disk_reservation); 1251 } 1252 1253 mutex_unlock(&c->sectors_available_lock); ··· 1276 GFP_KERNEL|__GFP_ZERO); 1277 if (!ca->buckets_nouse) { 1278 bch2_dev_put(ca); 1279 + return bch_err_throw(c, ENOMEM_buckets_nouse); 1280 } 1281 } 1282 ··· 1301 lockdep_assert_held(&c->state_lock); 1302 1303 if (resize && ca->buckets_nouse) 1304 + return bch_err_throw(c, no_resize_with_buckets_nouse); 1305 1306 bucket_gens = bch2_kvmalloc(struct_size(bucket_gens, b, nbuckets), 1307 GFP_KERNEL|__GFP_ZERO); 1308 if (!bucket_gens) { 1309 + ret = bch_err_throw(c, ENOMEM_bucket_gens); 1310 goto err; 1311 } 1312 ··· 1325 sizeof(bucket_gens->b[0]) * copy); 1326 } 1327 1328 + ret = bch2_bucket_bitmap_resize(ca, &ca->bucket_backpointer_mismatch, 1329 ca->mi.nbuckets, nbuckets) ?: 1330 + bch2_bucket_bitmap_resize(ca, &ca->bucket_backpointer_empty, 1331 ca->mi.nbuckets, nbuckets); 1332 1333 rcu_assign_pointer(ca->bucket_gens, bucket_gens); ··· 1354 { 1355 ca->usage = alloc_percpu(struct bch_dev_usage_full); 1356 if (!ca->usage) 1357 + return bch_err_throw(c, ENOMEM_usage_init); 1358 1359 return bch2_dev_buckets_resize(c, ca, ca->mi.nbuckets); 1360 }
+4 -8
fs/bcachefs/buckets.h
··· 84 85 static inline int bucket_gen_get(struct bch_dev *ca, size_t b) 86 { 87 - rcu_read_lock(); 88 - int ret = bucket_gen_get_rcu(ca, b); 89 - rcu_read_unlock(); 90 - return ret; 91 } 92 93 static inline size_t PTR_BUCKET_NR(const struct bch_dev *ca, ··· 154 */ 155 static inline int dev_ptr_stale(struct bch_dev *ca, const struct bch_extent_ptr *ptr) 156 { 157 - rcu_read_lock(); 158 - int ret = dev_ptr_stale_rcu(ca, ptr); 159 - rcu_read_unlock(); 160 - return ret; 161 } 162 163 /* Device usage: */
··· 84 85 static inline int bucket_gen_get(struct bch_dev *ca, size_t b) 86 { 87 + guard(rcu)(); 88 + return bucket_gen_get_rcu(ca, b); 89 } 90 91 static inline size_t PTR_BUCKET_NR(const struct bch_dev *ca, ··· 156 */ 157 static inline int dev_ptr_stale(struct bch_dev *ca, const struct bch_extent_ptr *ptr) 158 { 159 + guard(rcu)(); 160 + return dev_ptr_stale_rcu(ca, ptr); 161 } 162 163 /* Device usage: */
+2 -1
fs/bcachefs/buckets_waiting_for_journal.c
··· 108 realloc: 109 n = kvmalloc(sizeof(*n) + (sizeof(n->d[0]) << new_bits), GFP_KERNEL); 110 if (!n) { 111 - ret = -BCH_ERR_ENOMEM_buckets_waiting_for_journal_set; 112 goto out; 113 } 114
··· 108 realloc: 109 n = kvmalloc(sizeof(*n) + (sizeof(n->d[0]) << new_bits), GFP_KERNEL); 110 if (!n) { 111 + struct bch_fs *c = container_of(b, struct bch_fs, buckets_waiting_for_journal); 112 + ret = bch_err_throw(c, ENOMEM_buckets_waiting_for_journal_set); 113 goto out; 114 } 115
+3 -6
fs/bcachefs/chardev.c
··· 613 if (!dev) 614 return -EINVAL; 615 616 - rcu_read_lock(); 617 for_each_online_member_rcu(c, ca) 618 - if (ca->dev == dev) { 619 - rcu_read_unlock(); 620 return ca->dev_idx; 621 - } 622 - rcu_read_unlock(); 623 624 - return -BCH_ERR_ENOENT_dev_idx_not_found; 625 } 626 627 static long bch2_ioctl_disk_resize(struct bch_fs *c,
··· 613 if (!dev) 614 return -EINVAL; 615 616 + guard(rcu)(); 617 for_each_online_member_rcu(c, ca) 618 + if (ca->dev == dev) 619 return ca->dev_idx; 620 621 + return bch_err_throw(c, ENOENT_dev_idx_not_found); 622 } 623 624 static long bch2_ioctl_disk_resize(struct bch_fs *c,
+4 -4
fs/bcachefs/checksum.c
··· 173 174 if (bch2_fs_inconsistent_on(!c->chacha20_key_set, 175 c, "attempting to encrypt without encryption key")) 176 - return -BCH_ERR_no_encryption_key; 177 178 bch2_chacha20(&c->chacha20_key, nonce, data, len); 179 return 0; ··· 262 263 if (bch2_fs_inconsistent_on(!c->chacha20_key_set, 264 c, "attempting to encrypt without encryption key")) 265 - return -BCH_ERR_no_encryption_key; 266 267 bch2_chacha20_init(&chacha_state, &c->chacha20_key, nonce); 268 ··· 375 prt_str(&buf, ")"); 376 WARN_RATELIMIT(1, "%s", buf.buf); 377 printbuf_exit(&buf); 378 - return -BCH_ERR_recompute_checksum; 379 } 380 381 for (i = splits; i < splits + ARRAY_SIZE(splits); i++) { ··· 659 crypt = bch2_sb_field_resize(&c->disk_sb, crypt, 660 sizeof(*crypt) / sizeof(u64)); 661 if (!crypt) { 662 - ret = -BCH_ERR_ENOSPC_sb_crypt; 663 goto err; 664 } 665
··· 173 174 if (bch2_fs_inconsistent_on(!c->chacha20_key_set, 175 c, "attempting to encrypt without encryption key")) 176 + return bch_err_throw(c, no_encryption_key); 177 178 bch2_chacha20(&c->chacha20_key, nonce, data, len); 179 return 0; ··· 262 263 if (bch2_fs_inconsistent_on(!c->chacha20_key_set, 264 c, "attempting to encrypt without encryption key")) 265 + return bch_err_throw(c, no_encryption_key); 266 267 bch2_chacha20_init(&chacha_state, &c->chacha20_key, nonce); 268 ··· 375 prt_str(&buf, ")"); 376 WARN_RATELIMIT(1, "%s", buf.buf); 377 printbuf_exit(&buf); 378 + return bch_err_throw(c, recompute_checksum); 379 } 380 381 for (i = splits; i < splits + ARRAY_SIZE(splits); i++) { ··· 659 crypt = bch2_sb_field_resize(&c->disk_sb, crypt, 660 sizeof(*crypt) / sizeof(u64)); 661 if (!crypt) { 662 + ret = bch_err_throw(c, ENOSPC_sb_crypt); 663 goto err; 664 } 665
+18 -29
fs/bcachefs/clock.c
··· 53 54 struct io_clock_wait { 55 struct io_timer io_timer; 56 - struct timer_list cpu_timer; 57 struct task_struct *task; 58 int expired; 59 }; ··· 61 { 62 struct io_clock_wait *wait = container_of(timer, 63 struct io_clock_wait, io_timer); 64 - 65 - wait->expired = 1; 66 - wake_up_process(wait->task); 67 - } 68 - 69 - static void io_clock_cpu_timeout(struct timer_list *timer) 70 - { 71 - struct io_clock_wait *wait = container_of(timer, 72 - struct io_clock_wait, cpu_timer); 73 74 wait->expired = 1; 75 wake_up_process(wait->task); ··· 80 bch2_io_timer_del(clock, &wait.io_timer); 81 } 82 83 - void bch2_kthread_io_clock_wait(struct io_clock *clock, 84 - u64 io_until, unsigned long cpu_timeout) 85 { 86 bool kthread = (current->flags & PF_KTHREAD) != 0; 87 struct io_clock_wait wait = { ··· 93 94 bch2_io_timer_add(clock, &wait.io_timer); 95 96 - timer_setup_on_stack(&wait.cpu_timer, io_clock_cpu_timeout, 0); 97 - 98 - if (cpu_timeout != MAX_SCHEDULE_TIMEOUT) 99 - mod_timer(&wait.cpu_timer, cpu_timeout + jiffies); 100 - 101 - do { 102 - set_current_state(TASK_INTERRUPTIBLE); 103 - if (kthread && kthread_should_stop()) 104 - break; 105 - 106 - if (wait.expired) 107 - break; 108 - 109 - schedule(); 110 try_to_freeze(); 111 - } while (0); 112 113 __set_current_state(TASK_RUNNING); 114 - timer_delete_sync(&wait.cpu_timer); 115 - timer_destroy_on_stack(&wait.cpu_timer); 116 bch2_io_timer_del(clock, &wait.io_timer); 117 } 118 119 static struct io_timer *get_expired_timer(struct io_clock *clock, u64 now)
··· 53 54 struct io_clock_wait { 55 struct io_timer io_timer; 56 struct task_struct *task; 57 int expired; 58 }; ··· 62 { 63 struct io_clock_wait *wait = container_of(timer, 64 struct io_clock_wait, io_timer); 65 66 wait->expired = 1; 67 wake_up_process(wait->task); ··· 90 bch2_io_timer_del(clock, &wait.io_timer); 91 } 92 93 + unsigned long bch2_kthread_io_clock_wait_once(struct io_clock *clock, 94 + u64 io_until, unsigned long cpu_timeout) 95 { 96 bool kthread = (current->flags & PF_KTHREAD) != 0; 97 struct io_clock_wait wait = { ··· 103 104 bch2_io_timer_add(clock, &wait.io_timer); 105 106 + set_current_state(TASK_INTERRUPTIBLE); 107 + if (!(kthread && kthread_should_stop())) { 108 + cpu_timeout = schedule_timeout(cpu_timeout); 109 try_to_freeze(); 110 + } 111 112 __set_current_state(TASK_RUNNING); 113 bch2_io_timer_del(clock, &wait.io_timer); 114 + return cpu_timeout; 115 + } 116 + 117 + void bch2_kthread_io_clock_wait(struct io_clock *clock, 118 + u64 io_until, unsigned long cpu_timeout) 119 + { 120 + bool kthread = (current->flags & PF_KTHREAD) != 0; 121 + 122 + while (!(kthread && kthread_should_stop()) && 123 + cpu_timeout && 124 + atomic64_read(&clock->now) < io_until) 125 + cpu_timeout = bch2_kthread_io_clock_wait_once(clock, io_until, cpu_timeout); 126 } 127 128 static struct io_timer *get_expired_timer(struct io_clock *clock, u64 now)
+1
fs/bcachefs/clock.h
··· 4 5 void bch2_io_timer_add(struct io_clock *, struct io_timer *); 6 void bch2_io_timer_del(struct io_clock *, struct io_timer *); 7 void bch2_kthread_io_clock_wait(struct io_clock *, u64, unsigned long); 8 9 void __bch2_increment_clock(struct io_clock *, u64);
··· 4 5 void bch2_io_timer_add(struct io_clock *, struct io_timer *); 6 void bch2_io_timer_del(struct io_clock *, struct io_timer *); 7 + unsigned long bch2_kthread_io_clock_wait_once(struct io_clock *, u64, unsigned long); 8 void bch2_kthread_io_clock_wait(struct io_clock *, u64, unsigned long); 9 10 void __bch2_increment_clock(struct io_clock *, u64);
+10 -10
fs/bcachefs/compress.c
··· 187 __bch2_compression_types[crc.compression_type])) 188 ret = bch2_check_set_has_compressed_data(c, opt); 189 else 190 - ret = -BCH_ERR_compression_workspace_not_initialized; 191 if (ret) 192 goto err; 193 } ··· 200 ret2 = LZ4_decompress_safe_partial(src_data.b, dst_data, 201 src_len, dst_len, dst_len); 202 if (ret2 != dst_len) 203 - ret = -BCH_ERR_decompress_lz4; 204 break; 205 case BCH_COMPRESSION_TYPE_gzip: { 206 z_stream strm = { ··· 219 mempool_free(workspace, workspace_pool); 220 221 if (ret2 != Z_STREAM_END) 222 - ret = -BCH_ERR_decompress_gzip; 223 break; 224 } 225 case BCH_COMPRESSION_TYPE_zstd: { ··· 227 size_t real_src_len = le32_to_cpup(src_data.b); 228 229 if (real_src_len > src_len - 4) { 230 - ret = -BCH_ERR_decompress_zstd_src_len_bad; 231 goto err; 232 } 233 ··· 241 mempool_free(workspace, workspace_pool); 242 243 if (ret2 != dst_len) 244 - ret = -BCH_ERR_decompress_zstd; 245 break; 246 } 247 default: ··· 270 bch2_write_op_error(op, op->pos.offset, 271 "extent too big to decompress (%u > %u)", 272 crc->uncompressed_size << 9, c->opts.encoded_extent_max); 273 - return -BCH_ERR_decompress_exceeded_max_encoded_extent; 274 } 275 276 data = __bounce_alloc(c, dst_len, WRITE); ··· 314 315 if (crc.uncompressed_size << 9 > c->opts.encoded_extent_max || 316 crc.compressed_size << 9 > c->opts.encoded_extent_max) 317 - return -BCH_ERR_decompress_exceeded_max_encoded_extent; 318 319 dst_data = dst_len == dst_iter.bi_size 320 ? __bio_map_or_bounce(c, dst, dst_iter, WRITE) ··· 656 if (!mempool_initialized(&c->compression_bounce[READ]) && 657 mempool_init_kvmalloc_pool(&c->compression_bounce[READ], 658 1, c->opts.encoded_extent_max)) 659 - return -BCH_ERR_ENOMEM_compression_bounce_read_init; 660 661 if (!mempool_initialized(&c->compression_bounce[WRITE]) && 662 mempool_init_kvmalloc_pool(&c->compression_bounce[WRITE], 663 1, c->opts.encoded_extent_max)) 664 - return -BCH_ERR_ENOMEM_compression_bounce_write_init; 665 666 for (i = compression_types; 667 i < compression_types + ARRAY_SIZE(compression_types); ··· 675 if (mempool_init_kvmalloc_pool( 676 &c->compress_workspace[i->type], 677 1, i->compress_workspace)) 678 - return -BCH_ERR_ENOMEM_compression_workspace_init; 679 } 680 681 return 0;
··· 187 __bch2_compression_types[crc.compression_type])) 188 ret = bch2_check_set_has_compressed_data(c, opt); 189 else 190 + ret = bch_err_throw(c, compression_workspace_not_initialized); 191 if (ret) 192 goto err; 193 } ··· 200 ret2 = LZ4_decompress_safe_partial(src_data.b, dst_data, 201 src_len, dst_len, dst_len); 202 if (ret2 != dst_len) 203 + ret = bch_err_throw(c, decompress_lz4); 204 break; 205 case BCH_COMPRESSION_TYPE_gzip: { 206 z_stream strm = { ··· 219 mempool_free(workspace, workspace_pool); 220 221 if (ret2 != Z_STREAM_END) 222 + ret = bch_err_throw(c, decompress_gzip); 223 break; 224 } 225 case BCH_COMPRESSION_TYPE_zstd: { ··· 227 size_t real_src_len = le32_to_cpup(src_data.b); 228 229 if (real_src_len > src_len - 4) { 230 + ret = bch_err_throw(c, decompress_zstd_src_len_bad); 231 goto err; 232 } 233 ··· 241 mempool_free(workspace, workspace_pool); 242 243 if (ret2 != dst_len) 244 + ret = bch_err_throw(c, decompress_zstd); 245 break; 246 } 247 default: ··· 270 bch2_write_op_error(op, op->pos.offset, 271 "extent too big to decompress (%u > %u)", 272 crc->uncompressed_size << 9, c->opts.encoded_extent_max); 273 + return bch_err_throw(c, decompress_exceeded_max_encoded_extent); 274 } 275 276 data = __bounce_alloc(c, dst_len, WRITE); ··· 314 315 if (crc.uncompressed_size << 9 > c->opts.encoded_extent_max || 316 crc.compressed_size << 9 > c->opts.encoded_extent_max) 317 + return bch_err_throw(c, decompress_exceeded_max_encoded_extent); 318 319 dst_data = dst_len == dst_iter.bi_size 320 ? __bio_map_or_bounce(c, dst, dst_iter, WRITE) ··· 656 if (!mempool_initialized(&c->compression_bounce[READ]) && 657 mempool_init_kvmalloc_pool(&c->compression_bounce[READ], 658 1, c->opts.encoded_extent_max)) 659 + return bch_err_throw(c, ENOMEM_compression_bounce_read_init); 660 661 if (!mempool_initialized(&c->compression_bounce[WRITE]) && 662 mempool_init_kvmalloc_pool(&c->compression_bounce[WRITE], 663 1, c->opts.encoded_extent_max)) 664 + return bch_err_throw(c, ENOMEM_compression_bounce_write_init); 665 666 for (i = compression_types; 667 i < compression_types + ARRAY_SIZE(compression_types); ··· 675 if (mempool_init_kvmalloc_pool( 676 &c->compress_workspace[i->type], 677 1, i->compress_workspace)) 678 + return bch_err_throw(c, ENOMEM_compression_workspace_init); 679 } 680 681 return 0;
+45 -1
fs/bcachefs/darray.h
··· 8 * Inspired by CCAN's darray 9 */ 10 11 #include <linux/slab.h> 12 13 #define DARRAY_PREALLOCATED(_type, _nr) \ ··· 88 #define darray_remove_item(_d, _pos) \ 89 array_remove_item((_d)->data, (_d)->nr, (_pos) - (_d)->data) 90 91 - #define __darray_for_each(_d, _i) \ 92 for ((_i) = (_d).data; _i < (_d).data + (_d).nr; _i++) 93 94 #define darray_for_each(_d, _i) \ ··· 112 113 #define darray_for_each_reverse(_d, _i) \ 114 for (typeof(&(_d).data[0]) _i = (_d).data + (_d).nr - 1; _i >= (_d).data && (_d).nr; --_i) 115 116 #define darray_init(_d) \ 117 do { \ ··· 129 kvfree((_d)->data); \ 130 darray_init(_d); \ 131 } while (0) 132 133 #endif /* _BCACHEFS_DARRAY_H */
··· 8 * Inspired by CCAN's darray 9 */ 10 11 + #include <linux/cleanup.h> 12 #include <linux/slab.h> 13 14 #define DARRAY_PREALLOCATED(_type, _nr) \ ··· 87 #define darray_remove_item(_d, _pos) \ 88 array_remove_item((_d)->data, (_d)->nr, (_pos) - (_d)->data) 89 90 + #define darray_find_p(_d, _i, cond) \ 91 + ({ \ 92 + typeof((_d).data) _ret = NULL; \ 93 + \ 94 + darray_for_each(_d, _i) \ 95 + if (cond) { \ 96 + _ret = _i; \ 97 + break; \ 98 + } \ 99 + _ret; \ 100 + }) 101 + 102 + #define darray_find(_d, _item) darray_find_p(_d, _i, *_i == _item) 103 + 104 + /* Iteration: */ 105 + 106 + #define __darray_for_each(_d, _i) \ 107 for ((_i) = (_d).data; _i < (_d).data + (_d).nr; _i++) 108 109 #define darray_for_each(_d, _i) \ ··· 95 96 #define darray_for_each_reverse(_d, _i) \ 97 for (typeof(&(_d).data[0]) _i = (_d).data + (_d).nr - 1; _i >= (_d).data && (_d).nr; --_i) 98 + 99 + /* Init/exit */ 100 101 #define darray_init(_d) \ 102 do { \ ··· 110 kvfree((_d)->data); \ 111 darray_init(_d); \ 112 } while (0) 113 + 114 + #define DEFINE_DARRAY_CLASS(_type) \ 115 + DEFINE_CLASS(_type, _type, darray_exit(&(_T)), (_type) {}, void) 116 + 117 + #define DEFINE_DARRAY(_type) \ 118 + typedef DARRAY(_type) darray_##_type; \ 119 + DEFINE_DARRAY_CLASS(darray_##_type) 120 + 121 + #define DEFINE_DARRAY_NAMED(_name, _type) \ 122 + typedef DARRAY(_type) _name; \ 123 + DEFINE_DARRAY_CLASS(_name) 124 + 125 + DEFINE_DARRAY_CLASS(darray_char); 126 + DEFINE_DARRAY_CLASS(darray_str) 127 + DEFINE_DARRAY_CLASS(darray_const_str) 128 + 129 + DEFINE_DARRAY_CLASS(darray_u8) 130 + DEFINE_DARRAY_CLASS(darray_u16) 131 + DEFINE_DARRAY_CLASS(darray_u32) 132 + DEFINE_DARRAY_CLASS(darray_u64) 133 + 134 + DEFINE_DARRAY_CLASS(darray_s8) 135 + DEFINE_DARRAY_CLASS(darray_s16) 136 + DEFINE_DARRAY_CLASS(darray_s32) 137 + DEFINE_DARRAY_CLASS(darray_s64) 138 139 #endif /* _BCACHEFS_DARRAY_H */
+103 -73
fs/bcachefs/data_update.c
··· 66 } 67 } 68 69 - static bool bkey_nocow_lock(struct bch_fs *c, struct moving_context *ctxt, struct bkey_s_c k) 70 { 71 - struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 72 73 bkey_for_each_ptr(ptrs, ptr) { 74 struct bch_dev *ca = bch2_dev_have_ref(c, ptr->dev); 75 struct bpos bucket = PTR_BUCKET_POS(ca, ptr); 76 77 - if (ctxt) { 78 - bool locked; 79 - 80 - move_ctxt_wait_event(ctxt, 81 - (locked = bch2_bucket_nocow_trylock(&c->nocow_locks, bucket, 0)) || 82 - list_empty(&ctxt->ios)); 83 - 84 - if (!locked) 85 - bch2_bucket_nocow_lock(&c->nocow_locks, bucket, 0); 86 - } else { 87 - if (!bch2_bucket_nocow_trylock(&c->nocow_locks, bucket, 0)) { 88 - bkey_for_each_ptr(ptrs, ptr2) { 89 - if (ptr2 == ptr) 90 - break; 91 - 92 - ca = bch2_dev_have_ref(c, ptr2->dev); 93 - bucket = PTR_BUCKET_POS(ca, ptr2); 94 - bch2_bucket_nocow_unlock(&c->nocow_locks, bucket, 0); 95 - } 96 - return false; 97 - } 98 - } 99 } 100 return true; 101 } 102 ··· 255 bch2_print_str(c, KERN_ERR, buf.buf); 256 printbuf_exit(&buf); 257 258 - return -BCH_ERR_invalid_bkey; 259 } 260 261 static int __bch2_data_update_index_update(struct btree_trans *trans, ··· 376 bch2_bkey_durability(c, bkey_i_to_s_c(&new->k_i)); 377 378 /* Now, drop excess replicas: */ 379 - rcu_read_lock(); 380 restart_drop_extra_replicas: 381 - bkey_for_each_ptr_decode(old.k, bch2_bkey_ptrs(bkey_i_to_s(insert)), p, entry) { 382 - unsigned ptr_durability = bch2_extent_ptr_durability(c, &p); 383 384 - if (!p.ptr.cached && 385 - durability - ptr_durability >= m->op.opts.data_replicas) { 386 - durability -= ptr_durability; 387 388 - bch2_extent_ptr_set_cached(c, &m->op.opts, 389 - bkey_i_to_s(insert), &entry->ptr); 390 - goto restart_drop_extra_replicas; 391 } 392 } 393 - rcu_read_unlock(); 394 395 /* Finally, add the pointers we just wrote: */ 396 extent_for_each_ptr_decode(extent_i_to_s(new), p, entry) ··· 532 bch2_bkey_buf_exit(&update->k, c); 533 } 534 535 - static int bch2_update_unwritten_extent(struct btree_trans *trans, 536 - struct data_update *update) 537 { 538 struct bch_fs *c = update->op.c; 539 struct bkey_i_extent *e; ··· 726 bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); 727 } 728 729 - int bch2_data_update_bios_init(struct data_update *m, struct bch_fs *c, 730 - struct bch_io_opts *io_opts) 731 { 732 - struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(bkey_i_to_s_c(m->k.k)); 733 - const union bch_extent_entry *entry; 734 - struct extent_ptr_decoded p; 735 - 736 - /* write path might have to decompress data: */ 737 - unsigned buf_bytes = 0; 738 - bkey_for_each_ptr_decode(&m->k.k->k, ptrs, p, entry) 739 - buf_bytes = max_t(unsigned, buf_bytes, p.crc.uncompressed_size << 9); 740 - 741 unsigned nr_vecs = DIV_ROUND_UP(buf_bytes, PAGE_SIZE); 742 743 m->bvecs = kmalloc_array(nr_vecs, sizeof*(m->bvecs), GFP_KERNEL); ··· 753 return 0; 754 } 755 756 static int can_write_extent(struct bch_fs *c, struct data_update *m) 757 { 758 if ((m->op.flags & BCH_WRITE_alloc_nowait) && 759 unlikely(c->open_buckets_nr_free <= bch2_open_buckets_reserved(m->op.watermark))) 760 - return -BCH_ERR_data_update_done_would_block; 761 762 unsigned target = m->op.flags & BCH_WRITE_only_specified_devs 763 ? m->op.target ··· 782 darray_for_each(m->op.devs_have, i) 783 __clear_bit(*i, devs.d); 784 785 - rcu_read_lock(); 786 unsigned nr_replicas = 0, i; 787 for_each_set_bit(i, devs.d, BCH_SB_MEMBERS_MAX) { 788 struct bch_dev *ca = bch2_dev_rcu_noerror(c, i); ··· 800 if (nr_replicas >= m->op.nr_replicas) 801 break; 802 } 803 - rcu_read_unlock(); 804 805 if (!nr_replicas) 806 - return -BCH_ERR_data_update_done_no_rw_devs; 807 if (nr_replicas < m->op.nr_replicas) 808 - return -BCH_ERR_insufficient_devices; 809 return 0; 810 } 811 ··· 819 struct bkey_s_c k) 820 { 821 struct bch_fs *c = trans->c; 822 - struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 823 - const union bch_extent_entry *entry; 824 - struct extent_ptr_decoded p; 825 - unsigned reserve_sectors = k.k->size * data_opts.extra_replicas; 826 int ret = 0; 827 828 - /* 829 - * fs is corrupt we have a key for a snapshot node that doesn't exist, 830 - * and we have to check for this because we go rw before repairing the 831 - * snapshots table - just skip it, we can move it later. 832 - */ 833 - if (unlikely(k.k->p.snapshot && !bch2_snapshot_exists(c, k.k->p.snapshot))) 834 - return -BCH_ERR_data_update_done_no_snapshot; 835 836 bch2_bkey_buf_init(&m->k); 837 bch2_bkey_buf_reassemble(&m->k, c, k); ··· 861 862 unsigned durability_have = 0, durability_removing = 0; 863 864 unsigned ptr_bit = 1; 865 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { 866 if (!p.ptr.cached) { 867 - rcu_read_lock(); 868 if (ptr_bit & m->data_opts.rewrite_ptrs) { 869 if (crc_is_compressed(p.crc)) 870 reserve_sectors += k.k->size; ··· 882 bch2_dev_list_add_dev(&m->op.devs_have, p.ptr.dev); 883 durability_have += bch2_extent_ptr_durability(c, &p); 884 } 885 - rcu_read_unlock(); 886 } 887 888 /* ··· 896 897 if (p.crc.compression_type == BCH_COMPRESSION_TYPE_incompressible) 898 m->op.incompressible = true; 899 900 ptr_bit <<= 1; 901 } ··· 938 if (iter) 939 ret = bch2_extent_drop_ptrs(trans, iter, k, io_opts, &m->data_opts); 940 if (!ret) 941 - ret = -BCH_ERR_data_update_done_no_writes_needed; 942 goto out_bkey_buf_exit; 943 } 944 ··· 969 } 970 971 if (!bkey_get_dev_refs(c, k)) { 972 - ret = -BCH_ERR_data_update_done_no_dev_refs; 973 goto out_put_disk_res; 974 } 975 976 if (c->opts.nocow_enabled && 977 - !bkey_nocow_lock(c, ctxt, k)) { 978 - ret = -BCH_ERR_nocow_lock_blocked; 979 goto out_put_dev_refs; 980 } 981 982 - if (bkey_extent_is_unwritten(k)) { 983 ret = bch2_update_unwritten_extent(trans, m) ?: 984 - -BCH_ERR_data_update_done_unwritten; 985 goto out_nocow_unlock; 986 } 987 988 - ret = bch2_data_update_bios_init(m, c, io_opts); 989 if (ret) 990 goto out_nocow_unlock; 991
··· 66 } 67 } 68 69 + static noinline_for_stack 70 + bool __bkey_nocow_lock(struct bch_fs *c, struct moving_context *ctxt, struct bkey_ptrs_c ptrs, 71 + const struct bch_extent_ptr *start) 72 { 73 + if (!ctxt) { 74 + bkey_for_each_ptr(ptrs, ptr) { 75 + if (ptr == start) 76 + break; 77 78 + struct bch_dev *ca = bch2_dev_have_ref(c, ptr->dev); 79 + struct bpos bucket = PTR_BUCKET_POS(ca, ptr); 80 + bch2_bucket_nocow_unlock(&c->nocow_locks, bucket, 0); 81 + } 82 + return false; 83 + } 84 + 85 + __bkey_for_each_ptr(start, ptrs.end, ptr) { 86 + struct bch_dev *ca = bch2_dev_have_ref(c, ptr->dev); 87 + struct bpos bucket = PTR_BUCKET_POS(ca, ptr); 88 + 89 + bool locked; 90 + move_ctxt_wait_event(ctxt, 91 + (locked = bch2_bucket_nocow_trylock(&c->nocow_locks, bucket, 0)) || 92 + list_empty(&ctxt->ios)); 93 + if (!locked) 94 + bch2_bucket_nocow_lock(&c->nocow_locks, bucket, 0); 95 + } 96 + return true; 97 + } 98 + 99 + static bool bkey_nocow_lock(struct bch_fs *c, struct moving_context *ctxt, struct bkey_ptrs_c ptrs) 100 + { 101 bkey_for_each_ptr(ptrs, ptr) { 102 struct bch_dev *ca = bch2_dev_have_ref(c, ptr->dev); 103 struct bpos bucket = PTR_BUCKET_POS(ca, ptr); 104 105 + if (!bch2_bucket_nocow_trylock(&c->nocow_locks, bucket, 0)) 106 + return __bkey_nocow_lock(c, ctxt, ptrs, ptr); 107 } 108 + 109 return true; 110 } 111 ··· 246 bch2_print_str(c, KERN_ERR, buf.buf); 247 printbuf_exit(&buf); 248 249 + return bch_err_throw(c, invalid_bkey); 250 } 251 252 static int __bch2_data_update_index_update(struct btree_trans *trans, ··· 367 bch2_bkey_durability(c, bkey_i_to_s_c(&new->k_i)); 368 369 /* Now, drop excess replicas: */ 370 + scoped_guard(rcu) { 371 restart_drop_extra_replicas: 372 + bkey_for_each_ptr_decode(old.k, bch2_bkey_ptrs(bkey_i_to_s(insert)), p, entry) { 373 + unsigned ptr_durability = bch2_extent_ptr_durability(c, &p); 374 375 + if (!p.ptr.cached && 376 + durability - ptr_durability >= m->op.opts.data_replicas) { 377 + durability -= ptr_durability; 378 379 + bch2_extent_ptr_set_cached(c, &m->op.opts, 380 + bkey_i_to_s(insert), &entry->ptr); 381 + goto restart_drop_extra_replicas; 382 + } 383 } 384 } 385 386 /* Finally, add the pointers we just wrote: */ 387 extent_for_each_ptr_decode(extent_i_to_s(new), p, entry) ··· 523 bch2_bkey_buf_exit(&update->k, c); 524 } 525 526 + static noinline_for_stack 527 + int bch2_update_unwritten_extent(struct btree_trans *trans, 528 + struct data_update *update) 529 { 530 struct bch_fs *c = update->op.c; 531 struct bkey_i_extent *e; ··· 716 bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); 717 } 718 719 + static int __bch2_data_update_bios_init(struct data_update *m, struct bch_fs *c, 720 + struct bch_io_opts *io_opts, 721 + unsigned buf_bytes) 722 { 723 unsigned nr_vecs = DIV_ROUND_UP(buf_bytes, PAGE_SIZE); 724 725 m->bvecs = kmalloc_array(nr_vecs, sizeof*(m->bvecs), GFP_KERNEL); ··· 751 return 0; 752 } 753 754 + int bch2_data_update_bios_init(struct data_update *m, struct bch_fs *c, 755 + struct bch_io_opts *io_opts) 756 + { 757 + struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(bkey_i_to_s_c(m->k.k)); 758 + const union bch_extent_entry *entry; 759 + struct extent_ptr_decoded p; 760 + 761 + /* write path might have to decompress data: */ 762 + unsigned buf_bytes = 0; 763 + bkey_for_each_ptr_decode(&m->k.k->k, ptrs, p, entry) 764 + buf_bytes = max_t(unsigned, buf_bytes, p.crc.uncompressed_size << 9); 765 + 766 + return __bch2_data_update_bios_init(m, c, io_opts, buf_bytes); 767 + } 768 + 769 static int can_write_extent(struct bch_fs *c, struct data_update *m) 770 { 771 if ((m->op.flags & BCH_WRITE_alloc_nowait) && 772 unlikely(c->open_buckets_nr_free <= bch2_open_buckets_reserved(m->op.watermark))) 773 + return bch_err_throw(c, data_update_done_would_block); 774 775 unsigned target = m->op.flags & BCH_WRITE_only_specified_devs 776 ? m->op.target ··· 765 darray_for_each(m->op.devs_have, i) 766 __clear_bit(*i, devs.d); 767 768 + guard(rcu)(); 769 + 770 unsigned nr_replicas = 0, i; 771 for_each_set_bit(i, devs.d, BCH_SB_MEMBERS_MAX) { 772 struct bch_dev *ca = bch2_dev_rcu_noerror(c, i); ··· 782 if (nr_replicas >= m->op.nr_replicas) 783 break; 784 } 785 786 if (!nr_replicas) 787 + return bch_err_throw(c, data_update_done_no_rw_devs); 788 if (nr_replicas < m->op.nr_replicas) 789 + return bch_err_throw(c, insufficient_devices); 790 return 0; 791 } 792 ··· 802 struct bkey_s_c k) 803 { 804 struct bch_fs *c = trans->c; 805 int ret = 0; 806 807 + if (k.k->p.snapshot) { 808 + ret = bch2_check_key_has_snapshot(trans, iter, k); 809 + if (bch2_err_matches(ret, BCH_ERR_recovery_will_run)) { 810 + /* Can't repair yet, waiting on other recovery passes */ 811 + return bch_err_throw(c, data_update_done_no_snapshot); 812 + } 813 + if (ret < 0) 814 + return ret; 815 + if (ret) /* key was deleted */ 816 + return bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc) ?: 817 + bch_err_throw(c, data_update_done_no_snapshot); 818 + ret = 0; 819 + } 820 821 bch2_bkey_buf_init(&m->k); 822 bch2_bkey_buf_reassemble(&m->k, c, k); ··· 842 843 unsigned durability_have = 0, durability_removing = 0; 844 845 + struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(bkey_i_to_s_c(m->k.k)); 846 + const union bch_extent_entry *entry; 847 + struct extent_ptr_decoded p; 848 + unsigned reserve_sectors = k.k->size * data_opts.extra_replicas; 849 + unsigned buf_bytes = 0; 850 + bool unwritten = false; 851 + 852 unsigned ptr_bit = 1; 853 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { 854 if (!p.ptr.cached) { 855 + guard(rcu)(); 856 if (ptr_bit & m->data_opts.rewrite_ptrs) { 857 if (crc_is_compressed(p.crc)) 858 reserve_sectors += k.k->size; ··· 856 bch2_dev_list_add_dev(&m->op.devs_have, p.ptr.dev); 857 durability_have += bch2_extent_ptr_durability(c, &p); 858 } 859 } 860 861 /* ··· 871 872 if (p.crc.compression_type == BCH_COMPRESSION_TYPE_incompressible) 873 m->op.incompressible = true; 874 + 875 + buf_bytes = max_t(unsigned, buf_bytes, p.crc.uncompressed_size << 9); 876 + unwritten |= p.ptr.unwritten; 877 878 ptr_bit <<= 1; 879 } ··· 910 if (iter) 911 ret = bch2_extent_drop_ptrs(trans, iter, k, io_opts, &m->data_opts); 912 if (!ret) 913 + ret = bch_err_throw(c, data_update_done_no_writes_needed); 914 goto out_bkey_buf_exit; 915 } 916 ··· 941 } 942 943 if (!bkey_get_dev_refs(c, k)) { 944 + ret = bch_err_throw(c, data_update_done_no_dev_refs); 945 goto out_put_disk_res; 946 } 947 948 if (c->opts.nocow_enabled && 949 + !bkey_nocow_lock(c, ctxt, ptrs)) { 950 + ret = bch_err_throw(c, nocow_lock_blocked); 951 goto out_put_dev_refs; 952 } 953 954 + if (unwritten) { 955 ret = bch2_update_unwritten_extent(trans, m) ?: 956 + bch_err_throw(c, data_update_done_unwritten); 957 goto out_nocow_unlock; 958 } 959 960 + bch2_trans_unlock(trans); 961 + 962 + ret = __bch2_data_update_bios_init(m, c, io_opts, buf_bytes); 963 if (ret) 964 goto out_nocow_unlock; 965
+16 -14
fs/bcachefs/debug.c
··· 492 prt_printf(out, "journal pin %px:\t%llu\n", 493 &b->writes[1].journal, b->writes[1].journal.seq); 494 495 printbuf_indent_sub(out, 2); 496 } 497 ··· 510 i->ret = 0; 511 512 do { 513 - struct bucket_table *tbl; 514 - struct rhash_head *pos; 515 - struct btree *b; 516 - 517 ret = bch2_debugfs_flush_buf(i); 518 if (ret) 519 return ret; 520 521 - rcu_read_lock(); 522 i->buf.atomic++; 523 - tbl = rht_dereference_rcu(c->btree_cache.table.tbl, 524 - &c->btree_cache.table); 525 - if (i->iter < tbl->size) { 526 - rht_for_each_entry_rcu(b, pos, tbl, i->iter, hash) 527 - bch2_cached_btree_node_to_text(&i->buf, c, b); 528 - i->iter++; 529 - } else { 530 - done = true; 531 } 532 --i->buf.atomic; 533 - rcu_read_unlock(); 534 } while (!done); 535 536 if (i->buf.allocation_failure)
··· 492 prt_printf(out, "journal pin %px:\t%llu\n", 493 &b->writes[1].journal, b->writes[1].journal.seq); 494 495 + prt_printf(out, "ob:\t%u\n", b->ob.nr); 496 + 497 printbuf_indent_sub(out, 2); 498 } 499 ··· 508 i->ret = 0; 509 510 do { 511 ret = bch2_debugfs_flush_buf(i); 512 if (ret) 513 return ret; 514 515 i->buf.atomic++; 516 + scoped_guard(rcu) { 517 + struct bucket_table *tbl = 518 + rht_dereference_rcu(c->btree_cache.table.tbl, 519 + &c->btree_cache.table); 520 + if (i->iter < tbl->size) { 521 + struct rhash_head *pos; 522 + struct btree *b; 523 + 524 + rht_for_each_entry_rcu(b, pos, tbl, i->iter, hash) 525 + bch2_cached_btree_node_to_text(&i->buf, c, b); 526 + i->iter++; 527 + } else { 528 + done = true; 529 + } 530 } 531 --i->buf.atomic; 532 } while (!done); 533 534 if (i->buf.allocation_failure)
+84 -95
fs/bcachefs/dirent.c
··· 231 prt_printf(out, " type %s", bch2_d_type_str(d.v->d_type)); 232 } 233 234 - static struct bkey_i_dirent *dirent_alloc_key(struct btree_trans *trans, 235 subvol_inum dir, 236 u8 type, 237 - int name_len, int cf_name_len, 238 u64 dst) 239 { 240 - struct bkey_i_dirent *dirent; 241 - unsigned u64s = BKEY_U64s + dirent_val_u64s(name_len, cf_name_len); 242 - 243 - BUG_ON(u64s > U8_MAX); 244 - 245 - dirent = bch2_trans_kmalloc(trans, u64s * sizeof(u64)); 246 if (IS_ERR(dirent)) 247 return dirent; 248 249 bkey_dirent_init(&dirent->k_i); 250 - dirent->k.u64s = u64s; 251 252 if (type != DT_SUBVOL) { 253 dirent->v.d_inum = cpu_to_le64(dst); ··· 312 313 dirent->v.d_type = type; 314 dirent->v.d_unused = 0; 315 - dirent->v.d_casefold = cf_name_len ? 1 : 0; 316 317 - return dirent; 318 - } 319 - 320 - static void dirent_init_regular_name(struct bkey_i_dirent *dirent, 321 - const struct qstr *name) 322 - { 323 - EBUG_ON(dirent->v.d_casefold); 324 - 325 - memcpy(&dirent->v.d_name[0], name->name, name->len); 326 - memset(&dirent->v.d_name[name->len], 0, 327 - bkey_val_bytes(&dirent->k) - 328 - offsetof(struct bch_dirent, d_name) - 329 - name->len); 330 - } 331 - 332 - static void dirent_init_casefolded_name(struct bkey_i_dirent *dirent, 333 - const struct qstr *name, 334 - const struct qstr *cf_name) 335 - { 336 - EBUG_ON(!dirent->v.d_casefold); 337 - EBUG_ON(!cf_name->len); 338 - 339 - dirent->v.d_cf_name_block.d_name_len = cpu_to_le16(name->len); 340 - dirent->v.d_cf_name_block.d_cf_name_len = cpu_to_le16(cf_name->len); 341 - memcpy(&dirent->v.d_cf_name_block.d_names[0], name->name, name->len); 342 - memcpy(&dirent->v.d_cf_name_block.d_names[name->len], cf_name->name, cf_name->len); 343 - memset(&dirent->v.d_cf_name_block.d_names[name->len + cf_name->len], 0, 344 - bkey_val_bytes(&dirent->k) - 345 - offsetof(struct bch_dirent, d_cf_name_block.d_names) - 346 - name->len + cf_name->len); 347 - 348 - EBUG_ON(bch2_dirent_get_casefold_name(dirent_i_to_s_c(dirent)).len != cf_name->len); 349 - } 350 - 351 - static struct bkey_i_dirent *dirent_create_key(struct btree_trans *trans, 352 - const struct bch_hash_info *hash_info, 353 - subvol_inum dir, 354 - u8 type, 355 - const struct qstr *name, 356 - const struct qstr *cf_name, 357 - u64 dst) 358 - { 359 - struct bkey_i_dirent *dirent; 360 - struct qstr _cf_name; 361 - 362 - if (name->len > BCH_NAME_MAX) 363 - return ERR_PTR(-ENAMETOOLONG); 364 - 365 - if (hash_info->cf_encoding && !cf_name) { 366 - int ret = bch2_casefold(trans, hash_info, name, &_cf_name); 367 - if (ret) 368 - return ERR_PTR(ret); 369 - 370 - cf_name = &_cf_name; 371 - } 372 - 373 - dirent = dirent_alloc_key(trans, dir, type, name->len, cf_name ? cf_name->len : 0, dst); 374 - if (IS_ERR(dirent)) 375 - return dirent; 376 - 377 - if (cf_name) 378 - dirent_init_casefolded_name(dirent, name, cf_name); 379 - else 380 - dirent_init_regular_name(dirent, name); 381 382 EBUG_ON(bch2_dirent_get_name(dirent_i_to_s_c(dirent)).len != name->len); 383 - 384 return dirent; 385 } 386 ··· 332 struct bkey_i_dirent *dirent; 333 int ret; 334 335 - dirent = dirent_create_key(trans, hash_info, dir_inum, type, name, NULL, dst_inum); 336 ret = PTR_ERR_OR_ZERO(dirent); 337 if (ret) 338 return ret; ··· 356 struct bkey_i_dirent *dirent; 357 int ret; 358 359 - dirent = dirent_create_key(trans, hash_info, dir, type, name, NULL, dst_inum); 360 ret = PTR_ERR_OR_ZERO(dirent); 361 if (ret) 362 return ret; ··· 393 } 394 395 int bch2_dirent_rename(struct btree_trans *trans, 396 - subvol_inum src_dir, struct bch_hash_info *src_hash, u64 *src_dir_i_size, 397 - subvol_inum dst_dir, struct bch_hash_info *dst_hash, u64 *dst_dir_i_size, 398 const struct qstr *src_name, subvol_inum *src_inum, u64 *src_offset, 399 const struct qstr *dst_name, subvol_inum *dst_inum, u64 *dst_offset, 400 enum bch_rename_mode mode) ··· 461 *src_offset = dst_iter.pos.offset; 462 463 /* Create new dst key: */ 464 - new_dst = dirent_create_key(trans, dst_hash, dst_dir, 0, dst_name, 465 - dst_hash->cf_encoding ? &dst_name_lookup : NULL, 0); 466 ret = PTR_ERR_OR_ZERO(new_dst); 467 if (ret) 468 goto out; ··· 472 473 /* Create new src key: */ 474 if (mode == BCH_RENAME_EXCHANGE) { 475 - new_src = dirent_create_key(trans, src_hash, src_dir, 0, src_name, 476 - src_hash->cf_encoding ? &src_name_lookup : NULL, 0); 477 ret = PTR_ERR_OR_ZERO(new_src); 478 if (ret) 479 goto out; ··· 532 if ((mode == BCH_RENAME_EXCHANGE) && 533 new_src->v.d_type == DT_SUBVOL) 534 new_src->v.d_parent_subvol = cpu_to_le32(src_dir.subvol); 535 - 536 - if (old_dst.k) 537 - *dst_dir_i_size -= bkey_bytes(old_dst.k); 538 - *src_dir_i_size -= bkey_bytes(old_src.k); 539 - 540 - if (mode == BCH_RENAME_EXCHANGE) 541 - *src_dir_i_size += bkey_bytes(&new_src->k); 542 - *dst_dir_i_size += bkey_bytes(&new_dst->k); 543 544 ret = bch2_trans_update(trans, &dst_iter, &new_dst->k_i, 0); 545 if (ret) ··· 639 struct bkey_s_c_dirent d = bkey_s_c_to_dirent(k); 640 if (d.v->d_type == DT_SUBVOL && le32_to_cpu(d.v->d_parent_subvol) != subvol) 641 continue; 642 - ret = -BCH_ERR_ENOTEMPTY_dir_not_empty; 643 break; 644 } 645 bch2_trans_iter_exit(trans, &iter); ··· 675 return !ret; 676 } 677 678 - int bch2_readdir(struct bch_fs *c, subvol_inum inum, struct dir_context *ctx) 679 { 680 struct bkey_buf sk; 681 bch2_bkey_buf_init(&sk); ··· 695 struct bkey_s_c_dirent dirent = bkey_i_to_s_c_dirent(sk.k); 696 697 subvol_inum target; 698 - int ret2 = bch2_dirent_read_target(trans, inum, dirent, &target); 699 if (ret2 > 0) 700 continue; 701 ··· 729 ret = bch2_inode_unpack(k, inode); 730 goto found; 731 } 732 - ret = -BCH_ERR_ENOENT_inode; 733 found: 734 bch_err_msg(trans->c, ret, "fetching inode %llu", inode_nr); 735 bch2_trans_iter_exit(trans, &iter);
··· 231 prt_printf(out, " type %s", bch2_d_type_str(d.v->d_type)); 232 } 233 234 + int bch2_dirent_init_name(struct bkey_i_dirent *dirent, 235 + const struct bch_hash_info *hash_info, 236 + const struct qstr *name, 237 + const struct qstr *cf_name) 238 + { 239 + EBUG_ON(hash_info->cf_encoding == NULL && cf_name); 240 + int cf_len = 0; 241 + 242 + if (name->len > BCH_NAME_MAX) 243 + return -ENAMETOOLONG; 244 + 245 + dirent->v.d_casefold = hash_info->cf_encoding != NULL; 246 + 247 + if (!dirent->v.d_casefold) { 248 + memcpy(&dirent->v.d_name[0], name->name, name->len); 249 + memset(&dirent->v.d_name[name->len], 0, 250 + bkey_val_bytes(&dirent->k) - 251 + offsetof(struct bch_dirent, d_name) - 252 + name->len); 253 + } else { 254 + #ifdef CONFIG_UNICODE 255 + memcpy(&dirent->v.d_cf_name_block.d_names[0], name->name, name->len); 256 + 257 + char *cf_out = &dirent->v.d_cf_name_block.d_names[name->len]; 258 + 259 + if (cf_name) { 260 + cf_len = cf_name->len; 261 + 262 + memcpy(cf_out, cf_name->name, cf_name->len); 263 + } else { 264 + cf_len = utf8_casefold(hash_info->cf_encoding, name, 265 + cf_out, 266 + bkey_val_end(bkey_i_to_s(&dirent->k_i)) - (void *) cf_out); 267 + if (cf_len <= 0) 268 + return cf_len; 269 + } 270 + 271 + memset(&dirent->v.d_cf_name_block.d_names[name->len + cf_len], 0, 272 + bkey_val_bytes(&dirent->k) - 273 + offsetof(struct bch_dirent, d_cf_name_block.d_names) - 274 + name->len + cf_len); 275 + 276 + dirent->v.d_cf_name_block.d_name_len = cpu_to_le16(name->len); 277 + dirent->v.d_cf_name_block.d_cf_name_len = cpu_to_le16(cf_len); 278 + 279 + EBUG_ON(bch2_dirent_get_casefold_name(dirent_i_to_s_c(dirent)).len != cf_len); 280 + #else 281 + return -EOPNOTSUPP; 282 + #endif 283 + } 284 + 285 + unsigned u64s = dirent_val_u64s(name->len, cf_len); 286 + BUG_ON(u64s > bkey_val_u64s(&dirent->k)); 287 + set_bkey_val_u64s(&dirent->k, u64s); 288 + return 0; 289 + } 290 + 291 + struct bkey_i_dirent *bch2_dirent_create_key(struct btree_trans *trans, 292 + const struct bch_hash_info *hash_info, 293 subvol_inum dir, 294 u8 type, 295 + const struct qstr *name, 296 + const struct qstr *cf_name, 297 u64 dst) 298 { 299 + struct bkey_i_dirent *dirent = bch2_trans_kmalloc(trans, BKEY_U64s_MAX * sizeof(u64)); 300 if (IS_ERR(dirent)) 301 return dirent; 302 303 bkey_dirent_init(&dirent->k_i); 304 + dirent->k.u64s = BKEY_U64s_MAX; 305 306 if (type != DT_SUBVOL) { 307 dirent->v.d_inum = cpu_to_le64(dst); ··· 258 259 dirent->v.d_type = type; 260 dirent->v.d_unused = 0; 261 262 + int ret = bch2_dirent_init_name(dirent, hash_info, name, cf_name); 263 + if (ret) 264 + return ERR_PTR(ret); 265 266 EBUG_ON(bch2_dirent_get_name(dirent_i_to_s_c(dirent)).len != name->len); 267 return dirent; 268 } 269 ··· 341 struct bkey_i_dirent *dirent; 342 int ret; 343 344 + dirent = bch2_dirent_create_key(trans, hash_info, dir_inum, type, name, NULL, dst_inum); 345 ret = PTR_ERR_OR_ZERO(dirent); 346 if (ret) 347 return ret; ··· 365 struct bkey_i_dirent *dirent; 366 int ret; 367 368 + dirent = bch2_dirent_create_key(trans, hash_info, dir, type, name, NULL, dst_inum); 369 ret = PTR_ERR_OR_ZERO(dirent); 370 if (ret) 371 return ret; ··· 402 } 403 404 int bch2_dirent_rename(struct btree_trans *trans, 405 + subvol_inum src_dir, struct bch_hash_info *src_hash, 406 + subvol_inum dst_dir, struct bch_hash_info *dst_hash, 407 const struct qstr *src_name, subvol_inum *src_inum, u64 *src_offset, 408 const struct qstr *dst_name, subvol_inum *dst_inum, u64 *dst_offset, 409 enum bch_rename_mode mode) ··· 470 *src_offset = dst_iter.pos.offset; 471 472 /* Create new dst key: */ 473 + new_dst = bch2_dirent_create_key(trans, dst_hash, dst_dir, 0, dst_name, 474 + dst_hash->cf_encoding ? &dst_name_lookup : NULL, 0); 475 ret = PTR_ERR_OR_ZERO(new_dst); 476 if (ret) 477 goto out; ··· 481 482 /* Create new src key: */ 483 if (mode == BCH_RENAME_EXCHANGE) { 484 + new_src = bch2_dirent_create_key(trans, src_hash, src_dir, 0, src_name, 485 + src_hash->cf_encoding ? &src_name_lookup : NULL, 0); 486 ret = PTR_ERR_OR_ZERO(new_src); 487 if (ret) 488 goto out; ··· 541 if ((mode == BCH_RENAME_EXCHANGE) && 542 new_src->v.d_type == DT_SUBVOL) 543 new_src->v.d_parent_subvol = cpu_to_le32(src_dir.subvol); 544 545 ret = bch2_trans_update(trans, &dst_iter, &new_dst->k_i, 0); 546 if (ret) ··· 656 struct bkey_s_c_dirent d = bkey_s_c_to_dirent(k); 657 if (d.v->d_type == DT_SUBVOL && le32_to_cpu(d.v->d_parent_subvol) != subvol) 658 continue; 659 + ret = bch_err_throw(trans->c, ENOTEMPTY_dir_not_empty); 660 break; 661 } 662 bch2_trans_iter_exit(trans, &iter); ··· 692 return !ret; 693 } 694 695 + int bch2_readdir(struct bch_fs *c, subvol_inum inum, 696 + struct bch_hash_info *hash_info, 697 + struct dir_context *ctx) 698 { 699 struct bkey_buf sk; 700 bch2_bkey_buf_init(&sk); ··· 710 struct bkey_s_c_dirent dirent = bkey_i_to_s_c_dirent(sk.k); 711 712 subvol_inum target; 713 + 714 + bool need_second_pass = false; 715 + int ret2 = bch2_str_hash_check_key(trans, NULL, &bch2_dirent_hash_desc, 716 + hash_info, &iter, k, &need_second_pass) ?: 717 + bch2_dirent_read_target(trans, inum, dirent, &target); 718 if (ret2 > 0) 719 continue; 720 ··· 740 ret = bch2_inode_unpack(k, inode); 741 goto found; 742 } 743 + ret = bch_err_throw(trans->c, ENOENT_inode); 744 found: 745 bch_err_msg(trans->c, ret, "fetching inode %llu", inode_nr); 746 bch2_trans_iter_exit(trans, &iter);
+12 -4
fs/bcachefs/dirent.h
··· 38 } 39 } 40 41 - struct qstr bch2_dirent_get_name(struct bkey_s_c_dirent d); 42 43 static inline unsigned dirent_val_u64s(unsigned len, unsigned cf_len) 44 { ··· 58 dst->v.d_inum = src.v->d_inum; 59 dst->v.d_type = src.v->d_type; 60 } 61 62 int bch2_dirent_create_snapshot(struct btree_trans *, u32, u64, u32, 63 const struct bch_hash_info *, u8, ··· 88 }; 89 90 int bch2_dirent_rename(struct btree_trans *, 91 - subvol_inum, struct bch_hash_info *, u64 *, 92 - subvol_inum, struct bch_hash_info *, u64 *, 93 const struct qstr *, subvol_inum *, u64 *, 94 const struct qstr *, subvol_inum *, u64 *, 95 enum bch_rename_mode); ··· 103 104 int bch2_empty_dir_snapshot(struct btree_trans *, u64, u32, u32); 105 int bch2_empty_dir_trans(struct btree_trans *, subvol_inum); 106 - int bch2_readdir(struct bch_fs *, subvol_inum, struct dir_context *); 107 108 int bch2_fsck_remove_dirent(struct btree_trans *, struct bpos); 109
··· 38 } 39 } 40 41 + struct qstr bch2_dirent_get_name(struct bkey_s_c_dirent); 42 43 static inline unsigned dirent_val_u64s(unsigned len, unsigned cf_len) 44 { ··· 58 dst->v.d_inum = src.v->d_inum; 59 dst->v.d_type = src.v->d_type; 60 } 61 + 62 + int bch2_dirent_init_name(struct bkey_i_dirent *, 63 + const struct bch_hash_info *, 64 + const struct qstr *, 65 + const struct qstr *); 66 + struct bkey_i_dirent *bch2_dirent_create_key(struct btree_trans *, 67 + const struct bch_hash_info *, subvol_inum, u8, 68 + const struct qstr *, const struct qstr *, u64); 69 70 int bch2_dirent_create_snapshot(struct btree_trans *, u32, u64, u32, 71 const struct bch_hash_info *, u8, ··· 80 }; 81 82 int bch2_dirent_rename(struct btree_trans *, 83 + subvol_inum, struct bch_hash_info *, 84 + subvol_inum, struct bch_hash_info *, 85 const struct qstr *, subvol_inum *, u64 *, 86 const struct qstr *, subvol_inum *, u64 *, 87 enum bch_rename_mode); ··· 95 96 int bch2_empty_dir_snapshot(struct btree_trans *, u64, u32, u32); 97 int bch2_empty_dir_trans(struct btree_trans *, subvol_inum); 98 + int bch2_readdir(struct bch_fs *, subvol_inum, struct bch_hash_info *, struct dir_context *); 99 100 int bch2_fsck_remove_dirent(struct btree_trans *, struct bpos); 101
+18 -20
fs/bcachefs/disk_accounting.c
··· 390 err: 391 free_percpu(n.v[1]); 392 free_percpu(n.v[0]); 393 - return -BCH_ERR_ENOMEM_disk_accounting; 394 } 395 396 int bch2_accounting_mem_insert(struct bch_fs *c, struct bkey_s_c_accounting a, ··· 401 if (mode != BCH_ACCOUNTING_read && 402 accounting_to_replicas(&r.e, a.k->p) && 403 !bch2_replicas_marked_locked(c, &r.e)) 404 - return -BCH_ERR_btree_insert_need_mark_replicas; 405 406 percpu_up_read(&c->mark_lock); 407 percpu_down_write(&c->mark_lock); ··· 419 if (mode != BCH_ACCOUNTING_read && 420 accounting_to_replicas(&r.e, a.k->p) && 421 !bch2_replicas_marked_locked(c, &r.e)) 422 - return -BCH_ERR_btree_insert_need_mark_replicas; 423 424 return __bch2_accounting_mem_insert(c, a); 425 } ··· 559 sizeof(u64), GFP_KERNEL); 560 if (!e->v[1]) { 561 bch2_accounting_free_counters(acc, true); 562 - ret = -BCH_ERR_ENOMEM_disk_accounting; 563 break; 564 } 565 } ··· 737 bch2_disk_accounting_mod(trans, acc, v, nr, false)) ?: 738 -BCH_ERR_remove_disk_accounting_entry; 739 } else { 740 - ret = -BCH_ERR_remove_disk_accounting_entry; 741 } 742 goto fsck_err; 743 } ··· 897 case BCH_DISK_ACCOUNTING_replicas: 898 fs_usage_data_type_to_base(usage, k.replicas.data_type, v[0]); 899 break; 900 - case BCH_DISK_ACCOUNTING_dev_data_type: 901 - rcu_read_lock(); 902 struct bch_dev *ca = bch2_dev_rcu_noerror(c, k.dev_data_type.dev); 903 if (ca) { 904 struct bch_dev_usage_type __percpu *d = &ca->usage->d[k.dev_data_type.data_type]; ··· 910 k.dev_data_type.data_type == BCH_DATA_journal) 911 usage->hidden += v[0] * ca->mi.bucket_size; 912 } 913 - rcu_read_unlock(); 914 break; 915 } 916 } 917 preempt_enable(); ··· 1006 case BCH_DISK_ACCOUNTING_replicas: 1007 fs_usage_data_type_to_base(&base, acc_k.replicas.data_type, a.v->d[0]); 1008 break; 1009 - case BCH_DISK_ACCOUNTING_dev_data_type: { 1010 - rcu_read_lock(); 1011 - struct bch_dev *ca = bch2_dev_rcu_noerror(c, acc_k.dev_data_type.dev); 1012 - if (!ca) { 1013 - rcu_read_unlock(); 1014 - continue; 1015 - } 1016 1017 - v[0] = percpu_u64_get(&ca->usage->d[acc_k.dev_data_type.data_type].buckets); 1018 - v[1] = percpu_u64_get(&ca->usage->d[acc_k.dev_data_type.data_type].sectors); 1019 - v[2] = percpu_u64_get(&ca->usage->d[acc_k.dev_data_type.data_type].fragmented); 1020 - rcu_read_unlock(); 1021 1022 if (memcmp(a.v->d, v, 3 * sizeof(u64))) { 1023 struct printbuf buf = PRINTBUF; ··· 1030 printbuf_exit(&buf); 1031 mismatch = true; 1032 } 1033 - } 1034 } 1035 1036 0;
··· 390 err: 391 free_percpu(n.v[1]); 392 free_percpu(n.v[0]); 393 + return bch_err_throw(c, ENOMEM_disk_accounting); 394 } 395 396 int bch2_accounting_mem_insert(struct bch_fs *c, struct bkey_s_c_accounting a, ··· 401 if (mode != BCH_ACCOUNTING_read && 402 accounting_to_replicas(&r.e, a.k->p) && 403 !bch2_replicas_marked_locked(c, &r.e)) 404 + return bch_err_throw(c, btree_insert_need_mark_replicas); 405 406 percpu_up_read(&c->mark_lock); 407 percpu_down_write(&c->mark_lock); ··· 419 if (mode != BCH_ACCOUNTING_read && 420 accounting_to_replicas(&r.e, a.k->p) && 421 !bch2_replicas_marked_locked(c, &r.e)) 422 + return bch_err_throw(c, btree_insert_need_mark_replicas); 423 424 return __bch2_accounting_mem_insert(c, a); 425 } ··· 559 sizeof(u64), GFP_KERNEL); 560 if (!e->v[1]) { 561 bch2_accounting_free_counters(acc, true); 562 + ret = bch_err_throw(c, ENOMEM_disk_accounting); 563 break; 564 } 565 } ··· 737 bch2_disk_accounting_mod(trans, acc, v, nr, false)) ?: 738 -BCH_ERR_remove_disk_accounting_entry; 739 } else { 740 + ret = bch_err_throw(c, remove_disk_accounting_entry); 741 } 742 goto fsck_err; 743 } ··· 897 case BCH_DISK_ACCOUNTING_replicas: 898 fs_usage_data_type_to_base(usage, k.replicas.data_type, v[0]); 899 break; 900 + case BCH_DISK_ACCOUNTING_dev_data_type: { 901 + guard(rcu)(); 902 struct bch_dev *ca = bch2_dev_rcu_noerror(c, k.dev_data_type.dev); 903 if (ca) { 904 struct bch_dev_usage_type __percpu *d = &ca->usage->d[k.dev_data_type.data_type]; ··· 910 k.dev_data_type.data_type == BCH_DATA_journal) 911 usage->hidden += v[0] * ca->mi.bucket_size; 912 } 913 break; 914 + } 915 } 916 } 917 preempt_enable(); ··· 1006 case BCH_DISK_ACCOUNTING_replicas: 1007 fs_usage_data_type_to_base(&base, acc_k.replicas.data_type, a.v->d[0]); 1008 break; 1009 + case BCH_DISK_ACCOUNTING_dev_data_type: 1010 + { 1011 + guard(rcu)(); /* scoped guard is a loop, and doesn't play nicely with continue */ 1012 + struct bch_dev *ca = bch2_dev_rcu_noerror(c, acc_k.dev_data_type.dev); 1013 + if (!ca) 1014 + continue; 1015 1016 + v[0] = percpu_u64_get(&ca->usage->d[acc_k.dev_data_type.data_type].buckets); 1017 + v[1] = percpu_u64_get(&ca->usage->d[acc_k.dev_data_type.data_type].sectors); 1018 + v[2] = percpu_u64_get(&ca->usage->d[acc_k.dev_data_type.data_type].fragmented); 1019 + } 1020 1021 if (memcmp(a.v->d, v, 3 * sizeof(u64))) { 1022 struct printbuf buf = PRINTBUF; ··· 1031 printbuf_exit(&buf); 1032 mismatch = true; 1033 } 1034 } 1035 1036 0;
+3 -3
fs/bcachefs/disk_accounting.h
··· 174 case BCH_DISK_ACCOUNTING_replicas: 175 fs_usage_data_type_to_base(&trans->fs_usage_delta, acc_k.replicas.data_type, a.v->d[0]); 176 break; 177 - case BCH_DISK_ACCOUNTING_dev_data_type: 178 - rcu_read_lock(); 179 struct bch_dev *ca = bch2_dev_rcu_noerror(c, acc_k.dev_data_type.dev); 180 if (ca) { 181 this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].buckets, a.v->d[0]); 182 this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].sectors, a.v->d[1]); 183 this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].fragmented, a.v->d[2]); 184 } 185 - rcu_read_unlock(); 186 break; 187 } 188 } 189
··· 174 case BCH_DISK_ACCOUNTING_replicas: 175 fs_usage_data_type_to_base(&trans->fs_usage_delta, acc_k.replicas.data_type, a.v->d[0]); 176 break; 177 + case BCH_DISK_ACCOUNTING_dev_data_type: { 178 + guard(rcu)(); 179 struct bch_dev *ca = bch2_dev_rcu_noerror(c, acc_k.dev_data_type.dev); 180 if (ca) { 181 this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].buckets, a.v->d[0]); 182 this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].sectors, a.v->d[1]); 183 this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].fragmented, a.v->d[2]); 184 } 185 break; 186 + } 187 } 188 } 189
+12 -25
fs/bcachefs/disk_groups.c
··· 130 131 cpu_g = kzalloc(struct_size(cpu_g, entries, nr_groups), GFP_KERNEL); 132 if (!cpu_g) 133 - return -BCH_ERR_ENOMEM_disk_groups_to_cpu; 134 135 cpu_g->nr = nr_groups; 136 ··· 170 const struct bch_devs_mask *bch2_target_to_mask(struct bch_fs *c, unsigned target) 171 { 172 struct target t = target_decode(target); 173 - struct bch_devs_mask *devs; 174 175 - rcu_read_lock(); 176 177 switch (t.type) { 178 case TARGET_NULL: 179 - devs = NULL; 180 - break; 181 case TARGET_DEV: { 182 struct bch_dev *ca = t.dev < c->sb.nr_devices 183 ? rcu_dereference(c->devs[t.dev]) 184 : NULL; 185 - devs = ca ? &ca->self : NULL; 186 - break; 187 } 188 case TARGET_GROUP: { 189 struct bch_disk_groups_cpu *g = rcu_dereference(c->disk_groups); 190 191 - devs = g && t.group < g->nr && !g->entries[t.group].deleted 192 ? &g->entries[t.group].devs 193 : NULL; 194 - break; 195 } 196 default: 197 BUG(); 198 } 199 - 200 - rcu_read_unlock(); 201 - 202 - return devs; 203 } 204 205 bool bch2_dev_in_target(struct bch_fs *c, unsigned dev, unsigned target) ··· 376 bch2_printbuf_make_room(out, 4096); 377 378 out->atomic++; 379 - rcu_read_lock(); 380 struct bch_disk_groups_cpu *g = rcu_dereference(c->disk_groups); 381 382 for (unsigned i = 0; i < (g ? g->nr : 0); i++) { ··· 397 prt_newline(out); 398 } 399 400 - rcu_read_unlock(); 401 out->atomic--; 402 } 403 404 void bch2_disk_path_to_text(struct printbuf *out, struct bch_fs *c, unsigned v) 405 { 406 out->atomic++; 407 - rcu_read_lock(); 408 __bch2_disk_path_to_text(out, rcu_dereference(c->disk_groups), v), 409 - rcu_read_unlock(); 410 --out->atomic; 411 } 412 ··· 525 switch (t.type) { 526 case TARGET_NULL: 527 prt_printf(out, "none"); 528 - break; 529 case TARGET_DEV: { 530 - struct bch_dev *ca; 531 - 532 out->atomic++; 533 - rcu_read_lock(); 534 - ca = t.dev < c->sb.nr_devices 535 ? rcu_dereference(c->devs[t.dev]) 536 : NULL; 537 ··· 540 else 541 prt_printf(out, "invalid device %u", t.dev); 542 543 - rcu_read_unlock(); 544 out->atomic--; 545 - break; 546 } 547 case TARGET_GROUP: 548 bch2_disk_path_to_text(out, c, t.group); 549 - break; 550 default: 551 BUG(); 552 }
··· 130 131 cpu_g = kzalloc(struct_size(cpu_g, entries, nr_groups), GFP_KERNEL); 132 if (!cpu_g) 133 + return bch_err_throw(c, ENOMEM_disk_groups_to_cpu); 134 135 cpu_g->nr = nr_groups; 136 ··· 170 const struct bch_devs_mask *bch2_target_to_mask(struct bch_fs *c, unsigned target) 171 { 172 struct target t = target_decode(target); 173 174 + guard(rcu)(); 175 176 switch (t.type) { 177 case TARGET_NULL: 178 + return NULL; 179 case TARGET_DEV: { 180 struct bch_dev *ca = t.dev < c->sb.nr_devices 181 ? rcu_dereference(c->devs[t.dev]) 182 : NULL; 183 + return ca ? &ca->self : NULL; 184 } 185 case TARGET_GROUP: { 186 struct bch_disk_groups_cpu *g = rcu_dereference(c->disk_groups); 187 188 + return g && t.group < g->nr && !g->entries[t.group].deleted 189 ? &g->entries[t.group].devs 190 : NULL; 191 } 192 default: 193 BUG(); 194 } 195 } 196 197 bool bch2_dev_in_target(struct bch_fs *c, unsigned dev, unsigned target) ··· 384 bch2_printbuf_make_room(out, 4096); 385 386 out->atomic++; 387 + guard(rcu)(); 388 struct bch_disk_groups_cpu *g = rcu_dereference(c->disk_groups); 389 390 for (unsigned i = 0; i < (g ? g->nr : 0); i++) { ··· 405 prt_newline(out); 406 } 407 408 out->atomic--; 409 } 410 411 void bch2_disk_path_to_text(struct printbuf *out, struct bch_fs *c, unsigned v) 412 { 413 out->atomic++; 414 + guard(rcu)(); 415 __bch2_disk_path_to_text(out, rcu_dereference(c->disk_groups), v), 416 --out->atomic; 417 } 418 ··· 535 switch (t.type) { 536 case TARGET_NULL: 537 prt_printf(out, "none"); 538 + return; 539 case TARGET_DEV: { 540 out->atomic++; 541 + guard(rcu)(); 542 + struct bch_dev *ca = t.dev < c->sb.nr_devices 543 ? rcu_dereference(c->devs[t.dev]) 544 : NULL; 545 ··· 552 else 553 prt_printf(out, "invalid device %u", t.dev); 554 555 out->atomic--; 556 + return; 557 } 558 case TARGET_GROUP: 559 bch2_disk_path_to_text(out, c, t.group); 560 + return; 561 default: 562 BUG(); 563 }
+55 -53
fs/bcachefs/ec.c
··· 213 a->dirty_sectors, 214 a->stripe, s.k->p.offset, 215 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 216 - ret = -BCH_ERR_mark_stripe; 217 goto err; 218 } 219 ··· 224 a->dirty_sectors, 225 a->cached_sectors, 226 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 227 - ret = -BCH_ERR_mark_stripe; 228 goto err; 229 } 230 } else { ··· 234 bucket.inode, bucket.offset, a->gen, 235 a->stripe, 236 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 237 - ret = -BCH_ERR_mark_stripe; 238 goto err; 239 } 240 ··· 244 bch2_data_type_str(a->data_type), 245 bch2_data_type_str(data_type), 246 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 247 - ret = -BCH_ERR_mark_stripe; 248 goto err; 249 } 250 ··· 256 a->dirty_sectors, 257 a->cached_sectors, 258 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 259 - ret = -BCH_ERR_mark_stripe; 260 goto err; 261 } 262 } ··· 295 struct bch_dev *ca = bch2_dev_tryget(c, ptr->dev); 296 if (unlikely(!ca)) { 297 if (ptr->dev != BCH_SB_MEMBER_INVALID && !(flags & BTREE_TRIGGER_overwrite)) 298 - ret = -BCH_ERR_mark_stripe; 299 goto err; 300 } 301 ··· 325 if (bch2_fs_inconsistent_on(!g, c, "reference to invalid bucket on device %u\n%s", 326 ptr->dev, 327 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 328 - ret = -BCH_ERR_mark_stripe; 329 goto err; 330 } 331 ··· 428 gc = genradix_ptr_alloc(&c->gc_stripes, idx, GFP_KERNEL); 429 if (!gc) { 430 bch_err(c, "error allocating memory for gc_stripes, idx %llu", idx); 431 - return -BCH_ERR_ENOMEM_mark_stripe; 432 } 433 434 /* ··· 536 } 537 538 /* XXX: this is a non-mempoolified memory allocation: */ 539 - static int ec_stripe_buf_init(struct ec_stripe_buf *buf, 540 unsigned offset, unsigned size) 541 { 542 struct bch_stripe *v = &bkey_i_to_stripe(&buf->key)->v; ··· 565 return 0; 566 err: 567 ec_stripe_buf_exit(buf); 568 - return -BCH_ERR_ENOMEM_stripe_buf; 569 } 570 571 /* Checksumming: */ ··· 841 842 buf = kzalloc(sizeof(*buf), GFP_NOFS); 843 if (!buf) 844 - return -BCH_ERR_ENOMEM_ec_read_extent; 845 846 ret = lockrestart_do(trans, get_stripe_key_trans(trans, rbio->pick.ec.idx, buf)); 847 if (ret) { ··· 862 goto err; 863 } 864 865 - ret = ec_stripe_buf_init(buf, offset, bio_sectors(&rbio->bio)); 866 if (ret) { 867 msg = "-ENOMEM"; 868 goto err; ··· 895 bch_err_ratelimited(c, 896 "error doing reconstruct read: %s\n %s", msg, msgbuf.buf); 897 printbuf_exit(&msgbuf); 898 - ret = -BCH_ERR_stripe_reconstruct; 899 goto out; 900 } 901 ··· 905 { 906 if (c->gc_pos.phase != GC_PHASE_not_running && 907 !genradix_ptr_alloc(&c->gc_stripes, idx, gfp)) 908 - return -BCH_ERR_ENOMEM_ec_stripe_mem_alloc; 909 910 return 0; 911 } ··· 1130 1131 bch2_fs_inconsistent(c, "%s", buf.buf); 1132 printbuf_exit(&buf); 1133 - return -BCH_ERR_erasure_coding_found_btree_node; 1134 } 1135 1136 k = bch2_backpointer_get_key(trans, bp, &iter, BTREE_ITER_intent, last_flushed); ··· 1196 1197 struct bch_dev *ca = bch2_dev_tryget(c, ptr.dev); 1198 if (!ca) 1199 - return -BCH_ERR_ENOENT_dev_not_found; 1200 1201 struct bpos bucket_pos = PTR_BUCKET_POS(ca, &ptr); 1202 ··· 1257 struct bch_dev *ca = bch2_dev_get_ioref(c, ob->dev, WRITE, 1258 BCH_DEV_WRITE_REF_ec_bucket_zero); 1259 if (!ca) { 1260 - s->err = -BCH_ERR_erofs_no_writes; 1261 return; 1262 } 1263 ··· 1321 1322 if (ec_do_recov(c, &s->existing_stripe)) { 1323 bch_err(c, "error creating stripe: error reading existing stripe"); 1324 - ret = -BCH_ERR_ec_block_read; 1325 goto err; 1326 } 1327 ··· 1347 1348 if (ec_nr_failed(&s->new_stripe)) { 1349 bch_err(c, "error creating stripe: error writing redundancy buckets"); 1350 - ret = -BCH_ERR_ec_block_write; 1351 goto err; 1352 } 1353 ··· 1579 static void ec_stripe_head_devs_update(struct bch_fs *c, struct ec_stripe_head *h) 1580 { 1581 struct bch_devs_mask devs = h->devs; 1582 1583 - rcu_read_lock(); 1584 - h->devs = target_rw_devs(c, BCH_DATA_user, h->disk_label 1585 - ? group_to_target(h->disk_label - 1) 1586 - : 0); 1587 - unsigned nr_devs = dev_mask_nr(&h->devs); 1588 1589 - for_each_member_device_rcu(c, ca, &h->devs) 1590 - if (!ca->mi.durability) 1591 - __clear_bit(ca->dev_idx, h->devs.d); 1592 - unsigned nr_devs_with_durability = dev_mask_nr(&h->devs); 1593 1594 - h->blocksize = pick_blocksize(c, &h->devs); 1595 1596 - h->nr_active_devs = 0; 1597 - for_each_member_device_rcu(c, ca, &h->devs) 1598 - if (ca->mi.bucket_size == h->blocksize) 1599 - h->nr_active_devs++; 1600 - 1601 - rcu_read_unlock(); 1602 1603 /* 1604 * If we only have redundancy + 1 devices, we're better off with just ··· 1866 s->nr_data = existing_v->nr_blocks - 1867 existing_v->nr_redundant; 1868 1869 - int ret = ec_stripe_buf_init(&s->existing_stripe, 0, le16_to_cpu(existing_v->sectors)); 1870 if (ret) { 1871 bch2_stripe_close(c, s); 1872 return ret; ··· 1926 } 1927 bch2_trans_iter_exit(trans, &lru_iter); 1928 if (!ret) 1929 - ret = -BCH_ERR_stripe_alloc_blocked; 1930 if (ret == 1) 1931 ret = 0; 1932 if (ret) ··· 1967 continue; 1968 } 1969 1970 - ret = -BCH_ERR_ENOSPC_stripe_create; 1971 break; 1972 } 1973 ··· 2025 if (!h->s) { 2026 h->s = ec_new_stripe_alloc(c, h); 2027 if (!h->s) { 2028 - ret = -BCH_ERR_ENOMEM_ec_new_stripe_alloc; 2029 bch_err(c, "failed to allocate new stripe"); 2030 goto err; 2031 } ··· 2090 goto err; 2091 2092 allocate_buf: 2093 - ret = ec_stripe_buf_init(&s->new_stripe, 0, h->blocksize); 2094 if (ret) 2095 goto err; 2096 ··· 2116 if (k.k->type != KEY_TYPE_stripe) 2117 return 0; 2118 2119 struct bkey_i_stripe *s = 2120 bch2_bkey_make_mut_typed(trans, iter, &k, 0, stripe); 2121 int ret = PTR_ERR_OR_ZERO(s); ··· 2143 2144 unsigned nr_good = 0; 2145 2146 - rcu_read_lock(); 2147 - bkey_for_each_ptr(ptrs, ptr) { 2148 - if (ptr->dev == dev_idx) 2149 - ptr->dev = BCH_SB_MEMBER_INVALID; 2150 2151 - struct bch_dev *ca = bch2_dev_rcu(trans->c, ptr->dev); 2152 - nr_good += ca && ca->mi.state != BCH_MEMBER_STATE_failed; 2153 - } 2154 - rcu_read_unlock(); 2155 2156 if (nr_good < s->v.nr_blocks && !(flags & BCH_FORCE_IF_DATA_DEGRADED)) 2157 - return -BCH_ERR_remove_would_lose_data; 2158 2159 unsigned nr_data = s->v.nr_blocks - s->v.nr_redundant; 2160 2161 if (nr_good < nr_data && !(flags & BCH_FORCE_IF_DATA_LOST)) 2162 - return -BCH_ERR_remove_would_lose_data; 2163 2164 sectors = -sectors; 2165 ··· 2179 return 0; 2180 2181 if (a->stripe_sectors) { 2182 - bch_err(trans->c, "trying to invalidate device in stripe when bucket has stripe data"); 2183 - return -BCH_ERR_invalidate_stripe_to_dev; 2184 } 2185 2186 struct btree_iter iter; 2187 struct bkey_s_c_stripe s = 2188 bch2_bkey_get_iter_typed(trans, &iter, BTREE_ID_stripes, POS(0, a->stripe), 2189 - BTREE_ITER_slots, stripe); 2190 int ret = bkey_err(s); 2191 if (ret) 2192 return ret;
··· 213 a->dirty_sectors, 214 a->stripe, s.k->p.offset, 215 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 216 + ret = bch_err_throw(c, mark_stripe); 217 goto err; 218 } 219 ··· 224 a->dirty_sectors, 225 a->cached_sectors, 226 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 227 + ret = bch_err_throw(c, mark_stripe); 228 goto err; 229 } 230 } else { ··· 234 bucket.inode, bucket.offset, a->gen, 235 a->stripe, 236 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 237 + ret = bch_err_throw(c, mark_stripe); 238 goto err; 239 } 240 ··· 244 bch2_data_type_str(a->data_type), 245 bch2_data_type_str(data_type), 246 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 247 + ret = bch_err_throw(c, mark_stripe); 248 goto err; 249 } 250 ··· 256 a->dirty_sectors, 257 a->cached_sectors, 258 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 259 + ret = bch_err_throw(c, mark_stripe); 260 goto err; 261 } 262 } ··· 295 struct bch_dev *ca = bch2_dev_tryget(c, ptr->dev); 296 if (unlikely(!ca)) { 297 if (ptr->dev != BCH_SB_MEMBER_INVALID && !(flags & BTREE_TRIGGER_overwrite)) 298 + ret = bch_err_throw(c, mark_stripe); 299 goto err; 300 } 301 ··· 325 if (bch2_fs_inconsistent_on(!g, c, "reference to invalid bucket on device %u\n%s", 326 ptr->dev, 327 (bch2_bkey_val_to_text(&buf, c, s.s_c), buf.buf))) { 328 + ret = bch_err_throw(c, mark_stripe); 329 goto err; 330 } 331 ··· 428 gc = genradix_ptr_alloc(&c->gc_stripes, idx, GFP_KERNEL); 429 if (!gc) { 430 bch_err(c, "error allocating memory for gc_stripes, idx %llu", idx); 431 + return bch_err_throw(c, ENOMEM_mark_stripe); 432 } 433 434 /* ··· 536 } 537 538 /* XXX: this is a non-mempoolified memory allocation: */ 539 + static int ec_stripe_buf_init(struct bch_fs *c, 540 + struct ec_stripe_buf *buf, 541 unsigned offset, unsigned size) 542 { 543 struct bch_stripe *v = &bkey_i_to_stripe(&buf->key)->v; ··· 564 return 0; 565 err: 566 ec_stripe_buf_exit(buf); 567 + return bch_err_throw(c, ENOMEM_stripe_buf); 568 } 569 570 /* Checksumming: */ ··· 840 841 buf = kzalloc(sizeof(*buf), GFP_NOFS); 842 if (!buf) 843 + return bch_err_throw(c, ENOMEM_ec_read_extent); 844 845 ret = lockrestart_do(trans, get_stripe_key_trans(trans, rbio->pick.ec.idx, buf)); 846 if (ret) { ··· 861 goto err; 862 } 863 864 + ret = ec_stripe_buf_init(c, buf, offset, bio_sectors(&rbio->bio)); 865 if (ret) { 866 msg = "-ENOMEM"; 867 goto err; ··· 894 bch_err_ratelimited(c, 895 "error doing reconstruct read: %s\n %s", msg, msgbuf.buf); 896 printbuf_exit(&msgbuf); 897 + ret = bch_err_throw(c, stripe_reconstruct); 898 goto out; 899 } 900 ··· 904 { 905 if (c->gc_pos.phase != GC_PHASE_not_running && 906 !genradix_ptr_alloc(&c->gc_stripes, idx, gfp)) 907 + return bch_err_throw(c, ENOMEM_ec_stripe_mem_alloc); 908 909 return 0; 910 } ··· 1129 1130 bch2_fs_inconsistent(c, "%s", buf.buf); 1131 printbuf_exit(&buf); 1132 + return bch_err_throw(c, erasure_coding_found_btree_node); 1133 } 1134 1135 k = bch2_backpointer_get_key(trans, bp, &iter, BTREE_ITER_intent, last_flushed); ··· 1195 1196 struct bch_dev *ca = bch2_dev_tryget(c, ptr.dev); 1197 if (!ca) 1198 + return bch_err_throw(c, ENOENT_dev_not_found); 1199 1200 struct bpos bucket_pos = PTR_BUCKET_POS(ca, &ptr); 1201 ··· 1256 struct bch_dev *ca = bch2_dev_get_ioref(c, ob->dev, WRITE, 1257 BCH_DEV_WRITE_REF_ec_bucket_zero); 1258 if (!ca) { 1259 + s->err = bch_err_throw(c, erofs_no_writes); 1260 return; 1261 } 1262 ··· 1320 1321 if (ec_do_recov(c, &s->existing_stripe)) { 1322 bch_err(c, "error creating stripe: error reading existing stripe"); 1323 + ret = bch_err_throw(c, ec_block_read); 1324 goto err; 1325 } 1326 ··· 1346 1347 if (ec_nr_failed(&s->new_stripe)) { 1348 bch_err(c, "error creating stripe: error writing redundancy buckets"); 1349 + ret = bch_err_throw(c, ec_block_write); 1350 goto err; 1351 } 1352 ··· 1578 static void ec_stripe_head_devs_update(struct bch_fs *c, struct ec_stripe_head *h) 1579 { 1580 struct bch_devs_mask devs = h->devs; 1581 + unsigned nr_devs, nr_devs_with_durability; 1582 1583 + scoped_guard(rcu) { 1584 + h->devs = target_rw_devs(c, BCH_DATA_user, h->disk_label 1585 + ? group_to_target(h->disk_label - 1) 1586 + : 0); 1587 + nr_devs = dev_mask_nr(&h->devs); 1588 1589 + for_each_member_device_rcu(c, ca, &h->devs) 1590 + if (!ca->mi.durability) 1591 + __clear_bit(ca->dev_idx, h->devs.d); 1592 + nr_devs_with_durability = dev_mask_nr(&h->devs); 1593 1594 + h->blocksize = pick_blocksize(c, &h->devs); 1595 1596 + h->nr_active_devs = 0; 1597 + for_each_member_device_rcu(c, ca, &h->devs) 1598 + if (ca->mi.bucket_size == h->blocksize) 1599 + h->nr_active_devs++; 1600 + } 1601 1602 /* 1603 * If we only have redundancy + 1 devices, we're better off with just ··· 1865 s->nr_data = existing_v->nr_blocks - 1866 existing_v->nr_redundant; 1867 1868 + int ret = ec_stripe_buf_init(c, &s->existing_stripe, 0, le16_to_cpu(existing_v->sectors)); 1869 if (ret) { 1870 bch2_stripe_close(c, s); 1871 return ret; ··· 1925 } 1926 bch2_trans_iter_exit(trans, &lru_iter); 1927 if (!ret) 1928 + ret = bch_err_throw(c, stripe_alloc_blocked); 1929 if (ret == 1) 1930 ret = 0; 1931 if (ret) ··· 1966 continue; 1967 } 1968 1969 + ret = bch_err_throw(c, ENOSPC_stripe_create); 1970 break; 1971 } 1972 ··· 2024 if (!h->s) { 2025 h->s = ec_new_stripe_alloc(c, h); 2026 if (!h->s) { 2027 + ret = bch_err_throw(c, ENOMEM_ec_new_stripe_alloc); 2028 bch_err(c, "failed to allocate new stripe"); 2029 goto err; 2030 } ··· 2089 goto err; 2090 2091 allocate_buf: 2092 + ret = ec_stripe_buf_init(c, &s->new_stripe, 0, h->blocksize); 2093 if (ret) 2094 goto err; 2095 ··· 2115 if (k.k->type != KEY_TYPE_stripe) 2116 return 0; 2117 2118 + struct bch_fs *c = trans->c; 2119 struct bkey_i_stripe *s = 2120 bch2_bkey_make_mut_typed(trans, iter, &k, 0, stripe); 2121 int ret = PTR_ERR_OR_ZERO(s); ··· 2141 2142 unsigned nr_good = 0; 2143 2144 + scoped_guard(rcu) 2145 + bkey_for_each_ptr(ptrs, ptr) { 2146 + if (ptr->dev == dev_idx) 2147 + ptr->dev = BCH_SB_MEMBER_INVALID; 2148 2149 + struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 2150 + nr_good += ca && ca->mi.state != BCH_MEMBER_STATE_failed; 2151 + } 2152 2153 if (nr_good < s->v.nr_blocks && !(flags & BCH_FORCE_IF_DATA_DEGRADED)) 2154 + return bch_err_throw(c, remove_would_lose_data); 2155 2156 unsigned nr_data = s->v.nr_blocks - s->v.nr_redundant; 2157 2158 if (nr_good < nr_data && !(flags & BCH_FORCE_IF_DATA_LOST)) 2159 + return bch_err_throw(c, remove_would_lose_data); 2160 2161 sectors = -sectors; 2162 ··· 2178 return 0; 2179 2180 if (a->stripe_sectors) { 2181 + struct bch_fs *c = trans->c; 2182 + bch_err(c, "trying to invalidate device in stripe when bucket has stripe data"); 2183 + return bch_err_throw(c, invalidate_stripe_to_dev); 2184 } 2185 2186 struct btree_iter iter; 2187 struct bkey_s_c_stripe s = 2188 bch2_bkey_get_iter_typed(trans, &iter, BTREE_ID_stripes, POS(0, a->stripe), 2189 + BTREE_ITER_slots, stripe); 2190 int ret = bkey_err(s); 2191 if (ret) 2192 return ret;
+3 -1
fs/bcachefs/errcode.c
··· 13 NULL 14 }; 15 16 - static unsigned bch2_errcode_parents[] = { 17 #define x(class, err) [BCH_ERR_##err - BCH_ERR_START] = class, 18 BCH_ERRCODES() 19 #undef x 20 }; 21 22 const char *bch2_err_str(int err) 23 { 24 const char *errstr; ··· 37 return errstr ?: "(Invalid error)"; 38 } 39 40 bool __bch2_err_matches(int err, int class) 41 { 42 err = abs(err);
··· 13 NULL 14 }; 15 16 + static const unsigned bch2_errcode_parents[] = { 17 #define x(class, err) [BCH_ERR_##err - BCH_ERR_START] = class, 18 BCH_ERRCODES() 19 #undef x 20 }; 21 22 + __attribute__((const)) 23 const char *bch2_err_str(int err) 24 { 25 const char *errstr; ··· 36 return errstr ?: "(Invalid error)"; 37 } 38 39 + __attribute__((const)) 40 bool __bch2_err_matches(int err, int class) 41 { 42 err = abs(err);
+11 -4
fs/bcachefs/errcode.h
··· 182 x(BCH_ERR_fsck, fsck_errors_not_fixed) \ 183 x(BCH_ERR_fsck, fsck_repair_unimplemented) \ 184 x(BCH_ERR_fsck, fsck_repair_impossible) \ 185 - x(EINVAL, restart_recovery) \ 186 - x(EINVAL, cannot_rewind_recovery) \ 187 x(0, data_update_done) \ 188 x(BCH_ERR_data_update_done, data_update_done_would_block) \ 189 x(BCH_ERR_data_update_done, data_update_done_unwritten) \ 190 x(BCH_ERR_data_update_done, data_update_done_no_writes_needed) \ ··· 214 x(EINVAL, remove_would_lose_data) \ 215 x(EINVAL, no_resize_with_buckets_nouse) \ 216 x(EINVAL, inode_unpack_error) \ 217 x(EINVAL, varint_decode_error) \ 218 x(EINVAL, erasure_coding_found_btree_node) \ 219 x(EINVAL, option_negative) \ ··· 362 BCH_ERR_MAX 363 }; 364 365 - const char *bch2_err_str(int); 366 - bool __bch2_err_matches(int, int); 367 368 static inline bool _bch2_err_matches(int err, int class) 369 { 370 return err < 0 && __bch2_err_matches(err, class);
··· 182 x(BCH_ERR_fsck, fsck_errors_not_fixed) \ 183 x(BCH_ERR_fsck, fsck_repair_unimplemented) \ 184 x(BCH_ERR_fsck, fsck_repair_impossible) \ 185 + x(EINVAL, recovery_will_run) \ 186 + x(BCH_ERR_recovery_will_run, restart_recovery) \ 187 + x(BCH_ERR_recovery_will_run, cannot_rewind_recovery) \ 188 + x(BCH_ERR_recovery_will_run, recovery_pass_will_run) \ 189 x(0, data_update_done) \ 190 + x(0, bkey_was_deleted) \ 191 x(BCH_ERR_data_update_done, data_update_done_would_block) \ 192 x(BCH_ERR_data_update_done, data_update_done_unwritten) \ 193 x(BCH_ERR_data_update_done, data_update_done_no_writes_needed) \ ··· 211 x(EINVAL, remove_would_lose_data) \ 212 x(EINVAL, no_resize_with_buckets_nouse) \ 213 x(EINVAL, inode_unpack_error) \ 214 + x(EINVAL, inode_not_unlinked) \ 215 + x(EINVAL, inode_has_child_snapshot) \ 216 x(EINVAL, varint_decode_error) \ 217 x(EINVAL, erasure_coding_found_btree_node) \ 218 x(EINVAL, option_negative) \ ··· 357 BCH_ERR_MAX 358 }; 359 360 + __attribute__((const)) const char *bch2_err_str(int); 361 362 + __attribute__((const)) bool __bch2_err_matches(int, int); 363 + 364 + __attribute__((const)) 365 static inline bool _bch2_err_matches(int err, int class) 366 { 367 return err < 0 && __bch2_err_matches(err, class);
+50 -43
fs/bcachefs/error.c
··· 100 set_bit(BCH_FS_topology_error, &c->flags); 101 if (!test_bit(BCH_FS_in_recovery, &c->flags)) { 102 __bch2_inconsistent_error(c, out); 103 - return -BCH_ERR_btree_need_topology_repair; 104 } else { 105 return bch2_run_explicit_recovery_pass(c, out, BCH_RECOVERY_PASS_check_topology, 0) ?: 106 - -BCH_ERR_btree_node_read_validate_error; 107 } 108 } 109 ··· 403 404 if (test_bit(BCH_FS_in_fsck, &c->flags)) { 405 if (!(flags & (FSCK_CAN_FIX|FSCK_CAN_IGNORE))) 406 - return -BCH_ERR_fsck_repair_unimplemented; 407 408 switch (c->opts.fix_errors) { 409 case FSCK_FIX_exit: 410 - return -BCH_ERR_fsck_errors_not_fixed; 411 case FSCK_FIX_yes: 412 if (flags & FSCK_CAN_FIX) 413 - return -BCH_ERR_fsck_fix; 414 fallthrough; 415 case FSCK_FIX_no: 416 if (flags & FSCK_CAN_IGNORE) 417 - return -BCH_ERR_fsck_ignore; 418 - return -BCH_ERR_fsck_errors_not_fixed; 419 case FSCK_FIX_ask: 420 if (flags & FSCK_AUTOFIX) 421 - return -BCH_ERR_fsck_fix; 422 - return -BCH_ERR_fsck_ask; 423 default: 424 BUG(); 425 } ··· 427 if ((flags & FSCK_AUTOFIX) && 428 (c->opts.errors == BCH_ON_ERROR_continue || 429 c->opts.errors == BCH_ON_ERROR_fix_safe)) 430 - return -BCH_ERR_fsck_fix; 431 432 if (c->opts.errors == BCH_ON_ERROR_continue && 433 (flags & FSCK_CAN_IGNORE)) 434 - return -BCH_ERR_fsck_ignore; 435 - return -BCH_ERR_fsck_errors_not_fixed; 436 } 437 } 438 ··· 444 { 445 va_list args; 446 struct printbuf buf = PRINTBUF, *out = &buf; 447 - int ret = -BCH_ERR_fsck_ignore; 448 const char *action_orig = "fix?", *action = action_orig; 449 450 might_sleep(); ··· 474 475 if (test_bit(err, c->sb.errors_silent)) 476 return flags & FSCK_CAN_FIX 477 - ? -BCH_ERR_fsck_fix 478 - : -BCH_ERR_fsck_ignore; 479 480 printbuf_indent_add_nextline(out, 2); 481 ··· 517 prt_str(out, ", "); 518 if (flags & FSCK_CAN_FIX) { 519 prt_actioning(out, action); 520 - ret = -BCH_ERR_fsck_fix; 521 } else { 522 prt_str(out, ", continuing"); 523 - ret = -BCH_ERR_fsck_ignore; 524 } 525 526 goto print; ··· 532 "run fsck, and forward to devs so error can be marked for self-healing"); 533 inconsistent = true; 534 print = true; 535 - ret = -BCH_ERR_fsck_errors_not_fixed; 536 } else if (flags & FSCK_CAN_FIX) { 537 prt_str(out, ", "); 538 prt_actioning(out, action); 539 - ret = -BCH_ERR_fsck_fix; 540 } else { 541 prt_str(out, ", continuing"); 542 - ret = -BCH_ERR_fsck_ignore; 543 } 544 } else if (c->opts.fix_errors == FSCK_FIX_exit) { 545 prt_str(out, ", exiting"); 546 - ret = -BCH_ERR_fsck_errors_not_fixed; 547 } else if (flags & FSCK_CAN_FIX) { 548 int fix = s && s->fix 549 ? s->fix ··· 562 : FSCK_FIX_yes; 563 564 ret = ret & 1 565 - ? -BCH_ERR_fsck_fix 566 - : -BCH_ERR_fsck_ignore; 567 } else if (fix == FSCK_FIX_yes || 568 (c->opts.nochanges && 569 !(flags & FSCK_CAN_IGNORE))) { 570 prt_str(out, ", "); 571 prt_actioning(out, action); 572 - ret = -BCH_ERR_fsck_fix; 573 } else { 574 prt_str(out, ", not "); 575 prt_actioning(out, action); 576 } 577 - } else if (!(flags & FSCK_CAN_IGNORE)) { 578 - prt_str(out, " (repair unimplemented)"); 579 } 580 581 - if (ret == -BCH_ERR_fsck_ignore && 582 (c->opts.fix_errors == FSCK_FIX_exit || 583 !(flags & FSCK_CAN_IGNORE))) 584 - ret = -BCH_ERR_fsck_errors_not_fixed; 585 586 if (test_bit(BCH_FS_in_fsck, &c->flags) && 587 - (ret != -BCH_ERR_fsck_fix && 588 - ret != -BCH_ERR_fsck_ignore)) { 589 exiting = true; 590 print = true; 591 } ··· 620 621 if (s) 622 s->ret = ret; 623 - 624 /* 625 * We don't yet track whether the filesystem currently has errors, for 626 * log_fsck_err()s: that would require us to track for every error type 627 * which recovery pass corrects it, to get the fsck exit status correct: 628 */ 629 - if (flags & FSCK_CAN_FIX) { 630 - if (ret == -BCH_ERR_fsck_fix) { 631 - set_bit(BCH_FS_errors_fixed, &c->flags); 632 - } else { 633 - set_bit(BCH_FS_errors_not_fixed, &c->flags); 634 - set_bit(BCH_FS_error, &c->flags); 635 - } 636 } 637 - err_unlock: 638 - mutex_unlock(&c->fsck_error_msgs_lock); 639 - err: 640 if (action != action_orig) 641 kfree(action); 642 printbuf_exit(&buf); 643 return ret; 644 } 645 ··· 657 const char *fmt, ...) 658 { 659 if (from.flags & BCH_VALIDATE_silent) 660 - return -BCH_ERR_fsck_delete_bkey; 661 662 unsigned fsck_flags = 0; 663 if (!(from.flags & (BCH_VALIDATE_write|BCH_VALIDATE_commit))) { 664 if (test_bit(err, c->sb.errors_silent)) 665 - return -BCH_ERR_fsck_delete_bkey; 666 667 fsck_flags |= FSCK_AUTOFIX|FSCK_CAN_FIX; 668 }
··· 100 set_bit(BCH_FS_topology_error, &c->flags); 101 if (!test_bit(BCH_FS_in_recovery, &c->flags)) { 102 __bch2_inconsistent_error(c, out); 103 + return bch_err_throw(c, btree_need_topology_repair); 104 } else { 105 return bch2_run_explicit_recovery_pass(c, out, BCH_RECOVERY_PASS_check_topology, 0) ?: 106 + bch_err_throw(c, btree_node_read_validate_error); 107 } 108 } 109 ··· 403 404 if (test_bit(BCH_FS_in_fsck, &c->flags)) { 405 if (!(flags & (FSCK_CAN_FIX|FSCK_CAN_IGNORE))) 406 + return bch_err_throw(c, fsck_repair_unimplemented); 407 408 switch (c->opts.fix_errors) { 409 case FSCK_FIX_exit: 410 + return bch_err_throw(c, fsck_errors_not_fixed); 411 case FSCK_FIX_yes: 412 if (flags & FSCK_CAN_FIX) 413 + return bch_err_throw(c, fsck_fix); 414 fallthrough; 415 case FSCK_FIX_no: 416 if (flags & FSCK_CAN_IGNORE) 417 + return bch_err_throw(c, fsck_ignore); 418 + return bch_err_throw(c, fsck_errors_not_fixed); 419 case FSCK_FIX_ask: 420 if (flags & FSCK_AUTOFIX) 421 + return bch_err_throw(c, fsck_fix); 422 + return bch_err_throw(c, fsck_ask); 423 default: 424 BUG(); 425 } ··· 427 if ((flags & FSCK_AUTOFIX) && 428 (c->opts.errors == BCH_ON_ERROR_continue || 429 c->opts.errors == BCH_ON_ERROR_fix_safe)) 430 + return bch_err_throw(c, fsck_fix); 431 432 if (c->opts.errors == BCH_ON_ERROR_continue && 433 (flags & FSCK_CAN_IGNORE)) 434 + return bch_err_throw(c, fsck_ignore); 435 + return bch_err_throw(c, fsck_errors_not_fixed); 436 } 437 } 438 ··· 444 { 445 va_list args; 446 struct printbuf buf = PRINTBUF, *out = &buf; 447 + int ret = 0; 448 const char *action_orig = "fix?", *action = action_orig; 449 450 might_sleep(); ··· 474 475 if (test_bit(err, c->sb.errors_silent)) 476 return flags & FSCK_CAN_FIX 477 + ? bch_err_throw(c, fsck_fix) 478 + : bch_err_throw(c, fsck_ignore); 479 480 printbuf_indent_add_nextline(out, 2); 481 ··· 517 prt_str(out, ", "); 518 if (flags & FSCK_CAN_FIX) { 519 prt_actioning(out, action); 520 + ret = bch_err_throw(c, fsck_fix); 521 } else { 522 prt_str(out, ", continuing"); 523 + ret = bch_err_throw(c, fsck_ignore); 524 } 525 526 goto print; ··· 532 "run fsck, and forward to devs so error can be marked for self-healing"); 533 inconsistent = true; 534 print = true; 535 + ret = bch_err_throw(c, fsck_errors_not_fixed); 536 } else if (flags & FSCK_CAN_FIX) { 537 prt_str(out, ", "); 538 prt_actioning(out, action); 539 + ret = bch_err_throw(c, fsck_fix); 540 } else { 541 prt_str(out, ", continuing"); 542 + ret = bch_err_throw(c, fsck_ignore); 543 } 544 } else if (c->opts.fix_errors == FSCK_FIX_exit) { 545 prt_str(out, ", exiting"); 546 + ret = bch_err_throw(c, fsck_errors_not_fixed); 547 } else if (flags & FSCK_CAN_FIX) { 548 int fix = s && s->fix 549 ? s->fix ··· 562 : FSCK_FIX_yes; 563 564 ret = ret & 1 565 + ? bch_err_throw(c, fsck_fix) 566 + : bch_err_throw(c, fsck_ignore); 567 } else if (fix == FSCK_FIX_yes || 568 (c->opts.nochanges && 569 !(flags & FSCK_CAN_IGNORE))) { 570 prt_str(out, ", "); 571 prt_actioning(out, action); 572 + ret = bch_err_throw(c, fsck_fix); 573 } else { 574 prt_str(out, ", not "); 575 prt_actioning(out, action); 576 + ret = bch_err_throw(c, fsck_ignore); 577 } 578 + } else { 579 + if (flags & FSCK_CAN_IGNORE) { 580 + prt_str(out, ", continuing"); 581 + ret = bch_err_throw(c, fsck_ignore); 582 + } else { 583 + prt_str(out, " (repair unimplemented)"); 584 + ret = bch_err_throw(c, fsck_repair_unimplemented); 585 + } 586 } 587 588 + if (bch2_err_matches(ret, BCH_ERR_fsck_ignore) && 589 (c->opts.fix_errors == FSCK_FIX_exit || 590 !(flags & FSCK_CAN_IGNORE))) 591 + ret = bch_err_throw(c, fsck_errors_not_fixed); 592 593 if (test_bit(BCH_FS_in_fsck, &c->flags) && 594 + (!bch2_err_matches(ret, BCH_ERR_fsck_fix) && 595 + !bch2_err_matches(ret, BCH_ERR_fsck_ignore))) { 596 exiting = true; 597 print = true; 598 } ··· 613 614 if (s) 615 s->ret = ret; 616 + err_unlock: 617 + mutex_unlock(&c->fsck_error_msgs_lock); 618 + err: 619 /* 620 * We don't yet track whether the filesystem currently has errors, for 621 * log_fsck_err()s: that would require us to track for every error type 622 * which recovery pass corrects it, to get the fsck exit status correct: 623 */ 624 + if (bch2_err_matches(ret, BCH_ERR_fsck_fix)) { 625 + set_bit(BCH_FS_errors_fixed, &c->flags); 626 + } else { 627 + set_bit(BCH_FS_errors_not_fixed, &c->flags); 628 + set_bit(BCH_FS_error, &c->flags); 629 } 630 + 631 if (action != action_orig) 632 kfree(action); 633 printbuf_exit(&buf); 634 + 635 + BUG_ON(!ret); 636 return ret; 637 } 638 ··· 650 const char *fmt, ...) 651 { 652 if (from.flags & BCH_VALIDATE_silent) 653 + return bch_err_throw(c, fsck_delete_bkey); 654 655 unsigned fsck_flags = 0; 656 if (!(from.flags & (BCH_VALIDATE_write|BCH_VALIDATE_commit))) { 657 if (test_bit(err, c->sb.errors_silent)) 658 + return bch_err_throw(c, fsck_delete_bkey); 659 660 fsck_flags |= FSCK_AUTOFIX|FSCK_CAN_FIX; 661 }
+6 -6
fs/bcachefs/error.h
··· 105 #define fsck_err_wrap(_do) \ 106 ({ \ 107 int _ret = _do; \ 108 - if (_ret != -BCH_ERR_fsck_fix && \ 109 - _ret != -BCH_ERR_fsck_ignore) { \ 110 ret = _ret; \ 111 goto fsck_err; \ 112 } \ 113 \ 114 - _ret == -BCH_ERR_fsck_fix; \ 115 }) 116 117 #define __fsck_err(...) fsck_err_wrap(bch2_fsck_err(__VA_ARGS__)) ··· 170 int _ret = __bch2_bkey_fsck_err(c, k, from, \ 171 BCH_FSCK_ERR_##_err_type, \ 172 _err_msg, ##__VA_ARGS__); \ 173 - if (_ret != -BCH_ERR_fsck_fix && \ 174 - _ret != -BCH_ERR_fsck_ignore) \ 175 ret = _ret; \ 176 - ret = -BCH_ERR_fsck_delete_bkey; \ 177 goto fsck_err; \ 178 } while (0) 179
··· 105 #define fsck_err_wrap(_do) \ 106 ({ \ 107 int _ret = _do; \ 108 + if (!bch2_err_matches(_ret, BCH_ERR_fsck_fix) && \ 109 + !bch2_err_matches(_ret, BCH_ERR_fsck_ignore)) { \ 110 ret = _ret; \ 111 goto fsck_err; \ 112 } \ 113 \ 114 + bch2_err_matches(_ret, BCH_ERR_fsck_fix); \ 115 }) 116 117 #define __fsck_err(...) fsck_err_wrap(bch2_fsck_err(__VA_ARGS__)) ··· 170 int _ret = __bch2_bkey_fsck_err(c, k, from, \ 171 BCH_FSCK_ERR_##_err_type, \ 172 _err_msg, ##__VA_ARGS__); \ 173 + if (!bch2_err_matches(_ret, BCH_ERR_fsck_fix) && \ 174 + !bch2_err_matches(_ret, BCH_ERR_fsck_ignore)) \ 175 ret = _ret; \ 176 + ret = bch_err_throw(c, fsck_delete_bkey); \ 177 goto fsck_err; \ 178 } while (0) 179
+25 -38
fs/bcachefs/extents.c
··· 65 continue; 66 67 bch2_printbuf_make_room(out, 1024); 68 - rcu_read_lock(); 69 out->atomic++; 70 - struct bch_dev *ca = bch2_dev_rcu_noerror(c, f->dev); 71 - if (ca) 72 - prt_str(out, ca->name); 73 - else 74 - prt_printf(out, "(invalid device %u)", f->dev); 75 --out->atomic; 76 - rcu_read_unlock(); 77 78 prt_char(out, ' '); 79 ··· 193 bool have_dirty_ptrs = false, have_pick = false; 194 195 if (k.k->type == KEY_TYPE_error) 196 - return -BCH_ERR_key_type_error; 197 198 rcu_read_lock(); 199 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); ··· 286 if (!have_dirty_ptrs) 287 return 0; 288 if (have_missing_devs) 289 - return -BCH_ERR_no_device_to_read_from; 290 if (have_csum_errors) 291 - return -BCH_ERR_data_read_csum_err; 292 if (have_io_errors) 293 - return -BCH_ERR_data_read_io_err; 294 295 /* 296 * If we get here, we have pointers (bkey_ptrs_validate() ensures that), 297 * but they don't point to valid devices: 298 */ 299 - return -BCH_ERR_no_devices_valid; 300 } 301 302 /* KEY_TYPE_btree_ptr: */ ··· 407 lp.crc = bch2_extent_crc_unpack(l.k, NULL); 408 rp.crc = bch2_extent_crc_unpack(r.k, NULL); 409 410 while (__bkey_ptr_next_decode(l.k, l_ptrs.end, lp, en_l) && 411 __bkey_ptr_next_decode(r.k, r_ptrs.end, rp, en_r)) { 412 if (lp.ptr.offset + lp.crc.offset + lp.crc.live_size != ··· 420 return false; 421 422 /* Extents may not straddle buckets: */ 423 - rcu_read_lock(); 424 struct bch_dev *ca = bch2_dev_rcu(c, lp.ptr.dev); 425 bool same_bucket = ca && PTR_BUCKET_NR(ca, &lp.ptr) == PTR_BUCKET_NR(ca, &rp.ptr); 426 - rcu_read_unlock(); 427 428 if (!same_bucket) 429 return false; ··· 838 struct extent_ptr_decoded p; 839 unsigned durability = 0; 840 841 - rcu_read_lock(); 842 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) 843 durability += bch2_extent_ptr_durability(c, &p); 844 - rcu_read_unlock(); 845 - 846 return durability; 847 } 848 ··· 851 struct extent_ptr_decoded p; 852 unsigned durability = 0; 853 854 - rcu_read_lock(); 855 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) 856 if (p.ptr.dev < c->sb.nr_devices && c->devs[p.ptr.dev]) 857 durability += bch2_extent_ptr_durability(c, &p); 858 - rcu_read_unlock(); 859 - 860 return durability; 861 } 862 ··· 1011 { 1012 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1013 struct bch_dev *ca; 1014 - bool ret = false; 1015 1016 - rcu_read_lock(); 1017 bkey_for_each_ptr(ptrs, ptr) 1018 if (bch2_dev_in_target(c, ptr->dev, target) && 1019 (ca = bch2_dev_rcu(c, ptr->dev)) && 1020 (!ptr->cached || 1021 - !dev_ptr_stale_rcu(ca, ptr))) { 1022 - ret = true; 1023 - break; 1024 - } 1025 - rcu_read_unlock(); 1026 1027 - return ret; 1028 } 1029 1030 bool bch2_bkey_matches_ptr(struct bch_fs *c, struct bkey_s_c k, ··· 1134 bool have_cached_ptr; 1135 unsigned drop_dev = ptr->dev; 1136 1137 - rcu_read_lock(); 1138 restart_drop_ptrs: 1139 ptrs = bch2_bkey_ptrs(k); 1140 have_cached_ptr = false; ··· 1167 goto drop; 1168 1169 ptr->cached = true; 1170 - rcu_read_unlock(); 1171 return; 1172 drop: 1173 - rcu_read_unlock(); 1174 bch2_bkey_drop_ptr_noerror(k, ptr); 1175 } 1176 ··· 1184 { 1185 struct bch_dev *ca; 1186 1187 - rcu_read_lock(); 1188 bch2_bkey_drop_ptrs(k, ptr, 1189 ptr->cached && 1190 (!(ca = bch2_dev_rcu(c, ptr->dev)) || 1191 dev_ptr_stale_rcu(ca, ptr) > 0)); 1192 - rcu_read_unlock(); 1193 1194 return bkey_deleted(k.k); 1195 } ··· 1206 struct bkey_ptrs ptrs; 1207 bool have_cached_ptr; 1208 1209 - rcu_read_lock(); 1210 restart_drop_ptrs: 1211 ptrs = bch2_bkey_ptrs(k); 1212 have_cached_ptr = false; ··· 1219 } 1220 have_cached_ptr = true; 1221 } 1222 - rcu_read_unlock(); 1223 1224 return bkey_deleted(k.k); 1225 } ··· 1226 void bch2_extent_ptr_to_text(struct printbuf *out, struct bch_fs *c, const struct bch_extent_ptr *ptr) 1227 { 1228 out->atomic++; 1229 - rcu_read_lock(); 1230 struct bch_dev *ca = bch2_dev_rcu_noerror(c, ptr->dev); 1231 if (!ca) { 1232 prt_printf(out, "ptr: %u:%llu gen %u%s", ptr->dev, ··· 1250 else if (stale) 1251 prt_printf(out, " invalid"); 1252 } 1253 - rcu_read_unlock(); 1254 --out->atomic; 1255 } 1256 ··· 1515 struct bch_compression_opt opt = __bch2_compression_decode(r->compression); 1516 prt_printf(err, "invalid compression opt %u:%u", 1517 opt.type, opt.level); 1518 - return -BCH_ERR_invalid_bkey; 1519 } 1520 #endif 1521 break;
··· 65 continue; 66 67 bch2_printbuf_make_room(out, 1024); 68 out->atomic++; 69 + scoped_guard(rcu) { 70 + struct bch_dev *ca = bch2_dev_rcu_noerror(c, f->dev); 71 + if (ca) 72 + prt_str(out, ca->name); 73 + else 74 + prt_printf(out, "(invalid device %u)", f->dev); 75 + } 76 --out->atomic; 77 78 prt_char(out, ' '); 79 ··· 193 bool have_dirty_ptrs = false, have_pick = false; 194 195 if (k.k->type == KEY_TYPE_error) 196 + return bch_err_throw(c, key_type_error); 197 198 rcu_read_lock(); 199 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); ··· 286 if (!have_dirty_ptrs) 287 return 0; 288 if (have_missing_devs) 289 + return bch_err_throw(c, no_device_to_read_from); 290 if (have_csum_errors) 291 + return bch_err_throw(c, data_read_csum_err); 292 if (have_io_errors) 293 + return bch_err_throw(c, data_read_io_err); 294 295 /* 296 * If we get here, we have pointers (bkey_ptrs_validate() ensures that), 297 * but they don't point to valid devices: 298 */ 299 + return bch_err_throw(c, no_devices_valid); 300 } 301 302 /* KEY_TYPE_btree_ptr: */ ··· 407 lp.crc = bch2_extent_crc_unpack(l.k, NULL); 408 rp.crc = bch2_extent_crc_unpack(r.k, NULL); 409 410 + guard(rcu)(); 411 + 412 while (__bkey_ptr_next_decode(l.k, l_ptrs.end, lp, en_l) && 413 __bkey_ptr_next_decode(r.k, r_ptrs.end, rp, en_r)) { 414 if (lp.ptr.offset + lp.crc.offset + lp.crc.live_size != ··· 418 return false; 419 420 /* Extents may not straddle buckets: */ 421 struct bch_dev *ca = bch2_dev_rcu(c, lp.ptr.dev); 422 bool same_bucket = ca && PTR_BUCKET_NR(ca, &lp.ptr) == PTR_BUCKET_NR(ca, &rp.ptr); 423 424 if (!same_bucket) 425 return false; ··· 838 struct extent_ptr_decoded p; 839 unsigned durability = 0; 840 841 + guard(rcu)(); 842 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) 843 durability += bch2_extent_ptr_durability(c, &p); 844 return durability; 845 } 846 ··· 853 struct extent_ptr_decoded p; 854 unsigned durability = 0; 855 856 + guard(rcu)(); 857 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) 858 if (p.ptr.dev < c->sb.nr_devices && c->devs[p.ptr.dev]) 859 durability += bch2_extent_ptr_durability(c, &p); 860 return durability; 861 } 862 ··· 1015 { 1016 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1017 struct bch_dev *ca; 1018 1019 + guard(rcu)(); 1020 bkey_for_each_ptr(ptrs, ptr) 1021 if (bch2_dev_in_target(c, ptr->dev, target) && 1022 (ca = bch2_dev_rcu(c, ptr->dev)) && 1023 (!ptr->cached || 1024 + !dev_ptr_stale_rcu(ca, ptr))) 1025 + return true; 1026 1027 + return false; 1028 } 1029 1030 bool bch2_bkey_matches_ptr(struct bch_fs *c, struct bkey_s_c k, ··· 1142 bool have_cached_ptr; 1143 unsigned drop_dev = ptr->dev; 1144 1145 + guard(rcu)(); 1146 restart_drop_ptrs: 1147 ptrs = bch2_bkey_ptrs(k); 1148 have_cached_ptr = false; ··· 1175 goto drop; 1176 1177 ptr->cached = true; 1178 return; 1179 drop: 1180 bch2_bkey_drop_ptr_noerror(k, ptr); 1181 } 1182 ··· 1194 { 1195 struct bch_dev *ca; 1196 1197 + guard(rcu)(); 1198 bch2_bkey_drop_ptrs(k, ptr, 1199 ptr->cached && 1200 (!(ca = bch2_dev_rcu(c, ptr->dev)) || 1201 dev_ptr_stale_rcu(ca, ptr) > 0)); 1202 1203 return bkey_deleted(k.k); 1204 } ··· 1217 struct bkey_ptrs ptrs; 1218 bool have_cached_ptr; 1219 1220 + guard(rcu)(); 1221 restart_drop_ptrs: 1222 ptrs = bch2_bkey_ptrs(k); 1223 have_cached_ptr = false; ··· 1230 } 1231 have_cached_ptr = true; 1232 } 1233 1234 return bkey_deleted(k.k); 1235 } ··· 1238 void bch2_extent_ptr_to_text(struct printbuf *out, struct bch_fs *c, const struct bch_extent_ptr *ptr) 1239 { 1240 out->atomic++; 1241 + guard(rcu)(); 1242 struct bch_dev *ca = bch2_dev_rcu_noerror(c, ptr->dev); 1243 if (!ca) { 1244 prt_printf(out, "ptr: %u:%llu gen %u%s", ptr->dev, ··· 1262 else if (stale) 1263 prt_printf(out, " invalid"); 1264 } 1265 --out->atomic; 1266 } 1267 ··· 1528 struct bch_compression_opt opt = __bch2_compression_decode(r->compression); 1529 prt_printf(err, "invalid compression opt %u:%u", 1530 opt.type, opt.level); 1531 + return bch_err_throw(c, invalid_bkey); 1532 } 1533 #endif 1534 break;
+11 -19
fs/bcachefs/fs-io-buffered.c
··· 394 struct bch_io_opts opts; 395 struct bch_folio_sector *tmp; 396 unsigned tmp_sectors; 397 }; 398 - 399 - static inline struct bch_writepage_state bch_writepage_state_init(struct bch_fs *c, 400 - struct bch_inode_info *inode) 401 - { 402 - struct bch_writepage_state ret = { 0 }; 403 - 404 - bch2_inode_opts_get(&ret.opts, c, &inode->ei_inode); 405 - return ret; 406 - } 407 408 /* 409 * Determine when a writepage io is full. We have to limit writepage bios to a ··· 658 int bch2_writepages(struct address_space *mapping, struct writeback_control *wbc) 659 { 660 struct bch_fs *c = mapping->host->i_sb->s_fs_info; 661 - struct bch_writepage_state w = 662 - bch_writepage_state_init(c, to_bch_ei(mapping->host)); 663 - struct blk_plug plug; 664 - int ret; 665 666 - blk_start_plug(&plug); 667 - ret = write_cache_pages(mapping, wbc, __bch2_writepage, &w); 668 - if (w.io) 669 - bch2_writepage_do_io(&w); 670 - blk_finish_plug(&plug); 671 - kfree(w.tmp); 672 return bch2_err_class(ret); 673 } 674
··· 394 struct bch_io_opts opts; 395 struct bch_folio_sector *tmp; 396 unsigned tmp_sectors; 397 + struct blk_plug plug; 398 }; 399 400 /* 401 * Determine when a writepage io is full. We have to limit writepage bios to a ··· 666 int bch2_writepages(struct address_space *mapping, struct writeback_control *wbc) 667 { 668 struct bch_fs *c = mapping->host->i_sb->s_fs_info; 669 + struct bch_writepage_state *w = kzalloc(sizeof(*w), GFP_NOFS|__GFP_NOFAIL); 670 671 + bch2_inode_opts_get(&w->opts, c, &to_bch_ei(mapping->host)->ei_inode); 672 + 673 + blk_start_plug(&w->plug); 674 + int ret = write_cache_pages(mapping, wbc, __bch2_writepage, w); 675 + if (w->io) 676 + bch2_writepage_do_io(w); 677 + blk_finish_plug(&w->plug); 678 + kfree(w->tmp); 679 + kfree(w); 680 return bch2_err_class(ret); 681 } 682
+1 -1
fs/bcachefs/fs-io-pagecache.c
··· 447 448 if (!reserved) { 449 bch2_disk_reservation_put(c, &disk_res); 450 - return -BCH_ERR_ENOSPC_disk_reservation; 451 } 452 break; 453 }
··· 447 448 if (!reserved) { 449 bch2_disk_reservation_put(c, &disk_res); 450 + return bch_err_throw(c, ENOSPC_disk_reservation); 451 } 452 break; 453 }
+6 -6
fs/bcachefs/fs-io.c
··· 71 memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush)); 72 73 for_each_set_bit(dev, devs.d, BCH_SB_MEMBERS_MAX) { 74 - rcu_read_lock(); 75 - ca = rcu_dereference(c->devs[dev]); 76 - if (ca && !enumerated_ref_tryget(&ca->io_ref[WRITE], 77 - BCH_DEV_WRITE_REF_nocow_flush)) 78 - ca = NULL; 79 - rcu_read_unlock(); 80 81 if (!ca) 82 continue;
··· 71 memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush)); 72 73 for_each_set_bit(dev, devs.d, BCH_SB_MEMBERS_MAX) { 74 + scoped_guard(rcu) { 75 + ca = rcu_dereference(c->devs[dev]); 76 + if (ca && !enumerated_ref_tryget(&ca->io_ref[WRITE], 77 + BCH_DEV_WRITE_REF_nocow_flush)) 78 + ca = NULL; 79 + } 80 81 if (!ca) 82 continue;
+2 -2
fs/bcachefs/fs-ioctl.c
··· 268 } 269 270 if (dst_dentry->d_inode) { 271 - error = -BCH_ERR_EEXIST_subvolume_create; 272 goto err3; 273 } 274 275 dir = dst_path.dentry->d_inode; 276 if (IS_DEADDIR(dir)) { 277 - error = -BCH_ERR_ENOENT_directory_dead; 278 goto err3; 279 } 280
··· 268 } 269 270 if (dst_dentry->d_inode) { 271 + error = bch_err_throw(c, EEXIST_subvolume_create); 272 goto err3; 273 } 274 275 dir = dst_path.dentry->d_inode; 276 if (IS_DEADDIR(dir)) { 277 + error = bch_err_throw(c, ENOENT_directory_dead); 278 goto err3; 279 } 280
+25 -15
fs/bcachefs/fs.c
··· 124 goto err; 125 126 struct bch_extent_rebalance new_r = bch2_inode_rebalance_opts_get(c, &inode_u); 127 128 - if (memcmp(&old_r, &new_r, sizeof(new_r))) { 129 ret = bch2_set_rebalance_needs_scan_trans(trans, inode_u.bi_inum); 130 if (ret) 131 goto err; ··· 146 147 if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) 148 goto retry; 149 150 bch2_fs_fatal_err_on(bch2_err_matches(ret, ENOENT), c, 151 "%s: inode %llu:%llu not found when updating", ··· 1573 { 1574 struct bch_inode_info *inode = file_bch_inode(file); 1575 struct bch_fs *c = inode->v.i_sb->s_fs_info; 1576 1577 if (!dir_emit_dots(file, ctx)) 1578 return 0; 1579 1580 - int ret = bch2_readdir(c, inode_inum(inode), ctx); 1581 1582 bch_err_fn(c, ret); 1583 return bch2_err_class(ret); ··· 2007 goto err; 2008 2009 if (k.k->type != KEY_TYPE_dirent) { 2010 - ret = -BCH_ERR_ENOENT_dirent_doesnt_match_inode; 2011 goto err; 2012 } 2013 2014 d = bkey_s_c_to_dirent(k); 2015 ret = bch2_dirent_read_target(trans, inode_inum(dir), d, &target); 2016 if (ret > 0) 2017 - ret = -BCH_ERR_ENOENT_dirent_doesnt_match_inode; 2018 if (ret) 2019 goto err; 2020 ··· 2180 KEY_TYPE_QUOTA_WARN); 2181 bch2_quota_acct(c, inode->ei_qid, Q_INO, -1, 2182 KEY_TYPE_QUOTA_WARN); 2183 - bch2_inode_rm(c, inode_inum(inode)); 2184 2185 /* 2186 * If we are deleting, we need it present in the vfs hash table ··· 2333 struct bch_fs *c = root->d_sb->s_fs_info; 2334 bool first = true; 2335 2336 - rcu_read_lock(); 2337 for_each_online_member_rcu(c, ca) { 2338 if (!first) 2339 seq_putc(seq, ':'); 2340 first = false; 2341 seq_puts(seq, ca->disk_sb.sb_name); 2342 } 2343 - rcu_read_unlock(); 2344 2345 return 0; 2346 } ··· 2536 2537 sb->s_bdi->ra_pages = VM_READAHEAD_PAGES; 2538 2539 - rcu_read_lock(); 2540 - for_each_online_member_rcu(c, ca) { 2541 - struct block_device *bdev = ca->disk_sb.bdev; 2542 2543 - /* XXX: create an anonymous device for multi device filesystems */ 2544 - sb->s_bdev = bdev; 2545 - sb->s_dev = bdev->bd_dev; 2546 - break; 2547 } 2548 - rcu_read_unlock(); 2549 2550 c->dev = sb->s_dev; 2551
··· 124 goto err; 125 126 struct bch_extent_rebalance new_r = bch2_inode_rebalance_opts_get(c, &inode_u); 127 + bool rebalance_changed = memcmp(&old_r, &new_r, sizeof(new_r)); 128 129 + if (rebalance_changed) { 130 ret = bch2_set_rebalance_needs_scan_trans(trans, inode_u.bi_inum); 131 if (ret) 132 goto err; ··· 145 146 if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) 147 goto retry; 148 + 149 + if (rebalance_changed) 150 + bch2_rebalance_wakeup(c); 151 152 bch2_fs_fatal_err_on(bch2_err_matches(ret, ENOENT), c, 153 "%s: inode %llu:%llu not found when updating", ··· 1569 { 1570 struct bch_inode_info *inode = file_bch_inode(file); 1571 struct bch_fs *c = inode->v.i_sb->s_fs_info; 1572 + struct bch_hash_info hash = bch2_hash_info_init(c, &inode->ei_inode); 1573 1574 if (!dir_emit_dots(file, ctx)) 1575 return 0; 1576 1577 + int ret = bch2_readdir(c, inode_inum(inode), &hash, ctx); 1578 1579 bch_err_fn(c, ret); 1580 return bch2_err_class(ret); ··· 2002 goto err; 2003 2004 if (k.k->type != KEY_TYPE_dirent) { 2005 + ret = bch_err_throw(c, ENOENT_dirent_doesnt_match_inode); 2006 goto err; 2007 } 2008 2009 d = bkey_s_c_to_dirent(k); 2010 ret = bch2_dirent_read_target(trans, inode_inum(dir), d, &target); 2011 if (ret > 0) 2012 + ret = bch_err_throw(c, ENOENT_dirent_doesnt_match_inode); 2013 if (ret) 2014 goto err; 2015 ··· 2175 KEY_TYPE_QUOTA_WARN); 2176 bch2_quota_acct(c, inode->ei_qid, Q_INO, -1, 2177 KEY_TYPE_QUOTA_WARN); 2178 + int ret = bch2_inode_rm(c, inode_inum(inode)); 2179 + if (ret && !bch2_err_matches(ret, EROFS)) { 2180 + bch_err_msg(c, ret, "VFS incorrectly tried to delete inode %llu:%llu", 2181 + inode->ei_inum.subvol, 2182 + inode->ei_inum.inum); 2183 + bch2_sb_error_count(c, BCH_FSCK_ERR_vfs_bad_inode_rm); 2184 + } 2185 2186 /* 2187 * If we are deleting, we need it present in the vfs hash table ··· 2322 struct bch_fs *c = root->d_sb->s_fs_info; 2323 bool first = true; 2324 2325 + guard(rcu)(); 2326 for_each_online_member_rcu(c, ca) { 2327 if (!first) 2328 seq_putc(seq, ':'); 2329 first = false; 2330 seq_puts(seq, ca->disk_sb.sb_name); 2331 } 2332 2333 return 0; 2334 } ··· 2526 2527 sb->s_bdi->ra_pages = VM_READAHEAD_PAGES; 2528 2529 + scoped_guard(rcu) { 2530 + for_each_online_member_rcu(c, ca) { 2531 + struct block_device *bdev = ca->disk_sb.bdev; 2532 2533 + /* XXX: create an anonymous device for multi device filesystems */ 2534 + sb->s_bdev = bdev; 2535 + sb->s_dev = bdev->bd_dev; 2536 + break; 2537 + } 2538 } 2539 2540 c->dev = sb->s_dev; 2541
+86 -63
fs/bcachefs/fsck.c
··· 23 #include <linux/bsearch.h> 24 #include <linux/dcache.h> /* struct qstr */ 25 26 - static int dirent_points_to_inode_nowarn(struct bkey_s_c_dirent d, 27 struct bch_inode_unpacked *inode) 28 { 29 if (d.v->d_type == DT_SUBVOL 30 ? le32_to_cpu(d.v->d_child_subvol) == inode->bi_subvol 31 : le64_to_cpu(d.v->d_inum) == inode->bi_inum) 32 return 0; 33 - return -BCH_ERR_ENOENT_dirent_doesnt_match_inode; 34 } 35 36 static void dirent_inode_mismatch_msg(struct printbuf *out, ··· 50 struct bkey_s_c_dirent dirent, 51 struct bch_inode_unpacked *inode) 52 { 53 - int ret = dirent_points_to_inode_nowarn(dirent, inode); 54 if (ret) { 55 struct printbuf buf = PRINTBUF; 56 dirent_inode_mismatch_msg(&buf, c, dirent, inode); ··· 153 goto found; 154 } 155 } 156 - ret = -BCH_ERR_ENOENT_no_snapshot_tree_subvol; 157 found: 158 bch2_trans_iter_exit(trans, &iter); 159 return ret; ··· 230 231 if (d_type != DT_DIR) { 232 bch_err(c, "error looking up lost+found: not a directory"); 233 - return -BCH_ERR_ENOENT_not_directory; 234 } 235 236 /* ··· 532 533 if (!bch2_snapshot_is_leaf(c, snapshotid)) { 534 bch_err(c, "need to reconstruct subvol, but have interior node snapshot"); 535 - return -BCH_ERR_fsck_repair_unimplemented; 536 } 537 538 /* ··· 643 644 return __bch2_fsck_write_inode(trans, &new_inode); 645 } 646 - 647 - struct snapshots_seen { 648 - struct bpos pos; 649 - snapshot_id_list ids; 650 - }; 651 652 static inline void snapshots_seen_exit(struct snapshots_seen *s) 653 { ··· 886 { 887 struct bch_fs *c = trans->c; 888 889 - struct inode_walker_entry *i; 890 - __darray_for_each(w->inodes, i) 891 - if (bch2_snapshot_is_ancestor(c, k.k->p.snapshot, i->inode.bi_snapshot)) 892 - goto found; 893 894 - return NULL; 895 - found: 896 - BUG_ON(k.k->p.snapshot > i->inode.bi_snapshot); 897 898 struct printbuf buf = PRINTBUF; 899 int ret = 0; ··· 940 if (ret) 941 goto fsck_err; 942 943 - ret = -BCH_ERR_transaction_restart_nested; 944 goto fsck_err; 945 } 946 ··· 985 int ret = 0; 986 987 if (d->v.d_type == DT_SUBVOL) { 988 - BUG(); 989 } else { 990 ret = get_visible_inodes(trans, &target, s, le64_to_cpu(d->v.d_inum)); 991 if (ret) ··· 1042 if (ret && !bch2_err_matches(ret, ENOENT)) 1043 return ret; 1044 1045 - if ((ret || dirent_points_to_inode_nowarn(d, inode)) && 1046 inode->bi_subvol && 1047 (inode->bi_flags & BCH_INODE_has_child_snapshot)) { 1048 /* Older version of a renamed subvolume root: we won't have a ··· 1063 trans, inode_points_to_missing_dirent, 1064 "inode points to missing dirent\n%s", 1065 (bch2_inode_unpacked_to_text(&buf, inode), buf.buf)) || 1066 - fsck_err_on(!ret && dirent_points_to_inode_nowarn(d, inode), 1067 trans, inode_points_to_wrong_dirent, 1068 "%s", 1069 (printbuf_reset(&buf), ··· 1166 u.bi_flags &= ~BCH_INODE_unlinked; 1167 do_update = true; 1168 ret = 0; 1169 } 1170 1171 ret = bch2_inode_has_child_snapshots(trans, k.k->p); ··· 1454 goto err; 1455 1456 inode->last_pos.inode--; 1457 - ret = -BCH_ERR_transaction_restart_nested; 1458 goto err; 1459 } 1460 ··· 1571 sizeof(seen->ids.data[0]) * seen->ids.size, 1572 GFP_KERNEL); 1573 if (!n.seen.ids.data) 1574 - return -BCH_ERR_ENOMEM_fsck_extent_ends_at; 1575 1576 __darray_for_each(extent_ends->e, i) { 1577 if (i->snapshot == k.k->p.snapshot) { ··· 1621 1622 bch_err(c, "%s: error finding first overlapping extent when repairing, got%s", 1623 __func__, buf.buf); 1624 - ret = -BCH_ERR_internal_fsck_err; 1625 goto err; 1626 } 1627 ··· 1646 pos2.size != k2.k->size) { 1647 bch_err(c, "%s: error finding seconding overlapping extent when repairing%s", 1648 __func__, buf.buf); 1649 - ret = -BCH_ERR_internal_fsck_err; 1650 goto err; 1651 } 1652 ··· 1694 * We overwrote the second extent - restart 1695 * check_extent() from the top: 1696 */ 1697 - ret = -BCH_ERR_transaction_restart_nested; 1698 } 1699 } 1700 fsck_err: ··· 2047 (bch2_bkey_val_to_text(&buf, c, d.s_c), buf.buf))) { 2048 if (!new_parent_subvol) { 2049 bch_err(c, "could not find a subvol for snapshot %u", d.k->p.snapshot); 2050 - return -BCH_ERR_fsck_repair_unimplemented; 2051 } 2052 2053 struct bkey_i_dirent *new_dirent = bch2_bkey_make_mut_typed(trans, iter, &d.s_c, 0, dirent); ··· 2109 2110 if (ret) { 2111 bch_err(c, "subvol %u points to missing inode root %llu", target_subvol, target_inum); 2112 - ret = -BCH_ERR_fsck_repair_unimplemented; 2113 goto err; 2114 } 2115 ··· 2141 struct bch_hash_info *hash_info, 2142 struct inode_walker *dir, 2143 struct inode_walker *target, 2144 - struct snapshots_seen *s) 2145 { 2146 struct bch_fs *c = trans->c; 2147 struct inode_walker_entry *i; ··· 2184 *hash_info = bch2_hash_info_init(c, &i->inode); 2185 dir->first_this_inode = false; 2186 2187 - ret = bch2_str_hash_check_key(trans, s, &bch2_dirent_hash_desc, hash_info, iter, k); 2188 if (ret < 0) 2189 goto err; 2190 if (ret) { ··· 2210 (printbuf_reset(&buf), 2211 bch2_bkey_val_to_text(&buf, c, k), 2212 buf.buf))) { 2213 - struct qstr name = bch2_dirent_get_name(d); 2214 - u32 subvol = d.v->d_type == DT_SUBVOL 2215 - ? le32_to_cpu(d.v->d_parent_subvol) 2216 - : 0; 2217 u64 target = d.v->d_type == DT_SUBVOL 2218 ? le32_to_cpu(d.v->d_child_subvol) 2219 : le64_to_cpu(d.v->d_inum); 2220 - u64 dir_offset; 2221 2222 - ret = bch2_hash_delete_at(trans, 2223 bch2_dirent_hash_desc, hash_info, iter, 2224 BTREE_UPDATE_internal_snapshot_node) ?: 2225 - bch2_dirent_create_snapshot(trans, subvol, 2226 - d.k->p.inode, d.k->p.snapshot, 2227 - hash_info, 2228 - d.v->d_type, 2229 - &name, 2230 - target, 2231 - &dir_offset, 2232 - BTREE_ITER_with_updates| 2233 - BTREE_UPDATE_internal_snapshot_node| 2234 - STR_HASH_must_create) ?: 2235 - bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); 2236 - 2237 - /* might need another check_dirents pass */ 2238 goto out; 2239 } 2240 ··· 2305 err: 2306 fsck_err: 2307 printbuf_exit(&buf); 2308 - bch_err_fn(c, ret); 2309 return ret; 2310 } 2311 ··· 2318 struct inode_walker target = inode_walker_init(); 2319 struct snapshots_seen s; 2320 struct bch_hash_info hash_info; 2321 2322 snapshots_seen_init(&s); 2323 - 2324 - int ret = bch2_trans_run(c, 2325 - for_each_btree_key(trans, iter, BTREE_ID_dirents, 2326 POS(BCACHEFS_ROOT_INO, 0), 2327 BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 2328 - check_dirent(trans, &iter, k, &hash_info, &dir, &target, &s)) ?: 2329 check_subdir_count_notnested(trans, &dir)); 2330 2331 snapshots_seen_exit(&s); 2332 inode_walker_exit(&dir); ··· 2356 struct inode_walker *inode) 2357 { 2358 struct bch_fs *c = trans->c; 2359 - struct inode_walker_entry *i; 2360 - int ret; 2361 2362 - ret = bch2_check_key_has_snapshot(trans, iter, k); 2363 if (ret < 0) 2364 return ret; 2365 if (ret) 2366 return 0; 2367 2368 - i = walk_inode(trans, inode, k); 2369 ret = PTR_ERR_OR_ZERO(i); 2370 if (ret) 2371 return ret; ··· 2379 *hash_info = bch2_hash_info_init(c, &i->inode); 2380 inode->first_this_inode = false; 2381 2382 - ret = bch2_str_hash_check_key(trans, NULL, &bch2_xattr_hash_desc, hash_info, iter, k); 2383 - bch_err_fn(c, ret); 2384 - return ret; 2385 } 2386 2387 /* ··· 2770 if (!d) { 2771 bch_err(c, "fsck: error allocating memory for nlink_table, size %zu", 2772 new_size); 2773 - return -BCH_ERR_ENOMEM_fsck_add_nlink; 2774 } 2775 2776 if (t->d)
··· 23 #include <linux/bsearch.h> 24 #include <linux/dcache.h> /* struct qstr */ 25 26 + static int dirent_points_to_inode_nowarn(struct bch_fs *c, 27 + struct bkey_s_c_dirent d, 28 struct bch_inode_unpacked *inode) 29 { 30 if (d.v->d_type == DT_SUBVOL 31 ? le32_to_cpu(d.v->d_child_subvol) == inode->bi_subvol 32 : le64_to_cpu(d.v->d_inum) == inode->bi_inum) 33 return 0; 34 + return bch_err_throw(c, ENOENT_dirent_doesnt_match_inode); 35 } 36 37 static void dirent_inode_mismatch_msg(struct printbuf *out, ··· 49 struct bkey_s_c_dirent dirent, 50 struct bch_inode_unpacked *inode) 51 { 52 + int ret = dirent_points_to_inode_nowarn(c, dirent, inode); 53 if (ret) { 54 struct printbuf buf = PRINTBUF; 55 dirent_inode_mismatch_msg(&buf, c, dirent, inode); ··· 152 goto found; 153 } 154 } 155 + ret = bch_err_throw(trans->c, ENOENT_no_snapshot_tree_subvol); 156 found: 157 bch2_trans_iter_exit(trans, &iter); 158 return ret; ··· 229 230 if (d_type != DT_DIR) { 231 bch_err(c, "error looking up lost+found: not a directory"); 232 + return bch_err_throw(c, ENOENT_not_directory); 233 } 234 235 /* ··· 531 532 if (!bch2_snapshot_is_leaf(c, snapshotid)) { 533 bch_err(c, "need to reconstruct subvol, but have interior node snapshot"); 534 + return bch_err_throw(c, fsck_repair_unimplemented); 535 } 536 537 /* ··· 642 643 return __bch2_fsck_write_inode(trans, &new_inode); 644 } 645 646 static inline void snapshots_seen_exit(struct snapshots_seen *s) 647 { ··· 890 { 891 struct bch_fs *c = trans->c; 892 893 + struct inode_walker_entry *i = darray_find_p(w->inodes, i, 894 + bch2_snapshot_is_ancestor(c, k.k->p.snapshot, i->inode.bi_snapshot)); 895 896 + if (!i) 897 + return NULL; 898 899 struct printbuf buf = PRINTBUF; 900 int ret = 0; ··· 947 if (ret) 948 goto fsck_err; 949 950 + ret = bch_err_throw(c, transaction_restart_nested); 951 goto fsck_err; 952 } 953 ··· 992 int ret = 0; 993 994 if (d->v.d_type == DT_SUBVOL) { 995 + bch_err(trans->c, "%s does not support DT_SUBVOL", __func__); 996 + ret = -BCH_ERR_fsck_repair_unimplemented; 997 } else { 998 ret = get_visible_inodes(trans, &target, s, le64_to_cpu(d->v.d_inum)); 999 if (ret) ··· 1048 if (ret && !bch2_err_matches(ret, ENOENT)) 1049 return ret; 1050 1051 + if ((ret || dirent_points_to_inode_nowarn(c, d, inode)) && 1052 inode->bi_subvol && 1053 (inode->bi_flags & BCH_INODE_has_child_snapshot)) { 1054 /* Older version of a renamed subvolume root: we won't have a ··· 1069 trans, inode_points_to_missing_dirent, 1070 "inode points to missing dirent\n%s", 1071 (bch2_inode_unpacked_to_text(&buf, inode), buf.buf)) || 1072 + fsck_err_on(!ret && dirent_points_to_inode_nowarn(c, d, inode), 1073 trans, inode_points_to_wrong_dirent, 1074 "%s", 1075 (printbuf_reset(&buf), ··· 1172 u.bi_flags &= ~BCH_INODE_unlinked; 1173 do_update = true; 1174 ret = 0; 1175 + } 1176 + 1177 + if (fsck_err_on(S_ISDIR(u.bi_mode) && u.bi_size, 1178 + trans, inode_dir_has_nonzero_i_size, 1179 + "directory %llu:%u with nonzero i_size %lli", 1180 + u.bi_inum, u.bi_snapshot, u.bi_size)) { 1181 + u.bi_size = 0; 1182 + do_update = true; 1183 } 1184 1185 ret = bch2_inode_has_child_snapshots(trans, k.k->p); ··· 1452 goto err; 1453 1454 inode->last_pos.inode--; 1455 + ret = bch_err_throw(c, transaction_restart_nested); 1456 goto err; 1457 } 1458 ··· 1569 sizeof(seen->ids.data[0]) * seen->ids.size, 1570 GFP_KERNEL); 1571 if (!n.seen.ids.data) 1572 + return bch_err_throw(c, ENOMEM_fsck_extent_ends_at); 1573 1574 __darray_for_each(extent_ends->e, i) { 1575 if (i->snapshot == k.k->p.snapshot) { ··· 1619 1620 bch_err(c, "%s: error finding first overlapping extent when repairing, got%s", 1621 __func__, buf.buf); 1622 + ret = bch_err_throw(c, internal_fsck_err); 1623 goto err; 1624 } 1625 ··· 1644 pos2.size != k2.k->size) { 1645 bch_err(c, "%s: error finding seconding overlapping extent when repairing%s", 1646 __func__, buf.buf); 1647 + ret = bch_err_throw(c, internal_fsck_err); 1648 goto err; 1649 } 1650 ··· 1692 * We overwrote the second extent - restart 1693 * check_extent() from the top: 1694 */ 1695 + ret = bch_err_throw(c, transaction_restart_nested); 1696 } 1697 } 1698 fsck_err: ··· 2045 (bch2_bkey_val_to_text(&buf, c, d.s_c), buf.buf))) { 2046 if (!new_parent_subvol) { 2047 bch_err(c, "could not find a subvol for snapshot %u", d.k->p.snapshot); 2048 + return bch_err_throw(c, fsck_repair_unimplemented); 2049 } 2050 2051 struct bkey_i_dirent *new_dirent = bch2_bkey_make_mut_typed(trans, iter, &d.s_c, 0, dirent); ··· 2107 2108 if (ret) { 2109 bch_err(c, "subvol %u points to missing inode root %llu", target_subvol, target_inum); 2110 + ret = bch_err_throw(c, fsck_repair_unimplemented); 2111 goto err; 2112 } 2113 ··· 2139 struct bch_hash_info *hash_info, 2140 struct inode_walker *dir, 2141 struct inode_walker *target, 2142 + struct snapshots_seen *s, 2143 + bool *need_second_pass) 2144 { 2145 struct bch_fs *c = trans->c; 2146 struct inode_walker_entry *i; ··· 2181 *hash_info = bch2_hash_info_init(c, &i->inode); 2182 dir->first_this_inode = false; 2183 2184 + #ifdef CONFIG_UNICODE 2185 + hash_info->cf_encoding = bch2_inode_casefold(c, &i->inode) ? c->cf_encoding : NULL; 2186 + #endif 2187 + 2188 + ret = bch2_str_hash_check_key(trans, s, &bch2_dirent_hash_desc, hash_info, 2189 + iter, k, need_second_pass); 2190 if (ret < 0) 2191 goto err; 2192 if (ret) { ··· 2202 (printbuf_reset(&buf), 2203 bch2_bkey_val_to_text(&buf, c, k), 2204 buf.buf))) { 2205 + subvol_inum dir_inum = { .subvol = d.v->d_type == DT_SUBVOL 2206 + ? le32_to_cpu(d.v->d_parent_subvol) 2207 + : 0, 2208 + }; 2209 u64 target = d.v->d_type == DT_SUBVOL 2210 ? le32_to_cpu(d.v->d_child_subvol) 2211 : le64_to_cpu(d.v->d_inum); 2212 + struct qstr name = bch2_dirent_get_name(d); 2213 2214 + struct bkey_i_dirent *new_d = 2215 + bch2_dirent_create_key(trans, hash_info, dir_inum, 2216 + d.v->d_type, &name, NULL, target); 2217 + ret = PTR_ERR_OR_ZERO(new_d); 2218 + if (ret) 2219 + goto out; 2220 + 2221 + new_d->k.p.inode = d.k->p.inode; 2222 + new_d->k.p.snapshot = d.k->p.snapshot; 2223 + 2224 + struct btree_iter dup_iter = {}; 2225 + ret = bch2_hash_delete_at(trans, 2226 bch2_dirent_hash_desc, hash_info, iter, 2227 BTREE_UPDATE_internal_snapshot_node) ?: 2228 + bch2_str_hash_repair_key(trans, s, 2229 + &bch2_dirent_hash_desc, hash_info, 2230 + iter, bkey_i_to_s_c(&new_d->k_i), 2231 + &dup_iter, bkey_s_c_null, 2232 + need_second_pass); 2233 goto out; 2234 } 2235 ··· 2294 err: 2295 fsck_err: 2296 printbuf_exit(&buf); 2297 return ret; 2298 } 2299 ··· 2308 struct inode_walker target = inode_walker_init(); 2309 struct snapshots_seen s; 2310 struct bch_hash_info hash_info; 2311 + bool need_second_pass = false, did_second_pass = false; 2312 + int ret; 2313 2314 snapshots_seen_init(&s); 2315 + again: 2316 + ret = bch2_trans_run(c, 2317 + for_each_btree_key_commit(trans, iter, BTREE_ID_dirents, 2318 POS(BCACHEFS_ROOT_INO, 0), 2319 BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 2320 + NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 2321 + check_dirent(trans, &iter, k, &hash_info, &dir, &target, &s, 2322 + &need_second_pass)) ?: 2323 check_subdir_count_notnested(trans, &dir)); 2324 + 2325 + if (!ret && need_second_pass && !did_second_pass) { 2326 + bch_info(c, "check_dirents requires second pass"); 2327 + swap(did_second_pass, need_second_pass); 2328 + goto again; 2329 + } 2330 + 2331 + if (!ret && need_second_pass) { 2332 + bch_err(c, "dirents not repairing"); 2333 + ret = -EINVAL; 2334 + } 2335 2336 snapshots_seen_exit(&s); 2337 inode_walker_exit(&dir); ··· 2331 struct inode_walker *inode) 2332 { 2333 struct bch_fs *c = trans->c; 2334 2335 + int ret = bch2_check_key_has_snapshot(trans, iter, k); 2336 if (ret < 0) 2337 return ret; 2338 if (ret) 2339 return 0; 2340 2341 + struct inode_walker_entry *i = walk_inode(trans, inode, k); 2342 ret = PTR_ERR_OR_ZERO(i); 2343 if (ret) 2344 return ret; ··· 2356 *hash_info = bch2_hash_info_init(c, &i->inode); 2357 inode->first_this_inode = false; 2358 2359 + bool need_second_pass = false; 2360 + return bch2_str_hash_check_key(trans, NULL, &bch2_xattr_hash_desc, hash_info, 2361 + iter, k, &need_second_pass); 2362 } 2363 2364 /* ··· 2747 if (!d) { 2748 bch_err(c, "fsck: error allocating memory for nlink_table, size %zu", 2749 new_size); 2750 + return bch_err_throw(c, ENOMEM_fsck_add_nlink); 2751 } 2752 2753 if (t->d)
+6
fs/bcachefs/fsck.h
··· 4 5 #include "str_hash.h" 6 7 int bch2_fsck_update_backpointers(struct btree_trans *, 8 struct snapshots_seen *, 9 const struct bch_hash_desc,
··· 4 5 #include "str_hash.h" 6 7 + /* recoverds snapshot IDs of overwrites at @pos */ 8 + struct snapshots_seen { 9 + struct bpos pos; 10 + snapshot_id_list ids; 11 + }; 12 + 13 int bch2_fsck_update_backpointers(struct btree_trans *, 14 struct snapshots_seen *, 15 const struct bch_hash_desc,
+56 -30
fs/bcachefs/inode.c
··· 38 #undef x 39 40 static int delete_ancestor_snapshot_inodes(struct btree_trans *, struct bpos); 41 42 static const u8 byte_table[8] = { 1, 2, 3, 4, 6, 8, 10, 13 }; 43 ··· 1042 goto found_slot; 1043 1044 if (!ret && start == min) 1045 - ret = -BCH_ERR_ENOSPC_inode_create; 1046 1047 if (ret) { 1048 bch2_trans_iter_exit(trans, iter); ··· 1131 u32 snapshot; 1132 int ret; 1133 1134 /* 1135 * If this was a directory, there shouldn't be any real dirents left - 1136 * but there could be whiteouts (from hash collisions) that we should 1137 * delete: 1138 * 1139 - * XXX: the dirent could ideally would delete whiteouts when they're no 1140 * longer needed 1141 */ 1142 ret = bch2_inode_delete_keys(trans, inum, BTREE_ID_extents) ?: 1143 bch2_inode_delete_keys(trans, inum, BTREE_ID_xattrs) ?: 1144 bch2_inode_delete_keys(trans, inum, BTREE_ID_dirents); 1145 if (ret) 1146 - goto err; 1147 retry: 1148 bch2_trans_begin(trans); 1149 ··· 1166 bch2_fs_inconsistent(c, 1167 "inode %llu:%u not found when deleting", 1168 inum.inum, snapshot); 1169 - ret = -BCH_ERR_ENOENT_inode; 1170 goto err; 1171 } 1172 ··· 1333 bch2_fs_inconsistent(c, 1334 "inode %llu:%u not found when deleting", 1335 inum, snapshot); 1336 - ret = -BCH_ERR_ENOENT_inode; 1337 goto err; 1338 } 1339 ··· 1397 delete_ancestor_snapshot_inodes(trans, SPOS(0, inum, snapshot)); 1398 } 1399 1400 - static int may_delete_deleted_inode(struct btree_trans *trans, 1401 - struct btree_iter *iter, 1402 - struct bpos pos, 1403 - bool *need_another_pass) 1404 { 1405 struct bch_fs *c = trans->c; 1406 struct btree_iter inode_iter; ··· 1412 if (ret) 1413 return ret; 1414 1415 - ret = bkey_is_inode(k.k) ? 0 : -BCH_ERR_ENOENT_inode; 1416 - if (fsck_err_on(!bkey_is_inode(k.k), 1417 trans, deleted_inode_missing, 1418 "nonexistent inode %llu:%u in deleted_inodes btree", 1419 pos.offset, pos.snapshot)) 1420 goto delete; 1421 1422 ret = bch2_inode_unpack(k, &inode); 1423 if (ret) ··· 1427 1428 if (S_ISDIR(inode.bi_mode)) { 1429 ret = bch2_empty_dir_snapshot(trans, pos.offset, 0, pos.snapshot); 1430 - if (fsck_err_on(bch2_err_matches(ret, ENOTEMPTY), 1431 trans, deleted_inode_is_dir, 1432 "non empty directory %llu:%u in deleted_inodes btree", 1433 pos.offset, pos.snapshot)) ··· 1437 goto out; 1438 } 1439 1440 - if (fsck_err_on(!(inode.bi_flags & BCH_INODE_unlinked), 1441 trans, deleted_inode_not_unlinked, 1442 "non-deleted inode %llu:%u in deleted_inodes btree", 1443 pos.offset, pos.snapshot)) 1444 goto delete; 1445 1446 - if (fsck_err_on(inode.bi_flags & BCH_INODE_has_child_snapshot, 1447 trans, deleted_inode_has_child_snapshots, 1448 "inode with child snapshots %llu:%u in deleted_inodes btree", 1449 pos.offset, pos.snapshot)) 1450 goto delete; 1451 1452 ret = bch2_inode_has_child_snapshots(trans, k.k->p); 1453 if (ret < 0) ··· 1472 if (ret) 1473 goto out; 1474 } 1475 goto delete; 1476 1477 } 1478 1479 - if (test_bit(BCH_FS_clean_recovery, &c->flags) && 1480 - !fsck_err(trans, deleted_inode_but_clean, 1481 - "filesystem marked as clean but have deleted inode %llu:%u", 1482 - pos.offset, pos.snapshot)) { 1483 - ret = 0; 1484 - goto out; 1485 - } 1486 1487 - ret = 1; 1488 out: 1489 fsck_err: 1490 bch2_trans_iter_exit(trans, &inode_iter); ··· 1504 goto out; 1505 } 1506 1507 int bch2_delete_dead_inodes(struct bch_fs *c) 1508 { 1509 struct btree_trans *trans = bch2_trans_get(c); 1510 - bool need_another_pass; 1511 int ret; 1512 - again: 1513 /* 1514 * if we ran check_inodes() unlinked inodes will have already been 1515 * cleaned up but the write buffer will be out of sync; therefore we ··· 1525 ret = bch2_btree_write_buffer_flush_sync(trans); 1526 if (ret) 1527 goto err; 1528 - 1529 - need_another_pass = false; 1530 1531 /* 1532 * Weird transaction restart handling here because on successful delete, ··· 1535 ret = for_each_btree_key_commit(trans, iter, BTREE_ID_deleted_inodes, POS_MIN, 1536 BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 1537 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, ({ 1538 - ret = may_delete_deleted_inode(trans, &iter, k.k->p, &need_another_pass); 1539 if (ret > 0) { 1540 bch_verbose_ratelimited(c, "deleting unlinked inode %llu:%u", 1541 k.k->p.offset, k.k->p.snapshot); ··· 1556 1557 ret; 1558 })); 1559 - 1560 - if (!ret && need_another_pass) 1561 - goto again; 1562 err: 1563 bch2_trans_put(trans); 1564 return ret; 1565 }
··· 38 #undef x 39 40 static int delete_ancestor_snapshot_inodes(struct btree_trans *, struct bpos); 41 + static int may_delete_deleted_inum(struct btree_trans *, subvol_inum); 42 43 static const u8 byte_table[8] = { 1, 2, 3, 4, 6, 8, 10, 13 }; 44 ··· 1041 goto found_slot; 1042 1043 if (!ret && start == min) 1044 + ret = bch_err_throw(trans->c, ENOSPC_inode_create); 1045 1046 if (ret) { 1047 bch2_trans_iter_exit(trans, iter); ··· 1130 u32 snapshot; 1131 int ret; 1132 1133 + ret = lockrestart_do(trans, may_delete_deleted_inum(trans, inum)); 1134 + if (ret) 1135 + goto err2; 1136 + 1137 /* 1138 * If this was a directory, there shouldn't be any real dirents left - 1139 * but there could be whiteouts (from hash collisions) that we should 1140 * delete: 1141 * 1142 + * XXX: the dirent code ideally would delete whiteouts when they're no 1143 * longer needed 1144 */ 1145 ret = bch2_inode_delete_keys(trans, inum, BTREE_ID_extents) ?: 1146 bch2_inode_delete_keys(trans, inum, BTREE_ID_xattrs) ?: 1147 bch2_inode_delete_keys(trans, inum, BTREE_ID_dirents); 1148 if (ret) 1149 + goto err2; 1150 retry: 1151 bch2_trans_begin(trans); 1152 ··· 1161 bch2_fs_inconsistent(c, 1162 "inode %llu:%u not found when deleting", 1163 inum.inum, snapshot); 1164 + ret = bch_err_throw(c, ENOENT_inode); 1165 goto err; 1166 } 1167 ··· 1328 bch2_fs_inconsistent(c, 1329 "inode %llu:%u not found when deleting", 1330 inum, snapshot); 1331 + ret = bch_err_throw(c, ENOENT_inode); 1332 goto err; 1333 } 1334 ··· 1392 delete_ancestor_snapshot_inodes(trans, SPOS(0, inum, snapshot)); 1393 } 1394 1395 + static int may_delete_deleted_inode(struct btree_trans *trans, struct bpos pos, 1396 + bool from_deleted_inodes) 1397 { 1398 struct bch_fs *c = trans->c; 1399 struct btree_iter inode_iter; ··· 1409 if (ret) 1410 return ret; 1411 1412 + ret = bkey_is_inode(k.k) ? 0 : bch_err_throw(c, ENOENT_inode); 1413 + if (fsck_err_on(from_deleted_inodes && ret, 1414 trans, deleted_inode_missing, 1415 "nonexistent inode %llu:%u in deleted_inodes btree", 1416 pos.offset, pos.snapshot)) 1417 goto delete; 1418 + if (ret) 1419 + goto out; 1420 1421 ret = bch2_inode_unpack(k, &inode); 1422 if (ret) ··· 1422 1423 if (S_ISDIR(inode.bi_mode)) { 1424 ret = bch2_empty_dir_snapshot(trans, pos.offset, 0, pos.snapshot); 1425 + if (fsck_err_on(from_deleted_inodes && 1426 + bch2_err_matches(ret, ENOTEMPTY), 1427 trans, deleted_inode_is_dir, 1428 "non empty directory %llu:%u in deleted_inodes btree", 1429 pos.offset, pos.snapshot)) ··· 1431 goto out; 1432 } 1433 1434 + ret = inode.bi_flags & BCH_INODE_unlinked ? 0 : bch_err_throw(c, inode_not_unlinked); 1435 + if (fsck_err_on(from_deleted_inodes && ret, 1436 trans, deleted_inode_not_unlinked, 1437 "non-deleted inode %llu:%u in deleted_inodes btree", 1438 pos.offset, pos.snapshot)) 1439 goto delete; 1440 + if (ret) 1441 + goto out; 1442 1443 + ret = !(inode.bi_flags & BCH_INODE_has_child_snapshot) 1444 + ? 0 : bch_err_throw(c, inode_has_child_snapshot); 1445 + 1446 + if (fsck_err_on(from_deleted_inodes && ret, 1447 trans, deleted_inode_has_child_snapshots, 1448 "inode with child snapshots %llu:%u in deleted_inodes btree", 1449 pos.offset, pos.snapshot)) 1450 goto delete; 1451 + if (ret) 1452 + goto out; 1453 1454 ret = bch2_inode_has_child_snapshots(trans, k.k->p); 1455 if (ret < 0) ··· 1458 if (ret) 1459 goto out; 1460 } 1461 + 1462 + if (!from_deleted_inodes) { 1463 + ret = bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc) ?: 1464 + bch_err_throw(c, inode_has_child_snapshot); 1465 + goto out; 1466 + } 1467 + 1468 goto delete; 1469 1470 } 1471 1472 + if (from_deleted_inodes) { 1473 + if (test_bit(BCH_FS_clean_recovery, &c->flags) && 1474 + !fsck_err(trans, deleted_inode_but_clean, 1475 + "filesystem marked as clean but have deleted inode %llu:%u", 1476 + pos.offset, pos.snapshot)) { 1477 + ret = 0; 1478 + goto out; 1479 + } 1480 1481 + ret = 1; 1482 + } 1483 out: 1484 fsck_err: 1485 bch2_trans_iter_exit(trans, &inode_iter); ··· 1481 goto out; 1482 } 1483 1484 + static int may_delete_deleted_inum(struct btree_trans *trans, subvol_inum inum) 1485 + { 1486 + u32 snapshot; 1487 + 1488 + return bch2_subvolume_get_snapshot(trans, inum.subvol, &snapshot) ?: 1489 + may_delete_deleted_inode(trans, SPOS(0, inum.inum, snapshot), false); 1490 + } 1491 + 1492 int bch2_delete_dead_inodes(struct bch_fs *c) 1493 { 1494 struct btree_trans *trans = bch2_trans_get(c); 1495 int ret; 1496 + 1497 /* 1498 * if we ran check_inodes() unlinked inodes will have already been 1499 * cleaned up but the write buffer will be out of sync; therefore we ··· 1495 ret = bch2_btree_write_buffer_flush_sync(trans); 1496 if (ret) 1497 goto err; 1498 1499 /* 1500 * Weird transaction restart handling here because on successful delete, ··· 1507 ret = for_each_btree_key_commit(trans, iter, BTREE_ID_deleted_inodes, POS_MIN, 1508 BTREE_ITER_prefetch|BTREE_ITER_all_snapshots, k, 1509 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, ({ 1510 + ret = may_delete_deleted_inode(trans, k.k->p, true); 1511 if (ret > 0) { 1512 bch_verbose_ratelimited(c, "deleting unlinked inode %llu:%u", 1513 k.k->p.offset, k.k->p.snapshot); ··· 1528 1529 ret; 1530 })); 1531 err: 1532 bch2_trans_put(trans); 1533 + bch_err_fn(c, ret); 1534 return ret; 1535 }
-9
fs/bcachefs/inode.h
··· 283 int bch2_inode_nlink_inc(struct bch_inode_unpacked *); 284 void bch2_inode_nlink_dec(struct btree_trans *, struct bch_inode_unpacked *); 285 286 - static inline bool bch2_inode_should_have_single_bp(struct bch_inode_unpacked *inode) 287 - { 288 - bool inode_has_bp = inode->bi_dir || inode->bi_dir_offset; 289 - 290 - return S_ISDIR(inode->bi_mode) || 291 - inode->bi_subvol || 292 - (!inode->bi_nlink && inode_has_bp); 293 - } 294 - 295 struct bch_opts bch2_inode_opts_to_opts(struct bch_inode_unpacked *); 296 void bch2_inode_opts_get(struct bch_io_opts *, struct bch_fs *, 297 struct bch_inode_unpacked *);
··· 283 int bch2_inode_nlink_inc(struct bch_inode_unpacked *); 284 void bch2_inode_nlink_dec(struct btree_trans *, struct bch_inode_unpacked *); 285 286 struct bch_opts bch2_inode_opts_to_opts(struct bch_inode_unpacked *); 287 void bch2_inode_opts_get(struct bch_io_opts *, struct bch_fs *, 288 struct bch_inode_unpacked *);
+1 -1
fs/bcachefs/io_misc.c
··· 91 opts.data_replicas, 92 BCH_WATERMARK_normal, 0, &cl, &wp); 93 if (bch2_err_matches(ret, BCH_ERR_operation_blocked)) 94 - ret = -BCH_ERR_transaction_restart_nested; 95 if (ret) 96 goto err; 97
··· 91 opts.data_replicas, 92 BCH_WATERMARK_normal, 0, &cl, &wp); 93 if (bch2_err_matches(ret, BCH_ERR_operation_blocked)) 94 + ret = bch_err_throw(c, transaction_restart_nested); 95 if (ret) 96 goto err; 97
+17 -18
fs/bcachefs/io_read.c
··· 56 if (!target) 57 return false; 58 59 - rcu_read_lock(); 60 devs = bch2_target_to_mask(c, target) ?: 61 &c->rw_devs[BCH_DATA_user]; 62 ··· 73 total += max(congested, 0LL); 74 nr++; 75 } 76 - rcu_read_unlock(); 77 78 return get_random_u32_below(nr * CONGESTED_MAX) < total; 79 } ··· 137 BUG_ON(!opts.promote_target); 138 139 if (!(flags & BCH_READ_may_promote)) 140 - return -BCH_ERR_nopromote_may_not; 141 142 if (bch2_bkey_has_target(c, k, opts.promote_target)) 143 - return -BCH_ERR_nopromote_already_promoted; 144 145 if (bkey_extent_is_unwritten(k)) 146 - return -BCH_ERR_nopromote_unwritten; 147 148 if (bch2_target_congested(c, opts.promote_target)) 149 - return -BCH_ERR_nopromote_congested; 150 } 151 152 if (rhashtable_lookup_fast(&c->promote_table, &pos, 153 bch_promote_params)) 154 - return -BCH_ERR_nopromote_in_flight; 155 156 return 0; 157 } ··· 239 240 struct promote_op *op = kzalloc(sizeof(*op), GFP_KERNEL); 241 if (!op) { 242 - ret = -BCH_ERR_nopromote_enomem; 243 goto err_put; 244 } 245 ··· 248 249 if (rhashtable_lookup_insert_fast(&c->promote_table, &op->hash, 250 bch_promote_params)) { 251 - ret = -BCH_ERR_nopromote_in_flight; 252 goto err; 253 } 254 ··· 544 545 if (!bkey_and_val_eq(k, bkey_i_to_s_c(u->k.k))) { 546 /* extent we wanted to read no longer exists: */ 547 - rbio->ret = -BCH_ERR_data_read_key_overwritten; 548 goto err; 549 } 550 ··· 1035 1036 if ((bch2_bkey_extent_flags(k) & BIT_ULL(BCH_EXTENT_FLAG_poisoned)) && 1037 !orig->data_update) 1038 - return -BCH_ERR_extent_poisoned; 1039 retry_pick: 1040 ret = bch2_bkey_pick_read_device(c, k, failed, &pick, dev); 1041 ··· 1073 1074 bch_err_ratelimited(c, "%s", buf.buf); 1075 printbuf_exit(&buf); 1076 - ret = -BCH_ERR_data_read_no_encryption_key; 1077 goto err; 1078 } 1079 ··· 1127 if (ca) 1128 enumerated_ref_put(&ca->io_ref[READ], 1129 BCH_DEV_READ_REF_io_read); 1130 - rbio->ret = -BCH_ERR_data_read_buffer_too_small; 1131 goto out_read_done; 1132 } 1133 ··· 1332 * have to signal that: 1333 */ 1334 if (u) 1335 - orig->ret = -BCH_ERR_data_read_key_overwritten; 1336 1337 zero_fill_bio_iter(&orig->bio, iter); 1338 out_read_done: ··· 1509 c->opts.btree_node_size, 1510 c->opts.encoded_extent_max) / 1511 PAGE_SIZE, 0)) 1512 - return -BCH_ERR_ENOMEM_bio_bounce_pages_init; 1513 1514 if (bioset_init(&c->bio_read, 1, offsetof(struct bch_read_bio, bio), 1515 BIOSET_NEED_BVECS)) 1516 - return -BCH_ERR_ENOMEM_bio_read_init; 1517 1518 if (bioset_init(&c->bio_read_split, 1, offsetof(struct bch_read_bio, bio), 1519 BIOSET_NEED_BVECS)) 1520 - return -BCH_ERR_ENOMEM_bio_read_split_init; 1521 1522 if (rhashtable_init(&c->promote_table, &bch_promote_params)) 1523 - return -BCH_ERR_ENOMEM_promote_table_init; 1524 1525 return 0; 1526 }
··· 56 if (!target) 57 return false; 58 59 + guard(rcu)(); 60 devs = bch2_target_to_mask(c, target) ?: 61 &c->rw_devs[BCH_DATA_user]; 62 ··· 73 total += max(congested, 0LL); 74 nr++; 75 } 76 77 return get_random_u32_below(nr * CONGESTED_MAX) < total; 78 } ··· 138 BUG_ON(!opts.promote_target); 139 140 if (!(flags & BCH_READ_may_promote)) 141 + return bch_err_throw(c, nopromote_may_not); 142 143 if (bch2_bkey_has_target(c, k, opts.promote_target)) 144 + return bch_err_throw(c, nopromote_already_promoted); 145 146 if (bkey_extent_is_unwritten(k)) 147 + return bch_err_throw(c, nopromote_unwritten); 148 149 if (bch2_target_congested(c, opts.promote_target)) 150 + return bch_err_throw(c, nopromote_congested); 151 } 152 153 if (rhashtable_lookup_fast(&c->promote_table, &pos, 154 bch_promote_params)) 155 + return bch_err_throw(c, nopromote_in_flight); 156 157 return 0; 158 } ··· 240 241 struct promote_op *op = kzalloc(sizeof(*op), GFP_KERNEL); 242 if (!op) { 243 + ret = bch_err_throw(c, nopromote_enomem); 244 goto err_put; 245 } 246 ··· 249 250 if (rhashtable_lookup_insert_fast(&c->promote_table, &op->hash, 251 bch_promote_params)) { 252 + ret = bch_err_throw(c, nopromote_in_flight); 253 goto err; 254 } 255 ··· 545 546 if (!bkey_and_val_eq(k, bkey_i_to_s_c(u->k.k))) { 547 /* extent we wanted to read no longer exists: */ 548 + rbio->ret = bch_err_throw(trans->c, data_read_key_overwritten); 549 goto err; 550 } 551 ··· 1036 1037 if ((bch2_bkey_extent_flags(k) & BIT_ULL(BCH_EXTENT_FLAG_poisoned)) && 1038 !orig->data_update) 1039 + return bch_err_throw(c, extent_poisoned); 1040 retry_pick: 1041 ret = bch2_bkey_pick_read_device(c, k, failed, &pick, dev); 1042 ··· 1074 1075 bch_err_ratelimited(c, "%s", buf.buf); 1076 printbuf_exit(&buf); 1077 + ret = bch_err_throw(c, data_read_no_encryption_key); 1078 goto err; 1079 } 1080 ··· 1128 if (ca) 1129 enumerated_ref_put(&ca->io_ref[READ], 1130 BCH_DEV_READ_REF_io_read); 1131 + rbio->ret = bch_err_throw(c, data_read_buffer_too_small); 1132 goto out_read_done; 1133 } 1134 ··· 1333 * have to signal that: 1334 */ 1335 if (u) 1336 + orig->ret = bch_err_throw(c, data_read_key_overwritten); 1337 1338 zero_fill_bio_iter(&orig->bio, iter); 1339 out_read_done: ··· 1510 c->opts.btree_node_size, 1511 c->opts.encoded_extent_max) / 1512 PAGE_SIZE, 0)) 1513 + return bch_err_throw(c, ENOMEM_bio_bounce_pages_init); 1514 1515 if (bioset_init(&c->bio_read, 1, offsetof(struct bch_read_bio, bio), 1516 BIOSET_NEED_BVECS)) 1517 + return bch_err_throw(c, ENOMEM_bio_read_init); 1518 1519 if (bioset_init(&c->bio_read_split, 1, offsetof(struct bch_read_bio, bio), 1520 BIOSET_NEED_BVECS)) 1521 + return bch_err_throw(c, ENOMEM_bio_read_split_init); 1522 1523 if (rhashtable_init(&c->promote_table, &bch_promote_params)) 1524 + return bch_err_throw(c, ENOMEM_promote_table_init); 1525 1526 return 0; 1527 }
+4 -2
fs/bcachefs/io_read.h
··· 91 return 0; 92 93 *data_btree = BTREE_ID_reflink; 94 struct btree_iter iter; 95 struct bkey_s_c k = bch2_lookup_indirect_extent(trans, &iter, 96 offset_into_extent, ··· 104 105 if (bkey_deleted(k.k)) { 106 bch2_trans_iter_exit(trans, &iter); 107 - return -BCH_ERR_missing_indirect_extent; 108 } 109 110 - bch2_bkey_buf_reassemble(extent, trans->c, k); 111 bch2_trans_iter_exit(trans, &iter); 112 return 0; 113 }
··· 91 return 0; 92 93 *data_btree = BTREE_ID_reflink; 94 + 95 + struct bch_fs *c = trans->c; 96 struct btree_iter iter; 97 struct bkey_s_c k = bch2_lookup_indirect_extent(trans, &iter, 98 offset_into_extent, ··· 102 103 if (bkey_deleted(k.k)) { 104 bch2_trans_iter_exit(trans, &iter); 105 + return bch_err_throw(c, missing_indirect_extent); 106 } 107 108 + bch2_bkey_buf_reassemble(extent, c, k); 109 bch2_trans_iter_exit(trans, &iter); 110 return 0; 111 }
+12 -14
fs/bcachefs/io_write.c
··· 558 559 static noinline int bch2_write_drop_io_error_ptrs(struct bch_write_op *op) 560 { 561 struct keylist *keys = &op->insert_keys; 562 struct bkey_i *src, *dst = keys->keys, *n; 563 ··· 570 test_bit(ptr->dev, op->failed.d)); 571 572 if (!bch2_bkey_nr_ptrs(bkey_i_to_s_c(src))) 573 - return -BCH_ERR_data_write_io; 574 } 575 576 if (dst != src) ··· 977 op->crc.csum_type < BCH_CSUM_NR 978 ? __bch2_csum_types[op->crc.csum_type] 979 : "(unknown)"); 980 - return -BCH_ERR_data_write_csum; 981 } 982 983 static int bch2_write_extent(struct bch_write_op *op, struct write_point *wp, ··· 1209 1210 e = bkey_s_c_to_extent(k); 1211 1212 - rcu_read_lock(); 1213 extent_for_each_ptr_decode(e, p, entry) { 1214 - if (crc_is_encoded(p.crc) || p.has_ec) { 1215 - rcu_read_unlock(); 1216 return false; 1217 - } 1218 1219 replicas += bch2_extent_ptr_durability(c, &p); 1220 } 1221 - rcu_read_unlock(); 1222 1223 return replicas >= op->opts.data_replicas; 1224 } ··· 1288 static void __bch2_nocow_write_done(struct bch_write_op *op) 1289 { 1290 if (unlikely(op->flags & BCH_WRITE_io_error)) { 1291 - op->error = -BCH_ERR_data_write_io; 1292 } else if (unlikely(op->flags & BCH_WRITE_convert_unwritten)) 1293 bch2_nocow_write_convert_unwritten(op); 1294 } ··· 1481 "pointer to invalid bucket in nocow path on device %llu\n %s", 1482 stale_at->b.inode, 1483 (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 1484 - ret = -BCH_ERR_data_write_invalid_ptr; 1485 } else { 1486 /* We can retry this: */ 1487 - ret = -BCH_ERR_transaction_restart; 1488 } 1489 printbuf_exit(&buf); 1490 ··· 1691 1692 if (unlikely(bio->bi_iter.bi_size & (c->opts.block_size - 1))) { 1693 bch2_write_op_error(op, op->pos.offset, "misaligned write"); 1694 - op->error = -BCH_ERR_data_write_misaligned; 1695 goto err; 1696 } 1697 1698 if (c->opts.nochanges) { 1699 - op->error = -BCH_ERR_erofs_no_writes; 1700 goto err; 1701 } 1702 1703 if (!(op->flags & BCH_WRITE_move) && 1704 !enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_write)) { 1705 - op->error = -BCH_ERR_erofs_no_writes; 1706 goto err; 1707 } 1708 ··· 1774 { 1775 if (bioset_init(&c->bio_write, 1, offsetof(struct bch_write_bio, bio), BIOSET_NEED_BVECS) || 1776 bioset_init(&c->replica_set, 4, offsetof(struct bch_write_bio, bio), 0)) 1777 - return -BCH_ERR_ENOMEM_bio_write_init; 1778 1779 return 0; 1780 }
··· 558 559 static noinline int bch2_write_drop_io_error_ptrs(struct bch_write_op *op) 560 { 561 + struct bch_fs *c = op->c; 562 struct keylist *keys = &op->insert_keys; 563 struct bkey_i *src, *dst = keys->keys, *n; 564 ··· 569 test_bit(ptr->dev, op->failed.d)); 570 571 if (!bch2_bkey_nr_ptrs(bkey_i_to_s_c(src))) 572 + return bch_err_throw(c, data_write_io); 573 } 574 575 if (dst != src) ··· 976 op->crc.csum_type < BCH_CSUM_NR 977 ? __bch2_csum_types[op->crc.csum_type] 978 : "(unknown)"); 979 + return bch_err_throw(c, data_write_csum); 980 } 981 982 static int bch2_write_extent(struct bch_write_op *op, struct write_point *wp, ··· 1208 1209 e = bkey_s_c_to_extent(k); 1210 1211 + guard(rcu)(); 1212 extent_for_each_ptr_decode(e, p, entry) { 1213 + if (crc_is_encoded(p.crc) || p.has_ec) 1214 return false; 1215 1216 replicas += bch2_extent_ptr_durability(c, &p); 1217 } 1218 1219 return replicas >= op->opts.data_replicas; 1220 } ··· 1290 static void __bch2_nocow_write_done(struct bch_write_op *op) 1291 { 1292 if (unlikely(op->flags & BCH_WRITE_io_error)) { 1293 + op->error = bch_err_throw(op->c, data_write_io); 1294 } else if (unlikely(op->flags & BCH_WRITE_convert_unwritten)) 1295 bch2_nocow_write_convert_unwritten(op); 1296 } ··· 1483 "pointer to invalid bucket in nocow path on device %llu\n %s", 1484 stale_at->b.inode, 1485 (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 1486 + ret = bch_err_throw(c, data_write_invalid_ptr); 1487 } else { 1488 /* We can retry this: */ 1489 + ret = bch_err_throw(c, transaction_restart); 1490 } 1491 printbuf_exit(&buf); 1492 ··· 1693 1694 if (unlikely(bio->bi_iter.bi_size & (c->opts.block_size - 1))) { 1695 bch2_write_op_error(op, op->pos.offset, "misaligned write"); 1696 + op->error = bch_err_throw(c, data_write_misaligned); 1697 goto err; 1698 } 1699 1700 if (c->opts.nochanges) { 1701 + op->error = bch_err_throw(c, erofs_no_writes); 1702 goto err; 1703 } 1704 1705 if (!(op->flags & BCH_WRITE_move) && 1706 !enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_write)) { 1707 + op->error = bch_err_throw(c, erofs_no_writes); 1708 goto err; 1709 } 1710 ··· 1776 { 1777 if (bioset_init(&c->bio_write, 1, offsetof(struct bch_write_bio, bio), BIOSET_NEED_BVECS) || 1778 bioset_init(&c->replica_set, 4, offsetof(struct bch_write_bio, bio), 0)) 1779 + return bch_err_throw(c, ENOMEM_bio_write_init); 1780 1781 return 0; 1782 }
+89 -28
fs/bcachefs/journal.c
··· 397 BUG_ON(BCH_SB_CLEAN(c->disk_sb.sb)); 398 399 if (j->blocked) 400 - return -BCH_ERR_journal_blocked; 401 402 if (j->cur_entry_error) 403 return j->cur_entry_error; ··· 407 return ret; 408 409 if (!fifo_free(&j->pin)) 410 - return -BCH_ERR_journal_pin_full; 411 412 if (nr_unwritten_journal_entries(j) == ARRAY_SIZE(j->buf)) 413 - return -BCH_ERR_journal_max_in_flight; 414 415 if (atomic64_read(&j->seq) - j->seq_write_started == JOURNAL_STATE_BUF_NR) 416 - return -BCH_ERR_journal_max_open; 417 418 if (unlikely(journal_cur_seq(j) >= JOURNAL_SEQ_MAX)) { 419 bch_err(c, "cannot start: journal seq overflow"); 420 if (bch2_fs_emergency_read_only_locked(c)) 421 bch_err(c, "fatal error - emergency read only"); 422 - return -BCH_ERR_journal_shutdown; 423 } 424 425 if (!j->free_buf && !buf->data) 426 - return -BCH_ERR_journal_buf_enomem; /* will retry after write completion frees up a buf */ 427 428 BUG_ON(!j->cur_entry_sectors); 429 ··· 447 u64s = clamp_t(int, u64s, 0, JOURNAL_ENTRY_CLOSED_VAL - 1); 448 449 if (u64s <= (ssize_t) j->early_journal_entries.nr) 450 - return -BCH_ERR_journal_full; 451 452 if (fifo_empty(&j->pin) && j->reclaim_thread) 453 wake_up_process(j->reclaim_thread); ··· 464 journal_cur_seq(j)); 465 if (bch2_fs_emergency_read_only_locked(c)) 466 bch_err(c, "fatal error - emergency read only"); 467 - return -BCH_ERR_journal_shutdown; 468 } 469 470 BUG_ON(j->pin.back - 1 != atomic64_read(&j->seq)); ··· 597 return ret; 598 599 if (j->blocked) 600 - return -BCH_ERR_journal_blocked; 601 602 if ((flags & BCH_WATERMARK_MASK) < j->watermark) { 603 - ret = -BCH_ERR_journal_full; 604 can_discard = j->can_discard; 605 goto out; 606 } 607 608 if (nr_unwritten_journal_entries(j) == ARRAY_SIZE(j->buf) && !journal_entry_is_open(j)) { 609 - ret = -BCH_ERR_journal_max_in_flight; 610 goto out; 611 } 612 ··· 647 goto retry; 648 649 if (journal_error_check_stuck(j, ret, flags)) 650 - ret = -BCH_ERR_journal_stuck; 651 652 if (ret == -BCH_ERR_journal_max_in_flight && 653 track_event_change(&c->times[BCH_TIME_blocked_journal_max_in_flight], true) && ··· 708 { 709 u64 nsecs = 0; 710 711 - rcu_read_lock(); 712 for_each_rw_member_rcu(c, ca) 713 nsecs = max(nsecs, ca->io_latency[WRITE].stats.max_duration); 714 - rcu_read_unlock(); 715 716 return nsecs_to_jiffies(nsecs); 717 } ··· 812 int bch2_journal_flush_seq_async(struct journal *j, u64 seq, 813 struct closure *parent) 814 { 815 struct journal_buf *buf; 816 int ret = 0; 817 ··· 828 829 /* Recheck under lock: */ 830 if (j->err_seq && seq >= j->err_seq) { 831 - ret = -BCH_ERR_journal_flush_err; 832 goto out; 833 } 834 ··· 999 struct bch_fs *c = container_of(j, struct bch_fs, journal); 1000 1001 if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_journal)) 1002 - return -BCH_ERR_erofs_no_writes; 1003 1004 int ret = __bch2_journal_meta(j); 1005 enumerated_ref_put(&c->writes, BCH_WRITE_REF_journal); ··· 1132 new_buckets = kcalloc(nr, sizeof(u64), GFP_KERNEL); 1133 new_bucket_seq = kcalloc(nr, sizeof(u64), GFP_KERNEL); 1134 if (!bu || !ob || !new_buckets || !new_bucket_seq) { 1135 - ret = -BCH_ERR_ENOMEM_set_nr_journal_buckets; 1136 goto err_free; 1137 } 1138 ··· 1304 return ret; 1305 } 1306 1307 int bch2_dev_journal_alloc(struct bch_dev *ca, bool new_fs) 1308 { 1309 struct bch_fs *c = ca->fs; ··· 1373 1374 if (c->sb.features & BIT_ULL(BCH_FEATURE_small_image)) { 1375 bch_err(c, "cannot allocate journal, filesystem is an unresized image file"); 1376 - return -BCH_ERR_erofs_filesystem_full; 1377 } 1378 1379 unsigned nr; 1380 int ret; 1381 1382 if (dynamic_fault("bcachefs:add:journal_alloc")) { 1383 - ret = -BCH_ERR_ENOMEM_set_nr_journal_buckets; 1384 goto err; 1385 } 1386 ··· 1519 init_fifo(&j->pin, roundup_pow_of_two(nr), GFP_KERNEL); 1520 if (!j->pin.data) { 1521 bch_err(c, "error reallocating journal fifo (%llu open entries)", nr); 1522 - return -BCH_ERR_ENOMEM_journal_pin_fifo; 1523 } 1524 1525 j->replay_journal_seq = last_seq; ··· 1607 1608 int bch2_dev_journal_init(struct bch_dev *ca, struct bch_sb *sb) 1609 { 1610 struct journal_device *ja = &ca->journal; 1611 struct bch_sb_field_journal *journal_buckets = 1612 bch2_sb_field_get(sb, journal); ··· 1627 1628 ja->bucket_seq = kcalloc(ja->nr, sizeof(u64), GFP_KERNEL); 1629 if (!ja->bucket_seq) 1630 - return -BCH_ERR_ENOMEM_dev_journal_init; 1631 1632 unsigned nr_bvecs = DIV_ROUND_UP(JOURNAL_ENTRY_SIZE_MAX, PAGE_SIZE); 1633 ··· 1635 ja->bio[i] = kzalloc(struct_size(ja->bio[i], bio.bi_inline_vecs, 1636 nr_bvecs), GFP_KERNEL); 1637 if (!ja->bio[i]) 1638 - return -BCH_ERR_ENOMEM_dev_journal_init; 1639 1640 ja->bio[i]->ca = ca; 1641 ja->bio[i]->buf_idx = i; ··· 1644 1645 ja->buckets = kcalloc(ja->nr, sizeof(u64), GFP_KERNEL); 1646 if (!ja->buckets) 1647 - return -BCH_ERR_ENOMEM_dev_journal_init; 1648 1649 if (journal_buckets_v2) { 1650 unsigned nr = bch2_sb_field_journal_v2_nr_entries(journal_buckets_v2); ··· 1698 1699 int bch2_fs_journal_init(struct journal *j) 1700 { 1701 j->free_buf_size = j->buf_size_want = JOURNAL_ENTRY_SIZE_MIN; 1702 j->free_buf = kvmalloc(j->free_buf_size, GFP_KERNEL); 1703 if (!j->free_buf) 1704 - return -BCH_ERR_ENOMEM_journal_buf; 1705 1706 for (unsigned i = 0; i < ARRAY_SIZE(j->buf); i++) 1707 j->buf[i].idx = i; ··· 1711 j->wq = alloc_workqueue("bcachefs_journal", 1712 WQ_HIGHPRI|WQ_FREEZABLE|WQ_UNBOUND|WQ_MEM_RECLAIM, 512); 1713 if (!j->wq) 1714 - return -BCH_ERR_ENOMEM_fs_other_alloc; 1715 return 0; 1716 } 1717 ··· 1735 printbuf_tabstop_push(out, 28); 1736 out->atomic++; 1737 1738 - rcu_read_lock(); 1739 s = READ_ONCE(j->reservations); 1740 1741 prt_printf(out, "flags:\t"); ··· 1825 } 1826 1827 prt_printf(out, "replicas want %u need %u\n", c->opts.metadata_replicas, c->opts.metadata_replicas_required); 1828 - 1829 - rcu_read_unlock(); 1830 1831 --out->atomic; 1832 }
··· 397 BUG_ON(BCH_SB_CLEAN(c->disk_sb.sb)); 398 399 if (j->blocked) 400 + return bch_err_throw(c, journal_blocked); 401 402 if (j->cur_entry_error) 403 return j->cur_entry_error; ··· 407 return ret; 408 409 if (!fifo_free(&j->pin)) 410 + return bch_err_throw(c, journal_pin_full); 411 412 if (nr_unwritten_journal_entries(j) == ARRAY_SIZE(j->buf)) 413 + return bch_err_throw(c, journal_max_in_flight); 414 415 if (atomic64_read(&j->seq) - j->seq_write_started == JOURNAL_STATE_BUF_NR) 416 + return bch_err_throw(c, journal_max_open); 417 418 if (unlikely(journal_cur_seq(j) >= JOURNAL_SEQ_MAX)) { 419 bch_err(c, "cannot start: journal seq overflow"); 420 if (bch2_fs_emergency_read_only_locked(c)) 421 bch_err(c, "fatal error - emergency read only"); 422 + return bch_err_throw(c, journal_shutdown); 423 } 424 425 if (!j->free_buf && !buf->data) 426 + return bch_err_throw(c, journal_buf_enomem); /* will retry after write completion frees up a buf */ 427 428 BUG_ON(!j->cur_entry_sectors); 429 ··· 447 u64s = clamp_t(int, u64s, 0, JOURNAL_ENTRY_CLOSED_VAL - 1); 448 449 if (u64s <= (ssize_t) j->early_journal_entries.nr) 450 + return bch_err_throw(c, journal_full); 451 452 if (fifo_empty(&j->pin) && j->reclaim_thread) 453 wake_up_process(j->reclaim_thread); ··· 464 journal_cur_seq(j)); 465 if (bch2_fs_emergency_read_only_locked(c)) 466 bch_err(c, "fatal error - emergency read only"); 467 + return bch_err_throw(c, journal_shutdown); 468 } 469 470 BUG_ON(j->pin.back - 1 != atomic64_read(&j->seq)); ··· 597 return ret; 598 599 if (j->blocked) 600 + return bch_err_throw(c, journal_blocked); 601 602 if ((flags & BCH_WATERMARK_MASK) < j->watermark) { 603 + ret = bch_err_throw(c, journal_full); 604 can_discard = j->can_discard; 605 goto out; 606 } 607 608 if (nr_unwritten_journal_entries(j) == ARRAY_SIZE(j->buf) && !journal_entry_is_open(j)) { 609 + ret = bch_err_throw(c, journal_max_in_flight); 610 goto out; 611 } 612 ··· 647 goto retry; 648 649 if (journal_error_check_stuck(j, ret, flags)) 650 + ret = bch_err_throw(c, journal_stuck); 651 652 if (ret == -BCH_ERR_journal_max_in_flight && 653 track_event_change(&c->times[BCH_TIME_blocked_journal_max_in_flight], true) && ··· 708 { 709 u64 nsecs = 0; 710 711 + guard(rcu)(); 712 for_each_rw_member_rcu(c, ca) 713 nsecs = max(nsecs, ca->io_latency[WRITE].stats.max_duration); 714 715 return nsecs_to_jiffies(nsecs); 716 } ··· 813 int bch2_journal_flush_seq_async(struct journal *j, u64 seq, 814 struct closure *parent) 815 { 816 + struct bch_fs *c = container_of(j, struct bch_fs, journal); 817 struct journal_buf *buf; 818 int ret = 0; 819 ··· 828 829 /* Recheck under lock: */ 830 if (j->err_seq && seq >= j->err_seq) { 831 + ret = bch_err_throw(c, journal_flush_err); 832 goto out; 833 } 834 ··· 999 struct bch_fs *c = container_of(j, struct bch_fs, journal); 1000 1001 if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_journal)) 1002 + return bch_err_throw(c, erofs_no_writes); 1003 1004 int ret = __bch2_journal_meta(j); 1005 enumerated_ref_put(&c->writes, BCH_WRITE_REF_journal); ··· 1132 new_buckets = kcalloc(nr, sizeof(u64), GFP_KERNEL); 1133 new_bucket_seq = kcalloc(nr, sizeof(u64), GFP_KERNEL); 1134 if (!bu || !ob || !new_buckets || !new_bucket_seq) { 1135 + ret = bch_err_throw(c, ENOMEM_set_nr_journal_buckets); 1136 goto err_free; 1137 } 1138 ··· 1304 return ret; 1305 } 1306 1307 + int bch2_dev_journal_bucket_delete(struct bch_dev *ca, u64 b) 1308 + { 1309 + struct bch_fs *c = ca->fs; 1310 + struct journal *j = &c->journal; 1311 + struct journal_device *ja = &ca->journal; 1312 + 1313 + guard(mutex)(&c->sb_lock); 1314 + unsigned pos; 1315 + for (pos = 0; pos < ja->nr; pos++) 1316 + if (ja->buckets[pos] == b) 1317 + break; 1318 + 1319 + if (pos == ja->nr) { 1320 + bch_err(ca, "journal bucket %llu not found when deleting", b); 1321 + return -EINVAL; 1322 + } 1323 + 1324 + u64 *new_buckets = kcalloc(ja->nr, sizeof(u64), GFP_KERNEL);; 1325 + if (!new_buckets) 1326 + return bch_err_throw(c, ENOMEM_set_nr_journal_buckets); 1327 + 1328 + memcpy(new_buckets, ja->buckets, ja->nr * sizeof(u64)); 1329 + memmove(&new_buckets[pos], 1330 + &new_buckets[pos + 1], 1331 + (ja->nr - 1 - pos) * sizeof(new_buckets[0])); 1332 + 1333 + int ret = bch2_journal_buckets_to_sb(c, ca, ja->buckets, ja->nr - 1) ?: 1334 + bch2_write_super(c); 1335 + if (ret) { 1336 + kfree(new_buckets); 1337 + return ret; 1338 + } 1339 + 1340 + scoped_guard(spinlock, &j->lock) { 1341 + if (pos < ja->discard_idx) 1342 + --ja->discard_idx; 1343 + if (pos < ja->dirty_idx_ondisk) 1344 + --ja->dirty_idx_ondisk; 1345 + if (pos < ja->dirty_idx) 1346 + --ja->dirty_idx; 1347 + if (pos < ja->cur_idx) 1348 + --ja->cur_idx; 1349 + 1350 + ja->nr--; 1351 + 1352 + memmove(&ja->buckets[pos], 1353 + &ja->buckets[pos + 1], 1354 + (ja->nr - pos) * sizeof(ja->buckets[0])); 1355 + 1356 + memmove(&ja->bucket_seq[pos], 1357 + &ja->bucket_seq[pos + 1], 1358 + (ja->nr - pos) * sizeof(ja->bucket_seq[0])); 1359 + 1360 + bch2_journal_space_available(j); 1361 + } 1362 + 1363 + kfree(new_buckets); 1364 + return 0; 1365 + } 1366 + 1367 int bch2_dev_journal_alloc(struct bch_dev *ca, bool new_fs) 1368 { 1369 struct bch_fs *c = ca->fs; ··· 1313 1314 if (c->sb.features & BIT_ULL(BCH_FEATURE_small_image)) { 1315 bch_err(c, "cannot allocate journal, filesystem is an unresized image file"); 1316 + return bch_err_throw(c, erofs_filesystem_full); 1317 } 1318 1319 unsigned nr; 1320 int ret; 1321 1322 if (dynamic_fault("bcachefs:add:journal_alloc")) { 1323 + ret = bch_err_throw(c, ENOMEM_set_nr_journal_buckets); 1324 goto err; 1325 } 1326 ··· 1459 init_fifo(&j->pin, roundup_pow_of_two(nr), GFP_KERNEL); 1460 if (!j->pin.data) { 1461 bch_err(c, "error reallocating journal fifo (%llu open entries)", nr); 1462 + return bch_err_throw(c, ENOMEM_journal_pin_fifo); 1463 } 1464 1465 j->replay_journal_seq = last_seq; ··· 1547 1548 int bch2_dev_journal_init(struct bch_dev *ca, struct bch_sb *sb) 1549 { 1550 + struct bch_fs *c = ca->fs; 1551 struct journal_device *ja = &ca->journal; 1552 struct bch_sb_field_journal *journal_buckets = 1553 bch2_sb_field_get(sb, journal); ··· 1566 1567 ja->bucket_seq = kcalloc(ja->nr, sizeof(u64), GFP_KERNEL); 1568 if (!ja->bucket_seq) 1569 + return bch_err_throw(c, ENOMEM_dev_journal_init); 1570 1571 unsigned nr_bvecs = DIV_ROUND_UP(JOURNAL_ENTRY_SIZE_MAX, PAGE_SIZE); 1572 ··· 1574 ja->bio[i] = kzalloc(struct_size(ja->bio[i], bio.bi_inline_vecs, 1575 nr_bvecs), GFP_KERNEL); 1576 if (!ja->bio[i]) 1577 + return bch_err_throw(c, ENOMEM_dev_journal_init); 1578 1579 ja->bio[i]->ca = ca; 1580 ja->bio[i]->buf_idx = i; ··· 1583 1584 ja->buckets = kcalloc(ja->nr, sizeof(u64), GFP_KERNEL); 1585 if (!ja->buckets) 1586 + return bch_err_throw(c, ENOMEM_dev_journal_init); 1587 1588 if (journal_buckets_v2) { 1589 unsigned nr = bch2_sb_field_journal_v2_nr_entries(journal_buckets_v2); ··· 1637 1638 int bch2_fs_journal_init(struct journal *j) 1639 { 1640 + struct bch_fs *c = container_of(j, struct bch_fs, journal); 1641 + 1642 j->free_buf_size = j->buf_size_want = JOURNAL_ENTRY_SIZE_MIN; 1643 j->free_buf = kvmalloc(j->free_buf_size, GFP_KERNEL); 1644 if (!j->free_buf) 1645 + return bch_err_throw(c, ENOMEM_journal_buf); 1646 1647 for (unsigned i = 0; i < ARRAY_SIZE(j->buf); i++) 1648 j->buf[i].idx = i; ··· 1648 j->wq = alloc_workqueue("bcachefs_journal", 1649 WQ_HIGHPRI|WQ_FREEZABLE|WQ_UNBOUND|WQ_MEM_RECLAIM, 512); 1650 if (!j->wq) 1651 + return bch_err_throw(c, ENOMEM_fs_other_alloc); 1652 return 0; 1653 } 1654 ··· 1672 printbuf_tabstop_push(out, 28); 1673 out->atomic++; 1674 1675 + guard(rcu)(); 1676 s = READ_ONCE(j->reservations); 1677 1678 prt_printf(out, "flags:\t"); ··· 1762 } 1763 1764 prt_printf(out, "replicas want %u need %u\n", c->opts.metadata_replicas, c->opts.metadata_replicas_required); 1765 1766 --out->atomic; 1767 }
+3 -2
fs/bcachefs/journal.h
··· 444 void __bch2_journal_debug_to_text(struct printbuf *, struct journal *); 445 void bch2_journal_debug_to_text(struct printbuf *, struct journal *); 446 447 - int bch2_set_nr_journal_buckets(struct bch_fs *, struct bch_dev *, 448 - unsigned nr); 449 int bch2_dev_journal_alloc(struct bch_dev *, bool); 450 int bch2_fs_journal_alloc(struct bch_fs *); 451
··· 444 void __bch2_journal_debug_to_text(struct printbuf *, struct journal *); 445 void bch2_journal_debug_to_text(struct printbuf *, struct journal *); 446 447 + int bch2_set_nr_journal_buckets(struct bch_fs *, struct bch_dev *, unsigned); 448 + int bch2_dev_journal_bucket_delete(struct bch_dev *, u64); 449 + 450 int bch2_dev_journal_alloc(struct bch_dev *, bool); 451 int bch2_fs_journal_alloc(struct bch_fs *); 452
+173 -110
fs/bcachefs/journal_io.c
··· 49 mutex_unlock(&c->sb_lock); 50 } 51 52 - void bch2_journal_ptrs_to_text(struct printbuf *out, struct bch_fs *c, 53 - struct journal_replay *j) 54 { 55 darray_for_each(j->ptrs, i) { 56 if (i != j->ptrs.data) 57 prt_printf(out, " "); 58 - prt_printf(out, "%u:%u:%u (sector %llu)", 59 - i->dev, i->bucket, i->bucket_offset, i->sector); 60 } 61 } 62 ··· 81 struct journal_replay *j) 82 { 83 prt_printf(out, "seq %llu ", le64_to_cpu(j->j.seq)); 84 - 85 bch2_journal_ptrs_to_text(out, c, j); 86 - 87 - for_each_jset_entry_type(entry, &j->j, BCH_JSET_ENTRY_datetime) { 88 - struct jset_entry_datetime *datetime = 89 - container_of(entry, struct jset_entry_datetime, entry); 90 - bch2_prt_datetime(out, le64_to_cpu(datetime->seconds)); 91 - break; 92 - } 93 } 94 95 static struct nonce journal_nonce(const struct jset *jset) ··· 199 journal_entry_radix_idx(c, le64_to_cpu(j->seq)), 200 GFP_KERNEL); 201 if (!_i) 202 - return -BCH_ERR_ENOMEM_journal_entry_add; 203 204 /* 205 * Duplicate journal entries? If so we want the one that didn't have a ··· 242 replace: 243 i = kvmalloc(offsetof(struct journal_replay, j) + bytes, GFP_KERNEL); 244 if (!i) 245 - return -BCH_ERR_ENOMEM_journal_entry_add; 246 247 darray_init(&i->ptrs); 248 i->csum_good = entry_ptr.csum_good; ··· 322 bch2_sb_error_count(c, BCH_FSCK_ERR_##_err); \ 323 if (bch2_fs_inconsistent(c, \ 324 "corrupt metadata before write: %s\n", _buf.buf)) {\ 325 - ret = -BCH_ERR_fsck_errors_not_fixed; \ 326 goto fsck_err; \ 327 } \ 328 break; \ ··· 429 bool first = true; 430 431 jset_entry_for_each_key(entry, k) { 432 if (!first) { 433 prt_newline(out); 434 bch2_prt_jset_entry_type(out, entry->type); ··· 1020 size_t size; 1021 }; 1022 1023 - static int journal_read_buf_realloc(struct journal_read_buf *b, 1024 size_t new_size) 1025 { 1026 void *n; 1027 1028 /* the bios are sized for this many pages, max: */ 1029 if (new_size > JOURNAL_ENTRY_SIZE_MAX) 1030 - return -BCH_ERR_ENOMEM_journal_read_buf_realloc; 1031 1032 new_size = roundup_pow_of_two(new_size); 1033 n = kvmalloc(new_size, GFP_KERNEL); 1034 if (!n) 1035 - return -BCH_ERR_ENOMEM_journal_read_buf_realloc; 1036 1037 kvfree(b->data); 1038 b->data = n; ··· 1052 u64 offset = bucket_to_sector(ca, ja->buckets[bucket]), 1053 end = offset + ca->mi.bucket_size; 1054 bool saw_bad = false, csum_good; 1055 - struct printbuf err = PRINTBUF; 1056 int ret = 0; 1057 1058 pr_debug("reading %u", bucket); ··· 1067 1068 bio = bio_kmalloc(nr_bvecs, GFP_KERNEL); 1069 if (!bio) 1070 - return -BCH_ERR_ENOMEM_journal_read_bucket; 1071 bio_init(bio, ca->disk_sb.bdev, bio->bi_inline_vecs, nr_bvecs, REQ_OP_READ); 1072 1073 bio->bi_iter.bi_sector = offset; ··· 1078 kfree(bio); 1079 1080 if (!ret && bch2_meta_read_fault("journal")) 1081 - ret = -BCH_ERR_EIO_fault_injected; 1082 1083 bch2_account_io_completion(ca, BCH_MEMBER_ERROR_read, 1084 submit_time, !ret); ··· 1092 * found on a different device, and missing or 1093 * no journal entries will be handled later 1094 */ 1095 - goto out; 1096 } 1097 1098 j = buf->data; ··· 1106 break; 1107 case JOURNAL_ENTRY_REREAD: 1108 if (vstruct_bytes(j) > buf->size) { 1109 - ret = journal_read_buf_realloc(buf, 1110 vstruct_bytes(j)); 1111 if (ret) 1112 - goto err; 1113 } 1114 goto reread; 1115 case JOURNAL_ENTRY_NONE: 1116 if (!saw_bad) 1117 - goto out; 1118 /* 1119 * On checksum error we don't really trust the size 1120 * field of the journal entry we read, so try reading ··· 1123 sectors = block_sectors(c); 1124 goto next_block; 1125 default: 1126 - goto err; 1127 } 1128 1129 if (le64_to_cpu(j->seq) > ja->highest_seq_found) { ··· 1140 * bucket: 1141 */ 1142 if (le64_to_cpu(j->seq) < ja->bucket_seq[bucket]) 1143 - goto out; 1144 1145 ja->bucket_seq[bucket] = le64_to_cpu(j->seq); 1146 1147 - enum bch_csum_type csum_type = JSET_CSUM_TYPE(j); 1148 struct bch_csum csum; 1149 csum_good = jset_csum_good(c, j, &csum); 1150 1151 bch2_account_io_completion(ca, BCH_MEMBER_ERROR_checksum, 0, csum_good); 1152 1153 if (!csum_good) { 1154 - bch_err_dev_ratelimited(ca, "%s", 1155 - (printbuf_reset(&err), 1156 - prt_str(&err, "journal "), 1157 - bch2_csum_err_msg(&err, csum_type, j->csum, csum), 1158 - err.buf)); 1159 saw_bad = true; 1160 } 1161 ··· 1165 mutex_lock(&jlist->lock); 1166 ret = journal_entry_add(c, ca, (struct journal_ptr) { 1167 .csum_good = csum_good, 1168 .dev = ca->dev_idx, 1169 .bucket = bucket, 1170 .bucket_offset = offset - ··· 1180 case JOURNAL_ENTRY_ADD_OUT_OF_RANGE: 1181 break; 1182 default: 1183 - goto err; 1184 } 1185 next_block: 1186 pr_debug("next"); ··· 1189 j = ((void *) j) + (sectors << 9); 1190 } 1191 1192 - out: 1193 - ret = 0; 1194 - err: 1195 - printbuf_exit(&err); 1196 - return ret; 1197 } 1198 1199 static CLOSURE_CALLBACK(bch2_journal_read_device) ··· 1206 if (!ja->nr) 1207 goto out; 1208 1209 - ret = journal_read_buf_realloc(&buf, PAGE_SIZE); 1210 if (ret) 1211 goto err; 1212 ··· 1238 goto out; 1239 } 1240 1241 int bch2_journal_read(struct bch_fs *c, 1242 u64 *last_seq, 1243 u64 *blacklist_seq, 1244 u64 *start_seq) 1245 { 1246 struct journal_list jlist; 1247 - struct journal_replay *i, **_i, *prev = NULL; 1248 struct genradix_iter radix_iter; 1249 struct printbuf buf = PRINTBUF; 1250 bool degraded = false, last_write_torn = false; ··· 1427 return 0; 1428 } 1429 1430 - bch_info(c, "journal read done, replaying entries %llu-%llu", 1431 - *last_seq, *blacklist_seq - 1); 1432 - 1433 if (*start_seq != *blacklist_seq) 1434 - bch_info(c, "dropped unflushed entries %llu-%llu", 1435 - *blacklist_seq, *start_seq - 1); 1436 1437 /* Drop blacklisted entries and entries older than last_seq: */ 1438 genradix_for_each(&c->journal_entries, radix_iter, _i) { ··· 1455 } 1456 } 1457 1458 - /* Check for missing entries: */ 1459 - seq = *last_seq; 1460 - genradix_for_each(&c->journal_entries, radix_iter, _i) { 1461 - i = *_i; 1462 - 1463 - if (journal_replay_ignore(i)) 1464 - continue; 1465 - 1466 - BUG_ON(seq > le64_to_cpu(i->j.seq)); 1467 - 1468 - while (seq < le64_to_cpu(i->j.seq)) { 1469 - u64 missing_start, missing_end; 1470 - struct printbuf buf1 = PRINTBUF, buf2 = PRINTBUF; 1471 - 1472 - while (seq < le64_to_cpu(i->j.seq) && 1473 - bch2_journal_seq_is_blacklisted(c, seq, false)) 1474 - seq++; 1475 - 1476 - if (seq == le64_to_cpu(i->j.seq)) 1477 - break; 1478 - 1479 - missing_start = seq; 1480 - 1481 - while (seq < le64_to_cpu(i->j.seq) && 1482 - !bch2_journal_seq_is_blacklisted(c, seq, false)) 1483 - seq++; 1484 - 1485 - if (prev) { 1486 - bch2_journal_ptrs_to_text(&buf1, c, prev); 1487 - prt_printf(&buf1, " size %zu", vstruct_sectors(&prev->j, c->block_bits)); 1488 - } else 1489 - prt_printf(&buf1, "(none)"); 1490 - bch2_journal_ptrs_to_text(&buf2, c, i); 1491 - 1492 - missing_end = seq - 1; 1493 - fsck_err(c, journal_entries_missing, 1494 - "journal entries %llu-%llu missing! (replaying %llu-%llu)\n" 1495 - "prev at %s\n" 1496 - "next at %s, continue?", 1497 - missing_start, missing_end, 1498 - *last_seq, *blacklist_seq - 1, 1499 - buf1.buf, buf2.buf); 1500 - 1501 - printbuf_exit(&buf1); 1502 - printbuf_exit(&buf2); 1503 - } 1504 - 1505 - prev = i; 1506 - seq++; 1507 - } 1508 1509 genradix_for_each(&c->journal_entries, radix_iter, _i) { 1510 union bch_replicas_padded replicas = { ··· 1470 if (journal_replay_ignore(i)) 1471 continue; 1472 1473 - darray_for_each(i->ptrs, ptr) { 1474 - struct bch_dev *ca = bch2_dev_have_ref(c, ptr->dev); 1475 - 1476 - if (!ptr->csum_good) 1477 - bch_err_dev_offset(ca, ptr->sector, 1478 - "invalid journal checksum, seq %llu%s", 1479 - le64_to_cpu(i->j.seq), 1480 - i->csum_good ? " (had good copy on another device)" : ""); 1481 - } 1482 1483 ret = jset_validate(c, 1484 bch2_dev_have_ref(c, i->ptrs.data[0].dev), ··· 1521 { 1522 struct bch_fs *c = container_of(j, struct bch_fs, journal); 1523 1524 - rcu_read_lock(); 1525 darray_for_each(*devs, i) { 1526 struct bch_dev *ca = rcu_dereference(c->devs[*i]); 1527 if (!ca) ··· 1543 ja->bucket_seq[ja->cur_idx] = le64_to_cpu(seq); 1544 } 1545 } 1546 - rcu_read_unlock(); 1547 } 1548 1549 static void __journal_write_alloc(struct journal *j, ··· 1612 1613 retry_target: 1614 devs = target_rw_devs(c, BCH_DATA_journal, target); 1615 - devs_sorted = bch2_dev_alloc_list(c, &j->wp.stripe, &devs); 1616 retry_alloc: 1617 __journal_write_alloc(j, w, &devs_sorted, sectors, replicas, replicas_want); 1618 ··· 1633 } 1634 done: 1635 BUG_ON(bkey_val_u64s(&w->key.k) > BCH_REPLICAS_MAX); 1636 1637 return *replicas >= replicas_need ? 0 : -BCH_ERR_insufficient_journal_devices; 1638 } ··· 1691 : j->noflush_write_time, j->write_start_time); 1692 1693 if (!w->devs_written.nr) { 1694 - err = -BCH_ERR_journal_write_err; 1695 } else { 1696 bch2_devlist_to_replicas(&replicas.e, BCH_DATA_journal, 1697 w->devs_written); ··· 2121 struct journal *j = container_of(w, struct journal, buf[w->idx]); 2122 struct bch_fs *c = container_of(j, struct bch_fs, journal); 2123 union bch_replicas_padded replicas; 2124 - unsigned nr_rw_members = dev_mask_nr(&c->rw_devs[BCH_DATA_journal]); 2125 int ret; 2126 2127 BUG_ON(BCH_SB_CLEAN(c->disk_sb.sb));
··· 49 mutex_unlock(&c->sb_lock); 50 } 51 52 + static void bch2_journal_ptr_to_text(struct printbuf *out, struct bch_fs *c, struct journal_ptr *p) 53 + { 54 + struct bch_dev *ca = bch2_dev_tryget_noerror(c, p->dev); 55 + prt_printf(out, "%s %u:%u:%u (sector %llu)", 56 + ca ? ca->name : "(invalid dev)", 57 + p->dev, p->bucket, p->bucket_offset, p->sector); 58 + bch2_dev_put(ca); 59 + } 60 + 61 + void bch2_journal_ptrs_to_text(struct printbuf *out, struct bch_fs *c, struct journal_replay *j) 62 { 63 darray_for_each(j->ptrs, i) { 64 if (i != j->ptrs.data) 65 prt_printf(out, " "); 66 + bch2_journal_ptr_to_text(out, c, i); 67 + } 68 + } 69 + 70 + static void bch2_journal_datetime_to_text(struct printbuf *out, struct jset *j) 71 + { 72 + for_each_jset_entry_type(entry, j, BCH_JSET_ENTRY_datetime) { 73 + struct jset_entry_datetime *datetime = 74 + container_of(entry, struct jset_entry_datetime, entry); 75 + bch2_prt_datetime(out, le64_to_cpu(datetime->seconds)); 76 + break; 77 } 78 } 79 ··· 64 struct journal_replay *j) 65 { 66 prt_printf(out, "seq %llu ", le64_to_cpu(j->j.seq)); 67 + bch2_journal_datetime_to_text(out, &j->j); 68 + prt_char(out, ' '); 69 bch2_journal_ptrs_to_text(out, c, j); 70 } 71 72 static struct nonce journal_nonce(const struct jset *jset) ··· 188 journal_entry_radix_idx(c, le64_to_cpu(j->seq)), 189 GFP_KERNEL); 190 if (!_i) 191 + return bch_err_throw(c, ENOMEM_journal_entry_add); 192 193 /* 194 * Duplicate journal entries? If so we want the one that didn't have a ··· 231 replace: 232 i = kvmalloc(offsetof(struct journal_replay, j) + bytes, GFP_KERNEL); 233 if (!i) 234 + return bch_err_throw(c, ENOMEM_journal_entry_add); 235 236 darray_init(&i->ptrs); 237 i->csum_good = entry_ptr.csum_good; ··· 311 bch2_sb_error_count(c, BCH_FSCK_ERR_##_err); \ 312 if (bch2_fs_inconsistent(c, \ 313 "corrupt metadata before write: %s\n", _buf.buf)) {\ 314 + ret = bch_err_throw(c, fsck_errors_not_fixed); \ 315 goto fsck_err; \ 316 } \ 317 break; \ ··· 418 bool first = true; 419 420 jset_entry_for_each_key(entry, k) { 421 + /* We may be called on entries that haven't been validated: */ 422 + if (!k->k.u64s) 423 + break; 424 + 425 if (!first) { 426 prt_newline(out); 427 bch2_prt_jset_entry_type(out, entry->type); ··· 1005 size_t size; 1006 }; 1007 1008 + static int journal_read_buf_realloc(struct bch_fs *c, struct journal_read_buf *b, 1009 size_t new_size) 1010 { 1011 void *n; 1012 1013 /* the bios are sized for this many pages, max: */ 1014 if (new_size > JOURNAL_ENTRY_SIZE_MAX) 1015 + return bch_err_throw(c, ENOMEM_journal_read_buf_realloc); 1016 1017 new_size = roundup_pow_of_two(new_size); 1018 n = kvmalloc(new_size, GFP_KERNEL); 1019 if (!n) 1020 + return bch_err_throw(c, ENOMEM_journal_read_buf_realloc); 1021 1022 kvfree(b->data); 1023 b->data = n; ··· 1037 u64 offset = bucket_to_sector(ca, ja->buckets[bucket]), 1038 end = offset + ca->mi.bucket_size; 1039 bool saw_bad = false, csum_good; 1040 int ret = 0; 1041 1042 pr_debug("reading %u", bucket); ··· 1053 1054 bio = bio_kmalloc(nr_bvecs, GFP_KERNEL); 1055 if (!bio) 1056 + return bch_err_throw(c, ENOMEM_journal_read_bucket); 1057 bio_init(bio, ca->disk_sb.bdev, bio->bi_inline_vecs, nr_bvecs, REQ_OP_READ); 1058 1059 bio->bi_iter.bi_sector = offset; ··· 1064 kfree(bio); 1065 1066 if (!ret && bch2_meta_read_fault("journal")) 1067 + ret = bch_err_throw(c, EIO_fault_injected); 1068 1069 bch2_account_io_completion(ca, BCH_MEMBER_ERROR_read, 1070 submit_time, !ret); ··· 1078 * found on a different device, and missing or 1079 * no journal entries will be handled later 1080 */ 1081 + return 0; 1082 } 1083 1084 j = buf->data; ··· 1092 break; 1093 case JOURNAL_ENTRY_REREAD: 1094 if (vstruct_bytes(j) > buf->size) { 1095 + ret = journal_read_buf_realloc(c, buf, 1096 vstruct_bytes(j)); 1097 if (ret) 1098 + return ret; 1099 } 1100 goto reread; 1101 case JOURNAL_ENTRY_NONE: 1102 if (!saw_bad) 1103 + return 0; 1104 /* 1105 * On checksum error we don't really trust the size 1106 * field of the journal entry we read, so try reading ··· 1109 sectors = block_sectors(c); 1110 goto next_block; 1111 default: 1112 + return ret; 1113 } 1114 1115 if (le64_to_cpu(j->seq) > ja->highest_seq_found) { ··· 1126 * bucket: 1127 */ 1128 if (le64_to_cpu(j->seq) < ja->bucket_seq[bucket]) 1129 + return 0; 1130 1131 ja->bucket_seq[bucket] = le64_to_cpu(j->seq); 1132 1133 struct bch_csum csum; 1134 csum_good = jset_csum_good(c, j, &csum); 1135 1136 bch2_account_io_completion(ca, BCH_MEMBER_ERROR_checksum, 0, csum_good); 1137 1138 if (!csum_good) { 1139 + /* 1140 + * Don't print an error here, we'll print the error 1141 + * later if we need this journal entry 1142 + */ 1143 saw_bad = true; 1144 } 1145 ··· 1153 mutex_lock(&jlist->lock); 1154 ret = journal_entry_add(c, ca, (struct journal_ptr) { 1155 .csum_good = csum_good, 1156 + .csum = csum, 1157 .dev = ca->dev_idx, 1158 .bucket = bucket, 1159 .bucket_offset = offset - ··· 1167 case JOURNAL_ENTRY_ADD_OUT_OF_RANGE: 1168 break; 1169 default: 1170 + return ret; 1171 } 1172 next_block: 1173 pr_debug("next"); ··· 1176 j = ((void *) j) + (sectors << 9); 1177 } 1178 1179 + return 0; 1180 } 1181 1182 static CLOSURE_CALLBACK(bch2_journal_read_device) ··· 1197 if (!ja->nr) 1198 goto out; 1199 1200 + ret = journal_read_buf_realloc(c, &buf, PAGE_SIZE); 1201 if (ret) 1202 goto err; 1203 ··· 1229 goto out; 1230 } 1231 1232 + noinline_for_stack 1233 + static void bch2_journal_print_checksum_error(struct bch_fs *c, struct journal_replay *j) 1234 + { 1235 + struct printbuf buf = PRINTBUF; 1236 + enum bch_csum_type csum_type = JSET_CSUM_TYPE(&j->j); 1237 + bool have_good = false; 1238 + 1239 + prt_printf(&buf, "invalid journal checksum(s) at seq %llu ", le64_to_cpu(j->j.seq)); 1240 + bch2_journal_datetime_to_text(&buf, &j->j); 1241 + prt_newline(&buf); 1242 + 1243 + darray_for_each(j->ptrs, ptr) 1244 + if (!ptr->csum_good) { 1245 + bch2_journal_ptr_to_text(&buf, c, ptr); 1246 + prt_char(&buf, ' '); 1247 + bch2_csum_to_text(&buf, csum_type, ptr->csum); 1248 + prt_newline(&buf); 1249 + } else { 1250 + have_good = true; 1251 + } 1252 + 1253 + prt_printf(&buf, "should be "); 1254 + bch2_csum_to_text(&buf, csum_type, j->j.csum); 1255 + 1256 + if (have_good) 1257 + prt_printf(&buf, "\n(had good copy on another device)"); 1258 + 1259 + bch2_print_str(c, KERN_ERR, buf.buf); 1260 + printbuf_exit(&buf); 1261 + } 1262 + 1263 + noinline_for_stack 1264 + static int bch2_journal_check_for_missing(struct bch_fs *c, u64 start_seq, u64 end_seq) 1265 + { 1266 + struct printbuf buf = PRINTBUF; 1267 + int ret = 0; 1268 + 1269 + struct genradix_iter radix_iter; 1270 + struct journal_replay *i, **_i, *prev = NULL; 1271 + u64 seq = start_seq; 1272 + 1273 + genradix_for_each(&c->journal_entries, radix_iter, _i) { 1274 + i = *_i; 1275 + 1276 + if (journal_replay_ignore(i)) 1277 + continue; 1278 + 1279 + BUG_ON(seq > le64_to_cpu(i->j.seq)); 1280 + 1281 + while (seq < le64_to_cpu(i->j.seq)) { 1282 + while (seq < le64_to_cpu(i->j.seq) && 1283 + bch2_journal_seq_is_blacklisted(c, seq, false)) 1284 + seq++; 1285 + 1286 + if (seq == le64_to_cpu(i->j.seq)) 1287 + break; 1288 + 1289 + u64 missing_start = seq; 1290 + 1291 + while (seq < le64_to_cpu(i->j.seq) && 1292 + !bch2_journal_seq_is_blacklisted(c, seq, false)) 1293 + seq++; 1294 + 1295 + u64 missing_end = seq - 1; 1296 + 1297 + printbuf_reset(&buf); 1298 + prt_printf(&buf, "journal entries %llu-%llu missing! (replaying %llu-%llu)", 1299 + missing_start, missing_end, 1300 + start_seq, end_seq); 1301 + 1302 + prt_printf(&buf, "\nprev at "); 1303 + if (prev) { 1304 + bch2_journal_ptrs_to_text(&buf, c, prev); 1305 + prt_printf(&buf, " size %zu", vstruct_sectors(&prev->j, c->block_bits)); 1306 + } else 1307 + prt_printf(&buf, "(none)"); 1308 + 1309 + prt_printf(&buf, "\nnext at "); 1310 + bch2_journal_ptrs_to_text(&buf, c, i); 1311 + prt_printf(&buf, ", continue?"); 1312 + 1313 + fsck_err(c, journal_entries_missing, "%s", buf.buf); 1314 + } 1315 + 1316 + prev = i; 1317 + seq++; 1318 + } 1319 + fsck_err: 1320 + printbuf_exit(&buf); 1321 + return ret; 1322 + } 1323 + 1324 int bch2_journal_read(struct bch_fs *c, 1325 u64 *last_seq, 1326 u64 *blacklist_seq, 1327 u64 *start_seq) 1328 { 1329 struct journal_list jlist; 1330 + struct journal_replay *i, **_i; 1331 struct genradix_iter radix_iter; 1332 struct printbuf buf = PRINTBUF; 1333 bool degraded = false, last_write_torn = false; ··· 1326 return 0; 1327 } 1328 1329 + printbuf_reset(&buf); 1330 + prt_printf(&buf, "journal read done, replaying entries %llu-%llu", 1331 + *last_seq, *blacklist_seq - 1); 1332 if (*start_seq != *blacklist_seq) 1333 + prt_printf(&buf, " (unflushed %llu-%llu)", *blacklist_seq, *start_seq - 1); 1334 + bch_info(c, "%s", buf.buf); 1335 1336 /* Drop blacklisted entries and entries older than last_seq: */ 1337 genradix_for_each(&c->journal_entries, radix_iter, _i) { ··· 1354 } 1355 } 1356 1357 + ret = bch2_journal_check_for_missing(c, *last_seq, *blacklist_seq - 1); 1358 + if (ret) 1359 + goto err; 1360 1361 genradix_for_each(&c->journal_entries, radix_iter, _i) { 1362 union bch_replicas_padded replicas = { ··· 1416 if (journal_replay_ignore(i)) 1417 continue; 1418 1419 + /* 1420 + * Don't print checksum errors until we know we're going to use 1421 + * a given journal entry: 1422 + */ 1423 + darray_for_each(i->ptrs, ptr) 1424 + if (!ptr->csum_good) { 1425 + bch2_journal_print_checksum_error(c, i); 1426 + break; 1427 + } 1428 1429 ret = jset_validate(c, 1430 bch2_dev_have_ref(c, i->ptrs.data[0].dev), ··· 1467 { 1468 struct bch_fs *c = container_of(j, struct bch_fs, journal); 1469 1470 + guard(rcu)(); 1471 darray_for_each(*devs, i) { 1472 struct bch_dev *ca = rcu_dereference(c->devs[*i]); 1473 if (!ca) ··· 1489 ja->bucket_seq[ja->cur_idx] = le64_to_cpu(seq); 1490 } 1491 } 1492 } 1493 1494 static void __journal_write_alloc(struct journal *j, ··· 1559 1560 retry_target: 1561 devs = target_rw_devs(c, BCH_DATA_journal, target); 1562 + bch2_dev_alloc_list(c, &j->wp.stripe, &devs, &devs_sorted); 1563 retry_alloc: 1564 __journal_write_alloc(j, w, &devs_sorted, sectors, replicas, replicas_want); 1565 ··· 1580 } 1581 done: 1582 BUG_ON(bkey_val_u64s(&w->key.k) > BCH_REPLICAS_MAX); 1583 + 1584 + #if 0 1585 + /* 1586 + * XXX: we need a way to alert the user when we go degraded for any 1587 + * reason 1588 + */ 1589 + if (*replicas < min(replicas_want, 1590 + dev_mask_nr(&c->rw_devs[BCH_DATA_free]))) { 1591 + } 1592 + #endif 1593 1594 return *replicas >= replicas_need ? 0 : -BCH_ERR_insufficient_journal_devices; 1595 } ··· 1628 : j->noflush_write_time, j->write_start_time); 1629 1630 if (!w->devs_written.nr) { 1631 + err = bch_err_throw(c, journal_write_err); 1632 } else { 1633 bch2_devlist_to_replicas(&replicas.e, BCH_DATA_journal, 1634 w->devs_written); ··· 2058 struct journal *j = container_of(w, struct journal, buf[w->idx]); 2059 struct bch_fs *c = container_of(j, struct bch_fs, journal); 2060 union bch_replicas_padded replicas; 2061 + unsigned nr_rw_members = dev_mask_nr(&c->rw_devs[BCH_DATA_free]); 2062 int ret; 2063 2064 BUG_ON(BCH_SB_CLEAN(c->disk_sb.sb));
+1
fs/bcachefs/journal_io.h
··· 9 10 struct journal_ptr { 11 bool csum_good; 12 u8 dev; 13 u32 bucket; 14 u32 bucket_offset;
··· 9 10 struct journal_ptr { 11 bool csum_good; 12 + struct bch_csum csum; 13 u8 dev; 14 u32 bucket; 15 u32 bucket_offset;
+18 -26
fs/bcachefs/journal_reclaim.c
··· 83 journal_dev_space_available(struct journal *j, struct bch_dev *ca, 84 enum journal_space_from from) 85 { 86 struct journal_device *ja = &ca->journal; 87 unsigned sectors, buckets, unwritten; 88 u64 seq; 89 90 if (from == journal_space_total) 91 return (struct journal_space) { 92 - .next_entry = ca->mi.bucket_size, 93 - .total = ca->mi.bucket_size * ja->nr, 94 }; 95 96 buckets = bch2_journal_dev_buckets_available(j, ja, from); 97 - sectors = ja->sectors_free; 98 99 /* 100 * We that we don't allocate the space for a journal entry ··· 111 continue; 112 113 /* entry won't fit on this device, skip: */ 114 - if (unwritten > ca->mi.bucket_size) 115 continue; 116 117 if (unwritten >= sectors) { ··· 121 } 122 123 buckets--; 124 - sectors = ca->mi.bucket_size; 125 } 126 127 sectors -= unwritten; ··· 129 130 if (sectors < ca->mi.bucket_size && buckets) { 131 buckets--; 132 - sectors = ca->mi.bucket_size; 133 } 134 135 return (struct journal_space) { 136 .next_entry = sectors, 137 - .total = sectors + buckets * ca->mi.bucket_size, 138 }; 139 } 140 ··· 148 149 BUG_ON(nr_devs_want > ARRAY_SIZE(dev_space)); 150 151 - rcu_read_lock(); 152 for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) { 153 if (!ca->journal.nr || 154 !ca->mi.durability) ··· 165 166 array_insert_item(dev_space, nr_devs, pos, space); 167 } 168 - rcu_read_unlock(); 169 170 if (nr_devs < nr_devs_want) 171 return (struct journal_space) { 0, 0 }; ··· 189 int ret = 0; 190 191 lockdep_assert_held(&j->lock); 192 193 - rcu_read_lock(); 194 for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) { 195 struct journal_device *ja = &ca->journal; 196 ··· 210 max_entry_size = min_t(unsigned, max_entry_size, ca->mi.bucket_size); 211 nr_online++; 212 } 213 - rcu_read_unlock(); 214 215 j->can_discard = can_discard; 216 ··· 220 prt_printf(&buf, "insufficient writeable journal devices available: have %u, need %u\n" 221 "rw journal devs:", nr_online, metadata_replicas_required(c)); 222 223 - rcu_read_lock(); 224 for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) 225 prt_printf(&buf, " %s", ca->name); 226 - rcu_read_unlock(); 227 228 bch_err(c, "%s", buf.buf); 229 printbuf_exit(&buf); 230 } 231 - ret = -BCH_ERR_insufficient_journal_devices; 232 goto out; 233 } 234 ··· 240 total = j->space[journal_space_total].total; 241 242 if (!j->space[journal_space_discarded].next_entry) 243 - ret = -BCH_ERR_journal_full; 244 245 if ((j->space[journal_space_clean_ondisk].next_entry < 246 j->space[journal_space_clean_ondisk].total) && ··· 253 bch2_journal_set_watermark(j); 254 out: 255 j->cur_entry_sectors = !ret 256 - ? round_down(j->space[journal_space_discarded].next_entry, 257 - block_sectors(c)) 258 : 0; 259 j->cur_entry_error = ret; 260 ··· 621 struct bch_fs *c = container_of(j, struct bch_fs, journal); 622 u64 seq_to_flush = 0; 623 624 - spin_lock(&j->lock); 625 626 - rcu_read_lock(); 627 for_each_rw_member_rcu(c, ca) { 628 struct journal_device *ja = &ca->journal; 629 unsigned nr_buckets, bucket_to_flush; ··· 638 seq_to_flush = max(seq_to_flush, 639 ja->bucket_seq[bucket_to_flush]); 640 } 641 - rcu_read_unlock(); 642 643 /* Also flush if the pin fifo is more than half full */ 644 - seq_to_flush = max_t(s64, seq_to_flush, 645 - (s64) journal_cur_seq(j) - 646 - (j->pin.size >> 1)); 647 - spin_unlock(&j->lock); 648 - 649 - return seq_to_flush; 650 } 651 652 /**
··· 83 journal_dev_space_available(struct journal *j, struct bch_dev *ca, 84 enum journal_space_from from) 85 { 86 + struct bch_fs *c = container_of(j, struct bch_fs, journal); 87 struct journal_device *ja = &ca->journal; 88 unsigned sectors, buckets, unwritten; 89 + unsigned bucket_size_aligned = round_down(ca->mi.bucket_size, block_sectors(c)); 90 u64 seq; 91 92 if (from == journal_space_total) 93 return (struct journal_space) { 94 + .next_entry = bucket_size_aligned, 95 + .total = bucket_size_aligned * ja->nr, 96 }; 97 98 buckets = bch2_journal_dev_buckets_available(j, ja, from); 99 + sectors = round_down(ja->sectors_free, block_sectors(c)); 100 101 /* 102 * We that we don't allocate the space for a journal entry ··· 109 continue; 110 111 /* entry won't fit on this device, skip: */ 112 + if (unwritten > bucket_size_aligned) 113 continue; 114 115 if (unwritten >= sectors) { ··· 119 } 120 121 buckets--; 122 + sectors = bucket_size_aligned; 123 } 124 125 sectors -= unwritten; ··· 127 128 if (sectors < ca->mi.bucket_size && buckets) { 129 buckets--; 130 + sectors = bucket_size_aligned; 131 } 132 133 return (struct journal_space) { 134 .next_entry = sectors, 135 + .total = sectors + buckets * bucket_size_aligned, 136 }; 137 } 138 ··· 146 147 BUG_ON(nr_devs_want > ARRAY_SIZE(dev_space)); 148 149 for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) { 150 if (!ca->journal.nr || 151 !ca->mi.durability) ··· 164 165 array_insert_item(dev_space, nr_devs, pos, space); 166 } 167 168 if (nr_devs < nr_devs_want) 169 return (struct journal_space) { 0, 0 }; ··· 189 int ret = 0; 190 191 lockdep_assert_held(&j->lock); 192 + guard(rcu)(); 193 194 for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) { 195 struct journal_device *ja = &ca->journal; 196 ··· 210 max_entry_size = min_t(unsigned, max_entry_size, ca->mi.bucket_size); 211 nr_online++; 212 } 213 214 j->can_discard = can_discard; 215 ··· 221 prt_printf(&buf, "insufficient writeable journal devices available: have %u, need %u\n" 222 "rw journal devs:", nr_online, metadata_replicas_required(c)); 223 224 for_each_member_device_rcu(c, ca, &c->rw_devs[BCH_DATA_journal]) 225 prt_printf(&buf, " %s", ca->name); 226 227 bch_err(c, "%s", buf.buf); 228 printbuf_exit(&buf); 229 } 230 + ret = bch_err_throw(c, insufficient_journal_devices); 231 goto out; 232 } 233 ··· 243 total = j->space[journal_space_total].total; 244 245 if (!j->space[journal_space_discarded].next_entry) 246 + ret = bch_err_throw(c, journal_full); 247 248 if ((j->space[journal_space_clean_ondisk].next_entry < 249 j->space[journal_space_clean_ondisk].total) && ··· 256 bch2_journal_set_watermark(j); 257 out: 258 j->cur_entry_sectors = !ret 259 + ? j->space[journal_space_discarded].next_entry 260 : 0; 261 j->cur_entry_error = ret; 262 ··· 625 struct bch_fs *c = container_of(j, struct bch_fs, journal); 626 u64 seq_to_flush = 0; 627 628 + guard(spinlock)(&j->lock); 629 + guard(rcu)(); 630 631 for_each_rw_member_rcu(c, ca) { 632 struct journal_device *ja = &ca->journal; 633 unsigned nr_buckets, bucket_to_flush; ··· 642 seq_to_flush = max(seq_to_flush, 643 ja->bucket_seq[bucket_to_flush]); 644 } 645 646 /* Also flush if the pin fifo is more than half full */ 647 + return max_t(s64, seq_to_flush, 648 + (s64) journal_cur_seq(j) - 649 + (j->pin.size >> 1)); 650 } 651 652 /**
+1 -1
fs/bcachefs/journal_sb.c
··· 210 j = bch2_sb_field_resize(&ca->disk_sb, journal_v2, 211 (sizeof(*j) + sizeof(j->d[0]) * nr_compacted) / sizeof(u64)); 212 if (!j) 213 - return -BCH_ERR_ENOSPC_sb_journal; 214 215 bch2_sb_field_delete(&ca->disk_sb, BCH_SB_FIELD_journal); 216
··· 210 j = bch2_sb_field_resize(&ca->disk_sb, journal_v2, 211 (sizeof(*j) + sizeof(j->d[0]) * nr_compacted) / sizeof(u64)); 212 if (!j) 213 + return bch_err_throw(c, ENOSPC_sb_journal); 214 215 bch2_sb_field_delete(&ca->disk_sb, BCH_SB_FIELD_journal); 216
+2 -2
fs/bcachefs/journal_seq_blacklist.c
··· 78 bl = bch2_sb_field_resize(&c->disk_sb, journal_seq_blacklist, 79 sb_blacklist_u64s(nr + 1)); 80 if (!bl) { 81 - ret = -BCH_ERR_ENOSPC_sb_journal_seq_blacklist; 82 goto out; 83 } 84 ··· 152 153 t = kzalloc(struct_size(t, entries, nr), GFP_KERNEL); 154 if (!t) 155 - return -BCH_ERR_ENOMEM_blacklist_table_init; 156 157 t->nr = nr; 158
··· 78 bl = bch2_sb_field_resize(&c->disk_sb, journal_seq_blacklist, 79 sb_blacklist_u64s(nr + 1)); 80 if (!bl) { 81 + ret = bch_err_throw(c, ENOSPC_sb_journal_seq_blacklist); 82 goto out; 83 } 84 ··· 152 153 t = kzalloc(struct_size(t, entries, nr), GFP_KERNEL); 154 if (!t) 155 + return bch_err_throw(c, ENOMEM_blacklist_table_init); 156 157 t->nr = nr; 158
+2 -4
fs/bcachefs/lru.c
··· 145 case BCH_LRU_fragmentation: { 146 a = bch2_alloc_to_v4(k, &a_convert); 147 148 - rcu_read_lock(); 149 struct bch_dev *ca = bch2_dev_rcu_noerror(c, k.k->p.inode); 150 - u64 idx = ca 151 ? alloc_lru_idx_fragmentation(*a, ca) 152 : 0; 153 - rcu_read_unlock(); 154 - return idx; 155 } 156 case BCH_LRU_stripes: 157 return k.k->type == KEY_TYPE_stripe
··· 145 case BCH_LRU_fragmentation: { 146 a = bch2_alloc_to_v4(k, &a_convert); 147 148 + guard(rcu)(); 149 struct bch_dev *ca = bch2_dev_rcu_noerror(c, k.k->p.inode); 150 + return ca 151 ? alloc_lru_idx_fragmentation(*a, ca) 152 : 0; 153 } 154 case BCH_LRU_stripes: 155 return k.k->type == KEY_TYPE_stripe
+2 -2
fs/bcachefs/migrate.c
··· 35 nr_good = bch2_bkey_durability(c, k.s_c); 36 if ((!nr_good && !(flags & lost)) || 37 (nr_good < replicas && !(flags & degraded))) 38 - return -BCH_ERR_remove_would_lose_data; 39 40 return 0; 41 } ··· 156 157 /* don't handle this yet: */ 158 if (flags & BCH_FORCE_IF_METADATA_LOST) 159 - return -BCH_ERR_remove_with_metadata_missing_unimplemented; 160 161 trans = bch2_trans_get(c); 162 bch2_bkey_buf_init(&k);
··· 35 nr_good = bch2_bkey_durability(c, k.s_c); 36 if ((!nr_good && !(flags & lost)) || 37 (nr_good < replicas && !(flags & degraded))) 38 + return bch_err_throw(c, remove_would_lose_data); 39 40 return 0; 41 } ··· 156 157 /* don't handle this yet: */ 158 if (flags & BCH_FORCE_IF_METADATA_LOST) 159 + return bch_err_throw(c, remove_with_metadata_missing_unimplemented); 160 161 trans = bch2_trans_get(c); 162 bch2_bkey_buf_init(&k);
+92 -42
fs/bcachefs/move.c
··· 38 NULL 39 }; 40 41 - static void trace_io_move2(struct bch_fs *c, struct bkey_s_c k, 42 - struct bch_io_opts *io_opts, 43 - struct data_update_opts *data_opts) 44 - { 45 - if (trace_io_move_enabled()) { 46 - struct printbuf buf = PRINTBUF; 47 48 - bch2_bkey_val_to_text(&buf, c, k); 49 - prt_newline(&buf); 50 - bch2_data_update_opts_to_text(&buf, c, io_opts, data_opts); 51 - trace_io_move(c, buf.buf); 52 - printbuf_exit(&buf); 53 - } 54 } 55 56 - static void trace_io_move_read2(struct bch_fs *c, struct bkey_s_c k) 57 { 58 - if (trace_io_move_read_enabled()) { 59 - struct printbuf buf = PRINTBUF; 60 61 - bch2_bkey_val_to_text(&buf, c, k); 62 - trace_io_move_read(c, buf.buf); 63 - printbuf_exit(&buf); 64 } 65 } 66 67 struct moving_io { ··· 342 struct bch_fs *c = trans->c; 343 int ret = -ENOMEM; 344 345 - trace_io_move2(c, k, &io_opts, &data_opts); 346 this_cpu_add(c->counters[BCH_COUNTER_io_move], k.k->size); 347 348 if (ctxt->stats) ··· 359 return 0; 360 } 361 362 - /* 363 - * Before memory allocations & taking nocow locks in 364 - * bch2_data_update_init(): 365 - */ 366 - bch2_trans_unlock(trans); 367 - 368 - struct moving_io *io = kzalloc(sizeof(struct moving_io), GFP_KERNEL); 369 if (!io) 370 goto err; 371 372 INIT_LIST_HEAD(&io->io_list); 373 io->write.ctxt = ctxt; ··· 385 386 io->write.op.c = c; 387 io->write.data_opts = data_opts; 388 389 ret = bch2_data_update_bios_init(&io->write, c, &io_opts); 390 if (ret) ··· 409 atomic_inc(&io->b->count); 410 } 411 412 - trace_io_move_read2(c, k); 413 414 mutex_lock(&ctxt->lock); 415 atomic_add(io->read_sectors, &ctxt->read_sectors); ··· 436 err_free: 437 kfree(io); 438 err: 439 - if (bch2_err_matches(ret, BCH_ERR_data_update_done)) 440 - return 0; 441 - 442 if (bch2_err_matches(ret, EROFS) || 443 bch2_err_matches(ret, BCH_ERR_transaction_restart)) 444 return ret; ··· 451 trace_io_move_start_fail(c, buf.buf); 452 printbuf_exit(&buf); 453 } 454 return ret; 455 } 456 ··· 542 bch2_inode_opts_get(io_opts, c, &inode); 543 } 544 bch2_trans_iter_exit(trans, &inode_iter); 545 out: 546 return bch2_get_update_rebalance_opts(trans, io_opts, extent_iter, extent_k); 547 } ··· 957 } 958 959 struct data_update_opts data_opts = {}; 960 - if (!pred(c, arg, bp.v->btree_id, k, &io_opts, &data_opts)) { 961 bch2_trans_iter_exit(trans, &iter); 962 goto next; 963 } ··· 971 if (data_opts.scrub && 972 !bch2_dev_idx_is_online(c, data_opts.read_dev)) { 973 bch2_trans_iter_exit(trans, &iter); 974 - ret = -BCH_ERR_device_offline; 975 break; 976 } 977 ··· 1046 return ret; 1047 } 1048 1049 - struct evacuate_bucket_arg { 1050 - struct bpos bucket; 1051 - int gen; 1052 - struct data_update_opts data_opts; 1053 - }; 1054 - 1055 static bool evacuate_bucket_pred(struct bch_fs *c, void *_arg, 1056 enum btree_id btree, struct bkey_s_c k, 1057 struct bch_io_opts *io_opts, ··· 1072 struct bpos bucket, int gen, 1073 struct data_update_opts data_opts) 1074 { 1075 struct evacuate_bucket_arg arg = { bucket, gen, data_opts, }; 1076 1077 return __bch2_move_data_phys(ctxt, bucket_in_flight, 1078 bucket.inode, ··· 1176 ? c->opts.metadata_replicas 1177 : io_opts->data_replicas; 1178 1179 - rcu_read_lock(); 1180 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1181 unsigned i = 0; 1182 bkey_for_each_ptr(ptrs, ptr) { ··· 1186 data_opts->kill_ptrs |= BIT(i); 1187 i++; 1188 } 1189 - rcu_read_unlock(); 1190 1191 if (!data_opts->kill_ptrs && 1192 (!nr_good || nr_good >= replicas)) ··· 1293 struct extent_ptr_decoded p; 1294 unsigned i = 0; 1295 1296 - rcu_read_lock(); 1297 bkey_for_each_ptr_decode(k.k, bch2_bkey_ptrs_c(k), p, entry) { 1298 unsigned d = bch2_extent_ptr_durability(c, &p); 1299 ··· 1304 1305 i++; 1306 } 1307 - rcu_read_unlock(); 1308 1309 return data_opts->kill_ptrs != 0; 1310 }
··· 38 NULL 39 }; 40 41 + struct evacuate_bucket_arg { 42 + struct bpos bucket; 43 + int gen; 44 + struct data_update_opts data_opts; 45 + }; 46 47 + static bool evacuate_bucket_pred(struct bch_fs *, void *, 48 + enum btree_id, struct bkey_s_c, 49 + struct bch_io_opts *, 50 + struct data_update_opts *); 51 + 52 + static noinline void 53 + trace_io_move2(struct bch_fs *c, struct bkey_s_c k, 54 + struct bch_io_opts *io_opts, 55 + struct data_update_opts *data_opts) 56 + { 57 + struct printbuf buf = PRINTBUF; 58 + 59 + bch2_bkey_val_to_text(&buf, c, k); 60 + prt_newline(&buf); 61 + bch2_data_update_opts_to_text(&buf, c, io_opts, data_opts); 62 + trace_io_move(c, buf.buf); 63 + printbuf_exit(&buf); 64 } 65 66 + static noinline void trace_io_move_read2(struct bch_fs *c, struct bkey_s_c k) 67 { 68 + struct printbuf buf = PRINTBUF; 69 70 + bch2_bkey_val_to_text(&buf, c, k); 71 + trace_io_move_read(c, buf.buf); 72 + printbuf_exit(&buf); 73 + } 74 + 75 + static noinline void 76 + trace_io_move_pred2(struct bch_fs *c, struct bkey_s_c k, 77 + struct bch_io_opts *io_opts, 78 + struct data_update_opts *data_opts, 79 + move_pred_fn pred, void *_arg, bool p) 80 + { 81 + struct printbuf buf = PRINTBUF; 82 + 83 + prt_printf(&buf, "%ps: %u", pred, p); 84 + 85 + if (pred == evacuate_bucket_pred) { 86 + struct evacuate_bucket_arg *arg = _arg; 87 + prt_printf(&buf, " gen=%u", arg->gen); 88 } 89 + 90 + prt_newline(&buf); 91 + bch2_bkey_val_to_text(&buf, c, k); 92 + prt_newline(&buf); 93 + bch2_data_update_opts_to_text(&buf, c, io_opts, data_opts); 94 + trace_io_move_pred(c, buf.buf); 95 + printbuf_exit(&buf); 96 + } 97 + 98 + static noinline void 99 + trace_io_move_evacuate_bucket2(struct bch_fs *c, struct bpos bucket, int gen) 100 + { 101 + struct printbuf buf = PRINTBUF; 102 + 103 + prt_printf(&buf, "bucket: "); 104 + bch2_bpos_to_text(&buf, bucket); 105 + prt_printf(&buf, " gen: %i\n", gen); 106 + 107 + trace_io_move_evacuate_bucket(c, buf.buf); 108 + printbuf_exit(&buf); 109 } 110 111 struct moving_io { ··· 298 struct bch_fs *c = trans->c; 299 int ret = -ENOMEM; 300 301 + if (trace_io_move_enabled()) 302 + trace_io_move2(c, k, &io_opts, &data_opts); 303 this_cpu_add(c->counters[BCH_COUNTER_io_move], k.k->size); 304 305 if (ctxt->stats) ··· 314 return 0; 315 } 316 317 + struct moving_io *io = allocate_dropping_locks(trans, ret, 318 + kzalloc(sizeof(struct moving_io), _gfp)); 319 if (!io) 320 goto err; 321 + 322 + if (ret) 323 + goto err_free; 324 325 INIT_LIST_HEAD(&io->io_list); 326 io->write.ctxt = ctxt; ··· 342 343 io->write.op.c = c; 344 io->write.data_opts = data_opts; 345 + 346 + bch2_trans_unlock(trans); 347 348 ret = bch2_data_update_bios_init(&io->write, c, &io_opts); 349 if (ret) ··· 364 atomic_inc(&io->b->count); 365 } 366 367 + if (trace_io_move_read_enabled()) 368 + trace_io_move_read2(c, k); 369 370 mutex_lock(&ctxt->lock); 371 atomic_add(io->read_sectors, &ctxt->read_sectors); ··· 390 err_free: 391 kfree(io); 392 err: 393 if (bch2_err_matches(ret, EROFS) || 394 bch2_err_matches(ret, BCH_ERR_transaction_restart)) 395 return ret; ··· 408 trace_io_move_start_fail(c, buf.buf); 409 printbuf_exit(&buf); 410 } 411 + 412 + if (bch2_err_matches(ret, BCH_ERR_data_update_done)) 413 + return 0; 414 return ret; 415 } 416 ··· 496 bch2_inode_opts_get(io_opts, c, &inode); 497 } 498 bch2_trans_iter_exit(trans, &inode_iter); 499 + /* seem to be spinning here? */ 500 out: 501 return bch2_get_update_rebalance_opts(trans, io_opts, extent_iter, extent_k); 502 } ··· 910 } 911 912 struct data_update_opts data_opts = {}; 913 + bool p = pred(c, arg, bp.v->btree_id, k, &io_opts, &data_opts); 914 + 915 + if (trace_io_move_pred_enabled()) 916 + trace_io_move_pred2(c, k, &io_opts, &data_opts, 917 + pred, arg, p); 918 + 919 + if (!p) { 920 bch2_trans_iter_exit(trans, &iter); 921 goto next; 922 } ··· 918 if (data_opts.scrub && 919 !bch2_dev_idx_is_online(c, data_opts.read_dev)) { 920 bch2_trans_iter_exit(trans, &iter); 921 + ret = bch_err_throw(c, device_offline); 922 break; 923 } 924 ··· 993 return ret; 994 } 995 996 static bool evacuate_bucket_pred(struct bch_fs *c, void *_arg, 997 enum btree_id btree, struct bkey_s_c k, 998 struct bch_io_opts *io_opts, ··· 1025 struct bpos bucket, int gen, 1026 struct data_update_opts data_opts) 1027 { 1028 + struct bch_fs *c = ctxt->trans->c; 1029 struct evacuate_bucket_arg arg = { bucket, gen, data_opts, }; 1030 + 1031 + count_event(c, io_move_evacuate_bucket); 1032 + if (trace_io_move_evacuate_bucket_enabled()) 1033 + trace_io_move_evacuate_bucket2(c, bucket, gen); 1034 1035 return __bch2_move_data_phys(ctxt, bucket_in_flight, 1036 bucket.inode, ··· 1124 ? c->opts.metadata_replicas 1125 : io_opts->data_replicas; 1126 1127 + guard(rcu)(); 1128 struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k); 1129 unsigned i = 0; 1130 bkey_for_each_ptr(ptrs, ptr) { ··· 1134 data_opts->kill_ptrs |= BIT(i); 1135 i++; 1136 } 1137 1138 if (!data_opts->kill_ptrs && 1139 (!nr_good || nr_good >= replicas)) ··· 1242 struct extent_ptr_decoded p; 1243 unsigned i = 0; 1244 1245 + guard(rcu)(); 1246 bkey_for_each_ptr_decode(k.k, bch2_bkey_ptrs_c(k), p, entry) { 1247 unsigned d = bch2_extent_ptr_durability(c, &p); 1248 ··· 1253 1254 i++; 1255 } 1256 1257 return data_opts->kill_ptrs != 0; 1258 }
+12 -14
fs/bcachefs/movinggc.c
··· 293 { 294 u64 wait = U64_MAX; 295 296 - rcu_read_lock(); 297 for_each_rw_member_rcu(c, ca) 298 wait = min(wait, bch2_copygc_dev_wait_amount(ca)); 299 - rcu_read_unlock(); 300 - 301 return wait; 302 } 303 ··· 319 320 bch2_printbuf_make_room(out, 4096); 321 322 - rcu_read_lock(); 323 out->atomic++; 324 325 - prt_printf(out, "Currently calculated wait:\n"); 326 - for_each_rw_member_rcu(c, ca) { 327 - prt_printf(out, " %s:\t", ca->name); 328 - prt_human_readable_u64(out, bch2_copygc_dev_wait_amount(ca)); 329 - prt_newline(out); 330 } 331 - 332 - struct task_struct *t = rcu_dereference(c->copygc_thread); 333 - if (t) 334 - get_task_struct(t); 335 --out->atomic; 336 - rcu_read_unlock(); 337 338 if (t) { 339 bch2_prt_task_backtrace(out, t, 0, GFP_KERNEL);
··· 293 { 294 u64 wait = U64_MAX; 295 296 + guard(rcu)(); 297 for_each_rw_member_rcu(c, ca) 298 wait = min(wait, bch2_copygc_dev_wait_amount(ca)); 299 return wait; 300 } 301 ··· 321 322 bch2_printbuf_make_room(out, 4096); 323 324 + struct task_struct *t; 325 out->atomic++; 326 + scoped_guard(rcu) { 327 + prt_printf(out, "Currently calculated wait:\n"); 328 + for_each_rw_member_rcu(c, ca) { 329 + prt_printf(out, " %s:\t", ca->name); 330 + prt_human_readable_u64(out, bch2_copygc_dev_wait_amount(ca)); 331 + prt_newline(out); 332 + } 333 334 + t = rcu_dereference(c->copygc_thread); 335 + if (t) 336 + get_task_struct(t); 337 } 338 --out->atomic; 339 340 if (t) { 341 bch2_prt_task_backtrace(out, t, 0, GFP_KERNEL);
+1 -2
fs/bcachefs/movinggc.h
··· 7 8 static inline void bch2_copygc_wakeup(struct bch_fs *c) 9 { 10 - rcu_read_lock(); 11 struct task_struct *p = rcu_dereference(c->copygc_thread); 12 if (p) 13 wake_up_process(p); 14 - rcu_read_unlock(); 15 } 16 17 void bch2_copygc_stop(struct bch_fs *);
··· 7 8 static inline void bch2_copygc_wakeup(struct bch_fs *c) 9 { 10 + guard(rcu)(); 11 struct task_struct *p = rcu_dereference(c->copygc_thread); 12 if (p) 13 wake_up_process(p); 14 } 15 16 void bch2_copygc_stop(struct bch_fs *);
+7 -14
fs/bcachefs/namei.c
··· 287 } 288 289 if (deleting_subvol && !inode_u->bi_subvol) { 290 - ret = -BCH_ERR_ENOENT_not_subvol; 291 goto err; 292 } 293 ··· 425 } 426 427 ret = bch2_dirent_rename(trans, 428 - src_dir, &src_hash, &src_dir_u->bi_size, 429 - dst_dir, &dst_hash, &dst_dir_u->bi_size, 430 src_name, &src_inum, &src_offset, 431 dst_name, &dst_inum, &dst_offset, 432 mode); ··· 633 break; 634 635 if (!inode.bi_dir && !inode.bi_dir_offset) { 636 - ret = -BCH_ERR_ENOENT_inode_no_backpointer; 637 goto disconnected; 638 } 639 ··· 733 return __bch2_fsck_write_inode(trans, target); 734 } 735 736 - if (bch2_inode_should_have_single_bp(target) && 737 - !fsck_err(trans, inode_wrong_backpointer, 738 - "dirent points to inode that does not point back:\n%s", 739 - (bch2_bkey_val_to_text(&buf, c, d.s_c), 740 - prt_newline(&buf), 741 - bch2_inode_unpacked_to_text(&buf, target), 742 - buf.buf))) 743 - goto err; 744 - 745 struct bkey_s_c_dirent bp_dirent = 746 bch2_bkey_get_iter_typed(trans, &bp_iter, BTREE_ID_dirents, 747 SPOS(target->bi_dir, target->bi_dir_offset, target->bi_snapshot), ··· 759 ret = __bch2_fsck_write_inode(trans, target); 760 } 761 } else { 762 bch2_bkey_val_to_text(&buf, c, d.s_c); 763 prt_newline(&buf); 764 bch2_bkey_val_to_text(&buf, c, bp_dirent.s_c); ··· 849 n->v.d_inum = cpu_to_le64(target->bi_inum); 850 } 851 852 - ret = bch2_trans_update(trans, dirent_iter, &n->k_i, 0); 853 if (ret) 854 goto err; 855 }
··· 287 } 288 289 if (deleting_subvol && !inode_u->bi_subvol) { 290 + ret = bch_err_throw(c, ENOENT_not_subvol); 291 goto err; 292 } 293 ··· 425 } 426 427 ret = bch2_dirent_rename(trans, 428 + src_dir, &src_hash, 429 + dst_dir, &dst_hash, 430 src_name, &src_inum, &src_offset, 431 dst_name, &dst_inum, &dst_offset, 432 mode); ··· 633 break; 634 635 if (!inode.bi_dir && !inode.bi_dir_offset) { 636 + ret = bch_err_throw(trans->c, ENOENT_inode_no_backpointer); 637 goto disconnected; 638 } 639 ··· 733 return __bch2_fsck_write_inode(trans, target); 734 } 735 736 struct bkey_s_c_dirent bp_dirent = 737 bch2_bkey_get_iter_typed(trans, &bp_iter, BTREE_ID_dirents, 738 SPOS(target->bi_dir, target->bi_dir_offset, target->bi_snapshot), ··· 768 ret = __bch2_fsck_write_inode(trans, target); 769 } 770 } else { 771 + printbuf_reset(&buf); 772 bch2_bkey_val_to_text(&buf, c, d.s_c); 773 prt_newline(&buf); 774 bch2_bkey_val_to_text(&buf, c, bp_dirent.s_c); ··· 857 n->v.d_inum = cpu_to_le64(target->bi_inum); 858 } 859 860 + ret = bch2_trans_update(trans, dirent_iter, &n->k_i, 861 + BTREE_UPDATE_internal_snapshot_node); 862 if (ret) 863 goto err; 864 }
+8
fs/bcachefs/printbuf.h
··· 140 .size = _size, \ 141 }) 142 143 /* 144 * Returns size remaining of output buffer: 145 */
··· 140 .size = _size, \ 141 }) 142 143 + static inline struct printbuf bch2_printbuf_init(void) 144 + { 145 + return PRINTBUF; 146 + } 147 + 148 + DEFINE_CLASS(printbuf, struct printbuf, 149 + bch2_printbuf_exit(&_T), bch2_printbuf_init(), void) 150 + 151 /* 152 * Returns size remaining of output buffer: 153 */
+3 -3
fs/bcachefs/quota.c
··· 527 struct bch_sb_field_quota *sb_quota = bch2_sb_get_or_create_quota(&c->disk_sb); 528 if (!sb_quota) { 529 mutex_unlock(&c->sb_lock); 530 - return -BCH_ERR_ENOSPC_sb_quota; 531 } 532 533 bch2_sb_quota_read(c); ··· 572 mutex_lock(&c->sb_lock); 573 sb_quota = bch2_sb_get_or_create_quota(&c->disk_sb); 574 if (!sb_quota) { 575 - ret = -BCH_ERR_ENOSPC_sb_quota; 576 goto unlock; 577 } 578 ··· 726 mutex_lock(&c->sb_lock); 727 sb_quota = bch2_sb_get_or_create_quota(&c->disk_sb); 728 if (!sb_quota) { 729 - ret = -BCH_ERR_ENOSPC_sb_quota; 730 goto unlock; 731 } 732
··· 527 struct bch_sb_field_quota *sb_quota = bch2_sb_get_or_create_quota(&c->disk_sb); 528 if (!sb_quota) { 529 mutex_unlock(&c->sb_lock); 530 + return bch_err_throw(c, ENOSPC_sb_quota); 531 } 532 533 bch2_sb_quota_read(c); ··· 572 mutex_lock(&c->sb_lock); 573 sb_quota = bch2_sb_get_or_create_quota(&c->disk_sb); 574 if (!sb_quota) { 575 + ret = bch_err_throw(c, ENOSPC_sb_quota); 576 goto unlock; 577 } 578 ··· 726 mutex_lock(&c->sb_lock); 727 sb_quota = bch2_sb_get_or_create_quota(&c->disk_sb); 728 if (!sb_quota) { 729 + ret = bch_err_throw(c, ENOSPC_sb_quota); 730 goto unlock; 731 } 732
+14 -13
fs/bcachefs/rebalance.c
··· 80 unsigned ptr_bit = 1; 81 unsigned rewrite_ptrs = 0; 82 83 - rcu_read_lock(); 84 bkey_for_each_ptr(ptrs, ptr) { 85 if (!ptr->cached && !bch2_dev_in_target(c, ptr->dev, opts->background_target)) 86 rewrite_ptrs |= ptr_bit; 87 ptr_bit <<= 1; 88 } 89 - rcu_read_unlock(); 90 91 return rewrite_ptrs; 92 } ··· 134 } 135 incompressible: 136 if (opts->background_target) { 137 - rcu_read_lock(); 138 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) 139 if (!p.ptr.cached && 140 !bch2_dev_in_target(c, p.ptr.dev, opts->background_target)) 141 sectors += p.crc.compressed_size; 142 - rcu_read_unlock(); 143 } 144 145 return sectors; ··· 443 if (bch2_err_matches(ret, ENOMEM)) { 444 /* memory allocation failure, wait for some IO to finish */ 445 bch2_move_ctxt_wait_for_io(ctxt); 446 - ret = -BCH_ERR_transaction_restart_nested; 447 } 448 449 if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) ··· 525 r->state = BCH_REBALANCE_waiting; 526 } 527 528 - bch2_kthread_io_clock_wait(clock, r->wait_iotime_end, MAX_SCHEDULE_TIMEOUT); 529 } 530 531 static bool bch2_rebalance_enabled(struct bch_fs *c) ··· 542 struct bch_fs_rebalance *r = &c->rebalance; 543 struct btree_iter rebalance_work_iter, extent_iter = {}; 544 struct bkey_s_c k; 545 int ret = 0; 546 547 bch2_trans_begin(trans); ··· 592 if (!ret && 593 !kthread_should_stop() && 594 !atomic64_read(&r->work_stats.sectors_seen) && 595 - !atomic64_read(&r->scan_stats.sectors_seen)) { 596 bch2_moving_ctxt_flush_all(ctxt); 597 bch2_trans_unlock_long(trans); 598 rebalance_wait(c); ··· 677 } 678 prt_newline(out); 679 680 - rcu_read_lock(); 681 - struct task_struct *t = rcu_dereference(c->rebalance.thread); 682 - if (t) 683 - get_task_struct(t); 684 - rcu_read_unlock(); 685 686 if (t) { 687 bch2_prt_task_backtrace(out, t, 0, GFP_KERNEL); ··· 795 BTREE_ID_extents, POS_MIN, 796 BTREE_ITER_prefetch| 797 BTREE_ITER_all_snapshots); 798 - return -BCH_ERR_transaction_restart_nested; 799 } 800 801 if (!extent_k.k && !rebalance_k.k)
··· 80 unsigned ptr_bit = 1; 81 unsigned rewrite_ptrs = 0; 82 83 + guard(rcu)(); 84 bkey_for_each_ptr(ptrs, ptr) { 85 if (!ptr->cached && !bch2_dev_in_target(c, ptr->dev, opts->background_target)) 86 rewrite_ptrs |= ptr_bit; 87 ptr_bit <<= 1; 88 } 89 90 return rewrite_ptrs; 91 } ··· 135 } 136 incompressible: 137 if (opts->background_target) { 138 + guard(rcu)(); 139 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) 140 if (!p.ptr.cached && 141 !bch2_dev_in_target(c, p.ptr.dev, opts->background_target)) 142 sectors += p.crc.compressed_size; 143 } 144 145 return sectors; ··· 445 if (bch2_err_matches(ret, ENOMEM)) { 446 /* memory allocation failure, wait for some IO to finish */ 447 bch2_move_ctxt_wait_for_io(ctxt); 448 + ret = bch_err_throw(c, transaction_restart_nested); 449 } 450 451 if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) ··· 527 r->state = BCH_REBALANCE_waiting; 528 } 529 530 + bch2_kthread_io_clock_wait_once(clock, r->wait_iotime_end, MAX_SCHEDULE_TIMEOUT); 531 } 532 533 static bool bch2_rebalance_enabled(struct bch_fs *c) ··· 544 struct bch_fs_rebalance *r = &c->rebalance; 545 struct btree_iter rebalance_work_iter, extent_iter = {}; 546 struct bkey_s_c k; 547 + u32 kick = r->kick; 548 int ret = 0; 549 550 bch2_trans_begin(trans); ··· 593 if (!ret && 594 !kthread_should_stop() && 595 !atomic64_read(&r->work_stats.sectors_seen) && 596 + !atomic64_read(&r->scan_stats.sectors_seen) && 597 + kick == r->kick) { 598 bch2_moving_ctxt_flush_all(ctxt); 599 bch2_trans_unlock_long(trans); 600 rebalance_wait(c); ··· 677 } 678 prt_newline(out); 679 680 + struct task_struct *t; 681 + scoped_guard(rcu) { 682 + t = rcu_dereference(c->rebalance.thread); 683 + if (t) 684 + get_task_struct(t); 685 + } 686 687 if (t) { 688 bch2_prt_task_backtrace(out, t, 0, GFP_KERNEL); ··· 794 BTREE_ID_extents, POS_MIN, 795 BTREE_ITER_prefetch| 796 BTREE_ITER_all_snapshots); 797 + return bch_err_throw(c, transaction_restart_nested); 798 } 799 800 if (!extent_k.k && !rebalance_k.k)
+3 -5
fs/bcachefs/rebalance.h
··· 39 40 static inline void bch2_rebalance_wakeup(struct bch_fs *c) 41 { 42 - struct task_struct *p; 43 - 44 - rcu_read_lock(); 45 - p = rcu_dereference(c->rebalance.thread); 46 if (p) 47 wake_up_process(p); 48 - rcu_read_unlock(); 49 } 50 51 void bch2_rebalance_status_to_text(struct printbuf *, struct bch_fs *);
··· 39 40 static inline void bch2_rebalance_wakeup(struct bch_fs *c) 41 { 42 + c->rebalance.kick++; 43 + guard(rcu)(); 44 + struct task_struct *p = rcu_dereference(c->rebalance.thread); 45 if (p) 46 wake_up_process(p); 47 } 48 49 void bch2_rebalance_status_to_text(struct printbuf *, struct bch_fs *);
+1
fs/bcachefs/rebalance_types.h
··· 18 19 struct bch_fs_rebalance { 20 struct task_struct __rcu *thread; 21 struct bch_pd_controller pd; 22 23 enum bch_rebalance_states state;
··· 18 19 struct bch_fs_rebalance { 20 struct task_struct __rcu *thread; 21 + u32 kick; 22 struct bch_pd_controller pd; 23 24 enum bch_rebalance_states state;
+1 -5
fs/bcachefs/recovery.c
··· 879 use_clean: 880 if (!clean) { 881 bch_err(c, "no superblock clean section found"); 882 - ret = -BCH_ERR_fsck_repair_impossible; 883 goto err; 884 885 } ··· 1093 out: 1094 bch2_flush_fsck_errs(c); 1095 1096 - if (!c->opts.retain_recovery_info) { 1097 - bch2_journal_keys_put_initial(c); 1098 - bch2_find_btree_nodes_exit(&c->found_btree_nodes); 1099 - } 1100 if (!IS_ERR(clean)) 1101 kfree(clean); 1102
··· 879 use_clean: 880 if (!clean) { 881 bch_err(c, "no superblock clean section found"); 882 + ret = bch_err_throw(c, fsck_repair_impossible); 883 goto err; 884 885 } ··· 1093 out: 1094 bch2_flush_fsck_errs(c); 1095 1096 if (!IS_ERR(clean)) 1097 kfree(clean); 1098
+77 -15
fs/bcachefs/recovery_passes.c
··· 103 prt_tab(out); 104 105 bch2_pr_time_units(out, le32_to_cpu(i->last_runtime) * NSEC_PER_SEC); 106 prt_newline(out); 107 } 108 } 109 110 - static void bch2_sb_recovery_pass_complete(struct bch_fs *c, 111 - enum bch_recovery_pass pass, 112 - s64 start_time) 113 { 114 enum bch_recovery_pass_stable stable = bch2_recovery_pass_to_stable(pass); 115 - s64 end_time = ktime_get_real_seconds(); 116 117 - mutex_lock(&c->sb_lock); 118 - struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 119 - __clear_bit_le64(stable, ext->recovery_passes_required); 120 121 struct bch_sb_field_recovery_passes *r = 122 bch2_sb_field_get(c->disk_sb.sb, recovery_passes); ··· 127 r = bch2_sb_field_resize(&c->disk_sb, recovery_passes, u64s); 128 if (!r) { 129 bch_err(c, "error creating recovery_passes sb section"); 130 - goto out; 131 } 132 } 133 134 - r->start[stable].last_run = cpu_to_le64(end_time); 135 - r->start[stable].last_runtime = cpu_to_le32(max(0, end_time - start_time)); 136 - out: 137 bch2_write_super(c); 138 - mutex_unlock(&c->sb_lock); 139 } 140 141 static bool bch2_recovery_pass_want_ratelimit(struct bch_fs *c, enum bch_recovery_pass pass) ··· 185 */ 186 ret = (u64) le32_to_cpu(i->last_runtime) * 100 > 187 ktime_get_real_seconds() - le64_to_cpu(i->last_run); 188 } 189 190 return ret; ··· 346 goto out; 347 348 bool in_recovery = test_bit(BCH_FS_in_recovery, &c->flags); 349 - bool rewind = in_recovery && r->curr_pass > pass; 350 bool ratelimit = flags & RUN_RECOVERY_PASS_ratelimit; 351 352 if (!(in_recovery && (flags & RUN_RECOVERY_PASS_nopersistent))) { ··· 360 (!in_recovery || r->curr_pass >= BCH_RECOVERY_PASS_set_may_go_rw)) { 361 prt_printf(out, "need recovery pass %s (%u), but already rw\n", 362 bch2_recovery_passes[pass], pass); 363 - ret = -BCH_ERR_cannot_rewind_recovery; 364 goto out; 365 } 366 ··· 380 if (rewind) { 381 r->next_pass = pass; 382 r->passes_complete &= (1ULL << pass) >> 1; 383 - ret = -BCH_ERR_restart_recovery; 384 } 385 } else { 386 prt_printf(out, "scheduling recovery pass %s (%u)%s\n", ··· 413 } 414 415 return ret; 416 } 417 418 int bch2_run_print_explicit_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
··· 103 prt_tab(out); 104 105 bch2_pr_time_units(out, le32_to_cpu(i->last_runtime) * NSEC_PER_SEC); 106 + 107 + if (BCH_RECOVERY_PASS_NO_RATELIMIT(i)) 108 + prt_str(out, " (no ratelimit)"); 109 + 110 prt_newline(out); 111 } 112 } 113 114 + static struct recovery_pass_entry *bch2_sb_recovery_pass_entry(struct bch_fs *c, 115 + enum bch_recovery_pass pass) 116 { 117 enum bch_recovery_pass_stable stable = bch2_recovery_pass_to_stable(pass); 118 119 + lockdep_assert_held(&c->sb_lock); 120 121 struct bch_sb_field_recovery_passes *r = 122 bch2_sb_field_get(c->disk_sb.sb, recovery_passes); ··· 127 r = bch2_sb_field_resize(&c->disk_sb, recovery_passes, u64s); 128 if (!r) { 129 bch_err(c, "error creating recovery_passes sb section"); 130 + return NULL; 131 } 132 } 133 134 + return r->start + stable; 135 + } 136 + 137 + static void bch2_sb_recovery_pass_complete(struct bch_fs *c, 138 + enum bch_recovery_pass pass, 139 + s64 start_time) 140 + { 141 + guard(mutex)(&c->sb_lock); 142 + struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 143 + __clear_bit_le64(bch2_recovery_pass_to_stable(pass), 144 + ext->recovery_passes_required); 145 + 146 + struct recovery_pass_entry *e = bch2_sb_recovery_pass_entry(c, pass); 147 + if (e) { 148 + s64 end_time = ktime_get_real_seconds(); 149 + e->last_run = cpu_to_le64(end_time); 150 + e->last_runtime = cpu_to_le32(max(0, end_time - start_time)); 151 + SET_BCH_RECOVERY_PASS_NO_RATELIMIT(e, false); 152 + } 153 + 154 bch2_write_super(c); 155 + } 156 + 157 + void bch2_recovery_pass_set_no_ratelimit(struct bch_fs *c, 158 + enum bch_recovery_pass pass) 159 + { 160 + guard(mutex)(&c->sb_lock); 161 + 162 + struct recovery_pass_entry *e = bch2_sb_recovery_pass_entry(c, pass); 163 + if (e && !BCH_RECOVERY_PASS_NO_RATELIMIT(e)) { 164 + SET_BCH_RECOVERY_PASS_NO_RATELIMIT(e, false); 165 + bch2_write_super(c); 166 + } 167 } 168 169 static bool bch2_recovery_pass_want_ratelimit(struct bch_fs *c, enum bch_recovery_pass pass) ··· 157 */ 158 ret = (u64) le32_to_cpu(i->last_runtime) * 100 > 159 ktime_get_real_seconds() - le64_to_cpu(i->last_run); 160 + 161 + if (BCH_RECOVERY_PASS_NO_RATELIMIT(i)) 162 + ret = false; 163 } 164 165 return ret; ··· 315 goto out; 316 317 bool in_recovery = test_bit(BCH_FS_in_recovery, &c->flags); 318 + bool rewind = in_recovery && 319 + r->curr_pass > pass && 320 + !(r->passes_complete & BIT_ULL(pass)); 321 bool ratelimit = flags & RUN_RECOVERY_PASS_ratelimit; 322 323 if (!(in_recovery && (flags & RUN_RECOVERY_PASS_nopersistent))) { ··· 327 (!in_recovery || r->curr_pass >= BCH_RECOVERY_PASS_set_may_go_rw)) { 328 prt_printf(out, "need recovery pass %s (%u), but already rw\n", 329 bch2_recovery_passes[pass], pass); 330 + ret = bch_err_throw(c, cannot_rewind_recovery); 331 goto out; 332 } 333 ··· 347 if (rewind) { 348 r->next_pass = pass; 349 r->passes_complete &= (1ULL << pass) >> 1; 350 + ret = bch_err_throw(c, restart_recovery); 351 } 352 } else { 353 prt_printf(out, "scheduling recovery pass %s (%u)%s\n", ··· 380 } 381 382 return ret; 383 + } 384 + 385 + /* 386 + * Returns 0 if @pass has run recently, otherwise one of 387 + * -BCH_ERR_restart_recovery 388 + * -BCH_ERR_recovery_pass_will_run 389 + */ 390 + int bch2_require_recovery_pass(struct bch_fs *c, 391 + struct printbuf *out, 392 + enum bch_recovery_pass pass) 393 + { 394 + if (test_bit(BCH_FS_in_recovery, &c->flags) && 395 + c->recovery.passes_complete & BIT_ULL(pass)) 396 + return 0; 397 + 398 + guard(mutex)(&c->sb_lock); 399 + 400 + if (bch2_recovery_pass_want_ratelimit(c, pass)) 401 + return 0; 402 + 403 + enum bch_run_recovery_pass_flags flags = 0; 404 + int ret = 0; 405 + 406 + if (recovery_pass_needs_set(c, pass, &flags)) { 407 + ret = __bch2_run_explicit_recovery_pass(c, out, pass, flags); 408 + bch2_write_super(c); 409 + } 410 + 411 + return ret ?: bch_err_throw(c, recovery_pass_will_run); 412 } 413 414 int bch2_run_print_explicit_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
+5
fs/bcachefs/recovery_passes.h
··· 10 11 u64 bch2_fsck_recovery_passes(void); 12 13 enum bch_run_recovery_pass_flags { 14 RUN_RECOVERY_PASS_nopersistent = BIT(0), 15 RUN_RECOVERY_PASS_ratelimit = BIT(1), ··· 25 int bch2_run_explicit_recovery_pass(struct bch_fs *, struct printbuf *, 26 enum bch_recovery_pass, 27 enum bch_run_recovery_pass_flags); 28 29 int bch2_run_online_recovery_passes(struct bch_fs *, u64); 30 int bch2_run_recovery_passes(struct bch_fs *, enum bch_recovery_pass);
··· 10 11 u64 bch2_fsck_recovery_passes(void); 12 13 + void bch2_recovery_pass_set_no_ratelimit(struct bch_fs *, enum bch_recovery_pass); 14 + 15 enum bch_run_recovery_pass_flags { 16 RUN_RECOVERY_PASS_nopersistent = BIT(0), 17 RUN_RECOVERY_PASS_ratelimit = BIT(1), ··· 23 int bch2_run_explicit_recovery_pass(struct bch_fs *, struct printbuf *, 24 enum bch_recovery_pass, 25 enum bch_run_recovery_pass_flags); 26 + 27 + int bch2_require_recovery_pass(struct bch_fs *, struct printbuf *, 28 + enum bch_recovery_pass); 29 30 int bch2_run_online_recovery_passes(struct bch_fs *, u64); 31 int bch2_run_recovery_passes(struct bch_fs *, enum bch_recovery_pass);
+2
fs/bcachefs/recovery_passes_format.h
··· 87 __le32 flags; 88 }; 89 90 struct bch_sb_field_recovery_passes { 91 struct bch_sb_field field; 92 struct recovery_pass_entry start[];
··· 87 __le32 flags; 88 }; 89 90 + LE32_BITMASK(BCH_RECOVERY_PASS_NO_RATELIMIT, struct recovery_pass_entry, flags, 0, 1) 91 + 92 struct bch_sb_field_recovery_passes { 93 struct bch_sb_field field; 94 struct recovery_pass_entry start[];
+5 -4
fs/bcachefs/reflink.c
··· 312 313 if (!bkey_refcount_c(k)) { 314 if (!(flags & BTREE_TRIGGER_overwrite)) 315 - ret = -BCH_ERR_missing_indirect_extent; 316 goto next; 317 } 318 ··· 612 int ret = 0, ret2 = 0; 613 614 if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_reflink)) 615 - return -BCH_ERR_erofs_no_writes; 616 617 bch2_check_set_feature(c, BCH_FEATURE_reflink); 618 ··· 711 SET_REFLINK_P_IDX(&dst_p->v, offset); 712 713 if (reflink_p_may_update_opts_field && 714 - may_change_src_io_path_opts) 715 SET_REFLINK_P_MAY_UPDATE_OPTIONS(&dst_p->v, true); 716 } else { 717 BUG(); ··· 848 struct reflink_gc *r = genradix_ptr_alloc(&c->reflink_gc_table, 849 c->reflink_gc_nr++, GFP_KERNEL); 850 if (!r) { 851 - ret = -BCH_ERR_ENOMEM_gc_reflink_start; 852 break; 853 } 854
··· 312 313 if (!bkey_refcount_c(k)) { 314 if (!(flags & BTREE_TRIGGER_overwrite)) 315 + ret = bch_err_throw(c, missing_indirect_extent); 316 goto next; 317 } 318 ··· 612 int ret = 0, ret2 = 0; 613 614 if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_reflink)) 615 + return bch_err_throw(c, erofs_no_writes); 616 617 bch2_check_set_feature(c, BCH_FEATURE_reflink); 618 ··· 711 SET_REFLINK_P_IDX(&dst_p->v, offset); 712 713 if (reflink_p_may_update_opts_field && 714 + may_change_src_io_path_opts && 715 + REFLINK_P_MAY_UPDATE_OPTIONS(src_p.v)) 716 SET_REFLINK_P_MAY_UPDATE_OPTIONS(&dst_p->v, true); 717 } else { 718 BUG(); ··· 847 struct reflink_gc *r = genradix_ptr_alloc(&c->reflink_gc_table, 848 c->reflink_gc_nr++, GFP_KERNEL); 849 if (!r) { 850 + ret = bch_err_throw(c, ENOMEM_gc_reflink_start); 851 break; 852 } 853
+18 -19
fs/bcachefs/replicas.c
··· 119 return 0; 120 bad: 121 bch2_replicas_entry_to_text(err, r); 122 - return -BCH_ERR_invalid_replicas_entry; 123 } 124 125 void bch2_cpu_replicas_to_text(struct printbuf *out, ··· 311 !__replicas_has_entry(&c->replicas_gc, new_entry)) { 312 new_gc = cpu_replicas_add_entry(c, &c->replicas_gc, new_entry); 313 if (!new_gc.entries) { 314 - ret = -BCH_ERR_ENOMEM_cpu_replicas; 315 goto err; 316 } 317 } ··· 319 if (!__replicas_has_entry(&c->replicas, new_entry)) { 320 new_r = cpu_replicas_add_entry(c, &c->replicas, new_entry); 321 if (!new_r.entries) { 322 - ret = -BCH_ERR_ENOMEM_cpu_replicas; 323 goto err; 324 } 325 ··· 422 if (!c->replicas_gc.entries) { 423 mutex_unlock(&c->sb_lock); 424 bch_err(c, "error allocating c->replicas_gc"); 425 - return -BCH_ERR_ENOMEM_replicas_gc; 426 } 427 428 for_each_cpu_replicas_entry(&c->replicas, e) ··· 458 new.entries = kcalloc(nr, new.entry_size, GFP_KERNEL); 459 if (!new.entries) { 460 bch_err(c, "error allocating c->replicas_gc"); 461 - return -BCH_ERR_ENOMEM_replicas_gc; 462 } 463 464 mutex_lock(&c->sb_lock); ··· 622 sb_r = bch2_sb_field_resize(&c->disk_sb, replicas_v0, 623 DIV_ROUND_UP(bytes, sizeof(u64))); 624 if (!sb_r) 625 - return -BCH_ERR_ENOSPC_sb_replicas; 626 627 bch2_sb_field_delete(&c->disk_sb, BCH_SB_FIELD_replicas); 628 sb_r = bch2_sb_field_get(c->disk_sb.sb, replicas_v0); ··· 667 sb_r = bch2_sb_field_resize(&c->disk_sb, replicas, 668 DIV_ROUND_UP(bytes, sizeof(u64))); 669 if (!sb_r) 670 - return -BCH_ERR_ENOSPC_sb_replicas; 671 672 bch2_sb_field_delete(&c->disk_sb, BCH_SB_FIELD_replicas_v0); 673 sb_r = bch2_sb_field_get(c->disk_sb.sb, replicas); ··· 819 if (e->data_type == BCH_DATA_cached) 820 continue; 821 822 - rcu_read_lock(); 823 - for (unsigned i = 0; i < e->nr_devs; i++) { 824 - if (e->devs[i] == BCH_SB_MEMBER_INVALID) { 825 - nr_failed++; 826 - continue; 827 } 828 - 829 - nr_online += test_bit(e->devs[i], devs.d); 830 - 831 - struct bch_dev *ca = bch2_dev_rcu_noerror(c, e->devs[i]); 832 - nr_failed += !ca || ca->mi.state == BCH_MEMBER_STATE_failed; 833 - } 834 - rcu_read_unlock(); 835 836 if (nr_online + nr_failed == e->nr_devs) 837 continue;
··· 119 return 0; 120 bad: 121 bch2_replicas_entry_to_text(err, r); 122 + return bch_err_throw(c, invalid_replicas_entry); 123 } 124 125 void bch2_cpu_replicas_to_text(struct printbuf *out, ··· 311 !__replicas_has_entry(&c->replicas_gc, new_entry)) { 312 new_gc = cpu_replicas_add_entry(c, &c->replicas_gc, new_entry); 313 if (!new_gc.entries) { 314 + ret = bch_err_throw(c, ENOMEM_cpu_replicas); 315 goto err; 316 } 317 } ··· 319 if (!__replicas_has_entry(&c->replicas, new_entry)) { 320 new_r = cpu_replicas_add_entry(c, &c->replicas, new_entry); 321 if (!new_r.entries) { 322 + ret = bch_err_throw(c, ENOMEM_cpu_replicas); 323 goto err; 324 } 325 ··· 422 if (!c->replicas_gc.entries) { 423 mutex_unlock(&c->sb_lock); 424 bch_err(c, "error allocating c->replicas_gc"); 425 + return bch_err_throw(c, ENOMEM_replicas_gc); 426 } 427 428 for_each_cpu_replicas_entry(&c->replicas, e) ··· 458 new.entries = kcalloc(nr, new.entry_size, GFP_KERNEL); 459 if (!new.entries) { 460 bch_err(c, "error allocating c->replicas_gc"); 461 + return bch_err_throw(c, ENOMEM_replicas_gc); 462 } 463 464 mutex_lock(&c->sb_lock); ··· 622 sb_r = bch2_sb_field_resize(&c->disk_sb, replicas_v0, 623 DIV_ROUND_UP(bytes, sizeof(u64))); 624 if (!sb_r) 625 + return bch_err_throw(c, ENOSPC_sb_replicas); 626 627 bch2_sb_field_delete(&c->disk_sb, BCH_SB_FIELD_replicas); 628 sb_r = bch2_sb_field_get(c->disk_sb.sb, replicas_v0); ··· 667 sb_r = bch2_sb_field_resize(&c->disk_sb, replicas, 668 DIV_ROUND_UP(bytes, sizeof(u64))); 669 if (!sb_r) 670 + return bch_err_throw(c, ENOSPC_sb_replicas); 671 672 bch2_sb_field_delete(&c->disk_sb, BCH_SB_FIELD_replicas_v0); 673 sb_r = bch2_sb_field_get(c->disk_sb.sb, replicas); ··· 819 if (e->data_type == BCH_DATA_cached) 820 continue; 821 822 + scoped_guard(rcu) 823 + for (unsigned i = 0; i < e->nr_devs; i++) { 824 + if (e->devs[i] == BCH_SB_MEMBER_INVALID) { 825 + nr_failed++; 826 + continue; 827 + } 828 + 829 + nr_online += test_bit(e->devs[i], devs.d); 830 + 831 + struct bch_dev *ca = bch2_dev_rcu_noerror(c, e->devs[i]); 832 + nr_failed += !ca || ca->mi.state == BCH_MEMBER_STATE_failed; 833 } 834 835 if (nr_online + nr_failed == e->nr_devs) 836 continue;
+1
fs/bcachefs/sb-counters_format.h
··· 26 x(io_move_write_fail, 82, TYPE_COUNTER) \ 27 x(io_move_start_fail, 39, TYPE_COUNTER) \ 28 x(io_move_created_rebalance, 83, TYPE_COUNTER) \ 29 x(bucket_invalidate, 3, TYPE_COUNTER) \ 30 x(bucket_discard, 4, TYPE_COUNTER) \ 31 x(bucket_discard_fast, 79, TYPE_COUNTER) \
··· 26 x(io_move_write_fail, 82, TYPE_COUNTER) \ 27 x(io_move_start_fail, 39, TYPE_COUNTER) \ 28 x(io_move_created_rebalance, 83, TYPE_COUNTER) \ 29 + x(io_move_evacuate_bucket, 84, TYPE_COUNTER) \ 30 x(bucket_invalidate, 3, TYPE_COUNTER) \ 31 x(bucket_discard, 4, TYPE_COUNTER) \ 32 x(bucket_discard_fast, 79, TYPE_COUNTER) \
+1 -1
fs/bcachefs/sb-downgrade.c
··· 417 418 d = bch2_sb_field_resize(&c->disk_sb, downgrade, sb_u64s); 419 if (!d) { 420 - ret = -BCH_ERR_ENOSPC_sb_downgrade; 421 goto out; 422 } 423
··· 417 418 d = bch2_sb_field_resize(&c->disk_sb, downgrade, sb_u64s); 419 if (!d) { 420 + ret = bch_err_throw(c, ENOSPC_sb_downgrade); 421 goto out; 422 } 423
+22
fs/bcachefs/sb-errors.c
··· 78 .to_text = bch2_sb_errors_to_text, 79 }; 80 81 void bch2_sb_error_count(struct bch_fs *c, enum bch_sb_error_id err) 82 { 83 bch_sb_errors_cpu *e = &c->fsck_error_counts;
··· 78 .to_text = bch2_sb_errors_to_text, 79 }; 80 81 + void bch2_fs_errors_to_text(struct printbuf *out, struct bch_fs *c) 82 + { 83 + if (out->nr_tabstops < 1) 84 + printbuf_tabstop_push(out, 48); 85 + if (out->nr_tabstops < 2) 86 + printbuf_tabstop_push(out, 8); 87 + if (out->nr_tabstops < 3) 88 + printbuf_tabstop_push(out, 16); 89 + 90 + guard(mutex)(&c->fsck_error_counts_lock); 91 + 92 + bch_sb_errors_cpu *e = &c->fsck_error_counts; 93 + darray_for_each(*e, i) { 94 + bch2_sb_error_id_to_text(out, i->id); 95 + prt_tab(out); 96 + prt_u64(out, i->nr); 97 + prt_tab(out); 98 + bch2_prt_datetime(out, i->last_error_time); 99 + prt_newline(out); 100 + } 101 + } 102 + 103 void bch2_sb_error_count(struct bch_fs *c, enum bch_sb_error_id err) 104 { 105 bch_sb_errors_cpu *e = &c->fsck_error_counts;
+1
fs/bcachefs/sb-errors.h
··· 7 extern const char * const bch2_sb_error_strs[]; 8 9 void bch2_sb_error_id_to_text(struct printbuf *, enum bch_sb_error_id); 10 11 extern const struct bch_sb_field_ops bch_sb_field_ops_errors; 12
··· 7 extern const char * const bch2_sb_error_strs[]; 8 9 void bch2_sb_error_id_to_text(struct printbuf *, enum bch_sb_error_id); 10 + void bch2_fs_errors_to_text(struct printbuf *, struct bch_fs *); 11 12 extern const struct bch_sb_field_ops bch_sb_field_ops_errors; 13
+3 -1
fs/bcachefs/sb-errors_format.h
··· 232 x(inode_dir_multiple_links, 206, FSCK_AUTOFIX) \ 233 x(inode_dir_missing_backpointer, 284, FSCK_AUTOFIX) \ 234 x(inode_dir_unlinked_but_not_empty, 286, FSCK_AUTOFIX) \ 235 x(inode_multiple_links_but_nlink_0, 207, FSCK_AUTOFIX) \ 236 x(inode_wrong_backpointer, 208, FSCK_AUTOFIX) \ 237 x(inode_wrong_nlink, 209, FSCK_AUTOFIX) \ ··· 244 x(inode_parent_has_case_insensitive_not_set, 317, FSCK_AUTOFIX) \ 245 x(vfs_inode_i_blocks_underflow, 311, FSCK_AUTOFIX) \ 246 x(vfs_inode_i_blocks_not_zero_at_truncate, 313, FSCK_AUTOFIX) \ 247 x(deleted_inode_but_clean, 211, FSCK_AUTOFIX) \ 248 x(deleted_inode_missing, 212, FSCK_AUTOFIX) \ 249 x(deleted_inode_is_dir, 213, FSCK_AUTOFIX) \ ··· 330 x(dirent_stray_data_after_cf_name, 305, 0) \ 331 x(rebalance_work_incorrectly_set, 309, FSCK_AUTOFIX) \ 332 x(rebalance_work_incorrectly_unset, 310, FSCK_AUTOFIX) \ 333 - x(MAX, 319, 0) 334 335 enum bch_sb_error_id { 336 #define x(t, n, ...) BCH_FSCK_ERR_##t = n,
··· 232 x(inode_dir_multiple_links, 206, FSCK_AUTOFIX) \ 233 x(inode_dir_missing_backpointer, 284, FSCK_AUTOFIX) \ 234 x(inode_dir_unlinked_but_not_empty, 286, FSCK_AUTOFIX) \ 235 + x(inode_dir_has_nonzero_i_size, 319, FSCK_AUTOFIX) \ 236 x(inode_multiple_links_but_nlink_0, 207, FSCK_AUTOFIX) \ 237 x(inode_wrong_backpointer, 208, FSCK_AUTOFIX) \ 238 x(inode_wrong_nlink, 209, FSCK_AUTOFIX) \ ··· 243 x(inode_parent_has_case_insensitive_not_set, 317, FSCK_AUTOFIX) \ 244 x(vfs_inode_i_blocks_underflow, 311, FSCK_AUTOFIX) \ 245 x(vfs_inode_i_blocks_not_zero_at_truncate, 313, FSCK_AUTOFIX) \ 246 + x(vfs_bad_inode_rm, 320, 0) \ 247 x(deleted_inode_but_clean, 211, FSCK_AUTOFIX) \ 248 x(deleted_inode_missing, 212, FSCK_AUTOFIX) \ 249 x(deleted_inode_is_dir, 213, FSCK_AUTOFIX) \ ··· 328 x(dirent_stray_data_after_cf_name, 305, 0) \ 329 x(rebalance_work_incorrectly_set, 309, FSCK_AUTOFIX) \ 330 x(rebalance_work_incorrectly_unset, 310, FSCK_AUTOFIX) \ 331 + x(MAX, 321, 0) 332 333 enum bch_sb_error_id { 334 #define x(t, n, ...) BCH_FSCK_ERR_##t = n,
+7 -14
fs/bcachefs/sb-members.c
··· 101 102 mi = bch2_sb_field_resize(&c->disk_sb, members_v2, u64s); 103 if (!mi) 104 - return -BCH_ERR_ENOSPC_sb_members_v2; 105 106 for (int i = c->disk_sb.sb->nr_devices - 1; i >= 0; --i) { 107 void *dst = (void *) mi->_members + (i * sizeof(struct bch_member)); ··· 378 { 379 struct bch_sb_field_members_v2 *mi = bch2_sb_field_get(c->disk_sb.sb, members_v2); 380 381 - rcu_read_lock(); 382 for_each_member_device_rcu(c, ca, NULL) { 383 struct bch_member *m = __bch2_members_v2_get_mut(mi, ca->dev_idx); 384 385 for (unsigned e = 0; e < BCH_MEMBER_ERROR_NR; e++) 386 m->errors[e] = cpu_to_le64(atomic64_read(&ca->errors[e])); 387 } 388 - rcu_read_unlock(); 389 } 390 391 void bch2_dev_io_errors_to_text(struct printbuf *out, struct bch_dev *ca) ··· 442 443 bool bch2_dev_btree_bitmap_marked(struct bch_fs *c, struct bkey_s_c k) 444 { 445 - bool ret = true; 446 - rcu_read_lock(); 447 bkey_for_each_ptr(bch2_bkey_ptrs_c(k), ptr) { 448 struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 449 - if (!ca) 450 - continue; 451 - 452 - if (!bch2_dev_btree_bitmap_marked_sectors(ca, ptr->offset, btree_sectors(c))) { 453 - ret = false; 454 - break; 455 - } 456 } 457 - rcu_read_unlock(); 458 - return ret; 459 } 460 461 static void __bch2_dev_btree_bitmap_mark(struct bch_sb_field_members_v2 *mi, unsigned dev,
··· 101 102 mi = bch2_sb_field_resize(&c->disk_sb, members_v2, u64s); 103 if (!mi) 104 + return bch_err_throw(c, ENOSPC_sb_members_v2); 105 106 for (int i = c->disk_sb.sb->nr_devices - 1; i >= 0; --i) { 107 void *dst = (void *) mi->_members + (i * sizeof(struct bch_member)); ··· 378 { 379 struct bch_sb_field_members_v2 *mi = bch2_sb_field_get(c->disk_sb.sb, members_v2); 380 381 + guard(rcu)(); 382 for_each_member_device_rcu(c, ca, NULL) { 383 struct bch_member *m = __bch2_members_v2_get_mut(mi, ca->dev_idx); 384 385 for (unsigned e = 0; e < BCH_MEMBER_ERROR_NR; e++) 386 m->errors[e] = cpu_to_le64(atomic64_read(&ca->errors[e])); 387 } 388 } 389 390 void bch2_dev_io_errors_to_text(struct printbuf *out, struct bch_dev *ca) ··· 443 444 bool bch2_dev_btree_bitmap_marked(struct bch_fs *c, struct bkey_s_c k) 445 { 446 + guard(rcu)(); 447 bkey_for_each_ptr(bch2_bkey_ptrs_c(k), ptr) { 448 struct bch_dev *ca = bch2_dev_rcu(c, ptr->dev); 449 + if (ca && 450 + !bch2_dev_btree_bitmap_marked_sectors(ca, ptr->offset, btree_sectors(c))) 451 + return false; 452 } 453 + return true; 454 } 455 456 static void __bch2_dev_btree_bitmap_mark(struct bch_sb_field_members_v2 *mi, unsigned dev,
+11 -21
fs/bcachefs/sb-members.h
··· 28 29 static inline bool bch2_dev_idx_is_online(struct bch_fs *c, unsigned dev) 30 { 31 - rcu_read_lock(); 32 struct bch_dev *ca = bch2_dev_rcu(c, dev); 33 - bool ret = ca && bch2_dev_is_online(ca); 34 - rcu_read_unlock(); 35 - 36 - return ret; 37 } 38 39 static inline bool bch2_dev_is_healthy(struct bch_dev *ca) ··· 139 140 static inline struct bch_dev *bch2_get_next_dev(struct bch_fs *c, struct bch_dev *ca) 141 { 142 - rcu_read_lock(); 143 bch2_dev_put(ca); 144 if ((ca = __bch2_next_dev(c, ca, NULL))) 145 bch2_dev_get(ca); 146 - rcu_read_unlock(); 147 - 148 return ca; 149 } 150 ··· 161 unsigned state_mask, 162 int rw, unsigned ref_idx) 163 { 164 - rcu_read_lock(); 165 if (ca) 166 enumerated_ref_put(&ca->io_ref[rw], ref_idx); 167 ··· 169 (!((1 << ca->mi.state) & state_mask) || 170 !enumerated_ref_tryget(&ca->io_ref[rw], ref_idx))) 171 ; 172 - rcu_read_unlock(); 173 174 return ca; 175 } ··· 233 234 static inline struct bch_dev *bch2_dev_tryget_noerror(struct bch_fs *c, unsigned dev) 235 { 236 - rcu_read_lock(); 237 struct bch_dev *ca = bch2_dev_rcu_noerror(c, dev); 238 if (ca) 239 bch2_dev_get(ca); 240 - rcu_read_unlock(); 241 return ca; 242 } 243 ··· 292 { 293 might_sleep(); 294 295 - rcu_read_lock(); 296 struct bch_dev *ca = bch2_dev_rcu(c, dev); 297 - if (ca && !enumerated_ref_tryget(&ca->io_ref[rw], ref_idx)) 298 - ca = NULL; 299 - rcu_read_unlock(); 300 301 - if (ca && 302 - (ca->mi.state == BCH_MEMBER_STATE_rw || 303 - (ca->mi.state == BCH_MEMBER_STATE_ro && rw == READ))) 304 return ca; 305 306 - if (ca) 307 - enumerated_ref_put(&ca->io_ref[rw], ref_idx); 308 return NULL; 309 } 310
··· 28 29 static inline bool bch2_dev_idx_is_online(struct bch_fs *c, unsigned dev) 30 { 31 + guard(rcu)(); 32 struct bch_dev *ca = bch2_dev_rcu(c, dev); 33 + return ca && bch2_dev_is_online(ca); 34 } 35 36 static inline bool bch2_dev_is_healthy(struct bch_dev *ca) ··· 142 143 static inline struct bch_dev *bch2_get_next_dev(struct bch_fs *c, struct bch_dev *ca) 144 { 145 + guard(rcu)(); 146 bch2_dev_put(ca); 147 if ((ca = __bch2_next_dev(c, ca, NULL))) 148 bch2_dev_get(ca); 149 return ca; 150 } 151 ··· 166 unsigned state_mask, 167 int rw, unsigned ref_idx) 168 { 169 + guard(rcu)(); 170 if (ca) 171 enumerated_ref_put(&ca->io_ref[rw], ref_idx); 172 ··· 174 (!((1 << ca->mi.state) & state_mask) || 175 !enumerated_ref_tryget(&ca->io_ref[rw], ref_idx))) 176 ; 177 178 return ca; 179 } ··· 239 240 static inline struct bch_dev *bch2_dev_tryget_noerror(struct bch_fs *c, unsigned dev) 241 { 242 + guard(rcu)(); 243 struct bch_dev *ca = bch2_dev_rcu_noerror(c, dev); 244 if (ca) 245 bch2_dev_get(ca); 246 return ca; 247 } 248 ··· 299 { 300 might_sleep(); 301 302 + guard(rcu)(); 303 struct bch_dev *ca = bch2_dev_rcu(c, dev); 304 + if (!ca || !enumerated_ref_tryget(&ca->io_ref[rw], ref_idx)) 305 + return NULL; 306 307 + if (ca->mi.state == BCH_MEMBER_STATE_rw || 308 + (ca->mi.state == BCH_MEMBER_STATE_ro && rw == READ)) 309 return ca; 310 311 + enumerated_ref_put(&ca->io_ref[rw], ref_idx); 312 return NULL; 313 } 314
+2 -5
fs/bcachefs/six.c
··· 339 * acquiring the lock and setting the owner field. If we're an RT task 340 * that will live-lock because we won't let the owner complete. 341 */ 342 - rcu_read_lock(); 343 struct task_struct *owner = READ_ONCE(lock->owner); 344 - bool ret = owner ? owner_on_cpu(owner) : !rt_or_dl_task(current); 345 - rcu_read_unlock(); 346 - 347 - return ret; 348 } 349 350 static inline bool six_optimistic_spin(struct six_lock *lock,
··· 339 * acquiring the lock and setting the owner field. If we're an RT task 340 * that will live-lock because we won't let the owner complete. 341 */ 342 + guard(rcu)(); 343 struct task_struct *owner = READ_ONCE(lock->owner); 344 + return owner ? owner_on_cpu(owner) : !rt_or_dl_task(current); 345 } 346 347 static inline bool six_optimistic_spin(struct six_lock *lock,
+89 -59
fs/bcachefs/snapshot.c
··· 54 BTREE_ITER_with_updates, snapshot_tree, s); 55 56 if (bch2_err_matches(ret, ENOENT)) 57 - ret = -BCH_ERR_ENOENT_snapshot_tree; 58 return ret; 59 } 60 ··· 67 struct bkey_i_snapshot_tree *s_t; 68 69 if (ret == -BCH_ERR_ENOSPC_btree_slot) 70 - ret = -BCH_ERR_ENOSPC_snapshot_tree; 71 if (ret) 72 return ERR_PTR(ret); 73 ··· 105 106 static bool bch2_snapshot_is_ancestor_early(struct bch_fs *c, u32 id, u32 ancestor) 107 { 108 - rcu_read_lock(); 109 - bool ret = __bch2_snapshot_is_ancestor_early(rcu_dereference(c->snapshots), id, ancestor); 110 - rcu_read_unlock(); 111 - 112 - return ret; 113 } 114 115 static inline u32 get_ancestor_below(struct snapshot_table *t, u32 id, u32 ancestor) ··· 137 { 138 bool ret; 139 140 - rcu_read_lock(); 141 struct snapshot_table *t = rcu_dereference(c->snapshots); 142 143 - if (unlikely(c->recovery.pass_done < BCH_RECOVERY_PASS_check_snapshots)) { 144 - ret = __bch2_snapshot_is_ancestor_early(t, id, ancestor); 145 - goto out; 146 - } 147 148 if (likely(ancestor >= IS_ANCESTOR_BITMAP)) 149 while (id && id < ancestor - IS_ANCESTOR_BITMAP) ··· 152 : id == ancestor; 153 154 EBUG_ON(ret != __bch2_snapshot_is_ancestor_early(t, id, ancestor)); 155 - out: 156 - rcu_read_unlock(); 157 - 158 return ret; 159 } 160 ··· 285 mutex_lock(&c->snapshot_table_lock); 286 int ret = snapshot_t_mut(c, id) 287 ? 0 288 - : -BCH_ERR_ENOMEM_mark_snapshot; 289 mutex_unlock(&c->snapshot_table_lock); 290 return ret; 291 } ··· 304 305 t = snapshot_t_mut(c, id); 306 if (!t) { 307 - ret = -BCH_ERR_ENOMEM_mark_snapshot; 308 goto err; 309 } 310 ··· 404 u32 bch2_snapshot_oldest_subvol(struct bch_fs *c, u32 snapshot_root, 405 snapshot_id_list *skip) 406 { 407 u32 id, subvol = 0, s; 408 retry: 409 id = snapshot_root; 410 - rcu_read_lock(); 411 while (id && bch2_snapshot_exists(c, id)) { 412 if (!(skip && snapshot_list_has_id(skip, id))) { 413 s = snapshot_t(c, id)->subvol; ··· 419 if (id == snapshot_root) 420 break; 421 } 422 - rcu_read_unlock(); 423 424 if (!subvol && skip) { 425 skip = NULL; ··· 608 609 u32 bch2_snapshot_skiplist_get(struct bch_fs *c, u32 id) 610 { 611 - const struct snapshot_t *s; 612 - 613 if (!id) 614 return 0; 615 616 - rcu_read_lock(); 617 - s = snapshot_t(c, id); 618 - if (s->parent) 619 - id = bch2_snapshot_nth_parent(c, id, get_random_u32_below(s->depth)); 620 - rcu_read_unlock(); 621 - 622 - return id; 623 } 624 625 static int snapshot_skiplist_good(struct btree_trans *trans, u32 id, struct bch_snapshot s) ··· 934 935 static inline bool snapshot_id_lists_have_common(snapshot_id_list *l, snapshot_id_list *r) 936 { 937 - darray_for_each(*l, i) 938 - if (snapshot_list_has_id(r, *i)) 939 - return true; 940 - return false; 941 } 942 943 static void snapshot_id_list_to_text(struct printbuf *out, snapshot_id_list *s) ··· 1006 "snapshot node %u from tree %s missing, recreate?", *id, buf.buf)) { 1007 if (t->nr > 1) { 1008 bch_err(c, "cannot reconstruct snapshot trees with multiple nodes"); 1009 - ret = -BCH_ERR_fsck_repair_unimplemented; 1010 goto err; 1011 } 1012 ··· 1045 ret = bch2_btree_delete_at(trans, iter, 1046 BTREE_UPDATE_internal_snapshot_node) ?: 1; 1047 1048 - /* 1049 - * Snapshot missing: we should have caught this with btree_lost_data and 1050 - * kicked off reconstruct_snapshots, so if we end up here we have no 1051 - * idea what happened: 1052 - */ 1053 - if (fsck_err_on(state == SNAPSHOT_ID_empty, 1054 - trans, bkey_in_missing_snapshot, 1055 - "key in missing snapshot %s, delete?", 1056 - (bch2_btree_id_to_text(&buf, iter->btree_id), 1057 - prt_char(&buf, ' '), 1058 - bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 1059 - ret = bch2_btree_delete_at(trans, iter, 1060 - BTREE_UPDATE_internal_snapshot_node) ?: 1; 1061 fsck_err: 1062 printbuf_exit(&buf); 1063 return ret; 1064 } 1065 ··· 1296 goto err; 1297 1298 if (!k.k || !k.k->p.offset) { 1299 - ret = -BCH_ERR_ENOSPC_snapshot_create; 1300 goto err; 1301 } 1302 ··· 1432 1433 static inline u32 interior_delete_has_id(interior_delete_list *l, u32 id) 1434 { 1435 - darray_for_each(*l, i) 1436 - if (i->id == id) 1437 - return i->live_child; 1438 - return 0; 1439 } 1440 1441 static unsigned __live_child(struct snapshot_table *t, u32 id, ··· 1465 { 1466 struct snapshot_delete *d = &c->snapshot_delete; 1467 1468 - rcu_read_lock(); 1469 - u32 ret = __live_child(rcu_dereference(c->snapshots), id, 1470 - &d->delete_leaves, &d->delete_interior); 1471 - rcu_read_unlock(); 1472 - return ret; 1473 } 1474 1475 static bool snapshot_id_dying(struct snapshot_delete *d, unsigned id) ··· 1724 static inline u32 bch2_snapshot_nth_parent_skip(struct bch_fs *c, u32 id, u32 n, 1725 interior_delete_list *skip) 1726 { 1727 - rcu_read_lock(); 1728 while (interior_delete_has_id(skip, id)) 1729 id = __bch2_snapshot_parent(c, id); 1730 ··· 1733 id = __bch2_snapshot_parent(c, id); 1734 } while (interior_delete_has_id(skip, id)); 1735 } 1736 - rcu_read_unlock(); 1737 1738 return id; 1739 } ··· 1898 d->running = false; 1899 mutex_unlock(&d->progress_lock); 1900 bch2_trans_put(trans); 1901 out_unlock: 1902 mutex_unlock(&d->lock); 1903 if (!bch2_err_matches(ret, EROFS)) ··· 1935 1936 BUG_ON(!test_bit(BCH_FS_may_go_rw, &c->flags)); 1937 1938 - if (!queue_work(c->write_ref_wq, &c->snapshot_delete.work)) 1939 enumerated_ref_put(&c->writes, BCH_WRITE_REF_delete_dead_snapshots); 1940 } 1941
··· 54 BTREE_ITER_with_updates, snapshot_tree, s); 55 56 if (bch2_err_matches(ret, ENOENT)) 57 + ret = bch_err_throw(trans->c, ENOENT_snapshot_tree); 58 return ret; 59 } 60 ··· 67 struct bkey_i_snapshot_tree *s_t; 68 69 if (ret == -BCH_ERR_ENOSPC_btree_slot) 70 + ret = bch_err_throw(trans->c, ENOSPC_snapshot_tree); 71 if (ret) 72 return ERR_PTR(ret); 73 ··· 105 106 static bool bch2_snapshot_is_ancestor_early(struct bch_fs *c, u32 id, u32 ancestor) 107 { 108 + guard(rcu)(); 109 + return __bch2_snapshot_is_ancestor_early(rcu_dereference(c->snapshots), id, ancestor); 110 } 111 112 static inline u32 get_ancestor_below(struct snapshot_table *t, u32 id, u32 ancestor) ··· 140 { 141 bool ret; 142 143 + guard(rcu)(); 144 struct snapshot_table *t = rcu_dereference(c->snapshots); 145 146 + if (unlikely(c->recovery.pass_done < BCH_RECOVERY_PASS_check_snapshots)) 147 + return __bch2_snapshot_is_ancestor_early(t, id, ancestor); 148 149 if (likely(ancestor >= IS_ANCESTOR_BITMAP)) 150 while (id && id < ancestor - IS_ANCESTOR_BITMAP) ··· 157 : id == ancestor; 158 159 EBUG_ON(ret != __bch2_snapshot_is_ancestor_early(t, id, ancestor)); 160 return ret; 161 } 162 ··· 293 mutex_lock(&c->snapshot_table_lock); 294 int ret = snapshot_t_mut(c, id) 295 ? 0 296 + : bch_err_throw(c, ENOMEM_mark_snapshot); 297 mutex_unlock(&c->snapshot_table_lock); 298 return ret; 299 } ··· 312 313 t = snapshot_t_mut(c, id); 314 if (!t) { 315 + ret = bch_err_throw(c, ENOMEM_mark_snapshot); 316 goto err; 317 } 318 ··· 412 u32 bch2_snapshot_oldest_subvol(struct bch_fs *c, u32 snapshot_root, 413 snapshot_id_list *skip) 414 { 415 + guard(rcu)(); 416 u32 id, subvol = 0, s; 417 retry: 418 id = snapshot_root; 419 while (id && bch2_snapshot_exists(c, id)) { 420 if (!(skip && snapshot_list_has_id(skip, id))) { 421 s = snapshot_t(c, id)->subvol; ··· 427 if (id == snapshot_root) 428 break; 429 } 430 431 if (!subvol && skip) { 432 skip = NULL; ··· 617 618 u32 bch2_snapshot_skiplist_get(struct bch_fs *c, u32 id) 619 { 620 if (!id) 621 return 0; 622 623 + guard(rcu)(); 624 + const struct snapshot_t *s = snapshot_t(c, id); 625 + return s->parent 626 + ? bch2_snapshot_nth_parent(c, id, get_random_u32_below(s->depth)) 627 + : id; 628 } 629 630 static int snapshot_skiplist_good(struct btree_trans *trans, u32 id, struct bch_snapshot s) ··· 947 948 static inline bool snapshot_id_lists_have_common(snapshot_id_list *l, snapshot_id_list *r) 949 { 950 + return darray_find_p(*l, i, snapshot_list_has_id(r, *i)) != NULL; 951 } 952 953 static void snapshot_id_list_to_text(struct printbuf *out, snapshot_id_list *s) ··· 1022 "snapshot node %u from tree %s missing, recreate?", *id, buf.buf)) { 1023 if (t->nr > 1) { 1024 bch_err(c, "cannot reconstruct snapshot trees with multiple nodes"); 1025 + ret = bch_err_throw(c, fsck_repair_unimplemented); 1026 goto err; 1027 } 1028 ··· 1061 ret = bch2_btree_delete_at(trans, iter, 1062 BTREE_UPDATE_internal_snapshot_node) ?: 1; 1063 1064 + if (state == SNAPSHOT_ID_empty) { 1065 + /* 1066 + * Snapshot missing: we should have caught this with btree_lost_data and 1067 + * kicked off reconstruct_snapshots, so if we end up here we have no 1068 + * idea what happened. 1069 + * 1070 + * Do not delete unless we know that subvolumes and snapshots 1071 + * are consistent: 1072 + * 1073 + * XXX: 1074 + * 1075 + * We could be smarter here, and instead of using the generic 1076 + * recovery pass ratelimiting, track if there have been any 1077 + * changes to the snapshots or inodes btrees since those passes 1078 + * last ran. 1079 + */ 1080 + ret = bch2_require_recovery_pass(c, &buf, BCH_RECOVERY_PASS_check_snapshots) ?: ret; 1081 + ret = bch2_require_recovery_pass(c, &buf, BCH_RECOVERY_PASS_check_subvols) ?: ret; 1082 + 1083 + if (c->sb.btrees_lost_data & BIT_ULL(BTREE_ID_snapshots)) 1084 + ret = bch2_require_recovery_pass(c, &buf, BCH_RECOVERY_PASS_reconstruct_snapshots) ?: ret; 1085 + 1086 + unsigned repair_flags = FSCK_CAN_IGNORE | (!ret ? FSCK_CAN_FIX : 0); 1087 + 1088 + if (__fsck_err(trans, repair_flags, bkey_in_missing_snapshot, 1089 + "key in missing snapshot %s, delete?", 1090 + (bch2_btree_id_to_text(&buf, iter->btree_id), 1091 + prt_char(&buf, ' '), 1092 + bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 1093 + ret = bch2_btree_delete_at(trans, iter, 1094 + BTREE_UPDATE_internal_snapshot_node) ?: 1; 1095 + } 1096 + } 1097 fsck_err: 1098 printbuf_exit(&buf); 1099 + return ret; 1100 + } 1101 + 1102 + int __bch2_get_snapshot_overwrites(struct btree_trans *trans, 1103 + enum btree_id btree, struct bpos pos, 1104 + snapshot_id_list *s) 1105 + { 1106 + struct bch_fs *c = trans->c; 1107 + struct btree_iter iter; 1108 + struct bkey_s_c k; 1109 + int ret = 0; 1110 + 1111 + for_each_btree_key_reverse_norestart(trans, iter, btree, bpos_predecessor(pos), 1112 + BTREE_ITER_all_snapshots, k, ret) { 1113 + if (!bkey_eq(k.k->p, pos)) 1114 + break; 1115 + 1116 + if (!bch2_snapshot_is_ancestor(c, k.k->p.snapshot, pos.snapshot) || 1117 + snapshot_list_has_ancestor(c, s, k.k->p.snapshot)) 1118 + continue; 1119 + 1120 + ret = snapshot_list_add(c, s, k.k->p.snapshot); 1121 + if (ret) 1122 + break; 1123 + } 1124 + bch2_trans_iter_exit(trans, &iter); 1125 + if (ret) 1126 + darray_exit(s); 1127 + 1128 return ret; 1129 } 1130 ··· 1263 goto err; 1264 1265 if (!k.k || !k.k->p.offset) { 1266 + ret = bch_err_throw(c, ENOSPC_snapshot_create); 1267 goto err; 1268 } 1269 ··· 1399 1400 static inline u32 interior_delete_has_id(interior_delete_list *l, u32 id) 1401 { 1402 + struct snapshot_interior_delete *i = darray_find_p(*l, i, i->id == id); 1403 + return i ? i->live_child : 0; 1404 } 1405 1406 static unsigned __live_child(struct snapshot_table *t, u32 id, ··· 1434 { 1435 struct snapshot_delete *d = &c->snapshot_delete; 1436 1437 + guard(rcu)(); 1438 + return __live_child(rcu_dereference(c->snapshots), id, 1439 + &d->delete_leaves, &d->delete_interior); 1440 } 1441 1442 static bool snapshot_id_dying(struct snapshot_delete *d, unsigned id) ··· 1695 static inline u32 bch2_snapshot_nth_parent_skip(struct bch_fs *c, u32 id, u32 n, 1696 interior_delete_list *skip) 1697 { 1698 + guard(rcu)(); 1699 while (interior_delete_has_id(skip, id)) 1700 id = __bch2_snapshot_parent(c, id); 1701 ··· 1704 id = __bch2_snapshot_parent(c, id); 1705 } while (interior_delete_has_id(skip, id)); 1706 } 1707 1708 return id; 1709 } ··· 1870 d->running = false; 1871 mutex_unlock(&d->progress_lock); 1872 bch2_trans_put(trans); 1873 + 1874 + bch2_recovery_pass_set_no_ratelimit(c, BCH_RECOVERY_PASS_check_snapshots); 1875 out_unlock: 1876 mutex_unlock(&d->lock); 1877 if (!bch2_err_matches(ret, EROFS)) ··· 1905 1906 BUG_ON(!test_bit(BCH_FS_may_go_rw, &c->flags)); 1907 1908 + if (!queue_work(system_long_wq, &c->snapshot_delete.work)) 1909 enumerated_ref_put(&c->writes, BCH_WRITE_REF_delete_dead_snapshots); 1910 } 1911
+37 -48
fs/bcachefs/snapshot.h
··· 46 47 static inline u32 bch2_snapshot_tree(struct bch_fs *c, u32 id) 48 { 49 - rcu_read_lock(); 50 const struct snapshot_t *s = snapshot_t(c, id); 51 - id = s ? s->tree : 0; 52 - rcu_read_unlock(); 53 - 54 - return id; 55 } 56 57 static inline u32 __bch2_snapshot_parent_early(struct bch_fs *c, u32 id) ··· 59 60 static inline u32 bch2_snapshot_parent_early(struct bch_fs *c, u32 id) 61 { 62 - rcu_read_lock(); 63 - id = __bch2_snapshot_parent_early(c, id); 64 - rcu_read_unlock(); 65 - 66 - return id; 67 } 68 69 static inline u32 __bch2_snapshot_parent(struct bch_fs *c, u32 id) ··· 82 83 static inline u32 bch2_snapshot_parent(struct bch_fs *c, u32 id) 84 { 85 - rcu_read_lock(); 86 - id = __bch2_snapshot_parent(c, id); 87 - rcu_read_unlock(); 88 - 89 - return id; 90 } 91 92 static inline u32 bch2_snapshot_nth_parent(struct bch_fs *c, u32 id, u32 n) 93 { 94 - rcu_read_lock(); 95 while (n--) 96 id = __bch2_snapshot_parent(c, id); 97 - rcu_read_unlock(); 98 - 99 return id; 100 } 101 ··· 99 100 static inline u32 bch2_snapshot_root(struct bch_fs *c, u32 id) 101 { 102 - u32 parent; 103 104 - rcu_read_lock(); 105 while ((parent = __bch2_snapshot_parent(c, id))) 106 id = parent; 107 - rcu_read_unlock(); 108 - 109 return id; 110 } 111 ··· 115 116 static inline enum snapshot_id_state bch2_snapshot_id_state(struct bch_fs *c, u32 id) 117 { 118 - rcu_read_lock(); 119 - enum snapshot_id_state ret = __bch2_snapshot_id_state(c, id); 120 - rcu_read_unlock(); 121 - 122 - return ret; 123 } 124 125 static inline bool bch2_snapshot_exists(struct bch_fs *c, u32 id) ··· 126 127 static inline int bch2_snapshot_is_internal_node(struct bch_fs *c, u32 id) 128 { 129 - rcu_read_lock(); 130 const struct snapshot_t *s = snapshot_t(c, id); 131 - int ret = s ? s->children[0] : -BCH_ERR_invalid_snapshot_node; 132 - rcu_read_unlock(); 133 - 134 - return ret; 135 } 136 137 static inline int bch2_snapshot_is_leaf(struct bch_fs *c, u32 id) ··· 141 142 static inline u32 bch2_snapshot_depth(struct bch_fs *c, u32 parent) 143 { 144 - u32 depth; 145 - 146 - rcu_read_lock(); 147 - depth = parent ? snapshot_t(c, parent)->depth + 1 : 0; 148 - rcu_read_unlock(); 149 - 150 - return depth; 151 } 152 153 bool __bch2_snapshot_is_ancestor(struct bch_fs *, u32, u32); ··· 156 157 static inline bool bch2_snapshot_has_children(struct bch_fs *c, u32 id) 158 { 159 - rcu_read_lock(); 160 const struct snapshot_t *t = snapshot_t(c, id); 161 - bool ret = t && (t->children[0]|t->children[1]) != 0; 162 - rcu_read_unlock(); 163 - 164 - return ret; 165 } 166 167 static inline bool snapshot_list_has_id(snapshot_id_list *s, u32 id) 168 { 169 - darray_for_each(*s, i) 170 - if (*i == id) 171 - return true; 172 - return false; 173 } 174 175 static inline bool snapshot_list_has_ancestor(struct bch_fs *c, snapshot_id_list *s, u32 id) ··· 226 return likely(bch2_snapshot_exists(trans->c, k.k->p.snapshot)) 227 ? 0 228 : __bch2_check_key_has_snapshot(trans, iter, k); 229 } 230 231 int bch2_snapshot_node_set_deleted(struct btree_trans *, u32);
··· 46 47 static inline u32 bch2_snapshot_tree(struct bch_fs *c, u32 id) 48 { 49 + guard(rcu)(); 50 const struct snapshot_t *s = snapshot_t(c, id); 51 + return s ? s->tree : 0; 52 } 53 54 static inline u32 __bch2_snapshot_parent_early(struct bch_fs *c, u32 id) ··· 62 63 static inline u32 bch2_snapshot_parent_early(struct bch_fs *c, u32 id) 64 { 65 + guard(rcu)(); 66 + return __bch2_snapshot_parent_early(c, id); 67 } 68 69 static inline u32 __bch2_snapshot_parent(struct bch_fs *c, u32 id) ··· 88 89 static inline u32 bch2_snapshot_parent(struct bch_fs *c, u32 id) 90 { 91 + guard(rcu)(); 92 + return __bch2_snapshot_parent(c, id); 93 } 94 95 static inline u32 bch2_snapshot_nth_parent(struct bch_fs *c, u32 id, u32 n) 96 { 97 + guard(rcu)(); 98 while (n--) 99 id = __bch2_snapshot_parent(c, id); 100 return id; 101 } 102 ··· 110 111 static inline u32 bch2_snapshot_root(struct bch_fs *c, u32 id) 112 { 113 + guard(rcu)(); 114 115 + u32 parent; 116 while ((parent = __bch2_snapshot_parent(c, id))) 117 id = parent; 118 return id; 119 } 120 ··· 128 129 static inline enum snapshot_id_state bch2_snapshot_id_state(struct bch_fs *c, u32 id) 130 { 131 + guard(rcu)(); 132 + return __bch2_snapshot_id_state(c, id); 133 } 134 135 static inline bool bch2_snapshot_exists(struct bch_fs *c, u32 id) ··· 142 143 static inline int bch2_snapshot_is_internal_node(struct bch_fs *c, u32 id) 144 { 145 + guard(rcu)(); 146 const struct snapshot_t *s = snapshot_t(c, id); 147 + return s ? s->children[0] : -BCH_ERR_invalid_snapshot_node; 148 } 149 150 static inline int bch2_snapshot_is_leaf(struct bch_fs *c, u32 id) ··· 160 161 static inline u32 bch2_snapshot_depth(struct bch_fs *c, u32 parent) 162 { 163 + guard(rcu)(); 164 + return parent ? snapshot_t(c, parent)->depth + 1 : 0; 165 } 166 167 bool __bch2_snapshot_is_ancestor(struct bch_fs *, u32, u32); ··· 180 181 static inline bool bch2_snapshot_has_children(struct bch_fs *c, u32 id) 182 { 183 + guard(rcu)(); 184 const struct snapshot_t *t = snapshot_t(c, id); 185 + return t && (t->children[0]|t->children[1]) != 0; 186 } 187 188 static inline bool snapshot_list_has_id(snapshot_id_list *s, u32 id) 189 { 190 + return darray_find(*s, id) != NULL; 191 } 192 193 static inline bool snapshot_list_has_ancestor(struct bch_fs *c, snapshot_id_list *s, u32 id) ··· 256 return likely(bch2_snapshot_exists(trans->c, k.k->p.snapshot)) 257 ? 0 258 : __bch2_check_key_has_snapshot(trans, iter, k); 259 + } 260 + 261 + int __bch2_get_snapshot_overwrites(struct btree_trans *, 262 + enum btree_id, struct bpos, 263 + snapshot_id_list *); 264 + 265 + /* 266 + * Get a list of snapshot IDs that have overwritten a given key: 267 + */ 268 + static inline int bch2_get_snapshot_overwrites(struct btree_trans *trans, 269 + enum btree_id btree, struct bpos pos, 270 + snapshot_id_list *s) 271 + { 272 + darray_init(s); 273 + 274 + return bch2_snapshot_has_children(trans->c, pos.snapshot) 275 + ? __bch2_get_snapshot_overwrites(trans, btree, pos, s) 276 + : 0; 277 + 278 } 279 280 int bch2_snapshot_node_set_deleted(struct btree_trans *, u32);
+158 -87
fs/bcachefs/str_hash.c
··· 31 } 32 } 33 34 - static noinline int fsck_rename_dirent(struct btree_trans *trans, 35 - struct snapshots_seen *s, 36 - const struct bch_hash_desc desc, 37 - struct bch_hash_info *hash_info, 38 - struct bkey_s_c_dirent old) 39 { 40 struct qstr old_name = bch2_dirent_get_name(old); 41 - struct bkey_i_dirent *new = bch2_trans_kmalloc(trans, bkey_bytes(old.k) + 32); 42 int ret = PTR_ERR_OR_ZERO(new); 43 if (ret) 44 return ret; ··· 48 dirent_copy_target(new, old); 49 new->k.p = old.k->p; 50 51 for (unsigned i = 0; i < 1000; i++) { 52 - unsigned len = sprintf(new->v.d_name, "%.*s.fsck_renamed-%u", 53 - old_name.len, old_name.name, i); 54 - unsigned u64s = BKEY_U64s + dirent_val_u64s(len, 0); 55 56 - if (u64s > U8_MAX) 57 - return -EINVAL; 58 59 - new->k.u64s = u64s; 60 61 ret = bch2_hash_set_in_snapshot(trans, bch2_dirent_hash_desc, hash_info, 62 (subvol_inum) { 0, old.k->p.inode }, 63 old.k->p.snapshot, &new->k_i, 64 - BTREE_UPDATE_internal_snapshot_node); 65 - if (!bch2_err_matches(ret, EEXIST)) 66 break; 67 } 68 69 - if (ret) 70 - return ret; 71 - 72 - return bch2_fsck_update_backpointers(trans, s, desc, hash_info, &new->k_i); 73 } 74 75 static noinline int hash_pick_winner(struct btree_trans *trans, ··· 198 #endif 199 bch2_print_str(c, KERN_ERR, buf.buf); 200 printbuf_exit(&buf); 201 - ret = -BCH_ERR_fsck_repair_unimplemented; 202 goto err; 203 } 204 ··· 233 return ret; 234 } 235 236 int __bch2_str_hash_check_key(struct btree_trans *trans, 237 struct snapshots_seen *s, 238 const struct bch_hash_desc *desc, 239 struct bch_hash_info *hash_info, 240 - struct btree_iter *k_iter, struct bkey_s_c hash_k) 241 { 242 struct bch_fs *c = trans->c; 243 struct btree_iter iter = {}; ··· 355 356 for_each_btree_key_norestart(trans, iter, desc->btree_id, 357 SPOS(hash_k.k->p.inode, hash, hash_k.k->p.snapshot), 358 - BTREE_ITER_slots, k, ret) { 359 if (bkey_eq(k.k->p, hash_k.k->p)) 360 break; 361 362 if (k.k->type == desc->key_type && 363 - !desc->cmp_bkey(k, hash_k)) 364 - goto duplicate_entries; 365 - 366 - if (bkey_deleted(k.k)) { 367 - bch2_trans_iter_exit(trans, &iter); 368 - goto bad_hash; 369 } 370 } 371 - out: 372 bch2_trans_iter_exit(trans, &iter); 373 printbuf_exit(&buf); 374 return ret; 375 bad_hash: 376 /* 377 * Before doing any repair, check hash_info itself: 378 */ ··· 388 goto out; 389 390 if (fsck_err(trans, hash_table_key_wrong_offset, 391 - "hash table key at wrong offset: btree %s inode %llu offset %llu, hashed to %llu\n%s", 392 - bch2_btree_id_str(desc->btree_id), hash_k.k->p.inode, hash_k.k->p.offset, hash, 393 - (printbuf_reset(&buf), 394 - bch2_bkey_val_to_text(&buf, c, hash_k), buf.buf))) { 395 - struct bkey_i *new = bch2_bkey_make_mut_noupdate(trans, hash_k); 396 - if (IS_ERR(new)) 397 - return PTR_ERR(new); 398 - 399 - k = bch2_hash_set_or_get_in_snapshot(trans, &iter, *desc, hash_info, 400 - (subvol_inum) { 0, hash_k.k->p.inode }, 401 - hash_k.k->p.snapshot, new, 402 - STR_HASH_must_create| 403 - BTREE_ITER_with_updates| 404 - BTREE_UPDATE_internal_snapshot_node); 405 - ret = bkey_err(k); 406 - if (ret) 407 - goto out; 408 - if (k.k) 409 - goto duplicate_entries; 410 - 411 - ret = bch2_hash_delete_at(trans, *desc, hash_info, k_iter, 412 - BTREE_UPDATE_internal_snapshot_node) ?: 413 - bch2_fsck_update_backpointers(trans, s, *desc, hash_info, new) ?: 414 - bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc) ?: 415 - -BCH_ERR_transaction_restart_nested; 416 - goto out; 417 - } 418 - fsck_err: 419 - goto out; 420 - duplicate_entries: 421 - ret = hash_pick_winner(trans, *desc, hash_info, hash_k, k); 422 - if (ret < 0) 423 - goto out; 424 - 425 - if (!fsck_err(trans, hash_table_key_duplicate, 426 - "duplicate hash table keys%s:\n%s", 427 - ret != 2 ? "" : ", both point to valid inodes", 428 - (printbuf_reset(&buf), 429 - bch2_bkey_val_to_text(&buf, c, hash_k), 430 - prt_newline(&buf), 431 - bch2_bkey_val_to_text(&buf, c, k), 432 - buf.buf))) 433 - goto out; 434 - 435 - switch (ret) { 436 - case 0: 437 - ret = bch2_hash_delete_at(trans, *desc, hash_info, k_iter, 0); 438 - break; 439 - case 1: 440 - ret = bch2_hash_delete_at(trans, *desc, hash_info, &iter, 0); 441 - break; 442 - case 2: 443 - ret = fsck_rename_dirent(trans, s, *desc, hash_info, bkey_s_c_to_dirent(hash_k)) ?: 444 - bch2_hash_delete_at(trans, *desc, hash_info, k_iter, 0); 445 - goto out; 446 - } 447 - 448 - ret = bch2_trans_commit(trans, NULL, NULL, 0) ?: 449 - -BCH_ERR_transaction_restart_nested; 450 goto out; 451 }
··· 31 } 32 } 33 34 + static int bch2_fsck_rename_dirent(struct btree_trans *trans, 35 + struct snapshots_seen *s, 36 + const struct bch_hash_desc desc, 37 + struct bch_hash_info *hash_info, 38 + struct bkey_s_c_dirent old, 39 + bool *updated_before_k_pos) 40 { 41 struct qstr old_name = bch2_dirent_get_name(old); 42 + struct bkey_i_dirent *new = bch2_trans_kmalloc(trans, BKEY_U64s_MAX * sizeof(u64)); 43 int ret = PTR_ERR_OR_ZERO(new); 44 if (ret) 45 return ret; ··· 47 dirent_copy_target(new, old); 48 new->k.p = old.k->p; 49 50 + char *renamed_buf = bch2_trans_kmalloc(trans, old_name.len + 20); 51 + ret = PTR_ERR_OR_ZERO(renamed_buf); 52 + if (ret) 53 + return ret; 54 + 55 for (unsigned i = 0; i < 1000; i++) { 56 + new->k.u64s = BKEY_U64s_MAX; 57 58 + struct qstr renamed_name = (struct qstr) QSTR_INIT(renamed_buf, 59 + sprintf(renamed_buf, "%.*s.fsck_renamed-%u", 60 + old_name.len, old_name.name, i)); 61 62 + ret = bch2_dirent_init_name(new, hash_info, &renamed_name, NULL); 63 + if (ret) 64 + return ret; 65 66 ret = bch2_hash_set_in_snapshot(trans, bch2_dirent_hash_desc, hash_info, 67 (subvol_inum) { 0, old.k->p.inode }, 68 old.k->p.snapshot, &new->k_i, 69 + BTREE_UPDATE_internal_snapshot_node| 70 + STR_HASH_must_create); 71 + if (ret && !bch2_err_matches(ret, EEXIST)) 72 break; 73 + if (!ret) { 74 + if (bpos_lt(new->k.p, old.k->p)) 75 + *updated_before_k_pos = true; 76 + break; 77 + } 78 } 79 80 + ret = ret ?: bch2_fsck_update_backpointers(trans, s, desc, hash_info, &new->k_i); 81 + bch_err_fn(trans->c, ret); 82 + return ret; 83 } 84 85 static noinline int hash_pick_winner(struct btree_trans *trans, ··· 186 #endif 187 bch2_print_str(c, KERN_ERR, buf.buf); 188 printbuf_exit(&buf); 189 + ret = bch_err_throw(c, fsck_repair_unimplemented); 190 goto err; 191 } 192 ··· 221 return ret; 222 } 223 224 + /* Put a str_hash key in its proper location, checking for duplicates */ 225 + int bch2_str_hash_repair_key(struct btree_trans *trans, 226 + struct snapshots_seen *s, 227 + const struct bch_hash_desc *desc, 228 + struct bch_hash_info *hash_info, 229 + struct btree_iter *k_iter, struct bkey_s_c k, 230 + struct btree_iter *dup_iter, struct bkey_s_c dup_k, 231 + bool *updated_before_k_pos) 232 + { 233 + struct bch_fs *c = trans->c; 234 + struct printbuf buf = PRINTBUF; 235 + bool free_snapshots_seen = false; 236 + int ret = 0; 237 + 238 + if (!s) { 239 + s = bch2_trans_kmalloc(trans, sizeof(*s)); 240 + ret = PTR_ERR_OR_ZERO(s); 241 + if (ret) 242 + goto out; 243 + 244 + s->pos = k_iter->pos; 245 + darray_init(&s->ids); 246 + 247 + ret = bch2_get_snapshot_overwrites(trans, desc->btree_id, k_iter->pos, &s->ids); 248 + if (ret) 249 + goto out; 250 + 251 + free_snapshots_seen = true; 252 + } 253 + 254 + if (!dup_k.k) { 255 + struct bkey_i *new = bch2_bkey_make_mut_noupdate(trans, k); 256 + ret = PTR_ERR_OR_ZERO(new); 257 + if (ret) 258 + goto out; 259 + 260 + dup_k = bch2_hash_set_or_get_in_snapshot(trans, dup_iter, *desc, hash_info, 261 + (subvol_inum) { 0, new->k.p.inode }, 262 + new->k.p.snapshot, new, 263 + STR_HASH_must_create| 264 + BTREE_ITER_with_updates| 265 + BTREE_UPDATE_internal_snapshot_node); 266 + ret = bkey_err(dup_k); 267 + if (ret) 268 + goto out; 269 + if (dup_k.k) 270 + goto duplicate_entries; 271 + 272 + if (bpos_lt(new->k.p, k.k->p)) 273 + *updated_before_k_pos = true; 274 + 275 + ret = bch2_insert_snapshot_whiteouts(trans, desc->btree_id, 276 + k_iter->pos, new->k.p) ?: 277 + bch2_hash_delete_at(trans, *desc, hash_info, k_iter, 278 + BTREE_ITER_with_updates| 279 + BTREE_UPDATE_internal_snapshot_node) ?: 280 + bch2_fsck_update_backpointers(trans, s, *desc, hash_info, new) ?: 281 + bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc) ?: 282 + -BCH_ERR_transaction_restart_commit; 283 + } else { 284 + duplicate_entries: 285 + ret = hash_pick_winner(trans, *desc, hash_info, k, dup_k); 286 + if (ret < 0) 287 + goto out; 288 + 289 + if (!fsck_err(trans, hash_table_key_duplicate, 290 + "duplicate hash table keys%s:\n%s", 291 + ret != 2 ? "" : ", both point to valid inodes", 292 + (printbuf_reset(&buf), 293 + bch2_bkey_val_to_text(&buf, c, k), 294 + prt_newline(&buf), 295 + bch2_bkey_val_to_text(&buf, c, dup_k), 296 + buf.buf))) 297 + goto out; 298 + 299 + switch (ret) { 300 + case 0: 301 + ret = bch2_hash_delete_at(trans, *desc, hash_info, k_iter, 0); 302 + break; 303 + case 1: 304 + ret = bch2_hash_delete_at(trans, *desc, hash_info, dup_iter, 0); 305 + break; 306 + case 2: 307 + ret = bch2_fsck_rename_dirent(trans, s, *desc, hash_info, 308 + bkey_s_c_to_dirent(k), 309 + updated_before_k_pos) ?: 310 + bch2_hash_delete_at(trans, *desc, hash_info, k_iter, 311 + BTREE_ITER_with_updates); 312 + goto out; 313 + } 314 + 315 + ret = bch2_trans_commit(trans, NULL, NULL, 0) ?: 316 + -BCH_ERR_transaction_restart_commit; 317 + } 318 + out: 319 + fsck_err: 320 + bch2_trans_iter_exit(trans, dup_iter); 321 + printbuf_exit(&buf); 322 + if (free_snapshots_seen) 323 + darray_exit(&s->ids); 324 + return ret; 325 + } 326 + 327 int __bch2_str_hash_check_key(struct btree_trans *trans, 328 struct snapshots_seen *s, 329 const struct bch_hash_desc *desc, 330 struct bch_hash_info *hash_info, 331 + struct btree_iter *k_iter, struct bkey_s_c hash_k, 332 + bool *updated_before_k_pos) 333 { 334 struct bch_fs *c = trans->c; 335 struct btree_iter iter = {}; ··· 239 240 for_each_btree_key_norestart(trans, iter, desc->btree_id, 241 SPOS(hash_k.k->p.inode, hash, hash_k.k->p.snapshot), 242 + BTREE_ITER_slots| 243 + BTREE_ITER_with_updates, k, ret) { 244 if (bkey_eq(k.k->p, hash_k.k->p)) 245 break; 246 247 if (k.k->type == desc->key_type && 248 + !desc->cmp_bkey(k, hash_k)) { 249 + ret = check_inode_hash_info_matches_root(trans, hash_k.k->p.inode, 250 + hash_info) ?: 251 + bch2_str_hash_repair_key(trans, s, desc, hash_info, 252 + k_iter, hash_k, 253 + &iter, k, updated_before_k_pos); 254 + break; 255 } 256 + 257 + if (bkey_deleted(k.k)) 258 + goto bad_hash; 259 } 260 bch2_trans_iter_exit(trans, &iter); 261 + out: 262 + fsck_err: 263 printbuf_exit(&buf); 264 return ret; 265 bad_hash: 266 + bch2_trans_iter_exit(trans, &iter); 267 /* 268 * Before doing any repair, check hash_info itself: 269 */ ··· 265 goto out; 266 267 if (fsck_err(trans, hash_table_key_wrong_offset, 268 + "hash table key at wrong offset: should be at %llu\n%s", 269 + hash, 270 + (bch2_bkey_val_to_text(&buf, c, hash_k), buf.buf))) 271 + ret = bch2_str_hash_repair_key(trans, s, desc, hash_info, 272 + k_iter, hash_k, 273 + &iter, bkey_s_c_null, 274 + updated_before_k_pos); 275 goto out; 276 }
+18 -6
fs/bcachefs/str_hash.h
··· 261 struct bkey_i *insert, 262 enum btree_iter_update_trigger_flags flags) 263 { 264 struct btree_iter slot = {}; 265 struct bkey_s_c k; 266 bool found = false; ··· 289 } 290 291 if (!ret) 292 - ret = -BCH_ERR_ENOSPC_str_hash_create; 293 out: 294 bch2_trans_iter_exit(trans, &slot); 295 bch2_trans_iter_exit(trans, iter); ··· 301 bch2_trans_iter_exit(trans, &slot); 302 return k; 303 } else if (!found && (flags & STR_HASH_must_replace)) { 304 - ret = -BCH_ERR_ENOENT_str_hash_set_must_replace; 305 } else { 306 if (!found && slot.path) 307 swap(*iter, slot); ··· 329 return ret; 330 if (k.k) { 331 bch2_trans_iter_exit(trans, &iter); 332 - return -BCH_ERR_EEXIST_str_hash_set; 333 } 334 335 return 0; ··· 398 int bch2_repair_inode_hash_info(struct btree_trans *, struct bch_inode_unpacked *); 399 400 struct snapshots_seen; 401 int __bch2_str_hash_check_key(struct btree_trans *, 402 struct snapshots_seen *, 403 const struct bch_hash_desc *, 404 struct bch_hash_info *, 405 - struct btree_iter *, struct bkey_s_c); 406 407 static inline int bch2_str_hash_check_key(struct btree_trans *trans, 408 struct snapshots_seen *s, 409 const struct bch_hash_desc *desc, 410 struct bch_hash_info *hash_info, 411 - struct btree_iter *k_iter, struct bkey_s_c hash_k) 412 { 413 if (hash_k.k->type != desc->key_type) 414 return 0; ··· 426 if (likely(desc->hash_bkey(hash_info, hash_k) == hash_k.k->p.offset)) 427 return 0; 428 429 - return __bch2_str_hash_check_key(trans, s, desc, hash_info, k_iter, hash_k); 430 } 431 432 #endif /* _BCACHEFS_STR_HASH_H */
··· 261 struct bkey_i *insert, 262 enum btree_iter_update_trigger_flags flags) 263 { 264 + struct bch_fs *c = trans->c; 265 struct btree_iter slot = {}; 266 struct bkey_s_c k; 267 bool found = false; ··· 288 } 289 290 if (!ret) 291 + ret = bch_err_throw(c, ENOSPC_str_hash_create); 292 out: 293 bch2_trans_iter_exit(trans, &slot); 294 bch2_trans_iter_exit(trans, iter); ··· 300 bch2_trans_iter_exit(trans, &slot); 301 return k; 302 } else if (!found && (flags & STR_HASH_must_replace)) { 303 + ret = bch_err_throw(c, ENOENT_str_hash_set_must_replace); 304 } else { 305 if (!found && slot.path) 306 swap(*iter, slot); ··· 328 return ret; 329 if (k.k) { 330 bch2_trans_iter_exit(trans, &iter); 331 + return bch_err_throw(trans->c, EEXIST_str_hash_set); 332 } 333 334 return 0; ··· 397 int bch2_repair_inode_hash_info(struct btree_trans *, struct bch_inode_unpacked *); 398 399 struct snapshots_seen; 400 + int bch2_str_hash_repair_key(struct btree_trans *, 401 + struct snapshots_seen *, 402 + const struct bch_hash_desc *, 403 + struct bch_hash_info *, 404 + struct btree_iter *, struct bkey_s_c, 405 + struct btree_iter *, struct bkey_s_c, 406 + bool *); 407 + 408 int __bch2_str_hash_check_key(struct btree_trans *, 409 struct snapshots_seen *, 410 const struct bch_hash_desc *, 411 struct bch_hash_info *, 412 + struct btree_iter *, struct bkey_s_c, 413 + bool *); 414 415 static inline int bch2_str_hash_check_key(struct btree_trans *trans, 416 struct snapshots_seen *s, 417 const struct bch_hash_desc *desc, 418 struct bch_hash_info *hash_info, 419 + struct btree_iter *k_iter, struct bkey_s_c hash_k, 420 + bool *updated_before_k_pos) 421 { 422 if (hash_k.k->type != desc->key_type) 423 return 0; ··· 415 if (likely(desc->hash_bkey(hash_info, hash_k) == hash_k.k->p.offset)) 416 return 0; 417 418 + return __bch2_str_hash_check_key(trans, s, desc, hash_info, k_iter, hash_k, 419 + updated_before_k_pos); 420 } 421 422 #endif /* _BCACHEFS_STR_HASH_H */
+31 -14
fs/bcachefs/subvolume.c
··· 130 "subvolume %llu points to missing subvolume root %llu:%u", 131 k.k->p.offset, le64_to_cpu(subvol.v->inode), 132 le32_to_cpu(subvol.v->snapshot))) { 133 - ret = bch2_subvolume_delete(trans, iter->pos.offset); 134 - bch_err_msg(c, ret, "deleting subvolume %llu", iter->pos.offset); 135 - ret = ret ?: -BCH_ERR_transaction_restart_nested; 136 - goto err; 137 } 138 } else { 139 goto err; ··· 151 152 if (!BCH_SUBVOLUME_SNAP(subvol.v)) { 153 u32 snapshot_root = bch2_snapshot_root(c, le32_to_cpu(subvol.v->snapshot)); 154 - u32 snapshot_tree; 155 struct bch_snapshot_tree st; 156 - 157 - rcu_read_lock(); 158 - snapshot_tree = snapshot_t(c, snapshot_root)->tree; 159 - rcu_read_unlock(); 160 - 161 ret = bch2_snapshot_tree_lookup(trans, snapshot_tree, &st); 162 163 bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), c, ··· 265 prt_printf(out, " creation_parent %u", le32_to_cpu(s.v->creation_parent)); 266 prt_printf(out, " fs_parent %u", le32_to_cpu(s.v->fs_path_parent)); 267 } 268 } 269 270 static int subvolume_children_mod(struct btree_trans *trans, struct bpos pos, bool set) ··· 499 500 static int bch2_subvolume_delete(struct btree_trans *trans, u32 subvolid) 501 { 502 - return bch2_subvolumes_reparent(trans, subvolid) ?: 503 commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 504 __bch2_subvolume_delete(trans, subvolid)); 505 } 506 507 static void bch2_subvolume_wait_for_pagecache_and_delete(struct work_struct *work) ··· 613 ret = bch2_bkey_get_empty_slot(trans, &dst_iter, 614 BTREE_ID_subvolumes, POS(0, U32_MAX)); 615 if (ret == -BCH_ERR_ENOSPC_btree_slot) 616 - ret = -BCH_ERR_ENOSPC_subvolume_create; 617 if (ret) 618 return ret; 619 ··· 719 return ret; 720 721 if (!bkey_is_inode(k.k)) { 722 - bch_err(trans->c, "root inode not found"); 723 - ret = -BCH_ERR_ENOENT_inode; 724 goto err; 725 } 726
··· 130 "subvolume %llu points to missing subvolume root %llu:%u", 131 k.k->p.offset, le64_to_cpu(subvol.v->inode), 132 le32_to_cpu(subvol.v->snapshot))) { 133 + /* 134 + * Recreate - any contents that are still disconnected 135 + * will then get reattached under lost+found 136 + */ 137 + bch2_inode_init_early(c, &inode); 138 + bch2_inode_init_late(c, &inode, bch2_current_time(c), 139 + 0, 0, S_IFDIR|0700, 0, NULL); 140 + inode.bi_inum = le64_to_cpu(subvol.v->inode); 141 + inode.bi_snapshot = le32_to_cpu(subvol.v->snapshot); 142 + inode.bi_subvol = k.k->p.offset; 143 + inode.bi_parent_subvol = le32_to_cpu(subvol.v->fs_path_parent); 144 + ret = __bch2_fsck_write_inode(trans, &inode); 145 + if (ret) 146 + goto err; 147 } 148 } else { 149 goto err; ··· 141 142 if (!BCH_SUBVOLUME_SNAP(subvol.v)) { 143 u32 snapshot_root = bch2_snapshot_root(c, le32_to_cpu(subvol.v->snapshot)); 144 + u32 snapshot_tree = bch2_snapshot_tree(c, snapshot_root); 145 + 146 struct bch_snapshot_tree st; 147 ret = bch2_snapshot_tree_lookup(trans, snapshot_tree, &st); 148 149 bch2_fs_inconsistent_on(bch2_err_matches(ret, ENOENT), c, ··· 259 prt_printf(out, " creation_parent %u", le32_to_cpu(s.v->creation_parent)); 260 prt_printf(out, " fs_parent %u", le32_to_cpu(s.v->fs_path_parent)); 261 } 262 + 263 + if (BCH_SUBVOLUME_RO(s.v)) 264 + prt_printf(out, " ro"); 265 + if (BCH_SUBVOLUME_SNAP(s.v)) 266 + prt_printf(out, " snapshot"); 267 + if (BCH_SUBVOLUME_UNLINKED(s.v)) 268 + prt_printf(out, " unlinked"); 269 } 270 271 static int subvolume_children_mod(struct btree_trans *trans, struct bpos pos, bool set) ··· 486 487 static int bch2_subvolume_delete(struct btree_trans *trans, u32 subvolid) 488 { 489 + int ret = bch2_subvolumes_reparent(trans, subvolid) ?: 490 commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 491 __bch2_subvolume_delete(trans, subvolid)); 492 + 493 + bch2_recovery_pass_set_no_ratelimit(trans->c, BCH_RECOVERY_PASS_check_subvols); 494 + return ret; 495 } 496 497 static void bch2_subvolume_wait_for_pagecache_and_delete(struct work_struct *work) ··· 597 ret = bch2_bkey_get_empty_slot(trans, &dst_iter, 598 BTREE_ID_subvolumes, POS(0, U32_MAX)); 599 if (ret == -BCH_ERR_ENOSPC_btree_slot) 600 + ret = bch_err_throw(c, ENOSPC_subvolume_create); 601 if (ret) 602 return ret; 603 ··· 703 return ret; 704 705 if (!bkey_is_inode(k.k)) { 706 + struct bch_fs *c = trans->c; 707 + bch_err(c, "root inode not found"); 708 + ret = bch_err_throw(c, ENOENT_inode); 709 goto err; 710 } 711
+4 -4
fs/bcachefs/super-io.c
··· 1112 prt_str(&buf, ")"); 1113 bch2_fs_fatal_error(c, ": %s", buf.buf); 1114 printbuf_exit(&buf); 1115 - ret = -BCH_ERR_sb_not_downgraded; 1116 goto out; 1117 } 1118 ··· 1142 1143 if (c->opts.errors != BCH_ON_ERROR_continue && 1144 c->opts.errors != BCH_ON_ERROR_fix_safe) { 1145 - ret = -BCH_ERR_erofs_sb_err; 1146 bch2_fs_fatal_error(c, "%s", buf.buf); 1147 } else { 1148 bch_err(c, "%s", buf.buf); ··· 1161 ca->disk_sb.seq); 1162 bch2_fs_fatal_error(c, "%s", buf.buf); 1163 printbuf_exit(&buf); 1164 - ret = -BCH_ERR_erofs_sb_err; 1165 } 1166 } 1167 ··· 1215 !can_mount_with_written), c, 1216 ": Unable to write superblock to sufficient devices (from %ps)", 1217 (void *) _RET_IP_)) 1218 - ret = -BCH_ERR_erofs_sb_err; 1219 out: 1220 /* Make new options visible after they're persistent: */ 1221 bch2_sb_update(c);
··· 1112 prt_str(&buf, ")"); 1113 bch2_fs_fatal_error(c, ": %s", buf.buf); 1114 printbuf_exit(&buf); 1115 + ret = bch_err_throw(c, sb_not_downgraded); 1116 goto out; 1117 } 1118 ··· 1142 1143 if (c->opts.errors != BCH_ON_ERROR_continue && 1144 c->opts.errors != BCH_ON_ERROR_fix_safe) { 1145 + ret = bch_err_throw(c, erofs_sb_err); 1146 bch2_fs_fatal_error(c, "%s", buf.buf); 1147 } else { 1148 bch_err(c, "%s", buf.buf); ··· 1161 ca->disk_sb.seq); 1162 bch2_fs_fatal_error(c, "%s", buf.buf); 1163 printbuf_exit(&buf); 1164 + ret = bch_err_throw(c, erofs_sb_err); 1165 } 1166 } 1167 ··· 1215 !can_mount_with_written), c, 1216 ": Unable to write superblock to sufficient devices (from %ps)", 1217 (void *) _RET_IP_)) 1218 + ret = bch_err_throw(c, erofs_sb_err); 1219 out: 1220 /* Make new options visible after they're persistent: */ 1221 bch2_sb_update(c);
+50 -56
fs/bcachefs/super.c
··· 219 220 struct bch_fs *bch2_dev_to_fs(dev_t dev) 221 { 222 struct bch_fs *c; 223 - 224 - mutex_lock(&bch_fs_list_lock); 225 - rcu_read_lock(); 226 - 227 list_for_each_entry(c, &bch_fs_list, list) 228 for_each_member_device_rcu(c, ca, NULL) 229 if (ca->disk_sb.bdev && ca->disk_sb.bdev->bd_dev == dev) { 230 closure_get(&c->cl); 231 - goto found; 232 } 233 - c = NULL; 234 - found: 235 - rcu_read_unlock(); 236 - mutex_unlock(&bch_fs_list_lock); 237 - 238 - return c; 239 } 240 241 static struct bch_fs *__bch2_uuid_to_fs(__uuid_t uuid) ··· 474 BUG_ON(!test_bit(BCH_FS_may_go_rw, &c->flags)); 475 476 if (WARN_ON(c->sb.features & BIT_ULL(BCH_FEATURE_no_alloc_info))) 477 - return -BCH_ERR_erofs_no_alloc_info; 478 479 if (test_bit(BCH_FS_initial_gc_unfixed, &c->flags)) { 480 bch_err(c, "cannot go rw, unfixed btree errors"); 481 - return -BCH_ERR_erofs_unfixed_errors; 482 } 483 484 if (c->sb.features & BIT_ULL(BCH_FEATURE_small_image)) { 485 bch_err(c, "cannot go rw, filesystem is an unresized image file"); 486 - return -BCH_ERR_erofs_filesystem_full; 487 } 488 489 if (test_bit(BCH_FS_rw, &c->flags)) ··· 501 502 clear_bit(BCH_FS_clean_shutdown, &c->flags); 503 504 - rcu_read_lock(); 505 - for_each_online_member_rcu(c, ca) 506 - if (ca->mi.state == BCH_MEMBER_STATE_rw) { 507 - bch2_dev_allocator_add(c, ca); 508 - enumerated_ref_start(&ca->io_ref[WRITE]); 509 - } 510 - rcu_read_unlock(); 511 512 bch2_recalc_capacity(c); 513 ··· 564 { 565 if (c->opts.recovery_pass_last && 566 c->opts.recovery_pass_last < BCH_RECOVERY_PASS_journal_replay) 567 - return -BCH_ERR_erofs_norecovery; 568 569 if (c->opts.nochanges) 570 - return -BCH_ERR_erofs_nochanges; 571 572 if (c->sb.features & BIT_ULL(BCH_FEATURE_no_alloc_info)) 573 - return -BCH_ERR_erofs_no_alloc_info; 574 575 return __bch2_fs_read_write(c, false); 576 } ··· 755 if (c->sb.multi_device && 756 __bch2_uuid_to_fs(c->sb.uuid)) { 757 bch_err(c, "filesystem UUID already open"); 758 - return -BCH_ERR_filesystem_uuid_already_open; 759 } 760 761 ret = bch2_fs_chardev_init(c); ··· 814 WQ_HIGHPRI|WQ_FREEZABLE|WQ_MEM_RECLAIM, 1)) || 815 !(c->write_ref_wq = alloc_workqueue("bcachefs_write_ref", 816 WQ_FREEZABLE, 0))) 817 - return -BCH_ERR_ENOMEM_fs_other_alloc; 818 819 int ret = bch2_fs_btree_interior_update_init(c) ?: 820 bch2_fs_btree_write_buffer_init(c) ?: ··· 995 mempool_init_kvmalloc_pool(&c->btree_bounce_pool, 1, 996 c->opts.btree_node_size) || 997 mempool_init_kmalloc_pool(&c->large_bkey_pool, 1, 2048)) { 998 - ret = -BCH_ERR_ENOMEM_fs_other_alloc; 999 goto err; 1000 } 1001 ··· 1031 ret = -EINVAL; 1032 goto err; 1033 } 1034 - bch_info(c, "Using encoding defined by superblock: utf8-%u.%u.%u", 1035 - unicode_major(BCH_FS_DEFAULT_UTF8_ENCODING), 1036 - unicode_minor(BCH_FS_DEFAULT_UTF8_ENCODING), 1037 - unicode_rev(BCH_FS_DEFAULT_UTF8_ENCODING)); 1038 #else 1039 if (c->sb.features & BIT_ULL(BCH_FEATURE_casefolding)) { 1040 printk(KERN_ERR "Cannot mount a filesystem with casefolding on a kernel without CONFIG_UNICODE\n"); ··· 1148 1149 print_mount_opts(c); 1150 1151 if (!bch2_fs_may_start(c)) 1152 - return -BCH_ERR_insufficient_devices_to_start; 1153 1154 down_write(&c->state_lock); 1155 mutex_lock(&c->sb_lock); ··· 1167 sizeof(struct bch_sb_field_ext) / sizeof(u64))) { 1168 mutex_unlock(&c->sb_lock); 1169 up_write(&c->state_lock); 1170 - ret = -BCH_ERR_ENOSPC_sb; 1171 goto err; 1172 } 1173 ··· 1178 goto err; 1179 } 1180 1181 - rcu_read_lock(); 1182 - for_each_online_member_rcu(c, ca) 1183 - bch2_members_v2_get_mut(c->disk_sb.sb, ca->dev_idx)->last_mount = 1184 - cpu_to_le64(now); 1185 - rcu_read_unlock(); 1186 1187 /* 1188 * Dno't write superblock yet: recovery might have to downgrade 1189 */ 1190 mutex_unlock(&c->sb_lock); 1191 1192 - rcu_read_lock(); 1193 - for_each_online_member_rcu(c, ca) 1194 - if (ca->mi.state == BCH_MEMBER_STATE_rw) 1195 - bch2_dev_allocator_add(c, ca); 1196 - rcu_read_unlock(); 1197 bch2_recalc_capacity(c); 1198 up_write(&c->state_lock); 1199 ··· 1209 goto err; 1210 1211 if (bch2_fs_init_fault("fs_start")) { 1212 - ret = -BCH_ERR_injected_fs_start; 1213 goto err; 1214 } 1215 ··· 1236 struct bch_member m = bch2_sb_member_get(sb, sb->dev_idx); 1237 1238 if (le16_to_cpu(sb->block_size) != block_sectors(c)) 1239 - return -BCH_ERR_mismatched_block_size; 1240 1241 if (le16_to_cpu(m.bucket_size) < 1242 BCH_SB_BTREE_NODE_SIZE(c->disk_sb.sb)) 1243 - return -BCH_ERR_bucket_size_too_small; 1244 1245 return 0; 1246 } ··· 1551 bch2_dev_attach(c, ca, dev_idx); 1552 return 0; 1553 err: 1554 - return -BCH_ERR_ENOMEM_dev_alloc; 1555 } 1556 1557 static int __bch2_dev_attach_bdev(struct bch_dev *ca, struct bch_sb_handle *sb) ··· 1561 if (bch2_dev_is_online(ca)) { 1562 bch_err(ca, "already have device online in slot %u", 1563 sb->sb->dev_idx); 1564 - return -BCH_ERR_device_already_online; 1565 } 1566 1567 if (get_capacity(sb->bdev->bd_disk) < 1568 ca->mi.bucket_size * ca->mi.nbuckets) { 1569 bch_err(ca, "cannot online: device too small"); 1570 - return -BCH_ERR_device_size_too_small; 1571 } 1572 1573 BUG_ON(!enumerated_ref_is_zero(&ca->io_ref[READ])); ··· 1719 return 0; 1720 1721 if (!bch2_dev_state_allowed(c, ca, new_state, flags)) 1722 - return -BCH_ERR_device_state_not_allowed; 1723 1724 if (new_state != BCH_MEMBER_STATE_rw) 1725 __bch2_dev_read_only(c, ca); ··· 1772 1773 if (!bch2_dev_state_allowed(c, ca, BCH_MEMBER_STATE_failed, flags)) { 1774 bch_err(ca, "Cannot remove without losing data"); 1775 - ret = -BCH_ERR_device_state_not_allowed; 1776 goto err; 1777 } 1778 ··· 1908 if (list_empty(&c->list)) { 1909 mutex_lock(&bch_fs_list_lock); 1910 if (__bch2_uuid_to_fs(c->sb.uuid)) 1911 - ret = -BCH_ERR_filesystem_uuid_already_open; 1912 else 1913 list_add(&c->list, &bch_fs_list); 1914 mutex_unlock(&bch_fs_list_lock); ··· 2095 if (!bch2_dev_state_allowed(c, ca, BCH_MEMBER_STATE_failed, flags)) { 2096 bch_err(ca, "Cannot offline required disk"); 2097 up_write(&c->state_lock); 2098 - return -BCH_ERR_device_state_not_allowed; 2099 } 2100 2101 __bch2_dev_offline(c, ca); ··· 2134 if (nbuckets > BCH_MEMBER_NBUCKETS_MAX) { 2135 bch_err(ca, "New device size too big (%llu greater than max %u)", 2136 nbuckets, BCH_MEMBER_NBUCKETS_MAX); 2137 - ret = -BCH_ERR_device_size_too_big; 2138 goto err; 2139 } 2140 ··· 2142 get_capacity(ca->disk_sb.bdev->bd_disk) < 2143 ca->mi.bucket_size * nbuckets) { 2144 bch_err(ca, "New size larger than device"); 2145 - ret = -BCH_ERR_device_size_too_small; 2146 goto err; 2147 } 2148 ··· 2377 } 2378 2379 if (opts->nochanges && !opts->read_only) { 2380 - ret = -BCH_ERR_erofs_nochanges; 2381 goto err_print; 2382 } 2383
··· 219 220 struct bch_fs *bch2_dev_to_fs(dev_t dev) 221 { 222 + guard(mutex)(&bch_fs_list_lock); 223 + guard(rcu)(); 224 + 225 struct bch_fs *c; 226 list_for_each_entry(c, &bch_fs_list, list) 227 for_each_member_device_rcu(c, ca, NULL) 228 if (ca->disk_sb.bdev && ca->disk_sb.bdev->bd_dev == dev) { 229 closure_get(&c->cl); 230 + return c; 231 } 232 + return NULL; 233 } 234 235 static struct bch_fs *__bch2_uuid_to_fs(__uuid_t uuid) ··· 480 BUG_ON(!test_bit(BCH_FS_may_go_rw, &c->flags)); 481 482 if (WARN_ON(c->sb.features & BIT_ULL(BCH_FEATURE_no_alloc_info))) 483 + return bch_err_throw(c, erofs_no_alloc_info); 484 485 if (test_bit(BCH_FS_initial_gc_unfixed, &c->flags)) { 486 bch_err(c, "cannot go rw, unfixed btree errors"); 487 + return bch_err_throw(c, erofs_unfixed_errors); 488 } 489 490 if (c->sb.features & BIT_ULL(BCH_FEATURE_small_image)) { 491 bch_err(c, "cannot go rw, filesystem is an unresized image file"); 492 + return bch_err_throw(c, erofs_filesystem_full); 493 } 494 495 if (test_bit(BCH_FS_rw, &c->flags)) ··· 507 508 clear_bit(BCH_FS_clean_shutdown, &c->flags); 509 510 + scoped_guard(rcu) 511 + for_each_online_member_rcu(c, ca) 512 + if (ca->mi.state == BCH_MEMBER_STATE_rw) { 513 + bch2_dev_allocator_add(c, ca); 514 + enumerated_ref_start(&ca->io_ref[WRITE]); 515 + } 516 517 bch2_recalc_capacity(c); 518 ··· 571 { 572 if (c->opts.recovery_pass_last && 573 c->opts.recovery_pass_last < BCH_RECOVERY_PASS_journal_replay) 574 + return bch_err_throw(c, erofs_norecovery); 575 576 if (c->opts.nochanges) 577 + return bch_err_throw(c, erofs_nochanges); 578 579 if (c->sb.features & BIT_ULL(BCH_FEATURE_no_alloc_info)) 580 + return bch_err_throw(c, erofs_no_alloc_info); 581 582 return __bch2_fs_read_write(c, false); 583 } ··· 762 if (c->sb.multi_device && 763 __bch2_uuid_to_fs(c->sb.uuid)) { 764 bch_err(c, "filesystem UUID already open"); 765 + return bch_err_throw(c, filesystem_uuid_already_open); 766 } 767 768 ret = bch2_fs_chardev_init(c); ··· 821 WQ_HIGHPRI|WQ_FREEZABLE|WQ_MEM_RECLAIM, 1)) || 822 !(c->write_ref_wq = alloc_workqueue("bcachefs_write_ref", 823 WQ_FREEZABLE, 0))) 824 + return bch_err_throw(c, ENOMEM_fs_other_alloc); 825 826 int ret = bch2_fs_btree_interior_update_init(c) ?: 827 bch2_fs_btree_write_buffer_init(c) ?: ··· 1002 mempool_init_kvmalloc_pool(&c->btree_bounce_pool, 1, 1003 c->opts.btree_node_size) || 1004 mempool_init_kmalloc_pool(&c->large_bkey_pool, 1, 2048)) { 1005 + ret = bch_err_throw(c, ENOMEM_fs_other_alloc); 1006 goto err; 1007 } 1008 ··· 1038 ret = -EINVAL; 1039 goto err; 1040 } 1041 #else 1042 if (c->sb.features & BIT_ULL(BCH_FEATURE_casefolding)) { 1043 printk(KERN_ERR "Cannot mount a filesystem with casefolding on a kernel without CONFIG_UNICODE\n"); ··· 1159 1160 print_mount_opts(c); 1161 1162 + #ifdef CONFIG_UNICODE 1163 + bch_info(c, "Using encoding defined by superblock: utf8-%u.%u.%u", 1164 + unicode_major(BCH_FS_DEFAULT_UTF8_ENCODING), 1165 + unicode_minor(BCH_FS_DEFAULT_UTF8_ENCODING), 1166 + unicode_rev(BCH_FS_DEFAULT_UTF8_ENCODING)); 1167 + #endif 1168 + 1169 if (!bch2_fs_may_start(c)) 1170 + return bch_err_throw(c, insufficient_devices_to_start); 1171 1172 down_write(&c->state_lock); 1173 mutex_lock(&c->sb_lock); ··· 1171 sizeof(struct bch_sb_field_ext) / sizeof(u64))) { 1172 mutex_unlock(&c->sb_lock); 1173 up_write(&c->state_lock); 1174 + ret = bch_err_throw(c, ENOSPC_sb); 1175 goto err; 1176 } 1177 ··· 1182 goto err; 1183 } 1184 1185 + scoped_guard(rcu) 1186 + for_each_online_member_rcu(c, ca) 1187 + bch2_members_v2_get_mut(c->disk_sb.sb, ca->dev_idx)->last_mount = 1188 + cpu_to_le64(now); 1189 1190 /* 1191 * Dno't write superblock yet: recovery might have to downgrade 1192 */ 1193 mutex_unlock(&c->sb_lock); 1194 1195 + scoped_guard(rcu) 1196 + for_each_online_member_rcu(c, ca) 1197 + if (ca->mi.state == BCH_MEMBER_STATE_rw) 1198 + bch2_dev_allocator_add(c, ca); 1199 bch2_recalc_capacity(c); 1200 up_write(&c->state_lock); 1201 ··· 1215 goto err; 1216 1217 if (bch2_fs_init_fault("fs_start")) { 1218 + ret = bch_err_throw(c, injected_fs_start); 1219 goto err; 1220 } 1221 ··· 1242 struct bch_member m = bch2_sb_member_get(sb, sb->dev_idx); 1243 1244 if (le16_to_cpu(sb->block_size) != block_sectors(c)) 1245 + return bch_err_throw(c, mismatched_block_size); 1246 1247 if (le16_to_cpu(m.bucket_size) < 1248 BCH_SB_BTREE_NODE_SIZE(c->disk_sb.sb)) 1249 + return bch_err_throw(c, bucket_size_too_small); 1250 1251 return 0; 1252 } ··· 1557 bch2_dev_attach(c, ca, dev_idx); 1558 return 0; 1559 err: 1560 + return bch_err_throw(c, ENOMEM_dev_alloc); 1561 } 1562 1563 static int __bch2_dev_attach_bdev(struct bch_dev *ca, struct bch_sb_handle *sb) ··· 1567 if (bch2_dev_is_online(ca)) { 1568 bch_err(ca, "already have device online in slot %u", 1569 sb->sb->dev_idx); 1570 + return bch_err_throw(ca->fs, device_already_online); 1571 } 1572 1573 if (get_capacity(sb->bdev->bd_disk) < 1574 ca->mi.bucket_size * ca->mi.nbuckets) { 1575 bch_err(ca, "cannot online: device too small"); 1576 + return bch_err_throw(ca->fs, device_size_too_small); 1577 } 1578 1579 BUG_ON(!enumerated_ref_is_zero(&ca->io_ref[READ])); ··· 1725 return 0; 1726 1727 if (!bch2_dev_state_allowed(c, ca, new_state, flags)) 1728 + return bch_err_throw(c, device_state_not_allowed); 1729 1730 if (new_state != BCH_MEMBER_STATE_rw) 1731 __bch2_dev_read_only(c, ca); ··· 1778 1779 if (!bch2_dev_state_allowed(c, ca, BCH_MEMBER_STATE_failed, flags)) { 1780 bch_err(ca, "Cannot remove without losing data"); 1781 + ret = bch_err_throw(c, device_state_not_allowed); 1782 goto err; 1783 } 1784 ··· 1914 if (list_empty(&c->list)) { 1915 mutex_lock(&bch_fs_list_lock); 1916 if (__bch2_uuid_to_fs(c->sb.uuid)) 1917 + ret = bch_err_throw(c, filesystem_uuid_already_open); 1918 else 1919 list_add(&c->list, &bch_fs_list); 1920 mutex_unlock(&bch_fs_list_lock); ··· 2101 if (!bch2_dev_state_allowed(c, ca, BCH_MEMBER_STATE_failed, flags)) { 2102 bch_err(ca, "Cannot offline required disk"); 2103 up_write(&c->state_lock); 2104 + return bch_err_throw(c, device_state_not_allowed); 2105 } 2106 2107 __bch2_dev_offline(c, ca); ··· 2140 if (nbuckets > BCH_MEMBER_NBUCKETS_MAX) { 2141 bch_err(ca, "New device size too big (%llu greater than max %u)", 2142 nbuckets, BCH_MEMBER_NBUCKETS_MAX); 2143 + ret = bch_err_throw(c, device_size_too_big); 2144 goto err; 2145 } 2146 ··· 2148 get_capacity(ca->disk_sb.bdev->bd_disk) < 2149 ca->mi.bucket_size * nbuckets) { 2150 bch_err(ca, "New size larger than device"); 2151 + ret = bch_err_throw(c, device_size_too_small); 2152 goto err; 2153 } 2154 ··· 2383 } 2384 2385 if (opts->nochanges && !opts->read_only) { 2386 + ret = bch_err_throw(c, erofs_nochanges); 2387 goto err_print; 2388 } 2389
+24
fs/bcachefs/sysfs.c
··· 26 #include "disk_groups.h" 27 #include "ec.h" 28 #include "enumerated_ref.h" 29 #include "inode.h" 30 #include "journal.h" 31 #include "journal_reclaim.h" ··· 38 #include "rebalance.h" 39 #include "recovery_passes.h" 40 #include "replicas.h" 41 #include "super-io.h" 42 #include "tests.h" 43 ··· 145 write_attribute(trigger_gc); 146 write_attribute(trigger_discards); 147 write_attribute(trigger_invalidates); 148 write_attribute(trigger_journal_flush); 149 write_attribute(trigger_journal_writes); 150 write_attribute(trigger_btree_cache_shrink); ··· 154 write_attribute(trigger_freelist_wakeup); 155 write_attribute(trigger_recalc_capacity); 156 write_attribute(trigger_delete_dead_snapshots); 157 read_attribute(gc_gens_pos); 158 159 read_attribute(uuid); ··· 176 177 read_attribute(btree_cache_size); 178 read_attribute(compression_stats); 179 read_attribute(journal_debug); 180 read_attribute(btree_cache); 181 read_attribute(btree_key_cache); ··· 358 if (attr == &sysfs_compression_stats) 359 bch2_compression_stats_to_text(out, c); 360 361 if (attr == &sysfs_new_stripes) 362 bch2_new_stripes_to_text(out, c); 363 ··· 436 if (attr == &sysfs_trigger_invalidates) 437 bch2_do_invalidates(c); 438 439 if (attr == &sysfs_trigger_journal_flush) { 440 bch2_journal_flush_all_pins(&c->journal); 441 bch2_journal_meta(&c->journal); ··· 458 459 if (attr == &sysfs_trigger_delete_dead_snapshots) 460 __bch2_delete_dead_snapshots(c); 461 462 #ifdef CONFIG_BCACHEFS_TESTS 463 if (attr == &sysfs_perf_test) { ··· 504 &sysfs_recovery_status, 505 506 &sysfs_compression_stats, 507 508 #ifdef CONFIG_BCACHEFS_TESTS 509 &sysfs_perf_test, ··· 593 &sysfs_trigger_gc, 594 &sysfs_trigger_discards, 595 &sysfs_trigger_invalidates, 596 &sysfs_trigger_journal_flush, 597 &sysfs_trigger_journal_writes, 598 &sysfs_trigger_btree_cache_shrink, ··· 602 &sysfs_trigger_freelist_wakeup, 603 &sysfs_trigger_recalc_capacity, 604 &sysfs_trigger_delete_dead_snapshots, 605 606 &sysfs_gc_gens_pos, 607
··· 26 #include "disk_groups.h" 27 #include "ec.h" 28 #include "enumerated_ref.h" 29 + #include "error.h" 30 #include "inode.h" 31 #include "journal.h" 32 #include "journal_reclaim.h" ··· 37 #include "rebalance.h" 38 #include "recovery_passes.h" 39 #include "replicas.h" 40 + #include "sb-errors.h" 41 #include "super-io.h" 42 #include "tests.h" 43 ··· 143 write_attribute(trigger_gc); 144 write_attribute(trigger_discards); 145 write_attribute(trigger_invalidates); 146 + write_attribute(trigger_journal_commit); 147 write_attribute(trigger_journal_flush); 148 write_attribute(trigger_journal_writes); 149 write_attribute(trigger_btree_cache_shrink); ··· 151 write_attribute(trigger_freelist_wakeup); 152 write_attribute(trigger_recalc_capacity); 153 write_attribute(trigger_delete_dead_snapshots); 154 + write_attribute(trigger_emergency_read_only); 155 read_attribute(gc_gens_pos); 156 157 read_attribute(uuid); ··· 172 173 read_attribute(btree_cache_size); 174 read_attribute(compression_stats); 175 + read_attribute(errors); 176 read_attribute(journal_debug); 177 read_attribute(btree_cache); 178 read_attribute(btree_key_cache); ··· 353 if (attr == &sysfs_compression_stats) 354 bch2_compression_stats_to_text(out, c); 355 356 + if (attr == &sysfs_errors) 357 + bch2_fs_errors_to_text(out, c); 358 + 359 if (attr == &sysfs_new_stripes) 360 bch2_new_stripes_to_text(out, c); 361 ··· 428 if (attr == &sysfs_trigger_invalidates) 429 bch2_do_invalidates(c); 430 431 + if (attr == &sysfs_trigger_journal_commit) 432 + bch2_journal_flush(&c->journal); 433 + 434 if (attr == &sysfs_trigger_journal_flush) { 435 bch2_journal_flush_all_pins(&c->journal); 436 bch2_journal_meta(&c->journal); ··· 447 448 if (attr == &sysfs_trigger_delete_dead_snapshots) 449 __bch2_delete_dead_snapshots(c); 450 + 451 + if (attr == &sysfs_trigger_emergency_read_only) { 452 + struct printbuf buf = PRINTBUF; 453 + bch2_log_msg_start(c, &buf); 454 + 455 + prt_printf(&buf, "shutdown by sysfs\n"); 456 + bch2_fs_emergency_read_only2(c, &buf); 457 + bch2_print_str(c, KERN_ERR, buf.buf); 458 + printbuf_exit(&buf); 459 + } 460 461 #ifdef CONFIG_BCACHEFS_TESTS 462 if (attr == &sysfs_perf_test) { ··· 483 &sysfs_recovery_status, 484 485 &sysfs_compression_stats, 486 + &sysfs_errors, 487 488 #ifdef CONFIG_BCACHEFS_TESTS 489 &sysfs_perf_test, ··· 571 &sysfs_trigger_gc, 572 &sysfs_trigger_discards, 573 &sysfs_trigger_invalidates, 574 + &sysfs_trigger_journal_commit, 575 &sysfs_trigger_journal_flush, 576 &sysfs_trigger_journal_writes, 577 &sysfs_trigger_btree_cache_shrink, ··· 579 &sysfs_trigger_freelist_wakeup, 580 &sysfs_trigger_recalc_capacity, 581 &sysfs_trigger_delete_dead_snapshots, 582 + &sysfs_trigger_emergency_read_only, 583 584 &sysfs_gc_gens_pos, 585
+52 -17
fs/bcachefs/trace.h
··· 199 (unsigned long long)__entry->sector, __entry->nr_sector) 200 ); 201 202 /* disk_accounting.c */ 203 204 TRACE_EVENT(accounting_mem_insert, ··· 1475 TP_ARGS(c, str) 1476 ); 1477 1478 DEFINE_EVENT(fs_str, io_move_created_rebalance, 1479 TP_PROTO(struct bch_fs *c, const char *str), 1480 TP_ARGS(c, str) 1481 ); 1482 1483 - TRACE_EVENT(error_downcast, 1484 - TP_PROTO(int bch_err, int std_err, unsigned long ip), 1485 - TP_ARGS(bch_err, std_err, ip), 1486 - 1487 - TP_STRUCT__entry( 1488 - __array(char, bch_err, 32 ) 1489 - __array(char, std_err, 32 ) 1490 - __array(char, ip, 32 ) 1491 - ), 1492 - 1493 - TP_fast_assign( 1494 - strscpy(__entry->bch_err, bch2_err_str(bch_err), sizeof(__entry->bch_err)); 1495 - strscpy(__entry->std_err, bch2_err_str(std_err), sizeof(__entry->std_err)); 1496 - snprintf(__entry->ip, sizeof(__entry->ip), "%ps", (void *) ip); 1497 - ), 1498 - 1499 - TP_printk("%s -> %s %s", __entry->bch_err, __entry->std_err, __entry->ip) 1500 ); 1501 1502 #ifdef CONFIG_BCACHEFS_PATH_TRACEPOINTS
··· 199 (unsigned long long)__entry->sector, __entry->nr_sector) 200 ); 201 202 + /* errors */ 203 + 204 + TRACE_EVENT(error_throw, 205 + TP_PROTO(struct bch_fs *c, int bch_err, unsigned long ip), 206 + TP_ARGS(c, bch_err, ip), 207 + 208 + TP_STRUCT__entry( 209 + __field(dev_t, dev ) 210 + __field(int, err ) 211 + __array(char, err_str, 32 ) 212 + __array(char, ip, 32 ) 213 + ), 214 + 215 + TP_fast_assign( 216 + __entry->dev = c->dev; 217 + __entry->err = bch_err; 218 + strscpy(__entry->err_str, bch2_err_str(bch_err), sizeof(__entry->err_str)); 219 + snprintf(__entry->ip, sizeof(__entry->ip), "%ps", (void *) ip); 220 + ), 221 + 222 + TP_printk("%d,%d %s ret %s", MAJOR(__entry->dev), MINOR(__entry->dev), 223 + __entry->ip, __entry->err_str) 224 + ); 225 + 226 + TRACE_EVENT(error_downcast, 227 + TP_PROTO(int bch_err, int std_err, unsigned long ip), 228 + TP_ARGS(bch_err, std_err, ip), 229 + 230 + TP_STRUCT__entry( 231 + __array(char, bch_err, 32 ) 232 + __array(char, std_err, 32 ) 233 + __array(char, ip, 32 ) 234 + ), 235 + 236 + TP_fast_assign( 237 + strscpy(__entry->bch_err, bch2_err_str(bch_err), sizeof(__entry->bch_err)); 238 + strscpy(__entry->std_err, bch2_err_str(std_err), sizeof(__entry->std_err)); 239 + snprintf(__entry->ip, sizeof(__entry->ip), "%ps", (void *) ip); 240 + ), 241 + 242 + TP_printk("%s ret %s -> %s %s", __entry->ip, 243 + __entry->bch_err, __entry->std_err, __entry->ip) 244 + ); 245 + 246 /* disk_accounting.c */ 247 248 TRACE_EVENT(accounting_mem_insert, ··· 1431 TP_ARGS(c, str) 1432 ); 1433 1434 + DEFINE_EVENT(fs_str, io_move_pred, 1435 + TP_PROTO(struct bch_fs *c, const char *str), 1436 + TP_ARGS(c, str) 1437 + ); 1438 + 1439 DEFINE_EVENT(fs_str, io_move_created_rebalance, 1440 TP_PROTO(struct bch_fs *c, const char *str), 1441 TP_ARGS(c, str) 1442 ); 1443 1444 + DEFINE_EVENT(fs_str, io_move_evacuate_bucket, 1445 + TP_PROTO(struct bch_fs *c, const char *str), 1446 + TP_ARGS(c, str) 1447 ); 1448 1449 #ifdef CONFIG_BCACHEFS_PATH_TRACEPOINTS