Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2

+11

Documentation/ABI/obsolete/o2cb

··· 1 + What: /sys/o2cb symlink 2 + Date: Dec 2005 3 + KernelVersion: 2.6.16 4 + Contact: ocfs2-devel@oss.oracle.com 5 + Description: This is a symlink: /sys/o2cb to /sys/fs/o2cb. The symlink will 6 + be removed when new versions of ocfs2-tools which know to look 7 + in /sys/fs/o2cb are sufficiently prevalent. Don't code new 8 + software to look here, it should try /sys/fs/o2cb instead. 9 + See Documentation/ABI/stable/o2cb for more information on usage. 10 + Users: ocfs2-tools. It's sufficient to mail proposed changes to 11 + ocfs2-devel@oss.oracle.com.

+10

Documentation/ABI/stable/o2cb

··· 1 + What: /sys/fs/o2cb/ (was /sys/o2cb) 2 + Date: Dec 2005 3 + KernelVersion: 2.6.16 4 + Contact: ocfs2-devel@oss.oracle.com 5 + Description: Ocfs2-tools looks at 'interface-revision' for versioning 6 + information. Each logmask/ file controls a set of debug prints 7 + and can be written into with the strings "allow", "deny", or 8 + "off". Reading the file returns the current state. 9 + Users: ocfs2-tools. It's sufficient to mail proposed changes to 10 + ocfs2-devel@oss.oracle.com.

+89

Documentation/ABI/testing/sysfs-ocfs2

··· 1 + What: /sys/fs/ocfs2/ 2 + Date: April 2008 3 + Contact: ocfs2-devel@oss.oracle.com 4 + Description: 5 + The /sys/fs/ocfs2 directory contains knobs used by the 6 + ocfs2-tools to interact with the filesystem. 7 + 8 + What: /sys/fs/ocfs2/max_locking_protocol 9 + Date: April 2008 10 + Contact: ocfs2-devel@oss.oracle.com 11 + Description: 12 + The /sys/fs/ocfs2/max_locking_protocol file displays version 13 + of ocfs2 locking supported by the filesystem. This version 14 + covers how ocfs2 uses distributed locking between cluster 15 + nodes. 16 + 17 + The protocol version has a major and minor number. Two 18 + cluster nodes can interoperate if they have an identical 19 + major number and an overlapping minor number - thus, 20 + a node with version 1.10 can interoperate with a node 21 + sporting version 1.8, as long as both use the 1.8 protocol. 22 + 23 + Reading from this file returns a single line, the major 24 + number and minor number joined by a period, eg "1.10". 25 + 26 + This file is read-only. The value is compiled into the 27 + driver. 28 + 29 + What: /sys/fs/ocfs2/loaded_cluster_plugins 30 + Date: April 2008 31 + Contact: ocfs2-devel@oss.oracle.com 32 + Description: 33 + The /sys/fs/ocfs2/loaded_cluster_plugins file describes 34 + the available plugins to support ocfs2 cluster operation. 35 + A cluster plugin is required to use ocfs2 in a cluster. 36 + There are currently two available plugins: 37 + 38 + * 'o2cb' - The classic o2cb cluster stack that ocfs2 has 39 + used since its inception. 40 + * 'user' - A plugin supporting userspace cluster software 41 + in conjunction with fs/dlm. 42 + 43 + Reading from this file returns the names of all loaded 44 + plugins, one per line. 45 + 46 + This file is read-only. Its contents may change as 47 + plugins are loaded or removed. 48 + 49 + What: /sys/fs/ocfs2/active_cluster_plugin 50 + Date: April 2008 51 + Contact: ocfs2-devel@oss.oracle.com 52 + Description: 53 + The /sys/fs/ocfs2/active_cluster_plugin displays which 54 + cluster plugin is currently in use by the filesystem. 55 + The active plugin will appear in the loaded_cluster_plugins 56 + file as well. Only one plugin can be used at a time. 57 + 58 + Reading from this file returns the name of the active plugin 59 + on a single line. 60 + 61 + This file is read-only. Which plugin is active depends on 62 + the cluster stack in use. The contents may change 63 + when all filesystems are unmounted and the cluster stack 64 + is changed. 65 + 66 + What: /sys/fs/ocfs2/cluster_stack 67 + Date: April 2008 68 + Contact: ocfs2-devel@oss.oracle.com 69 + Description: 70 + The /sys/fs/ocfs2/cluster_stack file contains the name 71 + of current ocfs2 cluster stack. This value is set by 72 + userspace tools when bringing the cluster stack online. 73 + 74 + Cluster stack names are 4 characters in length. 75 + 76 + When the 'o2cb' cluster stack is used, the 'o2cb' cluster 77 + plugin is active. All other cluster stacks use the 'user' 78 + cluster plugin. 79 + 80 + Reading from this file returns the name of the current 81 + cluster stack on a single line. 82 + 83 + Writing a new stack name to this file changes the current 84 + cluster stack unless there are mounted ocfs2 filesystems. 85 + If there are mounted filesystems, attempts to change the 86 + stack return an error. 87 + 88 + Users: 89 + ocfs2-tools <ocfs2-tools-devel@oss.oracle.com>

+10

Documentation/feature-removal-schedule.txt

··· 318 318 code / infrastructure should be in the kernel and not in some 319 319 out-of-tree driver. 320 320 Who: Thomas Gleixner <tglx@linutronix.de> 321 + 322 + --------------------------- 323 + 324 + What: /sys/o2cb symlink 325 + When: January 2010 326 + Why: /sys/fs/o2cb is the proper location for this information - /sys/o2cb 327 + exists as a symlink for backwards compatibility for old versions of 328 + ocfs2-tools. 2 years should be sufficient time to phase in new versions 329 + which know to look in /sys/fs/o2cb. 330 + Who: ocfs2-devel@oss.oracle.com

+1

MAINTAINERS

··· 2952 2952 M: joel.becker@oracle.com 2953 2953 L: ocfs2-devel@oss.oracle.com 2954 2954 W: http://oss.oracle.com/projects/ocfs2/ 2955 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git 2955 2956 S: Supported 2956 2957 2957 2958 OMNIKEY CARDMAN 4000 DRIVER

+26

fs/Kconfig

··· 444 444 For more information on OCFS2, see the file 445 445 <file:Documentation/filesystems/ocfs2.txt>. 446 446 447 + config OCFS2_FS_O2CB 448 + tristate "O2CB Kernelspace Clustering" 449 + depends on OCFS2_FS 450 + default y 451 + help 452 + OCFS2 includes a simple kernelspace clustering package, the OCFS2 453 + Cluster Base. It only requires a very small userspace component 454 + to configure it. This comes with the standard ocfs2-tools package. 455 + O2CB is limited to maintaining a cluster for OCFS2 file systems. 456 + It cannot manage any other cluster applications. 457 + 458 + It is always safe to say Y here, as the clustering method is 459 + run-time selectable. 460 + 461 + config OCFS2_FS_USERSPACE_CLUSTER 462 + tristate "OCFS2 Userspace Clustering" 463 + depends on OCFS2_FS && DLM 464 + default y 465 + help 466 + This option will allow OCFS2 to use userspace clustering services 467 + in conjunction with the DLM in fs/dlm. If you are using a 468 + userspace cluster manager, say Y here. 469 + 470 + It is safe to say Y, as the clustering method is run-time 471 + selectable. 472 + 447 473 config OCFS2_DEBUG_MASKLOG 448 474 bool "OCFS2 logging support" 449 475 depends on OCFS2_FS

+12 -2

fs/ocfs2/Makefile

··· 2 2 3 3 EXTRA_CFLAGS += -DCATCH_BH_JBD_RACES 4 4 5 - obj-$(CONFIG_OCFS2_FS) += ocfs2.o 5 + obj-$(CONFIG_OCFS2_FS) += \ 6 + ocfs2.o \ 7 + ocfs2_stackglue.o 8 + 9 + obj-$(CONFIG_OCFS2_FS_O2CB) += ocfs2_stack_o2cb.o 10 + obj-$(CONFIG_OCFS2_FS_USERSPACE_CLUSTER) += ocfs2_stack_user.o 6 11 7 12 ocfs2-objs := \ 8 13 alloc.o \ ··· 36 31 uptodate.o \ 37 32 ver.o 38 33 34 + ocfs2_stackglue-objs := stackglue.o 35 + ocfs2_stack_o2cb-objs := stack_o2cb.o 36 + ocfs2_stack_user-objs := stack_user.o 37 + 38 + # cluster/ is always needed when OCFS2_FS for masklog support 39 39 obj-$(CONFIG_OCFS2_FS) += cluster/ 40 - obj-$(CONFIG_OCFS2_FS) += dlm/ 40 + obj-$(CONFIG_OCFS2_FS_O2CB) += dlm/

+416 -53

fs/ocfs2/alloc.c

··· 1029 1029 BUG_ON(!next_free); 1030 1030 1031 1031 /* The tree code before us didn't allow enough room in the leaf. */ 1032 - if (el->l_next_free_rec == el->l_count && !has_empty) 1033 - BUG(); 1032 + BUG_ON(el->l_next_free_rec == el->l_count && !has_empty); 1034 1033 1035 1034 /* 1036 1035 * The easiest way to approach this is to just remove the ··· 1449 1450 * - When our insert into the right path leaf is at the leftmost edge 1450 1451 * and requires an update of the path immediately to it's left. This 1451 1452 * can occur at the end of some types of rotation and appending inserts. 1453 + * - When we've adjusted the last extent record in the left path leaf and the 1454 + * 1st extent record in the right path leaf during cross extent block merge. 1452 1455 */ 1453 1456 static void ocfs2_complete_edge_insert(struct inode *inode, handle_t *handle, 1454 1457 struct ocfs2_path *left_path, ··· 2713 2712 } 2714 2713 } 2715 2714 2716 - /* 2717 - * Remove split_rec clusters from the record at index and merge them 2718 - * onto the beginning of the record at index + 1. 2719 - */ 2720 - static int ocfs2_merge_rec_right(struct inode *inode, struct buffer_head *bh, 2721 - handle_t *handle, 2722 - struct ocfs2_extent_rec *split_rec, 2723 - struct ocfs2_extent_list *el, int index) 2715 + static int ocfs2_get_right_path(struct inode *inode, 2716 + struct ocfs2_path *left_path, 2717 + struct ocfs2_path **ret_right_path) 2724 2718 { 2725 2719 int ret; 2720 + u32 right_cpos; 2721 + struct ocfs2_path *right_path = NULL; 2722 + struct ocfs2_extent_list *left_el; 2723 + 2724 + *ret_right_path = NULL; 2725 + 2726 + /* This function shouldn't be called for non-trees. */ 2727 + BUG_ON(left_path->p_tree_depth == 0); 2728 + 2729 + left_el = path_leaf_el(left_path); 2730 + BUG_ON(left_el->l_next_free_rec != left_el->l_count); 2731 + 2732 + ret = ocfs2_find_cpos_for_right_leaf(inode->i_sb, left_path, 2733 + &right_cpos); 2734 + if (ret) { 2735 + mlog_errno(ret); 2736 + goto out; 2737 + } 2738 + 2739 + /* This function shouldn't be called for the rightmost leaf. */ 2740 + BUG_ON(right_cpos == 0); 2741 + 2742 + right_path = ocfs2_new_path(path_root_bh(left_path), 2743 + path_root_el(left_path)); 2744 + if (!right_path) { 2745 + ret = -ENOMEM; 2746 + mlog_errno(ret); 2747 + goto out; 2748 + } 2749 + 2750 + ret = ocfs2_find_path(inode, right_path, right_cpos); 2751 + if (ret) { 2752 + mlog_errno(ret); 2753 + goto out; 2754 + } 2755 + 2756 + *ret_right_path = right_path; 2757 + out: 2758 + if (ret) 2759 + ocfs2_free_path(right_path); 2760 + return ret; 2761 + } 2762 + 2763 + /* 2764 + * Remove split_rec clusters from the record at index and merge them 2765 + * onto the beginning of the record "next" to it. 2766 + * For index < l_count - 1, the next means the extent rec at index + 1. 2767 + * For index == l_count - 1, the "next" means the 1st extent rec of the 2768 + * next extent block. 2769 + */ 2770 + static int ocfs2_merge_rec_right(struct inode *inode, 2771 + struct ocfs2_path *left_path, 2772 + handle_t *handle, 2773 + struct ocfs2_extent_rec *split_rec, 2774 + int index) 2775 + { 2776 + int ret, next_free, i; 2726 2777 unsigned int split_clusters = le16_to_cpu(split_rec->e_leaf_clusters); 2727 2778 struct ocfs2_extent_rec *left_rec; 2728 2779 struct ocfs2_extent_rec *right_rec; 2780 + struct ocfs2_extent_list *right_el; 2781 + struct ocfs2_path *right_path = NULL; 2782 + int subtree_index = 0; 2783 + struct ocfs2_extent_list *el = path_leaf_el(left_path); 2784 + struct buffer_head *bh = path_leaf_bh(left_path); 2785 + struct buffer_head *root_bh = NULL; 2729 2786 2730 2787 BUG_ON(index >= le16_to_cpu(el->l_next_free_rec)); 2731 - 2732 2788 left_rec = &el->l_recs[index]; 2733 - right_rec = &el->l_recs[index + 1]; 2789 + 2790 + if (index == le16_to_cpu(el->l_next_free_rec - 1) && 2791 + le16_to_cpu(el->l_next_free_rec) == le16_to_cpu(el->l_count)) { 2792 + /* we meet with a cross extent block merge. */ 2793 + ret = ocfs2_get_right_path(inode, left_path, &right_path); 2794 + if (ret) { 2795 + mlog_errno(ret); 2796 + goto out; 2797 + } 2798 + 2799 + right_el = path_leaf_el(right_path); 2800 + next_free = le16_to_cpu(right_el->l_next_free_rec); 2801 + BUG_ON(next_free <= 0); 2802 + right_rec = &right_el->l_recs[0]; 2803 + if (ocfs2_is_empty_extent(right_rec)) { 2804 + BUG_ON(le16_to_cpu(next_free) <= 1); 2805 + right_rec = &right_el->l_recs[1]; 2806 + } 2807 + 2808 + BUG_ON(le32_to_cpu(left_rec->e_cpos) + 2809 + le16_to_cpu(left_rec->e_leaf_clusters) != 2810 + le32_to_cpu(right_rec->e_cpos)); 2811 + 2812 + subtree_index = ocfs2_find_subtree_root(inode, 2813 + left_path, right_path); 2814 + 2815 + ret = ocfs2_extend_rotate_transaction(handle, subtree_index, 2816 + handle->h_buffer_credits, 2817 + right_path); 2818 + if (ret) { 2819 + mlog_errno(ret); 2820 + goto out; 2821 + } 2822 + 2823 + root_bh = left_path->p_node[subtree_index].bh; 2824 + BUG_ON(root_bh != right_path->p_node[subtree_index].bh); 2825 + 2826 + ret = ocfs2_journal_access(handle, inode, root_bh, 2827 + OCFS2_JOURNAL_ACCESS_WRITE); 2828 + if (ret) { 2829 + mlog_errno(ret); 2830 + goto out; 2831 + } 2832 + 2833 + for (i = subtree_index + 1; 2834 + i < path_num_items(right_path); i++) { 2835 + ret = ocfs2_journal_access(handle, inode, 2836 + right_path->p_node[i].bh, 2837 + OCFS2_JOURNAL_ACCESS_WRITE); 2838 + if (ret) { 2839 + mlog_errno(ret); 2840 + goto out; 2841 + } 2842 + 2843 + ret = ocfs2_journal_access(handle, inode, 2844 + left_path->p_node[i].bh, 2845 + OCFS2_JOURNAL_ACCESS_WRITE); 2846 + if (ret) { 2847 + mlog_errno(ret); 2848 + goto out; 2849 + } 2850 + } 2851 + 2852 + } else { 2853 + BUG_ON(index == le16_to_cpu(el->l_next_free_rec) - 1); 2854 + right_rec = &el->l_recs[index + 1]; 2855 + } 2734 2856 2735 2857 ret = ocfs2_journal_access(handle, inode, bh, 2736 2858 OCFS2_JOURNAL_ACCESS_WRITE); ··· 2875 2751 if (ret) 2876 2752 mlog_errno(ret); 2877 2753 2754 + if (right_path) { 2755 + ret = ocfs2_journal_dirty(handle, path_leaf_bh(right_path)); 2756 + if (ret) 2757 + mlog_errno(ret); 2758 + 2759 + ocfs2_complete_edge_insert(inode, handle, left_path, 2760 + right_path, subtree_index); 2761 + } 2878 2762 out: 2763 + if (right_path) 2764 + ocfs2_free_path(right_path); 2765 + return ret; 2766 + } 2767 + 2768 + static int ocfs2_get_left_path(struct inode *inode, 2769 + struct ocfs2_path *right_path, 2770 + struct ocfs2_path **ret_left_path) 2771 + { 2772 + int ret; 2773 + u32 left_cpos; 2774 + struct ocfs2_path *left_path = NULL; 2775 + 2776 + *ret_left_path = NULL; 2777 + 2778 + /* This function shouldn't be called for non-trees. */ 2779 + BUG_ON(right_path->p_tree_depth == 0); 2780 + 2781 + ret = ocfs2_find_cpos_for_left_leaf(inode->i_sb, 2782 + right_path, &left_cpos); 2783 + if (ret) { 2784 + mlog_errno(ret); 2785 + goto out; 2786 + } 2787 + 2788 + /* This function shouldn't be called for the leftmost leaf. */ 2789 + BUG_ON(left_cpos == 0); 2790 + 2791 + left_path = ocfs2_new_path(path_root_bh(right_path), 2792 + path_root_el(right_path)); 2793 + if (!left_path) { 2794 + ret = -ENOMEM; 2795 + mlog_errno(ret); 2796 + goto out; 2797 + } 2798 + 2799 + ret = ocfs2_find_path(inode, left_path, left_cpos); 2800 + if (ret) { 2801 + mlog_errno(ret); 2802 + goto out; 2803 + } 2804 + 2805 + *ret_left_path = left_path; 2806 + out: 2807 + if (ret) 2808 + ocfs2_free_path(left_path); 2879 2809 return ret; 2880 2810 } 2881 2811 2882 2812 /* 2883 2813 * Remove split_rec clusters from the record at index and merge them 2884 - * onto the tail of the record at index - 1. 2814 + * onto the tail of the record "before" it. 2815 + * For index > 0, the "before" means the extent rec at index - 1. 2816 + * 2817 + * For index == 0, the "before" means the last record of the previous 2818 + * extent block. And there is also a situation that we may need to 2819 + * remove the rightmost leaf extent block in the right_path and change 2820 + * the right path to indicate the new rightmost path. 2885 2821 */ 2886 - static int ocfs2_merge_rec_left(struct inode *inode, struct buffer_head *bh, 2822 + static int ocfs2_merge_rec_left(struct inode *inode, 2823 + struct ocfs2_path *right_path, 2887 2824 handle_t *handle, 2888 2825 struct ocfs2_extent_rec *split_rec, 2889 - struct ocfs2_extent_list *el, int index) 2826 + struct ocfs2_cached_dealloc_ctxt *dealloc, 2827 + int index) 2890 2828 { 2891 - int ret, has_empty_extent = 0; 2829 + int ret, i, subtree_index = 0, has_empty_extent = 0; 2892 2830 unsigned int split_clusters = le16_to_cpu(split_rec->e_leaf_clusters); 2893 2831 struct ocfs2_extent_rec *left_rec; 2894 2832 struct ocfs2_extent_rec *right_rec; 2833 + struct ocfs2_extent_list *el = path_leaf_el(right_path); 2834 + struct buffer_head *bh = path_leaf_bh(right_path); 2835 + struct buffer_head *root_bh = NULL; 2836 + struct ocfs2_path *left_path = NULL; 2837 + struct ocfs2_extent_list *left_el; 2895 2838 2896 - BUG_ON(index <= 0); 2839 + BUG_ON(index < 0); 2897 2840 2898 - left_rec = &el->l_recs[index - 1]; 2899 2841 right_rec = &el->l_recs[index]; 2900 - if (ocfs2_is_empty_extent(&el->l_recs[0])) 2901 - has_empty_extent = 1; 2842 + if (index == 0) { 2843 + /* we meet with a cross extent block merge. */ 2844 + ret = ocfs2_get_left_path(inode, right_path, &left_path); 2845 + if (ret) { 2846 + mlog_errno(ret); 2847 + goto out; 2848 + } 2849 + 2850 + left_el = path_leaf_el(left_path); 2851 + BUG_ON(le16_to_cpu(left_el->l_next_free_rec) != 2852 + le16_to_cpu(left_el->l_count)); 2853 + 2854 + left_rec = &left_el->l_recs[ 2855 + le16_to_cpu(left_el->l_next_free_rec) - 1]; 2856 + BUG_ON(le32_to_cpu(left_rec->e_cpos) + 2857 + le16_to_cpu(left_rec->e_leaf_clusters) != 2858 + le32_to_cpu(split_rec->e_cpos)); 2859 + 2860 + subtree_index = ocfs2_find_subtree_root(inode, 2861 + left_path, right_path); 2862 + 2863 + ret = ocfs2_extend_rotate_transaction(handle, subtree_index, 2864 + handle->h_buffer_credits, 2865 + left_path); 2866 + if (ret) { 2867 + mlog_errno(ret); 2868 + goto out; 2869 + } 2870 + 2871 + root_bh = left_path->p_node[subtree_index].bh; 2872 + BUG_ON(root_bh != right_path->p_node[subtree_index].bh); 2873 + 2874 + ret = ocfs2_journal_access(handle, inode, root_bh, 2875 + OCFS2_JOURNAL_ACCESS_WRITE); 2876 + if (ret) { 2877 + mlog_errno(ret); 2878 + goto out; 2879 + } 2880 + 2881 + for (i = subtree_index + 1; 2882 + i < path_num_items(right_path); i++) { 2883 + ret = ocfs2_journal_access(handle, inode, 2884 + right_path->p_node[i].bh, 2885 + OCFS2_JOURNAL_ACCESS_WRITE); 2886 + if (ret) { 2887 + mlog_errno(ret); 2888 + goto out; 2889 + } 2890 + 2891 + ret = ocfs2_journal_access(handle, inode, 2892 + left_path->p_node[i].bh, 2893 + OCFS2_JOURNAL_ACCESS_WRITE); 2894 + if (ret) { 2895 + mlog_errno(ret); 2896 + goto out; 2897 + } 2898 + } 2899 + } else { 2900 + left_rec = &el->l_recs[index - 1]; 2901 + if (ocfs2_is_empty_extent(&el->l_recs[0])) 2902 + has_empty_extent = 1; 2903 + } 2902 2904 2903 2905 ret = ocfs2_journal_access(handle, inode, bh, 2904 2906 OCFS2_JOURNAL_ACCESS_WRITE); ··· 3040 2790 *left_rec = *split_rec; 3041 2791 3042 2792 has_empty_extent = 0; 3043 - } else { 2793 + } else 3044 2794 le16_add_cpu(&left_rec->e_leaf_clusters, split_clusters); 3045 - } 3046 2795 3047 2796 le32_add_cpu(&right_rec->e_cpos, split_clusters); 3048 2797 le64_add_cpu(&right_rec->e_blkno, ··· 3054 2805 if (ret) 3055 2806 mlog_errno(ret); 3056 2807 2808 + if (left_path) { 2809 + ret = ocfs2_journal_dirty(handle, path_leaf_bh(left_path)); 2810 + if (ret) 2811 + mlog_errno(ret); 2812 + 2813 + /* 2814 + * In the situation that the right_rec is empty and the extent 2815 + * block is empty also, ocfs2_complete_edge_insert can't handle 2816 + * it and we need to delete the right extent block. 2817 + */ 2818 + if (le16_to_cpu(right_rec->e_leaf_clusters) == 0 && 2819 + le16_to_cpu(el->l_next_free_rec) == 1) { 2820 + 2821 + ret = ocfs2_remove_rightmost_path(inode, handle, 2822 + right_path, dealloc); 2823 + if (ret) { 2824 + mlog_errno(ret); 2825 + goto out; 2826 + } 2827 + 2828 + /* Now the rightmost extent block has been deleted. 2829 + * So we use the new rightmost path. 2830 + */ 2831 + ocfs2_mv_path(right_path, left_path); 2832 + left_path = NULL; 2833 + } else 2834 + ocfs2_complete_edge_insert(inode, handle, left_path, 2835 + right_path, subtree_index); 2836 + } 3057 2837 out: 2838 + if (left_path) 2839 + ocfs2_free_path(left_path); 3058 2840 return ret; 3059 2841 } 3060 2842 3061 2843 static int ocfs2_try_to_merge_extent(struct inode *inode, 3062 2844 handle_t *handle, 3063 - struct ocfs2_path *left_path, 2845 + struct ocfs2_path *path, 3064 2846 int split_index, 3065 2847 struct ocfs2_extent_rec *split_rec, 3066 2848 struct ocfs2_cached_dealloc_ctxt *dealloc, ··· 3099 2819 3100 2820 { 3101 2821 int ret = 0; 3102 - struct ocfs2_extent_list *el = path_leaf_el(left_path); 2822 + struct ocfs2_extent_list *el = path_leaf_el(path); 3103 2823 struct ocfs2_extent_rec *rec = &el->l_recs[split_index]; 3104 2824 3105 2825 BUG_ON(ctxt->c_contig_type == CONTIG_NONE); ··· 3112 2832 * extents - having more than one in a leaf is 3113 2833 * illegal. 3114 2834 */ 3115 - ret = ocfs2_rotate_tree_left(inode, handle, left_path, 2835 + ret = ocfs2_rotate_tree_left(inode, handle, path, 3116 2836 dealloc); 3117 2837 if (ret) { 3118 2838 mlog_errno(ret); ··· 3127 2847 * Left-right contig implies this. 3128 2848 */ 3129 2849 BUG_ON(!ctxt->c_split_covers_rec); 3130 - BUG_ON(split_index == 0); 3131 2850 3132 2851 /* 3133 2852 * Since the leftright insert always covers the entire ··· 3137 2858 * Since the adding of an empty extent shifts 3138 2859 * everything back to the right, there's no need to 3139 2860 * update split_index here. 2861 + * 2862 + * When the split_index is zero, we need to merge it to the 2863 + * prevoius extent block. It is more efficient and easier 2864 + * if we do merge_right first and merge_left later. 3140 2865 */ 3141 - ret = ocfs2_merge_rec_left(inode, path_leaf_bh(left_path), 3142 - handle, split_rec, el, split_index); 2866 + ret = ocfs2_merge_rec_right(inode, path, 2867 + handle, split_rec, 2868 + split_index); 3143 2869 if (ret) { 3144 2870 mlog_errno(ret); 3145 2871 goto out; ··· 3155 2871 */ 3156 2872 BUG_ON(!ocfs2_is_empty_extent(&el->l_recs[0])); 3157 2873 3158 - /* 3159 - * The left merge left us with an empty extent, remove 3160 - * it. 3161 - */ 3162 - ret = ocfs2_rotate_tree_left(inode, handle, left_path, dealloc); 2874 + /* The merge left us with an empty extent, remove it. */ 2875 + ret = ocfs2_rotate_tree_left(inode, handle, path, dealloc); 3163 2876 if (ret) { 3164 2877 mlog_errno(ret); 3165 2878 goto out; 3166 2879 } 3167 - split_index--; 2880 + 3168 2881 rec = &el->l_recs[split_index]; 3169 2882 3170 2883 /* 3171 2884 * Note that we don't pass split_rec here on purpose - 3172 - * we've merged it into the left side. 2885 + * we've merged it into the rec already. 3173 2886 */ 3174 - ret = ocfs2_merge_rec_right(inode, path_leaf_bh(left_path), 3175 - handle, rec, el, split_index); 2887 + ret = ocfs2_merge_rec_left(inode, path, 2888 + handle, rec, 2889 + dealloc, 2890 + split_index); 2891 + 3176 2892 if (ret) { 3177 2893 mlog_errno(ret); 3178 2894 goto out; 3179 2895 } 3180 2896 3181 - BUG_ON(!ocfs2_is_empty_extent(&el->l_recs[0])); 3182 - 3183 - ret = ocfs2_rotate_tree_left(inode, handle, left_path, 2897 + ret = ocfs2_rotate_tree_left(inode, handle, path, 3184 2898 dealloc); 3185 2899 /* 3186 2900 * Error from this last rotate is not critical, so ··· 3197 2915 */ 3198 2916 if (ctxt->c_contig_type == CONTIG_RIGHT) { 3199 2917 ret = ocfs2_merge_rec_left(inode, 3200 - path_leaf_bh(left_path), 3201 - handle, split_rec, el, 2918 + path, 2919 + handle, split_rec, 2920 + dealloc, 3202 2921 split_index); 3203 2922 if (ret) { 3204 2923 mlog_errno(ret); ··· 3207 2924 } 3208 2925 } else { 3209 2926 ret = ocfs2_merge_rec_right(inode, 3210 - path_leaf_bh(left_path), 3211 - handle, split_rec, el, 2927 + path, 2928 + handle, split_rec, 3212 2929 split_index); 3213 2930 if (ret) { 3214 2931 mlog_errno(ret); ··· 3221 2938 * The merge may have left an empty extent in 3222 2939 * our leaf. Try to rotate it away. 3223 2940 */ 3224 - ret = ocfs2_rotate_tree_left(inode, handle, left_path, 2941 + ret = ocfs2_rotate_tree_left(inode, handle, path, 3225 2942 dealloc); 3226 2943 if (ret) 3227 2944 mlog_errno(ret); ··· 3781 3498 } 3782 3499 3783 3500 static enum ocfs2_contig_type 3784 - ocfs2_figure_merge_contig_type(struct inode *inode, 3501 + ocfs2_figure_merge_contig_type(struct inode *inode, struct ocfs2_path *path, 3785 3502 struct ocfs2_extent_list *el, int index, 3786 3503 struct ocfs2_extent_rec *split_rec) 3787 3504 { 3788 - struct ocfs2_extent_rec *rec; 3505 + int status; 3789 3506 enum ocfs2_contig_type ret = CONTIG_NONE; 3507 + u32 left_cpos, right_cpos; 3508 + struct ocfs2_extent_rec *rec = NULL; 3509 + struct ocfs2_extent_list *new_el; 3510 + struct ocfs2_path *left_path = NULL, *right_path = NULL; 3511 + struct buffer_head *bh; 3512 + struct ocfs2_extent_block *eb; 3513 + 3514 + if (index > 0) { 3515 + rec = &el->l_recs[index - 1]; 3516 + } else if (path->p_tree_depth > 0) { 3517 + status = ocfs2_find_cpos_for_left_leaf(inode->i_sb, 3518 + path, &left_cpos); 3519 + if (status) 3520 + goto out; 3521 + 3522 + if (left_cpos != 0) { 3523 + left_path = ocfs2_new_path(path_root_bh(path), 3524 + path_root_el(path)); 3525 + if (!left_path) 3526 + goto out; 3527 + 3528 + status = ocfs2_find_path(inode, left_path, left_cpos); 3529 + if (status) 3530 + goto out; 3531 + 3532 + new_el = path_leaf_el(left_path); 3533 + 3534 + if (le16_to_cpu(new_el->l_next_free_rec) != 3535 + le16_to_cpu(new_el->l_count)) { 3536 + bh = path_leaf_bh(left_path); 3537 + eb = (struct ocfs2_extent_block *)bh->b_data; 3538 + OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, 3539 + eb); 3540 + goto out; 3541 + } 3542 + rec = &new_el->l_recs[ 3543 + le16_to_cpu(new_el->l_next_free_rec) - 1]; 3544 + } 3545 + } 3790 3546 3791 3547 /* 3792 3548 * We're careful to check for an empty extent record here - 3793 3549 * the merge code will know what to do if it sees one. 3794 3550 */ 3795 - 3796 - if (index > 0) { 3797 - rec = &el->l_recs[index - 1]; 3551 + if (rec) { 3798 3552 if (index == 1 && ocfs2_is_empty_extent(rec)) { 3799 3553 if (split_rec->e_cpos == el->l_recs[index].e_cpos) 3800 3554 ret = CONTIG_RIGHT; ··· 3840 3520 } 3841 3521 } 3842 3522 3843 - if (index < (le16_to_cpu(el->l_next_free_rec) - 1)) { 3523 + rec = NULL; 3524 + if (index < (le16_to_cpu(el->l_next_free_rec) - 1)) 3525 + rec = &el->l_recs[index + 1]; 3526 + else if (le16_to_cpu(el->l_next_free_rec) == le16_to_cpu(el->l_count) && 3527 + path->p_tree_depth > 0) { 3528 + status = ocfs2_find_cpos_for_right_leaf(inode->i_sb, 3529 + path, &right_cpos); 3530 + if (status) 3531 + goto out; 3532 + 3533 + if (right_cpos == 0) 3534 + goto out; 3535 + 3536 + right_path = ocfs2_new_path(path_root_bh(path), 3537 + path_root_el(path)); 3538 + if (!right_path) 3539 + goto out; 3540 + 3541 + status = ocfs2_find_path(inode, right_path, right_cpos); 3542 + if (status) 3543 + goto out; 3544 + 3545 + new_el = path_leaf_el(right_path); 3546 + rec = &new_el->l_recs[0]; 3547 + if (ocfs2_is_empty_extent(rec)) { 3548 + if (le16_to_cpu(new_el->l_next_free_rec) <= 1) { 3549 + bh = path_leaf_bh(right_path); 3550 + eb = (struct ocfs2_extent_block *)bh->b_data; 3551 + OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, 3552 + eb); 3553 + goto out; 3554 + } 3555 + rec = &new_el->l_recs[1]; 3556 + } 3557 + } 3558 + 3559 + if (rec) { 3844 3560 enum ocfs2_contig_type contig_type; 3845 3561 3846 - rec = &el->l_recs[index + 1]; 3847 3562 contig_type = ocfs2_extent_contig(inode, rec, split_rec); 3848 3563 3849 3564 if (contig_type == CONTIG_LEFT && ret == CONTIG_RIGHT) ··· 3886 3531 else if (ret == CONTIG_NONE) 3887 3532 ret = contig_type; 3888 3533 } 3534 + 3535 + out: 3536 + if (left_path) 3537 + ocfs2_free_path(left_path); 3538 + if (right_path) 3539 + ocfs2_free_path(right_path); 3889 3540 3890 3541 return ret; 3891 3542 } ··· 4355 3994 goto out; 4356 3995 } 4357 3996 4358 - ctxt.c_contig_type = ocfs2_figure_merge_contig_type(inode, el, 3997 + ctxt.c_contig_type = ocfs2_figure_merge_contig_type(inode, path, el, 4359 3998 split_index, 4360 3999 split_rec); 4361 4000 ··· 5149 4788 status = ocfs2_flush_truncate_log(osb); 5150 4789 if (status < 0) 5151 4790 mlog_errno(status); 4791 + else 4792 + ocfs2_init_inode_steal_slot(osb); 5152 4793 5153 4794 mlog_exit(status); 5154 4795 }

+3 -3

fs/ocfs2/aops.c

··· 467 467 unsigned to) 468 468 { 469 469 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); 470 - handle_t *handle = NULL; 470 + handle_t *handle; 471 471 int ret = 0; 472 472 473 473 handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS); 474 - if (!handle) { 474 + if (IS_ERR(handle)) { 475 475 ret = -ENOMEM; 476 476 mlog_errno(ret); 477 477 goto out; ··· 487 487 } 488 488 out: 489 489 if (ret) { 490 - if (handle) 490 + if (!IS_ERR(handle)) 491 491 ocfs2_commit_trans(osb, handle); 492 492 handle = ERR_PTR(ret); 493 493 }

+1 -1

fs/ocfs2/cluster/Makefile

··· 1 1 obj-$(CONFIG_OCFS2_FS) += ocfs2_nodemanager.o 2 2 3 3 ocfs2_nodemanager-objs := heartbeat.o masklog.o sys.o nodemanager.o \ 4 - quorum.o tcp.o ver.o 4 + quorum.o tcp.o netdebug.o ver.o

+441

fs/ocfs2/cluster/netdebug.c

··· 1 + /* -*- mode: c; c-basic-offset: 8; -*- 2 + * vim: noexpandtab sw=8 ts=8 sts=0: 3 + * 4 + * netdebug.c 5 + * 6 + * debug functionality for o2net 7 + * 8 + * Copyright (C) 2005, 2008 Oracle. All rights reserved. 9 + * 10 + * This program is free software; you can redistribute it and/or 11 + * modify it under the terms of the GNU General Public 12 + * License as published by the Free Software Foundation; either 13 + * version 2 of the License, or (at your option) any later version. 14 + * 15 + * This program is distributed in the hope that it will be useful, 16 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 17 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 18 + * General Public License for more details. 19 + * 20 + * You should have received a copy of the GNU General Public 21 + * License along with this program; if not, write to the 22 + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, 23 + * Boston, MA 021110-1307, USA. 24 + * 25 + */ 26 + 27 + #ifdef CONFIG_DEBUG_FS 28 + 29 + #include <linux/module.h> 30 + #include <linux/types.h> 31 + #include <linux/slab.h> 32 + #include <linux/idr.h> 33 + #include <linux/kref.h> 34 + #include <linux/seq_file.h> 35 + #include <linux/debugfs.h> 36 + 37 + #include <linux/uaccess.h> 38 + 39 + #include "tcp.h" 40 + #include "nodemanager.h" 41 + #define MLOG_MASK_PREFIX ML_TCP 42 + #include "masklog.h" 43 + 44 + #include "tcp_internal.h" 45 + 46 + #define O2NET_DEBUG_DIR "o2net" 47 + #define SC_DEBUG_NAME "sock_containers" 48 + #define NST_DEBUG_NAME "send_tracking" 49 + 50 + static struct dentry *o2net_dentry; 51 + static struct dentry *sc_dentry; 52 + static struct dentry *nst_dentry; 53 + 54 + static DEFINE_SPINLOCK(o2net_debug_lock); 55 + 56 + static LIST_HEAD(sock_containers); 57 + static LIST_HEAD(send_tracking); 58 + 59 + void o2net_debug_add_nst(struct o2net_send_tracking *nst) 60 + { 61 + spin_lock(&o2net_debug_lock); 62 + list_add(&nst->st_net_debug_item, &send_tracking); 63 + spin_unlock(&o2net_debug_lock); 64 + } 65 + 66 + void o2net_debug_del_nst(struct o2net_send_tracking *nst) 67 + { 68 + spin_lock(&o2net_debug_lock); 69 + if (!list_empty(&nst->st_net_debug_item)) 70 + list_del_init(&nst->st_net_debug_item); 71 + spin_unlock(&o2net_debug_lock); 72 + } 73 + 74 + static struct o2net_send_tracking 75 + *next_nst(struct o2net_send_tracking *nst_start) 76 + { 77 + struct o2net_send_tracking *nst, *ret = NULL; 78 + 79 + assert_spin_locked(&o2net_debug_lock); 80 + 81 + list_for_each_entry(nst, &nst_start->st_net_debug_item, 82 + st_net_debug_item) { 83 + /* discover the head of the list */ 84 + if (&nst->st_net_debug_item == &send_tracking) 85 + break; 86 + 87 + /* use st_task to detect real nsts in the list */ 88 + if (nst->st_task != NULL) { 89 + ret = nst; 90 + break; 91 + } 92 + } 93 + 94 + return ret; 95 + } 96 + 97 + static void *nst_seq_start(struct seq_file *seq, loff_t *pos) 98 + { 99 + struct o2net_send_tracking *nst, *dummy_nst = seq->private; 100 + 101 + spin_lock(&o2net_debug_lock); 102 + nst = next_nst(dummy_nst); 103 + spin_unlock(&o2net_debug_lock); 104 + 105 + return nst; 106 + } 107 + 108 + static void *nst_seq_next(struct seq_file *seq, void *v, loff_t *pos) 109 + { 110 + struct o2net_send_tracking *nst, *dummy_nst = seq->private; 111 + 112 + spin_lock(&o2net_debug_lock); 113 + nst = next_nst(dummy_nst); 114 + list_del_init(&dummy_nst->st_net_debug_item); 115 + if (nst) 116 + list_add(&dummy_nst->st_net_debug_item, 117 + &nst->st_net_debug_item); 118 + spin_unlock(&o2net_debug_lock); 119 + 120 + return nst; /* unused, just needs to be null when done */ 121 + } 122 + 123 + static int nst_seq_show(struct seq_file *seq, void *v) 124 + { 125 + struct o2net_send_tracking *nst, *dummy_nst = seq->private; 126 + 127 + spin_lock(&o2net_debug_lock); 128 + nst = next_nst(dummy_nst); 129 + 130 + if (nst != NULL) { 131 + /* get_task_comm isn't exported. oh well. */ 132 + seq_printf(seq, "%p:\n" 133 + " pid: %lu\n" 134 + " tgid: %lu\n" 135 + " process name: %s\n" 136 + " node: %u\n" 137 + " sc: %p\n" 138 + " message id: %d\n" 139 + " message type: %u\n" 140 + " message key: 0x%08x\n" 141 + " sock acquiry: %lu.%lu\n" 142 + " send start: %lu.%lu\n" 143 + " wait start: %lu.%lu\n", 144 + nst, (unsigned long)nst->st_task->pid, 145 + (unsigned long)nst->st_task->tgid, 146 + nst->st_task->comm, nst->st_node, 147 + nst->st_sc, nst->st_id, nst->st_msg_type, 148 + nst->st_msg_key, 149 + nst->st_sock_time.tv_sec, nst->st_sock_time.tv_usec, 150 + nst->st_send_time.tv_sec, nst->st_send_time.tv_usec, 151 + nst->st_status_time.tv_sec, 152 + nst->st_status_time.tv_usec); 153 + } 154 + 155 + spin_unlock(&o2net_debug_lock); 156 + 157 + return 0; 158 + } 159 + 160 + static void nst_seq_stop(struct seq_file *seq, void *v) 161 + { 162 + } 163 + 164 + static struct seq_operations nst_seq_ops = { 165 + .start = nst_seq_start, 166 + .next = nst_seq_next, 167 + .stop = nst_seq_stop, 168 + .show = nst_seq_show, 169 + }; 170 + 171 + static int nst_fop_open(struct inode *inode, struct file *file) 172 + { 173 + struct o2net_send_tracking *dummy_nst; 174 + struct seq_file *seq; 175 + int ret; 176 + 177 + dummy_nst = kmalloc(sizeof(struct o2net_send_tracking), GFP_KERNEL); 178 + if (dummy_nst == NULL) { 179 + ret = -ENOMEM; 180 + goto out; 181 + } 182 + dummy_nst->st_task = NULL; 183 + 184 + ret = seq_open(file, &nst_seq_ops); 185 + if (ret) 186 + goto out; 187 + 188 + seq = file->private_data; 189 + seq->private = dummy_nst; 190 + o2net_debug_add_nst(dummy_nst); 191 + 192 + dummy_nst = NULL; 193 + 194 + out: 195 + kfree(dummy_nst); 196 + return ret; 197 + } 198 + 199 + static int nst_fop_release(struct inode *inode, struct file *file) 200 + { 201 + struct seq_file *seq = file->private_data; 202 + struct o2net_send_tracking *dummy_nst = seq->private; 203 + 204 + o2net_debug_del_nst(dummy_nst); 205 + return seq_release_private(inode, file); 206 + } 207 + 208 + static struct file_operations nst_seq_fops = { 209 + .open = nst_fop_open, 210 + .read = seq_read, 211 + .llseek = seq_lseek, 212 + .release = nst_fop_release, 213 + }; 214 + 215 + void o2net_debug_add_sc(struct o2net_sock_container *sc) 216 + { 217 + spin_lock(&o2net_debug_lock); 218 + list_add(&sc->sc_net_debug_item, &sock_containers); 219 + spin_unlock(&o2net_debug_lock); 220 + } 221 + 222 + void o2net_debug_del_sc(struct o2net_sock_container *sc) 223 + { 224 + spin_lock(&o2net_debug_lock); 225 + list_del_init(&sc->sc_net_debug_item); 226 + spin_unlock(&o2net_debug_lock); 227 + } 228 + 229 + static struct o2net_sock_container 230 + *next_sc(struct o2net_sock_container *sc_start) 231 + { 232 + struct o2net_sock_container *sc, *ret = NULL; 233 + 234 + assert_spin_locked(&o2net_debug_lock); 235 + 236 + list_for_each_entry(sc, &sc_start->sc_net_debug_item, 237 + sc_net_debug_item) { 238 + /* discover the head of the list miscast as a sc */ 239 + if (&sc->sc_net_debug_item == &sock_containers) 240 + break; 241 + 242 + /* use sc_page to detect real scs in the list */ 243 + if (sc->sc_page != NULL) { 244 + ret = sc; 245 + break; 246 + } 247 + } 248 + 249 + return ret; 250 + } 251 + 252 + static void *sc_seq_start(struct seq_file *seq, loff_t *pos) 253 + { 254 + struct o2net_sock_container *sc, *dummy_sc = seq->private; 255 + 256 + spin_lock(&o2net_debug_lock); 257 + sc = next_sc(dummy_sc); 258 + spin_unlock(&o2net_debug_lock); 259 + 260 + return sc; 261 + } 262 + 263 + static void *sc_seq_next(struct seq_file *seq, void *v, loff_t *pos) 264 + { 265 + struct o2net_sock_container *sc, *dummy_sc = seq->private; 266 + 267 + spin_lock(&o2net_debug_lock); 268 + sc = next_sc(dummy_sc); 269 + list_del_init(&dummy_sc->sc_net_debug_item); 270 + if (sc) 271 + list_add(&dummy_sc->sc_net_debug_item, &sc->sc_net_debug_item); 272 + spin_unlock(&o2net_debug_lock); 273 + 274 + return sc; /* unused, just needs to be null when done */ 275 + } 276 + 277 + #define TV_SEC_USEC(TV) TV.tv_sec, TV.tv_usec 278 + 279 + static int sc_seq_show(struct seq_file *seq, void *v) 280 + { 281 + struct o2net_sock_container *sc, *dummy_sc = seq->private; 282 + 283 + spin_lock(&o2net_debug_lock); 284 + sc = next_sc(dummy_sc); 285 + 286 + if (sc != NULL) { 287 + struct inet_sock *inet = NULL; 288 + 289 + __be32 saddr = 0, daddr = 0; 290 + __be16 sport = 0, dport = 0; 291 + 292 + if (sc->sc_sock) { 293 + inet = inet_sk(sc->sc_sock->sk); 294 + /* the stack's structs aren't sparse endian clean */ 295 + saddr = (__force __be32)inet->saddr; 296 + daddr = (__force __be32)inet->daddr; 297 + sport = (__force __be16)inet->sport; 298 + dport = (__force __be16)inet->dport; 299 + } 300 + 301 + /* XXX sigh, inet-> doesn't have sparse annotation so any 302 + * use of it here generates a warning with -Wbitwise */ 303 + seq_printf(seq, "%p:\n" 304 + " krefs: %d\n" 305 + " sock: %u.%u.%u.%u:%u -> " 306 + "%u.%u.%u.%u:%u\n" 307 + " remote node: %s\n" 308 + " page off: %zu\n" 309 + " handshake ok: %u\n" 310 + " timer: %lu.%lu\n" 311 + " data ready: %lu.%lu\n" 312 + " advance start: %lu.%lu\n" 313 + " advance stop: %lu.%lu\n" 314 + " func start: %lu.%lu\n" 315 + " func stop: %lu.%lu\n" 316 + " func key: %u\n" 317 + " func type: %u\n", 318 + sc, 319 + atomic_read(&sc->sc_kref.refcount), 320 + NIPQUAD(saddr), inet ? ntohs(sport) : 0, 321 + NIPQUAD(daddr), inet ? ntohs(dport) : 0, 322 + sc->sc_node->nd_name, 323 + sc->sc_page_off, 324 + sc->sc_handshake_ok, 325 + TV_SEC_USEC(sc->sc_tv_timer), 326 + TV_SEC_USEC(sc->sc_tv_data_ready), 327 + TV_SEC_USEC(sc->sc_tv_advance_start), 328 + TV_SEC_USEC(sc->sc_tv_advance_stop), 329 + TV_SEC_USEC(sc->sc_tv_func_start), 330 + TV_SEC_USEC(sc->sc_tv_func_stop), 331 + sc->sc_msg_key, 332 + sc->sc_msg_type); 333 + } 334 + 335 + 336 + spin_unlock(&o2net_debug_lock); 337 + 338 + return 0; 339 + } 340 + 341 + static void sc_seq_stop(struct seq_file *seq, void *v) 342 + { 343 + } 344 + 345 + static struct seq_operations sc_seq_ops = { 346 + .start = sc_seq_start, 347 + .next = sc_seq_next, 348 + .stop = sc_seq_stop, 349 + .show = sc_seq_show, 350 + }; 351 + 352 + static int sc_fop_open(struct inode *inode, struct file *file) 353 + { 354 + struct o2net_sock_container *dummy_sc; 355 + struct seq_file *seq; 356 + int ret; 357 + 358 + dummy_sc = kmalloc(sizeof(struct o2net_sock_container), GFP_KERNEL); 359 + if (dummy_sc == NULL) { 360 + ret = -ENOMEM; 361 + goto out; 362 + } 363 + dummy_sc->sc_page = NULL; 364 + 365 + ret = seq_open(file, &sc_seq_ops); 366 + if (ret) 367 + goto out; 368 + 369 + seq = file->private_data; 370 + seq->private = dummy_sc; 371 + o2net_debug_add_sc(dummy_sc); 372 + 373 + dummy_sc = NULL; 374 + 375 + out: 376 + kfree(dummy_sc); 377 + return ret; 378 + } 379 + 380 + static int sc_fop_release(struct inode *inode, struct file *file) 381 + { 382 + struct seq_file *seq = file->private_data; 383 + struct o2net_sock_container *dummy_sc = seq->private; 384 + 385 + o2net_debug_del_sc(dummy_sc); 386 + return seq_release_private(inode, file); 387 + } 388 + 389 + static struct file_operations sc_seq_fops = { 390 + .open = sc_fop_open, 391 + .read = seq_read, 392 + .llseek = seq_lseek, 393 + .release = sc_fop_release, 394 + }; 395 + 396 + int o2net_debugfs_init(void) 397 + { 398 + o2net_dentry = debugfs_create_dir(O2NET_DEBUG_DIR, NULL); 399 + if (!o2net_dentry) { 400 + mlog_errno(-ENOMEM); 401 + goto bail; 402 + } 403 + 404 + nst_dentry = debugfs_create_file(NST_DEBUG_NAME, S_IFREG|S_IRUSR, 405 + o2net_dentry, NULL, 406 + &nst_seq_fops); 407 + if (!nst_dentry) { 408 + mlog_errno(-ENOMEM); 409 + goto bail; 410 + } 411 + 412 + sc_dentry = debugfs_create_file(SC_DEBUG_NAME, S_IFREG|S_IRUSR, 413 + o2net_dentry, NULL, 414 + &sc_seq_fops); 415 + if (!sc_dentry) { 416 + mlog_errno(-ENOMEM); 417 + goto bail; 418 + } 419 + 420 + return 0; 421 + bail: 422 + if (sc_dentry) 423 + debugfs_remove(sc_dentry); 424 + if (nst_dentry) 425 + debugfs_remove(nst_dentry); 426 + if (o2net_dentry) 427 + debugfs_remove(o2net_dentry); 428 + return -ENOMEM; 429 + } 430 + 431 + void o2net_debugfs_exit(void) 432 + { 433 + if (sc_dentry) 434 + debugfs_remove(sc_dentry); 435 + if (nst_dentry) 436 + debugfs_remove(nst_dentry); 437 + if (o2net_dentry) 438 + debugfs_remove(o2net_dentry); 439 + } 440 + 441 + #endif /* CONFIG_DEBUG_FS */

+4 -1

fs/ocfs2/cluster/nodemanager.c

··· 959 959 cluster_print_version(); 960 960 961 961 o2hb_init(); 962 - o2net_init(); 962 + 963 + ret = o2net_init(); 964 + if (ret) 965 + goto out; 963 966 964 967 ocfs2_table_header = register_sysctl_table(ocfs2_root_table); 965 968 if (!ocfs2_table_header) {

+9

fs/ocfs2/cluster/sys.c

··· 57 57 void o2cb_sys_shutdown(void) 58 58 { 59 59 mlog_sys_shutdown(); 60 + sysfs_remove_link(NULL, "o2cb"); 60 61 kset_unregister(o2cb_kset); 61 62 } 62 63 ··· 68 67 o2cb_kset = kset_create_and_add("o2cb", NULL, NULL); 69 68 if (!o2cb_kset) 70 69 return -ENOMEM; 70 + 71 + /* 72 + * Create this symlink for backwards compatibility with old 73 + * versions of ocfs2-tools which look for things in /sys/o2cb. 74 + */ 75 + ret = sysfs_create_link(NULL, &o2cb_kset->kobj, "o2cb"); 76 + if (ret) 77 + goto error; 71 78 72 79 ret = sysfs_create_group(&o2cb_kset->kobj, &o2cb_attr_group); 73 80 if (ret)

+123 -41

fs/ocfs2/cluster/tcp.c

··· 142 142 static void o2net_sc_postpone_idle(struct o2net_sock_container *sc); 143 143 static void o2net_sc_reset_idle_timer(struct o2net_sock_container *sc); 144 144 145 - /* 146 - * FIXME: These should use to_o2nm_cluster_from_node(), but we end up 147 - * losing our parent link to the cluster during shutdown. This can be 148 - * solved by adding a pre-removal callback to configfs, or passing 149 - * around the cluster with the node. -jeffm 150 - */ 151 - static inline int o2net_reconnect_delay(struct o2nm_node *node) 145 + static void o2net_init_nst(struct o2net_send_tracking *nst, u32 msgtype, 146 + u32 msgkey, struct task_struct *task, u8 node) 147 + { 148 + #ifdef CONFIG_DEBUG_FS 149 + INIT_LIST_HEAD(&nst->st_net_debug_item); 150 + nst->st_task = task; 151 + nst->st_msg_type = msgtype; 152 + nst->st_msg_key = msgkey; 153 + nst->st_node = node; 154 + #endif 155 + } 156 + 157 + static void o2net_set_nst_sock_time(struct o2net_send_tracking *nst) 158 + { 159 + #ifdef CONFIG_DEBUG_FS 160 + do_gettimeofday(&nst->st_sock_time); 161 + #endif 162 + } 163 + 164 + static void o2net_set_nst_send_time(struct o2net_send_tracking *nst) 165 + { 166 + #ifdef CONFIG_DEBUG_FS 167 + do_gettimeofday(&nst->st_send_time); 168 + #endif 169 + } 170 + 171 + static void o2net_set_nst_status_time(struct o2net_send_tracking *nst) 172 + { 173 + #ifdef CONFIG_DEBUG_FS 174 + do_gettimeofday(&nst->st_status_time); 175 + #endif 176 + } 177 + 178 + static void o2net_set_nst_sock_container(struct o2net_send_tracking *nst, 179 + struct o2net_sock_container *sc) 180 + { 181 + #ifdef CONFIG_DEBUG_FS 182 + nst->st_sc = sc; 183 + #endif 184 + } 185 + 186 + static void o2net_set_nst_msg_id(struct o2net_send_tracking *nst, u32 msg_id) 187 + { 188 + #ifdef CONFIG_DEBUG_FS 189 + nst->st_id = msg_id; 190 + #endif 191 + } 192 + 193 + static inline int o2net_reconnect_delay(void) 152 194 { 153 195 return o2nm_single_cluster->cl_reconnect_delay_ms; 154 196 } 155 197 156 - static inline int o2net_keepalive_delay(struct o2nm_node *node) 198 + static inline int o2net_keepalive_delay(void) 157 199 { 158 200 return o2nm_single_cluster->cl_keepalive_delay_ms; 159 201 } 160 202 161 - static inline int o2net_idle_timeout(struct o2nm_node *node) 203 + static inline int o2net_idle_timeout(void) 162 204 { 163 205 return o2nm_single_cluster->cl_idle_timeout_ms; 164 206 } ··· 338 296 o2nm_node_put(sc->sc_node); 339 297 sc->sc_node = NULL; 340 298 299 + o2net_debug_del_sc(sc); 341 300 kfree(sc); 342 301 } 343 302 ··· 379 336 380 337 ret = sc; 381 338 sc->sc_page = page; 339 + o2net_debug_add_sc(sc); 382 340 sc = NULL; 383 341 page = NULL; 384 342 ··· 443 399 mlog_bug_on_msg(err && valid, "err %d valid %u\n", err, valid); 444 400 mlog_bug_on_msg(valid && !sc, "valid %u sc %p\n", valid, sc); 445 401 446 - /* we won't reconnect after our valid conn goes away for 447 - * this hb iteration.. here so it shows up in the logs */ 448 402 if (was_valid && !valid && err == 0) 449 403 err = -ENOTCONN; 450 404 ··· 472 430 473 431 if (!was_valid && valid) { 474 432 o2quo_conn_up(o2net_num_from_nn(nn)); 475 - /* this is a bit of a hack. we only try reconnecting 476 - * when heartbeating starts until we get a connection. 477 - * if that connection then dies we don't try reconnecting. 478 - * the only way to start connecting again is to down 479 - * heartbeat and bring it back up. */ 480 433 cancel_delayed_work(&nn->nn_connect_expired); 481 434 printk(KERN_INFO "o2net: %s " SC_NODEF_FMT "\n", 482 435 o2nm_this_node() > sc->sc_node->nd_num ? ··· 488 451 /* delay if we're withing a RECONNECT_DELAY of the 489 452 * last attempt */ 490 453 delay = (nn->nn_last_connect_attempt + 491 - msecs_to_jiffies(o2net_reconnect_delay(NULL))) 454 + msecs_to_jiffies(o2net_reconnect_delay())) 492 455 - jiffies; 493 - if (delay > msecs_to_jiffies(o2net_reconnect_delay(NULL))) 456 + if (delay > msecs_to_jiffies(o2net_reconnect_delay())) 494 457 delay = 0; 495 458 mlog(ML_CONN, "queueing conn attempt in %lu jiffies\n", delay); 496 459 queue_delayed_work(o2net_wq, &nn->nn_connect_work, delay); 460 + 461 + /* 462 + * Delay the expired work after idle timeout. 463 + * 464 + * We might have lots of failed connection attempts that run 465 + * through here but we only cancel the connect_expired work when 466 + * a connection attempt succeeds. So only the first enqueue of 467 + * the connect_expired work will do anything. The rest will see 468 + * that it's already queued and do nothing. 469 + */ 470 + delay += msecs_to_jiffies(o2net_idle_timeout()); 471 + queue_delayed_work(o2net_wq, &nn->nn_connect_expired, delay); 497 472 } 498 473 499 474 /* keep track of the nn's sc ref for the caller */ ··· 963 914 struct o2net_status_wait nsw = { 964 915 .ns_node_item = LIST_HEAD_INIT(nsw.ns_node_item), 965 916 }; 917 + struct o2net_send_tracking nst; 918 + 919 + o2net_init_nst(&nst, msg_type, key, current, target_node); 966 920 967 921 if (o2net_wq == NULL) { 968 922 mlog(0, "attempt to tx without o2netd running\n"); ··· 991 939 goto out; 992 940 } 993 941 942 + o2net_debug_add_nst(&nst); 943 + 944 + o2net_set_nst_sock_time(&nst); 945 + 994 946 ret = wait_event_interruptible(nn->nn_sc_wq, 995 947 o2net_tx_can_proceed(nn, &sc, &error)); 996 948 if (!ret && error) 997 949 ret = error; 998 950 if (ret) 999 951 goto out; 952 + 953 + o2net_set_nst_sock_container(&nst, sc); 1000 954 1001 955 veclen = caller_veclen + 1; 1002 956 vec = kmalloc(sizeof(struct kvec) * veclen, GFP_ATOMIC); ··· 1030 972 goto out; 1031 973 1032 974 msg->msg_num = cpu_to_be32(nsw.ns_id); 975 + o2net_set_nst_msg_id(&nst, nsw.ns_id); 976 + 977 + o2net_set_nst_send_time(&nst); 1033 978 1034 979 /* finally, convert the message header to network byte-order 1035 980 * and send */ ··· 1047 986 } 1048 987 1049 988 /* wait on other node's handler */ 989 + o2net_set_nst_status_time(&nst); 1050 990 wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &nsw)); 1051 991 1052 992 /* Note that we avoid overwriting the callers status return ··· 1060 998 mlog(0, "woken, returning system status %d, user status %d\n", 1061 999 ret, nsw.ns_status); 1062 1000 out: 1001 + o2net_debug_del_nst(&nst); /* must be before dropping sc and node */ 1063 1002 if (sc) 1064 1003 sc_put(sc); 1065 1004 if (vec) ··· 1217 1154 * but isn't. This can ultimately cause corruption. 1218 1155 */ 1219 1156 if (be32_to_cpu(hand->o2net_idle_timeout_ms) != 1220 - o2net_idle_timeout(sc->sc_node)) { 1157 + o2net_idle_timeout()) { 1221 1158 mlog(ML_NOTICE, SC_NODEF_FMT " uses a network idle timeout of " 1222 1159 "%u ms, but we use %u ms locally. disconnecting\n", 1223 1160 SC_NODEF_ARGS(sc), 1224 1161 be32_to_cpu(hand->o2net_idle_timeout_ms), 1225 - o2net_idle_timeout(sc->sc_node)); 1162 + o2net_idle_timeout()); 1226 1163 o2net_ensure_shutdown(nn, sc, -ENOTCONN); 1227 1164 return -1; 1228 1165 } 1229 1166 1230 1167 if (be32_to_cpu(hand->o2net_keepalive_delay_ms) != 1231 - o2net_keepalive_delay(sc->sc_node)) { 1168 + o2net_keepalive_delay()) { 1232 1169 mlog(ML_NOTICE, SC_NODEF_FMT " uses a keepalive delay of " 1233 1170 "%u ms, but we use %u ms locally. disconnecting\n", 1234 1171 SC_NODEF_ARGS(sc), 1235 1172 be32_to_cpu(hand->o2net_keepalive_delay_ms), 1236 - o2net_keepalive_delay(sc->sc_node)); 1173 + o2net_keepalive_delay()); 1237 1174 o2net_ensure_shutdown(nn, sc, -ENOTCONN); 1238 1175 return -1; 1239 1176 } ··· 1256 1193 * shut down already */ 1257 1194 if (nn->nn_sc == sc) { 1258 1195 o2net_sc_reset_idle_timer(sc); 1196 + atomic_set(&nn->nn_timeout, 0); 1259 1197 o2net_set_nn_state(nn, sc, 1, 0); 1260 1198 } 1261 1199 spin_unlock(&nn->nn_lock); ··· 1411 1347 { 1412 1348 o2net_hand->o2hb_heartbeat_timeout_ms = cpu_to_be32( 1413 1349 O2HB_MAX_WRITE_TIMEOUT_MS); 1414 - o2net_hand->o2net_idle_timeout_ms = cpu_to_be32( 1415 - o2net_idle_timeout(NULL)); 1350 + o2net_hand->o2net_idle_timeout_ms = cpu_to_be32(o2net_idle_timeout()); 1416 1351 o2net_hand->o2net_keepalive_delay_ms = cpu_to_be32( 1417 - o2net_keepalive_delay(NULL)); 1352 + o2net_keepalive_delay()); 1418 1353 o2net_hand->o2net_reconnect_delay_ms = cpu_to_be32( 1419 - o2net_reconnect_delay(NULL)); 1354 + o2net_reconnect_delay()); 1420 1355 } 1421 1356 1422 1357 /* ------------------------------------------------------------ */ ··· 1454 1391 static void o2net_idle_timer(unsigned long data) 1455 1392 { 1456 1393 struct o2net_sock_container *sc = (struct o2net_sock_container *)data; 1394 + struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num); 1457 1395 struct timeval now; 1458 1396 1459 1397 do_gettimeofday(&now); 1460 1398 1461 1399 printk(KERN_INFO "o2net: connection to " SC_NODEF_FMT " has been idle for %u.%u " 1462 1400 "seconds, shutting it down.\n", SC_NODEF_ARGS(sc), 1463 - o2net_idle_timeout(sc->sc_node) / 1000, 1464 - o2net_idle_timeout(sc->sc_node) % 1000); 1401 + o2net_idle_timeout() / 1000, 1402 + o2net_idle_timeout() % 1000); 1465 1403 mlog(ML_NOTICE, "here are some times that might help debug the " 1466 1404 "situation: (tmr %ld.%ld now %ld.%ld dr %ld.%ld adv " 1467 1405 "%ld.%ld:%ld.%ld func (%08x:%u) %ld.%ld:%ld.%ld)\n", ··· 1477 1413 sc->sc_tv_func_start.tv_sec, (long) sc->sc_tv_func_start.tv_usec, 1478 1414 sc->sc_tv_func_stop.tv_sec, (long) sc->sc_tv_func_stop.tv_usec); 1479 1415 1416 + /* 1417 + * Initialize the nn_timeout so that the next connection attempt 1418 + * will continue in o2net_start_connect. 1419 + */ 1420 + atomic_set(&nn->nn_timeout, 1); 1421 + 1480 1422 o2net_sc_queue_work(sc, &sc->sc_shutdown_work); 1481 1423 } 1482 1424 ··· 1490 1420 { 1491 1421 o2net_sc_cancel_delayed_work(sc, &sc->sc_keepalive_work); 1492 1422 o2net_sc_queue_delayed_work(sc, &sc->sc_keepalive_work, 1493 - msecs_to_jiffies(o2net_keepalive_delay(sc->sc_node))); 1423 + msecs_to_jiffies(o2net_keepalive_delay())); 1494 1424 do_gettimeofday(&sc->sc_tv_timer); 1495 1425 mod_timer(&sc->sc_idle_timeout, 1496 - jiffies + msecs_to_jiffies(o2net_idle_timeout(sc->sc_node))); 1426 + jiffies + msecs_to_jiffies(o2net_idle_timeout())); 1497 1427 } 1498 1428 1499 1429 static void o2net_sc_postpone_idle(struct o2net_sock_container *sc) ··· 1517 1447 struct socket *sock = NULL; 1518 1448 struct sockaddr_in myaddr = {0, }, remoteaddr = {0, }; 1519 1449 int ret = 0, stop; 1450 + unsigned int timeout; 1520 1451 1521 1452 /* if we're greater we initiate tx, otherwise we accept */ 1522 1453 if (o2nm_this_node() <= o2net_num_from_nn(nn)) ··· 1537 1466 } 1538 1467 1539 1468 spin_lock(&nn->nn_lock); 1540 - /* see if we already have one pending or have given up */ 1541 - stop = (nn->nn_sc || nn->nn_persistent_error); 1469 + /* 1470 + * see if we already have one pending or have given up. 1471 + * For nn_timeout, it is set when we close the connection 1472 + * because of the idle time out. So it means that we have 1473 + * at least connected to that node successfully once, 1474 + * now try to connect to it again. 1475 + */ 1476 + timeout = atomic_read(&nn->nn_timeout); 1477 + stop = (nn->nn_sc || 1478 + (nn->nn_persistent_error && 1479 + (nn->nn_persistent_error != -ENOTCONN || timeout == 0))); 1542 1480 spin_unlock(&nn->nn_lock); 1543 1481 if (stop) 1544 1482 goto out; ··· 1635 1555 mlog(ML_ERROR, "no connection established with node %u after " 1636 1556 "%u.%u seconds, giving up and returning errors.\n", 1637 1557 o2net_num_from_nn(nn), 1638 - o2net_idle_timeout(NULL) / 1000, 1639 - o2net_idle_timeout(NULL) % 1000); 1558 + o2net_idle_timeout() / 1000, 1559 + o2net_idle_timeout() % 1000); 1640 1560 1641 1561 o2net_set_nn_state(nn, NULL, 0, -ENOTCONN); 1642 1562 } ··· 1659 1579 1660 1580 /* don't reconnect until it's heartbeating again */ 1661 1581 spin_lock(&nn->nn_lock); 1582 + atomic_set(&nn->nn_timeout, 0); 1662 1583 o2net_set_nn_state(nn, NULL, 0, -ENOTCONN); 1663 1584 spin_unlock(&nn->nn_lock); 1664 1585 ··· 1691 1610 1692 1611 /* ensure an immediate connect attempt */ 1693 1612 nn->nn_last_connect_attempt = jiffies - 1694 - (msecs_to_jiffies(o2net_reconnect_delay(node)) + 1); 1613 + (msecs_to_jiffies(o2net_reconnect_delay()) + 1); 1695 1614 1696 1615 if (node_num != o2nm_this_node()) { 1697 - /* heartbeat doesn't work unless a local node number is 1698 - * configured and doing so brings up the o2net_wq, so we can 1699 - * use it.. */ 1700 - queue_delayed_work(o2net_wq, &nn->nn_connect_expired, 1701 - msecs_to_jiffies(o2net_idle_timeout(node))); 1702 - 1703 1616 /* believe it or not, accept and node hearbeating testing 1704 1617 * can succeed for this node before we got here.. so 1705 1618 * only use set_nn_state to clear the persistent error 1706 1619 * if that hasn't already happened */ 1707 1620 spin_lock(&nn->nn_lock); 1621 + atomic_set(&nn->nn_timeout, 0); 1708 1622 if (nn->nn_persistent_error) 1709 1623 o2net_set_nn_state(nn, NULL, 0, 0); 1710 1624 spin_unlock(&nn->nn_lock); ··· 1823 1747 new_sock = NULL; 1824 1748 1825 1749 spin_lock(&nn->nn_lock); 1750 + atomic_set(&nn->nn_timeout, 0); 1826 1751 o2net_set_nn_state(nn, sc, 0, 0); 1827 1752 spin_unlock(&nn->nn_lock); 1828 1753 ··· 1999 1922 2000 1923 o2quo_init(); 2001 1924 1925 + if (o2net_debugfs_init()) 1926 + return -ENOMEM; 1927 + 2002 1928 o2net_hand = kzalloc(sizeof(struct o2net_handshake), GFP_KERNEL); 2003 1929 o2net_keep_req = kzalloc(sizeof(struct o2net_msg), GFP_KERNEL); 2004 1930 o2net_keep_resp = kzalloc(sizeof(struct o2net_msg), GFP_KERNEL); ··· 2021 1941 for (i = 0; i < ARRAY_SIZE(o2net_nodes); i++) { 2022 1942 struct o2net_node *nn = o2net_nn_from_num(i); 2023 1943 1944 + atomic_set(&nn->nn_timeout, 0); 2024 1945 spin_lock_init(&nn->nn_lock); 2025 1946 INIT_DELAYED_WORK(&nn->nn_connect_work, o2net_start_connect); 2026 1947 INIT_DELAYED_WORK(&nn->nn_connect_expired, ··· 2043 1962 kfree(o2net_hand); 2044 1963 kfree(o2net_keep_req); 2045 1964 kfree(o2net_keep_resp); 1965 + o2net_debugfs_exit(); 2046 1966 }

+32

fs/ocfs2/cluster/tcp.h

··· 117 117 int o2net_init(void); 118 118 void o2net_exit(void); 119 119 120 + struct o2net_send_tracking; 121 + struct o2net_sock_container; 122 + 123 + #ifdef CONFIG_DEBUG_FS 124 + int o2net_debugfs_init(void); 125 + void o2net_debugfs_exit(void); 126 + void o2net_debug_add_nst(struct o2net_send_tracking *nst); 127 + void o2net_debug_del_nst(struct o2net_send_tracking *nst); 128 + void o2net_debug_add_sc(struct o2net_sock_container *sc); 129 + void o2net_debug_del_sc(struct o2net_sock_container *sc); 130 + #else 131 + static int o2net_debugfs_init(void) 132 + { 133 + return 0; 134 + } 135 + static void o2net_debugfs_exit(void) 136 + { 137 + } 138 + static void o2net_debug_add_nst(struct o2net_send_tracking *nst) 139 + { 140 + } 141 + static void o2net_debug_del_nst(struct o2net_send_tracking *nst) 142 + { 143 + } 144 + static void o2net_debug_add_sc(struct o2net_sock_container *sc) 145 + { 146 + } 147 + static void o2net_debug_del_sc(struct o2net_sock_container *sc) 148 + { 149 + } 150 + #endif /* CONFIG_DEBUG_FS */ 151 + 120 152 #endif /* O2CLUSTER_TCP_H */

+25 -1

fs/ocfs2/cluster/tcp_internal.h

··· 95 95 unsigned nn_sc_valid:1; 96 96 /* if this is set tx just returns it */ 97 97 int nn_persistent_error; 98 + /* It is only set to 1 after the idle time out. */ 99 + atomic_t nn_timeout; 98 100 99 101 /* threads waiting for an sc to arrive wait on the wq for generation 100 102 * to increase. it is increased when a connecting socket succeeds ··· 166 164 /* original handlers for the sockets */ 167 165 void (*sc_state_change)(struct sock *sk); 168 166 void (*sc_data_ready)(struct sock *sk, int bytes); 169 - 167 + #ifdef CONFIG_DEBUG_FS 168 + struct list_head sc_net_debug_item; 169 + #endif 170 170 struct timeval sc_tv_timer; 171 171 struct timeval sc_tv_data_ready; 172 172 struct timeval sc_tv_advance_start; ··· 209 205 wait_queue_head_t ns_wq; 210 206 struct list_head ns_node_item; 211 207 }; 208 + 209 + #ifdef CONFIG_DEBUG_FS 210 + /* just for state dumps */ 211 + struct o2net_send_tracking { 212 + struct list_head st_net_debug_item; 213 + struct task_struct *st_task; 214 + struct o2net_sock_container *st_sc; 215 + u32 st_id; 216 + u32 st_msg_type; 217 + u32 st_msg_key; 218 + u8 st_node; 219 + struct timeval st_sock_time; 220 + struct timeval st_send_time; 221 + struct timeval st_status_time; 222 + }; 223 + #else 224 + struct o2net_send_tracking { 225 + u32 dummy; 226 + }; 227 + #endif /* CONFIG_DEBUG_FS */ 212 228 213 229 #endif /* O2CLUSTER_TCP_INTERNAL_H */

+1 -1

fs/ocfs2/dlm/Makefile

··· 1 1 EXTRA_CFLAGS += -Ifs/ocfs2 2 2 3 - obj-$(CONFIG_OCFS2_FS) += ocfs2_dlm.o ocfs2_dlmfs.o 3 + obj-$(CONFIG_OCFS2_FS_O2CB) += ocfs2_dlm.o ocfs2_dlmfs.o 4 4 5 5 ocfs2_dlm-objs := dlmdomain.o dlmdebug.o dlmthread.o dlmrecovery.o \ 6 6 dlmmaster.o dlmast.o dlmconvert.o dlmlock.o dlmunlock.o dlmver.o

+49

fs/ocfs2/dlm/dlmcommon.h

··· 49 49 /* Intended to make it easier for us to switch out hash functions */ 50 50 #define dlm_lockid_hash(_n, _l) full_name_hash(_n, _l) 51 51 52 + enum dlm_mle_type { 53 + DLM_MLE_BLOCK, 54 + DLM_MLE_MASTER, 55 + DLM_MLE_MIGRATION 56 + }; 57 + 58 + struct dlm_lock_name { 59 + u8 len; 60 + u8 name[DLM_LOCKID_NAME_MAX]; 61 + }; 62 + 63 + struct dlm_master_list_entry { 64 + struct list_head list; 65 + struct list_head hb_events; 66 + struct dlm_ctxt *dlm; 67 + spinlock_t spinlock; 68 + wait_queue_head_t wq; 69 + atomic_t woken; 70 + struct kref mle_refs; 71 + int inuse; 72 + unsigned long maybe_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; 73 + unsigned long vote_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; 74 + unsigned long response_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; 75 + unsigned long node_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; 76 + u8 master; 77 + u8 new_master; 78 + enum dlm_mle_type type; 79 + struct o2hb_callback_func mle_hb_up; 80 + struct o2hb_callback_func mle_hb_down; 81 + union { 82 + struct dlm_lock_resource *res; 83 + struct dlm_lock_name name; 84 + } u; 85 + }; 86 + 52 87 enum dlm_ast_type { 53 88 DLM_AST = 0, 54 89 DLM_BAST, ··· 136 101 struct list_head purge_list; 137 102 struct list_head pending_asts; 138 103 struct list_head pending_basts; 104 + struct list_head tracking_list; 139 105 unsigned int purge_count; 140 106 spinlock_t spinlock; 141 107 spinlock_t ast_lock; ··· 157 121 atomic_t local_resources; 158 122 atomic_t remote_resources; 159 123 atomic_t unknown_resources; 124 + 125 + struct dlm_debug_ctxt *dlm_debug_ctxt; 126 + struct dentry *dlm_debugfs_subroot; 160 127 161 128 /* NOTE: Next three are protected by dlm_domain_lock */ 162 129 struct kref dlm_refs; ··· 308 269 */ 309 270 struct list_head dirty; 310 271 struct list_head recovering; // dlm_recovery_ctxt.resources list 272 + 273 + /* Added during init and removed during release */ 274 + struct list_head tracking; /* dlm->tracking_list */ 311 275 312 276 /* unused lock resources have their last_used stamped and are 313 277 * put on a list for the dlm thread to run. */ ··· 1005 963 DLM_LOCK_RES_MIGRATING)); 1006 964 } 1007 965 966 + /* create/destroy slab caches */ 967 + int dlm_init_master_caches(void); 968 + void dlm_destroy_master_caches(void); 969 + 970 + int dlm_init_lock_cache(void); 971 + void dlm_destroy_lock_cache(void); 1008 972 1009 973 int dlm_init_mle_cache(void); 1010 974 void dlm_destroy_mle_cache(void); 975 + 1011 976 void dlm_hb_event_notify_attached(struct dlm_ctxt *dlm, int idx, int node_up); 1012 977 int dlm_drop_lockres_ref(struct dlm_ctxt *dlm, 1013 978 struct dlm_lock_resource *res);

+839 -72

fs/ocfs2/dlm/dlmdebug.c

··· 5 5 * 6 6 * debug functionality for the dlm 7 7 * 8 - * Copyright (C) 2004 Oracle. All rights reserved. 8 + * Copyright (C) 2004, 2008 Oracle. All rights reserved. 9 9 * 10 10 * This program is free software; you can redistribute it and/or 11 11 * modify it under the terms of the GNU General Public ··· 30 30 #include <linux/utsname.h> 31 31 #include <linux/sysctl.h> 32 32 #include <linux/spinlock.h> 33 + #include <linux/debugfs.h> 33 34 34 35 #include "cluster/heartbeat.h" 35 36 #include "cluster/nodemanager.h" ··· 38 37 39 38 #include "dlmapi.h" 40 39 #include "dlmcommon.h" 41 - 42 40 #include "dlmdomain.h" 41 + #include "dlmdebug.h" 43 42 44 43 #define MLOG_MASK_PREFIX ML_DLM 45 44 #include "cluster/masklog.h" 46 45 46 + int stringify_lockname(const char *lockname, int locklen, char *buf, int len); 47 + 47 48 void dlm_print_one_lock_resource(struct dlm_lock_resource *res) 48 49 { 49 - mlog(ML_NOTICE, "lockres: %.*s, owner=%u, state=%u\n", 50 - res->lockname.len, res->lockname.name, 51 - res->owner, res->state); 52 50 spin_lock(&res->spinlock); 53 51 __dlm_print_one_lock_resource(res); 54 52 spin_unlock(&res->spinlock); ··· 58 58 int bit; 59 59 assert_spin_locked(&res->spinlock); 60 60 61 - mlog(ML_NOTICE, " refmap nodes: [ "); 61 + printk(" refmap nodes: [ "); 62 62 bit = 0; 63 63 while (1) { 64 64 bit = find_next_bit(res->refmap, O2NM_MAX_NODES, bit); ··· 70 70 printk("], inflight=%u\n", res->inflight_locks); 71 71 } 72 72 73 + static void __dlm_print_lock(struct dlm_lock *lock) 74 + { 75 + spin_lock(&lock->spinlock); 76 + 77 + printk(" type=%d, conv=%d, node=%u, cookie=%u:%llu, " 78 + "ref=%u, ast=(empty=%c,pend=%c), bast=(empty=%c,pend=%c), " 79 + "pending=(conv=%c,lock=%c,cancel=%c,unlock=%c)\n", 80 + lock->ml.type, lock->ml.convert_type, lock->ml.node, 81 + dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)), 82 + dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)), 83 + atomic_read(&lock->lock_refs.refcount), 84 + (list_empty(&lock->ast_list) ? 'y' : 'n'), 85 + (lock->ast_pending ? 'y' : 'n'), 86 + (list_empty(&lock->bast_list) ? 'y' : 'n'), 87 + (lock->bast_pending ? 'y' : 'n'), 88 + (lock->convert_pending ? 'y' : 'n'), 89 + (lock->lock_pending ? 'y' : 'n'), 90 + (lock->cancel_pending ? 'y' : 'n'), 91 + (lock->unlock_pending ? 'y' : 'n')); 92 + 93 + spin_unlock(&lock->spinlock); 94 + } 95 + 73 96 void __dlm_print_one_lock_resource(struct dlm_lock_resource *res) 74 97 { 75 98 struct list_head *iter2; 76 99 struct dlm_lock *lock; 100 + char buf[DLM_LOCKID_NAME_MAX]; 77 101 78 102 assert_spin_locked(&res->spinlock); 79 103 80 - mlog(ML_NOTICE, "lockres: %.*s, owner=%u, state=%u\n", 81 - res->lockname.len, res->lockname.name, 82 - res->owner, res->state); 83 - mlog(ML_NOTICE, " last used: %lu, on purge list: %s\n", 84 - res->last_used, list_empty(&res->purge) ? "no" : "yes"); 104 + stringify_lockname(res->lockname.name, res->lockname.len, 105 + buf, sizeof(buf) - 1); 106 + printk("lockres: %s, owner=%u, state=%u\n", 107 + buf, res->owner, res->state); 108 + printk(" last used: %lu, refcnt: %u, on purge list: %s\n", 109 + res->last_used, atomic_read(&res->refs.refcount), 110 + list_empty(&res->purge) ? "no" : "yes"); 111 + printk(" on dirty list: %s, on reco list: %s, " 112 + "migrating pending: %s\n", 113 + list_empty(&res->dirty) ? "no" : "yes", 114 + list_empty(&res->recovering) ? "no" : "yes", 115 + res->migration_pending ? "yes" : "no"); 116 + printk(" inflight locks: %d, asts reserved: %d\n", 117 + res->inflight_locks, atomic_read(&res->asts_reserved)); 85 118 dlm_print_lockres_refmap(res); 86 - mlog(ML_NOTICE, " granted queue: \n"); 119 + printk(" granted queue:\n"); 87 120 list_for_each(iter2, &res->granted) { 88 121 lock = list_entry(iter2, struct dlm_lock, list); 89 - spin_lock(&lock->spinlock); 90 - mlog(ML_NOTICE, " type=%d, conv=%d, node=%u, " 91 - "cookie=%u:%llu, ast=(empty=%c,pend=%c), bast=(empty=%c,pend=%c)\n", 92 - lock->ml.type, lock->ml.convert_type, lock->ml.node, 93 - dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)), 94 - dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)), 95 - list_empty(&lock->ast_list) ? 'y' : 'n', 96 - lock->ast_pending ? 'y' : 'n', 97 - list_empty(&lock->bast_list) ? 'y' : 'n', 98 - lock->bast_pending ? 'y' : 'n'); 99 - spin_unlock(&lock->spinlock); 122 + __dlm_print_lock(lock); 100 123 } 101 - mlog(ML_NOTICE, " converting queue: \n"); 124 + printk(" converting queue:\n"); 102 125 list_for_each(iter2, &res->converting) { 103 126 lock = list_entry(iter2, struct dlm_lock, list); 104 - spin_lock(&lock->spinlock); 105 - mlog(ML_NOTICE, " type=%d, conv=%d, node=%u, " 106 - "cookie=%u:%llu, ast=(empty=%c,pend=%c), bast=(empty=%c,pend=%c)\n", 107 - lock->ml.type, lock->ml.convert_type, lock->ml.node, 108 - dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)), 109 - dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)), 110 - list_empty(&lock->ast_list) ? 'y' : 'n', 111 - lock->ast_pending ? 'y' : 'n', 112 - list_empty(&lock->bast_list) ? 'y' : 'n', 113 - lock->bast_pending ? 'y' : 'n'); 114 - spin_unlock(&lock->spinlock); 127 + __dlm_print_lock(lock); 115 128 } 116 - mlog(ML_NOTICE, " blocked queue: \n"); 129 + printk(" blocked queue:\n"); 117 130 list_for_each(iter2, &res->blocked) { 118 131 lock = list_entry(iter2, struct dlm_lock, list); 119 - spin_lock(&lock->spinlock); 120 - mlog(ML_NOTICE, " type=%d, conv=%d, node=%u, " 121 - "cookie=%u:%llu, ast=(empty=%c,pend=%c), bast=(empty=%c,pend=%c)\n", 122 - lock->ml.type, lock->ml.convert_type, lock->ml.node, 123 - dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)), 124 - dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)), 125 - list_empty(&lock->ast_list) ? 'y' : 'n', 126 - lock->ast_pending ? 'y' : 'n', 127 - list_empty(&lock->bast_list) ? 'y' : 'n', 128 - lock->bast_pending ? 'y' : 'n'); 129 - spin_unlock(&lock->spinlock); 132 + __dlm_print_lock(lock); 130 133 } 131 134 } 132 135 ··· 138 135 dlm_print_one_lock_resource(lockid->lockres); 139 136 } 140 137 EXPORT_SYMBOL_GPL(dlm_print_one_lock); 141 - 142 - #if 0 143 - void dlm_dump_lock_resources(struct dlm_ctxt *dlm) 144 - { 145 - struct dlm_lock_resource *res; 146 - struct hlist_node *iter; 147 - struct hlist_head *bucket; 148 - int i; 149 - 150 - mlog(ML_NOTICE, "struct dlm_ctxt: %s, node=%u, key=%u\n", 151 - dlm->name, dlm->node_num, dlm->key); 152 - if (!dlm || !dlm->name) { 153 - mlog(ML_ERROR, "dlm=%p\n", dlm); 154 - return; 155 - } 156 - 157 - spin_lock(&dlm->spinlock); 158 - for (i=0; i<DLM_HASH_BUCKETS; i++) { 159 - bucket = dlm_lockres_hash(dlm, i); 160 - hlist_for_each_entry(res, iter, bucket, hash_node) 161 - dlm_print_one_lock_resource(res); 162 - } 163 - spin_unlock(&dlm->spinlock); 164 - } 165 - #endif /* 0 */ 166 138 167 139 static const char *dlm_errnames[] = { 168 140 [DLM_NORMAL] = "DLM_NORMAL", ··· 244 266 return dlm_errnames[err]; 245 267 } 246 268 EXPORT_SYMBOL_GPL(dlm_errname); 269 + 270 + /* NOTE: This function converts a lockname into a string. It uses knowledge 271 + * of the format of the lockname that should be outside the purview of the dlm. 272 + * We are adding only to make dlm debugging slightly easier. 273 + * 274 + * For more on lockname formats, please refer to dlmglue.c and ocfs2_lockid.h. 275 + */ 276 + int stringify_lockname(const char *lockname, int locklen, char *buf, int len) 277 + { 278 + int out = 0; 279 + __be64 inode_blkno_be; 280 + 281 + #define OCFS2_DENTRY_LOCK_INO_START 18 282 + if (*lockname == 'N') { 283 + memcpy((__be64 *)&inode_blkno_be, 284 + (char *)&lockname[OCFS2_DENTRY_LOCK_INO_START], 285 + sizeof(__be64)); 286 + out += snprintf(buf + out, len - out, "%.*s%08x", 287 + OCFS2_DENTRY_LOCK_INO_START - 1, lockname, 288 + (unsigned int)be64_to_cpu(inode_blkno_be)); 289 + } else 290 + out += snprintf(buf + out, len - out, "%.*s", 291 + locklen, lockname); 292 + return out; 293 + } 294 + 295 + static int stringify_nodemap(unsigned long *nodemap, int maxnodes, 296 + char *buf, int len) 297 + { 298 + int out = 0; 299 + int i = -1; 300 + 301 + while ((i = find_next_bit(nodemap, maxnodes, i + 1)) < maxnodes) 302 + out += snprintf(buf + out, len - out, "%d ", i); 303 + 304 + return out; 305 + } 306 + 307 + static int dump_mle(struct dlm_master_list_entry *mle, char *buf, int len) 308 + { 309 + int out = 0; 310 + unsigned int namelen; 311 + const char *name; 312 + char *mle_type; 313 + 314 + if (mle->type != DLM_MLE_MASTER) { 315 + namelen = mle->u.name.len; 316 + name = mle->u.name.name; 317 + } else { 318 + namelen = mle->u.res->lockname.len; 319 + name = mle->u.res->lockname.name; 320 + } 321 + 322 + if (mle->type == DLM_MLE_BLOCK) 323 + mle_type = "BLK"; 324 + else if (mle->type == DLM_MLE_MASTER) 325 + mle_type = "MAS"; 326 + else 327 + mle_type = "MIG"; 328 + 329 + out += stringify_lockname(name, namelen, buf + out, len - out); 330 + out += snprintf(buf + out, len - out, 331 + "\t%3s\tmas=%3u\tnew=%3u\tevt=%1d\tuse=%1d\tref=%3d\n", 332 + mle_type, mle->master, mle->new_master, 333 + !list_empty(&mle->hb_events), 334 + !!mle->inuse, 335 + atomic_read(&mle->mle_refs.refcount)); 336 + 337 + out += snprintf(buf + out, len - out, "Maybe="); 338 + out += stringify_nodemap(mle->maybe_map, O2NM_MAX_NODES, 339 + buf + out, len - out); 340 + out += snprintf(buf + out, len - out, "\n"); 341 + 342 + out += snprintf(buf + out, len - out, "Vote="); 343 + out += stringify_nodemap(mle->vote_map, O2NM_MAX_NODES, 344 + buf + out, len - out); 345 + out += snprintf(buf + out, len - out, "\n"); 346 + 347 + out += snprintf(buf + out, len - out, "Response="); 348 + out += stringify_nodemap(mle->response_map, O2NM_MAX_NODES, 349 + buf + out, len - out); 350 + out += snprintf(buf + out, len - out, "\n"); 351 + 352 + out += snprintf(buf + out, len - out, "Node="); 353 + out += stringify_nodemap(mle->node_map, O2NM_MAX_NODES, 354 + buf + out, len - out); 355 + out += snprintf(buf + out, len - out, "\n"); 356 + 357 + out += snprintf(buf + out, len - out, "\n"); 358 + 359 + return out; 360 + } 361 + 362 + void dlm_print_one_mle(struct dlm_master_list_entry *mle) 363 + { 364 + char *buf; 365 + 366 + buf = (char *) get_zeroed_page(GFP_NOFS); 367 + if (buf) { 368 + dump_mle(mle, buf, PAGE_SIZE - 1); 369 + free_page((unsigned long)buf); 370 + } 371 + } 372 + 373 + #ifdef CONFIG_DEBUG_FS 374 + 375 + static struct dentry *dlm_debugfs_root = NULL; 376 + 377 + #define DLM_DEBUGFS_DIR "o2dlm" 378 + #define DLM_DEBUGFS_DLM_STATE "dlm_state" 379 + #define DLM_DEBUGFS_LOCKING_STATE "locking_state" 380 + #define DLM_DEBUGFS_MLE_STATE "mle_state" 381 + #define DLM_DEBUGFS_PURGE_LIST "purge_list" 382 + 383 + /* begin - utils funcs */ 384 + static void dlm_debug_free(struct kref *kref) 385 + { 386 + struct dlm_debug_ctxt *dc; 387 + 388 + dc = container_of(kref, struct dlm_debug_ctxt, debug_refcnt); 389 + 390 + kfree(dc); 391 + } 392 + 393 + void dlm_debug_put(struct dlm_debug_ctxt *dc) 394 + { 395 + if (dc) 396 + kref_put(&dc->debug_refcnt, dlm_debug_free); 397 + } 398 + 399 + static void dlm_debug_get(struct dlm_debug_ctxt *dc) 400 + { 401 + kref_get(&dc->debug_refcnt); 402 + } 403 + 404 + static struct debug_buffer *debug_buffer_allocate(void) 405 + { 406 + struct debug_buffer *db = NULL; 407 + 408 + db = kzalloc(sizeof(struct debug_buffer), GFP_KERNEL); 409 + if (!db) 410 + goto bail; 411 + 412 + db->len = PAGE_SIZE; 413 + db->buf = kmalloc(db->len, GFP_KERNEL); 414 + if (!db->buf) 415 + goto bail; 416 + 417 + return db; 418 + bail: 419 + kfree(db); 420 + return NULL; 421 + } 422 + 423 + static ssize_t debug_buffer_read(struct file *file, char __user *buf, 424 + size_t nbytes, loff_t *ppos) 425 + { 426 + struct debug_buffer *db = file->private_data; 427 + 428 + return simple_read_from_buffer(buf, nbytes, ppos, db->buf, db->len); 429 + } 430 + 431 + static loff_t debug_buffer_llseek(struct file *file, loff_t off, int whence) 432 + { 433 + struct debug_buffer *db = file->private_data; 434 + loff_t new = -1; 435 + 436 + switch (whence) { 437 + case 0: 438 + new = off; 439 + break; 440 + case 1: 441 + new = file->f_pos + off; 442 + break; 443 + } 444 + 445 + if (new < 0 || new > db->len) 446 + return -EINVAL; 447 + 448 + return (file->f_pos = new); 449 + } 450 + 451 + static int debug_buffer_release(struct inode *inode, struct file *file) 452 + { 453 + struct debug_buffer *db = (struct debug_buffer *)file->private_data; 454 + 455 + if (db) 456 + kfree(db->buf); 457 + kfree(db); 458 + 459 + return 0; 460 + } 461 + /* end - util funcs */ 462 + 463 + /* begin - purge list funcs */ 464 + static int debug_purgelist_print(struct dlm_ctxt *dlm, struct debug_buffer *db) 465 + { 466 + struct dlm_lock_resource *res; 467 + int out = 0; 468 + unsigned long total = 0; 469 + 470 + out += snprintf(db->buf + out, db->len - out, 471 + "Dumping Purgelist for Domain: %s\n", dlm->name); 472 + 473 + spin_lock(&dlm->spinlock); 474 + list_for_each_entry(res, &dlm->purge_list, purge) { 475 + ++total; 476 + if (db->len - out < 100) 477 + continue; 478 + spin_lock(&res->spinlock); 479 + out += stringify_lockname(res->lockname.name, 480 + res->lockname.len, 481 + db->buf + out, db->len - out); 482 + out += snprintf(db->buf + out, db->len - out, "\t%ld\n", 483 + (jiffies - res->last_used)/HZ); 484 + spin_unlock(&res->spinlock); 485 + } 486 + spin_unlock(&dlm->spinlock); 487 + 488 + out += snprintf(db->buf + out, db->len - out, 489 + "Total on list: %ld\n", total); 490 + 491 + return out; 492 + } 493 + 494 + static int debug_purgelist_open(struct inode *inode, struct file *file) 495 + { 496 + struct dlm_ctxt *dlm = inode->i_private; 497 + struct debug_buffer *db; 498 + 499 + db = debug_buffer_allocate(); 500 + if (!db) 501 + goto bail; 502 + 503 + db->len = debug_purgelist_print(dlm, db); 504 + 505 + file->private_data = db; 506 + 507 + return 0; 508 + bail: 509 + return -ENOMEM; 510 + } 511 + 512 + static struct file_operations debug_purgelist_fops = { 513 + .open = debug_purgelist_open, 514 + .release = debug_buffer_release, 515 + .read = debug_buffer_read, 516 + .llseek = debug_buffer_llseek, 517 + }; 518 + /* end - purge list funcs */ 519 + 520 + /* begin - debug mle funcs */ 521 + static int debug_mle_print(struct dlm_ctxt *dlm, struct debug_buffer *db) 522 + { 523 + struct dlm_master_list_entry *mle; 524 + int out = 0; 525 + unsigned long total = 0; 526 + 527 + out += snprintf(db->buf + out, db->len - out, 528 + "Dumping MLEs for Domain: %s\n", dlm->name); 529 + 530 + spin_lock(&dlm->master_lock); 531 + list_for_each_entry(mle, &dlm->master_list, list) { 532 + ++total; 533 + if (db->len - out < 200) 534 + continue; 535 + out += dump_mle(mle, db->buf + out, db->len - out); 536 + } 537 + spin_unlock(&dlm->master_lock); 538 + 539 + out += snprintf(db->buf + out, db->len - out, 540 + "Total on list: %ld\n", total); 541 + return out; 542 + } 543 + 544 + static int debug_mle_open(struct inode *inode, struct file *file) 545 + { 546 + struct dlm_ctxt *dlm = inode->i_private; 547 + struct debug_buffer *db; 548 + 549 + db = debug_buffer_allocate(); 550 + if (!db) 551 + goto bail; 552 + 553 + db->len = debug_mle_print(dlm, db); 554 + 555 + file->private_data = db; 556 + 557 + return 0; 558 + bail: 559 + return -ENOMEM; 560 + } 561 + 562 + static struct file_operations debug_mle_fops = { 563 + .open = debug_mle_open, 564 + .release = debug_buffer_release, 565 + .read = debug_buffer_read, 566 + .llseek = debug_buffer_llseek, 567 + }; 568 + 569 + /* end - debug mle funcs */ 570 + 571 + /* begin - debug lockres funcs */ 572 + static int dump_lock(struct dlm_lock *lock, int list_type, char *buf, int len) 573 + { 574 + int out; 575 + 576 + #define DEBUG_LOCK_VERSION 1 577 + spin_lock(&lock->spinlock); 578 + out = snprintf(buf, len, "LOCK:%d,%d,%d,%d,%d,%d:%lld,%d,%d,%d,%d,%d," 579 + "%d,%d,%d,%d\n", 580 + DEBUG_LOCK_VERSION, 581 + list_type, lock->ml.type, lock->ml.convert_type, 582 + lock->ml.node, 583 + dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)), 584 + dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)), 585 + !list_empty(&lock->ast_list), 586 + !list_empty(&lock->bast_list), 587 + lock->ast_pending, lock->bast_pending, 588 + lock->convert_pending, lock->lock_pending, 589 + lock->cancel_pending, lock->unlock_pending, 590 + atomic_read(&lock->lock_refs.refcount)); 591 + spin_unlock(&lock->spinlock); 592 + 593 + return out; 594 + } 595 + 596 + static int dump_lockres(struct dlm_lock_resource *res, char *buf, int len) 597 + { 598 + struct dlm_lock *lock; 599 + int i; 600 + int out = 0; 601 + 602 + out += snprintf(buf + out, len - out, "NAME:"); 603 + out += stringify_lockname(res->lockname.name, res->lockname.len, 604 + buf + out, len - out); 605 + out += snprintf(buf + out, len - out, "\n"); 606 + 607 + #define DEBUG_LRES_VERSION 1 608 + out += snprintf(buf + out, len - out, 609 + "LRES:%d,%d,%d,%ld,%d,%d,%d,%d,%d,%d,%d\n", 610 + DEBUG_LRES_VERSION, 611 + res->owner, res->state, res->last_used, 612 + !list_empty(&res->purge), 613 + !list_empty(&res->dirty), 614 + !list_empty(&res->recovering), 615 + res->inflight_locks, res->migration_pending, 616 + atomic_read(&res->asts_reserved), 617 + atomic_read(&res->refs.refcount)); 618 + 619 + /* refmap */ 620 + out += snprintf(buf + out, len - out, "RMAP:"); 621 + out += stringify_nodemap(res->refmap, O2NM_MAX_NODES, 622 + buf + out, len - out); 623 + out += snprintf(buf + out, len - out, "\n"); 624 + 625 + /* lvb */ 626 + out += snprintf(buf + out, len - out, "LVBX:"); 627 + for (i = 0; i < DLM_LVB_LEN; i++) 628 + out += snprintf(buf + out, len - out, 629 + "%02x", (unsigned char)res->lvb[i]); 630 + out += snprintf(buf + out, len - out, "\n"); 631 + 632 + /* granted */ 633 + list_for_each_entry(lock, &res->granted, list) 634 + out += dump_lock(lock, 0, buf + out, len - out); 635 + 636 + /* converting */ 637 + list_for_each_entry(lock, &res->converting, list) 638 + out += dump_lock(lock, 1, buf + out, len - out); 639 + 640 + /* blocked */ 641 + list_for_each_entry(lock, &res->blocked, list) 642 + out += dump_lock(lock, 2, buf + out, len - out); 643 + 644 + out += snprintf(buf + out, len - out, "\n"); 645 + 646 + return out; 647 + } 648 + 649 + static void *lockres_seq_start(struct seq_file *m, loff_t *pos) 650 + { 651 + struct debug_lockres *dl = m->private; 652 + struct dlm_ctxt *dlm = dl->dl_ctxt; 653 + struct dlm_lock_resource *res = NULL; 654 + 655 + spin_lock(&dlm->spinlock); 656 + 657 + if (dl->dl_res) { 658 + list_for_each_entry(res, &dl->dl_res->tracking, tracking) { 659 + if (dl->dl_res) { 660 + dlm_lockres_put(dl->dl_res); 661 + dl->dl_res = NULL; 662 + } 663 + if (&res->tracking == &dlm->tracking_list) { 664 + mlog(0, "End of list found, %p\n", res); 665 + dl = NULL; 666 + break; 667 + } 668 + dlm_lockres_get(res); 669 + dl->dl_res = res; 670 + break; 671 + } 672 + } else { 673 + if (!list_empty(&dlm->tracking_list)) { 674 + list_for_each_entry(res, &dlm->tracking_list, tracking) 675 + break; 676 + dlm_lockres_get(res); 677 + dl->dl_res = res; 678 + } else 679 + dl = NULL; 680 + } 681 + 682 + if (dl) { 683 + spin_lock(&dl->dl_res->spinlock); 684 + dump_lockres(dl->dl_res, dl->dl_buf, dl->dl_len - 1); 685 + spin_unlock(&dl->dl_res->spinlock); 686 + } 687 + 688 + spin_unlock(&dlm->spinlock); 689 + 690 + return dl; 691 + } 692 + 693 + static void lockres_seq_stop(struct seq_file *m, void *v) 694 + { 695 + } 696 + 697 + static void *lockres_seq_next(struct seq_file *m, void *v, loff_t *pos) 698 + { 699 + return NULL; 700 + } 701 + 702 + static int lockres_seq_show(struct seq_file *s, void *v) 703 + { 704 + struct debug_lockres *dl = (struct debug_lockres *)v; 705 + 706 + seq_printf(s, "%s", dl->dl_buf); 707 + 708 + return 0; 709 + } 710 + 711 + static struct seq_operations debug_lockres_ops = { 712 + .start = lockres_seq_start, 713 + .stop = lockres_seq_stop, 714 + .next = lockres_seq_next, 715 + .show = lockres_seq_show, 716 + }; 717 + 718 + static int debug_lockres_open(struct inode *inode, struct file *file) 719 + { 720 + struct dlm_ctxt *dlm = inode->i_private; 721 + int ret = -ENOMEM; 722 + struct seq_file *seq; 723 + struct debug_lockres *dl = NULL; 724 + 725 + dl = kzalloc(sizeof(struct debug_lockres), GFP_KERNEL); 726 + if (!dl) { 727 + mlog_errno(ret); 728 + goto bail; 729 + } 730 + 731 + dl->dl_len = PAGE_SIZE; 732 + dl->dl_buf = kmalloc(dl->dl_len, GFP_KERNEL); 733 + if (!dl->dl_buf) { 734 + mlog_errno(ret); 735 + goto bail; 736 + } 737 + 738 + ret = seq_open(file, &debug_lockres_ops); 739 + if (ret) { 740 + mlog_errno(ret); 741 + goto bail; 742 + } 743 + 744 + seq = (struct seq_file *) file->private_data; 745 + seq->private = dl; 746 + 747 + dlm_grab(dlm); 748 + dl->dl_ctxt = dlm; 749 + 750 + return 0; 751 + bail: 752 + if (dl) 753 + kfree(dl->dl_buf); 754 + kfree(dl); 755 + return ret; 756 + } 757 + 758 + static int debug_lockres_release(struct inode *inode, struct file *file) 759 + { 760 + struct seq_file *seq = (struct seq_file *)file->private_data; 761 + struct debug_lockres *dl = (struct debug_lockres *)seq->private; 762 + 763 + if (dl->dl_res) 764 + dlm_lockres_put(dl->dl_res); 765 + dlm_put(dl->dl_ctxt); 766 + kfree(dl->dl_buf); 767 + return seq_release_private(inode, file); 768 + } 769 + 770 + static struct file_operations debug_lockres_fops = { 771 + .open = debug_lockres_open, 772 + .release = debug_lockres_release, 773 + .read = seq_read, 774 + .llseek = seq_lseek, 775 + }; 776 + /* end - debug lockres funcs */ 777 + 778 + /* begin - debug state funcs */ 779 + static int debug_state_print(struct dlm_ctxt *dlm, struct debug_buffer *db) 780 + { 781 + int out = 0; 782 + struct dlm_reco_node_data *node; 783 + char *state; 784 + int lres, rres, ures, tres; 785 + 786 + lres = atomic_read(&dlm->local_resources); 787 + rres = atomic_read(&dlm->remote_resources); 788 + ures = atomic_read(&dlm->unknown_resources); 789 + tres = lres + rres + ures; 790 + 791 + spin_lock(&dlm->spinlock); 792 + 793 + switch (dlm->dlm_state) { 794 + case DLM_CTXT_NEW: 795 + state = "NEW"; break; 796 + case DLM_CTXT_JOINED: 797 + state = "JOINED"; break; 798 + case DLM_CTXT_IN_SHUTDOWN: 799 + state = "SHUTDOWN"; break; 800 + case DLM_CTXT_LEAVING: 801 + state = "LEAVING"; break; 802 + default: 803 + state = "UNKNOWN"; break; 804 + } 805 + 806 + /* Domain: xxxxxxxxxx Key: 0xdfbac769 */ 807 + out += snprintf(db->buf + out, db->len - out, 808 + "Domain: %s Key: 0x%08x\n", dlm->name, dlm->key); 809 + 810 + /* Thread Pid: xxx Node: xxx State: xxxxx */ 811 + out += snprintf(db->buf + out, db->len - out, 812 + "Thread Pid: %d Node: %d State: %s\n", 813 + dlm->dlm_thread_task->pid, dlm->node_num, state); 814 + 815 + /* Number of Joins: xxx Joining Node: xxx */ 816 + out += snprintf(db->buf + out, db->len - out, 817 + "Number of Joins: %d Joining Node: %d\n", 818 + dlm->num_joins, dlm->joining_node); 819 + 820 + /* Domain Map: xx xx xx */ 821 + out += snprintf(db->buf + out, db->len - out, "Domain Map: "); 822 + out += stringify_nodemap(dlm->domain_map, O2NM_MAX_NODES, 823 + db->buf + out, db->len - out); 824 + out += snprintf(db->buf + out, db->len - out, "\n"); 825 + 826 + /* Live Map: xx xx xx */ 827 + out += snprintf(db->buf + out, db->len - out, "Live Map: "); 828 + out += stringify_nodemap(dlm->live_nodes_map, O2NM_MAX_NODES, 829 + db->buf + out, db->len - out); 830 + out += snprintf(db->buf + out, db->len - out, "\n"); 831 + 832 + /* Mastered Resources Total: xxx Locally: xxx Remotely: ... */ 833 + out += snprintf(db->buf + out, db->len - out, 834 + "Mastered Resources Total: %d Locally: %d " 835 + "Remotely: %d Unknown: %d\n", 836 + tres, lres, rres, ures); 837 + 838 + /* Lists: Dirty=Empty Purge=InUse PendingASTs=Empty ... */ 839 + out += snprintf(db->buf + out, db->len - out, 840 + "Lists: Dirty=%s Purge=%s PendingASTs=%s " 841 + "PendingBASTs=%s Master=%s\n", 842 + (list_empty(&dlm->dirty_list) ? "Empty" : "InUse"), 843 + (list_empty(&dlm->purge_list) ? "Empty" : "InUse"), 844 + (list_empty(&dlm->pending_asts) ? "Empty" : "InUse"), 845 + (list_empty(&dlm->pending_basts) ? "Empty" : "InUse"), 846 + (list_empty(&dlm->master_list) ? "Empty" : "InUse")); 847 + 848 + /* Purge Count: xxx Refs: xxx */ 849 + out += snprintf(db->buf + out, db->len - out, 850 + "Purge Count: %d Refs: %d\n", dlm->purge_count, 851 + atomic_read(&dlm->dlm_refs.refcount)); 852 + 853 + /* Dead Node: xxx */ 854 + out += snprintf(db->buf + out, db->len - out, 855 + "Dead Node: %d\n", dlm->reco.dead_node); 856 + 857 + /* What about DLM_RECO_STATE_FINALIZE? */ 858 + if (dlm->reco.state == DLM_RECO_STATE_ACTIVE) 859 + state = "ACTIVE"; 860 + else 861 + state = "INACTIVE"; 862 + 863 + /* Recovery Pid: xxxx Master: xxx State: xxxx */ 864 + out += snprintf(db->buf + out, db->len - out, 865 + "Recovery Pid: %d Master: %d State: %s\n", 866 + dlm->dlm_reco_thread_task->pid, 867 + dlm->reco.new_master, state); 868 + 869 + /* Recovery Map: xx xx */ 870 + out += snprintf(db->buf + out, db->len - out, "Recovery Map: "); 871 + out += stringify_nodemap(dlm->recovery_map, O2NM_MAX_NODES, 872 + db->buf + out, db->len - out); 873 + out += snprintf(db->buf + out, db->len - out, "\n"); 874 + 875 + /* Recovery Node State: */ 876 + out += snprintf(db->buf + out, db->len - out, "Recovery Node State:\n"); 877 + list_for_each_entry(node, &dlm->reco.node_data, list) { 878 + switch (node->state) { 879 + case DLM_RECO_NODE_DATA_INIT: 880 + state = "INIT"; 881 + break; 882 + case DLM_RECO_NODE_DATA_REQUESTING: 883 + state = "REQUESTING"; 884 + break; 885 + case DLM_RECO_NODE_DATA_DEAD: 886 + state = "DEAD"; 887 + break; 888 + case DLM_RECO_NODE_DATA_RECEIVING: 889 + state = "RECEIVING"; 890 + break; 891 + case DLM_RECO_NODE_DATA_REQUESTED: 892 + state = "REQUESTED"; 893 + break; 894 + case DLM_RECO_NODE_DATA_DONE: 895 + state = "DONE"; 896 + break; 897 + case DLM_RECO_NODE_DATA_FINALIZE_SENT: 898 + state = "FINALIZE-SENT"; 899 + break; 900 + default: 901 + state = "BAD"; 902 + break; 903 + } 904 + out += snprintf(db->buf + out, db->len - out, "\t%u - %s\n", 905 + node->node_num, state); 906 + } 907 + 908 + spin_unlock(&dlm->spinlock); 909 + 910 + return out; 911 + } 912 + 913 + static int debug_state_open(struct inode *inode, struct file *file) 914 + { 915 + struct dlm_ctxt *dlm = inode->i_private; 916 + struct debug_buffer *db = NULL; 917 + 918 + db = debug_buffer_allocate(); 919 + if (!db) 920 + goto bail; 921 + 922 + db->len = debug_state_print(dlm, db); 923 + 924 + file->private_data = db; 925 + 926 + return 0; 927 + bail: 928 + return -ENOMEM; 929 + } 930 + 931 + static struct file_operations debug_state_fops = { 932 + .open = debug_state_open, 933 + .release = debug_buffer_release, 934 + .read = debug_buffer_read, 935 + .llseek = debug_buffer_llseek, 936 + }; 937 + /* end - debug state funcs */ 938 + 939 + /* files in subroot */ 940 + int dlm_debug_init(struct dlm_ctxt *dlm) 941 + { 942 + struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt; 943 + 944 + /* for dumping dlm_ctxt */ 945 + dc->debug_state_dentry = debugfs_create_file(DLM_DEBUGFS_DLM_STATE, 946 + S_IFREG|S_IRUSR, 947 + dlm->dlm_debugfs_subroot, 948 + dlm, &debug_state_fops); 949 + if (!dc->debug_state_dentry) { 950 + mlog_errno(-ENOMEM); 951 + goto bail; 952 + } 953 + 954 + /* for dumping lockres */ 955 + dc->debug_lockres_dentry = 956 + debugfs_create_file(DLM_DEBUGFS_LOCKING_STATE, 957 + S_IFREG|S_IRUSR, 958 + dlm->dlm_debugfs_subroot, 959 + dlm, &debug_lockres_fops); 960 + if (!dc->debug_lockres_dentry) { 961 + mlog_errno(-ENOMEM); 962 + goto bail; 963 + } 964 + 965 + /* for dumping mles */ 966 + dc->debug_mle_dentry = debugfs_create_file(DLM_DEBUGFS_MLE_STATE, 967 + S_IFREG|S_IRUSR, 968 + dlm->dlm_debugfs_subroot, 969 + dlm, &debug_mle_fops); 970 + if (!dc->debug_mle_dentry) { 971 + mlog_errno(-ENOMEM); 972 + goto bail; 973 + } 974 + 975 + /* for dumping lockres on the purge list */ 976 + dc->debug_purgelist_dentry = 977 + debugfs_create_file(DLM_DEBUGFS_PURGE_LIST, 978 + S_IFREG|S_IRUSR, 979 + dlm->dlm_debugfs_subroot, 980 + dlm, &debug_purgelist_fops); 981 + if (!dc->debug_purgelist_dentry) { 982 + mlog_errno(-ENOMEM); 983 + goto bail; 984 + } 985 + 986 + dlm_debug_get(dc); 987 + return 0; 988 + 989 + bail: 990 + dlm_debug_shutdown(dlm); 991 + return -ENOMEM; 992 + } 993 + 994 + void dlm_debug_shutdown(struct dlm_ctxt *dlm) 995 + { 996 + struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt; 997 + 998 + if (dc) { 999 + if (dc->debug_purgelist_dentry) 1000 + debugfs_remove(dc->debug_purgelist_dentry); 1001 + if (dc->debug_mle_dentry) 1002 + debugfs_remove(dc->debug_mle_dentry); 1003 + if (dc->debug_lockres_dentry) 1004 + debugfs_remove(dc->debug_lockres_dentry); 1005 + if (dc->debug_state_dentry) 1006 + debugfs_remove(dc->debug_state_dentry); 1007 + dlm_debug_put(dc); 1008 + } 1009 + } 1010 + 1011 + /* subroot - domain dir */ 1012 + int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm) 1013 + { 1014 + dlm->dlm_debugfs_subroot = debugfs_create_dir(dlm->name, 1015 + dlm_debugfs_root); 1016 + if (!dlm->dlm_debugfs_subroot) { 1017 + mlog_errno(-ENOMEM); 1018 + goto bail; 1019 + } 1020 + 1021 + dlm->dlm_debug_ctxt = kzalloc(sizeof(struct dlm_debug_ctxt), 1022 + GFP_KERNEL); 1023 + if (!dlm->dlm_debug_ctxt) { 1024 + mlog_errno(-ENOMEM); 1025 + goto bail; 1026 + } 1027 + kref_init(&dlm->dlm_debug_ctxt->debug_refcnt); 1028 + 1029 + return 0; 1030 + bail: 1031 + dlm_destroy_debugfs_subroot(dlm); 1032 + return -ENOMEM; 1033 + } 1034 + 1035 + void dlm_destroy_debugfs_subroot(struct dlm_ctxt *dlm) 1036 + { 1037 + if (dlm->dlm_debugfs_subroot) 1038 + debugfs_remove(dlm->dlm_debugfs_subroot); 1039 + } 1040 + 1041 + /* debugfs root */ 1042 + int dlm_create_debugfs_root(void) 1043 + { 1044 + dlm_debugfs_root = debugfs_create_dir(DLM_DEBUGFS_DIR, NULL); 1045 + if (!dlm_debugfs_root) { 1046 + mlog_errno(-ENOMEM); 1047 + return -ENOMEM; 1048 + } 1049 + return 0; 1050 + } 1051 + 1052 + void dlm_destroy_debugfs_root(void) 1053 + { 1054 + if (dlm_debugfs_root) 1055 + debugfs_remove(dlm_debugfs_root); 1056 + } 1057 + #endif /* CONFIG_DEBUG_FS */

+86

fs/ocfs2/dlm/dlmdebug.h

··· 1 + /* -*- mode: c; c-basic-offset: 8; -*- 2 + * vim: noexpandtab sw=8 ts=8 sts=0: 3 + * 4 + * dlmdebug.h 5 + * 6 + * Copyright (C) 2008 Oracle. All rights reserved. 7 + * 8 + * This program is free software; you can redistribute it and/or 9 + * modify it under the terms of the GNU General Public 10 + * License as published by the Free Software Foundation; either 11 + * version 2 of the License, or (at your option) any later version. 12 + * 13 + * This program is distributed in the hope that it will be useful, 14 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 15 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 16 + * General Public License for more details. 17 + * 18 + * You should have received a copy of the GNU General Public 19 + * License along with this program; if not, write to the 20 + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, 21 + * Boston, MA 021110-1307, USA. 22 + * 23 + */ 24 + 25 + #ifndef DLMDEBUG_H 26 + #define DLMDEBUG_H 27 + 28 + void dlm_print_one_mle(struct dlm_master_list_entry *mle); 29 + 30 + #ifdef CONFIG_DEBUG_FS 31 + 32 + struct dlm_debug_ctxt { 33 + struct kref debug_refcnt; 34 + struct dentry *debug_state_dentry; 35 + struct dentry *debug_lockres_dentry; 36 + struct dentry *debug_mle_dentry; 37 + struct dentry *debug_purgelist_dentry; 38 + }; 39 + 40 + struct debug_buffer { 41 + int len; 42 + char *buf; 43 + }; 44 + 45 + struct debug_lockres { 46 + int dl_len; 47 + char *dl_buf; 48 + struct dlm_ctxt *dl_ctxt; 49 + struct dlm_lock_resource *dl_res; 50 + }; 51 + 52 + int dlm_debug_init(struct dlm_ctxt *dlm); 53 + void dlm_debug_shutdown(struct dlm_ctxt *dlm); 54 + 55 + int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm); 56 + void dlm_destroy_debugfs_subroot(struct dlm_ctxt *dlm); 57 + 58 + int dlm_create_debugfs_root(void); 59 + void dlm_destroy_debugfs_root(void); 60 + 61 + #else 62 + 63 + static int dlm_debug_init(struct dlm_ctxt *dlm) 64 + { 65 + return 0; 66 + } 67 + static void dlm_debug_shutdown(struct dlm_ctxt *dlm) 68 + { 69 + } 70 + static int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm) 71 + { 72 + return 0; 73 + } 74 + static void dlm_destroy_debugfs_subroot(struct dlm_ctxt *dlm) 75 + { 76 + } 77 + static int dlm_create_debugfs_root(void) 78 + { 79 + return 0; 80 + } 81 + static void dlm_destroy_debugfs_root(void) 82 + { 83 + } 84 + 85 + #endif /* CONFIG_DEBUG_FS */ 86 + #endif /* DLMDEBUG_H */

+65 -5

fs/ocfs2/dlm/dlmdomain.c

··· 33 33 #include <linux/spinlock.h> 34 34 #include <linux/delay.h> 35 35 #include <linux/err.h> 36 + #include <linux/debugfs.h> 36 37 37 38 #include "cluster/heartbeat.h" 38 39 #include "cluster/nodemanager.h" ··· 41 40 42 41 #include "dlmapi.h" 43 42 #include "dlmcommon.h" 44 - 45 43 #include "dlmdomain.h" 44 + #include "dlmdebug.h" 46 45 47 46 #include "dlmver.h" 48 47 ··· 299 298 300 299 static void dlm_free_ctxt_mem(struct dlm_ctxt *dlm) 301 300 { 301 + dlm_destroy_debugfs_subroot(dlm); 302 + 302 303 if (dlm->lockres_hash) 303 304 dlm_free_pagevec((void **)dlm->lockres_hash, DLM_HASH_PAGES); 304 305 ··· 398 395 static void dlm_complete_dlm_shutdown(struct dlm_ctxt *dlm) 399 396 { 400 397 dlm_unregister_domain_handlers(dlm); 398 + dlm_debug_shutdown(dlm); 401 399 dlm_complete_thread(dlm); 402 400 dlm_complete_recovery_thread(dlm); 403 401 dlm_destroy_dlm_worker(dlm); ··· 648 644 void dlm_unregister_domain(struct dlm_ctxt *dlm) 649 645 { 650 646 int leave = 0; 647 + struct dlm_lock_resource *res; 651 648 652 649 spin_lock(&dlm_domain_lock); 653 650 BUG_ON(dlm->dlm_state != DLM_CTXT_JOINED); ··· 678 673 msleep(500); 679 674 mlog(0, "%s: more migration to do\n", dlm->name); 680 675 } 676 + 677 + /* This list should be empty. If not, print remaining lockres */ 678 + if (!list_empty(&dlm->tracking_list)) { 679 + mlog(ML_ERROR, "Following lockres' are still on the " 680 + "tracking list:\n"); 681 + list_for_each_entry(res, &dlm->tracking_list, tracking) 682 + dlm_print_one_lock_resource(res); 683 + } 684 + 681 685 dlm_mark_domain_leaving(dlm); 682 686 dlm_leave_domain(dlm); 683 687 dlm_complete_dlm_shutdown(dlm); ··· 1419 1405 goto bail; 1420 1406 } 1421 1407 1408 + status = dlm_debug_init(dlm); 1409 + if (status < 0) { 1410 + mlog_errno(status); 1411 + goto bail; 1412 + } 1413 + 1422 1414 status = dlm_launch_thread(dlm); 1423 1415 if (status < 0) { 1424 1416 mlog_errno(status); ··· 1492 1472 1493 1473 if (status) { 1494 1474 dlm_unregister_domain_handlers(dlm); 1475 + dlm_debug_shutdown(dlm); 1495 1476 dlm_complete_thread(dlm); 1496 1477 dlm_complete_recovery_thread(dlm); 1497 1478 dlm_destroy_dlm_worker(dlm); ··· 1505 1484 u32 key) 1506 1485 { 1507 1486 int i; 1487 + int ret; 1508 1488 struct dlm_ctxt *dlm = NULL; 1509 1489 1510 1490 dlm = kzalloc(sizeof(*dlm), GFP_KERNEL); ··· 1538 1516 dlm->key = key; 1539 1517 dlm->node_num = o2nm_this_node(); 1540 1518 1519 + ret = dlm_create_debugfs_subroot(dlm); 1520 + if (ret < 0) { 1521 + dlm_free_pagevec((void **)dlm->lockres_hash, DLM_HASH_PAGES); 1522 + kfree(dlm->name); 1523 + kfree(dlm); 1524 + dlm = NULL; 1525 + goto leave; 1526 + } 1527 + 1541 1528 spin_lock_init(&dlm->spinlock); 1542 1529 spin_lock_init(&dlm->master_lock); 1543 1530 spin_lock_init(&dlm->ast_lock); ··· 1557 1526 INIT_LIST_HEAD(&dlm->reco.node_data); 1558 1527 INIT_LIST_HEAD(&dlm->purge_list); 1559 1528 INIT_LIST_HEAD(&dlm->dlm_domain_handlers); 1529 + INIT_LIST_HEAD(&dlm->tracking_list); 1560 1530 dlm->reco.state = 0; 1561 1531 1562 1532 INIT_LIST_HEAD(&dlm->pending_asts); ··· 1848 1816 dlm_print_version(); 1849 1817 1850 1818 status = dlm_init_mle_cache(); 1851 - if (status) 1852 - return -1; 1819 + if (status) { 1820 + mlog(ML_ERROR, "Could not create o2dlm_mle slabcache\n"); 1821 + goto error; 1822 + } 1823 + 1824 + status = dlm_init_master_caches(); 1825 + if (status) { 1826 + mlog(ML_ERROR, "Could not create o2dlm_lockres and " 1827 + "o2dlm_lockname slabcaches\n"); 1828 + goto error; 1829 + } 1830 + 1831 + status = dlm_init_lock_cache(); 1832 + if (status) { 1833 + mlog(ML_ERROR, "Count not create o2dlm_lock slabcache\n"); 1834 + goto error; 1835 + } 1853 1836 1854 1837 status = dlm_register_net_handlers(); 1855 1838 if (status) { 1856 - dlm_destroy_mle_cache(); 1857 - return -1; 1839 + mlog(ML_ERROR, "Unable to register network handlers\n"); 1840 + goto error; 1858 1841 } 1859 1842 1843 + status = dlm_create_debugfs_root(); 1844 + if (status) 1845 + goto error; 1846 + 1860 1847 return 0; 1848 + error: 1849 + dlm_unregister_net_handlers(); 1850 + dlm_destroy_lock_cache(); 1851 + dlm_destroy_master_caches(); 1852 + dlm_destroy_mle_cache(); 1853 + return -1; 1861 1854 } 1862 1855 1863 1856 static void __exit dlm_exit (void) 1864 1857 { 1858 + dlm_destroy_debugfs_root(); 1865 1859 dlm_unregister_net_handlers(); 1860 + dlm_destroy_lock_cache(); 1861 + dlm_destroy_master_caches(); 1866 1862 dlm_destroy_mle_cache(); 1867 1863 } 1868 1864

+20 -2

fs/ocfs2/dlm/dlmlock.c

··· 53 53 #define MLOG_MASK_PREFIX ML_DLM 54 54 #include "cluster/masklog.h" 55 55 56 + static struct kmem_cache *dlm_lock_cache = NULL; 57 + 56 58 static DEFINE_SPINLOCK(dlm_cookie_lock); 57 59 static u64 dlm_next_cookie = 1; 58 60 ··· 65 63 u8 node, u64 cookie); 66 64 static void dlm_lock_release(struct kref *kref); 67 65 static void dlm_lock_detach_lockres(struct dlm_lock *lock); 66 + 67 + int dlm_init_lock_cache(void) 68 + { 69 + dlm_lock_cache = kmem_cache_create("o2dlm_lock", 70 + sizeof(struct dlm_lock), 71 + 0, SLAB_HWCACHE_ALIGN, NULL); 72 + if (dlm_lock_cache == NULL) 73 + return -ENOMEM; 74 + return 0; 75 + } 76 + 77 + void dlm_destroy_lock_cache(void) 78 + { 79 + if (dlm_lock_cache) 80 + kmem_cache_destroy(dlm_lock_cache); 81 + } 68 82 69 83 /* Tell us whether we can grant a new lock request. 70 84 * locking: ··· 371 353 mlog(0, "freeing kernel-allocated lksb\n"); 372 354 kfree(lock->lksb); 373 355 } 374 - kfree(lock); 356 + kmem_cache_free(dlm_lock_cache, lock); 375 357 } 376 358 377 359 /* associate a lock with it's lockres, getting a ref on the lockres */ ··· 430 412 struct dlm_lock *lock; 431 413 int kernel_allocated = 0; 432 414 433 - lock = kzalloc(sizeof(*lock), GFP_NOFS); 415 + lock = (struct dlm_lock *) kmem_cache_zalloc(dlm_lock_cache, GFP_NOFS); 434 416 if (!lock) 435 417 return NULL; 436 418

+62 -138

fs/ocfs2/dlm/dlmmaster.c

··· 48 48 #include "dlmapi.h" 49 49 #include "dlmcommon.h" 50 50 #include "dlmdomain.h" 51 + #include "dlmdebug.h" 51 52 52 53 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_MASTER) 53 54 #include "cluster/masklog.h" 54 - 55 - enum dlm_mle_type { 56 - DLM_MLE_BLOCK, 57 - DLM_MLE_MASTER, 58 - DLM_MLE_MIGRATION 59 - }; 60 - 61 - struct dlm_lock_name 62 - { 63 - u8 len; 64 - u8 name[DLM_LOCKID_NAME_MAX]; 65 - }; 66 - 67 - struct dlm_master_list_entry 68 - { 69 - struct list_head list; 70 - struct list_head hb_events; 71 - struct dlm_ctxt *dlm; 72 - spinlock_t spinlock; 73 - wait_queue_head_t wq; 74 - atomic_t woken; 75 - struct kref mle_refs; 76 - int inuse; 77 - unsigned long maybe_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; 78 - unsigned long vote_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; 79 - unsigned long response_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; 80 - unsigned long node_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; 81 - u8 master; 82 - u8 new_master; 83 - enum dlm_mle_type type; 84 - struct o2hb_callback_func mle_hb_up; 85 - struct o2hb_callback_func mle_hb_down; 86 - union { 87 - struct dlm_lock_resource *res; 88 - struct dlm_lock_name name; 89 - } u; 90 - }; 91 55 92 56 static void dlm_mle_node_down(struct dlm_ctxt *dlm, 93 57 struct dlm_master_list_entry *mle, ··· 92 128 return 1; 93 129 } 94 130 95 - #define dlm_print_nodemap(m) _dlm_print_nodemap(m,#m) 96 - static void _dlm_print_nodemap(unsigned long *map, const char *mapname) 97 - { 98 - int i; 99 - printk("%s=[ ", mapname); 100 - for (i=0; i<O2NM_MAX_NODES; i++) 101 - if (test_bit(i, map)) 102 - printk("%d ", i); 103 - printk("]"); 104 - } 105 - 106 - static void dlm_print_one_mle(struct dlm_master_list_entry *mle) 107 - { 108 - int refs; 109 - char *type; 110 - char attached; 111 - u8 master; 112 - unsigned int namelen; 113 - const char *name; 114 - struct kref *k; 115 - unsigned long *maybe = mle->maybe_map, 116 - *vote = mle->vote_map, 117 - *resp = mle->response_map, 118 - *node = mle->node_map; 119 - 120 - k = &mle->mle_refs; 121 - if (mle->type == DLM_MLE_BLOCK) 122 - type = "BLK"; 123 - else if (mle->type == DLM_MLE_MASTER) 124 - type = "MAS"; 125 - else 126 - type = "MIG"; 127 - refs = atomic_read(&k->refcount); 128 - master = mle->master; 129 - attached = (list_empty(&mle->hb_events) ? 'N' : 'Y'); 130 - 131 - if (mle->type != DLM_MLE_MASTER) { 132 - namelen = mle->u.name.len; 133 - name = mle->u.name.name; 134 - } else { 135 - namelen = mle->u.res->lockname.len; 136 - name = mle->u.res->lockname.name; 137 - } 138 - 139 - mlog(ML_NOTICE, "%.*s: %3s refs=%3d mas=%3u new=%3u evt=%c inuse=%d ", 140 - namelen, name, type, refs, master, mle->new_master, attached, 141 - mle->inuse); 142 - dlm_print_nodemap(maybe); 143 - printk(", "); 144 - dlm_print_nodemap(vote); 145 - printk(", "); 146 - dlm_print_nodemap(resp); 147 - printk(", "); 148 - dlm_print_nodemap(node); 149 - printk(", "); 150 - printk("\n"); 151 - } 152 - 153 - #if 0 154 - /* Code here is included but defined out as it aids debugging */ 155 - 156 - static void dlm_dump_mles(struct dlm_ctxt *dlm) 157 - { 158 - struct dlm_master_list_entry *mle; 159 - 160 - mlog(ML_NOTICE, "dumping all mles for domain %s:\n", dlm->name); 161 - spin_lock(&dlm->master_lock); 162 - list_for_each_entry(mle, &dlm->master_list, list) 163 - dlm_print_one_mle(mle); 164 - spin_unlock(&dlm->master_lock); 165 - } 166 - 167 - int dlm_dump_all_mles(const char __user *data, unsigned int len) 168 - { 169 - struct dlm_ctxt *dlm; 170 - 171 - spin_lock(&dlm_domain_lock); 172 - list_for_each_entry(dlm, &dlm_domains, list) { 173 - mlog(ML_NOTICE, "found dlm: %p, name=%s\n", dlm, dlm->name); 174 - dlm_dump_mles(dlm); 175 - } 176 - spin_unlock(&dlm_domain_lock); 177 - return len; 178 - } 179 - EXPORT_SYMBOL_GPL(dlm_dump_all_mles); 180 - 181 - #endif /* 0 */ 182 - 183 - 131 + static struct kmem_cache *dlm_lockres_cache = NULL; 132 + static struct kmem_cache *dlm_lockname_cache = NULL; 184 133 static struct kmem_cache *dlm_mle_cache = NULL; 185 - 186 134 187 135 static void dlm_mle_release(struct kref *kref); 188 136 static void dlm_init_mle(struct dlm_master_list_entry *mle, ··· 383 507 384 508 int dlm_init_mle_cache(void) 385 509 { 386 - dlm_mle_cache = kmem_cache_create("dlm_mle_cache", 510 + dlm_mle_cache = kmem_cache_create("o2dlm_mle", 387 511 sizeof(struct dlm_master_list_entry), 388 512 0, SLAB_HWCACHE_ALIGN, 389 513 NULL); ··· 436 560 * LOCK RESOURCE FUNCTIONS 437 561 */ 438 562 563 + int dlm_init_master_caches(void) 564 + { 565 + dlm_lockres_cache = kmem_cache_create("o2dlm_lockres", 566 + sizeof(struct dlm_lock_resource), 567 + 0, SLAB_HWCACHE_ALIGN, NULL); 568 + if (!dlm_lockres_cache) 569 + goto bail; 570 + 571 + dlm_lockname_cache = kmem_cache_create("o2dlm_lockname", 572 + DLM_LOCKID_NAME_MAX, 0, 573 + SLAB_HWCACHE_ALIGN, NULL); 574 + if (!dlm_lockname_cache) 575 + goto bail; 576 + 577 + return 0; 578 + bail: 579 + dlm_destroy_master_caches(); 580 + return -ENOMEM; 581 + } 582 + 583 + void dlm_destroy_master_caches(void) 584 + { 585 + if (dlm_lockname_cache) 586 + kmem_cache_destroy(dlm_lockname_cache); 587 + 588 + if (dlm_lockres_cache) 589 + kmem_cache_destroy(dlm_lockres_cache); 590 + } 591 + 439 592 static void dlm_set_lockres_owner(struct dlm_ctxt *dlm, 440 593 struct dlm_lock_resource *res, 441 594 u8 owner) ··· 515 610 mlog(0, "destroying lockres %.*s\n", res->lockname.len, 516 611 res->lockname.name); 517 612 613 + if (!list_empty(&res->tracking)) 614 + list_del_init(&res->tracking); 615 + else { 616 + mlog(ML_ERROR, "Resource %.*s not on the Tracking list\n", 617 + res->lockname.len, res->lockname.name); 618 + dlm_print_one_lock_resource(res); 619 + } 620 + 518 621 if (!hlist_unhashed(&res->hash_node) || 519 622 !list_empty(&res->granted) || 520 623 !list_empty(&res->converting) || ··· 555 642 BUG_ON(!list_empty(&res->recovering)); 556 643 BUG_ON(!list_empty(&res->purge)); 557 644 558 - kfree(res->lockname.name); 645 + kmem_cache_free(dlm_lockname_cache, (void *)res->lockname.name); 559 646 560 - kfree(res); 647 + kmem_cache_free(dlm_lockres_cache, res); 561 648 } 562 649 563 650 void dlm_lockres_put(struct dlm_lock_resource *res) ··· 590 677 INIT_LIST_HEAD(&res->dirty); 591 678 INIT_LIST_HEAD(&res->recovering); 592 679 INIT_LIST_HEAD(&res->purge); 680 + INIT_LIST_HEAD(&res->tracking); 593 681 atomic_set(&res->asts_reserved, 0); 594 682 res->migration_pending = 0; 595 683 res->inflight_locks = 0; ··· 606 692 607 693 res->last_used = 0; 608 694 695 + list_add_tail(&res->tracking, &dlm->tracking_list); 696 + 609 697 memset(res->lvb, 0, DLM_LVB_LEN); 610 698 memset(res->refmap, 0, sizeof(res->refmap)); 611 699 } ··· 616 700 const char *name, 617 701 unsigned int namelen) 618 702 { 619 - struct dlm_lock_resource *res; 703 + struct dlm_lock_resource *res = NULL; 620 704 621 - res = kmalloc(sizeof(struct dlm_lock_resource), GFP_NOFS); 705 + res = (struct dlm_lock_resource *) 706 + kmem_cache_zalloc(dlm_lockres_cache, GFP_NOFS); 622 707 if (!res) 623 - return NULL; 708 + goto error; 624 709 625 - res->lockname.name = kmalloc(namelen, GFP_NOFS); 626 - if (!res->lockname.name) { 627 - kfree(res); 628 - return NULL; 629 - } 710 + res->lockname.name = (char *) 711 + kmem_cache_zalloc(dlm_lockname_cache, GFP_NOFS); 712 + if (!res->lockname.name) 713 + goto error; 630 714 631 715 dlm_init_lockres(dlm, res, name, namelen); 632 716 return res; 717 + 718 + error: 719 + if (res && res->lockname.name) 720 + kmem_cache_free(dlm_lockname_cache, (void *)res->lockname.name); 721 + 722 + if (res) 723 + kmem_cache_free(dlm_lockres_cache, res); 724 + return NULL; 633 725 } 634 726 635 727 void __dlm_lockres_grab_inflight_ref(struct dlm_ctxt *dlm,

+399 -244

fs/ocfs2/dlmglue.c

··· 27 27 #include <linux/slab.h> 28 28 #include <linux/highmem.h> 29 29 #include <linux/mm.h> 30 - #include <linux/crc32.h> 31 30 #include <linux/kthread.h> 32 31 #include <linux/pagemap.h> 33 32 #include <linux/debugfs.h> 34 33 #include <linux/seq_file.h> 35 - 36 - #include <cluster/heartbeat.h> 37 - #include <cluster/nodemanager.h> 38 - #include <cluster/tcp.h> 39 - 40 - #include <dlm/dlmapi.h> 41 34 42 35 #define MLOG_MASK_PREFIX ML_DLM_GLUE 43 36 #include <cluster/masklog.h> ··· 46 53 #include "heartbeat.h" 47 54 #include "inode.h" 48 55 #include "journal.h" 56 + #include "stackglue.h" 49 57 #include "slot_map.h" 50 58 #include "super.h" 51 59 #include "uptodate.h" ··· 107 113 unsigned int line, 108 114 struct ocfs2_lock_res *lockres) 109 115 { 110 - struct ocfs2_meta_lvb *lvb = (struct ocfs2_meta_lvb *) lockres->l_lksb.lvb; 116 + struct ocfs2_meta_lvb *lvb = 117 + (struct ocfs2_meta_lvb *)ocfs2_dlm_lvb(&lockres->l_lksb); 111 118 112 119 mlog(level, "LVB information for %s (called from %s:%u):\n", 113 120 lockres->l_name, function, line); ··· 254 259 .flags = 0, 255 260 }; 256 261 257 - /* 258 - * This is the filesystem locking protocol version. 259 - * 260 - * Whenever the filesystem does new things with locks (adds or removes a 261 - * lock, orders them differently, does different things underneath a lock), 262 - * the version must be changed. The protocol is negotiated when joining 263 - * the dlm domain. A node may join the domain if its major version is 264 - * identical to all other nodes and its minor version is greater than 265 - * or equal to all other nodes. When its minor version is greater than 266 - * the other nodes, it will run at the minor version specified by the 267 - * other nodes. 268 - * 269 - * If a locking change is made that will not be compatible with older 270 - * versions, the major number must be increased and the minor version set 271 - * to zero. If a change merely adds a behavior that can be disabled when 272 - * speaking to older versions, the minor version must be increased. If a 273 - * change adds a fully backwards compatible change (eg, LVB changes that 274 - * are just ignored by older versions), the version does not need to be 275 - * updated. 276 - */ 277 - const struct dlm_protocol_version ocfs2_locking_protocol = { 278 - .pv_major = OCFS2_LOCKING_PROTOCOL_MAJOR, 279 - .pv_minor = OCFS2_LOCKING_PROTOCOL_MINOR, 280 - }; 281 - 282 262 static inline int ocfs2_is_inode_lock(struct ocfs2_lock_res *lockres) 283 263 { 284 264 return lockres->l_type == OCFS2_LOCK_TYPE_META || ··· 286 316 static int ocfs2_lock_create(struct ocfs2_super *osb, 287 317 struct ocfs2_lock_res *lockres, 288 318 int level, 289 - int dlm_flags); 319 + u32 dlm_flags); 290 320 static inline int ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res *lockres, 291 321 int wanted); 292 322 static void ocfs2_cluster_unlock(struct ocfs2_super *osb, ··· 300 330 struct ocfs2_lock_res *lockres); 301 331 static inline void ocfs2_recover_from_dlm_error(struct ocfs2_lock_res *lockres, 302 332 int convert); 303 - #define ocfs2_log_dlm_error(_func, _stat, _lockres) do { \ 304 - mlog(ML_ERROR, "Dlm error \"%s\" while calling %s on " \ 305 - "resource %s: %s\n", dlm_errname(_stat), _func, \ 306 - _lockres->l_name, dlm_errmsg(_stat)); \ 333 + #define ocfs2_log_dlm_error(_func, _err, _lockres) do { \ 334 + mlog(ML_ERROR, "DLM error %d while calling %s on resource %s\n", \ 335 + _err, _func, _lockres->l_name); \ 307 336 } while (0) 308 337 static int ocfs2_downconvert_thread(void *arg); 309 338 static void ocfs2_downconvert_on_unlock(struct ocfs2_super *osb, ··· 311 342 struct buffer_head **bh); 312 343 static void ocfs2_drop_osb_locks(struct ocfs2_super *osb); 313 344 static inline int ocfs2_highest_compat_lock_level(int level); 314 - static void ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres, 315 - int new_level); 345 + static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres, 346 + int new_level); 316 347 static int ocfs2_downconvert_lock(struct ocfs2_super *osb, 317 348 struct ocfs2_lock_res *lockres, 318 349 int new_level, 319 - int lvb); 350 + int lvb, 351 + unsigned int generation); 320 352 static int ocfs2_prepare_cancel_convert(struct ocfs2_super *osb, 321 353 struct ocfs2_lock_res *lockres); 322 354 static int ocfs2_cancel_convert(struct ocfs2_super *osb, ··· 376 406 res->l_ops = ops; 377 407 res->l_priv = priv; 378 408 379 - res->l_level = LKM_IVMODE; 380 - res->l_requested = LKM_IVMODE; 381 - res->l_blocking = LKM_IVMODE; 409 + res->l_level = DLM_LOCK_IV; 410 + res->l_requested = DLM_LOCK_IV; 411 + res->l_blocking = DLM_LOCK_IV; 382 412 res->l_action = OCFS2_AST_INVALID; 383 413 res->l_unlock_action = OCFS2_UNLOCK_INVALID; 384 414 ··· 574 604 BUG_ON(!lockres); 575 605 576 606 switch(level) { 577 - case LKM_EXMODE: 607 + case DLM_LOCK_EX: 578 608 lockres->l_ex_holders++; 579 609 break; 580 - case LKM_PRMODE: 610 + case DLM_LOCK_PR: 581 611 lockres->l_ro_holders++; 582 612 break; 583 613 default: ··· 595 625 BUG_ON(!lockres); 596 626 597 627 switch(level) { 598 - case LKM_EXMODE: 628 + case DLM_LOCK_EX: 599 629 BUG_ON(!lockres->l_ex_holders); 600 630 lockres->l_ex_holders--; 601 631 break; 602 - case LKM_PRMODE: 632 + case DLM_LOCK_PR: 603 633 BUG_ON(!lockres->l_ro_holders); 604 634 lockres->l_ro_holders--; 605 635 break; ··· 614 644 * lock types are added. */ 615 645 static inline int ocfs2_highest_compat_lock_level(int level) 616 646 { 617 - int new_level = LKM_EXMODE; 647 + int new_level = DLM_LOCK_EX; 618 648 619 - if (level == LKM_EXMODE) 620 - new_level = LKM_NLMODE; 621 - else if (level == LKM_PRMODE) 622 - new_level = LKM_PRMODE; 649 + if (level == DLM_LOCK_EX) 650 + new_level = DLM_LOCK_NL; 651 + else if (level == DLM_LOCK_PR) 652 + new_level = DLM_LOCK_PR; 623 653 return new_level; 624 654 } 625 655 ··· 658 688 BUG_ON(!(lockres->l_flags & OCFS2_LOCK_BUSY)); 659 689 BUG_ON(!(lockres->l_flags & OCFS2_LOCK_ATTACHED)); 660 690 BUG_ON(!(lockres->l_flags & OCFS2_LOCK_BLOCKED)); 661 - BUG_ON(lockres->l_blocking <= LKM_NLMODE); 691 + BUG_ON(lockres->l_blocking <= DLM_LOCK_NL); 662 692 663 693 lockres->l_level = lockres->l_requested; 664 694 if (lockres->l_level <= 665 695 ocfs2_highest_compat_lock_level(lockres->l_blocking)) { 666 - lockres->l_blocking = LKM_NLMODE; 696 + lockres->l_blocking = DLM_LOCK_NL; 667 697 lockres_clear_flags(lockres, OCFS2_LOCK_BLOCKED); 668 698 } 669 699 lockres_clear_flags(lockres, OCFS2_LOCK_BUSY); ··· 682 712 * information is already up to data. Convert from NL to 683 713 * *anything* however should mark ourselves as needing an 684 714 * update */ 685 - if (lockres->l_level == LKM_NLMODE && 715 + if (lockres->l_level == DLM_LOCK_NL && 686 716 lockres->l_ops->flags & LOCK_TYPE_REQUIRES_REFRESH) 687 717 lockres_or_flags(lockres, OCFS2_LOCK_NEEDS_REFRESH); 688 718 ··· 699 729 BUG_ON((!(lockres->l_flags & OCFS2_LOCK_BUSY))); 700 730 BUG_ON(lockres->l_flags & OCFS2_LOCK_ATTACHED); 701 731 702 - if (lockres->l_requested > LKM_NLMODE && 732 + if (lockres->l_requested > DLM_LOCK_NL && 703 733 !(lockres->l_flags & OCFS2_LOCK_LOCAL) && 704 734 lockres->l_ops->flags & LOCK_TYPE_REQUIRES_REFRESH) 705 735 lockres_or_flags(lockres, OCFS2_LOCK_NEEDS_REFRESH); ··· 737 767 return needs_downconvert; 738 768 } 739 769 770 + /* 771 + * OCFS2_LOCK_PENDING and l_pending_gen. 772 + * 773 + * Why does OCFS2_LOCK_PENDING exist? To close a race between setting 774 + * OCFS2_LOCK_BUSY and calling ocfs2_dlm_lock(). See ocfs2_unblock_lock() 775 + * for more details on the race. 776 + * 777 + * OCFS2_LOCK_PENDING closes the race quite nicely. However, it introduces 778 + * a race on itself. In o2dlm, we can get the ast before ocfs2_dlm_lock() 779 + * returns. The ast clears OCFS2_LOCK_BUSY, and must therefore clear 780 + * OCFS2_LOCK_PENDING at the same time. When ocfs2_dlm_lock() returns, 781 + * the caller is going to try to clear PENDING again. If nothing else is 782 + * happening, __lockres_clear_pending() sees PENDING is unset and does 783 + * nothing. 784 + * 785 + * But what if another path (eg downconvert thread) has just started a 786 + * new locking action? The other path has re-set PENDING. Our path 787 + * cannot clear PENDING, because that will re-open the original race 788 + * window. 789 + * 790 + * [Example] 791 + * 792 + * ocfs2_meta_lock() 793 + * ocfs2_cluster_lock() 794 + * set BUSY 795 + * set PENDING 796 + * drop l_lock 797 + * ocfs2_dlm_lock() 798 + * ocfs2_locking_ast() ocfs2_downconvert_thread() 799 + * clear PENDING ocfs2_unblock_lock() 800 + * take_l_lock 801 + * !BUSY 802 + * ocfs2_prepare_downconvert() 803 + * set BUSY 804 + * set PENDING 805 + * drop l_lock 806 + * take l_lock 807 + * clear PENDING 808 + * drop l_lock 809 + * <window> 810 + * ocfs2_dlm_lock() 811 + * 812 + * So as you can see, we now have a window where l_lock is not held, 813 + * PENDING is not set, and ocfs2_dlm_lock() has not been called. 814 + * 815 + * The core problem is that ocfs2_cluster_lock() has cleared the PENDING 816 + * set by ocfs2_prepare_downconvert(). That wasn't nice. 817 + * 818 + * To solve this we introduce l_pending_gen. A call to 819 + * lockres_clear_pending() will only do so when it is passed a generation 820 + * number that matches the lockres. lockres_set_pending() will return the 821 + * current generation number. When ocfs2_cluster_lock() goes to clear 822 + * PENDING, it passes the generation it got from set_pending(). In our 823 + * example above, the generation numbers will *not* match. Thus, 824 + * ocfs2_cluster_lock() will not clear the PENDING set by 825 + * ocfs2_prepare_downconvert(). 826 + */ 827 + 828 + /* Unlocked version for ocfs2_locking_ast() */ 829 + static void __lockres_clear_pending(struct ocfs2_lock_res *lockres, 830 + unsigned int generation, 831 + struct ocfs2_super *osb) 832 + { 833 + assert_spin_locked(&lockres->l_lock); 834 + 835 + /* 836 + * The ast and locking functions can race us here. The winner 837 + * will clear pending, the loser will not. 838 + */ 839 + if (!(lockres->l_flags & OCFS2_LOCK_PENDING) || 840 + (lockres->l_pending_gen != generation)) 841 + return; 842 + 843 + lockres_clear_flags(lockres, OCFS2_LOCK_PENDING); 844 + lockres->l_pending_gen++; 845 + 846 + /* 847 + * The downconvert thread may have skipped us because we 848 + * were PENDING. Wake it up. 849 + */ 850 + if (lockres->l_flags & OCFS2_LOCK_BLOCKED) 851 + ocfs2_wake_downconvert_thread(osb); 852 + } 853 + 854 + /* Locked version for callers of ocfs2_dlm_lock() */ 855 + static void lockres_clear_pending(struct ocfs2_lock_res *lockres, 856 + unsigned int generation, 857 + struct ocfs2_super *osb) 858 + { 859 + unsigned long flags; 860 + 861 + spin_lock_irqsave(&lockres->l_lock, flags); 862 + __lockres_clear_pending(lockres, generation, osb); 863 + spin_unlock_irqrestore(&lockres->l_lock, flags); 864 + } 865 + 866 + static unsigned int lockres_set_pending(struct ocfs2_lock_res *lockres) 867 + { 868 + assert_spin_locked(&lockres->l_lock); 869 + BUG_ON(!(lockres->l_flags & OCFS2_LOCK_BUSY)); 870 + 871 + lockres_or_flags(lockres, OCFS2_LOCK_PENDING); 872 + 873 + return lockres->l_pending_gen; 874 + } 875 + 876 + 740 877 static void ocfs2_blocking_ast(void *opaque, int level) 741 878 { 742 879 struct ocfs2_lock_res *lockres = opaque; ··· 851 774 int needs_downconvert; 852 775 unsigned long flags; 853 776 854 - BUG_ON(level <= LKM_NLMODE); 777 + BUG_ON(level <= DLM_LOCK_NL); 855 778 856 779 mlog(0, "BAST fired for lockres %s, blocking %d, level %d type %s\n", 857 780 lockres->l_name, level, lockres->l_level, ··· 878 801 static void ocfs2_locking_ast(void *opaque) 879 802 { 880 803 struct ocfs2_lock_res *lockres = opaque; 881 - struct dlm_lockstatus *lksb = &lockres->l_lksb; 804 + struct ocfs2_super *osb = ocfs2_get_lockres_osb(lockres); 882 805 unsigned long flags; 806 + int status; 883 807 884 808 spin_lock_irqsave(&lockres->l_lock, flags); 885 809 886 - if (lksb->status != DLM_NORMAL) { 887 - mlog(ML_ERROR, "lockres %s: lksb status value of %u!\n", 888 - lockres->l_name, lksb->status); 810 + status = ocfs2_dlm_lock_status(&lockres->l_lksb); 811 + 812 + if (status == -EAGAIN) { 813 + lockres_clear_flags(lockres, OCFS2_LOCK_BUSY); 814 + goto out; 815 + } 816 + 817 + if (status) { 818 + mlog(ML_ERROR, "lockres %s: lksb status value of %d!\n", 819 + lockres->l_name, status); 889 820 spin_unlock_irqrestore(&lockres->l_lock, flags); 890 821 return; 891 822 } ··· 916 831 lockres->l_unlock_action); 917 832 BUG(); 918 833 } 919 - 834 + out: 920 835 /* set it to something invalid so if we get called again we 921 836 * can catch it. */ 922 837 lockres->l_action = OCFS2_AST_INVALID; 838 + 839 + /* Did we try to cancel this lock? Clear that state */ 840 + if (lockres->l_unlock_action == OCFS2_UNLOCK_CANCEL_CONVERT) 841 + lockres->l_unlock_action = OCFS2_UNLOCK_INVALID; 842 + 843 + /* 844 + * We may have beaten the locking functions here. We certainly 845 + * know that dlm_lock() has been called :-) 846 + * Because we can't have two lock calls in flight at once, we 847 + * can use lockres->l_pending_gen. 848 + */ 849 + __lockres_clear_pending(lockres, lockres->l_pending_gen, osb); 923 850 924 851 wake_up(&lockres->l_event); 925 852 spin_unlock_irqrestore(&lockres->l_lock, flags); ··· 962 865 static int ocfs2_lock_create(struct ocfs2_super *osb, 963 866 struct ocfs2_lock_res *lockres, 964 867 int level, 965 - int dlm_flags) 868 + u32 dlm_flags) 966 869 { 967 870 int ret = 0; 968 - enum dlm_status status = DLM_NORMAL; 969 871 unsigned long flags; 872 + unsigned int gen; 970 873 971 874 mlog_entry_void(); 972 875 973 - mlog(0, "lock %s, level = %d, flags = %d\n", lockres->l_name, level, 876 + mlog(0, "lock %s, level = %d, flags = %u\n", lockres->l_name, level, 974 877 dlm_flags); 975 878 976 879 spin_lock_irqsave(&lockres->l_lock, flags); ··· 983 886 lockres->l_action = OCFS2_AST_ATTACH; 984 887 lockres->l_requested = level; 985 888 lockres_or_flags(lockres, OCFS2_LOCK_BUSY); 889 + gen = lockres_set_pending(lockres); 986 890 spin_unlock_irqrestore(&lockres->l_lock, flags); 987 891 988 - status = dlmlock(osb->dlm, 989 - level, 990 - &lockres->l_lksb, 991 - dlm_flags, 992 - lockres->l_name, 993 - OCFS2_LOCK_ID_MAX_LEN - 1, 994 - ocfs2_locking_ast, 995 - lockres, 996 - ocfs2_blocking_ast); 997 - if (status != DLM_NORMAL) { 998 - ocfs2_log_dlm_error("dlmlock", status, lockres); 999 - ret = -EINVAL; 892 + ret = ocfs2_dlm_lock(osb->cconn, 893 + level, 894 + &lockres->l_lksb, 895 + dlm_flags, 896 + lockres->l_name, 897 + OCFS2_LOCK_ID_MAX_LEN - 1, 898 + lockres); 899 + lockres_clear_pending(lockres, gen, osb); 900 + if (ret) { 901 + ocfs2_log_dlm_error("ocfs2_dlm_lock", ret, lockres); 1000 902 ocfs2_recover_from_dlm_error(lockres, 1); 1001 903 } 1002 904 1003 - mlog(0, "lock %s, successfull return from dlmlock\n", lockres->l_name); 905 + mlog(0, "lock %s, return from ocfs2_dlm_lock\n", lockres->l_name); 1004 906 1005 907 bail: 1006 908 mlog_exit(ret); ··· 1112 1016 static int ocfs2_cluster_lock(struct ocfs2_super *osb, 1113 1017 struct ocfs2_lock_res *lockres, 1114 1018 int level, 1115 - int lkm_flags, 1019 + u32 lkm_flags, 1116 1020 int arg_flags) 1117 1021 { 1118 1022 struct ocfs2_mask_waiter mw; 1119 - enum dlm_status status; 1120 1023 int wait, catch_signals = !(osb->s_mount_opt & OCFS2_MOUNT_NOINTR); 1121 1024 int ret = 0; /* gcc doesn't realize wait = 1 guarantees ret is set */ 1122 1025 unsigned long flags; 1026 + unsigned int gen; 1027 + int noqueue_attempted = 0; 1123 1028 1124 1029 mlog_entry_void(); 1125 1030 1126 1031 ocfs2_init_mask_waiter(&mw); 1127 1032 1128 1033 if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB) 1129 - lkm_flags |= LKM_VALBLK; 1034 + lkm_flags |= DLM_LKF_VALBLK; 1130 1035 1131 1036 again: 1132 1037 wait = 0; ··· 1165 1068 } 1166 1069 1167 1070 if (level > lockres->l_level) { 1071 + if (noqueue_attempted > 0) { 1072 + ret = -EAGAIN; 1073 + goto unlock; 1074 + } 1075 + if (lkm_flags & DLM_LKF_NOQUEUE) 1076 + noqueue_attempted = 1; 1077 + 1168 1078 if (lockres->l_action != OCFS2_AST_INVALID) 1169 1079 mlog(ML_ERROR, "lockres %s has action %u pending\n", 1170 1080 lockres->l_name, lockres->l_action); 1171 1081 1172 1082 if (!(lockres->l_flags & OCFS2_LOCK_ATTACHED)) { 1173 1083 lockres->l_action = OCFS2_AST_ATTACH; 1174 - lkm_flags &= ~LKM_CONVERT; 1084 + lkm_flags &= ~DLM_LKF_CONVERT; 1175 1085 } else { 1176 1086 lockres->l_action = OCFS2_AST_CONVERT; 1177 - lkm_flags |= LKM_CONVERT; 1087 + lkm_flags |= DLM_LKF_CONVERT; 1178 1088 } 1179 1089 1180 1090 lockres->l_requested = level; 1181 1091 lockres_or_flags(lockres, OCFS2_LOCK_BUSY); 1092 + gen = lockres_set_pending(lockres); 1182 1093 spin_unlock_irqrestore(&lockres->l_lock, flags); 1183 1094 1184 - BUG_ON(level == LKM_IVMODE); 1185 - BUG_ON(level == LKM_NLMODE); 1095 + BUG_ON(level == DLM_LOCK_IV); 1096 + BUG_ON(level == DLM_LOCK_NL); 1186 1097 1187 1098 mlog(0, "lock %s, convert from %d to level = %d\n", 1188 1099 lockres->l_name, lockres->l_level, level); 1189 1100 1190 1101 /* call dlm_lock to upgrade lock now */ 1191 - status = dlmlock(osb->dlm, 1192 - level, 1193 - &lockres->l_lksb, 1194 - lkm_flags, 1195 - lockres->l_name, 1196 - OCFS2_LOCK_ID_MAX_LEN - 1, 1197 - ocfs2_locking_ast, 1198 - lockres, 1199 - ocfs2_blocking_ast); 1200 - if (status != DLM_NORMAL) { 1201 - if ((lkm_flags & LKM_NOQUEUE) && 1202 - (status == DLM_NOTQUEUED)) 1203 - ret = -EAGAIN; 1204 - else { 1205 - ocfs2_log_dlm_error("dlmlock", status, 1206 - lockres); 1207 - ret = -EINVAL; 1102 + ret = ocfs2_dlm_lock(osb->cconn, 1103 + level, 1104 + &lockres->l_lksb, 1105 + lkm_flags, 1106 + lockres->l_name, 1107 + OCFS2_LOCK_ID_MAX_LEN - 1, 1108 + lockres); 1109 + lockres_clear_pending(lockres, gen, osb); 1110 + if (ret) { 1111 + if (!(lkm_flags & DLM_LKF_NOQUEUE) || 1112 + (ret != -EAGAIN)) { 1113 + ocfs2_log_dlm_error("ocfs2_dlm_lock", 1114 + ret, lockres); 1208 1115 } 1209 1116 ocfs2_recover_from_dlm_error(lockres, 1); 1210 1117 goto out; 1211 1118 } 1212 1119 1213 - mlog(0, "lock %s, successfull return from dlmlock\n", 1120 + mlog(0, "lock %s, successfull return from ocfs2_dlm_lock\n", 1214 1121 lockres->l_name); 1215 1122 1216 1123 /* At this point we've gone inside the dlm and need to ··· 1278 1177 int ex, 1279 1178 int local) 1280 1179 { 1281 - int level = ex ? LKM_EXMODE : LKM_PRMODE; 1180 + int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR; 1282 1181 unsigned long flags; 1283 - int lkm_flags = local ? LKM_LOCAL : 0; 1182 + u32 lkm_flags = local ? DLM_LKF_LOCAL : 0; 1284 1183 1285 1184 spin_lock_irqsave(&lockres->l_lock, flags); 1286 1185 BUG_ON(lockres->l_flags & OCFS2_LOCK_ATTACHED); ··· 1323 1222 } 1324 1223 1325 1224 /* 1326 - * We don't want to use LKM_LOCAL on a meta data lock as they 1225 + * We don't want to use DLM_LKF_LOCAL on a meta data lock as they 1327 1226 * don't use a generation in their lock names. 1328 1227 */ 1329 1228 ret = ocfs2_create_new_lock(osb, &OCFS2_I(inode)->ip_inode_lockres, 1, 0); ··· 1362 1261 1363 1262 lockres = &OCFS2_I(inode)->ip_rw_lockres; 1364 1263 1365 - level = write ? LKM_EXMODE : LKM_PRMODE; 1264 + level = write ? DLM_LOCK_EX : DLM_LOCK_PR; 1366 1265 1367 1266 status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level, 0, 1368 1267 0); ··· 1375 1274 1376 1275 void ocfs2_rw_unlock(struct inode *inode, int write) 1377 1276 { 1378 - int level = write ? LKM_EXMODE : LKM_PRMODE; 1277 + int level = write ? DLM_LOCK_EX : DLM_LOCK_PR; 1379 1278 struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_rw_lockres; 1380 1279 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); 1381 1280 ··· 1413 1312 lockres = &OCFS2_I(inode)->ip_open_lockres; 1414 1313 1415 1314 status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, 1416 - LKM_PRMODE, 0, 0); 1315 + DLM_LOCK_PR, 0, 0); 1417 1316 if (status < 0) 1418 1317 mlog_errno(status); 1419 1318 ··· 1441 1340 1442 1341 lockres = &OCFS2_I(inode)->ip_open_lockres; 1443 1342 1444 - level = write ? LKM_EXMODE : LKM_PRMODE; 1343 + level = write ? DLM_LOCK_EX : DLM_LOCK_PR; 1445 1344 1446 1345 /* 1447 1346 * The file system may already holding a PRMODE/EXMODE open lock. 1448 - * Since we pass LKM_NOQUEUE, the request won't block waiting on 1347 + * Since we pass DLM_LKF_NOQUEUE, the request won't block waiting on 1449 1348 * other nodes and the -EAGAIN will indicate to the caller that 1450 1349 * this inode is still in use. 1451 1350 */ 1452 1351 status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, 1453 - level, LKM_NOQUEUE, 0); 1352 + level, DLM_LKF_NOQUEUE, 0); 1454 1353 1455 1354 out: 1456 1355 mlog_exit(status); ··· 1475 1374 1476 1375 if(lockres->l_ro_holders) 1477 1376 ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres, 1478 - LKM_PRMODE); 1377 + DLM_LOCK_PR); 1479 1378 if(lockres->l_ex_holders) 1480 1379 ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres, 1481 - LKM_EXMODE); 1380 + DLM_LOCK_EX); 1482 1381 1483 1382 out: 1484 1383 mlog_exit_void(); ··· 1565 1464 ocfs2_init_mask_waiter(&mw); 1566 1465 1567 1466 if ((lockres->l_flags & OCFS2_LOCK_BUSY) || 1568 - (lockres->l_level > LKM_NLMODE)) { 1467 + (lockres->l_level > DLM_LOCK_NL)) { 1569 1468 mlog(ML_ERROR, 1570 1469 "File lock \"%s\" has busy or locked state: flags: 0x%lx, " 1571 1470 "level: %u\n", lockres->l_name, lockres->l_flags, ··· 1604 1503 lockres_add_mask_waiter(lockres, &mw, OCFS2_LOCK_BUSY, 0); 1605 1504 spin_unlock_irqrestore(&lockres->l_lock, flags); 1606 1505 1607 - ret = dlmlock(osb->dlm, level, &lockres->l_lksb, lkm_flags, 1608 - lockres->l_name, OCFS2_LOCK_ID_MAX_LEN - 1, 1609 - ocfs2_locking_ast, lockres, ocfs2_blocking_ast); 1610 - if (ret != DLM_NORMAL) { 1611 - if (trylock && ret == DLM_NOTQUEUED) 1612 - ret = -EAGAIN; 1613 - else { 1614 - ocfs2_log_dlm_error("dlmlock", ret, lockres); 1506 + ret = ocfs2_dlm_lock(osb->cconn, level, &lockres->l_lksb, lkm_flags, 1507 + lockres->l_name, OCFS2_LOCK_ID_MAX_LEN - 1, 1508 + lockres); 1509 + if (ret) { 1510 + if (!trylock || (ret != -EAGAIN)) { 1511 + ocfs2_log_dlm_error("ocfs2_dlm_lock", ret, lockres); 1615 1512 ret = -EINVAL; 1616 1513 } 1617 1514 ··· 1636 1537 * to just bubble sucess back up to the user. 1637 1538 */ 1638 1539 ret = ocfs2_flock_handle_signal(lockres, level); 1540 + } else if (!ret && (level > lockres->l_level)) { 1541 + /* Trylock failed asynchronously */ 1542 + BUG_ON(!trylock); 1543 + ret = -EAGAIN; 1639 1544 } 1640 1545 1641 1546 out: ··· 1652 1549 void ocfs2_file_unlock(struct file *file) 1653 1550 { 1654 1551 int ret; 1552 + unsigned int gen; 1655 1553 unsigned long flags; 1656 1554 struct ocfs2_file_private *fp = file->private_data; 1657 1555 struct ocfs2_lock_res *lockres = &fp->fp_flock; ··· 1676 1572 * Fake a blocking ast for the downconvert code. 1677 1573 */ 1678 1574 lockres_or_flags(lockres, OCFS2_LOCK_BLOCKED); 1679 - lockres->l_blocking = LKM_EXMODE; 1575 + lockres->l_blocking = DLM_LOCK_EX; 1680 1576 1681 - ocfs2_prepare_downconvert(lockres, LKM_NLMODE); 1577 + gen = ocfs2_prepare_downconvert(lockres, LKM_NLMODE); 1682 1578 lockres_add_mask_waiter(lockres, &mw, OCFS2_LOCK_BUSY, 0); 1683 1579 spin_unlock_irqrestore(&lockres->l_lock, flags); 1684 1580 1685 - ret = ocfs2_downconvert_lock(osb, lockres, LKM_NLMODE, 0); 1581 + ret = ocfs2_downconvert_lock(osb, lockres, LKM_NLMODE, 0, gen); 1686 1582 if (ret) { 1687 1583 mlog_errno(ret); 1688 1584 return; ··· 1705 1601 * condition. */ 1706 1602 if (lockres->l_flags & OCFS2_LOCK_BLOCKED) { 1707 1603 switch(lockres->l_blocking) { 1708 - case LKM_EXMODE: 1604 + case DLM_LOCK_EX: 1709 1605 if (!lockres->l_ex_holders && !lockres->l_ro_holders) 1710 1606 kick = 1; 1711 1607 break; 1712 - case LKM_PRMODE: 1608 + case DLM_LOCK_PR: 1713 1609 if (!lockres->l_ex_holders) 1714 1610 kick = 1; 1715 1611 break; ··· 1752 1648 1753 1649 mlog_entry_void(); 1754 1650 1755 - lvb = (struct ocfs2_meta_lvb *) lockres->l_lksb.lvb; 1651 + lvb = (struct ocfs2_meta_lvb *)ocfs2_dlm_lvb(&lockres->l_lksb); 1756 1652 1757 1653 /* 1758 1654 * Invalidate the LVB of a deleted inode - this way other ··· 1804 1700 1805 1701 mlog_meta_lvb(0, lockres); 1806 1702 1807 - lvb = (struct ocfs2_meta_lvb *) lockres->l_lksb.lvb; 1703 + lvb = (struct ocfs2_meta_lvb *)ocfs2_dlm_lvb(&lockres->l_lksb); 1808 1704 1809 1705 /* We're safe here without the lockres lock... */ 1810 1706 spin_lock(&oi->ip_lock); ··· 1839 1735 static inline int ocfs2_meta_lvb_is_trustable(struct inode *inode, 1840 1736 struct ocfs2_lock_res *lockres) 1841 1737 { 1842 - struct ocfs2_meta_lvb *lvb = (struct ocfs2_meta_lvb *) lockres->l_lksb.lvb; 1738 + struct ocfs2_meta_lvb *lvb = 1739 + (struct ocfs2_meta_lvb *)ocfs2_dlm_lvb(&lockres->l_lksb); 1843 1740 1844 1741 if (lvb->lvb_version == OCFS2_LVB_VERSION 1845 1742 && be32_to_cpu(lvb->lvb_igeneration) == inode->i_generation) ··· 2028 1923 int ex, 2029 1924 int arg_flags) 2030 1925 { 2031 - int status, level, dlm_flags, acquired; 1926 + int status, level, acquired; 1927 + u32 dlm_flags; 2032 1928 struct ocfs2_lock_res *lockres = NULL; 2033 1929 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); 2034 1930 struct buffer_head *local_bh = NULL; ··· 2056 1950 goto local; 2057 1951 2058 1952 if (!(arg_flags & OCFS2_META_LOCK_RECOVERY)) 2059 - wait_event(osb->recovery_event, 2060 - ocfs2_node_map_is_empty(osb, &osb->recovery_map)); 1953 + ocfs2_wait_for_recovery(osb); 2061 1954 2062 1955 lockres = &OCFS2_I(inode)->ip_inode_lockres; 2063 - level = ex ? LKM_EXMODE : LKM_PRMODE; 1956 + level = ex ? DLM_LOCK_EX : DLM_LOCK_PR; 2064 1957 dlm_flags = 0; 2065 1958 if (arg_flags & OCFS2_META_LOCK_NOQUEUE) 2066 - dlm_flags |= LKM_NOQUEUE; 1959 + dlm_flags |= DLM_LKF_NOQUEUE; 2067 1960 2068 1961 status = ocfs2_cluster_lock(osb, lockres, level, dlm_flags, arg_flags); 2069 1962 if (status < 0) { ··· 2079 1974 * committed to owning this lock so we don't allow signals to 2080 1975 * abort the operation. */ 2081 1976 if (!(arg_flags & OCFS2_META_LOCK_RECOVERY)) 2082 - wait_event(osb->recovery_event, 2083 - ocfs2_node_map_is_empty(osb, &osb->recovery_map)); 1977 + ocfs2_wait_for_recovery(osb); 2084 1978 2085 1979 local: 2086 1980 /* ··· 2213 2109 void ocfs2_inode_unlock(struct inode *inode, 2214 2110 int ex) 2215 2111 { 2216 - int level = ex ? LKM_EXMODE : LKM_PRMODE; 2112 + int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR; 2217 2113 struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres; 2218 2114 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); 2219 2115 ··· 2234 2130 int ex) 2235 2131 { 2236 2132 int status = 0; 2237 - int level = ex ? LKM_EXMODE : LKM_PRMODE; 2133 + int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR; 2238 2134 struct ocfs2_lock_res *lockres = &osb->osb_super_lockres; 2239 - struct buffer_head *bh; 2240 - struct ocfs2_slot_info *si = osb->slot_info; 2241 2135 2242 2136 mlog_entry_void(); 2243 2137 ··· 2261 2159 goto bail; 2262 2160 } 2263 2161 if (status) { 2264 - bh = si->si_bh; 2265 - status = ocfs2_read_block(osb, bh->b_blocknr, &bh, 0, 2266 - si->si_inode); 2267 - if (status == 0) 2268 - ocfs2_update_slot_info(si); 2162 + status = ocfs2_refresh_slot_info(osb); 2269 2163 2270 2164 ocfs2_complete_lock_res_refresh(lockres, status); 2271 2165 ··· 2276 2178 void ocfs2_super_unlock(struct ocfs2_super *osb, 2277 2179 int ex) 2278 2180 { 2279 - int level = ex ? LKM_EXMODE : LKM_PRMODE; 2181 + int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR; 2280 2182 struct ocfs2_lock_res *lockres = &osb->osb_super_lockres; 2281 2183 2282 2184 if (!ocfs2_mount_local(osb)) ··· 2294 2196 if (ocfs2_mount_local(osb)) 2295 2197 return 0; 2296 2198 2297 - status = ocfs2_cluster_lock(osb, lockres, LKM_EXMODE, 0, 0); 2199 + status = ocfs2_cluster_lock(osb, lockres, DLM_LOCK_EX, 0, 0); 2298 2200 if (status < 0) 2299 2201 mlog_errno(status); 2300 2202 ··· 2306 2208 struct ocfs2_lock_res *lockres = &osb->osb_rename_lockres; 2307 2209 2308 2210 if (!ocfs2_mount_local(osb)) 2309 - ocfs2_cluster_unlock(osb, lockres, LKM_EXMODE); 2211 + ocfs2_cluster_unlock(osb, lockres, DLM_LOCK_EX); 2310 2212 } 2311 2213 2312 2214 int ocfs2_dentry_lock(struct dentry *dentry, int ex) 2313 2215 { 2314 2216 int ret; 2315 - int level = ex ? LKM_EXMODE : LKM_PRMODE; 2217 + int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR; 2316 2218 struct ocfs2_dentry_lock *dl = dentry->d_fsdata; 2317 2219 struct ocfs2_super *osb = OCFS2_SB(dentry->d_sb); 2318 2220 ··· 2333 2235 2334 2236 void ocfs2_dentry_unlock(struct dentry *dentry, int ex) 2335 2237 { 2336 - int level = ex ? LKM_EXMODE : LKM_PRMODE; 2238 + int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR; 2337 2239 struct ocfs2_dentry_lock *dl = dentry->d_fsdata; 2338 2240 struct ocfs2_super *osb = OCFS2_SB(dentry->d_sb); 2339 2241 ··· 2498 2400 lockres->l_blocking); 2499 2401 2500 2402 /* Dump the raw LVB */ 2501 - lvb = lockres->l_lksb.lvb; 2403 + lvb = ocfs2_dlm_lvb(&lockres->l_lksb); 2502 2404 for(i = 0; i < DLM_LVB_LEN; i++) 2503 2405 seq_printf(m, "0x%x\t", lvb[i]); 2504 2406 ··· 2602 2504 int ocfs2_dlm_init(struct ocfs2_super *osb) 2603 2505 { 2604 2506 int status = 0; 2605 - u32 dlm_key; 2606 - struct dlm_ctxt *dlm = NULL; 2507 + struct ocfs2_cluster_connection *conn = NULL; 2607 2508 2608 2509 mlog_entry_void(); 2609 2510 2610 - if (ocfs2_mount_local(osb)) 2511 + if (ocfs2_mount_local(osb)) { 2512 + osb->node_num = 0; 2611 2513 goto local; 2514 + } 2612 2515 2613 2516 status = ocfs2_dlm_init_debug(osb); 2614 2517 if (status < 0) { ··· 2626 2527 goto bail; 2627 2528 } 2628 2529 2629 - /* used by the dlm code to make message headers unique, each 2630 - * node in this domain must agree on this. */ 2631 - dlm_key = crc32_le(0, osb->uuid_str, strlen(osb->uuid_str)); 2632 - 2633 2530 /* for now, uuid == domain */ 2634 - dlm = dlm_register_domain(osb->uuid_str, dlm_key, 2635 - &osb->osb_locking_proto); 2636 - if (IS_ERR(dlm)) { 2637 - status = PTR_ERR(dlm); 2531 + status = ocfs2_cluster_connect(osb->osb_cluster_stack, 2532 + osb->uuid_str, 2533 + strlen(osb->uuid_str), 2534 + ocfs2_do_node_down, osb, 2535 + &conn); 2536 + if (status) { 2638 2537 mlog_errno(status); 2639 2538 goto bail; 2640 2539 } 2641 2540 2642 - dlm_register_eviction_cb(dlm, &osb->osb_eviction_cb); 2541 + status = ocfs2_cluster_this_node(&osb->node_num); 2542 + if (status < 0) { 2543 + mlog_errno(status); 2544 + mlog(ML_ERROR, 2545 + "could not find this host's node number\n"); 2546 + ocfs2_cluster_disconnect(conn, 0); 2547 + goto bail; 2548 + } 2643 2549 2644 2550 local: 2645 2551 ocfs2_super_lock_res_init(&osb->osb_super_lockres, osb); 2646 2552 ocfs2_rename_lock_res_init(&osb->osb_rename_lockres, osb); 2647 2553 2648 - osb->dlm = dlm; 2554 + osb->cconn = conn; 2649 2555 2650 2556 status = 0; 2651 2557 bail: ··· 2664 2560 return status; 2665 2561 } 2666 2562 2667 - void ocfs2_dlm_shutdown(struct ocfs2_super *osb) 2563 + void ocfs2_dlm_shutdown(struct ocfs2_super *osb, 2564 + int hangup_pending) 2668 2565 { 2669 2566 mlog_entry_void(); 2670 2567 2671 - dlm_unregister_eviction_cb(&osb->osb_eviction_cb); 2672 - 2673 2568 ocfs2_drop_osb_locks(osb); 2569 + 2570 + /* 2571 + * Now that we have dropped all locks and ocfs2_dismount_volume() 2572 + * has disabled recovery, the DLM won't be talking to us. It's 2573 + * safe to tear things down before disconnecting the cluster. 2574 + */ 2674 2575 2675 2576 if (osb->dc_task) { 2676 2577 kthread_stop(osb->dc_task); ··· 2685 2576 ocfs2_lock_res_free(&osb->osb_super_lockres); 2686 2577 ocfs2_lock_res_free(&osb->osb_rename_lockres); 2687 2578 2688 - dlm_unregister_domain(osb->dlm); 2689 - osb->dlm = NULL; 2579 + ocfs2_cluster_disconnect(osb->cconn, hangup_pending); 2580 + osb->cconn = NULL; 2690 2581 2691 2582 ocfs2_dlm_shutdown_debug(osb); 2692 2583 2693 2584 mlog_exit_void(); 2694 2585 } 2695 2586 2696 - static void ocfs2_unlock_ast(void *opaque, enum dlm_status status) 2587 + static void ocfs2_unlock_ast(void *opaque, int error) 2697 2588 { 2698 2589 struct ocfs2_lock_res *lockres = opaque; 2699 2590 unsigned long flags; ··· 2704 2595 lockres->l_unlock_action); 2705 2596 2706 2597 spin_lock_irqsave(&lockres->l_lock, flags); 2707 - /* We tried to cancel a convert request, but it was already 2708 - * granted. All we want to do here is clear our unlock 2709 - * state. The wake_up call done at the bottom is redundant 2710 - * (ocfs2_prepare_cancel_convert doesn't sleep on this) but doesn't 2711 - * hurt anything anyway */ 2712 - if (status == DLM_CANCELGRANT && 2713 - lockres->l_unlock_action == OCFS2_UNLOCK_CANCEL_CONVERT) { 2714 - mlog(0, "Got cancelgrant for %s\n", lockres->l_name); 2715 - 2716 - /* We don't clear the busy flag in this case as it 2717 - * should have been cleared by the ast which the dlm 2718 - * has called. */ 2719 - goto complete_unlock; 2720 - } 2721 - 2722 - if (status != DLM_NORMAL) { 2723 - mlog(ML_ERROR, "Dlm passes status %d for lock %s, " 2724 - "unlock_action %d\n", status, lockres->l_name, 2598 + if (error) { 2599 + mlog(ML_ERROR, "Dlm passes error %d for lock %s, " 2600 + "unlock_action %d\n", error, lockres->l_name, 2725 2601 lockres->l_unlock_action); 2726 2602 spin_unlock_irqrestore(&lockres->l_lock, flags); 2727 2603 return; ··· 2718 2624 lockres->l_action = OCFS2_AST_INVALID; 2719 2625 break; 2720 2626 case OCFS2_UNLOCK_DROP_LOCK: 2721 - lockres->l_level = LKM_IVMODE; 2627 + lockres->l_level = DLM_LOCK_IV; 2722 2628 break; 2723 2629 default: 2724 2630 BUG(); 2725 2631 } 2726 2632 2727 2633 lockres_clear_flags(lockres, OCFS2_LOCK_BUSY); 2728 - complete_unlock: 2729 2634 lockres->l_unlock_action = OCFS2_UNLOCK_INVALID; 2730 2635 spin_unlock_irqrestore(&lockres->l_lock, flags); 2731 2636 ··· 2736 2643 static int ocfs2_drop_lock(struct ocfs2_super *osb, 2737 2644 struct ocfs2_lock_res *lockres) 2738 2645 { 2739 - enum dlm_status status; 2646 + int ret; 2740 2647 unsigned long flags; 2741 - int lkm_flags = 0; 2648 + u32 lkm_flags = 0; 2742 2649 2743 2650 /* We didn't get anywhere near actually using this lockres. */ 2744 2651 if (!(lockres->l_flags & OCFS2_LOCK_INITIALIZED)) 2745 2652 goto out; 2746 2653 2747 2654 if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB) 2748 - lkm_flags |= LKM_VALBLK; 2655 + lkm_flags |= DLM_LKF_VALBLK; 2749 2656 2750 2657 spin_lock_irqsave(&lockres->l_lock, flags); 2751 2658 ··· 2771 2678 2772 2679 if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB) { 2773 2680 if (lockres->l_flags & OCFS2_LOCK_ATTACHED && 2774 - lockres->l_level == LKM_EXMODE && 2681 + lockres->l_level == DLM_LOCK_EX && 2775 2682 !(lockres->l_flags & OCFS2_LOCK_NEEDS_REFRESH)) 2776 2683 lockres->l_ops->set_lvb(lockres); 2777 2684 } ··· 2800 2707 2801 2708 mlog(0, "lock %s\n", lockres->l_name); 2802 2709 2803 - status = dlmunlock(osb->dlm, &lockres->l_lksb, lkm_flags, 2804 - ocfs2_unlock_ast, lockres); 2805 - if (status != DLM_NORMAL) { 2806 - ocfs2_log_dlm_error("dlmunlock", status, lockres); 2710 + ret = ocfs2_dlm_unlock(osb->cconn, &lockres->l_lksb, lkm_flags, 2711 + lockres); 2712 + if (ret) { 2713 + ocfs2_log_dlm_error("ocfs2_dlm_unlock", ret, lockres); 2807 2714 mlog(ML_ERROR, "lockres flags: %lu\n", lockres->l_flags); 2808 - dlm_print_one_lock(lockres->l_lksb.lockid); 2715 + ocfs2_dlm_dump_lksb(&lockres->l_lksb); 2809 2716 BUG(); 2810 2717 } 2811 - mlog(0, "lock %s, successfull return from dlmunlock\n", 2718 + mlog(0, "lock %s, successfull return from ocfs2_dlm_unlock\n", 2812 2719 lockres->l_name); 2813 2720 2814 2721 ocfs2_wait_on_busy_lock(lockres); ··· 2899 2806 return status; 2900 2807 } 2901 2808 2902 - static void ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres, 2903 - int new_level) 2809 + static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres, 2810 + int new_level) 2904 2811 { 2905 2812 assert_spin_locked(&lockres->l_lock); 2906 2813 2907 - BUG_ON(lockres->l_blocking <= LKM_NLMODE); 2814 + BUG_ON(lockres->l_blocking <= DLM_LOCK_NL); 2908 2815 2909 2816 if (lockres->l_level <= new_level) { 2910 - mlog(ML_ERROR, "lockres->l_level (%u) <= new_level (%u)\n", 2817 + mlog(ML_ERROR, "lockres->l_level (%d) <= new_level (%d)\n", 2911 2818 lockres->l_level, new_level); 2912 2819 BUG(); 2913 2820 } ··· 2918 2825 lockres->l_action = OCFS2_AST_DOWNCONVERT; 2919 2826 lockres->l_requested = new_level; 2920 2827 lockres_or_flags(lockres, OCFS2_LOCK_BUSY); 2828 + return lockres_set_pending(lockres); 2921 2829 } 2922 2830 2923 2831 static int ocfs2_downconvert_lock(struct ocfs2_super *osb, 2924 2832 struct ocfs2_lock_res *lockres, 2925 2833 int new_level, 2926 - int lvb) 2834 + int lvb, 2835 + unsigned int generation) 2927 2836 { 2928 - int ret, dlm_flags = LKM_CONVERT; 2929 - enum dlm_status status; 2837 + int ret; 2838 + u32 dlm_flags = DLM_LKF_CONVERT; 2930 2839 2931 2840 mlog_entry_void(); 2932 2841 2933 2842 if (lvb) 2934 - dlm_flags |= LKM_VALBLK; 2843 + dlm_flags |= DLM_LKF_VALBLK; 2935 2844 2936 - status = dlmlock(osb->dlm, 2937 - new_level, 2938 - &lockres->l_lksb, 2939 - dlm_flags, 2940 - lockres->l_name, 2941 - OCFS2_LOCK_ID_MAX_LEN - 1, 2942 - ocfs2_locking_ast, 2943 - lockres, 2944 - ocfs2_blocking_ast); 2945 - if (status != DLM_NORMAL) { 2946 - ocfs2_log_dlm_error("dlmlock", status, lockres); 2947 - ret = -EINVAL; 2845 + ret = ocfs2_dlm_lock(osb->cconn, 2846 + new_level, 2847 + &lockres->l_lksb, 2848 + dlm_flags, 2849 + lockres->l_name, 2850 + OCFS2_LOCK_ID_MAX_LEN - 1, 2851 + lockres); 2852 + lockres_clear_pending(lockres, generation, osb); 2853 + if (ret) { 2854 + ocfs2_log_dlm_error("ocfs2_dlm_lock", ret, lockres); 2948 2855 ocfs2_recover_from_dlm_error(lockres, 1); 2949 2856 goto bail; 2950 2857 } ··· 2955 2862 return ret; 2956 2863 } 2957 2864 2958 - /* returns 1 when the caller should unlock and call dlmunlock */ 2865 + /* returns 1 when the caller should unlock and call ocfs2_dlm_unlock */ 2959 2866 static int ocfs2_prepare_cancel_convert(struct ocfs2_super *osb, 2960 2867 struct ocfs2_lock_res *lockres) 2961 2868 { ··· 2991 2898 struct ocfs2_lock_res *lockres) 2992 2899 { 2993 2900 int ret; 2994 - enum dlm_status status; 2995 2901 2996 2902 mlog_entry_void(); 2997 2903 mlog(0, "lock %s\n", lockres->l_name); 2998 2904 2999 - ret = 0; 3000 - status = dlmunlock(osb->dlm, 3001 - &lockres->l_lksb, 3002 - LKM_CANCEL, 3003 - ocfs2_unlock_ast, 3004 - lockres); 3005 - if (status != DLM_NORMAL) { 3006 - ocfs2_log_dlm_error("dlmunlock", status, lockres); 3007 - ret = -EINVAL; 2905 + ret = ocfs2_dlm_unlock(osb->cconn, &lockres->l_lksb, 2906 + DLM_LKF_CANCEL, lockres); 2907 + if (ret) { 2908 + ocfs2_log_dlm_error("ocfs2_dlm_unlock", ret, lockres); 3008 2909 ocfs2_recover_from_dlm_error(lockres, 0); 3009 2910 } 3010 2911 3011 - mlog(0, "lock %s return from dlmunlock\n", lockres->l_name); 2912 + mlog(0, "lock %s return from ocfs2_dlm_unlock\n", lockres->l_name); 3012 2913 3013 2914 mlog_exit(ret); 3014 2915 return ret; ··· 3017 2930 int new_level; 3018 2931 int ret = 0; 3019 2932 int set_lvb = 0; 2933 + unsigned int gen; 3020 2934 3021 2935 mlog_entry_void(); 3022 2936 ··· 3027 2939 3028 2940 recheck: 3029 2941 if (lockres->l_flags & OCFS2_LOCK_BUSY) { 2942 + /* XXX 2943 + * This is a *big* race. The OCFS2_LOCK_PENDING flag 2944 + * exists entirely for one reason - another thread has set 2945 + * OCFS2_LOCK_BUSY, but has *NOT* yet called dlm_lock(). 2946 + * 2947 + * If we do ocfs2_cancel_convert() before the other thread 2948 + * calls dlm_lock(), our cancel will do nothing. We will 2949 + * get no ast, and we will have no way of knowing the 2950 + * cancel failed. Meanwhile, the other thread will call 2951 + * into dlm_lock() and wait...forever. 2952 + * 2953 + * Why forever? Because another node has asked for the 2954 + * lock first; that's why we're here in unblock_lock(). 2955 + * 2956 + * The solution is OCFS2_LOCK_PENDING. When PENDING is 2957 + * set, we just requeue the unblock. Only when the other 2958 + * thread has called dlm_lock() and cleared PENDING will 2959 + * we then cancel their request. 2960 + * 2961 + * All callers of dlm_lock() must set OCFS2_DLM_PENDING 2962 + * at the same time they set OCFS2_DLM_BUSY. They must 2963 + * clear OCFS2_DLM_PENDING after dlm_lock() returns. 2964 + */ 2965 + if (lockres->l_flags & OCFS2_LOCK_PENDING) 2966 + goto leave_requeue; 2967 + 3030 2968 ctl->requeue = 1; 3031 2969 ret = ocfs2_prepare_cancel_convert(osb, lockres); 3032 2970 spin_unlock_irqrestore(&lockres->l_lock, flags); ··· 3066 2952 3067 2953 /* if we're blocking an exclusive and we have *any* holders, 3068 2954 * then requeue. */ 3069 - if ((lockres->l_blocking == LKM_EXMODE) 2955 + if ((lockres->l_blocking == DLM_LOCK_EX) 3070 2956 && (lockres->l_ex_holders || lockres->l_ro_holders)) 3071 2957 goto leave_requeue; 3072 2958 3073 2959 /* If it's a PR we're blocking, then only 3074 2960 * requeue if we've got any EX holders */ 3075 - if (lockres->l_blocking == LKM_PRMODE && 2961 + if (lockres->l_blocking == DLM_LOCK_PR && 3076 2962 lockres->l_ex_holders) 3077 2963 goto leave_requeue; 3078 2964 ··· 3119 3005 ctl->requeue = 0; 3120 3006 3121 3007 if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB) { 3122 - if (lockres->l_level == LKM_EXMODE) 3008 + if (lockres->l_level == DLM_LOCK_EX) 3123 3009 set_lvb = 1; 3124 3010 3125 3011 /* ··· 3132 3018 lockres->l_ops->set_lvb(lockres); 3133 3019 } 3134 3020 3135 - ocfs2_prepare_downconvert(lockres, new_level); 3021 + gen = ocfs2_prepare_downconvert(lockres, new_level); 3136 3022 spin_unlock_irqrestore(&lockres->l_lock, flags); 3137 - ret = ocfs2_downconvert_lock(osb, lockres, new_level, set_lvb); 3023 + ret = ocfs2_downconvert_lock(osb, lockres, new_level, set_lvb, 3024 + gen); 3025 + 3138 3026 leave: 3139 3027 mlog_exit(ret); 3140 3028 return ret; ··· 3175 3059 (unsigned long long)OCFS2_I(inode)->ip_blkno); 3176 3060 } 3177 3061 sync_mapping_buffers(mapping); 3178 - if (blocking == LKM_EXMODE) { 3062 + if (blocking == DLM_LOCK_EX) { 3179 3063 truncate_inode_pages(mapping, 0); 3180 3064 } else { 3181 3065 /* We only need to wait on the I/O if we're not also ··· 3196 3080 struct inode *inode = ocfs2_lock_res_inode(lockres); 3197 3081 int checkpointed = ocfs2_inode_fully_checkpointed(inode); 3198 3082 3199 - BUG_ON(new_level != LKM_NLMODE && new_level != LKM_PRMODE); 3200 - BUG_ON(lockres->l_level != LKM_EXMODE && !checkpointed); 3083 + BUG_ON(new_level != DLM_LOCK_NL && new_level != DLM_LOCK_PR); 3084 + BUG_ON(lockres->l_level != DLM_LOCK_EX && !checkpointed); 3201 3085 3202 3086 if (checkpointed) 3203 3087 return 1; ··· 3261 3145 * valid. The downconvert code will retain a PR for this node, 3262 3146 * so there's no further work to do. 3263 3147 */ 3264 - if (blocking == LKM_PRMODE) 3148 + if (blocking == DLM_LOCK_PR) 3265 3149 return UNBLOCK_CONTINUE; 3266 3150 3267 3151 /* ··· 3334 3218 3335 3219 return UNBLOCK_CONTINUE_POST; 3336 3220 } 3221 + 3222 + /* 3223 + * This is the filesystem locking protocol. It provides the lock handling 3224 + * hooks for the underlying DLM. It has a maximum version number. 3225 + * The version number allows interoperability with systems running at 3226 + * the same major number and an equal or smaller minor number. 3227 + * 3228 + * Whenever the filesystem does new things with locks (adds or removes a 3229 + * lock, orders them differently, does different things underneath a lock), 3230 + * the version must be changed. The protocol is negotiated when joining 3231 + * the dlm domain. A node may join the domain if its major version is 3232 + * identical to all other nodes and its minor version is greater than 3233 + * or equal to all other nodes. When its minor version is greater than 3234 + * the other nodes, it will run at the minor version specified by the 3235 + * other nodes. 3236 + * 3237 + * If a locking change is made that will not be compatible with older 3238 + * versions, the major number must be increased and the minor version set 3239 + * to zero. If a change merely adds a behavior that can be disabled when 3240 + * speaking to older versions, the minor version must be increased. If a 3241 + * change adds a fully backwards compatible change (eg, LVB changes that 3242 + * are just ignored by older versions), the version does not need to be 3243 + * updated. 3244 + */ 3245 + static struct ocfs2_locking_protocol lproto = { 3246 + .lp_max_version = { 3247 + .pv_major = OCFS2_LOCKING_PROTOCOL_MAJOR, 3248 + .pv_minor = OCFS2_LOCKING_PROTOCOL_MINOR, 3249 + }, 3250 + .lp_lock_ast = ocfs2_locking_ast, 3251 + .lp_blocking_ast = ocfs2_blocking_ast, 3252 + .lp_unlock_ast = ocfs2_unlock_ast, 3253 + }; 3254 + 3255 + void ocfs2_set_locking_protocol(void) 3256 + { 3257 + ocfs2_stack_glue_set_locking_protocol(&lproto); 3258 + } 3259 + 3337 3260 3338 3261 static void ocfs2_process_blocked_lock(struct ocfs2_super *osb, 3339 3262 struct ocfs2_lock_res *lockres)

+3 -2

fs/ocfs2/dlmglue.h

··· 58 58 #define OCFS2_LOCK_NONBLOCK (0x04) 59 59 60 60 int ocfs2_dlm_init(struct ocfs2_super *osb); 61 - void ocfs2_dlm_shutdown(struct ocfs2_super *osb); 61 + void ocfs2_dlm_shutdown(struct ocfs2_super *osb, int hangup_pending); 62 62 void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res); 63 63 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, 64 64 enum ocfs2_lock_type type, ··· 114 114 struct ocfs2_dlm_debug *ocfs2_new_dlm_debug(void); 115 115 void ocfs2_put_dlm_debug(struct ocfs2_dlm_debug *dlm_debug); 116 116 117 - extern const struct dlm_protocol_version ocfs2_locking_protocol; 117 + /* To set the locking protocol on module initialization */ 118 + void ocfs2_set_locking_protocol(void); 118 119 #endif /* DLMGLUE_H */

+2 -2

fs/ocfs2/file.c

··· 2242 2242 .open = ocfs2_file_open, 2243 2243 .aio_read = ocfs2_file_aio_read, 2244 2244 .aio_write = ocfs2_file_aio_write, 2245 - .ioctl = ocfs2_ioctl, 2245 + .unlocked_ioctl = ocfs2_ioctl, 2246 2246 #ifdef CONFIG_COMPAT 2247 2247 .compat_ioctl = ocfs2_compat_ioctl, 2248 2248 #endif ··· 2258 2258 .fsync = ocfs2_sync_file, 2259 2259 .release = ocfs2_dir_release, 2260 2260 .open = ocfs2_dir_open, 2261 - .ioctl = ocfs2_ioctl, 2261 + .unlocked_ioctl = ocfs2_ioctl, 2262 2262 #ifdef CONFIG_COMPAT 2263 2263 .compat_ioctl = ocfs2_compat_ioctl, 2264 2264 #endif

+8 -176

fs/ocfs2/heartbeat.c

··· 28 28 #include <linux/types.h> 29 29 #include <linux/slab.h> 30 30 #include <linux/highmem.h> 31 - #include <linux/kmod.h> 32 - 33 - #include <dlm/dlmapi.h> 34 31 35 32 #define MLOG_MASK_PREFIX ML_SUPER 36 33 #include <cluster/masklog.h> ··· 45 48 int bit); 46 49 static inline void __ocfs2_node_map_clear_bit(struct ocfs2_node_map *map, 47 50 int bit); 48 - static inline int __ocfs2_node_map_is_empty(struct ocfs2_node_map *map); 49 51 50 52 /* special case -1 for now 51 53 * TODO: should *really* make sure the calling func never passes -1!! */ ··· 58 62 void ocfs2_init_node_maps(struct ocfs2_super *osb) 59 63 { 60 64 spin_lock_init(&osb->node_map_lock); 61 - ocfs2_node_map_init(&osb->recovery_map); 62 65 ocfs2_node_map_init(&osb->osb_recovering_orphan_dirs); 63 66 } 64 67 65 - static void ocfs2_do_node_down(int node_num, 66 - struct ocfs2_super *osb) 68 + void ocfs2_do_node_down(int node_num, void *data) 67 69 { 70 + struct ocfs2_super *osb = data; 71 + 68 72 BUG_ON(osb->node_num == node_num); 69 73 70 74 mlog(0, "ocfs2: node down event for %d\n", node_num); 71 75 72 - if (!osb->dlm) { 76 + if (!osb->cconn) { 73 77 /* 74 - * No DLM means we're not even ready to participate yet. 75 - * We check the slots after the DLM comes up, so we will 76 - * notice the node death then. We can safely ignore it 77 - * here. 78 + * No cluster connection means we're not even ready to 79 + * participate yet. We check the slots after the cluster 80 + * comes up, so we will notice the node death then. We 81 + * can safely ignore it here. 78 82 */ 79 83 return; 80 84 } 81 85 82 86 ocfs2_recovery_thread(osb, node_num); 83 - } 84 - 85 - /* Called from the dlm when it's about to evict a node. We may also 86 - * get a heartbeat callback later. */ 87 - static void ocfs2_dlm_eviction_cb(int node_num, 88 - void *data) 89 - { 90 - struct ocfs2_super *osb = (struct ocfs2_super *) data; 91 - struct super_block *sb = osb->sb; 92 - 93 - mlog(ML_NOTICE, "device (%u,%u): dlm has evicted node %d\n", 94 - MAJOR(sb->s_dev), MINOR(sb->s_dev), node_num); 95 - 96 - ocfs2_do_node_down(node_num, osb); 97 - } 98 - 99 - void ocfs2_setup_hb_callbacks(struct ocfs2_super *osb) 100 - { 101 - /* Not exactly a heartbeat callback, but leads to essentially 102 - * the same path so we set it up here. */ 103 - dlm_setup_eviction_cb(&osb->osb_eviction_cb, 104 - ocfs2_dlm_eviction_cb, 105 - osb); 106 - } 107 - 108 - void ocfs2_stop_heartbeat(struct ocfs2_super *osb) 109 - { 110 - int ret; 111 - char *argv[5], *envp[3]; 112 - 113 - if (ocfs2_mount_local(osb)) 114 - return; 115 - 116 - if (!osb->uuid_str) { 117 - /* This can happen if we don't get far enough in mount... */ 118 - mlog(0, "No UUID with which to stop heartbeat!\n\n"); 119 - return; 120 - } 121 - 122 - argv[0] = (char *)o2nm_get_hb_ctl_path(); 123 - argv[1] = "-K"; 124 - argv[2] = "-u"; 125 - argv[3] = osb->uuid_str; 126 - argv[4] = NULL; 127 - 128 - mlog(0, "Run: %s %s %s %s\n", argv[0], argv[1], argv[2], argv[3]); 129 - 130 - /* minimal command environment taken from cpu_run_sbin_hotplug */ 131 - envp[0] = "HOME=/"; 132 - envp[1] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin"; 133 - envp[2] = NULL; 134 - 135 - ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC); 136 - if (ret < 0) 137 - mlog_errno(ret); 138 87 } 139 88 140 89 static inline void __ocfs2_node_map_set_bit(struct ocfs2_node_map *map, ··· 133 192 return ret; 134 193 } 135 194 136 - static inline int __ocfs2_node_map_is_empty(struct ocfs2_node_map *map) 137 - { 138 - int bit; 139 - bit = find_next_bit(map->map, map->num_nodes, 0); 140 - if (bit < map->num_nodes) 141 - return 0; 142 - return 1; 143 - } 144 - 145 - int ocfs2_node_map_is_empty(struct ocfs2_super *osb, 146 - struct ocfs2_node_map *map) 147 - { 148 - int ret; 149 - BUG_ON(map->num_nodes == 0); 150 - spin_lock(&osb->node_map_lock); 151 - ret = __ocfs2_node_map_is_empty(map); 152 - spin_unlock(&osb->node_map_lock); 153 - return ret; 154 - } 155 - 156 - #if 0 157 - 158 - static void __ocfs2_node_map_dup(struct ocfs2_node_map *target, 159 - struct ocfs2_node_map *from) 160 - { 161 - BUG_ON(from->num_nodes == 0); 162 - ocfs2_node_map_init(target); 163 - __ocfs2_node_map_set(target, from); 164 - } 165 - 166 - /* returns 1 if bit is the only bit set in target, 0 otherwise */ 167 - int ocfs2_node_map_is_only(struct ocfs2_super *osb, 168 - struct ocfs2_node_map *target, 169 - int bit) 170 - { 171 - struct ocfs2_node_map temp; 172 - int ret; 173 - 174 - spin_lock(&osb->node_map_lock); 175 - __ocfs2_node_map_dup(&temp, target); 176 - __ocfs2_node_map_clear_bit(&temp, bit); 177 - ret = __ocfs2_node_map_is_empty(&temp); 178 - spin_unlock(&osb->node_map_lock); 179 - 180 - return ret; 181 - } 182 - 183 - static void __ocfs2_node_map_set(struct ocfs2_node_map *target, 184 - struct ocfs2_node_map *from) 185 - { 186 - int num_longs, i; 187 - 188 - BUG_ON(target->num_nodes != from->num_nodes); 189 - BUG_ON(target->num_nodes == 0); 190 - 191 - num_longs = BITS_TO_LONGS(target->num_nodes); 192 - for (i = 0; i < num_longs; i++) 193 - target->map[i] = from->map[i]; 194 - } 195 - 196 - #endif /* 0 */ 197 - 198 - /* Returns whether the recovery bit was actually set - it may not be 199 - * if a node is still marked as needing recovery */ 200 - int ocfs2_recovery_map_set(struct ocfs2_super *osb, 201 - int num) 202 - { 203 - int set = 0; 204 - 205 - spin_lock(&osb->node_map_lock); 206 - 207 - if (!test_bit(num, osb->recovery_map.map)) { 208 - __ocfs2_node_map_set_bit(&osb->recovery_map, num); 209 - set = 1; 210 - } 211 - 212 - spin_unlock(&osb->node_map_lock); 213 - 214 - return set; 215 - } 216 - 217 - void ocfs2_recovery_map_clear(struct ocfs2_super *osb, 218 - int num) 219 - { 220 - ocfs2_node_map_clear_bit(osb, &osb->recovery_map, num); 221 - } 222 - 223 - int ocfs2_node_map_iterate(struct ocfs2_super *osb, 224 - struct ocfs2_node_map *map, 225 - int idx) 226 - { 227 - int i = idx; 228 - 229 - idx = O2NM_INVALID_NODE_NUM; 230 - spin_lock(&osb->node_map_lock); 231 - if ((i != O2NM_INVALID_NODE_NUM) && 232 - (i >= 0) && 233 - (i < map->num_nodes)) { 234 - while(i < map->num_nodes) { 235 - if (test_bit(i, map->map)) { 236 - idx = i; 237 - break; 238 - } 239 - i++; 240 - } 241 - } 242 - spin_unlock(&osb->node_map_lock); 243 - return idx; 244 - }

+1 -16

fs/ocfs2/heartbeat.h

··· 28 28 29 29 void ocfs2_init_node_maps(struct ocfs2_super *osb); 30 30 31 - void ocfs2_setup_hb_callbacks(struct ocfs2_super *osb); 32 - void ocfs2_stop_heartbeat(struct ocfs2_super *osb); 31 + void ocfs2_do_node_down(int node_num, void *data); 33 32 34 33 /* node map functions - used to keep track of mounted and in-recovery 35 34 * nodes. */ 36 - int ocfs2_node_map_is_empty(struct ocfs2_super *osb, 37 - struct ocfs2_node_map *map); 38 35 void ocfs2_node_map_set_bit(struct ocfs2_super *osb, 39 36 struct ocfs2_node_map *map, 40 37 int bit); ··· 41 44 int ocfs2_node_map_test_bit(struct ocfs2_super *osb, 42 45 struct ocfs2_node_map *map, 43 46 int bit); 44 - int ocfs2_node_map_iterate(struct ocfs2_super *osb, 45 - struct ocfs2_node_map *map, 46 - int idx); 47 - static inline int ocfs2_node_map_first_set_bit(struct ocfs2_super *osb, 48 - struct ocfs2_node_map *map) 49 - { 50 - return ocfs2_node_map_iterate(osb, map, 0); 51 - } 52 - int ocfs2_recovery_map_set(struct ocfs2_super *osb, 53 - int num); 54 - void ocfs2_recovery_map_clear(struct ocfs2_super *osb, 55 - int num); 56 47 57 48 #endif /* OCFS2_HEARTBEAT_H */

+4 -9

fs/ocfs2/ioctl.c

··· 7 7 8 8 #include <linux/fs.h> 9 9 #include <linux/mount.h> 10 + #include <linux/smp_lock.h> 10 11 11 12 #define MLOG_MASK_PREFIX ML_INODE 12 13 #include <cluster/masklog.h> ··· 113 112 return status; 114 113 } 115 114 116 - int ocfs2_ioctl(struct inode * inode, struct file * filp, 117 - unsigned int cmd, unsigned long arg) 115 + long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) 118 116 { 117 + struct inode *inode = filp->f_path.dentry->d_inode; 119 118 unsigned int flags; 120 119 int new_clusters; 121 120 int status; ··· 169 168 #ifdef CONFIG_COMPAT 170 169 long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg) 171 170 { 172 - struct inode *inode = file->f_path.dentry->d_inode; 173 - int ret; 174 - 175 171 switch (cmd) { 176 172 case OCFS2_IOC32_GETFLAGS: 177 173 cmd = OCFS2_IOC_GETFLAGS; ··· 188 190 return -ENOIOCTLCMD; 189 191 } 190 192 191 - lock_kernel(); 192 - ret = ocfs2_ioctl(inode, file, cmd, arg); 193 - unlock_kernel(); 194 - return ret; 193 + return ocfs2_ioctl(file, cmd, arg); 195 194 } 196 195 #endif

+1 -2

fs/ocfs2/ioctl.h

··· 10 10 #ifndef OCFS2_IOCTL_H 11 11 #define OCFS2_IOCTL_H 12 12 13 - int ocfs2_ioctl(struct inode * inode, struct file * filp, 14 - unsigned int cmd, unsigned long arg); 13 + long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); 15 14 long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg); 16 15 17 16 #endif /* OCFS2_IOCTL_H */

+180 -31

fs/ocfs2/journal.c

··· 64 64 int slot); 65 65 static int ocfs2_commit_thread(void *arg); 66 66 67 + 68 + /* 69 + * The recovery_list is a simple linked list of node numbers to recover. 70 + * It is protected by the recovery_lock. 71 + */ 72 + 73 + struct ocfs2_recovery_map { 74 + unsigned int rm_used; 75 + unsigned int *rm_entries; 76 + }; 77 + 78 + int ocfs2_recovery_init(struct ocfs2_super *osb) 79 + { 80 + struct ocfs2_recovery_map *rm; 81 + 82 + mutex_init(&osb->recovery_lock); 83 + osb->disable_recovery = 0; 84 + osb->recovery_thread_task = NULL; 85 + init_waitqueue_head(&osb->recovery_event); 86 + 87 + rm = kzalloc(sizeof(struct ocfs2_recovery_map) + 88 + osb->max_slots * sizeof(unsigned int), 89 + GFP_KERNEL); 90 + if (!rm) { 91 + mlog_errno(-ENOMEM); 92 + return -ENOMEM; 93 + } 94 + 95 + rm->rm_entries = (unsigned int *)((char *)rm + 96 + sizeof(struct ocfs2_recovery_map)); 97 + osb->recovery_map = rm; 98 + 99 + return 0; 100 + } 101 + 102 + /* we can't grab the goofy sem lock from inside wait_event, so we use 103 + * memory barriers to make sure that we'll see the null task before 104 + * being woken up */ 105 + static int ocfs2_recovery_thread_running(struct ocfs2_super *osb) 106 + { 107 + mb(); 108 + return osb->recovery_thread_task != NULL; 109 + } 110 + 111 + void ocfs2_recovery_exit(struct ocfs2_super *osb) 112 + { 113 + struct ocfs2_recovery_map *rm; 114 + 115 + /* disable any new recovery threads and wait for any currently 116 + * running ones to exit. Do this before setting the vol_state. */ 117 + mutex_lock(&osb->recovery_lock); 118 + osb->disable_recovery = 1; 119 + mutex_unlock(&osb->recovery_lock); 120 + wait_event(osb->recovery_event, !ocfs2_recovery_thread_running(osb)); 121 + 122 + /* At this point, we know that no more recovery threads can be 123 + * launched, so wait for any recovery completion work to 124 + * complete. */ 125 + flush_workqueue(ocfs2_wq); 126 + 127 + /* 128 + * Now that recovery is shut down, and the osb is about to be 129 + * freed, the osb_lock is not taken here. 130 + */ 131 + rm = osb->recovery_map; 132 + /* XXX: Should we bug if there are dirty entries? */ 133 + 134 + kfree(rm); 135 + } 136 + 137 + static int __ocfs2_recovery_map_test(struct ocfs2_super *osb, 138 + unsigned int node_num) 139 + { 140 + int i; 141 + struct ocfs2_recovery_map *rm = osb->recovery_map; 142 + 143 + assert_spin_locked(&osb->osb_lock); 144 + 145 + for (i = 0; i < rm->rm_used; i++) { 146 + if (rm->rm_entries[i] == node_num) 147 + return 1; 148 + } 149 + 150 + return 0; 151 + } 152 + 153 + /* Behaves like test-and-set. Returns the previous value */ 154 + static int ocfs2_recovery_map_set(struct ocfs2_super *osb, 155 + unsigned int node_num) 156 + { 157 + struct ocfs2_recovery_map *rm = osb->recovery_map; 158 + 159 + spin_lock(&osb->osb_lock); 160 + if (__ocfs2_recovery_map_test(osb, node_num)) { 161 + spin_unlock(&osb->osb_lock); 162 + return 1; 163 + } 164 + 165 + /* XXX: Can this be exploited? Not from o2dlm... */ 166 + BUG_ON(rm->rm_used >= osb->max_slots); 167 + 168 + rm->rm_entries[rm->rm_used] = node_num; 169 + rm->rm_used++; 170 + spin_unlock(&osb->osb_lock); 171 + 172 + return 0; 173 + } 174 + 175 + static void ocfs2_recovery_map_clear(struct ocfs2_super *osb, 176 + unsigned int node_num) 177 + { 178 + int i; 179 + struct ocfs2_recovery_map *rm = osb->recovery_map; 180 + 181 + spin_lock(&osb->osb_lock); 182 + 183 + for (i = 0; i < rm->rm_used; i++) { 184 + if (rm->rm_entries[i] == node_num) 185 + break; 186 + } 187 + 188 + if (i < rm->rm_used) { 189 + /* XXX: be careful with the pointer math */ 190 + memmove(&(rm->rm_entries[i]), &(rm->rm_entries[i + 1]), 191 + (rm->rm_used - i - 1) * sizeof(unsigned int)); 192 + rm->rm_used--; 193 + } 194 + 195 + spin_unlock(&osb->osb_lock); 196 + } 197 + 67 198 static int ocfs2_commit_cache(struct ocfs2_super *osb) 68 199 { 69 200 int status = 0; ··· 717 586 718 587 mlog_entry_void(); 719 588 720 - if (!journal) 721 - BUG(); 589 + BUG_ON(!journal); 722 590 723 591 osb = journal->j_osb; 724 592 ··· 778 648 bail: 779 649 mlog_exit(status); 780 650 return status; 651 + } 652 + 653 + static int ocfs2_recovery_completed(struct ocfs2_super *osb) 654 + { 655 + int empty; 656 + struct ocfs2_recovery_map *rm = osb->recovery_map; 657 + 658 + spin_lock(&osb->osb_lock); 659 + empty = (rm->rm_used == 0); 660 + spin_unlock(&osb->osb_lock); 661 + 662 + return empty; 663 + } 664 + 665 + void ocfs2_wait_for_recovery(struct ocfs2_super *osb) 666 + { 667 + wait_event(osb->recovery_event, ocfs2_recovery_completed(osb)); 781 668 } 782 669 783 670 /* ··· 995 848 { 996 849 int status, node_num; 997 850 struct ocfs2_super *osb = arg; 851 + struct ocfs2_recovery_map *rm = osb->recovery_map; 998 852 999 853 mlog_entry_void(); 1000 854 ··· 1011 863 goto bail; 1012 864 } 1013 865 1014 - while(!ocfs2_node_map_is_empty(osb, &osb->recovery_map)) { 1015 - node_num = ocfs2_node_map_first_set_bit(osb, 1016 - &osb->recovery_map); 1017 - if (node_num == O2NM_INVALID_NODE_NUM) { 1018 - mlog(0, "Out of nodes to recover.\n"); 1019 - break; 1020 - } 866 + spin_lock(&osb->osb_lock); 867 + while (rm->rm_used) { 868 + /* It's always safe to remove entry zero, as we won't 869 + * clear it until ocfs2_recover_node() has succeeded. */ 870 + node_num = rm->rm_entries[0]; 871 + spin_unlock(&osb->osb_lock); 1021 872 1022 873 status = ocfs2_recover_node(osb, node_num); 1023 - if (status < 0) { 874 + if (!status) { 875 + ocfs2_recovery_map_clear(osb, node_num); 876 + } else { 1024 877 mlog(ML_ERROR, 1025 878 "Error %d recovering node %d on device (%u,%u)!\n", 1026 879 status, node_num, 1027 880 MAJOR(osb->sb->s_dev), MINOR(osb->sb->s_dev)); 1028 881 mlog(ML_ERROR, "Volume requires unmount.\n"); 1029 - continue; 1030 882 } 1031 883 1032 - ocfs2_recovery_map_clear(osb, node_num); 884 + spin_lock(&osb->osb_lock); 1033 885 } 886 + spin_unlock(&osb->osb_lock); 887 + mlog(0, "All nodes recovered\n"); 888 + 1034 889 ocfs2_super_unlock(osb, 1); 1035 890 1036 891 /* We always run recovery on our own orphan dir - the dead ··· 1044 893 1045 894 bail: 1046 895 mutex_lock(&osb->recovery_lock); 1047 - if (!status && 1048 - !ocfs2_node_map_is_empty(osb, &osb->recovery_map)) { 896 + if (!status && !ocfs2_recovery_completed(osb)) { 1049 897 mutex_unlock(&osb->recovery_lock); 1050 898 goto restart; 1051 899 } ··· 1074 924 1075 925 /* People waiting on recovery will wait on 1076 926 * the recovery map to empty. */ 1077 - if (!ocfs2_recovery_map_set(osb, node_num)) 1078 - mlog(0, "node %d already be in recovery.\n", node_num); 927 + if (ocfs2_recovery_map_set(osb, node_num)) 928 + mlog(0, "node %d already in recovery map.\n", node_num); 1079 929 1080 930 mlog(0, "starting recovery thread...\n"); 1081 931 ··· 1229 1079 { 1230 1080 int status = 0; 1231 1081 int slot_num; 1232 - struct ocfs2_slot_info *si = osb->slot_info; 1233 1082 struct ocfs2_dinode *la_copy = NULL; 1234 1083 struct ocfs2_dinode *tl_copy = NULL; 1235 1084 ··· 1241 1092 * case we should've called ocfs2_journal_load instead. */ 1242 1093 BUG_ON(osb->node_num == node_num); 1243 1094 1244 - slot_num = ocfs2_node_num_to_slot(si, node_num); 1245 - if (slot_num == OCFS2_INVALID_SLOT) { 1095 + slot_num = ocfs2_node_num_to_slot(osb, node_num); 1096 + if (slot_num == -ENOENT) { 1246 1097 status = 0; 1247 1098 mlog(0, "no slot for this node, so no recovery required.\n"); 1248 1099 goto done; ··· 1272 1123 1273 1124 /* Likewise, this would be a strange but ultimately not so 1274 1125 * harmful place to get an error... */ 1275 - ocfs2_clear_slot(si, slot_num); 1276 - status = ocfs2_update_disk_slots(osb, si); 1126 + status = ocfs2_clear_slot(osb, slot_num); 1277 1127 if (status < 0) 1278 1128 mlog_errno(status); 1279 1129 ··· 1332 1184 * slot info struct has been updated from disk. */ 1333 1185 int ocfs2_mark_dead_nodes(struct ocfs2_super *osb) 1334 1186 { 1335 - int status, i, node_num; 1336 - struct ocfs2_slot_info *si = osb->slot_info; 1187 + unsigned int node_num; 1188 + int status, i; 1337 1189 1338 1190 /* This is called with the super block cluster lock, so we 1339 1191 * know that the slot map can't change underneath us. */ 1340 1192 1341 - spin_lock(&si->si_lock); 1342 - for(i = 0; i < si->si_num_slots; i++) { 1193 + spin_lock(&osb->osb_lock); 1194 + for (i = 0; i < osb->max_slots; i++) { 1343 1195 if (i == osb->slot_num) 1344 1196 continue; 1345 - if (ocfs2_is_empty_slot(si, i)) 1197 + 1198 + status = ocfs2_slot_to_node_num_locked(osb, i, &node_num); 1199 + if (status == -ENOENT) 1346 1200 continue; 1347 1201 1348 - node_num = si->si_global_node_nums[i]; 1349 - if (ocfs2_node_map_test_bit(osb, &osb->recovery_map, node_num)) 1202 + if (__ocfs2_recovery_map_test(osb, node_num)) 1350 1203 continue; 1351 - spin_unlock(&si->si_lock); 1204 + spin_unlock(&osb->osb_lock); 1352 1205 1353 1206 /* Ok, we have a slot occupied by another node which 1354 1207 * is not in the recovery map. We trylock his journal ··· 1365 1216 goto bail; 1366 1217 } 1367 1218 1368 - spin_lock(&si->si_lock); 1219 + spin_lock(&osb->osb_lock); 1369 1220 } 1370 - spin_unlock(&si->si_lock); 1221 + spin_unlock(&osb->osb_lock); 1371 1222 1372 1223 status = 0; 1373 1224 bail:

+4

fs/ocfs2/journal.h

··· 134 134 135 135 /* Exported only for the journal struct init code in super.c. Do not call. */ 136 136 void ocfs2_complete_recovery(struct work_struct *work); 137 + void ocfs2_wait_for_recovery(struct ocfs2_super *osb); 138 + 139 + int ocfs2_recovery_init(struct ocfs2_super *osb); 140 + void ocfs2_recovery_exit(struct ocfs2_super *osb); 137 141 138 142 /* 139 143 * Journal Control:

+4

fs/ocfs2/localalloc.c

··· 447 447 iput(main_bm_inode); 448 448 449 449 out: 450 + if (!status) 451 + ocfs2_init_inode_steal_slot(osb); 450 452 mlog_exit(status); 451 453 return status; 452 454 } ··· 525 523 } 526 524 527 525 ac->ac_inode = local_alloc_inode; 526 + /* We should never use localalloc from another slot */ 527 + ac->ac_alloc_slot = osb->slot_num; 528 528 ac->ac_which = OCFS2_AC_USE_LOCAL; 529 529 get_bh(osb->local_alloc_bh); 530 530 ac->ac_bh = osb->local_alloc_bh;

+2 -2

fs/ocfs2/namei.c

··· 424 424 fe->i_fs_generation = cpu_to_le32(osb->fs_generation); 425 425 fe->i_blkno = cpu_to_le64(fe_blkno); 426 426 fe->i_suballoc_bit = cpu_to_le16(suballoc_bit); 427 - fe->i_suballoc_slot = cpu_to_le16(osb->slot_num); 427 + fe->i_suballoc_slot = cpu_to_le16(inode_ac->ac_alloc_slot); 428 428 fe->i_uid = cpu_to_le32(current->fsuid); 429 429 if (dir->i_mode & S_ISGID) { 430 430 fe->i_gid = cpu_to_le32(dir->i_gid); ··· 997 997 * 998 998 * And that's why, just like the VFS, we need a file system 999 999 * rename lock. */ 1000 - if (old_dentry != new_dentry) { 1000 + if (old_dir != new_dir && S_ISDIR(old_inode->i_mode)) { 1001 1001 status = ocfs2_rename_lock(osb); 1002 1002 if (status < 0) { 1003 1003 mlog_errno(status);

+61 -16

fs/ocfs2/ocfs2.h

··· 36 36 #include <linux/mutex.h> 37 37 #include <linux/jbd.h> 38 38 39 - #include "cluster/nodemanager.h" 40 - #include "cluster/heartbeat.h" 41 - #include "cluster/tcp.h" 42 - 43 - #include "dlm/dlmapi.h" 39 + /* For union ocfs2_dlm_lksb */ 40 + #include "stackglue.h" 44 41 45 42 #include "ocfs2_fs.h" 46 43 #include "ocfs2_lockid.h" ··· 98 101 * dropped. */ 99 102 #define OCFS2_LOCK_QUEUED (0x00000100) /* queued for downconvert */ 100 103 #define OCFS2_LOCK_NOCACHE (0x00000200) /* don't use a holder count */ 104 + #define OCFS2_LOCK_PENDING (0x00000400) /* This lockres is pending a 105 + call to dlm_lock. Only 106 + exists with BUSY set. */ 101 107 102 108 struct ocfs2_lock_res_ops; 103 109 ··· 120 120 int l_level; 121 121 unsigned int l_ro_holders; 122 122 unsigned int l_ex_holders; 123 - struct dlm_lockstatus l_lksb; 123 + union ocfs2_dlm_lksb l_lksb; 124 124 125 125 /* used from AST/BAST funcs. */ 126 126 enum ocfs2_ast_action l_action; 127 127 enum ocfs2_unlock_action l_unlock_action; 128 128 int l_requested; 129 129 int l_blocking; 130 + unsigned int l_pending_gen; 130 131 131 132 wait_queue_head_t l_event; 132 133 ··· 180 179 #define OCFS2_DEFAULT_ATIME_QUANTUM 60 181 180 182 181 struct ocfs2_journal; 182 + struct ocfs2_slot_info; 183 + struct ocfs2_recovery_map; 183 184 struct ocfs2_super 184 185 { 185 186 struct task_struct *commit_task; ··· 193 190 struct ocfs2_slot_info *slot_info; 194 191 195 192 spinlock_t node_map_lock; 196 - struct ocfs2_node_map recovery_map; 197 193 198 194 u64 root_blkno; 199 195 u64 system_dir_blkno; ··· 208 206 u32 s_feature_incompat; 209 207 u32 s_feature_ro_compat; 210 208 211 - /* Protects s_next_generaion, osb_flags. Could protect more on 212 - * osb as it's very short lived. */ 209 + /* Protects s_next_generation, osb_flags and s_inode_steal_slot. 210 + * Could protect more on osb as it's very short lived. 211 + */ 213 212 spinlock_t osb_lock; 214 213 u32 s_next_generation; 215 214 unsigned long osb_flags; 215 + s16 s_inode_steal_slot; 216 + atomic_t s_num_inodes_stolen; 216 217 217 218 unsigned long s_mount_opt; 218 219 unsigned int s_atime_quantum; 219 220 220 - u16 max_slots; 221 - s16 node_num; 222 - s16 slot_num; 223 - s16 preferred_slot; 221 + unsigned int max_slots; 222 + unsigned int node_num; 223 + int slot_num; 224 + int preferred_slot; 224 225 int s_sectsize_bits; 225 226 int s_clustersize; 226 227 int s_clustersize_bits; 227 228 228 229 atomic_t vol_state; 229 230 struct mutex recovery_lock; 231 + struct ocfs2_recovery_map *recovery_map; 230 232 struct task_struct *recovery_thread_task; 231 233 int disable_recovery; 232 234 wait_queue_head_t checkpoint_event; ··· 251 245 struct ocfs2_alloc_stats alloc_stats; 252 246 char dev_str[20]; /* "major,minor" of the device */ 253 247 254 - struct dlm_ctxt *dlm; 248 + char osb_cluster_stack[OCFS2_STACK_LABEL_LEN + 1]; 249 + struct ocfs2_cluster_connection *cconn; 255 250 struct ocfs2_lock_res osb_super_lockres; 256 251 struct ocfs2_lock_res osb_rename_lockres; 257 - struct dlm_eviction_cb osb_eviction_cb; 258 252 struct ocfs2_dlm_debug *osb_dlm_debug; 259 - struct dlm_protocol_version osb_locking_proto; 260 253 261 254 struct dentry *osb_debug_root; 262 255 ··· 372 367 return ret; 373 368 } 374 369 370 + static inline int ocfs2_userspace_stack(struct ocfs2_super *osb) 371 + { 372 + return (osb->s_feature_incompat & 373 + OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK); 374 + } 375 + 375 376 static inline int ocfs2_mount_local(struct ocfs2_super *osb) 376 377 { 377 378 return (osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT); 378 379 } 380 + 381 + static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb) 382 + { 383 + return (osb->s_feature_incompat & 384 + OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP); 385 + } 386 + 379 387 380 388 #define OCFS2_IS_VALID_DINODE(ptr) \ 381 389 (!strcmp((ptr)->i_signature, OCFS2_INODE_SIGNATURE)) ··· 538 520 pages_per_cluster = 1 << (cbits - PAGE_CACHE_SHIFT); 539 521 540 522 return pages_per_cluster; 523 + } 524 + 525 + static inline void ocfs2_init_inode_steal_slot(struct ocfs2_super *osb) 526 + { 527 + spin_lock(&osb->osb_lock); 528 + osb->s_inode_steal_slot = OCFS2_INVALID_SLOT; 529 + spin_unlock(&osb->osb_lock); 530 + atomic_set(&osb->s_num_inodes_stolen, 0); 531 + } 532 + 533 + static inline void ocfs2_set_inode_steal_slot(struct ocfs2_super *osb, 534 + s16 slot) 535 + { 536 + spin_lock(&osb->osb_lock); 537 + osb->s_inode_steal_slot = slot; 538 + spin_unlock(&osb->osb_lock); 539 + } 540 + 541 + static inline s16 ocfs2_get_inode_steal_slot(struct ocfs2_super *osb) 542 + { 543 + s16 slot; 544 + 545 + spin_lock(&osb->osb_lock); 546 + slot = osb->s_inode_steal_slot; 547 + spin_unlock(&osb->osb_lock); 548 + 549 + return slot; 541 550 } 542 551 543 552 #define ocfs2_set_bit ext2_set_bit

+77 -2

fs/ocfs2/ocfs2_fs.h

··· 88 88 #define OCFS2_FEATURE_COMPAT_SUPP OCFS2_FEATURE_COMPAT_BACKUP_SB 89 89 #define OCFS2_FEATURE_INCOMPAT_SUPP (OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT \ 90 90 | OCFS2_FEATURE_INCOMPAT_SPARSE_ALLOC \ 91 - | OCFS2_FEATURE_INCOMPAT_INLINE_DATA) 91 + | OCFS2_FEATURE_INCOMPAT_INLINE_DATA \ 92 + | OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP \ 93 + | OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK) 92 94 #define OCFS2_FEATURE_RO_COMPAT_SUPP OCFS2_FEATURE_RO_COMPAT_UNWRITTEN 93 95 94 96 /* ··· 126 124 127 125 /* Support for data packed into inode blocks */ 128 126 #define OCFS2_FEATURE_INCOMPAT_INLINE_DATA 0x0040 127 + 128 + /* Support for the extended slot map */ 129 + #define OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP 0x100 130 + 131 + 132 + /* 133 + * Support for alternate, userspace cluster stacks. If set, the superblock 134 + * field s_cluster_info contains a tag for the alternate stack in use as 135 + * well as the name of the cluster being joined. 136 + * mount.ocfs2 must pass in a matching stack name. 137 + * 138 + * If not set, the classic stack will be used. This is compatbile with 139 + * all older versions. 140 + */ 141 + #define OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK 0x0080 129 142 130 143 /* 131 144 * backup superblock flag is used to indicate that this volume ··· 283 266 284 267 #define OCFS2_VOL_UUID_LEN 16 285 268 #define OCFS2_MAX_VOL_LABEL_LEN 64 269 + 270 + /* The alternate, userspace stack fields */ 271 + #define OCFS2_STACK_LABEL_LEN 4 272 + #define OCFS2_CLUSTER_NAME_LEN 16 286 273 287 274 /* Journal limits (in bytes) */ 288 275 #define OCFS2_MIN_JOURNAL_SIZE (4 * 1024 * 1024) ··· 496 475 }; 497 476 498 477 /* 478 + * On disk slot map for OCFS2. This defines the contents of the "slot_map" 479 + * system file. A slot is valid if it contains a node number >= 0. The 480 + * value -1 (0xFFFF) is OCFS2_INVALID_SLOT. This marks a slot empty. 481 + */ 482 + struct ocfs2_slot_map { 483 + /*00*/ __le16 sm_slots[0]; 484 + /* 485 + * Actual on-disk size is one block. OCFS2_MAX_SLOTS is 255, 486 + * 255 * sizeof(__le16) == 512B, within the 512B block minimum blocksize. 487 + */ 488 + }; 489 + 490 + struct ocfs2_extended_slot { 491 + /*00*/ __u8 es_valid; 492 + __u8 es_reserved1[3]; 493 + __le32 es_node_num; 494 + /*10*/ 495 + }; 496 + 497 + /* 498 + * The extended slot map, used when OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP 499 + * is set. It separates out the valid marker from the node number, and 500 + * has room to grow. Unlike the old slot map, this format is defined by 501 + * i_size. 502 + */ 503 + struct ocfs2_slot_map_extended { 504 + /*00*/ struct ocfs2_extended_slot se_slots[0]; 505 + /* 506 + * Actual size is i_size of the slot_map system file. It should 507 + * match s_max_slots * sizeof(struct ocfs2_extended_slot) 508 + */ 509 + }; 510 + 511 + struct ocfs2_cluster_info { 512 + /*00*/ __u8 ci_stack[OCFS2_STACK_LABEL_LEN]; 513 + __le32 ci_reserved; 514 + /*08*/ __u8 ci_cluster[OCFS2_CLUSTER_NAME_LEN]; 515 + /*18*/ 516 + }; 517 + 518 + /* 499 519 * On disk superblock for OCFS2 500 520 * Note that it is contained inside an ocfs2_dinode, so all offsets 501 521 * are relative to the start of ocfs2_dinode.id2. ··· 568 506 * group header */ 569 507 /*50*/ __u8 s_label[OCFS2_MAX_VOL_LABEL_LEN]; /* Label for mounting, etc. */ 570 508 /*90*/ __u8 s_uuid[OCFS2_VOL_UUID_LEN]; /* 128-bit uuid */ 571 - /*A0*/ 509 + /*A0*/ struct ocfs2_cluster_info s_cluster_info; /* Selected userspace 510 + stack. Only valid 511 + with INCOMPAT flag. */ 512 + /*B8*/ __le64 s_reserved2[17]; /* Fill out superblock */ 513 + /*140*/ 514 + 515 + /* 516 + * NOTE: As stated above, all offsets are relative to 517 + * ocfs2_dinode.id2, which is at 0xC0 in the inode. 518 + * 0xC0 + 0x140 = 0x200 or 512 bytes. A superblock must fit within 519 + * our smallest blocksize, which is 512 bytes. To ensure this, 520 + * we reserve the space in s_reserved2. Anything past s_reserved2 521 + * will not be available on the smallest blocksize. 522 + */ 572 523 }; 573 524 574 525 /*

+1 -1

fs/ocfs2/ocfs2_lockid.h

··· 100 100 static inline const char *ocfs2_lock_type_string(enum ocfs2_lock_type type) 101 101 { 102 102 #ifdef __KERNEL__ 103 - mlog_bug_on_msg(type >= OCFS2_NUM_LOCK_TYPES, "%d\n", type); 103 + BUG_ON(type >= OCFS2_NUM_LOCK_TYPES); 104 104 #endif 105 105 return ocfs2_lock_type_strings[type]; 106 106 }

+356 -104

fs/ocfs2/slot_map.c

··· 42 42 43 43 #include "buffer_head_io.h" 44 44 45 - static s16 __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, 46 - s16 global); 47 - static void __ocfs2_fill_slot(struct ocfs2_slot_info *si, 48 - s16 slot_num, 49 - s16 node_num); 50 45 51 - /* post the slot information on disk into our slot_info struct. */ 52 - void ocfs2_update_slot_info(struct ocfs2_slot_info *si) 46 + struct ocfs2_slot { 47 + int sl_valid; 48 + unsigned int sl_node_num; 49 + }; 50 + 51 + struct ocfs2_slot_info { 52 + int si_extended; 53 + int si_slots_per_block; 54 + struct inode *si_inode; 55 + unsigned int si_blocks; 56 + struct buffer_head **si_bh; 57 + unsigned int si_num_slots; 58 + struct ocfs2_slot *si_slots; 59 + }; 60 + 61 + 62 + static int __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, 63 + unsigned int node_num); 64 + 65 + static void ocfs2_invalidate_slot(struct ocfs2_slot_info *si, 66 + int slot_num) 67 + { 68 + BUG_ON((slot_num < 0) || (slot_num >= si->si_num_slots)); 69 + si->si_slots[slot_num].sl_valid = 0; 70 + } 71 + 72 + static void ocfs2_set_slot(struct ocfs2_slot_info *si, 73 + int slot_num, unsigned int node_num) 74 + { 75 + BUG_ON((slot_num < 0) || (slot_num >= si->si_num_slots)); 76 + 77 + si->si_slots[slot_num].sl_valid = 1; 78 + si->si_slots[slot_num].sl_node_num = node_num; 79 + } 80 + 81 + /* This version is for the extended slot map */ 82 + static void ocfs2_update_slot_info_extended(struct ocfs2_slot_info *si) 83 + { 84 + int b, i, slotno; 85 + struct ocfs2_slot_map_extended *se; 86 + 87 + slotno = 0; 88 + for (b = 0; b < si->si_blocks; b++) { 89 + se = (struct ocfs2_slot_map_extended *)si->si_bh[b]->b_data; 90 + for (i = 0; 91 + (i < si->si_slots_per_block) && 92 + (slotno < si->si_num_slots); 93 + i++, slotno++) { 94 + if (se->se_slots[i].es_valid) 95 + ocfs2_set_slot(si, slotno, 96 + le32_to_cpu(se->se_slots[i].es_node_num)); 97 + else 98 + ocfs2_invalidate_slot(si, slotno); 99 + } 100 + } 101 + } 102 + 103 + /* 104 + * Post the slot information on disk into our slot_info struct. 105 + * Must be protected by osb_lock. 106 + */ 107 + static void ocfs2_update_slot_info_old(struct ocfs2_slot_info *si) 53 108 { 54 109 int i; 55 - __le16 *disk_info; 110 + struct ocfs2_slot_map *sm; 56 111 57 - /* we don't read the slot block here as ocfs2_super_lock 58 - * should've made sure we have the most recent copy. */ 59 - spin_lock(&si->si_lock); 60 - disk_info = (__le16 *) si->si_bh->b_data; 112 + sm = (struct ocfs2_slot_map *)si->si_bh[0]->b_data; 61 113 62 - for (i = 0; i < si->si_size; i++) 63 - si->si_global_node_nums[i] = le16_to_cpu(disk_info[i]); 114 + for (i = 0; i < si->si_num_slots; i++) { 115 + if (le16_to_cpu(sm->sm_slots[i]) == (u16)OCFS2_INVALID_SLOT) 116 + ocfs2_invalidate_slot(si, i); 117 + else 118 + ocfs2_set_slot(si, i, le16_to_cpu(sm->sm_slots[i])); 119 + } 120 + } 64 121 65 - spin_unlock(&si->si_lock); 122 + static void ocfs2_update_slot_info(struct ocfs2_slot_info *si) 123 + { 124 + /* 125 + * The slot data will have been refreshed when ocfs2_super_lock 126 + * was taken. 127 + */ 128 + if (si->si_extended) 129 + ocfs2_update_slot_info_extended(si); 130 + else 131 + ocfs2_update_slot_info_old(si); 132 + } 133 + 134 + int ocfs2_refresh_slot_info(struct ocfs2_super *osb) 135 + { 136 + int ret; 137 + struct ocfs2_slot_info *si = osb->slot_info; 138 + 139 + if (si == NULL) 140 + return 0; 141 + 142 + BUG_ON(si->si_blocks == 0); 143 + BUG_ON(si->si_bh == NULL); 144 + 145 + mlog(0, "Refreshing slot map, reading %u block(s)\n", 146 + si->si_blocks); 147 + 148 + /* 149 + * We pass -1 as blocknr because we expect all of si->si_bh to 150 + * be !NULL. Thus, ocfs2_read_blocks() will ignore blocknr. If 151 + * this is not true, the read of -1 (UINT64_MAX) will fail. 152 + */ 153 + ret = ocfs2_read_blocks(osb, -1, si->si_blocks, si->si_bh, 0, 154 + si->si_inode); 155 + if (ret == 0) { 156 + spin_lock(&osb->osb_lock); 157 + ocfs2_update_slot_info(si); 158 + spin_unlock(&osb->osb_lock); 159 + } 160 + 161 + return ret; 66 162 } 67 163 68 164 /* post the our slot info stuff into it's destination bh and write it 69 165 * out. */ 70 - int ocfs2_update_disk_slots(struct ocfs2_super *osb, 71 - struct ocfs2_slot_info *si) 166 + static void ocfs2_update_disk_slot_extended(struct ocfs2_slot_info *si, 167 + int slot_num, 168 + struct buffer_head **bh) 72 169 { 73 - int status, i; 74 - __le16 *disk_info = (__le16 *) si->si_bh->b_data; 170 + int blkind = slot_num / si->si_slots_per_block; 171 + int slotno = slot_num % si->si_slots_per_block; 172 + struct ocfs2_slot_map_extended *se; 75 173 76 - spin_lock(&si->si_lock); 77 - for (i = 0; i < si->si_size; i++) 78 - disk_info[i] = cpu_to_le16(si->si_global_node_nums[i]); 79 - spin_unlock(&si->si_lock); 174 + BUG_ON(blkind >= si->si_blocks); 80 175 81 - status = ocfs2_write_block(osb, si->si_bh, si->si_inode); 176 + se = (struct ocfs2_slot_map_extended *)si->si_bh[blkind]->b_data; 177 + se->se_slots[slotno].es_valid = si->si_slots[slot_num].sl_valid; 178 + if (si->si_slots[slot_num].sl_valid) 179 + se->se_slots[slotno].es_node_num = 180 + cpu_to_le32(si->si_slots[slot_num].sl_node_num); 181 + *bh = si->si_bh[blkind]; 182 + } 183 + 184 + static void ocfs2_update_disk_slot_old(struct ocfs2_slot_info *si, 185 + int slot_num, 186 + struct buffer_head **bh) 187 + { 188 + int i; 189 + struct ocfs2_slot_map *sm; 190 + 191 + sm = (struct ocfs2_slot_map *)si->si_bh[0]->b_data; 192 + for (i = 0; i < si->si_num_slots; i++) { 193 + if (si->si_slots[i].sl_valid) 194 + sm->sm_slots[i] = 195 + cpu_to_le16(si->si_slots[i].sl_node_num); 196 + else 197 + sm->sm_slots[i] = cpu_to_le16(OCFS2_INVALID_SLOT); 198 + } 199 + *bh = si->si_bh[0]; 200 + } 201 + 202 + static int ocfs2_update_disk_slot(struct ocfs2_super *osb, 203 + struct ocfs2_slot_info *si, 204 + int slot_num) 205 + { 206 + int status; 207 + struct buffer_head *bh; 208 + 209 + spin_lock(&osb->osb_lock); 210 + if (si->si_extended) 211 + ocfs2_update_disk_slot_extended(si, slot_num, &bh); 212 + else 213 + ocfs2_update_disk_slot_old(si, slot_num, &bh); 214 + spin_unlock(&osb->osb_lock); 215 + 216 + status = ocfs2_write_block(osb, bh, si->si_inode); 82 217 if (status < 0) 83 218 mlog_errno(status); 84 219 85 220 return status; 86 221 } 87 222 88 - /* try to find global node in the slot info. Returns 89 - * OCFS2_INVALID_SLOT if nothing is found. */ 90 - static s16 __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, 91 - s16 global) 223 + /* 224 + * Calculate how many bytes are needed by the slot map. Returns 225 + * an error if the slot map file is too small. 226 + */ 227 + static int ocfs2_slot_map_physical_size(struct ocfs2_super *osb, 228 + struct inode *inode, 229 + unsigned long long *bytes) 92 230 { 93 - int i; 94 - s16 ret = OCFS2_INVALID_SLOT; 231 + unsigned long long bytes_needed; 232 + 233 + if (ocfs2_uses_extended_slot_map(osb)) { 234 + bytes_needed = osb->max_slots * 235 + sizeof(struct ocfs2_extended_slot); 236 + } else { 237 + bytes_needed = osb->max_slots * sizeof(__le16); 238 + } 239 + if (bytes_needed > i_size_read(inode)) { 240 + mlog(ML_ERROR, 241 + "Slot map file is too small! (size %llu, needed %llu)\n", 242 + i_size_read(inode), bytes_needed); 243 + return -ENOSPC; 244 + } 245 + 246 + *bytes = bytes_needed; 247 + return 0; 248 + } 249 + 250 + /* try to find global node in the slot info. Returns -ENOENT 251 + * if nothing is found. */ 252 + static int __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, 253 + unsigned int node_num) 254 + { 255 + int i, ret = -ENOENT; 95 256 96 257 for(i = 0; i < si->si_num_slots; i++) { 97 - if (global == si->si_global_node_nums[i]) { 98 - ret = (s16) i; 258 + if (si->si_slots[i].sl_valid && 259 + (node_num == si->si_slots[i].sl_node_num)) { 260 + ret = i; 99 261 break; 100 262 } 101 263 } 264 + 102 265 return ret; 103 266 } 104 267 105 - static s16 __ocfs2_find_empty_slot(struct ocfs2_slot_info *si, s16 preferred) 268 + static int __ocfs2_find_empty_slot(struct ocfs2_slot_info *si, 269 + int preferred) 106 270 { 107 - int i; 108 - s16 ret = OCFS2_INVALID_SLOT; 271 + int i, ret = -ENOSPC; 109 272 110 - if (preferred >= 0 && preferred < si->si_num_slots) { 111 - if (OCFS2_INVALID_SLOT == si->si_global_node_nums[preferred]) { 273 + if ((preferred >= 0) && (preferred < si->si_num_slots)) { 274 + if (!si->si_slots[preferred].sl_valid) { 112 275 ret = preferred; 113 276 goto out; 114 277 } 115 278 } 116 279 117 280 for(i = 0; i < si->si_num_slots; i++) { 118 - if (OCFS2_INVALID_SLOT == si->si_global_node_nums[i]) { 119 - ret = (s16) i; 281 + if (!si->si_slots[i].sl_valid) { 282 + ret = i; 120 283 break; 121 284 } 122 285 } ··· 287 124 return ret; 288 125 } 289 126 290 - s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, 291 - s16 global) 127 + int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num) 292 128 { 293 - s16 ret; 129 + int slot; 130 + struct ocfs2_slot_info *si = osb->slot_info; 294 131 295 - spin_lock(&si->si_lock); 296 - ret = __ocfs2_node_num_to_slot(si, global); 297 - spin_unlock(&si->si_lock); 298 - return ret; 132 + spin_lock(&osb->osb_lock); 133 + slot = __ocfs2_node_num_to_slot(si, node_num); 134 + spin_unlock(&osb->osb_lock); 135 + 136 + return slot; 299 137 } 300 138 301 - static void __ocfs2_fill_slot(struct ocfs2_slot_info *si, 302 - s16 slot_num, 303 - s16 node_num) 139 + int ocfs2_slot_to_node_num_locked(struct ocfs2_super *osb, int slot_num, 140 + unsigned int *node_num) 304 141 { 305 - BUG_ON(slot_num == OCFS2_INVALID_SLOT); 306 - BUG_ON(slot_num >= si->si_num_slots); 307 - BUG_ON((node_num != O2NM_INVALID_NODE_NUM) && 308 - (node_num >= O2NM_MAX_NODES)); 142 + struct ocfs2_slot_info *si = osb->slot_info; 309 143 310 - si->si_global_node_nums[slot_num] = node_num; 144 + assert_spin_locked(&osb->osb_lock); 145 + 146 + BUG_ON(slot_num < 0); 147 + BUG_ON(slot_num > osb->max_slots); 148 + 149 + if (!si->si_slots[slot_num].sl_valid) 150 + return -ENOENT; 151 + 152 + *node_num = si->si_slots[slot_num].sl_node_num; 153 + return 0; 311 154 } 312 155 313 - void ocfs2_clear_slot(struct ocfs2_slot_info *si, 314 - s16 slot_num) 156 + static void __ocfs2_free_slot_info(struct ocfs2_slot_info *si) 315 157 { 316 - spin_lock(&si->si_lock); 317 - __ocfs2_fill_slot(si, slot_num, OCFS2_INVALID_SLOT); 318 - spin_unlock(&si->si_lock); 158 + unsigned int i; 159 + 160 + if (si == NULL) 161 + return; 162 + 163 + if (si->si_inode) 164 + iput(si->si_inode); 165 + if (si->si_bh) { 166 + for (i = 0; i < si->si_blocks; i++) { 167 + if (si->si_bh[i]) { 168 + brelse(si->si_bh[i]); 169 + si->si_bh[i] = NULL; 170 + } 171 + } 172 + kfree(si->si_bh); 173 + } 174 + 175 + kfree(si); 176 + } 177 + 178 + int ocfs2_clear_slot(struct ocfs2_super *osb, int slot_num) 179 + { 180 + struct ocfs2_slot_info *si = osb->slot_info; 181 + 182 + if (si == NULL) 183 + return 0; 184 + 185 + spin_lock(&osb->osb_lock); 186 + ocfs2_invalidate_slot(si, slot_num); 187 + spin_unlock(&osb->osb_lock); 188 + 189 + return ocfs2_update_disk_slot(osb, osb->slot_info, slot_num); 190 + } 191 + 192 + static int ocfs2_map_slot_buffers(struct ocfs2_super *osb, 193 + struct ocfs2_slot_info *si) 194 + { 195 + int status = 0; 196 + u64 blkno; 197 + unsigned long long blocks, bytes; 198 + unsigned int i; 199 + struct buffer_head *bh; 200 + 201 + status = ocfs2_slot_map_physical_size(osb, si->si_inode, &bytes); 202 + if (status) 203 + goto bail; 204 + 205 + blocks = ocfs2_blocks_for_bytes(si->si_inode->i_sb, bytes); 206 + BUG_ON(blocks > UINT_MAX); 207 + si->si_blocks = blocks; 208 + if (!si->si_blocks) 209 + goto bail; 210 + 211 + if (si->si_extended) 212 + si->si_slots_per_block = 213 + (osb->sb->s_blocksize / 214 + sizeof(struct ocfs2_extended_slot)); 215 + else 216 + si->si_slots_per_block = osb->sb->s_blocksize / sizeof(__le16); 217 + 218 + /* The size checks above should ensure this */ 219 + BUG_ON((osb->max_slots / si->si_slots_per_block) > blocks); 220 + 221 + mlog(0, "Slot map needs %u buffers for %llu bytes\n", 222 + si->si_blocks, bytes); 223 + 224 + si->si_bh = kzalloc(sizeof(struct buffer_head *) * si->si_blocks, 225 + GFP_KERNEL); 226 + if (!si->si_bh) { 227 + status = -ENOMEM; 228 + mlog_errno(status); 229 + goto bail; 230 + } 231 + 232 + for (i = 0; i < si->si_blocks; i++) { 233 + status = ocfs2_extent_map_get_blocks(si->si_inode, i, 234 + &blkno, NULL, NULL); 235 + if (status < 0) { 236 + mlog_errno(status); 237 + goto bail; 238 + } 239 + 240 + mlog(0, "Reading slot map block %u at %llu\n", i, 241 + (unsigned long long)blkno); 242 + 243 + bh = NULL; /* Acquire a fresh bh */ 244 + status = ocfs2_read_block(osb, blkno, &bh, 0, si->si_inode); 245 + if (status < 0) { 246 + mlog_errno(status); 247 + goto bail; 248 + } 249 + 250 + si->si_bh[i] = bh; 251 + } 252 + 253 + bail: 254 + return status; 319 255 } 320 256 321 257 int ocfs2_init_slot_info(struct ocfs2_super *osb) 322 258 { 323 - int status, i; 324 - u64 blkno; 259 + int status; 325 260 struct inode *inode = NULL; 326 - struct buffer_head *bh = NULL; 327 261 struct ocfs2_slot_info *si; 328 262 329 - si = kzalloc(sizeof(struct ocfs2_slot_info), GFP_KERNEL); 263 + si = kzalloc(sizeof(struct ocfs2_slot_info) + 264 + (sizeof(struct ocfs2_slot) * osb->max_slots), 265 + GFP_KERNEL); 330 266 if (!si) { 331 267 status = -ENOMEM; 332 268 mlog_errno(status); 333 269 goto bail; 334 270 } 335 271 336 - spin_lock_init(&si->si_lock); 272 + si->si_extended = ocfs2_uses_extended_slot_map(osb); 337 273 si->si_num_slots = osb->max_slots; 338 - si->si_size = OCFS2_MAX_SLOTS; 339 - 340 - for(i = 0; i < si->si_num_slots; i++) 341 - si->si_global_node_nums[i] = OCFS2_INVALID_SLOT; 274 + si->si_slots = (struct ocfs2_slot *)((char *)si + 275 + sizeof(struct ocfs2_slot_info)); 342 276 343 277 inode = ocfs2_get_system_file_inode(osb, SLOT_MAP_SYSTEM_INODE, 344 278 OCFS2_INVALID_SLOT); ··· 445 185 goto bail; 446 186 } 447 187 448 - status = ocfs2_extent_map_get_blocks(inode, 0ULL, &blkno, NULL, NULL); 449 - if (status < 0) { 450 - mlog_errno(status); 451 - goto bail; 452 - } 453 - 454 - status = ocfs2_read_block(osb, blkno, &bh, 0, inode); 455 - if (status < 0) { 456 - mlog_errno(status); 457 - goto bail; 458 - } 459 - 460 188 si->si_inode = inode; 461 - si->si_bh = bh; 462 - osb->slot_info = si; 189 + status = ocfs2_map_slot_buffers(osb, si); 190 + if (status < 0) { 191 + mlog_errno(status); 192 + goto bail; 193 + } 194 + 195 + osb->slot_info = (struct ocfs2_slot_info *)si; 463 196 bail: 464 197 if (status < 0 && si) 465 - ocfs2_free_slot_info(si); 198 + __ocfs2_free_slot_info(si); 466 199 467 200 return status; 468 201 } 469 202 470 - void ocfs2_free_slot_info(struct ocfs2_slot_info *si) 203 + void ocfs2_free_slot_info(struct ocfs2_super *osb) 471 204 { 472 - if (si->si_inode) 473 - iput(si->si_inode); 474 - if (si->si_bh) 475 - brelse(si->si_bh); 476 - kfree(si); 205 + struct ocfs2_slot_info *si = osb->slot_info; 206 + 207 + osb->slot_info = NULL; 208 + __ocfs2_free_slot_info(si); 477 209 } 478 210 479 211 int ocfs2_find_slot(struct ocfs2_super *osb) 480 212 { 481 213 int status; 482 - s16 slot; 214 + int slot; 483 215 struct ocfs2_slot_info *si; 484 216 485 217 mlog_entry_void(); 486 218 487 219 si = osb->slot_info; 488 220 221 + spin_lock(&osb->osb_lock); 489 222 ocfs2_update_slot_info(si); 490 223 491 - spin_lock(&si->si_lock); 492 224 /* search for ourselves first and take the slot if it already 493 225 * exists. Perhaps we need to mark this in a variable for our 494 226 * own journal recovery? Possibly not, though we certainly 495 227 * need to warn to the user */ 496 228 slot = __ocfs2_node_num_to_slot(si, osb->node_num); 497 - if (slot == OCFS2_INVALID_SLOT) { 229 + if (slot < 0) { 498 230 /* if no slot yet, then just take 1st available 499 231 * one. */ 500 232 slot = __ocfs2_find_empty_slot(si, osb->preferred_slot); 501 - if (slot == OCFS2_INVALID_SLOT) { 502 - spin_unlock(&si->si_lock); 233 + if (slot < 0) { 234 + spin_unlock(&osb->osb_lock); 503 235 mlog(ML_ERROR, "no free slots available!\n"); 504 236 status = -EINVAL; 505 237 goto bail; ··· 500 248 mlog(ML_NOTICE, "slot %d is already allocated to this node!\n", 501 249 slot); 502 250 503 - __ocfs2_fill_slot(si, slot, osb->node_num); 251 + ocfs2_set_slot(si, slot, osb->node_num); 504 252 osb->slot_num = slot; 505 - spin_unlock(&si->si_lock); 253 + spin_unlock(&osb->osb_lock); 506 254 507 255 mlog(0, "taking node slot %d\n", osb->slot_num); 508 256 509 - status = ocfs2_update_disk_slots(osb, si); 257 + status = ocfs2_update_disk_slot(osb, si, osb->slot_num); 510 258 if (status < 0) 511 259 mlog_errno(status); 512 260 ··· 517 265 518 266 void ocfs2_put_slot(struct ocfs2_super *osb) 519 267 { 520 - int status; 268 + int status, slot_num; 521 269 struct ocfs2_slot_info *si = osb->slot_info; 522 270 523 271 if (!si) 524 272 return; 525 273 274 + spin_lock(&osb->osb_lock); 526 275 ocfs2_update_slot_info(si); 527 276 528 - spin_lock(&si->si_lock); 529 - __ocfs2_fill_slot(si, osb->slot_num, OCFS2_INVALID_SLOT); 277 + slot_num = osb->slot_num; 278 + ocfs2_invalidate_slot(si, osb->slot_num); 530 279 osb->slot_num = OCFS2_INVALID_SLOT; 531 - spin_unlock(&si->si_lock); 280 + spin_unlock(&osb->osb_lock); 532 281 533 - status = ocfs2_update_disk_slots(osb, si); 282 + status = ocfs2_update_disk_slot(osb, si, slot_num); 534 283 if (status < 0) { 535 284 mlog_errno(status); 536 285 goto bail; 537 286 } 538 287 539 288 bail: 540 - osb->slot_info = NULL; 541 - ocfs2_free_slot_info(si); 289 + ocfs2_free_slot_info(osb); 542 290 } 543 291

+6 -26

fs/ocfs2/slot_map.h

··· 27 27 #ifndef SLOTMAP_H 28 28 #define SLOTMAP_H 29 29 30 - struct ocfs2_slot_info { 31 - spinlock_t si_lock; 32 - 33 - struct inode *si_inode; 34 - struct buffer_head *si_bh; 35 - unsigned int si_num_slots; 36 - unsigned int si_size; 37 - s16 si_global_node_nums[OCFS2_MAX_SLOTS]; 38 - }; 39 - 40 30 int ocfs2_init_slot_info(struct ocfs2_super *osb); 41 - void ocfs2_free_slot_info(struct ocfs2_slot_info *si); 31 + void ocfs2_free_slot_info(struct ocfs2_super *osb); 42 32 43 33 int ocfs2_find_slot(struct ocfs2_super *osb); 44 34 void ocfs2_put_slot(struct ocfs2_super *osb); 45 35 46 - void ocfs2_update_slot_info(struct ocfs2_slot_info *si); 47 - int ocfs2_update_disk_slots(struct ocfs2_super *osb, 48 - struct ocfs2_slot_info *si); 36 + int ocfs2_refresh_slot_info(struct ocfs2_super *osb); 49 37 50 - s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si, 51 - s16 global); 52 - void ocfs2_clear_slot(struct ocfs2_slot_info *si, 53 - s16 slot_num); 38 + int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num); 39 + int ocfs2_slot_to_node_num_locked(struct ocfs2_super *osb, int slot_num, 40 + unsigned int *node_num); 54 41 55 - static inline int ocfs2_is_empty_slot(struct ocfs2_slot_info *si, 56 - int slot_num) 57 - { 58 - BUG_ON(slot_num == OCFS2_INVALID_SLOT); 59 - assert_spin_locked(&si->si_lock); 60 - 61 - return si->si_global_node_nums[slot_num] == OCFS2_INVALID_SLOT; 62 - } 42 + int ocfs2_clear_slot(struct ocfs2_super *osb, int slot_num); 63 43 64 44 #endif

+420

fs/ocfs2/stack_o2cb.c

··· 1 + /* -*- mode: c; c-basic-offset: 8; -*- 2 + * vim: noexpandtab sw=8 ts=8 sts=0: 3 + * 4 + * stack_o2cb.c 5 + * 6 + * Code which interfaces ocfs2 with the o2cb stack. 7 + * 8 + * Copyright (C) 2007 Oracle. All rights reserved. 9 + * 10 + * This program is free software; you can redistribute it and/or 11 + * modify it under the terms of the GNU General Public 12 + * License as published by the Free Software Foundation, version 2. 13 + * 14 + * This program is distributed in the hope that it will be useful, 15 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 16 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 17 + * General Public License for more details. 18 + */ 19 + 20 + #include <linux/crc32.h> 21 + #include <linux/module.h> 22 + 23 + /* Needed for AOP_TRUNCATED_PAGE in mlog_errno() */ 24 + #include <linux/fs.h> 25 + 26 + #include "cluster/masklog.h" 27 + #include "cluster/nodemanager.h" 28 + #include "cluster/heartbeat.h" 29 + 30 + #include "stackglue.h" 31 + 32 + struct o2dlm_private { 33 + struct dlm_eviction_cb op_eviction_cb; 34 + }; 35 + 36 + static struct ocfs2_stack_plugin o2cb_stack; 37 + 38 + /* These should be identical */ 39 + #if (DLM_LOCK_IV != LKM_IVMODE) 40 + # error Lock modes do not match 41 + #endif 42 + #if (DLM_LOCK_NL != LKM_NLMODE) 43 + # error Lock modes do not match 44 + #endif 45 + #if (DLM_LOCK_CR != LKM_CRMODE) 46 + # error Lock modes do not match 47 + #endif 48 + #if (DLM_LOCK_CW != LKM_CWMODE) 49 + # error Lock modes do not match 50 + #endif 51 + #if (DLM_LOCK_PR != LKM_PRMODE) 52 + # error Lock modes do not match 53 + #endif 54 + #if (DLM_LOCK_PW != LKM_PWMODE) 55 + # error Lock modes do not match 56 + #endif 57 + #if (DLM_LOCK_EX != LKM_EXMODE) 58 + # error Lock modes do not match 59 + #endif 60 + static inline int mode_to_o2dlm(int mode) 61 + { 62 + BUG_ON(mode > LKM_MAXMODE); 63 + 64 + return mode; 65 + } 66 + 67 + #define map_flag(_generic, _o2dlm) \ 68 + if (flags & (_generic)) { \ 69 + flags &= ~(_generic); \ 70 + o2dlm_flags |= (_o2dlm); \ 71 + } 72 + static int flags_to_o2dlm(u32 flags) 73 + { 74 + int o2dlm_flags = 0; 75 + 76 + map_flag(DLM_LKF_NOQUEUE, LKM_NOQUEUE); 77 + map_flag(DLM_LKF_CANCEL, LKM_CANCEL); 78 + map_flag(DLM_LKF_CONVERT, LKM_CONVERT); 79 + map_flag(DLM_LKF_VALBLK, LKM_VALBLK); 80 + map_flag(DLM_LKF_IVVALBLK, LKM_INVVALBLK); 81 + map_flag(DLM_LKF_ORPHAN, LKM_ORPHAN); 82 + map_flag(DLM_LKF_FORCEUNLOCK, LKM_FORCE); 83 + map_flag(DLM_LKF_TIMEOUT, LKM_TIMEOUT); 84 + map_flag(DLM_LKF_LOCAL, LKM_LOCAL); 85 + 86 + /* map_flag() should have cleared every flag passed in */ 87 + BUG_ON(flags != 0); 88 + 89 + return o2dlm_flags; 90 + } 91 + #undef map_flag 92 + 93 + /* 94 + * Map an o2dlm status to standard errno values. 95 + * 96 + * o2dlm only uses a handful of these, and returns even fewer to the 97 + * caller. Still, we try to assign sane values to each error. 98 + * 99 + * The following value pairs have special meanings to dlmglue, thus 100 + * the right hand side needs to stay unique - never duplicate the 101 + * mapping elsewhere in the table! 102 + * 103 + * DLM_NORMAL: 0 104 + * DLM_NOTQUEUED: -EAGAIN 105 + * DLM_CANCELGRANT: -EBUSY 106 + * DLM_CANCEL: -DLM_ECANCEL 107 + */ 108 + /* Keep in sync with dlmapi.h */ 109 + static int status_map[] = { 110 + [DLM_NORMAL] = 0, /* Success */ 111 + [DLM_GRANTED] = -EINVAL, 112 + [DLM_DENIED] = -EACCES, 113 + [DLM_DENIED_NOLOCKS] = -EACCES, 114 + [DLM_WORKING] = -EACCES, 115 + [DLM_BLOCKED] = -EINVAL, 116 + [DLM_BLOCKED_ORPHAN] = -EINVAL, 117 + [DLM_DENIED_GRACE_PERIOD] = -EACCES, 118 + [DLM_SYSERR] = -ENOMEM, /* It is what it is */ 119 + [DLM_NOSUPPORT] = -EPROTO, 120 + [DLM_CANCELGRANT] = -EBUSY, /* Cancel after grant */ 121 + [DLM_IVLOCKID] = -EINVAL, 122 + [DLM_SYNC] = -EINVAL, 123 + [DLM_BADTYPE] = -EINVAL, 124 + [DLM_BADRESOURCE] = -EINVAL, 125 + [DLM_MAXHANDLES] = -ENOMEM, 126 + [DLM_NOCLINFO] = -EINVAL, 127 + [DLM_NOLOCKMGR] = -EINVAL, 128 + [DLM_NOPURGED] = -EINVAL, 129 + [DLM_BADARGS] = -EINVAL, 130 + [DLM_VOID] = -EINVAL, 131 + [DLM_NOTQUEUED] = -EAGAIN, /* Trylock failed */ 132 + [DLM_IVBUFLEN] = -EINVAL, 133 + [DLM_CVTUNGRANT] = -EPERM, 134 + [DLM_BADPARAM] = -EINVAL, 135 + [DLM_VALNOTVALID] = -EINVAL, 136 + [DLM_REJECTED] = -EPERM, 137 + [DLM_ABORT] = -EINVAL, 138 + [DLM_CANCEL] = -DLM_ECANCEL, /* Successful cancel */ 139 + [DLM_IVRESHANDLE] = -EINVAL, 140 + [DLM_DEADLOCK] = -EDEADLK, 141 + [DLM_DENIED_NOASTS] = -EINVAL, 142 + [DLM_FORWARD] = -EINVAL, 143 + [DLM_TIMEOUT] = -ETIMEDOUT, 144 + [DLM_IVGROUPID] = -EINVAL, 145 + [DLM_VERS_CONFLICT] = -EOPNOTSUPP, 146 + [DLM_BAD_DEVICE_PATH] = -ENOENT, 147 + [DLM_NO_DEVICE_PERMISSION] = -EPERM, 148 + [DLM_NO_CONTROL_DEVICE] = -ENOENT, 149 + [DLM_RECOVERING] = -ENOTCONN, 150 + [DLM_MIGRATING] = -ERESTART, 151 + [DLM_MAXSTATS] = -EINVAL, 152 + }; 153 + 154 + static int dlm_status_to_errno(enum dlm_status status) 155 + { 156 + BUG_ON(status > (sizeof(status_map) / sizeof(status_map[0]))); 157 + 158 + return status_map[status]; 159 + } 160 + 161 + static void o2dlm_lock_ast_wrapper(void *astarg) 162 + { 163 + BUG_ON(o2cb_stack.sp_proto == NULL); 164 + 165 + o2cb_stack.sp_proto->lp_lock_ast(astarg); 166 + } 167 + 168 + static void o2dlm_blocking_ast_wrapper(void *astarg, int level) 169 + { 170 + BUG_ON(o2cb_stack.sp_proto == NULL); 171 + 172 + o2cb_stack.sp_proto->lp_blocking_ast(astarg, level); 173 + } 174 + 175 + static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status) 176 + { 177 + int error = dlm_status_to_errno(status); 178 + 179 + BUG_ON(o2cb_stack.sp_proto == NULL); 180 + 181 + /* 182 + * In o2dlm, you can get both the lock_ast() for the lock being 183 + * granted and the unlock_ast() for the CANCEL failing. A 184 + * successful cancel sends DLM_NORMAL here. If the 185 + * lock grant happened before the cancel arrived, you get 186 + * DLM_CANCELGRANT. 187 + * 188 + * There's no need for the double-ast. If we see DLM_CANCELGRANT, 189 + * we just ignore it. We expect the lock_ast() to handle the 190 + * granted lock. 191 + */ 192 + if (status == DLM_CANCELGRANT) 193 + return; 194 + 195 + o2cb_stack.sp_proto->lp_unlock_ast(astarg, error); 196 + } 197 + 198 + static int o2cb_dlm_lock(struct ocfs2_cluster_connection *conn, 199 + int mode, 200 + union ocfs2_dlm_lksb *lksb, 201 + u32 flags, 202 + void *name, 203 + unsigned int namelen, 204 + void *astarg) 205 + { 206 + enum dlm_status status; 207 + int o2dlm_mode = mode_to_o2dlm(mode); 208 + int o2dlm_flags = flags_to_o2dlm(flags); 209 + int ret; 210 + 211 + status = dlmlock(conn->cc_lockspace, o2dlm_mode, &lksb->lksb_o2dlm, 212 + o2dlm_flags, name, namelen, 213 + o2dlm_lock_ast_wrapper, astarg, 214 + o2dlm_blocking_ast_wrapper); 215 + ret = dlm_status_to_errno(status); 216 + return ret; 217 + } 218 + 219 + static int o2cb_dlm_unlock(struct ocfs2_cluster_connection *conn, 220 + union ocfs2_dlm_lksb *lksb, 221 + u32 flags, 222 + void *astarg) 223 + { 224 + enum dlm_status status; 225 + int o2dlm_flags = flags_to_o2dlm(flags); 226 + int ret; 227 + 228 + status = dlmunlock(conn->cc_lockspace, &lksb->lksb_o2dlm, 229 + o2dlm_flags, o2dlm_unlock_ast_wrapper, astarg); 230 + ret = dlm_status_to_errno(status); 231 + return ret; 232 + } 233 + 234 + static int o2cb_dlm_lock_status(union ocfs2_dlm_lksb *lksb) 235 + { 236 + return dlm_status_to_errno(lksb->lksb_o2dlm.status); 237 + } 238 + 239 + static void *o2cb_dlm_lvb(union ocfs2_dlm_lksb *lksb) 240 + { 241 + return (void *)(lksb->lksb_o2dlm.lvb); 242 + } 243 + 244 + static void o2cb_dump_lksb(union ocfs2_dlm_lksb *lksb) 245 + { 246 + dlm_print_one_lock(lksb->lksb_o2dlm.lockid); 247 + } 248 + 249 + /* 250 + * Called from the dlm when it's about to evict a node. This is how the 251 + * classic stack signals node death. 252 + */ 253 + static void o2dlm_eviction_cb(int node_num, void *data) 254 + { 255 + struct ocfs2_cluster_connection *conn = data; 256 + 257 + mlog(ML_NOTICE, "o2dlm has evicted node %d from group %.*s\n", 258 + node_num, conn->cc_namelen, conn->cc_name); 259 + 260 + conn->cc_recovery_handler(node_num, conn->cc_recovery_data); 261 + } 262 + 263 + static int o2cb_cluster_connect(struct ocfs2_cluster_connection *conn) 264 + { 265 + int rc = 0; 266 + u32 dlm_key; 267 + struct dlm_ctxt *dlm; 268 + struct o2dlm_private *priv; 269 + struct dlm_protocol_version dlm_version; 270 + 271 + BUG_ON(conn == NULL); 272 + BUG_ON(o2cb_stack.sp_proto == NULL); 273 + 274 + /* for now we only have one cluster/node, make sure we see it 275 + * in the heartbeat universe */ 276 + if (!o2hb_check_local_node_heartbeating()) { 277 + rc = -EINVAL; 278 + goto out; 279 + } 280 + 281 + priv = kzalloc(sizeof(struct o2dlm_private), GFP_KERNEL); 282 + if (!priv) { 283 + rc = -ENOMEM; 284 + goto out_free; 285 + } 286 + 287 + /* This just fills the structure in. It is safe to pass conn. */ 288 + dlm_setup_eviction_cb(&priv->op_eviction_cb, o2dlm_eviction_cb, 289 + conn); 290 + 291 + conn->cc_private = priv; 292 + 293 + /* used by the dlm code to make message headers unique, each 294 + * node in this domain must agree on this. */ 295 + dlm_key = crc32_le(0, conn->cc_name, conn->cc_namelen); 296 + dlm_version.pv_major = conn->cc_version.pv_major; 297 + dlm_version.pv_minor = conn->cc_version.pv_minor; 298 + 299 + dlm = dlm_register_domain(conn->cc_name, dlm_key, &dlm_version); 300 + if (IS_ERR(dlm)) { 301 + rc = PTR_ERR(dlm); 302 + mlog_errno(rc); 303 + goto out_free; 304 + } 305 + 306 + conn->cc_version.pv_major = dlm_version.pv_major; 307 + conn->cc_version.pv_minor = dlm_version.pv_minor; 308 + conn->cc_lockspace = dlm; 309 + 310 + dlm_register_eviction_cb(dlm, &priv->op_eviction_cb); 311 + 312 + out_free: 313 + if (rc && conn->cc_private) 314 + kfree(conn->cc_private); 315 + 316 + out: 317 + return rc; 318 + } 319 + 320 + static int o2cb_cluster_disconnect(struct ocfs2_cluster_connection *conn, 321 + int hangup_pending) 322 + { 323 + struct dlm_ctxt *dlm = conn->cc_lockspace; 324 + struct o2dlm_private *priv = conn->cc_private; 325 + 326 + dlm_unregister_eviction_cb(&priv->op_eviction_cb); 327 + conn->cc_private = NULL; 328 + kfree(priv); 329 + 330 + dlm_unregister_domain(dlm); 331 + conn->cc_lockspace = NULL; 332 + 333 + return 0; 334 + } 335 + 336 + static void o2hb_stop(const char *group) 337 + { 338 + int ret; 339 + char *argv[5], *envp[3]; 340 + 341 + argv[0] = (char *)o2nm_get_hb_ctl_path(); 342 + argv[1] = "-K"; 343 + argv[2] = "-u"; 344 + argv[3] = (char *)group; 345 + argv[4] = NULL; 346 + 347 + mlog(0, "Run: %s %s %s %s\n", argv[0], argv[1], argv[2], argv[3]); 348 + 349 + /* minimal command environment taken from cpu_run_sbin_hotplug */ 350 + envp[0] = "HOME=/"; 351 + envp[1] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin"; 352 + envp[2] = NULL; 353 + 354 + ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC); 355 + if (ret < 0) 356 + mlog_errno(ret); 357 + } 358 + 359 + /* 360 + * Hangup is a hack for tools compatibility. Older ocfs2-tools software 361 + * expects the filesystem to call "ocfs2_hb_ctl" during unmount. This 362 + * happens regardless of whether the DLM got started, so we can't do it 363 + * in ocfs2_cluster_disconnect(). We bring the o2hb_stop() function into 364 + * the glue and provide a "hangup" API for super.c to call. 365 + * 366 + * Other stacks will eventually provide a NULL ->hangup() pointer. 367 + */ 368 + static void o2cb_cluster_hangup(const char *group, int grouplen) 369 + { 370 + o2hb_stop(group); 371 + } 372 + 373 + static int o2cb_cluster_this_node(unsigned int *node) 374 + { 375 + int node_num; 376 + 377 + node_num = o2nm_this_node(); 378 + if (node_num == O2NM_INVALID_NODE_NUM) 379 + return -ENOENT; 380 + 381 + if (node_num >= O2NM_MAX_NODES) 382 + return -EOVERFLOW; 383 + 384 + *node = node_num; 385 + return 0; 386 + } 387 + 388 + struct ocfs2_stack_operations o2cb_stack_ops = { 389 + .connect = o2cb_cluster_connect, 390 + .disconnect = o2cb_cluster_disconnect, 391 + .hangup = o2cb_cluster_hangup, 392 + .this_node = o2cb_cluster_this_node, 393 + .dlm_lock = o2cb_dlm_lock, 394 + .dlm_unlock = o2cb_dlm_unlock, 395 + .lock_status = o2cb_dlm_lock_status, 396 + .lock_lvb = o2cb_dlm_lvb, 397 + .dump_lksb = o2cb_dump_lksb, 398 + }; 399 + 400 + static struct ocfs2_stack_plugin o2cb_stack = { 401 + .sp_name = "o2cb", 402 + .sp_ops = &o2cb_stack_ops, 403 + .sp_owner = THIS_MODULE, 404 + }; 405 + 406 + static int __init o2cb_stack_init(void) 407 + { 408 + return ocfs2_stack_glue_register(&o2cb_stack); 409 + } 410 + 411 + static void __exit o2cb_stack_exit(void) 412 + { 413 + ocfs2_stack_glue_unregister(&o2cb_stack); 414 + } 415 + 416 + MODULE_AUTHOR("Oracle"); 417 + MODULE_DESCRIPTION("ocfs2 driver for the classic o2cb stack"); 418 + MODULE_LICENSE("GPL"); 419 + module_init(o2cb_stack_init); 420 + module_exit(o2cb_stack_exit);

+883

fs/ocfs2/stack_user.c

··· 1 + /* -*- mode: c; c-basic-offset: 8; -*- 2 + * vim: noexpandtab sw=8 ts=8 sts=0: 3 + * 4 + * stack_user.c 5 + * 6 + * Code which interfaces ocfs2 with fs/dlm and a userspace stack. 7 + * 8 + * Copyright (C) 2007 Oracle. All rights reserved. 9 + * 10 + * This program is free software; you can redistribute it and/or 11 + * modify it under the terms of the GNU General Public 12 + * License as published by the Free Software Foundation, version 2. 13 + * 14 + * This program is distributed in the hope that it will be useful, 15 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 16 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 17 + * General Public License for more details. 18 + */ 19 + 20 + #include <linux/module.h> 21 + #include <linux/fs.h> 22 + #include <linux/miscdevice.h> 23 + #include <linux/mutex.h> 24 + #include <linux/reboot.h> 25 + #include <asm/uaccess.h> 26 + 27 + #include "ocfs2.h" /* For struct ocfs2_lock_res */ 28 + #include "stackglue.h" 29 + 30 + 31 + /* 32 + * The control protocol starts with a handshake. Until the handshake 33 + * is complete, the control device will fail all write(2)s. 34 + * 35 + * The handshake is simple. First, the client reads until EOF. Each line 36 + * of output is a supported protocol tag. All protocol tags are a single 37 + * character followed by a two hex digit version number. Currently the 38 + * only things supported is T01, for "Text-base version 0x01". Next, the 39 + * client writes the version they would like to use, including the newline. 40 + * Thus, the protocol tag is 'T01\n'. If the version tag written is 41 + * unknown, -EINVAL is returned. Once the negotiation is complete, the 42 + * client can start sending messages. 43 + * 44 + * The T01 protocol has three messages. First is the "SETN" message. 45 + * It has the following syntax: 46 + * 47 + * SETN<space><8-char-hex-nodenum><newline> 48 + * 49 + * This is 14 characters. 50 + * 51 + * The "SETN" message must be the first message following the protocol. 52 + * It tells ocfs2_control the local node number. 53 + * 54 + * Next comes the "SETV" message. It has the following syntax: 55 + * 56 + * SETV<space><2-char-hex-major><space><2-char-hex-minor><newline> 57 + * 58 + * This is 11 characters. 59 + * 60 + * The "SETV" message sets the filesystem locking protocol version as 61 + * negotiated by the client. The client negotiates based on the maximum 62 + * version advertised in /sys/fs/ocfs2/max_locking_protocol. The major 63 + * number from the "SETV" message must match 64 + * user_stack.sp_proto->lp_max_version.pv_major, and the minor number 65 + * must be less than or equal to ...->lp_max_version.pv_minor. 66 + * 67 + * Once this information has been set, mounts will be allowed. From this 68 + * point on, the "DOWN" message can be sent for node down notification. 69 + * It has the following syntax: 70 + * 71 + * DOWN<space><32-char-cap-hex-uuid><space><8-char-hex-nodenum><newline> 72 + * 73 + * eg: 74 + * 75 + * DOWN 632A924FDD844190BDA93C0DF6B94899 00000001\n 76 + * 77 + * This is 47 characters. 78 + */ 79 + 80 + /* 81 + * Whether or not the client has done the handshake. 82 + * For now, we have just one protocol version. 83 + */ 84 + #define OCFS2_CONTROL_PROTO "T01\n" 85 + #define OCFS2_CONTROL_PROTO_LEN 4 86 + 87 + /* Handshake states */ 88 + #define OCFS2_CONTROL_HANDSHAKE_INVALID (0) 89 + #define OCFS2_CONTROL_HANDSHAKE_READ (1) 90 + #define OCFS2_CONTROL_HANDSHAKE_PROTOCOL (2) 91 + #define OCFS2_CONTROL_HANDSHAKE_VALID (3) 92 + 93 + /* Messages */ 94 + #define OCFS2_CONTROL_MESSAGE_OP_LEN 4 95 + #define OCFS2_CONTROL_MESSAGE_SETNODE_OP "SETN" 96 + #define OCFS2_CONTROL_MESSAGE_SETNODE_TOTAL_LEN 14 97 + #define OCFS2_CONTROL_MESSAGE_SETVERSION_OP "SETV" 98 + #define OCFS2_CONTROL_MESSAGE_SETVERSION_TOTAL_LEN 11 99 + #define OCFS2_CONTROL_MESSAGE_DOWN_OP "DOWN" 100 + #define OCFS2_CONTROL_MESSAGE_DOWN_TOTAL_LEN 47 101 + #define OCFS2_TEXT_UUID_LEN 32 102 + #define OCFS2_CONTROL_MESSAGE_VERNUM_LEN 2 103 + #define OCFS2_CONTROL_MESSAGE_NODENUM_LEN 8 104 + 105 + /* 106 + * ocfs2_live_connection is refcounted because the filesystem and 107 + * miscdevice sides can detach in different order. Let's just be safe. 108 + */ 109 + struct ocfs2_live_connection { 110 + struct list_head oc_list; 111 + struct ocfs2_cluster_connection *oc_conn; 112 + }; 113 + 114 + struct ocfs2_control_private { 115 + struct list_head op_list; 116 + int op_state; 117 + int op_this_node; 118 + struct ocfs2_protocol_version op_proto; 119 + }; 120 + 121 + /* SETN<space><8-char-hex-nodenum><newline> */ 122 + struct ocfs2_control_message_setn { 123 + char tag[OCFS2_CONTROL_MESSAGE_OP_LEN]; 124 + char space; 125 + char nodestr[OCFS2_CONTROL_MESSAGE_NODENUM_LEN]; 126 + char newline; 127 + }; 128 + 129 + /* SETV<space><2-char-hex-major><space><2-char-hex-minor><newline> */ 130 + struct ocfs2_control_message_setv { 131 + char tag[OCFS2_CONTROL_MESSAGE_OP_LEN]; 132 + char space1; 133 + char major[OCFS2_CONTROL_MESSAGE_VERNUM_LEN]; 134 + char space2; 135 + char minor[OCFS2_CONTROL_MESSAGE_VERNUM_LEN]; 136 + char newline; 137 + }; 138 + 139 + /* DOWN<space><32-char-cap-hex-uuid><space><8-char-hex-nodenum><newline> */ 140 + struct ocfs2_control_message_down { 141 + char tag[OCFS2_CONTROL_MESSAGE_OP_LEN]; 142 + char space1; 143 + char uuid[OCFS2_TEXT_UUID_LEN]; 144 + char space2; 145 + char nodestr[OCFS2_CONTROL_MESSAGE_NODENUM_LEN]; 146 + char newline; 147 + }; 148 + 149 + union ocfs2_control_message { 150 + char tag[OCFS2_CONTROL_MESSAGE_OP_LEN]; 151 + struct ocfs2_control_message_setn u_setn; 152 + struct ocfs2_control_message_setv u_setv; 153 + struct ocfs2_control_message_down u_down; 154 + }; 155 + 156 + static struct ocfs2_stack_plugin user_stack; 157 + 158 + static atomic_t ocfs2_control_opened; 159 + static int ocfs2_control_this_node = -1; 160 + static struct ocfs2_protocol_version running_proto; 161 + 162 + static LIST_HEAD(ocfs2_live_connection_list); 163 + static LIST_HEAD(ocfs2_control_private_list); 164 + static DEFINE_MUTEX(ocfs2_control_lock); 165 + 166 + static inline void ocfs2_control_set_handshake_state(struct file *file, 167 + int state) 168 + { 169 + struct ocfs2_control_private *p = file->private_data; 170 + p->op_state = state; 171 + } 172 + 173 + static inline int ocfs2_control_get_handshake_state(struct file *file) 174 + { 175 + struct ocfs2_control_private *p = file->private_data; 176 + return p->op_state; 177 + } 178 + 179 + static struct ocfs2_live_connection *ocfs2_connection_find(const char *name) 180 + { 181 + size_t len = strlen(name); 182 + struct ocfs2_live_connection *c; 183 + 184 + BUG_ON(!mutex_is_locked(&ocfs2_control_lock)); 185 + 186 + list_for_each_entry(c, &ocfs2_live_connection_list, oc_list) { 187 + if ((c->oc_conn->cc_namelen == len) && 188 + !strncmp(c->oc_conn->cc_name, name, len)) 189 + return c; 190 + } 191 + 192 + return c; 193 + } 194 + 195 + /* 196 + * ocfs2_live_connection structures are created underneath the ocfs2 197 + * mount path. Since the VFS prevents multiple calls to 198 + * fill_super(), we can't get dupes here. 199 + */ 200 + static int ocfs2_live_connection_new(struct ocfs2_cluster_connection *conn, 201 + struct ocfs2_live_connection **c_ret) 202 + { 203 + int rc = 0; 204 + struct ocfs2_live_connection *c; 205 + 206 + c = kzalloc(sizeof(struct ocfs2_live_connection), GFP_KERNEL); 207 + if (!c) 208 + return -ENOMEM; 209 + 210 + mutex_lock(&ocfs2_control_lock); 211 + c->oc_conn = conn; 212 + 213 + if (atomic_read(&ocfs2_control_opened)) 214 + list_add(&c->oc_list, &ocfs2_live_connection_list); 215 + else { 216 + printk(KERN_ERR 217 + "ocfs2: Userspace control daemon is not present\n"); 218 + rc = -ESRCH; 219 + } 220 + 221 + mutex_unlock(&ocfs2_control_lock); 222 + 223 + if (!rc) 224 + *c_ret = c; 225 + else 226 + kfree(c); 227 + 228 + return rc; 229 + } 230 + 231 + /* 232 + * This function disconnects the cluster connection from ocfs2_control. 233 + * Afterwards, userspace can't affect the cluster connection. 234 + */ 235 + static void ocfs2_live_connection_drop(struct ocfs2_live_connection *c) 236 + { 237 + mutex_lock(&ocfs2_control_lock); 238 + list_del_init(&c->oc_list); 239 + c->oc_conn = NULL; 240 + mutex_unlock(&ocfs2_control_lock); 241 + 242 + kfree(c); 243 + } 244 + 245 + static int ocfs2_control_cfu(void *target, size_t target_len, 246 + const char __user *buf, size_t count) 247 + { 248 + /* The T01 expects write(2) calls to have exactly one command */ 249 + if ((count != target_len) || 250 + (count > sizeof(union ocfs2_control_message))) 251 + return -EINVAL; 252 + 253 + if (copy_from_user(target, buf, target_len)) 254 + return -EFAULT; 255 + 256 + return 0; 257 + } 258 + 259 + static ssize_t ocfs2_control_validate_protocol(struct file *file, 260 + const char __user *buf, 261 + size_t count) 262 + { 263 + ssize_t ret; 264 + char kbuf[OCFS2_CONTROL_PROTO_LEN]; 265 + 266 + ret = ocfs2_control_cfu(kbuf, OCFS2_CONTROL_PROTO_LEN, 267 + buf, count); 268 + if (ret) 269 + return ret; 270 + 271 + if (strncmp(kbuf, OCFS2_CONTROL_PROTO, OCFS2_CONTROL_PROTO_LEN)) 272 + return -EINVAL; 273 + 274 + ocfs2_control_set_handshake_state(file, 275 + OCFS2_CONTROL_HANDSHAKE_PROTOCOL); 276 + 277 + return count; 278 + } 279 + 280 + static void ocfs2_control_send_down(const char *uuid, 281 + int nodenum) 282 + { 283 + struct ocfs2_live_connection *c; 284 + 285 + mutex_lock(&ocfs2_control_lock); 286 + 287 + c = ocfs2_connection_find(uuid); 288 + if (c) { 289 + BUG_ON(c->oc_conn == NULL); 290 + c->oc_conn->cc_recovery_handler(nodenum, 291 + c->oc_conn->cc_recovery_data); 292 + } 293 + 294 + mutex_unlock(&ocfs2_control_lock); 295 + } 296 + 297 + /* 298 + * Called whenever configuration elements are sent to /dev/ocfs2_control. 299 + * If all configuration elements are present, try to set the global 300 + * values. If there is a problem, return an error. Skip any missing 301 + * elements, and only bump ocfs2_control_opened when we have all elements 302 + * and are successful. 303 + */ 304 + static int ocfs2_control_install_private(struct file *file) 305 + { 306 + int rc = 0; 307 + int set_p = 1; 308 + struct ocfs2_control_private *p = file->private_data; 309 + 310 + BUG_ON(p->op_state != OCFS2_CONTROL_HANDSHAKE_PROTOCOL); 311 + 312 + mutex_lock(&ocfs2_control_lock); 313 + 314 + if (p->op_this_node < 0) { 315 + set_p = 0; 316 + } else if ((ocfs2_control_this_node >= 0) && 317 + (ocfs2_control_this_node != p->op_this_node)) { 318 + rc = -EINVAL; 319 + goto out_unlock; 320 + } 321 + 322 + if (!p->op_proto.pv_major) { 323 + set_p = 0; 324 + } else if (!list_empty(&ocfs2_live_connection_list) && 325 + ((running_proto.pv_major != p->op_proto.pv_major) || 326 + (running_proto.pv_minor != p->op_proto.pv_minor))) { 327 + rc = -EINVAL; 328 + goto out_unlock; 329 + } 330 + 331 + if (set_p) { 332 + ocfs2_control_this_node = p->op_this_node; 333 + running_proto.pv_major = p->op_proto.pv_major; 334 + running_proto.pv_minor = p->op_proto.pv_minor; 335 + } 336 + 337 + out_unlock: 338 + mutex_unlock(&ocfs2_control_lock); 339 + 340 + if (!rc && set_p) { 341 + /* We set the global values successfully */ 342 + atomic_inc(&ocfs2_control_opened); 343 + ocfs2_control_set_handshake_state(file, 344 + OCFS2_CONTROL_HANDSHAKE_VALID); 345 + } 346 + 347 + return rc; 348 + } 349 + 350 + static int ocfs2_control_get_this_node(void) 351 + { 352 + int rc; 353 + 354 + mutex_lock(&ocfs2_control_lock); 355 + if (ocfs2_control_this_node < 0) 356 + rc = -EINVAL; 357 + else 358 + rc = ocfs2_control_this_node; 359 + mutex_unlock(&ocfs2_control_lock); 360 + 361 + return rc; 362 + } 363 + 364 + static int ocfs2_control_do_setnode_msg(struct file *file, 365 + struct ocfs2_control_message_setn *msg) 366 + { 367 + long nodenum; 368 + char *ptr = NULL; 369 + struct ocfs2_control_private *p = file->private_data; 370 + 371 + if (ocfs2_control_get_handshake_state(file) != 372 + OCFS2_CONTROL_HANDSHAKE_PROTOCOL) 373 + return -EINVAL; 374 + 375 + if (strncmp(msg->tag, OCFS2_CONTROL_MESSAGE_SETNODE_OP, 376 + OCFS2_CONTROL_MESSAGE_OP_LEN)) 377 + return -EINVAL; 378 + 379 + if ((msg->space != ' ') || (msg->newline != '\n')) 380 + return -EINVAL; 381 + msg->space = msg->newline = '\0'; 382 + 383 + nodenum = simple_strtol(msg->nodestr, &ptr, 16); 384 + if (!ptr || *ptr) 385 + return -EINVAL; 386 + 387 + if ((nodenum == LONG_MIN) || (nodenum == LONG_MAX) || 388 + (nodenum > INT_MAX) || (nodenum < 0)) 389 + return -ERANGE; 390 + p->op_this_node = nodenum; 391 + 392 + return ocfs2_control_install_private(file); 393 + } 394 + 395 + static int ocfs2_control_do_setversion_msg(struct file *file, 396 + struct ocfs2_control_message_setv *msg) 397 + { 398 + long major, minor; 399 + char *ptr = NULL; 400 + struct ocfs2_control_private *p = file->private_data; 401 + struct ocfs2_protocol_version *max = 402 + &user_stack.sp_proto->lp_max_version; 403 + 404 + if (ocfs2_control_get_handshake_state(file) != 405 + OCFS2_CONTROL_HANDSHAKE_PROTOCOL) 406 + return -EINVAL; 407 + 408 + if (strncmp(msg->tag, OCFS2_CONTROL_MESSAGE_SETVERSION_OP, 409 + OCFS2_CONTROL_MESSAGE_OP_LEN)) 410 + return -EINVAL; 411 + 412 + if ((msg->space1 != ' ') || (msg->space2 != ' ') || 413 + (msg->newline != '\n')) 414 + return -EINVAL; 415 + msg->space1 = msg->space2 = msg->newline = '\0'; 416 + 417 + major = simple_strtol(msg->major, &ptr, 16); 418 + if (!ptr || *ptr) 419 + return -EINVAL; 420 + minor = simple_strtol(msg->minor, &ptr, 16); 421 + if (!ptr || *ptr) 422 + return -EINVAL; 423 + 424 + /* 425 + * The major must be between 1 and 255, inclusive. The minor 426 + * must be between 0 and 255, inclusive. The version passed in 427 + * must be within the maximum version supported by the filesystem. 428 + */ 429 + if ((major == LONG_MIN) || (major == LONG_MAX) || 430 + (major > (u8)-1) || (major < 1)) 431 + return -ERANGE; 432 + if ((minor == LONG_MIN) || (minor == LONG_MAX) || 433 + (minor > (u8)-1) || (minor < 0)) 434 + return -ERANGE; 435 + if ((major != max->pv_major) || 436 + (minor > max->pv_minor)) 437 + return -EINVAL; 438 + 439 + p->op_proto.pv_major = major; 440 + p->op_proto.pv_minor = minor; 441 + 442 + return ocfs2_control_install_private(file); 443 + } 444 + 445 + static int ocfs2_control_do_down_msg(struct file *file, 446 + struct ocfs2_control_message_down *msg) 447 + { 448 + long nodenum; 449 + char *p = NULL; 450 + 451 + if (ocfs2_control_get_handshake_state(file) != 452 + OCFS2_CONTROL_HANDSHAKE_VALID) 453 + return -EINVAL; 454 + 455 + if (strncmp(msg->tag, OCFS2_CONTROL_MESSAGE_DOWN_OP, 456 + OCFS2_CONTROL_MESSAGE_OP_LEN)) 457 + return -EINVAL; 458 + 459 + if ((msg->space1 != ' ') || (msg->space2 != ' ') || 460 + (msg->newline != '\n')) 461 + return -EINVAL; 462 + msg->space1 = msg->space2 = msg->newline = '\0'; 463 + 464 + nodenum = simple_strtol(msg->nodestr, &p, 16); 465 + if (!p || *p) 466 + return -EINVAL; 467 + 468 + if ((nodenum == LONG_MIN) || (nodenum == LONG_MAX) || 469 + (nodenum > INT_MAX) || (nodenum < 0)) 470 + return -ERANGE; 471 + 472 + ocfs2_control_send_down(msg->uuid, nodenum); 473 + 474 + return 0; 475 + } 476 + 477 + static ssize_t ocfs2_control_message(struct file *file, 478 + const char __user *buf, 479 + size_t count) 480 + { 481 + ssize_t ret; 482 + union ocfs2_control_message msg; 483 + 484 + /* Try to catch padding issues */ 485 + WARN_ON(offsetof(struct ocfs2_control_message_down, uuid) != 486 + (sizeof(msg.u_down.tag) + sizeof(msg.u_down.space1))); 487 + 488 + memset(&msg, 0, sizeof(union ocfs2_control_message)); 489 + ret = ocfs2_control_cfu(&msg, count, buf, count); 490 + if (ret) 491 + goto out; 492 + 493 + if ((count == OCFS2_CONTROL_MESSAGE_SETNODE_TOTAL_LEN) && 494 + !strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_SETNODE_OP, 495 + OCFS2_CONTROL_MESSAGE_OP_LEN)) 496 + ret = ocfs2_control_do_setnode_msg(file, &msg.u_setn); 497 + else if ((count == OCFS2_CONTROL_MESSAGE_SETVERSION_TOTAL_LEN) && 498 + !strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_SETVERSION_OP, 499 + OCFS2_CONTROL_MESSAGE_OP_LEN)) 500 + ret = ocfs2_control_do_setversion_msg(file, &msg.u_setv); 501 + else if ((count == OCFS2_CONTROL_MESSAGE_DOWN_TOTAL_LEN) && 502 + !strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_DOWN_OP, 503 + OCFS2_CONTROL_MESSAGE_OP_LEN)) 504 + ret = ocfs2_control_do_down_msg(file, &msg.u_down); 505 + else 506 + ret = -EINVAL; 507 + 508 + out: 509 + return ret ? ret : count; 510 + } 511 + 512 + static ssize_t ocfs2_control_write(struct file *file, 513 + const char __user *buf, 514 + size_t count, 515 + loff_t *ppos) 516 + { 517 + ssize_t ret; 518 + 519 + switch (ocfs2_control_get_handshake_state(file)) { 520 + case OCFS2_CONTROL_HANDSHAKE_INVALID: 521 + ret = -EINVAL; 522 + break; 523 + 524 + case OCFS2_CONTROL_HANDSHAKE_READ: 525 + ret = ocfs2_control_validate_protocol(file, buf, 526 + count); 527 + break; 528 + 529 + case OCFS2_CONTROL_HANDSHAKE_PROTOCOL: 530 + case OCFS2_CONTROL_HANDSHAKE_VALID: 531 + ret = ocfs2_control_message(file, buf, count); 532 + break; 533 + 534 + default: 535 + BUG(); 536 + ret = -EIO; 537 + break; 538 + } 539 + 540 + return ret; 541 + } 542 + 543 + /* 544 + * This is a naive version. If we ever have a new protocol, we'll expand 545 + * it. Probably using seq_file. 546 + */ 547 + static ssize_t ocfs2_control_read(struct file *file, 548 + char __user *buf, 549 + size_t count, 550 + loff_t *ppos) 551 + { 552 + char *proto_string = OCFS2_CONTROL_PROTO; 553 + size_t to_write = 0; 554 + 555 + if (*ppos >= OCFS2_CONTROL_PROTO_LEN) 556 + return 0; 557 + 558 + to_write = OCFS2_CONTROL_PROTO_LEN - *ppos; 559 + if (to_write > count) 560 + to_write = count; 561 + if (copy_to_user(buf, proto_string + *ppos, to_write)) 562 + return -EFAULT; 563 + 564 + *ppos += to_write; 565 + 566 + /* Have we read the whole protocol list? */ 567 + if (*ppos >= OCFS2_CONTROL_PROTO_LEN) 568 + ocfs2_control_set_handshake_state(file, 569 + OCFS2_CONTROL_HANDSHAKE_READ); 570 + 571 + return to_write; 572 + } 573 + 574 + static int ocfs2_control_release(struct inode *inode, struct file *file) 575 + { 576 + struct ocfs2_control_private *p = file->private_data; 577 + 578 + mutex_lock(&ocfs2_control_lock); 579 + 580 + if (ocfs2_control_get_handshake_state(file) != 581 + OCFS2_CONTROL_HANDSHAKE_VALID) 582 + goto out; 583 + 584 + if (atomic_dec_and_test(&ocfs2_control_opened)) { 585 + if (!list_empty(&ocfs2_live_connection_list)) { 586 + /* XXX: Do bad things! */ 587 + printk(KERN_ERR 588 + "ocfs2: Unexpected release of ocfs2_control!\n" 589 + " Loss of cluster connection requires " 590 + "an emergency restart!\n"); 591 + emergency_restart(); 592 + } 593 + /* 594 + * Last valid close clears the node number and resets 595 + * the locking protocol version 596 + */ 597 + ocfs2_control_this_node = -1; 598 + running_proto.pv_major = 0; 599 + running_proto.pv_major = 0; 600 + } 601 + 602 + out: 603 + list_del_init(&p->op_list); 604 + file->private_data = NULL; 605 + 606 + mutex_unlock(&ocfs2_control_lock); 607 + 608 + kfree(p); 609 + 610 + return 0; 611 + } 612 + 613 + static int ocfs2_control_open(struct inode *inode, struct file *file) 614 + { 615 + struct ocfs2_control_private *p; 616 + 617 + p = kzalloc(sizeof(struct ocfs2_control_private), GFP_KERNEL); 618 + if (!p) 619 + return -ENOMEM; 620 + p->op_this_node = -1; 621 + 622 + mutex_lock(&ocfs2_control_lock); 623 + file->private_data = p; 624 + list_add(&p->op_list, &ocfs2_control_private_list); 625 + mutex_unlock(&ocfs2_control_lock); 626 + 627 + return 0; 628 + } 629 + 630 + static const struct file_operations ocfs2_control_fops = { 631 + .open = ocfs2_control_open, 632 + .release = ocfs2_control_release, 633 + .read = ocfs2_control_read, 634 + .write = ocfs2_control_write, 635 + .owner = THIS_MODULE, 636 + }; 637 + 638 + struct miscdevice ocfs2_control_device = { 639 + .minor = MISC_DYNAMIC_MINOR, 640 + .name = "ocfs2_control", 641 + .fops = &ocfs2_control_fops, 642 + }; 643 + 644 + static int ocfs2_control_init(void) 645 + { 646 + int rc; 647 + 648 + atomic_set(&ocfs2_control_opened, 0); 649 + 650 + rc = misc_register(&ocfs2_control_device); 651 + if (rc) 652 + printk(KERN_ERR 653 + "ocfs2: Unable to register ocfs2_control device " 654 + "(errno %d)\n", 655 + -rc); 656 + 657 + return rc; 658 + } 659 + 660 + static void ocfs2_control_exit(void) 661 + { 662 + int rc; 663 + 664 + rc = misc_deregister(&ocfs2_control_device); 665 + if (rc) 666 + printk(KERN_ERR 667 + "ocfs2: Unable to deregister ocfs2_control device " 668 + "(errno %d)\n", 669 + -rc); 670 + } 671 + 672 + static struct dlm_lksb *fsdlm_astarg_to_lksb(void *astarg) 673 + { 674 + struct ocfs2_lock_res *res = astarg; 675 + return &res->l_lksb.lksb_fsdlm; 676 + } 677 + 678 + static void fsdlm_lock_ast_wrapper(void *astarg) 679 + { 680 + struct dlm_lksb *lksb = fsdlm_astarg_to_lksb(astarg); 681 + int status = lksb->sb_status; 682 + 683 + BUG_ON(user_stack.sp_proto == NULL); 684 + 685 + /* 686 + * For now we're punting on the issue of other non-standard errors 687 + * where we can't tell if the unlock_ast or lock_ast should be called. 688 + * The main "other error" that's possible is EINVAL which means the 689 + * function was called with invalid args, which shouldn't be possible 690 + * since the caller here is under our control. Other non-standard 691 + * errors probably fall into the same category, or otherwise are fatal 692 + * which means we can't carry on anyway. 693 + */ 694 + 695 + if (status == -DLM_EUNLOCK || status == -DLM_ECANCEL) 696 + user_stack.sp_proto->lp_unlock_ast(astarg, 0); 697 + else 698 + user_stack.sp_proto->lp_lock_ast(astarg); 699 + } 700 + 701 + static void fsdlm_blocking_ast_wrapper(void *astarg, int level) 702 + { 703 + BUG_ON(user_stack.sp_proto == NULL); 704 + 705 + user_stack.sp_proto->lp_blocking_ast(astarg, level); 706 + } 707 + 708 + static int user_dlm_lock(struct ocfs2_cluster_connection *conn, 709 + int mode, 710 + union ocfs2_dlm_lksb *lksb, 711 + u32 flags, 712 + void *name, 713 + unsigned int namelen, 714 + void *astarg) 715 + { 716 + int ret; 717 + 718 + if (!lksb->lksb_fsdlm.sb_lvbptr) 719 + lksb->lksb_fsdlm.sb_lvbptr = (char *)lksb + 720 + sizeof(struct dlm_lksb); 721 + 722 + ret = dlm_lock(conn->cc_lockspace, mode, &lksb->lksb_fsdlm, 723 + flags|DLM_LKF_NODLCKWT, name, namelen, 0, 724 + fsdlm_lock_ast_wrapper, astarg, 725 + fsdlm_blocking_ast_wrapper); 726 + return ret; 727 + } 728 + 729 + static int user_dlm_unlock(struct ocfs2_cluster_connection *conn, 730 + union ocfs2_dlm_lksb *lksb, 731 + u32 flags, 732 + void *astarg) 733 + { 734 + int ret; 735 + 736 + ret = dlm_unlock(conn->cc_lockspace, lksb->lksb_fsdlm.sb_lkid, 737 + flags, &lksb->lksb_fsdlm, astarg); 738 + return ret; 739 + } 740 + 741 + static int user_dlm_lock_status(union ocfs2_dlm_lksb *lksb) 742 + { 743 + return lksb->lksb_fsdlm.sb_status; 744 + } 745 + 746 + static void *user_dlm_lvb(union ocfs2_dlm_lksb *lksb) 747 + { 748 + return (void *)(lksb->lksb_fsdlm.sb_lvbptr); 749 + } 750 + 751 + static void user_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb) 752 + { 753 + } 754 + 755 + /* 756 + * Compare a requested locking protocol version against the current one. 757 + * 758 + * If the major numbers are different, they are incompatible. 759 + * If the current minor is greater than the request, they are incompatible. 760 + * If the current minor is less than or equal to the request, they are 761 + * compatible, and the requester should run at the current minor version. 762 + */ 763 + static int fs_protocol_compare(struct ocfs2_protocol_version *existing, 764 + struct ocfs2_protocol_version *request) 765 + { 766 + if (existing->pv_major != request->pv_major) 767 + return 1; 768 + 769 + if (existing->pv_minor > request->pv_minor) 770 + return 1; 771 + 772 + if (existing->pv_minor < request->pv_minor) 773 + request->pv_minor = existing->pv_minor; 774 + 775 + return 0; 776 + } 777 + 778 + static int user_cluster_connect(struct ocfs2_cluster_connection *conn) 779 + { 780 + dlm_lockspace_t *fsdlm; 781 + struct ocfs2_live_connection *control; 782 + int rc = 0; 783 + 784 + BUG_ON(conn == NULL); 785 + 786 + rc = ocfs2_live_connection_new(conn, &control); 787 + if (rc) 788 + goto out; 789 + 790 + /* 791 + * running_proto must have been set before we allowed any mounts 792 + * to proceed. 793 + */ 794 + if (fs_protocol_compare(&running_proto, &conn->cc_version)) { 795 + printk(KERN_ERR 796 + "Unable to mount with fs locking protocol version " 797 + "%u.%u because the userspace control daemon has " 798 + "negotiated %u.%u\n", 799 + conn->cc_version.pv_major, conn->cc_version.pv_minor, 800 + running_proto.pv_major, running_proto.pv_minor); 801 + rc = -EPROTO; 802 + ocfs2_live_connection_drop(control); 803 + goto out; 804 + } 805 + 806 + rc = dlm_new_lockspace(conn->cc_name, strlen(conn->cc_name), 807 + &fsdlm, DLM_LSFL_FS, DLM_LVB_LEN); 808 + if (rc) { 809 + ocfs2_live_connection_drop(control); 810 + goto out; 811 + } 812 + 813 + conn->cc_private = control; 814 + conn->cc_lockspace = fsdlm; 815 + out: 816 + return rc; 817 + } 818 + 819 + static int user_cluster_disconnect(struct ocfs2_cluster_connection *conn, 820 + int hangup_pending) 821 + { 822 + dlm_release_lockspace(conn->cc_lockspace, 2); 823 + conn->cc_lockspace = NULL; 824 + ocfs2_live_connection_drop(conn->cc_private); 825 + conn->cc_private = NULL; 826 + return 0; 827 + } 828 + 829 + static int user_cluster_this_node(unsigned int *this_node) 830 + { 831 + int rc; 832 + 833 + rc = ocfs2_control_get_this_node(); 834 + if (rc < 0) 835 + return rc; 836 + 837 + *this_node = rc; 838 + return 0; 839 + } 840 + 841 + static struct ocfs2_stack_operations user_stack_ops = { 842 + .connect = user_cluster_connect, 843 + .disconnect = user_cluster_disconnect, 844 + .this_node = user_cluster_this_node, 845 + .dlm_lock = user_dlm_lock, 846 + .dlm_unlock = user_dlm_unlock, 847 + .lock_status = user_dlm_lock_status, 848 + .lock_lvb = user_dlm_lvb, 849 + .dump_lksb = user_dlm_dump_lksb, 850 + }; 851 + 852 + static struct ocfs2_stack_plugin user_stack = { 853 + .sp_name = "user", 854 + .sp_ops = &user_stack_ops, 855 + .sp_owner = THIS_MODULE, 856 + }; 857 + 858 + 859 + static int __init user_stack_init(void) 860 + { 861 + int rc; 862 + 863 + rc = ocfs2_control_init(); 864 + if (!rc) { 865 + rc = ocfs2_stack_glue_register(&user_stack); 866 + if (rc) 867 + ocfs2_control_exit(); 868 + } 869 + 870 + return rc; 871 + } 872 + 873 + static void __exit user_stack_exit(void) 874 + { 875 + ocfs2_stack_glue_unregister(&user_stack); 876 + ocfs2_control_exit(); 877 + } 878 + 879 + MODULE_AUTHOR("Oracle"); 880 + MODULE_DESCRIPTION("ocfs2 driver for userspace cluster stacks"); 881 + MODULE_LICENSE("GPL"); 882 + module_init(user_stack_init); 883 + module_exit(user_stack_exit);

+568

fs/ocfs2/stackglue.c

··· 1 + /* -*- mode: c; c-basic-offset: 8; -*- 2 + * vim: noexpandtab sw=8 ts=8 sts=0: 3 + * 4 + * stackglue.c 5 + * 6 + * Code which implements an OCFS2 specific interface to underlying 7 + * cluster stacks. 8 + * 9 + * Copyright (C) 2007 Oracle. All rights reserved. 10 + * 11 + * This program is free software; you can redistribute it and/or 12 + * modify it under the terms of the GNU General Public 13 + * License as published by the Free Software Foundation, version 2. 14 + * 15 + * This program is distributed in the hope that it will be useful, 16 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 17 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 18 + * General Public License for more details. 19 + */ 20 + 21 + #include <linux/list.h> 22 + #include <linux/spinlock.h> 23 + #include <linux/module.h> 24 + #include <linux/slab.h> 25 + #include <linux/kmod.h> 26 + #include <linux/fs.h> 27 + #include <linux/kobject.h> 28 + #include <linux/sysfs.h> 29 + 30 + #include "ocfs2_fs.h" 31 + 32 + #include "stackglue.h" 33 + 34 + #define OCFS2_STACK_PLUGIN_O2CB "o2cb" 35 + #define OCFS2_STACK_PLUGIN_USER "user" 36 + 37 + static struct ocfs2_locking_protocol *lproto; 38 + static DEFINE_SPINLOCK(ocfs2_stack_lock); 39 + static LIST_HEAD(ocfs2_stack_list); 40 + static char cluster_stack_name[OCFS2_STACK_LABEL_LEN + 1]; 41 + 42 + /* 43 + * The stack currently in use. If not null, active_stack->sp_count > 0, 44 + * the module is pinned, and the locking protocol cannot be changed. 45 + */ 46 + static struct ocfs2_stack_plugin *active_stack; 47 + 48 + static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name) 49 + { 50 + struct ocfs2_stack_plugin *p; 51 + 52 + assert_spin_locked(&ocfs2_stack_lock); 53 + 54 + list_for_each_entry(p, &ocfs2_stack_list, sp_list) { 55 + if (!strcmp(p->sp_name, name)) 56 + return p; 57 + } 58 + 59 + return NULL; 60 + } 61 + 62 + static int ocfs2_stack_driver_request(const char *stack_name, 63 + const char *plugin_name) 64 + { 65 + int rc; 66 + struct ocfs2_stack_plugin *p; 67 + 68 + spin_lock(&ocfs2_stack_lock); 69 + 70 + /* 71 + * If the stack passed by the filesystem isn't the selected one, 72 + * we can't continue. 73 + */ 74 + if (strcmp(stack_name, cluster_stack_name)) { 75 + rc = -EBUSY; 76 + goto out; 77 + } 78 + 79 + if (active_stack) { 80 + /* 81 + * If the active stack isn't the one we want, it cannot 82 + * be selected right now. 83 + */ 84 + if (!strcmp(active_stack->sp_name, plugin_name)) 85 + rc = 0; 86 + else 87 + rc = -EBUSY; 88 + goto out; 89 + } 90 + 91 + p = ocfs2_stack_lookup(plugin_name); 92 + if (!p || !try_module_get(p->sp_owner)) { 93 + rc = -ENOENT; 94 + goto out; 95 + } 96 + 97 + /* Ok, the stack is pinned */ 98 + p->sp_count++; 99 + active_stack = p; 100 + 101 + rc = 0; 102 + 103 + out: 104 + spin_unlock(&ocfs2_stack_lock); 105 + return rc; 106 + } 107 + 108 + /* 109 + * This function looks up the appropriate stack and makes it active. If 110 + * there is no stack, it tries to load it. It will fail if the stack still 111 + * cannot be found. It will also fail if a different stack is in use. 112 + */ 113 + static int ocfs2_stack_driver_get(const char *stack_name) 114 + { 115 + int rc; 116 + char *plugin_name = OCFS2_STACK_PLUGIN_O2CB; 117 + 118 + /* 119 + * Classic stack does not pass in a stack name. This is 120 + * compatible with older tools as well. 121 + */ 122 + if (!stack_name || !*stack_name) 123 + stack_name = OCFS2_STACK_PLUGIN_O2CB; 124 + 125 + if (strlen(stack_name) != OCFS2_STACK_LABEL_LEN) { 126 + printk(KERN_ERR 127 + "ocfs2 passed an invalid cluster stack label: \"%s\"\n", 128 + stack_name); 129 + return -EINVAL; 130 + } 131 + 132 + /* Anything that isn't the classic stack is a user stack */ 133 + if (strcmp(stack_name, OCFS2_STACK_PLUGIN_O2CB)) 134 + plugin_name = OCFS2_STACK_PLUGIN_USER; 135 + 136 + rc = ocfs2_stack_driver_request(stack_name, plugin_name); 137 + if (rc == -ENOENT) { 138 + request_module("ocfs2_stack_%s", plugin_name); 139 + rc = ocfs2_stack_driver_request(stack_name, plugin_name); 140 + } 141 + 142 + if (rc == -ENOENT) { 143 + printk(KERN_ERR 144 + "ocfs2: Cluster stack driver \"%s\" cannot be found\n", 145 + plugin_name); 146 + } else if (rc == -EBUSY) { 147 + printk(KERN_ERR 148 + "ocfs2: A different cluster stack is in use\n"); 149 + } 150 + 151 + return rc; 152 + } 153 + 154 + static void ocfs2_stack_driver_put(void) 155 + { 156 + spin_lock(&ocfs2_stack_lock); 157 + BUG_ON(active_stack == NULL); 158 + BUG_ON(active_stack->sp_count == 0); 159 + 160 + active_stack->sp_count--; 161 + if (!active_stack->sp_count) { 162 + module_put(active_stack->sp_owner); 163 + active_stack = NULL; 164 + } 165 + spin_unlock(&ocfs2_stack_lock); 166 + } 167 + 168 + int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin) 169 + { 170 + int rc; 171 + 172 + spin_lock(&ocfs2_stack_lock); 173 + if (!ocfs2_stack_lookup(plugin->sp_name)) { 174 + plugin->sp_count = 0; 175 + plugin->sp_proto = lproto; 176 + list_add(&plugin->sp_list, &ocfs2_stack_list); 177 + printk(KERN_INFO "ocfs2: Registered cluster interface %s\n", 178 + plugin->sp_name); 179 + rc = 0; 180 + } else { 181 + printk(KERN_ERR "ocfs2: Stack \"%s\" already registered\n", 182 + plugin->sp_name); 183 + rc = -EEXIST; 184 + } 185 + spin_unlock(&ocfs2_stack_lock); 186 + 187 + return rc; 188 + } 189 + EXPORT_SYMBOL_GPL(ocfs2_stack_glue_register); 190 + 191 + void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin) 192 + { 193 + struct ocfs2_stack_plugin *p; 194 + 195 + spin_lock(&ocfs2_stack_lock); 196 + p = ocfs2_stack_lookup(plugin->sp_name); 197 + if (p) { 198 + BUG_ON(p != plugin); 199 + BUG_ON(plugin == active_stack); 200 + BUG_ON(plugin->sp_count != 0); 201 + list_del_init(&plugin->sp_list); 202 + printk(KERN_INFO "ocfs2: Unregistered cluster interface %s\n", 203 + plugin->sp_name); 204 + } else { 205 + printk(KERN_ERR "Stack \"%s\" is not registered\n", 206 + plugin->sp_name); 207 + } 208 + spin_unlock(&ocfs2_stack_lock); 209 + } 210 + EXPORT_SYMBOL_GPL(ocfs2_stack_glue_unregister); 211 + 212 + void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto) 213 + { 214 + struct ocfs2_stack_plugin *p; 215 + 216 + BUG_ON(proto == NULL); 217 + 218 + spin_lock(&ocfs2_stack_lock); 219 + BUG_ON(active_stack != NULL); 220 + 221 + lproto = proto; 222 + list_for_each_entry(p, &ocfs2_stack_list, sp_list) { 223 + p->sp_proto = lproto; 224 + } 225 + 226 + spin_unlock(&ocfs2_stack_lock); 227 + } 228 + EXPORT_SYMBOL_GPL(ocfs2_stack_glue_set_locking_protocol); 229 + 230 + 231 + /* 232 + * The ocfs2_dlm_lock() and ocfs2_dlm_unlock() functions take 233 + * "struct ocfs2_lock_res *astarg" instead of "void *astarg" because the 234 + * underlying stack plugins need to pilfer the lksb off of the lock_res. 235 + * If some other structure needs to be passed as an astarg, the plugins 236 + * will need to be given a different avenue to the lksb. 237 + */ 238 + int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn, 239 + int mode, 240 + union ocfs2_dlm_lksb *lksb, 241 + u32 flags, 242 + void *name, 243 + unsigned int namelen, 244 + struct ocfs2_lock_res *astarg) 245 + { 246 + BUG_ON(lproto == NULL); 247 + 248 + return active_stack->sp_ops->dlm_lock(conn, mode, lksb, flags, 249 + name, namelen, astarg); 250 + } 251 + EXPORT_SYMBOL_GPL(ocfs2_dlm_lock); 252 + 253 + int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn, 254 + union ocfs2_dlm_lksb *lksb, 255 + u32 flags, 256 + struct ocfs2_lock_res *astarg) 257 + { 258 + BUG_ON(lproto == NULL); 259 + 260 + return active_stack->sp_ops->dlm_unlock(conn, lksb, flags, astarg); 261 + } 262 + EXPORT_SYMBOL_GPL(ocfs2_dlm_unlock); 263 + 264 + int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb) 265 + { 266 + return active_stack->sp_ops->lock_status(lksb); 267 + } 268 + EXPORT_SYMBOL_GPL(ocfs2_dlm_lock_status); 269 + 270 + /* 271 + * Why don't we cast to ocfs2_meta_lvb? The "clean" answer is that we 272 + * don't cast at the glue level. The real answer is that the header 273 + * ordering is nigh impossible. 274 + */ 275 + void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb) 276 + { 277 + return active_stack->sp_ops->lock_lvb(lksb); 278 + } 279 + EXPORT_SYMBOL_GPL(ocfs2_dlm_lvb); 280 + 281 + void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb) 282 + { 283 + active_stack->sp_ops->dump_lksb(lksb); 284 + } 285 + EXPORT_SYMBOL_GPL(ocfs2_dlm_dump_lksb); 286 + 287 + int ocfs2_cluster_connect(const char *stack_name, 288 + const char *group, 289 + int grouplen, 290 + void (*recovery_handler)(int node_num, 291 + void *recovery_data), 292 + void *recovery_data, 293 + struct ocfs2_cluster_connection **conn) 294 + { 295 + int rc = 0; 296 + struct ocfs2_cluster_connection *new_conn; 297 + 298 + BUG_ON(group == NULL); 299 + BUG_ON(conn == NULL); 300 + BUG_ON(recovery_handler == NULL); 301 + 302 + if (grouplen > GROUP_NAME_MAX) { 303 + rc = -EINVAL; 304 + goto out; 305 + } 306 + 307 + new_conn = kzalloc(sizeof(struct ocfs2_cluster_connection), 308 + GFP_KERNEL); 309 + if (!new_conn) { 310 + rc = -ENOMEM; 311 + goto out; 312 + } 313 + 314 + memcpy(new_conn->cc_name, group, grouplen); 315 + new_conn->cc_namelen = grouplen; 316 + new_conn->cc_recovery_handler = recovery_handler; 317 + new_conn->cc_recovery_data = recovery_data; 318 + 319 + /* Start the new connection at our maximum compatibility level */ 320 + new_conn->cc_version = lproto->lp_max_version; 321 + 322 + /* This will pin the stack driver if successful */ 323 + rc = ocfs2_stack_driver_get(stack_name); 324 + if (rc) 325 + goto out_free; 326 + 327 + rc = active_stack->sp_ops->connect(new_conn); 328 + if (rc) { 329 + ocfs2_stack_driver_put(); 330 + goto out_free; 331 + } 332 + 333 + *conn = new_conn; 334 + 335 + out_free: 336 + if (rc) 337 + kfree(new_conn); 338 + 339 + out: 340 + return rc; 341 + } 342 + EXPORT_SYMBOL_GPL(ocfs2_cluster_connect); 343 + 344 + /* If hangup_pending is 0, the stack driver will be dropped */ 345 + int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn, 346 + int hangup_pending) 347 + { 348 + int ret; 349 + 350 + BUG_ON(conn == NULL); 351 + 352 + ret = active_stack->sp_ops->disconnect(conn, hangup_pending); 353 + 354 + /* XXX Should we free it anyway? */ 355 + if (!ret) { 356 + kfree(conn); 357 + if (!hangup_pending) 358 + ocfs2_stack_driver_put(); 359 + } 360 + 361 + return ret; 362 + } 363 + EXPORT_SYMBOL_GPL(ocfs2_cluster_disconnect); 364 + 365 + void ocfs2_cluster_hangup(const char *group, int grouplen) 366 + { 367 + BUG_ON(group == NULL); 368 + BUG_ON(group[grouplen] != '\0'); 369 + 370 + if (active_stack->sp_ops->hangup) 371 + active_stack->sp_ops->hangup(group, grouplen); 372 + 373 + /* cluster_disconnect() was called with hangup_pending==1 */ 374 + ocfs2_stack_driver_put(); 375 + } 376 + EXPORT_SYMBOL_GPL(ocfs2_cluster_hangup); 377 + 378 + int ocfs2_cluster_this_node(unsigned int *node) 379 + { 380 + return active_stack->sp_ops->this_node(node); 381 + } 382 + EXPORT_SYMBOL_GPL(ocfs2_cluster_this_node); 383 + 384 + 385 + /* 386 + * Sysfs bits 387 + */ 388 + 389 + static ssize_t ocfs2_max_locking_protocol_show(struct kobject *kobj, 390 + struct kobj_attribute *attr, 391 + char *buf) 392 + { 393 + ssize_t ret = 0; 394 + 395 + spin_lock(&ocfs2_stack_lock); 396 + if (lproto) 397 + ret = snprintf(buf, PAGE_SIZE, "%u.%u\n", 398 + lproto->lp_max_version.pv_major, 399 + lproto->lp_max_version.pv_minor); 400 + spin_unlock(&ocfs2_stack_lock); 401 + 402 + return ret; 403 + } 404 + 405 + static struct kobj_attribute ocfs2_attr_max_locking_protocol = 406 + __ATTR(max_locking_protocol, S_IFREG | S_IRUGO, 407 + ocfs2_max_locking_protocol_show, NULL); 408 + 409 + static ssize_t ocfs2_loaded_cluster_plugins_show(struct kobject *kobj, 410 + struct kobj_attribute *attr, 411 + char *buf) 412 + { 413 + ssize_t ret = 0, total = 0, remain = PAGE_SIZE; 414 + struct ocfs2_stack_plugin *p; 415 + 416 + spin_lock(&ocfs2_stack_lock); 417 + list_for_each_entry(p, &ocfs2_stack_list, sp_list) { 418 + ret = snprintf(buf, remain, "%s\n", 419 + p->sp_name); 420 + if (ret < 0) { 421 + total = ret; 422 + break; 423 + } 424 + if (ret == remain) { 425 + /* snprintf() didn't fit */ 426 + total = -E2BIG; 427 + break; 428 + } 429 + total += ret; 430 + remain -= ret; 431 + } 432 + spin_unlock(&ocfs2_stack_lock); 433 + 434 + return total; 435 + } 436 + 437 + static struct kobj_attribute ocfs2_attr_loaded_cluster_plugins = 438 + __ATTR(loaded_cluster_plugins, S_IFREG | S_IRUGO, 439 + ocfs2_loaded_cluster_plugins_show, NULL); 440 + 441 + static ssize_t ocfs2_active_cluster_plugin_show(struct kobject *kobj, 442 + struct kobj_attribute *attr, 443 + char *buf) 444 + { 445 + ssize_t ret = 0; 446 + 447 + spin_lock(&ocfs2_stack_lock); 448 + if (active_stack) { 449 + ret = snprintf(buf, PAGE_SIZE, "%s\n", 450 + active_stack->sp_name); 451 + if (ret == PAGE_SIZE) 452 + ret = -E2BIG; 453 + } 454 + spin_unlock(&ocfs2_stack_lock); 455 + 456 + return ret; 457 + } 458 + 459 + static struct kobj_attribute ocfs2_attr_active_cluster_plugin = 460 + __ATTR(active_cluster_plugin, S_IFREG | S_IRUGO, 461 + ocfs2_active_cluster_plugin_show, NULL); 462 + 463 + static ssize_t ocfs2_cluster_stack_show(struct kobject *kobj, 464 + struct kobj_attribute *attr, 465 + char *buf) 466 + { 467 + ssize_t ret; 468 + spin_lock(&ocfs2_stack_lock); 469 + ret = snprintf(buf, PAGE_SIZE, "%s\n", cluster_stack_name); 470 + spin_unlock(&ocfs2_stack_lock); 471 + 472 + return ret; 473 + } 474 + 475 + static ssize_t ocfs2_cluster_stack_store(struct kobject *kobj, 476 + struct kobj_attribute *attr, 477 + const char *buf, size_t count) 478 + { 479 + size_t len = count; 480 + ssize_t ret; 481 + 482 + if (len == 0) 483 + return len; 484 + 485 + if (buf[len - 1] == '\n') 486 + len--; 487 + 488 + if ((len != OCFS2_STACK_LABEL_LEN) || 489 + (strnlen(buf, len) != len)) 490 + return -EINVAL; 491 + 492 + spin_lock(&ocfs2_stack_lock); 493 + if (active_stack) { 494 + if (!strncmp(buf, cluster_stack_name, len)) 495 + ret = count; 496 + else 497 + ret = -EBUSY; 498 + } else { 499 + memcpy(cluster_stack_name, buf, len); 500 + ret = count; 501 + } 502 + spin_unlock(&ocfs2_stack_lock); 503 + 504 + return ret; 505 + } 506 + 507 + 508 + static struct kobj_attribute ocfs2_attr_cluster_stack = 509 + __ATTR(cluster_stack, S_IFREG | S_IRUGO | S_IWUSR, 510 + ocfs2_cluster_stack_show, 511 + ocfs2_cluster_stack_store); 512 + 513 + static struct attribute *ocfs2_attrs[] = { 514 + &ocfs2_attr_max_locking_protocol.attr, 515 + &ocfs2_attr_loaded_cluster_plugins.attr, 516 + &ocfs2_attr_active_cluster_plugin.attr, 517 + &ocfs2_attr_cluster_stack.attr, 518 + NULL, 519 + }; 520 + 521 + static struct attribute_group ocfs2_attr_group = { 522 + .attrs = ocfs2_attrs, 523 + }; 524 + 525 + static struct kset *ocfs2_kset; 526 + 527 + static void ocfs2_sysfs_exit(void) 528 + { 529 + kset_unregister(ocfs2_kset); 530 + } 531 + 532 + static int ocfs2_sysfs_init(void) 533 + { 534 + int ret; 535 + 536 + ocfs2_kset = kset_create_and_add("ocfs2", NULL, fs_kobj); 537 + if (!ocfs2_kset) 538 + return -ENOMEM; 539 + 540 + ret = sysfs_create_group(&ocfs2_kset->kobj, &ocfs2_attr_group); 541 + if (ret) 542 + goto error; 543 + 544 + return 0; 545 + 546 + error: 547 + kset_unregister(ocfs2_kset); 548 + return ret; 549 + } 550 + 551 + static int __init ocfs2_stack_glue_init(void) 552 + { 553 + strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB); 554 + 555 + return ocfs2_sysfs_init(); 556 + } 557 + 558 + static void __exit ocfs2_stack_glue_exit(void) 559 + { 560 + lproto = NULL; 561 + ocfs2_sysfs_exit(); 562 + } 563 + 564 + MODULE_AUTHOR("Oracle"); 565 + MODULE_DESCRIPTION("ocfs2 cluter stack glue layer"); 566 + MODULE_LICENSE("GPL"); 567 + module_init(ocfs2_stack_glue_init); 568 + module_exit(ocfs2_stack_glue_exit);

+261

fs/ocfs2/stackglue.h

··· 1 + /* -*- mode: c; c-basic-offset: 8; -*- 2 + * vim: noexpandtab sw=8 ts=8 sts=0: 3 + * 4 + * stackglue.h 5 + * 6 + * Glue to the underlying cluster stack. 7 + * 8 + * Copyright (C) 2007 Oracle. All rights reserved. 9 + * 10 + * This program is free software; you can redistribute it and/or 11 + * modify it under the terms of the GNU General Public 12 + * License as published by the Free Software Foundation, version 2. 13 + * 14 + * This program is distributed in the hope that it will be useful, 15 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 16 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 17 + * General Public License for more details. 18 + */ 19 + 20 + 21 + #ifndef STACKGLUE_H 22 + #define STACKGLUE_H 23 + 24 + #include <linux/types.h> 25 + #include <linux/list.h> 26 + #include <linux/dlmconstants.h> 27 + 28 + #include "dlm/dlmapi.h" 29 + #include <linux/dlm.h> 30 + 31 + /* 32 + * dlmconstants.h does not have a LOCAL flag. We hope to remove it 33 + * some day, but right now we need it. Let's fake it. This value is larger 34 + * than any flag in dlmconstants.h. 35 + */ 36 + #define DLM_LKF_LOCAL 0x00100000 37 + 38 + /* 39 + * This shadows DLM_LOCKSPACE_LEN in fs/dlm/dlm_internal.h. That probably 40 + * wants to be in a public header. 41 + */ 42 + #define GROUP_NAME_MAX 64 43 + 44 + 45 + /* 46 + * ocfs2_protocol_version changes when ocfs2 does something different in 47 + * its inter-node behavior. See dlmglue.c for more information. 48 + */ 49 + struct ocfs2_protocol_version { 50 + u8 pv_major; 51 + u8 pv_minor; 52 + }; 53 + 54 + /* 55 + * The ocfs2_locking_protocol defines the handlers called on ocfs2's behalf. 56 + */ 57 + struct ocfs2_locking_protocol { 58 + struct ocfs2_protocol_version lp_max_version; 59 + void (*lp_lock_ast)(void *astarg); 60 + void (*lp_blocking_ast)(void *astarg, int level); 61 + void (*lp_unlock_ast)(void *astarg, int error); 62 + }; 63 + 64 + 65 + /* 66 + * The dlm_lockstatus struct includes lvb space, but the dlm_lksb struct only 67 + * has a pointer to separately allocated lvb space. This struct exists only to 68 + * include in the lksb union to make space for a combined dlm_lksb and lvb. 69 + */ 70 + struct fsdlm_lksb_plus_lvb { 71 + struct dlm_lksb lksb; 72 + char lvb[DLM_LVB_LEN]; 73 + }; 74 + 75 + /* 76 + * A union of all lock status structures. We define it here so that the 77 + * size of the union is known. Lock status structures are embedded in 78 + * ocfs2 inodes. 79 + */ 80 + union ocfs2_dlm_lksb { 81 + struct dlm_lockstatus lksb_o2dlm; 82 + struct dlm_lksb lksb_fsdlm; 83 + struct fsdlm_lksb_plus_lvb padding; 84 + }; 85 + 86 + /* 87 + * A cluster connection. Mostly opaque to ocfs2, the connection holds 88 + * state for the underlying stack. ocfs2 does use cc_version to determine 89 + * locking compatibility. 90 + */ 91 + struct ocfs2_cluster_connection { 92 + char cc_name[GROUP_NAME_MAX]; 93 + int cc_namelen; 94 + struct ocfs2_protocol_version cc_version; 95 + void (*cc_recovery_handler)(int node_num, void *recovery_data); 96 + void *cc_recovery_data; 97 + void *cc_lockspace; 98 + void *cc_private; 99 + }; 100 + 101 + /* 102 + * Each cluster stack implements the stack operations structure. Not used 103 + * in the ocfs2 code, the stackglue code translates generic cluster calls 104 + * into stack operations. 105 + */ 106 + struct ocfs2_stack_operations { 107 + /* 108 + * The fs code calls ocfs2_cluster_connect() to attach a new 109 + * filesystem to the cluster stack. The ->connect() op is passed 110 + * an ocfs2_cluster_connection with the name and recovery field 111 + * filled in. 112 + * 113 + * The stack must set up any notification mechanisms and create 114 + * the filesystem lockspace in the DLM. The lockspace should be 115 + * stored on cc_lockspace. Any other information can be stored on 116 + * cc_private. 117 + * 118 + * ->connect() must not return until it is guaranteed that 119 + * 120 + * - Node down notifications for the filesystem will be recieved 121 + * and passed to conn->cc_recovery_handler(). 122 + * - Locking requests for the filesystem will be processed. 123 + */ 124 + int (*connect)(struct ocfs2_cluster_connection *conn); 125 + 126 + /* 127 + * The fs code calls ocfs2_cluster_disconnect() when a filesystem 128 + * no longer needs cluster services. All DLM locks have been 129 + * dropped, and recovery notification is being ignored by the 130 + * fs code. The stack must disengage from the DLM and discontinue 131 + * recovery notification. 132 + * 133 + * Once ->disconnect() has returned, the connection structure will 134 + * be freed. Thus, a stack must not return from ->disconnect() 135 + * until it will no longer reference the conn pointer. 136 + * 137 + * If hangup_pending is zero, ocfs2_cluster_disconnect() will also 138 + * be dropping the reference on the module. 139 + */ 140 + int (*disconnect)(struct ocfs2_cluster_connection *conn, 141 + int hangup_pending); 142 + 143 + /* 144 + * ocfs2_cluster_hangup() exists for compatibility with older 145 + * ocfs2 tools. Only the classic stack really needs it. As such 146 + * ->hangup() is not required of all stacks. See the comment by 147 + * ocfs2_cluster_hangup() for more details. 148 + * 149 + * Note that ocfs2_cluster_hangup() can only be called if 150 + * hangup_pending was passed to ocfs2_cluster_disconnect(). 151 + */ 152 + void (*hangup)(const char *group, int grouplen); 153 + 154 + /* 155 + * ->this_node() returns the cluster's unique identifier for the 156 + * local node. 157 + */ 158 + int (*this_node)(unsigned int *node); 159 + 160 + /* 161 + * Call the underlying dlm lock function. The ->dlm_lock() 162 + * callback should convert the flags and mode as appropriate. 163 + * 164 + * ast and bast functions are not part of the call because the 165 + * stack will likely want to wrap ast and bast calls before passing 166 + * them to stack->sp_proto. 167 + */ 168 + int (*dlm_lock)(struct ocfs2_cluster_connection *conn, 169 + int mode, 170 + union ocfs2_dlm_lksb *lksb, 171 + u32 flags, 172 + void *name, 173 + unsigned int namelen, 174 + void *astarg); 175 + 176 + /* 177 + * Call the underlying dlm unlock function. The ->dlm_unlock() 178 + * function should convert the flags as appropriate. 179 + * 180 + * The unlock ast is not passed, as the stack will want to wrap 181 + * it before calling stack->sp_proto->lp_unlock_ast(). 182 + */ 183 + int (*dlm_unlock)(struct ocfs2_cluster_connection *conn, 184 + union ocfs2_dlm_lksb *lksb, 185 + u32 flags, 186 + void *astarg); 187 + 188 + /* 189 + * Return the status of the current lock status block. The fs 190 + * code should never dereference the union. The ->lock_status() 191 + * callback pulls out the stack-specific lksb, converts the status 192 + * to a proper errno, and returns it. 193 + */ 194 + int (*lock_status)(union ocfs2_dlm_lksb *lksb); 195 + 196 + /* 197 + * Pull the lvb pointer off of the stack-specific lksb. 198 + */ 199 + void *(*lock_lvb)(union ocfs2_dlm_lksb *lksb); 200 + 201 + /* 202 + * This is an optoinal debugging hook. If provided, the 203 + * stack can dump debugging information about this lock. 204 + */ 205 + void (*dump_lksb)(union ocfs2_dlm_lksb *lksb); 206 + }; 207 + 208 + /* 209 + * Each stack plugin must describe itself by registering a 210 + * ocfs2_stack_plugin structure. This is only seen by stackglue and the 211 + * stack driver. 212 + */ 213 + struct ocfs2_stack_plugin { 214 + char *sp_name; 215 + struct ocfs2_stack_operations *sp_ops; 216 + struct module *sp_owner; 217 + 218 + /* These are managed by the stackglue code. */ 219 + struct list_head sp_list; 220 + unsigned int sp_count; 221 + struct ocfs2_locking_protocol *sp_proto; 222 + }; 223 + 224 + 225 + /* Used by the filesystem */ 226 + int ocfs2_cluster_connect(const char *stack_name, 227 + const char *group, 228 + int grouplen, 229 + void (*recovery_handler)(int node_num, 230 + void *recovery_data), 231 + void *recovery_data, 232 + struct ocfs2_cluster_connection **conn); 233 + int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn, 234 + int hangup_pending); 235 + void ocfs2_cluster_hangup(const char *group, int grouplen); 236 + int ocfs2_cluster_this_node(unsigned int *node); 237 + 238 + struct ocfs2_lock_res; 239 + int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn, 240 + int mode, 241 + union ocfs2_dlm_lksb *lksb, 242 + u32 flags, 243 + void *name, 244 + unsigned int namelen, 245 + struct ocfs2_lock_res *astarg); 246 + int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn, 247 + union ocfs2_dlm_lksb *lksb, 248 + u32 flags, 249 + struct ocfs2_lock_res *astarg); 250 + 251 + int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb); 252 + void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb); 253 + void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb); 254 + 255 + void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto); 256 + 257 + 258 + /* Used by stack plugins */ 259 + int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin); 260 + void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin); 261 + #endif /* STACKGLUE_H */

+97 -6

fs/ocfs2/suballoc.c

··· 46 46 47 47 #include "buffer_head_io.h" 48 48 49 + #define NOT_ALLOC_NEW_GROUP 0 50 + #define ALLOC_NEW_GROUP 1 51 + 52 + #define OCFS2_MAX_INODES_TO_STEAL 1024 53 + 49 54 static inline void ocfs2_debug_bg(struct ocfs2_group_desc *bg); 50 55 static inline void ocfs2_debug_suballoc_inode(struct ocfs2_dinode *fe); 51 56 static inline u16 ocfs2_find_victim_chain(struct ocfs2_chain_list *cl); ··· 111 106 u64 *bg_blkno, 112 107 u16 *bg_bit_off); 113 108 114 - void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac) 109 + static void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac) 115 110 { 116 111 struct inode *inode = ac->ac_inode; 117 112 ··· 122 117 mutex_unlock(&inode->i_mutex); 123 118 124 119 iput(inode); 120 + ac->ac_inode = NULL; 125 121 } 126 - if (ac->ac_bh) 122 + if (ac->ac_bh) { 127 123 brelse(ac->ac_bh); 124 + ac->ac_bh = NULL; 125 + } 126 + } 127 + 128 + void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac) 129 + { 130 + ocfs2_free_ac_resource(ac); 128 131 kfree(ac); 129 132 } 130 133 ··· 404 391 static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb, 405 392 struct ocfs2_alloc_context *ac, 406 393 int type, 407 - u32 slot) 394 + u32 slot, 395 + int alloc_new_group) 408 396 { 409 397 int status; 410 398 u32 bits_wanted = ac->ac_bits_wanted; ··· 434 420 } 435 421 436 422 ac->ac_inode = alloc_inode; 423 + ac->ac_alloc_slot = slot; 437 424 438 425 fe = (struct ocfs2_dinode *) bh->b_data; 439 426 if (!OCFS2_IS_VALID_DINODE(fe)) { ··· 457 442 if (ocfs2_is_cluster_bitmap(alloc_inode)) { 458 443 mlog(0, "Disk Full: wanted=%u, free_bits=%u\n", 459 444 bits_wanted, free_bits); 445 + status = -ENOSPC; 446 + goto bail; 447 + } 448 + 449 + if (alloc_new_group != ALLOC_NEW_GROUP) { 450 + mlog(0, "Alloc File %u Full: wanted=%u, free_bits=%u, " 451 + "and we don't alloc a new group for it.\n", 452 + slot, bits_wanted, free_bits); 460 453 status = -ENOSPC; 461 454 goto bail; 462 455 } ··· 513 490 (*ac)->ac_group_search = ocfs2_block_group_search; 514 491 515 492 status = ocfs2_reserve_suballoc_bits(osb, (*ac), 516 - EXTENT_ALLOC_SYSTEM_INODE, slot); 493 + EXTENT_ALLOC_SYSTEM_INODE, 494 + slot, ALLOC_NEW_GROUP); 517 495 if (status < 0) { 518 496 if (status != -ENOSPC) 519 497 mlog_errno(status); ··· 532 508 return status; 533 509 } 534 510 511 + static int ocfs2_steal_inode_from_other_nodes(struct ocfs2_super *osb, 512 + struct ocfs2_alloc_context *ac) 513 + { 514 + int i, status = -ENOSPC; 515 + s16 slot = ocfs2_get_inode_steal_slot(osb); 516 + 517 + /* Start to steal inodes from the first slot after ours. */ 518 + if (slot == OCFS2_INVALID_SLOT) 519 + slot = osb->slot_num + 1; 520 + 521 + for (i = 0; i < osb->max_slots; i++, slot++) { 522 + if (slot == osb->max_slots) 523 + slot = 0; 524 + 525 + if (slot == osb->slot_num) 526 + continue; 527 + 528 + status = ocfs2_reserve_suballoc_bits(osb, ac, 529 + INODE_ALLOC_SYSTEM_INODE, 530 + slot, NOT_ALLOC_NEW_GROUP); 531 + if (status >= 0) { 532 + ocfs2_set_inode_steal_slot(osb, slot); 533 + break; 534 + } 535 + 536 + ocfs2_free_ac_resource(ac); 537 + } 538 + 539 + return status; 540 + } 541 + 535 542 int ocfs2_reserve_new_inode(struct ocfs2_super *osb, 536 543 struct ocfs2_alloc_context **ac) 537 544 { 538 545 int status; 546 + s16 slot = ocfs2_get_inode_steal_slot(osb); 539 547 540 548 *ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL); 541 549 if (!(*ac)) { ··· 581 525 582 526 (*ac)->ac_group_search = ocfs2_block_group_search; 583 527 528 + /* 529 + * slot is set when we successfully steal inode from other nodes. 530 + * It is reset in 3 places: 531 + * 1. when we flush the truncate log 532 + * 2. when we complete local alloc recovery. 533 + * 3. when we successfully allocate from our own slot. 534 + * After it is set, we will go on stealing inodes until we find the 535 + * need to check our slots to see whether there is some space for us. 536 + */ 537 + if (slot != OCFS2_INVALID_SLOT && 538 + atomic_read(&osb->s_num_inodes_stolen) < OCFS2_MAX_INODES_TO_STEAL) 539 + goto inode_steal; 540 + 541 + atomic_set(&osb->s_num_inodes_stolen, 0); 584 542 status = ocfs2_reserve_suballoc_bits(osb, *ac, 585 543 INODE_ALLOC_SYSTEM_INODE, 586 - osb->slot_num); 544 + osb->slot_num, ALLOC_NEW_GROUP); 545 + if (status >= 0) { 546 + status = 0; 547 + 548 + /* 549 + * Some inodes must be freed by us, so try to allocate 550 + * from our own next time. 551 + */ 552 + if (slot != OCFS2_INVALID_SLOT) 553 + ocfs2_init_inode_steal_slot(osb); 554 + goto bail; 555 + } else if (status < 0 && status != -ENOSPC) { 556 + mlog_errno(status); 557 + goto bail; 558 + } 559 + 560 + ocfs2_free_ac_resource(*ac); 561 + 562 + inode_steal: 563 + status = ocfs2_steal_inode_from_other_nodes(osb, *ac); 564 + atomic_inc(&osb->s_num_inodes_stolen); 587 565 if (status < 0) { 588 566 if (status != -ENOSPC) 589 567 mlog_errno(status); ··· 647 557 648 558 status = ocfs2_reserve_suballoc_bits(osb, ac, 649 559 GLOBAL_BITMAP_SYSTEM_INODE, 650 - OCFS2_INVALID_SLOT); 560 + OCFS2_INVALID_SLOT, 561 + ALLOC_NEW_GROUP); 651 562 if (status < 0 && status != -ENOSPC) { 652 563 mlog_errno(status); 653 564 goto bail;

+1

fs/ocfs2/suballoc.h

··· 36 36 struct ocfs2_alloc_context { 37 37 struct inode *ac_inode; /* which bitmap are we allocating from? */ 38 38 struct buffer_head *ac_bh; /* file entry bh */ 39 + u32 ac_alloc_slot; /* which slot are we allocating from? */ 39 40 u32 ac_bits_wanted; 40 41 u32 ac_bits_given; 41 42 #define OCFS2_AC_USE_LOCAL 1

+122 -86

fs/ocfs2/super.c

··· 40 40 #include <linux/crc32.h> 41 41 #include <linux/debugfs.h> 42 42 #include <linux/mount.h> 43 - 44 - #include <cluster/nodemanager.h> 43 + #include <linux/seq_file.h> 45 44 46 45 #define MLOG_MASK_PREFIX ML_SUPER 47 46 #include <cluster/masklog.h> ··· 87 88 unsigned int atime_quantum; 88 89 signed short slot; 89 90 unsigned int localalloc_opt; 91 + char cluster_stack[OCFS2_STACK_LABEL_LEN + 1]; 90 92 }; 91 93 92 94 static int ocfs2_parse_options(struct super_block *sb, char *options, ··· 109 109 static int ocfs2_init_global_system_inodes(struct ocfs2_super *osb); 110 110 static int ocfs2_init_local_system_inodes(struct ocfs2_super *osb); 111 111 static void ocfs2_release_system_inodes(struct ocfs2_super *osb); 112 - static int ocfs2_fill_local_node_info(struct ocfs2_super *osb); 113 112 static int ocfs2_check_volume(struct ocfs2_super *osb); 114 113 static int ocfs2_verify_volume(struct ocfs2_dinode *di, 115 114 struct buffer_head *bh, ··· 153 154 Opt_commit, 154 155 Opt_localalloc, 155 156 Opt_localflocks, 157 + Opt_stack, 156 158 Opt_err, 157 159 }; 158 160 ··· 172 172 {Opt_commit, "commit=%u"}, 173 173 {Opt_localalloc, "localalloc=%d"}, 174 174 {Opt_localflocks, "localflocks"}, 175 + {Opt_stack, "cluster_stack=%s"}, 175 176 {Opt_err, NULL} 176 177 }; 177 178 ··· 552 551 } 553 552 } 554 553 554 + if (ocfs2_userspace_stack(osb)) { 555 + if (osb->s_mount_opt & OCFS2_MOUNT_HB_LOCAL) { 556 + mlog(ML_ERROR, "Userspace stack expected, but " 557 + "o2cb heartbeat arguments passed to mount\n"); 558 + return -EINVAL; 559 + } 560 + } 561 + 555 562 if (!(osb->s_mount_opt & OCFS2_MOUNT_HB_LOCAL)) { 556 - if (!ocfs2_mount_local(osb) && !ocfs2_is_hard_readonly(osb)) { 563 + if (!ocfs2_mount_local(osb) && !ocfs2_is_hard_readonly(osb) && 564 + !ocfs2_userspace_stack(osb)) { 557 565 mlog(ML_ERROR, "Heartbeat has to be started to mount " 558 566 "a read-write clustered device.\n"); 559 567 return -EINVAL; 560 568 } 569 + } 570 + 571 + return 0; 572 + } 573 + 574 + /* 575 + * If we're using a userspace stack, mount should have passed 576 + * a name that matches the disk. If not, mount should not 577 + * have passed a stack. 578 + */ 579 + static int ocfs2_verify_userspace_stack(struct ocfs2_super *osb, 580 + struct mount_options *mopt) 581 + { 582 + if (!ocfs2_userspace_stack(osb) && mopt->cluster_stack[0]) { 583 + mlog(ML_ERROR, 584 + "cluster stack passed to mount, but this filesystem " 585 + "does not support it\n"); 586 + return -EINVAL; 587 + } 588 + 589 + if (ocfs2_userspace_stack(osb) && 590 + strncmp(osb->osb_cluster_stack, mopt->cluster_stack, 591 + OCFS2_STACK_LABEL_LEN)) { 592 + mlog(ML_ERROR, 593 + "cluster stack passed to mount (\"%s\") does not " 594 + "match the filesystem (\"%s\")\n", 595 + mopt->cluster_stack, 596 + osb->osb_cluster_stack); 597 + return -EINVAL; 561 598 } 562 599 563 600 return 0; ··· 618 579 goto read_super_error; 619 580 } 620 581 621 - /* for now we only have one cluster/node, make sure we see it 622 - * in the heartbeat universe */ 623 - if (parsed_options.mount_opt & OCFS2_MOUNT_HB_LOCAL) { 624 - if (!o2hb_check_local_node_heartbeating()) { 625 - status = -EINVAL; 626 - goto read_super_error; 627 - } 628 - } 629 - 630 582 /* probe for superblock */ 631 583 status = ocfs2_sb_probe(sb, &bh, &sector_size); 632 584 if (status < 0) { ··· 638 608 osb->preferred_slot = parsed_options.slot; 639 609 osb->osb_commit_interval = parsed_options.commit_interval; 640 610 osb->local_alloc_size = parsed_options.localalloc_opt; 611 + 612 + status = ocfs2_verify_userspace_stack(osb, &parsed_options); 613 + if (status) 614 + goto read_super_error; 641 615 642 616 sb->s_magic = OCFS2_SUPER_MAGIC; 643 617 ··· 728 694 if (ocfs2_mount_local(osb)) 729 695 snprintf(nodestr, sizeof(nodestr), "local"); 730 696 else 731 - snprintf(nodestr, sizeof(nodestr), "%d", osb->node_num); 697 + snprintf(nodestr, sizeof(nodestr), "%u", osb->node_num); 732 698 733 699 printk(KERN_INFO "ocfs2: Mounting device (%s) on (node %s, slot %d) " 734 700 "with %s data mode.\n", ··· 797 763 mopt->atime_quantum = OCFS2_DEFAULT_ATIME_QUANTUM; 798 764 mopt->slot = OCFS2_INVALID_SLOT; 799 765 mopt->localalloc_opt = OCFS2_DEFAULT_LOCAL_ALLOC_SIZE; 766 + mopt->cluster_stack[0] = '\0'; 800 767 801 768 if (!options) { 802 769 status = 1; ··· 899 864 if (!is_remount) 900 865 mopt->mount_opt |= OCFS2_MOUNT_LOCALFLOCKS; 901 866 break; 867 + case Opt_stack: 868 + /* Check both that the option we were passed 869 + * is of the right length and that it is a proper 870 + * string of the right length. 871 + */ 872 + if (((args[0].to - args[0].from) != 873 + OCFS2_STACK_LABEL_LEN) || 874 + (strnlen(args[0].from, 875 + OCFS2_STACK_LABEL_LEN) != 876 + OCFS2_STACK_LABEL_LEN)) { 877 + mlog(ML_ERROR, 878 + "Invalid cluster_stack option\n"); 879 + status = 0; 880 + goto bail; 881 + } 882 + memcpy(mopt->cluster_stack, args[0].from, 883 + OCFS2_STACK_LABEL_LEN); 884 + mopt->cluster_stack[OCFS2_STACK_LABEL_LEN] = '\0'; 885 + break; 902 886 default: 903 887 mlog(ML_ERROR, 904 888 "Unrecognized mount option \"%s\" " ··· 976 922 if (opts & OCFS2_MOUNT_LOCALFLOCKS) 977 923 seq_printf(s, ",localflocks,"); 978 924 925 + if (osb->osb_cluster_stack[0]) 926 + seq_printf(s, ",cluster_stack=%.*s", OCFS2_STACK_LABEL_LEN, 927 + osb->osb_cluster_stack); 928 + 979 929 return 0; 980 930 } 981 931 ··· 1014 956 status = -EFAULT; 1015 957 mlog(ML_ERROR, "Unable to create ocfs2 debugfs root.\n"); 1016 958 } 959 + 960 + ocfs2_set_locking_protocol(); 1017 961 1018 962 leave: 1019 963 if (status < 0) { ··· 1192 1132 return 0; 1193 1133 } 1194 1134 1195 - /* ocfs2 1.0 only allows one cluster and node identity per kernel image. */ 1196 - static int ocfs2_fill_local_node_info(struct ocfs2_super *osb) 1197 - { 1198 - int status; 1199 - 1200 - /* XXX hold a ref on the node while mounte? easy enough, if 1201 - * desirable. */ 1202 - if (ocfs2_mount_local(osb)) 1203 - osb->node_num = 0; 1204 - else 1205 - osb->node_num = o2nm_this_node(); 1206 - 1207 - if (osb->node_num == O2NM_MAX_NODES) { 1208 - mlog(ML_ERROR, "could not find this host's node number\n"); 1209 - status = -ENOENT; 1210 - goto bail; 1211 - } 1212 - 1213 - mlog(0, "I am node %d\n", osb->node_num); 1214 - 1215 - status = 0; 1216 - bail: 1217 - return status; 1218 - } 1219 - 1220 1135 static int ocfs2_mount_volume(struct super_block *sb) 1221 1136 { 1222 1137 int status = 0; ··· 1202 1167 1203 1168 if (ocfs2_is_hard_readonly(osb)) 1204 1169 goto leave; 1205 - 1206 - status = ocfs2_fill_local_node_info(osb); 1207 - if (status < 0) { 1208 - mlog_errno(status); 1209 - goto leave; 1210 - } 1211 1170 1212 1171 status = ocfs2_dlm_init(osb); 1213 1172 if (status < 0) { ··· 1253 1224 return status; 1254 1225 } 1255 1226 1256 - /* we can't grab the goofy sem lock from inside wait_event, so we use 1257 - * memory barriers to make sure that we'll see the null task before 1258 - * being woken up */ 1259 - static int ocfs2_recovery_thread_running(struct ocfs2_super *osb) 1260 - { 1261 - mb(); 1262 - return osb->recovery_thread_task != NULL; 1263 - } 1264 - 1265 1227 static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err) 1266 1228 { 1267 - int tmp; 1229 + int tmp, hangup_needed = 0; 1268 1230 struct ocfs2_super *osb = NULL; 1269 1231 char nodestr[8]; 1270 1232 ··· 1269 1249 1270 1250 ocfs2_truncate_log_shutdown(osb); 1271 1251 1272 - /* disable any new recovery threads and wait for any currently 1273 - * running ones to exit. Do this before setting the vol_state. */ 1274 - mutex_lock(&osb->recovery_lock); 1275 - osb->disable_recovery = 1; 1276 - mutex_unlock(&osb->recovery_lock); 1277 - wait_event(osb->recovery_event, !ocfs2_recovery_thread_running(osb)); 1278 - 1279 - /* At this point, we know that no more recovery threads can be 1280 - * launched, so wait for any recovery completion work to 1281 - * complete. */ 1282 - flush_workqueue(ocfs2_wq); 1252 + /* This will disable recovery and flush any recovery work. */ 1253 + ocfs2_recovery_exit(osb); 1283 1254 1284 1255 ocfs2_journal_shutdown(osb); 1285 1256 1286 1257 ocfs2_sync_blockdev(sb); 1287 1258 1288 - /* No dlm means we've failed during mount, so skip all the 1289 - * steps which depended on that to complete. */ 1290 - if (osb->dlm) { 1259 + /* No cluster connection means we've failed during mount, so skip 1260 + * all the steps which depended on that to complete. */ 1261 + if (osb->cconn) { 1291 1262 tmp = ocfs2_super_lock(osb, 1); 1292 1263 if (tmp < 0) { 1293 1264 mlog_errno(tmp); ··· 1289 1278 if (osb->slot_num != OCFS2_INVALID_SLOT) 1290 1279 ocfs2_put_slot(osb); 1291 1280 1292 - if (osb->dlm) 1281 + if (osb->cconn) 1293 1282 ocfs2_super_unlock(osb, 1); 1294 1283 1295 1284 ocfs2_release_system_inodes(osb); 1296 1285 1297 - if (osb->dlm) 1298 - ocfs2_dlm_shutdown(osb); 1286 + /* 1287 + * If we're dismounting due to mount error, mount.ocfs2 will clean 1288 + * up heartbeat. If we're a local mount, there is no heartbeat. 1289 + * If we failed before we got a uuid_str yet, we can't stop 1290 + * heartbeat. Otherwise, do it. 1291 + */ 1292 + if (!mnt_err && !ocfs2_mount_local(osb) && osb->uuid_str) 1293 + hangup_needed = 1; 1294 + 1295 + if (osb->cconn) 1296 + ocfs2_dlm_shutdown(osb, hangup_needed); 1299 1297 1300 1298 debugfs_remove(osb->osb_debug_root); 1301 1299 1302 - if (!mnt_err) 1303 - ocfs2_stop_heartbeat(osb); 1300 + if (hangup_needed) 1301 + ocfs2_cluster_hangup(osb->uuid_str, strlen(osb->uuid_str)); 1304 1302 1305 1303 atomic_set(&osb->vol_state, VOLUME_DISMOUNTED); 1306 1304 1307 1305 if (ocfs2_mount_local(osb)) 1308 1306 snprintf(nodestr, sizeof(nodestr), "local"); 1309 1307 else 1310 - snprintf(nodestr, sizeof(nodestr), "%d", osb->node_num); 1308 + snprintf(nodestr, sizeof(nodestr), "%u", osb->node_num); 1311 1309 1312 1310 printk(KERN_INFO "ocfs2: Unmounting device (%s) on (node %s)\n", 1313 1311 osb->dev_str, nodestr); ··· 1375 1355 sb->s_fs_info = osb; 1376 1356 sb->s_op = &ocfs2_sops; 1377 1357 sb->s_export_op = &ocfs2_export_ops; 1378 - osb->osb_locking_proto = ocfs2_locking_protocol; 1379 1358 sb->s_time_gran = 1; 1380 1359 sb->s_flags |= MS_NOATIME; 1381 1360 /* this is needed to support O_LARGEFILE */ ··· 1387 1368 osb->s_sectsize_bits = blksize_bits(sector_size); 1388 1369 BUG_ON(!osb->s_sectsize_bits); 1389 1370 1390 - init_waitqueue_head(&osb->recovery_event); 1391 1371 spin_lock_init(&osb->dc_task_lock); 1392 1372 init_waitqueue_head(&osb->dc_event); 1393 1373 osb->dc_work_sequence = 0; ··· 1394 1376 INIT_LIST_HEAD(&osb->blocked_lock_list); 1395 1377 osb->blocked_lock_count = 0; 1396 1378 spin_lock_init(&osb->osb_lock); 1379 + ocfs2_init_inode_steal_slot(osb); 1397 1380 1398 1381 atomic_set(&osb->alloc_stats.moves, 0); 1399 1382 atomic_set(&osb->alloc_stats.local_data, 0); ··· 1407 1388 snprintf(osb->dev_str, sizeof(osb->dev_str), "%u,%u", 1408 1389 MAJOR(osb->sb->s_dev), MINOR(osb->sb->s_dev)); 1409 1390 1410 - mutex_init(&osb->recovery_lock); 1411 - 1412 - osb->disable_recovery = 0; 1413 - osb->recovery_thread_task = NULL; 1391 + status = ocfs2_recovery_init(osb); 1392 + if (status) { 1393 + mlog(ML_ERROR, "Unable to initialize recovery state\n"); 1394 + mlog_errno(status); 1395 + goto bail; 1396 + } 1414 1397 1415 1398 init_waitqueue_head(&osb->checkpoint_event); 1416 1399 atomic_set(&osb->needs_checkpoint, 0); 1417 1400 1418 1401 osb->s_atime_quantum = OCFS2_DEFAULT_ATIME_QUANTUM; 1419 1402 1420 - osb->node_num = O2NM_INVALID_NODE_NUM; 1421 1403 osb->slot_num = OCFS2_INVALID_SLOT; 1422 1404 1423 1405 osb->local_alloc_state = OCFS2_LA_UNUSED; 1424 1406 osb->local_alloc_bh = NULL; 1425 - 1426 - ocfs2_setup_hb_callbacks(osb); 1427 1407 1428 1408 init_waitqueue_head(&osb->osb_mount_event); 1429 1409 ··· 1471 1453 "unsupported optional features (%x).\n", i); 1472 1454 status = -EINVAL; 1473 1455 goto bail; 1456 + } 1457 + 1458 + if (ocfs2_userspace_stack(osb)) { 1459 + memcpy(osb->osb_cluster_stack, 1460 + OCFS2_RAW_SB(di)->s_cluster_info.ci_stack, 1461 + OCFS2_STACK_LABEL_LEN); 1462 + osb->osb_cluster_stack[OCFS2_STACK_LABEL_LEN] = '\0'; 1463 + if (strlen(osb->osb_cluster_stack) != OCFS2_STACK_LABEL_LEN) { 1464 + mlog(ML_ERROR, 1465 + "couldn't mount because of an invalid " 1466 + "cluster stack label (%s) \n", 1467 + osb->osb_cluster_stack); 1468 + status = -EINVAL; 1469 + goto bail; 1470 + } 1471 + } else { 1472 + /* The empty string is identical with classic tools that 1473 + * don't know about s_cluster_info. */ 1474 + osb->osb_cluster_stack[0] = '\0'; 1474 1475 } 1475 1476 1476 1477 get_random_bytes(&osb->s_next_generation, sizeof(u32)); ··· 1761 1724 1762 1725 /* This function assumes that the caller has the main osb resource */ 1763 1726 1764 - if (osb->slot_info) 1765 - ocfs2_free_slot_info(osb->slot_info); 1727 + ocfs2_free_slot_info(osb); 1766 1728 1767 1729 kfree(osb->osb_orphan_wipes); 1768 1730 /* FIXME

+8 -1

fs/sysfs/symlink.c

··· 87 87 88 88 void sysfs_remove_link(struct kobject * kobj, const char * name) 89 89 { 90 - sysfs_hash_and_remove(kobj->sd, name); 90 + struct sysfs_dirent *parent_sd = NULL; 91 + 92 + if (!kobj) 93 + parent_sd = &sysfs_root; 94 + else 95 + parent_sd = kobj->sd; 96 + 97 + sysfs_hash_and_remove(parent_sd, name); 91 98 } 92 99 93 100 static int sysfs_get_target_path(struct sysfs_dirent *parent_sd,