Linux kernel mirror (for testing)
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel
os
linux
1.. SPDX-License-Identifier: GPL-2.0
2
3===============
4Shared Subtrees
5===============
6
7.. Contents:
8 1) Overview
9 2) Features
10 3) Setting mount states
11 4) Use-case
12 5) Detailed semantics
13 6) Quiz
14 7) FAQ
15 8) Implementation
16
17
181) Overview
19-----------
20
21Consider the following situation:
22
23A process wants to clone its own namespace, but still wants to access the CD
24that got mounted recently. Shared subtree semantics provide the necessary
25mechanism to accomplish the above.
26
27It provides the necessary building blocks for features like per-user-namespace
28and versioned filesystem.
29
302) Features
31-----------
32
33Shared subtree provides four different flavors of mounts; struct vfsmount to be
34precise:
35
36
37a) A **shared mount** can be replicated to as many mountpoints and all the
38 replicas continue to be exactly same.
39
40 Here is an example:
41
42 Let's say /mnt has a mount that is shared::
43
44 # mount --make-shared /mnt
45
46 .. note::
47 mount(8) command now supports the --make-shared flag,
48 so the sample 'smount' program is no longer needed and has been
49 removed.
50
51 ::
52
53 # mount --bind /mnt /tmp
54
55 The above command replicates the mount at /mnt to the mountpoint /tmp
56 and the contents of both the mounts remain identical.
57
58 ::
59
60 #ls /mnt
61 a b c
62
63 #ls /tmp
64 a b c
65
66 Now let's say we mount a device at /tmp/a::
67
68 # mount /dev/sd0 /tmp/a
69
70 # ls /tmp/a
71 t1 t2 t3
72
73 # ls /mnt/a
74 t1 t2 t3
75
76 Note that the mount has propagated to the mount at /mnt as well.
77
78 And the same is true even when /dev/sd0 is mounted on /mnt/a. The
79 contents will be visible under /tmp/a too.
80
81
82b) A **slave mount** is like a shared mount except that mount and umount events
83 only propagate towards it.
84
85 All slave mounts have a master mount which is a shared.
86
87 Here is an example:
88
89 Let's say /mnt has a mount which is shared::
90
91 # mount --make-shared /mnt
92
93 Let's bind mount /mnt to /tmp::
94
95 # mount --bind /mnt /tmp
96
97 the new mount at /tmp becomes a shared mount and it is a replica of
98 the mount at /mnt.
99
100 Now let's make the mount at /tmp; a slave of /mnt::
101
102 # mount --make-slave /tmp
103
104 let's mount /dev/sd0 on /mnt/a::
105
106 # mount /dev/sd0 /mnt/a
107
108 # ls /mnt/a
109 t1 t2 t3
110
111 # ls /tmp/a
112 t1 t2 t3
113
114 Note the mount event has propagated to the mount at /tmp
115
116 However let's see what happens if we mount something on the mount at
117 /tmp::
118
119 # mount /dev/sd1 /tmp/b
120
121 # ls /tmp/b
122 s1 s2 s3
123
124 # ls /mnt/b
125
126 Note how the mount event has not propagated to the mount at
127 /mnt
128
129
130c) A **private mount** does not forward or receive propagation.
131
132 This is the mount we are familiar with. Its the default type.
133
134
135d) An **unbindable mount** is, as the name suggests, an unbindable private
136 mount.
137
138 let's say we have a mount at /mnt and we make it unbindable::
139
140 # mount --make-unbindable /mnt
141
142 Let's try to bind mount this mount somewhere else::
143
144 # mount --bind /mnt /tmp mount: wrong fs type, bad option, bad
145 superblock on /mnt, or too many mounted file systems
146
147 Binding a unbindable mount is a invalid operation.
148
149
1503) Setting mount states
151-----------------------
152
153The mount command (util-linux package) can be used to set mount
154states::
155
156 mount --make-shared mountpoint
157 mount --make-slave mountpoint
158 mount --make-private mountpoint
159 mount --make-unbindable mountpoint
160
161
1624) Use cases
163------------
164
165A) A process wants to clone its own namespace, but still wants to
166 access the CD that got mounted recently.
167
168 Solution:
169
170 The system administrator can make the mount at /cdrom shared::
171
172 mount --bind /cdrom /cdrom
173 mount --make-shared /cdrom
174
175 Now any process that clones off a new namespace will have a
176 mount at /cdrom which is a replica of the same mount in the
177 parent namespace.
178
179 So when a CD is inserted and mounted at /cdrom that mount gets
180 propagated to the other mount at /cdrom in all the other clone
181 namespaces.
182
183B) A process wants its mounts invisible to any other process, but
184 still be able to see the other system mounts.
185
186 Solution:
187
188 To begin with, the administrator can mark the entire mount tree
189 as shareable::
190
191 mount --make-rshared /
192
193 A new process can clone off a new namespace. And mark some part
194 of its namespace as slave::
195
196 mount --make-rslave /myprivatetree
197
198 Hence forth any mounts within the /myprivatetree done by the
199 process will not show up in any other namespace. However mounts
200 done in the parent namespace under /myprivatetree still shows
201 up in the process's namespace.
202
203
204Apart from the above semantics this feature provides the
205building blocks to solve the following problems:
206
207C) Per-user namespace
208
209 The above semantics allows a way to share mounts across
210 namespaces. But namespaces are associated with processes. If
211 namespaces are made first class objects with user API to
212 associate/disassociate a namespace with userid, then each user
213 could have his/her own namespace and tailor it to his/her
214 requirements. This needs to be supported in PAM.
215
216D) Versioned files
217
218 If the entire mount tree is visible at multiple locations, then
219 an underlying versioning file system can return different
220 versions of the file depending on the path used to access that
221 file.
222
223 An example is::
224
225 mount --make-shared /
226 mount --rbind / /view/v1
227 mount --rbind / /view/v2
228 mount --rbind / /view/v3
229 mount --rbind / /view/v4
230
231 and if /usr has a versioning filesystem mounted, then that
232 mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
233 /view/v4/usr too
234
235 A user can request v3 version of the file /usr/fs/namespace.c
236 by accessing /view/v3/usr/fs/namespace.c . The underlying
237 versioning filesystem can then decipher that v3 version of the
238 filesystem is being requested and return the corresponding
239 inode.
240
2415) Detailed semantics
242---------------------
243The section below explains the detailed semantics of
244bind, rbind, move, mount, umount and clone-namespace operations.
245
246.. Note::
247 the word 'vfsmount' and the noun 'mount' have been used
248 to mean the same thing, throughout this document.
249
250a) Mount states
251
252 A **propagation event** is defined as event generated on a vfsmount
253 that leads to mount or unmount actions in other vfsmounts.
254
255 A **peer group** is defined as a group of vfsmounts that propagate
256 events to each other.
257
258 A given mount can be in one of the following states:
259
260 (1) Shared mounts
261
262 A **shared mount** is defined as a vfsmount that belongs to a
263 peer group.
264
265 For example::
266
267 mount --make-shared /mnt
268 mount --bind /mnt /tmp
269
270 The mount at /mnt and that at /tmp are both shared and belong
271 to the same peer group. Anything mounted or unmounted under
272 /mnt or /tmp reflect in all the other mounts of its peer
273 group.
274
275
276 (2) Slave mounts
277
278 A **slave mount** is defined as a vfsmount that receives
279 propagation events and does not forward propagation events.
280
281 A slave mount as the name implies has a master mount from which
282 mount/unmount events are received. Events do not propagate from
283 the slave mount to the master. Only a shared mount can be made
284 a slave by executing the following command::
285
286 mount --make-slave mount
287
288 A shared mount that is made as a slave is no more shared unless
289 modified to become shared.
290
291 (3) Shared and Slave
292
293 A vfsmount can be both **shared** as well as **slave**. This state
294 indicates that the mount is a slave of some vfsmount, and
295 has its own peer group too. This vfsmount receives propagation
296 events from its master vfsmount, and also forwards propagation
297 events to its 'peer group' and to its slave vfsmounts.
298
299 Strictly speaking, the vfsmount is shared having its own
300 peer group, and this peer-group is a slave of some other
301 peer group.
302
303 Only a slave vfsmount can be made as 'shared and slave' by
304 either executing the following command::
305
306 mount --make-shared mount
307
308 or by moving the slave vfsmount under a shared vfsmount.
309
310 (4) Private mount
311
312 A **private mount** is defined as vfsmount that does not
313 receive or forward any propagation events.
314
315 (5) Unbindable mount
316
317 A **unbindable mount** is defined as vfsmount that does not
318 receive or forward any propagation events and cannot
319 be bind mounted.
320
321
322 State diagram:
323
324 The state diagram below explains the state transition of a mount,
325 in response to various commands::
326
327 -----------------------------------------------------------------------
328 | |make-shared | make-slave | make-private |make-unbindab|
329 --------------|------------|--------------|--------------|-------------|
330 |shared |shared |*slave/private| private | unbindable |
331 | | | | | |
332 |-------------|------------|--------------|--------------|-------------|
333 |slave |shared | **slave | private | unbindable |
334 | |and slave | | | |
335 |-------------|------------|--------------|--------------|-------------|
336 |shared |shared | slave | private | unbindable |
337 |and slave |and slave | | | |
338 |-------------|------------|--------------|--------------|-------------|
339 |private |shared | **private | private | unbindable |
340 |-------------|------------|--------------|--------------|-------------|
341 |unbindable |shared |**unbindable | private | unbindable |
342 ------------------------------------------------------------------------
343
344 * if the shared mount is the only mount in its peer group, making it
345 slave, makes it private automatically. Note that there is no master to
346 which it can be slaved to.
347
348 ** slaving a non-shared mount has no effect on the mount.
349
350 Apart from the commands listed below, the 'move' operation also changes
351 the state of a mount depending on type of the destination mount. Its
352 explained in section 5d.
353
354b) Bind semantics
355
356 Consider the following command::
357
358 mount --bind A/a B/b
359
360 where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
361 is the destination mount and 'b' is the dentry in the destination mount.
362
363 The outcome depends on the type of mount of 'A' and 'B'. The table
364 below contains quick reference::
365
366 --------------------------------------------------------------------------
367 | BIND MOUNT OPERATION |
368 |************************************************************************|
369 |source(A)->| shared | private | slave | unbindable |
370 | dest(B) | | | | |
371 | | | | | | |
372 | v | | | | |
373 |************************************************************************|
374 | shared | shared | shared | shared & slave | invalid |
375 | | | | | |
376 |non-shared| shared | private | slave | invalid |
377 **************************************************************************
378
379 Details:
380
381 1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
382 which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
383 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
384 are created and mounted at the dentry 'b' on all mounts where 'B'
385 propagates to. A new propagation tree containing 'C1',..,'Cn' is
386 created. This propagation tree is identical to the propagation tree of
387 'B'. And finally the peer-group of 'C' is merged with the peer group
388 of 'A'.
389
390 2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
391 which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
392 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
393 are created and mounted at the dentry 'b' on all mounts where 'B'
394 propagates to. A new propagation tree is set containing all new mounts
395 'C', 'C1', .., 'Cn' with exactly the same configuration as the
396 propagation tree for 'B'.
397
398 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
399 mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
400 'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
401 'C3' ... are created and mounted at the dentry 'b' on all mounts where
402 'B' propagates to. A new propagation tree containing the new mounts
403 'C','C1',.. 'Cn' is created. This propagation tree is identical to the
404 propagation tree for 'B'. And finally the mount 'C' and its peer group
405 is made the slave of mount 'Z'. In other words, mount 'C' is in the
406 state 'slave and shared'.
407
408 4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
409 invalid operation.
410
411 5. 'A' is a private mount and 'B' is a non-shared(private or slave or
412 unbindable) mount. A new mount 'C' which is clone of 'A', is created.
413 Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
414
415 6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
416 which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
417 mounted on mount 'B' at dentry 'b'. 'C' is made a member of the
418 peer-group of 'A'.
419
420 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
421 new mount 'C' which is a clone of 'A' is created. Its root dentry is
422 'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
423 slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
424 'Z'. All mount/unmount events on 'Z' propagates to 'A' and 'C'. But
425 mount/unmount on 'A' do not propagate anywhere else. Similarly
426 mount/unmount on 'C' do not propagate anywhere else.
427
428 8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
429 invalid operation. A unbindable mount cannot be bind mounted.
430
431c) Rbind semantics
432
433 rbind is same as bind. Bind replicates the specified mount. Rbind
434 replicates all the mounts in the tree belonging to the specified mount.
435 Rbind mount is bind mount applied to all the mounts in the tree.
436
437 If the source tree that is rbind has some unbindable mounts,
438 then the subtree under the unbindable mount is pruned in the new
439 location.
440
441 eg:
442
443 let's say we have the following mount tree::
444
445 A
446 / \
447 B C
448 / \ / \
449 D E F G
450
451 Let's say all the mount except the mount C in the tree are
452 of a type other than unbindable.
453
454 If this tree is rbound to say Z
455
456 We will have the following tree at the new location::
457
458 Z
459 |
460 A'
461 /
462 B' Note how the tree under C is pruned
463 / \ in the new location.
464 D' E'
465
466
467
468d) Move semantics
469
470 Consider the following command::
471
472 mount --move A B/b
473
474 where 'A' is the source mount, 'B' is the destination mount and 'b' is
475 the dentry in the destination mount.
476
477 The outcome depends on the type of the mount of 'A' and 'B'. The table
478 below is a quick reference::
479
480 ---------------------------------------------------------------------------
481 | MOVE MOUNT OPERATION |
482 |**************************************************************************
483 | source(A)->| shared | private | slave | unbindable |
484 | dest(B) | | | | |
485 | | | | | | |
486 | v | | | | |
487 |**************************************************************************
488 | shared | shared | shared |shared and slave| invalid |
489 | | | | | |
490 |non-shared| shared | private | slave | unbindable |
491 ***************************************************************************
492
493 .. Note:: moving a mount residing under a shared mount is invalid.
494
495 Details follow:
496
497 1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is
498 mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An'
499 are created and mounted at dentry 'b' on all mounts that receive
500 propagation from mount 'B'. A new propagation tree is created in the
501 exact same configuration as that of 'B'. This new propagation tree
502 contains all the new mounts 'A1', 'A2'... 'An'. And this new
503 propagation tree is appended to the already existing propagation tree
504 of 'A'.
505
506 2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
507 mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
508 are created and mounted at dentry 'b' on all mounts that receive
509 propagation from mount 'B'. The mount 'A' becomes a shared mount and a
510 propagation tree is created which is identical to that of
511 'B'. This new propagation tree contains all the new mounts 'A1',
512 'A2'... 'An'.
513
514 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The
515 mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1',
516 'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
517 receive propagation from mount 'B'. A new propagation tree is created
518 in the exact same configuration as that of 'B'. This new propagation
519 tree contains all the new mounts 'A1', 'A2'... 'An'. And this new
520 propagation tree is appended to the already existing propagation tree of
521 'A'. Mount 'A' continues to be the slave mount of 'Z' but it also
522 becomes 'shared'.
523
524 4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
525 is invalid. Because mounting anything on the shared mount 'B' can
526 create new mounts that get mounted on the mounts that receive
527 propagation from 'B'. And since the mount 'A' is unbindable, cloning
528 it to mount at other mountpoints is not possible.
529
530 5. 'A' is a private mount and 'B' is a non-shared(private or slave or
531 unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
532
533 6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A'
534 is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
535 shared mount.
536
537 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
538 The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A'
539 continues to be a slave mount of mount 'Z'.
540
541 8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
542 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
543 unbindable mount.
544
545e) Mount semantics
546
547 Consider the following command::
548
549 mount device B/b
550
551 'B' is the destination mount and 'b' is the dentry in the destination
552 mount.
553
554 The above operation is the same as bind operation with the exception
555 that the source mount is always a private mount.
556
557
558f) Unmount semantics
559
560 Consider the following command::
561
562 umount A
563
564 where 'A' is a mount mounted on mount 'B' at dentry 'b'.
565
566 If mount 'B' is shared, then all most-recently-mounted mounts at dentry
567 'b' on mounts that receive propagation from mount 'B' and does not have
568 sub-mounts within them are unmounted.
569
570 Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
571 each other.
572
573 let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
574 'B1', 'B2' and 'B3' respectively.
575
576 let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
577 mount 'B1', 'B2' and 'B3' respectively.
578
579 if 'C1' is unmounted, all the mounts that are most-recently-mounted on
580 'B1' and on the mounts that 'B1' propagates-to are unmounted.
581
582 'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount
583 on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'.
584
585 So all 'C1', 'C2' and 'C3' should be unmounted.
586
587 If any of 'C2' or 'C3' has some child mounts, then that mount is not
588 unmounted, but all other mounts are unmounted. However if 'C1' is told
589 to be unmounted and 'C1' has some sub-mounts, the umount operation is
590 failed entirely.
591
592g) Clone Namespace
593
594 A cloned namespace contains all the mounts as that of the parent
595 namespace.
596
597 Let's say 'A' and 'B' are the corresponding mounts in the parent and the
598 child namespace.
599
600 If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
601 each other.
602
603 If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of
604 'Z'.
605
606 If 'A' is a private mount, then 'B' is a private mount too.
607
608 If 'A' is unbindable mount, then 'B' is a unbindable mount too.
609
610
6116) Quiz
612-------
613
614A. What is the result of the following command sequence?
615
616 ::
617
618 mount --bind /mnt /mnt
619 mount --make-shared /mnt
620 mount --bind /mnt /tmp
621 mount --move /tmp /mnt/1
622
623 what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
624 Should they all be identical? or should /mnt and /mnt/1 be
625 identical only?
626
627
628B. What is the result of the following command sequence?
629
630 ::
631
632 mount --make-rshared /
633 mkdir -p /v/1
634 mount --rbind / /v/1
635
636 what should be the content of /v/1/v/1 be?
637
638
639C. What is the result of the following command sequence?
640
641 ::
642
643 mount --bind /mnt /mnt
644 mount --make-shared /mnt
645 mkdir -p /mnt/1/2/3 /mnt/1/test
646 mount --bind /mnt/1 /tmp
647 mount --make-slave /mnt
648 mount --make-shared /mnt
649 mount --bind /mnt/1/2 /tmp1
650 mount --make-slave /mnt
651
652 At this point we have the first mount at /tmp and
653 its root dentry is 1. Let's call this mount 'A'
654 And then we have a second mount at /tmp1 with root
655 dentry 2. Let's call this mount 'B'
656 Next we have a third mount at /mnt with root dentry
657 mnt. Let's call this mount 'C'
658
659 'B' is the slave of 'A' and 'C' is a slave of 'B'
660 A -> B -> C
661
662 at this point if we execute the following command::
663
664 mount --bind /bin /tmp/test
665
666 The mount is attempted on 'A'
667
668 will the mount propagate to 'B' and 'C' ?
669
670 what would be the contents of
671 /mnt/1/test be?
672
6737) FAQ
674------
675
6761. Why is bind mount needed? How is it different from symbolic links?
677
678 symbolic links can get stale if the destination mount gets
679 unmounted or moved. Bind mounts continue to exist even if the
680 other mount is unmounted or moved.
681
6822. Why can't the shared subtree be implemented using exportfs?
683
684 exportfs is a heavyweight way of accomplishing part of what
685 shared subtree can do. I cannot imagine a way to implement the
686 semantics of slave mount using exportfs?
687
6883. Why is unbindable mount needed?
689
690 Let's say we want to replicate the mount tree at multiple
691 locations within the same subtree.
692
693 if one rbind mounts a tree within the same subtree 'n' times
694 the number of mounts created is an exponential function of 'n'.
695 Having unbindable mount can help prune the unneeded bind
696 mounts. Here is an example.
697
698 step 1:
699 let's say the root tree has just two directories with
700 one vfsmount::
701
702 root
703 / \
704 tmp usr
705
706 And we want to replicate the tree at multiple
707 mountpoints under /root/tmp
708
709 step 2:
710 ::
711
712
713 mount --make-shared /root
714
715 mkdir -p /tmp/m1
716
717 mount --rbind /root /tmp/m1
718
719 the new tree now looks like this::
720
721 root
722 / \
723 tmp usr
724 /
725 m1
726 / \
727 tmp usr
728 /
729 m1
730
731 it has two vfsmounts
732
733 step 3:
734 ::
735
736 mkdir -p /tmp/m2
737 mount --rbind /root /tmp/m2
738
739 the new tree now looks like this::
740
741 root
742 / \
743 tmp usr
744 / \
745 m1 m2
746 / \ / \
747 tmp usr tmp usr
748 / \ /
749 m1 m2 m1
750 / \ / \
751 tmp usr tmp usr
752 / / \
753 m1 m1 m2
754 / \
755 tmp usr
756 / \
757 m1 m2
758
759 it has 6 vfsmounts
760
761 step 4:
762 ::
763
764 mkdir -p /tmp/m3
765 mount --rbind /root /tmp/m3
766
767 I won't draw the tree..but it has 24 vfsmounts
768
769
770 at step i the number of vfsmounts is V[i] = i*V[i-1].
771 This is an exponential function. And this tree has way more
772 mounts than what we really needed in the first place.
773
774 One could use a series of umount at each step to prune
775 out the unneeded mounts. But there is a better solution.
776 Unclonable mounts come in handy here.
777
778 step 1:
779 let's say the root tree has just two directories with
780 one vfsmount::
781
782 root
783 / \
784 tmp usr
785
786 How do we set up the same tree at multiple locations under
787 /root/tmp
788
789 step 2:
790 ::
791
792
793 mount --bind /root/tmp /root/tmp
794
795 mount --make-rshared /root
796 mount --make-unbindable /root/tmp
797
798 mkdir -p /tmp/m1
799
800 mount --rbind /root /tmp/m1
801
802 the new tree now looks like this::
803
804 root
805 / \
806 tmp usr
807 /
808 m1
809 / \
810 tmp usr
811
812 step 3:
813 ::
814
815 mkdir -p /tmp/m2
816 mount --rbind /root /tmp/m2
817
818 the new tree now looks like this::
819
820 root
821 / \
822 tmp usr
823 / \
824 m1 m2
825 / \ / \
826 tmp usr tmp usr
827
828 step 4:
829 ::
830
831 mkdir -p /tmp/m3
832 mount --rbind /root /tmp/m3
833
834 the new tree now looks like this::
835
836 root
837 / \
838 tmp usr
839 / \ \
840 m1 m2 m3
841 / \ / \ / \
842 tmp usr tmp usr tmp usr
843
8448) Implementation
845-----------------
846
847A) Datastructure
848
849 Several new fields are introduced to struct vfsmount:
850
851 ->mnt_share
852 Links together all the mount to/from which this vfsmount
853 send/receives propagation events.
854
855 ->mnt_slave_list
856 Links all the mounts to which this vfsmount propagates
857 to.
858
859 ->mnt_slave
860 Links together all the slaves that its master vfsmount
861 propagates to.
862
863 ->mnt_master
864 Points to the master vfsmount from which this vfsmount
865 receives propagation.
866
867 ->mnt_flags
868 Takes two more flags to indicate the propagation status of
869 the vfsmount. MNT_SHARE indicates that the vfsmount is a shared
870 vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be
871 replicated.
872
873 All the shared vfsmounts in a peer group form a cyclic list through
874 ->mnt_share.
875
876 All vfsmounts with the same ->mnt_master form on a cyclic list anchored
877 in ->mnt_master->mnt_slave_list and going through ->mnt_slave.
878
879 ->mnt_master can point to arbitrary (and possibly different) members
880 of master peer group. To find all immediate slaves of a peer group
881 you need to go through _all_ ->mnt_slave_list of its members.
882 Conceptually it's just a single set - distribution among the
883 individual lists does not affect propagation or the way propagation
884 tree is modified by operations.
885
886 All vfsmounts in a peer group have the same ->mnt_master. If it is
887 non-NULL, they form a contiguous (ordered) segment of slave list.
888
889 A example propagation tree looks as shown in the figure below.
890
891 .. note::
892 Though it looks like a forest, if we consider all the shared
893 mounts as a conceptual entity called 'pnode', it becomes a tree.
894
895 ::
896
897
898 A <--> B <--> C <---> D
899 /|\ /| |\
900 / F G J K H I
901 /
902 E<-->K
903 /|\
904 M L N
905
906 In the above figure A,B,C and D all are shared and propagate to each
907 other. 'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave
908 mounts 'J' and 'K' and 'D' has got two slave mounts 'H' and 'I'.
909 'E' is also shared with 'K' and they propagate to each other. And
910 'K' has 3 slaves 'M', 'L' and 'N'
911
912 A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D'
913
914 A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
915
916 E's ->mnt_share links with ->mnt_share of K
917
918 'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
919
920 'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
921
922 K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
923
924 C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
925
926 J and K's ->mnt_master points to struct vfsmount of C
927
928 and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
929
930 'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
931
932
933 NOTE: The propagation tree is orthogonal to the mount tree.
934
935B) Locking:
936
937 ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
938 by namespace_sem (exclusive for modifications, shared for reading).
939
940 Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
941 There are two exceptions: do_add_mount() and clone_mnt().
942 The former modifies a vfsmount that has not been visible in any shared
943 data structures yet.
944 The latter holds namespace_sem and the only references to vfsmount
945 are in lists that can't be traversed without namespace_sem.
946
947C) Algorithm:
948
949 The crux of the implementation resides in rbind/move operation.
950
951 The overall algorithm breaks the operation into 3 phases: (look at
952 attach_recursive_mnt() and propagate_mnt())
953
954 1. Prepare phase.
955
956 For each mount in the source tree:
957
958 a) Create the necessary number of mount trees to
959 be attached to each of the mounts that receive
960 propagation from the destination mount.
961 b) Do not attach any of the trees to its destination.
962 However note down its ->mnt_parent and ->mnt_mountpoint
963 c) Link all the new mounts to form a propagation tree that
964 is identical to the propagation tree of the destination
965 mount.
966
967 If this phase is successful, there should be 'n' new
968 propagation trees; where 'n' is the number of mounts in the
969 source tree. Go to the commit phase
970
971 Also there should be 'm' new mount trees, where 'm' is
972 the number of mounts to which the destination mount
973 propagates to.
974
975 If any memory allocations fail, go to the abort phase.
976
977 2. Commit phase.
978
979 Attach each of the mount trees to their corresponding
980 destination mounts.
981
982 3. Abort phase.
983
984 Delete all the newly created trees.
985
986 .. Note::
987 all the propagation related functionality resides in the file pnode.c
988
989
990------------------------------------------------------------------------
991
992version 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com)
993
994version 0.2 (Incorporated comments from Al Viro)