Documentation/admin-guide/cgroup-v1/hugetlb.rst at v6.19-rc4

tjh.dev / kernel
fork
Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
fork
kernel / Documentation / admin-guide / cgroup-v1 / hugetlb.rst
at v6.19-rc4 139 lines 6.1 kB view raw
wrap content
  1==================
  2HugeTLB Controller
  3==================
  4
  5HugeTLB controller can be created by first mounting the cgroup filesystem.
  6
  7# mount -t cgroup -o hugetlb none /sys/fs/cgroup
  8
  9With the above step, the initial or the parent HugeTLB group becomes
 10visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
 11the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
 12
 13New groups can be created under the parent group /sys/fs/cgroup::
 14
 15  # cd /sys/fs/cgroup
 16  # mkdir g1
 17  # echo $$ > g1/tasks
 18
 19The above steps create a new group g1 and move the current shell
 20process (bash) into it.
 21
 22Brief summary of control files::
 23
 24 hugetlb.<hugepagesize>.rsvd.limit_in_bytes            # set/show limit of "hugepagesize" hugetlb reservations
 25 hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes        # show max "hugepagesize" hugetlb reservations and no-reserve faults
 26 hugetlb.<hugepagesize>.rsvd.usage_in_bytes            # show current reservations and no-reserve faults for "hugepagesize" hugetlb
 27 hugetlb.<hugepagesize>.rsvd.failcnt                   # show the number of allocation failure due to HugeTLB reservation limit
 28 hugetlb.<hugepagesize>.limit_in_bytes                 # set/show limit of "hugepagesize" hugetlb faults
 29 hugetlb.<hugepagesize>.max_usage_in_bytes             # show max "hugepagesize" hugetlb  usage recorded
 30 hugetlb.<hugepagesize>.usage_in_bytes                 # show current usage for "hugepagesize" hugetlb
 31 hugetlb.<hugepagesize>.failcnt                        # show the number of allocation failure due to HugeTLB usage limit
 32 hugetlb.<hugepagesize>.numa_stat                      # show the numa information of the hugetlb memory charged to this cgroup
 33
 34For a system supporting three hugepage sizes (64k, 32M and 1G), the control
 35files include::
 36
 37  hugetlb.1GB.limit_in_bytes
 38  hugetlb.1GB.max_usage_in_bytes
 39  hugetlb.1GB.numa_stat
 40  hugetlb.1GB.usage_in_bytes
 41  hugetlb.1GB.failcnt
 42  hugetlb.1GB.rsvd.limit_in_bytes
 43  hugetlb.1GB.rsvd.max_usage_in_bytes
 44  hugetlb.1GB.rsvd.usage_in_bytes
 45  hugetlb.1GB.rsvd.failcnt
 46  hugetlb.64KB.limit_in_bytes
 47  hugetlb.64KB.max_usage_in_bytes
 48  hugetlb.64KB.numa_stat
 49  hugetlb.64KB.usage_in_bytes
 50  hugetlb.64KB.failcnt
 51  hugetlb.64KB.rsvd.limit_in_bytes
 52  hugetlb.64KB.rsvd.max_usage_in_bytes
 53  hugetlb.64KB.rsvd.usage_in_bytes
 54  hugetlb.64KB.rsvd.failcnt
 55  hugetlb.32MB.limit_in_bytes
 56  hugetlb.32MB.max_usage_in_bytes
 57  hugetlb.32MB.numa_stat
 58  hugetlb.32MB.usage_in_bytes
 59  hugetlb.32MB.failcnt
 60  hugetlb.32MB.rsvd.limit_in_bytes
 61  hugetlb.32MB.rsvd.max_usage_in_bytes
 62  hugetlb.32MB.rsvd.usage_in_bytes
 63  hugetlb.32MB.rsvd.failcnt
 64
 65
 661. Page fault accounting
 67
 68::
 69
 70  hugetlb.<hugepagesize>.limit_in_bytes
 71  hugetlb.<hugepagesize>.max_usage_in_bytes
 72  hugetlb.<hugepagesize>.usage_in_bytes
 73  hugetlb.<hugepagesize>.failcnt
 74
 75The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
 76control group and enforces the limit during page fault. Since HugeTLB
 77doesn't support page reclaim, enforcing the limit at page fault time implies
 78that, the application will get SIGBUS signal if it tries to fault in HugeTLB
 79pages beyond its limit. Therefore the application needs to know exactly how many
 80HugeTLB pages it uses before hand, and the sysadmin needs to make sure that
 81there are enough available on the machine for all the users to avoid processes
 82getting SIGBUS.
 83
 84
 852. Reservation accounting
 86
 87::
 88
 89  hugetlb.<hugepagesize>.rsvd.limit_in_bytes
 90  hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes
 91  hugetlb.<hugepagesize>.rsvd.usage_in_bytes
 92  hugetlb.<hugepagesize>.rsvd.failcnt
 93
 94The HugeTLB controller allows to limit the HugeTLB reservations per control
 95group and enforces the controller limit at reservation time and at the fault of
 96HugeTLB memory for which no reservation exists. Since reservation limits are
 97enforced at reservation time (on mmap or shget), reservation limits never causes
 98the application to get SIGBUS signal if the memory was reserved before hand. For
 99MAP_NORESERVE allocations, the reservation limit behaves the same as the fault
100limit, enforcing memory usage at fault time and causing the application to
101receive a SIGBUS if it's crossing its limit.
102
103Reservation limits are superior to page fault limits described above, since
104reservation limits are enforced at reservation time (on mmap or shget), and
105never causes the application to get SIGBUS signal if the memory was reserved
106before hand. This allows for easier fallback to alternatives such as
107non-HugeTLB memory for example. In the case of page fault accounting, it's very
108hard to avoid processes getting SIGBUS since the sysadmin needs precisely know
109the HugeTLB usage of all the tasks in the system and make sure there is enough
110pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited
111systems is practically impossible with page fault accounting.
112
113
1143. Caveats with shared memory
115
116For shared HugeTLB memory, both HugeTLB reservation and page faults are charged
117to the first task that causes the memory to be reserved or faulted, and all
118subsequent uses of this reserved or faulted memory is done without charging.
119
120Shared HugeTLB memory is only uncharged when it is unreserved or deallocated.
121This is usually when the HugeTLB file is deleted, and not when the task that
122caused the reservation or fault has exited.
123
124
1254. Caveats with HugeTLB cgroup offline.
126
127When a HugeTLB cgroup goes offline with some reservations or faults still
128charged to it, the behavior is as follows:
129
130- The fault charges are charged to the parent HugeTLB cgroup (reparented),
131- the reservation charges remain on the offline HugeTLB cgroup.
132
133This means that if a HugeTLB cgroup gets offlined while there is still HugeTLB
134reservations charged to it, that cgroup persists as a zombie until all HugeTLB
135reservations are uncharged. HugeTLB reservations behave in this manner to match
136the memory controller whose cgroups also persist as zombie until all charged
137memory is uncharged. Also, the tracking of HugeTLB reservations is a bit more
138complex compared to the tracking of HugeTLB faults, so it is significantly
139harder to reparent reservations at offline time.
Configure Feed

Configure Feed