Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

docs: ntsync: Add documentation for the ntsync uAPI.

Add an overall explanation of the driver architecture, and complete and precise
specification for its intended behaviour.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-30-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

authored by

Elizabeth Figura and committed by
Greg Kroah-Hartman
6b695a75 79d42d9d

+386
+1
Documentation/userspace-api/index.rst
··· 63 63 vduse 64 64 futex2 65 65 perf_ring_buffer 66 + ntsync 66 67 67 68 .. only:: subproject and html 68 69
+385
Documentation/userspace-api/ntsync.rst
··· 1 + =================================== 2 + NT synchronization primitive driver 3 + =================================== 4 + 5 + This page documents the user-space API for the ntsync driver. 6 + 7 + ntsync is a support driver for emulation of NT synchronization 8 + primitives by user-space NT emulators. It exists because implementation 9 + in user-space, using existing tools, cannot match Windows performance 10 + while offering accurate semantics. It is implemented entirely in 11 + software, and does not drive any hardware device. 12 + 13 + This interface is meant as a compatibility tool only, and should not 14 + be used for general synchronization. Instead use generic, versatile 15 + interfaces such as futex(2) and poll(2). 16 + 17 + Synchronization primitives 18 + ========================== 19 + 20 + The ntsync driver exposes three types of synchronization primitives: 21 + semaphores, mutexes, and events. 22 + 23 + A semaphore holds a single volatile 32-bit counter, and a static 32-bit 24 + integer denoting the maximum value. It is considered signaled (that is, 25 + can be acquired without contention, or will wake up a waiting thread) 26 + when the counter is nonzero. The counter is decremented by one when a 27 + wait is satisfied. Both the initial and maximum count are established 28 + when the semaphore is created. 29 + 30 + A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit 31 + identifier denoting its owner. A mutex is considered signaled when its 32 + owner is zero (indicating that it is not owned). The recursion count is 33 + incremented when a wait is satisfied, and ownership is set to the given 34 + identifier. 35 + 36 + A mutex also holds an internal flag denoting whether its previous owner 37 + has died; such a mutex is said to be abandoned. Owner death is not 38 + tracked automatically based on thread death, but rather must be 39 + communicated using ``NTSYNC_IOC_MUTEX_KILL``. An abandoned mutex is 40 + inherently considered unowned. 41 + 42 + Except for the "unowned" semantics of zero, the actual value of the 43 + owner identifier is not interpreted by the ntsync driver at all. The 44 + intended use is to store a thread identifier; however, the ntsync 45 + driver does not actually validate that a calling thread provides 46 + consistent or unique identifiers. 47 + 48 + An event is similar to a semaphore with a maximum count of one. It holds 49 + a volatile boolean state denoting whether it is signaled or not. There 50 + are two types of events, auto-reset and manual-reset. An auto-reset 51 + event is designaled when a wait is satisfied; a manual-reset event is 52 + not. The event type is specified when the event is created. 53 + 54 + Unless specified otherwise, all operations on an object are atomic and 55 + totally ordered with respect to other operations on the same object. 56 + 57 + Objects are represented by files. When all file descriptors to an 58 + object are closed, that object is deleted. 59 + 60 + Char device 61 + =========== 62 + 63 + The ntsync driver creates a single char device /dev/ntsync. Each file 64 + description opened on the device represents a unique instance intended 65 + to back an individual NT virtual machine. Objects created by one ntsync 66 + instance may only be used with other objects created by the same 67 + instance. 68 + 69 + ioctl reference 70 + =============== 71 + 72 + All operations on the device are done through ioctls. There are four 73 + structures used in ioctl calls:: 74 + 75 + struct ntsync_sem_args { 76 + __u32 count; 77 + __u32 max; 78 + }; 79 + 80 + struct ntsync_mutex_args { 81 + __u32 owner; 82 + __u32 count; 83 + }; 84 + 85 + struct ntsync_event_args { 86 + __u32 signaled; 87 + __u32 manual; 88 + }; 89 + 90 + struct ntsync_wait_args { 91 + __u64 timeout; 92 + __u64 objs; 93 + __u32 count; 94 + __u32 owner; 95 + __u32 index; 96 + __u32 alert; 97 + __u32 flags; 98 + __u32 pad; 99 + }; 100 + 101 + Depending on the ioctl, members of the structure may be used as input, 102 + output, or not at all. 103 + 104 + The ioctls on the device file are as follows: 105 + 106 + .. c:macro:: NTSYNC_IOC_CREATE_SEM 107 + 108 + Create a semaphore object. Takes a pointer to struct 109 + :c:type:`ntsync_sem_args`, which is used as follows: 110 + 111 + .. list-table:: 112 + 113 + * - ``count`` 114 + - Initial count of the semaphore. 115 + * - ``max`` 116 + - Maximum count of the semaphore. 117 + 118 + Fails with ``EINVAL`` if ``count`` is greater than ``max``. 119 + On success, returns a file descriptor the created semaphore. 120 + 121 + .. c:macro:: NTSYNC_IOC_CREATE_MUTEX 122 + 123 + Create a mutex object. Takes a pointer to struct 124 + :c:type:`ntsync_mutex_args`, which is used as follows: 125 + 126 + .. list-table:: 127 + 128 + * - ``count`` 129 + - Initial recursion count of the mutex. 130 + * - ``owner`` 131 + - Initial owner of the mutex. 132 + 133 + If ``owner`` is nonzero and ``count`` is zero, or if ``owner`` is 134 + zero and ``count`` is nonzero, the function fails with ``EINVAL``. 135 + On success, returns a file descriptor the created mutex. 136 + 137 + .. c:macro:: NTSYNC_IOC_CREATE_EVENT 138 + 139 + Create an event object. Takes a pointer to struct 140 + :c:type:`ntsync_event_args`, which is used as follows: 141 + 142 + .. list-table:: 143 + 144 + * - ``signaled`` 145 + - If nonzero, the event is initially signaled, otherwise 146 + nonsignaled. 147 + * - ``manual`` 148 + - If nonzero, the event is a manual-reset event, otherwise 149 + auto-reset. 150 + 151 + On success, returns a file descriptor the created event. 152 + 153 + The ioctls on the individual objects are as follows: 154 + 155 + .. c:macro:: NTSYNC_IOC_SEM_POST 156 + 157 + Post to a semaphore object. Takes a pointer to a 32-bit integer, 158 + which on input holds the count to be added to the semaphore, and on 159 + output contains its previous count. 160 + 161 + If adding to the semaphore's current count would raise the latter 162 + past the semaphore's maximum count, the ioctl fails with 163 + ``EOVERFLOW`` and the semaphore is not affected. If raising the 164 + semaphore's count causes it to become signaled, eligible threads 165 + waiting on this semaphore will be woken and the semaphore's count 166 + decremented appropriately. 167 + 168 + .. c:macro:: NTSYNC_IOC_MUTEX_UNLOCK 169 + 170 + Release a mutex object. Takes a pointer to struct 171 + :c:type:`ntsync_mutex_args`, which is used as follows: 172 + 173 + .. list-table:: 174 + 175 + * - ``owner`` 176 + - Specifies the owner trying to release this mutex. 177 + * - ``count`` 178 + - On output, contains the previous recursion count. 179 + 180 + If ``owner`` is zero, the ioctl fails with ``EINVAL``. If ``owner`` 181 + is not the current owner of the mutex, the ioctl fails with 182 + ``EPERM``. 183 + 184 + The mutex's count will be decremented by one. If decrementing the 185 + mutex's count causes it to become zero, the mutex is marked as 186 + unowned and signaled, and eligible threads waiting on it will be 187 + woken as appropriate. 188 + 189 + .. c:macro:: NTSYNC_IOC_SET_EVENT 190 + 191 + Signal an event object. Takes a pointer to a 32-bit integer, which on 192 + output contains the previous state of the event. 193 + 194 + Eligible threads will be woken, and auto-reset events will be 195 + designaled appropriately. 196 + 197 + .. c:macro:: NTSYNC_IOC_RESET_EVENT 198 + 199 + Designal an event object. Takes a pointer to a 32-bit integer, which 200 + on output contains the previous state of the event. 201 + 202 + .. c:macro:: NTSYNC_IOC_PULSE_EVENT 203 + 204 + Wake threads waiting on an event object while leaving it in an 205 + unsignaled state. Takes a pointer to a 32-bit integer, which on 206 + output contains the previous state of the event. 207 + 208 + A pulse operation can be thought of as a set followed by a reset, 209 + performed as a single atomic operation. If two threads are waiting on 210 + an auto-reset event which is pulsed, only one will be woken. If two 211 + threads are waiting a manual-reset event which is pulsed, both will 212 + be woken. However, in both cases, the event will be unsignaled 213 + afterwards, and a simultaneous read operation will always report the 214 + event as unsignaled. 215 + 216 + .. c:macro:: NTSYNC_IOC_READ_SEM 217 + 218 + Read the current state of a semaphore object. Takes a pointer to 219 + struct :c:type:`ntsync_sem_args`, which is used as follows: 220 + 221 + .. list-table:: 222 + 223 + * - ``count`` 224 + - On output, contains the current count of the semaphore. 225 + * - ``max`` 226 + - On output, contains the maximum count of the semaphore. 227 + 228 + .. c:macro:: NTSYNC_IOC_READ_MUTEX 229 + 230 + Read the current state of a mutex object. Takes a pointer to struct 231 + :c:type:`ntsync_mutex_args`, which is used as follows: 232 + 233 + .. list-table:: 234 + 235 + * - ``owner`` 236 + - On output, contains the current owner of the mutex, or zero 237 + if the mutex is not currently owned. 238 + * - ``count`` 239 + - On output, contains the current recursion count of the mutex. 240 + 241 + If the mutex is marked as abandoned, the function fails with 242 + ``EOWNERDEAD``. In this case, ``count`` and ``owner`` are set to 243 + zero. 244 + 245 + .. c:macro:: NTSYNC_IOC_READ_EVENT 246 + 247 + Read the current state of an event object. Takes a pointer to struct 248 + :c:type:`ntsync_event_args`, which is used as follows: 249 + 250 + .. list-table:: 251 + 252 + * - ``signaled`` 253 + - On output, contains the current state of the event. 254 + * - ``manual`` 255 + - On output, contains 1 if the event is a manual-reset event, 256 + and 0 otherwise. 257 + 258 + .. c:macro:: NTSYNC_IOC_KILL_OWNER 259 + 260 + Mark a mutex as unowned and abandoned if it is owned by the given 261 + owner. Takes an input-only pointer to a 32-bit integer denoting the 262 + owner. If the owner is zero, the ioctl fails with ``EINVAL``. If the 263 + owner does not own the mutex, the function fails with ``EPERM``. 264 + 265 + Eligible threads waiting on the mutex will be woken as appropriate 266 + (and such waits will fail with ``EOWNERDEAD``, as described below). 267 + 268 + .. c:macro:: NTSYNC_IOC_WAIT_ANY 269 + 270 + Poll on any of a list of objects, atomically acquiring at most one. 271 + Takes a pointer to struct :c:type:`ntsync_wait_args`, which is 272 + used as follows: 273 + 274 + .. list-table:: 275 + 276 + * - ``timeout`` 277 + - Absolute timeout in nanoseconds. If ``NTSYNC_WAIT_REALTIME`` 278 + is set, the timeout is measured against the REALTIME clock; 279 + otherwise it is measured against the MONOTONIC clock. If the 280 + timeout is equal to or earlier than the current time, the 281 + function returns immediately without sleeping. If ``timeout`` 282 + is U64_MAX, the function will sleep until an object is 283 + signaled, and will not fail with ``ETIMEDOUT``. 284 + * - ``objs`` 285 + - Pointer to an array of ``count`` file descriptors 286 + (specified as an integer so that the structure has the same 287 + size regardless of architecture). If any object is 288 + invalid, the function fails with ``EINVAL``. 289 + * - ``count`` 290 + - Number of objects specified in the ``objs`` array. 291 + If greater than ``NTSYNC_MAX_WAIT_COUNT``, the function fails 292 + with ``EINVAL``. 293 + * - ``owner`` 294 + - Mutex owner identifier. If any object in ``objs`` is a mutex, 295 + the ioctl will attempt to acquire that mutex on behalf of 296 + ``owner``. If ``owner`` is zero, the ioctl fails with 297 + ``EINVAL``. 298 + * - ``index`` 299 + - On success, contains the index (into ``objs``) of the object 300 + which was signaled. If ``alert`` was signaled instead, 301 + this contains ``count``. 302 + * - ``alert`` 303 + - Optional event object file descriptor. If nonzero, this 304 + specifies an "alert" event object which, if signaled, will 305 + terminate the wait. If nonzero, the identifier must point to a 306 + valid event. 307 + * - ``flags`` 308 + - Zero or more flags. Currently the only flag is 309 + ``NTSYNC_WAIT_REALTIME``, which causes the timeout to be 310 + measured against the REALTIME clock instead of MONOTONIC. 311 + * - ``pad`` 312 + - Unused, must be set to zero. 313 + 314 + This function attempts to acquire one of the given objects. If unable 315 + to do so, it sleeps until an object becomes signaled, subsequently 316 + acquiring it, or the timeout expires. In the latter case the ioctl 317 + fails with ``ETIMEDOUT``. The function only acquires one object, even 318 + if multiple objects are signaled. 319 + 320 + A semaphore is considered to be signaled if its count is nonzero, and 321 + is acquired by decrementing its count by one. A mutex is considered 322 + to be signaled if it is unowned or if its owner matches the ``owner`` 323 + argument, and is acquired by incrementing its recursion count by one 324 + and setting its owner to the ``owner`` argument. An auto-reset event 325 + is acquired by designaling it; a manual-reset event is not affected 326 + by acquisition. 327 + 328 + Acquisition is atomic and totally ordered with respect to other 329 + operations on the same object. If two wait operations (with different 330 + ``owner`` identifiers) are queued on the same mutex, only one is 331 + signaled. If two wait operations are queued on the same semaphore, 332 + and a value of one is posted to it, only one is signaled. 333 + 334 + If an abandoned mutex is acquired, the ioctl fails with 335 + ``EOWNERDEAD``. Although this is a failure return, the function may 336 + otherwise be considered successful. The mutex is marked as owned by 337 + the given owner (with a recursion count of 1) and as no longer 338 + abandoned, and ``index`` is still set to the index of the mutex. 339 + 340 + The ``alert`` argument is an "extra" event which can terminate the 341 + wait, independently of all other objects. 342 + 343 + It is valid to pass the same object more than once, including by 344 + passing the same event in the ``objs`` array and in ``alert``. If a 345 + wakeup occurs due to that object being signaled, ``index`` is set to 346 + the lowest index corresponding to that object. 347 + 348 + The function may fail with ``EINTR`` if a signal is received. 349 + 350 + .. c:macro:: NTSYNC_IOC_WAIT_ALL 351 + 352 + Poll on a list of objects, atomically acquiring all of them. Takes a 353 + pointer to struct :c:type:`ntsync_wait_args`, which is used 354 + identically to ``NTSYNC_IOC_WAIT_ANY``, except that ``index`` is 355 + always filled with zero on success if not woken via alert. 356 + 357 + This function attempts to simultaneously acquire all of the given 358 + objects. If unable to do so, it sleeps until all objects become 359 + simultaneously signaled, subsequently acquiring them, or the timeout 360 + expires. In the latter case the ioctl fails with ``ETIMEDOUT`` and no 361 + objects are modified. 362 + 363 + Objects may become signaled and subsequently designaled (through 364 + acquisition by other threads) while this thread is sleeping. Only 365 + once all objects are simultaneously signaled does the ioctl acquire 366 + them and return. The entire acquisition is atomic and totally ordered 367 + with respect to other operations on any of the given objects. 368 + 369 + If an abandoned mutex is acquired, the ioctl fails with 370 + ``EOWNERDEAD``. Similarly to ``NTSYNC_IOC_WAIT_ANY``, all objects are 371 + nevertheless marked as acquired. Note that if multiple mutex objects 372 + are specified, there is no way to know which were marked as 373 + abandoned. 374 + 375 + As with "any" waits, the ``alert`` argument is an "extra" event which 376 + can terminate the wait. Critically, however, an "all" wait will 377 + succeed if all members in ``objs`` are signaled, *or* if ``alert`` is 378 + signaled. In the latter case ``index`` will be set to ``count``. As 379 + with "any" waits, if both conditions are filled, the former takes 380 + priority, and objects in ``objs`` will be acquired. 381 + 382 + Unlike ``NTSYNC_IOC_WAIT_ANY``, it is not valid to pass the same 383 + object more than once, nor is it valid to pass the same object in 384 + ``objs`` and in ``alert``. If this is attempted, the function fails 385 + with ``EINVAL``.