Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

liveupdate: luo_session: add ioctls for file preservation

Introducing the userspace interface and internal logic required to manage
the lifecycle of file descriptors within a session. Previously, a session
was merely a container; this change makes it a functional management unit.

The following capabilities are added:

A new set of ioctl commands are added, which operate on the file
descriptor returned by CREATE_SESSION. This allows userspace to:
- LIVEUPDATE_SESSION_PRESERVE_FD: Add a file descriptor to a session
to be preserved across the live update.
- LIVEUPDATE_SESSION_RETRIEVE_FD: Retrieve a preserved file in the
new kernel using its unique token.
- LIVEUPDATE_SESSION_FINISH: finish session

The session's .release handler is enhanced to be state-aware. When a
session's file descriptor is closed, it correctly unpreserves the session
based on its current state before freeing all associated file resources.

Link: https://lkml.kernel.org/r/20251125165850.3389713-8-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: David Matlack <dmatlack@google.com>
Cc: Aleksander Lobakin <aleksander.lobakin@intel.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: anish kumar <yesanishhere@gmail.com>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guixin Liu <kanie@linux.alibaba.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Myugnjoo Ham <myungjoo.ham@samsung.com>
Cc: Parav Pandit <parav@nvidia.com>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: William Tu <witu@nvidia.com>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Pasha Tatashin and committed by
Andrew Morton
16cec0d2 7c722a7f

+288 -2
+103
include/uapi/linux/liveupdate.h
··· 53 53 LIVEUPDATE_CMD_RETRIEVE_SESSION = 0x01, 54 54 }; 55 55 56 + /* ioctl commands for session file descriptors */ 57 + enum { 58 + LIVEUPDATE_CMD_SESSION_BASE = 0x40, 59 + LIVEUPDATE_CMD_SESSION_PRESERVE_FD = LIVEUPDATE_CMD_SESSION_BASE, 60 + LIVEUPDATE_CMD_SESSION_RETRIEVE_FD = 0x41, 61 + LIVEUPDATE_CMD_SESSION_FINISH = 0x42, 62 + }; 63 + 56 64 /** 57 65 * struct liveupdate_ioctl_create_session - ioctl(LIVEUPDATE_IOCTL_CREATE_SESSION) 58 66 * @size: Input; sizeof(struct liveupdate_ioctl_create_session) ··· 117 109 118 110 #define LIVEUPDATE_IOCTL_RETRIEVE_SESSION \ 119 111 _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_RETRIEVE_SESSION) 112 + 113 + /* Session specific IOCTLs */ 114 + 115 + /** 116 + * struct liveupdate_session_preserve_fd - ioctl(LIVEUPDATE_SESSION_PRESERVE_FD) 117 + * @size: Input; sizeof(struct liveupdate_session_preserve_fd) 118 + * @fd: Input; The user-space file descriptor to be preserved. 119 + * @token: Input; An opaque, unique token for preserved resource. 120 + * 121 + * Holds parameters for preserving a file descriptor. 122 + * 123 + * User sets the @fd field identifying the file descriptor to preserve 124 + * (e.g., memfd, kvm, iommufd, VFIO). The kernel validates if this FD type 125 + * and its dependencies are supported for preservation. If validation passes, 126 + * the kernel marks the FD internally and *initiates the process* of preparing 127 + * its state for saving. The actual snapshotting of the state typically occurs 128 + * during the subsequent %LIVEUPDATE_IOCTL_PREPARE execution phase, though 129 + * some finalization might occur during freeze. 130 + * On successful validation and initiation, the kernel uses the @token 131 + * field with an opaque identifier representing the resource being preserved. 132 + * This token confirms the FD is targeted for preservation and is required for 133 + * the subsequent %LIVEUPDATE_SESSION_RETRIEVE_FD call after the live update. 134 + * 135 + * Return: 0 on success (validation passed, preservation initiated), negative 136 + * error code on failure (e.g., unsupported FD type, dependency issue, 137 + * validation failed). 138 + */ 139 + struct liveupdate_session_preserve_fd { 140 + __u32 size; 141 + __s32 fd; 142 + __aligned_u64 token; 143 + }; 144 + 145 + #define LIVEUPDATE_SESSION_PRESERVE_FD \ 146 + _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_SESSION_PRESERVE_FD) 147 + 148 + /** 149 + * struct liveupdate_session_retrieve_fd - ioctl(LIVEUPDATE_SESSION_RETRIEVE_FD) 150 + * @size: Input; sizeof(struct liveupdate_session_retrieve_fd) 151 + * @fd: Output; The new file descriptor representing the fully restored 152 + * kernel resource. 153 + * @token: Input; An opaque, token that was used to preserve the resource. 154 + * 155 + * Retrieve a previously preserved file descriptor. 156 + * 157 + * User sets the @token field to the value obtained from a successful 158 + * %LIVEUPDATE_IOCTL_FD_PRESERVE call before the live update. On success, 159 + * the kernel restores the state (saved during the PREPARE/FREEZE phases) 160 + * associated with the token and populates the @fd field with a new file 161 + * descriptor referencing the restored resource in the current (new) kernel. 162 + * This operation must be performed *before* signaling completion via 163 + * %LIVEUPDATE_IOCTL_FINISH. 164 + * 165 + * Return: 0 on success, negative error code on failure (e.g., invalid token). 166 + */ 167 + struct liveupdate_session_retrieve_fd { 168 + __u32 size; 169 + __s32 fd; 170 + __aligned_u64 token; 171 + }; 172 + 173 + #define LIVEUPDATE_SESSION_RETRIEVE_FD \ 174 + _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_SESSION_RETRIEVE_FD) 175 + 176 + /** 177 + * struct liveupdate_session_finish - ioctl(LIVEUPDATE_SESSION_FINISH) 178 + * @size: Input; sizeof(struct liveupdate_session_finish) 179 + * @reserved: Input; Must be zero. Reserved for future use. 180 + * 181 + * Signals the completion of the restoration process for a retrieved session. 182 + * This is the final operation that should be performed on a session file 183 + * descriptor after a live update. 184 + * 185 + * This ioctl must be called once all required file descriptors for the session 186 + * have been successfully retrieved (using %LIVEUPDATE_SESSION_RETRIEVE_FD) and 187 + * are fully restored from the userspace and kernel perspective. 188 + * 189 + * Upon success, the kernel releases its ownership of the preserved resources 190 + * associated with this session. This allows internal resources to be freed, 191 + * typically by decrementing reference counts on the underlying preserved 192 + * objects. 193 + * 194 + * If this operation fails, the resources remain preserved in memory. Userspace 195 + * may attempt to call finish again. The resources will otherwise be reset 196 + * during the next live update cycle. 197 + * 198 + * Return: 0 on success, negative error code on failure. 199 + */ 200 + struct liveupdate_session_finish { 201 + __u32 size; 202 + __u32 reserved; 203 + }; 204 + 205 + #define LIVEUPDATE_SESSION_FINISH \ 206 + _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_SESSION_FINISH) 120 207 121 208 #endif /* _UAPI_LIVEUPDATE_H */
+185 -2
kernel/liveupdate/luo_session.c
··· 125 125 return ERR_PTR(-ENOMEM); 126 126 127 127 strscpy(session->name, name, sizeof(session->name)); 128 + INIT_LIST_HEAD(&session->file_set.files_list); 129 + luo_file_set_init(&session->file_set); 128 130 INIT_LIST_HEAD(&session->list); 129 131 mutex_init(&session->mutex); 130 132 ··· 135 133 136 134 static void luo_session_free(struct luo_session *session) 137 135 { 136 + luo_file_set_destroy(&session->file_set); 138 137 mutex_destroy(&session->mutex); 139 138 kfree(session); 140 139 } ··· 180 177 sh->count--; 181 178 } 182 179 180 + static int luo_session_finish_one(struct luo_session *session) 181 + { 182 + guard(mutex)(&session->mutex); 183 + return luo_file_finish(&session->file_set); 184 + } 185 + 186 + static void luo_session_unfreeze_one(struct luo_session *session, 187 + struct luo_session_ser *ser) 188 + { 189 + guard(mutex)(&session->mutex); 190 + luo_file_unfreeze(&session->file_set, &ser->file_set_ser); 191 + } 192 + 193 + static int luo_session_freeze_one(struct luo_session *session, 194 + struct luo_session_ser *ser) 195 + { 196 + guard(mutex)(&session->mutex); 197 + return luo_file_freeze(&session->file_set, &ser->file_set_ser); 198 + } 199 + 183 200 static int luo_session_release(struct inode *inodep, struct file *filep) 184 201 { 185 202 struct luo_session *session = filep->private_data; 186 203 struct luo_session_header *sh; 187 204 188 205 /* If retrieved is set, it means this session is from incoming list */ 189 - if (session->retrieved) 206 + if (session->retrieved) { 207 + int err = luo_session_finish_one(session); 208 + 209 + if (err) { 210 + pr_warn("Unable to finish session [%s] on release\n", 211 + session->name); 212 + return err; 213 + } 190 214 sh = &luo_session_global.incoming; 191 - else 215 + } else { 216 + scoped_guard(mutex, &session->mutex) 217 + luo_file_unpreserve_files(&session->file_set); 192 218 sh = &luo_session_global.outgoing; 219 + } 193 220 194 221 luo_session_remove(sh, session); 195 222 luo_session_free(session); ··· 227 194 return 0; 228 195 } 229 196 197 + static int luo_session_preserve_fd(struct luo_session *session, 198 + struct luo_ucmd *ucmd) 199 + { 200 + struct liveupdate_session_preserve_fd *argp = ucmd->cmd; 201 + int err; 202 + 203 + guard(mutex)(&session->mutex); 204 + err = luo_preserve_file(&session->file_set, argp->token, argp->fd); 205 + if (err) 206 + return err; 207 + 208 + err = luo_ucmd_respond(ucmd, sizeof(*argp)); 209 + if (err) 210 + pr_warn("The file was successfully preserved, but response to user failed\n"); 211 + 212 + return err; 213 + } 214 + 215 + static int luo_session_retrieve_fd(struct luo_session *session, 216 + struct luo_ucmd *ucmd) 217 + { 218 + struct liveupdate_session_retrieve_fd *argp = ucmd->cmd; 219 + struct file *file; 220 + int err; 221 + 222 + argp->fd = get_unused_fd_flags(O_CLOEXEC); 223 + if (argp->fd < 0) 224 + return argp->fd; 225 + 226 + guard(mutex)(&session->mutex); 227 + err = luo_retrieve_file(&session->file_set, argp->token, &file); 228 + if (err < 0) 229 + goto err_put_fd; 230 + 231 + err = luo_ucmd_respond(ucmd, sizeof(*argp)); 232 + if (err) 233 + goto err_put_file; 234 + 235 + fd_install(argp->fd, file); 236 + 237 + return 0; 238 + 239 + err_put_file: 240 + fput(file); 241 + err_put_fd: 242 + put_unused_fd(argp->fd); 243 + 244 + return err; 245 + } 246 + 247 + static int luo_session_finish(struct luo_session *session, 248 + struct luo_ucmd *ucmd) 249 + { 250 + struct liveupdate_session_finish *argp = ucmd->cmd; 251 + int err = luo_session_finish_one(session); 252 + 253 + if (err) 254 + return err; 255 + 256 + return luo_ucmd_respond(ucmd, sizeof(*argp)); 257 + } 258 + 259 + union ucmd_buffer { 260 + struct liveupdate_session_finish finish; 261 + struct liveupdate_session_preserve_fd preserve; 262 + struct liveupdate_session_retrieve_fd retrieve; 263 + }; 264 + 265 + struct luo_ioctl_op { 266 + unsigned int size; 267 + unsigned int min_size; 268 + unsigned int ioctl_num; 269 + int (*execute)(struct luo_session *session, struct luo_ucmd *ucmd); 270 + }; 271 + 272 + #define IOCTL_OP(_ioctl, _fn, _struct, _last) \ 273 + [_IOC_NR(_ioctl) - LIVEUPDATE_CMD_SESSION_BASE] = { \ 274 + .size = sizeof(_struct) + \ 275 + BUILD_BUG_ON_ZERO(sizeof(union ucmd_buffer) < \ 276 + sizeof(_struct)), \ 277 + .min_size = offsetofend(_struct, _last), \ 278 + .ioctl_num = _ioctl, \ 279 + .execute = _fn, \ 280 + } 281 + 282 + static const struct luo_ioctl_op luo_session_ioctl_ops[] = { 283 + IOCTL_OP(LIVEUPDATE_SESSION_FINISH, luo_session_finish, 284 + struct liveupdate_session_finish, reserved), 285 + IOCTL_OP(LIVEUPDATE_SESSION_PRESERVE_FD, luo_session_preserve_fd, 286 + struct liveupdate_session_preserve_fd, token), 287 + IOCTL_OP(LIVEUPDATE_SESSION_RETRIEVE_FD, luo_session_retrieve_fd, 288 + struct liveupdate_session_retrieve_fd, token), 289 + }; 290 + 291 + static long luo_session_ioctl(struct file *filep, unsigned int cmd, 292 + unsigned long arg) 293 + { 294 + struct luo_session *session = filep->private_data; 295 + const struct luo_ioctl_op *op; 296 + struct luo_ucmd ucmd = {}; 297 + union ucmd_buffer buf; 298 + unsigned int nr; 299 + int ret; 300 + 301 + nr = _IOC_NR(cmd); 302 + if (nr < LIVEUPDATE_CMD_SESSION_BASE || (nr - LIVEUPDATE_CMD_SESSION_BASE) >= 303 + ARRAY_SIZE(luo_session_ioctl_ops)) { 304 + return -EINVAL; 305 + } 306 + 307 + ucmd.ubuffer = (void __user *)arg; 308 + ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer); 309 + if (ret) 310 + return ret; 311 + 312 + op = &luo_session_ioctl_ops[nr - LIVEUPDATE_CMD_SESSION_BASE]; 313 + if (op->ioctl_num != cmd) 314 + return -ENOIOCTLCMD; 315 + if (ucmd.user_size < op->min_size) 316 + return -EINVAL; 317 + 318 + ucmd.cmd = &buf; 319 + ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer, 320 + ucmd.user_size); 321 + if (ret) 322 + return ret; 323 + 324 + return op->execute(session, &ucmd); 325 + } 326 + 230 327 static const struct file_operations luo_session_fops = { 231 328 .owner = THIS_MODULE, 232 329 .release = luo_session_release, 330 + .unlocked_ioctl = luo_session_ioctl, 233 331 }; 234 332 235 333 /* Create a "struct file" for session */ ··· 556 392 luo_session_free(session); 557 393 return err; 558 394 } 395 + 396 + scoped_guard(mutex, &session->mutex) { 397 + luo_file_deserialize(&session->file_set, 398 + &sh->ser[i].file_set_ser); 399 + } 559 400 } 560 401 561 402 kho_restore_free(sh->header_ser); ··· 575 406 struct luo_session_header *sh = &luo_session_global.outgoing; 576 407 struct luo_session *session; 577 408 int i = 0; 409 + int err; 578 410 579 411 guard(rwsem_write)(&sh->rwsem); 580 412 list_for_each_entry(session, &sh->list, list) { 413 + err = luo_session_freeze_one(session, &sh->ser[i]); 414 + if (err) 415 + goto err_undo; 416 + 581 417 strscpy(sh->ser[i].name, session->name, 582 418 sizeof(sh->ser[i].name)); 583 419 i++; ··· 590 416 sh->header_ser->count = sh->count; 591 417 592 418 return 0; 419 + 420 + err_undo: 421 + list_for_each_entry_continue_reverse(session, &sh->list, list) { 422 + i--; 423 + luo_session_unfreeze_one(session, &sh->ser[i]); 424 + memset(sh->ser[i].name, 0, sizeof(sh->ser[i].name)); 425 + } 426 + 427 + return err; 593 428 } 594 429 595 430 /**