Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

vfio: Extend the device migration protocol with PRE_COPY

The optional PRE_COPY states open the saving data transfer FD before
reaching STOP_COPY and allows the device to dirty track internal state
changes with the general idea to reduce the volume of data transferred
in the STOP_COPY stage.

While in PRE_COPY the device remains RUNNING, but the saving FD is open.

Only if the device also supports RUNNING_P2P can it support PRE_COPY_P2P,
which halts P2P transfers while continuing the saving FD.

PRE_COPY, with P2P support, requires the driver to implement 7 new arcs
and exists as an optional FSM branch between RUNNING and STOP_COPY:
RUNNING -> PRE_COPY -> PRE_COPY_P2P -> STOP_COPY

A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to
query the progress of the precopy operation in the driver with the idea it
will judge to move to STOP_COPY at least once the initial data set is
transferred, and possibly after the dirty size has shrunk appropriately.

This ioctl is valid only in PRE_COPY states and kernel driver should
return -EINVAL from any other migration state.

Compared to the v1 clarification, STOP_COPY -> PRE_COPY is blocked
and to be defined in future.
We also split the pending_bytes report into the initial and sustaining
values, e.g.: initial_bytes and dirty_bytes.
initial_bytes: Amount of initial precopy data.
dirty_bytes: Device state changes relative to data previously retrieved.
These fields are not required to have any bearing to STOP_COPY phase.

It is recommended to leave PRE_COPY for STOP_COPY only after the
initial_bytes field reaches zero. Leaving PRE_COPY earlier might make
things slower.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20221206083438.37807-3-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

authored by

Jason Gunthorpe and committed by
Alex Williamson
4db52602 c943a937

+192 -7
+72 -2
drivers/vfio/vfio_main.c
··· 1042 1042 enum vfio_device_mig_state new_fsm, 1043 1043 enum vfio_device_mig_state *next_fsm) 1044 1044 { 1045 - enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RUNNING_P2P + 1 }; 1045 + enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_PRE_COPY_P2P + 1 }; 1046 1046 /* 1047 1047 * The coding in this table requires the driver to implement the 1048 1048 * following FSM arcs: ··· 1057 1057 * RUNNING_P2P -> RUNNING 1058 1058 * RUNNING_P2P -> STOP 1059 1059 * STOP -> RUNNING_P2P 1060 - * Without P2P the driver must implement: 1060 + * 1061 + * If precopy is supported then the driver must support these additional 1062 + * FSM arcs: 1063 + * RUNNING -> PRE_COPY 1064 + * PRE_COPY -> RUNNING 1065 + * PRE_COPY -> STOP_COPY 1066 + * However, if precopy and P2P are supported together then the driver 1067 + * must support these additional arcs beyond the P2P arcs above: 1068 + * PRE_COPY -> RUNNING 1069 + * PRE_COPY -> PRE_COPY_P2P 1070 + * PRE_COPY_P2P -> PRE_COPY 1071 + * PRE_COPY_P2P -> RUNNING_P2P 1072 + * PRE_COPY_P2P -> STOP_COPY 1073 + * RUNNING -> PRE_COPY 1074 + * RUNNING_P2P -> PRE_COPY_P2P 1075 + * 1076 + * Without P2P and precopy the driver must implement: 1061 1077 * RUNNING -> STOP 1062 1078 * STOP -> RUNNING 1063 1079 * 1064 1080 * The coding will step through multiple states for some combination 1065 1081 * transitions; if all optional features are supported, this means the 1066 1082 * following ones: 1083 + * PRE_COPY -> PRE_COPY_P2P -> STOP_COPY 1084 + * PRE_COPY -> RUNNING -> RUNNING_P2P 1085 + * PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP 1086 + * PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP -> RESUMING 1087 + * PRE_COPY_P2P -> RUNNING_P2P -> RUNNING 1088 + * PRE_COPY_P2P -> RUNNING_P2P -> STOP 1089 + * PRE_COPY_P2P -> RUNNING_P2P -> STOP -> RESUMING 1067 1090 * RESUMING -> STOP -> RUNNING_P2P 1091 + * RESUMING -> STOP -> RUNNING_P2P -> PRE_COPY_P2P 1068 1092 * RESUMING -> STOP -> RUNNING_P2P -> RUNNING 1093 + * RESUMING -> STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY 1069 1094 * RESUMING -> STOP -> STOP_COPY 1095 + * RUNNING -> RUNNING_P2P -> PRE_COPY_P2P 1070 1096 * RUNNING -> RUNNING_P2P -> STOP 1071 1097 * RUNNING -> RUNNING_P2P -> STOP -> RESUMING 1072 1098 * RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY 1099 + * RUNNING_P2P -> RUNNING -> PRE_COPY 1073 1100 * RUNNING_P2P -> STOP -> RESUMING 1074 1101 * RUNNING_P2P -> STOP -> STOP_COPY 1102 + * STOP -> RUNNING_P2P -> PRE_COPY_P2P 1075 1103 * STOP -> RUNNING_P2P -> RUNNING 1104 + * STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY 1076 1105 * STOP_COPY -> STOP -> RESUMING 1077 1106 * STOP_COPY -> STOP -> RUNNING_P2P 1078 1107 * STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING 1108 + * 1109 + * The following transitions are blocked: 1110 + * STOP_COPY -> PRE_COPY 1111 + * STOP_COPY -> PRE_COPY_P2P 1079 1112 */ 1080 1113 static const u8 vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STATES] = { 1081 1114 [VFIO_DEVICE_STATE_STOP] = { 1082 1115 [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, 1083 1116 [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P, 1117 + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P, 1118 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, 1084 1119 [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, 1085 1120 [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, 1086 1121 [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, ··· 1124 1089 [VFIO_DEVICE_STATE_RUNNING] = { 1125 1090 [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P, 1126 1091 [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, 1092 + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, 1093 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, 1127 1094 [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P, 1095 + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P, 1096 + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, 1097 + [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, 1098 + }, 1099 + [VFIO_DEVICE_STATE_PRE_COPY] = { 1100 + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING, 1101 + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, 1102 + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, 1103 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, 1104 + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_PRE_COPY_P2P, 1105 + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING, 1106 + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING, 1107 + [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, 1108 + }, 1109 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = { 1110 + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P, 1111 + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P, 1112 + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, 1113 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, 1114 + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, 1128 1115 [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P, 1129 1116 [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, 1130 1117 [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, ··· 1154 1097 [VFIO_DEVICE_STATE_STOP_COPY] = { 1155 1098 [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, 1156 1099 [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, 1100 + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR, 1101 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR, 1157 1102 [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, 1158 1103 [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, 1159 1104 [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, ··· 1164 1105 [VFIO_DEVICE_STATE_RESUMING] = { 1165 1106 [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, 1166 1107 [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, 1108 + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_STOP, 1109 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_STOP, 1167 1110 [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, 1168 1111 [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, 1169 1112 [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, ··· 1174 1113 [VFIO_DEVICE_STATE_RUNNING_P2P] = { 1175 1114 [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, 1176 1115 [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, 1116 + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING, 1117 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, 1177 1118 [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, 1178 1119 [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, 1179 1120 [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, ··· 1184 1121 [VFIO_DEVICE_STATE_ERROR] = { 1185 1122 [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_ERROR, 1186 1123 [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_ERROR, 1124 + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR, 1125 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR, 1187 1126 [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_ERROR, 1188 1127 [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_ERROR, 1189 1128 [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_ERROR, ··· 1196 1131 static const unsigned int state_flags_table[VFIO_DEVICE_NUM_STATES] = { 1197 1132 [VFIO_DEVICE_STATE_STOP] = VFIO_MIGRATION_STOP_COPY, 1198 1133 [VFIO_DEVICE_STATE_RUNNING] = VFIO_MIGRATION_STOP_COPY, 1134 + [VFIO_DEVICE_STATE_PRE_COPY] = 1135 + VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY, 1136 + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_MIGRATION_STOP_COPY | 1137 + VFIO_MIGRATION_P2P | 1138 + VFIO_MIGRATION_PRE_COPY, 1199 1139 [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_MIGRATION_STOP_COPY, 1200 1140 [VFIO_DEVICE_STATE_RESUMING] = VFIO_MIGRATION_STOP_COPY, 1201 1141 [VFIO_DEVICE_STATE_RUNNING_P2P] =
+120 -5
include/uapi/linux/vfio.h
··· 819 819 * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P 820 820 * is supported in addition to the STOP_COPY states. 821 821 * 822 + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY means that 823 + * PRE_COPY is supported in addition to the STOP_COPY states. 824 + * 825 + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY 826 + * means that RUNNING_P2P, PRE_COPY and PRE_COPY_P2P are supported 827 + * in addition to the STOP_COPY states. 828 + * 822 829 * Other combinations of flags have behavior to be defined in the future. 823 830 */ 824 831 struct vfio_device_feature_migration { 825 832 __aligned_u64 flags; 826 833 #define VFIO_MIGRATION_STOP_COPY (1 << 0) 827 834 #define VFIO_MIGRATION_P2P (1 << 1) 835 + #define VFIO_MIGRATION_PRE_COPY (1 << 2) 828 836 }; 829 837 #define VFIO_DEVICE_FEATURE_MIGRATION 1 830 838 ··· 883 875 * RESUMING - The device is stopped and is loading a new internal state 884 876 * ERROR - The device has failed and must be reset 885 877 * 886 - * And 1 optional state to support VFIO_MIGRATION_P2P: 878 + * And optional states to support VFIO_MIGRATION_P2P: 887 879 * RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA 880 + * And VFIO_MIGRATION_PRE_COPY: 881 + * PRE_COPY - The device is running normally but tracking internal state 882 + * changes 883 + * And VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY: 884 + * PRE_COPY_P2P - PRE_COPY, except the device cannot do peer to peer DMA 888 885 * 889 886 * The FSM takes actions on the arcs between FSM states. The driver implements 890 887 * the following behavior for the FSM arcs: ··· 921 908 * 922 909 * To abort a RESUMING session the device must be reset. 923 910 * 911 + * PRE_COPY -> RUNNING 924 912 * RUNNING_P2P -> RUNNING 925 913 * While in RUNNING the device is fully operational, the device may generate 926 914 * interrupts, DMA, respond to MMIO, all vfio device regions are functional, 927 915 * and the device may advance its internal state. 928 916 * 917 + * The PRE_COPY arc will terminate a data transfer session. 918 + * 919 + * PRE_COPY_P2P -> RUNNING_P2P 929 920 * RUNNING -> RUNNING_P2P 930 921 * STOP -> RUNNING_P2P 931 922 * While in RUNNING_P2P the device is partially running in the P2P quiescent 932 923 * state defined below. 933 924 * 934 - * STOP -> STOP_COPY 935 - * This arc begin the process of saving the device state and will return a 936 - * new data_fd. 925 + * The PRE_COPY_P2P arc will terminate a data transfer session. 937 926 * 927 + * RUNNING -> PRE_COPY 928 + * RUNNING_P2P -> PRE_COPY_P2P 929 + * STOP -> STOP_COPY 930 + * PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states 931 + * which share a data transfer session. Moving between these states alters 932 + * what is streamed in session, but does not terminate or otherwise affect 933 + * the associated fd. 934 + * 935 + * These arcs begin the process of saving the device state and will return a 936 + * new data_fd. The migration driver may perform actions such as enabling 937 + * dirty logging of device state when entering PRE_COPY or PER_COPY_P2P. 938 + * 939 + * Each arc does not change the device operation, the device remains 940 + * RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below 941 + * in PRE_COPY_P2P -> STOP_COPY. 942 + * 943 + * PRE_COPY -> PRE_COPY_P2P 944 + * Entering PRE_COPY_P2P continues all the behaviors of PRE_COPY above. 945 + * However, while in the PRE_COPY_P2P state, the device is partially running 946 + * in the P2P quiescent state defined below, like RUNNING_P2P. 947 + * 948 + * PRE_COPY_P2P -> PRE_COPY 949 + * This arc allows returning the device to a full RUNNING behavior while 950 + * continuing all the behaviors of PRE_COPY. 951 + * 952 + * PRE_COPY_P2P -> STOP_COPY 938 953 * While in the STOP_COPY state the device has the same behavior as STOP 939 954 * with the addition that the data transfers session continues to stream the 940 955 * migration state. End of stream on the FD indicates the entire device ··· 980 939 * device state for this arc if required to prepare the device to receive the 981 940 * migration data. 982 941 * 942 + * STOP_COPY -> PRE_COPY 943 + * STOP_COPY -> PRE_COPY_P2P 944 + * These arcs are not permitted and return error if requested. Future 945 + * revisions of this API may define behaviors for these arcs, in this case 946 + * support will be discoverable by a new flag in 947 + * VFIO_DEVICE_FEATURE_MIGRATION. 948 + * 983 949 * any -> ERROR 984 950 * ERROR cannot be specified as a device state, however any transition request 985 951 * can be failed with an errno return and may then move the device_state into ··· 998 950 * The optional peer to peer (P2P) quiescent state is intended to be a quiescent 999 951 * state for the device for the purposes of managing multiple devices within a 1000 952 * user context where peer-to-peer DMA between devices may be active. The 1001 - * RUNNING_P2P states must prevent the device from initiating 953 + * RUNNING_P2P and PRE_COPY_P2P states must prevent the device from initiating 1002 954 * any new P2P DMA transactions. If the device can identify P2P transactions 1003 955 * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration 1004 956 * driver must complete any such outstanding operations prior to completing the ··· 1011 963 * above FSM arcs. As there are multiple paths through the FSM arcs the path 1012 964 * should be selected based on the following rules: 1013 965 * - Select the shortest path. 966 + * - The path cannot have saving group states as interior arcs, only 967 + * starting/end states. 1014 968 * Refer to vfio_mig_get_next_state() for the result of the algorithm. 1015 969 * 1016 970 * The automatic transit through the FSM arcs that make up the combination ··· 1026 976 * support them. The user can discover if these states are supported by using 1027 977 * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can 1028 978 * avoid knowing about these optional states if the kernel driver supports them. 979 + * 980 + * Arcs touching PRE_COPY and PRE_COPY_P2P are removed if support for PRE_COPY 981 + * is not present. 1029 982 */ 1030 983 enum vfio_device_mig_state { 1031 984 VFIO_DEVICE_STATE_ERROR = 0, ··· 1037 984 VFIO_DEVICE_STATE_STOP_COPY = 3, 1038 985 VFIO_DEVICE_STATE_RESUMING = 4, 1039 986 VFIO_DEVICE_STATE_RUNNING_P2P = 5, 987 + VFIO_DEVICE_STATE_PRE_COPY = 6, 988 + VFIO_DEVICE_STATE_PRE_COPY_P2P = 7, 1040 989 }; 990 + 991 + /** 992 + * VFIO_MIG_GET_PRECOPY_INFO - _IO(VFIO_TYPE, VFIO_BASE + 21) 993 + * 994 + * This ioctl is used on the migration data FD in the precopy phase of the 995 + * migration data transfer. It returns an estimate of the current data sizes 996 + * remaining to be transferred. It allows the user to judge when it is 997 + * appropriate to leave PRE_COPY for STOP_COPY. 998 + * 999 + * This ioctl is valid only in PRE_COPY states and kernel driver should 1000 + * return -EINVAL from any other migration state. 1001 + * 1002 + * The vfio_precopy_info data structure returned by this ioctl provides 1003 + * estimates of data available from the device during the PRE_COPY states. 1004 + * This estimate is split into two categories, initial_bytes and 1005 + * dirty_bytes. 1006 + * 1007 + * The initial_bytes field indicates the amount of initial precopy 1008 + * data available from the device. This field should have a non-zero initial 1009 + * value and decrease as migration data is read from the device. 1010 + * It is recommended to leave PRE_COPY for STOP_COPY only after this field 1011 + * reaches zero. Leaving PRE_COPY earlier might make things slower. 1012 + * 1013 + * The dirty_bytes field tracks device state changes relative to data 1014 + * previously retrieved. This field starts at zero and may increase as 1015 + * the internal device state is modified or decrease as that modified 1016 + * state is read from the device. 1017 + * 1018 + * Userspace may use the combination of these fields to estimate the 1019 + * potential data size available during the PRE_COPY phases, as well as 1020 + * trends relative to the rate the device is dirtying its internal 1021 + * state, but these fields are not required to have any bearing relative 1022 + * to the data size available during the STOP_COPY phase. 1023 + * 1024 + * Drivers have a lot of flexibility in when and what they transfer during the 1025 + * PRE_COPY phase, and how they report this from VFIO_MIG_GET_PRECOPY_INFO. 1026 + * 1027 + * During pre-copy the migration data FD has a temporary "end of stream" that is 1028 + * reached when both initial_bytes and dirty_byte are zero. For instance, this 1029 + * may indicate that the device is idle and not currently dirtying any internal 1030 + * state. When read() is done on this temporary end of stream the kernel driver 1031 + * should return ENOMSG from read(). Userspace can wait for more data (which may 1032 + * never come) by using poll. 1033 + * 1034 + * Once in STOP_COPY the migration data FD has a permanent end of stream 1035 + * signaled in the usual way by read() always returning 0 and poll always 1036 + * returning readable. ENOMSG may not be returned in STOP_COPY. 1037 + * Support for this ioctl is mandatory if a driver claims to support 1038 + * VFIO_MIGRATION_PRE_COPY. 1039 + * 1040 + * Return: 0 on success, -1 and errno set on failure. 1041 + */ 1042 + struct vfio_precopy_info { 1043 + __u32 argsz; 1044 + __u32 flags; 1045 + __aligned_u64 initial_bytes; 1046 + __aligned_u64 dirty_bytes; 1047 + }; 1048 + 1049 + #define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21) 1041 1050 1042 1051 /* 1043 1052 * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power