Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

objtool: Support stack layout changes in alternatives

The ORC unwinder showed a warning [1] which revealed the stack layout
didn't match what was expected. The problem was that paravirt patching
had replaced "CALL *pv_ops.irq.save_fl" with "PUSHF;POP". That changed
the stack layout between the PUSHF and the POP, so unwinding from an
interrupt which occurred between those two instructions would fail.

Part of the agreed upon solution was to rework the custom paravirt
patching code to use alternatives instead, since objtool already knows
how to read alternatives (and converging runtime patching infrastructure
is always a good thing anyway). But the main problem still remains,
which is that runtime patching can change the stack layout.

Making stack layout changes in alternatives was disallowed with commit
7117f16bf460 ("objtool: Fix ORC vs alternatives"), but now that paravirt
is going to be doing it, it needs to be supported.

One way to do so would be to modify the ORC table when the code gets
patched. But ORC is simple -- a good thing! -- and it's best to leave
it alone.

Instead, support stack layout changes by "flattening" all possible stack
states (CFI) from parallel alternative code streams into a single set of
linear states. The only necessary limitation is that CFI conflicts are
disallowed at all possible instruction boundaries.

For example, this scenario is allowed:

Alt1 Alt2 Alt3

0x00 CALL *pv_ops.save_fl CALL xen_save_fl PUSHF
0x01 POP %RAX
0x02 NOP
...
0x05 NOP
...
0x07 <insn>

The unwind information for offset-0x00 is identical for all 3
alternatives. Similarly offset-0x05 and higher also are identical (and
the same as 0x00). However offset-0x01 has deviating CFI, but that is
only relevant for Alt3, neither of the other alternative instruction
streams will ever hit that offset.

This scenario is NOT allowed:

Alt1 Alt2

0x00 CALL *pv_ops.save_fl PUSHF
0x01 NOP6
...
0x07 NOP POP %RAX

The problem here is that offset-0x7, which is an instruction boundary in
both possible instruction patch streams, has two conflicting stack
layouts.

[ The above examples were stolen from Peter Zijlstra. ]

The new flattened CFI array is used both for the detection of conflicts
(like the second example above) and the generation of linear ORC
entries.

BTW, another benefit of these changes is that, thanks to some related
cleanups (new fake nops and alt_group struct) objtool can finally be rid
of fake jumps, which were a constant source of headaches.

[1] https://lkml.kernel.org/r/20201111170536.arx2zbn4ngvjoov7@treble

Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>

+161 -113
+8 -6
tools/objtool/Documentation/stack-validation.txt
··· 315 315 function tracing inserts additional calls, which is not obvious from the 316 316 sources). 317 317 318 - 10. file.o: warning: func()+0x5c: alternative modifies stack 318 + 10. file.o: warning: func()+0x5c: stack layout conflict in alternatives 319 319 320 - This means that an alternative includes instructions that modify the 321 - stack. The problem is that there is only one ORC unwind table, this means 322 - that the ORC unwind entries must be valid for each of the alternatives. 323 - The easiest way to enforce this is to ensure alternatives do not contain 324 - any ORC entries, which in turn implies the above constraint. 320 + This means that in the use of the alternative() or ALTERNATIVE() 321 + macro, the code paths have conflicting modifications to the stack. 322 + The problem is that there is only one ORC unwind table, which means 323 + that the ORC unwind entries must be consistent for all possible 324 + instruction boundaries regardless of which code has been patched. 325 + This limitation can be overcome by massaging the alternatives with 326 + NOPs to shift the stack changes around so they no longer conflict. 325 327 326 328 11. file.o: warning: unannotated intra-function call 327 329
+99 -99
tools/objtool/check.c
··· 20 20 #include <linux/kernel.h> 21 21 #include <linux/static_call_types.h> 22 22 23 - #define FAKE_JUMP_OFFSET -1 24 - 25 23 struct alternative { 26 24 struct list_head list; 27 25 struct instruction *insn; ··· 773 775 if (!is_static_jump(insn)) 774 776 continue; 775 777 776 - if (insn->offset == FAKE_JUMP_OFFSET) 777 - continue; 778 - 779 778 reloc = find_reloc_by_dest_range(file->elf, insn->sec, 780 779 insn->offset, insn->len); 781 780 if (!reloc) { ··· 966 971 } 967 972 968 973 /* 969 - * The .alternatives section requires some extra special care, over and above 970 - * what other special sections require: 971 - * 972 - * 1. Because alternatives are patched in-place, we need to insert a fake jump 973 - * instruction at the end so that validate_branch() skips all the original 974 - * replaced instructions when validating the new instruction path. 975 - * 976 - * 2. An added wrinkle is that the new instruction length might be zero. In 977 - * that case the old instructions are replaced with noops. We simulate that 978 - * by creating a fake jump as the only new instruction. 979 - * 980 - * 3. In some cases, the alternative section includes an instruction which 981 - * conditionally jumps to the _end_ of the entry. We have to modify these 982 - * jumps' destinations to point back to .text rather than the end of the 983 - * entry in .altinstr_replacement. 974 + * The .alternatives section requires some extra special care over and above 975 + * other special sections because alternatives are patched in place. 984 976 */ 985 977 static int handle_group_alt(struct objtool_file *file, 986 978 struct special_alt *special_alt, 987 979 struct instruction *orig_insn, 988 980 struct instruction **new_insn) 989 981 { 990 - struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump = NULL; 982 + struct instruction *last_orig_insn, *last_new_insn = NULL, *insn, *nop = NULL; 991 983 struct alt_group *orig_alt_group, *new_alt_group; 992 984 unsigned long dest_off; 993 985 ··· 984 1002 WARN("malloc failed"); 985 1003 return -1; 986 1004 } 1005 + orig_alt_group->cfi = calloc(special_alt->orig_len, 1006 + sizeof(struct cfi_state *)); 1007 + if (!orig_alt_group->cfi) { 1008 + WARN("calloc failed"); 1009 + return -1; 1010 + } 1011 + 987 1012 last_orig_insn = NULL; 988 1013 insn = orig_insn; 989 1014 sec_for_each_insn_from(file, insn) { ··· 1004 1015 orig_alt_group->first_insn = orig_insn; 1005 1016 orig_alt_group->last_insn = last_orig_insn; 1006 1017 1007 - if (next_insn_same_sec(file, last_orig_insn)) { 1008 - fake_jump = malloc(sizeof(*fake_jump)); 1009 - if (!fake_jump) { 1010 - WARN("malloc failed"); 1011 - return -1; 1012 - } 1013 - memset(fake_jump, 0, sizeof(*fake_jump)); 1014 - INIT_LIST_HEAD(&fake_jump->alts); 1015 - INIT_LIST_HEAD(&fake_jump->stack_ops); 1016 - init_cfi_state(&fake_jump->cfi); 1017 - 1018 - fake_jump->sec = special_alt->new_sec; 1019 - fake_jump->offset = FAKE_JUMP_OFFSET; 1020 - fake_jump->type = INSN_JUMP_UNCONDITIONAL; 1021 - fake_jump->jump_dest = list_next_entry(last_orig_insn, list); 1022 - fake_jump->func = orig_insn->func; 1023 - } 1024 - 1025 - if (!special_alt->new_len) { 1026 - if (!fake_jump) { 1027 - WARN("%s: empty alternative at end of section", 1028 - special_alt->orig_sec->name); 1029 - return -1; 1030 - } 1031 - 1032 - *new_insn = fake_jump; 1033 - return 0; 1034 - } 1035 1018 1036 1019 new_alt_group = malloc(sizeof(*new_alt_group)); 1037 1020 if (!new_alt_group) { ··· 1011 1050 return -1; 1012 1051 } 1013 1052 1014 - last_new_insn = NULL; 1053 + if (special_alt->new_len < special_alt->orig_len) { 1054 + /* 1055 + * Insert a fake nop at the end to make the replacement 1056 + * alt_group the same size as the original. This is needed to 1057 + * allow propagate_alt_cfi() to do its magic. When the last 1058 + * instruction affects the stack, the instruction after it (the 1059 + * nop) will propagate the new state to the shared CFI array. 1060 + */ 1061 + nop = malloc(sizeof(*nop)); 1062 + if (!nop) { 1063 + WARN("malloc failed"); 1064 + return -1; 1065 + } 1066 + memset(nop, 0, sizeof(*nop)); 1067 + INIT_LIST_HEAD(&nop->alts); 1068 + INIT_LIST_HEAD(&nop->stack_ops); 1069 + init_cfi_state(&nop->cfi); 1070 + 1071 + nop->sec = special_alt->new_sec; 1072 + nop->offset = special_alt->new_off + special_alt->new_len; 1073 + nop->len = special_alt->orig_len - special_alt->new_len; 1074 + nop->type = INSN_NOP; 1075 + nop->func = orig_insn->func; 1076 + nop->alt_group = new_alt_group; 1077 + nop->ignore = orig_insn->ignore_alts; 1078 + } 1079 + 1080 + if (!special_alt->new_len) { 1081 + *new_insn = nop; 1082 + goto end; 1083 + } 1084 + 1015 1085 insn = *new_insn; 1016 1086 sec_for_each_insn_from(file, insn) { 1017 1087 struct reloc *alt_reloc; ··· 1081 1089 continue; 1082 1090 1083 1091 dest_off = arch_jump_destination(insn); 1084 - if (dest_off == special_alt->new_off + special_alt->new_len) { 1085 - if (!fake_jump) { 1086 - WARN("%s: alternative jump to end of section", 1087 - special_alt->orig_sec->name); 1088 - return -1; 1089 - } 1090 - insn->jump_dest = fake_jump; 1091 - } 1092 + if (dest_off == special_alt->new_off + special_alt->new_len) 1093 + insn->jump_dest = next_insn_same_sec(file, last_orig_insn); 1092 1094 1093 1095 if (!insn->jump_dest) { 1094 1096 WARN_FUNC("can't find alternative jump destination", ··· 1097 1111 return -1; 1098 1112 } 1099 1113 1114 + if (nop) 1115 + list_add(&nop->list, &last_new_insn->list); 1116 + end: 1100 1117 new_alt_group->orig_group = orig_alt_group; 1101 1118 new_alt_group->first_insn = *new_insn; 1102 - new_alt_group->last_insn = last_new_insn; 1103 - 1104 - if (fake_jump) 1105 - list_add(&fake_jump->list, &last_new_insn->list); 1106 - 1119 + new_alt_group->last_insn = nop ? : last_new_insn; 1120 + new_alt_group->cfi = orig_alt_group->cfi; 1107 1121 return 0; 1108 1122 } 1109 1123 ··· 2234 2248 return 0; 2235 2249 } 2236 2250 2251 + /* 2252 + * The stack layouts of alternatives instructions can sometimes diverge when 2253 + * they have stack modifications. That's fine as long as the potential stack 2254 + * layouts don't conflict at any given potential instruction boundary. 2255 + * 2256 + * Flatten the CFIs of the different alternative code streams (both original 2257 + * and replacement) into a single shared CFI array which can be used to detect 2258 + * conflicts and nicely feed a linear array of ORC entries to the unwinder. 2259 + */ 2260 + static int propagate_alt_cfi(struct objtool_file *file, struct instruction *insn) 2261 + { 2262 + struct cfi_state **alt_cfi; 2263 + int group_off; 2264 + 2265 + if (!insn->alt_group) 2266 + return 0; 2267 + 2268 + alt_cfi = insn->alt_group->cfi; 2269 + group_off = insn->offset - insn->alt_group->first_insn->offset; 2270 + 2271 + if (!alt_cfi[group_off]) { 2272 + alt_cfi[group_off] = &insn->cfi; 2273 + } else { 2274 + if (memcmp(alt_cfi[group_off], &insn->cfi, sizeof(struct cfi_state))) { 2275 + WARN_FUNC("stack layout conflict in alternatives", 2276 + insn->sec, insn->offset); 2277 + return -1; 2278 + } 2279 + } 2280 + 2281 + return 0; 2282 + } 2283 + 2237 2284 static int handle_insn_ops(struct instruction *insn, struct insn_state *state) 2238 2285 { 2239 2286 struct stack_op *op; 2240 2287 2241 2288 list_for_each_entry(op, &insn->stack_ops, list) { 2242 - struct cfi_state old_cfi = state->cfi; 2243 - int res; 2244 2289 2245 - res = update_cfi_state(insn, &state->cfi, op); 2246 - if (res) 2247 - return res; 2248 - 2249 - if (insn->alt_group && memcmp(&state->cfi, &old_cfi, sizeof(struct cfi_state))) { 2250 - WARN_FUNC("alternative modifies stack", insn->sec, insn->offset); 2251 - return -1; 2252 - } 2290 + if (update_cfi_state(insn, &state->cfi, op)) 2291 + return 1; 2253 2292 2254 2293 if (op->dest.type == OP_DEST_PUSHF) { 2255 2294 if (!state->uaccess_stack) { ··· 2464 2453 return 0; 2465 2454 } 2466 2455 2467 - /* 2468 - * Alternatives should not contain any ORC entries, this in turn means they 2469 - * should not contain any CFI ops, which implies all instructions should have 2470 - * the same same CFI state. 2471 - * 2472 - * It is possible to constuct alternatives that have unreachable holes that go 2473 - * unreported (because they're NOPs), such holes would result in CFI_UNDEFINED 2474 - * states which then results in ORC entries, which we just said we didn't want. 2475 - * 2476 - * Avoid them by copying the CFI entry of the first instruction into the whole 2477 - * alternative. 2478 - */ 2479 - static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn) 2456 + static struct instruction *next_insn_to_validate(struct objtool_file *file, 2457 + struct instruction *insn) 2480 2458 { 2481 - struct instruction *first_insn = insn; 2482 2459 struct alt_group *alt_group = insn->alt_group; 2483 2460 2484 - sec_for_each_insn_continue(file, insn) { 2485 - if (insn->alt_group != alt_group) 2486 - break; 2487 - insn->cfi = first_insn->cfi; 2488 - } 2461 + /* 2462 + * Simulate the fact that alternatives are patched in-place. When the 2463 + * end of a replacement alt_group is reached, redirect objtool flow to 2464 + * the end of the original alt_group. 2465 + */ 2466 + if (alt_group && insn == alt_group->last_insn && alt_group->orig_group) 2467 + return next_insn_same_sec(file, alt_group->orig_group->last_insn); 2468 + 2469 + return next_insn_same_sec(file, insn); 2489 2470 } 2490 2471 2491 2472 /* ··· 2498 2495 sec = insn->sec; 2499 2496 2500 2497 while (1) { 2501 - next_insn = next_insn_same_sec(file, insn); 2498 + next_insn = next_insn_to_validate(file, insn); 2502 2499 2503 2500 if (file->c_file && func && insn->func && func != insn->func->pfunc) { 2504 2501 WARN("%s() falls through to next function %s()", ··· 2531 2528 2532 2529 insn->visited |= visited; 2533 2530 2531 + if (propagate_alt_cfi(file, insn)) 2532 + return 1; 2533 + 2534 2534 if (!insn->ignore_alts && !list_empty(&insn->alts)) { 2535 2535 bool skip_orig = false; 2536 2536 ··· 2548 2542 return ret; 2549 2543 } 2550 2544 } 2551 - 2552 - if (insn->alt_group) 2553 - fill_alternative_cfi(file, insn); 2554 2545 2555 2546 if (skip_orig) 2556 2547 return 0; ··· 2780 2777 if (!strcmp(insn->sec->name, ".fixup") || 2781 2778 !strcmp(insn->sec->name, ".altinstr_replacement") || 2782 2779 !strcmp(insn->sec->name, ".altinstr_aux")) 2783 - return true; 2784 - 2785 - if (insn->type == INSN_JUMP_UNCONDITIONAL && insn->offset == FAKE_JUMP_OFFSET) 2786 2780 return true; 2787 2781 2788 2782 if (!insn->func)
+6
tools/objtool/include/objtool/check.h
··· 28 28 29 29 /* First and last instructions in the group */ 30 30 struct instruction *first_insn, *last_insn; 31 + 32 + /* 33 + * Byte-offset-addressed len-sized array of pointers to CFI structs. 34 + * This is shared with the other alt_groups in the same alternative. 35 + */ 36 + struct cfi_state **cfi; 31 37 }; 32 38 33 39 struct instruction {
+48 -8
tools/objtool/orc_gen.c
··· 144 144 return 0; 145 145 } 146 146 147 + static unsigned long alt_group_len(struct alt_group *alt_group) 148 + { 149 + return alt_group->last_insn->offset + 150 + alt_group->last_insn->len - 151 + alt_group->first_insn->offset; 152 + } 153 + 147 154 int orc_create(struct objtool_file *file) 148 155 { 149 156 struct section *sec, *ip_rsec, *orc_sec; ··· 175 168 continue; 176 169 177 170 sec_for_each_insn(file, sec, insn) { 178 - if (init_orc_entry(&orc, &insn->cfi)) 179 - return -1; 180 - if (!memcmp(&prev_orc, &orc, sizeof(orc))) 171 + struct alt_group *alt_group = insn->alt_group; 172 + int i; 173 + 174 + if (!alt_group) { 175 + if (init_orc_entry(&orc, &insn->cfi)) 176 + return -1; 177 + if (!memcmp(&prev_orc, &orc, sizeof(orc))) 178 + continue; 179 + if (orc_list_add(&orc_list, &orc, sec, 180 + insn->offset)) 181 + return -1; 182 + nr++; 183 + prev_orc = orc; 184 + empty = false; 181 185 continue; 182 - if (orc_list_add(&orc_list, &orc, sec, insn->offset)) 183 - return -1; 184 - nr++; 185 - prev_orc = orc; 186 - empty = false; 186 + } 187 + 188 + /* 189 + * Alternatives can have different stack layout 190 + * possibilities (but they shouldn't conflict). 191 + * Instead of traversing the instructions, use the 192 + * alt_group's flattened byte-offset-addressed CFI 193 + * array. 194 + */ 195 + for (i = 0; i < alt_group_len(alt_group); i++) { 196 + struct cfi_state *cfi = alt_group->cfi[i]; 197 + if (!cfi) 198 + continue; 199 + if (init_orc_entry(&orc, cfi)) 200 + return -1; 201 + if (!memcmp(&prev_orc, &orc, sizeof(orc))) 202 + continue; 203 + if (orc_list_add(&orc_list, &orc, insn->sec, 204 + insn->offset + i)) 205 + return -1; 206 + nr++; 207 + prev_orc = orc; 208 + empty = false; 209 + } 210 + 211 + /* Skip to the end of the alt_group */ 212 + insn = alt_group->last_insn; 187 213 } 188 214 189 215 /* Add a section terminator */