llvm.org GIT mirror llvm / 54049c8
Merging r323155: ------------------------------------------------------------------------ r323155 | chandlerc | 2018-01-22 14:05:25 -0800 (Mon, 22 Jan 2018) | 133 lines Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@324007 91177308-0d34-0410-b5e6-96231b3b80d8 Reid Kleckner 1 year, 6 months ago
32 changed file(s) with 1364 addition(s) and 12 deletion(s). Raw diff Collapse all Expand all
419419 /// shuffles.
420420 FunctionPass *createExpandReductionsPass();
421421
422 // This pass expands indirectbr instructions.
423 FunctionPass *createIndirectBrExpandPass();
424
422425 } // End llvm namespace
423426
424427 #endif
405405 /// immediately before machine code is emitted.
406406 virtual void addPreEmitPass() { }
407407
408 /// Targets may add passes immediately before machine code is emitted in this
409 /// callback. This is called even later than `addPreEmitPass`.
410 // FIXME: Rename `addPreEmitPass` to something more sensible given its actual
411 // position and remove the `2` suffix here as this callback is what
412 // `addPreEmitPass` *should* be but in reality isn't.
413 virtual void addPreEmitPass2() {}
414
408415 /// Utilities for targets to add passes to the pass manager.
409416 ///
410417
156156 void initializeIfConverterPass(PassRegistry&);
157157 void initializeImplicitNullChecksPass(PassRegistry&);
158158 void initializeIndVarSimplifyLegacyPassPass(PassRegistry&);
159 void initializeIndirectBrExpandPassPass(PassRegistry&);
159160 void initializeInductiveRangeCheckEliminationPass(PassRegistry&);
160161 void initializeInferAddressSpacesPass(PassRegistry&);
161162 void initializeInferFunctionAttrsLegacyPassPass(PassRegistry&);
798798 }
799799
800800 /// Return true if lowering to a jump table is allowed.
801 bool areJTsAllowed(const Function *Fn) const {
801 virtual bool areJTsAllowed(const Function *Fn) const {
802802 if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
803803 return false;
804804
171171 /// \brief True if the subtarget should run the atomic expansion pass.
172172 virtual bool enableAtomicExpand() const;
173173
174 /// True if the subtarget should run the indirectbr expansion pass.
175 virtual bool enableIndirectBrExpand() const;
176
174177 /// \brief Override generic scheduling policy within a region.
175178 ///
176179 /// This is a convenient way for targets that don't provide any custom
3333 GlobalMerge.cpp
3434 IfConversion.cpp
3535 ImplicitNullChecks.cpp
36 IndirectBrExpandPass.cpp
3637 InlineSpiller.cpp
3738 InterferenceCache.cpp
3839 InterleavedAccessPass.cpp
3838 initializeGCModuleInfoPass(Registry);
3939 initializeIfConverterPass(Registry);
4040 initializeImplicitNullChecksPass(Registry);
41 initializeIndirectBrExpandPassPass(Registry);
4142 initializeInterleavedAccessPass(Registry);
4243 initializeLiveDebugValuesPass(Registry);
4344 initializeLiveDebugVariablesPass(Registry);
0 //===- IndirectBrExpandPass.cpp - Expand indirectbr to switch -------------===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// Implements an expansion pass to turn `indirectbr` instructions in the IR
11 /// into `switch` instructions. This works by enumerating the basic blocks in
12 /// a dense range of integers, replacing each `blockaddr` constant with the
13 /// corresponding integer constant, and then building a switch that maps from
14 /// the integers to the actual blocks. All of the indirectbr instructions in the
15 /// function are redirected to this common switch.
16 ///
17 /// While this is generically useful if a target is unable to codegen
18 /// `indirectbr` natively, it is primarily useful when there is some desire to
19 /// get the builtin non-jump-table lowering of a switch even when the input
20 /// source contained an explicit indirect branch construct.
21 ///
22 /// Note that it doesn't make any sense to enable this pass unless a target also
23 /// disables jump-table lowering of switches. Doing that is likely to pessimize
24 /// the code.
25 ///
26 //===----------------------------------------------------------------------===//
27
28 #include "llvm/ADT/STLExtras.h"
29 #include "llvm/ADT/Sequence.h"
30 #include "llvm/ADT/SmallVector.h"
31 #include "llvm/CodeGen/TargetPassConfig.h"
32 #include "llvm/Target/TargetSubtargetInfo.h"
33 #include "llvm/IR/BasicBlock.h"
34 #include "llvm/IR/Function.h"
35 #include "llvm/IR/IRBuilder.h"
36 #include "llvm/IR/InstIterator.h"
37 #include "llvm/IR/Instruction.h"
38 #include "llvm/IR/Instructions.h"
39 #include "llvm/Pass.h"
40 #include "llvm/Support/Debug.h"
41 #include "llvm/Support/ErrorHandling.h"
42 #include "llvm/Support/raw_ostream.h"
43 #include "llvm/Target/TargetMachine.h"
44
45 using namespace llvm;
46
47 #define DEBUG_TYPE "indirectbr-expand"
48
49 namespace {
50
51 class IndirectBrExpandPass : public FunctionPass {
52 const TargetLowering *TLI = nullptr;
53
54 public:
55 static char ID; // Pass identification, replacement for typeid
56
57 IndirectBrExpandPass() : FunctionPass(ID) {
58 initializeIndirectBrExpandPassPass(*PassRegistry::getPassRegistry());
59 }
60
61 bool runOnFunction(Function &F) override;
62 };
63
64 } // end anonymous namespace
65
66 char IndirectBrExpandPass::ID = 0;
67
68 INITIALIZE_PASS(IndirectBrExpandPass, DEBUG_TYPE,
69 "Expand indirectbr instructions", false, false)
70
71 FunctionPass *llvm::createIndirectBrExpandPass() {
72 return new IndirectBrExpandPass();
73 }
74
75 bool IndirectBrExpandPass::runOnFunction(Function &F) {
76 auto &DL = F.getParent()->getDataLayout();
77 auto *TPC = getAnalysisIfAvailable();
78 if (!TPC)
79 return false;
80
81 auto &TM = TPC->getTM();
82 auto &STI = *TM.getSubtargetImpl(F);
83 if (!STI.enableIndirectBrExpand())
84 return false;
85 TLI = STI.getTargetLowering();
86
87 SmallVector IndirectBrs;
88
89 // Set of all potential successors for indirectbr instructions.
90 SmallPtrSet IndirectBrSuccs;
91
92 // Build a list of indirectbrs that we want to rewrite.
93 for (BasicBlock &BB : F)
94 if (auto *IBr = dyn_cast(BB.getTerminator())) {
95 // Handle the degenerate case of no successors by replacing the indirectbr
96 // with unreachable as there is no successor available.
97 if (IBr->getNumSuccessors() == 0) {
98 (void)new UnreachableInst(F.getContext(), IBr);
99 IBr->eraseFromParent();
100 continue;
101 }
102
103 IndirectBrs.push_back(IBr);
104 for (BasicBlock *SuccBB : IBr->successors())
105 IndirectBrSuccs.insert(SuccBB);
106 }
107
108 if (IndirectBrs.empty())
109 return false;
110
111 // If we need to replace any indirectbrs we need to establish integer
112 // constants that will correspond to each of the basic blocks in the function
113 // whose address escapes. We do that here and rewrite all the blockaddress
114 // constants to just be those integer constants cast to a pointer type.
115 SmallVector BBs;
116
117 for (BasicBlock &BB : F) {
118 // Skip blocks that aren't successors to an indirectbr we're going to
119 // rewrite.
120 if (!IndirectBrSuccs.count(&BB))
121 continue;
122
123 auto IsBlockAddressUse = [&](const Use &U) {
124 return isa(U.getUser());
125 };
126 auto BlockAddressUseIt = llvm::find_if(BB.uses(), IsBlockAddressUse);
127 if (BlockAddressUseIt == BB.use_end())
128 continue;
129
130 assert(std::find_if(std::next(BlockAddressUseIt), BB.use_end(),
131 IsBlockAddressUse) == BB.use_end() &&
132 "There should only ever be a single blockaddress use because it is "
133 "a constant and should be uniqued.");
134
135 auto *BA = cast(BlockAddressUseIt->getUser());
136
137 // Skip if the constant was formed but ended up not being used (due to DCE
138 // or whatever).
139 if (!BA->isConstantUsed())
140 continue;
141
142 // Compute the index we want to use for this basic block. We can't use zero
143 // because null can be compared with block addresses.
144 int BBIndex = BBs.size() + 1;
145 BBs.push_back(&BB);
146
147 auto *ITy = cast(DL.getIntPtrType(BA->getType()));
148 ConstantInt *BBIndexC = ConstantInt::get(ITy, BBIndex);
149
150 // Now rewrite the blockaddress to an integer constant based on the index.
151 // FIXME: We could potentially preserve the uses as arguments to inline asm.
152 // This would allow some uses such as diagnostic information in crashes to
153 // have higher quality even when this transform is enabled, but would break
154 // users that round-trip blockaddresses through inline assembly and then
155 // back into an indirectbr.
156 BA->replaceAllUsesWith(ConstantExpr::getIntToPtr(BBIndexC, BA->getType()));
157 }
158
159 if (BBs.empty()) {
160 // There are no blocks whose address is taken, so any indirectbr instruction
161 // cannot get a valid input and we can replace all of them with unreachable.
162 for (auto *IBr : IndirectBrs) {
163 (void)new UnreachableInst(F.getContext(), IBr);
164 IBr->eraseFromParent();
165 }
166 return true;
167 }
168
169 BasicBlock *SwitchBB;
170 Value *SwitchValue;
171
172 // Compute a common integer type across all the indirectbr instructions.
173 IntegerType *CommonITy = nullptr;
174 for (auto *IBr : IndirectBrs) {
175 auto *ITy =
176 cast(DL.getIntPtrType(IBr->getAddress()->getType()));
177 if (!CommonITy || ITy->getBitWidth() > CommonITy->getBitWidth())
178 CommonITy = ITy;
179 }
180
181 auto GetSwitchValue = [DL, CommonITy](IndirectBrInst *IBr) {
182 return CastInst::CreatePointerCast(
183 IBr->getAddress(), CommonITy,
184 Twine(IBr->getAddress()->getName()) + ".switch_cast", IBr);
185 };
186
187 if (IndirectBrs.size() == 1) {
188 // If we only have one indirectbr, we can just directly replace it within
189 // its block.
190 SwitchBB = IndirectBrs[0]->getParent();
191 SwitchValue = GetSwitchValue(IndirectBrs[0]);
192 IndirectBrs[0]->eraseFromParent();
193 } else {
194 // Otherwise we need to create a new block to hold the switch across BBs,
195 // jump to that block instead of each indirectbr, and phi together the
196 // values for the switch.
197 SwitchBB = BasicBlock::Create(F.getContext(), "switch_bb", &F);
198 auto *SwitchPN = PHINode::Create(CommonITy, IndirectBrs.size(),
199 "switch_value_phi", SwitchBB);
200 SwitchValue = SwitchPN;
201
202 // Now replace the indirectbr instructions with direct branches to the
203 // switch block and fill out the PHI operands.
204 for (auto *IBr : IndirectBrs) {
205 SwitchPN->addIncoming(GetSwitchValue(IBr), IBr->getParent());
206 BranchInst::Create(SwitchBB, IBr);
207 IBr->eraseFromParent();
208 }
209 }
210
211 // Now build the switch in the block. The block will have no terminator
212 // already.
213 auto *SI = SwitchInst::Create(SwitchValue, BBs[0], BBs.size(), SwitchBB);
214
215 // Add a case for each block.
216 for (int i : llvm::seq(1, BBs.size()))
217 SI->addCase(ConstantInt::get(CommonITy, i + 1), BBs[i]);
218
219 return true;
220 }
789789 if (EnableMachineOutliner)
790790 PM->add(createMachineOutlinerPass());
791791
792 // Add passes that directly emit MI after all other MI passes.
793 addPreEmitPass2();
794
792795 AddingMachinePasses = false;
793796 }
794797
3434
3535 bool TargetSubtargetInfo::enableAtomicExpand() const {
3636 return true;
37 }
38
39 bool TargetSubtargetInfo::enableIndirectBrExpand() const {
40 return false;
3741 }
3842
3943 bool TargetSubtargetInfo::enableMachineScheduler() const {
5656 X86OptimizeLEAs.cpp
5757 X86PadShortFunction.cpp
5858 X86RegisterInfo.cpp
59 X86RetpolineThunks.cpp
5960 X86SelectionDAGInfo.cpp
6061 X86ShuffleDecodeConstantPool.cpp
6162 X86Subtarget.cpp
2121 class FunctionPass;
2222 class ImmutablePass;
2323 class InstructionSelector;
24 class ModulePass;
2425 class PassRegistry;
2526 class X86RegisterBankInfo;
2627 class X86Subtarget;
9798 /// encoding when possible in order to reduce code size.
9899 FunctionPass *createX86EvexToVexInsts();
99100
101 /// This pass creates the thunks for the retpoline feature.
102 ModulePass *createX86RetpolineThunksPass();
103
100104 InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,
101105 X86Subtarget &,
102106 X86RegisterBankInfo &);
289289 "ermsb", "HasERMSB", "true",
290290 "REP MOVS/STOS are fast">;
291291
292 // Enable mitigation of some aspects of speculative execution related
293 // vulnerabilities by removing speculatable indirect branches. This disables
294 // jump-table formation, rewrites explicit `indirectbr` instructions into
295 // `switch` instructions, and uses a special construct called a "retpoline" to
296 // prevent speculation of the remaining indirect branches (indirect calls and
297 // tail calls).
298 def FeatureRetpoline
299 : SubtargetFeature<"retpoline", "UseRetpoline", "true",
300 "Remove speculation of indirect branches from the "
301 "generated code, either by avoiding them entirely or "
302 "lowering them with a speculation blocking construct.">;
303
304 // Rely on external thunks for the emitted retpoline calls. This allows users
305 // to provide their own custom thunk definitions in highly specialized
306 // environments such as a kernel that does boot-time hot patching.
307 def FeatureRetpolineExternalThunk
308 : SubtargetFeature<
309 "retpoline-external-thunk", "UseRetpolineExternalThunk", "true",
310 "Enable retpoline, but with an externally provided thunk.",
311 [FeatureRetpoline]>;
312
292313 //===----------------------------------------------------------------------===//
293314 // X86 processors supported.
294315 //===----------------------------------------------------------------------===//
2929 StackMaps SM;
3030 FaultMaps FM;
3131 std::unique_ptr CodeEmitter;
32 bool NeedsRetpoline = false;
3233
3334 // This utility class tracks the length of a stackmap instruction's 'shadow'.
3435 // It is used by the X86AsmPrinter to ensure that the stackmap shadow
31603160 (CalledFn && CalledFn->hasFnAttribute("no_caller_saved_registers")))
31613161 return false;
31623162
3163 // Functions using retpoline should use SDISel for calls.
3164 if (Subtarget->useRetpoline())
3165 return false;
3166
31633167 // Handle only C, fastcc, and webkit_js calling conventions for now.
31643168 switch (CC) {
31653169 default: return false;
741741 bool InProlog) const {
742742 bool IsLargeCodeModel = MF.getTarget().getCodeModel() == CodeModel::Large;
743743
744 // FIXME: Add retpoline support and remove this.
745 if (Is64Bit && IsLargeCodeModel && STI.useRetpoline())
746 report_fatal_error("Emitting stack probe calls on 64-bit with the large "
747 "code model and retpoline not yet implemented.");
748
744749 unsigned CallOp;
745750 if (Is64Bit)
746751 CallOp = IsLargeCodeModel ? X86::CALL64r : X86::CALL64pcrel32;
23362341 // This solution is not perfect, as it assumes that the .rodata section
23372342 // is laid out within 2^31 bytes of each function body, but this seems
23382343 // to be sufficient for JIT.
2344 // FIXME: Add retpoline support and remove the error here..
2345 if (STI.useRetpoline())
2346 report_fatal_error("Emitting morestack calls on 64-bit with the large "
2347 "code model and retpoline not yet implemented.");
23392348 BuildMI(allocMBB, DL, TII.get(X86::CALL64m))
23402349 .addReg(X86::RIP)
23412350 .addImm(0)
549549 SDNode *N = &*I++; // Preincrement iterator to avoid invalidation issues.
550550
551551 if (OptLevel != CodeGenOpt::None &&
552 // Only does this when target favors doesn't favor register indirect
553 // call.
552 // Only do this when the target can fold the load into the call or
553 // jmp.
554 !Subtarget->useRetpoline() &&
554555 ((N->getOpcode() == X86ISD::CALL && !Subtarget->callRegIndirect()) ||
555556 (N->getOpcode() == X86ISD::TC_RETURN &&
556 // Only does this if load can be folded into TC_RETURN.
557557 (Subtarget->is64Bit() ||
558558 !getTargetMachine().isPositionIndependent())))) {
559559 /// Also try moving call address load from outside callseq_start to just
2499324993 return isShuffleMaskLegal(Mask, VT);
2499424994 }
2499524995
24996 bool X86TargetLowering::areJTsAllowed(const Function *Fn) const {
24997 // If the subtarget is using retpolines, we need to not generate jump tables.
24998 if (Subtarget.useRetpoline())
24999 return false;
25000
25001 // Otherwise, fallback on the generic logic.
25002 return TargetLowering::areJTsAllowed(Fn);
25003 }
25004
2499625005 //===----------------------------------------------------------------------===//
2499725006 // X86 Scheduler Hooks
2499825007 //===----------------------------------------------------------------------===//
2622426233 return BB;
2622526234 }
2622626235
26236 static unsigned getOpcodeForRetpoline(unsigned RPOpc) {
26237 switch (RPOpc) {
26238 case X86::RETPOLINE_CALL32:
26239 return X86::CALLpcrel32;
26240 case X86::RETPOLINE_CALL64:
26241 return X86::CALL64pcrel32;
26242 case X86::RETPOLINE_TCRETURN32:
26243 return X86::TCRETURNdi;
26244 case X86::RETPOLINE_TCRETURN64:
26245 return X86::TCRETURNdi64;
26246 }
26247 llvm_unreachable("not retpoline opcode");
26248 }
26249
26250 static const char *getRetpolineSymbol(const X86Subtarget &Subtarget,
26251 unsigned Reg) {
26252 switch (Reg) {
26253 case 0:
26254 assert(!Subtarget.is64Bit() && "R11 should always be available on x64");
26255 return Subtarget.useRetpolineExternalThunk()
26256 ? "__llvm_external_retpoline_push"
26257 : "__llvm_retpoline_push";
26258 case X86::EAX:
26259 return Subtarget.useRetpolineExternalThunk()
26260 ? "__llvm_external_retpoline_eax"
26261 : "__llvm_retpoline_eax";
26262 case X86::ECX:
26263 return Subtarget.useRetpolineExternalThunk()
26264 ? "__llvm_external_retpoline_ecx"
26265 : "__llvm_retpoline_ecx";
26266 case X86::EDX:
26267 return Subtarget.useRetpolineExternalThunk()
26268 ? "__llvm_external_retpoline_edx"
26269 : "__llvm_retpoline_edx";
26270 case X86::R11:
26271 return Subtarget.useRetpolineExternalThunk()
26272 ? "__llvm_external_retpoline_r11"
26273 : "__llvm_retpoline_r11";
26274 }
26275 llvm_unreachable("unexpected reg for retpoline");
26276 }
26277
26278 MachineBasicBlock *
26279 X86TargetLowering::EmitLoweredRetpoline(MachineInstr &MI,
26280 MachineBasicBlock *BB) const {
26281 // Copy the virtual register into the R11 physical register and
26282 // call the retpoline thunk.
26283 DebugLoc DL = MI.getDebugLoc();
26284 const X86InstrInfo *TII = Subtarget.getInstrInfo();
26285 unsigned CalleeVReg = MI.getOperand(0).getReg();
26286 unsigned Opc = getOpcodeForRetpoline(MI.getOpcode());
26287
26288 // Find an available scratch register to hold the callee. On 64-bit, we can
26289 // just use R11, but we scan for uses anyway to ensure we don't generate
26290 // incorrect code. On 32-bit, we use one of EAX, ECX, or EDX that isn't
26291 // already a register use operand to the call to hold the callee. If none
26292 // are available, push the callee instead. This is less efficient, but is
26293 // necessary for functions using 3 regparms. Such function calls are
26294 // (currently) not eligible for tail call optimization, because there is no
26295 // scratch register available to hold the address of the callee.
26296 SmallVector AvailableRegs;
26297 if (Subtarget.is64Bit())
26298 AvailableRegs.push_back(X86::R11);
26299 else
26300 AvailableRegs.append({X86::EAX, X86::ECX, X86::EDX});
26301
26302 // Zero out any registers that are already used.
26303 for (const auto &MO : MI.operands()) {
26304 if (MO.isReg() && MO.isUse())
26305 for (unsigned &Reg : AvailableRegs)
26306 if (Reg == MO.getReg())
26307 Reg = 0;
26308 }
26309
26310 // Choose the first remaining non-zero available register.
26311 unsigned AvailableReg = 0;
26312 for (unsigned MaybeReg : AvailableRegs) {
26313 if (MaybeReg) {
26314 AvailableReg = MaybeReg;
26315 break;
26316 }
26317 }
26318
26319 const char *Symbol = getRetpolineSymbol(Subtarget, AvailableReg);
26320
26321 if (AvailableReg == 0) {
26322 // No register available. Use PUSH. This must not be a tailcall, and this
26323 // must not be x64.
26324 if (Subtarget.is64Bit())
26325 report_fatal_error(
26326 "Cannot make an indirect call on x86-64 using both retpoline and a "
26327 "calling convention that preservers r11");
26328 if (Opc != X86::CALLpcrel32)
26329 report_fatal_error("Cannot make an indirect tail call on x86 using "
26330 "retpoline without a preserved register");
26331 BuildMI(*BB, MI, DL, TII->get(X86::PUSH32r)).addReg(CalleeVReg);
26332 MI.getOperand(0).ChangeToES(Symbol);
26333 MI.setDesc(TII->get(Opc));
26334 } else {
26335 BuildMI(*BB, MI, DL, TII->get(TargetOpcode::COPY), AvailableReg)
26336 .addReg(CalleeVReg);
26337 MI.getOperand(0).ChangeToES(Symbol);
26338 MI.setDesc(TII->get(Opc));
26339 MachineInstrBuilder(*BB->getParent(), &MI)
26340 .addReg(AvailableReg, RegState::Implicit | RegState::Kill);
26341 }
26342 return BB;
26343 }
26344
2622726345 MachineBasicBlock *
2622826346 X86TargetLowering::emitEHSjLjSetJmp(MachineInstr &MI,
2622926347 MachineBasicBlock *MBB) const {
2668826806 case X86::TLS_base_addr32:
2668926807 case X86::TLS_base_addr64:
2669026808 return EmitLoweredTLSAddr(MI, BB);
26809 case X86::RETPOLINE_CALL32:
26810 case X86::RETPOLINE_CALL64:
26811 case X86::RETPOLINE_TCRETURN32:
26812 case X86::RETPOLINE_TCRETURN64:
26813 return EmitLoweredRetpoline(MI, BB);
2669126814 case X86::CATCHRET:
2669226815 return EmitLoweredCatchRet(MI, BB);
2669326816 case X86::CATCHPAD:
985985 bool isVectorClearMaskLegal(const SmallVectorImpl &Mask,
986986 EVT VT) const override;
987987
988 /// Returns true if lowering to a jump table is allowed.
989 bool areJTsAllowed(const Function *Fn) const override;
990
988991 /// If true, then instruction selection should
989992 /// seek to shrink the FP constant of the specified type to a smaller type
990993 /// in order to save space and / or reduce runtime.
12881291 MachineBasicBlock *EmitLoweredTLSCall(MachineInstr &MI,
12891292 MachineBasicBlock *BB) const;
12901293
1294 MachineBasicBlock *EmitLoweredRetpoline(MachineInstr &MI,
1295 MachineBasicBlock *BB) const;
1296
12911297 MachineBasicBlock *emitEHSjLjSetJmp(MachineInstr &MI,
12921298 MachineBasicBlock *MBB) const;
12931299
11051105
11061106 def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
11071107 (TCRETURNri ptr_rc_tailcall:$dst, imm:$off)>,
1108 Requires<[Not64BitMode]>;
1108 Requires<[Not64BitMode, NotUseRetpoline]>;
11091109
11101110 // FIXME: This is disabled for 32-bit PIC mode because the global base
11111111 // register which is part of the address mode may be assigned a
11121112 // callee-saved register.
11131113 def : Pat<(X86tcret (load addr:$dst), imm:$off),
11141114 (TCRETURNmi addr:$dst, imm:$off)>,
1115 Requires<[Not64BitMode, IsNotPIC]>;
1115 Requires<[Not64BitMode, IsNotPIC, NotUseRetpoline]>;
11161116
11171117 def : Pat<(X86tcret (i32 tglobaladdr:$dst), imm:$off),
11181118 (TCRETURNdi tglobaladdr:$dst, imm:$off)>,
11241124
11251125 def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
11261126 (TCRETURNri64 ptr_rc_tailcall:$dst, imm:$off)>,
1127 Requires<[In64BitMode]>;
1127 Requires<[In64BitMode, NotUseRetpoline]>;
11281128
11291129 // Don't fold loads into X86tcret requiring more than 6 regs.
11301130 // There wouldn't be enough scratch registers for base+index.
11311131 def : Pat<(X86tcret_6regs (load addr:$dst), imm:$off),
11321132 (TCRETURNmi64 addr:$dst, imm:$off)>,
1133 Requires<[In64BitMode]>;
1133 Requires<[In64BitMode, NotUseRetpoline]>;
1134
1135 def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
1136 (RETPOLINE_TCRETURN64 ptr_rc_tailcall:$dst, imm:$off)>,
1137 Requires<[In64BitMode, UseRetpoline]>;
1138
1139 def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
1140 (RETPOLINE_TCRETURN32 ptr_rc_tailcall:$dst, imm:$off)>,
1141 Requires<[Not64BitMode, UseRetpoline]>;
11341142
11351143 def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
11361144 (TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,
210210 Sched<[WriteJumpLd]>;
211211 def CALL32r : I<0xFF, MRM2r, (outs), (ins GR32:$dst),
212212 "call{l}\t{*}$dst", [(X86call GR32:$dst)], IIC_CALL_RI>,
213 OpSize32, Requires<[Not64BitMode]>, Sched<[WriteJump]>;
213 OpSize32, Requires<[Not64BitMode,NotUseRetpoline]>,
214 Sched<[WriteJump]>;
214215 def CALL32m : I<0xFF, MRM2m, (outs), (ins i32mem:$dst),
215216 "call{l}\t{*}$dst", [(X86call (loadi32 addr:$dst))],
216217 IIC_CALL_MEM>, OpSize32,
217 Requires<[Not64BitMode,FavorMemIndirectCall]>,
218 Requires<[Not64BitMode,FavorMemIndirectCall,NotUseRetpoline]>,
218219 Sched<[WriteJumpLd]>;
219220
220221 let Predicates = [Not64BitMode] in {
297298 def CALL64r : I<0xFF, MRM2r, (outs), (ins GR64:$dst),
298299 "call{q}\t{*}$dst", [(X86call GR64:$dst)],
299300 IIC_CALL_RI>,
300 Requires<[In64BitMode]>;
301 Requires<[In64BitMode,NotUseRetpoline]>;
301302 def CALL64m : I<0xFF, MRM2m, (outs), (ins i64mem:$dst),
302303 "call{q}\t{*}$dst", [(X86call (loadi64 addr:$dst))],
303304 IIC_CALL_MEM>,
304 Requires<[In64BitMode,FavorMemIndirectCall]>;
305 Requires<[In64BitMode,FavorMemIndirectCall,
306 NotUseRetpoline]>;
305307
306308 def FARCALL64 : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),
307309 "lcall{q}\t{*}$dst", [], IIC_CALL_FAR_MEM>;
340342 }
341343 }
342344
345 let isPseudo = 1, isCall = 1, isCodeGenOnly = 1,
346 Uses = [RSP],
347 usesCustomInserter = 1,
348 SchedRW = [WriteJump] in {
349 def RETPOLINE_CALL32 :
350 PseudoI<(outs), (ins GR32:$dst), [(X86call GR32:$dst)]>,
351 Requires<[Not64BitMode,UseRetpoline]>;
352
353 def RETPOLINE_CALL64 :
354 PseudoI<(outs), (ins GR64:$dst), [(X86call GR64:$dst)]>,
355 Requires<[In64BitMode,UseRetpoline]>;
356
357 // Retpoline variant of indirect tail calls.
358 let isTerminator = 1, isReturn = 1, isBarrier = 1 in {
359 def RETPOLINE_TCRETURN64 :
360 PseudoI<(outs), (ins GR64:$dst, i32imm:$offset), []>;
361 def RETPOLINE_TCRETURN32 :
362 PseudoI<(outs), (ins GR32:$dst, i32imm:$offset), []>;
363 }
364 }
365
343366 // Conditional tail calls are similar to the above, but they are branches
344367 // rather than barriers, and they use EFLAGS.
345368 let isCall = 1, isTerminator = 1, isReturn = 1, isBranch = 1,
916916 def HasFastSHLDRotate : Predicate<"Subtarget->hasFastSHLDRotate()">;
917917 def HasERMSB : Predicate<"Subtarget->hasERMSB()">;
918918 def HasMFence : Predicate<"Subtarget->hasMFence()">;
919 def UseRetpoline : Predicate<"Subtarget->useRetpoline()">;
920 def NotUseRetpoline : Predicate<"!Subtarget->useRetpoline()">;
919921
920922 //===----------------------------------------------------------------------===//
921923 // X86 Instruction Format Definitions.
873873 // address is to far away. (TODO: support non-relative addressing)
874874 break;
875875 case MachineOperand::MO_Register:
876 // FIXME: Add retpoline support and remove this.
877 if (Subtarget->useRetpoline())
878 report_fatal_error("Lowering register statepoints with retpoline not "
879 "yet implemented.");
876880 CallTargetMCOp = MCOperand::createReg(CallTarget.getReg());
877881 CallOpcode = X86::CALL64r;
878882 break;
10271031
10281032 EmitAndCountInstruction(
10291033 MCInstBuilder(X86::MOV64ri).addReg(ScratchReg).addOperand(CalleeMCOp));
1034 // FIXME: Add retpoline support and remove this.
1035 if (Subtarget->useRetpoline())
1036 report_fatal_error(
1037 "Lowering patchpoint with retpoline not yet implemented.");
10301038 EmitAndCountInstruction(MCInstBuilder(X86::CALL64r).addReg(ScratchReg));
10311039 }
10321040
0 //======- X86RetpolineThunks.cpp - Construct retpoline thunks for x86 --=====//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// Pass that injects an MI thunk implementing a "retpoline". This is
11 /// a RET-implemented trampoline that is used to lower indirect calls in a way
12 /// that prevents speculation on some x86 processors and can be used to mitigate
13 /// security vulnerabilities due to targeted speculative execution and side
14 /// channels such as CVE-2017-5715.
15 ///
16 /// TODO(chandlerc): All of this code could use better comments and
17 /// documentation.
18 ///
19 //===----------------------------------------------------------------------===//
20
21 #include "X86.h"
22 #include "X86InstrBuilder.h"
23 #include "X86Subtarget.h"
24 #include "llvm/CodeGen/MachineFunction.h"
25 #include "llvm/CodeGen/MachineInstrBuilder.h"
26 #include "llvm/CodeGen/MachineModuleInfo.h"
27 #include "llvm/CodeGen/Passes.h"
28 #include "llvm/CodeGen/TargetPassConfig.h"
29 #include "llvm/IR/IRBuilder.h"
30 #include "llvm/IR/Instructions.h"
31 #include "llvm/IR/Module.h"
32 #include "llvm/Support/CommandLine.h"
33 #include "llvm/Support/Debug.h"
34 #include "llvm/Support/raw_ostream.h"
35
36 using namespace llvm;
37
38 #define DEBUG_TYPE "x86-retpoline-thunks"
39
40 namespace {
41 class X86RetpolineThunks : public ModulePass {
42 public:
43 static char ID;
44
45 X86RetpolineThunks() : ModulePass(ID) {}
46
47 StringRef getPassName() const override { return "X86 Retpoline Thunks"; }
48
49 bool runOnModule(Module &M) override;
50
51 void getAnalysisUsage(AnalysisUsage &AU) const override {
52 AU.addRequired();
53 AU.addPreserved();
54 }
55
56 private:
57 MachineModuleInfo *MMI;
58 const TargetMachine *TM;
59 bool Is64Bit;
60 const X86Subtarget *STI;
61 const X86InstrInfo *TII;
62
63 Function *createThunkFunction(Module &M, StringRef Name);
64 void insertRegReturnAddrClobber(MachineBasicBlock &MBB, unsigned Reg);
65 void insert32BitPushReturnAddrClobber(MachineBasicBlock &MBB);
66 void createThunk(Module &M, StringRef NameSuffix,
67 Optional Reg = None);
68 };
69
70 } // end anonymous namespace
71
72 ModulePass *llvm::createX86RetpolineThunksPass() {
73 return new X86RetpolineThunks();
74 }
75
76 char X86RetpolineThunks::ID = 0;
77
78 bool X86RetpolineThunks::runOnModule(Module &M) {
79 DEBUG(dbgs() << getPassName() << '\n');
80
81 auto *TPC = getAnalysisIfAvailable();
82 assert(TPC && "X86-specific target pass should not be run without a target "
83 "pass config!");
84
85 MMI = &getAnalysis();
86 TM = &TPC->getTM();
87 Is64Bit = TM->getTargetTriple().getArch() == Triple::x86_64;
88
89 // Only add a thunk if we have at least one function that has the retpoline
90 // feature enabled in its subtarget.
91 // FIXME: Conditionalize on indirect calls so we don't emit a thunk when
92 // nothing will end up calling it.
93 // FIXME: It's a little silly to look at every function just to enumerate
94 // the subtargets, but eventually we'll want to look at them for indirect
95 // calls, so maybe this is OK.
96 if (!llvm::any_of(M, [&](const Function &F) {
97 // Save the subtarget we find for use in emitting the subsequent
98 // thunk.
99 STI = &TM->getSubtarget(F);
100 return STI->useRetpoline() && !STI->useRetpolineExternalThunk();
101 }))
102 return false;
103
104 // If we have a relevant subtarget, get the instr info as well.
105 TII = STI->getInstrInfo();
106
107 if (Is64Bit) {
108 // __llvm_retpoline_r11:
109 // callq .Lr11_call_target
110 // .Lr11_capture_spec:
111 // pause
112 // lfence
113 // jmp .Lr11_capture_spec
114 // .align 16
115 // .Lr11_call_target:
116 // movq %r11, (%rsp)
117 // retq
118
119 createThunk(M, "r11", X86::R11);
120 } else {
121 // For 32-bit targets we need to emit a collection of thunks for various
122 // possible scratch registers as well as a fallback that is used when
123 // there are no scratch registers and assumes the retpoline target has
124 // been pushed.
125 // __llvm_retpoline_eax:
126 // calll .Leax_call_target
127 // .Leax_capture_spec:
128 // pause
129 // jmp .Leax_capture_spec
130 // .align 16
131 // .Leax_call_target:
132 // movl %eax, (%esp) # Clobber return addr
133 // retl
134 //
135 // __llvm_retpoline_ecx:
136 // ... # Same setup
137 // movl %ecx, (%esp)
138 // retl
139 //
140 // __llvm_retpoline_edx:
141 // ... # Same setup
142 // movl %edx, (%esp)
143 // retl
144 //
145 // This last one is a bit more special and so needs a little extra
146 // handling.
147 // __llvm_retpoline_push:
148 // calll .Lpush_call_target
149 // .Lpush_capture_spec:
150 // pause
151 // lfence
152 // jmp .Lpush_capture_spec
153 // .align 16
154 // .Lpush_call_target:
155 // # Clear pause_loop return address.
156 // addl $4, %esp
157 // # Top of stack words are: Callee, RA. Exchange Callee and RA.
158 // pushl 4(%esp) # Push callee
159 // pushl 4(%esp) # Push RA
160 // popl 8(%esp) # Pop RA to final RA
161 // popl (%esp) # Pop callee to next top of stack
162 // retl # Ret to callee
163 createThunk(M, "eax", X86::EAX);
164 createThunk(M, "ecx", X86::ECX);
165 createThunk(M, "edx", X86::EDX);
166 createThunk(M, "push");
167 }
168
169 return true;
170 }
171
172 Function *X86RetpolineThunks::createThunkFunction(Module &M, StringRef Name) {
173 LLVMContext &Ctx = M.getContext();
174 auto Type = FunctionType::get(Type::getVoidTy(Ctx), false);
175 Function *F =
176 Function::Create(Type, GlobalValue::LinkOnceODRLinkage, Name, &M);
177 F->setVisibility(GlobalValue::HiddenVisibility);
178 F->setComdat(M.getOrInsertComdat(Name));
179
180 // Add Attributes so that we don't create a frame, unwind information, or
181 // inline.
182 AttrBuilder B;
183 B.addAttribute(llvm::Attribute::NoUnwind);
184 B.addAttribute(llvm::Attribute::Naked);
185 F->addAttributes(llvm::AttributeList::FunctionIndex, B);
186
187 // Populate our function a bit so that we can verify.
188 BasicBlock *Entry = BasicBlock::Create(Ctx, "entry", F);
189 IRBuilder<> Builder(Entry);
190
191 Builder.CreateRetVoid();
192 return F;
193 }
194
195 void X86RetpolineThunks::insertRegReturnAddrClobber(MachineBasicBlock &MBB,
196 unsigned Reg) {
197 const unsigned MovOpc = Is64Bit ? X86::MOV64mr : X86::MOV32mr;
198 const unsigned SPReg = Is64Bit ? X86::RSP : X86::ESP;
199 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(MovOpc)), SPReg, false, 0)
200 .addReg(Reg);
201 }
202 void X86RetpolineThunks::insert32BitPushReturnAddrClobber(
203 MachineBasicBlock &MBB) {
204 // The instruction sequence we use to replace the return address without
205 // a scratch register is somewhat complicated:
206 // # Clear capture_spec from return address.
207 // addl $4, %esp
208 // # Top of stack words are: Callee, RA. Exchange Callee and RA.
209 // pushl 4(%esp) # Push callee
210 // pushl 4(%esp) # Push RA
211 // popl 8(%esp) # Pop RA to final RA
212 // popl (%esp) # Pop callee to next top of stack
213 // retl # Ret to callee
214 BuildMI(&MBB, DebugLoc(), TII->get(X86::ADD32ri), X86::ESP)
215 .addReg(X86::ESP)
216 .addImm(4);
217 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
218 false, 4);
219 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
220 false, 4);
221 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
222 false, 8);
223 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
224 false, 0);
225 }
226
227 void X86RetpolineThunks::createThunk(Module &M, StringRef NameSuffix,
228 Optional Reg) {
229 Function &F =
230 *createThunkFunction(M, (Twine("__llvm_retpoline_") + NameSuffix).str());
231 MachineFunction &MF = MMI->getOrCreateMachineFunction(F);
232
233 // Set MF properties. We never use vregs...
234 MF.getProperties().set(MachineFunctionProperties::Property::NoVRegs);
235
236 BasicBlock &OrigEntryBB = F.getEntryBlock();
237 MachineBasicBlock *Entry = MF.CreateMachineBasicBlock(&OrigEntryBB);
238 MachineBasicBlock *CaptureSpec = MF.CreateMachineBasicBlock(&OrigEntryBB);
239 MachineBasicBlock *CallTarget = MF.CreateMachineBasicBlock(&OrigEntryBB);
240
241 MF.push_back(Entry);
242 MF.push_back(CaptureSpec);
243 MF.push_back(CallTarget);
244
245 const unsigned CallOpc = Is64Bit ? X86::CALL64pcrel32 : X86::CALLpcrel32;
246 const unsigned RetOpc = Is64Bit ? X86::RETQ : X86::RETL;
247
248 BuildMI(Entry, DebugLoc(), TII->get(CallOpc)).addMBB(CallTarget);
249 Entry->addSuccessor(CallTarget);
250 Entry->addSuccessor(CaptureSpec);
251 CallTarget->setHasAddressTaken();
252
253 // In the capture loop for speculation, we want to stop the processor from
254 // speculating as fast as possible. On Intel processors, the PAUSE instruction
255 // will block speculation without consuming any execution resources. On AMD
256 // processors, the PAUSE instruction is (essentially) a nop, so we also use an
257 // LFENCE instruction which they have advised will stop speculation as well
258 // with minimal resource utilization. We still end the capture with a jump to
259 // form an infinite loop to fully guarantee that no matter what implementation
260 // of the x86 ISA, speculating this code path never escapes.
261 BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::PAUSE));
262 BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::LFENCE));
263 BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::JMP_1)).addMBB(CaptureSpec);
264 CaptureSpec->setHasAddressTaken();
265 CaptureSpec->addSuccessor(CaptureSpec);
266
267 CallTarget->setAlignment(4);
268 if (Reg) {
269 insertRegReturnAddrClobber(*CallTarget, *Reg);
270 } else {
271 assert(!Is64Bit && "We only support non-reg thunks on 32-bit x86!");
272 insert32BitPushReturnAddrClobber(*CallTarget);
273 }
274 BuildMI(CallTarget, DebugLoc(), TII->get(RetOpc));
275 }
314314 HasCLFLUSHOPT = false;
315315 HasCLWB = false;
316316 IsBTMemSlow = false;
317 UseRetpoline = false;
318 UseRetpolineExternalThunk = false;
317319 IsPMULLDSlow = false;
318320 IsSHLDSlow = false;
319321 IsUAMem16Slow = false;
295295
296296 /// Processor supports Cache Line Write Back instruction
297297 bool HasCLWB;
298
299 /// Use a retpoline thunk rather than indirect calls to block speculative
300 /// execution.
301 bool UseRetpoline;
302
303 /// When using a retpoline thunk, call an externally provided thunk rather
304 /// than emitting one inside the compiler.
305 bool UseRetpolineExternalThunk;
298306
299307 /// Use software floating point for code generation.
300308 bool UseSoftFloat;
505513 bool hasPKU() const { return HasPKU; }
506514 bool hasMPX() const { return HasMPX; }
507515 bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }
516 bool useRetpoline() const { return UseRetpoline; }
517 bool useRetpolineExternalThunk() const { return UseRetpolineExternalThunk; }
508518
509519 bool isXRaySupported() const override { return is64Bit(); }
510520
638648 /// compiler runtime or math libraries.
639649 bool hasSinCos() const;
640650
651 /// If we are using retpolines, we need to expand indirectbr to avoid it
652 /// lowering to an actual indirect jump.
653 bool enableIndirectBrExpand() const override { return useRetpoline(); }
654
641655 /// Enable the MachineScheduler pass for all X86 subtargets.
642656 bool enableMachineScheduler() const override { return true; }
643657
304304 void addPreRegAlloc() override;
305305 void addPostRegAlloc() override;
306306 void addPreEmitPass() override;
307 void addPreEmitPass2() override;
307308 void addPreSched2() override;
308309 };
309310
333334
334335 if (TM->getOptLevel() != CodeGenOpt::None)
335336 addPass(createInterleavedAccessPass());
337
338 // Add passes that handle indirect branch removal and insertion of a retpoline
339 // thunk. These will be a no-op unless a function subtarget has the retpoline
340 // feature enabled.
341 addPass(createIndirectBrExpandPass());
336342 }
337343
338344 bool X86PassConfig::addInstSelector() {
417423 addPass(createX86EvexToVexInsts());
418424 }
419425 }
426
427 void X86PassConfig::addPreEmitPass2() {
428 addPass(createX86RetpolineThunksPass());
429 }
2424 ; CHECK-NEXT: Inserts calls to mcount-like functions
2525 ; CHECK-NEXT: Scalarize Masked Memory Intrinsics
2626 ; CHECK-NEXT: Expand reduction intrinsics
27 ; CHECK-NEXT: Expand indirectbr instructions
2728 ; CHECK-NEXT: Rewrite Symbols
2829 ; CHECK-NEXT: FunctionPass Manager
2930 ; CHECK-NEXT: Dominator Tree Construction
5455 ; CHECK-NEXT: Machine Natural Loop Construction
5556 ; CHECK-NEXT: Insert XRay ops
5657 ; CHECK-NEXT: Implement the 'patchable-function' attribute
58 ; CHECK-NEXT: X86 Retpoline Thunks
59 ; CHECK-NEXT: FunctionPass Manager
5760 ; CHECK-NEXT: Lazy Machine Block Frequency Analysis
5861 ; CHECK-NEXT: Machine Optimization Remark Emitter
5962 ; CHECK-NEXT: MachineDominator Tree Construction
0 ; RUN: llc -mtriple=x86_64-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64
1 ; RUN: llc -mtriple=x86_64-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64FAST
2
3 ; RUN: llc -mtriple=i686-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86
4 ; RUN: llc -mtriple=i686-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86FAST
5
6 declare void @bar(i32)
7
8 ; Test a simple indirect call and tail call.
9 define void @icall_reg(void (i32)* %fp, i32 %x) #0 {
10 entry:
11 tail call void @bar(i32 %x)
12 tail call void %fp(i32 %x)
13 tail call void @bar(i32 %x)
14 tail call void %fp(i32 %x)
15 ret void
16 }
17
18 ; X64-LABEL: icall_reg:
19 ; X64-DAG: movq %rdi, %[[fp:[^ ]*]]
20 ; X64-DAG: movl %esi, %[[x:[^ ]*]]
21 ; X64: movl %[[x]], %edi
22 ; X64: callq bar
23 ; X64-DAG: movl %[[x]], %edi
24 ; X64-DAG: movq %[[fp]], %r11
25 ; X64: callq __llvm_external_retpoline_r11
26 ; X64: movl %[[x]], %edi
27 ; X64: callq bar
28 ; X64-DAG: movl %[[x]], %edi
29 ; X64-DAG: movq %[[fp]], %r11
30 ; X64: jmp __llvm_external_retpoline_r11 # TAILCALL
31
32 ; X64FAST-LABEL: icall_reg:
33 ; X64FAST: callq bar
34 ; X64FAST: callq __llvm_external_retpoline_r11
35 ; X64FAST: callq bar
36 ; X64FAST: jmp __llvm_external_retpoline_r11 # TAILCALL
37
38 ; X86-LABEL: icall_reg:
39 ; X86-DAG: movl 12(%esp), %[[fp:[^ ]*]]
40 ; X86-DAG: movl 16(%esp), %[[x:[^ ]*]]
41 ; X86: pushl %[[x]]
42 ; X86: calll bar
43 ; X86: movl %[[fp]], %eax
44 ; X86: pushl %[[x]]
45 ; X86: calll __llvm_external_retpoline_eax
46 ; X86: pushl %[[x]]
47 ; X86: calll bar
48 ; X86: movl %[[fp]], %eax
49 ; X86: pushl %[[x]]
50 ; X86: calll __llvm_external_retpoline_eax
51 ; X86-NOT: # TAILCALL
52
53 ; X86FAST-LABEL: icall_reg:
54 ; X86FAST: calll bar
55 ; X86FAST: calll __llvm_external_retpoline_eax
56 ; X86FAST: calll bar
57 ; X86FAST: calll __llvm_external_retpoline_eax
58
59
60 @global_fp = external global void (i32)*
61
62 ; Test an indirect call through a global variable.
63 define void @icall_global_fp(i32 %x, void (i32)** %fpp) #0 {
64 %fp1 = load void (i32)*, void (i32)** @global_fp
65 call void %fp1(i32 %x)
66 %fp2 = load void (i32)*, void (i32)** @global_fp
67 tail call void %fp2(i32 %x)
68 ret void
69 }
70
71 ; X64-LABEL: icall_global_fp:
72 ; X64-DAG: movl %edi, %[[x:[^ ]*]]
73 ; X64-DAG: movq global_fp(%rip), %r11
74 ; X64: callq __llvm_external_retpoline_r11
75 ; X64-DAG: movl %[[x]], %edi
76 ; X64-DAG: movq global_fp(%rip), %r11
77 ; X64: jmp __llvm_external_retpoline_r11 # TAILCALL
78
79 ; X64FAST-LABEL: icall_global_fp:
80 ; X64FAST: movq global_fp(%rip), %r11
81 ; X64FAST: callq __llvm_external_retpoline_r11
82 ; X64FAST: movq global_fp(%rip), %r11
83 ; X64FAST: jmp __llvm_external_retpoline_r11 # TAILCALL
84
85 ; X86-LABEL: icall_global_fp:
86 ; X86: movl global_fp, %eax
87 ; X86: pushl 4(%esp)
88 ; X86: calll __llvm_external_retpoline_eax
89 ; X86: addl $4, %esp
90 ; X86: movl global_fp, %eax
91 ; X86: jmp __llvm_external_retpoline_eax # TAILCALL
92
93 ; X86FAST-LABEL: icall_global_fp:
94 ; X86FAST: calll __llvm_external_retpoline_eax
95 ; X86FAST: jmp __llvm_external_retpoline_eax # TAILCALL
96
97
98 %struct.Foo = type { void (%struct.Foo*)** }
99
100 ; Test an indirect call through a vtable.
101 define void @vcall(%struct.Foo* %obj) #0 {
102 %vptr_field = getelementptr %struct.Foo, %struct.Foo* %obj, i32 0, i32 0
103 %vptr = load void (%struct.Foo*)**, void (%struct.Foo*)*** %vptr_field
104 %vslot = getelementptr void(%struct.Foo*)*, void(%struct.Foo*)** %vptr, i32 1
105 %fp = load void(%struct.Foo*)*, void(%struct.Foo*)** %vslot
106 tail call void %fp(%struct.Foo* %obj)
107 tail call void %fp(%struct.Foo* %obj)
108 ret void
109 }
110
111 ; X64-LABEL: vcall:
112 ; X64: movq %rdi, %[[obj:[^ ]*]]
113 ; X64: movq (%[[obj]]), %[[vptr:[^ ]*]]
114 ; X64: movq 8(%[[vptr]]), %[[fp:[^ ]*]]
115 ; X64: movq %[[fp]], %r11
116 ; X64: callq __llvm_external_retpoline_r11
117 ; X64-DAG: movq %[[obj]], %rdi
118 ; X64-DAG: movq %[[fp]], %r11
119 ; X64: jmp __llvm_external_retpoline_r11 # TAILCALL
120
121 ; X64FAST-LABEL: vcall:
122 ; X64FAST: callq __llvm_external_retpoline_r11
123 ; X64FAST: jmp __llvm_external_retpoline_r11 # TAILCALL
124
125 ; X86-LABEL: vcall:
126 ; X86: movl 8(%esp), %[[obj:[^ ]*]]
127 ; X86: movl (%[[obj]]), %[[vptr:[^ ]*]]
128 ; X86: movl 4(%[[vptr]]), %[[fp:[^ ]*]]
129 ; X86: movl %[[fp]], %eax
130 ; X86: pushl %[[obj]]
131 ; X86: calll __llvm_external_retpoline_eax
132 ; X86: addl $4, %esp
133 ; X86: movl %[[fp]], %eax
134 ; X86: jmp __llvm_external_retpoline_eax # TAILCALL
135
136 ; X86FAST-LABEL: vcall:
137 ; X86FAST: calll __llvm_external_retpoline_eax
138 ; X86FAST: jmp __llvm_external_retpoline_eax # TAILCALL
139
140
141 declare void @direct_callee()
142
143 define void @direct_tail() #0 {
144 tail call void @direct_callee()
145 ret void
146 }
147
148 ; X64-LABEL: direct_tail:
149 ; X64: jmp direct_callee # TAILCALL
150 ; X64FAST-LABEL: direct_tail:
151 ; X64FAST: jmp direct_callee # TAILCALL
152 ; X86-LABEL: direct_tail:
153 ; X86: jmp direct_callee # TAILCALL
154 ; X86FAST-LABEL: direct_tail:
155 ; X86FAST: jmp direct_callee # TAILCALL
156
157
158 ; Lastly check that no thunks were emitted.
159 ; X64-NOT: __{{.*}}_retpoline_{{.*}}:
160 ; X64FAST-NOT: __{{.*}}_retpoline_{{.*}}:
161 ; X86-NOT: __{{.*}}_retpoline_{{.*}}:
162 ; X86FAST-NOT: __{{.*}}_retpoline_{{.*}}:
163
164
165 attributes #0 = { "target-features"="+retpoline-external-thunk" }
0 ; RUN: llc -mtriple=x86_64-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64
1 ; RUN: llc -mtriple=x86_64-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64FAST
2
3 ; RUN: llc -mtriple=i686-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86
4 ; RUN: llc -mtriple=i686-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86FAST
5
6 declare void @bar(i32)
7
8 ; Test a simple indirect call and tail call.
9 define void @icall_reg(void (i32)* %fp, i32 %x) #0 {
10 entry:
11 tail call void @bar(i32 %x)
12 tail call void %fp(i32 %x)
13 tail call void @bar(i32 %x)
14 tail call void %fp(i32 %x)
15 ret void
16 }
17
18 ; X64-LABEL: icall_reg:
19 ; X64-DAG: movq %rdi, %[[fp:[^ ]*]]
20 ; X64-DAG: movl %esi, %[[x:[^ ]*]]
21 ; X64: movl %[[x]], %edi
22 ; X64: callq bar
23 ; X64-DAG: movl %[[x]], %edi
24 ; X64-DAG: movq %[[fp]], %r11
25 ; X64: callq __llvm_retpoline_r11
26 ; X64: movl %[[x]], %edi
27 ; X64: callq bar
28 ; X64-DAG: movl %[[x]], %edi
29 ; X64-DAG: movq %[[fp]], %r11
30 ; X64: jmp __llvm_retpoline_r11 # TAILCALL
31
32 ; X64FAST-LABEL: icall_reg:
33 ; X64FAST: callq bar
34 ; X64FAST: callq __llvm_retpoline_r11
35 ; X64FAST: callq bar
36 ; X64FAST: jmp __llvm_retpoline_r11 # TAILCALL
37
38 ; X86-LABEL: icall_reg:
39 ; X86-DAG: movl 12(%esp), %[[fp:[^ ]*]]
40 ; X86-DAG: movl 16(%esp), %[[x:[^ ]*]]
41 ; X86: pushl %[[x]]
42 ; X86: calll bar
43 ; X86: movl %[[fp]], %eax
44 ; X86: pushl %[[x]]
45 ; X86: calll __llvm_retpoline_eax
46 ; X86: pushl %[[x]]
47 ; X86: calll bar
48 ; X86: movl %[[fp]], %eax
49 ; X86: pushl %[[x]]
50 ; X86: calll __llvm_retpoline_eax
51 ; X86-NOT: # TAILCALL
52
53 ; X86FAST-LABEL: icall_reg:
54 ; X86FAST: calll bar
55 ; X86FAST: calll __llvm_retpoline_eax
56 ; X86FAST: calll bar
57 ; X86FAST: calll __llvm_retpoline_eax
58
59
60 @global_fp = external global void (i32)*
61
62 ; Test an indirect call through a global variable.
63 define void @icall_global_fp(i32 %x, void (i32)** %fpp) #0 {
64 %fp1 = load void (i32)*, void (i32)** @global_fp
65 call void %fp1(i32 %x)
66 %fp2 = load void (i32)*, void (i32)** @global_fp
67 tail call void %fp2(i32 %x)
68 ret void
69 }
70
71 ; X64-LABEL: icall_global_fp:
72 ; X64-DAG: movl %edi, %[[x:[^ ]*]]
73 ; X64-DAG: movq global_fp(%rip), %r11
74 ; X64: callq __llvm_retpoline_r11
75 ; X64-DAG: movl %[[x]], %edi
76 ; X64-DAG: movq global_fp(%rip), %r11
77 ; X64: jmp __llvm_retpoline_r11 # TAILCALL
78
79 ; X64FAST-LABEL: icall_global_fp:
80 ; X64FAST: movq global_fp(%rip), %r11
81 ; X64FAST: callq __llvm_retpoline_r11
82 ; X64FAST: movq global_fp(%rip), %r11
83 ; X64FAST: jmp __llvm_retpoline_r11 # TAILCALL
84
85 ; X86-LABEL: icall_global_fp:
86 ; X86: movl global_fp, %eax
87 ; X86: pushl 4(%esp)
88 ; X86: calll __llvm_retpoline_eax
89 ; X86: addl $4, %esp
90 ; X86: movl global_fp, %eax
91 ; X86: jmp __llvm_retpoline_eax # TAILCALL
92
93 ; X86FAST-LABEL: icall_global_fp:
94 ; X86FAST: calll __llvm_retpoline_eax
95 ; X86FAST: jmp __llvm_retpoline_eax # TAILCALL
96
97
98 %struct.Foo = type { void (%struct.Foo*)** }
99
100 ; Test an indirect call through a vtable.
101 define void @vcall(%struct.Foo* %obj) #0 {
102 %vptr_field = getelementptr %struct.Foo, %struct.Foo* %obj, i32 0, i32 0
103 %vptr = load void (%struct.Foo*)**, void (%struct.Foo*)*** %vptr_field
104 %vslot = getelementptr void(%struct.Foo*)*, void(%struct.Foo*)** %vptr, i32 1
105 %fp = load void(%struct.Foo*)*, void(%struct.Foo*)** %vslot
106 tail call void %fp(%struct.Foo* %obj)
107 tail call void %fp(%struct.Foo* %obj)
108 ret void
109 }
110
111 ; X64-LABEL: vcall:
112 ; X64: movq %rdi, %[[obj:[^ ]*]]
113 ; X64: movq (%[[obj]]), %[[vptr:[^ ]*]]
114 ; X64: movq 8(%[[vptr]]), %[[fp:[^ ]*]]
115 ; X64: movq %[[fp]], %r11
116 ; X64: callq __llvm_retpoline_r11
117 ; X64-DAG: movq %[[obj]], %rdi
118 ; X64-DAG: movq %[[fp]], %r11
119 ; X64: jmp __llvm_retpoline_r11 # TAILCALL
120
121 ; X64FAST-LABEL: vcall:
122 ; X64FAST: callq __llvm_retpoline_r11
123 ; X64FAST: jmp __llvm_retpoline_r11 # TAILCALL
124
125 ; X86-LABEL: vcall:
126 ; X86: movl 8(%esp), %[[obj:[^ ]*]]
127 ; X86: movl (%[[obj]]), %[[vptr:[^ ]*]]
128 ; X86: movl 4(%[[vptr]]), %[[fp:[^ ]*]]
129 ; X86: movl %[[fp]], %eax
130 ; X86: pushl %[[obj]]
131 ; X86: calll __llvm_retpoline_eax
132 ; X86: addl $4, %esp
133 ; X86: movl %[[fp]], %eax
134 ; X86: jmp __llvm_retpoline_eax # TAILCALL
135
136 ; X86FAST-LABEL: vcall:
137 ; X86FAST: calll __llvm_retpoline_eax
138 ; X86FAST: jmp __llvm_retpoline_eax # TAILCALL
139
140
141 declare void @direct_callee()
142
143 define void @direct_tail() #0 {
144 tail call void @direct_callee()
145 ret void
146 }
147
148 ; X64-LABEL: direct_tail:
149 ; X64: jmp direct_callee # TAILCALL
150 ; X64FAST-LABEL: direct_tail:
151 ; X64FAST: jmp direct_callee # TAILCALL
152 ; X86-LABEL: direct_tail:
153 ; X86: jmp direct_callee # TAILCALL
154 ; X86FAST-LABEL: direct_tail:
155 ; X86FAST: jmp direct_callee # TAILCALL
156
157
158 declare void @nonlazybind_callee() #1
159
160 define void @nonlazybind_caller() #0 {
161 call void @nonlazybind_callee()
162 tail call void @nonlazybind_callee()
163 ret void
164 }
165
166 ; nonlazybind wasn't implemented in LLVM 5.0, so this looks the same as direct.
167 ; X64-LABEL: nonlazybind_caller:
168 ; X64: callq nonlazybind_callee
169 ; X64: jmp nonlazybind_callee # TAILCALL
170 ; X64FAST-LABEL: nonlazybind_caller:
171 ; X64FAST: callq nonlazybind_callee
172 ; X64FAST: jmp nonlazybind_callee # TAILCALL
173 ; X86-LABEL: nonlazybind_caller:
174 ; X86: calll nonlazybind_callee
175 ; X86: jmp nonlazybind_callee # TAILCALL
176 ; X86FAST-LABEL: nonlazybind_caller:
177 ; X86FAST: calll nonlazybind_callee
178 ; X86FAST: jmp nonlazybind_callee # TAILCALL
179
180
181 @indirectbr_rewrite.targets = constant [10 x i8*] [i8* blockaddress(@indirectbr_rewrite, %bb0),
182 i8* blockaddress(@indirectbr_rewrite, %bb1),
183 i8* blockaddress(@indirectbr_rewrite, %bb2),
184 i8* blockaddress(@indirectbr_rewrite, %bb3),
185 i8* blockaddress(@indirectbr_rewrite, %bb4),
186 i8* blockaddress(@indirectbr_rewrite, %bb5),
187 i8* blockaddress(@indirectbr_rewrite, %bb6),
188 i8* blockaddress(@indirectbr_rewrite, %bb7),
189 i8* blockaddress(@indirectbr_rewrite, %bb8),
190 i8* blockaddress(@indirectbr_rewrite, %bb9)]
191
192 ; Check that when retpolines are enabled a function with indirectbr gets
193 ; rewritten to use switch, and that in turn doesn't get lowered as a jump
194 ; table.
195 define void @indirectbr_rewrite(i64* readonly %p, i64* %sink) #0 {
196 ; X64-LABEL: indirectbr_rewrite:
197 ; X64-NOT: jmpq
198 ; X86-LABEL: indirectbr_rewrite:
199 ; X86-NOT: jmpl
200 entry:
201 %i0 = load i64, i64* %p
202 %target.i0 = getelementptr [10 x i8*], [10 x i8*]* @indirectbr_rewrite.targets, i64 0, i64 %i0
203 %target0 = load i8*, i8** %target.i0
204 indirectbr i8* %target0, [label %bb1, label %bb3]
205
206 bb0:
207 store volatile i64 0, i64* %sink
208 br label %latch
209
210 bb1:
211 store volatile i64 1, i64* %sink
212 br label %latch
213
214 bb2:
215 store volatile i64 2, i64* %sink
216 br label %latch
217
218 bb3:
219 store volatile i64 3, i64* %sink
220 br label %latch
221
222 bb4:
223 store volatile i64 4, i64* %sink
224 br label %latch
225
226 bb5:
227 store volatile i64 5, i64* %sink
228 br label %latch
229
230 bb6:
231 store volatile i64 6, i64* %sink
232 br label %latch
233
234 bb7:
235 store volatile i64 7, i64* %sink
236 br label %latch
237
238 bb8:
239 store volatile i64 8, i64* %sink
240 br label %latch
241
242 bb9:
243 store volatile i64 9, i64* %sink
244 br label %latch
245
246 latch:
247 %i.next = load i64, i64* %p
248 %target.i.next = getelementptr [10 x i8*], [10 x i8*]* @indirectbr_rewrite.targets, i64 0, i64 %i.next
249 %target.next = load i8*, i8** %target.i.next
250 ; Potentially hit a full 10 successors here so that even if we rewrite as
251 ; a switch it will try to be lowered with a jump table.
252 indirectbr i8* %target.next, [label %bb0,
253 label %bb1,
254 label %bb2,
255 label %bb3,
256 label %bb4,
257 label %bb5,
258 label %bb6,
259 label %bb7,
260 label %bb8,
261 label %bb9]
262 }
263
264 ; Lastly check that the necessary thunks were emitted.
265 ;
266 ; X64-LABEL: .section .text.__llvm_retpoline_r11,{{.*}},__llvm_retpoline_r11,comdat
267 ; X64-NEXT: .hidden __llvm_retpoline_r11
268 ; X64-NEXT: .weak __llvm_retpoline_r11
269 ; X64: __llvm_retpoline_r11:
270 ; X64-NEXT: # {{.*}} # %entry
271 ; X64-NEXT: callq [[CALL_TARGET:.*]]
272 ; X64-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
273 ; X64-NEXT: # %entry
274 ; X64-NEXT: # =>This Inner Loop Header: Depth=1
275 ; X64-NEXT: pause
276 ; X64-NEXT: lfence
277 ; X64-NEXT: jmp [[CAPTURE_SPEC]]
278 ; X64-NEXT: .p2align 4, 0x90
279 ; X64-NEXT: [[CALL_TARGET]]: # Block address taken
280 ; X64-NEXT: # %entry
281 ; X64-NEXT: movq %r11, (%rsp)
282 ; X64-NEXT: retq
283 ;
284 ; X86-LABEL: .section .text.__llvm_retpoline_eax,{{.*}},__llvm_retpoline_eax,comdat
285 ; X86-NEXT: .hidden __llvm_retpoline_eax
286 ; X86-NEXT: .weak __llvm_retpoline_eax
287 ; X86: __llvm_retpoline_eax:
288 ; X86-NEXT: # {{.*}} # %entry
289 ; X86-NEXT: calll [[CALL_TARGET:.*]]
290 ; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
291 ; X86-NEXT: # %entry
292 ; X86-NEXT: # =>This Inner Loop Header: Depth=1
293 ; X86-NEXT: pause
294 ; X86-NEXT: lfence
295 ; X86-NEXT: jmp [[CAPTURE_SPEC]]
296 ; X86-NEXT: .p2align 4, 0x90
297 ; X86-NEXT: [[CALL_TARGET]]: # Block address taken
298 ; X86-NEXT: # %entry
299 ; X86-NEXT: movl %eax, (%esp)
300 ; X86-NEXT: retl
301 ;
302 ; X86-LABEL: .section .text.__llvm_retpoline_ecx,{{.*}},__llvm_retpoline_ecx,comdat
303 ; X86-NEXT: .hidden __llvm_retpoline_ecx
304 ; X86-NEXT: .weak __llvm_retpoline_ecx
305 ; X86: __llvm_retpoline_ecx:
306 ; X86-NEXT: # {{.*}} # %entry
307 ; X86-NEXT: calll [[CALL_TARGET:.*]]
308 ; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
309 ; X86-NEXT: # %entry
310 ; X86-NEXT: # =>This Inner Loop Header: Depth=1
311 ; X86-NEXT: pause
312 ; X86-NEXT: lfence
313 ; X86-NEXT: jmp [[CAPTURE_SPEC]]
314 ; X86-NEXT: .p2align 4, 0x90
315 ; X86-NEXT: [[CALL_TARGET]]: # Block address taken
316 ; X86-NEXT: # %entry
317 ; X86-NEXT: movl %ecx, (%esp)
318 ; X86-NEXT: retl
319 ;
320 ; X86-LABEL: .section .text.__llvm_retpoline_edx,{{.*}},__llvm_retpoline_edx,comdat
321 ; X86-NEXT: .hidden __llvm_retpoline_edx
322 ; X86-NEXT: .weak __llvm_retpoline_edx
323 ; X86: __llvm_retpoline_edx:
324 ; X86-NEXT: # {{.*}} # %entry
325 ; X86-NEXT: calll [[CALL_TARGET:.*]]
326 ; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
327 ; X86-NEXT: # %entry
328 ; X86-NEXT: # =>This Inner Loop Header: Depth=1
329 ; X86-NEXT: pause
330 ; X86-NEXT: lfence
331 ; X86-NEXT: jmp [[CAPTURE_SPEC]]
332 ; X86-NEXT: .p2align 4, 0x90
333 ; X86-NEXT: [[CALL_TARGET]]: # Block address taken
334 ; X86-NEXT: # %entry
335 ; X86-NEXT: movl %edx, (%esp)
336 ; X86-NEXT: retl
337 ;
338 ; X86-LABEL: .section .text.__llvm_retpoline_push,{{.*}},__llvm_retpoline_push,comdat
339 ; X86-NEXT: .hidden __llvm_retpoline_push
340 ; X86-NEXT: .weak __llvm_retpoline_push
341 ; X86: __llvm_retpoline_push:
342 ; X86-NEXT: # {{.*}} # %entry
343 ; X86-NEXT: calll [[CALL_TARGET:.*]]
344 ; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
345 ; X86-NEXT: # %entry
346 ; X86-NEXT: # =>This Inner Loop Header: Depth=1
347 ; X86-NEXT: pause
348 ; X86-NEXT: lfence
349 ; X86-NEXT: jmp [[CAPTURE_SPEC]]
350 ; X86-NEXT: .p2align 4, 0x90
351 ; X86-NEXT: [[CALL_TARGET]]: # Block address taken
352 ; X86-NEXT: # %entry
353 ; X86-NEXT: addl $4, %esp
354 ; X86-NEXT: pushl 4(%esp)
355 ; X86-NEXT: pushl 4(%esp)
356 ; X86-NEXT: popl 8(%esp)
357 ; X86-NEXT: popl (%esp)
358 ; X86-NEXT: retl
359
360
361 attributes #0 = { "target-features"="+retpoline" }
362 attributes #1 = { nonlazybind }
0 ; RUN: opt < %s -indirectbr-expand -S | FileCheck %s
1 ;
2 ; REQUIRES: x86-registered-target
3
4 target triple = "x86_64-unknown-linux-gnu"
5
6 @test1.targets = constant [4 x i8*] [i8* blockaddress(@test1, %bb0),
7 i8* blockaddress(@test1, %bb1),
8 i8* blockaddress(@test1, %bb2),
9 i8* blockaddress(@test1, %bb3)]
10 ; CHECK-LABEL: @test1.targets = constant [4 x i8*]
11 ; CHECK: [i8* inttoptr (i64 1 to i8*),
12 ; CHECK: i8* inttoptr (i64 2 to i8*),
13 ; CHECK: i8* inttoptr (i64 3 to i8*),
14 ; CHECK: i8* blockaddress(@test1, %bb3)]
15
16 define void @test1(i64* readonly %p, i64* %sink) #0 {
17 ; CHECK-LABEL: define void @test1(
18 entry:
19 %i0 = load i64, i64* %p
20 %target.i0 = getelementptr [4 x i8*], [4 x i8*]* @test1.targets, i64 0, i64 %i0
21 %target0 = load i8*, i8** %target.i0
22 ; Only a subset of blocks are viable successors here.
23 indirectbr i8* %target0, [label %bb0, label %bb1]
24 ; CHECK-NOT: indirectbr
25 ; CHECK: %[[ENTRY_V:.*]] = ptrtoint i8* %{{.*}} to i64
26 ; CHECK-NEXT: br label %[[SWITCH_BB:.*]]
27
28 bb0:
29 store volatile i64 0, i64* %sink
30 br label %latch
31
32 bb1:
33 store volatile i64 1, i64* %sink
34 br label %latch
35
36 bb2:
37 store volatile i64 2, i64* %sink
38 br label %latch
39
40 bb3:
41 store volatile i64 3, i64* %sink
42 br label %latch
43
44 latch:
45 %i.next = load i64, i64* %p
46 %target.i.next = getelementptr [4 x i8*], [4 x i8*]* @test1.targets, i64 0, i64 %i.next
47 %target.next = load i8*, i8** %target.i.next
48 ; A different subset of blocks are viable successors here.
49 indirectbr i8* %target.next, [label %bb1, label %bb2]
50 ; CHECK-NOT: indirectbr
51 ; CHECK: %[[LATCH_V:.*]] = ptrtoint i8* %{{.*}} to i64
52 ; CHECK-NEXT: br label %[[SWITCH_BB]]
53 ;
54 ; CHECK: [[SWITCH_BB]]:
55 ; CHECK-NEXT: %[[V:.*]] = phi i64 [ %[[ENTRY_V]], %entry ], [ %[[LATCH_V]], %latch ]
56 ; CHECK-NEXT: switch i64 %[[V]], label %bb0 [
57 ; CHECK-NEXT: i64 2, label %bb1
58 ; CHECK-NEXT: i64 3, label %bb2
59 ; CHECK-NEXT: ]
60 }
61
62 attributes #0 = { "target-features"="+retpoline" }
400400 initializeSjLjEHPreparePass(Registry);
401401 initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
402402 initializeGlobalMergePass(Registry);
403 initializeIndirectBrExpandPassPass(Registry);
403404 initializeInterleavedAccessPass(Registry);
404405 initializeCountingFunctionInserterPass(Registry);
405406 initializeUnreachableBlockElimLegacyPassPass(Registry);