llvm.org GIT mirror llvm / e307072
Merging r323155: ------------------------------------------------------------------------ r323155 | chandlerc | 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) | 133 lines Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324067 91177308-0d34-0410-b5e6-96231b3b80d8 Hans Wennborg 1 year, 7 months ago
32 changed file(s) with 1368 addition(s) and 12 deletion(s). Raw diff Collapse all Expand all
416416 // This pass expands memcmp() to load/stores.
417417 FunctionPass *createExpandMemCmpPass();
418418
419 // This pass expands indirectbr instructions.
420 FunctionPass *createIndirectBrExpandPass();
421
419422 } // End llvm namespace
420423
421424 #endif
799799 }
800800
801801 /// Return true if lowering to a jump table is allowed.
802 bool areJTsAllowed(const Function *Fn) const {
802 virtual bool areJTsAllowed(const Function *Fn) const {
803803 if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
804804 return false;
805805
415415 /// immediately before machine code is emitted.
416416 virtual void addPreEmitPass() { }
417417
418 /// Targets may add passes immediately before machine code is emitted in this
419 /// callback. This is called even later than `addPreEmitPass`.
420 // FIXME: Rename `addPreEmitPass` to something more sensible given its actual
421 // position and remove the `2` suffix here as this callback is what
422 // `addPreEmitPass` *should* be but in reality isn't.
423 virtual void addPreEmitPass2() {}
424
418425 /// Utilities for targets to add passes to the pass manager.
419426 ///
420427
173173 /// \brief True if the subtarget should run the atomic expansion pass.
174174 virtual bool enableAtomicExpand() const;
175175
176 /// True if the subtarget should run the indirectbr expansion pass.
177 virtual bool enableIndirectBrExpand() const;
178
176179 /// \brief Override generic scheduling policy within a region.
177180 ///
178181 /// This is a convenient way for targets that don't provide any custom
160160 void initializeIfConverterPass(PassRegistry&);
161161 void initializeImplicitNullChecksPass(PassRegistry&);
162162 void initializeIndVarSimplifyLegacyPassPass(PassRegistry&);
163 void initializeIndirectBrExpandPassPass(PassRegistry&);
163164 void initializeInductiveRangeCheckEliminationPass(PassRegistry&);
164165 void initializeInferAddressSpacesPass(PassRegistry&);
165166 void initializeInferFunctionAttrsLegacyPassPass(PassRegistry&);
3232 GlobalMerge.cpp
3333 IfConversion.cpp
3434 ImplicitNullChecks.cpp
35 IndirectBrExpandPass.cpp
3536 InlineSpiller.cpp
3637 InterferenceCache.cpp
3738 InterleavedAccessPass.cpp
3737 initializeGCModuleInfoPass(Registry);
3838 initializeIfConverterPass(Registry);
3939 initializeImplicitNullChecksPass(Registry);
40 initializeIndirectBrExpandPassPass(Registry);
4041 initializeInterleavedAccessPass(Registry);
4142 initializeLiveDebugValuesPass(Registry);
4243 initializeLiveDebugVariablesPass(Registry);
0 //===- IndirectBrExpandPass.cpp - Expand indirectbr to switch -------------===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// Implements an expansion pass to turn `indirectbr` instructions in the IR
11 /// into `switch` instructions. This works by enumerating the basic blocks in
12 /// a dense range of integers, replacing each `blockaddr` constant with the
13 /// corresponding integer constant, and then building a switch that maps from
14 /// the integers to the actual blocks. All of the indirectbr instructions in the
15 /// function are redirected to this common switch.
16 ///
17 /// While this is generically useful if a target is unable to codegen
18 /// `indirectbr` natively, it is primarily useful when there is some desire to
19 /// get the builtin non-jump-table lowering of a switch even when the input
20 /// source contained an explicit indirect branch construct.
21 ///
22 /// Note that it doesn't make any sense to enable this pass unless a target also
23 /// disables jump-table lowering of switches. Doing that is likely to pessimize
24 /// the code.
25 ///
26 //===----------------------------------------------------------------------===//
27
28 #include "llvm/ADT/STLExtras.h"
29 #include "llvm/ADT/Sequence.h"
30 #include "llvm/ADT/SmallVector.h"
31 #include "llvm/CodeGen/TargetPassConfig.h"
32 #include "llvm/CodeGen/TargetSubtargetInfo.h"
33 #include "llvm/IR/BasicBlock.h"
34 #include "llvm/IR/Function.h"
35 #include "llvm/IR/IRBuilder.h"
36 #include "llvm/IR/InstIterator.h"
37 #include "llvm/IR/Instruction.h"
38 #include "llvm/IR/Instructions.h"
39 #include "llvm/Pass.h"
40 #include "llvm/Support/Debug.h"
41 #include "llvm/Support/ErrorHandling.h"
42 #include "llvm/Support/raw_ostream.h"
43 #include "llvm/Target/TargetMachine.h"
44
45 using namespace llvm;
46
47 #define DEBUG_TYPE "indirectbr-expand"
48
49 namespace {
50
51 class IndirectBrExpandPass : public FunctionPass {
52 const TargetLowering *TLI = nullptr;
53
54 public:
55 static char ID; // Pass identification, replacement for typeid
56
57 IndirectBrExpandPass() : FunctionPass(ID) {
58 initializeIndirectBrExpandPassPass(*PassRegistry::getPassRegistry());
59 }
60
61 bool runOnFunction(Function &F) override;
62 };
63
64 } // end anonymous namespace
65
66 char IndirectBrExpandPass::ID = 0;
67
68 INITIALIZE_PASS(IndirectBrExpandPass, DEBUG_TYPE,
69 "Expand indirectbr instructions", false, false)
70
71 FunctionPass *llvm::createIndirectBrExpandPass() {
72 return new IndirectBrExpandPass();
73 }
74
75 bool IndirectBrExpandPass::runOnFunction(Function &F) {
76 auto &DL = F.getParent()->getDataLayout();
77 auto *TPC = getAnalysisIfAvailable();
78 if (!TPC)
79 return false;
80
81 auto &TM = TPC->getTM();
82 auto &STI = *TM.getSubtargetImpl(F);
83 if (!STI.enableIndirectBrExpand())
84 return false;
85 TLI = STI.getTargetLowering();
86
87 SmallVector IndirectBrs;
88
89 // Set of all potential successors for indirectbr instructions.
90 SmallPtrSet IndirectBrSuccs;
91
92 // Build a list of indirectbrs that we want to rewrite.
93 for (BasicBlock &BB : F)
94 if (auto *IBr = dyn_cast(BB.getTerminator())) {
95 // Handle the degenerate case of no successors by replacing the indirectbr
96 // with unreachable as there is no successor available.
97 if (IBr->getNumSuccessors() == 0) {
98 (void)new UnreachableInst(F.getContext(), IBr);
99 IBr->eraseFromParent();
100 continue;
101 }
102
103 IndirectBrs.push_back(IBr);
104 for (BasicBlock *SuccBB : IBr->successors())
105 IndirectBrSuccs.insert(SuccBB);
106 }
107
108 if (IndirectBrs.empty())
109 return false;
110
111 // If we need to replace any indirectbrs we need to establish integer
112 // constants that will correspond to each of the basic blocks in the function
113 // whose address escapes. We do that here and rewrite all the blockaddress
114 // constants to just be those integer constants cast to a pointer type.
115 SmallVector BBs;
116
117 for (BasicBlock &BB : F) {
118 // Skip blocks that aren't successors to an indirectbr we're going to
119 // rewrite.
120 if (!IndirectBrSuccs.count(&BB))
121 continue;
122
123 auto IsBlockAddressUse = [&](const Use &U) {
124 return isa(U.getUser());
125 };
126 auto BlockAddressUseIt = llvm::find_if(BB.uses(), IsBlockAddressUse);
127 if (BlockAddressUseIt == BB.use_end())
128 continue;
129
130 assert(std::find_if(std::next(BlockAddressUseIt), BB.use_end(),
131 IsBlockAddressUse) == BB.use_end() &&
132 "There should only ever be a single blockaddress use because it is "
133 "a constant and should be uniqued.");
134
135 auto *BA = cast(BlockAddressUseIt->getUser());
136
137 // Skip if the constant was formed but ended up not being used (due to DCE
138 // or whatever).
139 if (!BA->isConstantUsed())
140 continue;
141
142 // Compute the index we want to use for this basic block. We can't use zero
143 // because null can be compared with block addresses.
144 int BBIndex = BBs.size() + 1;
145 BBs.push_back(&BB);
146
147 auto *ITy = cast(DL.getIntPtrType(BA->getType()));
148 ConstantInt *BBIndexC = ConstantInt::get(ITy, BBIndex);
149
150 // Now rewrite the blockaddress to an integer constant based on the index.
151 // FIXME: We could potentially preserve the uses as arguments to inline asm.
152 // This would allow some uses such as diagnostic information in crashes to
153 // have higher quality even when this transform is enabled, but would break
154 // users that round-trip blockaddresses through inline assembly and then
155 // back into an indirectbr.
156 BA->replaceAllUsesWith(ConstantExpr::getIntToPtr(BBIndexC, BA->getType()));
157 }
158
159 if (BBs.empty()) {
160 // There are no blocks whose address is taken, so any indirectbr instruction
161 // cannot get a valid input and we can replace all of them with unreachable.
162 for (auto *IBr : IndirectBrs) {
163 (void)new UnreachableInst(F.getContext(), IBr);
164 IBr->eraseFromParent();
165 }
166 return true;
167 }
168
169 BasicBlock *SwitchBB;
170 Value *SwitchValue;
171
172 // Compute a common integer type across all the indirectbr instructions.
173 IntegerType *CommonITy = nullptr;
174 for (auto *IBr : IndirectBrs) {
175 auto *ITy =
176 cast(DL.getIntPtrType(IBr->getAddress()->getType()));
177 if (!CommonITy || ITy->getBitWidth() > CommonITy->getBitWidth())
178 CommonITy = ITy;
179 }
180
181 auto GetSwitchValue = [DL, CommonITy](IndirectBrInst *IBr) {
182 return CastInst::CreatePointerCast(
183 IBr->getAddress(), CommonITy,
184 Twine(IBr->getAddress()->getName()) + ".switch_cast", IBr);
185 };
186
187 if (IndirectBrs.size() == 1) {
188 // If we only have one indirectbr, we can just directly replace it within
189 // its block.
190 SwitchBB = IndirectBrs[0]->getParent();
191 SwitchValue = GetSwitchValue(IndirectBrs[0]);
192 IndirectBrs[0]->eraseFromParent();
193 } else {
194 // Otherwise we need to create a new block to hold the switch across BBs,
195 // jump to that block instead of each indirectbr, and phi together the
196 // values for the switch.
197 SwitchBB = BasicBlock::Create(F.getContext(), "switch_bb", &F);
198 auto *SwitchPN = PHINode::Create(CommonITy, IndirectBrs.size(),
199 "switch_value_phi", SwitchBB);
200 SwitchValue = SwitchPN;
201
202 // Now replace the indirectbr instructions with direct branches to the
203 // switch block and fill out the PHI operands.
204 for (auto *IBr : IndirectBrs) {
205 SwitchPN->addIncoming(GetSwitchValue(IBr), IBr->getParent());
206 BranchInst::Create(SwitchBB, IBr);
207 IBr->eraseFromParent();
208 }
209 }
210
211 // Now build the switch in the block. The block will have no terminator
212 // already.
213 auto *SI = SwitchInst::Create(SwitchValue, BBs[0], BBs.size(), SwitchBB);
214
215 // Add a case for each block.
216 for (int i : llvm::seq(1, BBs.size()))
217 SI->addCase(ConstantInt::get(CommonITy, i + 1), BBs[i]);
218
219 return true;
220 }
906906 if (EnableMachineOutliner)
907907 PM->add(createMachineOutlinerPass(EnableLinkOnceODROutlining));
908908
909 // Add passes that directly emit MI after all other MI passes.
910 addPreEmitPass2();
911
909912 AddingMachinePasses = false;
910913 }
911914
3535
3636 bool TargetSubtargetInfo::enableAtomicExpand() const {
3737 return true;
38 }
39
40 bool TargetSubtargetInfo::enableIndirectBrExpand() const {
41 return false;
3842 }
3943
4044 bool TargetSubtargetInfo::enableMachineScheduler() const {
4747 X86PadShortFunction.cpp
4848 X86RegisterBankInfo.cpp
4949 X86RegisterInfo.cpp
50 X86RetpolineThunks.cpp
5051 X86SelectionDAGInfo.cpp
5152 X86ShuffleDecodeConstantPool.cpp
5253 X86Subtarget.cpp
2121 class FunctionPass;
2222 class ImmutablePass;
2323 class InstructionSelector;
24 class ModulePass;
2425 class PassRegistry;
2526 class X86RegisterBankInfo;
2627 class X86Subtarget;
101102 /// encoding when possible in order to reduce code size.
102103 FunctionPass *createX86EvexToVexInsts();
103104
105 /// This pass creates the thunks for the retpoline feature.
106 ModulePass *createX86RetpolineThunksPass();
107
104108 InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,
105109 X86Subtarget &,
106110 X86RegisterBankInfo &);
328328 : SubtargetFeature<"fast-gather", "HasFastGather", "true",
329329 "Indicates if gather is reasonably fast.">;
330330
331 // Enable mitigation of some aspects of speculative execution related
332 // vulnerabilities by removing speculatable indirect branches. This disables
333 // jump-table formation, rewrites explicit `indirectbr` instructions into
334 // `switch` instructions, and uses a special construct called a "retpoline" to
335 // prevent speculation of the remaining indirect branches (indirect calls and
336 // tail calls).
337 def FeatureRetpoline
338 : SubtargetFeature<"retpoline", "UseRetpoline", "true",
339 "Remove speculation of indirect branches from the "
340 "generated code, either by avoiding them entirely or "
341 "lowering them with a speculation blocking construct.">;
342
343 // Rely on external thunks for the emitted retpoline calls. This allows users
344 // to provide their own custom thunk definitions in highly specialized
345 // environments such as a kernel that does boot-time hot patching.
346 def FeatureRetpolineExternalThunk
347 : SubtargetFeature<
348 "retpoline-external-thunk", "UseRetpolineExternalThunk", "true",
349 "Enable retpoline, but with an externally provided thunk.",
350 [FeatureRetpoline]>;
351
331352 //===----------------------------------------------------------------------===//
332353 // Register File Description
333354 //===----------------------------------------------------------------------===//
3131 FaultMaps FM;
3232 std::unique_ptr CodeEmitter;
3333 bool EmitFPOData = false;
34 bool NeedsRetpoline = false;
3435
3536 // This utility class tracks the length of a stackmap instruction's 'shadow'.
3637 // It is used by the X86AsmPrinter to ensure that the stackmap shadow
31713171 (CalledFn && CalledFn->hasFnAttribute("no_caller_saved_registers")))
31723172 return false;
31733173
3174 // Functions using retpoline should use SDISel for calls.
3175 if (Subtarget->useRetpoline())
3176 return false;
3177
31743178 // Handle only C, fastcc, and webkit_js calling conventions for now.
31753179 switch (CC) {
31763180 default: return false;
740740 bool InProlog) const {
741741 bool IsLargeCodeModel = MF.getTarget().getCodeModel() == CodeModel::Large;
742742
743 // FIXME: Add retpoline support and remove this.
744 if (Is64Bit && IsLargeCodeModel && STI.useRetpoline())
745 report_fatal_error("Emitting stack probe calls on 64-bit with the large "
746 "code model and retpoline not yet implemented.");
747
743748 unsigned CallOp;
744749 if (Is64Bit)
745750 CallOp = IsLargeCodeModel ? X86::CALL64r : X86::CALL64pcrel32;
23442349 // This solution is not perfect, as it assumes that the .rodata section
23452350 // is laid out within 2^31 bytes of each function body, but this seems
23462351 // to be sufficient for JIT.
2352 // FIXME: Add retpoline support and remove the error here..
2353 if (STI.useRetpoline())
2354 report_fatal_error("Emitting morestack calls on 64-bit with the large "
2355 "code model and retpoline not yet implemented.");
23472356 BuildMI(allocMBB, DL, TII.get(X86::CALL64m))
23482357 .addReg(X86::RIP)
23492358 .addImm(0)
628628 SDNode *N = &*I++; // Preincrement iterator to avoid invalidation issues.
629629
630630 if (OptLevel != CodeGenOpt::None &&
631 // Only does this when target favors doesn't favor register indirect
632 // call.
631 // Only do this when the target can fold the load into the call or
632 // jmp.
633 !Subtarget->useRetpoline() &&
633634 ((N->getOpcode() == X86ISD::CALL && !Subtarget->slowTwoMemOps()) ||
634635 (N->getOpcode() == X86ISD::TC_RETURN &&
635 // Only does this if load can be folded into TC_RETURN.
636636 (Subtarget->is64Bit() ||
637637 !getTargetMachine().isPositionIndependent())))) {
638638 /// Also try moving call address load from outside callseq_start to just
2576625766 return isShuffleMaskLegal(Mask, VT);
2576725767 }
2576825768
25769 bool X86TargetLowering::areJTsAllowed(const Function *Fn) const {
25770 // If the subtarget is using retpolines, we need to not generate jump tables.
25771 if (Subtarget.useRetpoline())
25772 return false;
25773
25774 // Otherwise, fallback on the generic logic.
25775 return TargetLowering::areJTsAllowed(Fn);
25776 }
25777
2576925778 //===----------------------------------------------------------------------===//
2577025779 // X86 Scheduler Hooks
2577125780 //===----------------------------------------------------------------------===//
2706827077 return BB;
2706927078 }
2707027079
27080 static unsigned getOpcodeForRetpoline(unsigned RPOpc) {
27081 switch (RPOpc) {
27082 case X86::RETPOLINE_CALL32:
27083 return X86::CALLpcrel32;
27084 case X86::RETPOLINE_CALL64:
27085 return X86::CALL64pcrel32;
27086 case X86::RETPOLINE_TCRETURN32:
27087 return X86::TCRETURNdi;
27088 case X86::RETPOLINE_TCRETURN64:
27089 return X86::TCRETURNdi64;
27090 }
27091 llvm_unreachable("not retpoline opcode");
27092 }
27093
27094 static const char *getRetpolineSymbol(const X86Subtarget &Subtarget,
27095 unsigned Reg) {
27096 switch (Reg) {
27097 case 0:
27098 assert(!Subtarget.is64Bit() && "R11 should always be available on x64");
27099 return Subtarget.useRetpolineExternalThunk()
27100 ? "__llvm_external_retpoline_push"
27101 : "__llvm_retpoline_push";
27102 case X86::EAX:
27103 return Subtarget.useRetpolineExternalThunk()
27104 ? "__llvm_external_retpoline_eax"
27105 : "__llvm_retpoline_eax";
27106 case X86::ECX:
27107 return Subtarget.useRetpolineExternalThunk()
27108 ? "__llvm_external_retpoline_ecx"
27109 : "__llvm_retpoline_ecx";
27110 case X86::EDX:
27111 return Subtarget.useRetpolineExternalThunk()
27112 ? "__llvm_external_retpoline_edx"
27113 : "__llvm_retpoline_edx";
27114 case X86::R11:
27115 return Subtarget.useRetpolineExternalThunk()
27116 ? "__llvm_external_retpoline_r11"
27117 : "__llvm_retpoline_r11";
27118 }
27119 llvm_unreachable("unexpected reg for retpoline");
27120 }
27121
27122 MachineBasicBlock *
27123 X86TargetLowering::EmitLoweredRetpoline(MachineInstr &MI,
27124 MachineBasicBlock *BB) const {
27125 // Copy the virtual register into the R11 physical register and
27126 // call the retpoline thunk.
27127 DebugLoc DL = MI.getDebugLoc();
27128 const X86InstrInfo *TII = Subtarget.getInstrInfo();
27129 unsigned CalleeVReg = MI.getOperand(0).getReg();
27130 unsigned Opc = getOpcodeForRetpoline(MI.getOpcode());
27131
27132 // Find an available scratch register to hold the callee. On 64-bit, we can
27133 // just use R11, but we scan for uses anyway to ensure we don't generate
27134 // incorrect code. On 32-bit, we use one of EAX, ECX, or EDX that isn't
27135 // already a register use operand to the call to hold the callee. If none
27136 // are available, push the callee instead. This is less efficient, but is
27137 // necessary for functions using 3 regparms. Such function calls are
27138 // (currently) not eligible for tail call optimization, because there is no
27139 // scratch register available to hold the address of the callee.
27140 SmallVector AvailableRegs;
27141 if (Subtarget.is64Bit())
27142 AvailableRegs.push_back(X86::R11);
27143 else
27144 AvailableRegs.append({X86::EAX, X86::ECX, X86::EDX});
27145
27146 // Zero out any registers that are already used.
27147 for (const auto &MO : MI.operands()) {
27148 if (MO.isReg() && MO.isUse())
27149 for (unsigned &Reg : AvailableRegs)
27150 if (Reg == MO.getReg())
27151 Reg = 0;
27152 }
27153
27154 // Choose the first remaining non-zero available register.
27155 unsigned AvailableReg = 0;
27156 for (unsigned MaybeReg : AvailableRegs) {
27157 if (MaybeReg) {
27158 AvailableReg = MaybeReg;
27159 break;
27160 }
27161 }
27162
27163 const char *Symbol = getRetpolineSymbol(Subtarget, AvailableReg);
27164
27165 if (AvailableReg == 0) {
27166 // No register available. Use PUSH. This must not be a tailcall, and this
27167 // must not be x64.
27168 if (Subtarget.is64Bit())
27169 report_fatal_error(
27170 "Cannot make an indirect call on x86-64 using both retpoline and a "
27171 "calling convention that preservers r11");
27172 if (Opc != X86::CALLpcrel32)
27173 report_fatal_error("Cannot make an indirect tail call on x86 using "
27174 "retpoline without a preserved register");
27175 BuildMI(*BB, MI, DL, TII->get(X86::PUSH32r)).addReg(CalleeVReg);
27176 MI.getOperand(0).ChangeToES(Symbol);
27177 MI.setDesc(TII->get(Opc));
27178 } else {
27179 BuildMI(*BB, MI, DL, TII->get(TargetOpcode::COPY), AvailableReg)
27180 .addReg(CalleeVReg);
27181 MI.getOperand(0).ChangeToES(Symbol);
27182 MI.setDesc(TII->get(Opc));
27183 MachineInstrBuilder(*BB->getParent(), &MI)
27184 .addReg(AvailableReg, RegState::Implicit | RegState::Kill);
27185 }
27186 return BB;
27187 }
27188
2707127189 MachineBasicBlock *
2707227190 X86TargetLowering::emitEHSjLjSetJmp(MachineInstr &MI,
2707327191 MachineBasicBlock *MBB) const {
2758327701 case X86::TLS_base_addr32:
2758427702 case X86::TLS_base_addr64:
2758527703 return EmitLoweredTLSAddr(MI, BB);
27704 case X86::RETPOLINE_CALL32:
27705 case X86::RETPOLINE_CALL64:
27706 case X86::RETPOLINE_TCRETURN32:
27707 case X86::RETPOLINE_TCRETURN64:
27708 return EmitLoweredRetpoline(MI, BB);
2758627709 case X86::CATCHRET:
2758727710 return EmitLoweredCatchRet(MI, BB);
2758827711 case X86::CATCHPAD:
981981 bool isVectorClearMaskLegal(const SmallVectorImpl &Mask,
982982 EVT VT) const override;
983983
984 /// Returns true if lowering to a jump table is allowed.
985 bool areJTsAllowed(const Function *Fn) const override;
986
984987 /// If true, then instruction selection should
985988 /// seek to shrink the FP constant of the specified type to a smaller type
986989 /// in order to save space and / or reduce runtime.
12931296 MachineBasicBlock *EmitLoweredTLSCall(MachineInstr &MI,
12941297 MachineBasicBlock *BB) const;
12951298
1299 MachineBasicBlock *EmitLoweredRetpoline(MachineInstr &MI,
1300 MachineBasicBlock *BB) const;
1301
12961302 MachineBasicBlock *emitEHSjLjSetJmp(MachineInstr &MI,
12971303 MachineBasicBlock *MBB) const;
12981304
11451145
11461146 def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
11471147 (TCRETURNri ptr_rc_tailcall:$dst, imm:$off)>,
1148 Requires<[Not64BitMode]>;
1148 Requires<[Not64BitMode, NotUseRetpoline]>;
11491149
11501150 // FIXME: This is disabled for 32-bit PIC mode because the global base
11511151 // register which is part of the address mode may be assigned a
11521152 // callee-saved register.
11531153 def : Pat<(X86tcret (load addr:$dst), imm:$off),
11541154 (TCRETURNmi addr:$dst, imm:$off)>,
1155 Requires<[Not64BitMode, IsNotPIC]>;
1155 Requires<[Not64BitMode, IsNotPIC, NotUseRetpoline]>;
11561156
11571157 def : Pat<(X86tcret (i32 tglobaladdr:$dst), imm:$off),
11581158 (TCRETURNdi tglobaladdr:$dst, imm:$off)>,
11641164
11651165 def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
11661166 (TCRETURNri64 ptr_rc_tailcall:$dst, imm:$off)>,
1167 Requires<[In64BitMode]>;
1167 Requires<[In64BitMode, NotUseRetpoline]>;
11681168
11691169 // Don't fold loads into X86tcret requiring more than 6 regs.
11701170 // There wouldn't be enough scratch registers for base+index.
11711171 def : Pat<(X86tcret_6regs (load addr:$dst), imm:$off),
11721172 (TCRETURNmi64 addr:$dst, imm:$off)>,
1173 Requires<[In64BitMode]>;
1173 Requires<[In64BitMode, NotUseRetpoline]>;
1174
1175 def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
1176 (RETPOLINE_TCRETURN64 ptr_rc_tailcall:$dst, imm:$off)>,
1177 Requires<[In64BitMode, UseRetpoline]>;
1178
1179 def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
1180 (RETPOLINE_TCRETURN32 ptr_rc_tailcall:$dst, imm:$off)>,
1181 Requires<[Not64BitMode, UseRetpoline]>;
11741182
11751183 def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
11761184 (TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,
210210 Sched<[WriteJumpLd]>;
211211 def CALL32r : I<0xFF, MRM2r, (outs), (ins GR32:$dst),
212212 "call{l}\t{*}$dst", [(X86call GR32:$dst)], IIC_CALL_RI>,
213 OpSize32, Requires<[Not64BitMode]>, Sched<[WriteJump]>;
213 OpSize32, Requires<[Not64BitMode,NotUseRetpoline]>,
214 Sched<[WriteJump]>;
214215 def CALL32m : I<0xFF, MRM2m, (outs), (ins i32mem:$dst),
215216 "call{l}\t{*}$dst", [(X86call (loadi32 addr:$dst))],
216217 IIC_CALL_MEM>, OpSize32,
217 Requires<[Not64BitMode,FavorMemIndirectCall]>,
218 Requires<[Not64BitMode,FavorMemIndirectCall,NotUseRetpoline]>,
218219 Sched<[WriteJumpLd]>;
219220
220221 let Predicates = [Not64BitMode] in {
297298 def CALL64r : I<0xFF, MRM2r, (outs), (ins GR64:$dst),
298299 "call{q}\t{*}$dst", [(X86call GR64:$dst)],
299300 IIC_CALL_RI>,
300 Requires<[In64BitMode]>;
301 Requires<[In64BitMode,NotUseRetpoline]>;
301302 def CALL64m : I<0xFF, MRM2m, (outs), (ins i64mem:$dst),
302303 "call{q}\t{*}$dst", [(X86call (loadi64 addr:$dst))],
303304 IIC_CALL_MEM>,
304 Requires<[In64BitMode,FavorMemIndirectCall]>;
305 Requires<[In64BitMode,FavorMemIndirectCall,
306 NotUseRetpoline]>;
305307
306308 def FARCALL64 : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),
307309 "lcall{q}\t{*}$dst", [], IIC_CALL_FAR_MEM>;
340342 }
341343 }
342344
345 let isPseudo = 1, isCall = 1, isCodeGenOnly = 1,
346 Uses = [RSP, SSP],
347 usesCustomInserter = 1,
348 SchedRW = [WriteJump] in {
349 def RETPOLINE_CALL32 :
350 PseudoI<(outs), (ins GR32:$dst), [(X86call GR32:$dst)]>,
351 Requires<[Not64BitMode,UseRetpoline]>;
352
353 def RETPOLINE_CALL64 :
354 PseudoI<(outs), (ins GR64:$dst), [(X86call GR64:$dst)]>,
355 Requires<[In64BitMode,UseRetpoline]>;
356
357 // Retpoline variant of indirect tail calls.
358 let isTerminator = 1, isReturn = 1, isBarrier = 1 in {
359 def RETPOLINE_TCRETURN64 :
360 PseudoI<(outs), (ins GR64:$dst, i32imm:$offset), []>;
361 def RETPOLINE_TCRETURN32 :
362 PseudoI<(outs), (ins GR32:$dst, i32imm:$offset), []>;
363 }
364 }
365
343366 // Conditional tail calls are similar to the above, but they are branches
344367 // rather than barriers, and they use EFLAGS.
345368 let isCall = 1, isTerminator = 1, isReturn = 1, isBranch = 1,
937937 def HasFastSHLDRotate : Predicate<"Subtarget->hasFastSHLDRotate()">;
938938 def HasERMSB : Predicate<"Subtarget->hasERMSB()">;
939939 def HasMFence : Predicate<"Subtarget->hasMFence()">;
940 def UseRetpoline : Predicate<"Subtarget->useRetpoline()">;
941 def NotUseRetpoline : Predicate<"!Subtarget->useRetpoline()">;
940942
941943 //===----------------------------------------------------------------------===//
942944 // X86 Instruction Format Definitions.
873873 // address is to far away. (TODO: support non-relative addressing)
874874 break;
875875 case MachineOperand::MO_Register:
876 // FIXME: Add retpoline support and remove this.
877 if (Subtarget->useRetpoline())
878 report_fatal_error("Lowering register statepoints with retpoline not "
879 "yet implemented.");
876880 CallTargetMCOp = MCOperand::createReg(CallTarget.getReg());
877881 CallOpcode = X86::CALL64r;
878882 break;
10271031
10281032 EmitAndCountInstruction(
10291033 MCInstBuilder(X86::MOV64ri).addReg(ScratchReg).addOperand(CalleeMCOp));
1034 // FIXME: Add retpoline support and remove this.
1035 if (Subtarget->useRetpoline())
1036 report_fatal_error(
1037 "Lowering patchpoint with retpoline not yet implemented.");
10301038 EmitAndCountInstruction(MCInstBuilder(X86::CALL64r).addReg(ScratchReg));
10311039 }
10321040
0 //======- X86RetpolineThunks.cpp - Construct retpoline thunks for x86 --=====//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// Pass that injects an MI thunk implementing a "retpoline". This is
11 /// a RET-implemented trampoline that is used to lower indirect calls in a way
12 /// that prevents speculation on some x86 processors and can be used to mitigate
13 /// security vulnerabilities due to targeted speculative execution and side
14 /// channels such as CVE-2017-5715.
15 ///
16 /// TODO(chandlerc): All of this code could use better comments and
17 /// documentation.
18 ///
19 //===----------------------------------------------------------------------===//
20
21 #include "X86.h"
22 #include "X86InstrBuilder.h"
23 #include "X86Subtarget.h"
24 #include "llvm/CodeGen/MachineFunction.h"
25 #include "llvm/CodeGen/MachineInstrBuilder.h"
26 #include "llvm/CodeGen/MachineModuleInfo.h"
27 #include "llvm/CodeGen/Passes.h"
28 #include "llvm/CodeGen/TargetPassConfig.h"
29 #include "llvm/IR/IRBuilder.h"
30 #include "llvm/IR/Instructions.h"
31 #include "llvm/IR/Module.h"
32 #include "llvm/Support/CommandLine.h"
33 #include "llvm/Support/Debug.h"
34 #include "llvm/Support/raw_ostream.h"
35
36 using namespace llvm;
37
38 #define DEBUG_TYPE "x86-retpoline-thunks"
39
40 namespace {
41 class X86RetpolineThunks : public ModulePass {
42 public:
43 static char ID;
44
45 X86RetpolineThunks() : ModulePass(ID) {}
46
47 StringRef getPassName() const override { return "X86 Retpoline Thunks"; }
48
49 bool runOnModule(Module &M) override;
50
51 void getAnalysisUsage(AnalysisUsage &AU) const override {
52 AU.addRequired();
53 AU.addPreserved();
54 }
55
56 private:
57 MachineModuleInfo *MMI;
58 const TargetMachine *TM;
59 bool Is64Bit;
60 const X86Subtarget *STI;
61 const X86InstrInfo *TII;
62
63 Function *createThunkFunction(Module &M, StringRef Name);
64 void insertRegReturnAddrClobber(MachineBasicBlock &MBB, unsigned Reg);
65 void insert32BitPushReturnAddrClobber(MachineBasicBlock &MBB);
66 void createThunk(Module &M, StringRef NameSuffix,
67 Optional Reg = None);
68 };
69
70 } // end anonymous namespace
71
72 ModulePass *llvm::createX86RetpolineThunksPass() {
73 return new X86RetpolineThunks();
74 }
75
76 char X86RetpolineThunks::ID = 0;
77
78 bool X86RetpolineThunks::runOnModule(Module &M) {
79 DEBUG(dbgs() << getPassName() << '\n');
80
81 auto *TPC = getAnalysisIfAvailable();
82 assert(TPC && "X86-specific target pass should not be run without a target "
83 "pass config!");
84
85 MMI = &getAnalysis();
86 TM = &TPC->getTM();
87 Is64Bit = TM->getTargetTriple().getArch() == Triple::x86_64;
88
89 // Only add a thunk if we have at least one function that has the retpoline
90 // feature enabled in its subtarget.
91 // FIXME: Conditionalize on indirect calls so we don't emit a thunk when
92 // nothing will end up calling it.
93 // FIXME: It's a little silly to look at every function just to enumerate
94 // the subtargets, but eventually we'll want to look at them for indirect
95 // calls, so maybe this is OK.
96 if (!llvm::any_of(M, [&](const Function &F) {
97 // Save the subtarget we find for use in emitting the subsequent
98 // thunk.
99 STI = &TM->getSubtarget(F);
100 return STI->useRetpoline() && !STI->useRetpolineExternalThunk();
101 }))
102 return false;
103
104 // If we have a relevant subtarget, get the instr info as well.
105 TII = STI->getInstrInfo();
106
107 if (Is64Bit) {
108 // __llvm_retpoline_r11:
109 // callq .Lr11_call_target
110 // .Lr11_capture_spec:
111 // pause
112 // lfence
113 // jmp .Lr11_capture_spec
114 // .align 16
115 // .Lr11_call_target:
116 // movq %r11, (%rsp)
117 // retq
118
119 createThunk(M, "r11", X86::R11);
120 } else {
121 // For 32-bit targets we need to emit a collection of thunks for various
122 // possible scratch registers as well as a fallback that is used when
123 // there are no scratch registers and assumes the retpoline target has
124 // been pushed.
125 // __llvm_retpoline_eax:
126 // calll .Leax_call_target
127 // .Leax_capture_spec:
128 // pause
129 // jmp .Leax_capture_spec
130 // .align 16
131 // .Leax_call_target:
132 // movl %eax, (%esp) # Clobber return addr
133 // retl
134 //
135 // __llvm_retpoline_ecx:
136 // ... # Same setup
137 // movl %ecx, (%esp)
138 // retl
139 //
140 // __llvm_retpoline_edx:
141 // ... # Same setup
142 // movl %edx, (%esp)
143 // retl
144 //
145 // This last one is a bit more special and so needs a little extra
146 // handling.
147 // __llvm_retpoline_push:
148 // calll .Lpush_call_target
149 // .Lpush_capture_spec:
150 // pause
151 // lfence
152 // jmp .Lpush_capture_spec
153 // .align 16
154 // .Lpush_call_target:
155 // # Clear pause_loop return address.
156 // addl $4, %esp
157 // # Top of stack words are: Callee, RA. Exchange Callee and RA.
158 // pushl 4(%esp) # Push callee
159 // pushl 4(%esp) # Push RA
160 // popl 8(%esp) # Pop RA to final RA
161 // popl (%esp) # Pop callee to next top of stack
162 // retl # Ret to callee
163 createThunk(M, "eax", X86::EAX);
164 createThunk(M, "ecx", X86::ECX);
165 createThunk(M, "edx", X86::EDX);
166 createThunk(M, "push");
167 }
168
169 return true;
170 }
171
172 Function *X86RetpolineThunks::createThunkFunction(Module &M, StringRef Name) {
173 LLVMContext &Ctx = M.getContext();
174 auto Type = FunctionType::get(Type::getVoidTy(Ctx), false);
175 Function *F =
176 Function::Create(Type, GlobalValue::LinkOnceODRLinkage, Name, &M);
177 F->setVisibility(GlobalValue::HiddenVisibility);
178 F->setComdat(M.getOrInsertComdat(Name));
179
180 // Add Attributes so that we don't create a frame, unwind information, or
181 // inline.
182 AttrBuilder B;
183 B.addAttribute(llvm::Attribute::NoUnwind);
184 B.addAttribute(llvm::Attribute::Naked);
185 F->addAttributes(llvm::AttributeList::FunctionIndex, B);
186
187 // Populate our function a bit so that we can verify.
188 BasicBlock *Entry = BasicBlock::Create(Ctx, "entry", F);
189 IRBuilder<> Builder(Entry);
190
191 Builder.CreateRetVoid();
192 return F;
193 }
194
195 void X86RetpolineThunks::insertRegReturnAddrClobber(MachineBasicBlock &MBB,
196 unsigned Reg) {
197 const unsigned MovOpc = Is64Bit ? X86::MOV64mr : X86::MOV32mr;
198 const unsigned SPReg = Is64Bit ? X86::RSP : X86::ESP;
199 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(MovOpc)), SPReg, false, 0)
200 .addReg(Reg);
201 }
202 void X86RetpolineThunks::insert32BitPushReturnAddrClobber(
203 MachineBasicBlock &MBB) {
204 // The instruction sequence we use to replace the return address without
205 // a scratch register is somewhat complicated:
206 // # Clear capture_spec from return address.
207 // addl $4, %esp
208 // # Top of stack words are: Callee, RA. Exchange Callee and RA.
209 // pushl 4(%esp) # Push callee
210 // pushl 4(%esp) # Push RA
211 // popl 8(%esp) # Pop RA to final RA
212 // popl (%esp) # Pop callee to next top of stack
213 // retl # Ret to callee
214 BuildMI(&MBB, DebugLoc(), TII->get(X86::ADD32ri), X86::ESP)
215 .addReg(X86::ESP)
216 .addImm(4);
217 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
218 false, 4);
219 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
220 false, 4);
221 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
222 false, 8);
223 addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
224 false, 0);
225 }
226
227 void X86RetpolineThunks::createThunk(Module &M, StringRef NameSuffix,
228 Optional Reg) {
229 Function &F =
230 *createThunkFunction(M, (Twine("__llvm_retpoline_") + NameSuffix).str());
231 MachineFunction &MF = MMI->getOrCreateMachineFunction(F);
232
233 // Set MF properties. We never use vregs...
234 MF.getProperties().set(MachineFunctionProperties::Property::NoVRegs);
235
236 BasicBlock &OrigEntryBB = F.getEntryBlock();
237 MachineBasicBlock *Entry = MF.CreateMachineBasicBlock(&OrigEntryBB);
238 MachineBasicBlock *CaptureSpec = MF.CreateMachineBasicBlock(&OrigEntryBB);
239 MachineBasicBlock *CallTarget = MF.CreateMachineBasicBlock(&OrigEntryBB);
240
241 MF.push_back(Entry);
242 MF.push_back(CaptureSpec);
243 MF.push_back(CallTarget);
244
245 const unsigned CallOpc = Is64Bit ? X86::CALL64pcrel32 : X86::CALLpcrel32;
246 const unsigned RetOpc = Is64Bit ? X86::RETQ : X86::RETL;
247
248 BuildMI(Entry, DebugLoc(), TII->get(CallOpc)).addMBB(CallTarget);
249 Entry->addSuccessor(CallTarget);
250 Entry->addSuccessor(CaptureSpec);
251 CallTarget->setHasAddressTaken();
252
253 // In the capture loop for speculation, we want to stop the processor from
254 // speculating as fast as possible. On Intel processors, the PAUSE instruction
255 // will block speculation without consuming any execution resources. On AMD
256 // processors, the PAUSE instruction is (essentially) a nop, so we also use an
257 // LFENCE instruction which they have advised will stop speculation as well
258 // with minimal resource utilization. We still end the capture with a jump to
259 // form an infinite loop to fully guarantee that no matter what implementation
260 // of the x86 ISA, speculating this code path never escapes.
261 BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::PAUSE));
262 BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::LFENCE));
263 BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::JMP_1)).addMBB(CaptureSpec);
264 CaptureSpec->setHasAddressTaken();
265 CaptureSpec->addSuccessor(CaptureSpec);
266
267 CallTarget->setAlignment(4);
268 if (Reg) {
269 insertRegReturnAddrClobber(*CallTarget, *Reg);
270 } else {
271 assert(!Is64Bit && "We only support non-reg thunks on 32-bit x86!");
272 insert32BitPushReturnAddrClobber(*CallTarget);
273 }
274 BuildMI(CallTarget, DebugLoc(), TII->get(RetOpc));
275 }
313313 HasSGX = false;
314314 HasCLFLUSHOPT = false;
315315 HasCLWB = false;
316 UseRetpoline = false;
317 UseRetpolineExternalThunk = false;
316318 IsPMULLDSlow = false;
317319 IsSHLDSlow = false;
318320 IsUAMem16Slow = false;
339339
340340 /// Processor supports Cache Line Write Back instruction
341341 bool HasCLWB;
342
343 /// Use a retpoline thunk rather than indirect calls to block speculative
344 /// execution.
345 bool UseRetpoline;
346
347 /// When using a retpoline thunk, call an externally provided thunk rather
348 /// than emitting one inside the compiler.
349 bool UseRetpolineExternalThunk;
342350
343351 /// Use software floating point for code generation.
344352 bool UseSoftFloat;
573581 bool hasIBT() const { return HasIBT; }
574582 bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }
575583 bool hasCLWB() const { return HasCLWB; }
584 bool useRetpoline() const { return UseRetpoline; }
585 bool useRetpolineExternalThunk() const { return UseRetpolineExternalThunk; }
576586
577587 bool isXRaySupported() const override { return is64Bit(); }
578588
695705 /// Return true if the subtarget allows calls to immediate address.
696706 bool isLegalToCallImmediateAddr() const;
697707
708 /// If we are using retpolines, we need to expand indirectbr to avoid it
709 /// lowering to an actual indirect jump.
710 bool enableIndirectBrExpand() const override { return useRetpoline(); }
711
698712 /// Enable the MachineScheduler pass for all X86 subtargets.
699713 bool enableMachineScheduler() const override { return true; }
700714
320320 void addPreRegAlloc() override;
321321 void addPostRegAlloc() override;
322322 void addPreEmitPass() override;
323 void addPreEmitPass2() override;
323324 void addPreSched2() override;
324325 };
325326
349350
350351 if (TM->getOptLevel() != CodeGenOpt::None)
351352 addPass(createInterleavedAccessPass());
353
354 // Add passes that handle indirect branch removal and insertion of a retpoline
355 // thunk. These will be a no-op unless a function subtarget has the retpoline
356 // feature enabled.
357 addPass(createIndirectBrExpandPass());
352358 }
353359
354360 bool X86PassConfig::addInstSelector() {
435441 addPass(createX86EvexToVexInsts());
436442 }
437443 }
444
445 void X86PassConfig::addPreEmitPass2() {
446 addPass(createX86RetpolineThunksPass());
447 }
2424 ; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)
2525 ; CHECK-NEXT: Scalarize Masked Memory Intrinsics
2626 ; CHECK-NEXT: Expand reduction intrinsics
27 ; CHECK-NEXT: Expand indirectbr instructions
2728 ; CHECK-NEXT: Rewrite Symbols
2829 ; CHECK-NEXT: FunctionPass Manager
2930 ; CHECK-NEXT: Dominator Tree Construction
5657 ; CHECK-NEXT: Machine Natural Loop Construction
5758 ; CHECK-NEXT: Insert XRay ops
5859 ; CHECK-NEXT: Implement the 'patchable-function' attribute
60 ; CHECK-NEXT: X86 Retpoline Thunks
61 ; CHECK-NEXT: FunctionPass Manager
5962 ; CHECK-NEXT: Lazy Machine Block Frequency Analysis
6063 ; CHECK-NEXT: Machine Optimization Remark Emitter
6164 ; CHECK-NEXT: MachineDominator Tree Construction
0 ; RUN: llc -mtriple=x86_64-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64
1 ; RUN: llc -mtriple=x86_64-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64FAST
2
3 ; RUN: llc -mtriple=i686-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86
4 ; RUN: llc -mtriple=i686-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86FAST
5
6 declare void @bar(i32)
7
8 ; Test a simple indirect call and tail call.
9 define void @icall_reg(void (i32)* %fp, i32 %x) #0 {
10 entry:
11 tail call void @bar(i32 %x)
12 tail call void %fp(i32 %x)
13 tail call void @bar(i32 %x)
14 tail call void %fp(i32 %x)
15 ret void
16 }
17
18 ; X64-LABEL: icall_reg:
19 ; X64-DAG: movq %rdi, %[[fp:[^ ]*]]
20 ; X64-DAG: movl %esi, %[[x:[^ ]*]]
21 ; X64: movl %[[x]], %edi
22 ; X64: callq bar
23 ; X64-DAG: movl %[[x]], %edi
24 ; X64-DAG: movq %[[fp]], %r11
25 ; X64: callq __llvm_external_retpoline_r11
26 ; X64: movl %[[x]], %edi
27 ; X64: callq bar
28 ; X64-DAG: movl %[[x]], %edi
29 ; X64-DAG: movq %[[fp]], %r11
30 ; X64: jmp __llvm_external_retpoline_r11 # TAILCALL
31
32 ; X64FAST-LABEL: icall_reg:
33 ; X64FAST: callq bar
34 ; X64FAST: callq __llvm_external_retpoline_r11
35 ; X64FAST: callq bar
36 ; X64FAST: jmp __llvm_external_retpoline_r11 # TAILCALL
37
38 ; X86-LABEL: icall_reg:
39 ; X86-DAG: movl 12(%esp), %[[fp:[^ ]*]]
40 ; X86-DAG: movl 16(%esp), %[[x:[^ ]*]]
41 ; X86: pushl %[[x]]
42 ; X86: calll bar
43 ; X86: movl %[[fp]], %eax
44 ; X86: pushl %[[x]]
45 ; X86: calll __llvm_external_retpoline_eax
46 ; X86: pushl %[[x]]
47 ; X86: calll bar
48 ; X86: movl %[[fp]], %eax
49 ; X86: pushl %[[x]]
50 ; X86: calll __llvm_external_retpoline_eax
51 ; X86-NOT: # TAILCALL
52
53 ; X86FAST-LABEL: icall_reg:
54 ; X86FAST: calll bar
55 ; X86FAST: calll __llvm_external_retpoline_eax
56 ; X86FAST: calll bar
57 ; X86FAST: calll __llvm_external_retpoline_eax
58
59
60 @global_fp = external global void (i32)*
61
62 ; Test an indirect call through a global variable.
63 define void @icall_global_fp(i32 %x, void (i32)** %fpp) #0 {
64 %fp1 = load void (i32)*, void (i32)** @global_fp
65 call void %fp1(i32 %x)
66 %fp2 = load void (i32)*, void (i32)** @global_fp
67 tail call void %fp2(i32 %x)
68 ret void
69 }
70
71 ; X64-LABEL: icall_global_fp:
72 ; X64-DAG: movl %edi, %[[x:[^ ]*]]
73 ; X64-DAG: movq global_fp(%rip), %r11
74 ; X64: callq __llvm_external_retpoline_r11
75 ; X64-DAG: movl %[[x]], %edi
76 ; X64-DAG: movq global_fp(%rip), %r11
77 ; X64: jmp __llvm_external_retpoline_r11 # TAILCALL
78
79 ; X64FAST-LABEL: icall_global_fp:
80 ; X64FAST: movq global_fp(%rip), %r11
81 ; X64FAST: callq __llvm_external_retpoline_r11
82 ; X64FAST: movq global_fp(%rip), %r11
83 ; X64FAST: jmp __llvm_external_retpoline_r11 # TAILCALL
84
85 ; X86-LABEL: icall_global_fp:
86 ; X86: movl global_fp, %eax
87 ; X86: pushl 4(%esp)
88 ; X86: calll __llvm_external_retpoline_eax
89 ; X86: addl $4, %esp
90 ; X86: movl global_fp, %eax
91 ; X86: jmp __llvm_external_retpoline_eax # TAILCALL
92
93 ; X86FAST-LABEL: icall_global_fp:
94 ; X86FAST: calll __llvm_external_retpoline_eax
95 ; X86FAST: jmp __llvm_external_retpoline_eax # TAILCALL
96
97
98 %struct.Foo = type { void (%struct.Foo*)** }
99
100 ; Test an indirect call through a vtable.
101 define void @vcall(%struct.Foo* %obj) #0 {
102 %vptr_field = getelementptr %struct.Foo, %struct.Foo* %obj, i32 0, i32 0
103 %vptr = load void (%struct.Foo*)**, void (%struct.Foo*)*** %vptr_field
104 %vslot = getelementptr void(%struct.Foo*)*, void(%struct.Foo*)** %vptr, i32 1
105 %fp = load void(%struct.Foo*)*, void(%struct.Foo*)** %vslot
106 tail call void %fp(%struct.Foo* %obj)
107 tail call void %fp(%struct.Foo* %obj)
108 ret void
109 }
110
111 ; X64-LABEL: vcall:
112 ; X64: movq %rdi, %[[obj:[^ ]*]]
113 ; X64: movq (%[[obj]]), %[[vptr:[^ ]*]]
114 ; X64: movq 8(%[[vptr]]), %[[fp:[^ ]*]]
115 ; X64: movq %[[fp]], %r11
116 ; X64: callq __llvm_external_retpoline_r11
117 ; X64-DAG: movq %[[obj]], %rdi
118 ; X64-DAG: movq %[[fp]], %r11
119 ; X64: jmp __llvm_external_retpoline_r11 # TAILCALL
120
121 ; X64FAST-LABEL: vcall:
122 ; X64FAST: callq __llvm_external_retpoline_r11
123 ; X64FAST: jmp __llvm_external_retpoline_r11 # TAILCALL
124
125 ; X86-LABEL: vcall:
126 ; X86: movl 8(%esp), %[[obj:[^ ]*]]
127 ; X86: movl (%[[obj]]), %[[vptr:[^ ]*]]
128 ; X86: movl 4(%[[vptr]]), %[[fp:[^ ]*]]
129 ; X86: movl %[[fp]], %eax
130 ; X86: pushl %[[obj]]
131 ; X86: calll __llvm_external_retpoline_eax
132 ; X86: addl $4, %esp
133 ; X86: movl %[[fp]], %eax
134 ; X86: jmp __llvm_external_retpoline_eax # TAILCALL
135
136 ; X86FAST-LABEL: vcall:
137 ; X86FAST: calll __llvm_external_retpoline_eax
138 ; X86FAST: jmp __llvm_external_retpoline_eax # TAILCALL
139
140
141 declare void @direct_callee()
142
143 define void @direct_tail() #0 {
144 tail call void @direct_callee()
145 ret void
146 }
147
148 ; X64-LABEL: direct_tail:
149 ; X64: jmp direct_callee # TAILCALL
150 ; X64FAST-LABEL: direct_tail:
151 ; X64FAST: jmp direct_callee # TAILCALL
152 ; X86-LABEL: direct_tail:
153 ; X86: jmp direct_callee # TAILCALL
154 ; X86FAST-LABEL: direct_tail:
155 ; X86FAST: jmp direct_callee # TAILCALL
156
157
158 ; Lastly check that no thunks were emitted.
159 ; X64-NOT: __{{.*}}_retpoline_{{.*}}:
160 ; X64FAST-NOT: __{{.*}}_retpoline_{{.*}}:
161 ; X86-NOT: __{{.*}}_retpoline_{{.*}}:
162 ; X86FAST-NOT: __{{.*}}_retpoline_{{.*}}:
163
164
165 attributes #0 = { "target-features"="+retpoline-external-thunk" }
0 ; RUN: llc -mtriple=x86_64-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64
1 ; RUN: llc -mtriple=x86_64-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X64FAST
2
3 ; RUN: llc -mtriple=i686-unknown < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86
4 ; RUN: llc -mtriple=i686-unknown -O0 < %s | FileCheck %s --implicit-check-not="jmp.*\*" --implicit-check-not="call.*\*" --check-prefix=X86FAST
5
6 declare void @bar(i32)
7
8 ; Test a simple indirect call and tail call.
9 define void @icall_reg(void (i32)* %fp, i32 %x) #0 {
10 entry:
11 tail call void @bar(i32 %x)
12 tail call void %fp(i32 %x)
13 tail call void @bar(i32 %x)
14 tail call void %fp(i32 %x)
15 ret void
16 }
17
18 ; X64-LABEL: icall_reg:
19 ; X64-DAG: movq %rdi, %[[fp:[^ ]*]]
20 ; X64-DAG: movl %esi, %[[x:[^ ]*]]
21 ; X64: movl %[[x]], %edi
22 ; X64: callq bar
23 ; X64-DAG: movl %[[x]], %edi
24 ; X64-DAG: movq %[[fp]], %r11
25 ; X64: callq __llvm_retpoline_r11
26 ; X64: movl %[[x]], %edi
27 ; X64: callq bar
28 ; X64-DAG: movl %[[x]], %edi
29 ; X64-DAG: movq %[[fp]], %r11
30 ; X64: jmp __llvm_retpoline_r11 # TAILCALL
31
32 ; X64FAST-LABEL: icall_reg:
33 ; X64FAST: callq bar
34 ; X64FAST: callq __llvm_retpoline_r11
35 ; X64FAST: callq bar
36 ; X64FAST: jmp __llvm_retpoline_r11 # TAILCALL
37
38 ; X86-LABEL: icall_reg:
39 ; X86-DAG: movl 12(%esp), %[[fp:[^ ]*]]
40 ; X86-DAG: movl 16(%esp), %[[x:[^ ]*]]
41 ; X86: pushl %[[x]]
42 ; X86: calll bar
43 ; X86: movl %[[fp]], %eax
44 ; X86: pushl %[[x]]
45 ; X86: calll __llvm_retpoline_eax
46 ; X86: pushl %[[x]]
47 ; X86: calll bar
48 ; X86: movl %[[fp]], %eax
49 ; X86: pushl %[[x]]
50 ; X86: calll __llvm_retpoline_eax
51 ; X86-NOT: # TAILCALL
52
53 ; X86FAST-LABEL: icall_reg:
54 ; X86FAST: calll bar
55 ; X86FAST: calll __llvm_retpoline_eax
56 ; X86FAST: calll bar
57 ; X86FAST: calll __llvm_retpoline_eax
58
59
60 @global_fp = external global void (i32)*
61
62 ; Test an indirect call through a global variable.
63 define void @icall_global_fp(i32 %x, void (i32)** %fpp) #0 {
64 %fp1 = load void (i32)*, void (i32)** @global_fp
65 call void %fp1(i32 %x)
66 %fp2 = load void (i32)*, void (i32)** @global_fp
67 tail call void %fp2(i32 %x)
68 ret void
69 }
70
71 ; X64-LABEL: icall_global_fp:
72 ; X64-DAG: movl %edi, %[[x:[^ ]*]]
73 ; X64-DAG: movq global_fp(%rip), %r11
74 ; X64: callq __llvm_retpoline_r11
75 ; X64-DAG: movl %[[x]], %edi
76 ; X64-DAG: movq global_fp(%rip), %r11
77 ; X64: jmp __llvm_retpoline_r11 # TAILCALL
78
79 ; X64FAST-LABEL: icall_global_fp:
80 ; X64FAST: movq global_fp(%rip), %r11
81 ; X64FAST: callq __llvm_retpoline_r11
82 ; X64FAST: movq global_fp(%rip), %r11
83 ; X64FAST: jmp __llvm_retpoline_r11 # TAILCALL
84
85 ; X86-LABEL: icall_global_fp:
86 ; X86: movl global_fp, %eax
87 ; X86: pushl 4(%esp)
88 ; X86: calll __llvm_retpoline_eax
89 ; X86: addl $4, %esp
90 ; X86: movl global_fp, %eax
91 ; X86: jmp __llvm_retpoline_eax # TAILCALL
92
93 ; X86FAST-LABEL: icall_global_fp:
94 ; X86FAST: calll __llvm_retpoline_eax
95 ; X86FAST: jmp __llvm_retpoline_eax # TAILCALL
96
97
98 %struct.Foo = type { void (%struct.Foo*)** }
99
100 ; Test an indirect call through a vtable.
101 define void @vcall(%struct.Foo* %obj) #0 {
102 %vptr_field = getelementptr %struct.Foo, %struct.Foo* %obj, i32 0, i32 0
103 %vptr = load void (%struct.Foo*)**, void (%struct.Foo*)*** %vptr_field
104 %vslot = getelementptr void(%struct.Foo*)*, void(%struct.Foo*)** %vptr, i32 1
105 %fp = load void(%struct.Foo*)*, void(%struct.Foo*)** %vslot
106 tail call void %fp(%struct.Foo* %obj)
107 tail call void %fp(%struct.Foo* %obj)
108 ret void
109 }
110
111 ; X64-LABEL: vcall:
112 ; X64: movq %rdi, %[[obj:[^ ]*]]
113 ; X64: movq (%[[obj]]), %[[vptr:[^ ]*]]
114 ; X64: movq 8(%[[vptr]]), %[[fp:[^ ]*]]
115 ; X64: movq %[[fp]], %r11
116 ; X64: callq __llvm_retpoline_r11
117 ; X64-DAG: movq %[[obj]], %rdi
118 ; X64-DAG: movq %[[fp]], %r11
119 ; X64: jmp __llvm_retpoline_r11 # TAILCALL
120
121 ; X64FAST-LABEL: vcall:
122 ; X64FAST: callq __llvm_retpoline_r11
123 ; X64FAST: jmp __llvm_retpoline_r11 # TAILCALL
124
125 ; X86-LABEL: vcall:
126 ; X86: movl 8(%esp), %[[obj:[^ ]*]]
127 ; X86: movl (%[[obj]]), %[[vptr:[^ ]*]]
128 ; X86: movl 4(%[[vptr]]), %[[fp:[^ ]*]]
129 ; X86: movl %[[fp]], %eax
130 ; X86: pushl %[[obj]]
131 ; X86: calll __llvm_retpoline_eax
132 ; X86: addl $4, %esp
133 ; X86: movl %[[fp]], %eax
134 ; X86: jmp __llvm_retpoline_eax # TAILCALL
135
136 ; X86FAST-LABEL: vcall:
137 ; X86FAST: calll __llvm_retpoline_eax
138 ; X86FAST: jmp __llvm_retpoline_eax # TAILCALL
139
140
141 declare void @direct_callee()
142
143 define void @direct_tail() #0 {
144 tail call void @direct_callee()
145 ret void
146 }
147
148 ; X64-LABEL: direct_tail:
149 ; X64: jmp direct_callee # TAILCALL
150 ; X64FAST-LABEL: direct_tail:
151 ; X64FAST: jmp direct_callee # TAILCALL
152 ; X86-LABEL: direct_tail:
153 ; X86: jmp direct_callee # TAILCALL
154 ; X86FAST-LABEL: direct_tail:
155 ; X86FAST: jmp direct_callee # TAILCALL
156
157
158 declare void @nonlazybind_callee() #1
159
160 define void @nonlazybind_caller() #0 {
161 call void @nonlazybind_callee()
162 tail call void @nonlazybind_callee()
163 ret void
164 }
165
166 ; X64-LABEL: nonlazybind_caller:
167 ; X64: movq nonlazybind_callee@GOTPCREL(%rip), %[[REG:.*]]
168 ; X64: movq %[[REG]], %r11
169 ; X64: callq __llvm_retpoline_r11
170 ; X64: movq %[[REG]], %r11
171 ; X64: jmp __llvm_retpoline_r11 # TAILCALL
172 ; X64FAST-LABEL: nonlazybind_caller:
173 ; X64FAST: movq nonlazybind_callee@GOTPCREL(%rip), %r11
174 ; X64FAST: callq __llvm_retpoline_r11
175 ; X64FAST: movq nonlazybind_callee@GOTPCREL(%rip), %r11
176 ; X64FAST: jmp __llvm_retpoline_r11 # TAILCALL
177 ; X86-LABEL: nonlazybind_caller:
178 ; X86: calll nonlazybind_callee@PLT
179 ; X86: jmp nonlazybind_callee@PLT # TAILCALL
180 ; X86FAST-LABEL: nonlazybind_caller:
181 ; X86FAST: calll nonlazybind_callee@PLT
182 ; X86FAST: jmp nonlazybind_callee@PLT # TAILCALL
183
184
185 @indirectbr_rewrite.targets = constant [10 x i8*] [i8* blockaddress(@indirectbr_rewrite, %bb0),
186 i8* blockaddress(@indirectbr_rewrite, %bb1),
187 i8* blockaddress(@indirectbr_rewrite, %bb2),
188 i8* blockaddress(@indirectbr_rewrite, %bb3),
189 i8* blockaddress(@indirectbr_rewrite, %bb4),
190 i8* blockaddress(@indirectbr_rewrite, %bb5),
191 i8* blockaddress(@indirectbr_rewrite, %bb6),
192 i8* blockaddress(@indirectbr_rewrite, %bb7),
193 i8* blockaddress(@indirectbr_rewrite, %bb8),
194 i8* blockaddress(@indirectbr_rewrite, %bb9)]
195
196 ; Check that when retpolines are enabled a function with indirectbr gets
197 ; rewritten to use switch, and that in turn doesn't get lowered as a jump
198 ; table.
199 define void @indirectbr_rewrite(i64* readonly %p, i64* %sink) #0 {
200 ; X64-LABEL: indirectbr_rewrite:
201 ; X64-NOT: jmpq
202 ; X86-LABEL: indirectbr_rewrite:
203 ; X86-NOT: jmpl
204 entry:
205 %i0 = load i64, i64* %p
206 %target.i0 = getelementptr [10 x i8*], [10 x i8*]* @indirectbr_rewrite.targets, i64 0, i64 %i0
207 %target0 = load i8*, i8** %target.i0
208 indirectbr i8* %target0, [label %bb1, label %bb3]
209
210 bb0:
211 store volatile i64 0, i64* %sink
212 br label %latch
213
214 bb1:
215 store volatile i64 1, i64* %sink
216 br label %latch
217
218 bb2:
219 store volatile i64 2, i64* %sink
220 br label %latch
221
222 bb3:
223 store volatile i64 3, i64* %sink
224 br label %latch
225
226 bb4:
227 store volatile i64 4, i64* %sink
228 br label %latch
229
230 bb5:
231 store volatile i64 5, i64* %sink
232 br label %latch
233
234 bb6:
235 store volatile i64 6, i64* %sink
236 br label %latch
237
238 bb7:
239 store volatile i64 7, i64* %sink
240 br label %latch
241
242 bb8:
243 store volatile i64 8, i64* %sink
244 br label %latch
245
246 bb9:
247 store volatile i64 9, i64* %sink
248 br label %latch
249
250 latch:
251 %i.next = load i64, i64* %p
252 %target.i.next = getelementptr [10 x i8*], [10 x i8*]* @indirectbr_rewrite.targets, i64 0, i64 %i.next
253 %target.next = load i8*, i8** %target.i.next
254 ; Potentially hit a full 10 successors here so that even if we rewrite as
255 ; a switch it will try to be lowered with a jump table.
256 indirectbr i8* %target.next, [label %bb0,
257 label %bb1,
258 label %bb2,
259 label %bb3,
260 label %bb4,
261 label %bb5,
262 label %bb6,
263 label %bb7,
264 label %bb8,
265 label %bb9]
266 }
267
268 ; Lastly check that the necessary thunks were emitted.
269 ;
270 ; X64-LABEL: .section .text.__llvm_retpoline_r11,{{.*}},__llvm_retpoline_r11,comdat
271 ; X64-NEXT: .hidden __llvm_retpoline_r11
272 ; X64-NEXT: .weak __llvm_retpoline_r11
273 ; X64: __llvm_retpoline_r11:
274 ; X64-NEXT: # {{.*}} # %entry
275 ; X64-NEXT: callq [[CALL_TARGET:.*]]
276 ; X64-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
277 ; X64-NEXT: # %entry
278 ; X64-NEXT: # =>This Inner Loop Header: Depth=1
279 ; X64-NEXT: pause
280 ; X64-NEXT: lfence
281 ; X64-NEXT: jmp [[CAPTURE_SPEC]]
282 ; X64-NEXT: .p2align 4, 0x90
283 ; X64-NEXT: [[CALL_TARGET]]: # Block address taken
284 ; X64-NEXT: # %entry
285 ; X64-NEXT: movq %r11, (%rsp)
286 ; X64-NEXT: retq
287 ;
288 ; X86-LABEL: .section .text.__llvm_retpoline_eax,{{.*}},__llvm_retpoline_eax,comdat
289 ; X86-NEXT: .hidden __llvm_retpoline_eax
290 ; X86-NEXT: .weak __llvm_retpoline_eax
291 ; X86: __llvm_retpoline_eax:
292 ; X86-NEXT: # {{.*}} # %entry
293 ; X86-NEXT: calll [[CALL_TARGET:.*]]
294 ; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
295 ; X86-NEXT: # %entry
296 ; X86-NEXT: # =>This Inner Loop Header: Depth=1
297 ; X86-NEXT: pause
298 ; X86-NEXT: lfence
299 ; X86-NEXT: jmp [[CAPTURE_SPEC]]
300 ; X86-NEXT: .p2align 4, 0x90
301 ; X86-NEXT: [[CALL_TARGET]]: # Block address taken
302 ; X86-NEXT: # %entry
303 ; X86-NEXT: movl %eax, (%esp)
304 ; X86-NEXT: retl
305 ;
306 ; X86-LABEL: .section .text.__llvm_retpoline_ecx,{{.*}},__llvm_retpoline_ecx,comdat
307 ; X86-NEXT: .hidden __llvm_retpoline_ecx
308 ; X86-NEXT: .weak __llvm_retpoline_ecx
309 ; X86: __llvm_retpoline_ecx:
310 ; X86-NEXT: # {{.*}} # %entry
311 ; X86-NEXT: calll [[CALL_TARGET:.*]]
312 ; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
313 ; X86-NEXT: # %entry
314 ; X86-NEXT: # =>This Inner Loop Header: Depth=1
315 ; X86-NEXT: pause
316 ; X86-NEXT: lfence
317 ; X86-NEXT: jmp [[CAPTURE_SPEC]]
318 ; X86-NEXT: .p2align 4, 0x90
319 ; X86-NEXT: [[CALL_TARGET]]: # Block address taken
320 ; X86-NEXT: # %entry
321 ; X86-NEXT: movl %ecx, (%esp)
322 ; X86-NEXT: retl
323 ;
324 ; X86-LABEL: .section .text.__llvm_retpoline_edx,{{.*}},__llvm_retpoline_edx,comdat
325 ; X86-NEXT: .hidden __llvm_retpoline_edx
326 ; X86-NEXT: .weak __llvm_retpoline_edx
327 ; X86: __llvm_retpoline_edx:
328 ; X86-NEXT: # {{.*}} # %entry
329 ; X86-NEXT: calll [[CALL_TARGET:.*]]
330 ; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
331 ; X86-NEXT: # %entry
332 ; X86-NEXT: # =>This Inner Loop Header: Depth=1
333 ; X86-NEXT: pause
334 ; X86-NEXT: lfence
335 ; X86-NEXT: jmp [[CAPTURE_SPEC]]
336 ; X86-NEXT: .p2align 4, 0x90
337 ; X86-NEXT: [[CALL_TARGET]]: # Block address taken
338 ; X86-NEXT: # %entry
339 ; X86-NEXT: movl %edx, (%esp)
340 ; X86-NEXT: retl
341 ;
342 ; X86-LABEL: .section .text.__llvm_retpoline_push,{{.*}},__llvm_retpoline_push,comdat
343 ; X86-NEXT: .hidden __llvm_retpoline_push
344 ; X86-NEXT: .weak __llvm_retpoline_push
345 ; X86: __llvm_retpoline_push:
346 ; X86-NEXT: # {{.*}} # %entry
347 ; X86-NEXT: calll [[CALL_TARGET:.*]]
348 ; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
349 ; X86-NEXT: # %entry
350 ; X86-NEXT: # =>This Inner Loop Header: Depth=1
351 ; X86-NEXT: pause
352 ; X86-NEXT: lfence
353 ; X86-NEXT: jmp [[CAPTURE_SPEC]]
354 ; X86-NEXT: .p2align 4, 0x90
355 ; X86-NEXT: [[CALL_TARGET]]: # Block address taken
356 ; X86-NEXT: # %entry
357 ; X86-NEXT: addl $4, %esp
358 ; X86-NEXT: pushl 4(%esp)
359 ; X86-NEXT: pushl 4(%esp)
360 ; X86-NEXT: popl 8(%esp)
361 ; X86-NEXT: popl (%esp)
362 ; X86-NEXT: retl
363
364
365 attributes #0 = { "target-features"="+retpoline" }
366 attributes #1 = { nonlazybind }
0 ; RUN: opt < %s -indirectbr-expand -S | FileCheck %s
1 ;
2 ; REQUIRES: x86-registered-target
3
4 target triple = "x86_64-unknown-linux-gnu"
5
6 @test1.targets = constant [4 x i8*] [i8* blockaddress(@test1, %bb0),
7 i8* blockaddress(@test1, %bb1),
8 i8* blockaddress(@test1, %bb2),
9 i8* blockaddress(@test1, %bb3)]
10 ; CHECK-LABEL: @test1.targets = constant [4 x i8*]
11 ; CHECK: [i8* inttoptr (i64 1 to i8*),
12 ; CHECK: i8* inttoptr (i64 2 to i8*),
13 ; CHECK: i8* inttoptr (i64 3 to i8*),
14 ; CHECK: i8* blockaddress(@test1, %bb3)]
15
16 define void @test1(i64* readonly %p, i64* %sink) #0 {
17 ; CHECK-LABEL: define void @test1(
18 entry:
19 %i0 = load i64, i64* %p
20 %target.i0 = getelementptr [4 x i8*], [4 x i8*]* @test1.targets, i64 0, i64 %i0
21 %target0 = load i8*, i8** %target.i0
22 ; Only a subset of blocks are viable successors here.
23 indirectbr i8* %target0, [label %bb0, label %bb1]
24 ; CHECK-NOT: indirectbr
25 ; CHECK: %[[ENTRY_V:.*]] = ptrtoint i8* %{{.*}} to i64
26 ; CHECK-NEXT: br label %[[SWITCH_BB:.*]]
27
28 bb0:
29 store volatile i64 0, i64* %sink
30 br label %latch
31
32 bb1:
33 store volatile i64 1, i64* %sink
34 br label %latch
35
36 bb2:
37 store volatile i64 2, i64* %sink
38 br label %latch
39
40 bb3:
41 store volatile i64 3, i64* %sink
42 br label %latch
43
44 latch:
45 %i.next = load i64, i64* %p
46 %target.i.next = getelementptr [4 x i8*], [4 x i8*]* @test1.targets, i64 0, i64 %i.next
47 %target.next = load i8*, i8** %target.i.next
48 ; A different subset of blocks are viable successors here.
49 indirectbr i8* %target.next, [label %bb1, label %bb2]
50 ; CHECK-NOT: indirectbr
51 ; CHECK: %[[LATCH_V:.*]] = ptrtoint i8* %{{.*}} to i64
52 ; CHECK-NEXT: br label %[[SWITCH_BB]]
53 ;
54 ; CHECK: [[SWITCH_BB]]:
55 ; CHECK-NEXT: %[[V:.*]] = phi i64 [ %[[ENTRY_V]], %entry ], [ %[[LATCH_V]], %latch ]
56 ; CHECK-NEXT: switch i64 %[[V]], label %bb0 [
57 ; CHECK-NEXT: i64 2, label %bb1
58 ; CHECK-NEXT: i64 3, label %bb2
59 ; CHECK-NEXT: ]
60 }
61
62 attributes #0 = { "target-features"="+retpoline" }
401401 initializeSjLjEHPreparePass(Registry);
402402 initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
403403 initializeGlobalMergePass(Registry);
404 initializeIndirectBrExpandPassPass(Registry);
404405 initializeInterleavedAccessPass(Registry);
405406 initializeEntryExitInstrumenterPass(Registry);
406407 initializePostInlineEntryExitInstrumenterPass(Registry);