llvm.org GIT mirror llvm / f2286b0
Initial commit for the rewrite of the inline cost analysis to operate on a per-callsite walk of the called function's instructions, in breadth-first order over the potentially reachable set of basic blocks. This is a major shift in how inline cost analysis works to improve the accuracy and rationality of inlining decisions. A brief outline of the algorithm this moves to: - Build a simplification mapping based on the callsite arguments to the function arguments. - Push the entry block onto a worklist of potentially-live basic blocks. - Pop the first block off of the *front* of the worklist (for breadth-first ordering) and walk its instructions using a custom InstVisitor. - For each instruction's operands, re-map them based on the simplification mappings available for the given callsite. - Compute any simplification possible of the instruction after re-mapping, and store that back int othe simplification mapping. - Compute any bonuses, costs, or other impacts of the instruction on the cost metric. - When the terminator is reached, replace any conditional value in the terminator with any simplifications from the mapping we have, and add any successors which are not proven to be dead from these simplifications to the worklist. - Pop the next block off of the front of the worklist, and repeat. - As soon as the cost of inlining exceeds the threshold for the callsite, stop analyzing the function in order to bound cost. The primary goal of this algorithm is to perfectly handle dead code paths. We do not want any code in trivially dead code paths to impact inlining decisions. The previous metric was *extremely* flawed here, and would always subtract the average cost of two successors of a conditional branch when it was proven to become an unconditional branch at the callsite. There was no handling of wildly different costs between the two successors, which would cause inlining when the path actually taken was too large, and no inlining when the path actually taken was trivially simple. There was also no handling of the code *path*, only the immediate successors. These problems vanish completely now. See the added regression tests for the shiny new features -- we skip recursive function calls, SROA-killing instructions, and high cost complex CFG structures when dead at the callsite being analyzed. Switching to this algorithm required refactoring the inline cost interface to accept the actual threshold rather than simply returning a single cost. The resulting interface is pretty bad, and I'm planning to do lots of interface cleanup after this patch. Several other refactorings fell out of this, but I've tried to minimize them for this patch. =/ There is still more cleanup that can be done here. Please point out anything that you see in review. I've worked really hard to try to mirror at least the spirit of all of the previous heuristics in the new model. It's not clear that they are all correct any more, but I wanted to minimize the change in this single patch, it's already a bit ridiculous. One heuristic that is *not* yet mirrored is to allow inlining of functions with a dynamic alloca *if* the caller has a dynamic alloca. I will add this back, but I think the most reasonable way requires changes to the inliner itself rather than just the cost metric, and so I've deferred this for a subsequent patch. The test case is XFAIL-ed until then. As mentioned in the review mail, this seems to make Clang run about 1% to 2% faster in -O0, but makes its binary size grow by just under 4%. I've looked into the 4% growth, and it can be fixed, but requires changes to other parts of the inliner. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153812 91177308-0d34-0410-b5e6-96231b3b80d8 Chandler Carruth 7 years ago
13 changed file(s) with 1252 addition(s) and 861 deletion(s). Raw diff Collapse all Expand all
1919 namespace llvm {
2020 class BasicBlock;
2121 class Function;
22 class Instruction;
2223 class TargetData;
2324 class Value;
25
26 /// \brief Check whether an instruction is likely to be "free" when lowered.
27 bool isInstructionFree(const Instruction *I, const TargetData *TD = 0);
2428
2529 /// \brief Check whether a call will lower to something small.
2630 ///
1515
1616 #include "llvm/Function.h"
1717 #include "llvm/ADT/DenseMap.h"
18 #include "llvm/ADT/SmallPtrSet.h"
1819 #include "llvm/ADT/ValueMap.h"
1920 #include "llvm/Analysis/CodeMetrics.h"
2021 #include
2425 namespace llvm {
2526
2627 class CallSite;
27 template
28 class SmallPtrSet;
2928 class TargetData;
3029
3130 namespace InlineConstants {
3231 // Various magic constants used to adjust heuristics.
3332 const int InstrCost = 5;
34 const int IndirectCallBonus = -100;
33 const int IndirectCallThreshold = 100;
3534 const int CallPenalty = 25;
3635 const int LastCallToStaticBonus = -15000;
3736 const int ColdccPenalty = 2000;
3837 const int NoreturnPenalty = 10000;
3938 }
4039
41 /// InlineCost - Represent the cost of inlining a function. This
42 /// supports special values for functions which should "always" or
43 /// "never" be inlined. Otherwise, the cost represents a unitless
44 /// amount; smaller values increase the likelihood of the function
45 /// being inlined.
40 /// \brief Represents the cost of inlining a function.
41 ///
42 /// This supports special values for functions which should "always" or
43 /// "never" be inlined. Otherwise, the cost represents a unitless amount;
44 /// smaller values increase the likelihood of the function being inlined.
45 ///
46 /// Objects of this type also provide the adjusted threshold for inlining
47 /// based on the information available for a particular callsite. They can be
48 /// directly tested to determine if inlining should occur given the cost and
49 /// threshold for this cost metric.
4650 class InlineCost {
47 enum Kind {
48 Value,
49 Always,
50 Never
51 enum CostKind {
52 CK_Variable,
53 CK_Always,
54 CK_Never
5155 };
5256
53 // This is a do-it-yourself implementation of
54 // int Cost : 30;
55 // unsigned Type : 2;
56 // We used to use bitfields, but they were sometimes miscompiled (PR3822).
57 enum { TYPE_BITS = 2 };
58 enum { COST_BITS = unsigned(sizeof(unsigned)) * CHAR_BIT - TYPE_BITS };
59 unsigned TypedCost; // int Cost : COST_BITS; unsigned Type : TYPE_BITS;
57 const int Cost : 30; // The inlining cost if neither always nor never.
58 const unsigned Kind : 2; // The type of cost, one of CostKind above.
6059
61 Kind getType() const {
62 return Kind(TypedCost >> COST_BITS);
60 /// \brief The adjusted threshold against which this cost should be tested.
61 const int Threshold;
62
63 // Trivial constructor, interesting logic in the factory functions below.
64 InlineCost(int Cost, CostKind Kind, int Threshold)
65 : Cost(Cost), Kind(Kind), Threshold(Threshold) {}
66
67 public:
68 static InlineCost get(int Cost, int Threshold) {
69 InlineCost Result(Cost, CK_Variable, Threshold);
70 assert(Result.Cost == Cost && "Cost exceeds InlineCost precision");
71 return Result;
72 }
73 static InlineCost getAlways() {
74 return InlineCost(0, CK_Always, 0);
75 }
76 static InlineCost getNever() {
77 return InlineCost(0, CK_Never, 0);
6378 }
6479
65 int getCost() const {
66 // Sign-extend the bottom COST_BITS bits.
67 return (int(TypedCost << TYPE_BITS)) >> TYPE_BITS;
80 /// \brief Test whether the inline cost is low enough for inlining.
81 operator bool() const {
82 if (isAlways()) return true;
83 if (isNever()) return false;
84 return Cost < Threshold;
6885 }
6986
70 InlineCost(int C, int T) {
71 TypedCost = (unsigned(C << TYPE_BITS) >> TYPE_BITS) | (T << COST_BITS);
72 assert(getCost() == C && "Cost exceeds InlineCost precision");
87 bool isVariable() const { return Kind == CK_Variable; }
88 bool isAlways() const { return Kind == CK_Always; }
89 bool isNever() const { return Kind == CK_Never; }
90
91 /// getCost() - Return a "variable" inline cost's amount. It is
92 /// an error to call this on an "always" or "never" InlineCost.
93 int getCost() const {
94 assert(Kind == CK_Variable && "Invalid access of InlineCost");
95 return Cost;
7396 }
74 public:
75 static InlineCost get(int Cost) { return InlineCost(Cost, Value); }
76 static InlineCost getAlways() { return InlineCost(0, Always); }
77 static InlineCost getNever() { return InlineCost(0, Never); }
7897
79 bool isVariable() const { return getType() == Value; }
80 bool isAlways() const { return getType() == Always; }
81 bool isNever() const { return getType() == Never; }
82
83 /// getValue() - Return a "variable" inline cost's amount. It is
84 /// an error to call this on an "always" or "never" InlineCost.
85 int getValue() const {
86 assert(getType() == Value && "Invalid access of InlineCost");
87 return getCost();
98 /// \brief Get the cost delta from the threshold for inlining.
99 /// Only valid if the cost is of the variable kind. Returns a negative
100 /// value if the cost is too high to inline.
101 int getCostDelta() const {
102 return Threshold - getCost();
88103 }
89104 };
90105
91106 /// InlineCostAnalyzer - Cost analyzer used by inliner.
92107 class InlineCostAnalyzer {
93 struct ArgInfo {
94 public:
95 unsigned ConstantWeight;
96 unsigned AllocaWeight;
97
98 ArgInfo(unsigned CWeight, unsigned AWeight)
99 : ConstantWeight(CWeight), AllocaWeight(AWeight)
100 {}
101 };
102
103 struct FunctionInfo {
104 CodeMetrics Metrics;
105
106 /// ArgumentWeights - Each formal argument of the function is inspected to
107 /// see if it is used in any contexts where making it a constant or alloca
108 /// would reduce the code size. If so, we add some value to the argument
109 /// entry here.
110 std::vector ArgumentWeights;
111
112 /// PointerArgPairWeights - Weights to use when giving an inline bonus to
113 /// a call site due to correlated pairs of pointers.
114 DenseMap, unsigned> PointerArgPairWeights;
115
116 /// countCodeReductionForConstant - Figure out an approximation for how
117 /// many instructions will be constant folded if the specified value is
118 /// constant.
119 unsigned countCodeReductionForConstant(const CodeMetrics &Metrics,
120 Value *V);
121
122 /// countCodeReductionForAlloca - Figure out an approximation of how much
123 /// smaller the function will be if it is inlined into a context where an
124 /// argument becomes an alloca.
125 unsigned countCodeReductionForAlloca(const CodeMetrics &Metrics,
126 Value *V);
127
128 /// countCodeReductionForPointerPair - Count the bonus to apply to an
129 /// inline call site where a pair of arguments are pointers and one
130 /// argument is a constant offset from the other. The idea is to
131 /// recognize a common C++ idiom where a begin and end iterator are
132 /// actually pointers, and many operations on the pair of them will be
133 /// constants if the function is called with arguments that have
134 /// a constant offset.
135 void countCodeReductionForPointerPair(
136 const CodeMetrics &Metrics,
137 DenseMap &PointerArgs,
138 Value *V, unsigned ArgIdx);
139
140 /// analyzeFunction - Add information about the specified function
141 /// to the current structure.
142 void analyzeFunction(Function *F, const TargetData *TD);
143
144 /// NeverInline - Returns true if the function should never be
145 /// inlined into any caller.
146 bool NeverInline();
147 };
148
149 // The Function* for a function can be changed (by ArgumentPromotion);
150 // the ValueMap will update itself when this happens.
151 ValueMap CachedFunctionInfo;
152
153108 // TargetData if available, or null.
154109 const TargetData *TD;
155110
156 int CountBonusForConstant(Value *V, Constant *C = NULL);
157 int ConstantFunctionBonus(CallSite CS, Constant *C);
158 int getInlineSize(CallSite CS, Function *Callee);
159 int getInlineBonuses(CallSite CS, Function *Callee);
160111 public:
161112 InlineCostAnalyzer(): TD(0) {}
162113
163114 void setTargetData(const TargetData *TData) { TD = TData; }
164115
165 /// getInlineCost - The heuristic used to determine if we should inline the
166 /// function call or not.
116 /// \brief Get an InlineCost object representing the cost of inlining this
117 /// callsite.
167118 ///
168 InlineCost getInlineCost(CallSite CS);
169 /// getCalledFunction - The heuristic used to determine if we should inline
170 /// the function call or not. The callee is explicitly specified, to allow
171 /// you to calculate the cost of inlining a function via a pointer. The
172 /// result assumes that the inlined version will always be used. You should
173 /// weight it yourself in cases where this callee will not always be called.
174 InlineCost getInlineCost(CallSite CS, Function *Callee);
175
176 /// getInlineFudgeFactor - Return a > 1.0 factor if the inliner should use a
177 /// higher threshold to determine if the function call should be inlined.
178 float getInlineFudgeFactor(CallSite CS);
119 /// Note that threshold is passed into this function. Only costs below the
120 /// threshold are computed with any accuracy. The threshold can be used to
121 /// bound the computation necessary to determine whether the cost is
122 /// sufficiently low to warrant inlining.
123 InlineCost getInlineCost(CallSite CS, int Threshold);
179124
180125 /// resetCachedFunctionInfo - erase any cached cost info for this function.
181126 void resetCachedCostInfo(Function* Caller) {
182 CachedFunctionInfo[Caller] = FunctionInfo();
183127 }
184128
185129 /// growCachedCostInfo - update the cached cost info for Caller after Callee
6464 ///
6565 virtual InlineCost getInlineCost(CallSite CS) = 0;
6666
67 // getInlineFudgeFactor - Return a > 1.0 factor if the inliner should use a
68 // higher threshold to determine if the function call should be inlined.
69 ///
70 virtual float getInlineFudgeFactor(CallSite CS) = 0;
71
7267 /// resetCachedCostInfo - erase any cached cost data from the derived class.
7368 /// If the derived class has no such data this can be empty.
7469 ///
4949 return false;
5050 }
5151
52 bool llvm::isInstructionFree(const Instruction *I, const TargetData *TD) {
53 if (isa(I))
54 return true;
55
56 // If a GEP has all constant indices, it will probably be folded with
57 // a load/store.
58 if (const GetElementPtrInst *GEP = dyn_cast(I))
59 return GEP->hasAllConstantIndices();
60
61 if (const IntrinsicInst *II = dyn_cast(I)) {
62 switch (II->getIntrinsicID()) {
63 default:
64 return false;
65 case Intrinsic::dbg_declare:
66 case Intrinsic::dbg_value:
67 case Intrinsic::invariant_start:
68 case Intrinsic::invariant_end:
69 case Intrinsic::lifetime_start:
70 case Intrinsic::lifetime_end:
71 case Intrinsic::objectsize:
72 case Intrinsic::ptr_annotation:
73 case Intrinsic::var_annotation:
74 // These intrinsics don't count as size.
75 return true;
76 }
77 }
78
79 if (const CastInst *CI = dyn_cast(I)) {
80 // Noop casts, including ptr <-> int, don't count.
81 if (CI->isLosslessCast() || isa(CI) || isa(CI))
82 return true;
83 // trunc to a native type is free (assuming the target has compare and
84 // shift-right of the same width).
85 if (TD && isa(CI) &&
86 TD->isLegalInteger(TD->getTypeSizeInBits(CI->getType())))
87 return true;
88 // Result of a cmp instruction is often extended (to be used by other
89 // cmp instructions, logical or return instructions). These are usually
90 // nop on most sane targets.
91 if (isa(CI->getOperand(0)))
92 return true;
93 }
94
95 return false;
96 }
97
5298 /// analyzeBasicBlock - Fill in the current structure with information gleaned
5399 /// from the specified block.
54100 void CodeMetrics::analyzeBasicBlock(const BasicBlock *BB,
57103 unsigned NumInstsBeforeThisBB = NumInsts;
58104 for (BasicBlock::const_iterator II = BB->begin(), E = BB->end();
59105 II != E; ++II) {
60 if (isa(II)) continue; // PHI nodes don't count.
106 if (isInstructionFree(II, TD))
107 continue;
61108
62109 // Special handling for calls.
63110 if (isa(II) || isa(II)) {
64 if (const IntrinsicInst *IntrinsicI = dyn_cast(II)) {
65 switch (IntrinsicI->getIntrinsicID()) {
66 default: break;
67 case Intrinsic::dbg_declare:
68 case Intrinsic::dbg_value:
69 case Intrinsic::invariant_start:
70 case Intrinsic::invariant_end:
71 case Intrinsic::lifetime_start:
72 case Intrinsic::lifetime_end:
73 case Intrinsic::objectsize:
74 case Intrinsic::ptr_annotation:
75 case Intrinsic::var_annotation:
76 // These intrinsics don't count as size.
77 continue;
78 }
79 }
80
81111 ImmutableCallSite CS(cast(II));
82112
83113 if (const Function *F = CS.getCalledFunction()) {
113143
114144 if (isa(II) || II->getType()->isVectorTy())
115145 ++NumVectorInsts;
116
117 if (const CastInst *CI = dyn_cast(II)) {
118 // Noop casts, including ptr <-> int, don't count.
119 if (CI->isLosslessCast() || isa(CI) ||
120 isa(CI))
121 continue;
122 // trunc to a native type is free (assuming the target has compare and
123 // shift-right of the same width).
124 if (isa(CI) && TD &&
125 TD->isLegalInteger(TD->getTypeSizeInBits(CI->getType())))
126 continue;
127 // Result of a cmp instruction is often extended (to be used by other
128 // cmp instructions, logical or return instructions). These are usually
129 // nop on most sane targets.
130 if (isa(CI->getOperand(0)))
131 continue;
132 } else if (const GetElementPtrInst *GEPI = dyn_cast(II)){
133 // If a GEP has all constant indices, it will probably be folded with
134 // a load/store.
135 if (GEPI->hasAllConstantIndices())
136 continue;
137 }
138146
139147 ++NumInsts;
140148 }
1010 //
1111 //===----------------------------------------------------------------------===//
1212
13 #define DEBUG_TYPE "inline-cost"
1314 #include "llvm/Analysis/InlineCost.h"
15 #include "llvm/Analysis/ConstantFolding.h"
16 #include "llvm/Analysis/InstructionSimplify.h"
1417 #include "llvm/Support/CallSite.h"
18 #include "llvm/Support/Debug.h"
19 #include "llvm/Support/InstVisitor.h"
20 #include "llvm/Support/GetElementPtrTypeIterator.h"
21 #include "llvm/Support/raw_ostream.h"
1522 #include "llvm/CallingConv.h"
1623 #include "llvm/IntrinsicInst.h"
24 #include "llvm/Operator.h"
25 #include "llvm/GlobalAlias.h"
1726 #include "llvm/Target/TargetData.h"
27 #include "llvm/ADT/STLExtras.h"
28 #include "llvm/ADT/SetVector.h"
29 #include "llvm/ADT/SmallVector.h"
1830 #include "llvm/ADT/SmallPtrSet.h"
1931
2032 using namespace llvm;
2133
22 unsigned InlineCostAnalyzer::FunctionInfo::countCodeReductionForConstant(
23 const CodeMetrics &Metrics, Value *V) {
24 unsigned Reduction = 0;
25 SmallVector Worklist;
26 Worklist.push_back(V);
27 do {
28 Value *V = Worklist.pop_back_val();
29 for (Value::use_iterator UI = V->use_begin(), E = V->use_end(); UI != E;++UI){
30 User *U = *UI;
31 if (isa(U) || isa(U)) {
32 // We will be able to eliminate all but one of the successors.
33 const TerminatorInst &TI = cast(*U);
34 const unsigned NumSucc = TI.getNumSuccessors();
35 unsigned Instrs = 0;
36 for (unsigned I = 0; I != NumSucc; ++I)
37 Instrs += Metrics.NumBBInsts.lookup(TI.getSuccessor(I));
38 // We don't know which blocks will be eliminated, so use the average size.
39 Reduction += InlineConstants::InstrCost*Instrs*(NumSucc-1)/NumSucc;
40 continue;
34 namespace {
35
36 class CallAnalyzer : public InstVisitor {
37 typedef InstVisitor Base;
38 friend class InstVisitor;
39
40 // TargetData if available, or null.
41 const TargetData *const TD;
42
43 // The called function.
44 Function &F;
45
46 int Threshold;
47 int Cost;
48 const bool AlwaysInline;
49
50 bool IsRecursive;
51 bool ExposesReturnsTwice;
52 bool HasDynamicAlloca;
53 unsigned NumInstructions, NumVectorInstructions;
54 int FiftyPercentVectorBonus, TenPercentVectorBonus;
55 int VectorBonus;
56
57 // While we walk the potentially-inlined instructions, we build up and
58 // maintain a mapping of simplified values specific to this callsite. The
59 // idea is to propagate any special information we have about arguments to
60 // this call through the inlinable section of the function, and account for
61 // likely simplifications post-inlining. The most important aspect we track
62 // is CFG altering simplifications -- when we prove a basic block dead, that
63 // can cause dramatic shifts in the cost of inlining a function.
64 DenseMap SimplifiedValues;
65
66 // Keep track of the values which map back (through function arguments) to
67 // allocas on the caller stack which could be simplified through SROA.
68 DenseMap SROAArgValues;
69
70 // The mapping of caller Alloca values to their accumulated cost savings. If
71 // we have to disable SROA for one of the allocas, this tells us how much
72 // cost must be added.
73 DenseMap SROAArgCosts;
74
75 // Keep track of values which map to a pointer base and constant offset.
76 DenseMap > ConstantOffsetPtrs;
77
78 // Custom simplification helper routines.
79 bool isAllocaDerivedArg(Value *V);
80 bool lookupSROAArgAndCost(Value *V, Value *&Arg,
81 DenseMap::iterator &CostIt);
82 void disableSROA(DenseMap::iterator CostIt);
83 void disableSROA(Value *V);
84 void accumulateSROACost(DenseMap::iterator CostIt,
85 int InstructionCost);
86 bool handleSROACandidate(bool IsSROAValid,
87 DenseMap::iterator CostIt,
88 int InstructionCost);
89 bool isGEPOffsetConstant(GetElementPtrInst &GEP);
90 bool accumulateGEPOffset(GEPOperator &GEP, APInt &Offset);
91 ConstantInt *stripAndComputeInBoundsConstantOffsets(Value *&V);
92
93 // Custom analysis routines.
94 bool analyzeBlock(BasicBlock *BB);
95
96 // Disable several entry points to the visitor so we don't accidentally use
97 // them by declaring but not defining them here.
98 void visit(Module *); void visit(Module &);
99 void visit(Function *); void visit(Function &);
100 void visit(BasicBlock *); void visit(BasicBlock &);
101
102 // Provide base case for our instruction visit.
103 bool visitInstruction(Instruction &I);
104
105 // Our visit overrides.
106 bool visitAlloca(AllocaInst &I);
107 bool visitPHI(PHINode &I);
108 bool visitGetElementPtr(GetElementPtrInst &I);
109 bool visitBitCast(BitCastInst &I);
110 bool visitPtrToInt(PtrToIntInst &I);
111 bool visitIntToPtr(IntToPtrInst &I);
112 bool visitCastInst(CastInst &I);
113 bool visitUnaryInstruction(UnaryInstruction &I);
114 bool visitICmp(ICmpInst &I);
115 bool visitSub(BinaryOperator &I);
116 bool visitBinaryOperator(BinaryOperator &I);
117 bool visitLoad(LoadInst &I);
118 bool visitStore(StoreInst &I);
119 bool visitCallSite(CallSite CS);
120
121 public:
122 CallAnalyzer(const TargetData *TD, Function &Callee, int Threshold)
123 : TD(TD), F(Callee), Threshold(Threshold), Cost(0),
124 AlwaysInline(F.hasFnAttr(Attribute::AlwaysInline)),
125 IsRecursive(false), ExposesReturnsTwice(false), HasDynamicAlloca(false),
126 NumInstructions(0), NumVectorInstructions(0),
127 FiftyPercentVectorBonus(0), TenPercentVectorBonus(0), VectorBonus(0),
128 NumConstantArgs(0), NumConstantOffsetPtrArgs(0), NumAllocaArgs(0),
129 NumConstantPtrCmps(0), NumConstantPtrDiffs(0),
130 NumInstructionsSimplified(0), SROACostSavings(0), SROACostSavingsLost(0) {
131 }
132
133 bool analyzeCall(CallSite CS);
134
135 int getThreshold() { return Threshold; }
136 int getCost() { return Cost; }
137
138 // Keep a bunch of stats about the cost savings found so we can print them
139 // out when debugging.
140 unsigned NumConstantArgs;
141 unsigned NumConstantOffsetPtrArgs;
142 unsigned NumAllocaArgs;
143 unsigned NumConstantPtrCmps;
144 unsigned NumConstantPtrDiffs;
145 unsigned NumInstructionsSimplified;
146 unsigned SROACostSavings;
147 unsigned SROACostSavingsLost;
148
149 void dump();
150 };
151
152 } // namespace
153
154 /// \brief Test whether the given value is an Alloca-derived function argument.
155 bool CallAnalyzer::isAllocaDerivedArg(Value *V) {
156 return SROAArgValues.count(V);
157 }
158
159 /// \brief Lookup the SROA-candidate argument and cost iterator which V maps to.
160 /// Returns false if V does not map to a SROA-candidate.
161 bool CallAnalyzer::lookupSROAArgAndCost(
162 Value *V, Value *&Arg, DenseMap::iterator &CostIt) {
163 if (SROAArgValues.empty() || SROAArgCosts.empty())
164 return false;
165
166 DenseMap::iterator ArgIt = SROAArgValues.find(V);
167 if (ArgIt == SROAArgValues.end())
168 return false;
169
170 Arg = ArgIt->second;
171 CostIt = SROAArgCosts.find(Arg);
172 return CostIt != SROAArgCosts.end();
173 }
174
175 /// \brief Disable SROA for the candidate marked by this cost iterator.
176 ///
177 /// This markes the candidate as no longer viable for SROA, and adds the cost
178 /// savings associated with it back into the inline cost measurement.
179 void CallAnalyzer::disableSROA(DenseMap::iterator CostIt) {
180 // If we're no longer able to perform SROA we need to undo its cost savings
181 // and prevent subsequent analysis.
182 Cost += CostIt->second;
183 SROACostSavings -= CostIt->second;
184 SROACostSavingsLost += CostIt->second;
185 SROAArgCosts.erase(CostIt);
186 }
187
188 /// \brief If 'V' maps to a SROA candidate, disable SROA for it.
189 void CallAnalyzer::disableSROA(Value *V) {
190 Value *SROAArg;
191 DenseMap::iterator CostIt;
192 if (lookupSROAArgAndCost(V, SROAArg, CostIt))
193 disableSROA(CostIt);
194 }
195
196 /// \brief Accumulate the given cost for a particular SROA candidate.
197 void CallAnalyzer::accumulateSROACost(DenseMap::iterator CostIt,
198 int InstructionCost) {
199 CostIt->second += InstructionCost;
200 SROACostSavings += InstructionCost;
201 }
202
203 /// \brief Helper for the common pattern of handling a SROA candidate.
204 /// Either accumulates the cost savings if the SROA remains valid, or disables
205 /// SROA for the candidate.
206 bool CallAnalyzer::handleSROACandidate(bool IsSROAValid,
207 DenseMap::iterator CostIt,
208 int InstructionCost) {
209 if (IsSROAValid) {
210 accumulateSROACost(CostIt, InstructionCost);
211 return true;
212 }
213
214 disableSROA(CostIt);
215 return false;
216 }
217
218 /// \brief Check whether a GEP's indices are all constant.
219 ///
220 /// Respects any simplified values known during the analysis of this callsite.
221 bool CallAnalyzer::isGEPOffsetConstant(GetElementPtrInst &GEP) {
222 for (User::op_iterator I = GEP.idx_begin(), E = GEP.idx_end(); I != E; ++I)
223 if (!isa(*I) && !SimplifiedValues.lookup(*I))
224 return false;
225
226 return true;
227 }
228
229 /// \brief Accumulate a constant GEP offset into an APInt if possible.
230 ///
231 /// Returns false if unable to compute the offset for any reason. Respects any
232 /// simplified values known during the analysis of this callsite.
233 bool CallAnalyzer::accumulateGEPOffset(GEPOperator &GEP, APInt &Offset) {
234 if (!TD)
235 return false;
236
237 unsigned IntPtrWidth = TD->getPointerSizeInBits();
238 assert(IntPtrWidth == Offset.getBitWidth());
239
240 for (gep_type_iterator GTI = gep_type_begin(GEP), GTE = gep_type_end(GEP);
241 GTI != GTE; ++GTI) {
242 ConstantInt *OpC = dyn_cast(GTI.getOperand());
243 if (!OpC)
244 if (Constant *SimpleOp = SimplifiedValues.lookup(GTI.getOperand()))
245 OpC = dyn_cast(SimpleOp);
246 if (!OpC)
247 return false;
248 if (OpC->isZero()) continue;
249
250 // Handle a struct index, which adds its field offset to the pointer.
251 if (StructType *STy = dyn_cast(*GTI)) {
252 unsigned ElementIdx = OpC->getZExtValue();
253 const StructLayout *SL = TD->getStructLayout(STy);
254 Offset += APInt(IntPtrWidth, SL->getElementOffset(ElementIdx));
255 continue;
256 }
257
258 APInt TypeSize(IntPtrWidth, TD->getTypeAllocSize(GTI.getIndexedType()));
259 Offset += OpC->getValue().sextOrTrunc(IntPtrWidth) * TypeSize;
260 }
261 return true;
262 }
263
264 bool CallAnalyzer::visitAlloca(AllocaInst &I) {
265 // FIXME: Check whether inlining will turn a dynamic alloca into a static
266 // alloca, and handle that case.
267
268 // We will happily inline tatic alloca instructions or dynamic alloca
269 // instructions in always-inline situations.
270 if (AlwaysInline || I.isStaticAlloca())
271 return Base::visitAlloca(I);
272
273 // FIXME: This is overly conservative. Dynamic allocas are inefficient for
274 // a variety of reasons, and so we would like to not inline them into
275 // functions which don't currently have a dynamic alloca. This simply
276 // disables inlining altogether in the presence of a dynamic alloca.
277 HasDynamicAlloca = true;
278 return false;
279 }
280
281 bool CallAnalyzer::visitPHI(PHINode &I) {
282 // FIXME: We should potentially be tracking values through phi nodes,
283 // especially when they collapse to a single value due to deleted CFG edges
284 // during inlining.
285
286 // FIXME: We need to propagate SROA *disabling* through phi nodes, even
287 // though we don't want to propagate it's bonuses. The idea is to disable
288 // SROA if it *might* be used in an inappropriate manner.
289
290 // Phi nodes are always zero-cost.
291 return true;
292 }
293
294 bool CallAnalyzer::visitGetElementPtr(GetElementPtrInst &I) {
295 Value *SROAArg;
296 DenseMap::iterator CostIt;
297 bool SROACandidate = lookupSROAArgAndCost(I.getPointerOperand(),
298 SROAArg, CostIt);
299
300 // Try to fold GEPs of constant-offset call site argument pointers. This
301 // requires target data and inbounds GEPs.
302 if (TD && I.isInBounds()) {
303 // Check if we have a base + offset for the pointer.
304 Value *Ptr = I.getPointerOperand();
305 std::pair BaseAndOffset = ConstantOffsetPtrs.lookup(Ptr);
306 if (BaseAndOffset.first) {
307 // Check if the offset of this GEP is constant, and if so accumulate it
308 // into Offset.
309 if (!accumulateGEPOffset(cast(I), BaseAndOffset.second)) {
310 // Non-constant GEPs aren't folded, and disable SROA.
311 if (SROACandidate)
312 disableSROA(CostIt);
313 return false;
41314 }
42315
43 // Figure out if this instruction will be removed due to simple constant
44 // propagation.
45 Instruction &Inst = cast(*U);
46
47 // We can't constant propagate instructions which have effects or
48 // read memory.
49 //
50 // FIXME: It would be nice to capture the fact that a load from a
51 // pointer-to-constant-global is actually a *really* good thing to zap.
52 // Unfortunately, we don't know the pointer that may get propagated here,
53 // so we can't make this decision.
54 if (Inst.mayReadFromMemory() || Inst.mayHaveSideEffects() ||
55 isa(Inst))
56 continue;
57
58 bool AllOperandsConstant = true;
59 for (unsigned i = 0, e = Inst.getNumOperands(); i != e; ++i)
60 if (!isa(Inst.getOperand(i)) && Inst.getOperand(i) != V) {
61 AllOperandsConstant = false;
62 break;
63 }
64 if (!AllOperandsConstant)
65 continue;
66
67 // We will get to remove this instruction...
68 Reduction += InlineConstants::InstrCost;
69
70 // And any other instructions that use it which become constants
71 // themselves.
72 Worklist.push_back(&Inst);
73 }
74 } while (!Worklist.empty());
75 return Reduction;
76 }
77
78 static unsigned countCodeReductionForAllocaICmp(const CodeMetrics &Metrics,
79 ICmpInst *ICI) {
80 unsigned Reduction = 0;
81
82 // Bail if this is comparing against a non-constant; there is nothing we can
83 // do there.
84 if (!isa(ICI->getOperand(1)))
85 return Reduction;
86
87 // An icmp pred (alloca, C) becomes true if the predicate is true when
88 // equal and false otherwise.
89 bool Result = ICI->isTrueWhenEqual();
90
91 SmallVector Worklist;
92 Worklist.push_back(ICI);
93 do {
94 Instruction *U = Worklist.pop_back_val();
95 Reduction += InlineConstants::InstrCost;
96 for (Value::use_iterator UI = U->use_begin(), UE = U->use_end();
97 UI != UE; ++UI) {
98 Instruction *I = dyn_cast(*UI);
99 if (!I || I->mayHaveSideEffects()) continue;
100 if (I->getNumOperands() == 1)
101 Worklist.push_back(I);
102 if (BinaryOperator *BO = dyn_cast(I)) {
103 // If BO produces the same value as U, then the other operand is
104 // irrelevant and we can put it into the Worklist to continue
105 // deleting dead instructions. If BO produces the same value as the
106 // other operand, we can delete BO but that's it.
107 if (Result == true) {
108 if (BO->getOpcode() == Instruction::Or)
109 Worklist.push_back(I);
110 if (BO->getOpcode() == Instruction::And)
111 Reduction += InlineConstants::InstrCost;
112 } else {
113 if (BO->getOpcode() == Instruction::Or ||
114 BO->getOpcode() == Instruction::Xor)
115 Reduction += InlineConstants::InstrCost;
116 if (BO->getOpcode() == Instruction::And)
117 Worklist.push_back(I);
118 }
316 // Add the result as a new mapping to Base + Offset.
317 ConstantOffsetPtrs[&I] = BaseAndOffset;
318
319 // Also handle SROA candidates here, we already know that the GEP is
320 // all-constant indexed.
321 if (SROACandidate)
322 SROAArgValues[&I] = SROAArg;
323
324 return true;
325 }
326 }
327
328 if (isGEPOffsetConstant(I)) {
329 if (SROACandidate)
330 SROAArgValues[&I] = SROAArg;
331
332 // Constant GEPs are modeled as free.
333 return true;
334 }
335
336 // Variable GEPs will require math and will disable SROA.
337 if (SROACandidate)
338 disableSROA(CostIt);
339 return false;
340 }
341
342 bool CallAnalyzer::visitBitCast(BitCastInst &I) {
343 // Propagate constants through bitcasts.
344 if (Constant *COp = dyn_cast(I.getOperand(0)))
345 if (Constant *C = ConstantExpr::getBitCast(COp, I.getType())) {
346 SimplifiedValues[&I] = C;
347 return true;
348 }
349
350 // Track base/offsets through casts
351 std::pair BaseAndOffset
352 = ConstantOffsetPtrs.lookup(I.getOperand(0));
353 // Casts don't change the offset, just wrap it up.
354 if (BaseAndOffset.first)
355 ConstantOffsetPtrs[&I] = BaseAndOffset;
356
357 // Also look for SROA candidates here.
358 Value *SROAArg;
359 DenseMap::iterator CostIt;
360 if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt))
361 SROAArgValues[&I] = SROAArg;
362
363 // Bitcasts are always zero cost.
364 return true;
365 }
366
367 bool CallAnalyzer::visitPtrToInt(PtrToIntInst &I) {
368 // Propagate constants through ptrtoint.
369 if (Constant *COp = dyn_cast(I.getOperand(0)))
370 if (Constant *C = ConstantExpr::getPtrToInt(COp, I.getType())) {
371 SimplifiedValues[&I] = C;
372 return true;
373 }
374
375 // Track base/offset pairs when converted to a plain integer provided the
376 // integer is large enough to represent the pointer.
377 unsigned IntegerSize = I.getType()->getScalarSizeInBits();
378 if (TD && IntegerSize >= TD->getPointerSizeInBits()) {
379 std::pair BaseAndOffset
380 = ConstantOffsetPtrs.lookup(I.getOperand(0));
381 if (BaseAndOffset.first)
382 ConstantOffsetPtrs[&I] = BaseAndOffset;
383 }
384
385 // This is really weird. Technically, ptrtoint will disable SROA. However,
386 // unless that ptrtoint is *used* somewhere in the live basic blocks after
387 // inlining, it will be nuked, and SROA should proceed. All of the uses which
388 // would block SROA would also block SROA if applied directly to a pointer,
389 // and so we can just add the integer in here. The only places where SROA is
390 // preserved either cannot fire on an integer, or won't in-and-of themselves
391 // disable SROA (ext) w/o some later use that we would see and disable.
392 Value *SROAArg;
393 DenseMap::iterator CostIt;
394 if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt))
395 SROAArgValues[&I] = SROAArg;
396
397 // A ptrtoint cast is free so long as the result is large enough to store the
398 // pointer, and a legal integer type.
399 return TD && TD->isLegalInteger(IntegerSize) &&
400 IntegerSize >= TD->getPointerSizeInBits();
401 }
402
403 bool CallAnalyzer::visitIntToPtr(IntToPtrInst &I) {
404 // Propagate constants through ptrtoint.
405 if (Constant *COp = dyn_cast(I.getOperand(0)))
406 if (Constant *C = ConstantExpr::getIntToPtr(COp, I.getType())) {
407 SimplifiedValues[&I] = C;
408 return true;
409 }
410
411 // Track base/offset pairs when round-tripped through a pointer without
412 // modifications provided the integer is not too large.
413 Value *Op = I.getOperand(0);
414 unsigned IntegerSize = Op->getType()->getScalarSizeInBits();
415 if (TD && IntegerSize <= TD->getPointerSizeInBits()) {
416 std::pair BaseAndOffset = ConstantOffsetPtrs.lookup(Op);
417 if (BaseAndOffset.first)
418 ConstantOffsetPtrs[&I] = BaseAndOffset;
419 }
420
421 // "Propagate" SROA here in the same manner as we do for ptrtoint above.
422 Value *SROAArg;
423 DenseMap::iterator CostIt;
424 if (lookupSROAArgAndCost(Op, SROAArg, CostIt))
425 SROAArgValues[&I] = SROAArg;
426
427 // An inttoptr cast is free so long as the input is a legal integer type
428 // which doesn't contain values outside the range of a pointer.
429 return TD && TD->isLegalInteger(IntegerSize) &&
430 IntegerSize <= TD->getPointerSizeInBits();
431 }
432
433 bool CallAnalyzer::visitCastInst(CastInst &I) {
434 // Propagate constants through ptrtoint.
435 if (Constant *COp = dyn_cast(I.getOperand(0)))
436 if (Constant *C = ConstantExpr::getCast(I.getOpcode(), COp, I.getType())) {
437 SimplifiedValues[&I] = C;
438 return true;
439 }
440
441 // Disable SROA in the face of arbitrary casts we don't whitelist elsewhere.
442 disableSROA(I.getOperand(0));
443
444 // No-op casts don't have any cost.
445 if (I.isLosslessCast())
446 return true;
447
448 // trunc to a native type is free (assuming the target has compare and
449 // shift-right of the same width).
450 if (TD && isa(I) &&
451 TD->isLegalInteger(TD->getTypeSizeInBits(I.getType())))
452 return true;
453
454 // Result of a cmp instruction is often extended (to be used by other
455 // cmp instructions, logical or return instructions). These are usually
456 // no-ops on most sane targets.
457 if (isa(I.getOperand(0)))
458 return true;
459
460 // Assume the rest of the casts require work.
461 return false;
462 }
463
464 bool CallAnalyzer::visitUnaryInstruction(UnaryInstruction &I) {
465 Value *Operand = I.getOperand(0);
466 Constant *Ops[1] = { dyn_cast(Operand) };
467 if (Ops[0] || (Ops[0] = SimplifiedValues.lookup(Operand)))
468 if (Constant *C = ConstantFoldInstOperands(I.getOpcode(), I.getType(),
469 Ops, TD)) {
470 SimplifiedValues[&I] = C;
471 return true;
472 }
473
474 // Disable any SROA on the argument to arbitrary unary operators.
475 disableSROA(Operand);
476
477 return false;
478 }
479
480 bool CallAnalyzer::visitICmp(ICmpInst &I) {
481 Value *LHS = I.getOperand(0), *RHS = I.getOperand(1);
482 // First try to handle simplified comparisons.
483 if (!isa(LHS))
484 if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))
485 LHS = SimpleLHS;
486 if (!isa(RHS))
487 if (Constant *SimpleRHS = SimplifiedValues.lookup(RHS))
488 RHS = SimpleRHS;
489 if (Constant *CLHS = dyn_cast(LHS))
490 if (Constant *CRHS = dyn_cast(RHS))
491 if (Constant *C = ConstantExpr::getICmp(I.getPredicate(), CLHS, CRHS)) {
492 SimplifiedValues[&I] = C;
493 return true;
119494 }
120 if (BranchInst *BI = dyn_cast(I)) {
121 BasicBlock *BB = BI->getSuccessor(Result ? 0 : 1);
122 if (BB->getSinglePredecessor())
123 Reduction
124 += InlineConstants::InstrCost * Metrics.NumBBInsts.lookup(BB);
495
496 // Otherwise look for a comparison between constant offset pointers with
497 // a common base.
498 Value *LHSBase, *RHSBase;
499 APInt LHSOffset, RHSOffset;
500 llvm::tie(LHSBase, LHSOffset) = ConstantOffsetPtrs.lookup(LHS);
501 if (LHSBase) {
502 llvm::tie(RHSBase, RHSOffset) = ConstantOffsetPtrs.lookup(RHS);
503 if (RHSBase && LHSBase == RHSBase) {
504 // We have common bases, fold the icmp to a constant based on the
505 // offsets.
506 Constant *CLHS = ConstantInt::get(LHS->getContext(), LHSOffset);
507 Constant *CRHS = ConstantInt::get(RHS->getContext(), RHSOffset);
508 if (Constant *C = ConstantExpr::getICmp(I.getPredicate(), CLHS, CRHS)) {
509 SimplifiedValues[&I] = C;
510 ++NumConstantPtrCmps;
511 return true;
125512 }
126513 }
127 } while (!Worklist.empty());
128
129 return Reduction;
130 }
131
132 /// \brief Compute the reduction possible for a given instruction if we are able
133 /// to SROA an alloca.
134 ///
135 /// The reduction for this instruction is added to the SROAReduction output
136 /// parameter. Returns false if this instruction is expected to defeat SROA in
137 /// general.
138 static bool countCodeReductionForSROAInst(Instruction *I,
139 SmallVectorImpl &Worklist,
140 unsigned &SROAReduction) {
141 if (LoadInst *LI = dyn_cast(I)) {
142 if (!LI->isSimple())
143 return false;
144 SROAReduction += InlineConstants::InstrCost;
514 }
515
516 // If the comparison is an equality comparison with null, we can simplify it
517 // for any alloca-derived argument.
518 if (I.isEquality() && isa(I.getOperand(1)))
519 if (isAllocaDerivedArg(I.getOperand(0))) {
520 // We can actually predict the result of comparisons between an
521 // alloca-derived value and null. Note that this fires regardless of
522 // SROA firing.
523 bool IsNotEqual = I.getPredicate() == CmpInst::ICMP_NE;
524 SimplifiedValues[&I] = IsNotEqual ? ConstantInt::getTrue(I.getType())
525 : ConstantInt::getFalse(I.getType());
526 return true;
527 }
528
529 // Finally check for SROA candidates in comparisons.
530 Value *SROAArg;
531 DenseMap::iterator CostIt;
532 if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt)) {
533 if (isa(I.getOperand(1))) {
534 accumulateSROACost(CostIt, InlineConstants::InstrCost);
535 return true;
536 }
537
538 disableSROA(CostIt);
539 }
540
541 return false;
542 }
543
544 bool CallAnalyzer::visitSub(BinaryOperator &I) {
545 // Try to handle a special case: we can fold computing the difference of two
546 // constant-related pointers.
547 Value *LHS = I.getOperand(0), *RHS = I.getOperand(1);
548 Value *LHSBase, *RHSBase;
549 APInt LHSOffset, RHSOffset;
550 llvm::tie(LHSBase, LHSOffset) = ConstantOffsetPtrs.lookup(LHS);
551 if (LHSBase) {
552 llvm::tie(RHSBase, RHSOffset) = ConstantOffsetPtrs.lookup(RHS);
553 if (RHSBase && LHSBase == RHSBase) {
554 // We have common bases, fold the subtract to a constant based on the
555 // offsets.
556 Constant *CLHS = ConstantInt::get(LHS->getContext(), LHSOffset);
557 Constant *CRHS = ConstantInt::get(RHS->getContext(), RHSOffset);
558 if (Constant *C = ConstantExpr::getSub(CLHS, CRHS)) {
559 SimplifiedValues[&I] = C;
560 ++NumConstantPtrDiffs;
561 return true;
562 }
563 }
564 }
565
566 // Otherwise, fall back to the generic logic for simplifying and handling
567 // instructions.
568 return Base::visitSub(I);
569 }
570
571 bool CallAnalyzer::visitBinaryOperator(BinaryOperator &I) {
572 Value *LHS = I.getOperand(0), *RHS = I.getOperand(1);
573 if (!isa(LHS))
574 if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))
575 LHS = SimpleLHS;
576 if (!isa(RHS))
577 if (Constant *SimpleRHS = SimplifiedValues.lookup(RHS))
578 RHS = SimpleRHS;
579 Value *SimpleV = SimplifyBinOp(I.getOpcode(), LHS, RHS, TD);
580 if (Constant *C = dyn_cast_or_null(SimpleV)) {
581 SimplifiedValues[&I] = C;
145582 return true;
146583 }
147584
148 if (StoreInst *SI = dyn_cast(I)) {
149 if (!SI->isSimple())
150 return false;
151 SROAReduction += InlineConstants::InstrCost;
152 return true;
153 }
154
155 if (GetElementPtrInst *GEP = dyn_cast(I)) {
156 // If the GEP has variable indices, we won't be able to do much with it.
157 if (!GEP->hasAllConstantIndices())
158 return false;
159 // A non-zero GEP will likely become a mask operation after SROA.
160 if (GEP->hasAllZeroIndices())
161 SROAReduction += InlineConstants::InstrCost;
162 Worklist.push_back(GEP);
163 return true;
164 }
165
166 if (BitCastInst *BCI = dyn_cast(I)) {
167 // Track pointer through bitcasts.
168 Worklist.push_back(BCI);
169 SROAReduction += InlineConstants::InstrCost;
170 return true;
171 }
172
173 // We just look for non-constant operands to ICmp instructions as those will
174 // defeat SROA. The actual reduction for these happens even without SROA.
175 if (ICmpInst *ICI = dyn_cast(I))
176 return isa(ICI->getOperand(1));
177
178 if (SelectInst *SI = dyn_cast(I)) {
179 // SROA can handle a select of alloca iff all uses of the alloca are
180 // loads, and dereferenceable. We assume it's dereferenceable since
181 // we're told the input is an alloca.
182 for (Value::use_iterator UI = SI->use_begin(), UE = SI->use_end();
183 UI != UE; ++UI) {
184 LoadInst *LI = dyn_cast(*UI);
185 if (LI == 0 || !LI->isSimple())
186 return false;
187 }
188 // We don't know whether we'll be deleting the rest of the chain of
189 // instructions from the SelectInst on, because we don't know whether
190 // the other side of the select is also an alloca or not.
191 return true;
192 }
193
194 if (IntrinsicInst *II = dyn_cast(I)) {
585 // Disable any SROA on arguments to arbitrary, unsimplified binary operators.
586 disableSROA(LHS);
587 disableSROA(RHS);
588
589 return false;
590 }
591
592 bool CallAnalyzer::visitLoad(LoadInst &I) {
593 Value *SROAArg;
594 DenseMap::iterator CostIt;
595 if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt)) {
596 if (I.isSimple()) {
597 accumulateSROACost(CostIt, InlineConstants::InstrCost);
598 return true;
599 }
600
601 disableSROA(CostIt);
602 }
603
604 return false;
605 }
606
607 bool CallAnalyzer::visitStore(StoreInst &I) {
608 Value *SROAArg;
609 DenseMap::iterator CostIt;
610 if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt)) {
611 if (I.isSimple()) {
612 accumulateSROACost(CostIt, InlineConstants::InstrCost);
613 return true;
614 }
615
616 disableSROA(CostIt);
617 }
618
619 return false;
620 }
621
622 bool CallAnalyzer::visitCallSite(CallSite CS) {
623 if (CS.isCall() && cast(CS.getInstruction())->canReturnTwice() &&
624 !F.hasFnAttr(Attribute::ReturnsTwice)) {
625 // This aborts the entire analysis.
626 ExposesReturnsTwice = true;
627 return false;
628 }
629
630 if (IntrinsicInst *II = dyn_cast(CS.getInstruction())) {
195631 switch (II->getIntrinsicID()) {
196632 default:
197 return false;
633 return Base::visitCallSite(CS);
634
635 case Intrinsic::dbg_declare:
636 case Intrinsic::dbg_value:
637 case Intrinsic::invariant_start:
638 case Intrinsic::invariant_end:
639 case Intrinsic::lifetime_start:
640 case Intrinsic::lifetime_end:
198641 case Intrinsic::memset:
199642 case Intrinsic::memcpy:
200643 case Intrinsic::memmove:
201 case Intrinsic::lifetime_start:
202 case Intrinsic::lifetime_end:
203 // SROA can usually chew through these intrinsics.
204 SROAReduction += InlineConstants::InstrCost;
644 case Intrinsic::objectsize:
645 case Intrinsic::ptr_annotation:
646 case Intrinsic::var_annotation:
647 // SROA can usually chew through these intrinsics and they have no cost
648 // so don't pay the price of analyzing them in detail.
205649 return true;
206650 }
207651 }
208652
209 // If there is some other strange instruction, we're not going to be
210 // able to do much if we inline this.
653 if (Function *F = CS.getCalledFunction()) {
654 if (F == CS.getInstruction()->getParent()->getParent()) {
655 // This flag will fully abort the analysis, so don't bother with anything
656 // else.
657 IsRecursive = true;
658 return false;
659 }
660
661 if (!callIsSmall(F)) {
662 // We account for the average 1 instruction per call argument setup
663 // here.
664 Cost += CS.arg_size() * InlineConstants::InstrCost;
665
666 // Everything other than inline ASM will also have a significant cost
667 // merely from making the call.
668 if (!isa(CS.getCalledValue()))
669 Cost += InlineConstants::CallPenalty;
670 }
671
672 return Base::visitCallSite(CS);
673 }
674
675 // Otherwise we're in a very special case -- an indirect function call. See
676 // if we can be particularly clever about this.
677 Value *Callee = CS.getCalledValue();
678
679 // First, pay the price of the argument setup. We account for the average
680 // 1 instruction per call argument setup here.
681 Cost += CS.arg_size() * InlineConstants::InstrCost;
682
683 // Next, check if this happens to be an indirect function call to a known
684 // function in this inline context. If not, we've done all we can.
685 Function *F = dyn_cast_or_null(SimplifiedValues.lookup(Callee));
686 if (!F)
687 return Base::visitCallSite(CS);
688
689 // If we have a constant that we are calling as a function, we can peer
690 // through it and see the function target. This happens not infrequently
691 // during devirtualization and so we want to give it a hefty bonus for
692 // inlining, but cap that bonus in the event that inlining wouldn't pan
693 // out. Pretend to inline the function, with a custom threshold.
694 CallAnalyzer CA(TD, *F, InlineConstants::IndirectCallThreshold);
695 if (CA.analyzeCall(CS)) {
696 // We were able to inline the indirect call! Subtract the cost from the
697 // bonus we want to apply, but don't go below zero.
698 Cost -= std::max(0, InlineConstants::IndirectCallThreshold - CA.getCost());
699 }
700
701 return Base::visitCallSite(CS);
702 }
703
704 bool CallAnalyzer::visitInstruction(Instruction &I) {
705 // We found something we don't understand or can't handle. Mark any SROA-able
706 // values in the operand list as no longer viable.
707 for (User::op_iterator OI = I.op_begin(), OE = I.op_end(); OI != OE; ++OI)
708 disableSROA(*OI);
709
211710 return false;
212711 }
213712
214 unsigned InlineCostAnalyzer::FunctionInfo::countCodeReductionForAlloca(
215 const CodeMetrics &Metrics, Value *V) {
216 if (!V->getType()->isPointerTy()) return 0; // Not a pointer
217 unsigned Reduction = 0;
218 unsigned SROAReduction = 0;
219 bool CanSROAAlloca = true;
220
221 SmallVector Worklist;
222 Worklist.push_back(V);
713
714 /// \brief Analyze a basic block for its contribution to the inline cost.
715 ///
716 /// This method walks the analyzer over every instruction in the given basic
717 /// block and accounts for their cost during inlining at this callsite. It
718 /// aborts early if the threshold has been exceeded or an impossible to inline
719 /// construct has been detected. It returns false if inlining is no longer
720 /// viable, and true if inlining remains viable.
721 bool CallAnalyzer::analyzeBlock(BasicBlock *BB) {
722 for (BasicBlock::iterator I = BB->begin(), E = llvm::prior(BB->end());
723 I != E; ++I) {
724 ++NumInstructions;
725 if (isa(I) || I->getType()->isVectorTy())
726 ++NumVectorInstructions;
727
728 // If the instruction simplified to a constant, there is no cost to this
729 // instruction. Visit the instructions using our InstVisitor to account for
730 // all of the per-instruction logic. The visit tree returns true if we
731 // consumed the instruction in any way, and false if the instruction's base
732 // cost should count against inlining.
733 if (Base::visit(I))
734 ++NumInstructionsSimplified;
735 else
736 Cost += InlineConstants::InstrCost;
737
738 // If the visit this instruction detected an uninlinable pattern, abort.
739 if (IsRecursive || ExposesReturnsTwice || HasDynamicAlloca)
740 return false;
741
742 if (NumVectorInstructions > NumInstructions/2)
743 VectorBonus = FiftyPercentVectorBonus;
744 else if (NumVectorInstructions > NumInstructions/10)
745 VectorBonus = TenPercentVectorBonus;
746 else
747 VectorBonus = 0;
748
749 // Check if we've past the threshold so we don't spin in huge basic
750 // blocks that will never inline.
751 if (!AlwaysInline && Cost > (Threshold + VectorBonus))
752 return false;
753 }
754
755 return true;
756 }
757
758 /// \brief Compute the base pointer and cumulative constant offsets for V.
759 ///
760 /// This strips all constant offsets off of V, leaving it the base pointer, and
761 /// accumulates the total constant offset applied in the returned constant. It
762 /// returns 0 if V is not a pointer, and returns the constant '0' if there are
763 /// no constant offsets applied.
764 ConstantInt *CallAnalyzer::stripAndComputeInBoundsConstantOffsets(Value *&V) {
765 if (!TD || !V->getType()->isPointerTy())
766 return 0;
767
768 unsigned IntPtrWidth = TD->getPointerSizeInBits();
769 APInt Offset = APInt::getNullValue(IntPtrWidth);
770
771 // Even though we don't look through PHI nodes, we could be called on an
772 // instruction in an unreachable block, which may be on a cycle.
773 SmallPtrSet Visited;
774 Visited.insert(V);
223775 do {
224 Value *V = Worklist.pop_back_val();
225 for (Value::use_iterator UI = V->use_begin(), E = V->use_end();
226 UI != E; ++UI){
227 Instruction *I = cast(*UI);
228
229 if (ICmpInst *ICI = dyn_cast(I))
230 Reduction += countCodeReductionForAllocaICmp(Metrics, ICI);
231
232 if (CanSROAAlloca)
233 CanSROAAlloca = countCodeReductionForSROAInst(I, Worklist,
234 SROAReduction);
235 }
236 } while (!Worklist.empty());
237
238 return Reduction + (CanSROAAlloca ? SROAReduction : 0);
239 }
240
241 void InlineCostAnalyzer::FunctionInfo::countCodeReductionForPointerPair(
242 const CodeMetrics &Metrics, DenseMap &PointerArgs,
243 Value *V, unsigned ArgIdx) {
244 SmallVector Worklist;
245 Worklist.push_back(V);
246 do {
247 Value *V = Worklist.pop_back_val();
248 for (Value::use_iterator UI = V->use_begin(), E = V->use_end();
249 UI != E; ++UI){
250 Instruction *I = cast(*UI);
251
252 if (GetElementPtrInst *GEP = dyn_cast(I)) {
253 // If the GEP has variable indices, we won't be able to do much with it.
254 if (!GEP->hasAllConstantIndices())
776 if (GEPOperator *GEP = dyn_cast(V)) {
777 if (!GEP->isInBounds() || !accumulateGEPOffset(*GEP, Offset))
778 return 0;
779 V = GEP->getPointerOperand();
780 } else if (Operator::getOpcode(V) == Instruction::BitCast) {
781 V = cast(V)->getOperand(0);
782 } else if (GlobalAlias *GA = dyn_cast(V)) {
783 if (GA->mayBeOverridden())
784 break;
785 V = GA->getAliasee();
786 } else {
787 break;
788 }
789 assert(V->getType()->isPointerTy() && "Unexpected operand type!");
790 } while (Visited.insert(V));
791
792 Type *IntPtrTy = TD->getIntPtrType(V->getContext());
793 return cast(ConstantInt::get(IntPtrTy, Offset));
794 }
795
796 /// \brief Analyze a call site for potential inlining.
797 ///
798 /// Returns true if inlining this call is viable, and false if it is not
799 /// viable. It computes the cost and adjusts the threshold based on numerous
800 /// factors and heuristics. If this method returns false but the computed cost
801 /// is below the computed threshold, then inlining was forcibly disabled by
802 /// some artifact of the rountine.
803 bool CallAnalyzer::analyzeCall(CallSite CS) {
804 // Track whether the post-inlining function would have more than one basic
805 // block. A single basic block is often intended for inlining. Balloon the
806 // threshold by 50% until we pass the single-BB phase.
807 bool SingleBB = true;
808 int SingleBBBonus = Threshold / 2;
809 Threshold += SingleBBBonus;
810
811 // Unless we are always-inlining, perform some tweaks to the cost and
812 // threshold based on the direct callsite information.
813 if (!AlwaysInline) {
814 // We want to more aggressively inline vector-dense kernels, so up the
815 // threshold, and we'll lower it if the % of vector instructions gets too
816 // low.
817 assert(NumInstructions == 0);
818 assert(NumVectorInstructions == 0);
819 FiftyPercentVectorBonus = Threshold;
820 TenPercentVectorBonus = Threshold / 2;
821
822 // Subtract off one instruction per call argument as those will be free after
823 // inlining.
824 Cost -= CS.arg_size() * InlineConstants::InstrCost;
825
826 // If there is only one call of the function, and it has internal linkage,
827 // the cost of inlining it drops dramatically.
828 if (F.hasLocalLinkage() && F.hasOneUse() && &F == CS.getCalledFunction())
829 Cost += InlineConstants::LastCallToStaticBonus;
830
831 // If the instruction after the call, or if the normal destination of the
832 // invoke is an unreachable instruction, the function is noreturn. As such,
833 // there is little point in inlining this unless there is literally zero cost.
834 if (InvokeInst *II = dyn_cast(CS.getInstruction())) {
835 if (isa(II->getNormalDest()->begin()))
836 Threshold = 1;
837 } else if (isa(++BasicBlock::iterator(CS.getInstruction())))
838 Threshold = 1;
839
840 // If this function uses the coldcc calling convention, prefer not to inline
841 // it.
842 if (F.getCallingConv() == CallingConv::Cold)
843 Cost += InlineConstants::ColdccPenalty;
844
845 // Check if we're done. This can happen due to bonuses and penalties.
846 if (Cost > Threshold)
847 return false;
848 }
849
850 if (F.empty())
851 return true;
852
853 // Track whether we've seen a return instruction. The first return
854 // instruction is free, as at least one will usually disappear in inlining.
855 bool HasReturn = false;
856
857 // Populate our simplified values by mapping from function arguments to call
858 // arguments with known important simplifications.
859 CallSite::arg_iterator CAI = CS.arg_begin();
860 for (Function::arg_iterator FAI = F.arg_begin(), FAE = F.arg_end();
861 FAI != FAE; ++FAI, ++CAI) {
862 assert(CAI != CS.arg_end());
863 if (Constant *C = dyn_cast(CAI))
864 SimplifiedValues[FAI] = C;
865
866 Value *PtrArg = *CAI;
867 if (ConstantInt *C = stripAndComputeInBoundsConstantOffsets(PtrArg)) {
868 ConstantOffsetPtrs[FAI] = std::make_pair(PtrArg, C->getValue());
869
870 // We can SROA any pointer arguments derived from alloca instructions.
871 if (isa(PtrArg)) {
872 SROAArgValues[FAI] = PtrArg;
873 SROAArgCosts[PtrArg] = 0;
874 }
875 }
876 }
877 NumConstantArgs = SimplifiedValues.size();
878 NumConstantOffsetPtrArgs = ConstantOffsetPtrs.size();
879 NumAllocaArgs = SROAArgValues.size();
880
881 // The worklist of live basic blocks in the callee *after* inlining. We avoid
882 // adding basic blocks of the callee which can be proven to be dead for this
883 // particular call site in order to get more accurate cost estimates. This
884 // requires a somewhat heavyweight iteration pattern: we need to walk the
885 // basic blocks in a breadth-first order as we insert live successors. To
886 // accomplish this, prioritizing for small iterations because we exit after
887 // crossing our threshold, we use a small-size optimized SetVector.
888 typedef SetVector,
889 SmallPtrSet > BBSetVector;
890 BBSetVector BBWorklist;
891 BBWorklist.insert(&F.getEntryBlock());
892 // Note that we *must not* cache the size, this loop grows the worklist.
893 for (unsigned Idx = 0; Idx != BBWorklist.size(); ++Idx) {
894 // Bail out the moment we cross the threshold. This means we'll under-count
895 // the cost, but only when undercounting doesn't matter.
896 if (!AlwaysInline && Cost > (Threshold + VectorBonus))
897 break;
898
899 BasicBlock *BB = BBWorklist[Idx];
900 if (BB->empty())
901 continue;
902
903 // Handle the terminator cost here where we can track returns and other
904 // function-wide constructs.
905 TerminatorInst *TI = BB->getTerminator();
906
907 // We never want to inline functions that contain an indirectbr. This is
908 // incorrect because all the blockaddress's (in static global initializers
909 // for example) would be referring to the original function, and this indirect
910 // jump would jump from the inlined copy of the function into the original
911 // function which is extremely undefined behavior.
912 // FIXME: This logic isn't really right; we can safely inline functions
913 // with indirectbr's as long as no other function or global references the
914 // blockaddress of a block within the current function. And as a QOI issue,
915 // if someone is using a blockaddress without an indirectbr, and that
916 // reference somehow ends up in another function or global, we probably
917 // don't want to inline this function.
918 if (isa(TI))
919 return false;
920
921 if (!HasReturn && isa(TI))
922 HasReturn = true;
923 else
924 Cost += InlineConstants::InstrCost;
925
926 // Analyze the cost of this block. If we blow through the threshold, this
927 // returns false, and we can bail on out.
928 if (!analyzeBlock(BB)) {
929 if (IsRecursive || ExposesReturnsTwice || HasDynamicAlloca)
930 return false;
931 break;
932 }
933
934 // Add in the live successors by first checking whether we have terminator
935 // that may be simplified based on the values simplified by this call.
936 if (BranchInst *BI = dyn_cast(TI)) {
937 if (BI->isConditional()) {
938 Value *Cond = BI->getCondition();
939 if (ConstantInt *SimpleCond
940 = dyn_cast_or_null(SimplifiedValues.lookup(Cond))) {
941 BBWorklist.insert(BI->getSuccessor(SimpleCond->isZero() ? 1 : 0));
255942 continue;
256 // Unless the GEP is in-bounds, some comparisons will be non-constant.
257 // Fortunately, the real-world cases where this occurs uses in-bounds
258 // GEPs, and so we restrict the optimization to them here.
259 if (!GEP->isInBounds())
260 continue;
261
262 // Constant indices just change the constant offset. Add the resulting
263 // value both to our worklist for this argument, and to the set of
264 // viable paired values with future arguments.
265 PointerArgs[GEP] = ArgIdx;
266 Worklist.push_back(GEP);
943 }
944 }
945 } else if (SwitchInst *SI = dyn_cast(TI)) {
946 Value *Cond = SI->getCondition();
947 if (ConstantInt *SimpleCond
948 = dyn_cast_or_null(SimplifiedValues.lookup(Cond))) {
949 BBWorklist.insert(SI->findCaseValue(SimpleCond).getCaseSuccessor());
267950 continue;
268951 }
269
270 // Track pointer through casts. Even when the result is not a pointer, it
271 // remains a constant relative to constants derived from other constant
272 // pointers.
273 if (CastInst *CI = dyn_cast(I)) {
274 PointerArgs[CI] = ArgIdx;
275 Worklist.push_back(CI);
276 continue;
277 }
278
279 // There are two instructions which produce a strict constant value when
280 // applied to two related pointer values. Ignore everything else.
281 if (!isa(I) && I->getOpcode() != Instruction::Sub)
282 continue;
283 assert(I->getNumOperands() == 2);
284
285 // Ensure that the two operands are in our set of potentially paired
286 // pointers (or are derived from them).
287 Value *OtherArg = I->getOperand(0);
288 if (OtherArg == V)
289 OtherArg = I->getOperand(1);
290 DenseMap::const_iterator ArgIt
291 = PointerArgs.find(OtherArg);
292 if (ArgIt == PointerArgs.end())
293 continue;
294 std::pair ArgPair(ArgIt->second, ArgIdx);
295 if (ArgPair.first > ArgPair.second)
296 std::swap(ArgPair.first, ArgPair.second);
297
298 PointerArgPairWeights[ArgPair]
299 += countCodeReductionForConstant(Metrics, I);
300 }
301 } while (!Worklist.empty());
302 }
303
304 /// analyzeFunction - Fill in the current structure with information gleaned
305 /// from the specified function.
306 void InlineCostAnalyzer::FunctionInfo::analyzeFunction(Function *F,
307 const TargetData *TD) {
308 Metrics.analyzeFunction(F, TD);
309
310 // A function with exactly one return has it removed during the inlining
311 // process (see InlineFunction), so don't count it.
312 // FIXME: This knowledge should really be encoded outside of FunctionInfo.
313 if (Metrics.NumRets==1)
314 --Metrics.NumInsts;
315
316 ArgumentWeights.reserve(F->arg_size());
317 DenseMap PointerArgs;
318 unsigned ArgIdx = 0;
319 for (Function::arg_iterator I = F->arg_begin(), E = F->arg_end(); I != E;
320 ++I, ++ArgIdx) {
321 // Count how much code can be eliminated if one of the arguments is
322 // a constant or an alloca.
323 ArgumentWeights.push_back(ArgInfo(countCodeReductionForConstant(Metrics, I),
324 countCodeReductionForAlloca(Metrics, I)));
325
326 // If the argument is a pointer, also check for pairs of pointers where
327 // knowing a fixed offset between them allows simplification. This pattern
328 // arises mostly due to STL algorithm patterns where pointers are used as
329 // random access iterators.
330 if (!I->getType()->isPointerTy())
331 continue;
332 PointerArgs[I] = ArgIdx;
333 countCodeReductionForPointerPair(Metrics, PointerArgs, I, ArgIdx);
334 }
335 }
336
337 /// NeverInline - returns true if the function should never be inlined into
338 /// any caller
339 bool InlineCostAnalyzer::FunctionInfo::NeverInline() {
340 return (Metrics.exposesReturnsTwice || Metrics.isRecursive ||
341 Metrics.containsIndirectBr);
342 }
343
344 // ConstantFunctionBonus - Figure out how much of a bonus we can get for
345 // possibly devirtualizing a function. We'll subtract the size of the function
346 // we may wish to inline from the indirect call bonus providing a limit on
347 // growth. Leave an upper limit of 0 for the bonus - we don't want to penalize
348 // inlining because we decide we don't want to give a bonus for
349 // devirtualizing.
350 int InlineCostAnalyzer::ConstantFunctionBonus(CallSite CS, Constant *C) {
351
352 // This could just be NULL.
353 if (!C) return 0;
354
355 Function *F = dyn_cast(C);
356 if (!F) return 0;
357
358 int Bonus = InlineConstants::IndirectCallBonus + getInlineSize(CS, F);
359 return (Bonus > 0) ? 0 : Bonus;
360 }
361
362 // CountBonusForConstant - Figure out an approximation for how much per-call
363 // performance boost we can expect if the specified value is constant.
364 int InlineCostAnalyzer::CountBonusForConstant(Value *V, Constant *C) {
365 unsigned Bonus = 0;
366 for (Value::use_iterator UI = V->use_begin(), E = V->use_end(); UI != E;++UI){
367 User *U = *UI;
368 if (CallInst *CI = dyn_cast(U)) {
369 // Turning an indirect call into a direct call is a BIG win
370 if (CI->getCalledValue() == V)
371 Bonus += ConstantFunctionBonus(CallSite(CI), C);
372 } else if (InvokeInst *II = dyn_cast(U)) {
373 // Turning an indirect call into a direct call is a BIG win
374 if (II->getCalledValue() == V)
375 Bonus += ConstantFunctionBonus(CallSite(II), C);
376 }
377 // FIXME: Eliminating conditional branches and switches should
378 // also yield a per-call performance boost.
379 else {
380 // Figure out the bonuses that wll accrue due to simple constant
381 // propagation.
382 Instruction &Inst = cast(*U);
383
384 // We can't constant propagate instructions which have effects or
385 // read memory.
386 //
387 // FIXME: It would be nice to capture the fact that a load from a
388 // pointer-to-constant-global is actually a *really* good thing to zap.
389 // Unfortunately, we don't know the pointer that may get propagated here,
390 // so we can't make this decision.
391 if (Inst.mayReadFromMemory() || Inst.mayHaveSideEffects() ||
392 isa(Inst))
393 continue;
394
395 bool AllOperandsConstant = true;
396 for (unsigned i = 0, e = Inst.getNumOperands(); i != e; ++i)
397 if (!isa(Inst.getOperand(i)) && Inst.getOperand(i) != V) {
398 AllOperandsConstant = false;
399 break;
400 }
401
402 if (AllOperandsConstant)
403 Bonus += CountBonusForConstant(&Inst);
404 }
405 }
406
407 return Bonus;
408 }
409
410 int InlineCostAnalyzer::getInlineSize(CallSite CS, Function *Callee) {
411 // Get information about the callee.
412 FunctionInfo *CalleeFI = &CachedFunctionInfo[Callee];
413
414 // If we haven't calculated this information yet, do so now.
415 if (CalleeFI->Metrics.NumBlocks == 0)
416 CalleeFI->analyzeFunction(Callee, TD);
417
418 // InlineCost - This value measures how good of an inline candidate this call
419 // site is to inline. A lower inline cost make is more likely for the call to
420 // be inlined. This value may go negative.
421 //
422 int InlineCost = 0;
423
424 // Compute any size reductions we can expect due to arguments being passed into
425 // the function.
426 //
427 unsigned ArgNo = 0;
428 CallSite::arg_iterator I = CS.arg_begin();
429 for (Function::arg_iterator FI = Callee->arg_begin(), FE = Callee->arg_end();
430 FI != FE; ++I, ++FI, ++ArgNo) {
431
432 // If an alloca is passed in, inlining this function is likely to allow
433 // significant future optimization possibilities (like scalar promotion, and
434 // scalarization), so encourage the inlining of the function.
435 //
436 if (isa(I))
437 InlineCost -= CalleeFI->ArgumentWeights[ArgNo].AllocaWeight;
438
439 // If this is a constant being passed into the function, use the argument
440 // weights calculated for the callee to determine how much will be folded
441 // away with this information.
442 else if (isa(I))
443 InlineCost -= CalleeFI->ArgumentWeights[ArgNo].ConstantWeight;
444 }
445
446 const DenseMap, unsigned> &ArgPairWeights
447 = CalleeFI->PointerArgPairWeights;
448 for (DenseMap, unsigned>::const_iterator I
449 = ArgPairWeights.begin(), E = ArgPairWeights.end();
450 I != E; ++I)
451 if (CS.getArgument(I->first.first)->stripInBoundsConstantOffsets() ==
452 CS.getArgument(I->first.second)->stripInBoundsConstantOffsets())
453 InlineCost -= I->second;
454
455 // Each argument passed in has a cost at both the caller and the callee
456 // sides. Measurements show that each argument costs about the same as an
457 // instruction.
458 InlineCost -= (CS.arg_size() * InlineConstants::InstrCost);
459
460 // Now that we have considered all of the factors that make the call site more
461 // likely to be inlined, look at factors that make us not want to inline it.
462
463 // Calls usually take a long time, so they make the inlining gain smaller.
464 InlineCost += CalleeFI->Metrics.NumCalls * InlineConstants::CallPenalty;
465
466 // Look at the size of the callee. Each instruction counts as 5.
467 InlineCost += CalleeFI->Metrics.NumInsts * InlineConstants::InstrCost;
468
469 return InlineCost;
470 }
471
472 int InlineCostAnalyzer::getInlineBonuses(CallSite CS, Function *Callee) {
473 // Get information about the callee.
474 FunctionInfo *CalleeFI = &CachedFunctionInfo[Callee];
475
476 // If we haven't calculated this information yet, do so now.
477 if (CalleeFI->Metrics.NumBlocks == 0)
478 CalleeFI->analyzeFunction(Callee, TD);
479
480 bool isDirectCall = CS.getCalledFunction() == Callee;
481 Instruction *TheCall = CS.getInstruction();
482 int Bonus = 0;
483
484 // If there is only one call of the function, and it has internal linkage,
485 // make it almost guaranteed to be inlined.
486 //
487 if (Callee->hasLocalLinkage() && Callee->hasOneUse() && isDirectCall)
488 Bonus += InlineConstants::LastCallToStaticBonus;
489
490 // If the instruction after the call, or if the normal destination of the
491 // invoke is an unreachable instruction, the function is noreturn. As such,
492 // there is little point in inlining this.
493 if (InvokeInst *II = dyn_cast(TheCall)) {
494 if (isa(II->getNormalDest()->begin()))
495 Bonus += InlineConstants::NoreturnPenalty;
496 } else if (isa(++BasicBlock::iterator(TheCall)))
497 Bonus += InlineConstants::NoreturnPenalty;
498
499 // If this function uses the coldcc calling convention, prefer not to inline
500 // it.
501 if (Callee->getCallingConv() == CallingConv::Cold)
502 Bonus += InlineConstants::ColdccPenalty;
503
504 // Add to the inline quality for properties that make the call valuable to
505 // inline. This includes factors that indicate that the result of inlining
506 // the function will be optimizable. Currently this just looks at arguments
507 // passed into the function.
508 //
509 CallSite::arg_iterator I = CS.arg_begin();
510 for (Function::arg_iterator FI = Callee->arg_begin(), FE = Callee->arg_end();
511 FI != FE; ++I, ++FI)
512 // Compute any constant bonus due to inlining we want to give here.
513 if (isa(I))
514 Bonus += CountBonusForConstant(FI, cast(I));
515
516 return Bonus;
517 }
518
519 // getInlineCost - The heuristic used to determine if we should inline the
520 // function call or not.
521 //
522 InlineCost InlineCostAnalyzer::getInlineCost(CallSite CS) {
523 return getInlineCost(CS, CS.getCalledFunction());
524 }
525
526 InlineCost InlineCostAnalyzer::getInlineCost(CallSite CS, Function *Callee) {
527 Instruction *TheCall = CS.getInstruction();
528 Function *Caller = TheCall->getParent()->getParent();
952 }
953
954 // If we're unable to select a particular successor, just count all of
955 // them.
956 for (unsigned TIdx = 0, TSize = TI->getNumSuccessors(); TIdx != TSize; ++TIdx)
957 BBWorklist.insert(TI->getSuccessor(TIdx));
958
959 // If we had any successors at this point, than post-inlining is likely to
960 // have them as well. Note that we assume any basic blocks which existed
961 // due to branches or switches which folded above will also fold after
962 // inlining.
963 if (SingleBB && TI->getNumSuccessors() > 1) {
964 // Take off the bonus we applied to the threshold.
965 Threshold -= SingleBBBonus;
966 SingleBB = false;
967 }
968 }
969
970 Threshold += VectorBonus;
971
972 return AlwaysInline || Cost < Threshold;
973 }
974
975 /// \brief Dump stats about this call's analysis.
976 void CallAnalyzer::dump() {
977 #define DEBUG_PRINT_STAT(x) llvm::dbgs() << " " #x ": " << x << "\n"
978 DEBUG_PRINT_STAT(NumConstantArgs);
979 DEBUG_PRINT_STAT(NumConstantOffsetPtrArgs);
980 DEBUG_PRINT_STAT(NumAllocaArgs);
981 DEBUG_PRINT_STAT(NumConstantPtrCmps);
982 DEBUG_PRINT_STAT(NumConstantPtrDiffs);
983 DEBUG_PRINT_STAT(NumInstructionsSimplified);
984 DEBUG_PRINT_STAT(SROACostSavings);
985 DEBUG_PRINT_STAT(SROACostSavingsLost);
986 #undef DEBUG_PRINT_STAT
987 }
988
989 InlineCost InlineCostAnalyzer::getInlineCost(CallSite CS, int Threshold) {
990 Function *Callee = CS.getCalledFunction();
529991
530992 // Don't inline functions which can be redefined at link-time to mean
531993 // something else. Don't inline functions marked noinline or call sites
532994 // marked noinline.
533 if (Callee->mayBeOverridden() || Callee->hasFnAttr(Attribute::NoInline) ||
534 CS.isNoInline())
995 if (!Callee || Callee->mayBeOverridden() ||
996 Callee->hasFnAttr(Attribute::NoInline) || CS.isNoInline())
535997 return llvm::InlineCost::getNever();
536998
537 // Get information about the callee.
538 FunctionInfo *CalleeFI = &CachedFunctionInfo[Callee];
539
540 // If we haven't calculated this information yet, do so now.
541 if (CalleeFI->Metrics.NumBlocks == 0)
542 CalleeFI->analyzeFunction(Callee, TD);
543
544 // If we should never inline this, return a huge cost.
545 if (CalleeFI->NeverInline())
999 DEBUG(llvm::dbgs() << " Analyzing call of " << Callee->getName() << "...\n");
1000
1001 CallAnalyzer CA(TD, *Callee, Threshold);
1002 bool ShouldInline = CA.analyzeCall(CS);
1003
1004 DEBUG(CA.dump());
1005
1006 // Check if there was a reason to force inlining or no inlining.
1007 if (!ShouldInline && CA.getCost() < CA.getThreshold())
5461008 return InlineCost::getNever();
547
548 // FIXME: It would be nice to kill off CalleeFI->NeverInline. Then we
549 // could move this up and avoid computing the FunctionInfo for
550 // things we are going to just return always inline for. This
551 // requires handling setjmp somewhere else, however.
552 if (!Callee->isDeclaration() && Callee->hasFnAttr(Attribute::AlwaysInline))
1009 if (ShouldInline && CA.getCost() >= CA.getThreshold())
5531010 return InlineCost::getAlways();
5541011
555 if (CalleeFI->Metrics.usesDynamicAlloca) {
556 // Get information about the caller.
557 FunctionInfo &CallerFI = CachedFunctionInfo[Caller];
558
559 // If we haven't calculated this information yet, do so now.
560 if (CallerFI.Metrics.NumBlocks == 0) {
561 CallerFI.analyzeFunction(Caller, TD);
562
563 // Recompute the CalleeFI pointer, getting Caller could have invalidated
564 // it.
565 CalleeFI = &CachedFunctionInfo[Callee];
566 }
567
568 // Don't inline a callee with dynamic alloca into a caller without them.
569 // Functions containing dynamic alloca's are inefficient in various ways;
570 // don't create more inefficiency.
571 if (!CallerFI.Metrics.usesDynamicAlloca)
572 return InlineCost::getNever();
573 }
574
575 // InlineCost - This value measures how good of an inline candidate this call
576 // site is to inline. A lower inline cost make is more likely for the call to
577 // be inlined. This value may go negative due to the fact that bonuses
578 // are negative numbers.
579 //
580 int InlineCost = getInlineSize(CS, Callee) + getInlineBonuses(CS, Callee);
581 return llvm::InlineCost::get(InlineCost);
582 }
583
584 // getInlineFudgeFactor - Return a > 1.0 factor if the inliner should use a
585 // higher threshold to determine if the function call should be inlined.
586 float InlineCostAnalyzer::getInlineFudgeFactor(CallSite CS) {
587 Function *Callee = CS.getCalledFunction();
588
589 // Get information about the callee.
590 FunctionInfo &CalleeFI = CachedFunctionInfo[Callee];
591
592 // If we haven't calculated this information yet, do so now.
593 if (CalleeFI.Metrics.NumBlocks == 0)
594 CalleeFI.analyzeFunction(Callee, TD);
595
596 float Factor = 1.0f;
597 // Single BB functions are often written to be inlined.
598 if (CalleeFI.Metrics.NumBlocks == 1)
599 Factor += 0.5f;
600
601 // Be more aggressive if the function contains a good chunk (if it mades up
602 // at least 10% of the instructions) of vector instructions.
603 if (CalleeFI.Metrics.NumVectorInsts > CalleeFI.Metrics.NumInsts/2)
604 Factor += 2.0f;
605 else if (CalleeFI.Metrics.NumVectorInsts > CalleeFI.Metrics.NumInsts/10)
606 Factor += 1.5f;
607 return Factor;
1012 return llvm::InlineCost::get(CA.getCost(), CA.getThreshold());
6081013 }
6091014
6101015 /// growCachedCostInfo - update the cached cost info for Caller after Callee has
6111016 /// been inlined.
6121017 void
6131018 InlineCostAnalyzer::growCachedCostInfo(Function *Caller, Function *Callee) {
614 CodeMetrics &CallerMetrics = CachedFunctionInfo[Caller].Metrics;
615
616 // For small functions we prefer to recalculate the cost for better accuracy.
617 if (CallerMetrics.NumBlocks < 10 && CallerMetrics.NumInsts < 1000) {
618 resetCachedCostInfo(Caller);
619 return;
620 }
621
622 // For large functions, we can save a lot of computation time by skipping
623 // recalculations.
624 if (CallerMetrics.NumCalls > 0)
625 --CallerMetrics.NumCalls;
626
627 if (Callee == 0) return;
628
629 CodeMetrics &CalleeMetrics = CachedFunctionInfo[Callee].Metrics;
630
631 // If we don't have metrics for the callee, don't recalculate them just to
632 // update an approximation in the caller. Instead, just recalculate the
633 // caller info from scratch.
634 if (CalleeMetrics.NumBlocks == 0) {
635 resetCachedCostInfo(Caller);
636 return;
637 }
638
639 // Since CalleeMetrics were already calculated, we know that the CallerMetrics
640 // reference isn't invalidated: both were in the DenseMap.
641 CallerMetrics.usesDynamicAlloca |= CalleeMetrics.usesDynamicAlloca;
642
643 // FIXME: If any of these three are true for the callee, the callee was
644 // not inlined into the caller, so I think they're redundant here.
645 CallerMetrics.exposesReturnsTwice |= CalleeMetrics.exposesReturnsTwice;
646 CallerMetrics.isRecursive |= CalleeMetrics.isRecursive;
647 CallerMetrics.containsIndirectBr |= CalleeMetrics.containsIndirectBr;
648
649 CallerMetrics.NumInsts += CalleeMetrics.NumInsts;
650 CallerMetrics.NumBlocks += CalleeMetrics.NumBlocks;
651 CallerMetrics.NumCalls += CalleeMetrics.NumCalls;
652 CallerMetrics.NumVectorInsts += CalleeMetrics.NumVectorInsts;
653 CallerMetrics.NumRets += CalleeMetrics.NumRets;
654
655 // analyzeBasicBlock counts each function argument as an inst.
656 if (CallerMetrics.NumInsts >= Callee->arg_size())
657 CallerMetrics.NumInsts -= Callee->arg_size();
658 else
659 CallerMetrics.NumInsts = 0;
660
661 // We are not updating the argument weights. We have already determined that
662 // Caller is a fairly large function, so we accept the loss of precision.
6631019 }
6641020
6651021 /// clear - empty the cache of inline costs
6661022 void InlineCostAnalyzer::clear() {
667 CachedFunctionInfo.clear();
668 }
1023 }
5858 // We still have to check the inline cost in case there are reasons to
5959 // not inline which trump the always-inline attribute such as setjmp and
6060 // indirectbr.
61 return CA.getInlineCost(CS);
62 }
63 float getInlineFudgeFactor(CallSite CS) {
64 return CA.getInlineFudgeFactor(CS);
61 return CA.getInlineCost(CS, getInlineThreshold(CS));
6562 }
6663 void resetCachedCostInfo(Function *Caller) {
6764 CA.resetCachedCostInfo(Caller);
3939 }
4040 static char ID; // Pass identification, replacement for typeid
4141 InlineCost getInlineCost(CallSite CS) {
42 return CA.getInlineCost(CS);
43 }
44 float getInlineFudgeFactor(CallSite CS) {
45 return CA.getInlineFudgeFactor(CS);
42 return CA.getInlineCost(CS, getInlineThreshold(CS));
4643 }
4744 void resetCachedCostInfo(Function *Caller) {
4845 CA.resetCachedCostInfo(Caller);
230230 return false;
231231 }
232232
233 int Cost = IC.getValue();
234233 Function *Caller = CS.getCaller();
235 int CurrentThreshold = getInlineThreshold(CS);
236 float FudgeFactor = getInlineFudgeFactor(CS);
237 int AdjThreshold = (int)(CurrentThreshold * FudgeFactor);
238 if (Cost >= AdjThreshold) {
239 DEBUG(dbgs() << " NOT Inlining: cost=" << Cost
240 << ", thres=" << AdjThreshold
234 if (!IC) {
235 DEBUG(dbgs() << " NOT Inlining: cost=" << IC.getCost()
236 << ", thres=" << (IC.getCostDelta() + IC.getCost())
241237 << ", Call: " << *CS.getInstruction() << "\n");
242238 return false;
243239 }
254250 // are used. Thus we will always have the opportunity to make local inlining
255251 // decisions. Importantly the linkonce-ODR linkage covers inline functions
256252 // and templates in C++.
253 //
254 // FIXME: All of this logic should be sunk into getInlineCost. It relies on
255 // the internal implementation of the inline cost metrics rather than
256 // treating them as truly abstract units etc.
257257 if (Caller->hasLocalLinkage() ||
258258 Caller->getLinkage() == GlobalValue::LinkOnceODRLinkage) {
259259 int TotalSecondaryCost = 0;
260 bool outerCallsFound = false;
260 // The candidate cost to be imposed upon the current function.
261 int CandidateCost = IC.getCost() - (InlineConstants::CallPenalty + 1);
261262 // This bool tracks what happens if we do NOT inline C into B.
262263 bool callerWillBeRemoved = Caller->hasLocalLinkage();
263264 // This bool tracks what happens if we DO inline C into B.
275276 }
276277
277278 InlineCost IC2 = getInlineCost(CS2);
278 if (IC2.isNever())
279 if (!IC2) {
279280 callerWillBeRemoved = false;
280 if (IC2.isAlways() || IC2.isNever())
281281 continue;
282
283 outerCallsFound = true;
284 int Cost2 = IC2.getValue();
285 int CurrentThreshold2 = getInlineThreshold(CS2);
286 float FudgeFactor2 = getInlineFudgeFactor(CS2);
287
288 if (Cost2 >= (int)(CurrentThreshold2 * FudgeFactor2))
289 callerWillBeRemoved = false;
290
291 // See if we have this case. We subtract off the penalty
292 // for the call instruction, which we would be deleting.
293 if (Cost2 < (int)(CurrentThreshold2 * FudgeFactor2) &&
294 Cost2 + Cost - (InlineConstants::CallPenalty + 1) >=
295 (int)(CurrentThreshold2 * FudgeFactor2)) {
282 }
283 if (IC2.isAlways())
284 continue;
285
286 // See if inlining or original callsite would erase the cost delta of
287 // this callsite. We subtract off the penalty for the call instruction,
288 // which we would be deleting.
289 if (IC2.getCostDelta() <= CandidateCost) {
296290 inliningPreventsSomeOuterInline = true;
297 TotalSecondaryCost += Cost2;
291 TotalSecondaryCost += IC2.getCost();
298292 }
299293 }
300294 // If all outer calls to Caller would get inlined, the cost for the last
304298 if (callerWillBeRemoved && Caller->use_begin() != Caller->use_end())
305299 TotalSecondaryCost += InlineConstants::LastCallToStaticBonus;
306300
307 if (outerCallsFound && inliningPreventsSomeOuterInline &&
308 TotalSecondaryCost < Cost) {
309 DEBUG(dbgs() << " NOT Inlining: " << *CS.getInstruction() <<
310 " Cost = " << Cost <<
301 if (inliningPreventsSomeOuterInline && TotalSecondaryCost < IC.getCost()) {
302 DEBUG(dbgs() << " NOT Inlining: " << *CS.getInstruction() <<
303 " Cost = " << IC.getCost() <<
311304 ", outer Cost = " << TotalSecondaryCost << '\n');
312305 return false;
313306 }
314307 }
315308
316 DEBUG(dbgs() << " Inlining: cost=" << Cost
317 << ", thres=" << AdjThreshold
309 DEBUG(dbgs() << " Inlining: cost=" << IC.getCost()
310 << ", thres=" << (IC.getCostDelta() + IC.getCost())
318311 << ", Call: " << *CS.getInstruction() << '\n');
319312 return true;
320313 }
0 ; RUN: opt -inline < %s -S -o - -inline-threshold=8 | FileCheck %s
1
2 target datalayout = "p:32:32"
13
24 declare void @llvm.lifetime.start(i64 %size, i8* nocapture %ptr)
35
1416 define void @inner1(i32 *%ptr) {
1517 %A = load i32* %ptr
1618 store i32 0, i32* %ptr
17 %C = getelementptr i32* %ptr, i32 0
18 %D = getelementptr i32* %ptr, i32 1
19 %C = getelementptr inbounds i32* %ptr, i32 0
20 %D = getelementptr inbounds i32* %ptr, i32 1
1921 %E = bitcast i32* %ptr to i8*
2022 %F = select i1 false, i32* %ptr, i32* @glbl
2123 call void @llvm.lifetime.start(i64 0, i8* %E)
3436 define void @inner2(i32 *%ptr) {
3537 %A = load i32* %ptr
3638 store i32 0, i32* %ptr
37 %C = getelementptr i32* %ptr, i32 0
38 %D = getelementptr i32* %ptr, i32 %A
39 %C = getelementptr inbounds i32* %ptr, i32 0
40 %D = getelementptr inbounds i32* %ptr, i32 %A
3941 %E = bitcast i32* %ptr to i8*
4042 %F = select i1 false, i32* %ptr, i32* @glbl
4143 call void @llvm.lifetime.start(i64 0, i8* %E)
9294 ; %B poisons this call, scalar-repl can't handle that instruction. However, we
9395 ; still want to detect that the icmp and branch *can* be handled.
9496 define void @inner4(i32 *%ptr, i32 %A) {
95 %B = getelementptr i32* %ptr, i32 %A
97 %B = getelementptr inbounds i32* %ptr, i32 %A
9698 %C = icmp eq i32* %ptr, null
9799 br i1 %C, label %bb.true, label %bb.false
98100 bb.true:
121123 bb.false:
122124 ret void
123125 }
126
127 define void @outer5() {
128 ; CHECK: @outer5
129 ; CHECK-NOT: call void @inner5
130 %ptr = alloca i32
131 call void @inner5(i1 false, i32* %ptr)
132 ret void
133 }
134
135 ; %D poisons this call, scalar-repl can't handle that instruction. However, if
136 ; the flag is set appropriately, the poisoning instruction is inside of dead
137 ; code, and so shouldn't be counted.
138 define void @inner5(i1 %flag, i32 *%ptr) {
139 %A = load i32* %ptr
140 store i32 0, i32* %ptr
141 %C = getelementptr inbounds i32* %ptr, i32 0
142 br i1 %flag, label %if.then, label %exit
143
144 if.then:
145 %D = getelementptr inbounds i32* %ptr, i32 %A
146 %E = bitcast i32* %ptr to i8*
147 %F = select i1 false, i32* %ptr, i32* @glbl
148 call void @llvm.lifetime.start(i64 0, i8* %E)
149 ret void
150
151 exit:
152 ret void
153 }
154
33 ; already have dynamic allocas.
44
55 ; RUN: opt < %s -inline -S | FileCheck %s
6 ;
7 ; FIXME: This test is xfailed because the inline cost rewrite disabled *all*
8 ; inlining of functions which contain a dynamic alloca. It should be re-enabled
9 ; once that functionality is restored.
10 ; XFAIL: *
611
712 declare void @ext(i32*)
813
None ; RUN: opt < %s -inline -S | FileCheck %s
0 ; RUN: opt < %s -inline -inline-threshold=20 -S | FileCheck %s
11
22 define internal i32 @callee1(i32 %A, i32 %B) {
33 %C = sdiv i32 %A, %B
1313 }
1414
1515 define i32 @caller2() {
16 ; Check that we can constant-prop through instructions after inlining callee21
17 ; to get constants in the inlined callsite to callee22.
18 ; FIXME: Currently, the threshold is fixed at 20 because we don't perform
19 ; *recursive* cost analysis to realize that the nested call site will definitely
20 ; inline and be cheap. We should eventually do that and lower the threshold here
21 ; to 1.
22 ;
1623 ; CHECK: @caller2
1724 ; CHECK-NOT: call void @callee2
1825 ; CHECK: ret
1926
20 ; We contrive to make this hard for *just* the inline pass to do in order to
21 ; simulate what can actually happen with large, complex functions getting
22 ; inlined.
23 %a = add i32 42, 0
24 %b = add i32 48, 0
25
26 %x = call i32 @callee21(i32 %a, i32 %b)
27 %x = call i32 @callee21(i32 42, i32 48)
2728 ret i32 %x
2829 }
2930
4041 br i1 %icmp, label %bb.true, label %bb.false
4142 bb.true:
4243 ; This block musn't be counted in the inline cost.
43 %ptr = call i8* @getptr()
44 load volatile i8* %ptr
45 load volatile i8* %ptr
46 load volatile i8* %ptr
47 load volatile i8* %ptr
48 load volatile i8* %ptr
49 load volatile i8* %ptr
50 load volatile i8* %ptr
51 load volatile i8* %ptr
52 load volatile i8* %ptr
53 load volatile i8* %ptr
54 load volatile i8* %ptr
55 load volatile i8* %ptr
56 load volatile i8* %ptr
57 load volatile i8* %ptr
58 load volatile i8* %ptr
59 load volatile i8* %ptr
60 load volatile i8* %ptr
61 load volatile i8* %ptr
62 load volatile i8* %ptr
63 load volatile i8* %ptr
64 load volatile i8* %ptr
65 load volatile i8* %ptr
66 load volatile i8* %ptr
67 load volatile i8* %ptr
68 load volatile i8* %ptr
69 load volatile i8* %ptr
70 load volatile i8* %ptr
71 load volatile i8* %ptr
72 load volatile i8* %ptr
73 load volatile i8* %ptr
74 load volatile i8* %ptr
75 load volatile i8* %ptr
76 load volatile i8* %ptr
77 load volatile i8* %ptr
78 load volatile i8* %ptr
79 load volatile i8* %ptr
80 load volatile i8* %ptr
81 load volatile i8* %ptr
82 load volatile i8* %ptr
83 load volatile i8* %ptr
44 %x1 = add i32 %x, 1
45 %x2 = add i32 %x1, 1
46 %x3 = add i32 %x2, 1
47 %x4 = add i32 %x3, 1
48 %x5 = add i32 %x4, 1
49 %x6 = add i32 %x5, 1
50 %x7 = add i32 %x6, 1
51 %x8 = add i32 %x7, 1
8452
85 ret i32 %x
53 ret i32 %x8
8654 bb.false:
8755 ret i32 %x
8856 }
57
58 define i32 @caller3() {
59 ; Check that even if the expensive path is hidden behind several basic blocks,
60 ; it doesn't count toward the inline cost when constant-prop proves those paths
61 ; dead.
62 ;
63 ; CHECK: @caller3
64 ; CHECK-NOT: call
65 ; CHECK: ret i32 6
66
67 entry:
68 %x = call i32 @callee3(i32 42, i32 48)
69 ret i32 %x
70 }
71
72 define i32 @callee3(i32 %x, i32 %y) {
73 %sub = sub i32 %y, %x
74 %icmp = icmp ugt i32 %sub, 42
75 br i1 %icmp, label %bb.true, label %bb.false
76
77 bb.true:
78 %icmp2 = icmp ult i32 %sub, 64
79 br i1 %icmp2, label %bb.true.true, label %bb.true.false
80
81 bb.true.true:
82 ; This block musn't be counted in the inline cost.
83 %x1 = add i32 %x, 1
84 %x2 = add i32 %x1, 1
85 %x3 = add i32 %x2, 1
86 %x4 = add i32 %x3, 1
87 %x5 = add i32 %x4, 1
88 %x6 = add i32 %x5, 1
89 %x7 = add i32 %x6, 1
90 %x8 = add i32 %x7, 1
91 br label %bb.merge
92
93 bb.true.false:
94 ; This block musn't be counted in the inline cost.
95 %y1 = add i32 %y, 1
96 %y2 = add i32 %y1, 1
97 %y3 = add i32 %y2, 1
98 %y4 = add i32 %y3, 1
99 %y5 = add i32 %y4, 1
100 %y6 = add i32 %y5, 1
101 %y7 = add i32 %y6, 1
102 %y8 = add i32 %y7, 1
103 br label %bb.merge
104
105 bb.merge:
106 %result = phi i32 [ %x8, %bb.true.true ], [ %y8, %bb.true.false ]
107 ret i32 %result
108
109 bb.false:
110 ret i32 %sub
111 }
7070 call void @f2(i32 123, i8* bitcast (void (i32, i8*, i8*)* @f1 to i8*), i8* bitcast (void (i32, i8*, i8*)* @f2 to i8*)) nounwind ssp
7171 ret void
7272 }
73
74
75 ; Check that a recursive function, when called with a constant that makes the
76 ; recursive path dead code can actually be inlined.
77 define i32 @fib(i32 %i) {
78 entry:
79 %is.zero = icmp eq i32 %i, 0
80 br i1 %is.zero, label %zero.then, label %zero.else
81
82 zero.then:
83 ret i32 0
84
85 zero.else:
86 %is.one = icmp eq i32 %i, 1
87 br i1 %is.one, label %one.then, label %one.else
88
89 one.then:
90 ret i32 1
91
92 one.else:
93 %i1 = sub i32 %i, 1
94 %f1 = call i32 @fib(i32 %i1)
95 %i2 = sub i32 %i, 2
96 %f2 = call i32 @fib(i32 %i2)
97 %f = add i32 %f1, %f2
98 ret i32 %f
99 }
100
101 define i32 @fib_caller() {
102 ; CHECK: @fib_caller
103 ; CHECK-NOT: call
104 ; CHECK: ret
105 %f1 = call i32 @fib(i32 0)
106 %f2 = call i32 @fib(i32 1)
107 %result = add i32 %f1, %f2
108 ret i32 %result
109 }
0 ; RUN: opt -inline < %s -S -o - -inline-threshold=10 | FileCheck %s
1
2 target datalayout = "p:32:32"
13
24 define i32 @outer1() {
35 ; CHECK: @outer1