llvm.org GIT mirror llvm / a36486e
[LV] Model masking in VPlan, introducing VPInstructions This patch adds a new abstraction layer to VPlan and leverages it to model the planned instructions that manipulate masks (AND, OR, NOT), introduced during predication. The new VPValue and VPUser classes model how data flows into, through and out of a VPlan, forming the vertices of a planned Def-Use graph. The new VPInstruction class is a generic single-instruction Recipe that models a planned instruction along with its opcode, operands and users. See VectorizationPlan.rst for more details. Differential Revision: https://reviews.llvm.org/D38676 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318645 91177308-0d34-0410-b5e6-96231b3b80d8 Gil Rapaport 1 year, 11 months ago
7 changed file(s) with 712 addition(s) and 132 deletion(s). Raw diff Collapse all Expand all
8181 replicated VF*UF times to handle scalarized and predicated instructions.
8282 Innerloops are also modelled as SESE regions.
8383
84 Low-level Design
85 ================
84 7. Support instruction-level analysis and transformation, as part of Planning
85 Step 2.b: During vectorization instructions may need to be traversed, moved,
86 replaced by other instructions or be created. For example, vector idiom
87 detection and formation involves searching for and optimizing instruction
88 patterns.
89
90 Definitions
91 ===========
8692 The low-level design of VPlan comprises of the following classes.
8793
8894 :LoopVectorizationPlanner:
138144 instructions; e.g., cloned once, replicated multiple times or widened
139145 according to selected VF.
140146
147 :VPValue:
148 The base of VPlan's def-use relations class hierarchy. When instantiated, it
149 models a constant or a live-in Value in VPlan. It has users, which are of type
150 VPUser, but no operands.
151
152 :VPUser:
153 A VPValue representing a general vertex in the def-use graph of VPlan. It has
154 operands which are of type VPValue. When instantiated, it represents a
155 live-out Instruction that exists outside VPlan. VPUser is similar in some
156 aspects to LLVM's User class.
157
158 :VPInstruction:
159 A VPInstruction is both a VPRecipe and a VPUser. It models a single
160 VPlan-level instruction to be generated if the VPlan is executed, including
161 its opcode and possibly additional characteristics. It is the basis for
162 writing instruction-level analyses and optimizations in VPlan as creating,
163 replacing or moving VPInstructions record both def-use and scheduling
164 decisions. VPInstructions also extend LLVM IR's opcodes with idiomatic
165 operations that enrich the Vectorizer's semantics.
166
141167 :VPTransformState:
142168 Stores information used for generating output IR, passed from
143169 LoopVectorizationPlanner to its selected VPlan for execution, and used to pass
144170 additional information down to VPBlocks and VPRecipes.
145171
172 The Planning Process and VPlan Roadmap
173 ======================================
174
175 Transforming the Loop Vectorizer to use VPlan follows a staged approach. First,
176 VPlan is used to record the final vectorization decisions, and to execute them:
177 the Hierarchical CFG models the planned control-flow, and Recipes capture
178 decisions taken inside basic-blocks. Next, VPlan will be used also as the basis
179 for taking these decisions, effectively turning them into a series of
180 VPlan-to-VPlan algorithms. Finally, VPlan will support the planning process
181 itself including cost-based analyses for making these decisions, to fully
182 support compositional and iterative decision making.
183
184 Some decisions are local to an instruction in the loop, such as whether to widen
185 it into a vector instruction or replicate it, keeping the generated instructions
186 in place. Other decisions, however, involve moving instructions, replacing them
187 with other instructions, and/or introducing new instructions. For example, a
188 cast may sink past a later instruction and be widened to handle first-order
189 recurrence; an interleave group of strided gathers or scatters may effectively
190 move to one place where they are replaced with shuffles and a common wide vector
191 load or store; new instructions may be introduced to compute masks, shuffle the
192 elements of vectors, and pack scalar values into vectors or vice-versa.
193
194 In order for VPlan to support making instruction-level decisions and analyses,
195 it needs to model the relevant instructions along with their def/use relations.
196 This too follows a staged approach: first, the new instructions that compute
197 masks are modeled as VPInstructions, along with their induced def/use subgraph.
198 This effectively models masks in VPlan, facilitating VPlan-based predication.
199 Next, the logic embedded within each Recipe for generating its instructions at
200 VPlan execution time, will instead take part in the planning process by modeling
201 them as VPInstructions. Finally, only logic that applies to instructions as a
202 group will remain in Recipes, such as interleave groups and potentially other
203 idiom groups having synergistic cost.
204
146205 Related LLVM components
147206 -----------------------
148207 1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP
150209
151210 2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by
152211 Polly [7]_.
212
213 3. Loop Vectorizer: the Vectorization Plan aims to upgrade the infrastructure of
214 the Loop Vectorizer and extend it to handle outer loops [8,9]_.
153215
154216 References
155217 ----------
179241
180242 .. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks,
181243 European LLVM Developers' Meeting 2017.
244
245 .. [9] "Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop
246 Auto-Vectorization", Intel Vectorizer Team, LLVM Developers' Meeting 2016.
4747
4848 #include "llvm/Transforms/Vectorize/LoopVectorize.h"
4949 #include "VPlan.h"
50 #include "VPlanBuilder.h"
5051 #include "llvm/ADT/APInt.h"
5152 #include "llvm/ADT/ArrayRef.h"
5253 #include "llvm/ADT/DenseMap.h"
449450 /// new unrolled loop, where UF is the unroll factor.
450451 using VectorParts = SmallVector;
451452
452 /// A helper function that computes the predicate of the block BB, assuming
453 /// that the header block of the loop is set to True. It returns the *entry*
454 /// mask for the block BB.
455 VectorParts createBlockInMask(BasicBlock *BB);
456
457 /// A helper function that computes the predicate of the edge between SRC
458 /// and DST.
459 VectorParts createEdgeMask(BasicBlock *Src, BasicBlock *Dst);
460
461453 /// Vectorize a single PHINode in a block. This method handles the induction
462454 /// variable canonicalization. It supports both VF = 1 for unrolled loops and
463455 /// arbitrary length vectors.
510502 /// Try to vectorize the interleaved access group that \p Instr belongs to.
511503 void vectorizeInterleaveGroup(Instruction *Instr);
512504
513 /// Vectorize Load and Store instructions,
514 virtual void vectorizeMemoryInstruction(Instruction *Instr);
505 /// Vectorize Load and Store instructions, optionally masking the vector
506 /// operations if \p BlockInMask is non-null.
507 void vectorizeMemoryInstruction(Instruction *Instr,
508 VectorParts *BlockInMask = nullptr);
515509
516510 /// \brief Set the debug location in the builder using the debug location in
517511 /// the instruction.
528522 /// in the new unrolled loop, where UF is the unroll factor and VF is the
529523 /// vectorization factor.
530524 using ScalarParts = SmallVector, 2>;
531
532 // When we if-convert we need to create edge masks. We have to cache values
533 // so that we don't end up with exponential recursion/IR.
534 using EdgeMaskCacheTy =
535 DenseMap, VectorParts>;
536 using BlockMaskCacheTy = DenseMap;
537525
538526 /// Set up the values of the IVs correctly when exiting the vector loop.
539527 void fixupIVUsers(PHINode *OrigPhi, const InductionDescriptor &II,
737725
738726 /// Store instructions that were predicated.
739727 SmallVector PredicatedInstructions;
740
741 EdgeMaskCacheTy EdgeMaskCache;
742 BlockMaskCacheTy BlockMaskCache;
743728
744729 /// Trip count of the original loop.
745730 Value *TripCount = nullptr;
22302215 /// The profitablity analysis.
22312216 LoopVectorizationCostModel &CM;
22322217
2233 SmallVector, 4> VPlans;
2218 using VPlanPtr = std::unique_ptr;
2219
2220 SmallVector VPlans;
2221
2222 /// This class is used to enable the VPlan to invoke a method of ILV. This is
2223 /// needed until the method is refactored out of ILV and becomes reusable.
2224 struct VPCallbackILV : public VPCallback {
2225 InnerLoopVectorizer &ILV;
2226
2227 VPCallbackILV(InnerLoopVectorizer &ILV) : ILV(ILV) {}
2228
2229 Value *getOrCreateVectorValues(Value *V, unsigned Part) override {
2230 return ILV.getOrCreateVectorValue(V, Part);
2231 }
2232 };
2233
2234 /// A builder used to construct the current plan.
2235 VPBuilder Builder;
2236
2237 /// When we if-convert we need to create edge masks. We have to cache values
2238 /// so that we don't end up with exponential recursion/IR. Note that
2239 /// if-conversion currently takes place during VPlan-construction, so these
2240 /// caches are only used at that stage.
2241 using EdgeMaskCacheTy =
2242 DenseMap, VPValue *>;
2243 using BlockMaskCacheTy = DenseMap;
2244 EdgeMaskCacheTy EdgeMaskCache;
2245 BlockMaskCacheTy BlockMaskCache;
22342246
22352247 unsigned BestVF = 0;
22362248 unsigned BestUF = 0;
22872299 void buildVPlans(unsigned MinVF, unsigned MaxVF);
22882300
22892301 private:
2302 /// A helper function that computes the predicate of the block BB, assuming
2303 /// that the header block of the loop is set to True. It returns the *entry*
2304 /// mask for the block BB.
2305 VPValue *createBlockInMask(BasicBlock *BB, VPlanPtr &Plan);
2306
2307 /// A helper function that computes the predicate of the edge between SRC
2308 /// and DST.
2309 VPValue *createEdgeMask(BasicBlock *Src, BasicBlock *Dst, VPlanPtr &Plan);
2310
22902311 /// Check if \I belongs to an Interleave Group within the given VF \p Range,
22912312 /// \return true in the first returned value if so and false otherwise.
22922313 /// Build a new VPInterleaveGroup Recipe if \I is the primary member of an IG
22982319 VPInterleaveRecipe *tryToInterleaveMemory(Instruction *I, VFRange &Range);
22992320
23002321 // Check if \I is a memory instruction to be widened for \p Range.Start and
2301 // potentially masked.
2322 // potentially masked. Such instructions are handled by a recipe that takes an
2323 // additional VPInstruction for the mask.
23022324 VPWidenMemoryInstructionRecipe *tryToWidenMemory(Instruction *I,
2303 VFRange &Range);
2325 VFRange &Range,
2326 VPlanPtr &Plan);
23042327
23052328 /// Check if an induction recipe should be constructed for \I within the given
23062329 /// VF \p Range. If so build and return it. If not, return null. \p Range.End
23122335 /// Handle non-loop phi nodes. Currently all such phi nodes are turned into
23132336 /// a sequence of select instructions as the vectorizer currently performs
23142337 /// full if-conversion.
2315 VPBlendRecipe *tryToBlend(Instruction *I);
2338 VPBlendRecipe *tryToBlend(Instruction *I, VPlanPtr &Plan);
23162339
23172340 /// Check if \p I can be widened within the given VF \p Range. If \p I can be
23182341 /// widened for \p Range.Start, check if the last recipe of \p VPBB can be
23302353 /// \p Range.Start to \p Range.End.
23312354 VPBasicBlock *handleReplication(
23322355 Instruction *I, VFRange &Range, VPBasicBlock *VPBB,
2333 DenseMap &PredInst2Recipe);
2356 DenseMap &PredInst2Recipe,
2357 VPlanPtr &Plan);
23342358
23352359 /// Create a replicating region for instruction \p I that requires
23362360 /// predication. \p PredRecipe is a VPReplicateRecipe holding \p I.
2337 VPRegionBlock *createReplicateRegion(Instruction *I,
2338 VPRecipeBase *PredRecipe);
2361 VPRegionBlock *createReplicateRegion(Instruction *I, VPRecipeBase *PredRecipe,
2362 VPlanPtr &Plan);
23392363
23402364 /// Build a VPlan according to the information gathered by Legal. \return a
23412365 /// VPlan for vectorization factors \p Range.Start and up to \p Range.End
23422366 /// exclusive, possibly decreasing \p Range.End.
2343 std::unique_ptr buildVPlan(VFRange &Range);
2367 VPlanPtr buildVPlan(VFRange &Range,
2368 const SmallPtrSetImpl &NeedDef);
23442369 };
23452370
23462371 } // end namespace llvm
30903115 }
30913116 }
30923117
3093 void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
3118 void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr,
3119 VectorParts *BlockInMask) {
30943120 // Attempt to issue a wide load.
30953121 LoadInst *LI = dyn_cast(Instr);
30963122 StoreInst *SI = dyn_cast(Instr);
31313157 if (ConsecutiveStride)
31323158 Ptr = getOrCreateScalarValue(Ptr, {0, 0});
31333159
3134 VectorParts Mask = createBlockInMask(Instr->getParent());
3160 VectorParts Mask;
3161 bool isMaskRequired = BlockInMask;
3162 if (isMaskRequired)
3163 Mask = *BlockInMask;
3164
31353165 // Handle Stores:
31363166 if (SI) {
31373167 assert(!Legal->isUniform(SI->getPointerOperand()) &&
31423172 Instruction *NewSI = nullptr;
31433173 Value *StoredVal = getOrCreateVectorValue(SI->getValueOperand(), Part);
31443174 if (CreateGatherScatter) {
3145 Value *MaskPart = Legal->isMaskRequired(SI) ? Mask[Part] : nullptr;
3175 Value *MaskPart = isMaskRequired ? Mask[Part] : nullptr;
31463176 Value *VectorGep = getOrCreateVectorValue(Ptr, Part);
31473177 NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
31483178 MaskPart);
31643194 Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF));
31653195 PartPtr =
31663196 Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF));
3167 if (Mask[Part]) // The reverse of a null all-one mask is a null mask.
3197 if (isMaskRequired) // Reverse of a null all-one mask is a null mask.
31683198 Mask[Part] = reverseVector(Mask[Part]);
31693199 }
31703200
31713201 Value *VecPtr =
31723202 Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));
31733203
3174 if (Legal->isMaskRequired(SI) && Mask[Part])
3204 if (isMaskRequired)
31753205 NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
31763206 Mask[Part]);
31773207 else
31883218 for (unsigned Part = 0; Part < UF; ++Part) {
31893219 Value *NewLI;
31903220 if (CreateGatherScatter) {
3191 Value *MaskPart = Legal->isMaskRequired(LI) ? Mask[Part] : nullptr;
3221 Value *MaskPart = isMaskRequired ? Mask[Part] : nullptr;
31923222 Value *VectorGep = getOrCreateVectorValue(Ptr, Part);
31933223 NewLI = Builder.CreateMaskedGather(VectorGep, Alignment, MaskPart,
31943224 nullptr, "wide.masked.gather");
32033233 // wide load needs to start at the last vector element.
32043234 PartPtr = Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF));
32053235 PartPtr = Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF));
3206 if (Mask[Part]) // The reverse of a null all-one mask is a null mask.
3236 if (isMaskRequired) // Reverse of a null all-one mask is a null mask.
32073237 Mask[Part] = reverseVector(Mask[Part]);
32083238 }
32093239
32103240 Value *VecPtr =
32113241 Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));
3212 if (Legal->isMaskRequired(LI) && Mask[Part])
3242 if (isMaskRequired)
32133243 NewLI = Builder.CreateMaskedLoad(VecPtr, Alignment, Mask[Part],
32143244 UndefValue::get(DataTy),
32153245 "wide.masked.load");
45114541
45124542 void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,
45134543 unsigned VF) {
4544 assert(PN->getParent() == OrigLoop->getHeader() &&
4545 "Non-header phis should have been handled elsewhere");
4546
45144547 PHINode *P = cast(PN);
45154548 // In order to support recurrences we need to be able to vectorize Phi nodes.
45164549 // Phi nodes have cycles, so we need to vectorize them in two stages. This is
75087541 BestVF = VF;
75097542 BestUF = UF;
75107543
7511 erase_if(VPlans, [VF](const std::unique_ptr &Plan) {
7544 erase_if(VPlans, [VF](const VPlanPtr &Plan) {
75127545 return !Plan->hasVF(VF);
75137546 });
75147547 assert(VPlans.size() == 1 && "Best VF has not a single VPlan.");
75197552 // Perform the actual loop transformation.
75207553
75217554 // 1. Create a new empty loop. Unlink the old loop and connect the new one.
7522 VPTransformState State{
7523 BestVF, BestUF, LI, DT, ILV.Builder, ILV.VectorLoopValueMap, &ILV};
7555 VPCallbackILV CallbackILV(ILV);
7556
7557 VPTransformState State{BestVF, BestUF, LI,
7558 DT, ILV.Builder, ILV.VectorLoopValueMap,
7559 &ILV, CallbackILV};
75247560 State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();
75257561
75267562 //===------------------------------------------------===//
77337769 private:
77347770 PHINode *Phi;
77357771
7772 /// The blend operation is a User of a mask, if not null.
7773 std::unique_ptr User;
7774
77367775 public:
7737 VPBlendRecipe(PHINode *Phi) : VPRecipeBase(VPBlendSC), Phi(Phi) {}
7776 VPBlendRecipe(PHINode *Phi, ArrayRef Masks)
7777 : VPRecipeBase(VPBlendSC), Phi(Phi) {
7778 assert((Phi->getNumIncomingValues() == 1 ||
7779 Phi->getNumIncomingValues() == Masks.size()) &&
7780 "Expected the same number of incoming values and masks");
7781 if (!Masks.empty())
7782 User.reset(new VPUser(Masks));
7783 }
77387784
77397785 /// Method to support type inquiry through isa, cast, and dyn_cast.
77407786 static inline bool classof(const VPRecipeBase *V) {
77537799
77547800 unsigned NumIncoming = Phi->getNumIncomingValues();
77557801
7802 assert((User || NumIncoming == 1) &&
7803 "Multiple predecessors with predecessors having a full mask");
77567804 // Generate a sequence of selects of the form:
77577805 // SELECT(Mask3, In3,
77587806 // SELECT(Mask2, In2,
77597807 // ( ...)))
77607808 InnerLoopVectorizer::VectorParts Entry(State.UF);
7761 for (unsigned In = 0; In < NumIncoming; In++) {
7762 InnerLoopVectorizer::VectorParts Cond =
7763 State.ILV->createEdgeMask(Phi->getIncomingBlock(In), Phi->getParent());
7764
7809 for (unsigned In = 0; In < NumIncoming; ++In) {
77657810 for (unsigned Part = 0; Part < State.UF; ++Part) {
7811 // We might have single edge PHIs (blocks) - use an identity
7812 // 'select' for the first PHI operand.
77667813 Value *In0 =
7767 State.ILV->getOrCreateVectorValue(Phi->getIncomingValue(In), Part);
7768 assert((Cond[Part] || NumIncoming == 1) &&
7769 "Multiple predecessors with one predecessor having a full mask");
7814 State.ILV->getOrCreateVectorValue(Phi->getIncomingValue(In), Part);
77707815 if (In == 0)
77717816 Entry[Part] = In0; // Initialize with the first incoming value.
7772 else
7817 else {
77737818 // Select between the current value and the previous incoming edge
77747819 // based on the incoming mask.
7775 Entry[Part] = State.Builder.CreateSelect(Cond[Part], In0, Entry[Part],
7776 "predphi");
7820 Value *Cond = State.get(User->getOperand(In), Part);
7821 Entry[Part] =
7822 State.Builder.CreateSelect(Cond, In0, Entry[Part], "predphi");
7823 }
77777824 }
77787825 }
77797826 for (unsigned Part = 0; Part < State.UF; ++Part)
77857832 O << " +\n" << Indent << "\"BLEND ";
77867833 Phi->printAsOperand(O, false);
77877834 O << " =";
7788 if (Phi->getNumIncomingValues() == 1) {
7835 if (!User) {
77897836 // Not a User of any mask: not really blending, this is a
77907837 // single-predecessor phi.
77917838 O << " ";
77927839 Phi->getIncomingValue(0)->printAsOperand(O, false);
77937840 } else {
7794 for (unsigned I = 0, E = Phi->getNumIncomingValues(); I < E; ++I) {
7841 for (unsigned I = 0, E = User->getNumOperands(); I < E; ++I) {
77957842 O << " ";
77967843 Phi->getIncomingValue(I)->printAsOperand(O, false);
77977844 O << "/";
7798 Phi->getIncomingBlock(I)->printAsOperand(O, false);
7845 User->getOperand(I)->printAsOperand(O);
77997846 }
78007847 }
78017848 O << "\\l\"";
7802
78037849 }
78047850 };
78057851
78897935 /// A recipe for generating conditional branches on the bits of a mask.
78907936 class VPBranchOnMaskRecipe : public VPRecipeBase {
78917937 private:
7892 /// The input IR basic block used to obtain the mask providing the condition
7893 /// bits for the branch.
7894 BasicBlock *MaskedBasicBlock;
7938 std::unique_ptr User;
78957939
78967940 public:
7897 VPBranchOnMaskRecipe(BasicBlock *BB)
7898 : VPRecipeBase(VPBranchOnMaskSC), MaskedBasicBlock(BB) {}
7941 VPBranchOnMaskRecipe(VPValue *BlockInMask) : VPRecipeBase(VPBranchOnMaskSC) {
7942 if (BlockInMask) // nullptr means all-one mask.
7943 User.reset(new VPUser({BlockInMask}));
7944 }
78997945
79007946 /// Method to support type inquiry through isa, cast, and dyn_cast.
79017947 static inline bool classof(const VPRecipeBase *V) {
79087954
79097955 /// Print the recipe.
79107956 void print(raw_ostream &O, const Twine &Indent) const override {
7911 O << " +\n"
7912 << Indent << "\"BRANCH-ON-MASK-OF " << MaskedBasicBlock->getName()
7913 << "\\l\"";
7957 O << " +\n" << Indent << "\"BRANCH-ON-MASK ";
7958 if (User)
7959 O << *User->getOperand(0);
7960 else
7961 O << " All-One";
7962 O << "\\l\"";
79147963 }
79157964 };
79167965
79477996 };
79487997
79497998 /// A Recipe for widening load/store operations.
7999 /// TODO: We currently execute only per-part unless a specific instance is
8000 /// provided.
79508001 class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
79518002 private:
79528003 Instruction &Instr;
8004 std::unique_ptr User;
79538005
79548006 public:
7955 VPWidenMemoryInstructionRecipe(Instruction &Instr)
7956 : VPRecipeBase(VPWidenMemoryInstructionSC), Instr(Instr) {}
8007 VPWidenMemoryInstructionRecipe(Instruction &Instr, VPValue *Mask)
8008 : VPRecipeBase(VPWidenMemoryInstructionSC), Instr(Instr) {
8009 if (Mask) // Create a VPInstruction to register as a user of the mask.
8010 User.reset(new VPUser({Mask}));
8011 }
79578012
79588013 /// Method to support type inquiry through isa, cast, and dyn_cast.
79598014 static inline bool classof(const VPRecipeBase *V) {
79628017
79638018 /// Generate the wide load/store.
79648019 void execute(VPTransformState &State) override {
7965 State.ILV->vectorizeMemoryInstruction(&Instr);
8020 if (!User)
8021 return State.ILV->vectorizeMemoryInstruction(&Instr);
8022
8023 // Last (and currently only) operand is a mask.
8024 InnerLoopVectorizer::VectorParts MaskValues(State.UF);
8025 VPValue *Mask = User->getOperand(User->getNumOperands() - 1);
8026 for (unsigned Part = 0; Part < State.UF; ++Part)
8027 MaskValues[Part] = State.get(Mask, Part);
8028 State.ILV->vectorizeMemoryInstruction(&Instr, &MaskValues);
79668029 }
79678030
79688031 /// Print the recipe.
79698032 void print(raw_ostream &O, const Twine &Indent) const override {
79708033 O << " +\n" << Indent << "\"WIDEN " << VPlanIngredient(&Instr);
8034 if (User) {
8035 O << ", ";
8036 User->getOperand(0)->printAsOperand(O);
8037 }
79718038 O << "\\l\"";
79728039 }
79738040 };
79938060 /// vectorization decision can potentially shorten this sub-range during
79948061 /// buildVPlan().
79958062 void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF) {
8063
8064 // Collect conditions feeding internal conditional branches; they need to be
8065 // represented in VPlan for it to model masking.
8066 SmallPtrSet NeedDef;
8067
8068 auto *Latch = OrigLoop->getLoopLatch();
8069 for (BasicBlock *BB : OrigLoop->blocks()) {
8070 if (BB == Latch)
8071 continue;
8072 BranchInst *Branch = dyn_cast(BB->getTerminator());
8073 if (Branch && Branch->isConditional())
8074 NeedDef.insert(Branch->getCondition());
8075 }
8076
79968077 for (unsigned VF = MinVF; VF < MaxVF + 1;) {
79978078 VFRange SubRange = {VF, MaxVF + 1};
7998 VPlans.push_back(buildVPlan(SubRange));
8079 VPlans.push_back(buildVPlan(SubRange, NeedDef));
79998080 VF = SubRange.End;
80008081 }
80018082 }
80028083
8003 InnerLoopVectorizer::VectorParts
8004 InnerLoopVectorizer::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
8084 VPValue *LoopVectorizationPlanner::createEdgeMask(BasicBlock *Src,
8085 BasicBlock *Dst,
8086 VPlanPtr &Plan) {
80058087 assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
80068088
80078089 // Look for cached value.
80108092 if (ECEntryIt != EdgeMaskCache.end())
80118093 return ECEntryIt->second;
80128094
8013 VectorParts SrcMask = createBlockInMask(Src);
8095 VPValue *SrcMask = createBlockInMask(Src, Plan);
80148096
80158097 // The terminator has to be a branch inst!
80168098 BranchInst *BI = dyn_cast(Src->getTerminator());
80198101 if (!BI->isConditional())
80208102 return EdgeMaskCache[Edge] = SrcMask;
80218103
8022 VectorParts EdgeMask(UF);
8023 for (unsigned Part = 0; Part < UF; ++Part) {
8024 auto *EdgeMaskPart = getOrCreateVectorValue(BI->getCondition(), Part);
8025 if (BI->getSuccessor(0) != Dst)
8026 EdgeMaskPart = Builder.CreateNot(EdgeMaskPart);
8027
8028 if (SrcMask[Part]) // Otherwise block in-mask is all-one, no need to AND.
8029 EdgeMaskPart = Builder.CreateAnd(EdgeMaskPart, SrcMask[Part]);
8030
8031 EdgeMask[Part] = EdgeMaskPart;
8032 }
8104 VPValue *EdgeMask = Plan->getVPValue(BI->getCondition());
8105 assert(EdgeMask && "No Edge Mask found for condition");
8106
8107 if (BI->getSuccessor(0) != Dst)
8108 EdgeMask = Builder.createNot(EdgeMask);
8109
8110 if (SrcMask) // Otherwise block in-mask is all-one, no need to AND.
8111 EdgeMask = Builder.createAnd(EdgeMask, SrcMask);
80338112
80348113 return EdgeMaskCache[Edge] = EdgeMask;
80358114 }
80368115
8037 InnerLoopVectorizer::VectorParts
8038 InnerLoopVectorizer::createBlockInMask(BasicBlock *BB) {
8116 VPValue *LoopVectorizationPlanner::createBlockInMask(BasicBlock *BB,
8117 VPlanPtr &Plan) {
80398118 assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
80408119
80418120 // Look for cached value.
80458124
80468125 // All-one mask is modelled as no-mask following the convention for masked
80478126 // load/store/gather/scatter. Initialize BlockMask to no-mask.
8048 VectorParts BlockMask(UF);
8049 for (unsigned Part = 0; Part < UF; ++Part)
8050 BlockMask[Part] = nullptr;
8127 VPValue *BlockMask = nullptr;
80518128
80528129 // Loop incoming mask is all-one.
80538130 if (OrigLoop->getHeader() == BB)
80558132
80568133 // This is the block mask. We OR all incoming edges.
80578134 for (auto *Predecessor : predecessors(BB)) {
8058 VectorParts EdgeMask = createEdgeMask(Predecessor, BB);
8059 if (!EdgeMask[0]) // Mask of predecessor is all-one so mask of block is too.
8135 VPValue *EdgeMask = createEdgeMask(Predecessor, BB, Plan);
8136 if (!EdgeMask) // Mask of predecessor is all-one so mask of block is too.
80608137 return BlockMaskCache[BB] = EdgeMask;
80618138
8062 if (!BlockMask[0]) { // BlockMask has its initialized nullptr value.
8139 if (!BlockMask) { // BlockMask has its initialized nullptr value.
80638140 BlockMask = EdgeMask;
80648141 continue;
80658142 }
80668143
8067 for (unsigned Part = 0; Part < UF; ++Part)
8068 BlockMask[Part] = Builder.CreateOr(BlockMask[Part], EdgeMask[Part]);
8144 BlockMask = Builder.createOr(BlockMask, EdgeMask);
80698145 }
80708146
80718147 return BlockMaskCache[BB] = BlockMask;
80998175 }
81008176
81018177 VPWidenMemoryInstructionRecipe *
8102 LoopVectorizationPlanner::tryToWidenMemory(Instruction *I, VFRange &Range) {
8178 LoopVectorizationPlanner::tryToWidenMemory(Instruction *I, VFRange &Range,
8179 VPlanPtr &Plan) {
81038180 if (!isa(I) && !isa(I))
81048181 return nullptr;
81058182
81218198 if (!getDecisionAndClampRange(willWiden, Range))
81228199 return nullptr;
81238200
8124 return new VPWidenMemoryInstructionRecipe(*I);
8201 VPValue *Mask = nullptr;
8202 if (Legal->isMaskRequired(I))
8203 Mask = createBlockInMask(I->getParent(), Plan);
8204
8205 return new VPWidenMemoryInstructionRecipe(*I, Mask);
81258206 }
81268207
81278208 VPWidenIntOrFpInductionRecipe *
81588239 return nullptr;
81598240 }
81608241
8161 VPBlendRecipe *LoopVectorizationPlanner::tryToBlend(Instruction *I) {
8242 VPBlendRecipe *
8243 LoopVectorizationPlanner::tryToBlend(Instruction *I, VPlanPtr &Plan) {
81628244 PHINode *Phi = dyn_cast(I);
81638245 if (!Phi || Phi->getParent() == OrigLoop->getHeader())
81648246 return nullptr;
81658247
8166 return new VPBlendRecipe(Phi);
8248 // We know that all PHIs in non-header blocks are converted into selects, so
8249 // we don't have to worry about the insertion order and we can just use the
8250 // builder. At this point we generate the predication tree. There may be
8251 // duplications since this is a simple recursive scan, but future
8252 // optimizations will clean it up.
8253
8254 SmallVector Masks;
8255 unsigned NumIncoming = Phi->getNumIncomingValues();
8256 for (unsigned In = 0; In < NumIncoming; In++) {
8257 VPValue *EdgeMask =
8258 createEdgeMask(Phi->getIncomingBlock(In), Phi->getParent(), Plan);
8259 assert((EdgeMask || NumIncoming == 1) &&
8260 "Multiple predecessors with one having a full mask");
8261 if (EdgeMask)
8262 Masks.push_back(EdgeMask);
8263 }
8264 return new VPBlendRecipe(Phi, Masks);
81678265 }
81688266
81698267 bool LoopVectorizationPlanner::tryToWiden(Instruction *I, VPBasicBlock *VPBB,
82448342 return UseVectorIntrinsic || !NeedToScalarize;
82458343 }
82468344 if (isa(I) || isa(I)) {
8247 LoopVectorizationCostModel::InstWidening Decision =
8248 CM.getWideningDecision(I, VF);
8249 assert(Decision != LoopVectorizationCostModel::CM_Unknown &&
8250 "CM decision should be taken at this point.");
8251 assert(Decision != LoopVectorizationCostModel::CM_Interleave &&
8252 "Interleave memory opportunity should be caught earlier.");
8253 return Decision != LoopVectorizationCostModel::CM_Scalarize;
8345 assert(CM.getWideningDecision(I, VF) ==
8346 LoopVectorizationCostModel::CM_Scalarize &&
8347 "Memory widening decisions should have been taken care by now");
8348 return false;
82548349 }
82558350 return true;
82568351 };
82728367
82738368 VPBasicBlock *LoopVectorizationPlanner::handleReplication(
82748369 Instruction *I, VFRange &Range, VPBasicBlock *VPBB,
8275 DenseMap &PredInst2Recipe) {
8370 DenseMap &PredInst2Recipe,
8371 VPlanPtr &Plan) {
82768372 bool IsUniform = getDecisionAndClampRange(
82778373 [&](unsigned VF) { return CM.isUniformAfterVectorization(I, VF); },
82788374 Range);
82998395 "VPBB has successors when handling predicated replication.");
83008396 // Record predicated instructions for above packing optimizations.
83018397 PredInst2Recipe[I] = Recipe;
8302 VPBlockBase *Region = VPBB->setOneSuccessor(createReplicateRegion(I, Recipe));
8398 VPBlockBase *Region =
8399 VPBB->setOneSuccessor(createReplicateRegion(I, Recipe, Plan));
83038400 return cast(Region->setOneSuccessor(new VPBasicBlock()));
83048401 }
83058402
83068403 VPRegionBlock *
83078404 LoopVectorizationPlanner::createReplicateRegion(Instruction *Instr,
8308 VPRecipeBase *PredRecipe) {
8405 VPRecipeBase *PredRecipe,
8406 VPlanPtr &Plan) {
83098407 // Instructions marked for predication are replicated and placed under an
83108408 // if-then construct to prevent side-effects.
8409
8410 // Generate recipes to compute the block mask for this region.
8411 VPValue *BlockInMask = createBlockInMask(Instr->getParent(), Plan);
83118412
83128413 // Build the triangular if-then region.
83138414 std::string RegionName = (Twine("pred.") + Instr->getOpcodeName()).str();
83148415 assert(Instr->getParent() && "Predicated instruction not in any basic block");
8315 auto *BOMRecipe = new VPBranchOnMaskRecipe(Instr->getParent());
8416 auto *BOMRecipe = new VPBranchOnMaskRecipe(BlockInMask);
83168417 auto *Entry = new VPBasicBlock(Twine(RegionName) + ".entry", BOMRecipe);
83178418 auto *PHIRecipe =
83188419 Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr);
83288429 return Region;
83298430 }
83308431
8331 std::unique_ptr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
8432 LoopVectorizationPlanner::VPlanPtr
8433 LoopVectorizationPlanner::buildVPlan(VFRange &Range,
8434 const SmallPtrSetImpl &NeedDef) {
8435 EdgeMaskCache.clear();
8436 BlockMaskCache.clear();
83328437 DenseMap &SinkAfter = Legal->getSinkAfter();
83338438 DenseMap SinkAfterInverse;
83348439
83508455 VPBasicBlock *VPBB = new VPBasicBlock("Pre-Entry");
83518456 auto Plan = llvm::make_unique(VPBB);
83528457
8458 // Represent values that will have defs inside VPlan.
8459 for (Value *V : NeedDef)
8460 Plan->addVPValue(V);
8461
83538462 // Scan the body of the loop in a topological order to visit each basic block
83548463 // after having visited its predecessor basic blocks.
83558464 LoopBlocksDFS DFS(OrigLoop);
83628471 auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());
83638472 VPBB->setOneSuccessor(FirstVPBBForBB);
83648473 VPBB = FirstVPBBForBB;
8474 Builder.setInsertPoint(VPBB);
83658475
83668476 std::vector Ingredients;
83678477
84208530 }
84218531
84228532 // Check if Instr is a memory operation that should be widened.
8423 if ((Recipe = tryToWidenMemory(Instr, Range))) {
8533 if ((Recipe = tryToWidenMemory(Instr, Range, Plan))) {
84248534 VPBB->appendRecipe(Recipe);
84258535 continue;
84268536 }
84308540 VPBB->appendRecipe(Recipe);
84318541 continue;
84328542 }
8433 if ((Recipe = tryToBlend(Instr))) {
8543 if ((Recipe = tryToBlend(Instr, Plan))) {
84348544 VPBB->appendRecipe(Recipe);
84358545 continue;
84368546 }
84488558 // Otherwise, if all widening options failed, Instruction is to be
84498559 // replicated. This may create a successor for VPBB.
84508560 VPBasicBlock *NextVPBB =
8451 handleReplication(Instr, Range, VPBB, PredInst2Recipe);
8561 handleReplication(Instr, Range, VPBB, PredInst2Recipe, Plan);
84528562 if (NextVPBB != VPBB) {
84538563 VPBB = NextVPBB;
84548564 VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)
85248634 unsigned Part = State.Instance->Part;
85258635 unsigned Lane = State.Instance->Lane;
85268636
8527 auto Cond = State.ILV->createBlockInMask(MaskedBasicBlock);
8528
8529 Value *ConditionBit = Cond[Part];
8530 if (!ConditionBit) // Block in mask is all-one.
8637 Value *ConditionBit = nullptr;
8638 if (!User) // Block in mask is all-one.
85318639 ConditionBit = State.Builder.getTrue();
8532 else if (ConditionBit->getType()->isVectorTy())
8533 ConditionBit = State.Builder.CreateExtractElement(
8534 ConditionBit, State.Builder.getInt32(Lane));
8640 else {
8641 VPValue *BlockInMask = User->getOperand(0);
8642 ConditionBit = State.get(BlockInMask, Part);
8643 if (ConditionBit->getType()->isVectorTy())
8644 ConditionBit = State.Builder.CreateExtractElement(
8645 ConditionBit, State.Builder.getInt32(Lane));
8646 }
85358647
85368648 // Replace the temporary unreachable terminator with a new conditional branch,
85378649 // whose two destinations will be set later when they are created.
85418653 auto *CondBr = BranchInst::Create(State.CFG.PrevBB, nullptr, ConditionBit);
85428654 CondBr->setSuccessor(0, nullptr);
85438655 ReplaceInstWithInst(CurrentTerminator, CondBr);
8544
8545 DEBUG(dbgs() << "\nLV: vectorizing BranchOnMask recipe "
8546 << MaskedBasicBlock->getName());
85478656 }
85488657
85498658 void VPPredInstPHIRecipe::execute(VPTransformState &State) {
4545
4646 #define DEBUG_TYPE "vplan"
4747
48 raw_ostream &llvm::operator<<(raw_ostream &OS, const VPValue &V) {
49 if (const VPInstruction *Instr = dyn_cast(&V))
50 Instr->print(OS);
51 else
52 V.printAsOperand(OS);
53 return OS;
54 }
55
4856 /// \return the VPBasicBlock that is the entry of Block, possibly indirectly.
4957 const VPBasicBlock *VPBlockBase::getEntryBasicBlock() const {
5058 const VPBlockBase *Block = this;
211219 State->Instance.reset();
212220 }
213221
222 void VPInstruction::generateInstruction(VPTransformState &State,
223 unsigned Part) {
224 IRBuilder<> &Builder = State.Builder;
225
226 if (Instruction::isBinaryOp(getOpcode())) {
227 Value *A = State.get(getOperand(0), Part);
228 Value *B = State.get(getOperand(1), Part);
229 Value *V = Builder.CreateBinOp((Instruction::BinaryOps)getOpcode(), A, B);
230 State.set(this, V, Part);
231 return;
232 }
233
234 switch (getOpcode()) {
235 case VPInstruction::Not: {
236 Value *A = State.get(getOperand(0), Part);
237 Value *V = Builder.CreateNot(A);
238 State.set(this, V, Part);
239 break;
240 }
241 default:
242 llvm_unreachable("Unsupported opcode for instruction");
243 }
244 }
245
246 void VPInstruction::execute(VPTransformState &State) {
247 assert(!State.Instance && "VPInstruction executing an Instance");
248 for (unsigned Part = 0; Part < State.UF; ++Part)
249 generateInstruction(State, Part);
250 }
251
252 void VPInstruction::print(raw_ostream &O, const Twine &Indent) const {
253 O << " +\n" << Indent << "\"EMIT ";
254 print(O);
255 O << "\\l\"";
256 }
257
258 void VPInstruction::print(raw_ostream &O) const {
259 printAsOperand(O);
260 O << " = ";
261
262 switch (getOpcode()) {
263 case VPInstruction::Not:
264 O << "not";
265 break;
266 default:
267 O << Instruction::getOpcodeName(getOpcode());
268 }
269
270 for (const VPValue *Operand : operands()) {
271 O << " ";
272 Operand->printAsOperand(O);
273 }
274 }
275
214276 /// Generate the code inside the body of the vectorized loop. Assumes a single
215277 /// LoopVectorBody basic-block was created for this. Introduce additional
216278 /// basic-blocks as needed, and fill them all.
217279 void VPlan::execute(VPTransformState *State) {
280 // 0. Set the reverse mapping from VPValues to Values for code generation.
281 for (auto &Entry : Value2VPValue)
282 State->VPValue2Value[Entry.second] = Entry.first;
283
218284 BasicBlock *VectorPreHeaderBB = State->CFG.PrevBB;
219285 BasicBlock *VectorHeaderBB = VectorPreHeaderBB->getSingleSuccessor();
220286 assert(VectorHeaderBB && "Loop preheader does not have a single successor.");
315381 OS << "graph [labelloc=t, fontsize=30; label=\"Vectorization Plan";
316382 if (!Plan.getName().empty())
317383 OS << "\\n" << DOT::EscapeString(Plan.getName());
384 if (!Plan.Value2VPValue.empty()) {
385 OS << ", where:";
386 for (auto Entry : Plan.Value2VPValue) {
387 OS << "\\n" << *Entry.second;
388 OS << DOT::EscapeString(" := ");
389 Entry.first->printAsOperand(OS, false);
390 }
391 }
318392 OS << "\"]\n";
319393 OS << "node [shape=rect, fontname=Courier, fontsize=30]\n";
320394 OS << "edge [fontname=Courier, fontsize=30]\n";
1414 /// treated as proper graphs for generic algorithms;
1515 /// 3. Pure virtual VPRecipeBase serving as the base class for recipes contained
1616 /// within VPBasicBlocks;
17 /// 4. The VPlan class holding a candidate for vectorization;
18 /// 5. The VPlanPrinter class providing a way to print a plan in dot format.
17 /// 4. VPInstruction, a concrete Recipe and VPUser modeling a single planned
18 /// instruction;
19 /// 5. The VPlan class holding a candidate for vectorization;
20 /// 6. The VPlanPrinter class providing a way to print a plan in dot format;
1921 /// These are documented in docs/VectorizationPlan.rst.
2022 //
2123 //===----------------------------------------------------------------------===//
2325 #ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_H
2426 #define LLVM_TRANSFORMS_VECTORIZE_VPLAN_H
2527
28 #include "VPlanValue.h"
2629 #include "llvm/ADT/DenseMap.h"
2730 #include "llvm/ADT/GraphTraits.h"
2831 #include "llvm/ADT/Optional.h"
3740 #include
3841 #include
3942 #include
43
44 // The (re)use of existing LoopVectorize classes is subject to future VPlan
45 // refactoring.
46 namespace {
47 class LoopVectorizationLegality;
48 class LoopVectorizationCostModel;
49 } // namespace
4050
4151 namespace llvm {
4252
8191 /// Entries from either map can be retrieved using the getVectorValue and
8292 /// getScalarValue functions, which assert that the desired value exists.
8393 struct VectorizerValueMap {
94 friend struct VPTransformState;
95
8496 private:
8597 /// The unroll factor. Each entry in the vector map contains UF vector values.
8698 unsigned UF;
194206 }
195207 };
196208
209 /// This class is used to enable the VPlan to invoke a method of ILV. This is
210 /// needed until the method is refactored out of ILV and becomes reusable.
211 struct VPCallback {
212 virtual ~VPCallback() {}
213 virtual Value *getOrCreateVectorValues(Value *V, unsigned Part) = 0;
214 };
215
197216 /// VPTransformState holds information passed down when "executing" a VPlan,
198217 /// needed for generating the output IR.
199218 struct VPTransformState {
200219 VPTransformState(unsigned VF, unsigned UF, LoopInfo *LI, DominatorTree *DT,
201220 IRBuilder<> &Builder, VectorizerValueMap &ValueMap,
202 InnerLoopVectorizer *ILV)
203 : VF(VF), UF(UF), LI(LI), DT(DT), Builder(Builder), ValueMap(ValueMap),
204 ILV(ILV) {}
221 InnerLoopVectorizer *ILV, VPCallback &Callback)
222 : VF(VF), UF(UF), Instance(), LI(LI), DT(DT), Builder(Builder),
223 ValueMap(ValueMap), ILV(ILV), Callback(Callback) {}
205224
206225 /// The chosen Vectorization and Unroll Factors of the loop being vectorized.
207226 unsigned VF;
211230 /// that all instances are to be generated, using either scalar or vector
212231 /// instructions.
213232 Optional Instance;
233
234 struct DataState {
235 /// A type for vectorized values in the new loop. Each value from the
236 /// original loop, when vectorized, is represented by UF vector values in
237 /// the new unrolled loop, where UF is the unroll factor.
238 typedef SmallVector PerPartValuesTy;
239
240 DenseMap PerPartOutput;
241 } Data;
242
243 /// Get the generated Value for a given VPValue and a given Part. Note that
244 /// as some Defs are still created by ILV and managed in its ValueMap, this
245 /// method will delegate the call to ILV in such cases in order to provide
246 /// callers a consistent API.
247 /// \see set.
248 Value *get(VPValue *Def, unsigned Part) {
249 // If Values have been set for this Def return the one relevant for \p Part.
250 if (Data.PerPartOutput.count(Def))
251 return Data.PerPartOutput[Def][Part];
252 // Def is managed by ILV: bring the Values from ValueMap.
253 return Callback.getOrCreateVectorValues(VPValue2Value[Def], Part);
254 }
255
256 /// Set the generated Value for a given VPValue and a given Part.
257 void set(VPValue *Def, Value *V, unsigned Part) {
258 if (!Data.PerPartOutput.count(Def)) {
259 DataState::PerPartValuesTy Entry(UF);
260 Data.PerPartOutput[Def] = Entry;
261 }
262 Data.PerPartOutput[Def][Part] = V;
263 }
214264
215265 /// Hold state information used when constructing the CFG of the output IR,
216266 /// traversing the VPBasicBlocks and generating corresponding IR BasicBlocks.
246296 /// Values of the output IR.
247297 VectorizerValueMap &ValueMap;
248298
299 /// Hold a reference to a mapping between VPValues in VPlan and original
300 /// Values they correspond to.
301 VPValue2ValueTy VPValue2Value;
302
249303 /// Hold a pointer to InnerLoopVectorizer to reuse its IR generation methods.
250304 InnerLoopVectorizer *ILV;
305
306 VPCallback &Callback;
251307 };
252308
253309 /// VPBlockBase is the building block of the Hierarchical Control-Flow Graph.
453509 using VPRecipeTy = enum {
454510 VPBlendSC,
455511 VPBranchOnMaskSC,
512 VPInstructionSC,
456513 VPInterleaveSC,
457514 VPPredInstPHISC,
458515 VPReplicateSC,
482539 virtual void print(raw_ostream &O, const Twine &Indent) const = 0;
483540 };
484541
542 /// This is a concrete Recipe that models a single VPlan-level instruction.
543 /// While as any Recipe it may generate a sequence of IR instructions when
544 /// executed, these instructions would always form a single-def expression as
545 /// the VPInstruction is also a single def-use vertex.
546 class VPInstruction : public VPUser, public VPRecipeBase {
547 public:
548 /// VPlan opcodes, extending LLVM IR with idiomatics instructions.
549 enum { Not = Instruction::OtherOpsEnd + 1 };
550
551 private:
552 typedef unsigned char OpcodeTy;
553 OpcodeTy Opcode;
554
555 /// Utility method serving execute(): generates a single instance of the
556 /// modeled instruction.
557 void generateInstruction(VPTransformState &State, unsigned Part);
558
559 public:
560 VPInstruction(unsigned Opcode, std::initializer_list Operands)
561 : VPUser(VPValue::VPInstructionSC, Operands),
562 VPRecipeBase(VPRecipeBase::VPInstructionSC), Opcode(Opcode) {}
563
564 /// Method to support type inquiry through isa, cast, and dyn_cast.
565 static inline bool classof(const VPValue *V) {
566 return V->getVPValueID() == VPValue::VPInstructionSC;
567 }
568
569 /// Method to support type inquiry through isa, cast, and dyn_cast.
570 static inline bool classof(const VPRecipeBase *R) {
571 return R->getVPRecipeID() == VPRecipeBase::VPInstructionSC;
572 }
573
574 unsigned getOpcode() const { return Opcode; }
575
576 /// Generate the instruction.
577 /// TODO: We currently execute only per-part unless a specific instance is
578 /// provided.
579 void execute(VPTransformState &State) override;
580
581 /// Print the Recipe.
582 void print(raw_ostream &O, const Twine &Indent) const override;
583
584 /// Print the VPInstruction.
585 void print(raw_ostream &O) const;
586 };
587
485588 /// VPBasicBlock serves as the leaf of the Hierarchical Control-Flow Graph. It
486589 /// holds a sequence of zero or more VPRecipe's each representing a sequence of
487590 /// output IR instructions.
538641 return V->getVPBlockID() == VPBlockBase::VPBasicBlockSC;
539642 }
540643
541 /// Augment the existing recipes of a VPBasicBlock with an additional
542 /// \p Recipe as the last recipe.
543 void appendRecipe(VPRecipeBase *Recipe) {
644 void insert(VPRecipeBase *Recipe, iterator InsertPt) {
544645 assert(Recipe && "No recipe to append.");
545646 assert(!Recipe->Parent && "Recipe already in VPlan");
546647 Recipe->Parent = this;
547 return Recipes.push_back(Recipe);
548 }
648 Recipes.insert(InsertPt, Recipe);
649 }
650
651 /// Augment the existing recipes of a VPBasicBlock with an additional
652 /// \p Recipe as the last recipe.
653 void appendRecipe(VPRecipeBase *Recipe) { insert(Recipe, end()); }
549654
550655 /// The method which generates the output IR instructions that correspond to
551656 /// this VPBasicBlock, thereby "executing" the VPlan.
619724 /// Hierarchical-CFG of VPBasicBlocks and VPRegionBlocks rooted at an Entry
620725 /// VPBlock.
621726 class VPlan {
727 friend class VPlanPrinter;
728
622729 private:
623730 /// Hold the single entry to the Hierarchical CFG of the VPlan.
624731 VPBlockBase *Entry;
629736 /// Holds the name of the VPlan, for printing.
630737 std::string Name;
631738
739 /// Holds a mapping between Values and their corresponding VPValue inside
740 /// VPlan.
741 Value2VPValueTy Value2VPValue;
742
632743 public:
633744 VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {}
634745
635746 ~VPlan() {
636747 if (Entry)
637748 VPBlockBase::deleteCFG(Entry);
749 for (auto &MapEntry : Value2VPValue)
750 delete MapEntry.second;
638751 }
639752
640753 /// Generate the IR code for this VPlan.
652765 const std::string &getName() const { return Name; }
653766
654767 void setName(const Twine &newName) { Name = newName.str(); }
768
769 void addVPValue(Value *V) {
770 assert(V && "Trying to add a null Value to VPlan");
771 assert(!Value2VPValue.count(V) && "Value already exists in VPlan");
772 Value2VPValue[V] = new VPValue();
773 }
774
775 VPValue *getVPValue(Value *V) {
776 assert(V && "Trying to get the VPValue of a null Value");
777 assert(Value2VPValue.count(V) && "Value does not exist in VPlan");
778 return Value2VPValue[V];
779 }
655780
656781 private:
657782 /// Add to the given dominator tree the header block and every new basic block
0 //===- VPlanBuilder.h - A VPlan utility for constructing VPInstructions ---===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 ///
9 /// \file
10 /// This file provides a VPlan-based builder utility analogous to IRBuilder.
11 /// It provides an instruction-level API for generating VPInstructions while
12 /// abstracting away the Recipe manipulation details.
13 //===----------------------------------------------------------------------===//
14
15 #ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_BUILDER_H
16 #define LLVM_TRANSFORMS_VECTORIZE_VPLAN_BUILDER_H
17
18 #include "VPlan.h"
19
20 namespace llvm {
21
22 class VPBuilder {
23 private:
24 VPBasicBlock *BB = nullptr;
25 VPBasicBlock::iterator InsertPt = VPBasicBlock::iterator();
26
27 VPInstruction *createInstruction(unsigned Opcode,
28 std::initializer_list Operands) {
29 VPInstruction *Instr = new VPInstruction(Opcode, Operands);
30 BB->insert(Instr, InsertPt);
31 return Instr;
32 }
33
34 public:
35 VPBuilder() {}
36
37 /// \brief This specifies that created VPInstructions should be appended to
38 /// the end of the specified block.
39 void setInsertPoint(VPBasicBlock *TheBB) {
40 assert(TheBB && "Attempting to set a null insert point");
41 BB = TheBB;
42 InsertPt = BB->end();
43 }
44
45 VPValue *createNot(VPValue *Operand) {
46 return createInstruction(VPInstruction::Not, {Operand});
47 }
48
49 VPValue *createAnd(VPValue *LHS, VPValue *RHS) {
50 return createInstruction(Instruction::BinaryOps::And, {LHS, RHS});
51 }
52
53 VPValue *createOr(VPValue *LHS, VPValue *RHS) {
54 return createInstruction(Instruction::BinaryOps::Or, {LHS, RHS});
55 }
56 };
57
58 } // namespace llvm
59
60 #endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_BUILDER_H
0 //===- VPlanValue.h - Represent Values in Vectorizer Plan -----------------===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 ///
9 /// \file
10 /// This file contains the declarations of the entities induced by Vectorization
11 /// Plans, e.g. the instructions the VPlan intends to generate if executed.
12 /// VPlan models the following entities:
13 /// VPValue
14 /// |-- VPUser
15 /// | |-- VPInstruction
16 /// These are documented in docs/VectorizationPlan.rst.
17 ///
18 //===----------------------------------------------------------------------===//
19
20 #ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H
21 #define LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H
22
23 #include "llvm/ADT/DenseMap.h"
24 #include "llvm/ADT/SmallVector.h"
25 #include "llvm/IR/Value.h"
26 #include "llvm/Support/Debug.h"
27 #include "llvm/Support/raw_ostream.h"
28
29 namespace llvm {
30
31 // Forward declarations.
32 class VPUser;
33
34 // This is the base class of the VPlan Def/Use graph, used for modeling the data
35 // flow into, within and out of the VPlan. VPValues can stand for live-ins
36 // coming from the input IR, instructions which VPlan will generate if executed
37 // and live-outs which the VPlan will need to fix accordingly.
38 class VPValue {
39 private:
40 const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).
41
42 SmallVector Users;
43
44 protected:
45 VPValue(const unsigned char SC) : SubclassID(SC) {}
46
47 public:
48 /// An enumeration for keeping track of the concrete subclass of VPValue that
49 /// are actually instantiated. Values of this enumeration are kept in the
50 /// SubclassID field of the VPValue objects. They are used for concrete
51 /// type identification.
52 enum { VPValueSC, VPUserSC, VPInstructionSC };
53
54 VPValue() : SubclassID(VPValueSC) {}
55 VPValue(const VPValue &) = delete;
56 VPValue &operator=(const VPValue &) = delete;
57
58 /// \return an ID for the concrete type of this object.
59 /// This is used to implement the classof checks. This should not be used
60 /// for any other purpose, as the values may change as LLVM evolves.
61 unsigned getVPValueID() const { return SubclassID; }
62
63 void printAsOperand(raw_ostream &OS) const {
64 OS << "%vp" << (unsigned short)(unsigned long long)this;
65 }
66
67 unsigned getNumUsers() const { return Users.size(); }
68 void addUser(VPUser &User) { Users.push_back(&User); }
69
70 typedef SmallVectorImpl::iterator user_iterator;
71 typedef SmallVectorImpl::const_iterator const_user_iterator;
72 typedef iterator_range user_range;
73 typedef iterator_range const_user_range;
74
75 user_iterator user_begin() { return Users.begin(); }
76 const_user_iterator user_begin() const { return Users.begin(); }
77 user_iterator user_end() { return Users.end(); }
78 const_user_iterator user_end() const { return Users.end(); }
79 user_range users() { return user_range(user_begin(), user_end()); }
80 const_user_range users() const {
81 return const_user_range(user_begin(), user_end());
82 }
83 };
84
85 typedef DenseMap Value2VPValueTy;
86 typedef DenseMap VPValue2ValueTy;
87
88 raw_ostream &operator<<(raw_ostream &OS, const VPValue &V);
89
90 /// This class augments VPValue with operands which provide the inverse def-use
91 /// edges from VPValue's users to their defs.
92 class VPUser : public VPValue {
93 private:
94 SmallVector Operands;
95
96 void addOperand(VPValue *Operand) {
97 Operands.push_back(Operand);
98 Operand->addUser(*this);
99 }
100
101 protected:
102 VPUser(const unsigned char SC) : VPValue(SC) {}
103 VPUser(const unsigned char SC, ArrayRef Operands) : VPValue(SC) {
104 for (VPValue *Operand : Operands)
105 addOperand(Operand);
106 }
107
108 public:
109 VPUser() : VPValue(VPValue::VPUserSC) {}
110 VPUser(ArrayRef Operands) : VPUser(VPValue::VPUserSC, Operands) {}
111 VPUser(std::initializer_list Operands)
112 : VPUser(ArrayRef(Operands)) {}
113 VPUser(const VPUser &) = delete;
114 VPUser &operator=(const VPUser &) = delete;
115
116 /// Method to support type inquiry through isa, cast, and dyn_cast.
117 static inline bool classof(const VPValue *V) {
118 return V->getVPValueID() >= VPUserSC &&
119 V->getVPValueID() <= VPInstructionSC;
120 }
121
122 unsigned getNumOperands() const { return Operands.size(); }
123 inline VPValue *getOperand(unsigned N) const {
124 assert(N < Operands.size() && "Operand index out of bounds");
125 return Operands[N];
126 }
127
128 typedef SmallVectorImpl::iterator operand_iterator;
129 typedef SmallVectorImpl::const_iterator const_operand_iterator;
130 typedef iterator_range operand_range;
131 typedef iterator_range const_operand_range;
132
133 operand_iterator op_begin() { return Operands.begin(); }
134 const_operand_iterator op_begin() const { return Operands.begin(); }
135 operand_iterator op_end() { return Operands.end(); }
136 const_operand_iterator op_end() const { return Operands.end(); }
137 operand_range operands() { return operand_range(op_begin(), op_end()); }
138 const_operand_range operands() const {
139 return const_operand_range(op_begin(), op_end());
140 }
141 };
142
143 } // namespace llvm
144
145 #endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H
4141 ; CHECK-NEXT: [[TMP13:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD6]],
4242 ; CHECK-NEXT: [[TMP14:%.*]] = select <4 x i1> [[TMP13]], <4 x i32> , <4 x i32>
4343 ; CHECK-NEXT: [[TMP15:%.*]] = and <4 x i1> [[TMP12]], [[TMP11]]
44 ; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP15]], <4 x i32> , <4 x i32>
4544 ; CHECK-NEXT: [[TMP16:%.*]] = xor <4 x i1> [[TMP12]],
4645 ; CHECK-NEXT: [[TMP17:%.*]] = and <4 x i1> [[TMP11]], [[TMP16]]
46 ; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP15]], <4 x i32> , <4 x i32>
4747 ; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP17]], <4 x i32> [[TMP14]], <4 x i32> [[PREDPHI]]
4848 ; CHECK-NEXT: [[TMP18:%.*]] = bitcast i32* [[TMP7]] to <4 x i32>*
4949 ; CHECK-NEXT: store <4 x i32> [[PREDPHI7]], <4 x i32>* [[TMP18]], align 4, !alias.scope !0, !noalias !3