llvm.org GIT mirror llvm / 98be03e
[Docs] Add VectorizationPlan to docs/Proposals. Following the request made in https://reviews.llvm.org/D32871, the general documentation of the Vectorization Plan is hereby placed under docs/Proposals. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304161 91177308-0d34-0410-b5e6-96231b3b80d8 Ayal Zaks 2 years ago
3 changed file(s) with 196 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
0 ==================
1 Vectorization Plan
2 ==================
4 .. contents::
5 :local:
7 Abstract
8 ========
9 The vectorization transformation can be rather complicated, involving several
10 potential alternatives, especially for outer-loops [1]_ but also possibly for
11 innermost loops. These alternatives may have significant performance impact,
12 both positive and negative. A cost model is therefore employed to identify the
13 best alternative, including the alternative of avoiding any transformation
14 altogether.
16 The Vectorization Plan is an explicit model for describing vectorization
17 candidates. It serves for both optimizing candidates including estimating their
18 cost reliably, and for performing their final translation into IR. This
19 facilitates dealing with multiple vectorization candidates.
21 High-level Design
22 =================
24 Vectorization Workflow
25 ----------------------
26 VPlan-based vectorization involves three major steps, taking a "scenario-based
27 approach" to vectorization planning:
29 1. Legal Step: check if a loop can be legally vectorized; encode contraints and
30 artifacts if so.
31 2. Plan Step:
33 a. Build initial VPlans following the constraints and decisions taken by
34 Legal Step 1, and compute their cost.
35 b. Apply optimizations to the VPlans, possibly forking additional VPlans.
36 Prune sub-optimal VPlans having relatively high cost.
37 3. Execute Step: materialize the best VPlan. Note that this is the only step
38 that modifies the IR.
40 Design Guidelines
41 -----------------
42 In what follows, the term "input IR" refers to code that is fed into the
43 vectorizer whereas the term "output IR" refers to code that is generated by the
44 vectorizer. The output IR contains code that has been vectorized or "widened"
45 according to a loop Vectorization Factor (VF), and/or loop unroll-and-jammed
46 according to an Unroll Factor (UF).
47 The design of VPlan follows several high-level guidelines:
49 1. Analysis-like: building and manipulating VPlans must not modify the input IR.
50 In particular, if the best option is not to vectorize at all, the
51 vectorization process terminates before reaching Step 3, and compilation
52 should proceed as if VPlans had not been built.
54 2. Align Cost & Execute: each VPlan must support both estimating the cost and
55 generating the output IR code, such that the cost estimation evaluates the
56 to-be-generated code reliably.
58 3. Support vectorizing additional constructs:
60 a. Outer-loop vectorization. In particular, VPlan must be able to model the
61 control-flow of the output IR which may include multiple basic-blocks and
62 nested loops.
63 b. SLP vectorization.
64 c. Combinations of the above, including nested vectorization: vectorizing
65 both an inner loop and an outer-loop at the same time (each with its own
66 VF and UF), mixed vectorization: vectorizing a loop with SLP patterns
67 inside [4]_, (re)vectorizing input IR containing vector code.
68 d. Function vectorization [2]_.
70 4. Support multiple candidates efficiently. In particular, similar candidates
71 related to a range of possible VF's and UF's must be represented efficiently.
72 Potential versioning needs to be supported efficiently.
74 5. Support vectorizing idioms, such as interleaved groups of strided loads or
75 stores. This is achieved by modeling a sequence of output instructions using
76 a "Recipe", which is responsible for computing its cost and generating its
77 code.
79 6. Encapsulate Single-Entry Single-Exit regions (SESE). During vectorization
80 such regions may need to be, for example, predicated and linearized, or
81 replicated VF*UF times to handle scalarized and predicated instructions.
82 Innerloops are also modelled as SESE regions.
84 Low-level Design
85 ================
86 The low-level design of VPlan comprises of the following classes.
88 :LoopVectorizationPlanner:
89 A LoopVectorizationPlanner is designed to handle the vectorization of a loop
90 or a loop nest. It can construct, optimize and discard one or more VPlans,
91 each VPlan modelling a distinct way to vectorize the loop or the loop nest.
92 Once the best VPlan is determined, including the best VF and UF, this VPlan
93 drives the generation of output IR.
95 :VPlan:
96 A model of a vectorized candidate for a given input IR loop or loop nest. This
97 candidate is represented using a Hierarchical CFG. VPlan supports estimating
98 the cost and driving the generation of the output IR code it represents.
100 :Hierarchical CFG:
101 A control-flow graph whose nodes are basic-blocks or Hierarchical CFG's. The
102 Hierarchical CFG data structure is similar to the Tile Tree [5]_, where
103 cross-Tile edges are lifted to connect Tiles instead of the original
104 basic-blocks as in Sharir [6]_, promoting the Tile encapsulation. The terms
105 Region and Block are used rather than Tile [5]_ to avoid confusion with loop
106 tiling.
108 :VPBlockBase:
109 The building block of the Hierarchical CFG. A pure-virtual base-class of
110 VPBasicBlock and VPRegionBlock, see below. VPBlockBase models the hierarchical
111 control-flow relations with other VPBlocks. Note that in contrast to the IR
112 BasicBlock, a VPBlockBase models its control-flow successors and predecessors
113 directly, rather than through a Terminator branch or through predecessor
114 branches that "use" the VPBlockBase.
116 :VPBasicBlock:
117 VPBasicBlock is a subclass of VPBlockBase, and serves as the leaves of the
118 Hierarchical CFG. It represents a sequence of output IR instructions that will
119 appear consecutively in an output IR basic-block. The instructions of this
120 basic-block originate from one or more VPBasicBlocks. VPBasicBlock holds a
121 sequence of zero or more VPRecipes that model the cost and generation of the
122 output IR instructions.
124 :VPRegionBlock:
125 VPRegionBlock is a subclass of VPBlockBase. It models a collection of
126 VPBasicBlocks and VPRegionBlocks which form a SESE subgraph of the output IR
127 CFG. A VPRegionBlock may indicate that its contents are to be replicated a
128 constant number of times when output IR is generated, effectively representing
129 a loop with constant trip-count that will be completely unrolled. This is used
130 to support scalarized and predicated instructions with a single model for
131 multiple candidate VF's and UF's.
133 :VPRecipeBase:
134 A pure-virtual base class modeling a sequence of one or more output IR
135 instructions, possibly based on one or more input IR instructions. These
136 input IR instructions are referred to as "Ingredients" of the Recipe. A Recipe
137 may specify how its ingredients are to be transformed to produce the output IR
138 instructions; e.g., cloned once, replicated multiple times or widened
139 according to selected VF.
141 :VPTransformState:
142 Stores information used for generating output IR, passed from
143 LoopVectorizationPlanner to its selected VPlan for execution, and used to pass
144 additional information down to VPBlocks and VPRecipes.
146 Related LLVM components
147 -----------------------
148 1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP
149 tree, where TSLP [3]_ adds Plan Step 2.b.
151 2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by
152 Polly [7]_.
154 References
155 ----------
156 .. [1] "Outer-loop vectorization: revisited for short SIMD architectures", Dorit
157 Nuzman and Ayal Zaks, PACT 2008.
159 .. [2] "Proposal for function vectorization and loop vectorization with function
160 calls", Xinmin Tian, [`cfe-dev
161 `_].,
162 March 2, 2016.
163 See also `review `_.
165 .. [3] "Throttling Automatic Vectorization: When Less is More", Vasileios
166 Porpodas and Tim Jones, PACT 2015 and LLVM Developers' Meeting 2015.
168 .. [4] "Exploiting mixed SIMD parallelism by reducing data reorganization
169 overhead", Hao Zhou and Jingling Xue, CGO 2016.
171 .. [5] "Register Allocation via Hierarchical Graph Coloring", David Callahan and
172 Brian Koblenz, PLDI 1991
174 .. [6] "Structural analysis: A new approach to flow analysis in optimizing
175 compilers", M. Sharir, Journal of Computer Languages, Jan. 1980
177 .. [7] "Enabling Polyhedral Optimizations in LLVM", Tobias Grosser, Diploma
178 thesis, 2011.
180 .. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks,
181 European LLVM Developers' Meeting 2017.
382382 .. image:: linpack-pc.png
384 Ongoing Development Directions
385 ------------------------------
387 .. toctree::
388 :hidden:
390 Proposals/VectorizationPlan
392 :doc:`Proposals/VectorizationPlan`
393 Modeling the process and upgrading the infrastructure of LLVM's Loop Vectorizer.
384395 .. _slp-vectorizer:
386397 The SLP Vectorizer
528528 CodeOfConduct
529529 Proposals/GitHubMove
530 Proposals/VectorizationPlan
531532 :doc:`CodeOfConduct`
532533 Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,
535536 :doc:`Proposals/GitHubMove`
536537 Proposal to move from SVN/Git to GitHub.
539 :doc:`Proposals/VectorizationPlan`
540 Proposal to model the process and upgrade the infrastructure of LLVM's Loop Vectorizer.
539542 Indices and tables
540543 ==================