llvm.org GIT mirror llvm / c122af5
[llvm-mca][docs] Improve the "How LLVM-MCA works" section. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338410 91177308-0d34-0410-b5e6-96231b3b80d8 Andrea Di Biagio 2 years ago
1 changed file(s) with 24 addition(s) and 4 deletion(s). Raw diff Collapse all Expand all
286286 The report is structured in three main sections. The first section collects a
287287 few performance numbers; the goal of this section is to give a very quick
288288 overview of the performance throughput. In this example, the two important
289 performance indicators are the predicted total number of cycles, and the IPC.
290 IPC is probably the most important throughput indicator. A big delta between
291 the Dispatch Width and the computed IPC is an indicator of potential
292 performance issues.
289 performance indicators are **IPC** and **Block RThroughput** (Block Reciprocal
290 Throughput).
292 IPC is computed dividing the total number of simulated instructions by the total
293 number of cycles. A delta between Dispatch Width and IPC is an indicator of a
294 performance issue. In the absence of loop-carried data dependencies, the
295 observed IPC tends to a theoretical maximum which can be computed by dividing
296 the number of instructions of a single iteration by the *Block RThroughput*.
298 IPC is bounded from above by the dispatch width. That is because the dispatch
299 width limits the maximum size of a dispatch group. IPC is also limited by the
300 amount of hardware parallelism. The availability of hardware resources affects
301 the resource pressure distribution, and it limits the number of instructions
302 that can be executed in parallel every cycle. A delta between Dispatch
303 Width and the theoretical maximum IPC is an indicator of a performance
304 bottleneck caused by the lack of hardware resources. In general, the lower the
305 Block RThroughput, the better.
307 In this example, ``Instructions per iteration/Block RThroughput`` is 1.50. Since
308 there are no loop-carried dependencies, the observed IPC is expected to approach
309 1.50 when the number of iterations tends to infinity. The delta between the
310 Dispatch Width (2.00), and the theoretical maximum IPC (1.50) is an indicator of
311 a performance bottleneck caused by the lack of hardware resources, and the
312 *Resource pressure view* can help to identify the problematic resource usage.
294314 The second section of the report shows the latency and reciprocal
295315 throughput of every instruction in the sequence. That section also reports