llvm.org GIT mirror llvm / 29b29cc
[llvm-mca] LLVM Machine Code Analyzer. llvm-mca is an LLVM based performance analysis tool that can be used to statically measure the performance of code, and to help triage potential problems with target scheduling models. llvm-mca uses information which is already available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific cpu. Performance is measured in terms of throughput as well as processor resource consumption. The tool currently works for processors with an out-of-order backend, for which there is a scheduling model available in LLVM. The main goal of this tool is not just to predict the performance of the code when run on the target, but also help with diagnosing potential performance issues. Given an assembly code sequence, llvm-mca estimates the IPC (instructions per cycle), as well as hardware resources pressure. The analysis and reporting style were mostly inspired by the IACA tool from Intel. This patch is related to the RFC on llvm-dev visible at this link: http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html Differential Revision: https://reviews.llvm.org/D43951 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326998 91177308-0d34-0410-b5e6-96231b3b80d8 Andrea Di Biagio 1 year, 6 months ago
46 changed file(s) with 6334 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
0 llvm-mca - LLVM Machine Code Analyzer
1 =====================================
2
3 SYNOPSIS
4 --------
5
6 :program:`llvm-mca` [*options*] [input]
7
8 DESCRIPTION
9 -----------
10
11 :program:`llvm-mca` is a performance analysis tool that uses information
12 available in LLVM (e.g. scheduling models) to statically measure the performance
13 of machine code in a specific CPU.
14
15 Performance is measured in terms of throughput as well as processor resource
16 consumption. The tool currently works for processors with an out-of-order
17 backend, for which there is a scheduling model available in LLVM.
18
19 The main goal of this tool is not just to predict the performance of the code
20 when run on the target, but also help with diagnosing potential performance
21 issues.
22
23 Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
24 Cycle), as well as hardware resource pressure. The analysis and reporting style
25 were inspired by the IACA tool from Intel.
26
27 OPTIONS
28 -------
29
30 If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
31 input. Otherwise, it will read from the specified filename.
32
33 If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
34 to standard output if the input is from standard input. If the :option:`-o`
35 option specifies "``-``", then the output will also be sent to standard output.
36
37
38 .. option:: -help
39
40 Print a summary of command line options.
41
42 .. option:: -mtriple=
43
44 Specify a target triple string.
45
46 .. option:: -march=
47
48 Specify the architecture for which to analyze the code. It defaults to the
49 host default target.
50
51 .. option:: -mcpu=
52
53 Specify the processor for whic to run the analysis.
54 By default this defaults to a "generic" processor. It is not autodetected to
55 the current architecture.
56
57 .. option:: -output-asm-variant=
58
59 Specify the output assembly variant for the report generated by the tool.
60 On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
61 the AT&T (vic. Intel) assembly format for the code printed out by the tool in
62 the analysis report.
63
64 .. option:: -dispatch=
65
66 Specify a different dispatch width for the processor. The dispatch width
67 defaults to the 'IssueWidth' specified by the processor scheduling model.
68 If width is zero, then the default dispatch width is used.
69
70 .. option:: -max-retire-per-cycle=
71
72 Specify the retire throughput (i.e. how many instructions can be retired by the
73 retire control unit every cycle).
74
75 .. option:: -register-file-size=
76
77 Specify the size of the register file. When specified, this flag limits
78 how many temporary registers are available for register renaming purposes. By
79 default, the number of temporary registers is unlimited. A value of zero for
80 this flag means "unlimited number of temporary registers".
81
82 .. option:: -iterations=
83
84 Specify the number of iterations to run. If this flag is set to 0, then the
85 tool sets the number of iterations to a default value (i.e. 70).
86
87 .. option:: -noalias=
88
89 If set, the tool assumes that loads and stores don't alias. This is the
90 default behavior.
91
92 .. option:: -lqueue=
93
94 Specify the size of the load queue in the load/store unit emulated by the tool.
95 By default, the tool assumes an unbound number of entries in the load queue.
96 A value of zero for this flag is ignored, and the default load queue size is
97 used instead.
98
99 .. option:: -squeue=
100
101 Specify the size of the store queue in the load/store unit emulated by the
102 tool. By default, the tool assumes an unbound number of entries in the store
103 queue. A value of zero for this flag is ignored, and the default store queue
104 size is used instead.
105
106 .. option:: -verbose
107
108 Enable verbose output. In particular, this flag enables a number of extra
109 statistics and performance counters for the dispatch logic, the reorder
110 buffer, the retire control unit and the register file.
111
112 .. option:: -timeline
113
114 Enable the timeline view.
115
116 .. option:: -timeline-max-iterations=
117
118 Limit the number of iterations to print in the timeline view. By default, the
119 timeline view prints information for up to 10 iterations.
120
121 .. option:: -timeline-max-cycles=
122
123 Limit the number of cycles in the timeline view. By default, the number of
124 cycles is set to 80.
125
126
127 EXIT STATUS
128 -----------
129
130 :program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
131 to standard error, and the tool returns 1.
132
0 if not 'ARM' in config.root.targets:
1 config.unsupported = True
0 # RUN: llvm-mca -mtriple=arm-eabi -mcpu=cortex-a9 -iterations=100 < %s | FileCheck %s
1
2 vadd.f32 s0, s2, s2
3
4 # CHECK: Iterations: 100
5 # CHECK-NEXT: Instructions: 100
6 # CHECK-NEXT: Total Cycles: 105
7 # CHECK-NEXT: Dispatch Width: 2
8 # CHECK-NEXT: IPC: 0.95
9
10 # CHECK: Resources:
11 # CHECK-NEXT: [0] - A9UnitAGU
12 # CHECK-NEXT: [1.0] - A9UnitALU
13 # CHECK-NEXT: [1.1] - A9UnitALU
14 # CHECK-NEXT: [2] - A9UnitB
15 # CHECK-NEXT: [3] - A9UnitFP
16 # CHECK-NEXT: [4] - A9UnitLS
17 # CHECK-NEXT: [5] - A9UnitMul
18
19 # CHECK: Resource pressure per iteration:
20 # CHECK-NEXT: [0] [1.0] [1.1] [2] [3] [4] [5]
21 # CHECK-NEXT: 1.00 - - - 1.00 - -
22
23 # CHECK: Resource pressure by instruction:
24 # CHECK-NEXT: [0] [1.0] [1.1] [2] [3] [4] [5] Instructions:
25 # CHECK-NEXT: 1.00 - - - 1.00 - - vadd.f32 s0, s2, s2
26
27 # CHECK: Instruction Info:
28 # CHECK-NEXT: [1]: #uOps
29 # CHECK-NEXT: [2]: Latency
30 # CHECK-NEXT: [3]: RThroughput
31 # CHECK-NEXT: [4]: MayLoad
32 # CHECK-NEXT: [5]: MayStore
33 # CHECK-NEXT: [6]: HasSideEffects
34
35 # CHECK: [1] [2] [3] [4] [5] [6] Instructions:
36 # CHECK-NEXT: 1 4 1.00 vadd.f32 s0, s2, s2
0 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=300 -timeline -timeline-max-iterations=3 < %s | FileCheck %s
1
2 vmulps %xmm0, %xmm1, %xmm2
3 vhaddps %xmm2, %xmm2, %xmm3
4 vhaddps %xmm3, %xmm3, %xmm4
5
6 # CHECK: Iterations: 300
7 # CHECK-NEXT: Instructions: 900
8 # CHECK-NEXT: Total Cycles: 610
9 # CHECK-NEXT: Dispatch Width: 2
10 # CHECK-NEXT: IPC: 1.48
11
12 # CHECK: Resources:
13 # CHECK-NEXT: [0] - JALU0
14 # CHECK-NEXT: [1] - JALU1
15 # CHECK-NEXT: [2] - JDiv
16 # CHECK-NEXT: [3] - JFPA
17 # CHECK-NEXT: [4] - JFPM
18 # CHECK-NEXT: [5] - JFPU0
19 # CHECK-NEXT: [6] - JFPU1
20 # CHECK-NEXT: [7] - JLAGU
21 # CHECK-NEXT: [8] - JMul
22 # CHECK-NEXT: [9] - JSAGU
23 # CHECK-NEXT: [10] - JSTC
24 # CHECK-NEXT: [11] - JVALU0
25 # CHECK-NEXT: [12] - JVALU1
26 # CHECK-NEXT: [13] - JVIMUL
27
28 # CHECK: Resource pressure per iteration:
29 # CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
30 # CHECK-NEXT: - - - - - 2.00 1.00 - - - - - - -
31
32 # CHECK: Resource pressure by instruction:
33 # CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
34 # CHECK-NEXT: - - - - - - 1.00 - - - - - - - vmulps %xmm0, %xmm1, %xmm2
35 # CHECK-NEXT: - - - - - 1.00 - - - - - - - - vhaddps %xmm2, %xmm2, %xmm3
36 # CHECK-NEXT: - - - - - 1.00 - - - - - - - - vhaddps %xmm3, %xmm3, %xmm4
37
38 # CHECK: Instruction Info:
39 # CHECK-NEXT: [1]: #uOps
40 # CHECK-NEXT: [2]: Latency
41 # CHECK-NEXT: [3]: RThroughput
42 # CHECK-NEXT: [4]: MayLoad
43 # CHECK-NEXT: [5]: MayStore
44 # CHECK-NEXT: [6]: HasSideEffects
45
46 # CHECK: [1] [2] [3] [4] [5] [6] Instructions:
47 # CHECK-NEXT: 1 2 1.00 vmulps %xmm0, %xmm1, %xmm2
48 # CHECK-NEXT: 1 3 1.00 vhaddps %xmm2, %xmm2, %xmm3
49 # CHECK-NEXT: 1 3 1.00 vhaddps %xmm3, %xmm3, %xmm4
50
51
52 # CHECK: Timeline view:
53 # CHECK-NEXT: 012345
54 # CHECK-NEXT: Index 0123456789
55
56 # CHECK: [0,0] DeeER. . . vmulps %xmm0, %xmm1, %xmm2
57 # CHECK-NEXT: [0,1] D==eeeER . . vhaddps %xmm2, %xmm2, %xmm3
58 # CHECK-NEXT: [0,2] .D====eeeER . vhaddps %xmm3, %xmm3, %xmm4
59
60 # CHECK: [1,0] .DeeE-----R . vmulps %xmm0, %xmm1, %xmm2
61 # CHECK-NEXT: [1,1] . D=eeeE--R . vhaddps %xmm2, %xmm2, %xmm3
62 # CHECK-NEXT: [1,2] . D====eeeER . vhaddps %xmm3, %xmm3, %xmm4
63
64 # CHECK: [2,0] . DeeE----R . vmulps %xmm0, %xmm1, %xmm2
65 # CHECK-NEXT: [2,1] . D====eeeER . vhaddps %xmm2, %xmm2, %xmm3
66 # CHECK-NEXT: [2,2] . D======eeeER vhaddps %xmm3, %xmm3, %xmm4
67
68 # CHECK: Average Wait times (based on the timeline view):
69 # CHECK-NEXT: [0]: Executions
70 # CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
71 # CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
72 # CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
73
74 # CHECK: [0] [1] [2] [3]
75 # CHECK-NEXT: 0. 3 1.0 1.0 3.0 vmulps %xmm0, %xmm1, %xmm2
76 # CHECK-NEXT: 1. 3 3.3 0.7 0.7 vhaddps %xmm2, %xmm2, %xmm3
77 # CHECK-NEXT: 2. 3 5.7 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4
0 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=100 -timeline -timeline-max-iterations=1 -noalias=false < %s | FileCheck %s
1
2 vmovaps (%rsi), %xmm0
3 vmovaps %xmm0, (%rdi)
4 vmovaps 16(%rsi), %xmm0
5 vmovaps %xmm0, 16(%rdi)
6 vmovaps 32(%rsi), %xmm0
7 vmovaps %xmm0, 32(%rdi)
8 vmovaps 48(%rsi), %xmm0
9 vmovaps %xmm0, 48(%rdi)
10
11 # CHECK: Iterations: 100
12 # CHECK-NEXT: Instructions: 800
13 # CHECK-NEXT: Total Cycles: 2403
14 # CHECK-NEXT: Dispatch Width: 2
15 # CHECK-NEXT: IPC: 0.33
16
17
18 # CHECK: Resources:
19 # CHECK-NEXT: [0] - JALU0
20 # CHECK-NEXT: [1] - JALU1
21 # CHECK-NEXT: [2] - JDiv
22 # CHECK-NEXT: [3] - JFPA
23 # CHECK-NEXT: [4] - JFPM
24 # CHECK-NEXT: [5] - JFPU0
25 # CHECK-NEXT: [6] - JFPU1
26 # CHECK-NEXT: [7] - JLAGU
27 # CHECK-NEXT: [8] - JMul
28 # CHECK-NEXT: [9] - JSAGU
29 # CHECK-NEXT: [10] - JSTC
30 # CHECK-NEXT: [11] - JVALU0
31 # CHECK-NEXT: [12] - JVALU1
32 # CHECK-NEXT: [13] - JVIMUL
33
34
35 # CHECK: Resource pressure per iteration:
36 # CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
37 # CHECK-NEXT: - - - - - - - 4.00 - 4.00 - - - -
38
39 # CHECK: Resource pressure by instruction:
40 # CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
41 # CHECK-NEXT: - - - - - - - 1.00 - - - - - - vmovaps (%rsi), %xmm0
42 # CHECK-NEXT: - - - - - - - - - 1.00 - - - - vmovaps %xmm0, (%rdi)
43 # CHECK-NEXT: - - - - - - - 1.00 - - - - - - vmovaps 16(%rsi), %xmm0
44 # CHECK-NEXT: - - - - - - - - - 1.00 - - - - vmovaps %xmm0, 16(%rdi)
45 # CHECK-NEXT: - - - - - - - 1.00 - - - - - - vmovaps 32(%rsi), %xmm0
46 # CHECK-NEXT: - - - - - - - - - 1.00 - - - - vmovaps %xmm0, 32(%rdi)
47 # CHECK-NEXT: - - - - - - - 1.00 - - - - - - vmovaps 48(%rsi), %xmm0
48 # CHECK-NEXT: - - - - - - - - - 1.00 - - - - vmovaps %xmm0, 48(%rdi)
49
50
51 # CHECK: Instruction Info:
52 # CHECK-NEXT: [1]: #uOps
53 # CHECK-NEXT: [2]: Latency
54 # CHECK-NEXT: [3]: RThroughput
55 # CHECK-NEXT: [4]: MayLoad
56 # CHECK-NEXT: [5]: MayStore
57 # CHECK-NEXT: [6]: HasSideEffects
58
59 # CHECK: [1] [2] [3] [4] [5] [6] Instructions:
60 # CHECK-NEXT: 1 5 1.00 * vmovaps (%rsi), %xmm0
61 # CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, (%rdi)
62 # CHECK-NEXT: 1 5 1.00 * vmovaps 16(%rsi), %xmm0
63 # CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 16(%rdi)
64 # CHECK-NEXT: 1 5 1.00 * vmovaps 32(%rsi), %xmm0
65 # CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 32(%rdi)
66 # CHECK-NEXT: 1 5 1.00 * vmovaps 48(%rsi), %xmm0
67 # CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 48(%rdi)
68
69 # CHECK: Timeline view:
70 # CHECK-NEXT: 0123456789
71 # CHECK-NEXT: Index 0123456789 0123456
72
73 # CHECK: [0,0] DeeeeeER . . . .. vmovaps (%rsi), %xmm0
74 # CHECK-NEXT: [0,1] D=====eER . . . .. vmovaps %xmm0, (%rdi)
75 # CHECK-NEXT: [0,2] .D=====eeeeeER . . .. vmovaps 16(%rsi), %xmm0
76 # CHECK-NEXT: [0,3] .D==========eER. . .. vmovaps %xmm0, 16(%rdi)
77 # CHECK-NEXT: [0,4] . D==========eeeeeER. .. vmovaps 32(%rsi), %xmm0
78 # CHECK-NEXT: [0,5] . D===============eER .. vmovaps %xmm0, 32(%rdi)
79 # CHECK-NEXT: [0,6] . D===============eeeeeER. vmovaps 48(%rsi), %xmm0
80 # CHECK-NEXT: [0,7] . D====================eER vmovaps %xmm0, 48(%rdi)
81
82 # CHECK: Average Wait times (based on the timeline view):
83 # CHECK-NEXT: [0]: Executions
84 # CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
85 # CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
86 # CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
87
88 # CHECK: [0] [1] [2] [3]
89 # CHECK-NEXT: 0. 1 1.0 1.0 0.0 vmovaps (%rsi), %xmm0
90 # CHECK-NEXT: 1. 1 6.0 0.0 0.0 vmovaps %xmm0, (%rdi)
91 # CHECK-NEXT: 2. 1 6.0 0.0 0.0 vmovaps 16(%rsi), %xmm0
92 # CHECK-NEXT: 3. 1 11.0 0.0 0.0 vmovaps %xmm0, 16(%rdi)
93 # CHECK-NEXT: 4. 1 11.0 0.0 0.0 vmovaps 32(%rsi), %xmm0
94 # CHECK-NEXT: 5. 1 16.0 0.0 0.0 vmovaps %xmm0, 32(%rdi)
95 # CHECK-NEXT: 6. 1 16.0 0.0 0.0 vmovaps 48(%rsi), %xmm0
96 # CHECK-NEXT: 7. 1 21.0 0.0 0.0 vmovaps %xmm0, 48(%rdi)
0 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=100 -timeline -timeline-max-iterations=1 < %s | FileCheck %s
1
2 vmovaps (%rsi), %xmm0
3 vmovaps %xmm0, (%rdi)
4 vmovaps 16(%rsi), %xmm0
5 vmovaps %xmm0, 16(%rdi)
6 vmovaps 32(%rsi), %xmm0
7 vmovaps %xmm0, 32(%rdi)
8 vmovaps 48(%rsi), %xmm0
9 vmovaps %xmm0, 48(%rdi)
10
11
12 # CHECK: Iterations: 100
13 # CHECK-NEXT: Instructions: 800
14 # CHECK-NEXT: Total Cycles: 408
15 # CHECK-NEXT: Dispatch Width: 2
16 # CHECK-NEXT: IPC: 1.96
17
18
19 # CHECK: Resources:
20 # CHECK-NEXT: [0] - JALU0
21 # CHECK-NEXT: [1] - JALU1
22 # CHECK-NEXT: [2] - JDiv
23 # CHECK-NEXT: [3] - JFPA
24 # CHECK-NEXT: [4] - JFPM
25 # CHECK-NEXT: [5] - JFPU0
26 # CHECK-NEXT: [6] - JFPU1
27 # CHECK-NEXT: [7] - JLAGU
28 # CHECK-NEXT: [8] - JMul
29 # CHECK-NEXT: [9] - JSAGU
30 # CHECK-NEXT: [10] - JSTC
31 # CHECK-NEXT: [11] - JVALU0
32 # CHECK-NEXT: [12] - JVALU1
33 # CHECK-NEXT: [13] - JVIMUL
34
35
36 # CHECK: Resource pressure per iteration:
37 # CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
38 # CHECK-NEXT: - - - - - - - 4.00 - 4.00 - - - -
39
40 # CHECK: Resource pressure by instruction:
41 # CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
42 # CHECK-NEXT: - - - - - - - 1.00 - - - - - - vmovaps (%rsi), %xmm0
43 # CHECK-NEXT: - - - - - - - - - 1.00 - - - - vmovaps %xmm0, (%rdi)
44 # CHECK-NEXT: - - - - - - - 1.00 - - - - - - vmovaps 16(%rsi), %xmm0
45 # CHECK-NEXT: - - - - - - - - - 1.00 - - - - vmovaps %xmm0, 16(%rdi)
46 # CHECK-NEXT: - - - - - - - 1.00 - - - - - - vmovaps 32(%rsi), %xmm0
47 # CHECK-NEXT: - - - - - - - - - 1.00 - - - - vmovaps %xmm0, 32(%rdi)
48 # CHECK-NEXT: - - - - - - - 1.00 - - - - - - vmovaps 48(%rsi), %xmm0
49 # CHECK-NEXT: - - - - - - - - - 1.00 - - - - vmovaps %xmm0, 48(%rdi)
50
51
52 # CHECK: Instruction Info:
53 # CHECK-NEXT: [1]: #uOps
54 # CHECK-NEXT: [2]: Latency
55 # CHECK-NEXT: [3]: RThroughput
56 # CHECK-NEXT: [4]: MayLoad
57 # CHECK-NEXT: [5]: MayStore
58 # CHECK-NEXT: [6]: HasSideEffects
59
60 # CHECK: [1] [2] [3] [4] [5] [6] Instructions:
61 # CHECK-NEXT: 1 5 1.00 * vmovaps (%rsi), %xmm0
62 # CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, (%rdi)
63 # CHECK-NEXT: 1 5 1.00 * vmovaps 16(%rsi), %xmm0
64 # CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 16(%rdi)
65 # CHECK-NEXT: 1 5 1.00 * vmovaps 32(%rsi), %xmm0
66 # CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 32(%rdi)
67 # CHECK-NEXT: 1 5 1.00 * vmovaps 48(%rsi), %xmm0
68 # CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 48(%rdi)
69
70
71 # CHECK: Timeline view:
72 # CHECK-NEXT: 01
73 # CHECK-NEXT: Index 0123456789
74
75 # CHECK: [0,0] DeeeeeER .. vmovaps (%rsi), %xmm0
76 # CHECK-NEXT: [0,1] D=====eER .. vmovaps %xmm0, (%rdi)
77 # CHECK-NEXT: [0,2] .DeeeeeER .. vmovaps 16(%rsi), %xmm0
78 # CHECK-NEXT: [0,3] .D=====eER.. vmovaps %xmm0, 16(%rdi)
79 # CHECK-NEXT: [0,4] . DeeeeeER.. vmovaps 32(%rsi), %xmm0
80 # CHECK-NEXT: [0,5] . D=====eER. vmovaps %xmm0, 32(%rdi)
81 # CHECK-NEXT: [0,6] . DeeeeeER. vmovaps 48(%rsi), %xmm0
82 # CHECK-NEXT: [0,7] . D=====eER vmovaps %xmm0, 48(%rdi)
83
84
85 # CHECK: Average Wait times (based on the timeline view):
86 # CHECK-NEXT: [0]: Executions
87 # CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
88 # CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
89 # CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
90
91 # CHECK: [0] [1] [2] [3]
92 # CHECK-NEXT: 0. 1 1.0 1.0 0.0 vmovaps (%rsi), %xmm0
93 # CHECK-NEXT: 1. 1 6.0 0.0 0.0 vmovaps %xmm0, (%rdi)
94 # CHECK-NEXT: 2. 1 1.0 1.0 0.0 vmovaps 16(%rsi), %xmm0
95 # CHECK-NEXT: 3. 1 6.0 0.0 0.0 vmovaps %xmm0, 16(%rdi)
96 # CHECK-NEXT: 4. 1 1.0 1.0 0.0 vmovaps 32(%rsi), %xmm0
97 # CHECK-NEXT: 5. 1 6.0 0.0 0.0 vmovaps %xmm0, 32(%rdi)
98 # CHECK-NEXT: 6. 1 1.0 1.0 0.0 vmovaps 48(%rsi), %xmm0
99 # CHECK-NEXT: 7. 1 6.0 0.0 0.0 vmovaps %xmm0, 48(%rdi)
0 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=100 < %s | FileCheck %s
1
2 add %edi, %eax
3
4 # CHECK: Iterations: 100
5 # CHECK-NEXT: Instructions: 100
6 # CHECK-NEXT: Total Cycles: 103
7 # CHECK-NEXT: Dispatch Width: 2
8 # CHECK-NEXT: IPC: 0.97
9
10 # CHECK-LABEL: Resources:
11 # CHECK-NEXT: [0] - JALU0
12 # CHECK-NEXT: [1] - JALU1
13 # CHECK-NEXT: [2] - JDiv
14 # CHECK-NEXT: [3] - JFPA
15 # CHECK-NEXT: [4] - JFPM
16 # CHECK-NEXT: [5] - JFPU0
17 # CHECK-NEXT: [6] - JFPU1
18 # CHECK-NEXT: [7] - JLAGU
19 # CHECK-NEXT: [8] - JMul
20 # CHECK-NEXT: [9] - JSAGU
21 # CHECK-NEXT: [10] - JSTC
22 # CHECK-NEXT: [11] - JVALU0
23 # CHECK-NEXT: [12] - JVALU1
24 # CHECK-NEXT: [13] - JVIMUL
25
26
27 # CHECK: Resource pressure per iteration:
28 # CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
29 # CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - -
30
31 # CHECK: Resource pressure by instruction:
32 # CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
33 # CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - addl %edi, %eax
34
35 # CHECK: Instruction Info:
36 # CHECK-NEXT: [1]: #uOps
37 # CHECK-NEXT: [2]: Latency
38 # CHECK-NEXT: [3]: RThroughput
39 # CHECK-NEXT: [4]: MayLoad
40 # CHECK-NEXT: [5]: MayStore
41 # CHECK-NEXT: [6]: HasSideEffects
42
43 # CHECK: [1] [2] [3] [4] [5] [6] Instructions:
44 # CHECK-NEXT: 1 1 0.50 addl %edi, %eax
0 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=btver2 < %s | FileCheck --check-prefix=ALL --check-prefix=BTVER2 %s
1 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=znver1 < %s | FileCheck --check-prefix=ALL --check-prefix=ZNVER1 %s
2 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=sandybridge < %s | FileCheck --check-prefix=ALL --check-prefix=SANDYBRIDGE %s
3 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=ivybridge < %s | FileCheck --check-prefix=ALL --check-prefix=IVYBRIDGE %s
4 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=haswell < %s | FileCheck --check-prefix=ALL --check-prefix=HASWELL %s
5 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=broadwell < %s | FileCheck --check-prefix=ALL --check-prefix=BROADWELL %s
6 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=knl < %s | FileCheck --check-prefix=ALL --check-prefix=KNL %s
7 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=skylake < %s | FileCheck --check-prefix=ALL --check-prefix=SKX %s
8 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=skylake-avx512 < %s | FileCheck --check-prefix=ALL --check-prefix=SKX-AVX512 %s
9 # RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=slm < %s | FileCheck --check-prefix=ALL --check-prefix=SLM %s
10
11 add %edi, %eax
12
13 # ALL: Iterations: 70
14 # ALL-NEXT: Instructions: 70
15
16 # BTVER2: Dispatch Width: 2
17 # ZNVER1: Dispatch Width: 4
18 # SANDYBRIDGE: Dispatch Width: 4
19 # IVYBRIDGE: Dispatch Width: 4
20 # HASWELL: Dispatch Width: 4
21 # BROADWELL: Dispatch Width: 4
22 # KNL: Dispatch Width: 4
23 # SKX: Dispatch Width: 6
24 # SKX-AVX512: Dispatch Width: 6
25 # SLM: Dispatch Width: 2
26
0 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 < %s 2>&1 | FileCheck --check-prefix=DEFAULT %s
1 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=0 < %s 2>&1 | FileCheck --check-prefix=DEFAULT %s
2 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1 < %s 2>&1 | FileCheck --check-prefix=CUSTOM %s
3
4 add %eax, %eax
5
6 # DEFAULT: Iterations: 70
7 # DEFAULT-NEXT: Instructions: 70
8
9 # CUSTOM: Iterations: 1
10 # CUSTOM-NEXT: Instructions: 1
0 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 < %s 2>&1 | FileCheck --check-prefix=DEFAULT %s
1 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -dispatch=0 < %s 2>&1 | FileCheck --check-prefix=DEFAULT %s
2 # RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -dispatch=1 < %s 2>&1 | FileCheck --check-prefix=CUSTOM %s
3
4 add %eax, %eax
5
6 # DEFAULT: Dispatch Width: 2
7 # CUSTOM: Dispatch Width: 1
0 # RUN: not llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=atom -o %t1 2>&1 | FileCheck %s
1
2 # CHECK: error: please specify an out-of-order cpu. 'atom' is an in-order cpu.
0 # RUN: not llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 %s
1
2 invalid_instruction_mnemonic
0 # RUN: not llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=foo -o %t1 2>&1 | FileCheck %s
1
2 # CHECK: 'foo' is not a recognized processor for this target (ignoring processor)
0 # RUN: not llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=btver2 -o %t1 2>&1 | FileCheck %s
1
2 # CHECK: error: no assembly instructions found.
0 if not 'X86' in config.root.targets:
1 config.unsupported = True
2
0 # RUN: not llvm-mca -mtriple=x86_64-unknown-unknown < %s 2>&1 | FileCheck %s
1 # RUN: not llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=generic < %s 2>&1 | FileCheck %s
2
3 # CHECK: error: unable to find instruction-level scheduling information for target triple 'x86_64-unknown-unknown' and cpu 'generic'.
0 # RUN: not llvm-mca %t.blah -o %t2 2>&1 | FileCheck --check-prefix=ENOENT %s
1
2 # ENOENT: {{.*}}.blah: {{[Nn]}}o such file or directory
0 # Requires a non-empty default triple for these tests
1 if 'default_triple' not in config.available_features:
2 config.unsupported = True
3
3636 llvm-link
3737 llvm-lto
3838 llvm-mc
39 llvm-mca
3940 llvm-mcmarkup
4041 llvm-modextract
4142 llvm-mt
0 //===--------------------- Backend.cpp --------------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// Implementation of class Backend which emulates an hardware OoO backend.
11 ///
12 //===----------------------------------------------------------------------===//
13
14 #include "Backend.h"
15 #include "HWEventListener.h"
16 #include "llvm/CodeGen/TargetSchedule.h"
17 #include "llvm/Support/Debug.h"
18
19 namespace mca {
20
21 #define DEBUG_TYPE "llvm-mca"
22
23 using namespace llvm;
24
25 void Backend::addEventListener(HWEventListener *Listener) {
26 if (Listener)
27 Listeners.insert(Listener);
28 }
29
30 void Backend::runCycle(unsigned Cycle) {
31 notifyCycleBegin(Cycle);
32
33 if (!SM->hasNext()) {
34 notifyCycleEnd(Cycle);
35 return;
36 }
37
38 InstRef IR = SM->peekNext();
39 const InstrDesc *Desc = &IB->getOrCreateInstrDesc(STI, *IR.second);
40 while (DU->isAvailable(Desc->NumMicroOps) && DU->canDispatch(*Desc)) {
41 Instruction *NewIS = IB->createInstruction(STI, *DU, IR.first, *IR.second);
42 Instructions[IR.first] = std::unique_ptr(NewIS);
43 NewIS->setRCUTokenID(DU->dispatch(IR.first, NewIS));
44
45 // If this is a zero latency instruction, then we don't need to dispatch
46 // it. Instead, we can mark it as executed.
47 if (NewIS->isZeroLatency())
48 notifyInstructionExecuted(IR.first);
49
50 // Check if we have dispatched all the instructions.
51 SM->updateNext();
52 if (!SM->hasNext())
53 break;
54
55 // Prepare for the next round.
56 IR = SM->peekNext();
57 Desc = &IB->getOrCreateInstrDesc(STI, *IR.second);
58 }
59
60 notifyCycleEnd(Cycle);
61 }
62
63 void Backend::notifyCycleBegin(unsigned Cycle) {
64 DEBUG(dbgs() << "[E] Cycle begin: " << Cycle << '\n');
65 for (HWEventListener *Listener : Listeners)
66 Listener->onCycleBegin(Cycle);
67
68 DU->cycleEvent(Cycle);
69 HWS->cycleEvent(Cycle);
70 }
71
72 void Backend::notifyInstructionDispatched(unsigned Index) {
73 DEBUG(dbgs() << "[E] Instruction Dispatched: " << Index << '\n');
74 for (HWEventListener *Listener : Listeners)
75 Listener->onInstructionDispatched(Index);
76 }
77
78 void Backend::notifyInstructionReady(unsigned Index) {
79 DEBUG(dbgs() << "[E] Instruction Ready: " << Index << '\n');
80 for (HWEventListener *Listener : Listeners)
81 Listener->onInstructionReady(Index);
82 }
83
84 void Backend::notifyInstructionIssued(
85 unsigned Index, const ArrayRef> &Used) {
86 DEBUG(
87 dbgs() << "[E] Instruction Issued: " << Index << '\n';
88 for (const std::pair &Resource : Used) {
89 dbgs() << "[E] Resource Used: [" << Resource.first.first << '.'
90 << Resource.first.second << "]\n";
91 dbgs() << " cycles: " << Resource.second << '\n';
92 }
93 );
94
95 for (HWEventListener *Listener : Listeners)
96 Listener->onInstructionIssued(Index, Used);
97 }
98
99 void Backend::notifyInstructionExecuted(unsigned Index) {
100 DEBUG(dbgs() << "[E] Instruction Executed: " << Index << '\n');
101 for (HWEventListener *Listener : Listeners)
102 Listener->onInstructionExecuted(Index);
103
104 const Instruction &IS = *Instructions[Index];
105 DU->onInstructionExecuted(IS.getRCUTokenID());
106 }
107
108 void Backend::notifyInstructionRetired(unsigned Index) {
109 DEBUG(dbgs() << "[E] Instruction Retired: " << Index << '\n');
110 for (HWEventListener *Listener : Listeners)
111 Listener->onInstructionRetired(Index);
112
113 const Instruction &IS = *Instructions[Index];
114 DU->invalidateRegisterMappings(IS);
115 Instructions.erase(Index);
116 }
117
118 void Backend::notifyResourceAvailable(const ResourceRef &RR) {
119 DEBUG(dbgs() << "[E] Resource Available: [" << RR.first << '.' << RR.second
120 << "]\n");
121 for (HWEventListener *Listener : Listeners)
122 Listener->onResourceAvailable(RR);
123 }
124
125 void Backend::notifyCycleEnd(unsigned Cycle) {
126 DEBUG(dbgs() << "[E] Cycle end: " << Cycle << "\n\n");
127 for (HWEventListener *Listener : Listeners)
128 Listener->onCycleEnd(Cycle);
129 }
130
131 } // namespace mca.
0 //===--------------------- Backend.h ----------------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// This file implements an OoO backend for the llvm-mca tool.
11 ///
12 //===----------------------------------------------------------------------===//
13
14 #ifndef LLVM_TOOLS_LLVM_MCA_BACKEND_H
15 #define LLVM_TOOLS_LLVM_MCA_BACKEND_H
16
17 #include "Dispatch.h"
18 #include "InstrBuilder.h"
19 #include "Scheduler.h"
20 #include "SourceMgr.h"
21
22 namespace mca {
23
24 struct HWEventListener;
25
26 /// \brief An out of order backend for a specific subtarget.
27 ///
28 /// It emulates an out-of-order execution of instructions. Instructions are
29 /// fetched from a MCInst sequence managed by an object of class SourceMgr.
30 /// Instructions are firstly dispatched to the schedulers and then executed.
31 /// This class tracks the lifetime of an instruction from the moment where
32 /// it gets dispatched to the schedulers, to the moment where it finishes
33 /// executing and register writes are architecturally committed.
34 /// In particular, it monitors changes in the state of every instruction
35 /// in flight.
36 /// Instructions are executed in a loop of iterations. The number of iterations
37 /// is defined by the SourceMgr object.
38 /// The Backend entrypoint is method 'Run()' which execute cycles in a loop
39 /// until there are new instructions to dispatch, and not every instruction
40 /// has been retired.
41 /// Internally, the Backend collects statistical information in the form of
42 /// histograms. For example, it tracks how the dispatch group size changes
43 /// over time.
44 class Backend {
45 const llvm::MCSubtargetInfo &STI;
46
47 std::unique_ptr IB;
48 std::unique_ptr HWS;
49 std::unique_ptr DU;
50 std::unique_ptr SM;
51 unsigned Cycles;
52
53 llvm::DenseMap> Instructions;
54 std::set Listeners;
55
56 void runCycle(unsigned Cycle);
57
58 public:
59 Backend(const llvm::MCSubtargetInfo &Subtarget, const llvm::MCInstrInfo &MCII,
60 const llvm::MCRegisterInfo &MRI, std::unique_ptr Source,
61 unsigned DispatchWidth = 0, unsigned RegisterFileSize = 0,
62 unsigned MaxRetirePerCycle = 0, unsigned LoadQueueSize = 0,
63 unsigned StoreQueueSize = 0, bool AssumeNoAlias = false)
64 : STI(Subtarget),
65 HWS(llvm::make_unique(this, Subtarget.getSchedModel(),
66 LoadQueueSize, StoreQueueSize,
67 AssumeNoAlias)),
68 DU(llvm::make_unique(
69 this, MRI, Subtarget.getSchedModel().MicroOpBufferSize,
70 RegisterFileSize, MaxRetirePerCycle, DispatchWidth, HWS.get())),
71 SM(std::move(Source)), Cycles(0) {
72 IB = llvm::make_unique(MCII, getProcResourceMasks());
73 }
74
75 void run() {
76 while (SM->hasNext() || !DU->isRCUEmpty())
77 runCycle(Cycles++);
78 }
79
80 unsigned getNumIterations() const { return SM->getNumIterations(); }
81 unsigned getNumInstructions() const { return SM->size(); }
82 unsigned getNumCycles() const { return Cycles; }
83 unsigned getTotalRegisterMappingsCreated() const {
84 return DU->getTotalRegisterMappingsCreated();
85 }
86 unsigned getMaxUsedRegisterMappings() const {
87 return DU->getMaxUsedRegisterMappings();
88 }
89 unsigned getDispatchWidth() const { return DU->getDispatchWidth(); }
90
91 const llvm::MCSubtargetInfo &getSTI() const { return STI; }
92 const llvm::MCSchedModel &getSchedModel() const {
93 return STI.getSchedModel();
94 }
95 const llvm::ArrayRef getProcResourceMasks() const {
96 return HWS->getProcResourceMasks();
97 }
98
99 double getRThroughput(const InstrDesc &ID) const {
100 return HWS->getRThroughput(ID);
101 }
102 void getBuffersUsage(std::vector &Usage) const {
103 return HWS->getBuffersUsage(Usage);
104 }
105
106 unsigned getNumRATStalls() const { return DU->getNumRATStalls(); }
107 unsigned getNumRCUStalls() const { return DU->getNumRCUStalls(); }
108 unsigned getNumSQStalls() const { return DU->getNumSQStalls(); }
109 unsigned getNumLDQStalls() const { return DU->getNumLDQStalls(); }
110 unsigned getNumSTQStalls() const { return DU->getNumSTQStalls(); }
111 unsigned getNumDispatchGroupStalls() const {
112 return DU->getNumDispatchGroupStalls();
113 }
114
115 const llvm::MCInst &getMCInstFromIndex(unsigned Index) const {
116 return SM->getMCInstFromIndex(Index);
117 }
118
119 const InstrDesc &getInstrDesc(const llvm::MCInst &Inst) const {
120 return IB->getOrCreateInstrDesc(STI, Inst);
121 }
122
123 const SourceMgr &getSourceMgr() const { return *SM; }
124
125 void addEventListener(HWEventListener *Listener);
126 void notifyCycleBegin(unsigned Cycle);
127 void notifyInstructionDispatched(unsigned Index);
128 void notifyInstructionReady(unsigned Index);
129 void notifyInstructionIssued(
130 unsigned Index,
131 const llvm::ArrayRef> &Used);
132 void notifyInstructionExecuted(unsigned Index);
133 void notifyResourceAvailable(const ResourceRef &RR);
134 void notifyInstructionRetired(unsigned Index);
135 void notifyCycleEnd(unsigned Cycle);
136 };
137
138 } // namespace mca
139
140 #endif
0 //===--------------------- BackendPrinter.cpp -------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// This file implements the BackendPrinter interface.
11 ///
12 //===----------------------------------------------------------------------===//
13
14 #include "BackendPrinter.h"
15 #include "llvm/CodeGen/TargetSchedule.h"
16
17 namespace mca {
18
19 using namespace llvm;
20
21 std::unique_ptr
22 BackendPrinter::getOutputStream(std::string OutputFile) {
23 if (OutputFile == "")
24 OutputFile = "-";
25 std::error_code EC;
26 auto Out = llvm::make_unique(OutputFile, EC, sys::fs::F_None);
27 if (!EC)
28 return Out;
29 errs() << EC.message() << '\n';
30 return nullptr;
31 }
32
33 void BackendPrinter::printGeneralStatistics(unsigned Iterations,
34 unsigned Cycles,
35 unsigned Instructions,
36 unsigned DispatchWidth) const {
37 unsigned TotalInstructions = Instructions * Iterations;
38 double IPC = (double)TotalInstructions / Cycles;
39
40 std::string Buffer;
41 raw_string_ostream TempStream(Buffer);
42 TempStream << "Iterations: " << Iterations;
43 TempStream << "\nInstructions: " << TotalInstructions;
44 TempStream << "\nTotal Cycles: " << Cycles;
45 TempStream << "\nDispatch Width: " << DispatchWidth;
46 TempStream << "\nIPC: " << format("%.2f", IPC) << '\n';
47 TempStream.flush();
48 File->os() << Buffer;
49 }
50
51 void BackendPrinter::printRATStatistics(unsigned TotalMappings,
52 unsigned MaxUsedMappings) const {
53 std::string Buffer;
54 raw_string_ostream TempStream(Buffer);
55 TempStream << "\n\nRegister Alias Table:";
56 TempStream << "\nTotal number of mappings created: " << TotalMappings;
57 TempStream << "\nMax number of mappings used: " << MaxUsedMappings
58 << '\n';
59 TempStream.flush();
60 File->os() << Buffer;
61 }
62
63 void BackendPrinter::printDispatchStalls(unsigned RATStalls, unsigned RCUStalls,
64 unsigned SCHEDQStalls,
65 unsigned LDQStalls, unsigned STQStalls,
66 unsigned DGStalls) const {
67 std::string Buffer;
68 raw_string_ostream TempStream(Buffer);
69 TempStream << "\n\nDynamic Dispatch Stall Cycles:\n";
70 TempStream << "RAT - Register unavailable: "
71 << RATStalls;
72 TempStream << "\nRCU - Retire tokens unavailable: "
73 << RCUStalls;
74 TempStream << "\nSCHEDQ - Scheduler full: "
75 << SCHEDQStalls;
76 TempStream << "\nLQ - Load queue full: "
77 << LDQStalls;
78 TempStream << "\nSQ - Store queue full: "
79 << STQStalls;
80 TempStream << "\nGROUP - Static restrictions on the dispatch group: "
81 << DGStalls;
82 TempStream << '\n';
83 TempStream.flush();
84 File->os() << Buffer;
85 }
86
87 void BackendPrinter::printSchedulerUsage(
88 const MCSchedModel &SM, const ArrayRef &Usage) const {
89 std::string Buffer;
90 raw_string_ostream TempStream(Buffer);
91 TempStream << "\n\nScheduler's queue usage:\n";
92 const ArrayRef ResourceMasks = B.getProcResourceMasks();
93 for (unsigned I = 0, E = SM.getNumProcResourceKinds(); I < E; ++I) {
94 const MCProcResourceDesc &ProcResource = *SM.getProcResource(I);
95 if (!ProcResource.BufferSize)
96 continue;
97
98 for (const BufferUsageEntry &Entry : Usage)
99 if (ResourceMasks[I] == Entry.first)
100 TempStream << ProcResource.Name << ", " << Entry.second << '/'
101 << ProcResource.BufferSize << '\n';
102 }
103
104 TempStream.flush();
105 File->os() << Buffer;
106 }
107
108 void BackendPrinter::printInstructionInfo() const {
109 std::string Buffer;
110 raw_string_ostream TempStream(Buffer);
111
112 TempStream << "\n\nInstruction Info:\n";
113 TempStream << "[1]: #uOps\n[2]: Latency\n[3]: RThroughput\n"
114 << "[4]: MayLoad\n[5]: MayStore\n[6]: HasSideEffects\n\n";
115
116 TempStream << "[1] [2] [3] [4] [5] [6]\tInstructions:\n";
117 for (unsigned I = 0, E = B.getNumInstructions(); I < E; ++I) {
118 const MCInst &Inst = B.getMCInstFromIndex(I);
119 const InstrDesc &ID = B.getInstrDesc(Inst);
120 unsigned NumMicroOpcodes = ID.NumMicroOps;
121 unsigned Latency = ID.MaxLatency;
122 double RThroughput = B.getRThroughput(ID);
123 TempStream << ' ' << NumMicroOpcodes << " ";
124 if (NumMicroOpcodes < 10)
125 TempStream << " ";
126 else if (NumMicroOpcodes < 100)
127 TempStream << ' ';
128 TempStream << Latency << " ";
129 if (Latency < 10.0)
130 TempStream << " ";
131 else if (Latency < 100.0)
132 TempStream << ' ';
133 if (RThroughput) {
134 TempStream << format("%.2f", RThroughput) << ' ';
135 if (RThroughput < 10.0)
136 TempStream << " ";
137 else if (RThroughput < 100.0)
138 TempStream << ' ';
139 } else {
140 TempStream << " - ";
141 }
142 TempStream << (ID.MayLoad ? " * " : " ");
143 TempStream << (ID.MayStore ? " * " : " ");
144 TempStream << (ID.HasSideEffects ? " * " : " ");
145 MCIP->printInst(&Inst, TempStream, "", B.getSTI());
146 TempStream << '\n';
147 }
148
149 TempStream.flush();
150 File->os() << Buffer;
151 }
152
153 void BackendPrinter::printReport() const {
154 assert(isFileValid());
155 unsigned Cycles = B.getNumCycles();
156 printGeneralStatistics(B.getNumIterations(), Cycles, B.getNumInstructions(),
157 B.getDispatchWidth());
158 if (EnableVerboseOutput) {
159 printDispatchStalls(B.getNumRATStalls(), B.getNumRCUStalls(),
160 B.getNumSQStalls(), B.getNumLDQStalls(),
161 B.getNumSTQStalls(), B.getNumDispatchGroupStalls());
162 printRATStatistics(B.getTotalRegisterMappingsCreated(),
163 B.getMaxUsedRegisterMappings());
164 BS->printHistograms(File->os());
165
166 std::vector Usage;
167 B.getBuffersUsage(Usage);
168 printSchedulerUsage(B.getSchedModel(), Usage);
169 }
170
171 if (RPV) {
172 RPV->printResourcePressure(getOStream(), Cycles);
173 printInstructionInfo();
174 }
175
176 if (TV) {
177 TV->printTimeline(getOStream());
178 TV->printAverageWaitTimes(getOStream());
179 }
180 }
181
182 void BackendPrinter::addResourcePressureView() {
183 if (!RPV) {
184 RPV = llvm::make_unique(
185 B.getSTI(), *MCIP, B.getSourceMgr(), B.getProcResourceMasks());
186 B.addEventListener(RPV.get());
187 }
188 }
189
190 void BackendPrinter::addTimelineView(unsigned MaxIterations,
191 unsigned MaxCycles) {
192 if (!TV) {
193 TV = llvm::make_unique(B.getSTI(), *MCIP, B.getSourceMgr(),
194 MaxIterations, MaxCycles);
195 B.addEventListener(TV.get());
196 }
197 }
198
199 void BackendPrinter::initialize(std::string OutputFileName) {
200 File = getOutputStream(OutputFileName);
201 MCIP->setPrintImmHex(false);
202 if (EnableVerboseOutput) {
203 BS = llvm::make_unique();
204 B.addEventListener(BS.get());
205 }
206 }
207
208 } // namespace mca.
0 //===--------------------- BackendPrinter.h ---------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// This file implements class BackendPrinter.
11 /// BackendPrinter is able to collect statistics related to the code executed
12 /// by the Backend class. Information is then printed out with the help of
13 /// a MCInstPrinter (to pretty print MCInst objects) and other helper classes.
14 ///
15 //===----------------------------------------------------------------------===//
16
17 #ifndef LLVM_TOOLS_LLVM_MCA_BACKENDPRINTER_H
18 #define LLVM_TOOLS_LLVM_MCA_BACKENDPRINTER_H
19
20 #include "Backend.h"
21 #include "BackendStatistics.h"
22 #include "ResourcePressureView.h"
23 #include "TimelineView.h"
24 #include "llvm/MC/MCInstPrinter.h"
25 #include "llvm/Support/Debug.h"
26 #include "llvm/Support/FileUtilities.h"
27 #include "llvm/Support/ToolOutputFile.h"
28
29 #define DEBUG_TYPE "llvm-mca"
30
31 namespace mca {
32
33 class ResourcePressureView;
34 class TimelineView;
35
36 /// \brief A printer class that knows how to collects statistics on the
37 /// code analyzed by the llvm-mca tool.
38 ///
39 /// This class knows how to print out the analysis information collected
40 /// during the execution of the code. Internally, it delegates to other
41 /// classes the task of printing out timeline information as well as
42 /// resource pressure.
43 class BackendPrinter {
44 Backend &B;
45 bool EnableVerboseOutput;
46
47 std::unique_ptr MCIP;
48 std::unique_ptr File;
49
50 std::unique_ptr RPV;
51 std::unique_ptr TV;
52 std::unique_ptr BS;
53
54 using Histogram = std::map;
55 void printDUStatistics(const Histogram &Stats, unsigned Cycles) const;
56 void printDispatchStalls(unsigned RATStalls, unsigned RCUStalls,
57 unsigned SQStalls, unsigned LDQStalls,
58 unsigned STQStalls, unsigned DGStalls) const;
59 void printRATStatistics(unsigned Mappings, unsigned MaxUsedMappings) const;
60 void printRCUStatistics(const Histogram &Histogram, unsigned Cycles) const;
61 void printIssuePerCycle(const Histogram &IssuePerCycle,
62 unsigned TotalCycles) const;
63 void printSchedulerUsage(const llvm::MCSchedModel &SM,
64 const llvm::ArrayRef &Usage) const;
65 void printGeneralStatistics(unsigned Iterations, unsigned Cycles,
66 unsigned Instructions,
67 unsigned DispatchWidth) const;
68 void printInstructionInfo() const;
69
70 std::unique_ptr getOutputStream(std::string OutputFile);
71 void initialize(std::string OputputFileName);
72
73 public:
74 BackendPrinter(Backend &backend, std::string OutputFileName,
75 std::unique_ptr IP, bool EnableVerbose)
76 : B(backend), EnableVerboseOutput(EnableVerbose), MCIP(std::move(IP)) {
77 initialize(OutputFileName);
78 }
79
80 ~BackendPrinter() {
81 if (File)
82 File->keep();
83 }
84
85 bool isFileValid() const { return File.get(); }
86 llvm::raw_ostream &getOStream() const {
87 assert(isFileValid());
88 return File->os();
89 }
90
91 llvm::MCInstPrinter &getMCInstPrinter() const { return *MCIP; }
92
93 void addResourcePressureView();
94 void addTimelineView(unsigned MaxIterations = 3, unsigned MaxCycles = 80);
95
96 void printReport() const;
97 };
98
99 } // namespace mca
100
101 #endif
0 //===--------------------- BackendStatistics.cpp ---------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// Functionalities used by the BackendPrinter to print out histograms
11 /// related to number of {dispatch/issue/retire} per number of cycles.
12 ///
13 //===----------------------------------------------------------------------===//
14
15 #include "BackendStatistics.h"
16 #include "llvm/Support/Format.h"
17
18 using namespace llvm;
19
20 namespace mca {
21
22 void BackendStatistics::printRetireUnitStatistics(llvm::raw_ostream &OS) const {
23 std::string Buffer;
24 raw_string_ostream TempStream(Buffer);
25 TempStream << "\n\nRetire Control Unit - "
26 << "number of cycles where we saw N instructions retired:\n";
27 TempStream << "[# retired], [# cycles]\n";
28
29 for (const std::pair &Entry : RetiredPerCycle) {
30 TempStream << " " << Entry.first;
31 if (Entry.first < 10)
32 TempStream << ", ";
33 else
34 TempStream << ", ";
35 TempStream << Entry.second << " ("
36 << format("%.1f", ((double)Entry.second / NumCycles) * 100.0)
37 << "%)\n";
38 }
39
40 TempStream.flush();
41 OS << Buffer;
42 }
43
44 void BackendStatistics::printDispatchUnitStatistics(llvm::raw_ostream &OS) const {
45 std::string Buffer;
46 raw_string_ostream TempStream(Buffer);
47 TempStream << "\n\nDispatch Logic - "
48 << "number of cycles where we saw N instructions dispatched:\n";
49 TempStream << "[# dispatched], [# cycles]\n";
50 for (const std::pair &Entry : DispatchGroupSizePerCycle) {
51 TempStream << " " << Entry.first << ", " << Entry.second
52 << " ("
53 << format("%.1f", ((double)Entry.second / NumCycles) * 100.0)
54 << "%)\n";
55 }
56
57 TempStream.flush();
58 OS << Buffer;
59 }
60
61 void BackendStatistics::printSchedulerStatistics(llvm::raw_ostream &OS) const {
62 std::string Buffer;
63 raw_string_ostream TempStream(Buffer);
64 TempStream << "\n\nSchedulers - number of cycles where we saw N instructions "
65 "issued:\n";
66 TempStream << "[# issued], [# cycles]\n";
67 for (const std::pair &Entry : IssuedPerCycle) {
68 TempStream << " " << Entry.first << ", " << Entry.second << " ("
69 << format("%.1f", ((double)Entry.second / NumCycles) * 100)
70 << "%)\n";
71 }
72
73 TempStream.flush();
74 OS << Buffer;
75 }
76
77 } // namespace mca
78
0 //===--------------------- BackendStatistics.h ------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// This file implements a printer class for printing generic Backend
11 /// statistics related to the dispatch logic, scheduler and retire unit.
12 ///
13 /// Example:
14 /// ========
15 ///
16 /// Dispatch Logic - number of cycles where we saw N instructions dispatched:
17 /// [# dispatched], [# cycles]
18 /// 0, 15 (11.5%)
19 /// 5, 4 (3.1%)
20 ///
21 /// Schedulers - number of cycles where we saw N instructions issued:
22 /// [# issued], [# cycles]
23 /// 0, 7 (5.4%)
24 /// 1, 4 (3.1%)
25 /// 2, 8 (6.2%)
26 ///
27 /// Retire Control Unit - number of cycles where we saw N instructions retired:
28 /// [# retired], [# cycles]
29 /// 0, 9 (6.9%)
30 /// 1, 6 (4.6%)
31 /// 2, 1 (0.8%)
32 /// 4, 3 (2.3%)
33 ///
34 //===----------------------------------------------------------------------===//
35
36 #ifndef LLVM_TOOLS_LLVM_MCA_BACKENDSTATISTICS_H
37 #define LLVM_TOOLS_LLVM_MCA_BACKENDSTATISTICS_H
38
39 #include "HWEventListener.h"
40 #include "llvm/Support/raw_ostream.h"
41 #include
42
43 namespace mca {
44
45 class BackendStatistics : public HWEventListener {
46 using Histogram = std::map;
47 Histogram DispatchGroupSizePerCycle;
48 Histogram RetiredPerCycle;
49 Histogram IssuedPerCycle;
50
51 unsigned NumDispatched;
52 unsigned NumIssued;
53 unsigned NumRetired;
54 unsigned NumCycles;
55
56 void updateHistograms() {
57 DispatchGroupSizePerCycle[NumDispatched]++;
58 IssuedPerCycle[NumIssued]++;
59 RetiredPerCycle[NumRetired]++;
60 NumDispatched = 0;
61 NumIssued = 0;
62 NumRetired = 0;
63 }
64
65 void printRetireUnitStatistics(llvm::raw_ostream &OS) const;
66 void printDispatchUnitStatistics(llvm::raw_ostream &OS) const;
67 void printSchedulerStatistics(llvm::raw_ostream &OS) const;
68
69 public:
70 BackendStatistics() : NumDispatched(0), NumIssued(0), NumRetired(0) {}
71
72 void onInstructionDispatched(unsigned Index) override { NumDispatched++; }
73 void
74 onInstructionIssued(unsigned Index,
75 const llvm::ArrayRef>
76 & /* unused */) override {
77 NumIssued++;
78 }
79 void onInstructionRetired(unsigned Index) override { NumRetired++; }
80
81 void onCycleBegin(unsigned Cycle) override { NumCycles++; }
82
83 void onCycleEnd(unsigned Cycle) override { updateHistograms(); }
84
85 void printHistograms(llvm::raw_ostream &OS) {
86 printDispatchUnitStatistics(OS);
87 printSchedulerStatistics(OS);
88 printRetireUnitStatistics(OS);
89 }
90 };
91
92 } // namespace mca
93
94 #endif
0 set(LLVM_LINK_COMPONENTS
1 AllTargetsAsmPrinters
2 AllTargetsAsmParsers
3 AllTargetsDescs
4 AllTargetsDisassemblers
5 AllTargetsInfos
6 MC
7 MCParser
8 Support
9 )
10
11 add_llvm_tool(llvm-mca
12 Backend.cpp
13 BackendPrinter.cpp
14 BackendStatistics.cpp
15 Dispatch.cpp
16 HWEventListener.cpp
17 InstrBuilder.cpp
18 Instruction.cpp
19 LSUnit.cpp
20 llvm-mca.cpp
21 ResourcePressureView.cpp
22 Scheduler.cpp
23 TimelineView.cpp
24 )
0 //===--------------------- Dispatch.cpp -------------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// This file implements methods declared by class RegisterFile, DispatchUnit
11 /// and RetireControlUnit.
12 ///
13 //===----------------------------------------------------------------------===//
14
15 #include "Dispatch.h"
16 #include "Backend.h"
17 #include "Scheduler.h"
18 #include "llvm/Support/Debug.h"
19
20 using namespace llvm;
21
22 #define DEBUG_TYPE "llvm-mca"
23
24 namespace mca {
25
26 void RegisterFile::addRegisterMapping(WriteState &WS) {
27 unsigned RegID = WS.getRegisterID();
28 assert(RegID && "Adding an invalid register definition?");
29
30 RegisterMappings[RegID] = &WS;
31 for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I)
32 RegisterMappings[*I] = &WS;
33 if (MaxUsedMappings == NumUsedMappings)
34 MaxUsedMappings++;
35 NumUsedMappings++;
36 TotalMappingsCreated++;
37 // If this is a partial update, then we are done.
38 if (!WS.fullyUpdatesSuperRegs())
39 return;
40
41 for (MCSuperRegIterator I(RegID, &MRI); I.isValid(); ++I)
42 RegisterMappings[*I] = &WS;
43 }
44
45 void RegisterFile::invalidateRegisterMapping(const WriteState &WS) {
46 unsigned RegID = WS.getRegisterID();
47 bool ShouldInvalidateSuperRegs = WS.fullyUpdatesSuperRegs();
48
49 assert(RegID != 0 && "Invalidating an already invalid register?");
50 assert(WS.getCyclesLeft() != -512 &&
51 "Invalidating a write of unknown cycles!");
52 assert(WS.getCyclesLeft() <= 0 && "Invalid cycles left for this write!");
53 if (!RegisterMappings[RegID])
54 return;
55
56 assert(NumUsedMappings);
57 NumUsedMappings--;
58
59 if (RegisterMappings[RegID] == &WS)
60 RegisterMappings[RegID] = nullptr;
61
62 for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I)
63 if (RegisterMappings[*I] == &WS)
64 RegisterMappings[*I] = nullptr;
65
66 if (!ShouldInvalidateSuperRegs)
67 return;
68
69 for (MCSuperRegIterator I(RegID, &MRI); I.isValid(); ++I)
70 if (RegisterMappings[*I] == &WS)
71 RegisterMappings[*I] = nullptr;
72 }
73
74 // Update the number of used mappings in the event of instruction retired.
75 // This mehod delegates to the register file the task of invalidating
76 // register mappings that were created for instruction IS.
77 void DispatchUnit::invalidateRegisterMappings(const Instruction &IS) {
78 for (const std::unique_ptr &WS : IS.getDefs()) {
79 DEBUG(dbgs() << "[RAT] Invalidating mapping for: ");
80 DEBUG(WS->dump());
81 RAT->invalidateRegisterMapping(*WS.get());
82 }
83 }
84
85 void RegisterFile::collectWrites(SmallVectorImpl &Writes,
86 unsigned RegID) const {
87 assert(RegID && RegID < RegisterMappings.size());
88 WriteState *WS = RegisterMappings[RegID];
89 if (WS) {
90 DEBUG(dbgs() << "Found a dependent use of RegID=" << RegID << '\n');
91 Writes.push_back(WS);
92 }
93
94 // Handle potential partial register updates.
95 for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I) {
96 WS = RegisterMappings[*I];
97 if (WS && std::find(Writes.begin(), Writes.end(), WS) == Writes.end()) {
98 DEBUG(dbgs() << "Found a dependent use of subReg " << *I << " (part of "
99 << RegID << ")\n");
100 Writes.push_back(WS);
101 }
102 }
103 }
104
105 bool RegisterFile::isAvailable(unsigned NumRegWrites) {
106 if (!TotalMappings)
107 return true;
108 if (NumRegWrites > TotalMappings) {
109 // The user specified a too small number of registers.
110 // Artificially set the number of temporaries to NumRegWrites.
111 errs() << "warning: not enough temporaries in the register file. "
112 << "The register file size has been automatically increased to "
113 << NumRegWrites << '\n';
114 TotalMappings = NumRegWrites;
115 }
116
117 return NumRegWrites + NumUsedMappings <= TotalMappings;
118 }
119
120 #ifndef NDEBUG
121 void RegisterFile::dump() const {
122 for (unsigned I = 0, E = MRI.getNumRegs(); I < E; ++I)
123 if (RegisterMappings[I]) {
124 dbgs() << MRI.getName(I) << ", " << I << ", ";
125 RegisterMappings[I]->dump();
126 }
127
128 dbgs() << "TotalMappingsCreated: " << TotalMappingsCreated
129 << ", MaxUsedMappings: " << MaxUsedMappings
130 << ", NumUsedMappings: " << NumUsedMappings << '\n';
131 }
132 #endif
133
134 // Reserves a number of slots, and returns a new token.
135 unsigned RetireControlUnit::reserveSlot(unsigned Index, unsigned NumMicroOps) {
136 assert(isAvailable(NumMicroOps));
137 unsigned NormalizedQuantity =
138 std::min(NumMicroOps, static_cast(Queue.size()));
139 // Zero latency instructions may have zero mOps. Artificially bump this
140 // value to 1. Although zero latency instructions don't consume scheduler
141 // resources, they still consume one slot in the retire queue.
142 NormalizedQuantity = std::max(NormalizedQuantity, 1U);
143 unsigned TokenID = NextAvailableSlotIdx;
144 Queue[NextAvailableSlotIdx] = {Index, NormalizedQuantity, false};
145 NextAvailableSlotIdx += NormalizedQuantity;
146 NextAvailableSlotIdx %= Queue.size();
147 AvailableSlots -= NormalizedQuantity;
148 return TokenID;
149 }
150
151 void DispatchUnit::notifyInstructionDispatched(unsigned Index) {
152 Owner->notifyInstructionDispatched(Index);
153 }
154
155 void DispatchUnit::notifyInstructionRetired(unsigned Index) {
156 Owner->notifyInstructionRetired(Index);
157 }
158
159 void RetireControlUnit::cycleEvent() {
160 if (isEmpty())
161 return;
162
163 unsigned NumRetired = 0;
164 while (!isEmpty()) {
165 if (MaxRetirePerCycle != 0 && NumRetired == MaxRetirePerCycle)
166 break;
167 RUToken &Current = Queue[CurrentInstructionSlotIdx];
168 assert(Current.NumSlots && "Reserved zero slots?");
169 if (!Current.Executed)
170 break;
171 Owner->notifyInstructionRetired(Current.Index);
172 CurrentInstructionSlotIdx += Current.NumSlots;
173 CurrentInstructionSlotIdx %= Queue.size();
174 AvailableSlots += Current.NumSlots;
175 NumRetired++;
176 }
177 }
178
179 void RetireControlUnit::onInstructionExecuted(unsigned TokenID) {
180 assert(Queue.size() > TokenID);
181 assert(Queue[TokenID].Executed == false && Queue[TokenID].Index != ~0U);
182 Queue[TokenID].Executed = true;
183 }
184
185 #ifndef NDEBUG
186 void RetireControlUnit::dump() const {
187 dbgs() << "Retire Unit: { Total Slots=" << Queue.size()
188 << ", Available Slots=" << AvailableSlots << " }\n";
189 }
190 #endif
191
192 bool DispatchUnit::checkRAT(const InstrDesc &Desc) {
193 unsigned NumWrites = Desc.Writes.size();
194 if (RAT->isAvailable(NumWrites))
195 return true;
196 DispatchStalls[DS_RAT_REG_UNAVAILABLE]++;
197 return false;
198 }
199
200 bool DispatchUnit::checkRCU(const InstrDesc &Desc) {
201 unsigned NumMicroOps = Desc.NumMicroOps;
202 if (RCU->isAvailable(NumMicroOps))
203 return true;
204 DispatchStalls[DS_RCU_TOKEN_UNAVAILABLE]++;
205 return false;
206 }
207
208 bool DispatchUnit::checkScheduler(const InstrDesc &Desc) {
209 // If this is a zero-latency instruction, then it bypasses
210 // the scheduler.
211 switch (SC->canBeDispatched(Desc)) {
212 case Scheduler::HWS_AVAILABLE:
213 return true;
214 case Scheduler::HWS_QUEUE_UNAVAILABLE:
215 DispatchStalls[DS_SQ_TOKEN_UNAVAILABLE]++;
216 break;
217 case Scheduler::HWS_LD_QUEUE_UNAVAILABLE:
218 DispatchStalls[DS_LDQ_TOKEN_UNAVAILABLE]++;
219 break;
220 case Scheduler::HWS_ST_QUEUE_UNAVAILABLE:
221 DispatchStalls[DS_STQ_TOKEN_UNAVAILABLE]++;
222 break;
223 case Scheduler::HWS_DISPATCH_GROUP_RESTRICTION:
224 DispatchStalls[DS_DISPATCH_GROUP_RESTRICTION]++;
225 }
226
227 return false;
228 }
229
230 unsigned DispatchUnit::dispatch(unsigned IID, Instruction *NewInst) {
231 assert(!CarryOver && "Cannot dispatch another instruction!");
232 unsigned NumMicroOps = NewInst->getDesc().NumMicroOps;
233 if (NumMicroOps > DispatchWidth) {
234 assert(AvailableEntries == DispatchWidth);
235 AvailableEntries = 0;
236 CarryOver = NumMicroOps - DispatchWidth;
237 } else {
238 assert(AvailableEntries >= NumMicroOps);
239 AvailableEntries -= NumMicroOps;
240 }
241
242 // Reserve slots in the RCU.
243 unsigned RCUTokenID = RCU->reserveSlot(IID, NumMicroOps);
244 Owner->notifyInstructionDispatched(IID);
245
246 SC->scheduleInstruction(IID, NewInst);
247 return RCUTokenID;
248 }
249
250 #ifndef NDEBUG
251 void DispatchUnit::dump() const {
252 RAT->dump();
253 RCU->dump();
254
255 unsigned DSRAT = DispatchStalls[DS_RAT_REG_UNAVAILABLE];
256 unsigned DSRCU = DispatchStalls[DS_RCU_TOKEN_UNAVAILABLE];
257 unsigned DSSCHEDQ = DispatchStalls[DS_SQ_TOKEN_UNAVAILABLE];
258 unsigned DSLQ = DispatchStalls[DS_LDQ_TOKEN_UNAVAILABLE];
259 unsigned DSSQ = DispatchStalls[DS_STQ_TOKEN_UNAVAILABLE];
260
261 dbgs() << "STALLS --- RAT: " << DSRAT << ", RCU: " << DSRCU
262 << ", SCHED_QUEUE: " << DSSCHEDQ << ", LOAD_QUEUE: " << DSLQ
263 << ", STORE_QUEUE: " << DSSQ << '\n';
264 }
265 #endif
266
267 } // namespace mca
0 //===----------------------- Dispatch.h -------------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// This file implements classes that are used to model register files,
11 /// reorder buffers and the hardware dispatch logic.
12 ///
13 //===----------------------------------------------------------------------===//
14
15 #ifndef LLVM_TOOLS_LLVM_MCA_DISPATCH_H
16 #define LLVM_TOOLS_LLVM_MCA_DISPATCH_H
17
18 #include "Instruction.h"
19 #include "llvm/MC/MCRegisterInfo.h"
20 #include
21
22 namespace mca {
23
24 class WriteState;
25 class DispatchUnit;
26 class Scheduler;
27 class Backend;
28
29 /// \brief Keeps track of register definitions.
30 ///
31 /// This class tracks register definitions, and performs register renaming
32 /// to break anti dependencies.
33 /// By default, there is no limit in the number of register aliases which
34 /// can be created for the purpose of register renaming. However, users can
35 /// specify at object construction time a limit in the number of temporary
36 /// registers which can be used by the register renaming logic.
37 class RegisterFile {
38 const llvm::MCRegisterInfo &MRI;
39 // Currently used mappings and maximum used mappings.
40 // These are to generate statistics only.
41 unsigned NumUsedMappings;
42 unsigned MaxUsedMappings;
43 // Total number of mappings created over time.
44 unsigned TotalMappingsCreated;
45
46 // The maximum number of register aliases which can be used by the
47 // register renamer. Defaut value for this field is zero.
48 // A value of zero for this field means that there is no limit in the
49 // amount of register mappings which can be created. That is equivalent
50 // to having a theoretically infinite number of temporary registers.
51 unsigned TotalMappings;
52
53 // This map contains an entry for every physical register.
54 // A register index is used as a key value to access a WriteState.
55 // This is how we track RAW dependencies for dispatched
56 // instructions. For every register, we track the last seen write only.
57 // This assumes that all writes fully update both super and sub registers.
58 // We need a flag in MCInstrDesc to check if a write also updates super
59 // registers. We can then have a extra tablegen flag to set for instructions.
60 // This is a separate patch on its own.
61 std::vector RegisterMappings;
62 // Assumptions are:
63 // a) a false dependencies is always removed by the register renamer.
64 // b) the register renamer can create an "infinite" number of mappings.
65 // Since we track the number of mappings created, in future we may
66 // introduce constraints on the number of mappings that can be created.
67 // For example, the maximum number of registers that are available for
68 // register renaming purposes may default to the size of the register file.
69
70 // In future, we can extend this design to allow multiple register files, and
71 // apply different restrictions on the register mappings and the number of
72 // temporary registers used by mappings.
73
74 public:
75 RegisterFile(const llvm::MCRegisterInfo &mri, unsigned Mappings = 0)
76 : MRI(mri), NumUsedMappings(0), MaxUsedMappings(0),
77 TotalMappingsCreated(0), TotalMappings(Mappings),
78 RegisterMappings(MRI.getNumRegs(), nullptr) {}
79
80 // Creates a new register mapping for RegID.
81 // This reserves a temporary register in the register file.
82 void addRegisterMapping(WriteState &WS);
83
84 // Invalidates register mappings associated to the input WriteState object.
85 // This releases temporary registers in the register file.
86 void invalidateRegisterMapping(const WriteState &WS);
87
88 bool isAvailable(unsigned NumRegWrites);
89 void collectWrites(llvm::SmallVectorImpl &Writes,
90 unsigned RegID) const;
91 void updateOnRead(ReadState &RS, unsigned RegID);
92 unsigned getMaxUsedRegisterMappings() const { return MaxUsedMappings; }
93 unsigned getTotalRegisterMappingsCreated() const {
94 return TotalMappingsCreated;
95 }
96
97 #ifndef NDEBUG
98 void dump() const;
99 #endif
100 };
101
102 /// \brief tracks which instructions are in-flight (i.e. dispatched but not
103 /// retired) in the OoO backend.
104 ///
105 /// This class checks on every cycle if/which instructions can be retired.
106 /// Instructions are retired in program order.
107 /// In the event of instruction retired, the DispatchUnit object that owns
108 /// this RetireControlUnit gets notified.
109 /// On instruction retired, register updates are all architecturally
110 /// committed, and any temporary registers originally allocated for the
111 /// retired instruction are freed.
112 struct RetireControlUnit {
113 // A "token" (object of class RUToken) is created by the retire unit for every
114 // instruction dispatched to the schedulers. Flag 'Executed' is used to
115 // quickly check if an instruction has reached the write-back stage. A token
116 // also carries information related to the number of entries consumed by the
117 // instruction in the reorder buffer. The idea is that those entries will
118 // become available again once the instruction is retired. On every cycle,
119 // the RCU (Retire Control Unit) scans every token starting to search for
120 // instructions that are ready to retire. retired. Instructions are retired
121 // in program order. Only 'Executed' instructions are eligible for retire.
122 // Note that the size of the reorder buffer is defined by the scheduling model
123 // via field 'NumMicroOpBufferSize'.
124 struct RUToken {
125 unsigned Index; // Instruction index.
126 unsigned NumSlots; // Slots reserved to this instruction.
127 bool Executed; // True if the instruction is past the WB stage.
128 };
129
130 private:
131 unsigned NextAvailableSlotIdx;
132 unsigned CurrentInstructionSlotIdx;
133 unsigned AvailableSlots;
134 unsigned MaxRetirePerCycle; // 0 means no limit.
135 std::vector Queue;
136 DispatchUnit *Owner;
137
138 public:
139 RetireControlUnit(unsigned NumSlots, unsigned RPC, DispatchUnit *DU)
140 : NextAvailableSlotIdx(0), CurrentInstructionSlotIdx(0),
141 AvailableSlots(NumSlots), MaxRetirePerCycle(RPC), Owner(DU) {
142 assert(NumSlots && "Expected at least one slot!");
143 Queue.resize(NumSlots);
144 }
145
146 bool isFull() const { return !AvailableSlots; }
147 bool isEmpty() const { return AvailableSlots == Queue.size(); }
148 bool isAvailable(unsigned Quantity = 1) const {
149 // Some instructions may declare a number of uOps which exceedes the size
150 // of the reorder buffer. To avoid problems, cap the amount of slots to
151 // the size of the reorder buffer.
152 Quantity = std::min(Quantity, static_cast(Queue.size()));
153 return AvailableSlots >= Quantity;
154 }
155
156 // Reserves a number of slots, and returns a new token.
157 unsigned reserveSlot(unsigned Index, unsigned NumMicroOps);
158
159 /// Retires instructions in program order.
160 void cycleEvent();
161
162 void onInstructionExecuted(unsigned TokenID);
163
164 #ifndef NDEBUG
165 void dump() const;
166 #endif
167 };
168
169 // \brief Implements the hardware dispatch logic.
170 //
171 // This class is responsible for the dispatch stage, in which instructions are
172 // dispatched in groups to the Scheduler. An instruction can be dispatched if
173 // functional units are available.
174 // To be more specific, an instruction can be dispatched to the Scheduler if:
175 // 1) There are enough entries in the reorder buffer (implemented by class
176 // RetireControlUnit) to accomodate all opcodes.
177 // 2) There are enough temporaries to rename output register operands.
178 // 3) There are enough entries available in the used buffered resource(s).
179 //
180 // The number of micro opcodes that can be dispatched in one cycle is limited by
181 // the value of field 'DispatchWidth'. A "dynamic dispatch stall" occurs when
182 // processor resources are not available (i.e. at least one of the
183 // abovementioned checks fails). Dispatch stall events are counted during the
184 // entire execution of the code, and displayed by the performance report when
185 // flag '-verbose' is specified.
186 //
187 // If the number of micro opcodes of an instruction is bigger than
188 // DispatchWidth, then it can only be dispatched at the beginning of one cycle.
189 // The DispatchUnit will still have to wait for a number of cycles (depending on
190 // the DispatchWidth and the number of micro opcodes) before it can serve other
191 // instructions.
192 class DispatchUnit {
193 unsigned DispatchWidth;
194 unsigned AvailableEntries;
195 unsigned CarryOver;
196 Scheduler *SC;
197
198 std::unique_ptr RAT;
199 std::unique_ptr RCU;
200 Backend *Owner;
201
202 /// Dispatch stall event identifiers.
203 ///
204 /// The naming convention is:
205 /// * Event names starts with the "DS_" prefix
206 /// * For dynamic dispatch stalls, the "DS_" prefix is followed by the
207 /// the unavailable resource/functional unit acronym (example: RAT)
208 /// * The last substring is the event reason (example: REG_UNAVAILABLE means
209 /// that register renaming couldn't find enough spare registers in the
210 /// register file).
211 ///
212 /// List of acronyms used for processor resoures:
213 /// RAT - Register Alias Table (used by the register renaming logic)
214 /// RCU - Retire Control Unit
215 /// SQ - Scheduler's Queue
216 /// LDQ - Load Queue
217 /// STQ - Store Queue
218 enum {
219 DS_RAT_REG_UNAVAILABLE,
220 DS_RCU_TOKEN_UNAVAILABLE,
221 DS_SQ_TOKEN_UNAVAILABLE,
222 DS_LDQ_TOKEN_UNAVAILABLE,
223 DS_STQ_TOKEN_UNAVAILABLE,
224 DS_DISPATCH_GROUP_RESTRICTION,
225 DS_LAST
226 };
227
228 // The DispatchUnit track dispatch stall events caused by unavailable
229 // of hardware resources. Events are classified based on the stall kind;
230 // so we have a counter for every source of dispatch stall. Counters are
231 // stored into a vector `DispatchStall` which is always of size DS_LAST.
232 std::vector DispatchStalls;
233
234 bool checkRAT(const InstrDesc &Desc);
235 bool checkRCU(const InstrDesc &Desc);
236 bool checkScheduler(const InstrDesc &Desc);
237
238 void notifyInstructionDispatched(unsigned IID);
239
240 public:
241 DispatchUnit(Backend *B, const llvm::MCRegisterInfo &MRI,
242 unsigned MicroOpBufferSize, unsigned RegisterFileSize,
243 unsigned MaxRetirePerCycle, unsigned MaxDispatchWidth,
244 Scheduler *Sched)
245 : DispatchWidth(MaxDispatchWidth), AvailableEntries(MaxDispatchWidth),
246 CarryOver(0U), SC(Sched),
247 RAT(llvm::make_unique(MRI, RegisterFileSize)),
248 RCU(llvm::make_unique(MicroOpBufferSize,
249 MaxRetirePerCycle, this)),
250 Owner(B), DispatchStalls(DS_LAST, 0) {}
251
252 unsigned getDispatchWidth() const { return DispatchWidth; }
253
254 bool isAvailable(unsigned NumEntries) const {
255 return NumEntries <= AvailableEntries || AvailableEntries == DispatchWidth;
256 }
257
258 bool isRCUEmpty() const { return RCU->isEmpty(); }
259
260 bool canDispatch(const InstrDesc &Desc) {
261 assert(isAvailable(Desc.NumMicroOps));
262 return checkRCU(Desc) && checkRAT(Desc) && checkScheduler(Desc);
263 }
264
265 unsigned dispatch(unsigned IID, Instruction *NewInst);
266
267 void collectWrites(llvm::SmallVectorImpl &Vec,
268 unsigned RegID) const {
269 return RAT->collectWrites(Vec, RegID);
270 }
271 unsigned getNumRATStalls() const {
272 return DispatchStalls[DS_RAT_REG_UNAVAILABLE];
273 }
274 unsigned getNumRCUStalls() const {
275 return DispatchStalls[DS_RCU_TOKEN_UNAVAILABLE];
276 }
277 unsigned getNumSQStalls() const {
278 return DispatchStalls[DS_SQ_TOKEN_UNAVAILABLE];
279 }
280 unsigned getNumLDQStalls() const {
281 return DispatchStalls[DS_LDQ_TOKEN_UNAVAILABLE];
282 }
283 unsigned getNumSTQStalls() const {
284 return DispatchStalls[DS_STQ_TOKEN_UNAVAILABLE];
285 }
286 unsigned getNumDispatchGroupStalls() const {
287 return DispatchStalls[DS_DISPATCH_GROUP_RESTRICTION];
288 }
289 unsigned getMaxUsedRegisterMappings() const {
290 return RAT->getMaxUsedRegisterMappings();
291 }
292 unsigned getTotalRegisterMappingsCreated() const {
293 return RAT->getTotalRegisterMappingsCreated();
294 }
295 void addNewRegisterMapping(WriteState &WS) { RAT->addRegisterMapping(WS); }
296
297 void cycleEvent(unsigned Cycle) {
298 RCU->cycleEvent();
299 AvailableEntries =
300 CarryOver >= DispatchWidth ? 0 : DispatchWidth - CarryOver;
301 CarryOver = CarryOver >= DispatchWidth ? CarryOver - DispatchWidth : 0U;
302 }
303
304 void notifyInstructionRetired(unsigned Index);
305
306 void onInstructionExecuted(unsigned TokenID) {
307 RCU->onInstructionExecuted(TokenID);
308 }
309
310 void invalidateRegisterMappings(const Instruction &Inst);
311 #ifndef NDEBUG
312 void dump() const;
313 #endif
314 };
315
316 } // namespace mca
317
318 #endif
0 //===----------------------- HWEventListener.cpp ----------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// This file defines a vtable anchor for struct HWEventListener.
11 ///
12 //===----------------------------------------------------------------------===//
13
14 #include "HWEventListener.h"
15
16 namespace mca {
17
18 // Anchor the vtable here.
19 void HWEventListener::anchor() {}
20
21 } // namespace mca
0
1 //===----------------------- HWEventListener.h ------------------*- C++ -*-===//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //===----------------------------------------------------------------------===//
9 /// \file
10 ///
11 /// This file defines the main interface for hardware event listeners.
12 ///
13 //===----------------------------------------------------------------------===//
14
15 #ifndef LLVM_TOOLS_LLVM_MCA_HWEVENTLISTENER_H
16 #define LLVM_TOOLS_LLVM_MCA_HWEVENTLISTENER_H
17
18 #include "llvm/ADT/ArrayRef.h"
19 #include
20
21 namespace mca {
22
23 struct HWEventListener {
24 // Events generated by the Retire Control Unit.
25 virtual void onInstructionRetired(unsigned Index) {};
26
27 // Events generated by the Scheduler.
28 using ResourceRef = std::pair;
29 virtual void
30 onInstructionIssued(unsigned Index,
31 const llvm::ArrayRef> &Used) {}
32 virtual void onInstructionExecuted(unsigned Index) {}
33 virtual void onInstructionReady(unsigned Index) {}
34 virtual void onResourceAvailable(const ResourceRef &RRef) {};
35
36 // Events generated by the Dispatch logic.
37 virtual void onInstructionDispatched(unsigned Index) {}
38
39 // Generic events generated by the Backend.
40 virtual void onCycleBegin(unsigned Cycle) {}
41 virtual void onCycleEnd(unsigned Cycle) {}
42
43 virtual ~HWEventListener() = default;
44 virtual void anchor();
45 };
46
47 } // namespace mca
48
49 #endif
0 //===--------------------- InstrBuilder.cpp ---------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// This file implements the InstrBuilder interface.
11 ///
12 //===----------------------------------------------------------------------===//
13
14 #include "InstrBuilder.h"
15 #include "llvm/MC/MCInst.h"
16 #include "llvm/Support/Debug.h"
17 #include "llvm/Support/raw_ostream.h"
18
19 #define DEBUG_TYPE "llvm-mca"
20
21 namespace mca {
22
23 using namespace llvm;
24
25 static void
26 initializeUsedResources(InstrDesc &ID, const MCSchedClassDesc &SCDesc,
27 const MCSubtargetInfo &STI,
28 const ArrayRef ProcResourceMasks) {
29 const MCSchedModel &SM = STI.getSchedModel();
30
31 // Populate resources consumed.
32 using ResourcePlusCycles = std::pair;
33 std::vector Worklist;
34 for (unsigned I = 0, E = SCDesc.NumWriteProcResEntries; I < E; ++I) {
35 const MCWriteProcResEntry *PRE = STI.getWriteProcResBegin(&SCDesc) + I;
36 const MCProcResourceDesc &PR = *SM.getProcResource(PRE->ProcResourceIdx);
37 uint64_t Mask = ProcResourceMasks[PRE->ProcResourceIdx];
38 if (PR.BufferSize != -1)
39 ID.Buffers.push_back(Mask);
40 CycleSegment RCy(0, PRE->Cycles, false);
41 Worklist.emplace_back(ResourcePlusCycles(Mask, ResourceUsage(RCy)));
42 }
43
44 // Sort elements by mask popcount, so that we prioritize resource units over
45 // resource groups, and smaller groups over larger groups.
46 std::sort(Worklist.begin(), Worklist.end(),
47 [](const ResourcePlusCycles &A, const ResourcePlusCycles &B) {
48 unsigned popcntA = countPopulation(A.first);
49 unsigned popcntB = countPopulation(B.first);
50 if (popcntA < popcntB)
51 return true;
52 if (popcntA > popcntB)
53 return false;
54 return A.first < B.first;
55 });
56
57 uint64_t UsedResourceUnits = 0;
58
59 // Remove cycles contributed by smaller resources.
60 for (unsigned I = 0, E = Worklist.size(); I < E; ++I) {
61 ResourcePlusCycles &A = Worklist[I];
62 if (!A.second.size()) {
63 A.second.NumUnits = 0;
64 A.second.setReserved();
65 ID.Resources.emplace_back(A);
66 continue;
67 }
68
69 ID.Resources.emplace_back(A);
70 uint64_t NormalizedMask = A.first;
71 if (countPopulation(A.first) == 1) {
72 UsedResourceUnits |= A.first;
73 } else {
74 // Remove the leading 1 from the resource group mask.
75 NormalizedMask ^= PowerOf2Floor(NormalizedMask);
76 }
77
78 for (unsigned J = I + 1; J < E; ++J) {
79 ResourcePlusCycles &B = Worklist[J];
80 if ((NormalizedMask & B.first) == NormalizedMask) {
81 B.second.CS.Subtract(A.second.size());
82 if (countPopulation(B.first) > 1)
83 B.second.NumUnits++;
84 }
85 }
86 }
87
88 // A SchedWrite may specify a number of cycles in which a resource group
89 // is reserved. For example (on target x86; cpu Haswell):
90 //
91 // SchedWriteRes<[HWPort0, HWPort1, HWPort01]> {
92 // let ResourceCycles = [2, 2, 3];
93 // }
94 //
95 // This means:
96 // Resource units HWPort0 and HWPort1 are both used for 2cy.
97 // Resource group HWPort01 is the union of HWPort0 and HWPort1.
98 // Since this write touches both HWPort0 and HWPort1 for 2cy, HWPort01
99 // will not be usable for 2 entire cycles from instruction issue.
100 //
101 // On top of those 2cy, SchedWriteRes explicitly specifies an extra latency
102 // of 3 cycles for HWPort01. This tool assumes that the 3cy latency is an
103 // extra delay on top of the 2 cycles latency.
104 // During those extra cycles, HWPort01 is not usable by other instructions.
105 for (ResourcePlusCycles &RPC : ID.Resources) {
106 if (countPopulation(RPC.first) > 1 && !RPC.second.isReserved()) {
107 // Remove the leading 1 from the resource group mask.
108 uint64_t Mask = RPC.first ^ PowerOf2Floor(RPC.first);
109 if ((Mask & UsedResourceUnits) == Mask)
110 RPC.second.setReserved();
111 }
112 }
113
114 DEBUG(
115 for (const std::pair &R : ID.Resources)
116 dbgs() << "\t\tMask=" << R.first << ", cy=" << R.second.size() << '\n';
117 for (const uint64_t R : ID.Buffers)
118 dbgs() << "\t\tBuffer Mask=" << R << '\n';
119 );
120 }
121
122 static void computeMaxLatency(InstrDesc &ID, const MCInstrDesc &MCDesc,
123 const MCSchedClassDesc &SCDesc,
124 const MCSubtargetInfo &STI) {
125 unsigned MaxLatency = 0;
126 unsigned NumWriteLatencyEntries = SCDesc.NumWriteLatencyEntries;
127 for (unsigned I = 0, E = NumWriteLatencyEntries; I < E; ++I) {
128 int Cycles = STI.getWriteLatencyEntry(&SCDesc, I)->Cycles;
129 // Check if this is an unknown latency. Conservatively (pessimistically)
130 // assume a latency of 100cy if late.
131 if (Cycles == -1)
132 Cycles = 100;
133 MaxLatency = std::max(MaxLatency, static_cast(Cycles));
134 }
135
136 if (MCDesc.isCall()) {
137 // We cannot estimate how long this call will take.
138 // Artificially set an arbitrarily high latency (100cy).
139 MaxLatency = std::max(100U, MaxLatency);
140 }
141
142 // Check if this instruction consumes any processor resources.
143 // If the latency is unknown, then conservatively set it equal to the maximum
144 // number of cycles for a resource consumed by this instruction.
145 if (!MaxLatency && ID.Resources.size()) {
146 // Check if this instruction consumes any processor resources.
147 // If so, then compute the max of the cycles spent for each resource, and
148 // use it as the MaxLatency.
149 for (const std::pair &Resource : ID.Resources)
150 MaxLatency = std::max(MaxLatency, Resource.second.size());
151 }
152
153 if (SCDesc.isVariant() && MaxLatency == 0) {
154 errs() << "note: unknown latency for a variant opcode. Conservatively"
155 << " assume a default latency of 1cy.\n";
156 MaxLatency = 1;
157 }
158
159 ID.MaxLatency = MaxLatency;
160 }
161
162 static void populateWrites(InstrDesc &ID, const MCInst &MCI,
163 const MCInstrDesc &MCDesc,
164 const MCSchedClassDesc &SCDesc,
165 const MCSubtargetInfo &STI) {
166 computeMaxLatency(ID, MCDesc, SCDesc, STI);
167
168 // Set if writes through this opcode may update super registers.
169 // TODO: on x86-64, a 4 byte write of a general purpose register always
170 // fully updates the super-register.
171 // More in general, (at least on x86) not all register writes perform
172 // a partial (super-)register update.
173 // For example, an AVX instruction that writes on a XMM register implicitly
174 // zeroes the upper half of every aliasing super-register.
175 //
176 // For now, we pessimistically assume that writes are all potentially
177 // partial register updates. This is a good default for most targets, execept
178 // for those like x86 which implement a special semantic for certain opcodes.
179 // At least on x86, this may lead to an inaccurate prediction of the
180 // instruction level parallelism.
181 bool FullyUpdatesSuperRegisters = false;
182
183 // Now Populate Writes.
184
185 // This algorithm currently works under the strong (and potentially incorrect)
186 // assumption that information related to register def/uses can be obtained
187 // from MCInstrDesc.
188 //
189 // However class MCInstrDesc is used to describe MachineInstr objects and not
190 // MCInst objects. To be more specific, MCInstrDesc objects are opcode
191 // descriptors that are automatically generated via tablegen based on the
192 // instruction set information available from the target .td files. That
193 // means, the number of (explicit) definitions according to MCInstrDesc always
194 // matches the cardinality of the `(outs)` set in tablegen.
195 //
196 // By constructions, definitions must appear first in the operand sequence of
197 // a MachineInstr. Also, the (outs) sequence is preserved (example: the first
198 // element in the outs set is the first operand in the corresponding
199 // MachineInstr). That's the reason why MCInstrDesc only needs to declare the
200 // total number of register definitions, and not where those definitions are
201 // in the machine operand sequence.
202 //
203 // Unfortunately, it is not safe to use the information from MCInstrDesc to
204 // also describe MCInst objects. An MCInst object can be obtained from a
205 // MachineInstr through a lowering step which may restructure the operand
206 // sequence (and even remove or introduce new operands). So, there is a high
207 // risk that the lowering step breaks the assumptions that register
208 // definitions are always at the beginning of the machine operand sequence.
209 //
210 // This is a fundamental problem, and it is still an open problem. Essentially
211 // we have to find a way to correlate def/use operands of a MachineInstr to
212 // operands of an MCInst. Otherwise, we cannot correctly reconstruct data
213 // dependencies, nor we can correctly interpret the scheduling model, which
214 // heavily uses machine operand indices to define processor read-advance
215 // information, and to identify processor write resources. Essentially, we
216 // either need something like a MCInstrDesc, but for MCInst, or a way
217 // to map MCInst operands back to MachineInstr operands.
218 //
219 // Unfortunately, we don't have that information now. So, this prototype
220 // currently work under the strong assumption that we can always safely trust
221 // the content of an MCInstrDesc. For example, we can query a MCInstrDesc to
222 // obtain the number of explicit and implicit register defintions. We also
223 // assume that register definitions always come first in the operand sequence.
224 // This last assumption usually makes sense for MachineInstr, where register
225 // definitions always appear at the beginning of the operands sequence. In
226 // reality, these assumptions could be broken by the lowering step, which can
227 // decide to lay out operands in a different order than the original order of
228 // operand as specified by the MachineInstr.
229 //
230 // Things get even more complicated in the presence of "optional" register
231 // definitions. For MachineInstr, optional register definitions are always at
232 // the end of the operand sequence. Some ARM instructions that may update the
233 // status flags specify that register as a optional operand. Since we don't
234 // have operand descriptors for MCInst, we assume for now that the optional
235 // definition is always the last operand of a MCInst. Again, this assumption
236 // may be okay for most targets. However, there is no guarantee that targets
237 // would respect that.
238 //
239 // In conclusion: these are for now the strong assumptions made by the tool:
240 // * The number of explicit and implicit register definitions in a MCInst
241 // matches the number of explicit and implicit definitions according to
242 // the opcode descriptor (MCInstrDesc).
243 // * Register definitions take precedence over register uses in the operands
244 // list.
245 // * If an opcode specifies an optional definition, then the optional
246 // definition is always the last operand in the sequence, and it can be
247 // set to zero (i.e. "no register").
248 //
249 // These assumptions work quite well for most out-of-order in-tree targets
250 // like x86. This is mainly because the vast majority of instructions is
251 // expanded to MCInst using a straightforward lowering logic that preserves
252 // the ordering of the operands.
253 //
254 // In the longer term, we need to find a proper solution for this issue.
255 unsigned NumExplicitDefs = MCDesc.getNumDefs();
256 unsigned NumImplicitDefs = MCDesc.getNumImplicitDefs();
257 unsigned NumWriteLatencyEntries = SCDesc.NumWriteLatencyEntries;
258 unsigned TotalDefs = NumExplicitDefs + NumImplicitDefs;
259 if (MCDesc.hasOptionalDef())
260 TotalDefs++;
261 ID.Writes.resize(TotalDefs);
262 // Iterate over the operands list, and skip non-register operands.
263 // The first NumExplictDefs register operands are expected to be register
264 // definitions.
265 unsigned CurrentDef = 0;
266 unsigned i = 0;
267 for (; i < MCI.getNumOperands() && CurrentDef < NumExplicitDefs; ++i) {
268 const MCOperand &Op = MCI.getOperand(i);
269 if (!Op.isReg())
270 continue;
271
272 WriteDescriptor &Write = ID.Writes[CurrentDef];
273 Write.OpIndex = i;
274 if (CurrentDef < NumWriteLatencyEntries) {
275 const MCWriteLatencyEntry &WLE =
276 *STI.getWriteLatencyEntry(&SCDesc, CurrentDef);
277 // Conservatively default to MaxLatency.
278 Write.Latency = WLE.Cycles == -1 ? ID.MaxLatency : WLE.Cycles;
279 Write.SClassOrWriteResourceID = WLE.WriteResourceID;
280 } else {
281 // Assign a default latency for this write.
282 Write.Latency = ID.MaxLatency;
283 Write.SClassOrWriteResourceID = 0;
284 }
285 Write.FullyUpdatesSuperRegs = FullyUpdatesSuperRegisters;
286 Write.IsOptionalDef = false;
287 DEBUG(
288 dbgs() << "\t\tOpIdx=" << Write.OpIndex
289 << ", Latency=" << Write.Latency << ", WriteResourceID="
290 << Write.SClassOrWriteResourceID << '\n';
291 );
292 CurrentDef++;
293 }
294
295 if (CurrentDef != NumExplicitDefs)
296 llvm::report_fatal_error(
297 "error: Expected more register operand definitions. ");
298
299 CurrentDef = 0;
300 for (CurrentDef = 0; CurrentDef < NumImplicitDefs; ++CurrentDef) {
301 unsigned Index = NumExplicitDefs + CurrentDef;
302 WriteDescriptor &Write = ID.Writes[Index];
303 Write.OpIndex = -1;
304 Write.RegisterID = MCDesc.getImplicitDefs()[CurrentDef];
305 Write.Latency = ID.MaxLatency;
306 Write.SClassOrWriteResourceID = 0;
307 Write.IsOptionalDef = false;
308 assert(Write.RegisterID != 0 && "Expected a valid phys register!");
309 DEBUG(dbgs() << "\t\tOpIdx=" << Write.OpIndex << ", PhysReg="
310 << Write.RegisterID << ", Latency=" << Write.Latency
311 << ", WriteResourceID=" << Write.SClassOrWriteResourceID
312 << '\n');
313 }
314
315 if (MCDesc.hasOptionalDef()) {
316 // Always assume that the optional definition is the last operand of the
317 // MCInst sequence.
318 const MCOperand &Op = MCI.getOperand(MCI.getNumOperands() - 1);
319 if (i == MCI.getNumOperands() || !Op.isReg())
320 llvm::report_fatal_error(
321 "error: expected a register operand for an optional "
322 "definition. Instruction has not be correctly analyzed.\n",
323 false);
324
325 WriteDescriptor &Write = ID.Writes[TotalDefs - 1];
326 Write.OpIndex = MCI.getNumOperands() - 1;
327 // Assign a default latency for this write.
328 Write.Latency = ID.MaxLatency;
329 Write.SClassOrWriteResourceID = 0;
330 Write.IsOptionalDef = true;
331 }
332 }
333
334 static void populateReads(InstrDesc &ID, const MCInst &MCI,
335 const MCInstrDesc &MCDesc,
336 const MCSchedClassDesc &SCDesc,
337 const MCSubtargetInfo &STI) {
338 unsigned SchedClassID = MCDesc.getSchedClass();
339 bool HasReadAdvanceEntries = SCDesc.NumReadAdvanceEntries > 0;
340
341 unsigned i = 0;
342 unsigned NumExplicitDefs = MCDesc.getNumDefs();
343 // Skip explicit definitions.
344 for (; i < MCI.getNumOperands() && NumExplicitDefs; ++i) {
345 const MCOperand &Op = MCI.getOperand(i);
346 if (Op.isReg())
347 NumExplicitDefs--;
348 }
349
350 if (NumExplicitDefs)
351 llvm::report_fatal_error(
352 "error: Expected more register operand definitions. ", false);
353
354 unsigned NumExplicitUses = MCI.getNumOperands() - i;
355 unsigned NumImplicitUses = MCDesc.getNumImplicitUses();
356 if (MCDesc.hasOptionalDef()) {
357 assert(NumExplicitUses);
358 NumExplicitUses--;
359 }
360 unsigned TotalUses = NumExplicitUses + NumImplicitUses;
361 if (!TotalUses)
362 return;
363
364 ID.Reads.resize(TotalUses);
365 for (unsigned CurrentUse = 0; CurrentUse < NumExplicitUses; ++CurrentUse) {
366 ReadDescriptor &Read = ID.Reads[CurrentUse];
367 Read.OpIndex = i + CurrentUse;
368 Read.HasReadAdvanceEntries = HasReadAdvanceEntries;
369 Read.SchedClassID = SchedClassID;
370 DEBUG(dbgs() << "\t\tOpIdx=" << Read.OpIndex);
371 }
372
373 for (unsigned CurrentUse = 0; CurrentUse < NumImplicitUses; ++CurrentUse) {
374 ReadDescriptor &Read = ID.Reads[NumExplicitUses + CurrentUse];
375 Read.OpIndex = -1;
376 Read.RegisterID = MCDesc.getImplicitUses()[CurrentUse];
377 Read.HasReadAdvanceEntries = false;
378 Read.SchedClassID = SchedClassID;
379 DEBUG(dbgs() << "\t\tOpIdx=" << Read.OpIndex
380 << ", RegisterID=" << Read.RegisterID << '\n');
381 }
382 }
383
384 void InstrBuilder::createInstrDescImpl(const MCSubtargetInfo &STI,
385 const MCInst &MCI) {
386 assert(STI.getSchedModel().hasInstrSchedModel() &&
387 "Itineraries are not yet supported!");
388
389 unsigned short Opcode = MCI.getOpcode();
390 // Obtain the instruction descriptor from the opcode.
391 const MCInstrDesc &MCDesc = MCII.get(Opcode);
392 const MCSchedModel &SM = STI.getSchedModel();
393
394 // Then obtain the scheduling class information from the instruction.
395 const MCSchedClassDesc &SCDesc =
396 *SM.getSchedClassDesc(MCDesc.getSchedClass());
397
398 // Create a new empty descriptor.
399 InstrDesc *ID = new InstrDesc();
400
401 if (SCDesc.isVariant()) {
402 errs() << "warning: don't know how to model variant opcodes.\n"
403 << "note: assume 1 micro opcode.\n";
404 ID->NumMicroOps = 1U;
405 } else {
406 ID->NumMicroOps = SCDesc.NumMicroOps;
407 }
408
409 if (MCDesc.isCall()) {
410 // We don't correctly model calls.
411 errs() << "warning: found a call in the input assembly sequence.\n"
412 << "note: call instructions are not correctly modeled. Assume a "
413 "latency of 100cy.\n";
414 }
415
416 if (MCDesc.isReturn()) {
417 errs() << "warning: found a return instruction in the input assembly "
418 "sequence.\n"
419 << "note: program counter updates are ignored.\n";
420 }
421
422 ID->MayLoad = MCDesc.mayLoad();
423 ID->MayStore = MCDesc.mayStore();
424 ID->HasSideEffects = MCDesc.hasUnmodeledSideEffects();
425
426 initializeUsedResources(*ID, SCDesc, STI, ProcResourceMasks);
427 populateWrites(*ID, MCI, MCDesc, SCDesc, STI);
428 populateReads(*ID, MCI, MCDesc, SCDesc, STI);
429
430 DEBUG(dbgs() << "\t\tMaxLatency=" << ID->MaxLatency << '\n');
431 DEBUG(dbgs() << "\t\tNumMicroOps=" << ID->NumMicroOps << '\n');
432
433 // Now add the new descriptor.
434 Descriptors[Opcode] = std::unique_ptr(ID);
435 }
436
437 const InstrDesc &InstrBuilder::getOrCreateInstrDesc(const MCSubtargetInfo &STI,
438 const MCInst &MCI) {
439 auto it = Descriptors.find(MCI.getOpcode());
440 if (it == Descriptors.end())
441 createInstrDescImpl(STI, MCI);
442 return *Descriptors[MCI.getOpcode()].get();
443 }
444
445 Instruction *InstrBuilder::createInstruction(const MCSubtargetInfo &STI,
446 DispatchUnit &DU, unsigned Idx,
447 const MCInst &MCI) {
448 const InstrDesc &D = getOrCreateInstrDesc(STI, MCI);
449 Instruction *NewIS = new Instruction(D);
450
451 // Populate Reads first.
452 const MCSchedModel &SM = STI.getSchedModel();
453 SmallVector DependentWrites;
454 for (const ReadDescriptor &RD : D.Reads) {
455 int RegID = -1;
456 if (RD.OpIndex != -1) {
457 // explicit read.
458 const MCOperand &Op = MCI.getOperand(RD.OpIndex);
459 // Skip non-register operands.
460 if (!Op.isReg())
461 continue;
462 RegID = Op.getReg();
463 } else {
464 // Implicit read.
465 RegID = RD.RegisterID;
466 }
467
468 // Skip invalid register operands.
469 if (!RegID)
470 continue;
471
472 // Okay, this is a register operand. Create a ReadState for it.
473 assert(RegID > 0 && "Invalid register ID found!");
474 ReadState *NewRDS = new ReadState(RD);
475 NewIS->getUses().emplace_back(std::unique_ptr(NewRDS));
476 DU.collectWrites(DependentWrites, RegID);
477 NewRDS->setDependentWrites(DependentWrites.size());
478 DEBUG(dbgs() << "Found " << DependentWrites.size()
479 << " dependent writes\n");
480
481 // We know that this read depends on all the writes in DependentWrites.
482 // For each write, check if we have ReadAdvance information, and use it
483 // to figure out after how many cycles this read becomes available.
484 if (!RD.HasReadAdvanceEntries) {
485 for (WriteState *WS : DependentWrites)
486 WS->addUser(NewRDS, /* ReadAdvance */ 0);
487 // Prepare the set for another round.
488 DependentWrites.clear();
489 continue;
490 }
491
492 const MCSchedClassDesc *SC = SM.getSchedClassDesc(RD.SchedClassID);
493 for (WriteState *WS : DependentWrites) {
494 unsigned WriteResID = WS->getWriteResourceID();
495 int ReadAdvance = STI.getReadAdvanceCycles(SC, RD.OpIndex, WriteResID);
496 WS->addUser(NewRDS, ReadAdvance);
497 }
498
499 // Prepare the set for another round.
500 DependentWrites.clear();
501 }
502
503 // Now populate writes.
504 for (const WriteDescriptor &WD : D.Writes) {
505 unsigned RegID =
506 WD.OpIndex == -1 ? WD.RegisterID : MCI.getOperand(WD.OpIndex).getReg();
507 assert((RegID || WD.IsOptionalDef) && "Expected a valid register ID!");
508 // Special case where this is a optional definition, and the actual register
509 // is 0.
510 if (WD.IsOptionalDef && !RegID)
511 continue;
512
513 WriteState *NewWS = new WriteState(WD);
514 NewIS->getDefs().emplace_back(std::unique_ptr(NewWS));
515 NewWS->setRegisterID(RegID);
516 DU.addNewRegisterMapping(*NewWS);
517 }
518
519 // Update Latency.
520 NewIS->setCyclesLeft(D.MaxLatency);
521 return NewIS;
522 }
523
524 } // namespace mca
0 //===--------------------- InstrBuilder.h -----------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// A builder class for instructions that are statically analyzed by llvm-mca.
11 //
12 //===----------------------------------------------------------------------===//
13
14 #ifndef LLVM_TOOLS_LLVM_MCA_INSTRBUILDER_H
15 #define LLVM_TOOLS_LLVM_MCA_INSTRBUILDER_H
16
17 #include "Dispatch.h"
18 #include "Instruction.h"
19 #include "llvm/MC/MCInstrInfo.h"
20 #include "llvm/MC/MCSubtargetInfo.h"
21
22 namespace mca {
23
24 class DispatchUnit;
25
26 /// \brief A builder class that knows how to construct Instruction objects.
27 ///
28 /// Every llvm-mca Instruction is described by an object of class InstrDesc.
29 /// An InstrDesc describes which registers are read/written by the instruction,
30 /// as well as the instruction latency and hardware resources consumed.
31 ///
32 /// This class is used by the tool to construct Instructions and instruction
33 /// descriptors (i.e. InstrDesc objects).
34 /// Information from the machine scheduling model is used to identify processor
35 /// resources that are consumed by an instruction.
36 class InstrBuilder {
37 const llvm::MCInstrInfo &MCII;
38 const llvm::ArrayRef ProcResourceMasks;
39
40 llvm::DenseMap> Descriptors;
41 llvm::DenseMap> Instructions;
42
43 void createInstrDescImpl(const llvm::MCSubtargetInfo &STI,
44 const llvm::MCInst &MCI);
45
46 public:
47 InstrBuilder(const llvm::MCInstrInfo &mcii,
48 const llvm::ArrayRef Masks)
49 : MCII(mcii), ProcResourceMasks(Masks) {}
50
51 const InstrDesc &getOrCreateInstrDesc(const llvm::MCSubtargetInfo &STI,
52 const llvm::MCInst &MCI);
53
54 Instruction *createInstruction(const llvm::MCSubtargetInfo &STI,
55 DispatchUnit &DU, unsigned Idx,
56 const llvm::MCInst &MCI);
57 };
58
59 } // namespace mca
60
61 #endif
0 //===--------------------- Instruction.cpp ----------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 //
9 // This file defines abstractions used by the Backend to model register reads,
10 // register writes and instructions.
11 //
12 //===----------------------------------------------------------------------===//
13
14 #include "Instruction.h"
15 #include "llvm/Support/Debug.h"
16 #include "llvm/Support/raw_ostream.h"
17
18 namespace mca {
19
20 using namespace llvm;
21
22 void ReadState::writeStartEvent(unsigned Cycles) {
23 assert(DependentWrites);
24 assert(CyclesLeft == UNKNOWN_CYCLES);
25
26 // This read may be dependent on more than one write. This typically occurs
27 // when a definition is the result of multiple writes where at least one
28 // write does a partial register update.
29 // The HW is forced to do some extra bookkeeping to track of all the
30 // dependent writes, and implement a merging scheme for the partial writes.
31 --DependentWrites;
32 TotalCycles = std::max(TotalCycles, Cycles);
33
34 if (!DependentWrites)
35 CyclesLeft = TotalCycles;
36 }
37
38 void WriteState::onInstructionIssued() {
39 assert(CyclesLeft == UNKNOWN_CYCLES);
40 // Update the number of cycles left based on the WriteDescriptor info.
41 CyclesLeft = WD.Latency;
42
43 // Now that the time left before write-back is know, notify
44 // all the users.
45 for (const std::pair &User : Users) {
46 ReadState *RS = User.first;
47 unsigned ReadCycles = std::max(0, CyclesLeft - User.second);
48 RS->writeStartEvent(ReadCycles);
49 }
50 }
51
52 void WriteState::addUser(ReadState *User, int ReadAdvance) {
53 // If CyclesLeft is different than -1, then we don't need to
54 // update the list of users. We can just notify the user with
55 // the actual number of cycles left (which may be zero).
56 if (CyclesLeft != UNKNOWN_CYCLES) {
57 unsigned ReadCycles = std::max(0, CyclesLeft - ReadAdvance);
58 User->writeStartEvent(ReadCycles);
59 return;
60 }
61
62 std::pair NewPair(User, ReadAdvance);
63 Users.insert(NewPair);
64 }
65
66 void WriteState::cycleEvent() {
67 // Note: CyclesLeft can be a negative number. It is an error to
68 // make it an unsigned quantity because users of this write may
69 // specify a negative ReadAdvance.
70 if (CyclesLeft != UNKNOWN_CYCLES)
71 CyclesLeft--;
72 }
73
74 void ReadState::cycleEvent() {
75 // If CyclesLeft is unknown, then bail out immediately.
76 if (CyclesLeft == UNKNOWN_CYCLES)
77 return;
78
79 // If there are still dependent writes, or we reached cycle zero,
80 // then just exit.
81 if (DependentWrites || CyclesLeft == 0)
82 return;
83
84 CyclesLeft--;
85 }
86
87 #ifndef NDEBUG
88 void WriteState::dump() const {
89 dbgs() << "{ OpIdx=" << WD.OpIndex << ", Lat=" << WD.Latency << ", RegID "
90 << getRegisterID() << ", Cycles Left=" << getCyclesLeft() << " }\n";
91 }
92 #endif
93
94 bool Instruction::isReady() {
95 if (Stage == IS_READY)
96 return true;
97
98 assert(Stage == IS_AVAILABLE);
99 for (const UniqueUse &Use : Uses)
100 if (!Use.get()->isReady())
101 return false;
102
103 setReady();
104 return true;
105 }
106
107 void Instruction::execute() {
108 assert(Stage == IS_READY);
109 Stage = IS_EXECUTING;
110 for (UniqueDef &Def : Defs)
111 Def->onInstructionIssued();
112 }
113
114 bool Instruction::isZeroLatency() const {
115 return Desc.MaxLatency == 0 && Defs.size() == 0 && Uses.size() == 0;
116 }
117
118 void Instruction::cycleEvent() {
119 if (isDispatched()) {
120 for (UniqueUse &Use : Uses)
121 Use->cycleEvent();
122 return;
123 }
124 if (isExecuting()) {
125 for (UniqueDef &Def : Defs)
126 Def->cycleEvent();
127 CyclesLeft--;
128 }
129 if (!CyclesLeft)
130 Stage = IS_EXECUTED;
131 }
132
133 } // namespace mca
0 //===--------------------- Instruction.h ------------------------*- C++ -*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// This file defines abstractions used by the Backend to model register reads,
11 /// register writes and instructions.
12 ///
13 //===----------------------------------------------------------------------===//
14
15 #ifndef LLVM_TOOLS_LLVM_MCA_INSTRUCTION_H
16 #define LLVM_TOOLS_LLVM_MCA_INSTRUCTION_H
17
18 #include "llvm/Support/MathExtras.h"
19 #include
20 #include
21 #include
22
23 namespace mca {
24
25 struct WriteDescriptor;
26 struct ReadDescriptor;
27 class WriteState;
28 class ReadState;
29
30 constexpr int UNKNOWN_CYCLES = -512;
31
32 /// \brief A register write descriptor.
33 struct WriteDescriptor {
34 int OpIndex; // Operand index. -1 if this is an implicit write.
35 // Write latency. Number of cycles before write-back stage.
36 int Latency;
37 // This field is set to a value different than zero only if this
38 // is an implicit definition.
39 unsigned RegisterID;
40 // True if this write generates a partial update of a super-registers.
41 // On X86, this flag is set by byte/word writes on GPR registers. Also,
42 // a write of an XMM register only partially updates the corresponding
43 // YMM super-register if the write is associated to a legacy SSE instruction.
44 bool FullyUpdatesSuperRegs;
45 // Instruction itineraries would set this field to the SchedClass ID.
46 // Otherwise, it defaults to the WriteResourceID from teh MCWriteLatencyEntry
47 // element associated to this write.
48 // When computing read latencies, this value is matched against the
49 // "ReadAdvance" information. The hardware backend may implement
50 // dedicated forwarding paths to quickly propagate write results to dependent
51 // instructions waiting in the reservation station (effectively bypassing the
52 // write-back stage).
53 unsigned SClassOrWriteResourceID;
54 // True only if this is a write obtained from an optional definition.
55 // Optional definitions are allowed to reference regID zero (i.e. "no
56 // register").
57 bool IsOptionalDef;
58 };
59
60 /// \brief A register read descriptor.
61 struct ReadDescriptor {
62 // This field defaults to -1 if this is an implicit read.
63 int OpIndex;
64 // This field is only set if this is an implicit read.
65 unsigned RegisterID;
66 // Scheduling Class Index. It is used to query the scheduling model for the
67 // MCSchedClassDesc object.
68 unsigned SchedClassID;
69 // True if there may be a local forwarding logic in hardware to serve a
70 // write used by this read. This information, along with SchedClassID, is
71 // used to dynamically check at Instruction creation time, if the input
72 // operands can benefit from a ReadAdvance bonus.
73 bool HasReadAdvanceEntries;
74 };
75
76 /// \brief Tracks uses of a register definition (e.g. register write).
77 ///
78 /// Each implicit/explicit register write is associated with an instance of
79 /// this class. A WriteState object tracks the dependent users of a
80 /// register write. It also tracks how many cycles are left before the write
81 /// back stage.
82 class WriteState {
83 const WriteDescriptor &WD;
84 // On instruction issue, this field is set equal to the write latency.
85 // Before instruction issue, this field defaults to -512, a special
86 // value that represents an "unknown" number of cycles.
87 int CyclesLeft;
88
89 // Actual register defined by this write. This field is only used
90 // to speedup queries on the register file.
91 // For implicit writes, this field always matches the value of
92 // field RegisterID from WD.
93 unsigned RegisterID;
94
95 // A list of dependent reads. Users is a set of dependent
96 // reads. A dependent read is added to the set only if CyclesLeft
97 // is "unknown". As soon as CyclesLeft is 'known', each user in the set
98 // gets notified with the actual CyclesLeft.
99
100 // The 'second' element of a pair is a "ReadAdvance" number of cycles.
101 std::set> Users;
102
103 public:
104 WriteState(const WriteDescriptor &Desc)
105 : WD(Desc), CyclesLeft(UNKNOWN_CYCLES), RegisterID(Desc.RegisterID) {}
106 WriteState(const WriteState &Other) = delete;
107 WriteState &operator=(const WriteState &Other) = delete;
108
109 int getCyclesLeft() const { return CyclesLeft; }
110 unsigned getWriteResourceID() const { return WD.SClassOrWriteResourceID; }
111 unsigned getRegisterID() const { return RegisterID; }
112 void setRegisterID(unsigned ID) { RegisterID = ID; }
113
114 void addUser(ReadState *Use, int ReadAdvance);
115 bool fullyUpdatesSuperRegs() const { return WD.FullyUpdatesSuperRegs; }
116 bool isWrittenBack() const { return CyclesLeft == 0; }
117
118 // On every cycle, update CyclesLeft and notify dependent users.
119 void cycleEvent();
120 void onInstructionIssued();
121
122 #ifndef NDEBUG
123 void dump() const;
124 #endif
125 };
126
127 /// \brief Tracks register operand latency in cycles.
128 ///
129 /// A read may be dependent on more than one write. This occurs when some
130 /// writes only partially update the register associated to this read.
131 class ReadState {
132 const ReadDescriptor &RD;
133 unsigned DependentWrites;
134 int CyclesLeft;
135 unsigned TotalCycles;
136
137 public:
138 bool isReady() const {
139 if (DependentWrites)
140 return false;
141 return (CyclesLeft == UNKNOWN_CYCLES || CyclesLeft == 0);
142 }
143
144 ReadState(const ReadDescriptor &Desc)
145 : RD(Desc), DependentWrites(0), CyclesLeft(UNKNOWN_CYCLES),
146 TotalCycles(0) {}
147 ReadState(const ReadState &Other) = delete;
148 ReadState &operator=(const ReadState &Other) = delete;
149
150 const ReadDescriptor &getDescriptor() const { return RD; }
151 unsigned getSchedClass() const { return RD.SchedClassID; }
152 void cycleEvent();
153 void writeStartEvent(unsigned Cycles);
154 void setDependentWrites(unsigned Writes) { DependentWrites = Writes; }
155 };
156
157 /// \brief A sequence of cycles.
158 ///
159 /// This class can be used as a building block to construct ranges of cycles.
160 class CycleSegment {
161 unsigned Begin; // Inclusive.
162 unsigned End; // Exclusive.
163 bool Reserved; // Resources associated to this segment must be reserved.
164
165 public:
166 CycleSegment(unsigned StartCycle, unsigned EndCycle, bool IsReserved = false)
167 : Begin(StartCycle), End(EndCycle), Reserved(IsReserved) {}
168
169 bool contains(unsigned Cycle) const { return Cycle >= Begin && Cycle < End; }
170 bool startsAfter(const CycleSegment &CS) const { return End <= CS.Begin; }
171 bool endsBefore(const CycleSegment &CS) const { return Begin >= CS.End; }
172 bool overlaps(const CycleSegment &CS) const {
173 return !startsAfter(CS) && !endsBefore(CS);
174 }
175 bool isExecuting() const { return Begin == 0 && End != 0; }
176 bool isExecuted() const { return End == 0; }
177 bool operator<(const CycleSegment &Other) const {
178 return Begin < Other.Begin;
179 }
180 CycleSegment &operator--(void) {
181 if (Begin)
182 Begin--;
183 if (End)
184 End--;
185 return *this;
186 }
187
188 bool isValid() const { return Begin <= End; }
189 unsigned size() const { return End - Begin; };
190 void Subtract(unsigned Cycles) {
191 assert(End >= Cycles);
192 End -= Cycles;
193 }
194
195 unsigned begin() const { return Begin; }
196 unsigned end() const { return End; }
197 void setEnd(unsigned NewEnd) { End = NewEnd; }
198 bool isReserved() const { return Reserved; }
199 void setReserved() { Reserved = true; }
200 };
201
202 /// \brief Helper used by class InstrDesc to describe how hardware resources
203 /// are used.
204 ///
205 /// This class describes how many resource units of a specific resource kind
206 /// (and how many cycles) are "used" by an instruction.
207 struct ResourceUsage {
208 CycleSegment CS;
209 unsigned NumUnits;
210 ResourceUsage(CycleSegment Cycles, unsigned Units = 1)
211 : CS(Cycles), NumUnits(Units) {}
212 unsigned size() const { return CS.size(); }
213 bool isReserved() const { return CS.isReserved(); }
214 void setReserved() { CS.setReserved(); }
215 };
216
217 /// \brief An instruction descriptor
218 struct InstrDesc {
219 std::vector Writes; // Implicit writes are at the end.
220 std::vector Reads; // Implicit reads are at the end.
221
222 // For every resource used by an instruction of this kind, this vector
223 // reports the number of "consumed cycles".
224 std::vector> Resources;
225
226 // A list of buffered resources consumed by this instruction.
227 std::vector Buffers;
228 unsigned MaxLatency;
229 // Number of MicroOps for this instruction.
230 unsigned NumMicroOps;
231
232 bool MayLoad;
233 bool MayStore;
234 bool HasSideEffects;
235 };
236
237 /// An instruction dispatched to the out-of-order backend.
238 ///
239 /// This class is used to monitor changes in the internal state of instructions
240 /// that are dispatched by the DispatchUnit to the hardware schedulers.
241 class Instruction {
242 const InstrDesc &Desc;
243
244 enum InstrStage {
245 IS_INVALID, // Instruction in an invalid state.
246 IS_AVAILABLE, // Instruction dispatched but operands are not ready.
247 IS_READY, // Instruction dispatched and operands ready.
248 IS_EXECUTING, // Instruction issued.
249 IS_EXECUTED, // Instruction executed. Values are written back.
250 IS_RETIRED // Instruction retired.
251 };
252
253 // The current instruction stage.
254 enum InstrStage Stage;
255
256 // This value defaults to the instruction latency. This instruction is
257 // considered executed when field CyclesLeft goes to zero.
258 int CyclesLeft;
259
260 // Retire Unit token ID for this instruction.
261 unsigned RCUTokenID;
262
263 using UniqueDef = std::unique_ptr;
264 using UniqueUse = std::unique_ptr;
265 using VecDefs = std::vector;
266 using VecUses = std::vector;
267
268 // Output dependencies.
269 // One entry per each implicit and explicit register definition.
270 VecDefs Defs;
271
272 // Input dependencies.
273 // One entry per each implicit and explicit register use.
274 VecUses Uses;
275
276 // This instruction has already been dispatched, and all operands are ready.
277 void setReady() {
278 assert(Stage == IS_AVAILABLE);
279 Stage = IS_READY;
280 }
281
282 public:
283 Instruction(const InstrDesc &D)
284 : Desc(D), Stage(IS_INVALID), CyclesLeft(-1) {}
285 Instruction(const Instruction &Other) = delete;
286 Instruction &operator=(const Instruction &Other) = delete;
287
288 VecDefs &getDefs() { return Defs; }
289 const VecDefs &getDefs() const { return Defs; }
290 VecUses &getUses() { return Uses; }
291 const VecUses &getUses() const { return Uses; }
292 const InstrDesc &getDesc() const { return Desc; }
293
294 unsigned getRCUTokenID() const { return RCUTokenID; }
295 int getCyclesLeft() const { return CyclesLeft; }
296 void setCyclesLeft(int Cycles) { CyclesLeft = Cycles; }
297 void setRCUTokenID(unsigned TokenID) { RCUTokenID = TokenID; }
298
299 // Transition to the dispatch stage.
300 // No definition is updated because the instruction is not "executing".
301 void dispatch() {
302 assert(Stage == IS_INVALID);
303 Stage = IS_AVAILABLE;
304 }
305
306 // Instruction issued. Transition to the IS_EXECUTING state, and update
307 // all the definitions.
308 void execute();
309
310 void forceExecuted() {
311 assert((Stage == IS_INVALID && isZeroLatency()) ||
312 (Stage == IS_READY && Desc.MaxLatency == 0));
313 Stage = IS_EXECUTED;
314 }
315
316 // Checks if operands are available. If all operands area ready,
317 // then this forces a transition from IS_AVAILABLE to IS_READY.
318 bool isReady();
319
320 bool isDispatched() const { return Stage == IS_AVAILABLE; }
321 bool isExecuting() const { return Stage == IS_EXECUTING; }
322 bool isExecuted() const { return Stage == IS_EXECUTED; }
323 bool isZeroLatency() const;
324
325 void retire() {
326 assert(Stage == IS_EXECUTED);
327 Stage = IS_RETIRED;
328 }
329
330 void cycleEvent();
331 };
332
333 } // namespace mca
334
335 #endif
0 ;===- ./tools/llvm-mc/LLVMBuild.txt ----------------------------*- Conf -*--===;
1 ;
2 ; The LLVM Compiler Infrastructure
3 ;
4 ; This file is distributed under the University of Illinois Open Source
5 ; License. See LICENSE.TXT for details.
6 ;
7 ;===------------------------------------------------------------------------===;
8 ;
9 ; This is an LLVMBuild description file for the components in this subdirectory.
10 ;
11 ; For more information on the LLVMBuild system, please see:
12 ;
13 ; http://llvm.org/docs/LLVMBuild.html
14 ;
15 ;===------------------------------------------------------------------------===;
16
17 [component_0]
18 type = Tool
19 name = llvm-mca
20 parent = Tools
21 required_libraries = MC MCParser Support all-targets
0 //===----------------------- LSUnit.cpp --------------------------*- C++-*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// A Load-Store Unit for the llvm-mca tool.
11 ///
12 //===----------------------------------------------------------------------===//
13
14 #include "LSUnit.h"
15 #include "llvm/Support/Debug.h"
16 #include "llvm/Support/raw_ostream.h"
17
18 using namespace llvm;
19
20 #define DEBUG_TYPE "llvm-mca"
21
22 namespace mca {
23
24 #ifndef NDEBUG
25 void LSUnit::dump() const {
26 dbgs() << "[LSUnit] LQ_Size = " << LQ_Size << '\n';
27 dbgs() << "[LSUnit] SQ_Size = " << SQ_Size << '\n';
28 dbgs() << "[LSUnit] NextLQSlotIdx = " << LoadQueue.size() << '\n';
29 dbgs() << "[LSUnit] NextSQSlotIdx = " << StoreQueue.size() << '\n';
30 }
31 #endif
32
33 void LSUnit::assignLQSlot(unsigned Index) {
34 assert(!isLQFull());
35 assert(LoadQueue.count(Index) == 0);
36
37 DEBUG(dbgs() << "[LSUnit] - AssignLQSlot
38 << ",slot=" << LoadQueue.size() << ">\n");
39 LoadQueue.insert(Index);
40 }
41
42 void LSUnit::assignSQSlot(unsigned Index) {
43 assert(!isSQFull());
44 assert(StoreQueue.count(Index) == 0);
45
46 DEBUG(dbgs() << "[LSUnit] - AssignSQSlot
47 << ",slot=" << StoreQueue.size() << ">\n");
48 StoreQueue.insert(Index);
49 }
50
51 bool LSUnit::isReady(unsigned Index) const {
52 bool IsALoad = LoadQueue.count(Index) != 0;
53 bool IsAStore = StoreQueue.count(Index) != 0;
54 unsigned LoadBarrierIndex = LoadBarriers.empty() ? 0 : *LoadBarriers.begin();
55 unsigned StoreBarrierIndex = StoreBarriers.empty() ? 0 : *StoreBarriers.begin();
56
57 if (IsALoad && LoadBarrierIndex) {
58 if (Index > LoadBarrierIndex)
59 return false;
60 if (Index == LoadBarrierIndex && Index != *LoadQueue.begin())
61 return false;
62 }
63
64 if (IsAStore && StoreBarrierIndex) {
65 if (Index > StoreBarrierIndex)
66 return false;
67 if (Index == StoreBarrierIndex && Index != *StoreQueue.begin())
68 return false;
69 }
70
71 if (NoAlias && IsALoad)
72 return true;
73
74 if (StoreQueue.size()) {
75 // Check if this memory operation is younger than the older store.
76 if (Index > *StoreQueue.begin())
77 return false;
78 }
79
80 // Okay, we are older than the oldest store in the queue.
81 // If there are no pending loads, then we can say for sure that this
82 // instruction is ready.
83 if (isLQEmpty())
84 return true;
85
86 // Check if there are no older loads.
87 if (Index <= *LoadQueue.begin())
88 return true;
89
90 // There is at least one younger load.
91 return !IsAStore;
92 }
93
94 void LSUnit::onInstructionExecuted(unsigned Index) {
95 std::set::iterator it = LoadQueue.find(Index);
96 if (it != LoadQueue.end()) {
97 DEBUG(dbgs() << "[LSUnit]: Instruction idx=" << Index
98 << " has been removed from the load queue.\n");
99 LoadQueue.erase(it);
100 }
101
102 it = StoreQueue.find(Index);
103 if (it != StoreQueue.end()) {
104 DEBUG(dbgs() << "[LSUnit]: Instruction idx=" << Index
105 << " has been removed from the store queue.\n");
106 StoreQueue.erase(it);
107 }
108
109 if (!StoreBarriers.empty() && Index == *StoreBarriers.begin())
110 StoreBarriers.erase(StoreBarriers.begin());
111 if (!LoadBarriers.empty() && Index == *LoadBarriers.begin())
112 LoadBarriers.erase(LoadBarriers.begin());
113 }
114 } // namespace mca
0 //===------------------------- LSUnit.h --------------------------*- C++-*-===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 ///
10 /// A Load/Store unit class that models load/store queues and that implements
11 /// a simple weak memory consistency model.
12 ///
13 //===----------------------------------------------------------------------===//
14
15 #ifndef LLVM_TOOLS_LLVM_MCA_LSUNIT_H
16 #define LLVM_TOOLS_LLVM_MCA_LSUNIT_H
17
18 #include "llvm/Support/Debug.h"
19 #include "llvm/Support/raw_ostream.h"
20 #include
21
22 #define DEBUG_TYPE "llvm-mca"
23
24 namespace mca {
25
26 /// \brief A Load/Store Unit implementing a load and store queues.
27 ///
28 /// This class implements a load queue and a store queue to emulate the
29 /// out-of-order execution of memory operations.
30 /// Each load (or store) consumes an entry in the load (or store) queue.
31 ///
32 /// Rules are:
33 /// 1) A younger load is allowed to pass an older load only if there are no
34 /// stores nor barriers in between the two loads.
35 /// 2) An younger store is not allowed to pass an older store.
36 /// 3) A younger store is not allowed to pass an older load.
37 /// 4) A younger load is allowed to pass an older store only if the load does
38 /// not alias with the store.
39 ///
40 /// This class optimistically assumes that loads don't alias store operations.
41 /// Under this assumption, younger loads are always allowed to pass older
42 /// stores (this would only affects rule 4).
43 /// Essentially, this LSUnit doesn't attempt to run any sort alias analysis to
44 /// predict when loads and stores don't alias with eachother.
45 ///
46 /// To enforce aliasing between loads and stores, flag `AssumeNoAlias` must be
47 /// set to `false` by the constructor of LSUnit.
48 ///
49 /// In the case of write-combining memory, rule 2. could be relaxed to allow
50 /// reordering of non-aliasing store operations. At the moment, this is not
51 /// allowed.
52 /// To put it in another way, there is no option to specify a different memory
53 /// type for memory operations (example: write-through, write-combining, etc.).
54 /// Also, there is no way to weaken the memory model, and this unit currently
55 /// doesn't support write-combining behavior.
56 ///
57 /// No assumptions are made on the size of the store buffer.
58 /// As mentioned before, this class doesn't perform alias analysis.
59 /// Consequently, LSUnit doesn't know how to identify cases where
60 /// store-to-load forwarding may occur.
61 ///
62 /// LSUnit doesn't attempt to predict whether a load or store hits or misses
63 /// the L1 cache. To be more specific, LSUnit doesn't know anything about
64 /// the cache hierarchy and memory types.
65 /// It only knows if an instruction "mayLoad" and/or "mayStore". For loads, the
66 /// scheduling model provides an "optimistic" load-to-use latency (which usually
67 /// matches the load-to-use latency for when there is a hit in the L1D).
68 ///
69 /// Class MCInstrDesc in LLVM doesn't know about serializing operations, nor
70 /// memory-barrier like instructions.
71 /// LSUnit conservatively assumes that an instruction which `mayLoad` and has
72 /// `unmodeled side effects` behave like a "soft" load-barrier. That means, it
73 /// serializes loads without forcing a flush of the load queue.
74 /// Similarly, instructions that both `mayStore` and have `unmodeled side
75 /// effects` are treated like store barriers. A full memory
76 /// barrier is a 'mayLoad' and 'mayStore' instruction with unmodeled side
77 /// effects. This is obviously inaccurate, but this is the best that we can do
78 /// at the moment.
79 ///
80 /// Each load/store barrier consumes one entry in the load/store queue. A
81 /// load/store barrier enforces ordering of loads/stores:
82 /// - A younger load cannot pass a load barrier.
83 /// - A younger store cannot pass a store barrier.
84 ///
85 /// A younger load has to wait for the memory load barrier to execute.
86 /// A load/store barrier is "executed" when it becomes the oldest entry in
87 /// the load/store queue(s). That also means, all the older loads/stores have
88 /// already been executed.
89 class LSUnit {
90 // Load queue size.
91 // LQ_Size == 0 means that there are infinite slots in the load queue.
92 unsigned LQ_Size;
93
94 // Store queue size.
95 // SQ_Size == 0 means that there are infinite slots in the store queue.
96 unsigned SQ_Size;
97
98 // If true, loads will never alias with stores. This is the default.
99 bool NoAlias;
100
101 std::set LoadQueue;
102 std::set StoreQueue;
103
104 void assignLQSlot(unsigned Index);
105 void assignSQSlot(unsigned Index);
106 bool isReadyNoAlias(unsigned Index) const;
107
108 // An instruction that both 'mayStore' and 'HasUnmodeledSideEffects' is
109 // conservatively treated as a store barrier. It forces older store to be
110 // executed before newer stores are issued.
111 std::set StoreBarriers;
112
113 // An instruction that both 'MayLoad' and 'HasUnmodeledSideEffects' is
114 // conservatively treated as a load barrier. It forces older loads to execute
115 // before newer loads are issued.
116 std::set LoadBarriers;
117
118 public:
119 LSUnit(unsigned LQ = 0, unsigned SQ = 0, bool AssumeNoAlias = false)
120 : LQ_Size(LQ), SQ_Size(SQ), NoAlias(AssumeNoAlias) {}
121
122 #ifndef NDEBUG
123 void dump() const;
124 #endif
125
126 bool isSQEmpty() const { return StoreQueue.empty(); }
127 bool isLQEmpty() const { return LoadQueue.empty(); }
128 bool isSQFull() const { return SQ_Size != 0 && StoreQueue.size() == SQ_Size; }
129 bool isLQFull() const { return LQ_Size != 0 && LoadQueue.size() == LQ_Size; }
130
131 void reserve(unsigned Index, bool MayLoad, bool MayStore, bool IsMemBarrier) {
132 if (!MayLoad && !MayStore)
133 return;
134 if (MayLoad) {
135 if (IsMemBarrier)
136 LoadBarriers.insert(Index);
137 assignLQSlot(Index);
138 }
139 if (MayStore) {
140 if (IsMemBarrier)
141 StoreBarriers.insert(Index);
142 assignSQSlot(Index);
143 }
144 }
145
146 // The rules are:
147 // 1. A store may not pass a previous store.
148 // 2. A load may not pass a previous store unless flag 'NoAlias' is set.
149 // 3. A load may pass a previous load.
150 // 4. A store may not pass a previous load (regardless of flag 'NoAlias').
151 // 5. A load has to wait until an older load barrier is fully executed.
152 // 6. A store has to wait until an older store barrier is fully executed.
153 bool isReady(unsigned Index) const;
154 void onInstructionExecuted(unsigned Index);
155 };
156
157 } // namespace mca
158
159 #endif
0 llvm-mca - LLVM Machine Code Analyzer
1 -------------------------------------
2
3 llvm-mca is a performance analysis tool that uses information which is already
4 available in LLVM (e.g. scheduling models) to statically measure the performance
5 of machine code in a specific cpu.
6
7 Performance is measured in terms of throughput as well as processor resource
8 consumption. The tool currently works for processors with an out-of-order
9 backend, for which there is a scheduling model available in LLVM.
10
11 The main goal of this tool is not just to predict the performance of the code
12 when run on the target, but also help with diagnosing potential performance
13 issues.
14
15 Given an assembly code sequence, llvm-mca estimates the IPC (instructions Per
16 cycle), as well as hardware resources pressure. The analysis and reporting style
17 were inspired by the IACA tool from Intel.