llvm.org GIT mirror llvm / 98ccb48
[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880) Summary: This is an alternative to D59539. Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`. Let's suppose we are using `-analysis-clustering-epsilon=0.5`. By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster. Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster. Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster. So all these points ended up in the same cluster. This may or may not be a correct implementation of dbscan clustering algorithm. But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data. Let's suppose all those opcodes are currently in the same sched cluster. If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter the LLVM values this cluster will **never** match the LLVM values, and thus this cluster will **always** be displayed as inconsistent. The solution is obviously to split off some of these opcodes into different sched cluster. But how do i do that? Out of 4 opcodes displayed in the inconsistency report, which ones are the "bad ones"? Which ones are the most different from the checked-in data? I'd need to go in to the `.yaml` and look it up manually. The trivial solution is to, when creating clusters, don't use the full dbscan algorithm, but instead "pick some unclustered point, pick all unclustered points that are it's neighbor, put them all into a new cluster, repeat". And just so as it happens, we can arrive at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step. But that won't work well once we teach analyze mode to operate in on-1D mode (i.e. on more than a single measurement type at a time), because the clustering would depend on the order of the measurements. Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster. And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster, and if they are not, the cluster (==opcode) is unstable. This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59820 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357152 91177308-0d34-0410-b5e6-96231b3b80d8 Roman Lebedev 1 year, 8 months ago
11 changed file(s) with 630 addition(s) and 38 deletion(s). Raw diff Collapse all Expand all
213213 If non-empty, write inconsistencies found during analysis to this file. `-`
214214 prints to stdout. By default, this analysis is not run.
215215
216 .. option:: -analysis-clustering=[dbscan,naive]
217
218 Specify the clustering algorithm to use. By default DBSCAN will be used.
219 Naive clustering algorithm is better for doing further work on the
220 `-analysis-inconsistencies-output-file=` output, it will create one cluster
221 per opcode, and check that the cluster is stable (all points are neighbours).
222
216223 .. option:: -analysis-numpoints=
217224
218225 Specify the numPoints parameters to be used for DBSCAN clustering
219 (`analysis` mode).
226 (`analysis` mode, DBSCAN only).
220227
221228 .. option:: -analysis-clustering-epsilon=
222229
0 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.5 -analysis-numpoints=1 -analysis-clustering=dbscan | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-DBSCAN-05 %s
1 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.49 -analysis-numpoints=1 -analysis-clustering=dbscan | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-DBSCAN-049 %s
2 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.5 -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-NAIVE %s
3 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.49 -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-NAIVE %s
4
5 # CHECK-CLUSTERS-ALL: {{^}}cluster_id,opcode_name,config,sched_class,inverse_throughput{{$}}
6
7 # By default with -analysis-clustering-epsilon=0.5 everything ends up in the
8 # same cluster, because each next point is a neighbour of the previous point.
9
10 # CHECK-CLUSTERS-DBSCAN-05-NEXT: {{^}}0,
11 # CHECK-CLUSTERS-DBSCAN-05-SAME: ,1.00{{$}}
12 # CHECK-CLUSTERS-DBSCAN-05-NEXT: {{^}}0,
13 # CHECK-CLUSTERS-DBSCAN-05-SAME: ,1.50{{$}}
14 # CHECK-CLUSTERS-DBSCAN-05-NEXT: {{^}}0,
15 # CHECK-CLUSTERS-DBSCAN-05-SAME: ,2.00{{$}}
16 # CHECK-CLUSTERS-DBSCAN-05-NEXT: {{^}}0,
17 # CHECK-CLUSTERS-DBSCAN-05-SAME: ,2.50{{$}}
18
19 # With -analysis-clustering-epsilon=0.49 every point goes into separate cluster.
20
21 # CHECK-CLUSTERS-DBSCAN-049-NEXT: {{^}}0,
22 # CHECK-CLUSTERS-DBSCAN-049-SAME: ,1.00{{$}}
23 # CHECK-CLUSTERS-DBSCAN-049: {{^}}1,
24 # CHECK-CLUSTERS-DBSCAN-049-SAME: ,1.50{{$}}
25 # CHECK-CLUSTERS-DBSCAN-049: {{^}}2,
26 # CHECK-CLUSTERS-DBSCAN-049-SAME: ,2.00{{$}}
27 # CHECK-CLUSTERS-DBSCAN-049: {{^}}3,
28 # CHECK-CLUSTERS-DBSCAN-049-SAME: ,2.50{{$}}
29
30 # And -analysis-clustering=naive every opcode goes into separate cluster.
31
32 # CHECK-CLUSTERS-NAIVE-049-NEXT: {{^}}0,
33 # CHECK-CLUSTERS-NAIVE-049-SAME: ,1.50{{$}}
34 # CHECK-CLUSTERS-NAIVE-049: {{^}}1,
35 # CHECK-CLUSTERS-NAIVE-049-SAME: ,2.00{{$}}
36 # CHECK-CLUSTERS-NAIVE-049: {{^}}2,
37 # CHECK-CLUSTERS-NAIVE-049-SAME: ,2.50{{$}}
38 # CHECK-CLUSTERS-NAIVE-049: {{^}}3,
39 # CHECK-CLUSTERS-NAIVE-049-SAME: ,1.00{{$}}
40
41 # The "value" is manually specified, not measured.
42
43 ---
44 mode: inverse_throughput
45 key:
46 instructions:
47 - 'ROL8ri AH AH i_0x1'
48 - 'ROL8ri AL AL i_0x1'
49 - 'ROL8ri BH BH i_0x1'
50 - 'ROL8ri BL BL i_0x1'
51 - 'ROL8ri BPL BPL i_0x1'
52 - 'ROL8ri CH CH i_0x1'
53 - 'ROL8ri CL CL i_0x1'
54 - 'ROL8ri DH DH i_0x1'
55 - 'ROL8ri DIL DIL i_0x1'
56 - 'ROL8ri DL DL i_0x1'
57 - 'ROL8ri SIL SIL i_0x1'
58 - 'ROL8ri R8B R8B i_0x1'
59 - 'ROL8ri R9B R9B i_0x1'
60 - 'ROL8ri R10B R10B i_0x1'
61 - 'ROL8ri R11B R11B i_0x1'
62 - 'ROL8ri R12B R12B i_0x1'
63 - 'ROL8ri R13B R13B i_0x1'
64 - 'ROL8ri R14B R14B i_0x1'
65 - 'ROL8ri R15B R15B i_0x1'
66 config: ''
67 register_initial_values:
68 - 'AH=0x0'
69 - 'AL=0x0'
70 - 'BH=0x0'
71 - 'BL=0x0'
72 - 'BPL=0x0'
73 - 'CH=0x0'
74 - 'CL=0x0'
75 - 'DH=0x0'
76 - 'DIL=0x0'
77 - 'DL=0x0'
78 - 'SIL=0x0'
79 - 'R8B=0x0'
80 - 'R9B=0x0'
81 - 'R10B=0x0'
82 - 'R11B=0x0'
83 - 'R12B=0x0'
84 - 'R13B=0x0'
85 - 'R14B=0x0'
86 - 'R15B=0x0'
87 cpu_name: bdver2
88 llvm_triple: x86_64-unknown-linux-gnu
89 num_repetitions: 1000000
90 measurements:
91 - { key: inverse_throughput, value: 1.0000, per_snippet_value: 30.4026 }
92 error: ''
93 info: instruction has tied variables, using static renaming.
94 assembled_snippet: 55415741564155415453B400B000B700B30040B500B500B100B60040B700B20040B60041B00041B10041B20041B30041B40041B50041B60041B700C0C401C0C001C0C701C0C30140C0C501C0C501C0C101C0C60140C0C701C0C20140C0C60141C0C00141C0C10141C0C20141C0C30141C0C40141C0C50141C0C60141C0C7015B415C415D415E415F5DC3
95 ...
96 ---
97 mode: inverse_throughput
98 key:
99 instructions:
100 - 'ROL16ri AX AX i_0x1'
101 - 'ROL16ri BP BP i_0x1'
102 - 'ROL16ri BX BX i_0x1'
103 - 'ROL16ri CX CX i_0x1'
104 - 'ROL16ri DI DI i_0x1'
105 - 'ROL16ri DX DX i_0x1'
106 - 'ROL16ri SI SI i_0x1'
107 - 'ROL16ri R8W R8W i_0x1'
108 - 'ROL16ri R9W R9W i_0x1'
109 - 'ROL16ri R10W R10W i_0x1'
110 - 'ROL16ri R11W R11W i_0x1'
111 - 'ROL16ri R12W R12W i_0x1'
112 - 'ROL16ri R13W R13W i_0x1'
113 - 'ROL16ri R14W R14W i_0x1'
114 - 'ROL16ri R15W R15W i_0x1'
115 config: ''
116 register_initial_values:
117 - 'AX=0x0'
118 - 'BP=0x0'
119 - 'BX=0x0'
120 - 'CX=0x0'
121 - 'DI=0x0'
122 - 'DX=0x0'
123 - 'SI=0x0'
124 - 'R8W=0x0'
125 - 'R9W=0x0'
126 - 'R10W=0x0'
127 - 'R11W=0x0'
128 - 'R12W=0x0'
129 - 'R13W=0x0'
130 - 'R14W=0x0'
131 - 'R15W=0x0'
132 cpu_name: bdver2
133 llvm_triple: x86_64-unknown-linux-gnu
134 num_repetitions: 1000000
135 measurements:
136 - { key: inverse_throughput, value: 1.5000, per_snippet_value: 30.154 }
137 error: ''
138 info: instruction has tied variables, using static renaming.
139 assembled_snippet: 5541574156415541545366B8000066BD000066BB000066B9000066BF000066BA000066BE00006641B800006641B900006641BA00006641BB00006641BC00006641BD00006641BE00006641BF000066C1C00166C1C50166C1C30166C1C10166C1C70166C1C20166C1C6016641C1C0016641C1C1016641C1C2016641C1C3016641C1C4016641C1C5016641C1C6016641C1C70166C1C0015B415C415D415E415F5DC3
140 ...
141 ---
142 mode: inverse_throughput
143 key:
144 instructions:
145 - 'ROL32ri EAX EAX i_0x1'
146 - 'ROL32ri EBP EBP i_0x1'
147 - 'ROL32ri EBX EBX i_0x1'
148 - 'ROL32ri ECX ECX i_0x1'
149 - 'ROL32ri EDI EDI i_0x1'
150 - 'ROL32ri EDX EDX i_0x1'
151 - 'ROL32ri ESI ESI i_0x1'
152 - 'ROL32ri R8D R8D i_0x1'
153 - 'ROL32ri R9D R9D i_0x1'
154 - 'ROL32ri R10D R10D i_0x1'
155 - 'ROL32ri R11D R11D i_0x1'
156 - 'ROL32ri R12D R12D i_0x1'
157 - 'ROL32ri R13D R13D i_0x1'
158 - 'ROL32ri R14D R14D i_0x1'
159 - 'ROL32ri R15D R15D i_0x1'
160 config: ''
161 register_initial_values:
162 - 'EAX=0x0'
163 - 'EBP=0x0'
164 - 'EBX=0x0'
165 - 'ECX=0x0'
166 - 'EDI=0x0'
167 - 'EDX=0x0'
168 - 'ESI=0x0'
169 - 'R8D=0x0'
170 - 'R9D=0x0'
171 - 'R10D=0x0'
172 - 'R11D=0x0'
173 - 'R12D=0x0'
174 - 'R13D=0x0'
175 - 'R14D=0x0'
176 - 'R15D=0x0'
177 cpu_name: bdver2
178 llvm_triple: x86_64-unknown-linux-gnu
179 num_repetitions: 1000000
180 measurements:
181 - { key: inverse_throughput, value: 2.0000, per_snippet_value: 23.2762 }
182 error: ''
183 info: instruction has tied variables, using static renaming.
184 assembled_snippet: 55415741564155415453B800000000BD00000000BB00000000B900000000BF00000000BA00000000BE0000000041B80000000041B90000000041BA0000000041BB0000000041BC0000000041BD0000000041BE0000000041BF00000000C1C001C1C501C1C301C1C101C1C701C1C201C1C60141C1C00141C1C10141C1C20141C1C30141C1C40141C1C50141C1C60141C1C701C1C0015B415C415D415E415F5DC3
185 ...
186 ---
187 mode: inverse_throughput
188 key:
189 instructions:
190 - 'ROL64ri RAX RAX i_0x1'
191 - 'ROL64ri RBP RBP i_0x1'
192 - 'ROL64ri RBX RBX i_0x1'
193 - 'ROL64ri RCX RCX i_0x1'
194 - 'ROL64ri RDI RDI i_0x1'
195 - 'ROL64ri RDX RDX i_0x1'
196 - 'ROL64ri RSI RSI i_0x1'
197 - 'ROL64ri R8 R8 i_0x1'
198 - 'ROL64ri R9 R9 i_0x1'
199 - 'ROL64ri R10 R10 i_0x1'
200 - 'ROL64ri R11 R11 i_0x1'
201 - 'ROL64ri R12 R12 i_0x1'
202 - 'ROL64ri R13 R13 i_0x1'
203 - 'ROL64ri R14 R14 i_0x1'
204 - 'ROL64ri R15 R15 i_0x1'
205 config: ''
206 register_initial_values:
207 - 'RAX=0x0'
208 - 'RBP=0x0'
209 - 'RBX=0x0'
210 - 'RCX=0x0'
211 - 'RDI=0x0'
212 - 'RDX=0x0'
213 - 'RSI=0x0'
214 - 'R8=0x0'
215 - 'R9=0x0'
216 - 'R10=0x0'
217 - 'R11=0x0'
218 - 'R12=0x0'
219 - 'R13=0x0'
220 - 'R14=0x0'
221 - 'R15=0x0'
222 cpu_name: bdver2
223 llvm_triple: x86_64-unknown-linux-gnu
224 num_repetitions: 1000000
225 measurements:
226 - { key: inverse_throughput, value: 2.5000, per_snippet_value: 26.2268 }
227 error: ''
228 info: instruction has tied variables, using static renaming.
229 assembled_snippet: 5541574156415541545348B8000000000000000048BD000000000000000048BB000000000000000048B9000000000000000048BF000000000000000048BA000000000000000048BE000000000000000049B8000000000000000049B9000000000000000049BA000000000000000049BB000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF000000000000000048C1C00148C1C50148C1C30148C1C10148C1C70148C1C20148C1C60149C1C00149C1C10149C1C20149C1C30149C1C40149C1C50149C1C60149C1C70148C1C0015B415C415D415E415F5DC3
230 ...
0 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.5 -analysis-inconsistency-epsilon=0.5 -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-05 %s
1 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=0.5 -analysis-inconsistency-epsilon=0.5 -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-STABLE-05 %s
2 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=0.5 -analysis-inconsistency-epsilon=0.5 -analysis-display-unstable-clusters -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-UNSTABLE-05 %s
3
4 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.49 -analysis-inconsistency-epsilon=0.5 -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-049 %s
5 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=0.49 -analysis-inconsistency-epsilon=0.5 -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-STABLE-049 %s
6 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=0.49 -analysis-inconsistency-epsilon=0.5 -analysis-display-unstable-clusters -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-UNSTABLE-049 %s
7
8 # CHECK-CLUSTERS-ALL: {{^}}cluster_id,opcode_name,config,sched_class,latency{{$}}
9
10 # CHECK-CLUSTERS-NEXT-05: {{^}}0,
11 # CHECK-CLUSTERS-SAME-05: ,90.00{{$}}
12 # CHECK-CLUSTERS-05: {{^}}0,
13 # CHECK-CLUSTERS-SAME-05: ,90.50{{$}}
14
15 # CHECK-INCONSISTENCIES-STABLE-05: ADD32rr
16 # CHECK-INCONSISTENCIES-STABLE-05: ADD32rr
17 # CHECK-INCONSISTENCIES-STABLE-05-NOT: ADD32rr
18
19 # CHECK-INCONSISTENCIES-UNSTABLE-05-NOT: ADD32rr
20
21 # CHECK-INCONSISTENCIES-STABLE-049-NOT: ADD32rr
22
23 # CHECK-INCONSISTENCIES-UNSTABLE-049: ADD32rr
24 # CHECK-INCONSISTENCIES-UNSTABLE-049: ADD32rr
25 # CHECK-INCONSISTENCIES-UNSTABLE-049-NOT: ADD32rr
26
27 ---
28 mode: latency
29 key:
30 instructions:
31 - 'ADD32rr EDX EDX EAX'
32 config: ''
33 register_initial_values:
34 - 'EDX=0x0'
35 - 'EAX=0x0'
36 cpu_name: bdver2
37 llvm_triple: x86_64-unknown-linux-gnu
38 num_repetitions: 10000
39 measurements:
40 - { key: latency, value: 90.0000, per_snippet_value: 90.0000 }
41 error: ''
42 info: Repeating a single implicitly serial instruction
43 assembled_snippet: BA00000000B80000000001C201C201C201C201C201C201C201C201C201C201C201C201C201C201C201C2C3
44 ---
45 mode: latency
46 key:
47 instructions:
48 - 'ADD32rr EDX EDX EAX'
49 config: ''
50 register_initial_values:
51 - 'EDX=0x0'
52 - 'EAX=0x0'
53 cpu_name: bdver2
54 llvm_triple: x86_64-unknown-linux-gnu
55 num_repetitions: 10000
56 measurements:
57 - { key: latency, value: 90.5000, per_snippet_value: 90.5000 }
58 error: ''
59 info: Repeating a single implicitly serial instruction
60 assembled_snippet: BA00000000B80000000001C201C201C201C201C201C201C201C201C201C201C201C201C201C201C201C2C3
61 ---
62 ...
0 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.1 -analysis-inconsistency-epsilon=0.1 -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-CLUSTERS %s
1 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=0.5 -analysis-inconsistency-epsilon=0.5 -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-STABLE %s
2 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=0.5 -analysis-inconsistency-epsilon=0.5 -analysis-display-unstable-clusters -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-UNSTABLE %s
3
4 # We have two ADD32rr measurements, and two measurements for SQRTSSr.
5 # ADD32rr measurements are neighbours.
6 # But the measurements of SQRTSSr are not neighbours,
7 # so therefore that cluster is marked as unstable.
8
9 # By default, we do not show such unstable clusters.
10 # If told to show, we *only* show such unstable clusters.
11
12 # CHECK-CLUSTERS: {{^}}cluster_id,opcode_name,config,sched_class,latency{{$}}
13 # CHECK-CLUSTERS-NEXT: {{^}}0,
14 # CHECK-CLUSTERS-SAME: ,90.00{{$}}
15 # CHECK-CLUSTERS-NEXT: {{^}}0,
16 # CHECK-CLUSTERS-SAME: ,90.11{{$}}
17 # CHECK-CLUSTERS: {{^}}1,
18 # CHECK-CLUSTERS-SAME: ,90.11{{$}}
19 # CHECK-CLUSTERS-NEXT: {{^}}1,
20 # CHECK-CLUSTERS-SAME: ,100.00{{$}}
21
22 # CHECK-INCONSISTENCIES-STABLE: ADD32rr
23 # CHECK-INCONSISTENCIES-STABLE: ADD32rr
24 # CHECK-INCONSISTENCIES-STABLE-NOT: ADD32rr
25 # CHECK-INCONSISTENCIES-STABLE-NOT: SQRTSSr
26
27 # CHECK-INCONSISTENCIES-UNSTABLE: SQRTSSr
28 # CHECK-INCONSISTENCIES-UNSTABLE: SQRTSSr
29 # CHECK-INCONSISTENCIES-UNSTABLE-NOT: SQRTSSr
30 # CHECK-INCONSISTENCIES-UNSTABLE-NOT: ADD32rr
31
32 ---
33 mode: latency
34 key:
35 instructions:
36 - 'ADD32rr EDX EDX EAX'
37 config: ''
38 register_initial_values:
39 - 'EDX=0x0'
40 - 'EAX=0x0'
41 cpu_name: bdver2
42 llvm_triple: x86_64-unknown-linux-gnu
43 num_repetitions: 10000
44 measurements:
45 - { key: latency, value: 90.0000, per_snippet_value: 90.0000 }
46 error: ''
47 info: Repeating a single implicitly serial instruction
48 assembled_snippet: BA00000000B80000000001C201C201C201C201C201C201C201C201C201C201C201C201C201C201C201C2C3
49 ---
50 mode: latency
51 key:
52 instructions:
53 - 'ADD32rr EDX EDX EAX'
54 config: ''
55 register_initial_values:
56 - 'EDX=0x0'
57 - 'EAX=0x0'
58 cpu_name: bdver2
59 llvm_triple: x86_64-unknown-linux-gnu
60 num_repetitions: 10000
61 measurements:
62 - { key: latency, value: 90.1100, per_snippet_value: 90.1100 }
63 error: ''
64 info: Repeating a single implicitly serial instruction
65 assembled_snippet: BA00000000B80000000001C201C201C201C201C201C201C201C201C201C201C201C201C201C201C201C2C3
66 ---
67 mode: latency
68 key:
69 instructions:
70 - 'SQRTSSr XMM11 XMM11'
71 config: ''
72 register_initial_values:
73 - 'XMM11=0x0'
74 cpu_name: bdver2
75 llvm_triple: x86_64-unknown-linux-gnu
76 num_repetitions: 10000
77 measurements:
78 - { key: latency, value: 90.1111, per_snippet_value: 90.1111 }
79 error: ''
80 info: Repeating a single explicitly serial instruction
81 assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C410F3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBC3
82 ...
83 ---
84 mode: latency
85 key:
86 instructions:
87 - 'SQRTSSr XMM11 XMM11'
88 config: ''
89 register_initial_values:
90 - 'XMM11=0x0'
91 cpu_name: bdver2
92 llvm_triple: x86_64-unknown-linux-gnu
93 num_repetitions: 10000
94 measurements:
95 - { key: latency, value: 100, per_snippet_value: 100 }
96 error: ''
97 info: Repeating a single explicitly serial instruction
98 assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C410F3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBC3
99 ...
0 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=10 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-DBSCAN %s
1 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=10 -analysis-numpoints=1 -analysis-clustering=dbscan | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-DBSCAN %s
2 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=10 -analysis-numpoints=1 -analysis-clustering=naive | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-NAIVE %s
3
4 # Normally BSR32rr is in WriteBSR and BSF32rr is in WriteBSF sched classes.
5 # Here we check that if we have dbscan-clustered these two measurements into the
6 # same cluster, we don't split it per the sched classes into two.
7
8 # CHECK-CLUSTERS-ALL: {{^}}cluster_id,opcode_name,config,sched_class,inverse_throughput{{$}}
9
10 # CHECK-CLUSTERS-DBSCAN-NEXT: {{^}}0,
11 # CHECK-CLUSTERS-DBSCAN-SAME: ,4.03{{$}}
12 # CHECK-CLUSTERS-DBSCAN-NEXT: {{^}}0,
13 # CHECK-CLUSTERS-DBSCAN-SAME: ,3.02{{$}}
14
15 # CHECK-CLUSTERS-NAIVE-NEXT: {{^}}0,
16 # CHECK-CLUSTERS-NAIVE-SAME: ,3.02{{$}}
17 # CHECK-CLUSTERS-NAIVE: {{^}}1,
18 # CHECK-CLUSTERS-NAIVE-SAME: ,4.03{{$}}
19
20 ---
21 mode: inverse_throughput
22 key:
23 instructions:
24 - 'BSR32rr R11D EDI'
25 config: ''
26 register_initial_values:
27 - 'EDI=0x0'
28 cpu_name: bdver2
29 llvm_triple: x86_64-unknown-linux-gnu
30 num_repetitions: 1000000
31 measurements:
32 - { key: inverse_throughput, value: 4.03048, per_snippet_value: 4.03048 }
33 error: ''
34 info: instruction has no tied variables picking Uses different from defs
35 assembled_snippet: BF00000000440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDFC3
36 ...
37 ---
38 mode: inverse_throughput
39 key:
40 instructions:
41 - 'BSF32rr EAX R14D'
42 config: ''
43 register_initial_values:
44 - 'R14D=0x0'
45 cpu_name: bdver2
46 llvm_triple: x86_64-unknown-linux-gnu
47 num_repetitions: 1000000
48 measurements:
49 - { key: inverse_throughput, value: 3.02186, per_snippet_value: 3.02186 }
50 error: ''
51 info: instruction has no tied variables picking Uses different from defs
52 assembled_snippet: 415641BE00000000410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6415EC3
53 ...
";
332332 OS << "";
333333 }
334334 OS << "
335 for (const auto &Stats : Cluster.getRepresentative()) {
335 for (const auto &Stats : Cluster.getCentroid().getStats()) {
336336 OS << "";
337337 writeMeasurementValue(OS, Stats.avg());
338338 OS << "
[";
436436 size_t PointId, const InstructionBenchmarkClustering &Clustering) {
437437 PointIds.push_back(PointId);
438438 const auto &Point = Clustering.getPoints()[PointId];
439 if (ClusterId.isUndef()) {
439 if (ClusterId.isUndef())
440440 ClusterId = Clustering.getClusterIdForPoint(PointId);
441 Representative.resize(Point.Measurements.size());
442 }
443 for (size_t I = 0, E = Point.Measurements.size(); I < E; ++I) {
444 Representative[I].push(Point.Measurements[I]);
445 }
446441 assert(ClusterId == Clustering.getClusterIdForPoint(PointId));
442
443 Centroid.addPoint(Point.Measurements);
447444 }
448445
449446 // Returns a ProxResIdx by id or name.
466463 const llvm::MCSubtargetInfo &STI, const ResolvedSchedClass &RSC,
467464 const InstructionBenchmarkClustering &Clustering,
468465 const double AnalysisInconsistencyEpsilonSquared_) const {
466 ArrayRef Representative = Centroid.getStats();
469467 const size_t NumMeasurements = Representative.size();
470468 std::vector ClusterCenterPoint(NumMeasurements);
471469 std::vector SchedClassPoint(NumMeasurements);
7272
7373 const std::vector &getPointIds() const { return PointIds; }
7474
75 void addPoint(size_t PointId,
76 const InstructionBenchmarkClustering &Clustering);
77
7578 // Return the cluster centroid.
76 const std::vector &getRepresentative() const {
77 return Representative;
78 }
79 const SchedClassClusterCentroid &getCentroid() const { return Centroid; }
7980
8081 // Returns true if the cluster representative measurements match that of SC.
8182 bool
8485 const InstructionBenchmarkClustering &Clustering,
8586 const double AnalysisInconsistencyEpsilonSquared_) const;
8687
87 void addPoint(size_t PointId,
88 const InstructionBenchmarkClustering &Clustering);
89
9088 private:
9189 InstructionBenchmarkClustering::ClusterId ClusterId;
9290 std::vector PointIds;
9391 // Measurement stats for the points in the SchedClassCluster.
94 std::vector Representative;
92 SchedClassClusterCentroid Centroid;
9593 };
9694
9795 void printInstructionRowCsv(size_t PointId, llvm::raw_ostream &OS) const;
5252 }
5353 }
5454
55 // Given a set of points, checks that all the points are neighbours
56 // up to AnalysisClusteringEpsilon. This is O(2*N).
57 bool InstructionBenchmarkClustering::areAllNeighbours(
58 ArrayRef Pts) const {
59 // First, get the centroid of this group of points. This is O(N).
60 SchedClassClusterCentroid G;
61 llvm::for_each(Pts, [this, &G](size_t P) {
62 assert(P < Points_.size());
63 ArrayRef Measurements = Points_[P].Measurements;
64 if (Measurements.empty()) // Error point.
65 return;
66 G.addPoint(Measurements);
67 });
68 const std::vector Centroid = G.getAsPoint();
69
70 // Since we will be comparing with the centroid, we need to halve the epsilon.
71 double AnalysisClusteringEpsilonHalvedSquared =
72 AnalysisClusteringEpsilonSquared_ / 4.0;
73
74 // And now check that every point is a neighbour of the centroid. Also O(N).
75 return llvm::all_of(
76 Pts, [this, &Centroid, AnalysisClusteringEpsilonHalvedSquared](size_t P) {
77 assert(P < Points_.size());
78 const auto &PMeasurements = Points_[P].Measurements;
79 if (PMeasurements.empty()) // Error point.
80 return true; // Pretend that error point is a neighbour.
81 return isNeighbour(PMeasurements, Centroid,
82 AnalysisClusteringEpsilonHalvedSquared);
83 });
84 }
85
5586 InstructionBenchmarkClustering::InstructionBenchmarkClustering(
5687 const std::vector &Points,
5788 const double AnalysisClusteringEpsilonSquared)
94125 return llvm::Error::success();
95126 }
96127
97 void InstructionBenchmarkClustering::dbScan(const size_t MinPts) {
128 void InstructionBenchmarkClustering::clusterizeDbScan(const size_t MinPts) {
98129 std::vector Neighbors; // Persistent buffer to avoid allocs.
99130 for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {
100131 if (!ClusterIdForPoint_[P].isUndef())
149180 NoiseCluster_.PointIndices.push_back(P);
150181 }
151182 }
183 }
184
185 void InstructionBenchmarkClustering::clusterizeNaive(unsigned NumOpcodes) {
186 // Given an instruction Opcode, which are the benchmarks of this instruction?
187 std::vector> OpcodeToPoints;
188 OpcodeToPoints.resize(NumOpcodes);
189 size_t NumOpcodesSeen = 0;
190 for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {
191 const InstructionBenchmark &Point = Points_[P];
192 const unsigned Opcode = Point.keyInstruction().getOpcode();
193 assert(Opcode < NumOpcodes && "NumOpcodes is incorrect (too small)");
194 llvm::SmallVectorImpl &PointsOfOpcode = OpcodeToPoints[Opcode];
195 if (PointsOfOpcode.empty()) // If we previously have not seen any points of
196 ++NumOpcodesSeen; // this opcode, then naturally this is the new opcode.
197 PointsOfOpcode.emplace_back(P);
198 }
199 assert(OpcodeToPoints.size() == NumOpcodes && "sanity check");
200 assert(NumOpcodesSeen <= NumOpcodes &&
201 "can't see more opcodes than there are total opcodes");
202 assert(NumOpcodesSeen <= Points_.size() &&
203 "can't see more opcodes than there are total points");
204
205 Clusters_.reserve(NumOpcodesSeen); // One cluster per opcode.
206 for (ArrayRef PointsOfOpcode : llvm::make_filter_range(
207 OpcodeToPoints, [](ArrayRef PointsOfOpcode) {
208 return !PointsOfOpcode.empty(); // Ignore opcodes with no points.
209 })) {
210 // Create a new cluster.
211 Clusters_.emplace_back(ClusterId::makeValid(
212 Clusters_.size(), /*IsUnstable=*/!areAllNeighbours(PointsOfOpcode)));
213 Cluster &CurrentCluster = Clusters_.back();
214 // Mark points as belonging to the new cluster.
215 llvm::for_each(PointsOfOpcode, [this, &CurrentCluster](size_t P) {
216 ClusterIdForPoint_[P] = CurrentCluster.Id;
217 });
218 // And add all the points of this opcode to the new cluster.
219 CurrentCluster.PointIndices.reserve(PointsOfOpcode.size());
220 CurrentCluster.PointIndices.assign(PointsOfOpcode.begin(),
221 PointsOfOpcode.end());
222 assert(CurrentCluster.PointIndices.size() == PointsOfOpcode.size());
223 }
224 assert(Clusters_.size() == NumOpcodesSeen);
152225 }
153226
154227 // Given an instruction Opcode, we can make benchmarks (measurements) of the
245318
246319 llvm::Expected
247320 InstructionBenchmarkClustering::create(
248 const std::vector &Points, const size_t MinPts,
249 const double AnalysisClusteringEpsilon,
321 const std::vector &Points, const ModeE Mode,
322 const size_t DbscanMinPts, const double AnalysisClusteringEpsilon,
250323 llvm::Optional NumOpcodes) {
251324 InstructionBenchmarkClustering Clustering(
252325 Points, AnalysisClusteringEpsilon * AnalysisClusteringEpsilon);
257330 return Clustering; // Nothing to cluster.
258331 }
259332
260 Clustering.dbScan(MinPts);
261
262 if (NumOpcodes.hasValue())
263 Clustering.stabilize(NumOpcodes.getValue());
333 if (Mode == ModeE::Dbscan) {
334 Clustering.clusterizeDbScan(DbscanMinPts);
335
336 if (NumOpcodes.hasValue())
337 Clustering.stabilize(NumOpcodes.getValue());
338 } else /*if(Mode == ModeE::Naive)*/ {
339 if (!NumOpcodes.hasValue())
340 llvm::report_fatal_error(
341 "'naive' clustering mode requires opcode count to be specified");
342 Clustering.clusterizeNaive(NumOpcodes.getValue());
343 }
264344
265345 return Clustering;
346 }
347
348 void SchedClassClusterCentroid::addPoint(ArrayRef Point) {
349 if (Representative.empty())
350 Representative.resize(Point.size());
351 assert(Representative.size() == Point.size() &&
352 "All points should have identical dimensions.");
353
354 for (const auto &I : llvm::zip(Representative, Point))
355 std::get<0>(I).push(std::get<1>(I));
356 }
357
358 std::vector SchedClassClusterCentroid::getAsPoint() const {
359 std::vector ClusterCenterPoint(Representative.size());
360 for (const auto &I : llvm::zip(ClusterCenterPoint, Representative))
361 std::get<0>(I).PerInstructionValue = std::get<1>(I).avg();
362 return ClusterCenterPoint;
266363 }
267364
268365 } // namespace exegesis
2424
2525 class InstructionBenchmarkClustering {
2626 public:
27 enum ModeE { Dbscan, Naive };
28
2729 // Clusters `Points` using DBSCAN with the given parameters. See the cc file
2830 // for more explanations on the algorithm.
2931 static llvm::Expected
30 create(const std::vector &Points, size_t MinPts,
31 double AnalysisClusteringEpsilon,
32 create(const std::vector &Points, ModeE Mode,
33 size_t DbscanMinPts, double AnalysisClusteringEpsilon,
3234 llvm::Optional NumOpcodes = llvm::None);
3335
3436 class ClusterId {
3537 public:
3638 static ClusterId noise() { return ClusterId(kNoise); }
3739 static ClusterId error() { return ClusterId(kError); }
38 static ClusterId makeValid(size_t Id) { return ClusterId(Id); }
40 static ClusterId makeValid(size_t Id, bool IsUnstable = false) {
41 return ClusterId(Id, IsUnstable);
42 }
3943 static ClusterId makeValidUnstable(size_t Id) {
40 return ClusterId(Id, /*IsUnstable=*/true);
44 return makeValid(Id, /*IsUnstable=*/true);
4145 }
4246
4347 ClusterId() : Id_(kUndef), IsUnstable_(false) {}
119123 double AnalysisClusteringEpsilonSquared);
120124
121125 llvm::Error validateAndSetup();
122 void dbScan(size_t MinPts);
126
127 void clusterizeDbScan(size_t MinPts);
128 void clusterizeNaive(unsigned NumOpcodes);
129
130 // Stabilization is only needed if dbscan was used to clusterize.
123131 void stabilize(unsigned NumOpcodes);
132
124133 void rangeQuery(size_t Q, std::vector &Scratchpad) const;
134
135 bool areAllNeighbours(ArrayRef Pts) const;
125136
126137 const std::vector &Points_;
127138 const double AnalysisClusteringEpsilonSquared_;
139
128140 int NumDimensions_ = 0;
129141 // ClusterForPoint_[P] is the cluster id for Points[P].
130142 std::vector ClusterIdForPoint_;
133145 Cluster ErrorCluster_;
134146 };
135147
148 class SchedClassClusterCentroid {
149 public:
150 const std::vector &getStats() const {
151 return Representative;
152 }
153
154 std::vector getAsPoint() const;
155
156 void addPoint(ArrayRef Point);
157
158 private:
159 // Measurement stats for the points in the SchedClassCluster.
160 std::vector Representative;
161 };
162
136163 } // namespace exegesis
137164 } // namespace llvm
138165
6565 cl::cat(Options), cl::init(""));
6666
6767 static cl::opt BenchmarkMode(
68 "mode", cl::desc("the mode to run"), cl::cat(BenchmarkOptions),
68 "mode", cl::desc("the mode to run"), cl::cat(Options),
6969 cl::values(clEnumValN(exegesis::InstructionBenchmark::Latency, "latency",
7070 "Instruction Latency"),
7171 clEnumValN(exegesis::InstructionBenchmark::InverseThroughput,
8888 cl::desc("ignore instructions that do not define a sched class"),
8989 cl::cat(BenchmarkOptions), cl::init(false));
9090
91 static cl::opt<unsigned> AnalysisNumPoints(
91 static cl::opt<exegesis::InstructionBenchmarkClustering::ModeE>
92 AnalysisClusteringAlgorithm(
93 "analysis-clustering", cl::desc("the clustering algorithm to use"),
94 cl::cat(AnalysisOptions),
95 cl::values(clEnumValN(exegesis::InstructionBenchmarkClustering::Dbscan,
96 "dbscan", "use DBSCAN/OPTICS algorithm"),
97 clEnumValN(exegesis::InstructionBenchmarkClustering::Naive,
98 "naive", "one cluster per opcode")),
99 cl::init(exegesis::InstructionBenchmarkClustering::Dbscan));
100
101 static cl::opt AnalysisDbscanNumPoints(
92102 "analysis-numpoints",
93 cl::desc("minimum number of points in an analysis cluster"),
103 cl::desc("minimum number of points in an analysis cluster (dbscan only)"),
94104 cl::cat(AnalysisOptions), cl::init(3));
95105
96106 static cl::opt AnalysisClusteringEpsilon(
97107 "analysis-clustering-epsilon",
98 cl::desc("dbscan epsilon for benchmark point clustering"),
108 cl::desc("epsilon for benchmark point clustering"),
99109 cl::cat(AnalysisOptions), cl::init(0.1));
100110
101111 static cl::opt AnalysisInconsistencyEpsilon(
459469 std::unique_ptr InstrInfo(TheTarget->createMCInstrInfo());
460470
461471 const auto Clustering = ExitOnErr(InstructionBenchmarkClustering::create(
462 Points, AnalysisNumPoints, AnalysisClusteringEpsilon,
463 InstrInfo->getNumOpcodes()));
472 Points, AnalysisClusteringAlgorithm, AnalysisDbscanNumPoints,
473 AnalysisClusteringEpsilon, InstrInfo->getNumOpcodes()));
464474
465475 const Analysis Analyzer(*TheTarget, std::move(InstrInfo), Clustering,
466476 AnalysisInconsistencyEpsilon,
4545 // Error cluster: points {2}
4646 Points[2].Error = "oops";
4747
48 auto Clustering = InstructionBenchmarkClustering::create(Points, 2, 0.25);
48 auto Clustering = InstructionBenchmarkClustering::create(
49 Points, InstructionBenchmarkClustering::ModeE::Dbscan, 2, 0.25);
4950 ASSERT_TRUE((bool)Clustering);
5051 EXPECT_THAT(Clustering.get().getValidClusters(),
5152 UnorderedElementsAre(HasPoints({0, 3}), HasPoints({1, 4})));
7273 {"x", 0.01, 0.0}, {"y", 1.02, 0.0}, {"z", 1.98, 0.0}};
7374 Points[1].Measurements = {{"y", 1.02, 0.0}, {"z", 1.98, 0.0}};
7475 auto Error =
75 InstructionBenchmarkClustering::create(Points, 2, 0.25).takeError();
76 InstructionBenchmarkClustering::create(
77 Points, InstructionBenchmarkClustering::ModeE::Dbscan, 2, 0.25)
78 .takeError();
7679 ASSERT_TRUE((bool)Error);
7780 consumeError(std::move(Error));
7881 }
8285 Points[0].Measurements = {{"x", 0.01, 0.0}, {"y", 1.02, 0.0}};
8386 Points[1].Measurements = {{"y", 1.02, 0.0}, {"x", 1.98, 0.0}};
8487 auto Error =
85 InstructionBenchmarkClustering::create(Points, 2, 0.25).takeError();
88 InstructionBenchmarkClustering::create(
89 Points, InstructionBenchmarkClustering::ModeE::Dbscan, 2, 0.25)
90 .takeError();
8691 ASSERT_TRUE((bool)Error);
8792 consumeError(std::move(Error));
8893 }
111116 Points[2].Measurements = {
112117 {"x", 2.0, 0.0}};
113118
114 auto Clustering = InstructionBenchmarkClustering::create(Points, 2, 1.1);
119 auto Clustering = InstructionBenchmarkClustering::create(
120 Points, InstructionBenchmarkClustering::ModeE::Dbscan, 2, 1.1);
115121 ASSERT_TRUE((bool)Clustering);
116122 EXPECT_THAT(Clustering.get().getValidClusters(),
117123 UnorderedElementsAre(HasPoints({0, 1, 2})));
127133 Points[2].Measurements = {
128134 {"x", 1.0, 0.0}};
129135
130 auto Clustering = InstructionBenchmarkClustering::create(Points, 2, 1.1);
136 auto Clustering = InstructionBenchmarkClustering::create(
137 Points, InstructionBenchmarkClustering::ModeE::Dbscan, 2, 1.1);
131138 ASSERT_TRUE((bool)Clustering);
132139 EXPECT_THAT(Clustering.get().getValidClusters(),
133140 UnorderedElementsAre(HasPoints({0, 1, 2})));