llvm.org GIT mirror llvm / a41ebf8
[llvm-exegesis] Opcode stabilization / reclusterization (PR40715) Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with *similar* characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with *different* performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into *different* clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move *all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, *this* measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354441 91177308-0d34-0410-b5e6-96231b3b80d8 Roman Lebedev 1 year, 9 months ago
8 changed file(s) with 242 addition(s) and 17 deletion(s). Raw diff Collapse all Expand all
223223 Specify the numPoints parameters to be used for DBSCAN clustering
224224 (`analysis` mode).
225225
226 .. option:: -analysis-display-unstable-clusters
227
228 If there is more than one benchmark for an opcode, said benchmarks may end up
229 not being clustered into the same cluster if the measured performance
230 characteristics are different. by default all such opcodes are filtered out.
231 This flag will instead show only such unstable opcodes.
232
226233 .. option:: -ignore-invalid-sched-class=false
227234
228235 If set, ignore instructions that do not have a sched class (class idx = 0).
0 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-epsilon=0.1 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-CLUSTERS %s
1 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-epsilon=0.5 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-STABLE %s
2 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-epsilon=0.5 -analysis-display-unstable-clusters -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-UNSTABLE %s
3
4 # We have one ADD32rr measurement, and two measurements for SQRTSSr.
5 # The ADD32rr measurement and one of the SQRTSSr measurements are identical,
6 # and thus will be be in the same cluster. But the second SQRTSSr measurement
7 # is different from the first SQRTSSr measurement, and thus it will be in it's
8 # own cluster. We do reclusterization, and thus since there is more than one
9 # measurement from SQRTSSr, and they are not in the same cluster, we move
10 # all two SQRTSSr measurements into their own cluster, and mark it as unstable.
11 # By default, we do not show such unstable clusters.
12 # If told to show, we *only* show such unstable clusters.
13
14 # CHECK-CLUSTERS: {{^}}cluster_id,opcode_name,config,sched_class,latency{{$}}
15 # CHECK-CLUSTERS-NEXT: {{^}}0,
16 # CHECK-CLUSTERS-SAME: ,90.00{{$}}
17 # CHECK-CLUSTERS: {{^}}3,
18 # CHECK-CLUSTERS-SAME: ,90.11{{$}}
19 # CHECK-CLUSTERS-NEXT: {{^}}3,
20 # CHECK-CLUSTERS-SAME: ,100.00{{$}}
21
22 # CHECK-INCONSISTENCIES-STABLE: ADD32rr
23 # CHECK-INCONSISTENCIES-STABLE-NOT: ADD32rr
24 # CHECK-INCONSISTENCIES-STABLE-NOT: SQRTSSr
25
26 # CHECK-INCONSISTENCIES-UNSTABLE: SQRTSSr
27 # CHECK-INCONSISTENCIES-UNSTABLE: SQRTSSr
28 # CHECK-INCONSISTENCIES-UNSTABLE-NOT: SQRTSSr
29 # CHECK-INCONSISTENCIES-UNSTABLE-NOT: ADD32rr
30
31 ---
32 mode: latency
33 key:
34 instructions:
35 - 'ADD32rr EDX EDX EAX'
36 config: ''
37 register_initial_values:
38 - 'EDX=0x0'
39 - 'EAX=0x0'
40 cpu_name: bdver2
41 llvm_triple: x86_64-unknown-linux-gnu
42 num_repetitions: 10000
43 measurements:
44 - { key: latency, value: 90.0000, per_snippet_value: 90.0000 }
45 error: ''
46 info: Repeating a single implicitly serial instruction
47 assembled_snippet: BA00000000B80000000001C201C201C201C201C201C201C201C201C201C201C201C201C201C201C201C2C3
48 ---
49 mode: latency
50 key:
51 instructions:
52 - 'SQRTSSr XMM11 XMM11'
53 config: ''
54 register_initial_values:
55 - 'XMM11=0x0'
56 cpu_name: bdver2
57 llvm_triple: x86_64-unknown-linux-gnu
58 num_repetitions: 10000
59 measurements:
60 - { key: latency, value: 90.1111, per_snippet_value: 90.1111 }
61 error: ''
62 info: Repeating a single explicitly serial instruction
63 assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C410F3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBC3
64 ...
65 ---
66 mode: latency
67 key:
68 instructions:
69 - 'SQRTSSr XMM11 XMM11'
70 config: ''
71 register_initial_values:
72 - 'XMM11=0x0'
73 cpu_name: bdver2
74 llvm_triple: x86_64-unknown-linux-gnu
75 num_repetitions: 10000
76 measurements:
77 - { key: latency, value: 100, per_snippet_value: 100 }
78 error: ''
79 info: Repeating a single explicitly serial instruction
80 assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C410F3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBC3
81 ...
148148 writeEscaped(OS, Point.Key.Config);
149149 OS << kCsvSep;
150150 assert(!Point.Key.Instructions.empty());
151 const llvm::MCInst &MCI = Point.Key.Instructions[0];
151 const llvm::MCInst &MCI = Point.keyInstruction();
152152 const unsigned SchedClassId = resolveSchedClassId(
153153 *SubtargetInfo_, InstrInfo_->get(MCI.getOpcode()).getSchedClass(), MCI);
154154
167167 }
168168
169169 Analysis::Analysis(const llvm::Target &Target,
170 const InstructionBenchmarkClustering &Clustering)
171 : Clustering_(Clustering) {
170 std::unique_ptr InstrInfo,
171 const InstructionBenchmarkClustering &Clustering,
172 bool AnalysisDisplayUnstableOpcodes)
173 : Clustering_(Clustering), InstrInfo_(std::move(InstrInfo)),
174 AnalysisDisplayUnstableOpcodes_(AnalysisDisplayUnstableOpcodes) {
172175 if (Clustering.getPoints().empty())
173176 return;
174177
175178 const InstructionBenchmark &FirstPoint = Clustering.getPoints().front();
176 InstrInfo_.reset(Target.createMCInstrInfo());
177179 RegInfo_.reset(Target.createMCRegInfo(FirstPoint.LLVMTriple));
178180 AsmInfo_.reset(Target.createMCAsmInfo(*RegInfo_, FirstPoint.LLVMTriple));
179181 SubtargetInfo_.reset(Target.createMCSubtargetInfo(FirstPoint.LLVMTriple,
232234 assert(!Point.Key.Instructions.empty());
233235 // FIXME: we should be using the tuple of classes for instructions in the
234236 // snippet as key.
235 const llvm::MCInst &MCI = Point.Key.Instructions[0];
237 const llvm::MCInst &MCI = Point.keyInstruction();
236238 unsigned SchedClassId = InstrInfo_->get(MCI.getOpcode()).getSchedClass();
237239 const bool WasVariant = SchedClassId && SubtargetInfo_->getSchedModel()
238240 .getSchedClassDesc(SchedClassId)
667669 const auto &ClusterId = Clustering_.getClusterIdForPoint(PointId);
668670 if (!ClusterId.isValid())
669671 continue; // Ignore noise and errors. FIXME: take noise into account ?
672 if (ClusterId.isUnstable() ^ AnalysisDisplayUnstableOpcodes_)
673 continue; // Either display stable or unstable clusters only.
670674 auto SchedClassClusterIt =
671675 std::find_if(SchedClassClusters.begin(), SchedClassClusters.end(),
672676 [ClusterId](const SchedClassCluster &C) {
3535 class Analysis {
3636 public:
3737 Analysis(const llvm::Target &Target,
38 const InstructionBenchmarkClustering &Clustering);
38 std::unique_ptr InstrInfo,
39 const InstructionBenchmarkClustering &Clustering,
40 bool AnalysisDisplayUnstableOpcodes);
3941
4042 // Prints a csv of instructions for each cluster.
4143 struct PrintClusters {};
124126 std::unique_ptr AsmInfo_;
125127 std::unique_ptr InstPrinter_;
126128 std::unique_ptr Disasm_;
129 const bool AnalysisDisplayUnstableOpcodes_;
127130 };
128131
129132 // Computes the idealized ProcRes Unit pressure. This is the expected
6060 ModeE Mode;
6161 std::string CpuName;
6262 std::string LLVMTriple;
63 // Which instruction is being benchmarked here?
64 const llvm::MCInst &keyInstruction() const { return Key.Instructions[0]; }
6365 // The number of instructions inside the repeated snippet. For example, if a
6466 // snippet of 3 instructions is repeated 4 times, this is 12.
6567 int NumRepetitions = 0;
77
88 #include "Clustering.h"
99 #include "llvm/ADT/SetVector.h"
10 #include "llvm/ADT/SmallSet.h"
1011 #include "llvm/ADT/SmallVector.h"
12 #include
1113 #include
14 #include
1215
1316 namespace llvm {
1417 namespace exegesis {
146149 }
147150 }
148151
152 // Given an instruction Opcode, we can make benchmarks (measurements) of the
153 // instruction characteristics/performance. Then, to facilitate further analysis
154 // we group the benchmarks with *similar* characteristics into clusters.
155 // Now, this is all not entirely deterministic. Some instructions have variable
156 // characteristics, depending on their arguments. And thus, if we do several
157 // benchmarks of the same instruction Opcode, we may end up with *different*
158 // performance characteristics measurements. And when we then do clustering,
159 // these several benchmarks of the same instruction Opcode may end up being
160 // clustered into *different* clusters. This is not great for further analysis.
161 // We shall find every opcode with benchmarks not in just one cluster, and move
162 // *all* the benchmarks of said Opcode into one new unstable cluster per Opcode.
163 void InstructionBenchmarkClustering::stabilize(unsigned NumOpcodes) {
164 // Given an instruction Opcode, in which clusters do benchmarks of this
165 // instruction lie? Normally, they all should be in the same cluster.
166 std::vector> OpcodeToClusterIDs;
167 OpcodeToClusterIDs.resize(NumOpcodes);
168 // The list of opcodes that have more than one cluster.
169 llvm::SetVector UnstableOpcodes;
170 // Populate OpcodeToClusterIDs and UnstableOpcodes data structures.
171 assert(ClusterIdForPoint_.size() == Points_.size() && "size mismatch");
172 for (const auto &Point : zip(Points_, ClusterIdForPoint_)) {
173 const ClusterId &ClusterIdOfPoint = std::get<1>(Point);
174 if (!ClusterIdOfPoint.isValid())
175 continue; // Only process fully valid clusters.
176 const unsigned Opcode = std::get<0>(Point).keyInstruction().getOpcode();
177 assert(Opcode < NumOpcodes && "NumOpcodes is incorrect (too small)");
178 llvm::SmallSet &ClusterIDsOfOpcode =
179 OpcodeToClusterIDs[Opcode];
180 ClusterIDsOfOpcode.insert(ClusterIdOfPoint);
181 // Is there more than one ClusterID for this opcode?.
182 if (ClusterIDsOfOpcode.size() < 2)
183 continue; // If not, then at this moment this Opcode is stable.
184 // Else let's record this unstable opcode for future use.
185 UnstableOpcodes.insert(Opcode);
186 }
187 assert(OpcodeToClusterIDs.size() == NumOpcodes && "sanity check");
188
189 // We know with how many [new] clusters we will end up with.
190 const auto NewTotalClusterCount = Clusters_.size() + UnstableOpcodes.size();
191 Clusters_.reserve(NewTotalClusterCount);
192 for (const size_t UnstableOpcode : UnstableOpcodes.getArrayRef()) {
193 const llvm::SmallSet &ClusterIDs =
194 OpcodeToClusterIDs[UnstableOpcode];
195 assert(ClusterIDs.size() > 1 &&
196 "Should only have Opcodes with more than one cluster.");
197
198 // Create a new unstable cluster, one per Opcode.
199 Clusters_.emplace_back(ClusterId::makeValidUnstable(Clusters_.size()));
200 Cluster &UnstableCluster = Clusters_.back();
201 // We will find *at least* one point in each of these clusters.
202 UnstableCluster.PointIndices.reserve(ClusterIDs.size());
203
204 // Go through every cluster which we recorded as containing benchmarks
205 // of this UnstableOpcode. NOTE: we only recorded valid clusters.
206 for (const ClusterId &CID : ClusterIDs) {
207 assert(CID.isValid() &&
208 "We only recorded valid clusters, not noise/error clusters.");
209 Cluster &OldCluster = Clusters_[CID.getId()]; // Valid clusters storage.
210 // Within each cluster, go through each point, and either move it to the
211 // new unstable cluster, or 'keep' it.
212 // In this case, we'll reshuffle OldCluster.PointIndices vector
213 // so that all the points that are *not* for UnstableOpcode are first,
214 // and the rest of the points is for the UnstableOpcode.
215 const auto it = std::stable_partition(
216 OldCluster.PointIndices.begin(), OldCluster.PointIndices.end(),
217 [this, UnstableOpcode](size_t P) {
218 return Points_[P].keyInstruction().getOpcode() != UnstableOpcode;
219 });
220 assert(std::distance(it, OldCluster.PointIndices.end()) > 0 &&
221 "Should have found at least one bad point");
222 // Mark to-be-moved points as belonging to the new cluster.
223 std::for_each(it, OldCluster.PointIndices.end(),
224 [this, &UnstableCluster](size_t P) {
225 ClusterIdForPoint_[P] = UnstableCluster.Id;
226 });
227 // Actually append to-be-moved points to the new cluster.
228 UnstableCluster.PointIndices.insert(UnstableCluster.PointIndices.cend(),
229 it, OldCluster.PointIndices.end());
230 // And finally, remove "to-be-moved" points form the old cluster.
231 OldCluster.PointIndices.erase(it, OldCluster.PointIndices.cend());
232 // Now, the old cluster may end up being empty, but let's just keep it
233 // in whatever state it ended up. Purging empty clusters isn't worth it.
234 };
235 assert(UnstableCluster.PointIndices.size() > 1 &&
236 "New unstable cluster should end up with more than one point.");
237 assert(UnstableCluster.PointIndices.size() >= ClusterIDs.size() &&
238 "New unstable cluster should end up with no less points than there "
239 "was clusters");
240 }
241 assert(Clusters_.size() == NewTotalClusterCount && "sanity check");
242 }
243
149244 llvm::Expected
150245 InstructionBenchmarkClustering::create(
151246 const std::vector &Points, const size_t MinPts,
152 const double Epsilon) {
247 const double Epsilon, llvm::Optional NumOpcodes) {
153248 InstructionBenchmarkClustering Clustering(Points, Epsilon * Epsilon);
154249 if (auto Error = Clustering.validateAndSetup()) {
155250 return std::move(Error);
159254 }
160255
161256 Clustering.dbScan(MinPts);
257
258 if (NumOpcodes.hasValue())
259 Clustering.stabilize(NumOpcodes.getValue());
260
162261 return Clustering;
163262 }
164263
1414 #define LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H
1515
1616 #include "BenchmarkResult.h"
17 #include "llvm/ADT/Optional.h"
1718 #include "llvm/Support/Error.h"
19 #include
1820 #include
1921
2022 namespace llvm {
2628 // for more explanations on the algorithm.
2729 static llvm::Expected
2830 create(const std::vector &Points, size_t MinPts,
29 double Epsilon);
31 double Epsilon, llvm::Optional NumOpcodes = llvm::None);
3032
3133 class ClusterId {
3234 public:
3335 static ClusterId noise() { return ClusterId(kNoise); }
3436 static ClusterId error() { return ClusterId(kError); }
3537 static ClusterId makeValid(size_t Id) { return ClusterId(Id); }
36 ClusterId() : Id_(kUndef) {}
38 static ClusterId makeValidUnstable(size_t Id) {
39 return ClusterId(Id, /*IsUnstable=*/true);
40 }
41
42 ClusterId() : Id_(kUndef), IsUnstable_(false) {}
43
44 // Compare id's, ignoring the 'unstability' bit.
3745 bool operator==(const ClusterId &O) const { return Id_ == O.Id_; }
3846 bool operator<(const ClusterId &O) const { return Id_ < O.Id_; }
3947
4048 bool isValid() const { return Id_ <= kMaxValid; }
41 bool isUndef() const { return Id_ == kUndef; }
49 bool isUnstable() const { return IsUnstable_; }
4250 bool isNoise() const { return Id_ == kNoise; }
4351 bool isError() const { return Id_ == kError; }
52 bool isUndef() const { return Id_ == kUndef; }
4453
4554 // Precondition: isValid().
4655 size_t getId() const {
4958 }
5059
5160 private:
52 explicit ClusterId(size_t Id) : Id_(Id) {}
61 ClusterId(size_t Id, bool IsUnstable = false)
62 : Id_(Id), IsUnstable_(IsUnstable) {}
63
5364 static constexpr const size_t kMaxValid =
54 std::numeric_limits::max() - 4;
65 (std::numeric_limits::max() >> 1) - 4;
5566 static constexpr const size_t kNoise = kMaxValid + 1;
5667 static constexpr const size_t kError = kMaxValid + 2;
5768 static constexpr const size_t kUndef = kMaxValid + 3;
58 size_t Id_;
69
70 size_t Id_ : (std::numeric_limits::digits - 1);
71 size_t IsUnstable_ : 1;
5972 };
73 static_assert(sizeof(ClusterId) == sizeof(size_t), "should be a bit field.");
6074
6175 struct Cluster {
6276 Cluster() = delete;
100114 private:
101115 InstructionBenchmarkClustering(
102116 const std::vector &Points, double EpsilonSquared);
117
103118 llvm::Error validateAndSetup();
104119 void dbScan(size_t MinPts);
120 void stabilize(unsigned NumOpcodes);
105121 void rangeQuery(size_t Q, std::vector &Scratchpad) const;
106122
107123 const std::vector &Points_;
9595 AnalysisInconsistenciesOutputFile("analysis-inconsistencies-output-file",
9696 cl::desc(""), cl::init(""));
9797
98 static cl::opt AnalysisDisplayUnstableOpcodes(
99 "analysis-display-unstable-clusters",
100 cl::desc("if there is more than one benchmark for an opcode, said "
101 "benchmarks may end up not being clustered into the same cluster "
102 "if the measured performance characteristics are different. by "
103 "default all such opcodes are filtered out. this flag will "
104 "instead show only such unstable opcodes"),
105 cl::init(false));
106
98107 static cl::opt
99108 CpuName("mcpu",
100109 cl::desc(
101110 "cpu name to use for pfm counters, leave empty to autodetect"),
102111 cl::init(""));
103
104112
105113 static ExitOnError ExitOnErr;
106114
431439 llvm::errs() << "unknown target '" << Points[0].LLVMTriple << "'\n";
432440 return;
433441 }
442
443 std::unique_ptr InstrInfo(TheTarget->createMCInstrInfo());
444
434445 const auto Clustering = ExitOnErr(InstructionBenchmarkClustering::create(
435 Points, AnalysisNumPoints, AnalysisEpsilon));
436
437 const Analysis Analyzer(*TheTarget, Clustering);
446 Points, AnalysisNumPoints, AnalysisEpsilon, InstrInfo->getNumOpcodes()));
447
448 const Analysis Analyzer(*TheTarget, std::move(InstrInfo), Clustering,
449 AnalysisDisplayUnstableOpcodes);
438450
439451 maybeRunAnalysis(Analyzer, "analysis clusters",
440452 AnalysisClustersOutputFile);