llvm.org GIT mirror llvm / 5e1169b
[llvm-exegesis] Split Epsilon param into two (PR40787) Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354767 91177308-0d34-0410-b5e6-96231b3b80d8 Roman Lebedev 1 year, 9 months ago
8 changed file(s) with 117 addition(s) and 25 deletion(s). Raw diff Collapse all Expand all
218218 Specify the numPoints parameters to be used for DBSCAN clustering
219219 (`analysis` mode).
220220
221 .. option:: -analysis-epsilon=
222
223 Specify the numPoints parameters to be used for DBSCAN clustering
221 .. option:: -analysis-clustering-epsilon=
222
223 Specify the epsilon parameter used for clustering of benchmark points
224224 (`analysis` mode).
225
226 .. option:: -analysis-inconsistency-epsilon=
227
228 Specify the epsilon parameter used for detection of when the cluster
229 is different from the LLVM schedule profile values (`analysis` mode).
225230
226231 .. option:: -analysis-display-unstable-clusters
227232
None # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-epsilon=0.1 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-CLUSTERS %s
1 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-epsilon=0.5 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-STABLE %s
2 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-epsilon=0.5 -analysis-display-unstable-clusters -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-UNSTABLE %s
0 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.1 -analysis-inconsistency-epsilon=0.1 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-CLUSTERS %s
1 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=0.5 -analysis-inconsistency-epsilon=0.5 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-STABLE %s
2 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=0.5 -analysis-inconsistency-epsilon=0.5 -analysis-display-unstable-clusters -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-UNSTABLE %s
33
44 # We have one ADD32rr measurement, and two measurements for SQRTSSr.
55 # The ADD32rr measurement and one of the SQRTSSr measurements are identical,
0 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=9 -analysis-inconsistency-epsilon=0.1 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-TWO %s
1 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=9 -analysis-inconsistency-epsilon=100 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-TWO %s
2 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=10 -analysis-inconsistency-epsilon=0.1 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-ONE %s
3 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=10 -analysis-inconsistency-epsilon=100 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-ONE %s
4
5 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=9 -analysis-inconsistency-epsilon=0.1 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-FAIL %s
6 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=10 -analysis-inconsistency-epsilon=0.1 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-FAIL %s
7 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=9 -analysis-inconsistency-epsilon=100 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-PASS %s
8 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clustering-epsilon=10 -analysis-inconsistency-epsilon=100 -analysis-numpoints=1 | FileCheck -check-prefixes=CHECK-INCONSISTENCIES-PASS %s
9
10 # CHECK-CLUSTERS-ALL: {{^}}cluster_id,opcode_name,config,sched_class,latency{{$}}
11
12 # CHECK-CLUSTERS-TWO: {{^}}0,
13 # CHECK-CLUSTERS-TWO-SAME: ,90.00{{$}}
14 # CHECK-CLUSTERS-TWO: {{^}}1,
15 # CHECK-CLUSTERS-TWO-SAME: ,100.00{{$}}
16
17 # CHECK-CLUSTERS-ONE: {{^}}0,
18 # CHECK-CLUSTERS-ONE-SAME: ,90.00{{$}}
19 # CHECK-CLUSTERS-ONE-NEXT: {{^}}0,
20 # CHECK-CLUSTERS-ONE-SAME: ,100.00{{$}}
21
22 # CHECK-INCONSISTENCIES-FAIL: contains instructions whose performance characteristics do not match that of LLVM
23 # CHECK-INCONSISTENCIES-FAIL: contains instructions whose performance characteristics do not match that of LLVM
24 # CHECK-INCONSISTENCIES-FAIL-NOT: contains instructions whose performance characteristics do not match that of LLVM
25
26 # CHECK-INCONSISTENCIES-PASS-NOT: contains instructions whose performance characteristics do not match that of LLVM
27
28 ---
29 mode: latency
30 key:
31 instructions:
32 - 'ADD32rr EDX EDX EAX'
33 config: ''
34 register_initial_values:
35 - 'EDX=0x0'
36 - 'EAX=0x0'
37 cpu_name: bdver2
38 llvm_triple: x86_64-unknown-linux-gnu
39 num_repetitions: 10000
40 measurements:
41 - { key: latency, value: 90, per_snippet_value: 90 }
42 error: ''
43 info: Repeating a single implicitly serial instruction
44 assembled_snippet: BA00000000B80000000001C201C201C201C201C201C201C201C201C201C201C201C201C201C201C201C2C3
45 ...
46 ---
47 mode: latency
48 key:
49 instructions:
50 - 'SQRTSSr XMM11 XMM11'
51 config: ''
52 register_initial_values:
53 - 'XMM11=0x0'
54 cpu_name: bdver2
55 llvm_triple: x86_64-unknown-linux-gnu
56 num_repetitions: 10000
57 measurements:
58 - { key: latency, value: 100, per_snippet_value: 100 }
59 error: ''
60 info: Repeating a single explicitly serial instruction
61 assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C410F3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBC3
62 ...
";
169169 Analysis::Analysis(const llvm::Target &Target,
170170 std::unique_ptr InstrInfo,
171171 const InstructionBenchmarkClustering &Clustering,
172 double AnalysisInconsistencyEpsilon,
172173 bool AnalysisDisplayUnstableOpcodes)
173174 : Clustering_(Clustering), InstrInfo_(std::move(InstrInfo)),
175 AnalysisInconsistencyEpsilonSquared_(AnalysisInconsistencyEpsilon *
176 AnalysisInconsistencyEpsilon),
174177 AnalysisDisplayUnstableOpcodes_(AnalysisDisplayUnstableOpcodes) {
175178 if (Clustering.getPoints().empty())
176179 return;
300303 OS << "
301304 for (const SchedClassCluster &Cluster : Clusters) {
302305 OS << "
303 << (Cluster.measurementsMatch(*SubtargetInfo_, RSC, Clustering_)
306 << (Cluster.measurementsMatch(*SubtargetInfo_, RSC, Clustering_,
307 AnalysisInconsistencyEpsilonSquared_)
304308 ? "good-cluster"
305309 : "bad-cluster")
306310 << "\">";
460464
461465 bool Analysis::SchedClassCluster::measurementsMatch(
462466 const llvm::MCSubtargetInfo &STI, const ResolvedSchedClass &RSC,
463 const InstructionBenchmarkClustering &Clustering) const {
467 const InstructionBenchmarkClustering &Clustering,
468 const double AnalysisInconsistencyEpsilonSquared_) const {
464469 const size_t NumMeasurements = Representative.size();
465470 std::vector ClusterCenterPoint(NumMeasurements);
466471 std::vector SchedClassPoint(NumMeasurements);
519524 llvm_unreachable("unimplemented measurement matching mode");
520525 return false;
521526 }
522 return Clustering.isNeighbour(ClusterCenterPoint, SchedClassPoint);
527 return Clustering.isNeighbour(ClusterCenterPoint, SchedClassPoint,
528 AnalysisInconsistencyEpsilonSquared_);
523529 }
524530
525531 void Analysis::printSchedClassDescHtml(const ResolvedSchedClass &RSC,
688694 if (llvm::all_of(SchedClassClusters,
689695 [this, &RSCAndPoints](const SchedClassCluster &C) {
690696 return C.measurementsMatch(
691 *SubtargetInfo_, RSCAndPoints.RSC, Clustering_);
697 *SubtargetInfo_, RSCAndPoints.RSC, Clustering_,
698 AnalysisInconsistencyEpsilonSquared_);
692699 }))
693700 continue; // Nothing weird.
694701
3737 Analysis(const llvm::Target &Target,
3838 std::unique_ptr InstrInfo,
3939 const InstructionBenchmarkClustering &Clustering,
40 double AnalysisInconsistencyEpsilon,
4041 bool AnalysisDisplayUnstableOpcodes);
4142
4243 // Prints a csv of instructions for each cluster.
8081 bool
8182 measurementsMatch(const llvm::MCSubtargetInfo &STI,
8283 const ResolvedSchedClass &SC,
83 const InstructionBenchmarkClustering &Clustering) const;
84 const InstructionBenchmarkClustering &Clustering,
85 const double AnalysisInconsistencyEpsilonSquared_) const;
8486
8587 void addPoint(size_t PointId,
8688 const InstructionBenchmarkClustering &Clustering);
126128 std::unique_ptr AsmInfo_;
127129 std::unique_ptr InstPrinter_;
128130 std::unique_ptr Disasm_;
131 const double AnalysisInconsistencyEpsilonSquared_;
129132 const bool AnalysisDisplayUnstableOpcodes_;
130133 };
131134
4545 const auto &PMeasurements = Points_[P].Measurements;
4646 if (PMeasurements.empty()) // Error point.
4747 continue;
48 if (isNeighbour(PMeasurements, QMeasurements)) {
48 if (isNeighbour(PMeasurements, QMeasurements,
49 AnalysisClusteringEpsilonSquared_)) {
4950 Neighbors.push_back(P);
5051 }
5152 }
5354
5455 InstructionBenchmarkClustering::InstructionBenchmarkClustering(
5556 const std::vector &Points,
56 const double EpsilonSquared)
57 : Points_(Points), EpsilonSquared_(EpsilonSquared),
57 const double AnalysisClusteringEpsilonSquared)
58 : Points_(Points),
59 AnalysisClusteringEpsilonSquared_(AnalysisClusteringEpsilonSquared),
5860 NoiseCluster_(ClusterId::noise()), ErrorCluster_(ClusterId::error()) {}
5961
6062 llvm::Error InstructionBenchmarkClustering::validateAndSetup() {
244246 llvm::Expected
245247 InstructionBenchmarkClustering::create(
246248 const std::vector &Points, const size_t MinPts,
247 const double Epsilon, llvm::Optional NumOpcodes) {
248 InstructionBenchmarkClustering Clustering(Points, Epsilon * Epsilon);
249 const double AnalysisClusteringEpsilon,
250 llvm::Optional NumOpcodes) {
251 InstructionBenchmarkClustering Clustering(
252 Points, AnalysisClusteringEpsilon * AnalysisClusteringEpsilon);
249253 if (auto Error = Clustering.validateAndSetup()) {
250254 return std::move(Error);
251255 }
2828 // for more explanations on the algorithm.
2929 static llvm::Expected
3030 create(const std::vector &Points, size_t MinPts,
31 double Epsilon, llvm::Optional NumOpcodes = llvm::None);
31 double AnalysisClusteringEpsilon,
32 llvm::Optional NumOpcodes = llvm::None);
3233
3334 class ClusterId {
3435 public:
102103
103104 // Returns true if the given point is within a distance Epsilon of each other.
104105 bool isNeighbour(const std::vector &P,
105 const std::vector &Q) const {
106 const std::vector &Q,
107 const double EpsilonSquared_) const {
106108 double DistanceSquared = 0.0;
107109 for (size_t I = 0, E = P.size(); I < E; ++I) {
108110 const auto Diff = P[I].PerInstructionValue - Q[I].PerInstructionValue;
113115
114116 private:
115117 InstructionBenchmarkClustering(
116 const std::vector &Points, double EpsilonSquared);
118 const std::vector &Points,
119 double AnalysisClusteringEpsilonSquared);
117120
118121 llvm::Error validateAndSetup();
119122 void dbScan(size_t MinPts);
121124 void rangeQuery(size_t Q, std::vector &Scratchpad) const;
122125
123126 const std::vector &Points_;
124 const double EpsilonSquared_;
127 const double AnalysisClusteringEpsilonSquared_;
125128 int NumDimensions_ = 0;
126129 // ClusterForPoint_[P] is the cluster id for Points[P].
127130 std::vector ClusterIdForPoint_;
8383 "analysis-numpoints",
8484 cl::desc("minimum number of points in an analysis cluster"), cl::init(3));
8585
86 static cl::opt
87 AnalysisEpsilon("analysis-epsilon",
88 cl::desc("dbscan epsilon for analysis clustering"),
89 cl::init(0.1));
86 static cl::opt AnalysisClusteringEpsilon(
87 "analysis-clustering-epsilon",
88 cl::desc("dbscan epsilon for benchmark point clustering"), cl::init(0.1));
89
90 static cl::opt AnalysisInconsistencyEpsilon(
91 "analysis-inconsistency-epsilon",
92 cl::desc("epsilon for detection of when the cluster is different from the "
93 "LLVM schedule profile values"),
94 cl::init(0.1));
9095
9196 static cl::opt
9297 AnalysisClustersOutputFile("analysis-clusters-output-file", cl::desc(""),
443448 std::unique_ptr InstrInfo(TheTarget->createMCInstrInfo());
444449
445450 const auto Clustering = ExitOnErr(InstructionBenchmarkClustering::create(
446 Points, AnalysisNumPoints, AnalysisEpsilon, InstrInfo->getNumOpcodes()));
451 Points, AnalysisNumPoints, AnalysisClusteringEpsilon,
452 InstrInfo->getNumOpcodes()));
447453
448454 const Analysis Analyzer(*TheTarget, std::move(InstrInfo), Clustering,
455 AnalysisInconsistencyEpsilon,
449456 AnalysisDisplayUnstableOpcodes);
450457
451458 maybeRunAnalysis(Analyzer, "analysis clusters",