llvm.org GIT mirror llvm / b01acda
[XRay][tools] Function call stack based analysis tooling for XRay traces This change introduces a subcommand to the llvm-xray tool called "stacks" which allows for analysing XRay traces provided as inputs and accounting time to stacks instead of just individual functions. This gives us a more precise view of where in a program the latency is actually attributed. The tool uses a trie data structure to keep track of the caller-callee relationships as we process the XRay traces. In particular, we keep track of the function call stack as we enter functions. While we're doing this we're adding nodes in a trie and indicating a "calls" relatinship between the caller (current top of the stack) and the callee (the new top of the stack). When we push function ids onto the stack, we keep track of the timestamp (TSC) for the enter event. When exiting functions, we are able to account the duration by getting the difference between the timestamp of the exit event and the corresponding entry event in the stack. This works even if we somehow miss the exit events for intermediary functions (i.e. if the exit event is not cleanly associated with the enter event at the top of the stack). The output of the tool currently provides just the top N leaf functions that contribute the most latency, and the top N stacks that have the most frequency. In the future we can provide more sophisticated query mechanisms and potentially an export to database feature to make offline analysis of the stack traces possible with existing tools. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312426 91177308-0d34-0410-b5e6-96231b3b80d8 Keith Wyss 2 years ago
7 changed file(s) with 773 addition(s) and 1 deletion(s). Raw diff Collapse all Expand all
0 #RUN: (llvm-xray stack %s 2>&1 || echo "Checking Command Failed") | FileCheck %s
1 ---
2 header:
3 version: 1
4 type: 0
5 constant-tsc: true
6 nonstop-tsc: true
7 cycle-frequency: 2601000000
8 records:
9 ...
10 # CHECK: llvm-xray: No instrumented calls were accounted in the input file.
11 # CHECK: Checking Command Failed
12 # CHECK-NOT: {{[0-9A-Z]+}}
0 #RUN: (llvm-xray stack %s 2>&1 1>&- || echo "Check Command Failed") | FileCheck --check-prefix HALT %s
1 #RUN: (llvm-xray stack -k %s 2>&1 && echo "Check Command Succeeded") | FileCheck --check-prefix KEEP-GOING-SUCCEEDS %s
2 #RUN: llvm-xray stack -k %s | FileCheck --check-prefix KEEP-GOING %s
3 ---
4 header:
5 version: 1
6 type: 0
7 constant-tsc: true
8 nonstop-tsc: true
9 cycle-frequency: 2601000000
10 records:
11 - { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10001 }
12 - { type: 1, func-id: 4, cpu: 1, thread: 111, kind: function-exit, tsc: 10301 }
13 - { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10401 }
14 - { type: 0, func-id: 2, cpu: 1, thread: 111, kind: function-enter, tsc: 10501 }
15 - { type: 0, func-id: 3, cpu: 1, thread: 111, kind: function-enter, tsc: 10601 }
16 - { type: 1, func-id: 3, cpu: 1, thread: 111, kind: function-exit, tsc: 10701 }
17 - { type: 1, func-id: 2, cpu: 1, thread: 111, kind: function-exit, tsc: 10751 }
18 - { type: 1, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10775 }
19 ...
20
21 #HALT: llvm-xray: Found record {FuncId: "#4", ThreadId: "111", RecordType: "Fn Exit"} with no matching function entry
22 #HALT: Check Command Failed
23 #KEEP-GOING-SUCCEEDS: Found record {FuncId: "#4", ThreadId: "111", RecordType: "Fn Exit"} with no matching function entry
24 #KEEP-GOING-SUCCEEDS: Check Command Succeeded
25 #KEEP-GOING: Unique Stacks: 2
26 # Note the interesting case here that the stack { fn-1 } is a prefix of { fn-1, fn-2, fn-3 } but they
27 # are still counted as unique stacks.
0 #RUN: llvm-xray stack -per-thread-stacks %s | FileCheck %s --check-prefix PER-THREAD
1 #RUN: llvm-xray stack -aggregate-threads %s | FileCheck %s --check-prefix AGGREGATE
2
3 ---
4 header:
5 version: 1
6 type: 0
7 constant-tsc: true
8 nonstop-tsc: true
9 cycle-frequency: 2601000000
10 records:
11 - { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10001 }
12 - { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10100 }
13 - { type: 1, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10101 }
14 - { type: 1, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10301 }
15 - { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10401 }
16 - { type: 0, func-id: 2, cpu: 1, thread: 111, kind: function-enter, tsc: 10501 }
17 - { type: 0, func-id: 3, cpu: 1, thread: 111, kind: function-enter, tsc: 10601 }
18 - { type: 1, func-id: 3, cpu: 1, thread: 111, kind: function-exit, tsc: 10701 }
19 - { type: 1, func-id: 2, cpu: 1, thread: 111, kind: function-exit, tsc: 10751 }
20 - { type: 1, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10775 }
21 - { type: 0, func-id: 1, cpu: 1, thread: 123, kind: function-enter, tsc: 10401 }
22 - { type: 0, func-id: 2, cpu: 1, thread: 123, kind: function-enter, tsc: 10501 }
23 - { type: 0, func-id: 3, cpu: 1, thread: 123, kind: function-enter, tsc: 10701 }
24 - { type: 1, func-id: 3, cpu: 1, thread: 123, kind: function-exit, tsc: 10801 }
25 - { type: 1, func-id: 2, cpu: 1, thread: 123, kind: function-exit, tsc: 10951 }
26 - { type: 1, func-id: 1, cpu: 1, thread: 123, kind: function-exit, tsc: 11075 }
27 - { type: 0, func-id: 2, cpu: 1, thread: 200, kind: function-enter, tsc: 0 }
28 - { type: 0, func-id: 3, cpu: 1, thread: 200, kind: function-enter, tsc: 10 }
29 - { type: 1, func-id: 3, cpu: 1, thread: 200, kind: function-exit, tsc: 20 }
30 - { type: 1, func-id: 2, cpu: 1, thread: 200, kind: function-exit, tsc: 30 }
31 ...
32 # PER-THREAD: Thread 123
33 # PER-THREAD: Unique Stacks: 1
34 # PER-THREAD: Top 10 Stacks by leaf sum:
35 # PER-THREAD: Sum: 100
36 # PER-THREAD: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
37 # PER-THREAD: #0 #1{{[[:space:]]+}}1{{[[:space:]]+}}674
38 # PER-THREAD: #1 #2{{[[:space:]]+}}1{{[[:space:]]+}}450
39 # PER-THREAD: #2 #3{{[[:space:]]+}}1{{[[:space:]]+}}100
40 # PER-THREAD: Top 10 Stacks by leaf count:
41 # PER-THREAD: #0 #1{{[[:space:]]+}}1{{[[:space:]]+}}674
42 # PER-THREAD: #1 #2{{[[:space:]]+}}1{{[[:space:]]+}}450
43 # PER-THREAD: #2 #3{{[[:space:]]+}}1{{[[:space:]]+}}100
44 # PER-THREAD: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
45
46 # AGGREGATE: Unique Stacks: 3
47 # AGGREGATE: Top 10 Stacks by leaf sum:
48 # AGGREGATE: Sum: 200
49
50 # AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
51 # AGGREGATE: #0 #1{{[[:space:]]+}}3{{[[:space:]]+}}1348
52 # AGGREGATE: #1 #2{{[[:space:]]+}}2{{[[:space:]]+}}700
53 # AGGREGATE: #2 #3{{[[:space:]]+}}2{{[[:space:]]+}}200
54
55 # AGGREGATE: Sum: 10
56 # AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
57 # AGGREGATE: #0 #2{{[[:space:]]+}}1{{[[:space:]]+}}30
58 # AGGREGATE: #1 #3{{[[:space:]]+}}1{{[[:space:]]+}}10
59
60 # AGGREGATE: Sum: 1
61 # AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
62 # AGGREGATE: #0 #1{{[[:space:]]+}}2{{[[:space:]]+}}674
63 # AGGREGATE: #1 #1{{[[:space:]]+}}1{{[[:space:]]+}}1
64
65
66 # AGGREGATE: Top 10 Stacks by leaf count:
67
68 # AGGREGATE: Count: 2
69 # AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
70 # AGGREGATE: #0 #1{{[[:space:]]+}}3{{[[:space:]]+}}1348
71 # AGGREGATE: #1 #2{{[[:space:]]+}}2{{[[:space:]]+}}700
72 # AGGREGATE: #2 #3{{[[:space:]]+}}2{{[[:space:]]+}}200
73
74 # AGGREGATE: Count: 1
75 # AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
76 # AGGREGATE: #0 #2{{[[:space:]]+}}1{{[[:space:]]+}}30
77 # AGGREGATE: #1 #3{{[[:space:]]+}}1{{[[:space:]]+}}10
78
79 # AGGREGATE: Count: 1
80 # AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
81 # AGGREGATE: #0 #1{{[[:space:]]+}}2{{[[:space:]]+}}674
82 # AGGREGATE: #1 #1{{[[:space:]]+}}1{{[[:space:]]+}}1
0 #RUN: llvm-xray stack %s | FileCheck %s
1 ---
2 header:
3 version: 1
4 type: 0
5 constant-tsc: true
6 nonstop-tsc: true
7 cycle-frequency: 2601000000
8 records:
9 - { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10001 }
10 - { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10100 }
11 ...
12 #CHECK: Unique Stacks: 1
1414 xray-extract.cc
1515 xray-graph.cc
1616 xray-graph-diff.cc
17 xray-stacks.cc
1718 xray-registry.cc)
1819
1920 add_llvm_tool(llvm-xray llvm-xray.cc ${LLVM_XRAY_TOOLS})
9696 } // namespace yaml
9797 } // namespace llvm
9898
99 LLVM_YAML_IS_SEQUENCE_VECTOR(xray::YAMLXRayRecord)
99 LLVM_YAML_IS_SEQUENCE_VECTOR(xray::YAMLXRayRecord)
100100
101101 #endif // LLVM_TOOLS_LLVM_XRAY_XRAY_RECORD_YAML_H
0 //===- xray-stacks.cc - XRay Function Call Stack Accounting ---------------===//
1 //
2 // The LLVM Compiler Infrastructure
3 //
4 // This file is distributed under the University of Illinois Open Source
5 // License. See LICENSE.TXT for details.
6 //
7 //===----------------------------------------------------------------------===//
8 //
9 // This file implements stack-based accounting. It takes XRay traces, and
10 // collates statistics across these traces to show a breakdown of time spent
11 // at various points of the stack to provide insight into which functions
12 // spend the most time in terms of a call stack. We provide a few
13 // sorting/filtering options for zero'ing in on the useful stacks.
14 //
15 //===----------------------------------------------------------------------===//
16
17 #include
18 #include
19
20 #include "func-id-helper.h"
21 #include "xray-registry.h"
22 #include "llvm/ADT/StringExtras.h"
23 #include "llvm/Support/CommandLine.h"
24 #include "llvm/Support/Errc.h"
25 #include "llvm/Support/ErrorHandling.h"
26 #include "llvm/Support/FormatAdapters.h"
27 #include "llvm/Support/FormatVariadic.h"
28 #include "llvm/XRay/Graph.h"
29 #include "llvm/XRay/InstrumentationMap.h"
30 #include "llvm/XRay/Trace.h"
31
32 using namespace llvm;
33 using namespace llvm::xray;
34
35 static cl::SubCommand Stack("stack", "Call stack accounting");
36 static cl::list StackInputs(cl::Positional,
37 cl::desc(""), cl::Required,
38 cl::sub(Stack), cl::OneOrMore);
39
40 static cl::opt
41 StackKeepGoing("keep-going", cl::desc("Keep going on errors encountered"),
42 cl::sub(Stack), cl::init(false));
43 static cl::alias StackKeepGoing2("k", cl::aliasopt(StackKeepGoing),
44 cl::desc("Alias for -keep-going"),
45 cl::sub(Stack));
46
47 // TODO: Does there need to be an option to deduce tail or sibling calls?
48
49 static cl::opt StacksInstrMap(
50 "instr_map",
51 cl::desc("instrumentation map used to identify function ids. "
52 "Currently supports elf file instrumentation maps."),
53 cl::sub(Stack), cl::init(""));
54 static cl::alias StacksInstrMap2("m", cl::aliasopt(StacksInstrMap),
55 cl::desc("Alias for -instr_map"),
56 cl::sub(Stack));
57
58 static cl::opt
59 SeparateThreadStacks("per-thread-stacks",
60 cl::desc("Report top stacks within each thread id"),
61 cl::sub(Stack), cl::init(false));
62
63 static cl::opt
64 AggregateThreads("aggregate-threads",
65 cl::desc("Aggregate stack times across threads"),
66 cl::sub(Stack), cl::init(false));
67
68 /// A helper struct to work with formatv and XRayRecords. Makes it easier to use
69 /// instrumentation map names or addresses in formatted output.
70 struct format_xray_record : public FormatAdapter {
71 explicit format_xray_record(XRayRecord record,
72 const FuncIdConversionHelper &conv)
73 : FormatAdapter(std::move(record)), Converter(&conv) {}
74 void format(raw_ostream &Stream, StringRef Style) override {
75 Stream << formatv(
76 "{FuncId: \"{0}\", ThreadId: \"{1}\", RecordType: \"{2}\"}",
77 Converter->SymbolOrNumber(Item.FuncId), Item.TId,
78 DecodeRecordType(Item.RecordType));
79 }
80
81 private:
82 Twine DecodeRecordType(uint16_t recordType) {
83 switch (recordType) {
84 case 0:
85 return Twine("Fn Entry");
86 case 1:
87 return Twine("Fn Exit");
88 default:
89 // TODO: Add Tail exit when it is added to llvm/XRay/XRayRecord.h
90 return Twine("Unknown");
91 }
92 }
93
94 const FuncIdConversionHelper *Converter;
95 };
96
97 /// The stack command will take a set of XRay traces as arguments, and collects
98 /// information about the stacks of instrumented functions that appear in the
99 /// traces. We track the following pieces of information:
100 ///
101 /// - Total time: amount of time/cycles accounted for in the traces.
102 /// - Stack count: number of times a specific stack appears in the
103 /// traces. Only instrumented functions show up in stacks.
104 /// - Cumulative stack time: amount of time spent in a stack accumulated
105 /// across the invocations in the traces.
106 /// - Cumulative local time: amount of time spent in each instrumented
107 /// function showing up in a specific stack, accumulated across the traces.
108 ///
109 /// Example output for the kind of data we'd like to provide looks like the
110 /// following:
111 ///
112 /// Total time: 3.33234 s
113 /// Stack ID: ...
114 /// Stack Count: 2093
115 /// # Function Local Time (%) Stack Time (%)
116 /// 0 main 2.34 ms 0.07% 3.33234 s 100%
117 /// 1 foo() 3.30000 s 99.02% 3.33 s 99.92%
118 /// 2 bar() 30 ms 0.90% 30 ms 0.90%
119 ///
120 /// We can also show distributions of the function call durations with
121 /// statistics at each level of the stack. This works by doing the following
122 /// algorithm:
123 ///
124 /// 1. When unwinding, record the duration of each unwound function associated
125 /// with the path up to which the unwinding stops. For example:
126 ///
127 /// Step Duration (? means has start time)
128 ///
129 /// push a a = ?
130 /// push b a = ?, a->b = ?
131 /// push c a = ?, a->b = ?, a->b->c = ?
132 /// pop c a = ?, a->b = ?, emit duration(a->b->c)
133 /// pop b a = ?, emit duration(a->b)
134 /// push c a = ?, a->c = ?
135 /// pop c a = ?, emit duration(a->c)
136 /// pop a emit duration(a)
137 ///
138 /// 2. We then account for the various stacks we've collected, and for each of
139 /// them will have measurements that look like the following (continuing
140 /// with the above simple example):
141 ///
142 /// c : [b->c"), [durations]>, c"), [durations]>]
143 /// b : [b"), [durations]>]
144 /// a : []
145 ///
146 /// This allows us to compute, for each stack id, and each function that
147 /// shows up in the stack, some important statistics like:
148 ///
149 /// - median
150 /// - 99th percentile
151 /// - mean + stddev
152 /// - count
153 ///
154 /// 3. For cases where we don't have durations for some of the higher levels
155 /// of the stack (perhaps instrumentation wasn't activated when the stack was
156 /// entered), we can mark them appropriately.
157 ///
158 /// Computing this data also allows us to implement lookup by call stack nodes,
159 /// so that we can find functions that show up in multiple stack traces and
160 /// show the statistical properties of that function in various contexts. We
161 /// can compute information similar to the following:
162 ///
163 /// Function: 'c'
164 /// Stacks: 2 / 2
165 /// Stack ID: ...
166 /// Stack Count: ...
167 /// # Function ...
168 /// 0 a ...
169 /// 1 b ...
170 /// 2 c ...
171 ///
172 /// Stack ID: ...
173 /// Stack Count: ...
174 /// # Function ...
175 /// 0 a ...
176 /// 1 c ...
177 /// ----------------...
178 ///
179 /// Function: 'b'
180 /// Stacks: 1 / 2
181 /// Stack ID: ...
182 /// Stack Count: ...
183 /// # Function ...
184 /// 0 a ...
185 /// 1 b ...
186 /// 2 c ...
187 ///
188 ///
189 /// To do this we require a Trie data structure that will allow us to represent
190 /// all the call stacks of instrumented functions in an easily traversible
191 /// manner when we do the aggregations and lookups. For instrumented call
192 /// sequences like the following:
193 ///
194 /// a()
195 /// b()
196 /// c()
197 /// d()
198 /// c()
199 ///
200 /// We will have a representation like so:
201 ///
202 /// a -> b -> c
203 /// | |
204 /// | +--> d
205 /// |
206 /// +--> c
207 ///
208 /// We maintain a sequence of durations on the leaves and in the internal nodes
209 /// as we go through and process every record from the XRay trace. We also
210 /// maintain an index of unique functions, and provide a means of iterating
211 /// through all the instrumented call stacks which we know about.
212
213 struct TrieNode {
214 int32_t FuncId;
215 TrieNode *Parent;
216 SmallVector Callees;
217 // Separate durations depending on whether the node is the deepest node in the
218 // stack.
219 SmallVector TerminalDurations;
220 SmallVector IntermediateDurations;
221 };
222
223 /// Merges together two TrieNodes with like function ids, aggregating their
224 /// callee lists and durations. The caller must provide storage where new merged
225 /// nodes can be allocated in the form of a linked list.
226 TrieNode *mergeTrieNodes(const TrieNode &Left, const TrieNode &Right,
227 TrieNode *NewParent,
228 std::forward_list &NodeStore) {
229 assert(Left.FuncId == Right.FuncId);
230 NodeStore.push_front(TrieNode{Left.FuncId, NewParent, {}, {}, {}});
231 auto I = NodeStore.begin();
232 auto *Node = &*I;
233
234 // Build a map of callees from the left side.
235 DenseMap LeftCalleesByFuncId;
236 for (auto *Callee : Left.Callees) {
237 LeftCalleesByFuncId[Callee->FuncId] = Callee;
238 }
239
240 // Iterate through the right side, either merging with the map values or
241 // directly adding to the Callees vector. The iteration also removes any
242 // merged values from the left side map.
243 for (auto *Callee : Right.Callees) {
244 auto iter = LeftCalleesByFuncId.find(Callee->FuncId);
245 if (iter != LeftCalleesByFuncId.end()) {
246 Node->Callees.push_back(
247 mergeTrieNodes(*(iter->second), *Callee, Node, NodeStore));
248 LeftCalleesByFuncId.erase(iter);
249 } else {
250 Node->Callees.push_back(Callee);
251 }
252 }
253
254 // Add any callees that weren't found in the right side.
255 for (auto MapPairIter : LeftCalleesByFuncId) {
256 Node->Callees.push_back(MapPairIter.second);
257 }
258
259 // Aggregate the durations.
260 for (auto duration : Left.TerminalDurations) {
261 Node->TerminalDurations.push_back(duration);
262 }
263 for (auto duration : Right.TerminalDurations) {
264 Node->TerminalDurations.push_back(duration);
265 }
266 for (auto duration : Left.IntermediateDurations) {
267 Node->IntermediateDurations.push_back(duration);
268 }
269 for (auto duration : Right.IntermediateDurations) {
270 Node->IntermediateDurations.push_back(duration);
271 }
272
273 return Node;
274 }
275
276 class StackTrie {
277
278 // We maintain pointers to the roots of the tries we see.
279 DenseMap> Roots;
280
281 // We make sure all the nodes are accounted for in this list.
282 std::forward_list NodeStore;
283
284 // A map of thread ids to pairs call stack trie nodes and their start times.
285 DenseMap, 8>>
286 ThreadStackMap;
287
288 TrieNode *createTrieNode(uint32_t ThreadId, int32_t FuncId,
289 TrieNode *Parent) {
290 NodeStore.push_front(TrieNode{FuncId, Parent, {}, {}, {}});
291 auto I = NodeStore.begin();
292 auto *Node = &*I;
293 if (!Parent)
294 Roots[ThreadId].push_back(Node);
295 return Node;
296 }
297
298 TrieNode *findRootNode(uint32_t ThreadId, int32_t FuncId) {
299 const auto &RootsByThread = Roots[ThreadId];
300 auto I = find_if(RootsByThread,
301 [&](TrieNode *N) { return N->FuncId == FuncId; });
302 return (I == RootsByThread.end()) ? nullptr : *I;
303 }
304
305 public:
306 enum class AccountRecordStatus {
307 OK, // Successfully processed
308 ENTRY_NOT_FOUND, // An exit record had no matching call stack entry
309 UNKNOWN_RECORD_TYPE
310 };
311
312 struct AccountRecordState {
313 // We keep track of whether the call stack is currently unwinding.
314 bool wasLastRecordExit;
315
316 static AccountRecordState CreateInitialState() { return {false}; }
317 };
318
319 AccountRecordStatus accountRecord(const XRayRecord &R,
320 AccountRecordState *state) {
321 auto &TS = ThreadStackMap[R.TId];
322 switch (R.Type) {
323 case RecordTypes::ENTER: {
324 state->wasLastRecordExit = false;
325 // When we encounter a new function entry, we want to record the TSC for
326 // that entry, and the function id. Before doing so we check the top of
327 // the stack to see if there are callees that already represent this
328 // function.
329 if (TS.empty()) {
330 auto *Root = findRootNode(R.TId, R.FuncId);
331 TS.emplace_back(Root ? Root : createTrieNode(R.TId, R.FuncId, nullptr),
332 R.TSC);
333 return AccountRecordStatus::OK;
334 }
335
336 auto &Top = TS.back();
337 auto I = find_if(Top.first->Callees,
338 [&](TrieNode *N) { return N->FuncId == R.FuncId; });
339 if (I == Top.first->Callees.end()) {
340 // We didn't find the callee in the stack trie, so we're going to
341 // add to the stack then set up the pointers properly.
342 auto N = createTrieNode(R.TId, R.FuncId, Top.first);
343 Top.first->Callees.emplace_back(N);
344
345 // Top may be invalidated after this statement.
346 TS.emplace_back(N, R.TSC);
347 } else {
348 // We found the callee in the stack trie, so we'll use that pointer
349 // instead, add it to the stack associated with the TSC.
350 TS.emplace_back(*I, R.TSC);
351 }
352 return AccountRecordStatus::OK;
353 }
354 case RecordTypes::EXIT: {
355 bool wasLastRecordExit = state->wasLastRecordExit;
356 state->wasLastRecordExit = true;
357 // The exit case is more interesting, since we want to be able to deduce
358 // missing exit records. To do that properly, we need to look up the stack
359 // and see whether the exit record matches any of the entry records. If it
360 // does match, we attempt to record the durations as we pop the stack to
361 // where we see the parent.
362 if (TS.empty()) {
363 // Short circuit, and say we can't find it.
364
365 return AccountRecordStatus::ENTRY_NOT_FOUND;
366 }
367
368 auto FunctionEntryMatch =
369 find_if(reverse(TS), [&](const std::pair &E) {
370 return E.first->FuncId == R.FuncId;
371 });
372 auto status = AccountRecordStatus::OK;
373 if (FunctionEntryMatch == TS.rend()) {
374 status = AccountRecordStatus::ENTRY_NOT_FOUND;
375 } else {
376 // Account for offset of 1 between reverse and forward iterators. We
377 // want the forward iterator to include the function that is exited.
378 ++FunctionEntryMatch;
379 }
380 auto I = FunctionEntryMatch.base();
381 for (auto &E : make_range(I, TS.end() - 1))
382 E.first->IntermediateDurations.push_back(std::max(E.second, R.TSC) -
383 std::min(E.second, R.TSC));
384 auto &Deepest = TS.back();
385 if (wasLastRecordExit)
386 Deepest.first->IntermediateDurations.push_back(
387 std::max(Deepest.second, R.TSC) - std::min(Deepest.second, R.TSC));
388 else
389 Deepest.first->TerminalDurations.push_back(
390 std::max(Deepest.second, R.TSC) - std::min(Deepest.second, R.TSC));
391 TS.erase(I, TS.end());
392 return status;
393 }
394 }
395 return AccountRecordStatus::UNKNOWN_RECORD_TYPE;
396 }
397
398 bool isEmpty() const { return Roots.empty(); }
399
400 void printStack(raw_ostream &OS, const TrieNode *Top,
401 FuncIdConversionHelper &FN) {
402 // Traverse the pointers up to the parent, noting the sums, then print
403 // in reverse order (callers at top, callees down bottom).
404 SmallVector CurrentStack;
405 for (auto *F = Top; F != nullptr; F = F->Parent)
406 CurrentStack.push_back(F);
407 int Level = 0;
408 OS << formatv("{0,-5} {1,-60} {2,+12} {3,+16}\n", "lvl", "function",
409 "count", "sum");
410 for (auto *F :
411 reverse(make_range(CurrentStack.begin() + 1, CurrentStack.end()))) {
412 auto Sum = std::accumulate(F->IntermediateDurations.begin(),
413 F->IntermediateDurations.end(), 0LL);
414 auto FuncId = FN.SymbolOrNumber(F->FuncId);
415 OS << formatv("#{0,-4} {1,-60} {2,+12} {3,+16}\n", Level++,
416 FuncId.size() > 60 ? FuncId.substr(0, 57) + "..." : FuncId,
417 F->IntermediateDurations.size(), Sum);
418 }
419 auto *Leaf = *CurrentStack.begin();
420 auto LeafSum = std::accumulate(Leaf->TerminalDurations.begin(),
421 Leaf->TerminalDurations.end(), 0LL);
422 auto LeafFuncId = FN.SymbolOrNumber(Leaf->FuncId);
423 OS << formatv("#{0,-4} {1,-60} {2,+12} {3,+16}\n", Level++,
424 LeafFuncId.size() > 60 ? LeafFuncId.substr(0, 57) + "..."
425 : LeafFuncId,
426 Leaf->TerminalDurations.size(), LeafSum);
427 OS << "\n";
428 }
429
430 /// Prints top stacks for each thread.
431 void printPerThread(raw_ostream &OS, FuncIdConversionHelper &FN) {
432 for (auto iter : Roots) {
433 OS << "Thread " << iter.first << ":\n";
434 print(OS, FN, iter.second);
435 OS << "\n";
436 }
437 }
438
439 /// Prints top stacks from looking at all the leaves and ignoring thread IDs.
440 /// Stacks that consist of the same function IDs but were called in different
441 /// thread IDs are not considered unique in this printout.
442 void printIgnoringThreads(raw_ostream &OS, FuncIdConversionHelper &FN) {
443 SmallVector RootValues;
444 auto RootBackInserter = std::back_inserter(RootValues);
445 using RootsType = decltype(*Roots.begin());
446 auto SecondFn = [](const RootsType &Value) { return Value.second; };
447
448 std::for_each(
449 map_iterator(Roots.begin(), SecondFn),
450 map_iterator(Roots.end(), SecondFn),
451 [&RootBackInserter](decltype(Roots.begin()->second) RootNodeVector) {
452 for (auto *RootNode : RootNodeVector) {
453 RootBackInserter = RootNode;
454 }
455 });
456 print(OS, FN, RootValues);
457 }
458
459 /// Merges the trie by thread id before printing top stacks.
460 void printAggregatingThreads(raw_ostream &OS, FuncIdConversionHelper &FN) {
461 std::forward_list AggregatedNodeStore;
462 SmallVector RootValues;
463 for (auto MapIter : Roots) {
464 const auto &RootNodeVector = MapIter.second;
465 for (auto *Node : RootNodeVector) {
466 auto MaybeFoundIter = find_if(RootValues, [Node](TrieNode *elem) {
467 return Node->FuncId == elem->FuncId;
468 });
469 if (MaybeFoundIter == RootValues.end()) {
470 RootValues.push_back(Node);
471 } else {
472 RootValues.push_back(mergeTrieNodes(**MaybeFoundIter, *Node, nullptr,
473 AggregatedNodeStore));
474 RootValues.erase(MaybeFoundIter);
475 }
476 }
477 }
478 print(OS, FN, RootValues);
479 }
480
481 void print(raw_ostream &OS, FuncIdConversionHelper &FN,
482 SmallVector RootValues) {
483 // Go through each of the roots, and traverse the call stack, producing the
484 // aggregates as you go along. Remember these aggregates and stacks, and
485 // show summary statistics about:
486 //
487 // - Total number of unique stacks
488 // - Top 10 stacks by count
489 // - Top 10 stacks by aggregate duration
490 SmallVector, 11> TopStacksByCount;
491 SmallVector, 11> TopStacksBySum;
492 auto greater_second = [](const std::pair &A,
493 const std::pair &B) {
494 return A.second > B.second;
495 };
496 uint64_t UniqueStacks = 0;
497 for (const auto *N : RootValues) {
498 SmallVector S;
499 S.emplace_back(N);
500
501 while (!S.empty()) {
502 auto Top = S.pop_back_val();
503
504 // We only start printing the stack (by walking up the parent pointers)
505 // when we get to a leaf function.
506 if (!Top->TerminalDurations.empty()) {
507 ++UniqueStacks;
508 auto TopSum = std::accumulate(Top->TerminalDurations.begin(),
509 Top->TerminalDurations.end(), 0uLL);
510 {
511 auto E = std::make_pair(Top, TopSum);
512 TopStacksBySum.insert(std::lower_bound(TopStacksBySum.begin(),
513 TopStacksBySum.end(), E,
514 greater_second),
515 E);
516 if (TopStacksBySum.size() == 11)
517 TopStacksBySum.pop_back();
518 }
519 {
520 auto E = std::make_pair(Top, Top->TerminalDurations.size());
521 TopStacksByCount.insert(std::lower_bound(TopStacksByCount.begin(),
522 TopStacksByCount.end(), E,
523 greater_second),
524 E);
525 if (TopStacksByCount.size() == 11)
526 TopStacksByCount.pop_back();
527 }
528 }
529 for (const auto *C : Top->Callees)
530 S.push_back(C);
531 }
532 }
533
534 // Now print the statistics in the end.
535 OS << "\n";
536 OS << "Unique Stacks: " << UniqueStacks << "\n";
537 OS << "Top 10 Stacks by leaf sum:\n\n";
538 for (const auto &P : TopStacksBySum) {
539 OS << "Sum: " << P.second << "\n";
540 printStack(OS, P.first, FN);
541 }
542 OS << "\n";
543 OS << "Top 10 Stacks by leaf count:\n\n";
544 for (const auto &P : TopStacksByCount) {
545 OS << "Count: " << P.second << "\n";
546 printStack(OS, P.first, FN);
547 }
548 OS << "\n";
549 }
550 };
551
552 std::string CreateErrorMessage(StackTrie::AccountRecordStatus Error,
553 const XRayRecord &Record,
554 const FuncIdConversionHelper &Converter) {
555 switch (Error) {
556 case StackTrie::AccountRecordStatus::ENTRY_NOT_FOUND:
557 return formatv("Found record {0} with no matching function entry\n",
558 format_xray_record(Record, Converter));
559 default:
560 return formatv("Unknown error type for record {0}\n",
561 format_xray_record(Record, Converter));
562 }
563 }
564
565 static CommandRegistration Unused(&Stack, []() -> Error {
566 // Load each file provided as a command-line argument. For each one of them
567 // account to a single StackTrie, and just print the whole trie for now.
568 StackTrie ST;
569 InstrumentationMap Map;
570 if (!StacksInstrMap.empty()) {
571 auto InstrumentationMapOrError = loadInstrumentationMap(StacksInstrMap);
572 if (!InstrumentationMapOrError)
573 return joinErrors(
574 make_error(
575 Twine("Cannot open instrumentation map: ") + StacksInstrMap,
576 std::make_error_code(std::errc::invalid_argument)),
577 InstrumentationMapOrError.takeError());
578 Map = std::move(*InstrumentationMapOrError);
579 }
580
581 if (SeparateThreadStacks && AggregateThreads)
582 return make_error(
583 Twine("Can't specify options for per thread reporting and reporting "
584 "that aggregates threads."),
585 std::make_error_code(std::errc::invalid_argument));
586
587 symbolize::LLVMSymbolizer::Options Opts(
588 symbolize::FunctionNameKind::LinkageName, true, true, false, "");
589 symbolize::LLVMSymbolizer Symbolizer(Opts);
590 FuncIdConversionHelper FuncIdHelper(StacksInstrMap, Symbolizer,
591 Map.getFunctionAddresses());
592 // TODO: Someday, support output to files instead of just directly to
593 // standard output.
594 for (const auto &Filename : StackInputs) {
595 auto TraceOrErr = loadTraceFile(Filename);
596 if (!TraceOrErr) {
597 if (!StackKeepGoing)
598 return joinErrors(
599 make_error(
600 Twine("Failed loading input file '") + Filename + "'",
601 std::make_error_code(std::errc::invalid_argument)),
602 TraceOrErr.takeError());
603 logAllUnhandledErrors(TraceOrErr.takeError(), errs(), "");
604 continue;
605 }
606 auto &T = *TraceOrErr;
607 StackTrie::AccountRecordState AccountRecordState =
608 StackTrie::AccountRecordState::CreateInitialState();
609 for (const auto &Record : T) {
610 auto error = ST.accountRecord(Record, &AccountRecordState);
611 if (error != StackTrie::AccountRecordStatus::OK) {
612 if (!StackKeepGoing)
613 return make_error(
614 CreateErrorMessage(error, Record, FuncIdHelper),
615 make_error_code(errc::illegal_byte_sequence));
616 errs() << CreateErrorMessage(error, Record, FuncIdHelper);
617 }
618 }
619 }
620 if (ST.isEmpty()) {
621 return make_error(
622 "No instrumented calls were accounted in the input file.",
623 make_error_code(errc::result_out_of_range));
624 }
625 if (AggregateThreads) {
626 ST.printAggregatingThreads(outs(), FuncIdHelper);
627 } else if (SeparateThreadStacks) {
628 ST.printPerThread(outs(), FuncIdHelper);
629 } else {
630 ST.printIgnoringThreads(outs(), FuncIdHelper);
631 }
632 return Error::success();
633 });