llvm.org GIT mirror llvm / 719641e
Merging r354764: ------------------------------------------------------------------------ r354764 | lebedevri | 2019-02-25 08:39:07 +0100 (Mon, 25 Feb 2019) | 123 lines [XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert" Summary: This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert Abstractions are great. Readable code is great. JSON support library is a *good* idea. However unfortunately, there is an internal detail that one needs to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`. So for **every** `llvm::json::Object`, even if you only store a single `int` entry there, you pay the whole price of `llvm::DenseMap`. Unfortunately, it matters for `llvm-xray`. I was trying to analyse the `llvm-exegesis` analysis mode performance, and for that i wanted to view the LLVM X-Ray log visualization in Chrome trace viewer. And the `llvm-xray convert` is sluggish, and sometimes even ended up being killed by OOM. `xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis` (compiled with ` -fxray-instruction-threshold=128`) analysis mode over `-benchmarks-file` with 10099 points (one full latency measurement set), with normal runtime of 0.387s. Timings: Old: (copied from D58580) ``` $ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs): 21346.24 msec task-clock # 1.000 CPUs utilized ( +- 0.28% ) 314 context-switches # 14.701 M/sec ( +- 59.13% ) 1 cpu-migrations # 0.037 M/sec ( +-100.00% ) 2181354 page-faults # 102191.251 M/sec ( +- 0.02% ) 85477442102 cycles # 4004415.019 GHz ( +- 0.28% ) (83.33%) 14526427066 stalled-cycles-frontend # 16.99% frontend cycles idle ( +- 0.70% ) (83.33%) 32371533721 stalled-cycles-backend # 37.87% backend cycles idle ( +- 0.27% ) (33.34%) 67896890228 instructions # 0.79 insn per cycle # 0.48 stalled cycles per insn ( +- 0.03% ) (50.00%) 14592654840 branches # 683631198.653 M/sec ( +- 0.02% ) (66.67%) 212207534 branch-misses # 1.45% of all branches ( +- 0.94% ) (83.34%) 21.3502 +- 0.0585 seconds time elapsed ( +- 0.27% ) ``` New: ``` $ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs): 7178.38 msec task-clock # 1.000 CPUs utilized ( +- 0.26% ) 182 context-switches # 25.402 M/sec ( +- 28.84% ) 0 cpu-migrations # 0.046 M/sec ( +- 70.71% ) 33701 page-faults # 4694.994 M/sec ( +- 0.88% ) 28761053971 cycles # 4006833.933 GHz ( +- 0.26% ) (83.32%) 2028297997 stalled-cycles-frontend # 7.05% frontend cycles idle ( +- 1.61% ) (83.32%) 10773154901 stalled-cycles-backend # 37.46% backend cycles idle ( +- 0.38% ) (33.36%) 36199132874 instructions # 1.26 insn per cycle # 0.30 stalled cycles per insn ( +- 0.03% ) (50.02%) 6434504227 branches # 896420204.421 M/sec ( +- 0.03% ) (66.68%) 73355176 branch-misses # 1.14% of all branches ( +- 1.46% ) (83.33%) 7.1807 +- 0.0190 seconds time elapsed ( +- 0.26% ) ``` So using `llvm::json` nearly triples run-time on that test case. (+3x is times, not percent.) Memory: Old: ``` total runtime: 39.88s. bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s) calls to allocation functions: 33267816 (834135/s) temporary memory allocations: 5832298 (146235/s) peak heap memory consumption: 9.21GB peak RSS (including heaptrack overhead): 147.98GB total memory leaked: 1.09MB ``` New: ``` total runtime: 17.42s. bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s) calls to allocation functions: 21382982 (1227284/s) temporary memory allocations: 232858 (13364/s) peak heap memory consumption: 350.69MB peak RSS (including heaptrack overhead): 2.55GB total memory leaked: 79.95KB ``` Diff: ``` total runtime: -22.46s. bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s) calls to allocation functions: -11884834 (529155/s) temporary memory allocations: -5599440 (249307/s) peak heap memory consumption: -8.86GB peak RSS (including heaptrack overhead): 0B total memory leaked: -1.01MB ``` So using `llvm::json` increases *peak* memory consumption on *this* testcase ~+27x. And total allocation count +15x. Both of these numbers are times, *not* percent. And note that memory usage is clearly unbound with `llvm::json`, it directly depends on the length of the log, so peak memory consumption is always increasing. This isn't so with the dumb code, there is no accumulating memory consumption, peak memory consumption is fixed. Naturally, that means it will handle *much* larger logs without OOM'ing. Readability is good, but the price is simply unacceptable here. Too bad none of this analysis was done as part of the development/review D50129 itself. Reviewers: dberris, kpw, sammccall Reviewed By: dberris Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58584 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_80@354856 91177308-0d34-0410-b5e6-96231b3b80d8 Hans Wennborg 5 months ago
1 changed file(s) with 62 addition(s) and 49 deletion(s). Raw diff Collapse all Expand all
1717 #include "llvm/Support/EndianStream.h"
1818 #include "llvm/Support/FileSystem.h"
1919 #include "llvm/Support/FormatVariadic.h"
20 #include "llvm/Support/JSON.h"
2120 #include "llvm/Support/ScopedPrinter.h"
2221 #include "llvm/Support/YAMLTraits.h"
2322 #include "llvm/Support/raw_ostream.h"
241240 return CurrentStack;
242241 }
243242
243 void writeTraceViewerRecord(uint16_t Version, raw_ostream &OS, int32_t FuncId,
244 uint32_t TId, uint32_t PId, bool Symbolize,
245 const FuncIdConversionHelper &FuncIdHelper,
246 double EventTimestampUs,
247 const StackTrieNode &StackCursor,
248 StringRef FunctionPhenotype) {
249 OS << " ";
250 if (Version >= 3) {
251 OS << llvm::formatv(
252 R"({ "name" : "{0}", "ph" : "{1}", "tid" : "{2}", "pid" : "{3}", )"
253 R"("ts" : "{4:f4}", "sf" : "{5}" })",
254 (Symbolize ? FuncIdHelper.SymbolOrNumber(FuncId)
255 : llvm::to_string(FuncId)),
256 FunctionPhenotype, TId, PId, EventTimestampUs,
257 StackCursor.ExtraData.id);
258 } else {
259 OS << llvm::formatv(
260 R"({ "name" : "{0}", "ph" : "{1}", "tid" : "{2}", "pid" : "1", )"
261 R"("ts" : "{3:f3}", "sf" : "{4}" })",
262 (Symbolize ? FuncIdHelper.SymbolOrNumber(FuncId)
263 : llvm::to_string(FuncId)),
264 FunctionPhenotype, TId, EventTimestampUs, StackCursor.ExtraData.id);
265 }
266 }
267
244268 } // namespace
245269
246270 void TraceConverter::exportAsChromeTraceEventFormat(const Trace &Records,
251275
252276 unsigned id_counter = 0;
253277
278 OS << "{\n \"traceEvents\": [";
254279 DenseMap StackCursorByThreadId{};
255280 DenseMap> StackRootsByThreadId{};
256281 DenseMap StacksByStackId{};
257282 std::forward_list NodeStore{};
258
259 // Create a JSON Array which will hold all trace events.
260 json::Array TraceEvents;
283 int loop_count = 0;
261284 for (const auto &R : Records) {
285 if (loop_count++ == 0)
286 OS << "\n";
287 else
288 OS << ",\n";
289
262290 // Chrome trace event format always wants data in micros.
263291 // CyclesPerMicro = CycleHertz / 10^6
264292 // TSC / CyclesPerMicro == TSC * 10^6 / CycleHertz == MicroTimestamp
283311 // type of B for begin or E for end, thread id, process id,
284312 // timestamp in microseconds, and a stack frame id. The ids are logged
285313 // in an id dictionary after the events.
286 TraceEvents.push_back(json::Object({
287 {"name", Symbolize ? FuncIdHelper.SymbolOrNumber(R.FuncId)
288 : llvm::to_string(R.FuncId)},
289 {"ph", "B"},
290 {"tid", llvm::to_string(R.TId)},
291 {"pid", llvm::to_string(Version >= 3 ? R.PId : 1)},
292 {"ts", llvm::formatv("{0:f4}", EventTimestampUs)},
293 {"sf", llvm::to_string(StackCursor->ExtraData.id)},
294 }));
314 writeTraceViewerRecord(Version, OS, R.FuncId, R.TId, R.PId, Symbolize,
315 FuncIdHelper, EventTimestampUs, *StackCursor, "B");
295316 break;
296317 case RecordTypes::EXIT:
297318 case RecordTypes::TAIL_EXIT:
302323 // (And/Or in loop termination below)
303324 StackTrieNode *PreviousCursor = nullptr;
304325 do {
305 TraceEvents.push_back(json::Object({
306 {"name", Symbolize
307 ? FuncIdHelper.SymbolOrNumber(StackCursor->FuncId)
308 : llvm::to_string(StackCursor->FuncId)},
309 {"ph", "E"},
310 {"tid", llvm::to_string(R.TId)},
311 {"pid", llvm::to_string(Version >= 3 ? R.PId : 1)},
312 {"ts", llvm::formatv("{0:f4}", EventTimestampUs)},
313 {"sf", llvm::to_string(StackCursor->ExtraData.id)},
314 }));
326 if (PreviousCursor != nullptr) {
327 OS << ",\n";
328 }
329 writeTraceViewerRecord(Version, OS, StackCursor->FuncId, R.TId, R.PId,
330 Symbolize, FuncIdHelper, EventTimestampUs,
331 *StackCursor, "E");
315332 PreviousCursor = StackCursor;
316333 StackCursor = StackCursor->Parent;
317334 } while (PreviousCursor->FuncId != R.FuncId && StackCursor != nullptr);
318335 break;
319336 }
320337 }
338 OS << "\n ],\n"; // Close the Trace Events array.
339 OS << " "
340 << "\"displayTimeUnit\": \"ns\",\n";
321341
322342 // The stackFrames dictionary substantially reduces size of the output file by
323343 // avoiding repeating the entire call stack of function names for each entry.
324 json::Object StackFrames;
325 for (const auto &Stack : StacksByStackId) {
326 const auto &StackId = Stack.first;
327 const auto &StackFunctionNode = Stack.second;
328 json::Object::iterator It;
329 std::tie(It, std::ignore) = StackFrames.insert({
330 llvm::to_string(StackId),
331 json::Object{
332 {"name",
333 Symbolize ? FuncIdHelper.SymbolOrNumber(StackFunctionNode->FuncId)
334 : llvm::to_string(StackFunctionNode->FuncId)}},
335 });
336
337 if (StackFunctionNode->Parent != nullptr)
338 It->second.getAsObject()->insert(
339 {"parent", llvm::to_string(StackFunctionNode->Parent->ExtraData.id)});
340 }
341
342 json::Object TraceJSON{
343 {"displayTimeUnit", "ns"},
344 {"traceEvents", std::move(TraceEvents)},
345 {"stackFrames", std::move(StackFrames)},
346 };
347
348 // Pretty-print the JSON using two spaces for indentations.
349 OS << formatv("{0:2}", json::Value(std::move(TraceJSON)));
344 OS << R"( "stackFrames": {)";
345 int stack_frame_count = 0;
346 for (auto map_iter : StacksByStackId) {
347 if (stack_frame_count++ == 0)
348 OS << "\n";
349 else
350 OS << ",\n";
351 OS << " ";
352 OS << llvm::formatv(
353 R"("{0}" : { "name" : "{1}")", map_iter.first,
354 (Symbolize ? FuncIdHelper.SymbolOrNumber(map_iter.second->FuncId)
355 : llvm::to_string(map_iter.second->FuncId)));
356 if (map_iter.second->Parent != nullptr)
357 OS << llvm::formatv(R"(, "parent": "{0}")",
358 map_iter.second->Parent->ExtraData.id);
359 OS << " }";
360 }
361 OS << "\n }\n"; // Close the stack frames map.
362 OS << "}\n"; // Close the JSON entry.
350363 }
351364
352365 namespace llvm {