llvm.org GIT mirror llvm / 630a4ac
Xray docs with description of Flight Data Recorder binary format. Summary: Adding a new restructuredText file to document the trace format produced with an FDR mode handler and read by llvm-xray toolset. Fixed two problems in the documentation from differential review. One bad table and a missing link in the toc. Original commit was e97c5836a77db803fe53319c53f3bf8e8b26d2b7. Reviewers: dberris, pelikan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36041 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309891 91177308-0d34-0410-b5e6-96231b3b80d8 Keith Wyss 2 years ago
4 changed file(s) with 411 addition(s) and 5 deletion(s). Raw diff Collapse all Expand all
195195 ``XRAY_OPTIONS`` environment variable (while also optionally setting the
196196 ``xray_naive_log`` to ``false``).
197197
198 When the buffers are flushed to disk, the result is a binary trace format
199 described by `XRay FDR format `_
200
198201 When FDR mode is on, it will keep writing and recycling memory buffers until
199202 the logging implementation is finalized -- at which point it can be flushed and
200203 re-initialised later. To do this programmatically, we follow the workflow
0 ======================================
1 XRay Flight Data Recorder Trace Format
2 ======================================
3
4 :Version: 1 as of 2017-07-20
5
6 .. contents::
7 :local:
8
9
10 Introduction
11 ============
12
13 When gathering XRay traces in Flight Data Recorder mode, each thread of an
14 application will claim buffers to fill with trace data, which at some point
15 is finalized and flushed.
16
17 A goal of the profiler is to minimize overhead, so the flushed data directly
18 corresponds to the buffer.
19
20 This document describes the format of a trace file.
21
22
23 General
24 =======
25
26 Each trace file corresponds to a sequence of events in a particular thread.
27
28 The file has a header followed by a sequence of discriminated record types.
29
30 The endianess of byte fields matches the endianess of the platform which
31 produced the trace file.
32
33
34 Header Section
35 ==============
36
37 A trace file begins with a 32 byte header.
38
39 +-------------------+-----------------+----------------------------------------+
40 | Field | Size (bytes) | Description |
41 +===================+=================+========================================+
42 | version | ``2`` | Anticipates versioned readers. This |
43 | | | document describes the format when |
44 | | | version == 1 |
45 +-------------------+-----------------+----------------------------------------+
46 | type | ``2`` | An enumeration encoding the type of |
47 | | | trace. Flight Data Recorder mode |
48 | | | traces have type == 1 |
49 +-------------------+-----------------+----------------------------------------+
50 | bitfield | ``4`` | Holds parameters that are not aligned |
51 | | | to bytes. Further described below. |
52 +-------------------+-----------------+----------------------------------------+
53 | cycle_frequency | ``8`` | The frequency in hertz of the CPU |
54 | | | oscillator used to measure duration of |
55 | | | events in ticks. |
56 +-------------------+-----------------+----------------------------------------+
57 | buffer_size | ``8`` | The size in bytes of the data portion |
58 | | | of the trace following the header. |
59 +-------------------+-----------------+----------------------------------------+
60 | reserved | ``8`` | Reserved for future use. |
61 +-------------------+-----------------+----------------------------------------+
62
63 The bitfield parameter of the file header is composed of the following fields.
64
65 +-------------------+----------------+-----------------------------------------+
66 | Field | Size (bits) | Description |
67 +===================+================+=========================================+
68 | constant_tsc | ``1`` | Whether the platform's timestamp |
69 | | | counter used to record ticks between |
70 | | | events ticks at a constant frequency |
71 | | | despite CPU frequency changes. |
72 | | | 0 == non-constant. 1 == constant. |
73 +-------------------+----------------+-----------------------------------------+
74 | nonstop_tsc | ``1`` | Whether the tsc continues to count |
75 | | | despite whether the CPU is in a low |
76 | | | power state. 0 == stop. 1 == non-stop. |
77 +-------------------+----------------+-----------------------------------------+
78 | reserved | ``30`` | Not meaningful. |
79 +-------------------+----------------+-----------------------------------------+
80
81
82 Data Section
83 ============
84
85 Following the header in a trace is a data section with size matching the
86 buffer_size field in the header.
87
88 The data section is a stream of elements of different types.
89
90 There are a few categories of data in the sequence.
91
92 - ``Function Records``: Function Records contain the timing of entry into and
93 exit from function execution. Function Records have 8 bytes each.
94
95 - ``Metadata Records``: Metadata records serve many purposes. Mostly, they
96 capture information that may be too costly to record for each function, but
97 that is required to contextualize the fine-grained timings. They also are used
98 as markers for user-defined Event Data payloads. Metadata records have 16
99 bytes each.
100
101 - ``Event Data``: Free form data may be associated with events that are traced
102 by the binary and encode data defined by a handler function. Event data is
103 always preceded with a marker record which indicates how large it is.
104
105 - ``Function Arguments``: The arguments to some functions are included in the
106 trace. These are either pointer addresses or primitives that are read and
107 logged independently of their types in a high level language. To the tracer,
108 they are all simply numbers. Function Records that have attached arguments
109 will indicate their presence on the function entry record. We only support
110 logging contiguous function argument sequences starting with argument zero,
111 which will be the "this" pointer for member function invocations. For example,
112 we don't support logging the first and third argument.
113
114 A reader of the memory format must maintain a state machine. The format makes no
115 attempt to pad for alignment, and it is not seekable.
116
117
118 Function Records
119 ----------------
120
121 Function Records have an 8 byte layout. This layout encodes information to
122 reconstruct a call stack of instrumented function and their durations.
123
124 +---------------+--------------+-----------------------------------------------+
125 | Field | Size (bits) | Description |
126 +===============+==============+===============================================+
127 | discriminant | ``1`` | Indicates whether a reader should read a |
128 | | | Function or Metadata record. Set to ``0`` for |
129 | | | Function records. |
130 +---------------+--------------+-----------------------------------------------+
131 | action | ``3`` | Specifies whether the function is being |
132 | | | entered, exited, or is a non-standard entry |
133 | | | or exit produced by optimizations. |
134 +---------------+--------------+-----------------------------------------------+
135 | function_id | ``28`` | A numeric ID for the function. Resolved to a |
136 | | | name via the xray instrumentation map. The |
137 | | | instrumentation map is built by xray at |
138 | | | compile time into an object file and pairs |
139 | | | the function ids to addresses. It is used for |
140 | | | patching and as a lookup into the binary's |
141 | | | symbols to obtain names. |
142 +---------------+--------------+-----------------------------------------------+
143 | tsc_delta | ``32`` | The number of ticks of the timestamp counter |
144 | | | since a previous record recorded a delta or |
145 | | | other TSC resetting event. |
146 +---------------+--------------+-----------------------------------------------+
147
148 On little-endian machines, the bitfields are ordered from least significant bit
149 bit to most significant bit. A reader can read an 8 bit value and apply the mask
150 ``0x01`` for the discriminant. Similarly, they can read 32 bits and unsigned
151 shift right by ``0x04`` to obtain the function_id field.
152
153 On big-endian machine, the bitfields are written in order from most significant
154 bit to least significant bit. A reader would read an 8 bit value and unsigned
155 shift right by 7 bits for the discriminant. The function_id field could be
156 obtained by reading a 32 bit value and applying the mask ``0x0FFFFFFF``.
157
158 Function action types are as follows.
159
160 +---------------+--------------+-----------------------------------------------+
161 | Type | Number | Description |
162 +===============+==============+===============================================+
163 | Entry | ``0`` | Typical function entry. |
164 +---------------+--------------+-----------------------------------------------+
165 | Exit | ``1`` | Typical function exit. |
166 +---------------+--------------+-----------------------------------------------+
167 | Tail_Exit | ``2`` | An exit from a function due to Tail call |
168 | | | optimization. |
169 +---------------+--------------+-----------------------------------------------+
170 | Entry_Args | ``3`` | A function entry that records arguments. |
171 +---------------+--------------+-----------------------------------------------+
172
173 Entry_Args records do not contain the arguments themselves. Instead, metadata
174 records for each of the logged args follow the function record in the stream.
175
176
177 Metadata Records
178 ----------------
179
180 Interspersed throughout the buffer are 16 byte Metadata records. For typically
181 instrumented binaries, they will be sparser than Function records, and they
182 provide a fuller picture of the binary execution state.
183
184 Metadata record layout is partially record dependent, but they share a common
185 structure.
186
187 The same bit field rules described for function records apply to the first byte
188 of MetadataRecords. Within this byte, little endian machines use lsb to msb
189 ordering and big endian machines use msb to lsb ordering.
190
191 +---------------+--------------+-----------------------------------------------+
192 | Field | Size | Description |
193 +===============+==============+===============================================+
194 | discriminant | ``1 bit`` | Indicates whether a reader should read a |
195 | | | Function or Metadata record. Set to ``1`` for |
196 | | | Metadata records. |
197 +---------------+--------------+-----------------------------------------------+
198 | record_kind | ``7 bits`` | The type of Metadata record. |
199 +---------------+--------------+-----------------------------------------------+
200 | data | ``15 bytes`` | A data field used differently for each record |
201 | | | type. |
202 +---------------+--------------+-----------------------------------------------+
203
204 Here is a table of the enumerated record kinds.
205
206 +--------+---------------------------+
207 | Number | Type |
208 +========+===========================+
209 | 0 | NewBuffer |
210 +--------+---------------------------+
211 | 1 | EndOfBuffer |
212 +--------+---------------------------+
213 | 2 | NewCPUId |
214 +--------+---------------------------+
215 | 3 | TSCWrap |
216 +--------+---------------------------+
217 | 4 | WallTimeMarker |
218 +--------+---------------------------+
219 | 5 | CustomEventMarker |
220 +--------+---------------------------+
221 | 6 | CallArgument |
222 +--------+---------------------------+
223
224
225 NewBuffer Records
226 -----------------
227
228 Each buffer begins with a NewBuffer record immediately after the header.
229 It records the thread ID of the thread that the trace belongs to.
230
231 Its data segment is as follows.
232
233 +---------------+--------------+-----------------------------------------------+
234 | Field | Size (bytes) | Description |
235 +===============+==============+===============================================+
236 | thread_Id | ``2`` | Thread ID for buffer. |
237 +---------------+--------------+-----------------------------------------------+
238 | reserved | ``13`` | Unused. |
239 +---------------+--------------+-----------------------------------------------+
240
241
242 WallClockTime Records
243 ---------------------
244
245 Following the NewBuffer record, each buffer records an absolute time as a frame
246 of reference for the durations recorded by timestamp counter deltas.
247
248 Its data segment is as follows.
249
250 +---------------+--------------+-----------------------------------------------+
251 | Field | Size (bytes) | Description |
252 +===============+==============+===============================================+
253 | seconds | ``8`` | Seconds on absolute timescale. The starting |
254 | | | point is unspecified and depends on the |
255 | | | implementation and platform configured by the |
256 | | | tracer. |
257 +---------------+--------------+-----------------------------------------------+
258 | microseconds | ``4`` | The microsecond component of the time. |
259 +---------------+--------------+-----------------------------------------------+
260 | reserved | ``3`` | Unused. |
261 +---------------+--------------+-----------------------------------------------+
262
263
264 NewCpuId Records
265 ----------------
266
267 Each function entry invokes a routine to determine what CPU is executing.
268 Typically, this is done with readtscp, which reads the timestamp counter at the
269 same time.
270
271 If the tracing detects that the execution has switched CPUs or if this is the
272 first instrumented entry point, the tracer will output a NewCpuId record.
273
274 Its data segment is as follows.
275
276 +---------------+--------------+-----------------------------------------------+
277 | Field | Size (bytes) | Description |
278 +===============+==============+===============================================+
279 | cpu_id | ``2`` | CPU Id. |
280 +---------------+--------------+-----------------------------------------------+
281 | absolute_tsc | ``8`` | The absolute value of the timestamp counter. |
282 +---------------+--------------+-----------------------------------------------+
283 | reserved | ``5`` | Unused. |
284 +---------------+--------------+-----------------------------------------------+
285
286
287 TSCWrap Records
288 ---------------
289
290 Since each function record uses a 32 bit value to represent the number of ticks
291 of the timestamp counter since the last reference, it is possible for this value
292 to overflow, particularly for sparsely instrumented binaries.
293
294 When this delta would not fit into a 32 bit representation, a reference absolute
295 timestamp counter record is written in the form of a TSCWrap record.
296
297 Its data segment is as follows.
298
299 +---------------+--------------+-----------------------------------------------+
300 | Field | Size (bytes) | Description |
301 +===============+==============+===============================================+
302 | absolute_tsc | ``8`` | Timestamp counter value. |
303 +---------------+--------------+-----------------------------------------------+
304 | reserved | ``7`` | Unused. |
305 +---------------+--------------+-----------------------------------------------+
306
307
308 CallArgument Records
309 --------------------
310
311 Immediately following an Entry_Args type function record, there may be one or
312 more CallArgument records that contain the traced function's parameter values.
313
314 The order of the CallArgument Record sequency corresponds one to one with the
315 order of the function parameters.
316
317 CallArgument data segment:
318
319 +---------------+--------------+-----------------------------------------------+
320 | Field | Size (bytes) | Description |
321 +===============+==============+===============================================+
322 | argument | ``8`` | Numeric argument (may be pointer address). |
323 +---------------+--------------+-----------------------------------------------+
324 | reserved | ``7`` | Unused. |
325 +---------------+--------------+-----------------------------------------------+
326
327
328 CustomEventMarker Records
329 -------------------------
330
331 XRay provides the feature of logging custom events. This may be leveraged to
332 record tracing info for RPCs or similarly trace data that is application
333 specific.
334
335 Custom Events themselves are an unstructured (application defined) segment of
336 memory with arbitrary size within the buffer. They are preceded by
337 CustomEventMarkers to indicate their presence and size.
338
339 CustomEventMarker data segment:
340
341 +---------------+--------------+-----------------------------------------------+
342 | Field | Size (bytes) | Description |
343 +===============+==============+===============================================+
344 | event_size | ``4`` | Size of preceded event. |
345 +---------------+--------------+-----------------------------------------------+
346 | absolute_tsc | ``8`` | A timestamp counter of the event. |
347 +---------------+--------------+-----------------------------------------------+
348 | reserved | ``3`` | Unused. |
349 +---------------+--------------+-----------------------------------------------+
350
351
352 EndOfBuffer Records
353 -------------------
354
355 An EndOfBuffer record type indicates that there is no more trace data in this
356 buffer. The reader is expected to seek past the remaining buffer_size expressed
357 before the start of buffer and look for either another header or EOF.
358
359
360 Format Grammar and Invariants
361 =============================
362
363 Not all sequences of Metadata records and Function records are valid data. A
364 sequence should be parsed as a state machine. The expectations for a valid
365 format can be expressed as a context free grammar.
366
367 This is an attempt to explain the format with statements in EBNF format.
368
369 - Format := Header ThreadBuffer* EOF
370
371 - ThreadBuffer := NewBuffer WallClockTime NewCPUId BodySequence* End
372
373 - BodySequence := NewCPUId | TSCWrap | Function | CustomEvent
374
375 - Function := (Function_Entry_Args CallArgument*) | Function_Other_Type
376
377 - CustomEvent := CustomEventMarker CustomEventUnstructuredMemory
378
379 - End := EndOfBuffer RemainingBufferSizeToSkip
380
381
382 Function Record Order
383 ---------------------
384
385 There are a few clarifications that may help understand what is expected of
386 Function records.
387
388 - Functions with an Exit are expected to have a corresponding Entry or
389 Entry_Args function record precede them in the trace.
390
391 - Tail_Exit Function records record the Function ID of the function whose return
392 address the program counter will take. In other words, the final function that
393 would be popped off of the call stack if tail call optimization was not used.
394
395 - Not all functions marked for instrumentation are necessarily in the trace. The
396 tracer uses heuristics to preserve the trace for non-trivial functions.
397
398 - Not every entry must have a traced Exit or Tail Exit. The buffer may run out
399 of space or the program may request for the tracer to finalize toreturn the
400 buffer before an instrumented function exits.
279279 GlobalISel
280280 XRay
281281 XRayExample
282 XRayFDRFormat
282283 PDB/index
283284
284285 :doc:`WritingAnLLVMPass`
221221 DataExtractor &RecordExtractor,
222222 size_t &RecordSize) {
223223 // We can encounter a CustomEventMarker anywhere in the log, so we can handle
224 // it regardless of the expectation. However, we do se the expectation to read
225 // a set number of fixed bytes, as described in the metadata.
224 // it regardless of the expectation. However, we do set the expectation to
225 // read a set number of fixed bytes, as described in the metadata.
226226 uint32_t OffsetPtr = 1; // Read after the first byte.
227227 uint32_t DataSize = RecordExtractor.getU32(&OffsetPtr);
228228 uint64_t TSC = RecordExtractor.getU64(&OffsetPtr);
332332 }
333333 Record.CPU = State.CPUId;
334334 Record.TId = State.ThreadId;
335 // Back up to read first 32 bits, including the 8 we pulled RecordType
335 // Back up to read first 32 bits, including the 4 we pulled RecordType
336336 // and RecordKind out of. The remaining 28 are FunctionId.
337337 uint32_t OffsetPtr = 0;
338338 // Despite function Id being a signed int on XRayRecord,
368368 /// We expect a format complying with the grammar in the following pseudo-EBNF.
369369 ///
370370 /// FDRLog: XRayFileHeader ThreadBuffer*
371 /// XRayFileHeader: 32 bits to identify the log as FDR with machine metadata.
372 /// ThreadBuffer: BufSize NewBuffer WallClockTime NewCPUId FunctionSequence EOB
371 /// XRayFileHeader: 32 bytes to identify the log as FDR with machine metadata.
372 /// Includes BufferSize
373 /// ThreadBuffer: NewBuffer WallClockTime NewCPUId FunctionSequence EOB
373374 /// BufSize: 8 byte unsigned integer indicating how large the buffer is.
374375 /// NewBuffer: 16 byte metadata record with Thread Id.
375376 /// WallClockTime: 16 byte metadata record with human readable time.