llvm.org GIT mirror llvm / 3e9e87e
Revert "Xray docs with description of Flight Data Recorder binary format." This reverts commit 3462b8ad41a840fd54dbbd0d3f2a514c5ad6f656. The docs-llvm-html target failed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309842 91177308-0d34-0410-b5e6-96231b3b80d8 Keith Wyss 2 years ago
3 changed file(s) with 5 addition(s) and 403 deletion(s). Raw diff Collapse all Expand all
195195 ``XRAY_OPTIONS`` environment variable (while also optionally setting the
196196 ``xray_naive_log`` to ``false``).
197197
198 When the buffers are flushed to disk, the result is a binary trace format
199 described by `XRay FDR format `_
200
201198 When FDR mode is on, it will keep writing and recycling memory buffers until
202199 the logging implementation is finalized -- at which point it can be flushed and
203200 re-initialised later. To do this programmatically, we follow the workflow
+0
-394
docs/XRayFDRFormat.rst less more
None ======================================
1 XRay Flight Data Recorder Trace Format
2 ======================================
3
4 :Version: 1 as of 2017-07-20
5
6 .. contents::
7 :local:
8
9
10 Introduction
11 ============
12
13 When gathering XRay traces in Flight Data Recorder mode, each thread of an
14 application will claim buffers to fill with trace data, which at some point
15 is finalized and flushed.
16
17 A goal of the profiler is to minimize overhead, so the flushed data directly
18 corresponds to the buffer.
19
20 This document describes the format of a trace file.
21
22
23 General
24 =======
25
26 Each trace file corresponds to a sequence of events in a particular thread.
27
28 The file has a header followed by a sequence of discriminated record types.
29
30 The endianess of byte fields matches the endianess of the platform which
31 produced the trace file.
32
33
34 Header Section
35 ==============
36
37 A trace file begins with a 32 byte header.
38
39 +-------------------+-----------------+----------------------------------------+
40 | Field | Size (bytes) | Description |
41 +===================+=================+========================================+
42 | version | ``2`` | Anticipates versioned readers. This |
43 | | | document describes the format when |
44 | | | version == 1 |
45 +-------------------+-----------------+----------------------------------------+
46 | type | ``2`` | An enumeration encoding the type of |
47 | | | trace. Flight Data Recorder mode |
48 | | | traces have type == 1 |
49 +-------------------+-----------------+----------------------------------------+
50 | bitfield | ``4`` | Holds parameters that are not aligned |
51 | | | to bytes. Further described below. |
52 +-------------------+-----------------+----------------------------------------+
53 | cycle_frequency | ``8`` | The frequency in hertz of the CPU |
54 | | | oscillator used to measure duration of |
55 | | | events in ticks. |
56 +-------------------+-----------------+----------------------------------------+
57 | buffer_size | ``8`` | The size in bytes of the data portion |
58 | | | of the trace following the header. |
59 +-------------------+-----------------+----------------------------------------+
60 | reserved | ``8`` | Reserved for future use. |
61 +-------------------+-----------------+----------------------------------------+
62
63 The bitfield parameter of the file header is composed of the following fields.
64
65 +-------------------+----------------+-----------------------------------------+
66 | Field | Size (bits) | Description |
67 +===================+================+=========================================+
68 | constant_tsc | ``1`` | Whether the platform's timestamp |
69 | | | counter used to record ticks between |
70 | | | events ticks at a constant frequency |
71 | | | despite CPU frequency changes. |
72 | | | 0 == non-constant. 1 == constant. |
73 +-------------------+----------------+-----------------------------------------+
74 | nonstop_tsc | ``1`` | Whether the tsc continues to count |
75 | | | despite whether the CPU is in a low |
76 | | | power state. 0 == stop. 1 == non-stop. |
77 +-------------------+----------------+-----------------------------------------+
78 | reserved | ``30`` | Not meaningful. |
79 +-------------------+----------------+-----------------------------------------+
80
81
82 Data Section
83 ============
84
85 Following the header in a trace is a data section with size matching the
86 buffer_size field in the header.
87
88 The data section is a stream of elements of different types.
89
90 There are a few categories of data in the sequence.
91
92 - ``Function Records``: Function Records contain the timing of entry into and
93 exit from function execution. Function Records have 8 bytes each.
94
95 - ``Metadata Records``: Metadata records serve many purposes. Mostly, they
96 capture information that may be too costly to record for each function, but
97 that is required to contextualize the fine-grained timings. They also are used
98 as markers for user-defined Event Data payloads. Metadata records have 16
99 bytes each.
100
101 - ``Event Data``: Free form data may be associated with events that are traced
102 by the binary and encode data defined by a handler function. Event data is
103 always preceded with a marker record which indicates how large it is.
104
105 - ``Function Arguments``: The arguments to some functions are included in the
106 trace. These are either pointer addresses or primitives that are read and
107 logged independently of their types in a high level language. To the tracer,
108 they are all simply numbers. Function Records that have attached arguments
109 will indicate their presence on the function entry record. We only support
110 logging contiguous function argument sequences starting with argument zero,
111 which will be the "this" pointer for member function invocations. For example,
112 we don't support logging the first and third argument.
113
114 A reader of the memory format must maintain a state machine. The format makes no
115 attempt to pad for alignment, and it is not seekable.
116
117
118 Function Records
119 ----------------
120
121 Function Records have an 8 byte layout. This layout encodes information to
122 reconstruct a call stack of instrumented function and their durations.
123
124 +---------------+--------------+-----------------------------------------------+
125 | Field | Size (bits) | Description |
126 +===============+==============+===============================================+
127 | discriminant | ``1`` | Indicates whether a reader should read a |
128 | | | Function or Metadata record. Set to ``0`` for |
129 | | | Function records. |
130 +---------------+--------------+-----------------------------------------------+
131 | action | ``3`` | Specifies whether the function is being |
132 | | | entered, exited, or is a non-standard entry |
133 | | | or exit produced by optimizations. |
134 +---------------+--------------+-----------------------------------------------+
135 | function_id | ``28`` | A numeric ID for the function. Resolved to a |
136 | | | name via the xray instrumentation map. The |
137 | | | instrumentation map is built by xray at |
138 | | | compile time into an object file and pairs |
139 | | | the function ids to addresses. It is used for |
140 | | | patching and as a lookup into the binary's |
141 | | | symbols to obtain names. |
142 +---------------+--------------+-----------------------------------------------+
143 | tsc_delta | ``32`` | The number of ticks of the timestamp counter |
144 | | | since a previous record recorded a delta or |
145 | | | other TSC resetting event. |
146 +---------------+--------------+-----------------------------------------------+
147
148 On little-endian machines, the bitfields are ordered from least significant bit
149 bit to most significant bit. A reader can read an 8 bit value and apply the mask
150 ``0x01`` for the discriminant. Similarly, they can read 32 bits and unsigned
151 shift right by ``0x04`` to obtain the function_id field.
152
153 On big-endian machine, the bitfields are written in order from most significant
154 bit to least significant bit. A reader would read an 8 bit value and unsigned
155 shift right by 7 bits for the discriminant. The function_id field could be
156 obtained by reading a 32 bit value and applying the mask ``0x0FFFFFFF``.
157
158 Function action types are as follows.
159
160 +---------------+--------------+-----------------------------------------------+
161 | Type | Number | Description |
162 +===============+==============+===============================================+
163 | Entry | ``0`` | Typical function entry. |
164 +---------------+--------------+-----------------------------------------------+
165 | Exit | ``1`` | Typical function exit. |
166 +---------------+--------------+-----------------------------------------------+
167 | Tail_Exit | ``2`` | An exit from a function due to Tail call |
168 | | | optimization. |
169 +---------------+--------------+-----------------------------------------------+
170 | Entry_Args | ``3`` | A function entry that records arguments. |
171 +---------------+--------------+-----------------------------------------------+
172
173 Entry_Args records do not contain the arguments themselves. Instead, metadata
174 records for each of the logged args follow the function record in the stream.
175
176
177 Metadata Records
178 ----------------
179
180 Interspersed throughout the buffer are 16 byte Metadata records. For typically
181 instrumented binaries, they will be sparser than Function records, and they
182 provide a fuller picture of the binary execution state.
183
184 Metadata record layout is partially record dependent, but they share a common
185 structure.
186
187 The same bit field rules described for function records apply to the first byte
188 of MetadataRecords. Within this byte, little endian machines use lsb to msb
189 ordering and big endian machines use msb to lsb ordering.
190
191 +---------------+--------------+-----------------------------------------------+
192 | Field | Size | Description |
193 +===============+==============+===============================================+
194 | discriminant | ``1 bit`` | Indicates whether a reader should read a |
195 | | | Function or Metadata record. Set to ``1`` for |
196 | | | Metadata records. |
197 +---------------+--------------+-----------------------------------------------+
198 | record_kind | ``7 bits`` | The type of Metadata record. |
199 +---------------+--------------+-----------------------------------------------+
200 | data | ``15 bytes`` | A data field used differently for each record |
201 | | | type. |
202 +---------------+--------------+-----------------------------------------------+
203
204 Here is a table of the enumerated record kinds.
205
206 ====== ===========================
207 Number Type
208 ------ ---------------------------
209 0 NewBuffer
210 1 EndOfBuffer
211 2 NewCPUId
212 3 TSCWrap
213 4 WallTimeMarker
214 5 CustomEventMarker
215 6 CallArgument
216
217
218 NewBuffer Records
219 -----------------
220
221 Each buffer begins with a NewBuffer record immediately after the header.
222 It records the thread ID of the thread that the trace belongs to.
223
224 Its data segment is as follows.
225
226 +---------------+--------------+-----------------------------------------------+
227 | Field | Size (bytes) | Description |
228 +===============+==============+===============================================+
229 | thread_Id | ``2`` | Thread ID for buffer. |
230 +---------------+--------------+-----------------------------------------------+
231 | reserved | ``13`` | Unused. |
232 +---------------+--------------+-----------------------------------------------+
233
234
235 WallClockTime Records
236 ---------------------
237
238 Following the NewBuffer record, each buffer records an absolute time as a frame
239 of reference for the durations recorded by timestamp counter deltas.
240
241 Its data segment is as follows.
242
243 +---------------+--------------+-----------------------------------------------+
244 | Field | Size (bytes) | Description |
245 +===============+==============+===============================================+
246 | seconds | ``8`` | Seconds on absolute timescale. The starting |
247 | | | point is unspecified and depends on the |
248 | | | implementation and platform configured by the |
249 | | | tracer. |
250 +---------------+--------------+-----------------------------------------------+
251 | microseconds | ``4`` | The microsecond component of the time. |
252 +---------------+--------------+-----------------------------------------------+
253 | reserved | ``3`` | Unused. |
254 +---------------+--------------+-----------------------------------------------+
255
256
257 NewCpuId Records
258 ----------------
259
260 Each function entry invokes a routine to determine what CPU is executing.
261 Typically, this is done with readtscp, which reads the timestamp counter at the
262 same time.
263
264 If the tracing detects that the execution has switched CPUs or if this is the
265 first instrumented entry point, the tracer will output a NewCpuId record.
266
267 Its data segment is as follows.
268
269 +---------------+--------------+-----------------------------------------------+
270 | Field | Size (bytes) | Description |
271 +===============+==============+===============================================+
272 | cpu_id | ``2`` | CPU Id. |
273 +---------------+--------------+-----------------------------------------------+
274 | absolute_tsc | ``8`` | The absolute value of the timestamp counter. |
275 +---------------+--------------+-----------------------------------------------+
276 | reserved | ``5`` | Unused. |
277 +---------------+--------------+-----------------------------------------------+
278
279
280 TSCWrap Records
281 ---------------
282
283 Since each function record uses a 32 bit value to represent the number of ticks
284 of the timestamp counter since the last reference, it is possible for this value
285 to overflow, particularly for sparsely instrumented binaries.
286
287 When this delta would not fit into a 32 bit representation, a reference absolute
288 timestamp counter record is written in the form of a TSCWrap record.
289
290 Its data segment is as follows.
291
292 +---------------+--------------+-----------------------------------------------+
293 | Field | Size (bytes) | Description |
294 +===============+==============+===============================================+
295 | absolute_tsc | ``8`` | Timestamp counter value. |
296 +---------------+--------------+-----------------------------------------------+
297 | reserved | ``7`` | Unused. |
298 +---------------+--------------+-----------------------------------------------+
299
300
301 CallArgument Records
302 --------------------
303
304 Immediately following an Entry_Args type function record, there may be one or
305 more CallArgument records that contain the traced function's parameter values.
306
307 The order of the CallArgument Record sequency corresponds one to one with the
308 order of the function parameters.
309
310 CallArgument data segment:
311
312 +---------------+--------------+-----------------------------------------------+
313 | Field | Size (bytes) | Description |
314 +===============+==============+===============================================+
315 | argument | ``8`` | Numeric argument (may be pointer address). |
316 +---------------+--------------+-----------------------------------------------+
317 | reserved | ``7`` | Unused. |
318 +---------------+--------------+-----------------------------------------------+
319
320
321 CustomEventMarker Records
322 -------------------------
323
324 XRay provides the feature of logging custom events. This may be leveraged to
325 record tracing info for RPCs or similarly trace data that is application
326 specific.
327
328 Custom Events themselves are an unstructured (application defined) segment of
329 memory with arbitrary size within the buffer. They are preceded by
330 CustomEventMarkers to indicate their presence and size.
331
332 CustomEventMarker data segment:
333
334 +---------------+--------------+-----------------------------------------------+
335 | Field | Size (bytes) | Description |
336 +===============+==============+===============================================+
337 | event_size | ``4`` | Size of preceded event. |
338 +---------------+--------------+-----------------------------------------------+
339 | absolute_tsc | ``8`` | A timestamp counter of the event. |
340 +---------------+--------------+-----------------------------------------------+
341 | reserved | ``3`` | Unused. |
342 +---------------+--------------+-----------------------------------------------+
343
344
345 EndOfBuffer Records
346 -------------------
347
348 An EndOfBuffer record type indicates that there is no more trace data in this
349 buffer. The reader is expected to seek past the remaining buffer_size expressed
350 before the start of buffer and look for either another header or EOF.
351
352
353 Format Grammar and Invariants
354 =============================
355
356 Not all sequences of Metadata records and Function records are valid data. A
357 sequence should be parsed as a state machine. The expectations for a valid
358 format can be expressed as a context free grammar.
359
360 This is an attempt to explain the format with statements in EBNF format.
361
362 - Format := Header ThreadBuffer* EOF
363
364 - ThreadBuffer := NewBuffer WallClockTime NewCPUId BodySequence* End
365
366 - BodySequence := NewCPUId | TSCWrap | Function | CustomEvent
367
368 - Function := (Function_Entry_Args CallArgument*) | Function_Other_Type
369
370 - CustomEvent := CustomEventMarker CustomEventUnstructuredMemory
371
372 - End := EndOfBuffer RemainingBufferSizeToSkip
373
374
375 Function Record Order
376 ---------------------
377
378 There are a few clarifications that may help understand what is expected of
379 Function records.
380
381 - Functions with an Exit are expected to have a corresponding Entry or
382 Entry_Args function record precede them in the trace.
383
384 - Tail_Exit Function records record the Function ID of the function whose return
385 address the program counter will take. In other words, the final function that
386 would be popped off of the call stack if tail call optimization was not used.
387
388 - Not all functions marked for instrumentation are necessarily in the trace. The
389 tracer uses heuristics to preserve the trace for non-trivial functions.
390
391 - Not every entry must have a traced Exit or Tail Exit. The buffer may run out
392 of space or the program may request for the tracer to finalize toreturn the
393 buffer before an instrumented function exits.
221221 DataExtractor &RecordExtractor,
222222 size_t &RecordSize) {
223223 // We can encounter a CustomEventMarker anywhere in the log, so we can handle
224 // it regardless of the expectation. However, we do set the expectation to
225 // read a set number of fixed bytes, as described in the metadata.
224 // it regardless of the expectation. However, we do se the expectation to read
225 // a set number of fixed bytes, as described in the metadata.
226226 uint32_t OffsetPtr = 1; // Read after the first byte.
227227 uint32_t DataSize = RecordExtractor.getU32(&OffsetPtr);
228228 uint64_t TSC = RecordExtractor.getU64(&OffsetPtr);
332332 }
333333 Record.CPU = State.CPUId;
334334 Record.TId = State.ThreadId;
335 // Back up to read first 32 bits, including the 4 we pulled RecordType
335 // Back up to read first 32 bits, including the 8 we pulled RecordType
336336 // and RecordKind out of. The remaining 28 are FunctionId.
337337 uint32_t OffsetPtr = 0;
338338 // Despite function Id being a signed int on XRayRecord,
368368 /// We expect a format complying with the grammar in the following pseudo-EBNF.
369369 ///
370370 /// FDRLog: XRayFileHeader ThreadBuffer*
371 /// XRayFileHeader: 32 bytes to identify the log as FDR with machine metadata.
372 /// Includes BufferSize
373 /// ThreadBuffer: NewBuffer WallClockTime NewCPUId FunctionSequence EOB
371 /// XRayFileHeader: 32 bits to identify the log as FDR with machine metadata.
372 /// ThreadBuffer: BufSize NewBuffer WallClockTime NewCPUId FunctionSequence EOB
374373 /// BufSize: 8 byte unsigned integer indicating how large the buffer is.
375374 /// NewBuffer: 16 byte metadata record with Thread Id.
376375 /// WallClockTime: 16 byte metadata record with human readable time.