llvm.org GIT mirror llvm / e3cea5f
Docs: add documentation for the coverage mapping format. Differential Revision: http://reviews.llvm.org/D4729 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215990 91177308-0d34-0410-b5e6-96231b3b80d8 Alex Lorenz 5 years ago
2 changed file(s) with 579 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
0 .. role:: raw-html(raw)
1 :format: html
2
3 =================================
4 LLVM Code Coverage Mapping Format
5 =================================
6
7 .. contents::
8 :local:
9
10 Introduction
11 ============
12
13 LLVM's code coverage mapping format is used to provide code coverage
14 analysis using LLVM's and Clang's instrumenation based profiling
15 (Clang's ``-fprofile-instr-generate`` option).
16
17 This document is aimed at those who use LLVM's code coverage mapping to provide
18 code coverage analysis for their own programs, and for those who would like
19 to know how it works under the hood. A prior knowledge of how Clang's profile
20 guided optimization works is useful, but not required.
21
22 We start by showing how to use LLVM and Clang for code coverage analysis,
23 then we briefly desribe LLVM's code coverage mapping format and the
24 way that Clang and LLVM's code coverage tool work with this format. After
25 the basics are down, more advanced features of the coverage mapping format
26 are discussed - such as the data structures, LLVM IR representation and
27 the binary encoding.
28
29 Quick Start
30 ===========
31
32 Here's a short story that describes how to generate code coverage overview
33 for a sample source file called *test.c*.
34
35 * First, compile an instrumented version of your program using Clang's
36 ``-fprofile-instr-generate`` option with the additional ``-fcoverage-mapping``
37 option:
38
39 ``clang -o test -fprofile-instr-generate -fcoverage-mapping test.c``
40 * Then, run the instrumented binary. The runtime will produce a file called
41 *default.profraw* containing the raw profile instrumentation data:
42
43 ``./test``
44 * After that, merge the profile data using the *llvm-profdata* tool:
45
46 ``llvm-profdata merge -o test.profdata default.profraw``
47 * Finally, run LLVM's code coverage tool (*llvm-cov*) to produce the code
48 coverage overview for the sample source file:
49
50 ``llvm-cov show ./test -instr-profile=test.profdata test.c``
51
52 High Level Overview
53 ===================
54
55 LLVM's code coverage mapping format is designed to be a self contained
56 data format, that can be embedded into the LLVM IR and object files.
57 It's described in this document as a **mapping** format because its goal is
58 to store the data that is required for a code coverage tool to map between
59 the specific source ranges in a file and the execution counts obtained
60 after running the instrumented version of the program.
61
62 The mapping data is used in two places in the code coverage process:
63
64 1. When clang compiles a source file with ``-fcoverage-mapping``, it
65 generates the mapping information that describes the mapping between the
66 source ranges and the profiling instrumentation counters.
67 This information gets embedded into the LLVM IR and conveniently
68 ends up in the final executable file when the program is linked.
69
70 2. It is also used by *llvm-cov* - the mapping information is extracted from an
71 object file and is used to associate the execution counts (the values of the
72 profile instrumentation counters), and the source ranges in a file.
73 After that, the tool is able to generate various code coverage reports
74 for the program.
75
76 The coverage mapping format aims to be a "universal format" that would be
77 suitable for usage by any frontend, and not just by Clang. It also aims to
78 provide the frontend the possibility of generating the minimal coverage mapping
79 data in order to reduce the size of the IR and object files - for example,
80 instead of emitting mapping information for each statement in a function, the
81 frontend is allowed to group the statements with the same execution count into
82 regions of code, and emit the mapping information only for those regions.
83
84 Advanced Concepts
85 =================
86
87 The remainder of this guide is meant to give you insight into the way the
88 coverage mapping format works.
89
90 The coverage mapping format operates on a per-function level as the
91 profile instrumentation counters are associated with a specific function.
92 For each function that requires code coverage, the frontend has to create
93 coverage mapping data that can map between the source code ranges and
94 the profile instrumentation counters for that function.
95
96 Mapping Region
97 --------------
98
99 The function's coverage mapping data contains an array of mapping regions.
100 A mapping region stores the `source code range`_ that is covered by this region,
101 the `file id `_, the `coverage mapping counter`_ and
102 the region's kind.
103 There are several kinds of mapping regions:
104
105 * Code regions associate portions of source code and `coverage mapping
106 counters`_. They make up the majority of the mapping regions. They are used
107 by the code coverage tool to compute the execution counts for lines,
108 highlight the regions of code that were never executed, and to obtain
109 the various code coverage statistics for a function.
110 For example:
111
112 :raw-html:`
int main(int argc, const char *argv[]) {     // Code Region from 1:40 to 9:2

                  
                
113
114 if (argc > 1) { // Code Region from 3:17 to 5:4
115 printf("%s\n", argv[1]);
116 } else { // Code Region from 5:10 to 7:4
117 printf("\n");
118 }
119 return 0;
120 }
121 `
122 * Skipped regions are used to represent source ranges that were skipped
123 by Clang's preprocessor. They don't associate with
124 `coverage mapping counters`_, as the frontend knows that they are never
125 executed. They are used by the code coverage tool to mark the skipped lines
126 inside a function as non-code lines that don't have execution counts.
127 For example:
128
129 :raw-html:`
int main() {                // Code Region from 1:12 to 6:2

                  
                
130 #ifdef DEBUG // Skipped Region from 2:1 to 4:2
131 printf("Hello world");
132 #endif
133 return 0;
134 }
135 `
136 * Expansion regions are used to represent Clang's macro expansions. They
137 have an additional property - *expanded file id*. This property can be
138 used by the code coverage tool to find the mapping regions that are created
139 as a result of this macro expansion, by checking if their file id matches the
140 expanded file id. They don't associate with `coverage mapping counters`_,
141 as the code coverage tool can determine the execution count for this region
142 by looking up the execution count of the first region with a corresponding
143 file id.
144 For example:
145
146 :raw-html:`
int func(int x) {                             

                  
                
147 #define MAX(x,y) ((x) > (y)? (x) : (y))
148 return MAX(x, 42); // Expansion Region from 3:10 to 3:13
149 }
150 `
151
152 .. _source code range:
153
154 Source Range:
155 ^^^^^^^^^^^^^
156
157 The source range record contains the starting and ending location of a certain
158 mapping region. Both locations include the line and the column numbers.
159
160 .. _coverage file id:
161
162 File ID:
163 ^^^^^^^^
164
165 The file id an integer value that tells us
166 in which source file or macro expansion is this region located.
167 It enables Clang to produce mapping information for the code
168 defined inside macros, like this example demonstrates:
169
170 :raw-html:`
void func(const char *str) {         // Code Region from 1:28 to 6:2 with file id 0

                  
                
171 #define PUT printf("%s\n", str) // 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2
172 if(*str)
173 PUT; // Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1
174 PUT; // Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2
175 }
176 `
177
178 .. _coverage mapping counter:
179 .. _coverage mapping counters:
180
181 Counter:
182 ^^^^^^^^
183
184 A coverage mapping counter can represents a reference to the profile
185 instrumentation counter. The execution count for a region with such counter
186 is determined by looking up the value of the corresponding profile
187 instrumentation counter.
188
189 It can also represent a binary arithmetical expression that operates on
190 coverage mapping counters or other expressions.
191 The execution count for a region with an expression counter is determined by
192 evaluating the expression's arguments and then adding them together or
193 subtracting them from one another.
194 In the example below, a subtraction expression is used to compute the execution
195 count for the compound statement that follows the *else* keyword:
196
197 :raw-html:`
int main(int argc, const char *argv[]) {    // Region's counter is a reference to the profile counter #0

                  
                
198
199 if (argc > 1) { // Region's counter is a reference to the profile counter #1
200 printf("%s\n", argv[1]);
201 } else { // Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)
202 printf("\n");
203 }
204 return 0;
205 }
206 `
207
208 Finally, a coverage mapping counter can also represent an execution count of
209 of zero. The zero counter is used to provide coverage mapping for
210 unreachable statements and expressions, like in the example below:
211
212 :raw-html:`
int main() {                  

                  
                
213 return 0;
214 printf("Hello world!\n"); // Unreachable region's counter is zero
215 }
216 `
217
218 The zero counters allow the code coverage tool to display proper line execution
219 counts for the unreachable lines and highlight the unreachable code.
220 Without them, the tool would think that those lines and regions were still
221 executed, as it doesn't possess the frontend's knowledge.
222
223 LLVM IR Representation
224 ======================
225
226 The coverage mapping data is stored in the LLVM IR using a single global
227 constant structure variable called *__llvm_coverage_mapping*
228 with the *__llvm_covmap* section specifier.
229
230 For example, let’s consider a C file and how it gets compiled to LLVM:
231
232 .. _coverage mapping sample:
233
234 .. code-block:: c
235
236 int foo() {
237 return 42;
238 }
239 int bar() {
240 return 13;
241 }
242
243 The coverage mapping variable generated by Clang is:
244
245 .. code-block:: llvm
246
247 @__llvm_coverage_mapping = internal constant { i32, i32, i32, i32, [2 x { i8*, i32, i32 }], [40 x i8] }
248 { i32 2, ; The number of function records
249 i32 20, ; The length of the string that contains the encoded translation unit filenames
250 i32 20, ; The length of the string that contains the encoded coverage mapping data
251 i32 0, ; Coverage mapping format version
252 [2 x { i8*, i32, i32 }] [ ; Function records
253 { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_foo, i32 0, i32 0), ; Function's name
254 i32 3, ; Function's name length
255 i32 9 ; Function's encoded coverage mapping data string length
256 },
257 { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_bar, i32 0, i32 0), ; Function's name
258 i32 3, ; Function's name length
259 i32 9 ; Function's encoded coverage mapping data string length
260 }],
261 [40 x i8] c"..." ; Encoded data (dissected later)
262 }, section "__llvm_covmap", align 8
263
264 Version:
265 --------
266
267 The coverage mapping version number can have the following values:
268
269 * 0 — The first (current) version of the coverage mapping format.
270
271 .. _function records:
272
273 Function record:
274 ----------------
275
276 A function record is a structure of the following type:
277
278 .. code-block:: llvm
279
280 { i8*, i32, i32 }
281
282 It contains the pointer to the function's name, function's name length,
283 and the length of the encoded mapping data for that function.
284
285 Encoded data:
286 -------------
287
288 The encoded data is stored in a single string that contains
289 the encoded filenames used by this translation unit and the encoded coverage
290 mapping data for each function in this translation unit.
291
292 The encoded data has the following structure:
293
294 ``[filenames, coverageMappingDataForFunctionRecord0, coverageMappingDataForFunctionRecord1, ..., padding]``
295
296 If necessary, the encoded data is padded with zeroes so that the size
297 of the data string is rounded up to the nearest multiple of 8 bytes.
298
299 Dissecting the sample:
300 ^^^^^^^^^^^^^^^^^^^^^^
301
302 Here's an overview of the encoded data that was stored in the
303 IR for the `coverage mapping sample`_ that was shown earlier:
304
305 * The IR contains the following string constant that represents the encoded
306 coverage mapping data for the sample translation unit:
307
308 .. code-block:: llvm
309
310 c"\01\12/Users/alex/test.c\01\00\00\01\01\01\0C\02\02\01\00\00\01\01\04\0C\02\02\00\00"
311
312 * The string contains values that are encoded in the LEB128 format, which is
313 used throughout for storing integers. It also contains a string value.
314
315 * The length of the substring that contains the encoded translation unit
316 filenames is the value of the second field in the *__llvm_coverage_mapping*
317 structure, which is 20, thus the filenames are encoded in this string:
318
319 .. code-block:: llvm
320
321 c"\01\12/Users/alex/test.c"
322
323 This string contains the following data:
324
325 * Its first byte has a value of ``0x01``. It stores the number of filenames
326 contained in this string.
327 * Its second byte stores the length of the first filename in this string.
328 * The remaining 18 bytes are used to store the first filename.
329
330 * The length of the substring that contains the encoded coverage mapping data
331 for the first function is the value of the third field in the first
332 structure in an array of `function records`_ stored in the
333 fifth field of the *__llvm_coverage_mapping* structure, which is the 9.
334 Therefore, the coverage mapping for the first function record is encoded
335 in this string:
336
337 .. code-block:: llvm
338
339 c"\01\00\00\01\01\01\0C\02\02"
340
341 This string consists of the following bytes:
342
343 +----------+-------------------------------------------------------------------------------------------------------------------------+
344 | ``0x01`` | The number of file ids used by this function. There is only one file id used by the mapping data in this function. |
345 +----------+-------------------------------------------------------------------------------------------------------------------------+
346 | ``0x00`` | An index into the filenames array which corresponds to the file "/Users/alex/test.c". |
347 +----------+-------------------------------------------------------------------------------------------------------------------------+
348 | ``0x00`` | The number of counter expressions used by this function. This function doesn't use any expressions. |
349 +----------+-------------------------------------------------------------------------------------------------------------------------+
350 | ``0x01`` | The number of mapping regions that are stored in an array for the function's file id #0. |
351 +----------+-------------------------------------------------------------------------------------------------------------------------+
352 | ``0x01`` | The coverage mapping counter for the first region in this function. The value of 1 tells us that it's a coverage |
353 | | mapping counter that is a reference ot the profile instrumentation counter with an index of 0. |
354 +----------+-------------------------------------------------------------------------------------------------------------------------+
355 | ``0x01`` | The starting line of the first mapping region in this function. |
356 +----------+-------------------------------------------------------------------------------------------------------------------------+
357 | ``0x0C`` | The starting column of the first mapping region in this function. |
358 +----------+-------------------------------------------------------------------------------------------------------------------------+
359 | ``0x02`` | The ending line of the first mapping region in this function. |
360 +----------+-------------------------------------------------------------------------------------------------------------------------+
361 | ``0x02`` | The ending column of the first mapping region in this function. |
362 +----------+-------------------------------------------------------------------------------------------------------------------------+
363
364 * The length of the substring that contains the encoded coverage mapping data
365 for the second function record is also 9. It's structured like the mapping data
366 for the first function record.
367
368 * The two trailing bytes are zeroes and are used to pad the coverage mapping
369 data to give it the 8 byte alignment.
370
371 Encoding
372 ========
373
374 The per-function coverage mapping data is encoded as a stream of bytes,
375 with a simple structure. The structure consists of the encoding
376 `types `_ like variable-length unsigned integers, that
377 are used to encode `File ID Mapping`_, `Counter Expressions`_ and
378 the `Mapping Regions`_.
379
380 The format of the structure follows:
381
382 ``[file id mapping, counter expressions, mapping regions]``
383
384 The translation unit filenames are encoded using the same encoding
385 `types `_ as the per-function coverage mapping data, with the
386 following structure:
387
388 ``[numFilenames : LEB128, filename0 : string, filename1 : string, ...]``
389
390 .. _cvmtypes:
391
392 Types
393 -----
394
395 This section describes the basic types that are used by the encoding format
396 and can appear after ``:`` in the ``[foo : type]`` description.
397
398 .. _LEB128:
399
400 LEB128
401 ^^^^^^
402
403 LEB128 is an unsigned interger value that is encoded using DWARF's LEB128
404 encoding, optimizing for the case where values are small
405 (1 byte for values less than 128).
406
407 .. _strings:
408
409 Strings
410 ^^^^^^^
411
412 ``[length : LEB128, characters...]``
413
414 String values are encoded with a `LEB value `_ for the length
415 of the string and a sequence of bytes for its characters.
416
417 .. _file id mapping:
418
419 File ID Mapping
420 ---------------
421
422 ``[numIndices : LEB128, filenameIndex0 : LEB128, filenameIndex1 : LEB128, ...]``
423
424 File id mapping in a function's coverage mapping stream
425 contains the indices into the translation unit's filenames array.
426
427 Counter
428 -------
429
430 ``[value : LEB128]``
431
432 A `coverage mapping counter`_ is stored in a single `LEB value `_.
433 It is composed of two things --- the `tag `_
434 which is stored in the lowest 2 bits, and the `counter data`_ which is stored
435 in the remaining bits.
436
437 .. _counter-tag:
438
439 Tag:
440 ^^^^
441
442 The counter's tag encodes the counter's kind
443 and, if the counter is an expression, the expression's kind.
444 The possible tag values are:
445
446 * 0 - The counter is zero.
447
448 * 1 - The counter is a reference to the profile instrumentation counter.
449
450 * 2 - The counter is a subtraction expression.
451
452 * 3 - The counter is an addition expression.
453
454 .. _counter data:
455
456 Data:
457 ^^^^^
458
459 The counter's data is interpreted in the following manner:
460
461 * When the counter is a reference to the profile instrumentation counter,
462 then the counter's data is the id of the profile counter.
463 * When the counter is an expression, then the counter's data
464 is the index into the array of counter expressions.
465
466 .. _Counter Expressions:
467
468 Counter Expressions
469 -------------------
470
471 ``[numExpressions : LEB128, expr0LHS : LEB128, expr0RHS : LEB128, expr1LHS : LEB128, expr1RHS : LEB128, ...]``
472
473 Counter expressions consist of two counters as they
474 represent binary arithmetic operations.
475 The expression's kind is determined from the `tag `_ of the
476 counter that references this expression.
477
478 .. _Mapping Regions:
479
480 Mapping Regions
481 ---------------
482
483 ``[numRegionArrays : LEB128, regionsForFile0, regionsForFile1, ...]``
484
485 The mapping regions are stored in an array of sub-arrays where every
486 region in a particular sub-array has the same file id.
487
488 The file id for a sub-array of regions is the index of that
489 sub-array in the main array e.g. The first sub-array will have the file id
490 of 0.
491
492 Sub-Array of Regions
493 ^^^^^^^^^^^^^^^^^^^^
494
495 ``[numRegions : LEB128, region0, region1, ...]``
496
497 The mapping regions for a specific file id are stored in an array that is
498 sorted in an ascending order by the region's starting location.
499
500 Mapping Region
501 ^^^^^^^^^^^^^^
502
503 ``[header, source range]``
504
505 The mapping region record contains two sub-records ---
506 the `header`_, which stores the counter and/or the region's kind,
507 and the `source range`_ that contains the starting and ending
508 location of this region.
509
510 .. _header:
511
512 Header
513 ^^^^^^
514
515 ``[counter]``
516
517 or
518
519 ``[pseudo-counter]``
520
521 The header encodes the region's counter and the region's kind.
522
523 The value of the counter's tag distinguishes between the counters and
524 pseudo-counters --- if the tag is zero, than this header contains a
525 pseudo-counter, otherwise this header contains an ordinary counter.
526
527 Counter:
528 """"""""
529
530 A mapping region whose header has a counter with a non-zero tag is
531 a code region.
532
533 Pseudo-Counter:
534 """""""""""""""
535
536 ``[value : LEB128]``
537
538 A pseudo-counter is stored in a single `LEB value `_, just like
539 the ordinary counter. It has the following interpretation:
540
541 * bits 0-1: tag, which is always 0.
542
543 * bit 2: expansionRegionTag. If this bit is set, then this mapping region
544 is an expansion region.
545
546 * remaining bits: data. If this region is an expansion region, then the data
547 contains the expanded file id of that region.
548
549 Otherwise, the data contains the region's kind. The possible region
550 kind values are:
551
552 * 0 - This mapping region is a code region with a counter of zero.
553 * 2 - This mapping region is a skipped region.
554
555 .. _source range:
556
557 Source Range
558 ^^^^^^^^^^^^
559
560 ``[deltaLineStart : LEB128, columnStart : LEB128, numLines : LEB128, columnEnd : LEB128]``
561
562 The source range record contains the following fields:
563
564 * *deltaLineStart*: The difference between the starting line of the
565 current mapping region and the starting line of the previous mapping region.
566
567 If the current mapping region is the first region in the current
568 sub-array, then it stores the starting line of that region.
569
570 * *columnStart*: The starting column of the mapping region.
571
572 * *numLines*: The difference between the ending line and the starting line
573 of the current mapping region.
574
575 * *columnEnd*: The ending column of the mapping region.
237237 StackMaps
238238 InAlloca
239239 BigEndianNEON
240 CoverageMappingFormat
240241
241242 :doc:`WritingAnLLVMPass`
242243 Information on how to write LLVM transformations and analyses.
323324 LLVM's support for generating NEON instructions on big endian ARM targets is
324325 somewhat nonintuitive. This document explains the implementation and rationale.
325326
327 :doc:`CoverageMappingFormat`
328 This describes the format and encoding used for LLVM’s code coverage mapping.
326329
327330 Development Process Documentation
328331 =================================