llvm.org GIT mirror llvm / 48839d9
Formatting. Some updating of data structures. More work needs to be done to update the examples. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@71974 91177308-0d34-0410-b5e6-96231b3b80d8 Bill Wendling 10 years ago
1 changed file(s) with 669 addition(s) and 555 deletion(s). Raw diff Collapse all Expand all
11 "http://www.w3.org/TR/html4/strict.dtd">
22
33
4
45 Source Level Debugging with LLVM
56
67
7677
7778
7879

This document is the central repository for all information pertaining to

79 debug information in LLVM. It describes the actual format
80 that the LLVM debug information takes, which is useful for those interested
81 in creating front-ends or dealing directly with the information. Further, this
82 document provides specifc examples of what debug information for C/C++.

80 debug information in LLVM. It describes the actual format
81 that the LLVM debug information takes, which is useful for those
82 interested in creating front-ends or dealing directly with the information.
83 Further, this document provides specifc examples of what debug information
84 for C/C++.

8385
8486
8587
9193
9294
9395

The idea of the LLVM debugging information is to capture how the important

94 pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
95 Several design aspects have shaped the solution that appears here. The
96 important ones are:

96 pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
97 Several design aspects have shaped the solution that appears here. The
98 important ones are:

9799
98100
99
  • Debugging information should have very little impact on the rest of the
  • 100 compiler. No transformations, analyses, or code generators should need to be
    101 modified because of debugging information.
    102
    103
  • LLVM optimizations should interact in well-defined and
  • 104 easily described ways with the debugging information.
    105
    106
  • Because LLVM is designed to support arbitrary programming languages,
  • 107 LLVM-to-LLVM tools should not need to know anything about the semantics of the
    108 source-level-language.
    109
    110
  • Source-level languages are often widely different from one another.
  • 111 LLVM should not put any restrictions of the flavor of the source-language, and
    112 the debugging information should work with any language.
    113
    114
  • With code generator support, it should be possible to use an LLVM compiler
  • 115 to compile a program to native machine code and standard debugging formats.
    116 This allows compatibility with traditional machine-code level debuggers, like
    117 GDB or DBX.
    118
    101
  • Debugging information should have very little impact on the rest of the
  • 102 compiler. No transformations, analyses, or code generators should need to
    103 be modified because of debugging information.
    104
    105
  • LLVM optimizations should interact in well-defined and
  • 106 easily described ways with the debugging information.
    107
    108
  • Because LLVM is designed to support arbitrary programming languages,
  • 109 LLVM-to-LLVM tools should not need to know anything about the semantics of
    110 the source-level-language.
    111
    112
  • Source-level languages are often widely different from one another.
  • 113 LLVM should not put any restrictions of the flavor of the source-language,
    114 and the debugging information should work with any language.
    115
    116
  • With code generator support, it should be possible to use an LLVM compiler
  • 117 to compile a program to native machine code and standard debugging
    118 formats. This allows compatibility with traditional machine-code level
    119 debuggers, like GDB or DBX.
    119120
    120121
    121

    The approach used by the LLVM implementation is to use a small set of

    122 href="#format_common_intrinsics">intrinsic functions to define a mapping
    123 between LLVM program objects and the source-level objects. The description of
    124 the source-level program is maintained in LLVM global variables in an
    125 href="#ccxx_frontend">implementation-defined format (the C/C++ front-end
    126 currently uses working draft 7 of the
    127 href="http://www.eagercon.com/dwarf/dwarf3std.htm">Dwarf 3 standard).

    122

    The approach used by the LLVM implementation is to use a small set

    123 of intrinsic functions to define a
    124 mapping between LLVM program objects and the source-level objects. The
    125 description of the source-level program is maintained in LLVM global
    126 variables in an implementation-defined format
    127 (the C/C++ front-end currently uses working draft 7 of
    128 the DWARF 3
    129 standard).

    128130
    129131

    When a program is being debugged, a debugger interacts with the user and

    130 turns the stored debug information into source-language specific information.
    131 As such, a debugger must be aware of the source-language, and is thus tied to
    132 a specific language or family of languages.

    132 turns the stored debug information into source-language specific information.
    133 As such, a debugger must be aware of the source-language, and is thus tied to
    134 a specific language or family of languages.

    133135
    134136
    135137
    139141
    140142
    141143
    144
    142145

    The role of debug information is to provide meta information normally

    143 stripped away during the compilation process. This meta information provides an
    144 LLVM user a relationship between generated code and the original program source
    145 code.

    146 stripped away during the compilation process. This meta information provides
    147 an LLVM user a relationship between generated code and the original program
    148 source code.

    146149
    147150

    Currently, debug information is consumed by the DwarfWriter to produce dwarf

    148 information used by the gdb debugger. Other targets could use the same
    149 information to produce stabs or other debug forms.

    151 information used by the gdb debugger. Other targets could use the same
    152 information to produce stabs or other debug forms.

    150153
    151154

    It would also be reasonable to use debug information to feed profiling tools

    152 for analysis of generated code, or, tools for reconstructing the original source
    153 from generated code.

    155 for analysis of generated code, or, tools for reconstructing the original
    156 source from generated code.

    154157
    155158

    TODO - expound a bit more.

    156159
    164167
    165168
    166169

    An extremely high priority of LLVM debugging information is to make it

    167 interact well with optimizations and analysis. In particular, the LLVM debug
    168 information provides the following guarantees:

    170 interact well with optimizations and analysis. In particular, the LLVM debug
    171 information provides the following guarantees:

    169172
    170173
    171
    172
  • LLVM debug information always provides information to accurately read the
  • 173 source-level state of the program, regardless of which LLVM optimizations
    174 have been run, and without any modification to the optimizations themselves.
    175 However, some optimizations may impact the ability to modify the current state
    176 of the program with a debugger, such as setting program variables, or calling
    177 functions that have been deleted.
    178
    179
  • LLVM optimizations gracefully interact with debugging information. If they
  • 180 are not aware of debug information, they are automatically disabled as necessary
    181 in the cases that would invalidate the debug info. This retains the LLVM
    182 features, making it easy to write new transformations.
    183
    184
  • As desired, LLVM optimizations can be upgraded to be aware of the LLVM
  • 185 debugging information, allowing them to update the debugging information as they
    186 perform aggressive optimizations. This means that, with effort, the LLVM
    187 optimizers could optimize debug code just as well as non-debug code.
    188
    189
  • LLVM debug information does not prevent many important optimizations from
  • 190 happening (for example inlining, basic block reordering/merging/cleanup, tail
    191 duplication, etc), further reducing the amount of the compiler that eventually
    192 is "aware" of debugging information.
    193
    194
  • LLVM debug information is automatically optimized along with the rest of the
  • 195 program, using existing facilities. For example, duplicate information is
    196 automatically merged by the linker, and unused information is automatically
    197 removed.
    198
    174
  • LLVM debug information always provides information to accurately read
  • 175 the source-level state of the program, regardless of which LLVM
    176 optimizations have been run, and without any modification to the
    177 optimizations themselves. However, some optimizations may impact the
    178 ability to modify the current state of the program with a debugger, such
    179 as setting program variables, or calling functions that have been
    180 deleted.
    181
    182
  • LLVM optimizations gracefully interact with debugging information. If
  • 183 they are not aware of debug information, they are automatically disabled
    184 as necessary in the cases that would invalidate the debug info. This
    185 retains the LLVM features, making it easy to write new
    186 transformations.
    187
    188
  • As desired, LLVM optimizations can be upgraded to be aware of the LLVM
  • 189 debugging information, allowing them to update the debugging information
    190 as they perform aggressive optimizations. This means that, with effort,
    191 the LLVM optimizers could optimize debug code just as well as non-debug
    192 code.
    193
    194
  • LLVM debug information does not prevent many important optimizations from
  • 195 happening (for example inlining, basic block reordering/merging/cleanup,
    196 tail duplication, etc), further reducing the amount of the compiler that
    197 eventually is "aware" of debugging information.
    198
    199
  • LLVM debug information is automatically optimized along with the rest of
  • 200 the program, using existing facilities. For example, duplicate
    201 information is automatically merged by the linker, and unused information
    202 is automatically removed.
    199203
    200204
    201205

    Basically, the debug information allows you to compile a program with

    202 "-O0 -g" and get full debug information, allowing you to arbitrarily
    203 modify the program as it executes from a debugger. Compiling a program with
    204 "-O3 -g" gives you full debug information that is always available and
    205 accurate for reading (e.g., you get accurate stack traces despite tail call
    206 elimination and inlining), but you might lose the ability to modify the program
    207 and call functions where were optimized out of the program, or inlined away
    208 completely.

    206 "-O0 -g" and get full debug information, allowing you to arbitrarily
    207 modify the program as it executes from a debugger. Compiling a program with
    208 "-O3 -g" gives you full debug information that is always available
    209 and accurate for reading (e.g., you get accurate stack traces despite tail
    210 call elimination and inlining), but you might lose the ability to modify the
    211 program and call functions where were optimized out of the program, or
    212 inlined away completely.

    209213
    210214

    LLVM test suite provides a

    211 framework to test optimizer's handling of debugging information. It can be run
    212 like this:

    215 framework to test optimizer's handling of debugging information. It can be
    216 run like this:

    213217
    214218
    215219
    
                      
                    
    218222
    219223
    220224
    221

    222 This will test impact of debugging information on optimization passes. If
    223 debugging information influences optimization passes then it will be reported
    224 as a failure. See TestingGuide
    225 for more information on LLVM test infrastructure and how to run various tests.
    226 </p>
    225 <p>This will test impact of debugging information on optimization passes. If
    226 debugging information influences optimization passes then it will be reported
    227 as a failure. See TestingGuide for more
    228 information on LLVM test infrastructure and how to run various tests.

    227229
    228230
    229231
    236238
    237239
    238240

    LLVM debugging information has been carefully designed to make it possible

    239 for the optimizer to optimize the program and debugging information without
    240 necessarily having to know anything about debugging information. In particular,
    241 the global constant merging pass automatically eliminates duplicated debugging
    242 information (often caused by header files), the global dead code elimination
    243 pass automatically deletes debugging information for a function if it decides to
    244 delete the function, and the linker eliminates debug information when it merges
    245 linkonce functions.

    241 for the optimizer to optimize the program and debugging information without
    242 necessarily having to know anything about debugging information. In
    243 particular, the global constant merging pass automatically eliminates
    244 duplicated debugging information (often caused by header files), the global
    245 dead code elimination pass automatically deletes debugging information for a
    246 function if it decides to delete the function, and the linker eliminates
    247 debug information when it merges linkonce functions.

    246248
    247249

    To do this, most of the debugging information (descriptors for types,

    248 variables, functions, source files, etc) is inserted by the language front-end
    249 in the form of LLVM global variables. These LLVM global variables are no
    250 different from any other global variables, except that they have a web of LLVM
    251 intrinsic functions that point to them. If the last references to a particular
    252 piece of debugging information are deleted (for example, by the
    253 -globaldce pass), the extraneous debug information will automatically
    254 become dead and be removed by the optimizer.

    250 variables, functions, source files, etc) is inserted by the language
    251 front-end in the form of LLVM global variables. These LLVM global variables
    252 are no different from any other global variables, except that they have a web
    253 of LLVM intrinsic functions that point to them. If the last references to a
    254 particular piece of debugging information are deleted (for example, by the
    255 -globaldce pass), the extraneous debug information will
    256 automatically become dead and be removed by the optimizer.

    255257
    256258

    Debug information is designed to be agnostic about the target debugger and

    257 debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic
    258 machine debug information pass to decode the information that represents
    259 variables, types, functions, namespaces, etc: this allows for arbitrary
    260 source-language semantics and type-systems to be used, as long as there is a
    261 module written for the target debugger to interpret the information. In
    262 addition, debug global variables are declared in the "llvm.metadata"
    263 section. All values declared in this section are stripped away after target
    264 debug information is constructed and before the program object is emitted.

    259 debugging information representation (e.g. DWARF/Stabs/etc). It uses a
    260 generic machine debug information pass to decode the information that
    261 represents variables, types, functions, namespaces, etc: this allows for
    262 arbitrary source-language semantics and type-systems to be used, as long as
    263 there is a module written for the target debugger to interpret the
    264 information. In addition, debug global variables are declared in
    265 the "llvm.metadata" section. All values declared in this section
    266 are stripped away after target debug information is constructed and before
    267 the program object is emitted.

    265268
    266269

    To provide basic functionality, the LLVM debugger does have to make some

    267 assumptions about the source-level language being debugged, though it keeps
    268 these to a minimum. The only common features that the LLVM debugger assumes
    269 exist are source files, and
    270 href="#format_global_variables">program objects. These abstract objects are
    271 used by a debugger to form stack traces, show information about local
    272 variables, etc.

    270 assumptions about the source-level language being debugged, though it keeps
    271 these to a minimum. The only common features that the LLVM debugger assumes
    272 exist are source files,
    273 and program objects. These abstract
    274 objects are used by a debugger to form stack traces, show information about
    275 local variables, etc.

    273276
    274277

    This section of the documentation first describes the representation aspects

    275 common to any source-language. The next section
    276 describes the data layout conventions used by the C and C++ front-ends.>
    278 common to any source-language. The next section>
    279 describes the data layout conventions used by the C and C++ front-ends.

    277280
    278281
    279282
    283286
    284287
    285288
    289
    286290

    In consideration of the complexity and volume of debug information, LLVM

    287 provides a specification for well formed debug global variables. The constant
    288 value of each of these globals is one of a limited set of structures, known as
    289 debug descriptors.

    291 provides a specification for well formed debug global variables. The
    292 constant value of each of these globals is one of a limited set of
    293 structures, known as debug descriptors.

    290294
    291295

    Consumers of LLVM debug information expect the descriptors for program

    292 objects to start in a canonical format, but the descriptors can include
    293 additional information appended at the end that is source-language specific. All
    294 LLVM debugging information is versioned, allowing backwards compatibility in the
    295 case that the core structures need to change in some way. Also, all debugging
    296 information objects start with a tag to indicate what type of object it is. The
    297 source-language is allowed to define its own objects, by using unreserved tag
    298 numbers. We recommend using with tags in the range 0x1000 thru 0x2000 (there is
    299 a defined enum DW_TAG_user_base = 0x1000.)

    296 objects to start in a canonical format, but the descriptors can include
    297 additional information appended at the end that is source-language
    298 specific. All LLVM debugging information is versioned, allowing backwards
    299 compatibility in the case that the core structures need to change in some
    300 way. Also, all debugging information objects start with a tag to indicate
    301 what type of object it is. The source-language is allowed to define its own
    302 objects, by using unreserved tag numbers. We recommend using with tags in
    303 the range 0x1000 thru 0x2000 (there is a defined enum DW_TAG_user_base =
    304 0x1000.)

    300305
    301306

    The fields of debug descriptors used internally by LLVM (MachineModuleInfo)

    302 are restricted to only the simple data types int, uint,
    303 bool, float, double, i8* and { }*
    304 . References to arbitrary values are handled using a { }* and a
    305 cast to { }* expression; typically references to other field
    306 descriptors, arrays of descriptors or global variables.

    307
    308
    
                      
                    
    309 %llvm.dbg.object.type = type {
    310 uint, ;; A tag
    311 ...
    312 }
    313
    307 are restricted to only the simple data types int, uint,
    308 bool, float, double, i8* and
    309 { }*. References to arbitrary values are handled using a
    310 { }* and a cast to { }* expression; typically
    311 references to other field descriptors, arrays of descriptors or global
    312 variables.

    313
    314
    315
    
                      
                    
    316 %llvm.dbg.object.type = type {
    317 uint, ;; A tag
    318 ...
    319 }
    320
    321
    314322
    315323

    The first field of a descriptor is always an

    316 uint containing a tag value identifying the content of the descriptor.
    317 The remaining fields are specific to the descriptor. The values of tags are
    318 loosely bound to the tag values of Dwarf information entries. However, that
    319 does not restrict the use of the information supplied to Dwarf targets. To
    320 facilitate versioning of debug information, the tag is augmented with the
    321 current debug version (LLVMDebugVersion = 4 << 16 or 0x40000 or 262144.)

    324 uint containing a tag value identifying the content of the
    325 descriptor. The remaining fields are specific to the descriptor. The values
    326 of tags are loosely bound to the tag values of DWARF information entries.
    327 However, that does not restrict the use of the information supplied to DWARF
    328 targets. To facilitate versioning of debug information, the tag is augmented
    329 with the current debug version (LLVMDebugVersion = 4 << 16 or 0x40000 or
    330 262144.)

    322331
    323332

    The details of the various descriptors follow.

    324333
    331340
    332341
    333342
    334
    
                      
                    
    335 %llvm.dbg.anchor.type = type {
    336 uint, ;; Tag = 0 + LLVMDebugVersion
    337 uint ;; Tag of descriptors grouped by the anchor
    338 }
    339 </pre>
    343 <div class="doc_code">
    344
    
                      
                    
    345 %llvm.dbg.anchor.type = type {
    346 i32, ;; Tag = 0 + LLVMDebugVersion
    347 i32 ;; Tag of descriptors grouped by the anchor
    348 }
    349
    350
    340351
    341352

    One important aspect of the LLVM debug representation is that it allows the

    342 LLVM debugger to efficiently index all of the global objects without having the
    343 scan the program. To do this, all of the global objects use "anchor"
    344 descriptors with designated names. All of the global objects of a particular
    345 type (e.g., compile units) contain a pointer to the anchor. This pointer allows
    346 a debugger to use def-use chains to find all global objects of that type.

    353 LLVM debugger to efficiently index all of the global objects without having
    354 the scan the program. To do this, all of the global objects use "anchor"
    355 descriptors with designated names. All of the global objects of a particular
    356 type (e.g., compile units) contain a pointer to the anchor. This pointer
    357 allows a debugger to use def-use chains to find all global objects of that
    358 type.

    347359
    348360

    The following names are recognized as anchors by LLVM:

    349361
    350
    
                      
                    
    351 %llvm.dbg.compile_units = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 17 } ;; DW_TAG_compile_unit
    352 %llvm.dbg.global_variables = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 52 } ;; DW_TAG_variable
    353 %llvm.dbg.subprograms = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 46 } ;; DW_TAG_subprogram
    354 </pre>
    362 <div class="doc_code">
    363
    
                      
                    
    364 %llvm.dbg.compile_units = linkonce constant %llvm.dbg.anchor.type {
    365 i32 0,
    366 i32 17
    367 } ;; DW_TAG_compile_unit
    368 %llvm.dbg.global_variables = linkonce constant %llvm.dbg.anchor.type {
    369 i32 0,
    370 i32 52
    371 } ;; DW_TAG_variable
    372 %llvm.dbg.subprograms = linkonce constant %llvm.dbg.anchor.type {
    373 i32 0,
    374 i32 46
    375 } ;; DW_TAG_subprogram
    376
    377
    355378
    356379

    Using anchors in this way (where the compile unit descriptor points to the

    357 anchors, as opposed to having a list of compile unit descriptors) allows for the
    358 standard dead global elimination and merging passes to automatically remove
    359 unused debugging information. If the globals were kept track of through lists,
    360 there would always be an object pointing to the descriptors, thus would never be
    361 deleted.

    380 anchors, as opposed to having a list of compile unit descriptors) allows for
    381 the standard dead global elimination and merging passes to automatically
    382 remove unused debugging information. If the globals were kept track of
    383 through lists, there would always be an object pointing to the descriptors,
    384 thus would never be deleted.

    362385
    363386
    364387
    369392
    370393
    371394
    372
    
                      
                    
    373 %llvm.dbg.compile_unit.type = type {
    374 uint, ;; Tag = 17 + LLVMDebugVersion (DW_TAG_compile_unit)
    375 { }*, ;; Compile unit anchor = cast = (%llvm.dbg.anchor.type* %llvm.dbg.compile_units to { }*)
    376 uint, ;; Dwarf language identifier (ex. DW_LANG_C89)
    377 i8*, ;; Source file name
    378 i8*, ;; Source file directory (includes trailing slash)
    379 i8* ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
    380 bool ;; True if this is a main compile unit.
    381 }
    382
    383
    384

    These descriptors contain a source language ID for the file (we use the Dwarf

    385 3.0 ID numbers, such as DW_LANG_C89, DW_LANG_C_plus_plus,
    386 DW_LANG_Cobol74, etc), three strings describing the filename, working
    387 directory of the compiler, and an identifier string for the compiler that
    388 produced it.

    389
    390

    Compile unit descriptors provide the root context for objects declared in a

    391 specific source file. Global variables and top level functions would be defined
    392 using this context. Compile unit descriptors also provide context for source
    393 line correspondence.

    394
    395

    Each input file is encoded as a separate compile unit in LLVM debugging

    396 information output. However, many target specific tool chains prefer to encode
    397 only one compile unit in an object file. In this situation, the LLVM code
    398 generator will include debugging information entities in the compile unit
    399 that is marked as main compile unit. The code generator accepts maximum one main
    400 compile unit per module. If a module does not contain any main compile unit
    401 then the code generator will emit multiple compile units in the output object
    402 file.
    395
    396
    
                      
                    
    397 %llvm.dbg.compile_unit.type = type {
    398 i32, ;; Tag = 17 + LLVMDebugVersion (DW_TAG_compile_unit)
    399 { }*, ;; Compile unit anchor = cast = (%llvm.dbg.anchor.type* %llvm.dbg.compile_units to { }*)
    400 i32, ;; DWARF language identifier (ex. DW_LANG_C89)
    401 i8*, ;; Source file name
    402 i8*, ;; Source file directory (includes trailing slash)
    403 i8* ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
    404 i1, ;; True if this is a main compile unit.
    405 i1, ;; True if this is optimized.
    406 i8*, ;; Flags
    407 i32 ;; Runtime version
    408 }
    409
    410
    411
    412

    These descriptors contain a source language ID for the file (we use the DWARF

    413 3.0 ID numbers, such as DW_LANG_C89, DW_LANG_C_plus_plus,
    414 DW_LANG_Cobol74, etc), three strings describing the filename,
    415 working directory of the compiler, and an identifier string for the compiler
    416 that produced it.

    417
    418

    Compile unit descriptors provide the root context for objects declared in a

    419 specific source file. Global variables and top level functions would be
    420 defined using this context. Compile unit descriptors also provide context
    421 for source line correspondence.

    422
    423

    Each input file is encoded as a separate compile unit in LLVM debugging

    424 information output. However, many target specific tool chains prefer to
    425 encode only one compile unit in an object file. In this situation, the LLVM
    426 code generator will include debugging information entities in the compile
    427 unit that is marked as main compile unit. The code generator accepts maximum
    428 one main compile unit per module. If a module does not contain any main
    429 compile unit then the code generator will emit multiple compile units in the
    430 output object file.

    431
    403432
    404433
    405434
    409438
    410439
    411440
    412
    
                      
                    
    413 %llvm.dbg.global_variable.type = type {
    414 uint, ;; Tag = 52 + LLVMDebugVersion (DW_TAG_variable)
    415 { }*, ;; Global variable anchor = cast (%llvm.dbg.anchor.type* %llvm.dbg.global_variables to { }*),
    416 { }*, ;; Reference to context descriptor
    417 i8*, ;; Name
    418 i8*, ;; Display name (fully qualified C++ name)
    419 i8*, ;; MIPS linkage name (for C++)
    420 { }*, ;; Reference to compile unit where defined
    421 uint, ;; Line number where defined
    422 { }*, ;; Reference to type descriptor
    423 bool, ;; True if the global is local to compile unit (static)
    424 bool, ;; True if the global is defined in the compile unit (not extern)
    425 { }* ;; Reference to the global variable
    426 }
    427 </pre>
    441 <div class="doc_code">
    442
    
                      
                    
    443 %llvm.dbg.global_variable.type = type {
    444 i32, ;; Tag = 52 + LLVMDebugVersion (DW_TAG_variable)
    445 { }*, ;; Global variable anchor = cast (%llvm.dbg.anchor.type* %llvm.dbg.global_variables to { }*),
    446 { }*, ;; Reference to context descriptor
    447 i8*, ;; Name
    448 i8*, ;; Display name (fully qualified C++ name)
    449 i8*, ;; MIPS linkage name (for C++)
    450 { }*, ;; Reference to compile unit where defined
    451 i32, ;; Line number where defined
    452 { }*, ;; Reference to type descriptor
    453 i1, ;; True if the global is local to compile unit (static)
    454 i1, ;; True if the global is defined in the compile unit (not extern)
    455 { }* ;; Reference to the global variable
    456 }
    457
    458
    428459
    429460

    These descriptors provide debug information about globals variables. The

    430461 provide details such as name, type and where the variable is defined.

    438469
    439470
    440471
    441
    
                      
                    
    442 %llvm.dbg.subprogram.type = type {
    443 uint, ;; Tag = 46 + LLVMDebugVersion (DW_TAG_subprogram)
    444 { }*, ;; Subprogram anchor = cast (%llvm.dbg.anchor.type* %llvm.dbg.subprograms to { }*),
    445 { }*, ;; Reference to context descriptor
    446 i8*, ;; Name
    447 i8*, ;; Display name (fully qualified C++ name)
    448 i8*, ;; MIPS linkage name (for C++)
    449 { }*, ;; Reference to compile unit where defined
    450 uint, ;; Line number where defined
    451 { }*, ;; Reference to type descriptor
    452 bool, ;; True if the global is local to compile unit (static)
    453 bool ;; True if the global is defined in the compile unit (not extern)
    454 }
    455 </pre>
    472 <div class="doc_code">
    473
    
                      
                    
    474 %llvm.dbg.subprogram.type = type {
    475 i32, ;; Tag = 46 + LLVMDebugVersion (DW_TAG_subprogram)
    476 { }*, ;; Subprogram anchor = cast (%llvm.dbg.anchor.type* %llvm.dbg.subprograms to { }*),
    477 { }*, ;; Reference to context descriptor
    478 i8*, ;; Name
    479 i8*, ;; Display name (fully qualified C++ name)
    480 i8*, ;; MIPS linkage name (for C++)
    481 { }*, ;; Reference to compile unit where defined
    482 i32, ;; Line number where defined
    483 { }*, ;; Reference to type descriptor
    484 i1, ;; True if the global is local to compile unit (static)
    485 i1 ;; True if the global is defined in the compile unit (not extern)
    486 }
    487
    488
    456489
    457490

    These descriptors provide debug information about functions, methods and

    458 subprograms. They provide details such as name, return types and the source
    459 location where the subprogram is defined.

    460
    461
    491 subprograms. They provide details such as name, return types and the source
    492 location where the subprogram is defined.

    493
    494
    495
    462496
    463497
    464498 Block descriptors
    466500
    467501
    468502
    469
    
                      
                    
    470 %llvm.dbg.block = type {
    471 i32, ;; Tag = 13 + LLVMDebugVersion (DW_TAG_lexical_block)
    472 { }* ;; Reference to context descriptor
    473 }
    474 </pre>
    503 <div class="doc_code">
    504
    
                      
                    
    505 %llvm.dbg.block = type {
    506 i32, ;; Tag = 13 + LLVMDebugVersion (DW_TAG_lexical_block)
    507 { }* ;; Reference to context descriptor
    508 }
    509
    510
    475511
    476512

    These descriptors provide debug information about nested blocks within a

    477 subprogram. The array of member descriptors is used to define local variables
    478 and deeper nested blocks.

    513 subprogram. The array of member descriptors is used to define local
    514 variables and deeper nested blocks.

    479515
    480516
    481517
    486522
    487523
    488524
    489
    
                      
                    
    490 %llvm.dbg.basictype.type = type {
    491 uint, ;; Tag = 36 + LLVMDebugVersion (DW_TAG_base_type)
    492 { }*, ;; Reference to context (typically a compile unit)
    493 i8*, ;; Name (may be "" for anonymous types)
    494 { }*, ;; Reference to compile unit where defined (may be NULL)
    495 uint, ;; Line number where defined (may be 0)
    496 i64, ;; Size in bits
    497 i64, ;; Alignment in bits
    498 uint, ;; Offset in bits
    499 uint ;; Dwarf type encoding
    500 }
    501 </pre>
    525 <div class="doc_code">
    526
    
                      
                    
    527 %llvm.dbg.basictype.type = type {
    528 i32, ;; Tag = 36 + LLVMDebugVersion (DW_TAG_base_type)
    529 { }*, ;; Reference to context (typically a compile unit)
    530 i8*, ;; Name (may be "" for anonymous types)
    531 { }*, ;; Reference to compile unit where defined (may be NULL)
    532 i32, ;; Line number where defined (may be 0)
    533 i64, ;; Size in bits
    534 i64, ;; Alignment in bits
    535 i64, ;; Offset in bits
    536 i32, ;; Flags
    537 i32 ;; DWARF type encoding
    538 }
    539
    540
    502541
    503542

    These descriptors define primitive types used in the code. Example int, bool

    504 and float. The context provides the scope of the type, which is usually the top
    505 level. Since basic types are not usually user defined the compile unit and line
    506 number can be left as NULL and 0. The size, alignment and offset are expressed
    507 in bits and can be 64 bit values. The alignment is used to round the offset
    508 when embedded in a composite type
    509 (example to keep float doubles on 64 bit boundaries.) The offset is the bit
    510 offset if embedded in a composite
    511 type.

    543 and float. The context provides the scope of the type, which is usually the
    544 top level. Since basic types are not usually user defined the compile unit
    545 and line number can be left as NULL and 0. The size, alignment and offset
    546 are expressed in bits and can be 64 bit values. The alignment is used to
    547 round the offset when embedded in a
    548 composite type (example to keep float
    549 doubles on 64 bit boundaries.) The offset is the bit offset if embedded in
    550 a composite type.

    512551
    513552

    The type encoding provides the details of the type. The values are typically

    514 one of the following:

    515
    516
    
                      
                    
    517 DW_ATE_address = 1
    518 DW_ATE_boolean = 2
    519 DW_ATE_float = 4
    520 DW_ATE_signed = 5
    521 DW_ATE_signed_char = 6
    522 DW_ATE_unsigned = 7
    523 DW_ATE_unsigned_char = 8
    524 >
    553 one of the following:>
    554
    555
    556
    
                      
                    
    557 DW_ATE_address = 1
    558 DW_ATE_boolean = 2
    559 DW_ATE_float = 4
    560 DW_ATE_signed = 5
    561 DW_ATE_signed_char = 6
    562 DW_ATE_unsigned = 7
    563 DW_ATE_unsigned_char = 8
    564
    565
    525566
    526567
    527568
    532573
    533574
    534575
    535
    
                      
                    
    536 %llvm.dbg.derivedtype.type = type {
    537 uint, ;; Tag (see below)
    538 { }*, ;; Reference to context
    539 i8*, ;; Name (may be "" for anonymous types)
    540 { }*, ;; Reference to compile unit where defined (may be NULL)
    541 uint, ;; Line number where defined (may be 0)
    542 uint, ;; Size in bits
    543 uint, ;; Alignment in bits
    544 uint, ;; Offset in bits
    545 { }* ;; Reference to type derived from
    546 }
    547 </pre>
    576 <div class="doc_code">
    577
    
                      
                    
    578 %llvm.dbg.derivedtype.type = type {
    579 i32, ;; Tag (see below)
    580 { }*, ;; Reference to context
    581 i8*, ;; Name (may be "" for anonymous types)
    582 { }*, ;; Reference to compile unit where defined (may be NULL)
    583 i32, ;; Line number where defined (may be 0)
    584 i32, ;; Size in bits
    585 i32, ;; Alignment in bits
    586 i32, ;; Offset in bits
    587 { }* ;; Reference to type derived from
    588 }
    589
    590
    548591
    549592

    These descriptors are used to define types derived from other types. The

    550593 value of the tag varies depending on the meaning. The following are possible
    551594 tag values:

    552595
    553
    
                      
                    
    554 DW_TAG_formal_parameter = 5
    555 DW_TAG_member = 13
    556 DW_TAG_pointer_type = 15
    557 DW_TAG_reference_type = 16
    558 DW_TAG_typedef = 22
    559 DW_TAG_const_type = 38
    560 DW_TAG_volatile_type = 53
    561 DW_TAG_restrict_type = 55
    562
    563
    564

    DW_TAG_member is used to define a member of a

    565 href="#format_composite_type">composite type or
    566 href="#format_subprograms">subprogram. The type of the member is the
    567 href="#format_derived_type">derived type. DW_TAG_formal_parameter
    568 is used to define a member which is a formal argument of a subprogram.

    569
    570

    DW_TAG_typedef is used to

    571 provide a name for the derived type.

    572
    573

    DW_TAG_pointer_type,

    574 DW_TAG_reference_type, DW_TAG_const_type,
    575 DW_TAG_volatile_type and DW_TAG_restrict_type are used to
    576 qualify the derived type. >
    596
    >
    597
    
                      
                    
    598 DW_TAG_formal_parameter = 5
    599 DW_TAG_member = 13
    600 DW_TAG_pointer_type = 15
    601 DW_TAG_reference_type = 16
    602 DW_TAG_typedef = 22
    603 DW_TAG_const_type = 38
    604 DW_TAG_volatile_type = 53
    605 DW_TAG_restrict_type = 55
    606
    607
    608
    609

    DW_TAG_member is used to define a member of

    610 a composite type
    611 or subprogram. The type of the member is
    612 the derived
    613 type. DW_TAG_formal_parameter is used to define a member which
    614 is a formal argument of a subprogram.

    615
    616

    DW_TAG_typedef is used to provide a name for the derived type.

    617
    618

    DW_TAG_pointer_type,DW_TAG_reference_type,

    619 DW_TAG_const_type, DW_TAG_volatile_type
    620 and DW_TAG_restrict_type are used to qualify
    621 the derived type.

    577622
    578623

    Derived type location can be determined

    579 from the compile unit and line number. The size, alignment and offset are
    580 expressed in bits and can be 64 bit values. The alignment is used to round the
    581 offset when embedded in a composite type
    582 (example to keep float doubles on 64 bit boundaries.) The offset is the bit
    583 offset if embedded in a composite
    584 type.

    624 from the compile unit and line number. The size, alignment and offset are
    625 expressed in bits and can be 64 bit values. The alignment is used to round
    626 the offset when embedded in a composite
    627 type (example to keep float doubles on 64 bit boundaries.) The offset is
    628 the bit offset if embedded in a composite
    629 type.

    585630
    586631

    Note that the void * type is expressed as a

    587 llvm.dbg.derivedtype.type with tag of DW_TAG_pointer_type and
    588 NULL derived type.>
    632 llvm.dbg.derivedtype.type with tag of DW_TAG_pointer_type>
    633 and NULL derived type.

    589634
    590635
    591636
    596641
    597642
    598643
    599
    
                      
                    
    600 %llvm.dbg.compositetype.type = type {
    601 uint, ;; Tag (see below)
    602 { }*, ;; Reference to context
    603 i8*, ;; Name (may be "" for anonymous types)
    604 { }*, ;; Reference to compile unit where defined (may be NULL)
    605 uint, ;; Line number where defined (may be 0)
    606 uint, ;; Size in bits
    607 uint, ;; Alignment in bits
    608 uint, ;; Offset in bits
    609 { }* ;; Reference to array of member descriptors
    610 }
    611 </pre>
    644 <div class="doc_code">
    645
    
                      
                    
    646 %llvm.dbg.compositetype.type = type {
    647 i32, ;; Tag (see below)
    648 { }*, ;; Reference to context
    649 i8*, ;; Name (may be "" for anonymous types)
    650 { }*, ;; Reference to compile unit where defined (may be NULL)
    651 i32, ;; Line number where defined (may be 0)
    652 i64, ;; Size in bits
    653 i64, ;; Alignment in bits
    654 i64, ;; Offset in bits
    655 i32, ;; Flags
    656 { }*, ;; Reference to type derived from
    657 { }*, ;; Reference to array of member descriptors
    658 i32 ;; Runtime languages
    659 }
    660
    661
    612662
    613663

    These descriptors are used to define types that are composed of 0 or more

    614664 elements. The value of the tag varies depending on the meaning. The following
    615665 are possible tag values:

    616666
    617
    
                      
                    
    618 DW_TAG_array_type = 1
    619 DW_TAG_enumeration_type = 4
    620 DW_TAG_structure_type = 19
    621 DW_TAG_union_type = 23
    622 DW_TAG_vector_type = 259
    623 DW_TAG_subroutine_type = 46
    624 DW_TAG_inheritance = 26
    625 </pre>
    667 <div class="doc_code">
    668
    
                      
                    
    669 DW_TAG_array_type = 1
    670 DW_TAG_enumeration_type = 4
    671 DW_TAG_structure_type = 19
    672 DW_TAG_union_type = 23
    673 DW_TAG_vector_type = 259
    674 DW_TAG_subroutine_type = 46
    675 DW_TAG_inheritance = 26
    676
    677
    626678
    627679

    The vector flag indicates that an array type is a native packed vector.

    628680
    629681

    The members of array types (tag = DW_TAG_array_type) or vector types

    630 (tag = DW_TAG_vector_type) are subrange
    631 descriptors, each representing the range of subscripts at that level of
    632 indexing.

    682 (tag = DW_TAG_vector_type) are subrange
    683 descriptors, each representing the range of subscripts at that level of
    684 indexing.

    633685
    634686

    The members of enumeration types (tag = DW_TAG_enumeration_type) are

    635 enumerator descriptors, each representing the
    636 definition of enumeration value
    637 for the set.

    687 enumerator descriptors, each representing
    688 the definition of enumeration value for the set.

    638689
    639690

    The members of structure (tag = DW_TAG_structure_type) or union (tag

    640 = DW_TAG_union_type) types are any one of the
    641 href="#format_basic_type">basic, derived
    642 or composite type descriptors, each
    643 representing a field member of the structure or union.

    691 = DW_TAG_union_type) types are any one of
    692 the basic,
    693 derived
    694 or composite type descriptors, each
    695 representing a field member of the structure or union.

    644696
    645697

    For C++ classes (tag = DW_TAG_structure_type), member descriptors

    646 provide information about base classes, static members and member functions. If
    647 a member is a derived type descriptor and has
    648 a tag of DW_TAG_inheritance, then the type represents a base class. If
    649 the member of is a global variable
    650 descriptor then it represents a static member. And, if the member is a
    651 href="#format_subprograms">subprogram descriptor then it represents a member
    652 function. For static members and member functions, getName() returns
    653 the members link or the C++ mangled name. getDisplayName() the
    654 simplied version of the name.

    655
    656

    The first member of subroutine (tag = DW_TAG_subroutine_type)

    657 type elements is the return type for the subroutine. The remaining
    658 elements are the formal arguments to the subroutine.

    698 provide information about base classes, static members and member
    699 functions. If a member is a derived type
    700 descriptor and has a tag of DW_TAG_inheritance, then the type
    701 represents a base class. If the member of is
    702 a global variable descriptor then it
    703 represents a static member. And, if the member is
    704 a subprogram descriptor then it represents
    705 a member function. For static members and member
    706 functions, getName() returns the members link or the C++ mangled
    707 name. getDisplayName() the simplied version of the name.

    708
    709

    The first member of subroutine (tag = DW_TAG_subroutine_type) type

    710 elements is the return type for the subroutine. The remaining elements are
    711 the formal arguments to the subroutine.

    659712
    660713

    Composite type location can be

    661 determined from the compile unit and line number. The size, alignment and
    662 offset are expressed in bits and can be 64 bit values. The alignment is used to
    663 round the offset when embedded in a composite
    664 type (as an example, to keep float doubles on 64 bit boundaries.) The offset
    665 is the bit offset if embedded in a composite
    666 type.

    714 determined from the compile unit and line number. The size, alignment and
    715 offset are expressed in bits and can be 64 bit values. The alignment is used
    716 to round the offset when embedded in
    717 a composite type (as an example, to keep
    718 float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
    719 in a composite type.

    667720
    668721
    669722
    674727
    675728
    676729
    677
    
                      
                    
    678 %llvm.dbg.subrange.type = type {
    679 uint, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type)
    680 uint, ;; Low value
    681 uint ;; High value
    682 }
    683 </pre>
    730 <div class="doc_code">
    731
    
                      
                    
    732 %llvm.dbg.subrange.type = type {
    733 i32, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type)
    734 i64, ;; Low value
    735 i64 ;; High value
    736 }
    737
    738
    684739
    685740

    These descriptors are used to define ranges of array subscripts for an array

    686 composite type. The low value defines the
    687 lower bounds typically zero for C/C++. The high value is the upper bounds.
    688 Values are 64 bit. High - low + 1 is the size of the array. If
    689 low == high the array will be unbounded.

    741 composite type. The low value defines
    742 the lower bounds typically zero for C/C++. The high value is the upper
    743 bounds. Values are 64 bit. High - low + 1 is the size of the array. If low
    744 == high the array will be unbounded.

    690745
    691746
    692747
    697752
    698753
    699754
    700
    
                      
                    
    701 %llvm.dbg.enumerator.type = type {
    702 uint, ;; Tag = 40 + LLVMDebugVersion (DW_TAG_enumerator)
    703 i8*, ;; Name
    704 uint ;; Value
    705 }
    706
    707
    708

    These descriptors are used to define members of an enumeration

    709 href="#format_composite_type">composite type, it associates the name to the
    710 value.>
    755
    >
    756
    
                      
                    
    757 %llvm.dbg.enumerator.type = type {
    758 i32, ;; Tag = 40 + LLVMDebugVersion (DW_TAG_enumerator)
    759 i8*, ;; Name
    760 i64 ;; Value
    761 }
    762
    763
    764
    765

    These descriptors are used to define members of an

    766 enumeration composite type, it
    767 associates the name to the value.

    711768
    712769
    713770
    717774
    718775
    719776
    720
    
                      
                    
    721 %llvm.dbg.variable.type = type {
    722 uint, ;; Tag (see below)
    723 { }*, ;; Context
    724 i8*, ;; Name
    725 { }*, ;; Reference to compile unit where defined
    726 uint, ;; Line number where defined
    727 { }* ;; Type descriptor
    728 }
    729
    777
    778
    779
    
                      
                    
    780 %llvm.dbg.variable.type = type {
    781 i32, ;; Tag (see below)
    782 { }*, ;; Context
    783 i8*, ;; Name
    784 { }*, ;; Reference to compile unit where defined
    785 i32, ;; Line number where defined
    786 { }* ;; Type descriptor
    787 }
    788
    789
    730790
    731791

    These descriptors are used to define variables local to a sub program. The

    732 value of the tag depends on the usage of the variable:

    733
    734
    
                      
                    
    735 DW_TAG_auto_variable = 256
    736 DW_TAG_arg_variable = 257
    737 DW_TAG_return_variable = 258
    738 >
    792 value of the tag depends on the usage of the variable:>
    793
    794
    795
    
                      
                    
    796 DW_TAG_auto_variable = 256
    797 DW_TAG_arg_variable = 257
    798 DW_TAG_return_variable = 258
    799
    800
    739801
    740802

    An auto variable is any variable declared in the body of the function. An

    741 argument variable is any variable that appears as a formal argument to the
    742 function. A return variable is used to track the result of a function and has
    743 no source correspondent.

    803 argument variable is any variable that appears as a formal argument to the
    804 function. A return variable is used to track the result of a function and
    805 has no source correspondent.

    744806
    745807

    The context is either the subprogram or block where the variable is defined.

    746 Name the source variable name. Compile unit and line indicate where the
    747 variable was defined. Type descriptor defines the declared type of the
    748 variable.

    808 Name the source variable name. Compile unit and line indicate where the
    809 variable was defined. Type descriptor defines the declared type of the
    810 variable.

    749811
    750812
    751813
    757819
    758820
    759821

    LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to

    760 provide debug information at various points in generated code.

    822 provide debug information at various points in generated code.

    761823
    762824
    763825
    772834
    773835
    774836

    This intrinsic is used to provide correspondence between the source file and

    775 the generated code. The first argument is the line number (base 1), second
    776 argument is the column number (0 if unknown) and the third argument the source
    777 %llvm.dbg.compile_unit* cast to a
    778 { }*. Code following a call to this intrinsic will have been defined
    779 in close proximity of the line, column and file. This information holds until
    780 the next call to %
    781 href="#format_common_stoppoint">lvm.dbg.stoppoint.

    837 the generated code. The first argument is the line number (base 1), second
    838 argument is the column number (0 if unknown) and the third argument the
    839 source %llvm.dbg.compile_unit*
    840 cast to a { }*. Code following a call to this intrinsic will
    841 have been defined in close proximity of the line, column and file. This
    842 information holds until the next call
    843 to %lvm.dbg.stoppoint.

    782844
    783845
    784846
    792854 void %llvm.dbg.func.start( { }* )
    793855
    794856
    795

    This intrinsic is used to link the debug information in %

    796 href="#format_subprograms">llvm.dbg.subprogram to the function. It
    797 defines the beginning of the function's declarative region (scope). It also
    798 implies a call to %
    799 href="#format_common_stoppoint">llvm.dbg.stoppoint which defines a
    800 source line "stop point". The intrinsic should be called early in the function
    801 after the all the alloca instructions. It should be paired off with a closing
    802 %
    803 href="#format_common_region_end">llvm.dbg.region.end. The function's
    804 single argument is the %
    805 href="#format_subprograms">llvm.dbg.subprogram.type.

    857

    This intrinsic is used to link the debug information

    858 in %llvm.dbg.subprogram to the
    859 function. It defines the beginning of the function's declarative region
    860 (scope). It also implies a call to
    861 %llvm.dbg.stoppoint which
    862 defines a source line "stop point". The intrinsic should be called early in
    863 the function after the all the alloca instructions. It should be paired off
    864 with a closing
    865 %llvm.dbg.region.end.
    866 The function's single argument is
    867 the %llvm.dbg.subprogram.type.

    806868
    807869
    808870
    817879
    818880
    819881

    This intrinsic is used to define the beginning of a declarative scope (ex.

    820 block) for local language elements. It should be paired off with a closing
    821 %llvm.dbg.region.end. The
    822 function's single argument is the %
    823 href="#format_blocks">llvm.dbg.block which is starting.

    882 block) for local language elements. It should be paired off with a closing
    883 %llvm.dbg.region.end. The
    884 function's single argument is
    885 the %llvm.dbg.block which is
    886 starting.

    824887
    825888
    826889
    836899
    837900
    838901

    This intrinsic is used to define the end of a declarative scope (ex. block)

    839 for local language elements. It should be paired off with an opening %
    840 href="#format_common_region_start">llvm.dbg.region.start or %
    841 href="#format_common_func_start">llvm.dbg.func.start. The function's
    842 single argument is either the %
    843 href="#format_blocks">llvm.dbg.block or the %
    844 href="#format_subprograms">llvm.dbg.subprogram.type which is
    845 ending.

    902 for local language elements. It should be paired off with an
    903 opening %llvm.dbg.region.start
    904 or %llvm.dbg.func.start.
    905 The function's single argument is either
    906 the %llvm.dbg.block or
    907 the %llvm.dbg.subprogram.type
    908 which is ending.

    846909
    847910
    848911
    857920
    858921
    859922

    This intrinsic provides information about a local element (ex. variable.) The

    860 first argument is the alloca for the variable, cast to a { }*. The
    861 second argument is the %
    862 href="#format_variables">llvm.dbg.variable containing the description
    863 of the variable, also cast to a { }*.

    923 first argument is the alloca for the variable, cast to a { }*. The
    924 second argument is
    925 the %llvm.dbg.variable containing
    926 the description of the variable, also cast to a { }*.

    864927
    865928
    866929
    874937
    875938
    876939

    LLVM debugger "stop points" are a key part of the debugging representation

    877 that allows the LLVM to maintain simple semantics for
    878 href="#debugopt">debugging optimized code. The basic idea is that the
    879 front-end inserts calls to the
    880 href="#format_common_stoppoint">%llvm.dbg.stoppoint intrinsic
    881 function at every point in the program where a debugger should be able to
    882 inspect the program (these correspond to places a debugger stops when you
    883 "step" through it). The front-end can choose to place these as
    884 fine-grained as it would like (for example, before every subexpression
    885 evaluated), but it is recommended to only put them after every source statement
    886 that includes executable code.

    940 that allows the LLVM to maintain simple semantics
    941 for debugging optimized code. The basic idea is that
    942 the front-end inserts calls to
    943 the %llvm.dbg.stoppoint
    944 intrinsic function at every point in the program where a debugger should be
    945 able to inspect the program (these correspond to places a debugger stops when
    946 you "step" through it). The front-end can choose to place these as
    947 fine-grained as it would like (for example, before every subexpression
    948 evaluated), but it is recommended to only put them after every source
    949 statement that includes executable code.

    887950
    888951

    Using calls to this intrinsic function to demark legal points for the

    889 debugger to inspect the program automatically disables any optimizations that
    890 could potentially confuse debugging information. To non-debug-information-aware
    891 transformations, these calls simply look like calls to an external function,
    892 which they must assume to do anything (including reading or writing to any part
    893 of reachable memory). On the other hand, it does not impact many optimizations,
    894 such as code motion of non-trapping instructions, nor does it impact
    895 optimization of subexpressions, code duplication transformations, or basic-block
    896 reordering transformations.

    897
    898
    899
    952 debugger to inspect the program automatically disables any optimizations that
    953 could potentially confuse debugging information. To
    954 non-debug-information-aware transformations, these calls simply look like
    955 calls to an external function, which they must assume to do anything
    956 (including reading or writing to any part of reachable memory). On the other
    957 hand, it does not impact many optimizations, such as code motion of
    958 non-trapping instructions, nor does it impact optimization of subexpressions,
    959 code duplication transformations, or basic-block reordering
    960 transformations.

    961
    962
    900963
    901964
    902965
    905968
    906969
    907970

    In many languages, the local variables in functions can have their lifetime

    908 or scope limited to a subset of a function. In the C family of languages, for
    909 example, variables are only live (readable and writable) within the source block
    910 that they are defined in. In functional languages, values are only readable
    911 after they have been defined. Though this is a very obvious concept, it is also
    912 non-trivial to model in LLVM, because it has no notion of scoping in this sense,
    913 and does not want to be tied to a language's scoping rules.

    971 or scope limited to a subset of a function. In the C family of languages,
    972 for example, variables are only live (readable and writable) within the
    973 source block that they are defined in. In functional languages, values are
    974 only readable after they have been defined. Though this is a very obvious
    975 concept, it is also non-trivial to model in LLVM, because it has no notion of
    976 scoping in this sense, and does not want to be tied to a language's scoping
    977 rules.

    914978
    915979

    In order to handle this, the LLVM debug format uses the notion of "regions"

    916 of a function, delineated by calls to intrinsic functions. These intrinsic
    917 functions define new regions of the program and indicate when the region
    918 lifetime expires. Consider the following C fragment, for example:

    919
    980 of a function, delineated by calls to intrinsic functions. These intrinsic
    981 functions define new regions of the program and indicate when the region
    982 lifetime expires. Consider the following C fragment, for example:

    983
    984
    920985
    
                      
                    
    921986 1. void foo() {
    922987 2. int X = ...;
    928993 8. ...
    929994 9. }
    930995
    996
    931997
    932998

    Compiled to LLVM, this function would be represented like this:

    933999
    1000
    9341001
    
                      
                    
    9351002 void %foo() {
    9361003 entry:
    9401007
    9411008 ...
    9421009
    943 call void %llvm.dbg.func.start( %llvm.dbg.subprogram.type* %llvm.dbg.subprogram )
    1010 call void @llvm.dbg.func.start( %llvm.dbg.subprogram.type* @llvm.dbg.subprogram )
    9441011
    945 call void %llvm.dbg.stoppoint( uint 2, uint 2, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit )
    1012 call void @llvm.dbg.stoppoint( uint 2, uint 2, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit )
    9461013
    947 call void %llvm.dbg.declare({}* %X, ...)
    948 call void %llvm.dbg.declare({}* %Y, ...)
    1014 call void @llvm.dbg.declare({}* %X, ...)
    1015 call void @llvm.dbg.declare({}* %Y, ...)
    9491016
    9501017 ;; Evaluate expression on line 2, assigning to X.
    9511018
    952 call void %llvm.dbg.stoppoint( uint 3, uint 2, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit )
    1019 call void @llvm.dbg.stoppoint( uint 3, uint 2, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit )
    9531020
    9541021 ;; Evaluate expression on line 3, assigning to Y.
    9551022
    956 call void %llvm.region.start()
    957 call void %llvm.dbg.stoppoint( uint 5, uint 4, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit )
    958 call void %llvm.dbg.declare({}* %X, ...)
    1023 call void @llvm.region.start()
    1024 call void @llvm.dbg.stoppoint( uint 5, uint 4, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit )
    1025 call void @llvm.dbg.declare({}* %X, ...)
    9591026
    9601027 ;; Evaluate expression on line 5, assigning to Z.
    9611028
    962 call void %llvm.dbg.stoppoint( uint 7, uint 2, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit )
    963 call void %llvm.region.end()
    1029 call void @llvm.dbg.stoppoint( uint 7, uint 2, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit )
    1030 call void @llvm.region.end()
    9641031
    965 call void %llvm.dbg.stoppoint( uint 9, uint 2, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit )
    1032 call void @llvm.dbg.stoppoint( uint 9, uint 2, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit )
    9661033
    967 call void %llvm.region.end()
    1034 call void @llvm.region.end()
    9681035
    9691036 ret void
    9701037 }
    9711038
    1039
    9721040
    9731041

    This example illustrates a few important details about the LLVM debugging

    974 information. In particular, it shows how the various intrinsics are applied
    975 together to allow a debugger to analyze the relationship between statements,
    976 variable definitions, and the code used to implement the function.

    977
    978

    The first intrinsic %

    979 href="#format_common_func_start">llvm.dbg.func.start provides
    980 a link with the subprogram descriptor
    981 containing the details of this function. This call also defines the beginning
    982 of the function region, bounded by the %
    983 href="#format_common_region_end">llvm.region.end at the end of
    984 the function. This region is used to bracket the lifetime of variables declared
    985 within. For a function, this outer region defines a new stack frame whose
    986 lifetime ends when the region is ended.

    1042 information. In particular, it shows how the various intrinsics are applied
    1043 together to allow a debugger to analyze the relationship between statements,
    1044 variable definitions, and the code used to implement the function.

    1045
    1046

    The first

    1047 intrinsic %llvm.dbg.func.start
    1048 provides a link with the subprogram
    1049 descriptor containing the details of this function. This call also
    1050 defines the beginning of the function region, bounded by
    1051 the %llvm.region.end at the
    1052 end of the function. This region is used to bracket the lifetime of
    1053 variables declared within. For a function, this outer region defines a new
    1054 stack frame whose lifetime ends when the region is ended.

    9871055
    9881056

    It is possible to define inner regions for short term variables by using the

    989 %llvm.region.start and
    990 href="#format_common_region_end">%llvm.region.end to bound a
    991 region. The inner region in this example would be for the block containing the
    992 declaration of Z.>
    1057 %llvm.region.start>
    1058 and %llvm.region.end to
    1059 bound a region. The inner region in this example would be for the block
    1060 containing the declaration of Z.

    9931061
    9941062

    Using regions to represent the boundaries of source-level functions allow

    995 LLVM interprocedural optimizations to arbitrarily modify LLVM functions without
    996 having to worry about breaking mapping information between the LLVM code and the
    997 and source-level program. In particular, the inliner requires no modification
    998 to support inlining with debugging information: there is no explicit correlation
    999 drawn between LLVM functions and their source-level counterparts (note however,
    1000 that if the inliner inlines all instances of a non-strong-linkage function into
    1001 its caller that it will not be possible for the user to manually invoke the
    1002 inlined function from a debugger).

    1003
    1004

    Once the function has been defined, the

    1005 href="#format_common_stoppoint">stopping point corresponding to
    1006 line #2 (column #2) of the function is encountered. At this point in the
    1007 function, no local variables are live. As lines 2 and 3 of the example
    1008 are executed, their variable definitions are introduced into the program using
    1009 %llvm.dbg.declare, without the
    1010 need to specify a new region. These variables do not require new regions to be
    1011 introduced because they go out of scope at the same point in the program: line
    1012 9.

    1063 LLVM interprocedural optimizations to arbitrarily modify LLVM functions
    1064 without having to worry about breaking mapping information between the LLVM
    1065 code and the and source-level program. In particular, the inliner requires
    1066 no modification to support inlining with debugging information: there is no
    1067 explicit correlation drawn between LLVM functions and their source-level
    1068 counterparts (note however, that if the inliner inlines all instances of a
    1069 non-strong-linkage function into its caller that it will not be possible for
    1070 the user to manually invoke the inlined function from a debugger).

    1071
    1072

    Once the function has been defined,

    1073 the stopping point
    1074 corresponding to line #2 (column #2) of the function is encountered. At this
    1075 point in the function, no local variables are live. As lines 2 and 3
    1076 of the example are executed, their variable definitions are introduced into
    1077 the program using
    1078 %llvm.dbg.declare, without the
    1079 need to specify a new region. These variables do not require new regions to
    1080 be introduced because they go out of scope at the same point in the program:
    1081 line 9.

    10131082
    10141083

    In contrast, the Z variable goes out of scope at a different time,

    1015 on line 7. For this reason, it is defined within the inner region, which kills
    1016 the availability of Z before the code for line 8 is executed. In this
    1017 way, regions can support arbitrary source-language scoping rules, as long as
    1018 they can only be nested (ie, one scope cannot partially overlap with a part of
    1019 another scope).

    1084 on line 7. For this reason, it is defined within the inner region, which
    1085 kills the availability of Z before the code for line 8 is executed.
    1086 In this way, regions can support arbitrary source-language scoping rules, as
    1087 long as they can only be nested (ie, one scope cannot partially overlap with
    1088 a part of another scope).

    10201089
    10211090

    It is worth noting that this scoping mechanism is used to control scoping of

    1022 all declarations, not just variable declarations. For example, the scope of a
    1023 C++ using declaration is controlled with this and could change how name lookup is
    1024 performed.

    1025
    1026
    1027
    1028
    1091 all declarations, not just variable declarations. For example, the scope of
    1092 a C++ using declaration is controlled with this and could change how name
    1093 lookup is performed.

    1094
    1095
    10291096
    10301097
    10311098
    10361103
    10371104
    10381105

    The C and C++ front-ends represent information about the program in a format

    1039 that is effectively identical to
    1040 href="http://www.eagercon.com/dwarf/dwarf3std.htm">Dwarf 3.0 in terms of
    1041 information content. This allows code generators to trivially support native
    1042 debuggers by generating standard dwarf information, and contains enough
    1043 information for non-dwarf targets to translate it as needed.

    1106 that is effectively identical
    1107 to DWARF 3.0 in
    1108 terms of information content. This allows code generators to trivially
    1109 support native debuggers by generating standard dwarf information, and
    1110 contains enough information for non-dwarf targets to translate it as
    1111 needed.

    10441112
    10451113

    This section describes the forms used to represent C and C++ programs. Other

    1046 languages could pattern themselves after this (which itself is tuned to
    1047 representing programs in the same way that Dwarf 3 does), or they could choose
    1048 to provide completely different forms if they don't fit into the Dwarf model.
    1049 As support for debugging information gets added to the various LLVM
    1050 source-language front-ends, the information used should be documented here.

    1114 languages could pattern themselves after this (which itself is tuned to
    1115 representing programs in the same way that DWARF 3 does), or they could
    1116 choose to provide completely different forms if they don't fit into the DWARF
    1117 model. As support for debugging information gets added to the various LLVM
    1118 source-language front-ends, the information used should be documented
    1119 here.

    10511120
    10521121

    The following sections provide examples of various C/C++ constructs and the

    1053 debug information that would best describe those constructs.

    1122 debug information that would best describe those constructs.

    10541123
    10551124
    10561125
    10611130
    10621131
    10631132
    1064

    Given the source files "MySource.cpp" and "MyHeader.h" located in the

    1065 directory "/Users/mine/sources", the following code:

    1066
    1133

    Given the source files MySource.cpp and MyHeader.h located

    1134 in the directory /Users/mine/sources, the following code:

    1135
    1136
    10671137
    
                      
                    
    10681138 #include "MyHeader.h"
    10691139
    10711141 return 0;
    10721142 }
    10731143
    1144
    10741145
    10751146

    a C/C++ front-end would generate the following descriptors:

    10761147
    1148
    10771149
    
                      
                    
    10781150 ...
    10791151 ;;
    11231195 %str4 = internal constant [11 x i8] c"MyHeader.h\00", section "llvm.metadata";
    11241196 ...
    11251197
    1198
    11261199
    11271200
    11281201
    11351208
    11361209

    Given an integer global variable declared as follows:

    11371210
    1211
    11381212
    
                      
                    
    11391213 int MyGlobal = 100;
    11401214
    1215
    11411216
    11421217

    a C/C++ front-end would generate the following descriptors:

    11431218
    1219
    11441220
    
                      
                    
    11451221 ;;
    11461222 ;; Define types used. One for global variable anchors, one for the global
    12031279 %str2 = internal constant [1 x i8] c"\00", section "llvm.metadata"
    12041280 %str3 = internal constant [4 x i8] c"int\00", section "llvm.metadata"
    12051281
    1282
    12061283
    12071284
    12081285
    12151292
    12161293

    Given a function declared as follows:

    12171294
    1295
    12181296
    
                      
                    
    12191297 int main(int argc, char *argv[]) {
    12201298 return 0;
    12211299 }
    12221300
    1301
    12231302
    12241303

    a C/C++ front-end would generate the following descriptors:

    12251304
    1305
    12261306
    
                      
                    
    12271307 ;;
    12281308 ;; Define types used. One for subprogram anchors, one for the subprogram
    12681348 ...
    12691349 }
    12701350
    1351
    12711352
    12721353
    12731354
    12891370
    12901371
    12911372
    1373
    12921374
    
                      
                    
    12931375 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    12941376 uint add(uint 36, uint 262144),
    13021384 uint 2 }, section "llvm.metadata"
    13031385 %str1 = internal constant [5 x i8] c"bool\00", section "llvm.metadata"
    13041386
    1387
    13051388
    13061389
    13071390
    13121395
    13131396
    13141397
    1398
    13151399
    
                      
                    
    13161400 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    13171401 uint add(uint 36, uint 262144),
    13251409 uint 6 }, section "llvm.metadata"
    13261410 %str1 = internal constant [5 x i8] c"char\00", section "llvm.metadata"
    13271411
    1412
    13281413
    13291414
    13301415
    13351420
    13361421
    13371422
    1423
    13381424
    
                      
                    
    13391425 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    13401426 uint add(uint 36, uint 262144),
    13481434 uint 8 }, section "llvm.metadata"
    13491435 %str1 = internal constant [14 x i8] c"unsigned char\00", section "llvm.metadata"
    13501436
    1437
    13511438
    13521439
    13531440
    13581445
    13591446
    13601447
    1448
    13611449
    
                      
                    
    13621450 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    13631451 uint add(uint 36, uint 262144),
    13711459 uint 5 }, section "llvm.metadata"
    13721460 %str1 = internal constant [10 x i8] c"short int\00", section "llvm.metadata"
    13731461
    1462
    13741463
    13751464
    13761465
    13811470
    13821471
    13831472
    1473
    13841474
    
                      
                    
    13851475 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    13861476 uint add(uint 36, uint 262144),
    13941484 uint 7 }, section "llvm.metadata"
    13951485 %str1 = internal constant [19 x i8] c"short unsigned int\00", section "llvm.metadata"
    13961486
    1487
    13971488
    13981489
    13991490
    14041495
    14051496
    14061497
    1498
    14071499
    
                      
                    
    14081500 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    14091501 uint add(uint 36, uint 262144),
    14161508 uint 0,
    14171509 uint 5 }, section "llvm.metadata"
    14181510 %str1 = internal constant [4 x i8] c"int\00", section "llvm.metadata"
    1419
    1511
    14201512
    14211513
    14221514
    14271519
    14281520
    14291521
    1522
    14301523
    
                      
                    
    14311524 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    14321525 uint add(uint 36, uint 262144),
    14401533 uint 7 }, section "llvm.metadata"
    14411534 %str1 = internal constant [13 x i8] c"unsigned int\00", section "llvm.metadata"
    14421535
    1536
    14431537
    14441538
    14451539
    14501544
    14511545
    14521546
    1547
    14531548
    
                      
                    
    14541549 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    14551550 uint add(uint 36, uint 262144),
    14631558 uint 5 }, section "llvm.metadata"
    14641559 %str1 = internal constant [14 x i8] c"long long int\00", section "llvm.metadata"
    14651560
    1561
    14661562
    14671563
    14681564
    14731569
    14741570
    14751571
    1572
    14761573
    
                      
                    
    14771574 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    14781575 uint add(uint 36, uint 262144),
    14861583 uint 7 }, section "llvm.metadata"
    14871584 %str1 = internal constant [23 x 8] c"long long unsigned int\00", section "llvm.metadata"
    14881585
    1586
    14891587
    14901588
    14911589
    14961594
    14971595
    14981596
    1597
    14991598
    
                      
                    
    15001599 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    15011600 uint add(uint 36, uint 262144),
    15091608 uint 4 }, section "llvm.metadata"
    15101609 %str1 = internal constant [6 x i8] c"float\00", section "llvm.metadata"
    15111610
    1611
    15121612
    15131613
    15141614
    15191619
    15201620
    15211621
    1622
    15221623
    
                      
                    
    15231624 %llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type {
    15241625 uint add(uint 36, uint 262144),
    15321633 uint 4 }, section "llvm.metadata"
    15331634 %str1 = internal constant [7 x 8] c"double\00", section "llvm.metadata"
    15341635
    1636
    15351637
    15361638
    15371639
    15441646
    15451647

    Given the following as an example of C/C++ derived type:

    15461648
    1649
    15471650
    
                      
                    
    15481651 typedef const int *IntPtr;
    15491652
    1653
    15501654
    15511655

    a C/C++ front-end would generate the following descriptors:

    15521656
    1657
    15531658
    
                      
                    
    15541659 ;;
    15551660 ;; Define the typedef "IntPtr".
    16091714 uint 5 }, section "llvm.metadata"
    16101715 %str2 = internal constant [4 x 8] c"int\00", section "llvm.metadata"
    16111716
    1717
    16121718
    16131719
    16141720
    16211727
    16221728

    Given the following as an example of C/C++ struct type:

    16231729
    1730
    16241731
    
                      
                    
    16251732 struct Color {
    16261733 unsigned Red;
    16281735 unsigned Blue;
    16291736 };
    16301737
    1738
    16311739
    16321740

    a C/C++ front-end would generate the following descriptors:

    16331741
    1742
    16341743
    
                      
                    
    16351744 ;;
    16361745 ;; Define basic type for unsigned int.
    17161825 { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype2 to { }*),
    17171826 { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype3 to { }*) ], section "llvm.metadata"
    17181827
    1828
    17191829
    17201830
    17211831
    17281838
    17291839

    Given the following as an example of C/C++ enumeration type:

    17301840
    1841
    17311842
    
                      
                    
    17321843 enum Trees {
    17331844 Spruce = 100,
    17351846 Maple = 300
    17361847 };
    17371848
    1849
    17381850
    17391851

    a C/C++ front-end would generate the following descriptors:

    17401852
    1853
    17411854
    
                      
                    
    17421855 ;;
    17431856 ;; Define composite type for enum Trees
    17901903 { }* cast (%llvm.dbg.enumerator.type* %llvm.dbg.enumerator2 to { }*),
    17911904 { }* cast (%llvm.dbg.enumerator.type* %llvm.dbg.enumerator3 to { }*) ], section "llvm.metadata"
    17921905
    1906
    17931907
    17941908
    17951909