llvm.org GIT mirror llvm / bbef5ea
Documentation: convert SourceLevelDebugging.html to reST git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168493 91177308-0d34-0410-b5e6-96231b3b80d8 Dmitri Gribenko 6 years ago
3 changed file(s) with 2287 addition(s) and 2859 deletion(s). Raw diff Collapse all Expand all
+0
-2858
docs/SourceLevelDebugging.html less more
None
1 "http://www.w3.org/TR/html4/strict.dtd">
2
3
4
5 Source Level Debugging with LLVM
6
7
8
9
10

Source Level Debugging with LLVM

11
12
13
14
15
16
  • Introduction
  • 17
    18
  • Philosophy behind LLVM debugging information
  • 19
  • Debug information consumers
  • 20
  • Debugging optimized code
  • 21
    22
  • Debugging information format
  • 23
    24
  • Debug information descriptors
  • 25
    26
  • Compile unit descriptors
  • 27
  • File descriptors
  • 28
  • Global variable descriptors
  • 29
  • Subprogram descriptors
  • 30
  • Block descriptors
  • 31
  • Basic type descriptors
  • 32
  • Derived type descriptors
  • 33
  • Composite type descriptors
  • 34
  • Subrange descriptors
  • 35
  • Enumerator descriptors
  • 36
  • Local variables
  • 37
    38
  • Debugger intrinsic functions
  • 39
    40
  • llvm.dbg.declare
  • 41
  • llvm.dbg.value
  • 42
    43
    44
  • Object lifetimes and scoping
  • 45
  • C/C++ front-end specific debug information
  • 46
    47
  • C/C++ source file information
  • 48
  • C/C++ global variable information
  • 49
  • C/C++ function information
  • 50
  • C/C++ basic types
  • 51
  • C/C++ derived types
  • 52
  • C/C++ struct/union types
  • 53
  • C/C++ enumeration types
  • 54
    55
  • LLVM Dwarf Extensions
  • 56
    57
  • Debugging Information Extension
  • 58 for Objective C Properties
    59
    60
  • Introduction
  • 61
  • Proposal
  • 62
  • New DWARF Attributes
  • 63
  • New DWARF Constants
  • 64
    65
    66
  • Name Accelerator Tables
  • 67
    68
  • Introduction
  • 69
  • Hash Tables
  • 70
  • Details
  • 71
  • Contents
  • 72
  • Language Extensions and File Format Changes
  • 73
    74
    75
    76
    77
    78
    79
    80
    81
    82

    Written by Chris Lattner

    83 and Jim Laskey

    84
    85
    86
    87
    88

    Introduction

    89
    90
    91
    92
    93

    This document is the central repository for all information pertaining to

    94 debug information in LLVM. It describes the actual format
    95 that the LLVM debug information takes, which is useful for those
    96 interested in creating front-ends or dealing directly with the information.
    97 Further, this document provides specific examples of what debug information
    98 for C/C++ looks like.

    99
    100
    101

    102 Philosophy behind LLVM debugging information
    103
    104
    105
    106
    107

    The idea of the LLVM debugging information is to capture how the important

    108 pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
    109 Several design aspects have shaped the solution that appears here. The
    110 important ones are:

    111
    112
    113
  • Debugging information should have very little impact on the rest of the
  • 114 compiler. No transformations, analyses, or code generators should need to
    115 be modified because of debugging information.
    116
    117
  • LLVM optimizations should interact in well-defined and
  • 118 easily described ways with the debugging information.
    119
    120
  • Because LLVM is designed to support arbitrary programming languages,
  • 121 LLVM-to-LLVM tools should not need to know anything about the semantics of
    122 the source-level-language.
    123
    124
  • Source-level languages are often widely different from one another.
  • 125 LLVM should not put any restrictions of the flavor of the source-language,
    126 and the debugging information should work with any language.
    127
    128
  • With code generator support, it should be possible to use an LLVM compiler
  • 129 to compile a program to native machine code and standard debugging
    130 formats. This allows compatibility with traditional machine-code level
    131 debuggers, like GDB or DBX.
    132
    133
    134

    The approach used by the LLVM implementation is to use a small set

    135 of intrinsic functions to define a
    136 mapping between LLVM program objects and the source-level objects. The
    137 description of the source-level program is maintained in LLVM metadata
    138 in an implementation-defined format
    139 (the C/C++ front-end currently uses working draft 7 of
    140 the DWARF 3
    141 standard).

    142
    143

    When a program is being debugged, a debugger interacts with the user and

    144 turns the stored debug information into source-language specific information.
    145 As such, a debugger must be aware of the source-language, and is thus tied to
    146 a specific language or family of languages.

    147
    148
    149
    150
    151

    152 Debug information consumers
    153
    154
    155
    156
    157

    The role of debug information is to provide meta information normally

    158 stripped away during the compilation process. This meta information provides
    159 an LLVM user a relationship between generated code and the original program
    160 source code.

    161
    162

    Currently, debug information is consumed by DwarfDebug to produce dwarf

    163 information used by the gdb debugger. Other targets could use the same
    164 information to produce stabs or other debug forms.

    165
    166

    It would also be reasonable to use debug information to feed profiling tools

    167 for analysis of generated code, or, tools for reconstructing the original
    168 source from generated code.

    169
    170

    TODO - expound a bit more.

    171
    172
    173
    174
    175

    176 Debugging optimized code
    177
    178
    179
    180
    181

    An extremely high priority of LLVM debugging information is to make it

    182 interact well with optimizations and analysis. In particular, the LLVM debug
    183 information provides the following guarantees:

    184
    185
    186
  • LLVM debug information always provides information to accurately read
  • 187 the source-level state of the program, regardless of which LLVM
    188 optimizations have been run, and without any modification to the
    189 optimizations themselves. However, some optimizations may impact the
    190 ability to modify the current state of the program with a debugger, such
    191 as setting program variables, or calling functions that have been
    192 deleted.
    193
    194
  • As desired, LLVM optimizations can be upgraded to be aware of the LLVM
  • 195 debugging information, allowing them to update the debugging information
    196 as they perform aggressive optimizations. This means that, with effort,
    197 the LLVM optimizers could optimize debug code just as well as non-debug
    198 code.
    199
    200
  • LLVM debug information does not prevent optimizations from
  • 201 happening (for example inlining, basic block reordering/merging/cleanup,
    202 tail duplication, etc).
    203
    204
  • LLVM debug information is automatically optimized along with the rest of
  • 205 the program, using existing facilities. For example, duplicate
    206 information is automatically merged by the linker, and unused information
    207 is automatically removed.
    208
    209
    210

    Basically, the debug information allows you to compile a program with

    211 "-O0 -g" and get full debug information, allowing you to arbitrarily
    212 modify the program as it executes from a debugger. Compiling a program with
    213 "-O3 -g" gives you full debug information that is always available
    214 and accurate for reading (e.g., you get accurate stack traces despite tail
    215 call elimination and inlining), but you might lose the ability to modify the
    216 program and call functions where were optimized out of the program, or
    217 inlined away completely.

    218
    219

    LLVM test suite provides a

    220 framework to test optimizer's handling of debugging information. It can be
    221 run like this:

    222
    223
    224
    
                      
                    
    225 % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
    226 % make TEST=dbgopt
    227
    228
    229
    230

    This will test impact of debugging information on optimization passes. If

    231 debugging information influences optimization passes then it will be reported
    232 as a failure. See TestingGuide for more
    233 information on LLVM test infrastructure and how to run various tests.

    234
    235
    236
    237
    238
    239
    240

    241 Debugging information format
    242
    243
    244
    245
    246
    247

    LLVM debugging information has been carefully designed to make it possible

    248 for the optimizer to optimize the program and debugging information without
    249 necessarily having to know anything about debugging information. In
    250 particular, the use of metadata avoids duplicated debugging information from
    251 the beginning, and the global dead code elimination pass automatically
    252 deletes debugging information for a function if it decides to delete the
    253 function.

    254
    255

    To do this, most of the debugging information (descriptors for types,

    256 variables, functions, source files, etc) is inserted by the language
    257 front-end in the form of LLVM metadata.

    258
    259

    Debug information is designed to be agnostic about the target debugger and

    260 debugging information representation (e.g. DWARF/Stabs/etc). It uses a
    261 generic pass to decode the information that represents variables, types,
    262 functions, namespaces, etc: this allows for arbitrary source-language
    263 semantics and type-systems to be used, as long as there is a module
    264 written for the target debugger to interpret the information.

    265
    266

    To provide basic functionality, the LLVM debugger does have to make some

    267 assumptions about the source-level language being debugged, though it keeps
    268 these to a minimum. The only common features that the LLVM debugger assumes
    269 exist are source files,
    270 and program objects. These abstract
    271 objects are used by a debugger to form stack traces, show information about
    272 local variables, etc.

    273
    274

    This section of the documentation first describes the representation aspects

    275 common to any source-language. The next section
    276 describes the data layout conventions used by the C and C++ front-ends.

    277
    278
    279

    280 Debug information descriptors
    281
    282
    283
    284
    285

    In consideration of the complexity and volume of debug information, LLVM

    286 provides a specification for well formed debug descriptors.

    287
    288

    Consumers of LLVM debug information expect the descriptors for program

    289 objects to start in a canonical format, but the descriptors can include
    290 additional information appended at the end that is source-language
    291 specific. All LLVM debugging information is versioned, allowing backwards
    292 compatibility in the case that the core structures need to change in some
    293 way. Also, all debugging information objects start with a tag to indicate
    294 what type of object it is. The source-language is allowed to define its own
    295 objects, by using unreserved tag numbers. We recommend using with tags in
    296 the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base =
    297 0x1000.)

    298
    299

    The fields of debug descriptors used internally by LLVM

    300 are restricted to only the simple data types i32, i1,
    301 float, double, mdstring and mdnode.

    302
    303
    304
    
                      
                    
    305 !1 = metadata !{
    306 i32, ;; A tag
    307 ...
    308 }
    309
    310
    311
    312

    The first field of a descriptor is always an

    313 i32 containing a tag value identifying the content of the
    314 descriptor. The remaining fields are specific to the descriptor. The values
    315 of tags are loosely bound to the tag values of DWARF information entries.
    316 However, that does not restrict the use of the information supplied to DWARF
    317 targets. To facilitate versioning of debug information, the tag is augmented
    318 with the current debug version (LLVMDebugVersion = 8 << 16 or
    319 0x80000 or 524288.)

    320
    321

    The details of the various descriptors follow.

    322
    323
    324

    325 Compile unit descriptors
    326
    327
    328
    329
    330
    331
    
                      
                    
    332 !0 = metadata !{
    333 i32, ;; Tag = 17 + LLVMDebugVersion
    334 ;; (DW_TAG_compile_unit)
    335 i32, ;; Unused field.
    336 i32, ;; DWARF language identifier (ex. DW_LANG_C89)
    337 metadata, ;; Source file name
    338 metadata, ;; Source file directory (includes trailing slash)
    339 metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
    340 i1, ;; True if this is a main compile unit.
    341 i1, ;; True if this is optimized.
    342 metadata, ;; Flags
    343 i32 ;; Runtime version
    344 metadata ;; List of enums types
    345 metadata ;; List of retained types
    346 metadata ;; List of subprograms
    347 metadata ;; List of global variables
    348 }
    349
    350
    351
    352

    These descriptors contain a source language ID for the file (we use the DWARF

    353 3.0 ID numbers, such as DW_LANG_C89, DW_LANG_C_plus_plus,
    354 DW_LANG_Cobol74, etc), three strings describing the filename,
    355 working directory of the compiler, and an identifier string for the compiler
    356 that produced it.

    357
    358

    Compile unit descriptors provide the root context for objects declared in a

    359 specific compilation unit. File descriptors are defined using this context.
    360 These descriptors are collected by a named metadata
    361 !llvm.dbg.cu. Compile unit descriptor keeps track of subprograms,
    362 global variables and type information.
    363
    364
    365
    366
    367

    368 File descriptors
    369
    370
    371
    372
    373
    374
    
                      
                    
    375 !0 = metadata !{
    376 i32, ;; Tag = 41 + LLVMDebugVersion
    377 ;; (DW_TAG_file_type)
    378 metadata, ;; Source file name
    379 metadata, ;; Source file directory (includes trailing slash)
    380 metadata ;; Unused
    381 }
    382
    383
    384
    385

    These descriptors contain information for a file. Global variables and top

    386 level functions would be defined using this context.k File descriptors also
    387 provide context for source line correspondence.

    388
    389

    Each input file is encoded as a separate file descriptor in LLVM debugging

    390 information output.

    391
    392
    393
    394
    395

    396 Global variable descriptors
    397
    398
    399
    400
    401
    402
    
                      
                    
    403 !1 = metadata !{
    404 i32, ;; Tag = 52 + LLVMDebugVersion
    405 ;; (DW_TAG_variable)
    406 i32, ;; Unused field.
    407 metadata, ;; Reference to context descriptor
    408 metadata, ;; Name
    409 metadata, ;; Display name (fully qualified C++ name)
    410 metadata, ;; MIPS linkage name (for C++)
    411 metadata, ;; Reference to file where defined
    412 i32, ;; Line number where defined
    413 metadata, ;; Reference to type descriptor
    414 i1, ;; True if the global is local to compile unit (static)
    415 i1, ;; True if the global is defined in the compile unit (not extern)
    416 {}* ;; Reference to the global variable
    417 }
    418
    419
    420
    421

    These descriptors provide debug information about globals variables. The

    422 provide details such as name, type and where the variable is defined. All
    423 global variables are collected inside the named metadata
    424 !llvm.dbg.cu.

    425
    426
    427
    428
    429

    430 Subprogram descriptors
    431
    432
    433
    434
    435
    436
    
                      
                    
    437 !2 = metadata !{
    438 i32, ;; Tag = 46 + LLVMDebugVersion
    439 ;; (DW_TAG_subprogram)
    440 i32, ;; Unused field.
    441 metadata, ;; Reference to context descriptor
    442 metadata, ;; Name
    443 metadata, ;; Display name (fully qualified C++ name)
    444 metadata, ;; MIPS linkage name (for C++)
    445 metadata, ;; Reference to file where defined
    446 i32, ;; Line number where defined
    447 metadata, ;; Reference to type descriptor
    448 i1, ;; True if the global is local to compile unit (static)
    449 i1, ;; True if the global is defined in the compile unit (not extern)
    450 i32, ;; Line number where the scope of the subprogram begins
    451 i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
    452 i32, ;; Index into a virtual function
    453 metadata, ;; indicates which base type contains the vtable pointer for the
    454 ;; derived class
    455 i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped.
    456 i1, ;; isOptimized
    457 Function *,;; Pointer to LLVM function
    458 metadata, ;; Lists function template parameters
    459 metadata ;; Function declaration descriptor
    460 metadata ;; List of function variables
    461 }
    462
    463
    464
    465

    These descriptors provide debug information about functions, methods and

    466 subprograms. They provide details such as name, return types and the source
    467 location where the subprogram is defined.
    468

    469
    470
    471
    472
    473

    474 Block descriptors
    475
    476
    477
    478
    479
    480
    
                      
                    
    481 !3 = metadata !{
    482 i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block)
    483 metadata,;; Reference to context descriptor
    484 i32, ;; Line number
    485 i32, ;; Column number
    486 metadata,;; Reference to source file
    487 i32 ;; Unique ID to identify blocks from a template function
    488 }
    489
    490
    491
    492

    This descriptor provides debug information about nested blocks within a

    493 subprogram. The line number and column numbers are used to dinstinguish
    494 two lexical blocks at same depth.

    495
    496
    497
    
                      
                    
    498 !3 = metadata !{
    499 i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block)
    500 metadata ;; Reference to the scope we're annotating with a file change
    501 metadata,;; Reference to the file the scope is enclosed in.
    502 }
    503
    504
    505
    506

    This descriptor provides a wrapper around a lexical scope to handle file

    507 changes in the middle of a lexical block.

    508
    509
    510
    511
    512

    513 Basic type descriptors
    514
    515
    516
    517
    518
    519
    
                      
                    
    520 !4 = metadata !{
    521 i32, ;; Tag = 36 + LLVMDebugVersion
    522 ;; (DW_TAG_base_type)
    523 metadata, ;; Reference to context
    524 metadata, ;; Name (may be "" for anonymous types)
    525 metadata, ;; Reference to file where defined (may be NULL)
    526 i32, ;; Line number where defined (may be 0)
    527 i64, ;; Size in bits
    528 i64, ;; Alignment in bits
    529 i64, ;; Offset in bits
    530 i32, ;; Flags
    531 i32 ;; DWARF type encoding
    532 }
    533
    534
    535
    536

    These descriptors define primitive types used in the code. Example int, bool

    537 and float. The context provides the scope of the type, which is usually the
    538 top level. Since basic types are not usually user defined the context
    539 and line number can be left as NULL and 0. The size, alignment and offset
    540 are expressed in bits and can be 64 bit values. The alignment is used to
    541 round the offset when embedded in a
    542 composite type (example to keep float
    543 doubles on 64 bit boundaries.) The offset is the bit offset if embedded in
    544 a composite type.

    545
    546

    The type encoding provides the details of the type. The values are typically

    547 one of the following:

    548
    549
    550
    
                      
                    
    551 DW_ATE_address = 1
    552 DW_ATE_boolean = 2
    553 DW_ATE_float = 4
    554 DW_ATE_signed = 5
    555 DW_ATE_signed_char = 6
    556 DW_ATE_unsigned = 7
    557 DW_ATE_unsigned_char = 8
    558
    559
    560
    561
    562
    563
    564

    565 Derived type descriptors
    566
    567
    568
    569
    570
    571
    
                      
                    
    572 !5 = metadata !{
    573 i32, ;; Tag (see below)
    574 metadata, ;; Reference to context
    575 metadata, ;; Name (may be "" for anonymous types)
    576 metadata, ;; Reference to file where defined (may be NULL)
    577 i32, ;; Line number where defined (may be 0)
    578 i64, ;; Size in bits
    579 i64, ;; Alignment in bits
    580 i64, ;; Offset in bits
    581 i32, ;; Flags to encode attributes, e.g. private
    582 metadata, ;; Reference to type derived from
    583 metadata, ;; (optional) Name of the Objective C property associated with
    584 ;; Objective-C an ivar
    585 metadata, ;; (optional) Name of the Objective C property getter selector.
    586 metadata, ;; (optional) Name of the Objective C property setter selector.
    587 i32 ;; (optional) Objective C property attributes.
    588 }
    589
    590
    591
    592

    These descriptors are used to define types derived from other types. The

    593 value of the tag varies depending on the meaning. The following are possible
    594 tag values:

    595
    596
    597
    
                      
                    
    598 DW_TAG_formal_parameter = 5
    599 DW_TAG_member = 13
    600 DW_TAG_pointer_type = 15
    601 DW_TAG_reference_type = 16
    602 DW_TAG_typedef = 22
    603 DW_TAG_const_type = 38
    604 DW_TAG_volatile_type = 53
    605 DW_TAG_restrict_type = 55
    606
    607
    608
    609

    DW_TAG_member is used to define a member of

    610 a composite type
    611 or subprogram. The type of the member is
    612 the derived
    613 type. DW_TAG_formal_parameter is used to define a member which
    614 is a formal argument of a subprogram.

    615
    616

    DW_TAG_typedef is used to provide a name for the derived type.

    617
    618

    DW_TAG_pointer_type, DW_TAG_reference_type,

    619 DW_TAG_const_type, DW_TAG_volatile_type and
    620 DW_TAG_restrict_type are used to qualify
    621 the derived type.

    622
    623

    Derived type location can be determined

    624 from the context and line number. The size, alignment and offset are
    625 expressed in bits and can be 64 bit values. The alignment is used to round
    626 the offset when embedded in a composite
    627 type (example to keep float doubles on 64 bit boundaries.) The offset is
    628 the bit offset if embedded in a composite
    629 type.

    630
    631

    Note that the void * type is expressed as a type derived from NULL.

    632

    633
    634
    635
    636
    637

    638 Composite type descriptors
    639
    640
    641
    642
    643
    644
    
                      
                    
    645 !6 = metadata !{
    646 i32, ;; Tag (see below)
    647 metadata, ;; Reference to context
    648 metadata, ;; Name (may be "" for anonymous types)
    649 metadata, ;; Reference to file where defined (may be NULL)
    650 i32, ;; Line number where defined (may be 0)
    651 i64, ;; Size in bits
    652 i64, ;; Alignment in bits
    653 i64, ;; Offset in bits
    654 i32, ;; Flags
    655 metadata, ;; Reference to type derived from
    656 metadata, ;; Reference to array of member descriptors
    657 i32 ;; Runtime languages
    658 }
    659
    660
    661
    662

    These descriptors are used to define types that are composed of 0 or more

    663 elements. The value of the tag varies depending on the meaning. The following
    664 are possible tag values:

    665
    666
    667
    
                      
                    
    668 DW_TAG_array_type = 1
    669 DW_TAG_enumeration_type = 4
    670 DW_TAG_structure_type = 19
    671 DW_TAG_union_type = 23
    672 DW_TAG_vector_type = 259
    673 DW_TAG_subroutine_type = 21
    674 DW_TAG_inheritance = 28
    675
    676
    677
    678

    The vector flag indicates that an array type is a native packed vector.

    679
    680

    The members of array types (tag = DW_TAG_array_type) or vector types

    681 (tag = DW_TAG_vector_type) are subrange
    682 descriptors, each representing the range of subscripts at that level of
    683 indexing.

    684
    685

    The members of enumeration types (tag = DW_TAG_enumeration_type) are

    686 enumerator descriptors, each representing
    687 the definition of enumeration value for the set. All enumeration type
    688 descriptors are collected inside the named metadata
    689 !llvm.dbg.cu.

    690
    691

    The members of structure (tag = DW_TAG_structure_type) or union (tag

    692 = DW_TAG_union_type) types are any one of
    693 the basic,
    694 derived
    695 or composite type descriptors, each
    696 representing a field member of the structure or union.

    697
    698

    For C++ classes (tag = DW_TAG_structure_type), member descriptors

    699 provide information about base classes, static members and member
    700 functions. If a member is a derived type
    701 descriptor and has a tag of DW_TAG_inheritance, then the type
    702 represents a base class. If the member of is
    703 a global variable descriptor then it
    704 represents a static member. And, if the member is
    705 a subprogram descriptor then it represents
    706 a member function. For static members and member
    707 functions, getName() returns the members link or the C++ mangled
    708 name. getDisplayName() the simplied version of the name.

    709
    710

    The first member of subroutine (tag = DW_TAG_subroutine_type) type

    711 elements is the return type for the subroutine. The remaining elements are
    712 the formal arguments to the subroutine.

    713
    714

    Composite type location can be

    715 determined from the context and line number. The size, alignment and
    716 offset are expressed in bits and can be 64 bit values. The alignment is used
    717 to round the offset when embedded in
    718 a composite type (as an example, to keep
    719 float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
    720 in a composite type.

    721
    722
    723
    724
    725

    726 Subrange descriptors
    727
    728
    729
    730
    731
    732
    
                      
                    
    733 !42 = metadata !{
    734 i32, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type)
    735 i64, ;; Low value
    736 i64 ;; High value
    737 }
    738
    739
    740
    741

    These descriptors are used to define ranges of array subscripts for an array

    742 composite type. The low value defines
    743 the lower bounds typically zero for C/C++. The high value is the upper
    744 bounds. Values are 64 bit. High - low + 1 is the size of the array. If low
    745 > high the array bounds are not included in generated debugging information.
    746

    747
    748
    749
    750
    751

    752 Enumerator descriptors
    753
    754
    755
    756
    757
    758
    
                      
                    
    759 !6 = metadata !{
    760 i32, ;; Tag = 40 + LLVMDebugVersion
    761 ;; (DW_TAG_enumerator)
    762 metadata, ;; Name
    763 i64 ;; Value
    764 }
    765
    766
    767
    768

    These descriptors are used to define members of an

    769 enumeration composite type, it
    770 associates the name to the value.

    771
    772
    773
    774
    775

    776 Local variables
    777
    778
    779
    780
    781
    782
    
                      
                    
    783 !7 = metadata !{
    784 i32, ;; Tag (see below)
    785 metadata, ;; Context
    786 metadata, ;; Name
    787 metadata, ;; Reference to file where defined
    788 i32, ;; 24 bit - Line number where defined
    789 ;; 8 bit - Argument number. 1 indicates 1st argument.
    790 metadata, ;; Type descriptor
    791 i32, ;; flags
    792 metadata ;; (optional) Reference to inline location
    793 }
    794
    795
    796
    797

    These descriptors are used to define variables local to a sub program. The

    798 value of the tag depends on the usage of the variable:

    799
    800
    801
    
                      
                    
    802 DW_TAG_auto_variable = 256
    803 DW_TAG_arg_variable = 257
    804 DW_TAG_return_variable = 258
    805
    806
    807
    808

    An auto variable is any variable declared in the body of the function. An

    809 argument variable is any variable that appears as a formal argument to the
    810 function. A return variable is used to track the result of a function and
    811 has no source correspondent.

    812
    813

    The context is either the subprogram or block where the variable is defined.

    814 Name the source variable name. Context and line indicate where the
    815 variable was defined. Type descriptor defines the declared type of the
    816 variable.

    817
    818
    819
    820
    821
    822
    823

    824 Debugger intrinsic functions
    825
    826
    827
    828
    829

    LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to

    830 provide debug information at various points in generated code.

    831
    832
    833

    834 llvm.dbg.declare
    835
    836
    837
    838
    
                      
                    
    839 void %llvm.dbg.declare(metadata, metadata)
    840
    841
    842

    This intrinsic provides information about a local element (e.g., variable). The

    843 first argument is metadata holding the alloca for the variable. The
    844 second argument is metadata containing a description of the variable.

    845
    846
    847
    848

    849 llvm.dbg.value
    850
    851
    852
    853
    
                      
                    
    854 void %llvm.dbg.value(metadata, i64, metadata)
    855
    856
    857

    This intrinsic provides information when a user source variable is set to a

    858 new value. The first argument is the new value (wrapped as metadata). The
    859 second argument is the offset in the user source variable where the new value
    860 is written. The third argument is metadata containing a description of the
    861 user source variable.

    862
    863
    864
    865
    866
    867

    868 Object lifetimes and scoping
    869
    870
    871
    872

    In many languages, the local variables in functions can have their lifetimes

    873 or scopes limited to a subset of a function. In the C family of languages,
    874 for example, variables are only live (readable and writable) within the
    875 source block that they are defined in. In functional languages, values are
    876 only readable after they have been defined. Though this is a very obvious
    877 concept, it is non-trivial to model in LLVM, because it has no notion of
    878 scoping in this sense, and does not want to be tied to a language's scoping
    879 rules.

    880
    881

    In order to handle this, the LLVM debug format uses the metadata attached to

    882 llvm instructions to encode line number and scoping information. Consider
    883 the following C fragment, for example:

    884
    885
    886
    
                      
                    
    887 1. void foo() {
    888 2. int X = 21;
    889 3. int Y = 22;
    890 4. {
    891 5. int Z = 23;
    892 6. Z = X;
    893 7. }
    894 8. X = Y;
    895 9. }
    896
    897
    898
    899

    Compiled to LLVM, this function would be represented like this:

    900
    901
    902
    
                      
                    
    903 define void @foo() nounwind ssp {
    904 entry:
    905 %X = alloca i32, align 4 ; <i32*> [#uses=4]
    906 %Y = alloca i32, align 4 ; <i32*> [#uses=4]
    907 %Z = alloca i32, align 4 ; <i32*> [#uses=3]
    908 %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1]
    909 call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7
    910 store i32 21, i32* %X, !dbg !8
    911 %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1]
    912 call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10
    913 store i32 22, i32* %Y, !dbg !11
    914 %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1]
    915 call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14
    916 store i32 23, i32* %Z, !dbg !15
    917 %tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1]
    918 %tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1]
    919 %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1]
    920 store i32 %add, i32* %Z, !dbg !16
    921 %tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1]
    922 store i32 %tmp2, i32* %X, !dbg !17
    923 ret void, !dbg !18
    924 }
    925
    926 declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone
    927
    928 !0 = metadata !{i32 459008, metadata !1, metadata !"X",
    929 metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ]
    930 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
    931 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo",
    932 metadata !"foo", metadata !3, i32 1, metadata !4,
    933 i1 false, i1 true}; [DW_TAG_subprogram ]
    934 !3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c",
    935 metadata !"/private/tmp", metadata !"clang 1.1", i1 true,
    936 i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ]
    937 !4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0,
    938 i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ]
    939 !5 = metadata !{null}
    940 !6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0,
    941 i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ]
    942 !7 = metadata !{i32 2, i32 7, metadata !1, null}
    943 !8 = metadata !{i32 2, i32 3, metadata !1, null}
    944 !9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3,
    945 metadata !6}; [ DW_TAG_auto_variable ]
    946 !10 = metadata !{i32 3, i32 7, metadata !1, null}
    947 !11 = metadata !{i32 3, i32 3, metadata !1, null}
    948 !12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5,
    949 metadata !6}; [ DW_TAG_auto_variable ]
    950 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
    951 !14 = metadata !{i32 5, i32 9, metadata !13, null}
    952 !15 = metadata !{i32 5, i32 5, metadata !13, null}
    953 !16 = metadata !{i32 6, i32 5, metadata !13, null}
    954 !17 = metadata !{i32 8, i32 3, metadata !1, null}
    955 !18 = metadata !{i32 9, i32 1, metadata !2, null}
    956
    957
    958
    959

    This example illustrates a few important details about LLVM debugging

    960 information. In particular, it shows how the llvm.dbg.declare
    961 intrinsic and location information, which are attached to an instruction,
    962 are applied together to allow a debugger to analyze the relationship between
    963 statements, variable definitions, and the code used to implement the
    964 function.

    965
    966
    967
    
                      
                    
    968 call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7
    969
    970
    971
    972

    The first intrinsic

    973 %llvm.dbg.declare
    974 encodes debugging information for the variable X. The metadata
    975 !dbg !7 attached to the intrinsic provides scope information for the
    976 variable X.

    977
    978
    979
    
                      
                    
    980 !7 = metadata !{i32 2, i32 7, metadata !1, null}
    981 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
    982 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo",
    983 metadata !"foo", metadata !"foo", metadata !3, i32 1,
    984 metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ]
    985
    986
    987
    988

    Here !7 is metadata providing location information. It has four

    989 fields: line number, column number, scope, and original scope. The original
    990 scope represents inline location if this instruction is inlined inside a
    991 caller, and is null otherwise. In this example, scope is encoded by
    992 !1. !1 represents a lexical block inside the scope
    993 !2, where !2 is a
    994 subprogram descriptor. This way the
    995 location information attached to the intrinsics indicates that the
    996 variable X is declared at line number 2 at a function level scope in
    997 function foo.

    998
    999

    Now lets take another example.

    1000
    1001
    1002
    
                      
                    
    1003 call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14
    1004
    1005
    1006
    1007

    The second intrinsic

    1008 %llvm.dbg.declare
    1009 encodes debugging information for variable Z. The metadata
    1010 !dbg !14 attached to the intrinsic provides scope information for
    1011 the variable Z.

    1012
    1013
    1014
    
                      
                    
    1015 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
    1016 !14 = metadata !{i32 5, i32 9, metadata !13, null}
    1017
    1018
    1019
    1020

    Here !14 indicates that Z is declared at line number 5 and

    1021 column number 9 inside of lexical scope !13. The lexical scope
    1022 itself resides inside of lexical scope !1 described above.

    1023
    1024

    The scope information attached with each instruction provides a

    1025 straightforward way to find instructions covered by a scope.

    1026
    1027
    1028
    1029
    1030
    1031
    1032

    1033 C/C++ front-end specific debug information
    1034
    1035
    1036
    1037
    1038
    1039

    The C and C++ front-ends represent information about the program in a format

    1040 that is effectively identical
    1041 to DWARF 3.0 in
    1042 terms of information content. This allows code generators to trivially
    1043 support native debuggers by generating standard dwarf information, and
    1044 contains enough information for non-dwarf targets to translate it as
    1045 needed.

    1046
    1047

    This section describes the forms used to represent C and C++ programs. Other

    1048 languages could pattern themselves after this (which itself is tuned to
    1049 representing programs in the same way that DWARF 3 does), or they could
    1050 choose to provide completely different forms if they don't fit into the DWARF
    1051 model. As support for debugging information gets added to the various LLVM
    1052 source-language front-ends, the information used should be documented
    1053 here.

    1054
    1055

    The following sections provide examples of various C/C++ constructs and the

    1056 debug information that would best describe those constructs.

    1057
    1058
    1059

    1060 C/C++ source file information
    1061
    1062
    1063
    1064
    1065

    Given the source files MySource.cpp and MyHeader.h located

    1066 in the directory /Users/mine/sources, the following code:

    1067
    1068
    1069
    
                      
                    
    1070 #include "MyHeader.h"
    1071
    1072 int main(int argc, char *argv[]) {
    1073 return 0;
    1074 }
    1075
    1076
    1077
    1078

    a C/C++ front-end would generate the following descriptors:

    1079
    1080
    1081
    
                      
                    
    1082 ...
    1083 ;;
    1084 ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
    1085 ;;
    1086 !2 = metadata !{
    1087 i32 524305, ;; Tag
    1088 i32 0, ;; Unused
    1089 i32 4, ;; Language Id
    1090 metadata !"MySource.cpp",
    1091 metadata !"/Users/mine/sources",
    1092 metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)",
    1093 i1 true, ;; Main Compile Unit
    1094 i1 false, ;; Optimized compile unit
    1095 metadata !"", ;; Compiler flags
    1096 i32 0} ;; Runtime version
    1097
    1098 ;;
    1099 ;; Define the file for the file "/Users/mine/sources/MySource.cpp".
    1100 ;;
    1101 !1 = metadata !{
    1102 i32 524329, ;; Tag
    1103 metadata !"MySource.cpp",
    1104 metadata !"/Users/mine/sources",
    1105 metadata !2 ;; Compile unit
    1106 }
    1107
    1108 ;;
    1109 ;; Define the file for the file "/Users/mine/sources/Myheader.h"
    1110 ;;
    1111 !3 = metadata !{
    1112 i32 524329, ;; Tag
    1113 metadata !"Myheader.h"
    1114 metadata !"/Users/mine/sources",
    1115 metadata !2 ;; Compile unit
    1116 }
    1117
    1118 ...
    1119
    1120
    1121
    1122

    llvm::Instruction provides easy access to metadata attached with an

    1123 instruction. One can extract line number information encoded in LLVM IR
    1124 using Instruction::getMetadata() and
    1125 DILocation::getLineNumber().
    1126
    
                      
                    
    1127 if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction
    1128 DILocation Loc(N); // DILocation is in DebugInfo.h
    1129 unsigned Line = Loc.getLineNumber();
    1130 StringRef File = Loc.getFilename();
    1131 StringRef Dir = Loc.getDirectory();
    1132 }
    1133
    1134
    1135
    1136
    1137

    1138 C/C++ global variable information
    1139
    1140
    1141
    1142
    1143

    Given an integer global variable declared as follows:

    1144
    1145
    1146
    
                      
                    
    1147 int MyGlobal = 100;
    1148
    1149
    1150
    1151

    a C/C++ front-end would generate the following descriptors:

    1152
    1153
    1154
    
                      
                    
    1155 ;;
    1156 ;; Define the global itself.
    1157 ;;
    1158 %MyGlobal = global int 100
    1159 ...
    1160 ;;
    1161 ;; List of debug info of globals
    1162 ;;
    1163 !llvm.dbg.cu = !{!0}
    1164
    1165 ;; Define the compile unit.
    1166 !0 = metadata !{
    1167 i32 786449, ;; Tag
    1168 i32 0, ;; Context
    1169 i32 4, ;; Language
    1170 metadata !"foo.cpp", ;; File
    1171 metadata !"/Volumes/Data/tmp", ;; Directory
    1172 metadata !"clang version 3.1 ", ;; Producer
    1173 i1 true, ;; Deprecated field
    1174 i1 false, ;; "isOptimized"?
    1175 metadata !"", ;; Flags
    1176 i32 0, ;; Runtime Version
    1177 metadata !1, ;; Enum Types
    1178 metadata !1, ;; Retained Types
    1179 metadata !1, ;; Subprograms
    1180 metadata !3 ;; Global Variables
    1181 } ; [ DW_TAG_compile_unit ]
    1182
    1183 ;; The Array of Global Variables
    1184 !3 = metadata !{
    1185 metadata !4
    1186 }
    1187
    1188 !4 = metadata !{
    1189 metadata !5
    1190 }
    1191
    1192 ;;
    1193 ;; Define the global variable itself.
    1194 ;;
    1195 !5 = metadata !{
    1196 i32 786484, ;; Tag
    1197 i32 0, ;; Unused
    1198 null, ;; Unused
    1199 metadata !"MyGlobal", ;; Name
    1200 metadata !"MyGlobal", ;; Display Name
    1201 metadata !"", ;; Linkage Name
    1202 metadata !6, ;; File
    1203 i32 1, ;; Line
    1204 metadata !7, ;; Type
    1205 i32 0, ;; IsLocalToUnit
    1206 i32 1, ;; IsDefinition
    1207 i32* @MyGlobal ;; LLVM-IR Value
    1208 } ; [ DW_TAG_variable ]
    1209
    1210 ;;
    1211 ;; Define the file
    1212 ;;
    1213 !6 = metadata !{
    1214 i32 786473, ;; Tag
    1215 metadata !"foo.cpp", ;; File
    1216 metadata !"/Volumes/Data/tmp", ;; Directory
    1217 null ;; Unused
    1218 } ; [ DW_TAG_file_type ]
    1219
    1220 ;;
    1221 ;; Define the type
    1222 ;;
    1223 !7 = metadata !{
    1224 i32 786468, ;; Tag
    1225 null, ;; Unused
    1226 metadata !"int", ;; Name
    1227 null, ;; Unused
    1228 i32 0, ;; Line
    1229 i64 32, ;; Size in Bits
    1230 i64 32, ;; Align in Bits
    1231 i64 0, ;; Offset
    1232 i32 0, ;; Flags
    1233 i32 5 ;; Encoding
    1234 } ; [ DW_TAG_base_type ]
    1235
    1236
    1237
    1238
    1239
    1240
    1241
    1242

    1243 C/C++ function information
    1244
    1245
    1246
    1247
    1248

    Given a function declared as follows:

    1249
    1250
    1251
    
                      
                    
    1252 int main(int argc, char *argv[]) {
    1253 return 0;
    1254 }
    1255
    1256
    1257
    1258

    a C/C++ front-end would generate the following descriptors:

    1259
    1260
    1261
    
                      
                    
    1262 ;;
    1263 ;; Define the anchor for subprograms. Note that the second field of the
    1264 ;; anchor is 46, which is the same as the tag for subprograms
    1265 ;; (46 = DW_TAG_subprogram.)
    1266 ;;
    1267 !6 = metadata !{
    1268 i32 524334, ;; Tag
    1269 i32 0, ;; Unused
    1270 metadata !1, ;; Context
    1271 metadata !"main", ;; Name
    1272 metadata !"main", ;; Display name
    1273 metadata !"main", ;; Linkage name
    1274 metadata !1, ;; File
    1275 i32 1, ;; Line number
    1276 metadata !4, ;; Type
    1277 i1 false, ;; Is local
    1278 i1 true, ;; Is definition
    1279 i32 0, ;; Virtuality attribute, e.g. pure virtual function
    1280 i32 0, ;; Index into virtual table for C++ methods
    1281 i32 0, ;; Type that holds virtual table.
    1282 i32 0, ;; Flags
    1283 i1 false, ;; True if this function is optimized
    1284 Function *, ;; Pointer to llvm::Function
    1285 null ;; Function template parameters
    1286 }
    1287 ;;
    1288 ;; Define the subprogram itself.
    1289 ;;
    1290 define i32 @main(i32 %argc, i8** %argv) {
    1291 ...
    1292 }
    1293
    1294
    1295
    1296
    1297
    1298
    1299

    1300 C/C++ basic types
    1301
    1302
    1303
    1304
    1305

    The following are the basic type descriptors for C/C++ core types:

    1306
    1307
    1308

    1309 bool
    1310
    1311
    1312
    1313
    1314
    1315
    
                      
                    
    1316 !2 = metadata !{
    1317 i32 524324, ;; Tag
    1318 metadata !1, ;; Context
    1319 metadata !"bool", ;; Name
    1320 metadata !1, ;; File
    1321 i32 0, ;; Line number
    1322 i64 8, ;; Size in Bits
    1323 i64 8, ;; Align in Bits
    1324 i64 0, ;; Offset in Bits
    1325 i32 0, ;; Flags
    1326 i32 2 ;; Encoding
    1327 }
    1328
    1329
    1330
    1331
    1332
    1333
    1334

    1335 char
    1336
    1337
    1338
    1339
    1340
    1341
    
                      
                    
    1342 !2 = metadata !{
    1343 i32 524324, ;; Tag
    1344 metadata !1, ;; Context
    1345 metadata !"char", ;; Name
    1346 metadata !1, ;; File
    1347 i32 0, ;; Line number
    1348 i64 8, ;; Size in Bits
    1349 i64 8, ;; Align in Bits
    1350 i64 0, ;; Offset in Bits
    1351 i32 0, ;; Flags
    1352 i32 6 ;; Encoding
    1353 }
    1354
    1355
    1356
    1357
    1358
    1359
    1360

    1361 unsigned char
    1362
    1363
    1364
    1365
    1366
    1367
    
                      
                    
    1368 !2 = metadata !{
    1369 i32 524324, ;; Tag
    1370 metadata !1, ;; Context
    1371 metadata !"unsigned char",
    1372 metadata !1, ;; File
    1373 i32 0, ;; Line number
    1374 i64 8, ;; Size in Bits
    1375 i64 8, ;; Align in Bits
    1376 i64 0, ;; Offset in Bits
    1377 i32 0, ;; Flags
    1378 i32 8 ;; Encoding
    1379 }
    1380
    1381
    1382
    1383
    1384
    1385
    1386

    1387 short
    1388
    1389
    1390
    1391
    1392
    1393
    
                      
                    
    1394 !2 = metadata !{
    1395 i32 524324, ;; Tag
    1396 metadata !1, ;; Context
    1397 metadata !"short int",
    1398 metadata !1, ;; File
    1399 i32 0, ;; Line number
    1400 i64 16, ;; Size in Bits
    1401 i64 16, ;; Align in Bits
    1402 i64 0, ;; Offset in Bits
    1403 i32 0, ;; Flags
    1404 i32 5 ;; Encoding
    1405 }
    1406
    1407
    1408
    1409
    1410
    1411
    1412

    1413 unsigned short
    1414
    1415
    1416
    1417
    1418
    1419
    
                      
                    
    1420 !2 = metadata !{
    1421 i32 524324, ;; Tag
    1422 metadata !1, ;; Context
    1423 metadata !"short unsigned int",
    1424 metadata !1, ;; File
    1425 i32 0, ;; Line number
    1426 i64 16, ;; Size in Bits
    1427 i64 16, ;; Align in Bits
    1428 i64 0, ;; Offset in Bits
    1429 i32 0, ;; Flags
    1430 i32 7 ;; Encoding
    1431 }
    1432
    1433
    1434
    1435
    1436
    1437
    1438

    1439 int
    1440
    1441
    1442
    1443
    1444
    1445
    
                      
                    
    1446 !2 = metadata !{
    1447 i32 524324, ;; Tag
    1448 metadata !1, ;; Context
    1449 metadata !"int", ;; Name
    1450 metadata !1, ;; File
    1451 i32 0, ;; Line number
    1452 i64 32, ;; Size in Bits
    1453 i64 32, ;; Align in Bits
    1454 i64 0, ;; Offset in Bits
    1455 i32 0, ;; Flags
    1456 i32 5 ;; Encoding
    1457 }
    1458
    1459
    1460
    1461
    1462
    1463

    1464 unsigned int
    1465
    1466
    1467
    1468
    1469
    1470
    
                      
                    
    1471 !2 = metadata !{
    1472 i32 524324, ;; Tag
    1473 metadata !1, ;; Context
    1474 metadata !"unsigned int",
    1475 metadata !1, ;; File
    1476 i32 0, ;; Line number
    1477 i64 32, ;; Size in Bits
    1478 i64 32, ;; Align in Bits
    1479 i64 0, ;; Offset in Bits
    1480 i32 0, ;; Flags
    1481 i32 7 ;; Encoding
    1482 }
    1483
    1484
    1485
    1486
    1487
    1488
    1489

    1490 long long
    1491
    1492
    1493
    1494
    1495
    1496
    
                      
                    
    1497 !2 = metadata !{
    1498 i32 524324, ;; Tag
    1499 metadata !1, ;; Context
    1500 metadata !"long long int",
    1501 metadata !1, ;; File
    1502 i32 0, ;; Line number
    1503 i64 64, ;; Size in Bits
    1504 i64 64, ;; Align in Bits
    1505 i64 0, ;; Offset in Bits
    1506 i32 0, ;; Flags
    1507 i32 5 ;; Encoding
    1508 }
    1509
    1510
    1511
    1512
    1513
    1514
    1515

    1516 unsigned long long
    1517
    1518
    1519
    1520
    1521
    1522
    
                      
                    
    1523 !2 = metadata !{
    1524 i32 524324, ;; Tag
    1525 metadata !1, ;; Context
    1526 metadata !"long long unsigned int",
    1527 metadata !1, ;; File
    1528 i32 0, ;; Line number
    1529 i64 64, ;; Size in Bits
    1530 i64 64, ;; Align in Bits
    1531 i64 0, ;; Offset in Bits
    1532 i32 0, ;; Flags
    1533 i32 7 ;; Encoding
    1534 }
    1535
    1536
    1537
    1538
    1539
    1540
    1541

    1542 float
    1543
    1544
    1545
    1546
    1547
    1548
    
                      
                    
    1549 !2 = metadata !{
    1550 i32 524324, ;; Tag
    1551 metadata !1, ;; Context
    1552 metadata !"float",
    1553 metadata !1, ;; File
    1554 i32 0, ;; Line number
    1555 i64 32, ;; Size in Bits
    1556 i64 32, ;; Align in Bits
    1557 i64 0, ;; Offset in Bits
    1558 i32 0, ;; Flags
    1559 i32 4 ;; Encoding
    1560 }
    1561
    1562
    1563
    1564
    1565
    1566
    1567

    1568 double
    1569
    1570
    1571
    1572
    1573
    1574
    
                      
                    
    1575 !2 = metadata !{
    1576 i32 524324, ;; Tag
    1577 metadata !1, ;; Context
    1578 metadata !"double",;; Name
    1579 metadata !1, ;; File
    1580 i32 0, ;; Line number
    1581 i64 64, ;; Size in Bits
    1582 i64 64, ;; Align in Bits
    1583 i64 0, ;; Offset in Bits
    1584 i32 0, ;; Flags
    1585 i32 4 ;; Encoding
    1586 }
    1587
    1588
    1589
    1590
    1591
    1592
    1593
    1594
    1595

    1596 C/C++ derived types
    1597
    1598
    1599
    1600
    1601

    Given the following as an example of C/C++ derived type:

    1602
    1603
    1604
    
                      
                    
    1605 typedef const int *IntPtr;
    1606
    1607
    1608
    1609

    a C/C++ front-end would generate the following descriptors:

    1610
    1611
    1612
    
                      
                    
    1613 ;;
    1614 ;; Define the typedef "IntPtr".
    1615 ;;
    1616 !2 = metadata !{
    1617 i32 524310, ;; Tag
    1618 metadata !1, ;; Context
    1619 metadata !"IntPtr", ;; Name
    1620 metadata !3, ;; File
    1621 i32 0, ;; Line number
    1622 i64 0, ;; Size in bits
    1623 i64 0, ;; Align in bits
    1624 i64 0, ;; Offset in bits
    1625 i32 0, ;; Flags
    1626 metadata !4 ;; Derived From type
    1627 }
    1628
    1629 ;;
    1630 ;; Define the pointer type.
    1631 ;;
    1632 !4 = metadata !{
    1633 i32 524303, ;; Tag
    1634 metadata !1, ;; Context
    1635 metadata !"", ;; Name
    1636 metadata !1, ;; File
    1637 i32 0, ;; Line number
    1638 i64 64, ;; Size in bits
    1639 i64 64, ;; Align in bits
    1640 i64 0, ;; Offset in bits
    1641 i32 0, ;; Flags
    1642 metadata !5 ;; Derived From type
    1643 }
    1644 ;;
    1645 ;; Define the const type.
    1646 ;;
    1647 !5 = metadata !{
    1648 i32 524326, ;; Tag
    1649 metadata !1, ;; Context
    1650 metadata !"", ;; Name
    1651 metadata !1, ;; File
    1652 i32 0, ;; Line number
    1653 i64 32, ;; Size in bits
    1654 i64 32, ;; Align in bits
    1655 i64 0, ;; Offset in bits
    1656 i32 0, ;; Flags
    1657 metadata !6 ;; Derived From type
    1658 }
    1659 ;;
    1660 ;; Define the int type.
    1661 ;;
    1662 !6 = metadata !{
    1663 i32 524324, ;; Tag
    1664 metadata !1, ;; Context
    1665 metadata !"int", ;; Name
    1666 metadata !1, ;; File
    1667 i32 0, ;; Line number
    1668 i64 32, ;; Size in bits
    1669 i64 32, ;; Align in bits
    1670 i64 0, ;; Offset in bits
    1671 i32 0, ;; Flags
    1672 5 ;; Encoding
    1673 }
    1674
    1675
    1676
    1677
    1678
    1679
    1680

    1681 C/C++ struct/union types
    1682
    1683
    1684
    1685
    1686

    Given the following as an example of C/C++ struct type:

    1687
    1688
    1689
    
                      
                    
    1690 struct Color {
    1691 unsigned Red;
    1692 unsigned Green;
    1693 unsigned Blue;
    1694 };
    1695
    1696
    1697
    1698

    a C/C++ front-end would generate the following descriptors:

    1699
    1700
    1701
    
                      
                    
    1702 ;;
    1703 ;; Define basic type for unsigned int.
    1704 ;;
    1705 !5 = metadata !{
    1706 i32 524324, ;; Tag
    1707 metadata !1, ;; Context
    1708 metadata !"unsigned int",
    1709 metadata !1, ;; File
    1710 i32 0, ;; Line number
    1711 i64 32, ;; Size in Bits
    1712 i64 32, ;; Align in Bits
    1713 i64 0, ;; Offset in Bits
    1714 i32 0, ;; Flags
    1715 i32 7 ;; Encoding
    1716 }
    1717 ;;
    1718 ;; Define composite type for struct Color.
    1719 ;;
    1720 !2 = metadata !{
    1721 i32 524307, ;; Tag
    1722 metadata !1, ;; Context
    1723 metadata !"Color", ;; Name
    1724 metadata !1, ;; Compile unit
    1725 i32 1, ;; Line number
    1726 i64 96, ;; Size in bits
    1727 i64 32, ;; Align in bits
    1728 i64 0, ;; Offset in bits
    1729 i32 0, ;; Flags
    1730 null, ;; Derived From
    1731 metadata !3, ;; Elements
    1732 i32 0 ;; Runtime Language
    1733 }
    1734
    1735 ;;
    1736 ;; Define the Red field.
    1737 ;;
    1738 !4 = metadata !{
    1739 i32 524301, ;; Tag
    1740 metadata !1, ;; Context
    1741 metadata !"Red", ;; Name
    1742 metadata !1, ;; File
    1743 i32 2, ;; Line number
    1744 i64 32, ;; Size in bits
    1745 i64 32, ;; Align in bits
    1746 i64 0, ;; Offset in bits
    1747 i32 0, ;; Flags
    1748 metadata !5 ;; Derived From type
    1749 }
    1750
    1751 ;;
    1752 ;; Define the Green field.
    1753 ;;
    1754 !6 = metadata !{
    1755 i32 524301, ;; Tag
    1756 metadata !1, ;; Context
    1757 metadata !"Green", ;; Name
    1758 metadata !1, ;; File
    1759 i32 3, ;; Line number
    1760 i64 32, ;; Size in bits
    1761 i64 32, ;; Align in bits
    1762 i64 32, ;; Offset in bits
    1763 i32 0, ;; Flags
    1764 metadata !5 ;; Derived From type
    1765 }
    1766
    1767 ;;
    1768 ;; Define the Blue field.
    1769 ;;
    1770 !7 = metadata !{
    1771 i32 524301, ;; Tag
    1772 metadata !1, ;; Context
    1773 metadata !"Blue", ;; Name
    1774 metadata !1, ;; File
    1775 i32 4, ;; Line number
    1776 i64 32, ;; Size in bits
    1777 i64 32, ;; Align in bits
    1778 i64 64, ;; Offset in bits
    1779 i32 0, ;; Flags
    1780 metadata !5 ;; Derived From type
    1781 }
    1782
    1783 ;;
    1784 ;; Define the array of fields used by the composite type Color.
    1785 ;;
    1786 !3 = metadata !{metadata !4, metadata !6, metadata !7}
    1787
    1788
    1789
    1790
    1791
    1792
    1793

    1794 C/C++ enumeration types
    1795
    1796
    1797
    1798
    1799

    Given the following as an example of C/C++ enumeration type:

    1800
    1801
    1802
    
                      
                    
    1803 enum Trees {
    1804 Spruce = 100,
    1805 Oak = 200,
    1806 Maple = 300
    1807 };
    1808
    1809
    1810
    1811

    a C/C++ front-end would generate the following descriptors:

    1812
    1813
    1814
    
                      
                    
    1815 ;;
    1816 ;; Define composite type for enum Trees
    1817 ;;
    1818 !2 = metadata !{
    1819 i32 524292, ;; Tag
    1820 metadata !1, ;; Context
    1821 metadata !"Trees", ;; Name
    1822 metadata !1, ;; File
    1823 i32 1, ;; Line number
    1824 i64 32, ;; Size in bits
    1825 i64 32, ;; Align in bits
    1826 i64 0, ;; Offset in bits
    1827 i32 0, ;; Flags
    1828 null, ;; Derived From type
    1829 metadata !3, ;; Elements
    1830 i32 0 ;; Runtime language
    1831 }
    1832
    1833 ;;
    1834 ;; Define the array of enumerators used by composite type Trees.
    1835 ;;
    1836 !3 = metadata !{metadata !4, metadata !5, metadata !6}
    1837
    1838 ;;
    1839 ;; Define Spruce enumerator.
    1840 ;;
    1841 !4 = metadata !{i32 524328, metadata !"Spruce", i64 100}
    1842
    1843 ;;
    1844 ;; Define Oak enumerator.
    1845 ;;
    1846 !5 = metadata !{i32 524328, metadata !"Oak", i64 200}
    1847
    1848 ;;
    1849 ;; Define Maple enumerator.
    1850 ;;
    1851 !6 = metadata !{i32 524328, metadata !"Maple", i64 300}
    1852
    1853
    1854
    1855
    1856
    1857
    1858
    1859
    1860
    1861
    1862

    1863 Debugging information format
    1864
    1865
    1866
    1867
    1868

    1869 Debugging Information Extension for Objective C Properties
    1870
    1871
    1872
    1873

    1874 Introduction
    1875
    1876
    1877
    1878
    1879

    Objective C provides a simpler way to declare and define accessor methods

    1880 using declared properties. The language provides features to declare a
    1881 property and to let compiler synthesize accessor methods.
    1882

    1883
    1884

    The debugger lets developer inspect Objective C interfaces and their

    1885 instance variables and class variables. However, the debugger does not know
    1886 anything about the properties defined in Objective C interfaces. The debugger
    1887 consumes information generated by compiler in DWARF format. The format does
    1888 not support encoding of Objective C properties. This proposal describes DWARF
    1889 extensions to encode Objective C properties, which the debugger can use to let
    1890 developers inspect Objective C properties.
    1891

    1892
    1893
    1894
    1895
    1896
    1897

    1898 Proposal
    1899
    1900
    1901
    1902
    1903

    Objective C properties exist separately from class members. A property

    1904 can be defined only by "setter" and "getter" selectors, and
    1905 be calculated anew on each access. Or a property can just be a direct access
    1906 to some declared ivar. Finally it can have an ivar "automatically
    1907 synthesized" for it by the compiler, in which case the property can be
    1908 referred to in user code directly using the standard C dereference syntax as
    1909 well as through the property "dot" syntax, but there is no entry in
    1910 the @interface declaration corresponding to this ivar.
    1911

    1912

    1913 To facilitate debugging, these properties we will add a new DWARF TAG into the
    1914 DW_TAG_structure_type definition for the class to hold the description of a
    1915 given property, and a set of DWARF attributes that provide said description.
    1916 The property tag will also contain the name and declared type of the property.
    1917

    1918

    1919 If there is a related ivar, there will also be a DWARF property attribute placed
    1920 in the DW_TAG_member DIE for that ivar referring back to the property TAG for
    1921 that property. And in the case where the compiler synthesizes the ivar directly,
    1922 the compiler is expected to generate a DW_TAG_member for that ivar (with the
    1923 DW_AT_artificial set to 1), whose name will be the name used to access this
    1924 ivar directly in code, and with the property attribute pointing back to the
    1925 property it is backing.
    1926

    1927

    1928 The following examples will serve as illustration for our discussion:
    1929

    1930
    1931
    1932
    
                      
                    
    1933 @interface I1 {
    1934 int n2;
    1935 }
    1936
    1937 @property int p1;
    1938 @property int p2;
    1939 @end
    1940
    1941 @implementation I1
    1942 @synthesize p1;
    1943 @synthesize p2 = n2;
    1944 @end
    1945
    1946
    1947
    1948

    1949 This produces the following DWARF (this is a "pseudo dwarfdump" output):
    1950

    1951
    1952
    
                      
                    
    1953 0x00000100: TAG_structure_type [7] *
    1954 AT_APPLE_runtime_class( 0x10 )
    1955 AT_name( "I1" )
    1956 AT_decl_file( "Objc_Property.m" )
    1957 AT_decl_line( 3 )
    1958
    1959 0x00000110 TAG_APPLE_property
    1960 AT_name ( "p1" )
    1961 AT_type ( {0x00000150} ( int ) )
    1962
    1963 0x00000120: TAG_APPLE_property
    1964 AT_name ( "p2" )
    1965 AT_type ( {0x00000150} ( int ) )
    1966
    1967 0x00000130: TAG_member [8]
    1968 AT_name( "_p1" )
    1969 AT_APPLE_property ( {0x00000110} "p1" )
    1970 AT_type( {0x00000150} ( int ) )
    1971 AT_artificial ( 0x1 )
    1972
    1973 0x00000140: TAG_member [8]
    1974 AT_name( "n2" )
    1975 AT_APPLE_property ( {0x00000120} "p2" )
    1976 AT_type( {0x00000150} ( int ) )
    1977
    1978 0x00000150: AT_type( ( int ) )
    1979
    1980
    1981
    1982

    Note, the current convention is that the name of the ivar for an

    1983 auto-synthesized property is the name of the property from which it derives with
    1984 an underscore prepended, as is shown in the example.
    1985 But we actually don't need to know this convention, since we are given the name
    1986 of the ivar directly.
    1987

    1988
    1989

    1990 Also, it is common practice in ObjC to have different property declarations in
    1991 the @interface and @implementation - e.g. to provide a read-only property in
    1992 the interface,and a read-write interface in the implementation. In that case,
    1993 the compiler should emit whichever property declaration will be in force in the
    1994 current translation unit.
    1995

    1996
    1997

    Developers can decorate a property with attributes which are encoded using

    1998 DW_AT_APPLE_property_attribute.
    1999

    2000
    2001
    2002
    
                      
                    
    2003 @property (readonly, nonatomic) int pr;
    2004
    2005
    2006

    2007 Which produces a property tag:
    2008

    2009
    2010
    
                      
                    
    2011 TAG_APPLE_property [8]
    2012 AT_name( "pr" )
    2013 AT_type ( {0x00000147} (int) )
    2014 AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
    2015
    2016
    2017
    2018

    The setter and getter method names are attached to the property using

    2019 DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes.
    2020

    2021
    2022
    
                      
                    
    2023 @interface I1
    2024 @property (setter=myOwnP3Setter:) int p3;
    2025 -(void)myOwnP3Setter:(int)a;
    2026 @end
    2027
    2028 @implementation I1
    2029 @synthesize p3;
    2030 -(void)myOwnP3Setter:(int)a{ }
    2031 @end
    2032
    2033
    2034
    2035

    2036 The DWARF for this would be:
    2037

    2038
    2039
    
                      
                    
    2040 0x000003bd: TAG_structure_type [7] *
    2041 AT_APPLE_runtime_class( 0x10 )
    2042 AT_name( "I1" )
    2043 AT_decl_file( "Objc_Property.m" )
    2044 AT_decl_line( 3 )
    2045
    2046 0x000003cd TAG_APPLE_property
    2047 AT_name ( "p3" )
    2048 AT_APPLE_property_setter ( "myOwnP3Setter:" )
    2049 AT_type( {0x00000147} ( int ) )
    2050
    2051 0x000003f3: TAG_member [8]
    2052 AT_name( "_p3" )
    2053 AT_type ( {0x00000147} ( int ) )
    2054 AT_APPLE_property ( {0x000003cd} )
    2055 AT_artificial ( 0x1 )
    2056
    2057
    2058
    2059
    2060
    2061
    2062

    2063 New DWARF Tags
    2064
    2065
    2066
    2067
    2068
    2069
    2070
    2071
    2072 TAG
    2073 Value
    2074
    2075
    2076 DW_TAG_APPLE_property
    2077 0x4200
    2078
    2079
    2080
    2081
    2082
    2083
    2084

    2085 New DWARF Attributes
    2086
    2087
    2088
    2089
    2090
    2091
    2092
    2093
    2094
    2095 Attribute
    2096 Value
    2097 Classes
    2098
    2099
    2100 DW_AT_APPLE_property
    2101 0x3fed
    2102 Reference
    2103
    2104
    2105 DW_AT_APPLE_property_getter
    2106 0x3fe9
    2107 String
    2108
    2109
    2110 DW_AT_APPLE_property_setter
    2111 0x3fea
    2112 String
    2113
    2114
    2115 DW_AT_APPLE_property_attribute
    2116 0x3feb
    2117 Constant
    2118
    2119
    2120
    2121
    2122
    2123
    2124

    2125 New DWARF Constants
    2126
    2127
    2128
    2129
    2130
    2131
    2132
    2133
    2134 Name
    2135 Value
    2136
    2137
    2138 DW_AT_APPLE_PROPERTY_readonly
    2139 0x1
    2140
    2141
    2142 DW_AT_APPLE_PROPERTY_readwrite
    2143 0x2
    2144
    2145
    2146 DW_AT_APPLE_PROPERTY_assign
    2147 0x4
    2148
    2149
    2150 DW_AT_APPLE_PROPERTY_retain
    2151 0x8
    2152
    2153
    2154 DW_AT_APPLE_PROPERTY_copy
    2155 0x10
    2156
    2157
    2158 DW_AT_APPLE_PROPERTY_nonatomic
    2159 0x20
    2160
    2161
    2162
    2163
    2164
    2165
    2166
    2167

    2168 Name Accelerator Tables
    2169
    2170
    2171
    2172
    2173

    2174 Introduction
    2175
    2176
    2177
    2178

    The .debug_pubnames and .debug_pubtypes formats are not what a debugger

    2179 needs. The "pub" in the section name indicates that the entries in the
    2180 table are publicly visible names only. This means no static or hidden
    2181 functions show up in the .debug_pubnames. No static variables or private class
    2182 variables are in the .debug_pubtypes. Many compilers add different things to
    2183 these tables, so we can't rely upon the contents between gcc, icc, or clang.

    2184
    2185

    The typical query given by users tends not to match up with the contents of

    2186 these tables. For example, the DWARF spec states that "In the case of the
    2187 name of a function member or static data member of a C++ structure, class or
    2188 union, the name presented in the .debug_pubnames section is not the simple
    2189 name given by the DW_AT_name attribute of the referenced debugging information
    2190 entry, but rather the fully qualified name of the data or function member."
    2191 So the only names in these tables for complex C++ entries is a fully
    2192 qualified name. Debugger users tend not to enter their search strings as
    2193 "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c". So
    2194 the name entered in the name table must be demangled in order to chop it up
    2195 appropriately and additional names must be manually entered into the table
    2196 to make it effective as a name lookup table for debuggers to use.

    2197
    2198

    All debuggers currently ignore the .debug_pubnames table as a result of

    2199 its inconsistent and useless public-only name content making it a waste of
    2200 space in the object file. These tables, when they are written to disk, are
    2201 not sorted in any way, leaving every debugger to do its own parsing
    2202 and sorting. These tables also include an inlined copy of the string values
    2203 in the table itself making the tables much larger than they need to be on
    2204 disk, especially for large C++ programs.

    2205
    2206

    Can't we just fix the sections by adding all of the names we need to this

    2207 table? No, because that is not what the tables are defined to contain and we
    2208 won't know the difference between the old bad tables and the new good tables.
    2209 At best we could make our own renamed sections that contain all of the data
    2210 we need.

    2211
    2212

    These tables are also insufficient for what a debugger like LLDB needs.

    2213 LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is
    2214 then often asked to look for type "foo" or namespace "bar", or list items in
    2215 namespace "baz". Namespaces are not included in the pubnames or pubtypes
    2216 tables. Since clang asks a lot of questions when it is parsing an expression,
    2217 we need to be very fast when looking up names, as it happens a lot. Having new
    2218 accelerator tables that are optimized for very quick lookups will benefit
    2219 this type of debugging experience greatly.

    2220
    2221

    We would like to generate name lookup tables that can be mapped into

    2222 memory from disk, and used as is, with little or no up-front parsing. We would
    2223 also be able to control the exact content of these different tables so they
    2224 contain exactly what we need. The Name Accelerator Tables were designed
    2225 to fix these issues. In order to solve these issues we need to:

    2226
    2227
    2228
  • Have a format that can be mapped into memory from disk and used as is
  • 2229
  • Lookups should be very fast
  • 2230
  • Extensible table format so these tables can be made by many producers
  • 2231
  • Contain all of the names needed for typical lookups out of the box
  • 2232
  • Strict rules for the contents of tables
  • 2233
    2234
    2235

    Table size is important and the accelerator table format should allow the

    2236 reuse of strings from common string tables so the strings for the names are
    2237 not duplicated. We also want to make sure the table is ready to be used as-is
    2238 by simply mapping the table into memory with minimal header parsing.

    2239
    2240

    The name lookups need to be fast and optimized for the kinds of lookups

    2241 that debuggers tend to do. Optimally we would like to touch as few parts of
    2242 the mapped table as possible when doing a name lookup and be able to quickly
    2243 find the name entry we are looking for, or discover there are no matches. In
    2244 the case of debuggers we optimized for lookups that fail most of the time.

    2245
    2246

    Each table that is defined should have strict rules on exactly what is in

    2247 the accelerator tables and documented so clients can rely on the content.

    2248
    2249
    2250
    2251
    2252

    2253 Hash Tables
    2254
    2255
    2256
    2257
    2258
    Standard Hash Tables
    2259
    2260

    Typical hash tables have a header, buckets, and each bucket points to the

    2261 bucket contents:
    2262

    2263
    2264
    2265
    
                      
                    
    2266 .------------.
    2267 | HEADER |
    2268 |------------|
    2269 | BUCKETS |
    2270 |------------|
    2271 | DATA |
    2272 `------------'
    2273
    2274
    2275
    2276

    The BUCKETS are an array of offsets to DATA for each hash:

    2277
    2278
    2279
    
                      
                    
    2280 .------------.
    2281 | 0x00001000 | BUCKETS[0]
    2282 | 0x00002000 | BUCKETS[1]
    2283 | 0x00002200 | BUCKETS[2]
    2284 | 0x000034f0 | BUCKETS[3]
    2285 | | ...
    2286 | 0xXXXXXXXX | BUCKETS[n_buckets]
    2287 '------------'
    2288
    2289
    2290
    2291

    So for bucket[3] in the example above, we have an offset into the table

    2292 0x000034f0 which points to a chain of entries for the bucket. Each bucket
    2293 must contain a next pointer, full 32 bit hash value, the string itself,
    2294 and the data for the current string value.

    2295
    2296
    2297
    
                      
                    
    2298 .------------.
    2299 0x000034f0: | 0x00003500 | next pointer
    2300 | 0x12345678 | 32 bit hash
    2301 | "erase" | string value
    2302 | data[n] | HashData for this bucket
    2303 |------------|
    2304 0x00003500: | 0x00003550 | next pointer
    2305 | 0x29273623 | 32 bit hash
    2306 | "dump" | string value
    2307 | data[n] | HashData for this bucket
    2308 |------------|
    2309 0x00003550: | 0x00000000 | next pointer
    2310 | 0x82638293 | 32 bit hash
    2311 | "main" | string value
    2312 | data[n] | HashData for this bucket
    2313 `------------'
    2314
    2315
    2316
    2317

    The problem with this layout for debuggers is that we need to optimize for

    2318 the negative lookup case where the symbol we're searching for is not present.
    2319 So if we were to lookup "printf" in the table above, we would make a 32 hash
    2320 for "printf", it might match bucket[3]. We would need to go to the offset
    2321 0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we
    2322 need to read the next pointer, then read the hash, compare it, and skip to
    2323 the next bucket. Each time we are skipping many bytes in memory and touching
    2324 new cache pages just to do the compare on the full 32 bit hash. All of these
    2325 accesses then tell us that we didn't have a match.

    2326
    2327
    Name Hash Tables
    2328
    2329

    To solve the issues mentioned above we have structured the hash tables

    2330 a bit differently: a header, buckets, an array of all unique 32 bit hash
    2331 values, followed by an array of hash value data offsets, one for each hash
    2332 value, then the data for all hash values:

    2333
    2334
    2335
    
                      
                    
    2336 .-------------.
    2337 | HEADER |
    2338 |-------------|
    2339 | BUCKETS |
    2340 |-------------|
    2341 | HASHES |
    2342 |-------------|
    2343 | OFFSETS |
    2344 |-------------|
    2345 | DATA |
    2346 `-------------'
    2347
    2348
    2349
    2350

    The BUCKETS in the name tables are an index into the HASHES array. By

    2351 making all of the full 32 bit hash values contiguous in memory, we allow
    2352 ourselves to efficiently check for a match while touching as little
    2353 memory as possible. Most often checking the 32 bit hash values is as far as
    2354 the lookup goes. If it does match, it usually is a match with no collisions.
    2355 So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash
    2356 values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:

    2357
    2358
    2359
    
                      
                    
    2360 .-------------------------.
    2361 | HEADER.magic | uint32_t
    2362 | HEADER.version | uint16_t
    2363 | HEADER.hash_function | uint16_t
    2364 | HEADER.bucket_count | uint32_t
    2365 | HEADER.hashes_count | uint32_t
    2366 | HEADER.header_data_len | uint32_t
    2367 | HEADER_DATA | HeaderData
    2368 |-------------------------|
    2369 | BUCKETS | uint32_t[bucket_count] // 32 bit hash indexes
    2370 |-------------------------|
    2371 | HASHES | uint32_t[hashes_count] // 32 bit hash values
    2372 |-------------------------|
    2373 | OFFSETS | uint32_t[hashes_count] // 32 bit offsets to hash value data
    2374 |-------------------------|
    2375 | ALL HASH DATA |
    2376 `-------------------------'
    2377
    2378
    2379
    2380

    So taking the exact same data from the standard hash example above we end up

    2381 with:

    2382
    2383
    2384
    
                      
                    
    2385 .------------.
    2386 | HEADER |
    2387 |------------|
    2388 | 0 | BUCKETS[0]
    2389 | 2 | BUCKETS[1]
    2390 | 5 | BUCKETS[2]
    2391 | 6 | BUCKETS[3]
    2392 | | ...
    2393 | ... | BUCKETS[n_buckets]
    2394 |------------|
    2395 | 0x........ | HASHES[0]
    2396 | 0x........ | HASHES[1]
    2397 | 0x........ | HASHES[2]
    2398 | 0x........ | HASHES[3]
    2399 | 0x........ | HASHES[4]
    2400 | 0x........ | HASHES[5]
    2401 | 0x12345678 | HASHES[6] hash for BUCKETS[3]
    2402 | 0x29273623 | HASHES[7] hash for BUCKETS[3]
    2403 | 0x82638293 | HASHES[8] hash for BUCKETS[3]
    2404 | 0x........ | HASHES[9]
    2405 | 0x........ | HASHES[10]
    2406 | 0x........ | HASHES[11]
    2407 | 0x........ | HASHES[12]
    2408 | 0x........ | HASHES[13]
    2409 | 0x........ | HASHES[n_hashes]
    2410 |------------|
    2411 | 0x........ | OFFSETS[0]
    2412 | 0x........ | OFFSETS[1]
    2413 | 0x........ | OFFSETS[2]
    2414 | 0x........ | OFFSETS[3]
    2415 | 0x........ | OFFSETS[4]
    2416 | 0x........ | OFFSETS[5]
    2417 | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3]
    2418 | 0x00003500 | OFFSETS[7] offset for BUCKETS[3]
    2419 | 0x00003550 | OFFSETS[8] offset for BUCKETS[3]
    2420 | 0x........ | OFFSETS[9]
    2421 | 0x........ | OFFSETS[10]
    2422 | 0x........ | OFFSETS[11]
    2423 | 0x........ | OFFSETS[12]
    2424 | 0x........ | OFFSETS[13]
    2425 | 0x........ | OFFSETS[n_hashes]
    2426 |------------|
    2427 | |
    2428 | |
    2429 | |
    2430 | |
    2431 | |
    2432 |------------|
    2433 0x000034f0: | 0x00001203 | .debug_str ("erase")
    2434 | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
    2435 | 0x........ | HashData[0]
    2436 | 0x........ | HashData[1]
    2437 | 0x........ | HashData[2]
    2438 | 0x........ | HashData[3]
    2439 | 0x00000000 | String offset into .debug_str (terminate data for hash)
    2440 |------------|
    2441 0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
    2442 | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
    2443 | 0x........ | HashData[0]
    2444 | 0x........ | HashData[1]
    2445 | 0x00001203 | String offset into .debug_str ("dump")
    2446 | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
    2447 | 0x........ | HashData[0]
    2448 | 0x........ | HashData[1]
    2449 | 0x........ | HashData[2]
    2450 | 0x00000000 | String offset into .debug_str (terminate data for hash)
    2451 |------------|
    2452 0x00003550: | 0x00001203 | String offset into .debug_str ("main")
    2453 | 0x00000009 | A 32 bit array count - number of HashData with name "main"
    2454 | 0x........ | HashData[0]
    2455 | 0x........ | HashData[1]
    2456 | 0x........ | HashData[2]
    2457 | 0x........ | HashData[3]
    2458 | 0x........ | HashData[4]
    2459 | 0x........ | HashData[5]
    2460 | 0x........ | HashData[6]
    2461 | 0x........ | HashData[7]
    2462 | 0x........ | HashData[8]
    2463 | 0x00000000 | String offset into .debug_str (terminate data for hash)
    2464 `------------'
    2465
    2466
    2467
    2468

    So we still have all of the same data, we just organize it more efficiently

    2469 for debugger lookup. If we repeat the same "printf" lookup from above, we
    2470 would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash
    2471 value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index
    2472 into the HASHES table. We would then compare any consecutive 32 bit hashes
    2473 values in the HASHES array as long as the hashes would be in BUCKETS[3]. We
    2474 do this by verifying that each subsequent hash value modulo n_buckets is still
    2475 3. In the case of a failed lookup we would access the memory for BUCKETS[3], and
    2476 then compare a few consecutive 32 bit hashes before we know that we have no match.
    2477 We don't end up marching through multiple words of memory and we really keep the
    2478 number of processor data cache lines being accessed as small as possible.

    2479
    2480

    The string hash that is used for these lookup tables is the Daniel J.

    2481 Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very
    2482 good hash for all kinds of names in programs with very few hash collisions.

    2483
    2484

    Empty buckets are designated by using an invalid hash index of UINT32_MAX.

    2485
    2486
    2487
    2488

    2489 Details
    2490
    2491
    2492
    2493

    These name hash tables are designed to be generic where specializations of

    2494 the table get to define additional data that goes into the header
    2495 ("HeaderData"), how the string value is stored ("KeyType") and the content
    2496 of the data for each hash value.

    2497
    2498
    Header Layout
    2499

    The header has a fixed part, and the specialized part. The exact format of

    2500 the header is:

    2501
    2502
    
                      
                    
    2503 struct Header
    2504 {
    2505 uint32_t magic; // 'HASH' magic value to allow endian detection
    2506 uint16_t version; // Version number
    2507 uint16_t hash_function; // The hash function enumeration that was used
    2508 uint32_t bucket_count; // The number of buckets in this hash table
    2509 uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table
    2510 uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
    2511 // Specifically the length of the following HeaderData field - this does not
    2512 // include the size of the preceding fields
    2513 HeaderData header_data; // Implementation specific header data
    2514 };
    2515
    2516
    2517

    The header starts with a 32 bit "magic" value which must be 'HASH' encoded as

    2518 an ASCII integer. This allows the detection of the start of the hash table and
    2519 also allows the table's byte order to be determined so the table can be
    2520 correctly extracted. The "magic" value is followed by a 16 bit version number
    2521 which allows the table to be revised and modified in the future. The current
    2522 version number is 1. "hash_function" is a uint16_t enumeration that specifies
    2523 which hash function was used to produce this table. The current values for the
    2524 hash function enumerations include:

    2525
    2526
    
                      
                    
    2527 enum HashFunctionType
    2528 {
    2529 eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
    2530 };
    2531
    2532
    2533

    "bucket_count" is a 32 bit unsigned integer that represents how many buckets

    2534 are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash
    2535 values that are in the HASHES array, and is the same number of offsets are
    2536 contained in the OFFSETS array. "header_data_len" specifies the size in
    2537 bytes of the HeaderData that is filled in by specialized versions of this
    2538 table.

    2539
    2540
    Fixed Lookup
    2541

    The header is followed by the buckets, hashes, offsets, and hash value

    2542 data.
    2543
    2544
    
                      
                    
    2545 struct FixedTable
    2546 {
    2547 uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below
    2548 uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table
    2549 uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above
    2550 };
    2551
    2552
    2553

    "buckets" is an array of 32 bit indexes into the "hashes" array. The

    2554 "hashes" array contains all of the 32 bit hash values for all names in the
    2555 hash table. Each hash in the "hashes" table has an offset in the "offsets"
    2556 array that points to the data for the hash value.

    2557
    2558

    This table setup makes it very easy to repurpose these tables to contain

    2559 different data, while keeping the lookup mechanism the same for all tables.
    2560 This layout also makes it possible to save the table to disk and map it in
    2561 later and do very efficient name lookups with little or no parsing.

    2562
    2563

    DWARF lookup tables can be implemented in a variety of ways and can store

    2564 a lot of information for each name. We want to make the DWARF tables
    2565 extensible and able to store the data efficiently so we have used some of the
    2566 DWARF features that enable efficient data storage to define exactly what kind
    2567 of data we store for each name.

    2568
    2569

    The "HeaderData" contains a definition of the contents of each HashData

    2570 chunk. We might want to store an offset to all of the debug information
    2571 entries (DIEs) for each name. To keep things extensible, we create a list of
    2572 items, or Atoms, that are contained in the data for each name. First comes the
    2573 type of the data in each atom:

    2574
    2575
    
                      
                    
    2576 enum AtomType
    2577 {
    2578 eAtomTypeNULL = 0u,
    2579 eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding
    2580 eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question
    2581 eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
    2582 eAtomTypeNameFlags = 4u, // Flags from enum NameFlags
    2583 eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags
    2584 };
    2585
    2586
    2587

    The enumeration values and their meanings are:

    2588
    2589
    
                      
                    
    2590 eAtomTypeNULL - a termination atom that specifies the end of the atom list
    2591 eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name
    2592 eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE
    2593 eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
    2594 eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...)
    2595 eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...)
    2596
    2597
    2598

    Then we allow each atom type to define the atom type and how the data for

    2599 each atom type data is encoded:

    2600
    2601
    
                      
                    
    2602 struct Atom
    2603 {
    2604 uint16_t type; // AtomType enum value
    2605 uint16_t form; // DWARF DW_FORM_XXX defines
    2606 };
    2607
    2608
    2609

    The "form" type above is from the DWARF specification and defines the

    2610 exact encoding of the data for the Atom type. See the DWARF specification for
    2611 the DW_FORM_ definitions.

    2612
    2613
    
                      
                    
    2614 struct HeaderData
    2615 {
    2616 uint32_t die_offset_base;
    2617 uint32_t atom_count;
    2618 Atoms atoms[atom_count0];
    2619 };
    2620
    2621
    2622

    "HeaderData" defines the base DIE offset that should be added to any atoms

    2623 that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4,
    2624 DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in
    2625 each "HashData" object -- Atom.form tells us how large each field will be in
    2626 the HashData and the Atom.type tells us how this data should be interpreted.

    2627
    2628

    For the current implementations of the ".apple_names" (all functions + globals),

    2629 the ".apple_types" (names of all types that are defined), and the
    2630 ".apple_namespaces" (all namespaces), we currently set the Atom array to be:

    2631
    2632
    
                      
                    
    2633 HeaderData.atom_count = 1;
    2634 HeaderData.atoms[0].type = eAtomTypeDIEOffset;
    2635 HeaderData.atoms[0].form = DW_FORM_data4;
    2636
    2637
    2638

    This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is

    2639 encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
    2640 multiple matching DIEs in a single file, which could come up with an inlined
    2641 function for instance. Future tables could include more information about the
    2642 DIE such as flags indicating if the DIE is a function, method, block,
    2643 or inlined.

    2644
    2645

    The KeyType for the DWARF table is a 32 bit string table offset into the

    2646 ".debug_str" table. The ".debug_str" is the string table for the DWARF which
    2647 may already contain copies of all of the strings. This helps make sure, with
    2648 help from the compiler, that we reuse the strings between all of the DWARF
    2649 sections and keeps the hash table size down. Another benefit to having the
    2650 compiler generate all strings as DW_FORM_strp in the debug info, is that
    2651 DWARF parsing can be made much faster.

    2652
    2653

    After a lookup is made, we get an offset into the hash data. The hash data

    2654 needs to be able to deal with 32 bit hash collisions, so the chunk of data
    2655 at the offset in the hash data consists of a triple:

    2656
    2657
    
                      
                    
    2658 uint32_t str_offset
    2659 uint32_t hash_data_count
    2660 HashData[hash_data_count]
    2661
    2662
    2663

    If "str_offset" is zero, then the bucket contents are done. 99.9% of the

    2664 hash data chunks contain a single item (no 32 bit hash collision):

    2665
    2666
    
                      
                    
    2667 .------------.
    2668 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
    2669 | 0x00000004 | uint32_t HashData count
    2670 | 0x........ | uint32_t HashData[0] DIE offset
    2671 | 0x........ | uint32_t HashData[1] DIE offset
    2672 | 0x........ | uint32_t HashData[2] DIE offset
    2673 | 0x........ | uint32_t HashData[3] DIE offset
    2674 | 0x00000000 | uint32_t KeyType (end of hash chain)
    2675 `------------'
    2676
    2677
    2678

    If there are collisions, you will have multiple valid string offsets:

    2679
    2680
    
                      
                    
    2681 .------------.
    2682 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
    2683 | 0x00000004 | uint32_t HashData count
    2684 | 0x........ | uint32_t HashData[0] DIE offset
    2685 | 0x........ | uint32_t HashData[1] DIE offset
    2686 | 0x........ | uint32_t HashData[2] DIE offset
    2687 | 0x........ | uint32_t HashData[3] DIE offset
    2688 | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
    2689 | 0x00000002 | uint32_t HashData count
    2690 | 0x........ | uint32_t HashData[0] DIE offset
    2691 | 0x........ | uint32_t HashData[1] DIE offset
    2692 | 0x00000000 | uint32_t KeyType (end of hash chain)
    2693 `------------'
    2694
    2695
    2696

    Current testing with real world C++ binaries has shown that there is around 1

    2697 32 bit hash collision per 100,000 name entries.

    2698
    2699
    2700

    2701 Contents
    2702
    2703
    2704
    2705

    As we said, we want to strictly define exactly what is included in the

    2706 different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types",
    2707 and ".apple_namespaces".

    2708
    2709

    ".apple_names" sections should contain an entry for each DWARF DIE whose

    2710 DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that
    2711 has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or
    2712 DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr
    2713 in the location (global and static variables). All global and static variables
    2714 should be included, including those scoped within functions and classes. For
    2715 example using the following code:

    2716
    2717
    
                      
                    
    2718 static int var = 0;
    2719
    2720 void f ()
    2721 {
    2722 static int var = 0;
    2723 }
    2724
    2725
    2726

    Both of the static "var" variables would be included in the table. All

    2727 functions should emit both their full names and their basenames. For C or C++,
    2728 the full name is the mangled name (if available) which is usually in the
    2729 DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function
    2730 basename. If global or static variables have a mangled name in a
    2731 DW_AT_MIPS_linkage_name attribute, this should be emitted along with the
    2732 simple name found in the DW_AT_name attribute.

    2733
    2734

    ".apple_types" sections should contain an entry for each DWARF DIE whose

    2735 tag is one of:

    2736
    2737
  • DW_TAG_array_type
  • 2738
  • DW_TAG_class_type
  • 2739
  • DW_TAG_enumeration_type
  • 2740
  • DW_TAG_pointer_type
  • 2741
  • DW_TAG_reference_type
  • 2742
  • DW_TAG_string_type
  • 2743
  • DW_TAG_structure_type
  • 2744
  • DW_TAG_subroutine_type
  • 2745
  • DW_TAG_typedef
  • 2746
  • DW_TAG_union_type
  • 2747
  • DW_TAG_ptr_to_member_type
  • 2748
  • DW_TAG_set_type
  • 2749
  • DW_TAG_subrange_type
  • 2750
  • DW_TAG_base_type
  • 2751
  • DW_TAG_const_type
  • 2752
  • DW_TAG_constant
  • 2753
  • DW_TAG_file_type
  • 2754
  • DW_TAG_namelist
  • 2755
  • DW_TAG_packed_type
  • 2756
  • DW_TAG_volatile_type
  • 2757
  • DW_TAG_restrict_type
  • 2758
  • DW_TAG_interface_type
  • 2759
  • DW_TAG_unspecified_type
  • 2760
  • DW_TAG_shared_type
  • 2761
    2762

    Only entries with a DW_AT_name attribute are included, and the entry must

    2763 not be a forward declaration (DW_AT_declaration attribute with a non-zero value).
    2764 For example, using the following code:

    2765
    2766
    
                      
                    
    2767 int main ()
    2768 {
    2769 int *b = 0;
    2770 return *b;
    2771 }
    2772
    2773
    2774

    We get a few type DIEs:

    2775
    2776
    
                      
                    
    2777 0x00000067: TAG_base_type [5]
    2778 AT_encoding( DW_ATE_signed )
    2779 AT_name( "int" )
    2780 AT_byte_size( 0x04 )
    2781
    2782 0x0000006e: TAG_pointer_type [6]
    2783 AT_type( {0x00000067} ( int ) )
    2784 AT_byte_size( 0x08 )
    2785
    2786
    2787

    The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.

    2788
    2789

    ".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If

    2790 we run into a namespace that has no name this is an anonymous namespace,
    2791 and the name should be output as "(anonymous namespace)" (without the quotes).
    2792 Why? This matches the output of the abi::cxa_demangle() that is in the standard
    2793 C++ library that demangles mangled names.

    2794
    2795
    2796
    2797

    2798 Language Extensions and File Format Changes
    2799
    2800
    2801
    2802
    Objective-C Extensions
    2803

    ".apple_objc" section should contain all DW_TAG_subprogram DIEs for an

    2804 Objective-C class. The name used in the hash table is the name of the
    2805 Objective-C class itself. If the Objective-C class has a category, then an
    2806 entry is made for both the class name without the category, and for the class
    2807 name with the category. So if we have a DIE at offset 0x1234 with a name
    2808 of method "-[NSString(my_additions) stringWithSpecialString:]", we would add
    2809 an entry for "NSString" that points to DIE 0x1234, and an entry for
    2810 "NSString(my_additions)" that points to 0x1234. This allows us to quickly
    2811 track down all Objective-C methods for an Objective-C class when doing
    2812 expressions. It is needed because of the dynamic nature of Objective-C where
    2813 anyone can add methods to a class. The DWARF for Objective-C methods is also
    2814 emitted differently from C++ classes where the methods are not usually
    2815 contained in the class definition, they are scattered about across one or more
    2816 compile units. Categories can also be defined in different shared libraries.
    2817 So we need to be able to quickly find all of the methods and class functions
    2818 given the Objective-C class name, or quickly find all methods and class
    2819 functions for a class + category name. This table does not contain any selector
    2820 names, it just maps Objective-C class names (or class names + category) to all
    2821 of the methods and class functions. The selectors are added as function
    2822 basenames in the .debug_names section.

    2823
    2824

    In the ".apple_names" section for Objective-C functions, the full name is the

    2825 entire function name with the brackets ("-[NSString stringWithCString:]") and the
    2826 basename is the selector only ("stringWithCString:").

    2827
    2828
    Mach-O Changes
    2829

    The sections names for the apple hash tables are for non mach-o files. For

    2830 mach-o files, the sections should be contained in the "__DWARF" segment with
    2831 names as follows:

    2832
    2833
  • ".apple_names" -> "__apple_names"
  • 2834
  • ".apple_types" -> "__apple_types"
  • 2835
  • ".apple_namespaces" -> "__apple_namespac" (16 character limit)
  • 2836
  • ".apple_objc" -> "__apple_objc"
  • 2837
    2838
    2839
    2840
    2841
    2842
    2843
    2844
    2845
    2846
    2847 src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS">
    2848
    2849 src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01">
    2850
    2851 Chris Lattner
    2852 LLVM Compiler Infrastructure
    2853 Last modified: $Date$
    2854
    2855
    2856
    2857
    0 ================================
    1 Source Level Debugging with LLVM
    2 ================================
    3
    4 .. sectionauthor:: Chris Lattner and Jim Laskey
    5
    6 .. contents::
    7 :local:
    8
    9 Introduction
    10 ============
    11
    12 This document is the central repository for all information pertaining to debug
    13 information in LLVM. It describes the :ref:`actual format that the LLVM debug
    14 information takes `, which is useful for those interested in creating
    15 front-ends or dealing directly with the information. Further, this document
    16 provides specific examples of what debug information for C/C++ looks like.
    17
    18 Philosophy behind LLVM debugging information
    19 --------------------------------------------
    20
    21 The idea of the LLVM debugging information is to capture how the important
    22 pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
    23 Several design aspects have shaped the solution that appears here. The
    24 important ones are:
    25
    26 * Debugging information should have very little impact on the rest of the
    27 compiler. No transformations, analyses, or code generators should need to
    28 be modified because of debugging information.
    29
    30 * LLVM optimizations should interact in :ref:`well-defined and easily described
    31 ways ` with the debugging information.
    32
    33 * Because LLVM is designed to support arbitrary programming languages,
    34 LLVM-to-LLVM tools should not need to know anything about the semantics of
    35 the source-level-language.
    36
    37 * Source-level languages are often **widely** different from one another.
    38 LLVM should not put any restrictions of the flavor of the source-language,
    39 and the debugging information should work with any language.
    40
    41 * With code generator support, it should be possible to use an LLVM compiler
    42 to compile a program to native machine code and standard debugging
    43 formats. This allows compatibility with traditional machine-code level
    44 debuggers, like GDB or DBX.
    45
    46 The approach used by the LLVM implementation is to use a small set of
    47 :ref:`intrinsic functions ` to define a mapping
    48 between LLVM program objects and the source-level objects. The description of
    49 the source-level program is maintained in LLVM metadata in an
    50 :ref:`implementation-defined format ` (the C/C++ front-end
    51 currently uses working draft 7 of the `DWARF 3 standard
    52 `_).
    53
    54 When a program is being debugged, a debugger interacts with the user and turns
    55 the stored debug information into source-language specific information. As
    56 such, a debugger must be aware of the source-language, and is thus tied to a
    57 specific language or family of languages.
    58
    59 Debug information consumers
    60 ---------------------------
    61
    62 The role of debug information is to provide meta information normally stripped
    63 away during the compilation process. This meta information provides an LLVM
    64 user a relationship between generated code and the original program source
    65 code.
    66
    67 Currently, debug information is consumed by DwarfDebug to produce dwarf
    68 information used by the gdb debugger. Other targets could use the same
    69 information to produce stabs or other debug forms.
    70
    71 It would also be reasonable to use debug information to feed profiling tools
    72 for analysis of generated code, or, tools for reconstructing the original
    73 source from generated code.
    74
    75 TODO - expound a bit more.
    76
    77 .. _intro_debugopt:
    78
    79 Debugging optimized code
    80 ------------------------
    81
    82 An extremely high priority of LLVM debugging information is to make it interact
    83 well with optimizations and analysis. In particular, the LLVM debug
    84 information provides the following guarantees:
    85
    86 * LLVM debug information **always provides information to accurately read
    87 the source-level state of the program**, regardless of which LLVM
    88 optimizations have been run, and without any modification to the
    89 optimizations themselves. However, some optimizations may impact the
    90 ability to modify the current state of the program with a debugger, such
    91 as setting program variables, or calling functions that have been
    92 deleted.
    93
    94 * As desired, LLVM optimizations can be upgraded to be aware of the LLVM
    95 debugging information, allowing them to update the debugging information
    96 as they perform aggressive optimizations. This means that, with effort,
    97 the LLVM optimizers could optimize debug code just as well as non-debug
    98 code.
    99
    100 * LLVM debug information does not prevent optimizations from
    101 happening (for example inlining, basic block reordering/merging/cleanup,
    102 tail duplication, etc).
    103
    104 * LLVM debug information is automatically optimized along with the rest of
    105 the program, using existing facilities. For example, duplicate
    106 information is automatically merged by the linker, and unused information
    107 is automatically removed.
    108
    109 Basically, the debug information allows you to compile a program with
    110 "``-O0 -g``" and get full debug information, allowing you to arbitrarily modify
    111 the program as it executes from a debugger. Compiling a program with
    112 "``-O3 -g``" gives you full debug information that is always available and
    113 accurate for reading (e.g., you get accurate stack traces despite tail call
    114 elimination and inlining), but you might lose the ability to modify the program
    115 and call functions where were optimized out of the program, or inlined away
    116 completely.
    117
    118 :ref:`LLVM test suite ` provides a framework to test
    119 optimizer's handling of debugging information. It can be run like this:
    120
    121 .. code-block:: bash
    122
    123 % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
    124 % make TEST=dbgopt
    125
    126 This will test impact of debugging information on optimization passes. If
    127 debugging information influences optimization passes then it will be reported
    128 as a failure. See :doc:`TestingGuide` for more information on LLVM test
    129 infrastructure and how to run various tests.
    130
    131 .. _format:
    132
    133 Debugging information format
    134 ============================
    135
    136 LLVM debugging information has been carefully designed to make it possible for
    137 the optimizer to optimize the program and debugging information without
    138 necessarily having to know anything about debugging information. In
    139 particular, the use of metadata avoids duplicated debugging information from
    140 the beginning, and the global dead code elimination pass automatically deletes
    141 debugging information for a function if it decides to delete the function.
    142
    143 To do this, most of the debugging information (descriptors for types,
    144 variables, functions, source files, etc) is inserted by the language front-end
    145 in the form of LLVM metadata.
    146
    147 Debug information is designed to be agnostic about the target debugger and
    148 debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic
    149 pass to decode the information that represents variables, types, functions,
    150 namespaces, etc: this allows for arbitrary source-language semantics and
    151 type-systems to be used, as long as there is a module written for the target
    152 debugger to interpret the information.
    153
    154 To provide basic functionality, the LLVM debugger does have to make some
    155 assumptions about the source-level language being debugged, though it keeps
    156 these to a minimum. The only common features that the LLVM debugger assumes
    157 exist are :ref:`source files `, and :ref:`program objects
    158 `. These abstract objects are used by a debugger to
    159 form stack traces, show information about local variables, etc.
    160
    161 This section of the documentation first describes the representation aspects
    162 common to any source-language. :ref:`ccxx_frontend` describes the data layout
    163 conventions used by the C and C++ front-ends.
    164
    165 Debug information descriptors
    166 -----------------------------
    167
    168 In consideration of the complexity and volume of debug information, LLVM
    169 provides a specification for well formed debug descriptors.
    170
    171 Consumers of LLVM debug information expect the descriptors for program objects
    172 to start in a canonical format, but the descriptors can include additional
    173 information appended at the end that is source-language specific. All LLVM
    174 debugging information is versioned, allowing backwards compatibility in the
    175 case that the core structures need to change in some way. Also, all debugging
    176 information objects start with a tag to indicate what type of object it is.
    177 The source-language is allowed to define its own objects, by using unreserved
    178 tag numbers. We recommend using with tags in the range 0x1000 through 0x2000
    179 (there is a defined ``enum DW_TAG_user_base = 0x1000``.)
    180
    181 The fields of debug descriptors used internally by LLVM are restricted to only
    182 the simple data types ``i32``, ``i1``, ``float``, ``double``, ``mdstring`` and
    183 ``mdnode``.
    184
    185 .. code-block:: llvm
    186
    187 !1 = metadata !{
    188 i32, ;; A tag
    189 ...
    190 }
    191
    192 The first field of a descriptor is always an
    193 ``i32`` containing a tag value identifying the content of the descriptor.
    194 The remaining fields are specific to the descriptor. The values of tags are
    195 loosely bound to the tag values of DWARF information entries. However, that
    196 does not restrict the use of the information supplied to DWARF targets. To
    197 facilitate versioning of debug information, the tag is augmented with the
    198 current debug version (``LLVMDebugVersion = 8 << 16`` or 0x80000 or
    199 524288.)
    200
    201 The details of the various descriptors follow.
    202
    203 Compile unit descriptors
    204 ^^^^^^^^^^^^^^^^^^^^^^^^
    205
    206 .. code-block:: llvm
    207
    208 !0 = metadata !{
    209 i32, ;; Tag = 17 + LLVMDebugVersion (DW_TAG_compile_unit)
    210 i32, ;; Unused field.
    211 i32, ;; DWARF language identifier (ex. DW_LANG_C89)
    212 metadata, ;; Source file name
    213 metadata, ;; Source file directory (includes trailing slash)
    214 metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
    215 i1, ;; True if this is a main compile unit.
    216 i1, ;; True if this is optimized.
    217 metadata, ;; Flags
    218 i32 ;; Runtime version
    219 metadata ;; List of enums types
    220 metadata ;; List of retained types
    221 metadata ;; List of subprograms
    222 metadata ;; List of global variables
    223 }
    224
    225 These descriptors contain a source language ID for the file (we use the DWARF
    226 3.0 ID numbers, such as ``DW_LANG_C89``, ``DW_LANG_C_plus_plus``,
    227 ``DW_LANG_Cobol74``, etc), three strings describing the filename, working
    228 directory of the compiler, and an identifier string for the compiler that
    229 produced it.
    230
    231 Compile unit descriptors provide the root context for objects declared in a
    232 specific compilation unit. File descriptors are defined using this context.
    233 These descriptors are collected by a named metadata ``!llvm.dbg.cu``. Compile
    234 unit descriptor keeps track of subprograms, global variables and type
    235 information.
    236
    237 .. _format_files:
    238
    239 File descriptors
    240 ^^^^^^^^^^^^^^^^
    241
    242 .. code-block:: llvm
    243
    244 !0 = metadata !{
    245 i32, ;; Tag = 41 + LLVMDebugVersion (DW_TAG_file_type)
    246 metadata, ;; Source file name
    247 metadata, ;; Source file directory (includes trailing slash)
    248 metadata ;; Unused
    249 }
    250
    251 These descriptors contain information for a file. Global variables and top
    252 level functions would be defined using this context. File descriptors also
    253 provide context for source line correspondence.
    254
    255 Each input file is encoded as a separate file descriptor in LLVM debugging
    256 information output.
    257
    258 .. _format_global_variables:
    259
    260 Global variable descriptors
    261 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    262
    263 .. code-block:: llvm
    264
    265 !1 = metadata !{
    266 i32, ;; Tag = 52 + LLVMDebugVersion (DW_TAG_variable)
    267 i32, ;; Unused field.
    268 metadata, ;; Reference to context descriptor
    269 metadata, ;; Name
    270 metadata, ;; Display name (fully qualified C++ name)
    271 metadata, ;; MIPS linkage name (for C++)
    272 metadata, ;; Reference to file where defined
    273 i32, ;; Line number where defined
    274 metadata, ;; Reference to type descriptor
    275 i1, ;; True if the global is local to compile unit (static)
    276 i1, ;; True if the global is defined in the compile unit (not extern)
    277 {}* ;; Reference to the global variable
    278 }
    279
    280 These descriptors provide debug information about globals variables. The
    281 provide details such as name, type and where the variable is defined. All
    282 global variables are collected inside the named metadata ``!llvm.dbg.cu``.
    283
    284 .. _format_subprograms:
    285
    286 Subprogram descriptors
    287 ^^^^^^^^^^^^^^^^^^^^^^
    288
    289 .. code-block:: llvm
    290
    291 !2 = metadata !{
    292 i32, ;; Tag = 46 + LLVMDebugVersion (DW_TAG_subprogram)
    293 i32, ;; Unused field.
    294 metadata, ;; Reference to context descriptor
    295 metadata, ;; Name
    296 metadata, ;; Display name (fully qualified C++ name)
    297 metadata, ;; MIPS linkage name (for C++)
    298 metadata, ;; Reference to file where defined
    299 i32, ;; Line number where defined
    300 metadata, ;; Reference to type descriptor
    301 i1, ;; True if the global is local to compile unit (static)
    302 i1, ;; True if the global is defined in the compile unit (not extern)
    303 i32, ;; Line number where the scope of the subprogram begins
    304 i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
    305 i32, ;; Index into a virtual function
    306 metadata, ;; indicates which base type contains the vtable pointer for the
    307 ;; derived class
    308 i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped.
    309 i1, ;; isOptimized
    310 Function * , ;; Pointer to LLVM function
    311 metadata, ;; Lists function template parameters
    312 metadata, ;; Function declaration descriptor
    313 metadata ;; List of function variables
    314 }
    315
    316 These descriptors provide debug information about functions, methods and
    317 subprograms. They provide details such as name, return types and the source
    318 location where the subprogram is defined.
    319
    320 Block descriptors
    321 ^^^^^^^^^^^^^^^^^
    322
    323 .. code-block:: llvm
    324
    325 !3 = metadata !{
    326 i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block)
    327 metadata,;; Reference to context descriptor
    328 i32, ;; Line number
    329 i32, ;; Column number
    330 metadata,;; Reference to source file
    331 i32 ;; Unique ID to identify blocks from a template function
    332 }
    333
    334 This descriptor provides debug information about nested blocks within a
    335 subprogram. The line number and column numbers are used to dinstinguish two
    336 lexical blocks at same depth.
    337
    338 .. code-block:: llvm
    339
    340 !3 = metadata !{
    341 i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block)
    342 metadata ;; Reference to the scope we're annotating with a file change
    343 metadata,;; Reference to the file the scope is enclosed in.
    344 }
    345
    346 This descriptor provides a wrapper around a lexical scope to handle file
    347 changes in the middle of a lexical block.
    348
    349 .. _format_basic_type:
    350
    351 Basic type descriptors
    352 ^^^^^^^^^^^^^^^^^^^^^^
    353
    354 .. code-block:: llvm
    355
    356 !4 = metadata !{
    357 i32, ;; Tag = 36 + LLVMDebugVersion (DW_TAG_base_type)
    358 metadata, ;; Reference to context
    359 metadata, ;; Name (may be "" for anonymous types)
    360 metadata, ;; Reference to file where defined (may be NULL)
    361 i32, ;; Line number where defined (may be 0)
    362 i64, ;; Size in bits
    363 i64, ;; Alignment in bits
    364 i64, ;; Offset in bits
    365 i32, ;; Flags
    366 i32 ;; DWARF type encoding
    367 }
    368
    369 These descriptors define primitive types used in the code. Example ``int``,
    370 ``bool`` and ``float``. The context provides the scope of the type, which is
    371 usually the top level. Since basic types are not usually user defined the
    372 context and line number can be left as NULL and 0. The size, alignment and
    373 offset are expressed in bits and can be 64 bit values. The alignment is used
    374 to round the offset when embedded in a :ref:`composite type
    375 ` (example to keep float doubles on 64 bit boundaries).
    376 The offset is the bit offset if embedded in a :ref:`composite type
    377 `.
    378
    379 The type encoding provides the details of the type. The values are typically
    380 one of the following:
    381
    382 .. code-block:: llvm
    383
    384 DW_ATE_address = 1
    385 DW_ATE_boolean = 2
    386 DW_ATE_float = 4
    387 DW_ATE_signed = 5
    388 DW_ATE_signed_char = 6
    389 DW_ATE_unsigned = 7
    390 DW_ATE_unsigned_char = 8
    391
    392 .. _format_derived_type:
    393
    394 Derived type descriptors
    395 ^^^^^^^^^^^^^^^^^^^^^^^^
    396
    397 .. code-block:: llvm
    398
    399 !5 = metadata !{
    400 i32, ;; Tag (see below)
    401 metadata, ;; Reference to context
    402 metadata, ;; Name (may be "" for anonymous types)
    403 metadata, ;; Reference to file where defined (may be NULL)
    404 i32, ;; Line number where defined (may be 0)
    405 i64, ;; Size in bits
    406 i64, ;; Alignment in bits
    407 i64, ;; Offset in bits
    408 i32, ;; Flags to encode attributes, e.g. private
    409 metadata, ;; Reference to type derived from
    410 metadata, ;; (optional) Name of the Objective C property associated with
    411 ;; Objective-C an ivar
    412 metadata, ;; (optional) Name of the Objective C property getter selector.
    413 metadata, ;; (optional) Name of the Objective C property setter selector.
    414 i32 ;; (optional) Objective C property attributes.
    415 }
    416
    417 These descriptors are used to define types derived from other types. The value
    418 of the tag varies depending on the meaning. The following are possible tag
    419 values:
    420
    421 .. code-block:: llvm
    422
    423 DW_TAG_formal_parameter = 5
    424 DW_TAG_member = 13
    425 DW_TAG_pointer_type = 15
    426 DW_TAG_reference_type = 16
    427 DW_TAG_typedef = 22
    428 DW_TAG_const_type = 38
    429 DW_TAG_volatile_type = 53
    430 DW_TAG_restrict_type = 55
    431
    432 ``DW_TAG_member`` is used to define a member of a :ref:`composite type
    433 ` or :ref:`subprogram `. The type
    434 of the member is the :ref:`derived type `.
    435 ``DW_TAG_formal_parameter`` is used to define a member which is a formal
    436 argument of a subprogram.
    437
    438 ``DW_TAG_typedef`` is used to provide a name for the derived type.
    439
    440 ``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``,
    441 ``DW_TAG_volatile_type`` and ``DW_TAG_restrict_type`` are used to qualify the
    442 :ref:`derived type `.
    443
    444 :ref:`Derived type ` location can be determined from the
    445 context and line number. The size, alignment and offset are expressed in bits
    446 and can be 64 bit values. The alignment is used to round the offset when
    447 embedded in a :ref:`composite type ` (example to keep
    448 float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
    449 in a :ref:`composite type `.
    450
    451 Note that the ``void *`` type is expressed as a type derived from NULL.
    452
    453 .. _format_composite_type:
    454
    455 Composite type descriptors
    456 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    457
    458 .. code-block:: llvm
    459
    460 !6 = metadata !{
    461 i32, ;; Tag (see below)
    462 metadata, ;; Reference to context
    463 metadata, ;; Name (may be "" for anonymous types)
    464 metadata, ;; Reference to file where defined (may be NULL)
    465 i32, ;; Line number where defined (may be 0)
    466 i64, ;; Size in bits
    467 i64, ;; Alignment in bits
    468 i64, ;; Offset in bits
    469 i32, ;; Flags
    470 metadata, ;; Reference to type derived from
    471 metadata, ;; Reference to array of member descriptors
    472 i32 ;; Runtime languages
    473 }
    474
    475 These descriptors are used to define types that are composed of 0 or more
    476 elements. The value of the tag varies depending on the meaning. The following
    477 are possible tag values:
    478
    479 .. code-block:: llvm
    480
    481 DW_TAG_array_type = 1
    482 DW_TAG_enumeration_type = 4
    483 DW_TAG_structure_type = 19
    484 DW_TAG_union_type = 23
    485 DW_TAG_vector_type = 259
    486 DW_TAG_subroutine_type = 21
    487 DW_TAG_inheritance = 28
    488
    489 The vector flag indicates that an array type is a native packed vector.
    490
    491 The members of array types (tag = ``DW_TAG_array_type``) or vector types (tag =
    492 ``DW_TAG_vector_type``) are :ref:`subrange descriptors `, each
    493 representing the range of subscripts at that level of indexing.
    494
    495 The members of enumeration types (tag = ``DW_TAG_enumeration_type``) are
    496 :ref:`enumerator descriptors `, each representing the
    497 definition of enumeration value for the set. All enumeration type descriptors
    498 are collected inside the named metadata ``!llvm.dbg.cu``.
    499
    500 The members of structure (tag = ``DW_TAG_structure_type``) or union (tag =
    501 ``DW_TAG_union_type``) types are any one of the :ref:`basic
    502 `, :ref:`derived ` or :ref:`composite
    503 ` type descriptors, each representing a field member of
    504 the structure or union.
    505
    506 For C++ classes (tag = ``DW_TAG_structure_type``), member descriptors provide
    507 information about base classes, static members and member functions. If a
    508 member is a :ref:`derived type descriptor ` and has a tag
    509 of ``DW_TAG_inheritance``, then the type represents a base class. If the member
    510 of is a :ref:`global variable descriptor ` then it
    511 represents a static member. And, if the member is a :ref:`subprogram
    512 descriptor ` then it represents a member function. For
    513 static members and member functions, ``getName()`` returns the members link or
    514 the C++ mangled name. ``getDisplayName()`` the simplied version of the name.
    515
    516 The first member of subroutine (tag = ``DW_TAG_subroutine_type``) type elements
    517 is the return type for the subroutine. The remaining elements are the formal
    518 arguments to the subroutine.
    519
    520 :ref:`Composite type ` location can be determined from
    521 the context and line number. The size, alignment and offset are expressed in
    522 bits and can be 64 bit values. The alignment is used to round the offset when
    523 embedded in a :ref:`composite type ` (as an example, to
    524 keep float doubles on 64 bit boundaries). The offset is the bit offset if
    525 embedded in a :ref:`composite type `.
    526
    527 .. _format_subrange:
    528
    529 Subrange descriptors
    530 ^^^^^^^^^^^^^^^^^^^^
    531
    532 .. code-block:: llvm
    533
    534 !42 = metadata !{
    535 i32, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type)
    536 i64, ;; Low value
    537 i64 ;; High value
    538 }
    539
    540 These descriptors are used to define ranges of array subscripts for an array
    541 :ref:`composite type `. The low value defines the lower
    542 bounds typically zero for C/C++. The high value is the upper bounds. Values
    543 are 64 bit. ``High - Low + 1`` is the size of the array. If ``Low > High``
    544 the array bounds are not included in generated debugging information.
    545
    546 .. _format_enumerator:
    547
    548 Enumerator descriptors
    549 ^^^^^^^^^^^^^^^^^^^^^^
    550
    551 .. code-block:: llvm
    552
    553 !6 = metadata !{
    554 i32, ;; Tag = 40 + LLVMDebugVersion (DW_TAG_enumerator)
    555 metadata, ;; Name
    556 i64 ;; Value
    557 }
    558
    559 These descriptors are used to define members of an enumeration :ref:`composite
    560 type `, it associates the name to the value.
    561
    562 Local variables
    563 ^^^^^^^^^^^^^^^
    564
    565 .. code-block:: llvm
    566
    567 !7 = metadata !{
    568 i32, ;; Tag (see below)
    569 metadata, ;; Context
    570 metadata, ;; Name
    571 metadata, ;; Reference to file where defined
    572 i32, ;; 24 bit - Line number where defined
    573 ;; 8 bit - Argument number. 1 indicates 1st argument.
    574 metadata, ;; Type descriptor
    575 i32, ;; flags
    576 metadata ;; (optional) Reference to inline location
    577 }
    578
    579 These descriptors are used to define variables local to a sub program. The
    580 value of the tag depends on the usage of the variable:
    581
    582 .. code-block:: llvm
    583
    584 DW_TAG_auto_variable = 256
    585 DW_TAG_arg_variable = 257
    586 DW_TAG_return_variable = 258
    587
    588 An auto variable is any variable declared in the body of the function. An
    589 argument variable is any variable that appears as a formal argument to the
    590 function. A return variable is used to track the result of a function and has
    591 no source correspondent.
    592
    593 The context is either the subprogram or block where the variable is defined.
    594 Name the source variable name. Context and line indicate where the variable
    595 was defined. Type descriptor defines the declared type of the variable.
    596
    597 .. _format_common_intrinsics:
    598
    599 Debugger intrinsic functions
    600 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    601
    602 LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to
    603 provide debug information at various points in generated code.
    604
    605 ``llvm.dbg.declare``
    606 ^^^^^^^^^^^^^^^^^^^^
    607
    608 .. code-block:: llvm
    609
    610 void %llvm.dbg.declare(metadata, metadata)
    611
    612 This intrinsic provides information about a local element (e.g., variable).
    613 The first argument is metadata holding the alloca for the variable. The second
    614 argument is metadata containing a description of the variable.
    615
    616 ``llvm.dbg.value``
    617 ^^^^^^^^^^^^^^^^^^
    618
    619 .. code-block:: llvm
    620
    621 void %llvm.dbg.value(metadata, i64, metadata)
    622
    623 This intrinsic provides information when a user source variable is set to a new
    624 value. The first argument is the new value (wrapped as metadata). The second
    625 argument is the offset in the user source variable where the new value is
    626 written. The third argument is metadata containing a description of the user
    627 source variable.
    628
    629 Object lifetimes and scoping
    630 ============================
    631
    632 In many languages, the local variables in functions can have their lifetimes or
    633 scopes limited to a subset of a function. In the C family of languages, for
    634 example, variables are only live (readable and writable) within the source
    635 block that they are defined in. In functional languages, values are only
    636 readable after they have been defined. Though this is a very obvious concept,
    637 it is non-trivial to model in LLVM, because it has no notion of scoping in this
    638 sense, and does not want to be tied to a language's scoping rules.
    639
    640 In order to handle this, the LLVM debug format uses the metadata attached to
    641 llvm instructions to encode line number and scoping information. Consider the
    642 following C fragment, for example:
    643
    644 .. code-block:: c
    645
    646 1. void foo() {
    647 2. int X = 21;
    648 3. int Y = 22;
    649 4. {
    650 5. int Z = 23;
    651 6. Z = X;
    652 7. }
    653 8. X = Y;
    654 9. }
    655
    656 Compiled to LLVM, this function would be represented like this:
    657
    658 .. code-block:: llvm
    659
    660 define void @foo() nounwind ssp {
    661 entry:
    662 %X = alloca i32, align 4 ; [#uses=4]
    663 %Y = alloca i32, align 4 ; [#uses=4]
    664 %Z = alloca i32, align 4 ; [#uses=3]
    665 %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1]
    666 call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7
    667 store i32 21, i32* %X, !dbg !8
    668 %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1]
    669 call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10
    670 store i32 22, i32* %Y, !dbg !11
    671 %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1]
    672 call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14
    673 store i32 23, i32* %Z, !dbg !15
    674 %tmp = load i32* %X, !dbg !16 ; [#uses=1]
    675 %tmp1 = load i32* %Y, !dbg !16 ; [#uses=1]
    676 %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; [#uses=1]
    677 store i32 %add, i32* %Z, !dbg !16
    678 %tmp2 = load i32* %Y, !dbg !17 ; [#uses=1]
    679 store i32 %tmp2, i32* %X, !dbg !17
    680 ret void, !dbg !18
    681 }
    682
    683 declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone
    684
    685 !0 = metadata !{i32 459008, metadata !1, metadata !"X",
    686 metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ]
    687 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
    688 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo",
    689 metadata !"foo", metadata !3, i32 1, metadata !4,
    690 i1 false, i1 true}; [DW_TAG_subprogram ]
    691 !3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c",
    692 metadata !"/private/tmp", metadata !"clang 1.1", i1 true,
    693 i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ]
    694 !4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0,
    695 i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ]
    696 !5 = metadata !{null}
    697 !6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0,
    698 i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ]
    699 !7 = metadata !{i32 2, i32 7, metadata !1, null}
    700 !8 = metadata !{i32 2, i32 3, metadata !1, null}
    701 !9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3,
    702 metadata !6}; [ DW_TAG_auto_variable ]
    703 !10 = metadata !{i32 3, i32 7, metadata !1, null}
    704 !11 = metadata !{i32 3, i32 3, metadata !1, null}
    705 !12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5,
    706 metadata !6}; [ DW_TAG_auto_variable ]
    707 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
    708 !14 = metadata !{i32 5, i32 9, metadata !13, null}
    709 !15 = metadata !{i32 5, i32 5, metadata !13, null}
    710 !16 = metadata !{i32 6, i32 5, metadata !13, null}
    711 !17 = metadata !{i32 8, i32 3, metadata !1, null}
    712 !18 = metadata !{i32 9, i32 1, metadata !2, null}
    713
    714 This example illustrates a few important details about LLVM debugging
    715 information. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and
    716 location information, which are attached to an instruction, are applied
    717 together to allow a debugger to analyze the relationship between statements,
    718 variable definitions, and the code used to implement the function.
    719
    720 .. code-block:: llvm
    721
    722 call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7
    723
    724 The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
    725 variable ``X``. The metadata ``!dbg !7`` attached to the intrinsic provides
    726 scope information for the variable ``X``.
    727
    728 .. code-block:: llvm
    729
    730 !7 = metadata !{i32 2, i32 7, metadata !1, null}
    731 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
    732 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo",
    733 metadata !"foo", metadata !"foo", metadata !3, i32 1,
    734 metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ]
    735
    736 Here ``!7`` is metadata providing location information. It has four fields:
    737 line number, column number, scope, and original scope. The original scope
    738 represents inline location if this instruction is inlined inside a caller, and
    739 is null otherwise. In this example, scope is encoded by ``!1``. ``!1``
    740 represents a lexical block inside the scope ``!2``, where ``!2`` is a
    741 :ref:`subprogram descriptor `. This way the location
    742 information attached to the intrinsics indicates that the variable ``X`` is
    743 declared at line number 2 at a function level scope in function ``foo``.
    744
    745 Now lets take another example.
    746
    747 .. code-block:: llvm
    748
    749 call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14
    750
    751 The second intrinsic ``%llvm.dbg.declare`` encodes debugging information for
    752 variable ``Z``. The metadata ``!dbg !14`` attached to the intrinsic provides
    753 scope information for the variable ``Z``.
    754
    755 .. code-block:: llvm
    756
    757 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
    758 !14 = metadata !{i32 5, i32 9, metadata !13, null}
    759
    760 Here ``!14`` indicates that ``Z`` is declared at line number 5 and
    761 column number 9 inside of lexical scope ``!13``. The lexical scope itself
    762 resides inside of lexical scope ``!1`` described above.
    763
    764 The scope information attached with each instruction provides a straightforward
    765 way to find instructions covered by a scope.
    766
    767 .. _ccxx_frontend:
    768
    769 C/C++ front-end specific debug information
    770 ==========================================
    771
    772 The C and C++ front-ends represent information about the program in a format
    773 that is effectively identical to `DWARF 3.0
    774 `_ in terms of information
    775 content. This allows code generators to trivially support native debuggers by
    776 generating standard dwarf information, and contains enough information for
    777 non-dwarf targets to translate it as needed.
    778
    779 This section describes the forms used to represent C and C++ programs. Other
    780 languages could pattern themselves after this (which itself is tuned to
    781 representing programs in the same way that DWARF 3 does), or they could choose
    782 to provide completely different forms if they don't fit into the DWARF model.
    783 As support for debugging information gets added to the various LLVM
    784 source-language front-ends, the information used should be documented here.
    785
    786 The following sections provide examples of various C/C++ constructs and the
    787 debug information that would best describe those constructs.
    788
    789 C/C++ source file information
    790 -----------------------------
    791
    792 Given the source files ``MySource.cpp`` and ``MyHeader.h`` located in the