llvm.org GIT mirror llvm / 24f9633
[DebugInfo][Docs] Document variable location metadata transformations This patch adds documentation explaining how variable location information is compiled from the IR representation down to the end of the codegen pipeline, but avoiding discussion of file formats and encoding. This should make it clearer how the dbg.value / dbg.declare etc intrinsics are transformed and arranged into DBG_VALUE instructions, and their meaning. Differential Revision: https://reviews.llvm.org/D59790 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358385 91177308-0d34-0410-b5e6-96231b3b80d8 Jeremy Morse 1 year, 6 months ago
1 changed file(s) with 310 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
514514 recovered, then an undef dbg.value is necessary to terminate earlier variable
515515 locations. Additional undef dbg.values may be necessary when the debugger can
516516 observe re-ordering of assignments.
517
518 How variable location metadata is transformed during CodeGen
519 ============================================================
520
521 LLVM preserves debug information throughout mid-level and backend passes,
522 ultimately producing a mapping between source-level information and
523 instruction ranges. This
524 is relatively straightforwards for line number information, as mapping
525 instructions to line numbers is a simple association. For variable locations
526 however the story is more complex. As each ``llvm.dbg.value`` intrinsic
527 represents a source-level assignment of a value to a source variable, the
528 variable location intrinsics effectively embed a small imperative program
529 within the LLVM IR. By the end of CodeGen, this becomes a mapping from each
530 variable to their machine locations over ranges of instructions.
531 From IR to object emission, the major transformations which affect variable
532 location fidelity are:
533 1. Instruction Selection
534 2. Register allocation
535 3. Block layout
536
537 each of which are discussed below. In addition, instruction scheduling can
538 significantly change the ordering of the program, and occurs in a number of
539 different passes.
540
541 Variable locations in Instruction Selection and MIR
542 ---------------------------------------------------
543
544 Instruction selection creates a MIR function from an IR function, and just as
545 it transforms ``intermediate`` instructions into machine instructions, so must
546 ``intermediate`` variable locations become machine variable locations.
547 Within IR, variable locations are always identified by a Value, but in MIR
548 there can be different types of variable locations. In addition, some IR
549 locations become unavailable, for example if the operation of multiple IR
550 instructions are combined into one machine instruction (such as
551 multiply-and-accumulate) then intermediate Values are lost. To track variable
552 locations through instruction selection, they are first separated into
553 locations that do not depend on code generation (constants, stack locations,
554 allocated virtual registers) and those that do. For those that do, debug
555 metadata is attached to SDNodes in SelectionDAGs. After instruction selection
556 has occurred and a MIR function is created, if the SDNode associated with debug
557 metadata is allocated a virtual register, that virtual register is used as the
558 variable location. If the SDNode is folded into a machine instruction or
559 otherwise transformed into a non-register, the variable location becomes
560 unavailable.
561
562 Locations that are unavailable are treated as if they have been optimized out:
563 in IR the location would be assigned ``undef`` by a debug intrinsic, and in MIR
564 the equivalent location is used.
565
566 After MIR locations are assigned to each variable, machine pseudo-instructions
567 corresponding to each ``llvm.dbg.value`` and ``llvm.dbg.addr`` intrinsic are
568 inserted. These ``DBG_VALUE`` instructions appear thus:
569
570 .. code-block:: text
571
572 DBG_VALUE %1, $noreg, !123, !DIExpression()
573
574 And have the following operands:
575 * The first operand can record the variable location as a register, an
576 immediate, or the base address register if the original debug intrinsic
577 referred to memory. ``$noreg`` indicates the variable location is undefined,
578 equivalent to an ``undef`` dbg.value operand.
579 * The type of the second operand indicates whether the variable location is
580 directly referred to by the DBG_VALUE, or whether it is indirect. The
581 ``$noreg`` register signifies the former, an immediate operand (0) the
582 latter.
583 * Operand 3 is the Variable field of the original debug intrinsic.
584 * Operand 4 is the Expression field of the original debug intrinsic.
585
586 The position at which the DBG_VALUEs are inserted should correspond to the
587 positions of their matching ``llvm.dbg.value`` intrinsics in the IR block. As
588 with optimization, LLVM aims to preserve the order in which variable
589 assignments occurred in the source program. However SelectionDAG performs some
590 instruction scheduling, which can reorder assignments (discussed below).
591 Function parameter locations are moved to the beginning of the function if
592 they're not already, to ensure they're immediately available on function entry.
593
594 To demonstrate variable locations during instruction selection, consider
595 the following example:
596
597 .. code-block:: llvm
598
599 define i32 @foo(i32* %addr) {
600 entry:
601 call void @llvm.dbg.value(metadata i32 0, metadata !3, metadata !DIExpression()), !dbg !5
602 br label %bb1, !dbg !5
603
604 bb1: ; preds = %bb1, %entry
605 %bar.0 = phi i32 [ 0, %entry ], [ %add, %bb1 ]
606 call void @llvm.dbg.value(metadata i32 %bar.0, metadata !3, metadata !DIExpression()), !dbg !5
607 %addr1 = getelementptr i32, i32 *%addr, i32 1, !dbg !5
608 call void @llvm.dbg.value(metadata i32 *%addr1, metadata !3, metadata !DIExpression()), !dbg !5
609 %loaded1 = load i32, i32* %addr1, !dbg !5
610 %addr2 = getelementptr i32, i32 *%addr, i32 %bar.0, !dbg !5
611 call void @llvm.dbg.value(metadata i32 *%addr2, metadata !3, metadata !DIExpression()), !dbg !5
612 %loaded2 = load i32, i32* %addr2, !dbg !5
613 %add = add i32 %bar.0, 1, !dbg !5
614 call void @llvm.dbg.value(metadata i32 %add, metadata !3, metadata !DIExpression()), !dbg !5
615 %added = add i32 %loaded1, %loaded2
616 %cond = icmp ult i32 %added, %bar.0, !dbg !5
617 br i1 %cond, label %bb1, label %bb2, !dbg !5
618
619 bb2: ; preds = %bb1
620 ret i32 0, !dbg !5
621 }
622
623 If one compiles this IR with ``llc -o - -start-after=codegen-prepare -stop-after=expand-isel-pseudos -mtriple=x86_64--``, the following MIR is produced:
624
625 .. code-block:: text
626
627 bb.0.entry:
628 successors: %bb.1(0x80000000)
629 liveins: $rdi
630
631 %2:gr64 = COPY $rdi
632 %3:gr32 = MOV32r0 implicit-def dead $eflags
633 DBG_VALUE 0, $noreg, !3, !DIExpression(), debug-location !5
634
635 bb.1.bb1:
636 successors: %bb.1(0x7c000000), %bb.2(0x04000000)
637
638 %0:gr32 = PHI %3, %bb.0, %1, %bb.1
639 DBG_VALUE %0, $noreg, !3, !DIExpression(), debug-location !5
640 DBG_VALUE %2, $noreg, !3, !DIExpression(DW_OP_plus_uconst, 4, DW_OP_stack_value), debug-location !5
641 %4:gr32 = MOV32rm %2, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1)
642 %5:gr64_nosp = MOVSX64rr32 %0, debug-location !5
643 DBG_VALUE $noreg, $noreg, !3, !DIExpression(), debug-location !5
644 %1:gr32 = INC32r %0, implicit-def dead $eflags, debug-location !5
645 DBG_VALUE %1, $noreg, !3, !DIExpression(), debug-location !5
646 %6:gr32 = ADD32rm %4, %2, 4, killed %5, 0, $noreg, implicit-def dead $eflags :: (load 4 from %ir.addr2)
647 %7:gr32 = SUB32rr %6, %0, implicit-def $eflags, debug-location !5
648 JB_1 %bb.1, implicit $eflags, debug-location !5
649 JMP_1 %bb.2, debug-location !5
650
651 bb.2.bb2:
652 %8:gr32 = MOV32r0 implicit-def dead $eflags
653 $eax = COPY %8, debug-location !5
654 RET 0, $eax, debug-location !5
655
656 Observe first that there is a DBG_VALUE instruction for every ``llvm.dbg.value``
657 intrinsic in the source IR, ensuring no source level assignments go missing.
658 Then consider the different ways in which variable locations have been recorded:
659
660 * For the first dbg.value an immediate operand is used to record a zero value.
661 * The dbg.value of the PHI instruction leads to a DBG_VALUE of virtual register
662 ``%0``.
663 * The first GEP has its effect folded into the first load instruction
664 (as a 4-byte offset), but the variable location is salvaged by folding
665 the GEPs effect into the DIExpression.
666 * The second GEP is also folded into the corresponding load. However, it is
667 insufficiently simple to be salvaged, and is emitted as a ``$noreg``
668 DBG_VALUE, indicating that the variable takes on an undefined location.
669 * The final dbg.value has its Value placed in virtual register ``%1``.
670
671 Instruction Scheduling
672 ----------------------
673
674 A number of passes can reschedule instructions, notably instruction selection
675 and the pre-and-post RA machine schedulers. Instruction scheduling can
676 significantly change the nature of the program -- in the (very unlikely) worst
677 case the instruction sequence could be completely reversed. In such
678 circumstances LLVM follows the principle applied to optimizations, that it is
679 better for the debugger not to display any state than a misleading state.
680 Thus, whenever instructions are advanced in order of execution, any
681 corresponding DBG_VALUE is kept in its original position, and if an instruction
682 is delayed then the variable is given an undefined location for the duration
683 of the delay. To illustrate, consider this pseudo-MIR:
684
685 .. code-block:: text
686
687 %1:gr32 = MOV32rm %0, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1)
688 DBG_VALUE %1, $noreg, !1, !2
689 %4:gr32 = ADD32rr %3, %2, implicit-def dead $eflags
690 DBG_VALUE %4, $noreg, !3, !4
691 %7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags
692 DBG_VALUE %7, $noreg, !5, !6
693
694 Imagine that the SUB32rr were moved forward to give us the following MIR:
695
696 .. code-block:: text
697
698 %7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags
699 %1:gr32 = MOV32rm %0, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1)
700 DBG_VALUE %1, $noreg, !1, !2
701 %4:gr32 = ADD32rr %3, %2, implicit-def dead $eflags
702 DBG_VALUE %4, $noreg, !3, !4
703 DBG_VALUE %7, $noreg, !5, !6
704
705 In this circumstance LLVM would leave the MIR as shown above. Were we to move
706 the DBG_VALUE of virtual register %7 upwards with the SUB32rr, we would re-order
707 assignments and introduce a new state of the program. Wheras with the solution
708 above, the debugger will see one fewer combination of variable values, because
709 ``!3`` and ``!5`` will change value at the same time. This is preferred over
710 misrepresenting the original program.
711
712 In comparison, if one sunk the MOV32rm, LLVM would produce the following:
713
714 .. code-block:: text
715
716 DBG_VALUE $noreg, $noreg, !1, !2
717 %4:gr32 = ADD32rr %3, %2, implicit-def dead $eflags
718 DBG_VALUE %4, $noreg, !3, !4
719 %7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags
720 DBG_VALUE %7, $noreg, !5, !6
721 %1:gr32 = MOV32rm %0, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1)
722 DBG_VALUE %1, $noreg, !1, !2
723
724 Here, to avoid presenting a state in which the first assignment to ``!1``
725 disappears, the DBG_VALUE at the top of the block assigns the variable the
726 undefined location, until its value is available at the end of the block where
727 an additional DBG_VALUE is added. Were any other DBG_VALUE for ``!1`` to occur
728 in the instructions that the MOV32rm was sunk past, the DBG_VALUE for ``%1``
729 would be dropped and the debugger would never observe it in the variable. This
730 accurately reflects that the value is not available during the corresponding
731 portion of the original program.
732
733 Variable locations during Register Allocation
734 ---------------------------------------------
735
736 To avoid debug instructions interfering with the register allocator, the
737 LiveDebugVariables pass extracts variable locations from a MIR function and
738 deletes the corresponding DBG_VALUE instructions. Some localized copy
739 propagation is performed within blocks. After register allocation, the
740 VirtRegRewriter pass re-inserts DBG_VALUE instructions in their orignal
741 positions, translating virtual register references into their physical
742 machine locations. To avoid encoding incorrect variable locations, in this
743 pass any DBG_VALUE of a virtual register that is not live, is replaced by
744 the undefined location.
745
746 LiveDebugValues expansion of variable locations
747 -----------------------------------------------
748
749 After all optimizations have run and shortly before emission, the
750 LiveDebugValues pass runs to achieve two aims:
751
752 * To propagate the location of variables through copies and register spills,
753 * For every block, to record every valid variable location in that block.
754
755 After this pass the DBG_VALUE instruction changes meaning: rather than
756 corresponding to a source-level assignment where the variable may change value,
757 it asserts the location of a variable in a block, and loses effect outside the
758 block. Propagating variable locations through copies and spills is
759 straightforwards: determining the variable location in every basic block
760 requries the consideraton of control flow. Consider the following IR, which
761 presents several difficulties:
762
763 .. code-block:: llvm
764
765 define dso_local i32 @foo(i1 %cond, i32 %input) !dbg !12 {
766 entry:
767 br i1 %cond, label %truebr, label %falsebr
768
769 bb1:
770 %value = phi i32 [ %value1, %truebr ], [ %value2, %falsebr ]
771 br label %exit, !dbg !26
772
773 truebr:
774 call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24
775 call void @llvm.dbg.value(metadata i32 1, metadata !23, metadata !DIExpression()), !dbg !24
776 %value1 = add i32 %input, 1
777 br label %bb1
778
779 falsebr:
780 call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24
781 call void @llvm.dbg.value(metadata i32 2, metadata !23, metadata !DIExpression()), !dbg !24
782 %value = add i32 %input, 2
783 br label %bb1
784
785 exit:
786 ret i32 %value, !dbg !30
787 }
788
789 Here the difficulties are:
790
791 * The control flow is roughly the opposite of basic block order
792 * The value of the ``!23`` variable merges into ``%bb1``, but there is no PHI
793 node
794
795 As mentioned above, the ``llvm.dbg.value`` intrinsics essentially form an
796 imperative program embedded in the IR, with each intrinsic defining a variable
797 location. This *could* be converted to an SSA form by mem2reg, in the same way
798 that it uses use-def chains to identify control flow merges and insert phi
799 nodes for IR Values. However, because debug variable locations are defined for
800 every machine instruction, in effect every IR instruction uses every variable
801 location, which would lead to a large number of debugging intrinsics being
802 generated.
803
804 Examining the example above, variable ``!30`` is assigned ``%input`` on both
805 conditional paths through the function, while ``!23`` is assigned differing
806 constant values on either path. Where control flow merges in ``%bb1`` we would
807 want ``!30`` to keep its location (``%input``), but ``!23`` to become undefined
808 as we cannot determine at runtime what value it should have in %bb1 without
809 inserting a PHI node. mem2reg does not insert the PHI node to avoid changing
810 codegen when debugging is enabled, and does not insert the other dbg.values
811 to avoid adding very large numbers of intrinsics.
812
813 Instead, LiveDebugValues determines variable locations when control
814 flow merges. A dataflow analysis is used to propagate locations between blocks:
815 when control flow merges, if a variable has the same location in all
816 predecessors then that location is propagated into the successor. If the
817 predecessor locations disagree, the location becomes undefined.
818
819 Once LiveDebugValues has run, every block should have all valid variable
820 locations described by DBG_VALUE instructions within the block. Very little
821 effort is then required by supporting classes (such as
822 DbgEntityHistoryCalculator) to build a map of each instruction to every
823 valid variable location, without the need to consider control flow. From
824 the example above, it is otherwise difficult to determine that the location
825 of variable ``!30`` should flow "up" into block ``%bb1``, but that the location
826 of variable ``!23`` should not flow "down" into the ``%exit`` block.
517827
518828 .. _ccxx_frontend:
519829