llvm.org GIT mirror llvm / 327ae58
[statepoints][experimental] Add support for live-in semantics of values in deopt bundles This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes. The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate. Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.) My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default. Differential Revision: https://reviews.llvm.org/D24000 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280250 91177308-0d34-0410-b5e6-96231b3b80d8 Philip Reames 3 years ago
6 changed file(s) with 233 addition(s) and 9 deletion(s). Raw diff Collapse all Expand all
3333 None = 0,
3434 GCTransition = 1, ///< Indicates that this statepoint is a transition from
3535 ///< GC-aware code to code that is not GC-aware.
36
37 MaskAll = GCTransition ///< A bitmask that includes all valid flags.
36 /// Mark the deopt arguments associated with the statepoint as only being
37 /// "live-in". By default, deopt arguments are "live-through". "live-through"
38 /// requires that they the value be live on entry, on exit, and at any point
39 /// during the call. "live-in" only requires the value be available at the
40 /// start of the call. In particular, "live-in" values can be placed in
41 /// unused argument registers or other non-callee saved registers.
42 DeoptLiveIn = 2,
43
44 MaskAll = 3 ///< A bitmask that includes all valid flags.
3845 };
3946
4047 class GCRelocateInst;
369369 /// Lower a single value incoming to a statepoint node. This value can be
370370 /// either a deopt value or a gc value, the handling is the same. We special
371371 /// case constants and allocas, then fall back to spilling if required.
372 static void lowerIncomingStatepointValue(SDValue Incoming,
372 static void lowerIncomingStatepointValue(SDValue Incoming, bool LiveInOnly,
373373 SmallVectorImpl &Ops,
374374 SelectionDAGBuilder &Builder) {
375375 SDValue Chain = Builder.getRoot();
388388 // relocate the address of the alloca itself?)
389389 Ops.push_back(Builder.DAG.getTargetFrameIndex(FI->getIndex(),
390390 Incoming.getValueType()));
391 } else if (LiveInOnly) {
392 // If this value is live in (not live-on-return, or live-through), we can
393 // treat it the same way patchpoint treats it's "live in" values. We'll
394 // end up folding some of these into stack references, but they'll be
395 // handled by the register allocator. Note that we do not have the notion
396 // of a late use so these values might be placed in registers which are
397 // clobbered by the call. This is fine for live-in.
398 Ops.push_back(Incoming);
391399 } else {
392400 // Otherwise, locate a spill slot and explicitly spill it so it
393401 // can be found by the runtime later. We currently do not support
438446 "non gc managed derived pointer found in statepoint");
439447 }
440448 }
449 assert(SI.Bases.size() == SI.Ptrs.size() && "Pointer without base!");
441450 } else {
442451 assert(SI.Bases.empty() && "No gc specified, so cannot relocate pointers!");
443452 assert(SI.Ptrs.empty() && "No gc specified, so cannot relocate pointers!");
444453 }
445454 #endif
446455
456 // Figure out what lowering strategy we're going to use for each part
457 // Note: Is is conservatively correct to lower both "live-in" and "live-out"
458 // as "live-through". A "live-through" variable is one which is "live-in",
459 // "live-out", and live throughout the lifetime of the call (i.e. we can find
460 // it from any PC within the transitive callee of the statepoint). In
461 // particular, if the callee spills callee preserved registers we may not
462 // be able to find a value placed in that register during the call. This is
463 // fine for live-out, but not for live-through. If we were willing to make
464 // assumptions about the code generator producing the callee, we could
465 // potentially allow live-through values in callee saved registers.
466 const bool LiveInDeopt =
467 SI.StatepointFlags & (uint64_t)StatepointFlags::DeoptLiveIn;
468
469 auto isGCValue =[&](const Value *V) {
470 return is_contained(SI.Ptrs, V) || is_contained(SI.Bases, V);
471 };
472
447473 // Before we actually start lowering (and allocating spill slots for values),
448474 // reserve any stack slots which we judge to be profitable to reuse for a
449475 // particular value. This is purely an optimization over the code below and
450476 // doesn't change semantics at all. It is important for performance that we
451477 // reserve slots for both deopt and gc values before lowering either.
452478 for (const Value *V : SI.DeoptState) {
453 reservePreviousStackSlotForValue(V, Builder);
479 if (!LiveInDeopt || isGCValue(V))
480 reservePreviousStackSlotForValue(V, Builder);
454481 }
455482 for (unsigned i = 0; i < SI.Bases.size(); ++i) {
456483 reservePreviousStackSlotForValue(SI.Bases[i], Builder);
467494 // what type of values are contained within.
468495 for (const Value *V : SI.DeoptState) {
469496 SDValue Incoming = Builder.getValue(V);
470 lowerIncomingStatepointValue(Incoming, Ops, Builder);
497 const bool LiveInValue = LiveInDeopt && !isGCValue(V);
498 lowerIncomingStatepointValue(Incoming, LiveInValue, Ops, Builder);
471499 }
472500
473501 // Finally, go ahead and lower all the gc arguments. There's no prefixed
477505 // (base[0], ptr[0], base[1], ptr[1], ...)
478506 for (unsigned i = 0; i < SI.Bases.size(); ++i) {
479507 const Value *Base = SI.Bases[i];
480 lowerIncomingStatepointValue(Builder.getValue(Base), Ops, Builder);
508 lowerIncomingStatepointValue(Builder.getValue(Base), /*LiveInOnly*/ false,
509 Ops, Builder);
481510
482511 const Value *Ptr = SI.Ptrs[i];
483 lowerIncomingStatepointValue(Builder.getValue(Ptr), Ops, Builder);
512 lowerIncomingStatepointValue(Builder.getValue(Ptr), /*LiveInOnly*/ false,
513 Ops, Builder);
484514 }
485515
486516 // If there are any explicit spill slots passed to the statepoint, record
447447 StartIdx = PatchPointOpers(&MI).getVarIdx();
448448 break;
449449 }
450 case TargetOpcode::STATEPOINT: {
451 // For statepoints, fold deopt and gc arguments, but not call arguments.
452 StartIdx = StatepointOpers(&MI).getVarIdx();
453 break;
454 }
450455 default:
451456 llvm_unreachable("unexpected stackmap opcode");
452457 }
512517 MachineInstr *NewMI = nullptr;
513518
514519 if (MI.getOpcode() == TargetOpcode::STACKMAP ||
515 MI.getOpcode() == TargetOpcode::PATCHPOINT) {
520 MI.getOpcode() == TargetOpcode::PATCHPOINT ||
521 MI.getOpcode() == TargetOpcode::STATEPOINT) {
516522 // Fold stackmap/patchpoint.
517523 NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);
518524 if (NewMI)
793799 int FrameIndex = 0;
794800
795801 if ((MI.getOpcode() == TargetOpcode::STACKMAP ||
796 MI.getOpcode() == TargetOpcode::PATCHPOINT) &&
802 MI.getOpcode() == TargetOpcode::PATCHPOINT ||
803 MI.getOpcode() == TargetOpcode::STATEPOINT) &&
797804 isLoadFromStackSlot(LoadMI, FrameIndex)) {
798805 // Fold stackmap/patchpoint.
799806 NewMI = foldPatchpoint(MF, MI, Ops, FrameIndex, *this);
12721272 };
12731273 }
12741274
1275 static StringRef getDeoptLowering(CallSite CS) {
1276 const char *DeoptLowering = "deopt-lowering";
1277 if (CS.hasFnAttr(DeoptLowering)) {
1278 // FIXME: CallSite has a *really* confusing interface around attributes
1279 // with values.
1280 const AttributeSet &CSAS = CS.getAttributes();
1281 if (CSAS.hasAttribute(AttributeSet::FunctionIndex,
1282 DeoptLowering))
1283 return CSAS.getAttribute(AttributeSet::FunctionIndex,
1284 DeoptLowering).getValueAsString();
1285 Function *F = CS.getCalledFunction();
1286 assert(F && F->hasFnAttribute(DeoptLowering));
1287 return F->getFnAttribute(DeoptLowering).getValueAsString();
1288 }
1289 return "live-through";
1290 }
1291
1292
12751293 static void
12761294 makeStatepointExplicitImpl(const CallSite CS, /* to replace */
12771295 const SmallVectorImpl &BasePtrs,
13121330 NumPatchBytes = *SD.NumPatchBytes;
13131331 if (SD.StatepointID)
13141332 StatepointID = *SD.StatepointID;
1333
1334 // Pass through the requested lowering if any. The default is live-through.
1335 StringRef DeoptLowering = getDeoptLowering(CS);
1336 if (DeoptLowering.equals("live-in"))
1337 Flags |= uint32_t(StatepointFlags::DeoptLiveIn);
1338 else {
1339 assert(DeoptLowering.equals("live-through") && "Unsupported value!");
1340 }
13151341
13161342 Value *CallTarget = CS.getCalledValue();
13171343 if (Function *F = dyn_cast(CallTarget)) {
0 ; RUN: llc -O3 < %s | FileCheck %s
1 target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
2 target triple = "x86_64-apple-macosx10.11.0"
3
4 declare void @bar() #0
5 declare void @baz()
6
7 define void @test1(i32 %a) gc "statepoint-example" {
8 entry:
9 ; We expect the argument to be passed in an extra register to bar
10 ; CHECK-LABEL: test1
11 ; CHECK: pushq %rax
12 ; CHECK-NEXT: Ltmp0:
13 ; CHECK-NEXT: .cfi_def_cfa_offset 16
14 ; CHECK-NEXT: callq _bar
15 %statepoint_token1 = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 1, i32 %a)
16 ret void
17 }
18
19 define void @test2(i32 %a, i32 %b) gc "statepoint-example" {
20 entry:
21 ; Because the first call clobbers esi, we have to move the values into
22 ; new registers. Note that they stay in the registers for both calls.
23 ; CHECK-LABEL: @test2
24 ; CHECK: movl %esi, %ebx
25 ; CHECK-NEXT: movl %edi, %ebp
26 ; CHECK-NEXT: callq _bar
27 call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 2, i32 %a, i32 %b)
28 call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 2, i32 %b, i32 %a)
29 ret void
30 }
31
32 define void @test3(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i) gc "statepoint-example" {
33 entry:
34 ; TODO: We should have folded the reload into the statepoint.
35 ; CHECK-LABEL: @test3
36 ; CHECK: movl 32(%rsp), %r10d
37 ; CHECK-NEXT: movl 24(%rsp), %r11d
38 ; CHECK-NEXT: movl 16(%rsp), %eax
39 ; CHECK-NEXT: callq _bar
40 %statepoint_token1 = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 9, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i)
41 ret void
42 }
43
44 ; This case just confirms that we don't crash when given more live values
45 ; than registers. This is a case where we *have* to use a stack slot.
46 define void @test4(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n, i32 %o, i32 %p, i32 %q, i32 %r, i32 %s, i32 %t, i32 %u, i32 %v, i32 %w, i32 %x, i32 %y, i32 %z) gc "statepoint-example" {
47 entry:
48 ; TODO: We should have folded the reload into the statepoint.
49 ; CHECK-LABEL: test4
50 ; CHECK: pushq %r15
51 ; CHECK: pushq %r14
52 ; CHECK: pushq %r13
53 ; CHECK: pushq %r12
54 ; CHECK: pushq %rbx
55 ; CHECK: pushq %rax
56 ; CHECK: movl 128(%rsp), %r13d
57 ; CHECK-NEXT: movl 120(%rsp), %r12d
58 ; CHECK-NEXT: movl 112(%rsp), %r15d
59 ; CHECK-NEXT: movl 104(%rsp), %r14d
60 ; CHECK-NEXT: movl 96(%rsp), %ebp
61 ; CHECK-NEXT: movl 88(%rsp), %ebx
62 ; CHECK-NEXT: movl 80(%rsp), %r11d
63 ; CHECK-NEXT: movl 72(%rsp), %r10d
64 ; CHECK-NEXT: movl 64(%rsp), %eax
65 ; CHECK-NEXT: callq _bar
66 %statepoint_token1 = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 26, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n, i32 %o, i32 %p, i32 %q, i32 %r, i32 %s, i32 %t, i32 %u, i32 %v, i32 %w, i32 %x, i32 %y, i32 %z)
67 ret void
68 }
69
70 ; A live-through gc-value must be spilled even if it is also a live-in deopt
71 ; value. For live-in, we could technically report the register copy, but from
72 ; a code quality perspective it's better to reuse the required stack slot so
73 ; as to put less stress on the register allocator for no benefit.
74 define i32 addrspace(1)* @test5(i32 %a, i32 addrspace(1)* %p) gc "statepoint-example" {
75 entry:
76 ; CHECK-LABEL: test5
77 ; CHECK: movq %rsi, (%rsp)
78 ; CHECK-NEXT: callq _bar
79 %token = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 1, i32 %a, i32 addrspace(1)* %p, i32 addrspace(1)* %p)
80 %p2 = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %token, i32 9, i32 9)
81 ret i32 addrspace(1)* %p2
82 }
83
84 ; Show the interaction of live-through spilling followed by live-in.
85 define void @test6(i32 %a) gc "statepoint-example" {
86 entry:
87 ; TODO: We could have reused the previous spill slot at zero additional cost.
88 ; CHECK-LABEL: test6
89 ; CHECK: movl %edi, %ebx
90 ; CHECK: movl %ebx, 12(%rsp)
91 ; CHECK-NEXT: callq _baz
92 ; CHECK-NEXT: Ltmp30:
93 ; CHECK-NEXT: callq _bar
94 call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @baz, i32 0, i32 0, i32 0, i32 1, i32 %a)
95 call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 1, i32 %a)
96 ret void
97 }
98
99
100 ; CHECK: Ltmp1-_test1
101 ; CHECK: .byte 1
102 ; CHECK-NEXT: .byte 4
103 ; CHECK-NEXT: .short 5
104 ; CHECK-NEXT: .long 0
105
106 ; CHECK: Ltmp7-_test2
107 ; CHECK: .byte 1
108 ; CHECK-NEXT: .byte 4
109 ; CHECK-NEXT: .short 6
110 ; CHECK-NEXT: .long 0
111 ; CHECK: .byte 1
112 ; CHECK-NEXT: .byte 4
113 ; CHECK-NEXT: .short 3
114 ; CHECK-NEXT: .long 0
115 ; CHECK: Ltmp8-_test2
116 ; CHECK: .byte 1
117 ; CHECK-NEXT: .byte 4
118 ; CHECK-NEXT: .short 3
119 ; CHECK-NEXT: .long 0
120 ; CHECK: .byte 1
121 ; CHECK-NEXT: .byte 4
122 ; CHECK-NEXT: .short 6
123 ; CHECK-NEXT: .long 0
124
125 declare token @llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32, void ()*, i32, i32, ...)
126 declare i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token, i32, i32)
127
128
129 attributes #0 = { "deopt-lowering"="live-in" }
130 attributes #1 = { "deopt-lowering"="live-through" }
0 ; RUN: opt -rewrite-statepoints-for-gc -S < %s | FileCheck %s
1 ; Check that the "deopt-lowering" function attribute gets transcoded into
2 ; flags on the resulting statepoint
3
4 target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
5 target triple = "x86_64-apple-macosx10.11.0"
6
7 declare void @foo()
8 declare void @bar() "deopt-lowering"="live-in"
9 declare void @baz() "deopt-lowering"="live-through"
10
11 define void @test1() gc "statepoint-example" {
12 ; CHECK-LABEL: @test1(
13 ; CHECK: @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 1, i32 57)
14 ; CHECK: @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 1, i32 42)
15 ; CHECK: @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @baz, i32 0, i32 0, i32 0, i32 1, i32 13)
16
17 entry:
18 call void @foo() [ "deopt"(i32 57) ]
19 call void @bar() [ "deopt"(i32 42) ]
20 call void @baz() [ "deopt"(i32 13) ]
21 ret void
22 }