llvm.org GIT mirror llvm / 779f485
Allow value forwarding past release fences in GVN A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-load forwarding across a release fence. We choose to be much more conservative about stores. In theory, nothing prevents us from shifting a store from after a release fence to before it, and then eliminating the preceeding (previously fenced) store. Doing this without actually moving the second store is likely also legal, but we chose to be conservative at this time. The LangRef indicates only atomic loads and stores are effected by fences. This patch chooses to be far more conservative then that. This is the GVN companion to http://reviews.llvm.org/D11434 which applied the same logic in EarlyCSE and has been baking in tree for a while now. Differential Revision: http://reviews.llvm.org/D11436 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264472 91177308-0d34-0410-b5e6-96231b3b80d8 Philip Reames 3 years ago
3 changed file(s) with 126 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
637637 if (isInvariantLoad)
638638 continue;
639639
640 // A release fence requires that all stores complete before it, but does
641 // not prevent the reordering of following loads or stores 'before' the
642 // fence. As a result, we look past it when finding a dependency for
643 // loads. DSE uses this to find preceeding stores to delete and thus we
644 // can't bypass the fence if the query instruction is a store.
645 if (FenceInst *FI = dyn_cast(Inst))
646 if (isLoad && FI->getOrdering() == Release)
647 continue;
648
640649 // See if this instruction (e.g. a call or vaarg) mod/ref's the pointer.
641650 ModRefInfo MR = AA.getModRefInfo(Inst, MemLoc);
642651 // If necessary, perform additional analysis.
0 ; RUN: opt -S -basicaa -dse < %s | FileCheck %s
1
2 ; We conservative choose to prevent dead store elimination
3 ; across release or stronger fences. It's not required
4 ; (since the must still be a race on %addd.i), but
5 ; it is conservatively correct. A legal optimization
6 ; could hoist the second store above the fence, and then
7 ; DSE one of them.
8 define void @test1(i32* %addr.i) {
9 ; CHECK-LABEL: @test1
10 ; CHECK: store i32 5
11 ; CHECK: fence
12 ; CHECK: store i32 5
13 ; CHECK: ret
14 store i32 5, i32* %addr.i, align 4
15 fence release
16 store i32 5, i32* %addr.i, align 4
17 ret void
18 }
19
20 ; Same as previous, but with different values. If we ever optimize
21 ; this more aggressively, this allows us to check that the correct
22 ; store is retained (the 'i32 1' store in this case)
23 define void @test1b(i32* %addr.i) {
24 ; CHECK-LABEL: @test1b
25 ; CHECK: store i32 42
26 ; CHECK: fence release
27 ; CHECK: store i32 1
28 ; CHECK: ret
29 store i32 42, i32* %addr.i, align 4
30 fence release
31 store i32 1, i32* %addr.i, align 4
32 ret void
33 }
34
35 ; We *could* DSE across this fence, but don't. No other thread can
36 ; observe the order of the acquire fence and the store.
37 define void @test2(i32* %addr.i) {
38 ; CHECK-LABEL: @test2
39 ; CHECK: store
40 ; CHECK: fence
41 ; CHECK: store
42 ; CHECK: ret
43 store i32 5, i32* %addr.i, align 4
44 fence acquire
45 store i32 5, i32* %addr.i, align 4
46 ret void
47 }
0 ; RUN: opt -S -basicaa -gvn < %s | FileCheck %s
1
2 ; We can value forward across the fence since we can (semantically)
3 ; reorder the following load before the fence.
4 define i32 @test(i32* %addr.i) {
5 ; CHECK-LABEL: @test
6 ; CHECK: store
7 ; CHECK: fence
8 ; CHECK-NOT: load
9 ; CHECK: ret
10 store i32 5, i32* %addr.i, align 4
11 fence release
12 %a = load i32, i32* %addr.i, align 4
13 ret i32 %a
14 }
15
16 ; Same as above
17 define i32 @test2(i32* %addr.i) {
18 ; CHECK-LABEL: @test2
19 ; CHECK-NEXT: fence
20 ; CHECK-NOT: load
21 ; CHECK: ret
22 %a = load i32, i32* %addr.i, align 4
23 fence release
24 %a2 = load i32, i32* %addr.i, align 4
25 %res = sub i32 %a, %a2
26 ret i32 %res
27 }
28
29 ; We can not value forward across an acquire barrier since we might
30 ; be syncronizing with another thread storing to the same variable
31 ; followed by a release fence. This is not so much enforcing an
32 ; ordering property (though it is that too), but a liveness
33 ; property. We expect to eventually see the value of store by
34 ; another thread when spinning on that location.
35 define i32 @test3(i32* noalias %addr.i, i32* noalias %otheraddr) {
36 ; CHECK-LABEL: @test3
37 ; CHECK: load
38 ; CHECK: fence
39 ; CHECK: load
40 ; CHECK: ret i32 %res
41 ; the following code is intented to model the unrolling of
42 ; two iterations in a spin loop of the form:
43 ; do { fence acquire: tmp = *%addr.i; ) while (!tmp);
44 ; It's hopefully clear that allowing PRE to turn this into:
45 ; if (!*%addr.i) while(true) {} would be unfortunate
46 fence acquire
47 %a = load i32, i32* %addr.i, align 4
48 fence acquire
49 %a2 = load i32, i32* %addr.i, align 4
50 %res = sub i32 %a, %a2
51 ret i32 %res
52 }
53
54 ; Another example of why forwarding across an acquire fence is problematic
55 ; can be seen in a normal locking operation. Say we had:
56 ; *p = 5; unlock(l); lock(l); use(p);
57 ; forwarding the store to p would be invalid. A reasonable implementation
58 ; of unlock and lock might be:
59 ; unlock() { atomicrmw sub %l, 1 unordered; fence release }
60 ; lock() {
61 ; do {
62 ; %res = cmpxchg %p, 0, 1, monotonic monotonic
63 ; } while(!%res.success)
64 ; fence acquire;
65 ; }
66 ; Given we chose to forward across the release fence, we clearly can't forward
67 ; across the acquire fence as well.
68