llvm.org GIT mirror llvm / f0e0135
[PM/Unswitch] Fix a collection of closely related issues with trivial switch unswitching. The core problem was that the way we handled unswitching trivial exit edges through the default successor of a switch. For some reason I thought the right way to do this was to add a block containing unreachable and point the default successor at this block. In retrospect, this has an amazing number of problems. The first issue is the one that this pass has always worked around -- we have to *detect* such edges and avoid unswitching them again. This seemed pretty easy really. You juts look for an edge to a block containing unreachable. However, this pattern is woefully unsound. So many things can break it. The amazing thing is that I found a test case where *simple-loop-unswitch itself* breaks this! When we do a *non-trivial* unswitch of a switch we will end up splitting this exit edge. The result will be a default successor that is an exit and terminates in ... a perfectly normal branch. So the first test case that I started trying to fix is added to the nontrivial test cases. This is a ridiculous example that did just amazing things previously. With just unswitch, it would create 10+ copies of this stuff stamped out. But if you combine it *just right* with a bunch of other passes (like simplify-cfg, loop rotate, and some LICM) you can get it to do this infinitely. Or at least, I never got it to finish. =[ This, in turn, uncovered another related issue. When we are manipulating these switches after doing a trivial unswitch we never correctly updated PHI nodes to reflect our edits. As soon as I started changing how these edges were managed, it became obvious there were more issues that I couldn't realistically leave unaddressed, so I wrote more test cases around PHI updates here and ensured all of that works now. And this, in turn, required some adjustment to how we collect and manage the exit successor when it is the default successor. That showed a clear bug where we failed to include it in our search for the outer-most loop reached by an unswitched exit edge. This was actually already tested and the test case didn't work. I (wrongly) thought that was due to SCEV failing to analyze the switch. In fact, it was just a simple bug in the code that skipped the default successor. While changing this, I handled it correctly and have updated the test to reflect that we now get precise SCEV analysis of trip counts for the outer loop in one of these cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336646 91177308-0d34-0410-b5e6-96231b3b80d8 Chandler Carruth 1 year, 2 months ago
4 changed file(s) with 318 addition(s) and 25 deletion(s). Raw diff Collapse all Expand all
542542 // the exits.
543543 Loop *OuterL = &L;
544544
545 if (DefaultExitBB) {
546 // Clear out the default destination temporarily to allow accurate
547 // predecessor lists to be examined below.
548 SI.setDefaultDest(nullptr);
549 // Check the loop containing this exit.
550 Loop *ExitL = LI.getLoopFor(DefaultExitBB);
551 if (!ExitL || ExitL->contains(OuterL))
552 OuterL = ExitL;
553 }
554
555 // Store the exit cases into a separate data structure and remove them from
556 // the switch.
545557 SmallVector, 4> ExitCases;
546558 ExitCases.reserve(ExitCaseIndices.size());
547559 // We walk the case indices backwards so that we remove the last case first
575587 SI.case_begin()->getCaseSuccessor();
576588 }))
577589 CommonSuccBB = SI.case_begin()->getCaseSuccessor();
578
579 if (DefaultExitBB) {
580 // We can't remove the default edge so replace it with an edge to either
581 // the single common remaining successor (if we have one) or an unreachable
582 // block.
583 if (CommonSuccBB) {
584 SI.setDefaultDest(CommonSuccBB);
585 } else {
586 BasicBlock *UnreachableBB = BasicBlock::Create(
587 ParentBB->getContext(),
588 Twine(ParentBB->getName()) + ".unreachable_default",
589 ParentBB->getParent());
590 new UnreachableInst(ParentBB->getContext(), UnreachableBB);
591 SI.setDefaultDest(UnreachableBB);
592 DT.addNewBlock(UnreachableBB, ParentBB);
593 }
594 } else {
590 if (!DefaultExitBB) {
595591 // If we're not unswitching the default, we need it to match any cases to
596592 // have a common successor or if we have no cases it is the common
597593 // successor.
687683 // pointing at unreachable and other complexity.
688684 if (CommonSuccBB) {
689685 BasicBlock *BB = SI.getParent();
686 // We may have had multiple edges to this common successor block, so remove
687 // them as predecessors. We skip the first one, either the default or the
688 // actual first case.
689 bool SkippedFirst = DefaultExitBB == nullptr;
690 for (auto Case : SI.cases()) {
691 assert(Case.getCaseSuccessor() == CommonSuccBB &&
692 "Non-common successor!");
693 if (!SkippedFirst) {
694 SkippedFirst = true;
695 continue;
696 }
697 CommonSuccBB->removePredecessor(BB,
698 /*DontDeleteUselessPHIs*/ true);
699 }
700 // Now nuke the switch and replace it with a direct branch.
690701 SI.eraseFromParent();
691702 BranchInst::Create(CommonSuccBB, BB);
703 } else if (DefaultExitBB) {
704 assert(SI.getNumCases() > 0 &&
705 "If we had no cases we'd have a common successor!");
706 // Move the last case to the default successor. This is valid as if the
707 // default got unswitched it cannot be reached. This has the advantage of
708 // being simple and keeping the number of edges from this switch to
709 // successors the same, and avoiding any PHI update complexity.
710 auto LastCaseI = std::prev(SI.case_end());
711 SI.setDefaultDest(LastCaseI->getCaseSuccessor());
712 SI.removeCase(LastCaseI);
692713 }
693714
694715 // Walk the unswitched exit blocks and the unswitched split blocks and update
39333933 ; CHECK: exit:
39343934 ; CHECK-NEXT: ret void
39353935 }
3936
3937 ; A devilish pattern. This is a crafty, crafty test case designed to risk
3938 ; creating indirect cycles with trivial and non-trivial unswitching. The inner
3939 ; loop has a switch with a trivial exit edge that can be unswitched, but the
3940 ; rest of the switch cannot be unswitched because its cost is too high.
3941 ; However, the unswitching of the trivial edge creates a new switch in the
3942 ; outer loop. *This* switch isn't trivial, but has a low cost to unswitch. When
3943 ; we unswitch this switch from the outer loop, we will remove it completely and
3944 ; create a clone of the inner loop on one side. This clone will then again be
3945 ; viable for unswitching the inner-most loop. This lets us check that the
3946 ; unswitching doesn't end up cycling infinitely even when the cycle is
3947 ; indirect and due to revisiting a loop after cloning.
3948 define void @test30(i32 %arg) {
3949 ; CHECK-LABEL: define void @test30(
3950 entry:
3951 br label %outer.header
3952 ; CHECK-NEXT: entry:
3953 ; CHECK-NEXT: switch i32 %arg, label %[[ENTRY_SPLIT:.*]] [
3954 ; CHECK-NEXT: i32 1, label %[[ENTRY_SPLIT_US:.*]]
3955 ; CHECK-NEXT: i32 2, label %[[ENTRY_SPLIT_US]]
3956 ; CHECK-NEXT: ]
3957 ;
3958 ; CHECK: [[ENTRY_SPLIT_US]]:
3959 ; CHECK-NEXT: switch i32 %arg, label %[[ENTRY_SPLIT_US_SPLIT:.*]] [
3960 ; CHECK-NEXT: i32 1, label %[[ENTRY_SPLIT_US_SPLIT_US:.*]]
3961 ; CHECK-NEXT: ]
3962
3963 outer.header:
3964 br label %inner.header
3965
3966 inner.header:
3967 switch i32 %arg, label %inner.loopexit1 [
3968 i32 1, label %inner.body1
3969 i32 2, label %inner.body2
3970 ]
3971
3972 inner.body1:
3973 %a = call i32 @a()
3974 br label %inner.latch
3975 ; The (super convoluted) fully unswitched loop around `@a`.
3976 ;
3977 ; CHECK: [[ENTRY_SPLIT_US_SPLIT_US]]:
3978 ; CHECK-NEXT: br label %[[OUTER_HEADER_US_US:.*]]
3979 ;
3980 ; CHECK: [[OUTER_HEADER_US_US]]:
3981 ; CHECK-NEXT: br label %[[OUTER_HEADER_SPLIT_US_US:.*]]
3982 ;
3983 ; CHECK: [[OUTER_LATCH_US_US:.*]]:
3984 ; CHECK-NEXT: %[[OUTER_COND_US_US:.*]] = call i1 @cond()
3985 ; CHECK-NEXT: br i1 %[[OUTER_COND_US_US]], label %[[OUTER_HEADER_US_US]], label %[[EXIT_SPLIT_US_SPLIT_US:.*]]
3986 ;
3987 ; CHECK: [[OUTER_HEADER_SPLIT_US_US]]:
3988 ; CHECK-NEXT: br label %[[OUTER_HEADER_SPLIT_SPLIT_US_US_US:.*]]
3989 ;
3990 ; CHECK: [[INNER_LOOPEXIT2_US_US:.*]]:
3991 ; CHECK-NEXT: br label %[[OUTER_LATCH_US_US]]
3992 ;
3993 ; CHECK: [[OUTER_HEADER_SPLIT_SPLIT_US_US_US]]:
3994 ; CHECK-NEXT: br label %[[INNER_HEADER_US_US_US:.*]]
3995 ;
3996 ; CHECK: [[INNER_HEADER_US_US_US]]:
3997 ; CHECK-NEXT: br label %[[INNER_BODY1_US_US_US:.*]]
3998 ;
3999 ; CHECK: [[INNER_BODY1_US_US_US]]:
4000 ; CHECK-NEXT: %[[A:.*]] = call i32 @a()
4001 ; CHECK-NEXT: br label %[[INNER_LATCH_US_US_US:.*]]
4002 ;
4003 ; CHECK: [[INNER_LATCH_US_US_US]]:
4004 ; CHECK-NEXT: %[[PHI_A:.*]] = phi i32 [ %[[A]], %[[INNER_BODY1_US_US_US]] ]
4005 ; CHECK-NEXT: call void @sink1(i32 0)
4006 ; CHECK-NEXT: call void @sink1(i32 0)
4007 ; CHECK-NEXT: call void @sink1(i32 0)
4008 ; CHECK-NEXT: call void @sink1(i32 0)
4009 ; CHECK-NEXT: call void @sink1(i32 0)
4010 ; CHECK-NEXT: call void @sink1(i32 0)
4011 ; CHECK-NEXT: call void @sink1(i32 0)
4012 ; CHECK-NEXT: call void @sink1(i32 0)
4013 ; CHECK-NEXT: call void @sink1(i32 0)
4014 ; CHECK-NEXT: call void @sink1(i32 0)
4015 ; CHECK-NEXT: call void @sink1(i32 %[[PHI_A]])
4016 ; CHECK-NEXT: %[[INNER_COND_US_US_US:.*]] = call i1 @cond()
4017 ; CHECK-NEXT: br i1 %[[INNER_COND_US_US_US]], label %[[INNER_HEADER_US_US_US]], label %[[INNER_LOOPEXIT2_SPLIT_US_US_US:.*]]
4018 ;
4019 ; CHECK: [[INNER_LOOPEXIT2_SPLIT_US_US_US]]:
4020 ; CHECK-NEXT: br label %[[INNER_LOOPEXIT2_US_US]]
4021 ;
4022 ; CHECK: [[EXIT_SPLIT_US_SPLIT_US]]:
4023 ; CHECK-NEXT: br label %[[EXIT_SPLIT_US:.*]]
4024
4025
4026 inner.body2:
4027 %b = call i32 @b()
4028 br label %inner.latch
4029 ; The fully unswitched loop around `@b`.
4030 ;
4031 ; CHECK: [[ENTRY_SPLIT_US_SPLIT]]:
4032 ; CHECK-NEXT: br label %[[OUTER_HEADER_US:.*]]
4033 ;
4034 ; CHECK: [[OUTER_HEADER_US]]:
4035 ; CHECK-NEXT: br label %[[OUTER_HEADER_SPLIT_US:.*]]
4036 ;
4037 ; CHECK: [[INNER_HEADER_US:.*]]:
4038 ; CHECK-NEXT: br label %[[INNER_BODY2_US:.*]]
4039 ;
4040 ; CHECK: [[INNER_BODY2_US]]:
4041 ; CHECK-NEXT: %[[B:.*]] = call i32 @b()
4042 ; CHECK-NEXT: br label %[[INNER_LATCH_US:.*]]
4043 ;
4044 ; CHECK: [[INNER_LATCH_US]]:
4045 ; CHECK-NEXT: call void @sink1(i32 0)
4046 ; CHECK-NEXT: call void @sink1(i32 0)
4047 ; CHECK-NEXT: call void @sink1(i32 0)
4048 ; CHECK-NEXT: call void @sink1(i32 0)
4049 ; CHECK-NEXT: call void @sink1(i32 0)
4050 ; CHECK-NEXT: call void @sink1(i32 0)
4051 ; CHECK-NEXT: call void @sink1(i32 0)
4052 ; CHECK-NEXT: call void @sink1(i32 0)
4053 ; CHECK-NEXT: call void @sink1(i32 0)
4054 ; CHECK-NEXT: call void @sink1(i32 0)
4055 ; CHECK-NEXT: call void @sink1(i32 %[[B]])
4056 ; CHECK-NEXT: %[[INNER_COND_US:.*]] = call i1 @cond()
4057 ; CHECK-NEXT: br i1 %[[INNER_COND_US]], label %[[INNER_HEADER_US]], label %[[INNER_LOOPEXIT2_SPLIT_US:.*]]
4058 ;
4059 ; CHECK: [[INNER_LOOPEXIT2_SPLIT_US]]:
4060 ; CHECK-NEXT: br label %[[INNER_LOOPEXIT2_US:.*]]
4061 ;
4062 ; CHECK: [[OUTER_LATCH_US:.*]]:
4063 ; CHECK-NEXT: %[[OUTER_COND_US:.*]] = call i1 @cond()
4064 ; CHECK-NEXT: br i1 %[[OUTER_COND_US]], label %[[OUTER_HEADER_US]], label %[[EXIT_SPLIT_US_SPLIT:.*]]
4065 ;
4066 ; CHECK: [[OUTER_HEADER_SPLIT_US]]:
4067 ; CHECK-NEXT: br label %[[OUTER_HEADER_SPLIT_SPLIT_US:.*]]
4068 ;
4069 ; CHECK: [[OUTER_HEADER_SPLIT_SPLIT_US]]:
4070 ; CHECK-NEXT: br label %[[INNER_HEADER_US]]
4071 ;
4072 ; CHECK: [[INNER_LOOPEXIT2_US]]:
4073 ; CHECK-NEXT: br label %[[OUTER_LATCH_US]]
4074 ;
4075 ; CHECK: [[EXIT_SPLIT_US]]:
4076 ; CHECK-NEXT: br label %exit
4077
4078 inner.latch:
4079 %phi = phi i32 [ %a, %inner.body1 ], [ %b, %inner.body2 ]
4080 ; Make 10 junk calls here to ensure we're over the "50" cost threshold of
4081 ; non-trivial unswitching for this inner switch.
4082 call void @sink1(i32 0)
4083 call void @sink1(i32 0)
4084 call void @sink1(i32 0)
4085 call void @sink1(i32 0)
4086 call void @sink1(i32 0)
4087 call void @sink1(i32 0)
4088 call void @sink1(i32 0)
4089 call void @sink1(i32 0)
4090 call void @sink1(i32 0)
4091 call void @sink1(i32 0)
4092 call void @sink1(i32 %phi)
4093 %inner.cond = call i1 @cond()
4094 br i1 %inner.cond, label %inner.header, label %inner.loopexit2
4095
4096 inner.loopexit1:
4097 br label %outer.latch
4098 ; The unswitched `loopexit1` path.
4099 ;
4100 ; CHECK: [[ENTRY_SPLIT]]:
4101 ; CHECK-NEXT: br label %[[OUTER_HEADER:.*]]
4102 ;
4103 ; CHECK: outer.header:
4104 ; CHECK-NEXT: br label %inner.loopexit1
4105 ;
4106 ; CHECK: inner.loopexit1:
4107 ; CHECK-NEXT: br label %outer.latch
4108 ;
4109 ; CHECK: outer.latch:
4110 ; CHECK-NEXT: %outer.cond = call i1 @cond()
4111 ; CHECK-NEXT: br i1 %outer.cond, label %outer.header, label %[[EXIT_SPLIT:.*]]
4112 ;
4113 ; CHECK: [[EXIT_SPLIT]]:
4114 ; CHECK-NEXT: br label %exit
4115
4116 inner.loopexit2:
4117 br label %outer.latch
4118
4119 outer.latch:
4120 %outer.cond = call i1 @cond()
4121 br i1 %outer.cond, label %outer.header, label %exit
4122
4123 exit:
4124 ret void
4125 ; CHECK: exit:
4126 ; CHECK-NEXT: ret void
4127 }
0 ; RUN: opt -passes='loop(unswitch),verify' -S < %s | FileCheck %s
11
22 declare void @some_func() noreturn
3 declare void @sink(i32)
34
45 declare i1 @cond()
56 declare i32 @cond.i32()
135136 ]
136137 ; CHECK: loop_begin:
137138 ; CHECK-NEXT: load
138 ; CHECK-NEXT: switch i32 %cond2, label %[[UNREACHABLE:.*]] [
139 ; CHECK-NEXT: switch i32 %cond2, label %loop2 [
139140 ; CHECK-NEXT: i32 0, label %loop0
140141 ; CHECK-NEXT: i32 1, label %loop1
141 ; CHECK-NEXT: i32 2, label %loop2
142142 ; CHECK-NEXT: ]
143143
144144 loop0:
181181 ret i32 0
182182 ; CHECK: loop_exit3:
183183 ; CHECK-NEXT: ret
184 ;
185 ; CHECK: [[UNREACHABLE]]:
186 ; CHECK-NEXT: unreachable
187184 }
188185
189186 ; This test contains a trivially unswitchable branch with an LCSSA phi node in
11591156 ; CHECK: exit:
11601157 ; CHECK-NEXT: ret void
11611158 }
1159
1160 define void @test_unswitch_to_common_succ_with_phis(i32* %var, i32 %cond) {
1161 ; CHECK-LABEL: @test_unswitch_to_common_succ_with_phis(
1162 entry:
1163 br label %header
1164 ; CHECK-NEXT: entry:
1165 ; CHECK-NEXT: switch i32 %cond, label %loopexit1 [
1166 ; CHECK-NEXT: i32 13, label %loopexit2
1167 ; CHECK-NEXT: i32 0, label %entry.split
1168 ; CHECK-NEXT: i32 1, label %entry.split
1169 ; CHECK-NEXT: ]
1170 ;
1171 ; CHECK: entry.split:
1172 ; CHECK-NEXT: br label %header
1173
1174 header:
1175 %var_val = load i32, i32* %var
1176 switch i32 %cond, label %loopexit1 [
1177 i32 0, label %latch
1178 i32 1, label %latch
1179 i32 13, label %loopexit2
1180 ]
1181 ; CHECK: header:
1182 ; CHECK-NEXT: load
1183 ; CHECK-NEXT: br label %latch
1184
1185 latch:
1186 ; No-op PHI node to exercise weird PHI update scenarios.
1187 %phi = phi i32 [ %var_val, %header ], [ %var_val, %header ]
1188 call void @sink(i32 %phi)
1189 br label %header
1190 ; CHECK: latch:
1191 ; CHECK-NEXT: %[[PHI:.*]] = phi i32 [ %var_val, %header ]
1192 ; CHECK-NEXT: call void @sink(i32 %[[PHI]])
1193 ; CHECK-NEXT: br label %header
1194
1195 loopexit1:
1196 ret void
1197 ; CHECK: loopexit1:
1198 ; CHECK-NEXT: ret
1199
1200 loopexit2:
1201 ret void
1202 ; CHECK: loopexit2:
1203 ; CHECK-NEXT: ret
1204 }
1205
1206 define void @test_unswitch_to_default_common_succ_with_phis(i32* %var, i32 %cond) {
1207 ; CHECK-LABEL: @test_unswitch_to_default_common_succ_with_phis(
1208 entry:
1209 br label %header
1210 ; CHECK-NEXT: entry:
1211 ; CHECK-NEXT: switch i32 %cond, label %entry.split [
1212 ; CHECK-NEXT: i32 13, label %loopexit
1213 ; CHECK-NEXT: ]
1214 ;
1215 ; CHECK: entry.split:
1216 ; CHECK-NEXT: br label %header
1217
1218 header:
1219 %var_val = load i32, i32* %var
1220 switch i32 %cond, label %latch [
1221 i32 0, label %latch
1222 i32 1, label %latch
1223 i32 13, label %loopexit
1224 ]
1225 ; CHECK: header:
1226 ; CHECK-NEXT: load
1227 ; CHECK-NEXT: br label %latch
1228
1229 latch:
1230 ; No-op PHI node to exercise weird PHI update scenarios.
1231 %phi = phi i32 [ %var_val, %header ], [ %var_val, %header ], [ %var_val, %header ]
1232 call void @sink(i32 %phi)
1233 br label %header
1234 ; CHECK: latch:
1235 ; CHECK-NEXT: %[[PHI:.*]] = phi i32 [ %var_val, %header ]
1236 ; CHECK-NEXT: call void @sink(i32 %[[PHI]])
1237 ; CHECK-NEXT: br label %header
1238
1239 loopexit:
1240 ret void
1241 ; CHECK: loopexit:
1242 ; CHECK-NEXT: ret
1243 }
7575 ; backedge-taken counts.
7676 ; SCEV-LABEL: Determining loop execution counts for: @test2
7777 ; SCEV: Loop %inner_loop_begin: backedge-taken count is (-1 + (1 smax %m))
78 ; FIXME: The following backedge taken count should be known but isn't apparently
79 ; just because of a switch in the outer loop.
80 ; SCEV: Loop %outer_loop_begin: Unpredictable backedge-taken count.
78 ; SCEV: Loop %outer_loop_begin: backedge-taken count is (-1 + (1 smax %n))
8179 ;
8280 ; CHECK-LABEL: define void @test2(
8381 entry: