llvm.org GIT mirror llvm / 61c6f9e
Merging r243519: ------------------------------------------------------------------------ r243519 | wschmidt | 2015-07-29 07:31:57 -0700 (Wed, 29 Jul 2015) | 14 lines [PPC] Fix PR24216: Don't generate splat for misaligned shuffle mask Given certain shuffle-vector masks, LLVM emits splat instructions which splat the wrong bytes from the source register. The issue is that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp does not ensure that the splat pattern found is requesting bytes that are aligned on an EltSize boundary. This patch detects this situation as not a valid splat mask, resulting in a permute being generated instead of a splat. Patch and test case by Tyler Kenney, cleaned up a bit by me. This is a simple bug fix that would be good to incorporate into 3.7. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@243528 91177308-0d34-0410-b5e6-96231b3b80d8 Hans Wennborg 5 years ago
2 changed file(s) with 19 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
14291429 assert(N->getValueType(0) == MVT::v16i8 &&
14301430 (EltSize == 1 || EltSize == 2 || EltSize == 4));
14311431
1432 // The consecutive indices need to specify an element, not part of two
1433 // different elements. So abandon ship early if this isn't the case.
1434 if (N->getMaskElt(0) % EltSize != 0)
1435 return false;
1436
14321437 // This is a splat operation if each element of the permute is the same, and
14331438 // if the value doesn't reference the second vector.
14341439 unsigned ElementBase = N->getMaskElt(0);
0 ; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-linux-gnu < %s | FileCheck %s
1
2 ; Test case adapted from PR24216.
3
4 define void @foo(<16 x i8>* nocapture readonly %in, <16 x i8>* nocapture %out) {
5 entry:
6 %0 = load <16 x i8>, <16 x i8>* %in, align 16
7 %1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32>
8 store <16 x i8> %1, <16 x i8>* %out, align 16
9 ret void
10 }
11
12 ; CHECK: vperm
13 ; CHECK-NOT: vspltw