llvm.org GIT mirror llvm / 8f1f7ce
Introduce experimental generic intrinsics for horizontal vector reductions. - This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle. - For CodeGen, basic legalization support is added. Differential Revision: https://reviews.llvm.org/D30086 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302514 91177308-0d34-0410-b5e6-96231b3b80d8 Amara Emerson 3 years ago
19 changed file(s) with 960 addition(s) and 66 deletion(s). Raw diff Collapse all Expand all
1168611686
1168711687 %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c
1168811688
11689
11690 Experimental Vector Reduction Intrinsics
11691 ----------------------------------------
11692
11693 Horizontal reductions of vectors can be expressed using the following
11694 intrinsics. Each one takes a vector operand as an input and applies its
11695 respective operation across all elements of the vector, returning a single
11696 scalar result of the same element type.
11697
11698
11699 '``llvm.experimental.vector.reduce.add.*``' Intrinsic
11700 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11701
11702 Syntax:
11703 """""""
11704
11705 ::
11706
11707 declare i32 @llvm.experimental.vector.reduce.add.i32.v4i32(<4 x i32> %a)
11708 declare i64 @llvm.experimental.vector.reduce.add.i64.v2i64(<2 x i64> %a)
11709
11710 Overview:
11711 """""""""
11712
11713 The '``llvm.experimental.vector.reduce.add.*``' intrinsics do an integer ``ADD``
11714 reduction of a vector, returning the result as a scalar. The return type matches
11715 the element-type of the vector input.
11716
11717 Arguments:
11718 """"""""""
11719 The argument to this intrinsic must be a vector of integer values.
11720
11721 '``llvm.experimental.vector.reduce.fadd.*``' Intrinsic
11722 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11723
11724 Syntax:
11725 """""""
11726
11727 ::
11728
11729 declare float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float %acc, <4 x float> %a)
11730 declare double @llvm.experimental.vector.reduce.fadd.f64.v2f64(double %acc, <2 x double> %a)
11731
11732 Overview:
11733 """""""""
11734
11735 The '``llvm.experimental.vector.reduce.fadd.*``' intrinsics do a floating point
11736 ``ADD`` reduction of a vector, returning the result as a scalar. The return type
11737 matches the element-type of the vector input.
11738
11739 If the intrinsic call has fast-math flags, then the reduction will not preserve
11740 the associativity of an equivalent scalarized counterpart. If it does not have
11741 fast-math flags, then the reduction will be *ordered*, implying that the
11742 operation respects the associativity of a scalarized reduction.
11743
11744
11745 Arguments:
11746 """"""""""
11747 The first argument to this intrinsic is a scalar accumulator value, which is
11748 only used when there are no fast-math flags attached. This argument may be undef
11749 when fast-math flags are used.
11750
11751 The second argument must be a vector of floating point values.
11752
11753 Examples:
11754 """""""""
11755
11756 .. code-block:: llvm
11757
11758 %fast = call fast float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float undef, <4 x float> %input) ; fast reduction
11759 %ord = call float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float %acc, <4 x float> %input) ; ordered reduction
11760
11761
11762 '``llvm.experimental.vector.reduce.mul.*``' Intrinsic
11763 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11764
11765 Syntax:
11766 """""""
11767
11768 ::
11769
11770 declare i32 @llvm.experimental.vector.reduce.mul.i32.v4i32(<4 x i32> %a)
11771 declare i64 @llvm.experimental.vector.reduce.mul.i64.v2i64(<2 x i64> %a)
11772
11773 Overview:
11774 """""""""
11775
11776 The '``llvm.experimental.vector.reduce.mul.*``' intrinsics do an integer ``MUL``
11777 reduction of a vector, returning the result as a scalar. The return type matches
11778 the element-type of the vector input.
11779
11780 Arguments:
11781 """"""""""
11782 The argument to this intrinsic must be a vector of integer values.
11783
11784 '``llvm.experimental.vector.reduce.fmul.*``' Intrinsic
11785 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11786
11787 Syntax:
11788 """""""
11789
11790 ::
11791
11792 declare float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float %acc, <4 x float> %a)
11793 declare double @llvm.experimental.vector.reduce.fmul.f64.v2f64(double %acc, <2 x double> %a)
11794
11795 Overview:
11796 """""""""
11797
11798 The '``llvm.experimental.vector.reduce.fmul.*``' intrinsics do a floating point
11799 ``MUL`` reduction of a vector, returning the result as a scalar. The return type
11800 matches the element-type of the vector input.
11801
11802 If the intrinsic call has fast-math flags, then the reduction will not preserve
11803 the associativity of an equivalent scalarized counterpart. If it does not have
11804 fast-math flags, then the reduction will be *ordered*, implying that the
11805 operation respects the associativity of a scalarized reduction.
11806
11807
11808 Arguments:
11809 """"""""""
11810 The first argument to this intrinsic is a scalar accumulator value, which is
11811 only used when there are no fast-math flags attached. This argument may be undef
11812 when fast-math flags are used.
11813
11814 The second argument must be a vector of floating point values.
11815
11816 Examples:
11817 """""""""
11818
11819 .. code-block:: llvm
11820
11821 %fast = call fast float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float undef, <4 x float> %input) ; fast reduction
11822 %ord = call float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float %acc, <4 x float> %input) ; ordered reduction
11823
11824 '``llvm.experimental.vector.reduce.and.*``' Intrinsic
11825 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11826
11827 Syntax:
11828 """""""
11829
11830 ::
11831
11832 declare i32 @llvm.experimental.vector.reduce.and.i32.v4i32(<4 x i32> %a)
11833
11834 Overview:
11835 """""""""
11836
11837 The '``llvm.experimental.vector.reduce.and.*``' intrinsics do a bitwise ``AND``
11838 reduction of a vector, returning the result as a scalar. The return type matches
11839 the element-type of the vector input.
11840
11841 Arguments:
11842 """"""""""
11843 The argument to this intrinsic must be a vector of integer values.
11844
11845 '``llvm.experimental.vector.reduce.or.*``' Intrinsic
11846 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11847
11848 Syntax:
11849 """""""
11850
11851 ::
11852
11853 declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a)
11854
11855 Overview:
11856 """""""""
11857
11858 The '``llvm.experimental.vector.reduce.or.*``' intrinsics do a bitwise ``OR`` reduction
11859 of a vector, returning the result as a scalar. The return type matches the
11860 element-type of the vector input.
11861
11862 Arguments:
11863 """"""""""
11864 The argument to this intrinsic must be a vector of integer values.
11865
11866 '``llvm.experimental.vector.reduce.xor.*``' Intrinsic
11867 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11868
11869 Syntax:
11870 """""""
11871
11872 ::
11873
11874 declare i32 @llvm.experimental.vector.reduce.xor.i32.v4i32(<4 x i32> %a)
11875
11876 Overview:
11877 """""""""
11878
11879 The '``llvm.experimental.vector.reduce.xor.*``' intrinsics do a bitwise ``XOR``
11880 reduction of a vector, returning the result as a scalar. The return type matches
11881 the element-type of the vector input.
11882
11883 Arguments:
11884 """"""""""
11885 The argument to this intrinsic must be a vector of integer values.
11886
11887 '``llvm.experimental.vector.reduce.smax.*``' Intrinsic
11888 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11889
11890 Syntax:
11891 """""""
11892
11893 ::
11894
11895 declare i32 @llvm.experimental.vector.reduce.smax.i32.v4i32(<4 x i32> %a)
11896
11897 Overview:
11898 """""""""
11899
11900 The '``llvm.experimental.vector.reduce.smax.*``' intrinsics do a signed integer
11901 ``MAX`` reduction of a vector, returning the result as a scalar. The return type
11902 matches the element-type of the vector input.
11903
11904 Arguments:
11905 """"""""""
11906 The argument to this intrinsic must be a vector of integer values.
11907
11908 '``llvm.experimental.vector.reduce.smin.*``' Intrinsic
11909 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11910
11911 Syntax:
11912 """""""
11913
11914 ::
11915
11916 declare i32 @llvm.experimental.vector.reduce.smin.i32.v4i32(<4 x i32> %a)
11917
11918 Overview:
11919 """""""""
11920
11921 The '``llvm.experimental.vector.reduce.smin.*``' intrinsics do a signed integer
11922 ``MIN`` reduction of a vector, returning the result as a scalar. The return type
11923 matches the element-type of the vector input.
11924
11925 Arguments:
11926 """"""""""
11927 The argument to this intrinsic must be a vector of integer values.
11928
11929 '``llvm.experimental.vector.reduce.umax.*``' Intrinsic
11930 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11931
11932 Syntax:
11933 """""""
11934
11935 ::
11936
11937 declare i32 @llvm.experimental.vector.reduce.umax.i32.v4i32(<4 x i32> %a)
11938
11939 Overview:
11940 """""""""
11941
11942 The '``llvm.experimental.vector.reduce.umax.*``' intrinsics do an unsigned
11943 integer ``MAX`` reduction of a vector, returning the result as a scalar. The
11944 return type matches the element-type of the vector input.
11945
11946 Arguments:
11947 """"""""""
11948 The argument to this intrinsic must be a vector of integer values.
11949
11950 '``llvm.experimental.vector.reduce.umin.*``' Intrinsic
11951 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11952
11953 Syntax:
11954 """""""
11955
11956 ::
11957
11958 declare i32 @llvm.experimental.vector.reduce.umin.i32.v4i32(<4 x i32> %a)
11959
11960 Overview:
11961 """""""""
11962
11963 The '``llvm.experimental.vector.reduce.umin.*``' intrinsics do an unsigned
11964 integer ``MIN`` reduction of a vector, returning the result as a scalar. The
11965 return type matches the element-type of the vector input.
11966
11967 Arguments:
11968 """"""""""
11969 The argument to this intrinsic must be a vector of integer values.
11970
11971 '``llvm.experimental.vector.reduce.fmax.*``' Intrinsic
11972 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11973
11974 Syntax:
11975 """""""
11976
11977 ::
11978
11979 declare float @llvm.experimental.vector.reduce.fmax.f32.v4f32(<4 x float> %a)
11980 declare double @llvm.experimental.vector.reduce.fmax.f64.v2f64(<2 x double> %a)
11981
11982 Overview:
11983 """""""""
11984
11985 The '``llvm.experimental.vector.reduce.fmax.*``' intrinsics do a floating point
11986 ``MAX`` reduction of a vector, returning the result as a scalar. The return type
11987 matches the element-type of the vector input.
11988
11989 If the intrinsic call has the ``nnan`` fast-math flag then the operation can
11990 assume that NaNs are not present in the input vector.
11991
11992 Arguments:
11993 """"""""""
11994 The argument to this intrinsic must be a vector of floating point values.
11995
11996 '``llvm.experimental.vector.reduce.fmin.*``' Intrinsic
11997 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11998
11999 Syntax:
12000 """""""
12001
12002 ::
12003
12004 declare float @llvm.experimental.vector.reduce.fmin.f32.v4f32(<4 x float> %a)
12005 declare double @llvm.experimental.vector.reduce.fmin.f64.v2f64(<2 x double> %a)
12006
12007 Overview:
12008 """""""""
12009
12010 The '``llvm.experimental.vector.reduce.fmin.*``' intrinsics do a floating point
12011 ``MIN`` reduction of a vector, returning the result as a scalar. The return type
12012 matches the element-type of the vector input.
12013
12014 If the intrinsic call has the ``nnan`` fast-math flag then the operation can
12015 assume that NaNs are not present in the input vector.
12016
12017 Arguments:
12018 """"""""""
12019 The argument to this intrinsic must be a vector of floating point values.
12020
1168912021 Half Precision Floating Point Intrinsics
1169012022 ----------------------------------------
1169112023
738738 unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
739739 unsigned ChainSizeInBytes,
740740 VectorType *VecTy) const;
741
742 /// Flags describing the kind of vector reduction.
743 struct ReductionFlags {
744 ReductionFlags() : IsMaxOp(false), IsSigned(false), NoNaN(false) {}
745 bool IsMaxOp; ///< If the op a min/max kind, true if it's a max operation.
746 bool IsSigned; ///< Whether the operation is a signed int reduction.
747 bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.
748 };
749
750 /// \returns True if the target wants to handle the given reduction idiom in
751 /// the intrinsics form instead of the shuffle form.
752 bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
753 ReductionFlags Flags) const;
741754
742755 /// @}
743756
894907 virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
895908 unsigned ChainSizeInBytes,
896909 VectorType *VecTy) const = 0;
910 virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
911 ReductionFlags) const = 0;
897912 };
898913
899914 template
11981213 unsigned ChainSizeInBytes,
11991214 VectorType *VecTy) const override {
12001215 return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
1216 }
1217 bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
1218 ReductionFlags Flags) const override {
1219 return Impl.useReductionIntrinsic(Opcode, Ty, Flags);
12011220 }
12021221 };
12031222
455455 VectorType *VecTy) const {
456456 return VF;
457457 }
458
459 bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
460 TTI::ReductionFlags Flags) const {
461 return false;
462 }
463
458464 protected:
459465 // Obtain the minimum required size to hold the value (without the sign)
460466 // In case of a vector it returns the min required size for one element.
781781 /// for some others (e.g. PowerPC, PowerPC64) that would be compile-time
782782 /// known nonzero constant. The only operand here is the chain.
783783 GET_DYNAMIC_AREA_OFFSET,
784
785 /// Generic reduction nodes. These nodes represent horizontal vector
786 /// reduction operations, producing a scalar result.
787 /// The STRICT variants perform reductions in sequential order. The first
788 /// operand is an initial scalar accumulator value, and the second operand
789 /// is the vector to reduce.
790 VECREDUCE_STRICT_FADD, VECREDUCE_STRICT_FMUL,
791 /// These reductions are non-strict, and have a single vector operand.
792 VECREDUCE_FADD, VECREDUCE_FMUL,
793 VECREDUCE_ADD, VECREDUCE_MUL,
794 VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,
795 VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,
796 /// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
797 VECREDUCE_FMAX, VECREDUCE_FMIN,
784798
785799 /// BUILTIN_OP_END - This must be the last enum value in this list.
786800 /// The target-specific pre-isel opcode values start here.
453453 MDNode *ScopeTag = nullptr,
454454 MDNode *NoAliasTag = nullptr);
455455
456 /// \brief Create a vector fadd reduction intrinsic of the source vector.
457 /// The first parameter is a scalar accumulator value for ordered reductions.
458 CallInst *CreateFAddReduce(Value *Acc, Value *Src);
459
460 /// \brief Create a vector fmul reduction intrinsic of the source vector.
461 /// The first parameter is a scalar accumulator value for ordered reductions.
462 CallInst *CreateFMulReduce(Value *Acc, Value *Src);
463
464 /// \brief Create a vector int add reduction intrinsic of the source vector.
465 CallInst *CreateAddReduce(Value *Src);
466
467 /// \brief Create a vector int mul reduction intrinsic of the source vector.
468 CallInst *CreateMulReduce(Value *Src);
469
470 /// \brief Create a vector int AND reduction intrinsic of the source vector.
471 CallInst *CreateAndReduce(Value *Src);
472
473 /// \brief Create a vector int OR reduction intrinsic of the source vector.
474 CallInst *CreateOrReduce(Value *Src);
475
476 /// \brief Create a vector int XOR reduction intrinsic of the source vector.
477 CallInst *CreateXorReduce(Value *Src);
478
479 /// \brief Create a vector integer max reduction intrinsic of the source
480 /// vector.
481 CallInst *CreateIntMaxReduce(Value *Src, bool IsSigned = false);
482
483 /// \brief Create a vector integer min reduction intrinsic of the source
484 /// vector.
485 CallInst *CreateIntMinReduce(Value *Src, bool IsSigned = false);
486
487 /// \brief Create a vector float max reduction intrinsic of the source
488 /// vector.
489 CallInst *CreateFPMaxReduce(Value *Src, bool NoNaN = false);
490
491 /// \brief Create a vector float min reduction intrinsic of the source
492 /// vector.
493 CallInst *CreateFPMinReduce(Value *Src, bool NoNaN = false);
494
456495 /// \brief Create a lifetime.start intrinsic.
457496 ///
458497 /// If the pointer isn't i8* it will be converted.
811811 [IntrArgMemOnly, NoCapture<0>, NoCapture<1>,
812812 WriteOnly<0>, ReadOnly<1>]>;
813813
814 //===------------------------ Reduction Intrinsics ------------------------===//
815 //
816 def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],
817 [llvm_anyfloat_ty,
818 llvm_anyvector_ty],
819 [IntrNoMem]>;
820 def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],
821 [llvm_anyfloat_ty,
822 llvm_anyvector_ty],
823 [IntrNoMem]>;
824 def int_experimental_vector_reduce_add : Intrinsic<[llvm_anyint_ty],
825 [llvm_anyvector_ty],
826 [IntrNoMem]>;
827 def int_experimental_vector_reduce_mul : Intrinsic<[llvm_anyint_ty],
828 [llvm_anyvector_ty],
829 [IntrNoMem]>;
830 def int_experimental_vector_reduce_and : Intrinsic<[llvm_anyint_ty],
831 [llvm_anyvector_ty],
832 [IntrNoMem]>;
833 def int_experimental_vector_reduce_or : Intrinsic<[llvm_anyint_ty],
834 [llvm_anyvector_ty],
835 [IntrNoMem]>;
836 def int_experimental_vector_reduce_xor : Intrinsic<[llvm_anyint_ty],
837 [llvm_anyvector_ty],
838 [IntrNoMem]>;
839 def int_experimental_vector_reduce_smax : Intrinsic<[llvm_anyint_ty],
840 [llvm_anyvector_ty],
841 [IntrNoMem]>;
842 def int_experimental_vector_reduce_smin : Intrinsic<[llvm_anyint_ty],
843 [llvm_anyvector_ty],
844 [IntrNoMem]>;
845 def int_experimental_vector_reduce_umax : Intrinsic<[llvm_anyint_ty],
846 [llvm_anyvector_ty],
847 [IntrNoMem]>;
848 def int_experimental_vector_reduce_umin : Intrinsic<[llvm_anyint_ty],
849 [llvm_anyvector_ty],
850 [IntrNoMem]>;
851 def int_experimental_vector_reduce_fmax : Intrinsic<[llvm_anyfloat_ty],
852 [llvm_anyvector_ty],
853 [IntrNoMem]>;
854 def int_experimental_vector_reduce_fmin : Intrinsic<[llvm_anyfloat_ty],
855 [llvm_anyvector_ty],
856 [IntrNoMem]>;
857
814858 //===----- Intrinsics that are used to provide predicate information -----===//
815859
816860 def int_ssa_copy : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>],
2020 #include "llvm/ADT/StringRef.h"
2121 #include "llvm/Analysis/AliasAnalysis.h"
2222 #include "llvm/Analysis/EHPersonalities.h"
23 #include "llvm/Analysis/TargetTransformInfo.h"
2324 #include "llvm/IR/Dominators.h"
2425 #include "llvm/IR/IRBuilder.h"
2526 #include "llvm/IR/InstrTypes.h"
4142 class ScalarEvolution;
4243 class SCEV;
4344 class TargetLibraryInfo;
45 class TargetTransformInfo;
4446
4547 /// \brief Captures loop safety information.
4648 /// It keep information for loop & its header may throw exception.
488490 LoopSafetyInfo *SafetyInfo,
489491 OptimizationRemarkEmitter *ORE = nullptr);
490492
493 /// Create a target reduction of the given vector. The reduction operation
494 /// is described by the \p Opcode parameter. min/max reductions require
495 /// additional information supplied in \p Flags.
496 /// The target is queried to determine if intrinsics or shuffle sequences are
497 /// required to implement the reduction.
498 Value *
499 createSimpleTargetReduction(IRBuilder<> &B, const TargetTransformInfo *TTI,
500 unsigned Opcode, Value *Src,
501 TargetTransformInfo::ReductionFlags Flags =
502 TargetTransformInfo::ReductionFlags(),
503 ArrayRef RedOps = ArrayRef());
504
505 /// Create a generic target reduction using a recurrence descriptor \p Desc
506 /// The target is queried to determine if intrinsics or shuffle sequences are
507 /// required to implement the reduction.
508 Value *createTargetReduction(IRBuilder<> &B, const TargetTransformInfo *TTI,
509 RecurrenceDescriptor &Desc, Value *Src,
510 bool NoNaN = false);
511
512 /// Get the intersection (logical and) of all of the potential IR flags
513 /// of each scalar operation (VL) that will be converted into a vector (I).
514 /// Flag set: NSW, NUW, exact, and all of fast-math.
515 void propagateIRFlags(Value *I, ArrayRef VL);
516
491517 } // end namespace llvm
492518
493519 #endif // LLVM_TRANSFORMS_UTILS_LOOPUTILS_H
499499 return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
500500 }
501501
502 bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,
503 Type *Ty, ReductionFlags Flags) const {
504 return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
505 }
506
507
502508 TargetTransformInfo::Concept::~Concept() {}
503509
504510 TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
2222 #include "llvm/IR/PatternMatch.h"
2323 #include "llvm/IR/Value.h"
2424 #include "llvm/IR/Constants.h"
25 #include "llvm/IR/IRBuilder.h"
2526
2627 using namespace llvm;
2728 using namespace llvm::PatternMatch;
674674 // Vector Operand Splitting: <128 x ty> -> 2 x <64 x ty>.
675675 bool SplitVectorOperand(SDNode *N, unsigned OpNo);
676676 SDValue SplitVecOp_VSELECT(SDNode *N, unsigned OpNo);
677 SDValue SplitVecOp_VECREDUCE(SDNode *N, unsigned OpNo);
677678 SDValue SplitVecOp_UnaryOp(SDNode *N);
678679 SDValue SplitVecOp_TruncateHelper(SDNode *N);
679680
15121512 case ISD::ZERO_EXTEND_VECTOR_INREG:
15131513 Res = SplitVecOp_ExtVecInRegOp(N);
15141514 break;
1515
1516 case ISD::VECREDUCE_FADD:
1517 case ISD::VECREDUCE_FMUL:
1518 case ISD::VECREDUCE_ADD:
1519 case ISD::VECREDUCE_MUL:
1520 case ISD::VECREDUCE_AND:
1521 case ISD::VECREDUCE_OR:
1522 case ISD::VECREDUCE_XOR:
1523 case ISD::VECREDUCE_SMAX:
1524 case ISD::VECREDUCE_SMIN:
1525 case ISD::VECREDUCE_UMAX:
1526 case ISD::VECREDUCE_UMIN:
1527 case ISD::VECREDUCE_FMAX:
1528 case ISD::VECREDUCE_FMIN:
1529 Res = SplitVecOp_VECREDUCE(N, OpNo);
1530 break;
15151531 }
15161532 }
15171533
15621578 DAG.getNode(ISD::VSELECT, DL, HiOpVT, HiMask, HiOp0, HiOp1);
15631579
15641580 return DAG.getNode(ISD::CONCAT_VECTORS, DL, Src0VT, LoSelect, HiSelect);
1581 }
1582
1583 SDValue DAGTypeLegalizer::SplitVecOp_VECREDUCE(SDNode *N, unsigned OpNo) {
1584 EVT ResVT = N->getValueType(0);
1585 SDValue Lo, Hi;
1586 SDLoc dl(N);
1587
1588 SDValue VecOp = N->getOperand(OpNo);
1589 EVT VecVT = VecOp.getValueType();
1590 assert(VecVT.isVector() && "Can only split reduce vector operand");
1591 GetSplitVector(VecOp, Lo, Hi);
1592 EVT LoOpVT, HiOpVT;
1593 std::tie(LoOpVT, HiOpVT) = DAG.GetSplitDestVTs(VecVT);
1594
1595 bool NoNaN = N->getFlags().hasNoNaNs();
1596 unsigned CombineOpc = 0;
1597 switch (N->getOpcode()) {
1598 case ISD::VECREDUCE_FADD: CombineOpc = ISD::FADD; break;
1599 case ISD::VECREDUCE_FMUL: CombineOpc = ISD::FMUL; break;
1600 case ISD::VECREDUCE_ADD: CombineOpc = ISD::ADD; break;
1601 case ISD::VECREDUCE_MUL: CombineOpc = ISD::MUL; break;
1602 case ISD::VECREDUCE_AND: CombineOpc = ISD::AND; break;
1603 case ISD::VECREDUCE_OR: CombineOpc = ISD::OR; break;
1604 case ISD::VECREDUCE_XOR: CombineOpc = ISD::XOR; break;
1605 case ISD::VECREDUCE_SMAX: CombineOpc = ISD::SMAX; break;
1606 case ISD::VECREDUCE_SMIN: CombineOpc = ISD::SMIN; break;
1607 case ISD::VECREDUCE_UMAX: CombineOpc = ISD::UMAX; break;
1608 case ISD::VECREDUCE_UMIN: CombineOpc = ISD::UMIN; break;
1609 case ISD::VECREDUCE_FMAX:
1610 CombineOpc = NoNaN ? ISD::FMAXNUM : ISD::FMAXNAN;
1611 break;
1612 case ISD::VECREDUCE_FMIN:
1613 CombineOpc = NoNaN ? ISD::FMINNUM : ISD::FMINNAN;
1614 break;
1615 default:
1616 llvm_unreachable("Unexpected reduce ISD node");
1617 }
1618
1619 // Use the appropriate scalar instruction on the split subvectors before
1620 // reducing the now partially reduced smaller vector.
1621 SDValue Partial = DAG.getNode(CombineOpc, dl, LoOpVT, Lo, Hi);
1622 return DAG.getNode(N->getOpcode(), dl, ResVT, Partial);
15651623 }
15661624
15671625 SDValue DAGTypeLegalizer::SplitVecOp_UnaryOp(SDNode *N) {
59695969 unsigned NumOps = Ops.size();
59705970 switch (NumOps) {
59715971 case 0: return getNode(Opcode, DL, VT);
5972 case 1: return getNode(Opcode, DL, VT, Ops[0]);
5972 case 1: return getNode(Opcode, DL, VT, Ops[0], Flags);
59735973 case 2: return getNode(Opcode, DL, VT, Ops[0], Ops[1], Flags);
59745974 case 3: return getNode(Opcode, DL, VT, Ops[0], Ops[1], Ops[2]);
59755975 default: break;
57365736 case Intrinsic::experimental_deoptimize:
57375737 LowerDeoptimizeCall(&I);
57385738 return nullptr;
5739
5740 case Intrinsic::experimental_vector_reduce_fadd:
5741 case Intrinsic::experimental_vector_reduce_fmul:
5742 case Intrinsic::experimental_vector_reduce_add:
5743 case Intrinsic::experimental_vector_reduce_mul:
5744 case Intrinsic::experimental_vector_reduce_and:
5745 case Intrinsic::experimental_vector_reduce_or:
5746 case Intrinsic::experimental_vector_reduce_xor:
5747 case Intrinsic::experimental_vector_reduce_smax:
5748 case Intrinsic::experimental_vector_reduce_smin:
5749 case Intrinsic::experimental_vector_reduce_umax:
5750 case Intrinsic::experimental_vector_reduce_umin:
5751 case Intrinsic::experimental_vector_reduce_fmax:
5752 case Intrinsic::experimental_vector_reduce_fmin: {
5753 visitVectorReduce(I, Intrinsic);
5754 return nullptr;
5755 }
5756
57395757 }
57405758 }
57415759
76157633 FuncInfo.MF->getFrameInfo().setHasPatchPoint();
76167634 }
76177635
7636 void SelectionDAGBuilder::visitVectorReduce(const CallInst &I,
7637 unsigned Intrinsic) {
7638 const TargetLowering &TLI = DAG.getTargetLoweringInfo();
7639 SDValue Op1 = getValue(I.getArgOperand(0));
7640 SDValue Op2;
7641 if (I.getNumArgOperands() > 1)
7642 Op2 = getValue(I.getArgOperand(1));
7643 SDLoc dl = getCurSDLoc();
7644 EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
7645 SDValue Res;
7646 FastMathFlags FMF;
7647 if (isa(I))
7648 FMF = I.getFastMathFlags();
7649 SDNodeFlags SDFlags;
7650 SDFlags.setNoNaNs(FMF.noNaNs());
7651
7652 switch (Intrinsic) {
7653 case Intrinsic::experimental_vector_reduce_fadd:
7654 if (FMF.unsafeAlgebra())
7655 Res = DAG.getNode(ISD::VECREDUCE_FADD, dl, VT, Op2);
7656 else
7657 Res = DAG.getNode(ISD::VECREDUCE_STRICT_FADD, dl, VT, Op1, Op2);
7658 break;
7659 case Intrinsic::experimental_vector_reduce_fmul:
7660 if (FMF.unsafeAlgebra())
7661 Res = DAG.getNode(ISD::VECREDUCE_FMUL, dl, VT, Op2);
7662 else
7663 Res = DAG.getNode(ISD::VECREDUCE_STRICT_FMUL, dl, VT, Op1, Op2);
7664 break;
7665 case Intrinsic::experimental_vector_reduce_add:
7666 Res = DAG.getNode(ISD::VECREDUCE_ADD, dl, VT, Op1);
7667 break;
7668 case Intrinsic::experimental_vector_reduce_mul:
7669 Res = DAG.getNode(ISD::VECREDUCE_MUL, dl, VT, Op1);
7670 break;
7671 case Intrinsic::experimental_vector_reduce_and:
7672 Res = DAG.getNode(ISD::VECREDUCE_AND, dl, VT, Op1);
7673 break;
7674 case Intrinsic::experimental_vector_reduce_or:
7675 Res = DAG.getNode(ISD::VECREDUCE_OR, dl, VT, Op1);
7676 break;
7677 case Intrinsic::experimental_vector_reduce_xor:
7678 Res = DAG.getNode(ISD::VECREDUCE_XOR, dl, VT, Op1);
7679 break;
7680 case Intrinsic::experimental_vector_reduce_smax:
7681 Res = DAG.getNode(ISD::VECREDUCE_SMAX, dl, VT, Op1);
7682 break;
7683 case Intrinsic::experimental_vector_reduce_smin:
7684 Res = DAG.getNode(ISD::VECREDUCE_SMIN, dl, VT, Op1);
7685 break;
7686 case Intrinsic::experimental_vector_reduce_umax:
7687 Res = DAG.getNode(ISD::VECREDUCE_UMAX, dl, VT, Op1);
7688 break;
7689 case Intrinsic::experimental_vector_reduce_umin:
7690 Res = DAG.getNode(ISD::VECREDUCE_UMIN, dl, VT, Op1);
7691 break;
7692 case Intrinsic::experimental_vector_reduce_fmax: {
7693 Res = DAG.getNode(ISD::VECREDUCE_FMAX, dl, VT, Op1, SDFlags);
7694 break;
7695 }
7696 case Intrinsic::experimental_vector_reduce_fmin: {
7697 Res = DAG.getNode(ISD::VECREDUCE_FMIN, dl, VT, Op1, SDFlags);
7698 break;
7699 }
7700 default:
7701 llvm_unreachable("Unhandled vector reduce intrinsic");
7702 }
7703 setValue(&I, Res);
7704 }
7705
76187706 /// Returns an AttributeList representing the attributes applied to the return
76197707 /// value of the given call.
76207708 static AttributeList getReturnAttrs(TargetLowering::CallLoweringInfo &CLI) {
908908 void visitGCRelocate(const GCRelocateInst &I);
909909 void visitGCResult(const GCResultInst &I);
910910
911 void visitVectorReduce(const CallInst &I, unsigned Intrinsic);
912
911913 void visitUserOp1(const Instruction &I) {
912914 llvm_unreachable("UserOp1 should not exist at instruction selection time!");
913915 }
345345 case ISD::SETFALSE: return "setfalse";
346346 case ISD::SETFALSE2: return "setfalse2";
347347 }
348 case ISD::VECREDUCE_FADD: return "vecreduce_fadd";
349 case ISD::VECREDUCE_FMUL: return "vecreduce_fmul";
350 case ISD::VECREDUCE_ADD: return "vecreduce_add";
351 case ISD::VECREDUCE_MUL: return "vecreduce_mul";
352 case ISD::VECREDUCE_AND: return "vecreduce_and";
353 case ISD::VECREDUCE_OR: return "vecreduce_or";
354 case ISD::VECREDUCE_XOR: return "vecreduce_xor";
355 case ISD::VECREDUCE_SMAX: return "vecreduce_smax";
356 case ISD::VECREDUCE_SMIN: return "vecreduce_smin";
357 case ISD::VECREDUCE_UMAX: return "vecreduce_umax";
358 case ISD::VECREDUCE_UMIN: return "vecreduce_umin";
359 case ISD::VECREDUCE_FMAX: return "vecreduce_fmax";
360 case ISD::VECREDUCE_FMIN: return "vecreduce_fmin";
348361 }
349362 }
350363
158158 CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);
159159
160160 return CI;
161 }
162
163 static CallInst *getReductionIntrinsic(IRBuilderBase *Builder, Intrinsic::ID ID,
164 Value *Src) {
165 Module *M = Builder->GetInsertBlock()->getParent()->getParent();
166 Value *Ops[] = {Src};
167 Type *Tys[] = { Src->getType()->getVectorElementType(), Src->getType() };
168 auto Decl = Intrinsic::getDeclaration(M, ID, Tys);
169 return createCallHelper(Decl, Ops, Builder);
170 }
171
172 CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src) {
173 Module *M = GetInsertBlock()->getParent()->getParent();
174 Value *Ops[] = {Acc, Src};
175 Type *Tys[] = {Src->getType()->getVectorElementType(), Acc->getType(),
176 Src->getType()};
177 auto Decl = Intrinsic::getDeclaration(
178 M, Intrinsic::experimental_vector_reduce_fadd, Tys);
179 return createCallHelper(Decl, Ops, this);
180 }
181
182 CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src) {
183 Module *M = GetInsertBlock()->getParent()->getParent();
184 Value *Ops[] = {Acc, Src};
185 Type *Tys[] = {Src->getType()->getVectorElementType(), Acc->getType(),
186 Src->getType()};
187 auto Decl = Intrinsic::getDeclaration(
188 M, Intrinsic::experimental_vector_reduce_fmul, Tys);
189 return createCallHelper(Decl, Ops, this);
190 }
191
192 CallInst *IRBuilderBase::CreateAddReduce(Value *Src) {
193 return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_add,
194 Src);
195 }
196
197 CallInst *IRBuilderBase::CreateMulReduce(Value *Src) {
198 return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_mul,
199 Src);
200 }
201
202 CallInst *IRBuilderBase::CreateAndReduce(Value *Src) {
203 return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_and,
204 Src);
205 }
206
207 CallInst *IRBuilderBase::CreateOrReduce(Value *Src) {
208 return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_or,
209 Src);
210 }
211
212 CallInst *IRBuilderBase::CreateXorReduce(Value *Src) {
213 return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_xor,
214 Src);
215 }
216
217 CallInst *IRBuilderBase::CreateIntMaxReduce(Value *Src, bool IsSigned) {
218 auto ID = IsSigned ? Intrinsic::experimental_vector_reduce_smax
219 : Intrinsic::experimental_vector_reduce_umax;
220 return getReductionIntrinsic(this, ID, Src);
221 }
222
223 CallInst *IRBuilderBase::CreateIntMinReduce(Value *Src, bool IsSigned) {
224 auto ID = IsSigned ? Intrinsic::experimental_vector_reduce_smin
225 : Intrinsic::experimental_vector_reduce_umin;
226 return getReductionIntrinsic(this, ID, Src);
227 }
228
229 CallInst *IRBuilderBase::CreateFPMaxReduce(Value *Src, bool NoNaN) {
230 auto Rdx = getReductionIntrinsic(
231 this, Intrinsic::experimental_vector_reduce_fmax, Src);
232 if (NoNaN) {
233 FastMathFlags FMF;
234 FMF.setNoNaNs();
235 Rdx->setFastMathFlags(FMF);
236 }
237 return Rdx;
238 }
239
240 CallInst *IRBuilderBase::CreateFPMinReduce(Value *Src, bool NoNaN) {
241 auto Rdx = getReductionIntrinsic(
242 this, Intrinsic::experimental_vector_reduce_fmin, Src);
243 if (NoNaN) {
244 FastMathFlags FMF;
245 FMF.setNoNaNs();
246 Rdx->setFastMathFlags(FMF);
247 }
248 return Rdx;
161249 }
162250
163251 CallInst *IRBuilderBase::CreateLifetimeStart(Value *Ptr, ConstantInt *Size) {
1717 #include "llvm/Analysis/GlobalsModRef.h"
1818 #include "llvm/Analysis/LoopInfo.h"
1919 #include "llvm/Analysis/LoopPass.h"
20 #include "llvm/Analysis/TargetTransformInfo.h"
2021 #include "llvm/Analysis/ScalarEvolution.h"
2122 #include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
2223 #include "llvm/Analysis/ScalarEvolutionExpander.h"
11111112 else
11121113 return (FalseVal + (TrueVal / 2)) / TrueVal;
11131114 }
1115
1116 /// \brief Adds a 'fast' flag to floating point operations.
1117 static Value *addFastMathFlag(Value *V) {
1118 if (isa(V)) {
1119 FastMathFlags Flags;
1120 Flags.setUnsafeAlgebra();
1121 cast(V)->setFastMathFlags(Flags);
1122 }
1123 return V;
1124 }
1125
1126 // Helper to generate a log2 shuffle reduction.
1127 static Value *
1128 getShuffleReduction(IRBuilder<> &Builder, Value *Src, unsigned Op,
1129 RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =
1130 RecurrenceDescriptor::MRK_Invalid,
1131 ArrayRef RedOps = ArrayRef()) {
1132 unsigned VF = Src->getType()->getVectorNumElements();
1133 // VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
1134 // and vector ops, reducing the set of values being computed by half each
1135 // round.
1136 assert(isPowerOf2_32(VF) &&
1137 "Reduction emission only supported for pow2 vectors!");
1138 Value *TmpVec = Src;
1139 SmallVector ShuffleMask(VF, nullptr);
1140 for (unsigned i = VF; i != 1; i >>= 1) {
1141 // Move the upper half of the vector to the lower half.
1142 for (unsigned j = 0; j != i / 2; ++j)
1143 ShuffleMask[j] = Builder.getInt32(i / 2 + j);
1144
1145 // Fill the rest of the mask with undef.
1146 std::fill(&ShuffleMask[i / 2], ShuffleMask.end(),
1147 UndefValue::get(Builder.getInt32Ty()));
1148
1149 Value *Shuf = Builder.CreateShuffleVector(
1150 TmpVec, UndefValue::get(TmpVec->getType()),
1151 ConstantVector::get(ShuffleMask), "rdx.shuf");
1152
1153 if (Op != Instruction::ICmp && Op != Instruction::FCmp) {
1154 // Floating point operations had to be 'fast' to enable the reduction.
1155 TmpVec = addFastMathFlag(Builder.CreateBinOp((Instruction::BinaryOps)Op,
1156 TmpVec, Shuf, "bin.rdx"));
1157 } else {
1158 assert(MinMaxKind != RecurrenceDescriptor::MRK_Invalid &&
1159 "Invalid min/max");
1160 TmpVec = RecurrenceDescriptor::createMinMaxOp(Builder, MinMaxKind, TmpVec,
1161 Shuf);
1162 }
1163 if (!RedOps.empty())
1164 propagateIRFlags(TmpVec, RedOps);
1165 }
1166 // The result is in the first element of the vector.
1167 return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));
1168 }
1169
1170 /// Create a simple vector reduction specified by an opcode and some
1171 /// flags (if generating min/max reductions).
1172 Value *llvm::createSimpleTargetReduction(
1173 IRBuilder<> &Builder, const TargetTransformInfo *TTI, unsigned Opcode,
1174 Value *Src, TargetTransformInfo::ReductionFlags Flags,
1175 ArrayRef RedOps) {
1176 assert(isa(Src->getType()) && "Type must be a vector");
1177
1178 Value *ScalarUdf = UndefValue::get(Src->getType()->getVectorElementType());
1179 std::function BuildFunc;
1180 using RD = RecurrenceDescriptor;
1181 RD::MinMaxRecurrenceKind MinMaxKind = RD::MRK_Invalid;
1182 // TODO: Support creating ordered reductions.
1183 FastMathFlags FMFUnsafe;
1184 FMFUnsafe.setUnsafeAlgebra();
1185
1186 switch (Opcode) {
1187 case Instruction::Add:
1188 BuildFunc = [&]() { return Builder.CreateAddReduce(Src); };
1189 break;
1190 case Instruction::Mul:
1191 BuildFunc = [&]() { return Builder.CreateMulReduce(Src); };
1192 break;
1193 case Instruction::And:
1194 BuildFunc = [&]() { return Builder.CreateAndReduce(Src); };
1195 break;
1196 case Instruction::Or:
1197 BuildFunc = [&]() { return Builder.CreateOrReduce(Src); };
1198 break;
1199 case Instruction::Xor:
1200 BuildFunc = [&]() { return Builder.CreateXorReduce(Src); };
1201 break;
1202 case Instruction::FAdd:
1203 BuildFunc = [&]() {
1204 auto Rdx = Builder.CreateFAddReduce(ScalarUdf, Src);
1205 cast(Rdx)->setFastMathFlags(FMFUnsafe);
1206 return Rdx;
1207 };
1208 break;
1209 case Instruction::FMul:
1210 BuildFunc = [&]() {
1211 auto Rdx = Builder.CreateFMulReduce(ScalarUdf, Src);
1212 cast(Rdx)->setFastMathFlags(FMFUnsafe);
1213 return Rdx;
1214 };
1215 break;
1216 case Instruction::ICmp:
1217 if (Flags.IsMaxOp) {
1218 MinMaxKind = Flags.IsSigned ? RD::MRK_SIntMax : RD::MRK_UIntMax;
1219 BuildFunc = [&]() {
1220 return Builder.CreateIntMaxReduce(Src, Flags.IsSigned);
1221 };
1222 } else {
1223 MinMaxKind = Flags.IsSigned ? RD::MRK_SIntMin : RD::MRK_UIntMin;
1224 BuildFunc = [&]() {
1225 return Builder.CreateIntMinReduce(Src, Flags.IsSigned);
1226 };
1227 }
1228 break;
1229 case Instruction::FCmp:
1230 if (Flags.IsMaxOp) {
1231 MinMaxKind = RD::MRK_FloatMax;
1232 BuildFunc = [&]() { return Builder.CreateFPMaxReduce(Src, Flags.NoNaN); };
1233 } else {
1234 MinMaxKind = RD::MRK_FloatMin;
1235 BuildFunc = [&]() { return Builder.CreateFPMinReduce(Src, Flags.NoNaN); };
1236 }
1237 break;
1238 default:
1239 llvm_unreachable("Unhandled opcode");
1240 break;
1241 }
1242 if (TTI->useReductionIntrinsic(Opcode, Src->getType(), Flags))
1243 return BuildFunc();
1244 return getShuffleReduction(Builder, Src, Opcode, MinMaxKind, RedOps);
1245 }
1246
1247 /// Create a vector reduction using a given recurrence descriptor.
1248 Value *llvm::createTargetReduction(IRBuilder<> &Builder,
1249 const TargetTransformInfo *TTI,
1250 RecurrenceDescriptor &Desc, Value *Src,
1251 bool NoNaN) {
1252 // TODO: Support in-order reductions based on the recurrence descriptor.
1253 RecurrenceDescriptor::RecurrenceKind RecKind = Desc.getRecurrenceKind();
1254 TargetTransformInfo::ReductionFlags Flags;
1255 Flags.NoNaN = NoNaN;
1256 auto getSimpleRdx = [&](unsigned Opc) {
1257 return createSimpleTargetReduction(Builder, TTI, Opc, Src, Flags);
1258 };
1259 switch (RecKind) {
1260 case RecurrenceDescriptor::RK_FloatAdd:
1261 return getSimpleRdx(Instruction::FAdd);
1262 case RecurrenceDescriptor::RK_FloatMult:
1263 return getSimpleRdx(Instruction::FMul);
1264 case RecurrenceDescriptor::RK_IntegerAdd:
1265 return getSimpleRdx(Instruction::Add);
1266 case RecurrenceDescriptor::RK_IntegerMult:
1267 return getSimpleRdx(Instruction::Mul);
1268 case RecurrenceDescriptor::RK_IntegerAnd:
1269 return getSimpleRdx(Instruction::And);
1270 case RecurrenceDescriptor::RK_IntegerOr:
1271 return getSimpleRdx(Instruction::Or);
1272 case RecurrenceDescriptor::RK_IntegerXor:
1273 return getSimpleRdx(Instruction::Xor);
1274 case RecurrenceDescriptor::RK_IntegerMinMax: {
1275 switch (Desc.getMinMaxRecurrenceKind()) {
1276 case RecurrenceDescriptor::MRK_SIntMax:
1277 Flags.IsSigned = true;
1278 Flags.IsMaxOp = true;
1279 break;
1280 case RecurrenceDescriptor::MRK_UIntMax:
1281 Flags.IsMaxOp = true;
1282 break;
1283 case RecurrenceDescriptor::MRK_SIntMin:
1284 Flags.IsSigned = true;
1285 break;
1286 case RecurrenceDescriptor::MRK_UIntMin:
1287 break;
1288 default:
1289 llvm_unreachable("Unhandled MRK");
1290 }
1291 return getSimpleRdx(Instruction::ICmp);
1292 }
1293 case RecurrenceDescriptor::RK_FloatMinMax: {
1294 Flags.IsMaxOp =
1295 Desc.getMinMaxRecurrenceKind() == RecurrenceDescriptor::MRK_FloatMax;
1296 return getSimpleRdx(Instruction::FCmp);
1297 }
1298 default:
1299 llvm_unreachable("Unhandled RecKind");
1300 }
1301 }
1302
1303 void llvm::propagateIRFlags(Value *I, ArrayRef VL) {
1304 if (auto *VecOp = dyn_cast(I)) {
1305 if (auto *I0 = dyn_cast(VL[0])) {
1306 // VecOVp is initialized to the 0th scalar, so start counting from index
1307 // '1'.
1308 VecOp->copyIRFlags(I0);
1309 for (int i = 1, e = VL.size(); i < e; ++i) {
1310 if (auto *Scalar = dyn_cast(VL[i]))
1311 VecOp->andIRFlags(Scalar);
1312 }
1313 }
1314 }
1315 }
16991699 /// access that can be widened.
17001700 bool memoryInstructionCanBeWidened(Instruction *I, unsigned VF = 1);
17011701
1702 // Returns true if the NoNaN attribute is set on the function.
1703 bool hasFunNoNaNAttr() const { return HasFunNoNaNAttr; }
1704
17021705 private:
17031706 /// Check if a single basic block loop is vectorizable.
17041707 /// At this point we know that this is a loop with a constant trip count
42574260 }
42584261
42594262 if (VF > 1) {
4260 // VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
4261 // and vector ops, reducing the set of values being computed by half each
4262 // round.
4263 assert(isPowerOf2_32(VF) &&
4264 "Reduction emission only supported for pow2 vectors!");
4265 Value *TmpVec = ReducedPartRdx;
4266 SmallVector ShuffleMask(VF, nullptr);
4267 for (unsigned i = VF; i != 1; i >>= 1) {
4268 // Move the upper half of the vector to the lower half.
4269 for (unsigned j = 0; j != i / 2; ++j)
4270 ShuffleMask[j] = Builder.getInt32(i / 2 + j);
4271
4272 // Fill the rest of the mask with undef.
4273 std::fill(&ShuffleMask[i / 2], ShuffleMask.end(),
4274 UndefValue::get(Builder.getInt32Ty()));
4275
4276 Value *Shuf = Builder.CreateShuffleVector(
4277 TmpVec, UndefValue::get(TmpVec->getType()),
4278 ConstantVector::get(ShuffleMask), "rdx.shuf");
4279
4280 if (Op != Instruction::ICmp && Op != Instruction::FCmp)
4281 // Floating point operations had to be 'fast' to enable the reduction.
4282 TmpVec = addFastMathFlag(Builder.CreateBinOp(
4283 (Instruction::BinaryOps)Op, TmpVec, Shuf, "bin.rdx"));
4284 else
4285 TmpVec = RecurrenceDescriptor::createMinMaxOp(Builder, MinMaxKind,
4286 TmpVec, Shuf);
4287 }
4288
4289 // The result is in the first element of the vector.
4263 bool NoNaN = Legal->hasFunNoNaNAttr();
42904264 ReducedPartRdx =
4291 Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));
4292
4265 createTargetReduction(Builder, TTI, RdxDesc, ReducedPartRdx, NoNaN);
42934266 // If the reduction can be performed in a smaller type, we need to extend
42944267 // the reduction to the wider type before we branch to the original loop.
42954268 if (Phi->getType() != RdxDesc.getRecurrenceType())
4040 #include "llvm/Support/Debug.h"
4141 #include "llvm/Support/GraphWriter.h"
4242 #include "llvm/Support/raw_ostream.h"
43 #include "llvm/Transforms/Utils/LoopUtils.h"
4344 #include "llvm/Transforms/Vectorize.h"
4445 #include
4546 #include
209210 }
210211 }
211212 return Opcode;
212 }
213
214 /// Get the intersection (logical and) of all of the potential IR flags
215 /// of each scalar operation (VL) that will be converted into a vector (I).
216 /// Flag set: NSW, NUW, exact, and all of fast-math.
217 static void propagateIRFlags(Value *I, ArrayRef VL) {
218 if (auto *VecOp = dyn_cast(I)) {
219 if (auto *I0 = dyn_cast(VL[0])) {
220 // VecOVp is initialized to the 0th scalar, so start counting from index
221 // '1'.
222 VecOp->copyIRFlags(I0);
223 for (int i = 1, e = VL.size(); i < e; ++i) {
224 if (auto *Scalar = dyn_cast(VL[i]))
225 VecOp->andIRFlags(Scalar);
226 }
227 }
228 }
229213 }
230214
231215 /// \returns true if all of the values in \p VL have the same type or false
45124496
45134497 // Emit a reduction.
45144498 Value *ReducedSubTree =
4515 emitReduction(VectorizedRoot, Builder, ReduxWidth, ReductionOps);
4499 emitReduction(VectorizedRoot, Builder, ReduxWidth, ReductionOps, TTI);
45164500 if (VectorizedTree) {
45174501 Builder.SetCurrentDebugLocation(Loc);
45184502 VectorizedTree = Builder.CreateBinOp(ReductionOpcode, VectorizedTree,
45824566
45834567 /// \brief Emit a horizontal reduction of the vectorized value.
45844568 Value *emitReduction(Value *VectorizedValue, IRBuilder<> &Builder,
4585 unsigned ReduxWidth, ArrayRef RedOps) {
4569 unsigned ReduxWidth, ArrayRef RedOps,
4570 const TargetTransformInfo *TTI) {
45864571 assert(VectorizedValue && "Need to have a vectorized tree node");
45874572 assert(isPowerOf2_32(ReduxWidth) &&
45884573 "We only handle power-of-two reductions for now");
45894574
4575 if (!IsPairwiseReduction)
4576 return createSimpleTargetReduction(
4577 Builder, TTI, ReductionOpcode, VectorizedValue,
4578 TargetTransformInfo::ReductionFlags(), RedOps);
4579
45904580 Value *TmpVec = VectorizedValue;
45914581 for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
4592 if (IsPairwiseReduction) {
4593 Value *LeftMask =
4582 Value *LeftMask =
45944583 createRdxShuffleMask(ReduxWidth, i, true, true, Builder);
4595 Value *RightMask =
4584 Value *RightMask =
45964585 createRdxShuffleMask(ReduxWidth, i, true, false, Builder);
45974586
4598 Value *LeftShuf = Builder.CreateShuffleVector(
4587 Value *LeftShuf = Builder.CreateShuffleVector(
45994588 TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");
4600 Value *RightShuf = Builder.CreateShuffleVector(
4589 Value *RightShuf = Builder.CreateShuffleVector(
46014590 TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
46024591 "rdx.shuf.r");
4603 TmpVec = Builder.CreateBinOp(ReductionOpcode, LeftShuf, RightShuf,
4604 "bin.rdx");
4605 } else {
4606 Value *UpperHalf =
4607 createRdxShuffleMask(ReduxWidth, i, false, false, Builder);
4608 Value *Shuf = Builder.CreateShuffleVector(
4609 TmpVec, UndefValue::get(TmpVec->getType()), UpperHalf, "rdx.shuf");
4610 TmpVec = Builder.CreateBinOp(ReductionOpcode, TmpVec, Shuf, "bin.rdx");
4611 }
4592 TmpVec =
4593 Builder.CreateBinOp(ReductionOpcode, LeftShuf, RightShuf, "bin.rdx");
46124594 propagateIRFlags(TmpVec, RedOps);
46134595 }
46144596