llvm.org GIT mirror llvm / 42a382c
Introduce llvm.loop.parallel_accesses and llvm.access.group metadata. The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349725 91177308-0d34-0410-b5e6-96231b3b80d8 Michael Kruse 1 year, 9 months ago
40 changed file(s) with 872 addition(s) and 248 deletion(s). Raw diff Collapse all Expand all
51395139 conjunction with ``llvm.loop`` loop identification metadata. The
51405140 ``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
51415141 optimization hints and the optimizer will only interleave and vectorize loops if
5142 it believes it is safe to do so. The ``llvm.mem.parallel_loop_access`` metadata
5142 it believes it is safe to do so. The ``llvm.loop.parallel_accesses`` metadata
51435143 which contains information about loop-carried memory dependencies can be helpful
51445144 in determining the safety of these transformations.
51455145
54425442 loop distribution pass. See
54435443 :ref:`Transformation Metadata ` for details.
54445444
5445 '``llvm.mem``'
5446 ^^^^^^^^^^^^^^^
5447
5448 Metadata types used to annotate memory accesses with information helpful
5449 for optimizations are prefixed with ``llvm.mem``.
5450
5451 '``llvm.mem.parallel_loop_access``' Metadata
5452 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5453
5454 The ``llvm.mem.parallel_loop_access`` metadata refers to a loop identifier,
5455 or metadata containing a list of loop identifiers for nested loops.
5456 The metadata is attached to memory accessing instructions and denotes that
5457 no loop carried memory dependence exist between it and other instructions denoted
5458 with the same loop identifier. The metadata on memory reads also implies that
5459 if conversion (i.e. speculative execution within a loop iteration) is safe.
5460
5461 Precisely, given two instructions ``m1`` and ``m2`` that both have the
5462 ``llvm.mem.parallel_loop_access`` metadata, with ``L1`` and ``L2`` being the
5463 set of loops associated with that metadata, respectively, then there is no loop
5464 carried dependence between ``m1`` and ``m2`` for loops in both ``L1`` and
5465 ``L2``.
5466
5467 As a special case, if all memory accessing instructions in a loop have
5468 ``llvm.mem.parallel_loop_access`` metadata that refers to that loop, then the
5469 loop has no loop carried memory dependences and is considered to be a parallel
5470 loop.
5471
5472 Note that if not all memory access instructions have such metadata referring to
5473 the loop, then the loop is considered not being trivially parallel. Additional
5445 '``llvm.access.group``' Metadata
5446 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5447
5448 ``llvm.access.group`` metadata can be attached to any instruction that
5449 potentially accesses memory. It can point to a single distinct metadata
5450 node, which we call access group. This node represents all memory access
5451 instructions referring to it via ``llvm.access.group``. When an
5452 instruction belongs to multiple access groups, it can also point to a
5453 list of accesses groups, illustrated by the following example.
5454
5455 .. code-block:: llvm
5456
5457 %val = load i32, i32* %arrayidx, !llvm.access.group !0
5458 ...
5459 !0 = !{!1, !2}
5460 !1 = distinct !{}
5461 !2 = distinct !{}
5462
5463 It is illegal for the list node to be empty since it might be confused
5464 with an access group.
5465
5466 The access group metadata node must be 'distinct' to avoid collapsing
5467 multiple access groups by content. A access group metadata node must
5468 always be empty which can be used to distinguish an access group
5469 metadata node from a list of access groups. Being empty avoids the
5470 situation that the content must be updated which, because metadata is
5471 immutable by design, would required finding and updating all references
5472 to the access group node.
5473
5474 The access group can be used to refer to a memory access instruction
5475 without pointing to it directly (which is not possible in global
5476 metadata). Currently, the only metadata making use of it is
5477 ``llvm.loop.parallel_accesses``.
5478
5479 '``llvm.loop.parallel_accesses``' Metadata
5480 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5481
5482 The ``llvm.loop.parallel_accesses`` metadata refers to one or more
5483 access group metadata nodes (see ``llvm.access.group``). It denotes that
5484 no loop-carried memory dependence exist between it and other instructions
5485 in the loop with this metadata.
5486
5487 Let ``m1`` and ``m2`` be two instructions that both have the
5488 ``llvm.access.group`` metadata to the access group ``g1``, respectively
5489 ``g2`` (which might be identical). If a loop contains both access groups
5490 in its ``llvm.loop.parallel_accesses`` metadata, then the compiler can
5491 assume that there is no dependency between ``m1`` and ``m2`` carried by
5492 this loop. Instructions that belong to multiple access groups are
5493 considered having this property if at least one of the access groups
5494 matches the ``llvm.loop.parallel_accesses`` list.
5495
5496 If all memory-accessing instructions in a loop have
5497 ``llvm.loop.parallel_accesses`` metadata that refers to that loop, then the
5498 loop has no loop carried memory dependences and is considered to be a
5499 parallel loop.
5500
5501 Note that if not all memory access instructions belong to an access
5502 group referred to by ``llvm.loop.parallel_accesses``, then the loop must
5503 not be considered trivially parallel. Additional
54745504 memory dependence analysis is required to make that determination. As a fail
54755505 safe mechanism, this causes loops that were originally parallel to be considered
54765506 sequential (if optimization passes that are unaware of the parallel semantics
54775507 insert new memory instructions into the loop body).
54785508
54795509 Example of a loop that is considered parallel due to its correct use of
5480 both ``llvm.loop`` and ``llvm.mem.parallel_loop_access``
5481 metadata types that refer to the same loop identifier metadata.
5510 both ``llvm.access.group`` and ``llvm.loop.parallel_accesses``
5511 metadata types.
54825512
54835513 .. code-block:: llvm
54845514
54855515 for.body:
54865516 ...
5487 %val0 = load i32, i32* %arrayidx, !llvm.mem.parallel_loop_access !0
5517 %val0 = load i32, i32* %arrayidx, !llvm.access.group !1
54885518 ...
5489 store i32 %val0, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0
5519 store i32 %val0, i32* %arrayidx1, !llvm.access.group !1
54905520 ...
54915521 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
54925522
54935523 for.end:
54945524 ...
5495 !0 = !{!0}
5496
5497 It is also possible to have nested parallel loops. In that case the
5498 memory accesses refer to a list of loop identifier metadata nodes instead of
5499 the loop identifier metadata node directly:
5525 !0 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !1}}
5526 !1 = distinct !{}
5527
5528 It is also possible to have nested parallel loops:
55005529
55015530 .. code-block:: llvm
55025531
55035532 outer.for.body:
55045533 ...
5505 %val1 = load i32, i32* %arrayidx3, !llvm.mem.parallel_loop_access !2
5534 %val1 = load i32, i32* %arrayidx3, !llvm.access.group !4
55065535 ...
55075536 br label %inner.for.body
55085537
55095538 inner.for.body:
55105539 ...
5511 %val0 = load i32, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0
5540 %val0 = load i32, i32* %arrayidx1, !llvm.access.group !3
55125541 ...
5513 store i32 %val0, i32* %arrayidx2, !llvm.mem.parallel_loop_access !0
5542 store i32 %val0, i32* %arrayidx2, !llvm.access.group !3
55145543 ...
55155544 br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1
55165545
55175546 inner.for.end:
55185547 ...
5519 store i32 %val1, i32* %arrayidx4, !llvm.mem.parallel_loop_access !2
5548 store i32 %val1, i32* %arrayidx4, !llvm.access.group !4
55205549 ...
55215550 br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2
55225551
55235552 outer.for.end: ; preds = %for.body
55245553 ...
5525 !0 = !{!1, !2} ; a list of loop identifiers
5526 !1 = !{!1} ; an identifier for the inner loop
5527 !2 = !{!2} ; an identifier for the outer loop
5554 !1 = distinct !{!1, !{!"llvm.loop.parallel_accesses", !3}} ; metadata for the inner loop
5555 !2 = distinct !{!2, !{!"llvm.loop.parallel_accesses", !3, !4}} ; metadata for the outer loop
5556 !3 = distinct !{} ; access group for instructions in the inner loop (which are implicitly contained in outer loop as well)
5557 !4 = distinct !{} ; access group for instructions in the outer, but not the inner loop
55285558
55295559 '``irr_loop``' Metadata
55305560 ^^^^^^^^^^^^^^^^^^^^^^^
69406970
69416971 The argument to the '``fneg``' instruction must be a
69426972 :ref:`floating-point ` or :ref:`vector ` of
6943 floating-point values.
6973 floating-point values.
69446974
69456975 Semantics:
69466976 """"""""""
1495614986 Overview:
1495714987 """""""""
1495814988
14959 The '``llvm.experimental.constrained.maxnum``' intrinsic returns the maximum
14989 The '``llvm.experimental.constrained.maxnum``' intrinsic returns the maximum
1496014990 of the two arguments.
1496114991
1496214992 Arguments:
1496314993 """"""""""
1496414994
14965 The first two arguments and the return value are floating-point numbers
14995 The first two arguments and the return value are floating-point numbers
1496614996 of the same type.
1496714997
1496814998 The third and forth arguments specify the rounding mode and exception
1503015060 Overview:
1503115061 """""""""
1503215062
15033 The '``llvm.experimental.constrained.ceil``' intrinsic returns the ceiling of the
15063 The '``llvm.experimental.constrained.ceil``' intrinsic returns the ceiling of the
1503415064 first operand.
1503515065
1503615066 Arguments:
1506615096 Overview:
1506715097 """""""""
1506815098
15069 The '``llvm.experimental.constrained.floor``' intrinsic returns the floor of the
15099 The '``llvm.experimental.constrained.floor``' intrinsic returns the floor of the
1507015100 first operand.
1507115101
1507215102 Arguments:
1508315113 """"""""""
1508415114
1508515115 This function returns the same values as the libm ``floor`` functions
15086 would and handles error conditions in the same way.
15116 would and handles error conditions in the same way.
1508715117
1508815118
1508915119 '``llvm.experimental.constrained.round``' Intrinsic
1510215132 Overview:
1510315133 """""""""
1510415134
15105 The '``llvm.experimental.constrained.round``' intrinsic returns the first
15135 The '``llvm.experimental.constrained.round``' intrinsic returns the first
1510615136 operand rounded to the nearest integer.
1510715137
1510815138 Arguments:
1513815168 Overview:
1513915169 """""""""
1514015170
15141 The '``llvm.experimental.constrained.trunc``' intrinsic returns the first
15142 operand rounded to the nearest integer not larger in magnitude than the
15171 The '``llvm.experimental.constrained.trunc``' intrinsic returns the first
15172 operand rounded to the nearest integer not larger in magnitude than the
1514315173 operand.
1514415174
1514515175 Arguments:
407407 /// Verify loop structure of this loop and all nested loops.
408408 void verifyLoopNest(DenseSet *Loops) const;
409409
410 /// Returns true if the loop is annotated parallel.
411 ///
412 /// Derived classes can override this method using static template
413 /// polymorphism.
414 bool isAnnotatedParallel() const { return false; }
415
410416 /// Print loop with all the BBs inside it.
411417 void print(raw_ostream &OS, unsigned Depth = 0, bool Verbose = false) const;
412418
988994 /// Function to print a loop's contents as LLVM's text IR assembly.
989995 void printLoop(Loop &L, raw_ostream &OS, const std::string &Banner = "");
990996
997 /// Find and return the loop attribute node for the attribute @p Name in
998 /// @p LoopID. Return nullptr if there is no such attribute.
999 MDNode *findOptionMDForLoopID(MDNode *LoopID, StringRef Name);
1000
1001 /// Find string metadata for a loop.
1002 ///
1003 /// Returns the MDNode where the first operand is the metadata's name. The
1004 /// following operands are the metadata's values. If no metadata with @p Name is
1005 /// found, return nullptr.
1006 MDNode *findOptionMDForLoop(const Loop *TheLoop, StringRef Name);
1007
1008 /// Return whether an MDNode might represent an access group.
1009 ///
1010 /// Access group metadata nodes have to be distinct and empty. Being
1011 /// always-empty ensures that it never needs to be changed (which -- because
1012 /// MDNodes are designed immutable -- would require creating a new MDNode). Note
1013 /// that this is not a sufficient condition: not every distinct and empty NDNode
1014 /// is representing an access group.
1015 bool isValidAsAccessGroup(MDNode *AccGroup);
1016
9911017 } // End llvm namespace
9921018
9931019 #endif
391391 template
392392 void LoopBase::print(raw_ostream &OS, unsigned Depth,
393393 bool Verbose) const {
394 OS.indent(Depth * 2) << "Loop at depth " << getLoopDepth() << " containing: ";
394 OS.indent(Depth * 2);
395 if (static_cast(this)->isAnnotatedParallel())
396 OS << "Parallel ";
397 OS << "Loop at depth " << getLoopDepth() << " containing: ";
395398
396399 BlockT *H = getHeader();
397400 for (unsigned i = 0; i < getBlocks().size(); ++i) {
116116 DemandedBits &DB,
117117 const TargetTransformInfo *TTI=nullptr);
118118
119 /// Compute the union of two access-group lists.
120 ///
121 /// If the list contains just one access group, it is returned directly. If the
122 /// list is empty, returns nullptr.
123 MDNode *uniteAccessGroups(MDNode *AccGroups1, MDNode *AccGroups2);
124
125 /// Compute the access-group list of access groups that @p Inst1 and @p Inst2
126 /// are both in. If either instruction does not access memory at all, it is
127 /// considered to be in every list.
128 ///
129 /// If the list contains just one access group, it is returned directly. If the
130 /// list is empty, returns nullptr.
131 MDNode *intersectAccessGroups(const Instruction *Inst1,
132 const Instruction *Inst2);
133
119134 /// Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath,
120 /// MD_nontemporal]. For K in Kinds, we get the MDNode for K from each of the
135 /// MD_nontemporal, MD_access_group].
136 /// For K in Kinds, we get the MDNode for K from each of the
121137 /// elements of VL, compute their "intersection" (i.e., the most generic
122138 /// metadata value that covers all of the individual values), and set I's
123139 /// metadata for M equal to the intersection value.
141157
142158 /// Create a mask with replicated elements.
143159 ///
144 /// This function creates a shuffle mask for replicating each of the \p VF
160 /// This function creates a shuffle mask for replicating each of the \p VF
145161 /// elements in a vector \p ReplicationFactor times. It can be used to
146162 /// transform a mask of \p VF elements into a mask of
147163 /// \p VF * \p ReplicationFactor elements used by a predicated
101101 MD_associated = 22, // "associated"
102102 MD_callees = 23, // "callees"
103103 MD_irr_loop = 24, // "irr_loop"
104 MD_access_group = 25, // "llvm.access.group"
104105 };
105106
106107 /// Known operand bundle tag IDs, which always have the same value. All
167167 /// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
168168 /// operand or null otherwise. If the string metadata is not found return
169169 /// Optional's not-a-value.
170 Optional findStringMetadataForLoop(Loop *TheLoop,
170 Optional findStringMetadataForLoop(const Loop *TheLoop,
171171 StringRef Name);
172172
173173 /// Find named metadata for a loop with an integer value.
292292 if (!DesiredLoopIdMetadata)
293293 return false;
294294
295 MDNode *ParallelAccesses =
296 findOptionMDForLoop(this, "llvm.loop.parallel_accesses");
297 SmallPtrSet
298 ParallelAccessGroups; // For scalable 'contains' check.
299 if (ParallelAccesses) {
300 for (const MDOperand &MD : drop_begin(ParallelAccesses->operands(), 1)) {
301 MDNode *AccGroup = cast(MD.get());
302 assert(isValidAsAccessGroup(AccGroup) &&
303 "List item must be an access group");
304 ParallelAccessGroups.insert(AccGroup);
305 }
306 }
307
295308 // The loop branch contains the parallel loop metadata. In order to ensure
296309 // that any parallel-loop-unaware optimization pass hasn't added loop-carried
297310 // dependencies (thus converted the loop back to a sequential loop), check
298 // that all the memory instructions in the loop contain parallelism metadata
299 // that point to the same unique "loop id metadata" the loop branch does.
311 // that all the memory instructions in the loop belong to an access group that
312 // is parallel to this loop.
300313 for (BasicBlock *BB : this->blocks()) {
301314 for (Instruction &I : *BB) {
302315 if (!I.mayReadOrWriteMemory())
303316 continue;
317
318 if (MDNode *AccessGroup = I.getMetadata(LLVMContext::MD_access_group)) {
319 auto ContainsAccessGroup = [&ParallelAccessGroups](MDNode *AG) -> bool {
320 if (AG->getNumOperands() == 0) {
321 assert(isValidAsAccessGroup(AG) && "Item must be an access group");
322 return ParallelAccessGroups.count(AG);
323 }
324
325 for (const MDOperand &AccessListItem : AG->operands()) {
326 MDNode *AccGroup = cast(AccessListItem.get());
327 assert(isValidAsAccessGroup(AccGroup) &&
328 "List item must be an access group");
329 if (ParallelAccessGroups.count(AccGroup))
330 return true;
331 }
332 return false;
333 };
334
335 if (ContainsAccessGroup(AccessGroup))
336 continue;
337 }
304338
305339 // The memory instruction can refer to the loop identifier metadata
306340 // directly or indirectly through another list metadata (in case of
692726 }
693727 }
694728
729 MDNode *llvm::findOptionMDForLoopID(MDNode *LoopID, StringRef Name) {
730 // No loop metadata node, no loop properties.
731 if (!LoopID)
732 return nullptr;
733
734 // First operand should refer to the metadata node itself, for legacy reasons.
735 assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
736 assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
737
738 // Iterate over the metdata node operands and look for MDString metadata.
739 for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {
740 MDNode *MD = dyn_cast(LoopID->getOperand(i));
741 if (!MD || MD->getNumOperands() < 1)
742 continue;
743 MDString *S = dyn_cast(MD->getOperand(0));
744 if (!S)
745 continue;
746 // Return the operand node if MDString holds expected metadata.
747 if (Name.equals(S->getString()))
748 return MD;
749 }
750
751 // Loop property not found.
752 return nullptr;
753 }
754
755 MDNode *llvm::findOptionMDForLoop(const Loop *TheLoop, StringRef Name) {
756 return findOptionMDForLoopID(TheLoop->getLoopID(), Name);
757 }
758
759 bool llvm::isValidAsAccessGroup(MDNode *Node) {
760 return Node->getNumOperands() == 0 && Node->isDistinct();
761 }
762
695763 //===----------------------------------------------------------------------===//
696764 // LoopInfo implementation
697765 //
463463 return MinBWs;
464464 }
465465
466 /// Add all access groups in @p AccGroups to @p List.
467 template
468 static void addToAccessGroupList(ListT &List, MDNode *AccGroups) {
469 // Interpret an access group as a list containing itself.
470 if (AccGroups->getNumOperands() == 0) {
471 assert(isValidAsAccessGroup(AccGroups) && "Node must be an access group");
472 List.insert(AccGroups);
473 return;
474 }
475
476 for (auto &AccGroupListOp : AccGroups->operands()) {
477 auto *Item = cast(AccGroupListOp.get());
478 assert(isValidAsAccessGroup(Item) && "List item must be an access group");
479 List.insert(Item);
480 }
481 };
482
483 MDNode *llvm::uniteAccessGroups(MDNode *AccGroups1, MDNode *AccGroups2) {
484 if (!AccGroups1)
485 return AccGroups2;
486 if (!AccGroups2)
487 return AccGroups1;
488 if (AccGroups1 == AccGroups2)
489 return AccGroups1;
490
491 SmallSetVector Union;
492 addToAccessGroupList(Union, AccGroups1);
493 addToAccessGroupList(Union, AccGroups2);
494
495 if (Union.size() == 0)
496 return nullptr;
497 if (Union.size() == 1)
498 return cast(Union.front());
499
500 LLVMContext &Ctx = AccGroups1->getContext();
501 return MDNode::get(Ctx, Union.getArrayRef());
502 }
503
504 MDNode *llvm::intersectAccessGroups(const Instruction *Inst1,
505 const Instruction *Inst2) {
506 bool MayAccessMem1 = Inst1->mayReadOrWriteMemory();
507 bool MayAccessMem2 = Inst2->mayReadOrWriteMemory();
508
509 if (!MayAccessMem1 && !MayAccessMem2)
510 return nullptr;
511 if (!MayAccessMem1)
512 return Inst2->getMetadata(LLVMContext::MD_access_group);
513 if (!MayAccessMem2)
514 return Inst1->getMetadata(LLVMContext::MD_access_group);
515
516 MDNode *MD1 = Inst1->getMetadata(LLVMContext::MD_access_group);
517 MDNode *MD2 = Inst2->getMetadata(LLVMContext::MD_access_group);
518 if (!MD1 || !MD2)
519 return nullptr;
520 if (MD1 == MD2)
521 return MD1;
522
523 // Use set for scalable 'contains' check.
524 SmallPtrSet AccGroupSet2;
525 addToAccessGroupList(AccGroupSet2, MD2);
526
527 SmallVector Intersection;
528 if (MD1->getNumOperands() == 0) {
529 assert(isValidAsAccessGroup(MD1) && "Node must be an access group");
530 if (AccGroupSet2.count(MD1))
531 Intersection.push_back(MD1);
532 } else {
533 for (const MDOperand &Node : MD1->operands()) {
534 auto *Item = cast(Node.get());
535 assert(isValidAsAccessGroup(Item) && "List item must be an access group");
536 if (AccGroupSet2.count(Item))
537 Intersection.push_back(Item);
538 }
539 }
540
541 if (Intersection.size() == 0)
542 return nullptr;
543 if (Intersection.size() == 1)
544 return cast(Intersection.front());
545
546 LLVMContext &Ctx = Inst1->getContext();
547 return MDNode::get(Ctx, Intersection);
548 }
549
466550 /// \returns \p I after propagating metadata from \p VL.
467551 Instruction *llvm::propagateMetadata(Instruction *Inst, ArrayRef VL) {
468552 Instruction *I0 = cast(VL[0]);
469553 SmallVector, 4> Metadata;
470554 I0->getAllMetadataOtherThanDebugLoc(Metadata);
471555
472 for (auto Kind :
473 {LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
474 LLVMContext::MD_noalias, LLVMContext::MD_fpmath,
475 LLVMContext::MD_nontemporal, LLVMContext::MD_invariant_load}) {
556 for (auto Kind : {LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
557 LLVMContext::MD_noalias, LLVMContext::MD_fpmath,
558 LLVMContext::MD_nontemporal, LLVMContext::MD_invariant_load,
559 LLVMContext::MD_access_group}) {
476560 MDNode *MD = I0->getMetadata(Kind);
477561
478562 for (int J = 1, E = VL.size(); MD && J != E; ++J) {
492576 case LLVMContext::MD_nontemporal:
493577 case LLVMContext::MD_invariant_load:
494578 MD = MDNode::intersect(MD, IMD);
579 break;
580 case LLVMContext::MD_access_group:
581 MD = intersectAccessGroups(Inst, IJ);
495582 break;
496583 default:
497584 llvm_unreachable("unhandled metadata");
6060 {MD_associated, "associated"},
6161 {MD_callees, "callees"},
6262 {MD_irr_loop, "irr_loop"},
63 {MD_access_group, "llvm.access.group"},
6364 };
6465
6566 for (auto &MDKind : MDKinds) {
173173 MI->getMetadata(LLVMContext::MD_mem_parallel_loop_access);
174174 if (LoopMemParallelMD)
175175 L->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);
176 MDNode *AccessGroupMD = MI->getMetadata(LLVMContext::MD_access_group);
177 if (AccessGroupMD)
178 L->setMetadata(LLVMContext::MD_access_group, AccessGroupMD);
176179
177180 StoreInst *S = Builder.CreateStore(L, Dest);
178181 // Alignment from the mem intrinsic will be better, so use it.
181184 S->setMetadata(LLVMContext::MD_tbaa, CopyMD);
182185 if (LoopMemParallelMD)
183186 S->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);
187 if (AccessGroupMD)
188 S->setMetadata(LLVMContext::MD_access_group, AccessGroupMD);
184189
185190 if (auto *MT = dyn_cast(MI)) {
186191 // non-atomics can be volatile
492492 case LLVMContext::MD_noalias:
493493 case LLVMContext::MD_nontemporal:
494494 case LLVMContext::MD_mem_parallel_loop_access:
495 case LLVMContext::MD_access_group:
495496 // All of these directly apply.
496497 NewLoad->setMetadata(ID, N);
497498 break;
607607 LLVMContext::MD_align,
608608 LLVMContext::MD_dereferenceable,
609609 LLVMContext::MD_dereferenceable_or_null,
610 LLVMContext::MD_access_group,
610611 };
611612
612613 for (unsigned ID : KnownIDs)
245245 LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
246246 LLVMContext::MD_noalias, LLVMContext::MD_range,
247247 LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,
248 LLVMContext::MD_invariant_group};
248 LLVMContext::MD_invariant_group, LLVMContext::MD_access_group};
249249 combineMetadata(ReplInst, I, KnownIDs, true);
250250 }
251251
632632 // Set Loop Versioning metaData for version loop.
633633 addStringMetadataToLoop(LVer.getVersionedLoop(), LICMVersioningMetaData);
634634 // Set "llvm.mem.parallel_loop_access" metaData to versioned loop.
635 // FIXME: "llvm.mem.parallel_loop_access" annotates memory access
636 // instructions, not loops.
635637 addStringMetadataToLoop(LVer.getVersionedLoop(),
636638 "llvm.mem.parallel_loop_access");
637639 // Update version loop with aggressive aliasing assumption.
995995 // handled here, but combineMetadata doesn't support them yet
996996 unsigned KnownIDs[] = {LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
997997 LLVMContext::MD_noalias,
998 LLVMContext::MD_invariant_group};
998 LLVMContext::MD_invariant_group,
999 LLVMContext::MD_access_group};
9991000 combineMetadata(C, cpy, KnownIDs, true);
10001001
10011002 // Remove the memcpy.
25922592 }
25932593 V = convertValue(DL, IRB, V, NewAllocaTy);
25942594 StoreInst *Store = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment());
2595 Store->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);
2595 Store->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access,
2596 LLVMContext::MD_access_group});
25962597 if (AATags)
25972598 Store->setAAMetadata(AATags);
25982599 Pass.DeadInsts.insert(&SI);
26612662 NewSI = IRB.CreateAlignedStore(V, NewPtr, getSliceAlign(V->getType()),
26622663 SI.isVolatile());
26632664 }
2664 NewSI->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);
2665 NewSI->copyMetadata(SI, {LLVMContext::MD_mem_parallel_loop_access,
2666 LLVMContext::MD_access_group});
26652667 if (AATags)
26662668 NewSI->setAAMetadata(AATags);
26672669 if (SI.isVolatile())
37983800 PartPtrTy, BasePtr->getName() + "."),
37993801 getAdjustedAlignment(LI, PartOffset, DL), /*IsVolatile*/ false,
38003802 LI->getName());
3801 PLoad->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);
3803 PLoad->copyMetadata(*LI, {LLVMContext::MD_mem_parallel_loop_access,
3804 LLVMContext::MD_access_group});
38023805
38033806 // Append this load onto the list of split loads so we can find it later
38043807 // to rewrite the stores.
38543857 APInt(DL.getIndexSizeInBits(AS), PartOffset),
38553858 PartPtrTy, StoreBasePtr->getName() + "."),
38563859 getAdjustedAlignment(SI, PartOffset, DL), /*IsVolatile*/ false);
3857 PStore->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);
3860 PStore->copyMetadata(*LI, {LLVMContext::MD_mem_parallel_loop_access,
3861 LLVMContext::MD_access_group});
38583862 LLVM_DEBUG(dbgs() << " +" << PartOffset << ":" << *PStore << "\n");
38593863 }
38603864
378378 || Tag == LLVMContext::MD_invariant_load
379379 || Tag == LLVMContext::MD_alias_scope
380380 || Tag == LLVMContext::MD_noalias
381 || Tag == ParallelLoopAccessMDKind);
381 || Tag == ParallelLoopAccessMDKind
382 || Tag == LLVMContext::MD_access_group);
382383 }
383384
384385 // Transfer metadata from Op to the instructions in CV if it is known
3030 #include "llvm/Analysis/ProfileSummaryInfo.h"
3131 #include "llvm/Transforms/Utils/Local.h"
3232 #include "llvm/Analysis/ValueTracking.h"
33 #include "llvm/Analysis/VectorUtils.h"
3334 #include "llvm/IR/Argument.h"
3435 #include "llvm/IR/BasicBlock.h"
3536 #include "llvm/IR/CFG.h"
769770 UnwindDest->removePredecessor(InvokeBB);
770771 }
771772
772 /// When inlining a call site that has !llvm.mem.parallel_loop_access metadata,
773 /// that metadata should be propagated to all memory-accessing cloned
774 /// instructions.
773 /// When inlining a call site that has !llvm.mem.parallel_loop_access or
774 /// llvm.access.group metadata, that metadata should be propagated to all
775 /// memory-accessing cloned instructions.
775776 static void PropagateParallelLoopAccessMetadata(CallSite CS,
776777 ValueToValueMapTy &VMap) {
777778 MDNode *M =
778779 CS.getInstruction()->getMetadata(LLVMContext::MD_mem_parallel_loop_access);
779 if (!M)
780 MDNode *CallAccessGroup =
781 CS.getInstruction()->getMetadata(LLVMContext::MD_access_group);
782 if (!M && !CallAccessGroup)
780783 return;
781784
782785 for (ValueToValueMapTy::iterator VMI = VMap.begin(), VMIE = VMap.end();
788791 if (!NI)
789792 continue;
790793
791 if (MDNode *PM = NI->getMetadata(LLVMContext::MD_mem_parallel_loop_access)) {
794 if (M) {
795 if (MDNode *PM =
796 NI->getMetadata(LLVMContext::MD_mem_parallel_loop_access)) {
792797 M = MDNode::concatenate(PM, M);
793798 NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
794 } else if (NI->mayReadOrWriteMemory()) {
795 NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
799 } else if (NI->mayReadOrWriteMemory()) {
800 NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
801 }
802 }
803
804 if (NI->mayReadOrWriteMemory()) {
805 MDNode *UnitedAccGroups = uniteAccessGroups(
806 NI->getMetadata(LLVMContext::MD_access_group), CallAccessGroup);
807 NI->setMetadata(LLVMContext::MD_access_group, UnitedAccGroups);
796808 }
797809 }
798810 }
3333 #include "llvm/Analysis/MemorySSAUpdater.h"
3434 #include "llvm/Analysis/TargetLibraryInfo.h"
3535 #include "llvm/Analysis/ValueTracking.h"
36 #include "llvm/Analysis/VectorUtils.h"
3637 #include "llvm/BinaryFormat/Dwarf.h"
3738 #include "llvm/IR/Argument.h"
3839 #include "llvm/IR/Attributes.h"
22962297 case LLVMContext::MD_mem_parallel_loop_access:
22972298 K->setMetadata(Kind, MDNode::intersect(JMD, KMD));
22982299 break;
2300 case LLVMContext::MD_access_group:
2301 K->setMetadata(LLVMContext::MD_access_group,
2302 intersectAccessGroups(K, J));
2303 break;
22992304 case LLVMContext::MD_range:
23002305
23012306 // If K does move, use most generic range. Otherwise keep the range of
23522357 LLVMContext::MD_invariant_load, LLVMContext::MD_nonnull,
23532358 LLVMContext::MD_invariant_group, LLVMContext::MD_align,
23542359 LLVMContext::MD_dereferenceable,
2355 LLVMContext::MD_dereferenceable_or_null};
2360 LLVMContext::MD_dereferenceable_or_null,
2361 LLVMContext::MD_access_group};
23562362 combineMetadata(K, J, KnownIDs, KDominatesJ);
23572363 }
23582364
23832389 LLVMContext::MD_tbaa, LLVMContext::MD_alias_scope,
23842390 LLVMContext::MD_noalias, LLVMContext::MD_range,
23852391 LLVMContext::MD_fpmath, LLVMContext::MD_invariant_load,
2386 LLVMContext::MD_invariant_group, LLVMContext::MD_nonnull};
2392 LLVMContext::MD_invariant_group, LLVMContext::MD_nonnull,
2393 LLVMContext::MD_access_group};
23872394 combineMetadata(ReplInst, I, KnownIDs, false);
23882395 }
23892396
186186 INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
187187 }
188188
189 static Optional findOptionMDForLoopID(MDNode *LoopID,
190 StringRef Name) {
191 // Return none if LoopID is false.
192 if (!LoopID)
193 return None;
194
195 // First operand should refer to the loop id itself.
196 assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
197 assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
198
199 // Iterate over LoopID operands and look for MDString Metadata
200 for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {
201 MDNode *MD = dyn_cast(LoopID->getOperand(i));
202 if (!MD)
203 continue;
204 MDString *S = dyn_cast(MD->getOperand(0));
205 if (!S)
206 continue;
207 // Return true if MDString holds expected MetaData.
208 if (Name.equals(S->getString()))
209 return MD;
210 }
211 return None;
212 }
213
214 static Optional findOptionMDForLoop(const Loop *TheLoop,
215 StringRef Name) {
216 return findOptionMDForLoopID(TheLoop->getLoopID(), Name);
217 }
218
219189 /// Find string metadata for loop
220190 ///
221191 /// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
222192 /// operand or null otherwise. If the string metadata is not found return
223193 /// Optional's not-a-value.
224 Optional llvm::findStringMetadataForLoop(Loop *TheLoop,
194 Optional llvm::findStringMetadataForLoop(const Loop *TheLoop,
225195 StringRef Name) {
226 auto MD = findOptionMDForLoop(TheLoop, Name).getValueOr(nullptr);
196 MDNode *MD = findOptionMDForLoop(TheLoop, Name);
227197 if (!MD)
228198 return None;
229199 switch (MD->getNumOperands()) {
238208
239209 static Optional getOptionalBoolLoopAttribute(const Loop *TheLoop,
240210 StringRef Name) {
241 Optional MD = findOptionMDForLoop(TheLoop, Name);
242 if (!MD.hasValue())
211 MDNode *MD = findOptionMDForLoop(TheLoop, Name);
212 if (!MD)
243213 return None;
244 MDNode *OptionNode = MD.getValue();
245 if (OptionNode == nullptr)
246 return None;
247 switch (OptionNode->getNumOperands()) {
214 switch (MD->getNumOperands()) {
248215 case 1:
249216 // When the value is absent it is interpreted as 'attribute set'.
250217 return true;
251218 case 2:
252 return mdconst::extract_or_null(
253 OptionNode->getOperand(1).get());
219 return mdconst::extract_or_null(MD->getOperand(1).get());
254220 }
255221 llvm_unreachable("unexpected number of options");
256222 }
324290
325291 bool HasAnyFollowup = false;
326292 for (StringRef OptionName : FollowupOptions) {
327 MDNode *FollowupNode =
328 findOptionMDForLoopID(OrigLoopID, OptionName).getValueOr(nullptr);
293 MDNode *FollowupNode = findOptionMDForLoopID(OrigLoopID, OptionName);
329294 if (!FollowupNode)
330295 continue;
331296
13201320 LLVMContext::MD_align,
13211321 LLVMContext::MD_dereferenceable,
13221322 LLVMContext::MD_dereferenceable_or_null,
1323 LLVMContext::MD_mem_parallel_loop_access};
1323 LLVMContext::MD_mem_parallel_loop_access,
1324 LLVMContext::MD_access_group};
13241325 combineMetadata(I1, I2, KnownIDs, true);
13251326
13261327 // I1 and I2 are being combined into a single instruction. Its debug
0 ; RUN: opt -loops -analyze < %s | FileCheck %s
1 ;
2 ; void func(long n, double A[static const restrict 4*n], double B[static const restrict 4*n]) {
3 ; for (long i = 0; i < n; i += 1)
4 ; for (long j = 0; j < n; j += 1)
5 ; for (long k = 0; k < n; k += 1)
6 ; for (long l = 0; l < n; l += 1) {
7 ; A[i + j + k + l] = 21;
8 ; B[i + j + k + l] = 42;
9 ; }
10 ; }
11 ;
12 ; Check that isAnnotatedParallel is working as expected.
13 ;
14 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
15
16 define void @func(i64 %n, double* noalias nonnull %A, double* noalias nonnull %B) {
17 entry:
18 br label %for.cond
19
20 for.cond:
21 %i.0 = phi i64 [ 0, %entry ], [ %add28, %for.inc27 ]
22 %cmp = icmp slt i64 %i.0, %n
23 br i1 %cmp, label %for.cond2, label %for.end29
24
25 for.cond2:
26 %j.0 = phi i64 [ %add25, %for.inc24 ], [ 0, %for.cond ]
27 %cmp3 = icmp slt i64 %j.0, %n
28 br i1 %cmp3, label %for.cond6, label %for.inc27
29
30 for.cond6:
31 %k.0 = phi i64 [ %add22, %for.inc21 ], [ 0, %for.cond2 ]
32 %cmp7 = icmp slt i64 %k.0, %n
33 br i1 %cmp7, label %for.cond10, label %for.inc24
34
35 for.cond10:
36 %l.0 = phi i64 [ %add20, %for.body13 ], [ 0, %for.cond6 ]
37 %cmp11 = icmp slt i64 %l.0, %n
38 br i1 %cmp11, label %for.body13, label %for.inc21
39
40 for.body13:
41 %add = add nuw nsw i64 %i.0, %j.0
42 %add14 = add nuw nsw i64 %add, %k.0
43 %add15 = add nuw nsw i64 %add14, %l.0
44 %arrayidx = getelementptr inbounds double, double* %A, i64 %add15
45 store double 2.100000e+01, double* %arrayidx, align 8, !llvm.access.group !5
46 %add16 = add nuw nsw i64 %i.0, %j.0
47 %add17 = add nuw nsw i64 %add16, %k.0
48 %add18 = add nuw nsw i64 %add17, %l.0
49 %arrayidx19 = getelementptr inbounds double, double* %B, i64 %add18
50 store double 4.200000e+01, double* %arrayidx19, align 8, !llvm.access.group !6
51 %add20 = add nuw nsw i64 %l.0, 1
52 br label %for.cond10, !llvm.loop !11
53
54 for.inc21:
55 %add22 = add nuw nsw i64 %k.0, 1
56 br label %for.cond6, !llvm.loop !14
57
58 for.inc24:
59 %add25 = add nuw nsw i64 %j.0, 1
60 br label %for.cond2, !llvm.loop !16
61
62 for.inc27:
63 %add28 = add nuw nsw i64 %i.0, 1
64 br label %for.cond, !llvm.loop !18
65
66 for.end29:
67 ret void
68 }
69
70 ; access groups
71 !7 = distinct !{}
72 !8 = distinct !{}
73 !10 = distinct !{}
74
75 ; access group lists
76 !5 = !{!7, !10}
77 !6 = !{!7, !8, !10}
78
79 ; LoopIDs
80 !11 = distinct !{!11, !{!"llvm.loop.parallel_accesses", !10}}
81 !14 = distinct !{!14, !{!"llvm.loop.parallel_accesses", !8, !10}}
82 !16 = distinct !{!16, !{!"llvm.loop.parallel_accesses", !8}}
83 !18 = distinct !{!18, !{!"llvm.loop.parallel_accesses", !7}}
84
85
86 ; CHECK: Parallel Loop at depth 1
87 ; CHECK-NOT: Parallel
88 ; CHECK: Loop at depth 2
89 ; CHECK: Parallel Loop
90 ; CHECK: Parallel Loop
0 ; RUN: opt -loops -analyze < %s | FileCheck %s
1 ;
2 ; void func(long n, double A[static const restrict n]) {
3 ; for (long i = 0; i < n; i += 1)
4 ; A[i] = 21;
5 ; }
6 ;
7 ; Check that isAnnotatedParallel is working as expected.
8 ;
9 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
10
11 define void @func(i64 %n, double* noalias nonnull %A) {
12 entry:
13 br label %for.cond
14
15 for.cond:
16 %i.0 = phi i64 [ 0, %entry ], [ %add, %for.body ]
17 %cmp = icmp slt i64 %i.0, %n
18 br i1 %cmp, label %for.body, label %for.end
19
20 for.body:
21 %arrayidx = getelementptr inbounds double, double* %A, i64 %i.0
22 store double 2.100000e+01, double* %arrayidx, align 8, !llvm.access.group !6
23 %add = add nuw nsw i64 %i.0, 1
24 br label %for.cond, !llvm.loop !7
25
26 for.end:
27 ret void
28 }
29
30 !6 = distinct !{} ; access group
31
32 !7 = distinct !{!7, !9} ; LoopID
33 !9 = !{!"llvm.loop.parallel_accesses", !6}
34
35
36 ; CHECK: Parallel Loop
99 ; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
1010 ; RUN: -o /dev/null -stats \
1111 ; RUN: 2>&1 | FileCheck %s -check-prefix=LAZY
12 ; LAZY: 55 bitcode-reader - Number of Metadata records loaded
12 ; LAZY: 57 bitcode-reader - Number of Metadata records loaded
1313 ; LAZY: 2 bitcode-reader - Number of MDStrings loaded
1414
1515 ; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
1616 ; RUN: -o /dev/null -disable-ondemand-mds-loading -stats \
1717 ; RUN: 2>&1 | FileCheck %s -check-prefix=NOTLAZY
18 ; NOTLAZY: 64 bitcode-reader - Number of Metadata records loaded
18 ; NOTLAZY: 66 bitcode-reader - Number of Metadata records loaded
1919 ; NOTLAZY: 7 bitcode-reader - Number of MDStrings loaded
2020
2121
0 ; RUN: opt -S -inline < %s | FileCheck %s
1 ;
2 ; Check that the !llvm.access.group is still present after inlining.
3 ;
4 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
5
6 define void @Body(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p, i32 %i) {
7 entry:
8 %idxprom = sext i32 %i to i64
9 %arrayidx = getelementptr inbounds i32, i32* %p, i64 %idxprom
10 %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !0
11 %cmp = icmp eq i32 %0, 0
12 %arrayidx2 = getelementptr inbounds i32, i32* %res, i64 %idxprom
13 %1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !0
14 br i1 %cmp, label %cond.end, label %cond.false
15
16 cond.false:
17 %arrayidx6 = getelementptr inbounds i32, i32* %d, i64 %idxprom
18 %2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !0
19 %add = add nsw i32 %2, %1
20 br label %cond.end
21
22 cond.end:
23 %cond = phi i32 [ %add, %cond.false ], [ %1, %entry ]
24 store i32 %cond, i32* %arrayidx2, align 4
25 ret void
26 }
27
28 define void @Test(i32* %res, i32* %c, i32* %d, i32* %p, i32 %n) {
29 entry:
30 br label %for.cond
31
32 for.cond:
33 %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
34 %cmp = icmp slt i32 %i.0, 1600
35 br i1 %cmp, label %for.body, label %for.end
36
37 for.body:
38 call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.access.group !0
39 %inc = add nsw i32 %i.0, 1
40 br label %for.cond, !llvm.loop !1
41
42 for.end:
43 ret void
44 }
45
46 !0 = distinct !{} ; access group
47 !1 = distinct !{!1, !{!"llvm.loop.parallel_accesses", !0}} ; LoopID
48
49
50 ; CHECK-LABEL: @Test
51 ; CHECK: load i32,{{.*}}, !llvm.access.group !0
52 ; CHECK: load i32,{{.*}}, !llvm.access.group !0
53 ; CHECK: load i32,{{.*}}, !llvm.access.group !0
54 ; CHECK: store i32 {{.*}}, !llvm.access.group !0
55 ; CHECK: br label %for.cond, !llvm.loop !1
0 ; RUN: opt -always-inline -globalopt -S < %s | FileCheck %s
1 ;
2 ; static void __attribute__((always_inline)) callee(long n, double A[static const restrict n], long i) {
3 ; for (long j = 0; j < n; j += 1)
4 ; A[i * n + j] = 42;
5 ; }
6 ;
7 ; void caller(long n, double A[static const restrict n]) {
8 ; for (long i = 0; i < n; i += 1)
9 ; callee(n, A, i);
10 ; }
11 ;
12 ; Check that the access groups (llvm.access.group) are correctly merged.
13 ;
14 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
15
16 define internal void @callee(i64 %n, double* noalias nonnull %A, i64 %i) #0 {
17 entry:
18 br label %for.cond
19
20 for.cond:
21 %j.0 = phi i64 [ 0, %entry ], [ %add1, %for.body ]
22 %cmp = icmp slt i64 %j.0, %n
23 br i1 %cmp, label %for.body, label %for.end
24
25 for.body:
26 %mul = mul nsw i64 %i, %n
27 %add = add nsw i64 %mul, %j.0
28 %arrayidx = getelementptr inbounds double, double* %A, i64 %add
29 store double 4.200000e+01, double* %arrayidx, align 8, !llvm.access.group !6
30 %add1 = add nuw nsw i64 %j.0, 1
31 br label %for.cond, !llvm.loop !7
32
33 for.end:
34 ret void
35 }
36
37 attributes #0 = { alwaysinline }
38
39 !6 = distinct !{} ; access group
40 !7 = distinct !{!7, !9} ; LoopID
41 !9 = !{!"llvm.loop.parallel_accesses", !6}
42
43
44 define void @caller(i64 %n, double* noalias nonnull %A) {
45 entry:
46 br label %for.cond
47
48 for.cond:
49 %i.0 = phi i64 [ 0, %entry ], [ %add, %for.body ]
50 %cmp = icmp slt i64 %i.0, %n
51 br i1 %cmp, label %for.body, label %for.end
52
53 for.body:
54 call void @callee(i64 %n, double* %A, i64 %i.0), !llvm.access.group !10
55 %add = add nuw nsw i64 %i.0, 1
56 br label %for.cond, !llvm.loop !11
57
58 for.end:
59 ret void
60 }
61
62 !10 = distinct !{} ; access group
63 !11 = distinct !{!11, !12} ; LoopID
64 !12 = !{!"llvm.loop.parallel_accesses", !10}
65
66
67 ; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group ![[ACCESS_GROUP_LIST_3:[0-9]+]]
68 ; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
69 ; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]
70
71 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]], ![[ACCESS_GROUP_OUTER:[0-9]+]]}
72 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
73 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
74 ; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]], ![[ACCESSES_INNER:[0-9]+]]}
75 ; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}
76 ; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]], ![[ACCESSES_OUTER:[0-9]+]]}
77 ; CHECK: ![[ACCESSES_OUTER]] = !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}
3636 br i1 %cmp, label %for.body, label %for.end
3737
3838 for.body: ; preds = %for.cond
39 call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.mem.parallel_loop_access !0
39 call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.access.group !0
4040 %inc = add nsw i32 %i.0, 1
41 br label %for.cond, !llvm.loop !0
41 br label %for.cond, !llvm.loop !1
4242
4343 for.end: ; preds = %for.cond
4444 ret void
4545 }
4646
4747 ; CHECK-LABEL: @Test
48 ; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0
49 ; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0
50 ; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0
51 ; CHECK: store i32{{.*}}, !llvm.mem.parallel_loop_access !0
52 ; CHECK: br label %for.cond, !llvm.loop !0
48 ; CHECK: load i32,{{.*}}, !llvm.access.group !0
49 ; CHECK: load i32,{{.*}}, !llvm.access.group !0
50 ; CHECK: load i32,{{.*}}, !llvm.access.group !0
51 ; CHECK: store i32{{.*}}, !llvm.access.group !0
52 ; CHECK: br label %for.cond, !llvm.loop !1
5353
5454 attributes #0 = { norecurse nounwind uwtable }
5555
56 !0 = distinct !{!0}
57
56 !0 = distinct !{}
57 !1 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !0}}
0 ; RUN: opt -instcombine -S < %s | FileCheck %s
1 ;
2 ; void func(long n, double A[static const restrict n]) {
3 ; for (int i = 0; i < n; i+=1)
4 ; for (int j = 0; j < n;j+=1)
5 ; for (int k = 0; k < n; k += 1)
6 ; for (int l = 0; l < n; l += 1) {
7 ; double *p = &A[i + j + k + l];
8 ; double x = *p;
9 ; double y = *p;
10 ; arg(x + y);
11 ; }
12 ; }
13 ;
14 ; Check for correctly merging access group metadata for instcombine
15 ; (only common loops are parallel == intersection)
16 ; Note that combined load would be parallel to loop !16 since both
17 ; origin loads are parallel to it, but it references two access groups
18 ; (!8 and !9), neither of which contain both loads. As such, the
19 ; information that the combined load is parallel to !16 is lost.
20 ;
21 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
22
23 declare void @arg(double)
24
25 define void @func(i64 %n, double* noalias nonnull %A) {
26 entry:
27 br label %for.cond
28
29 for.cond:
30 %i.0 = phi i32 [ 0, %entry ], [ %add31, %for.inc30 ]
31 %conv = sext i32 %i.0 to i64
32 %cmp = icmp slt i64 %conv, %n
33 br i1 %cmp, label %for.cond2, label %for.end32
34
35 for.cond2:
36 %j.0 = phi i32 [ %add28, %for.inc27 ], [ 0, %for.cond ]
37 %conv3 = sext i32 %j.0 to i64
38 %cmp4 = icmp slt i64 %conv3, %n
39 br i1 %cmp4, label %for.cond8, label %for.inc30
40
41 for.cond8:
42 %k.0 = phi i32 [ %add25, %for.inc24 ], [ 0, %for.cond2 ]
43 %conv9 = sext i32 %k.0 to i64
44 %cmp10 = icmp slt i64 %conv9, %n
45 br i1 %cmp10, label %for.cond14, label %for.inc27
46
47 for.cond14:
48 %l.0 = phi i32 [ %add23, %for.body19 ], [ 0, %for.cond8 ]
49 %conv15 = sext i32 %l.0 to i64
50 %cmp16 = icmp slt i64 %conv15, %n
51 br i1 %cmp16, label %for.body19, label %for.inc24
52
53 for.body19:
54 %add = add nsw i32 %i.0, %j.0
55 %add20 = add nsw i32 %add, %k.0
56 %add21 = add nsw i32 %add20, %l.0
57 %idxprom = sext i32 %add21 to i64
58 %arrayidx = getelementptr inbounds double, double* %A, i64 %idxprom
59 %0 = load double, double* %arrayidx, align 8, !llvm.access.group !1
60 %1 = load double, double* %arrayidx, align 8, !llvm.access.group !2
61 %add22 = fadd double %0, %1
62 call void @arg(double %add22), !llvm.access.group !3
63 %add23 = add nsw i32 %l.0, 1
64 br label %for.cond14, !llvm.loop !11
65
66 for.inc24:
67 %add25 = add nsw i32 %k.0, 1
68 br label %for.cond8, !llvm.loop !14
69
70 for.inc27:
71 %add28 = add nsw i32 %j.0, 1
72 br label %for.cond2, !llvm.loop !16
73
74 for.inc30:
75 %add31 = add nsw i32 %i.0, 1
76 br label %for.cond, !llvm.loop !18
77
78 for.end32:
79 ret void
80 }
81
82
83 ; access groups
84 !7 = distinct !{}
85 !8 = distinct !{}
86 !9 = distinct !{}
87
88 ; access group lists
89 !1 = !{!7, !9}
90 !2 = !{!7, !8}
91 !3 = !{!7, !8, !9}
92
93 !11 = distinct !{!11, !13}
94 !13 = !{!"llvm.loop.parallel_accesses", !7}
95
96 !14 = distinct !{!14, !15}
97 !15 = !{!"llvm.loop.parallel_accesses", !8}
98
99 !16 = distinct !{!16, !17}
100 !17 = !{!"llvm.loop.parallel_accesses", !8, !9}
101
102 !18 = distinct !{!18, !19}
103 !19 = !{!"llvm.loop.parallel_accesses", !9}
104
105
106 ; CHECK: load double, {{.*}} !llvm.access.group ![[ACCESSGROUP_0:[0-9]+]]
107 ; CHECK: br label %for.cond14, !llvm.loop ![[LOOP_4:[0-9]+]]
108
109 ; CHECK: ![[ACCESSGROUP_0]] = distinct !{}
110
111 ; CHECK: ![[LOOP_4]] = distinct !{![[LOOP_4]], ![[PARALLEL_ACCESSES_5:[0-9]+]]}
112 ; CHECK: ![[PARALLEL_ACCESSES_5]] = !{!"llvm.loop.parallel_accesses", ![[ACCESSGROUP_0]]}
3838 define i32 @test_load_cast_combine_invariant(float* %ptr) {
3939 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves invariant metadata.
4040 ; CHECK-LABEL: @test_load_cast_combine_invariant(
41 ; CHECK: load i32, i32* %{{.*}}, !invariant.load !5
41 ; CHECK: load i32, i32* %{{.*}}, !invariant.load !7
4242 entry:
4343 %l = load float, float* %ptr, !invariant.load !6
4444 %c = bitcast float %l to i32
4949 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves nontemporal
5050 ; metadata.
5151 ; CHECK-LABEL: @test_load_cast_combine_nontemporal(
52 ; CHECK: load i32, i32* %{{.*}}, !nontemporal !6
52 ; CHECK: load i32, i32* %{{.*}}, !nontemporal !8
5353 entry:
5454 %l = load float, float* %ptr, !nontemporal !7
5555 %c = bitcast float %l to i32
6060 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves align
6161 ; metadata.
6262 ; CHECK-LABEL: @test_load_cast_combine_align(
63 ; CHECK: load i8*, i8** %{{.*}}, !align !7
63 ; CHECK: load i8*, i8** %{{.*}}, !align !9
6464 entry:
6565 %l = load i32*, i32** %ptr, !align !8
6666 %c = bitcast i32* %l to i8*
7171 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves dereferenceable
7272 ; metadata.
7373 ; CHECK-LABEL: @test_load_cast_combine_deref(
74 ; CHECK: load i8*, i8** %{{.*}}, !dereferenceable !7
74 ; CHECK: load i8*, i8** %{{.*}}, !dereferenceable !9
7575 entry:
7676 %l = load i32*, i32** %ptr, !dereferenceable !8
7777 %c = bitcast i32* %l to i8*
8282 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves
8383 ; dereferenceable_or_null metadata.
8484 ; CHECK-LABEL: @test_load_cast_combine_deref_or_null(
85 ; CHECK: load i8*, i8** %{{.*}}, !dereferenceable_or_null !7
85 ; CHECK: load i8*, i8** %{{.*}}, !dereferenceable_or_null !9
8686 entry:
8787 %l = load i32*, i32** %ptr, !dereferenceable_or_null !8
8888 %c = bitcast i32* %l to i8*
9393 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves loop access
9494 ; metadata.
9595 ; CHECK-LABEL: @test_load_cast_combine_loop(
96 ; CHECK: load i32, i32* %{{.*}}, !llvm.mem.parallel_loop_access !4
96 ; CHECK: load i32, i32* %{{.*}}, !llvm.access.group !6
9797 entry:
9898 br label %loop
9999
101101 %i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
102102 %src.gep = getelementptr inbounds float, float* %src, i32 %i
103103 %dst.gep = getelementptr inbounds i32, i32* %dst, i32 %i
104 %l = load float, float* %src.gep, !llvm.mem.parallel_loop_access !4
104 %l = load float, float* %src.gep, !llvm.access.group !9
105105 %c = bitcast float %l to i32
106106 store i32 %c, i32* %dst.gep
107107 %i.next = add i32 %i, 1
141141 !1 = !{!"scalar type", !2}
142142 !2 = !{!"root"}
143143 !3 = distinct !{!3, !4}
144 !4 = distinct !{!4}
144 !4 = distinct !{!4, !{!"llvm.loop.parallel_accesses", !9}}
145145 !5 = !{i32 0, i32 42}
146146 !6 = !{}
147147 !7 = !{i32 1}
148148 !8 = !{i64 8}
149 !9 = distinct !{}
0 ; RUN: opt < %s -instcombine -S | FileCheck %s
11 ;
2 ; Make sure the llvm.mem.parallel_loop_access meta-data is preserved
2 ; Make sure the llvm.access.group meta-data is preserved
33 ; when a memcpy is replaced with a load+store by instcombine
44 ;
55 ; #include
1212 ; }
1313
1414 ; CHECK: for.body:
15 ; CHECK: %{{.*}} = load i16, i16* %{{.*}}, align 1, !llvm.mem.parallel_loop_access !1
16 ; CHECK: store i16 %{{.*}}, i16* %{{.*}}, align 1, !llvm.mem.parallel_loop_access !1
15 ; CHECK: %{{.*}} = load i16, i16* %{{.*}}, align 1, !llvm.access.group !1
16 ; CHECK: store i16 %{{.*}}, i16* %{{.*}}, align 1, !llvm.access.group !1
1717
1818
1919 ; ModuleID = ''
3535 %arrayidx = getelementptr inbounds i8, i8* %out, i64 %i.0
3636 %add = add nsw i64 %i.0, %size
3737 %arrayidx1 = getelementptr inbounds i8, i8* %out, i64 %add
38 call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arrayidx, i8* %arrayidx1, i64 2, i1 false), !llvm.mem.parallel_loop_access !1
38 call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arrayidx, i8* %arrayidx1, i64 2, i1 false), !llvm.access.group !4
3939 br label %for.inc
4040
4141 for.inc: ; preds = %for.body
5555 !llvm.ident = !{!0}
5656
5757 !0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}
58 !1 = distinct !{!1, !2, !3}
58 !1 = distinct !{!1, !2, !3, !{!"llvm.loop.parallel_accesses", !4}}
5959 !2 = distinct !{!2, !3}
6060 !3 = !{!"llvm.loop.vectorize.enable", i1 true}
61 !4 = distinct !{} ; access group
1212 for.body: ; preds = %cond.end, %entry
1313 %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
1414 %arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
15 %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
15 %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !1
1616 %cmp1 = icmp eq i32 %0, 0
1717 %arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
18 %1 = load i32, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0
18 %1 = load i32, i32* %arrayidx3, align 4, !llvm.access.group !1
1919 br i1 %cmp1, label %cond.end, label %cond.false
2020
2121 cond.false: ; preds = %for.body
2222 %arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
23 %2 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0
23 %2 = load i32, i32* %arrayidx7, align 4, !llvm.access.group !1
2424 %add = add nsw i32 %2, %1
2525 br label %cond.end
2626
2727 cond.end: ; preds = %for.body, %cond.false
2828 %cond = phi i32 [ %add, %cond.false ], [ %1, %for.body ]
29 store i32 %cond, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0
29 store i32 %cond, i32* %arrayidx3, align 4, !llvm.access.group !1
3030 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
3131 %exitcond = icmp eq i64 %indvars.iv.next, 16
3232 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
3737
3838 attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }
3939
40 !0 = distinct !{!0}
40 !0 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !1}}
41 !1 = distinct !{}
1818 for.body: ; preds = %for.body.for.body_crit_edge, %entry
1919 %indvars.iv.reload = load i64, i64* %indvars.iv.reg2mem
2020 %arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.reload
21 %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
21 %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !4
2222 %arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.reload
23 %1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
23 %1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !4
2424 %idxprom3 = sext i32 %1 to i64
2525 %arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
26 store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !3
26 store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !4
2727 %indvars.iv.next = add i64 %indvars.iv.reload, 1
2828 ; A new store without the parallel metadata here:
2929 store i64 %indvars.iv.next, i64* %indvars.iv.next.reg2mem
3030 %indvars.iv.next.reload1 = load i64, i64* %indvars.iv.next.reg2mem
3131 %arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next.reload1
32 %2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !3
33 store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
32 %2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !4
33 store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !4
3434 %indvars.iv.next.reload = load i64, i64* %indvars.iv.next.reg2mem
3535 %lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32
3636 %exitcond = icmp eq i32 %lftr.wideiv, 512
4545 ret void
4646 }
4747
48 !3 = !{!3}
48 !3 = !{!3, !{!"llvm.loop.parallel_accesses", !4}}
49 !4 = distinct !{}
5050 for.body: ; preds = %for.body, %entry
5151 %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
5252 %arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
53 %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
53 %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !13
5454 %arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
55 %1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
55 %1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !13
5656 %idxprom3 = sext i32 %1 to i64
5757 %arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
5858 ; This store might have originated from inlining a function with a parallel
5959 ; loop. Refers to a list with the "original loop reference" (!4) also included.
60 store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !5
60 store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !15
6161 %indvars.iv.next = add i64 %indvars.iv, 1
6262 %arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next
63 %2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !3
64 store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
63 %2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !13
64 store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !13
6565 %lftr.wideiv = trunc i64 %indvars.iv.next to i32
6666 %exitcond = icmp eq i32 %lftr.wideiv, 512
6767 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3
8383 for.body: ; preds = %for.body, %entry
8484 %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
8585 %arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
86 %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !6
86 %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !16
8787 %arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
88 %1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6
88 %1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !16
8989 %idxprom3 = sext i32 %1 to i64
9090 %arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
9191 ; This refers to the loop marked with !7 which we are not in at the moment.
9292 ; It should prevent detecting as a parallel loop.
93 store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !7
93 store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !17
9494 %indvars.iv.next = add i64 %indvars.iv, 1
9595 %arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next
96 %2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !6
97 store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6
96 %2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !16
97 store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !16
9898 %lftr.wideiv = trunc i64 %indvars.iv.next to i32
9999 %exitcond = icmp eq i32 %lftr.wideiv, 512
100100 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6
103103 ret void
104104 }
105105
106 !3 = !{!3}
107 !4 = !{!4}
108 !5 = !{!3, !4}
109 !6 = !{!6}
110 !7 = !{!7}
106 !3 = !{!3, !{!"llvm.loop.parallel_accesses", !13, !15}}
107 !4 = !{!4, !{!"llvm.loop.parallel_accesses", !14, !15}}
108 !6 = !{!6, !{!"llvm.loop.parallel_accesses", !16}}
109 !7 = !{!7, !{!"llvm.loop.parallel_accesses", !17}}
110 !13 = distinct !{}
111 !14 = distinct !{}
112 !15 = distinct !{}
113 !16 = distinct !{}
114 !17 = distinct !{}
1717 for.body:
1818 %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
1919 %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
20 %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
20 %0 = load float, float* %arrayidx, align 4, !llvm.access.group !5
2121 %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
22 %1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
22 %1 = load float, float* %arrayidx2, align 4, !llvm.access.group !5
2323 %add = fadd fast float %0, %1
24 store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
24 store float %add, float* %arrayidx2, align 4, !llvm.access.group !5
2525 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
2626 %exitcond = icmp eq i64 %indvars.iv.next, 8
2727 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4
3030 ret void
3131 }
3232
33 !3 = !{!3}
33 !3 = !{!3, !{!"llvm.loop.parallel_accesses", !5}}
3434 !4 = !{!4}
35 !5 = distinct !{}
3131 for.body:
3232 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
3333 %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
34 %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !1
34 %0 = load float, float* %arrayidx, align 4, !llvm.access.group !11
3535 %call = tail call float @llvm.sin.f32(float %0)
3636 %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
37 store float %call, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1
37 store float %call, float* %arrayidx2, align 4, !llvm.access.group !11
3838 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
3939 %lftr.wideiv = trunc i64 %indvars.iv.next to i32
4040 %exitcond = icmp eq i32 %lftr.wideiv, 1000
4747 ret void
4848 }
4949
50 !1 = !{!1, !2}
50 !1 = !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
5151 !2 = !{!"llvm.loop.vectorize.enable", i1 true}
52 !11 = distinct !{}
5253
5354 ;
5455 ; This method will not be vectorized, as scalar cost is lower than any of vector costs.
6162 for.body:
6263 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
6364 %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
64 %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
65 %0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
6566 %call = tail call float @llvm.sin.f32(float %0)
6667 %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
67 store float %call, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
68 store float %call, float* %arrayidx2, align 4, !llvm.access.group !13
6869 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
6970 %lftr.wideiv = trunc i64 %indvars.iv.next to i32
7071 %exitcond = icmp eq i32 %lftr.wideiv, 1000
8081 declare float @llvm.sin.f32(float) nounwind readnone
8182
8283 ; Dummy metadata
83 !3 = !{!3}
84 !3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
85 !13 = distinct !{}
8486
4040 ; CHECK-NEXT: store <8 x float> [[TMP7]], <8 x float>* [[TMP8]], align 4
4141 ; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
4242 ; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
43 ; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
43 ; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !1
4444 ; CHECK: middle.block:
4545 ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 20, 16
4646 ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
5050 ; CHECK: for.body:
5151 ; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
5252 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, float* [[B]], i64 [[INDVARS_IV]]
53 ; CHECK-NEXT: [[TMP10:%.*]] = load float, float* [[ARRAYIDX]], align 4, !llvm.mem.parallel_loop_access !3
53 ; CHECK-NEXT: [[TMP10:%.*]] = load float, float* [[ARRAYIDX]], align 4, !llvm.access.group !0
5454 ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, float* [[A]], i64 [[INDVARS_IV]]
55 ; CHECK-NEXT: [[TMP11:%.*]] = load float, float* [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !3
55 ; CHECK-NEXT: [[TMP11:%.*]] = load float, float* [[ARRAYIDX2]], align 4, !llvm.access.group !0
5656 ; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]
57 ; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !3
57 ; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.access.group !0
5858 ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
5959 ; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 20
60 ; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !4
60 ; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !5
6161 ; CHECK: for.end:
6262 ; CHECK-NEXT: ret void
6363 ;
6767 for.body:
6868 %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
6969 %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
70 %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !1
70 %0 = load float, float* %arrayidx, align 4, !llvm.access.group !11
7171 %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
72 %1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1
72 %1 = load float, float* %arrayidx2, align 4, !llvm.access.group !11
7373 %add = fadd fast float %0, %1
74 store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1
74 store float %add, float* %arrayidx2, align 4, !llvm.access.group !11
7575 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
7676 %exitcond = icmp eq i64 %indvars.iv.next, 20
7777 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1
8080 ret void
8181 }
8282
83 !1 = !{!1, !2}
83 !1 = !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
8484 !2 = !{!"llvm.loop.vectorize.enable", i1 true}
85 !11 = distinct !{}
8586
8687 ;
8788 ; This loop will be vectorized as the trip count is below the threshold but no
113114 ; CHECK-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP7]], <8 x float>* [[TMP9]], i32 4, <8 x i1> [[TMP8]])
114115 ; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
115116 ; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 24
116 ; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6
117 ; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !8
117118 ; CHECK: middle.block:
118119 ; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
119120 ; CHECK: scalar.ph:
126127 for.body:
127128 %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
128129 %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
129 %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
130 %0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
130131 %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
131 %1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
132 %1 = load float, float* %arrayidx2, align 4, !llvm.access.group !13
132133 %add = fadd fast float %0, %1
133 store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
134 store float %add, float* %arrayidx2, align 4, !llvm.access.group !13
134135 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
135136 %exitcond = icmp eq i64 %indvars.iv.next, 20
136137 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3
139140 ret void
140141 }
141142
142 !3 = !{!3}
143 !3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
144 !13 = distinct !{}
143145
144146 ;
145147 ; This loop will be vectorized as the trip count is below the threshold but no
170172 ; CHECK-NEXT: store <8 x float> [[TMP7]], <8 x float>* [[TMP8]], align 4
171173 ; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
172174 ; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
173 ; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !9
175 ; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !11
174176 ; CHECK: middle.block:
175177 ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 16, 16
176178 ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
180182 ; CHECK: for.body:
181183 ; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
182184 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, float* [[B]], i64 [[INDVARS_IV]]
183 ; CHECK-NEXT: [[TMP10:%.*]] = load float, float* [[ARRAYIDX]], align 4, !llvm.mem.parallel_loop_access !7
185 ; CHECK-NEXT: [[TMP10:%.*]] = load float, float* [[ARRAYIDX]], align 4, !llvm.access.group !7
184186 ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, float* [[A]], i64 [[INDVARS_IV]]
185 ; CHECK-NEXT: [[TMP11:%.*]] = load float, float* [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !7
187 ; CHECK-NEXT: [[TMP11:%.*]] = load float, float* [[ARRAYIDX2]], align 4, !llvm.access.group !7
186188 ; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP10]], [[TMP11]]
187 ; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.mem.parallel_loop_access !7
189 ; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4, !llvm.access.group !7
188190 ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
189191 ; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 16
190 ; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !10
192 ; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !12
191193 ; CHECK: for.end:
192194 ; CHECK-NEXT: ret void
193195 ;
197199 for.body:
198200 %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
199201 %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
200 %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
202 %0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
201203 %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
202 %1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
204 %1 = load float, float* %arrayidx2, align 4, !llvm.access.group !13
203205 %add = fadd fast float %0, %1
204 store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
206 store float %add, float* %arrayidx2, align 4, !llvm.access.group !13
205207 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
206208 %exitcond = icmp eq i64 %indvars.iv.next, 16
207209 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4
5757 for.body:
5858 %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
5959 %arrayidx = getelementptr inbounds i8, i8* %B, i64 %indvars.iv
60 %l1 = load i8, i8* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
60 %l1 = load i8, i8* %arrayidx, align 4, !llvm.access.group !13
6161 %arrayidx2 = getelementptr inbounds i8, i8* %A, i64 %indvars.iv
62 %l2 = load i8, i8* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
62 %l2 = load i8, i8* %arrayidx2, align 4, !llvm.access.group !13
6363 %add = add i8 %l1, %l2
64 store i8 %add, i8* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
64 store i8 %add, i8* %arrayidx2, align 4, !llvm.access.group !13
6565 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
6666 %exitcond = icmp eq i64 %indvars.iv.next, 16
6767 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4
6969 for.end:
7070 ret void
7171 }
72 !3 = !{!3}
72 !3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
7373 !4 = !{!4}
74 !13 = distinct !{}
0 ; RUN: opt < %s -sroa -S | FileCheck %s
11 ;
2 ; Make sure the llvm.mem.parallel_loop_access meta-data is preserved
2 ; Make sure the llvm.access.group meta-data is preserved
33 ; when a load/store is replaced with another load/store by sroa
44 ;
55 ; class Complex {
3232
3333 ; CHECK: for.body:
3434 ; CHECK-NOT: store i32 %{{.*}}, i32* %{{.*}}, align 4
35 ; CHECK: store i32 %{{.*}}, i32* %{{.*}}, align 4, !llvm.mem.parallel_loop_access !1
35 ; CHECK: store i32 %{{.*}}, i32* %{{.*}}, align 4, !llvm.access.group !1
3636 ; CHECK-NOT: store i32 %{{.*}}, i32* %{{.*}}, align 4
37 ; CHECK: store i32 %{{.*}}, i32* %{{.*}}, align 4, !llvm.mem.parallel_loop_access !1
37 ; CHECK: store i32 %{{.*}}, i32* %{{.*}}, align 4, !llvm.access.group !1
3838 ; CHECK-NOT: store i32 %{{.*}}, i32* %{{.*}}, align 4
3939 ; CHECK: br label
4040
6262 %arrayidx = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0
6363 %real_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
6464 %real_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 0
65 %0 = load float, float* %real_.i.i, align 4, !llvm.mem.parallel_loop_access !1
66 store float %0, float* %real_.i, align 4, !llvm.mem.parallel_loop_access !1
65 %0 = load float, float* %real_.i.i, align 4, !llvm.access.group !11
66 store float %0, float* %real_.i, align 4, !llvm.access.group !11
6767 %imaginary_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
6868 %imaginary_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 1
69 %1 = load float, float* %imaginary_.i.i, align 4, !llvm.mem.parallel_loop_access !1
70 store float %1, float* %imaginary_.i, align 4, !llvm.mem.parallel_loop_access !1
69 %1 = load float, float* %imaginary_.i.i, align 4, !llvm.access.group !11
70 store float %1, float* %imaginary_.i, align 4, !llvm.access.group !11
7171 %arrayidx1 = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0
7272 %real_.i1 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
73 %2 = load float, float* %real_.i1, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1
73 %2 = load float, float* %real_.i1, align 4, !noalias !3, !llvm.access.group !11
7474 %real_2.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
75 %3 = load float, float* %real_2.i, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1
75 %3 = load float, float* %real_2.i, align 4, !noalias !3, !llvm.access.group !11
7676 %add.i = fadd float %2, %3
7777 %imaginary_.i2 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
78 %4 = load float, float* %imaginary_.i2, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1
78 %4 = load float, float* %imaginary_.i2, align 4, !noalias !3, !llvm.access.group !11
7979 %imaginary_3.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
80 %5 = load float, float* %imaginary_3.i, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1
80 %5 = load float, float* %imaginary_3.i, align 4, !noalias !3, !llvm.access.group !11
8181 %add4.i = fadd float %4, %5
8282 %real_.i.i3 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 0
83 store float %add.i, float* %real_.i.i3, align 4, !alias.scope !3, !llvm.mem.parallel_loop_access !1
83 store float %add.i, float* %real_.i.i3, align 4, !alias.scope !3, !llvm.access.group !11
8484 %imaginary_.i.i4 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 1
85 store float %add4.i, float* %imaginary_.i.i4, align 4, !alias.scope !3, !llvm.mem.parallel_loop_access !1
85 store float %add4.i, float* %imaginary_.i.i4, align 4, !alias.scope !3, !llvm.access.group !11
8686 %6 = bitcast %class.Complex* %arrayidx1 to i64*
87 %7 = load i64, i64* %ref.tmp, align 8, !llvm.mem.parallel_loop_access !1
88 store i64 %7, i64* %6, align 4, !llvm.mem.parallel_loop_access !1
87 %7 = load i64, i64* %ref.tmp, align 8, !llvm.access.group !11
88 store i64 %7, i64* %6, align 4, !llvm.access.group !11
8989 %inc = add nsw i64 %offset.0, 1
9090 br label %for.cond, !llvm.loop !1
9191
102102 !llvm.ident = !{!0}
103103
104104 !0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}
105 !1 = distinct !{!1, !2}
105 !1 = distinct !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
106106 !2 = !{!"llvm.loop.vectorize.enable", i1 true}
107107 !3 = !{!4}
108108 !4 = distinct !{!4, !5, !"_ZNK7ComplexplERKS_: %agg.result"}
109109 !5 = distinct !{!5, !"_ZNK7ComplexplERKS_"}
110 !11 = distinct !{}
205205 ret void
206206 }
207207
208 ; Check that llvm.mem.parallel_loop_access information is preserved.
208 ; Check that llvm.access.group information is preserved.
209209 define void @f5(i32 %count, <4 x i32> *%src, <4 x i32> *%dst) {
210210 ; CHECK-LABEL: @f5(
211 ; CHECK: %val.i0 = load i32, i32* %this_src.i0, align 16, !llvm.mem.parallel_loop_access ![[TAG:[0-9]*]]
212 ; CHECK: %val.i1 = load i32, i32* %this_src.i1, align 4, !llvm.mem.parallel_loop_access ![[TAG]]
213 ; CHECK: %val.i2 = load i32, i32* %this_src.i2, align 8, !llvm.mem.parallel_loop_access ![[TAG]]
214 ; CHECK: %val.i3 = load i32, i32* %this_src.i3, align 4, !llvm.mem.parallel_loop_access ![[TAG]]
215 ; CHECK: store i32 %add.i0, i32* %this_dst.i0, align 16, !llvm.mem.parallel_loop_access ![[TAG]]
216 ; CHECK: store i32 %add.i1, i32* %this_dst.i1, align 4, !llvm.mem.parallel_loop_access ![[TAG]]
217 ; CHECK: store i32 %add.i2, i32* %this_dst.i2, align 8, !llvm.mem.parallel_loop_access ![[TAG]]
218 ; CHECK: store i32 %add.i3, i32* %this_dst.i3, align 4, !llvm.mem.parallel_loop_access ![[TAG]]
211 ; CHECK: %val.i0 = load i32, i32* %this_src.i0, align 16, !llvm.access.group ![[TAG:[0-9]*]]
212 ; CHECK: %val.i1 = load i32, i32* %this_src.i1, align 4, !llvm.access.group ![[TAG]]
213 ; CHECK: %val.i2 = load i32, i32* %this_src.i2, align 8, !llvm.access.group ![[TAG]]
214 ; CHECK: %val.i3 = load i32, i32* %this_src.i3, align 4, !llvm.access.group ![[TAG]]
215 ; CHECK: store i32 %add.i0, i32* %this_dst.i0, align 16, !llvm.access.group ![[TAG]]
216 ; CHECK: store i32 %add.i1, i32* %this_dst.i1, align 4, !llvm.access.group ![[TAG]]
217 ; CHECK: store i32 %add.i2, i32* %this_dst.i2, align 8, !llvm.access.group ![[TAG]]
218 ; CHECK: store i32 %add.i3, i32* %this_dst.i3, align 4, !llvm.access.group ![[TAG]]
219219 ; CHECK: ret void
220220 entry:
221221 br label %loop
224224 %index = phi i32 [ 0, %entry ], [ %next_index, %loop ]
225225 %this_src = getelementptr <4 x i32>, <4 x i32> *%src, i32 %index
226226 %this_dst = getelementptr <4 x i32>, <4 x i32> *%dst, i32 %index
227 %val = load <4 x i32> , <4 x i32> *%this_src, !llvm.mem.parallel_loop_access !3
227 %val = load <4 x i32> , <4 x i32> *%this_src, !llvm.access.group !13
228228 %add = add <4 x i32> %val, %val
229 store <4 x i32> %add, <4 x i32> *%this_dst, !llvm.mem.parallel_loop_access !3
229 store <4 x i32> %add, <4 x i32> *%this_dst, !llvm.access.group !13
230230 %next_index = add i32 %index, -1
231231 %continue = icmp ne i32 %next_index, %count
232232 br i1 %continue, label %loop, label %end, !llvm.loop !3
446446 !0 = !{ !"root" }
447447 !1 = !{ !"set1", !0 }
448448 !2 = !{ !"set2", !0 }
449 !3 = !{ !3 }
449 !3 = !{ !3, !{!"llvm.loop.parallel_accesses", !13} }
450450 !4 = !{ float 4.0 }
451451 !5 = !{ i64 0, i64 8, null }
452 !13 = distinct !{}
77 br label %for.body
88
99 ; CHECK-LABEL: @Test
10 ; CHECK: load i32, i32* {{.*}}, align 4, !llvm.mem.parallel_loop_access !0
11 ; CHECK: load i32, i32* {{.*}}, align 4, !llvm.mem.parallel_loop_access !0
12 ; CHECK: store i32 {{.*}}, align 4, !llvm.mem.parallel_loop_access !0
10 ; CHECK: load i32, i32* {{.*}}, align 4, !llvm.access.group !0
11 ; CHECK: load i32, i32* {{.*}}, align 4, !llvm.access.group !0
12 ; CHECK: store i32 {{.*}}, align 4, !llvm.access.group !0
1313 ; CHECK-NOT: load
1414 ; CHECK-NOT: store
1515
1616 for.body: ; preds = %cond.end, %entry
1717 %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
1818 %arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
19 %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
19 %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !0
2020 %cmp1 = icmp eq i32 %0, 0
2121 br i1 %cmp1, label %cond.true, label %cond.false
2222
2323 cond.false: ; preds = %for.body
2424 %arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
25 %v = load i32, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0
25 %v = load i32, i32* %arrayidx3, align 4, !llvm.access.group !0
2626 %arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
27 %1 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0
27 %1 = load i32, i32* %arrayidx7, align 4, !llvm.access.group !0
2828 %add = add nsw i32 %1, %v
2929 br label %cond.end
3030
3131 cond.true: ; preds = %for.body
3232 %arrayidx4 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
33 %w = load i32, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
33 %w = load i32, i32* %arrayidx4, align 4, !llvm.access.group !0
3434 %arrayidx8 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
35 %2 = load i32, i32* %arrayidx8, align 4, !llvm.mem.parallel_loop_access !0
35 %2 = load i32, i32* %arrayidx8, align 4, !llvm.access.group !0
3636 %add2 = add nsw i32 %2, %w
3737 br label %cond.end
3838
3939 cond.end: ; preds = %for.body, %cond.false
4040 %cond = phi i32 [ %add, %cond.false ], [ %add2, %cond.true ]
4141 %arrayidx9 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
42 store i32 %cond, i32* %arrayidx9, align 4, !llvm.mem.parallel_loop_access !0
42 store i32 %cond, i32* %arrayidx9, align 4, !llvm.access.group !0
4343 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
4444 %exitcond = icmp eq i64 %indvars.iv.next, 16
4545 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
5050
5151 attributes #0 = { norecurse nounwind uwtable }
5252
53 !0 = distinct !{!0, !1}
53 !0 = distinct !{!0, !1, !{!"llvm.loop.parallel_accesses", !10}}
5454 !1 = !{!"llvm.loop.vectorize.enable", i1 true}
55 !10 = distinct !{}