llvm.org GIT mirror llvm / fb88888
[FileCheck] Re-implement the logic to find each check prefix in the check file to not be unreasonably slow in the face of multiple check prefixes. The previous logic would repeatedly scan potentially large portions of the check file looking for alternative prefixes. In the worst case this would scan most of the file looking for a rare prefix between every single occurance of a common prefix. Even if we bounded the scan, this would do bad things if the order of the prefixes was "unlucky" and the distant prefix was scanned for first. None of this is necessary. It is straightforward to build a state machine that recognizes the first, longest of the set of alternative prefixes. That is in fact exactly whan a regular expression does. This patch builds a regular expression once for the set of prefixes and then uses it to search incrementally for the next prefix. This requires some threading of state but actually makes the code dramatically simpler. I've also added a big comment describing the algorithm as it was not at all obvious to me when I started. With this patch, several previously pathological test cases in test/CodeGen/X86 are 5x and more faster. Overall, running all tests under test/CodeGen/X86 uses 10% less CPU after this, and because all the slowest tests were hitting this, finishes in 40% less wall time on my system (going from just over 5.38s to just over 3.23s) on a release build! This patch substantially improves the time of all 7 X86 tests that were in the top 20 reported by --time-tests, 5 of them are completely off the list and the remaining 2 are much lower. (Sadly, the new tests on the list include 2 new X86 ones that are slow for unrelated reasons, so the count stays at 4 of the top 20.) It isn't clear how much this helps debug builds in aggregate in part because of the noise, but it again makes mane of the slowest x86 tests significantly faster (10% or more improvement). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289382 91177308-0d34-0410-b5e6-96231b3b80d8 Chandler Carruth 3 years ago
1 changed file(s) with 98 addition(s) and 97 deletion(s). Raw diff Collapse all Expand all
743743 return Loc;
744744 }
745745
746 // Try to find the first match in buffer for any prefix. If a valid match is
747 // found, return that prefix and set its type and location. If there are almost
748 // matches (e.g. the actual prefix string is found, but is not an actual check
749 // string), but no valid match, return an empty string and set the position to
750 // resume searching from. If no partial matches are found, return an empty
751 // string and the location will be StringRef::npos. If one prefix is a substring
752 // of another, the maximal match should be found. e.g. if "A" and "AA" are
753 // prefixes then AA-CHECK: should match the second one.
754 static StringRef FindFirstCandidateMatch(StringRef &Buffer,
755 Check::CheckType &CheckTy,
756 size_t &CheckLoc) {
757 StringRef FirstPrefix;
758 size_t FirstLoc = StringRef::npos;
759 size_t SearchLoc = StringRef::npos;
760 Check::CheckType FirstTy = Check::CheckNone;
761
762 CheckTy = Check::CheckNone;
763 CheckLoc = StringRef::npos;
764
765 for (StringRef Prefix : CheckPrefixes) {
766 size_t PrefixLoc = Buffer.find(Prefix);
767
768 if (PrefixLoc == StringRef::npos)
769 continue;
770
771 // Track where we are searching for invalid prefixes that look almost right.
772 // We need to only advance to the first partial match on the next attempt
773 // since a partial match could be a substring of a later, valid prefix.
774 // Need to skip to the end of the word, otherwise we could end up
775 // matching a prefix in a substring later.
776 if (PrefixLoc < SearchLoc)
777 SearchLoc = SkipWord(Buffer, PrefixLoc);
778
779 // We only want to find the first match to avoid skipping some.
780 if (PrefixLoc > FirstLoc)
781 continue;
782 // If one matching check-prefix is a prefix of another, choose the
783 // longer one.
784 if (PrefixLoc == FirstLoc && Prefix.size() < FirstPrefix.size())
785 continue;
786
787 StringRef Rest = Buffer.drop_front(PrefixLoc);
788 // Make sure we have actually found the prefix, and not a word containing
789 // it. This should also prevent matching the wrong prefix when one is a
790 // substring of another.
791 if (PrefixLoc != 0 && IsPartOfWord(Buffer[PrefixLoc - 1]))
792 FirstTy = Check::CheckNone;
793 else
794 FirstTy = FindCheckType(Rest, Prefix);
795
796 FirstLoc = PrefixLoc;
797 FirstPrefix = Prefix;
798 }
799
800 // If the first prefix is invalid, we should continue the search after it.
801 if (FirstTy == Check::CheckNone) {
802 CheckLoc = SearchLoc;
803 return "";
804 }
805
806 CheckTy = FirstTy;
807 CheckLoc = FirstLoc;
808 return FirstPrefix;
809 }
810
811 static StringRef FindFirstMatchingPrefix(StringRef &Buffer,
746 /// Search the buffer for the first prefix in the prefix regular expression.
747 ///
748 /// This searches the buffer using the provided regular expression, however it
749 /// enforces constraints beyond that:
750 /// 1) The found prefix must not be a suffix of something that looks like
751 /// a valid prefix.
752 /// 2) The found prefix must be followed by a valid check type suffix using \c
753 /// FindCheckType above.
754 ///
755 /// The first match of the regular expression to satisfy these two is returned,
756 /// otherwise an empty StringRef is returned to indicate failure.
757 ///
758 /// If this routine returns a valid prefix, it will also shrink \p Buffer to
759 /// start at the beginning of the returned prefix, increment \p LineNumber for
760 /// each new line consumed from \p Buffer, and set \p CheckTy to the type of
761 /// check found by examining the suffix.
762 ///
763 /// If no valid prefix is found, the state of Buffer, LineNumber, and CheckTy
764 /// is unspecified.
765 static StringRef FindFirstMatchingPrefix(Regex &PrefixRE, StringRef &Buffer,
812766 unsigned &LineNumber,
813 Check::CheckType &CheckTy,
814 size_t &CheckLoc) {
767 Check::CheckType &CheckTy) {
768 SmallVector Matches;
769
815770 while (!Buffer.empty()) {
816 StringRef Prefix = FindFirstCandidateMatch(Buffer, CheckTy, CheckLoc);
817 // If we found a real match, we are done.
818 if (!Prefix.empty()) {
819 LineNumber += Buffer.substr(0, CheckLoc).count('\n');
820 return Prefix;
821 }
822
823 // We didn't find any almost matches either, we are also done.
824 if (CheckLoc == StringRef::npos)
771 // Find the first (longest) match using the RE.
772 if (!PrefixRE.match(Buffer, &Matches))
773 // No match at all, bail.
825774 return StringRef();
826775
827 LineNumber += Buffer.substr(0, CheckLoc + 1).count('\n');
828
829 // Advance to the last possible match we found and try again.
830 Buffer = Buffer.drop_front(CheckLoc + 1);
831 }
832
776 StringRef Prefix = Matches[0];
777 Matches.clear();
778
779 assert(Prefix.data() >= Buffer.data() &&
780 Prefix.data() < Buffer.data() + Buffer.size() &&
781 "Prefix doesn't start inside of buffer!");
782 size_t Loc = Prefix.data() - Buffer.data();
783 StringRef Skipped = Buffer.substr(0, Loc);
784 Buffer = Buffer.drop_front(Loc);
785 LineNumber += Skipped.count('\n');
786
787 // Check that the matched prefix isn't a suffix of some other check-like
788 // word.
789 // FIXME: This is a very ad-hoc check. it would be better handled in some
790 // other way. Among other things it seems hard to distinguish between
791 // intentional and unintentional uses of this feature.
792 if (Skipped.empty() || !IsPartOfWord(Skipped.back())) {
793 // Now extract the type.
794 CheckTy = FindCheckType(Buffer, Prefix);
795
796 // If we've found a valid check type for this prefix, we're done.
797 if (CheckTy != Check::CheckNone)
798 return Prefix;
799 }
800
801 // If we didn't successfully find a prefix, we need to skip this invalid
802 // prefix and continue scanning. We directly skip the prefix that was
803 // matched and any additional parts of that check-like word.
804 Buffer = Buffer.drop_front(SkipWord(Buffer, Prefix.size()));
805 }
806
807 // We ran out of buffer while skipping partial matches so give up.
833808 return StringRef();
834809 }
835810
837812 ///
838813 /// The strings are added to the CheckStrings vector. Returns true in case of
839814 /// an error, false otherwise.
840 static bool ReadCheckFile(SourceMgr &SM, StringRef Buffer,
815 static bool ReadCheckFile(SourceMgr &SM, StringRef Buffer, Regex &PrefixRE,
841816 std::vector &CheckStrings) {
842817 std::vector ImplicitNegativeChecks;
843818 for (const auto &PatternString : ImplicitCheckNot) {
865840
866841 while (1) {
867842 Check::CheckType CheckTy;
868 size_t PrefixLoc;
869843
870844 // See if a prefix occurs in the memory buffer.
871 StringRef UsedPrefix =
872 FindFirstMatchingPrefix(Buffer, LineNumber, CheckTy, PrefixLoc);
845 StringRef UsedPrefix = FindFirstMatchingPrefix(PrefixRE, Buffer, LineNumber,
846 CheckTy);
873847 if (UsedPrefix.empty())
874848 break;
875
876 Buffer = Buffer.drop_front(PrefixLoc);
849 assert(UsedPrefix.data() == Buffer.data() &&
850 "Failed to move Buffer's start forward, or pointed prefix outside "
851 "of the buffer!");
877852
878853 // Location to use for error messages.
879 const char *UsedPrefixStart = Buffer.data() + (PrefixLoc == 0 ? 0 : 1);
880
881 // PrefixLoc is to the start of the prefix. Skip to the end.
854 const char *UsedPrefixStart = UsedPrefix.data();
855
856 // Skip the buffer to the end.
882857 Buffer = Buffer.drop_front(UsedPrefix.size() + CheckTypeSize(CheckTy));
883858
884859 // Complain about useful-looking but unsupported suffixes.
12531228 return true;
12541229 }
12551230
1256 // I don't think there's a way to specify an initial value for cl::list,
1257 // so if nothing was specified, add the default
1258 static void AddCheckPrefixIfNeeded() {
1231 // Combines the check prefixes into a single regex so that we can efficiently
1232 // scan for any of the set.
1233 //
1234 // The semantics are that the longest-match wins which matches our regex
1235 // library.
1236 static Regex buildCheckPrefixRegex() {
1237 // I don't think there's a way to specify an initial value for cl::list,
1238 // so if nothing was specified, add the default
12591239 if (CheckPrefixes.empty())
12601240 CheckPrefixes.push_back("CHECK");
1241
1242 // We already validated the contents of CheckPrefixes so just concatenate
1243 // them as alternatives.
1244 SmallString<32> PrefixRegexStr;
1245 for (StringRef Prefix : CheckPrefixes) {
1246 if (Prefix != CheckPrefixes.front())
1247 PrefixRegexStr.push_back('|');
1248
1249 PrefixRegexStr.append(Prefix);
1250 }
1251
1252 return Regex(PrefixRegexStr);
12611253 }
12621254
12631255 static void DumpCommandLine(int argc, char **argv) {
13411333 return 2;
13421334 }
13431335
1344 AddCheckPrefixIfNeeded();
1336 Regex PrefixRE = buildCheckPrefixRegex();
1337 std::string REError;
1338 if (!PrefixRE.isValid(REError)) {
1339 errs() << "Unable to combine check-prefix strings into a prefix regular "
1340 "expression! This is likely a bug in FileCheck's verification of "
1341 "the check-prefix strings. Regular expression parsing failed "
1342 "with the following error: "
1343 << REError << "\n";
1344 return 2;
1345 }
13451346
13461347 SourceMgr SM;
13471348
13631364 SMLoc());
13641365
13651366 std::vector CheckStrings;
1366 if (ReadCheckFile(SM, CheckFileText, CheckStrings))
1367 if (ReadCheckFile(SM, CheckFileText, PrefixRE, CheckStrings))
13671368 return 2;
13681369
13691370 // Open the file to check and add it to SourceMgr.