(Problem solved. See my answer bellow.)
I just did a profile for my project(winform / C#) because I felt that it worked much slower than before. It is strange that List.AddRange() costs 92% of the total profiling process.
Code1: With the following code, it takes 2m30s to finish a scan job(not in profiling mode):
var allMatches = new List<Match>();
foreach (var typedRegex in Regexes)
{
var ms = typedRegex.Matches(text); //typedRegex is just Regex.
allMatches.AddRange(ms);
}
Function Name Total CPU [unit, %] Self CPU [unit, %] Module Category |||||||||||||||| - [External Call] System.Collections.Generic.List.InsertRange(int, System.Collections.Generic.IEnumerable<!0>) 146579 (92.45%) 146579 (92.45%) Multiple modules IO | Kernel
Code2: So I removed the AddRange, and it costs only 1.6s:
var allMatches = new List<Match>();
foreach (var typedRegex in Regexes)
{
var ms = typedRegex.Matches(text);
// allMatches.AddRange(ms);
}
Code3: Thinking that there might be some kind of "lazy load" mechanism, I added a counter to trigger the Regex.Maches(). And the value of the counter is displayed in the UI. Not it takes 9s:
public static int Count = 0;
var allMatches = new List<Match>();
foreach (var typedRegex in Regexes)
{
var ms = typedRegex.Matches(text);
// allMatches.AddRange(ms);
Count += ms.Count;
}
Code4: Noticing the value of Count is 32676, so I pre-allocated memories for the list. Now it still costs 9s:
public static int Count = 0;
var allMatches = new List<Match>(33000);
foreach (var typedRegex in Regexes)
{
var ms = typedRegex.Matches(text);
// allMatches.AddRange(ms);
Count += ms.Count;
}
Code5: Thinking List.AddRange(MatchCollection) might sound strange, I changed the code to foreach(...) {List.Add(match)}, but nothing happened, 2m30s. The profile says Function Name Total CPU [unit, %] Self CPU [unit, %] Module Category |||||||||||||||| - [External Call] System.Text.RegularExpressions.MatchCollection.MatchCollection+Enumerator.MoveNext() 183804 (92.14%) 183804 (92.14%) Multiple modules IO | Kernel
Code6: SelectMany cost 2m30s as well. It's my oldest solution.
var allMatches = Regexes.SelectMany(i => i.Matches(text));
So, maybe creating a list up to 32676 items is a big deal, but 10 times more than creating those Match is out of imagination. It cost 27s to finish the job just 1 day before. I made a lot of changes today, and thought the profiler would tell me why. But it didn't. That AddRange() was there 1 month before. I can barely remember it's name from any profiles before.
I will try to remember what happened during the day. But could anybody explain the profile result above? Thanks for any help.