0

I have a program which color codes a returned results set a certain way depending on what the results are. Due to the length of time it takes to color-code the results (currently being done with Regex and RichTextBox.Select + .SelectionColor), I cut off color-coding at 400 results. At around that number it takes about 20 seconds, which is just about max time of what I'd consider reasonable.

To try an improve performance I re-wrote the Regex part to use a Parallel.ForEach loop to iterate through the MatchCollection, but the time was about the same (18-19 seconds vs 20)! Is just not a job that lends itself to Parallel programming very well? Should I try something different? Any advice is welcome. Thanks!

PS: Thought it was a bit strange that my CPU utilization never went about 14%, with or without Parallel.ForEach.

Code

MatchCollection startMatches = Regex.Matches(tempRTB.Text, startPattern);

object locker = new object();
System.Threading.Tasks.Parallel.ForEach(startMatches.Cast<Match>(), m =>
{
    int i = 0;
    foreach (Group g in m.Groups)
    {
        if (i > 0 && i < 5 && g.Length > 0)
        {
            tempRTB.Invoke(new Func<bool>(
                delegate
                {
                    lock (locker)
                    {
                        tempRTB.Select(g.Index, g.Length);
                        if ((i & 1) == 0) // Even number
                            tempRTB.SelectionColor = Namespace.Properties.Settings.Default.ValueColor;
                        else              // Odd number
                            tempRTB.SelectionColor = Namespace.Properties.Settings.Default.AttributeColor;
                        return true;
                    }
                }));
        }
        else if (i == 5 && g.Length > 0)
        {
            var result = tempRTB.Invoke(new Func<string>(
                delegate
                {
                    lock (locker)
                    {
                        return tempRTB.Text.Substring(g.Index, g.Length);
                    }
                }));

            MatchCollection subMatches = Regex.Matches((string)result, pattern);

            foreach (Match subMatch in subMatches)
            {
                int j = 0;
                foreach (Group subGroup in subMatch.Groups)
                {
                    if (j > 0 && subGroup.Length > 0)
                    {
                        tempRTB.Invoke(new Func<bool>(
                            delegate
                            {
                                lock (locker)
                                {
                                    tempRTB.Select(g.Index + subGroup.Index, subGroup.Length);
                                    if ((j & 1) == 0) // Even number
                                        tempRTB.SelectionColor = Namespace.Properties.Settings.Default.ValueColor;
                                    else              // Odd number
                                        tempRTB.SelectionColor = Namespace.Properties.Settings.Default.AttributeColor;
                                    return true;
                                }
                            }));
                    }
                    j++;
                }
            }
        }
        i++;
    }
});
Hershizer33
  • 1,206
  • 2
  • 23
  • 46
  • 14% utilization sounds like 100% utilization of one core on a quad core with Hyperthreading. – Daniel Hilgarth May 14 '13 at 14:33
  • How many cores do you have on your machine? – LukeHennerley May 14 '13 at 14:33
  • @LukeHennerley Task manager shows 8 (Its a Intel i7-3770) – Hershizer33 May 14 '13 at 14:35
  • Why do you have lock blocks inside of your `Invoke` statements? By definition, since you're using `Invoke`, the code can only run on one thread, so there can be no synchronization issues. – Servy May 14 '13 at 14:38
  • @Hershizer33: That confirms my suspicion: 100% overall CPU utilization / 8 cores = 12,5% utilization if one core has 100%. --> Your UI thread is utilizing its core completely. – Daniel Hilgarth May 14 '13 at 14:38
  • @Servy Sometimes it doesn't finish, so I assumed it was a race condition and thus tried out the lock blocks, the first version didn't have them. – Hershizer33 May 14 '13 at 14:39
  • @Hershizer33 Then debug it and figure out where it's stopped. Find the actual problem rather than just throwing lock blocks around when you're not even sure if they're needed. – Servy May 14 '13 at 14:41

2 Answers2

5

Virtually no aspect of your program is actually able to run in parallel.

The generation of the matches needs to be done sequentially. It can't find the second match until it has already found the first. Parallel.ForEach will, at best, allow you to process the results of the sequence in parallel, but they are still generated sequentially. This is where the majority of your time consuming work seems to be, and there are no gains there.

On top of that, you aren't really processing the results in parallel either. The majority of code run in the body of your loop is all inside an invoke to the UI thread, which means it's all being run by a single thread.

In short, only a tiny, tiny bit of your program is actually run in parallel, and using parallelization in general adds some overhead; it sounds like you're just barely getting more than that overhead. There isn't really much that you did wrong, the operation just inherently doesn't lend itself to parallelization, unless there is an effective way of breaking up the initial string into several smaller chucks that the regex can parse individually (in parallel).

Servy
  • 202,030
  • 26
  • 332
  • 449
  • I guess I misunderstood how Regex works in .Net.. I though the second this line was finished `MatchCollection startMatches = Regex.Matches(tempRTB.Text, startPattern);` it has all of the matches, but you're saying it still must wait for each one to be found before it finds the next in the ForEach loop? – Hershizer33 May 14 '13 at 14:38
  • 1
    @Hershizer33 Yes. It relies on deferred execution. It won't bother finding all 20 matches if you're only going to ask for the first two. When you ask for the first one it goes and finds the first one. When you ask for the one after that it then goes and finds the next one, and so on, until there are no more. – Servy May 14 '13 at 14:40
  • Understood, thanks didn't know that... So only option is to maybe find a faster Regex pattern then? – Hershizer33 May 14 '13 at 14:55
  • @Hershizer33 Or, as I said, break up the input string into several smaller strings so that you can parse them each individually in parallel; this assumes that there's an appropriate way of breaking up the strings that won't take more time then the parsing you're currently doing. – Servy May 14 '13 at 14:56
  • So the output actually comes into that function as a list of strings, I just put it in a temporary RTB to use the nice auto-rtf stuff it has. I could do the Parallel.ForEach on that initial list, but would each thread having its own RTB to return RTF add too much overhead? – Hershizer33 May 14 '13 at 15:05
  • @Hershizer33 I don't know; I haven't worked with them enough to know. It will mostly depend on how long it actually takes to do the parsing to determine if it's enough to minimize the overhead. – Servy May 14 '13 at 15:09
4

The most time in your code is most likely spent in the part that actually selects the text in the richtext box and sets the color.

This code is impossible to execute in parallel, because it has to be marshalled to the UI thread - which you do via tempRTB.Invoke.

Furthermore, you explicitly make sure that the highlighting is not executed in parallel but sequentially by using the lock statement. This is unnecessary, because all of that code is run on the single UI thread anyway.


You could try to improve your performance by suspending the layouting of your UI while you select and color the text in the RTB:

tempRTB.SuspendLayout();

// your loop

tempRTB.ResumeLayout();
Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
  • I see, I that makes sense. So it is what it is then? Keep the hard limit of 400 results and live with 20 seconds? – Hershizer33 May 14 '13 at 14:37
  • @Hershizer33: You could try to reduce the time the selection takes by disabling UI updates while you perform the coloring. – Daniel Hilgarth May 14 '13 at 14:40
  • I see. Would there be a way to generate the rtf i want without hitting the RTB? Only reason I used RTB in the first place was because it had those handy functions to do it for you... If I could do it without Invoking out of the Parallel thread it would be faster right? – Hershizer33 May 14 '13 at 14:43
  • 1
    @Hershizer33: Sure, you can use a library for that. See [this question](http://stackoverflow.com/questions/12347587/how-to-create-rtf-from-plain-text-or-string-in-c) for some possibilities. – Daniel Hilgarth May 14 '13 at 14:46