16

I have a fairly simple string extension method that gets called very frequently in a system I have that is doing a lot of string manipulations. I read this post (String.Substring() seems to bottleneck this code) and thought I would try the same method to see if I could find some performance by changing how I'm reading the string. My results are not quite what I was expecting (I was expecting ReadOnlySpan to provide a significant perf boost), and I'm wondering why that is. In my production code on a real run I found a very slight loss of performance.

I generated a file with ~1.15 million rows of strings with the character I care about, called the method on each, and dumped the results to console.

My results (runtime in milliseconds) are:

ReadOnlySpan.IndexOf Framework 4.7.1: 68538
ReadOnlySpan.IndexOf Core 2.1: 64486

ReadOnlySpan.SequenceEqual Framework 4.7.1: 63650
ReadOnlySpan.SequenceEqual Core 2.1: 65071

substring Framework 4.7.1: 63508
substring Core 2.1: 64125


The code (all identical from Full Framework to Core 2.1):

The calling code:

static void Main(string[] args)
{
    Stopwatch sw = new Stopwatch();
    sw.Start();

    var f = File.ReadAllLines("periods.CSV");

    foreach (string s in f)
    { Console.WriteLine(s.CountOccurrences(".")); }

    sw.Stop();
    Console.WriteLine("Done in " + sw.ElapsedMilliseconds + " ms");
    Console.ReadKey();
}


The original substring form of my method:

public static int CountOccurrencesSub(this string val, string searchFor)
{
    if (string.IsNullOrEmpty(val) || string.IsNullOrEmpty(searchFor))
    { return 0; }

    int count = 0;

    for (int x = 0; x <= val.Length - searchFor.Length; x++)
    {
        if (val.Substring(x, searchFor.Length) == searchFor)
        { count++; }
    }

    return count;
}


The ReadOnlySpan version (which I've tested with both IndexOf and SequenceEqual for equality checks):

public static int CountOccurrences(this string val, string searchFor)
{
    if (string.IsNullOrEmpty(val) || string.IsNullOrEmpty(searchFor))
    { return 0; }

    int count = 0;

    ReadOnlySpan<char> vSpan = val.AsSpan();
    ReadOnlySpan<char> searchSpan = searchFor.AsSpan();

    for (int x = 0; x <= vSpan.Length - searchSpan.Length; x++)
    {
        if (vSpan.Slice(x, searchSpan.Length).SequenceEqual(searchSpan))
        { count++; }
    }

    return count;
}


Does the equality comparison do an allocation in the methods I'm calling, and therefor no boost? Is this just not a good application for ReadOnlySpan? Am I just plain old missing something?

Mardoch
  • 195
  • 1
  • 3
  • 9
  • Your time measurements may have been affected by the `Console.WriteLine()` between the `Start()` and `Stop()` of the timer. The `WriteLine()` is costly and slows things down a lot. You may want to eliminate that and do a measurement first. – Sach Aug 15 '18 at 18:56
  • Not the answer, but... Change ReadAllLines with ReadLines and change Substring with IndexOf. Redo your tests – Steve Aug 15 '18 at 18:56
  • @SelmanGenç, the OP already pointed to that question. – Sach Aug 15 '18 at 19:00
  • 6
    Console.WriteLine and File.ReadAllLines are orders of magnitude slower than anything related to the actual test. When measuring performance, also make sure you're compiling in Release mode and 'warm up' your code (make sure everyting is JIT compiled by calling each function once) – jeroenh Aug 15 '18 at 19:01
  • @Sach oh didn't realize it – Selman Genç Aug 15 '18 at 19:01
  • 2
    @jeroenh is right, you can also use libraries that are specifically dedicated to benchmarking like Benchmark.NET – Selman Genç Aug 15 '18 at 19:06
  • I'm not worried about the cost of the Console.WriteLine or the File.ReadAllLines from the code above. I'm only running this test code to generate some comparative numbers that aren't from the multi-minute runtime of the real application the code came from – Mardoch Aug 15 '18 at 19:06
  • 1
    _only running this test code to generate some comparative numbers_. Those numbers are affected by the read and write operations. As @jeroenh pointed out, if you do a proper test the results are different. I did a similar test with one line which contains one `.` repeated 2 million times, and I got roughly 2400ms vs 2000ms in favor of `CountOccurrences()`. – Sach Aug 15 '18 at 19:15
  • @Mardoch the point is not whether you should worry about Console.WriteLine or File.ReadAllLines. The point is that both of those methods make the test pointless if you include them in the measurements. – jeroenh Aug 15 '18 at 19:25
  • @jeroenh ah, well, I do suppose that's fair. Just noise, as it were. – Mardoch Aug 15 '18 at 19:26
  • 3
    I'm the author of that SO question you mentioned in the link! Please note that when doing benchmarks and profiling there is no one solution that works equally for all problems. In my case the issue was generation of hundreds of milions possibly billions of substrings and the GC couldn't simply keep up collecting. Avoid IO operations as pointed out and compute averages on at least 100 runs. In my case the performance increase was 3000% in that particular special use case but in some general scenario the speed increase on .NET is between 2x and 4x from my personal experience. – Ilhan Aug 18 '18 at 22:15
  • 1
    Also note that I was optimizing that routine for months up to the point of decompiling it down to IL and comparing different versions of the same code untill I've "hit the jackpot". – Ilhan Aug 18 '18 at 22:16
  • 1
    Hey @Ilhan! I hadn't even considered span in this way before I saw one of the .Net team link your SO question on Twitter. This is really my first venture into something really perf intensive in this way so it was a pure learning experience. Thanks for posting your solution to your own question. – Mardoch Aug 19 '18 at 05:35
  • May I have a link to he twitter post? You got me curious now =). But yes the speed increase is mainly due to the memory copying avoided with Span. – Ilhan Aug 19 '18 at 12:55
  • 1
    Sorry, I misremembered the actual source of the post. It's this one https://twitter.com/vcsjones/status/1028484794676600833 and it appeared in my stream through a retweet from David Fowler. – Mardoch Aug 19 '18 at 22:46

3 Answers3

25

Although I'm a bit late to the party but I think I can still add relevant information to this topic.

First of all, some words about the other posters' measurements.

OP's results are clearly incorrect. As it was pointed out in the comments, the I/O operations completely distorts the stats.

The poster of the accepted answer is on the right track. His method eliminates the slow I/O operations and focuses clearly on the subject of the benchmark. However, he doesn't mention the environment (especially the .NET runtime) used and his "warm-up method" is rather debatable.

Performance measurement is a really tricky business, it's very hard to get it right. I wouldn't even try to code it myself if I wanted to get valid results. So I decided to check out this issue using the widely-adopted Benchmark.NET library. To make this all more interesting I added a third candidate to the mix. This implementation uses String.CompareOrdinal for occurence counting and I expected pretty good results from it.

The benchmark

Before the measurement starts (at the global setup stage), I generate 1,000,000 lines of lorem ipsum text. This data is used throughout the measurement.

Each method is exercised with 1,000 and 1,000,000 lines and with a shorter (5 characters long) and a longer (39 characters long) search text.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace MyBenchmarks
{
#if NETCOREAPP2_1
    [CoreJob]
#else
    [ClrJob]
#endif
    [RankColumn, MarkdownExporterAttribute.StackOverflow]
    public class Benchmark
    {
        static readonly string[] words = new[]
        {
            "lorem", "ipsum", "dolor", "sit", "amet", "consectetuer",
            "adipiscing", "elit", "sed", "diam", "nonummy", "nibh", "euismod",
            "tincidunt", "ut", "laoreet", "dolore", "magna", "aliquam", "erat"
        };

        // borrowed from greg (https://stackoverflow.com/questions/4286487/is-there-any-lorem-ipsum-generator-in-c)
        static IEnumerable<string> LoremIpsum(Random random, int minWords, int maxWords, int minSentences, int maxSentences, int numLines)
        {
            var line = new StringBuilder();
            for (int l = 0; l < numLines; l++)
            {
                line.Clear();
                var numSentences = random.Next(maxSentences - minSentences) + minSentences + 1;
                for (int s = 0; s < numSentences; s++)
                {
                    var numWords = random.Next(maxWords - minWords) + minWords + 1;
                    line.Append(words[random.Next(words.Length)]);
                    for (int w = 1; w < numWords; w++)
                    {
                        line.Append(" ");
                        line.Append(words[random.Next(words.Length)]);
                    }
                    line.Append(". ");
                }
                yield return line.ToString();
            }
        }

        string[] lines;

        [Params(1000, 1_000_000)]
        public int N;

        [Params("lorem", "lorem ipsum dolor sit amet consectetuer")]
        public string SearchValue;

        [GlobalSetup]
        public void GlobalSetup()
        {
            lines = LoremIpsum(new Random(), 6, 8, 2, 3, 1_000_000).ToArray();
        }

        public static int CountOccurrencesSub(string val, string searchFor)
        {
            if (string.IsNullOrEmpty(val) || string.IsNullOrEmpty(searchFor))
            { return 0; }

            int count = 0;

            for (int x = 0; x <= val.Length - searchFor.Length; x++)
            {
                if (val.Substring(x, searchFor.Length) == searchFor)
                { count++; }
            }

            return count;
        }

        public static int CountOccurrences(string val, string searchFor)
        {
            if (string.IsNullOrEmpty(val) || string.IsNullOrEmpty(searchFor))
            { return 0; }

            int count = 0;

            ReadOnlySpan<char> vSpan = val.AsSpan();
            ReadOnlySpan<char> searchSpan = searchFor.AsSpan();

            for (int x = 0; x <= vSpan.Length - searchSpan.Length; x++)
            {
                if (vSpan.Slice(x, searchSpan.Length).SequenceEqual(searchSpan))
                { count++; }
            }

            return count;
        }

        public static int CountOccurrencesCmp(string val, string searchFor)
        {
            if (string.IsNullOrEmpty(val) || string.IsNullOrEmpty(searchFor))
            { return 0; }

            int count = 0;

            for (int x = 0; x <= val.Length - searchFor.Length; x++)
            {
                if (string.CompareOrdinal(val, x, searchFor, 0, searchFor.Length) == 0)
                { count++; }
            }

            return count;
        }


        [Benchmark(Baseline = true)]
        public int Substring()
        {
            int occurences = 0;
            for (var i = 0; i < N; i++)
                occurences += CountOccurrencesSub(lines[i], SearchValue);
            return occurences;
        }

        [Benchmark]
        public int Span()
        {
            int occurences = 0;
            for (var i = 0; i < N; i++)
                occurences += CountOccurrences(lines[i], SearchValue);
            return occurences;
        }

        [Benchmark]
        public int Compare()
        {
            int occurences = 0;
            for (var i = 0; i < N; i++)
                occurences += CountOccurrencesCmp(lines[i], SearchValue);
            return occurences;
        }
    }

    public class Program
    {
        public static void Main(string[] args)
        {
            BenchmarkRunner.Run<Benchmark>();
        }
    }
}

The results

NET Core 2.1

BenchmarkDotNet=v0.11.0, OS=Windows 7 SP1 (6.1.7601.0)
Intel Core i3-4360 CPU 3.70GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
Frequency=3604970 Hz, Resolution=277.3948 ns, Timer=TSC
.NET Core SDK=2.1.400
  [Host] : .NET Core 2.1.2 (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT
  Core   : .NET Core 2.1.2 (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT

Job=Core  Runtime=Core  

    Method |       N |          SearchValue |           Mean |           Error |          StdDev |         Median | Scaled | ScaledSD | Rank |
---------- |-------- |--------------------- |---------------:|----------------:|----------------:|---------------:|-------:|---------:|-----:|
 Substring |    1000 |                lorem |     2,149.4 us |       2.2763 us |       2.1293 us |     2,149.4 us |   1.00 |     0.00 |    3 |
      Span |    1000 |                lorem |       555.5 us |       0.2786 us |       0.2470 us |       555.5 us |   0.26 |     0.00 |    1 |
   Compare |    1000 |                lorem |     1,471.8 us |       0.2133 us |       0.1891 us |     1,471.8 us |   0.68 |     0.00 |    2 |
           |         |                      |                |                 |                 |                |        |          |      |
 Substring |    1000 | lorem(...)etuer [39] |     2,128.7 us |       1.0414 us |       0.9741 us |     2,128.6 us |   1.00 |     0.00 |    3 |
      Span |    1000 | lorem(...)etuer [39] |       388.9 us |       0.0440 us |       0.0412 us |       388.9 us |   0.18 |     0.00 |    1 |
   Compare |    1000 | lorem(...)etuer [39] |     1,215.6 us |       0.7016 us |       0.6220 us |     1,215.5 us |   0.57 |     0.00 |    2 |
           |         |                      |                |                 |                 |                |        |          |      |
 Substring | 1000000 |                lorem | 2,239,510.8 us | 241,887.0796 us | 214,426.5747 us | 2,176,083.7 us |   1.00 |     0.00 |    3 |
      Span | 1000000 |                lorem |   558,317.4 us |     447.3105 us |     418.4144 us |   558,338.9 us |   0.25 |     0.02 |    1 |
   Compare | 1000000 |                lorem | 1,471,941.2 us |     190.7533 us |     148.9276 us | 1,471,955.8 us |   0.66 |     0.05 |    2 |
           |         |                      |                |                 |                 |                |        |          |      |
 Substring | 1000000 | lorem(...)etuer [39] | 2,350,820.3 us |  46,974.4500 us | 115,229.1264 us | 2,327,187.2 us |   1.00 |     0.00 |    3 |
      Span | 1000000 | lorem(...)etuer [39] |   433,567.7 us |  14,445.7191 us |  42,593.5286 us |   417,333.4 us |   0.18 |     0.02 |    1 |
   Compare | 1000000 | lorem(...)etuer [39] | 1,299,065.2 us |  25,474.8504 us |  46,582.2045 us | 1,296,892.8 us |   0.55 |     0.03 |    2 |  

NET Framework 4.7.2

BenchmarkDotNet=v0.11.0, OS=Windows 7 SP1 (6.1.7601.0)
Intel Core i3-4360 CPU 3.70GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
Frequency=3604960 Hz, Resolution=277.3956 ns, Timer=TSC
  [Host] : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3062.0
  Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3062.0

Job=Clr  Runtime=Clr  

    Method |       N |          SearchValue |           Mean |          Error |          StdDev |         Median | Scaled | ScaledSD | Rank |
---------- |-------- |--------------------- |---------------:|---------------:|----------------:|---------------:|-------:|---------:|-----:|
 Substring |    1000 |                lorem |     2,025.8 us |      2.4639 us |       1.9237 us |     2,025.4 us |   1.00 |     0.00 |    3 |
      Span |    1000 |                lorem |     1,216.6 us |      4.2994 us |       4.0217 us |     1,217.8 us |   0.60 |     0.00 |    1 |
   Compare |    1000 |                lorem |     1,295.5 us |      5.2427 us |       4.6475 us |     1,293.1 us |   0.64 |     0.00 |    2 |
           |         |                      |                |                |                 |                |        |          |      |
 Substring |    1000 | lorem(...)etuer [39] |     1,939.5 us |      0.4428 us |       0.4142 us |     1,939.3 us |   1.00 |     0.00 |    3 |
      Span |    1000 | lorem(...)etuer [39] |       944.9 us |      2.6648 us |       2.3622 us |       944.7 us |   0.49 |     0.00 |    1 |
   Compare |    1000 | lorem(...)etuer [39] |     1,002.0 us |      0.2475 us |       0.2067 us |     1,002.1 us |   0.52 |     0.00 |    2 |
           |         |                      |                |                |                 |                |        |          |      |
 Substring | 1000000 |                lorem | 2,065,805.7 us |  2,009.2139 us |   1,568.6619 us | 2,065,555.1 us |   1.00 |     0.00 |    3 |
      Span | 1000000 |                lorem | 1,209,976.4 us |  6,238.6091 us |   5,835.5982 us | 1,206,554.3 us |   0.59 |     0.00 |    1 |
   Compare | 1000000 |                lorem | 1,303,321.8 us |  1,257.7418 us |   1,114.9552 us | 1,303,330.1 us |   0.63 |     0.00 |    2 |
           |         |                      |                |                |                 |                |        |          |      |
 Substring | 1000000 | lorem(...)etuer [39] | 2,085,652.9 us | 62,651.7471 us | 168,309.8501 us | 1,973,522.2 us |   1.00 |     0.00 |    3 |
      Span | 1000000 | lorem(...)etuer [39] |   958,421.2 us |  3,703.5508 us |   3,464.3034 us |   958,324.9 us |   0.46 |     0.03 |    1 |
   Compare | 1000000 | lorem(...)etuer [39] | 1,007,936.8 us |    802.1730 us |     750.3531 us | 1,007,680.3 us |   0.49 |     0.04 |    2 |

Conclusion

It's clear that there is a solid performance gain using Span<T>. What's somewhat surprising that it's 4-5x on .NET Core and only 2x on .NET Framework. What reasons could be behind that? Does anyone have a clue?

String.CompareOrdinal performs pretty good as well. I expected somewhat better results because theoretically it's just the same byte by byte compare but it's not bad at all. On .NET Framework it's a viable option by all means.

The length of the search string (except for extremities, of course) doesn't seem to have too much impact on the results.

Adam Simon
  • 2,762
  • 16
  • 22
  • I noted the difference in Core vs Framework performance myself and I'm guessing it's due to peculiarities in the underlying implementations? I definitely still find value in your answer and thanks for the time you put into it! I've learned much more than I expected by asking this question. – Mardoch Aug 16 '18 at 13:45
  • I'm glad to read you've found my efforts helpful. It's likely that there are some implementational differences explaining the results of the Span approach but I'm afraid only someone insider would be able to shed light on this... – Adam Simon Aug 16 '18 at 14:32
  • I'm the author of the referenced SO question (and answer). My initial post was about that specific code segment which was bottlenecked due to an insane amount of substrings generated and the GC not being able to handle it very well. Quite literally reading it from the memory using Span made a 3000% difference. Note that I am optimizing a small function for months up to the point of decompiling it to IL and comparing different styles of implementation. Some personal tests (although undocumented and not as thorough as yours) yielded roughly the same results (cca 2-4x performance increase) – Ilhan Aug 18 '18 at 22:25
  • 4
    The implementation and optimization of `Span` is different on full framework than it it on core, and on full framework code using span will run more slowly than on core (sometimes referred to as ["fast span" vs "slow span"](http://adamsitnik.com/Span/)). – Andy Ayers Sep 02 '18 at 16:24
5

I was intrigued and tried to repeat your test. Depending on how large the data set is, the code using ReadOnlySpan performs almost twice as fast:

CountOccurences Done in 1080 ms
CountOccurencesSub Done in 1789 ms

With larger data sets, the difference seems to increase (which seems logical as Substring allocates a string, which increases GC pressure).

I used this code to test:

static void Main(string[] args)
{
    var r = new Random();

    // generate 100000 lines of 1000 random characters
    var text = Enumerable.Range(0, 100000).Select(x => new string(Enumerable.Range(0, 1000).Select(i => (char)r.Next(255)).ToArray())).ToArray();

    // warm up
    "".CountOccurrencesSub("");
    "".CountOccurrences("");

    Measure(text, "CountOccurencesSub", s => s.CountOccurrencesSub("."));

    Measure(text, "CountOccurences", s => s.CountOccurrences("."));

    Console.ReadKey();
}

private static void Measure(string[] text, string test, Action<string> action)
{
    Stopwatch sw = new Stopwatch();
    sw.Start();

    foreach (string s in text)
    {
        action(s);
    }

    sw.Stop();

    Console.WriteLine($"{test} Done in {sw.ElapsedMilliseconds} ms");
}
jeroenh
  • 26,362
  • 10
  • 73
  • 104
  • To add to this answer... Because of the string creation and subsequent GC activity, the relative performance will probably vary significantly depending on available resources, the lengths of the substrings, etc. – Chris Shain Aug 15 '18 at 19:12
  • That's interesting. I took your code, and ran it on my machine and added an additional method to check. I also ran my file in the same fashion immediately after ward and my results look like this: Text CountOccurencesIndexOf Done in 1342 ms Text CountOccurencesSequenceEqual Done in 1072 ms Text CountOccurencesSub Done in 1729 ms File CountOccurencesIndexOf Done in 1274 ms File CountOccurencesSequenceEqual Done in 1028 ms File CountOccurencesSub Done in 1619 ms So A) the shape of the strings matters a lot and B) SequenceEqual is by far the best choice. – Mardoch Aug 15 '18 at 19:36
-2

The performance benchmark that hurt me the most with ReadOnlySpan is that it put me in a world of hurt when I tried to use it in an app that gets put into the Windows app store. The ReadOnlySpan<> stuff isn't compatible (yet, it seems) with compiling with the .NETNative tool chain. That, in turn may (or may not) trigger a cascade of subsequent WACK test errors that will block submission to the store.

So far, I've spent two days just trying to get an app that uses ReadOnlySpan<> into the store. That far, far outweighs any possible performance improvements I might have potentially gotten.

PESMITH_MSFT
  • 350
  • 1
  • 9