24

If you want to only take a part of a string, the substring method is mostly used. This has a drawback that you must first test on the length of the string to avoid errors. For example you want to save data into a database, and want to cut off a value to the first 20 characters.

If you do temp.substring(0,20) but temp only holds 10 chars, an exception is thrown.

There are 2 solutions that I see :

  1. test on the length, and do the substring if needed
  2. use the extension method Take

        string temp = "1234567890";
        var data= new string( temp.Take(20).ToArray());
        --> data now holds "1234657890"
    

Is there any disadvantage in terms of speed or memory use , when one uses the Take method. The benefit is that you do not have to write all those if statements.

Williams
  • 741
  • 1
  • 4
  • 11

6 Answers6

25

If you find yourself doing this a lot, why not write an extension method?

For example:

using System;

namespace Demo
{
    public static class Program
    {
        public static void Main(string[] args)
        {
            Console.WriteLine("123456789".Left(5));
            Console.WriteLine("123456789".Left(15));
        }
    }

    public static class StringExt
    {
        public static string Left(this string @this, int count)
        {
            if (@this.Length <= count)
            {
                return @this;
            }
            else
            {
                return @this.Substring(0, count);
            }
        }
    }
}
Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • this is indeed my preferred solution, it is more readable than the Take, and uses the power of the substring method. Thanks all for the information – Williams Mar 14 '13 at 11:14
14

As Henk Holtermand said, Take() creates an IEnumerator and then you need the ToArray() call.

So, if the performance is important in your application, or you will perform substrings several times in your process, the performance could be a problem.

I wrote an example program to benchmark exactly how slower is the Take() method here are the results:

Tested with ten million times:

  • Time performing substring: 266 ms
  • Time performing take operation: 1437 ms

And here is the code:

    internal const int RETRIES = 10000000;

    static void Main(string[] args)
    {
        string testString = Guid.NewGuid().ToString();

        long timeSubstring = MeasureSubstring(testString);
        long timeTake = MeasureTake(testString);

        Console.WriteLine("Time substring: {0} ms, Time take: {1} ms",
            timeSubstring, timeTake);
    }

    private static long MeasureSubstring(string test)
    {
        long ini = Environment.TickCount;

        for (int i = 0; i < RETRIES; i++)
        {
            if (test.Length > 4)
            {
                string tmp = test.Substring(4);
            }
        }

        return Environment.TickCount - ini;
    }

    private static long MeasureTake(string test)
    {
        long ini = Environment.TickCount;

        for (int i = 0; i < RETRIES; i++)
        {
            var data = new string(test.Take(4).ToArray());
        }

        return Environment.TickCount - ini;
    }
Daniel Peñalba
  • 30,507
  • 32
  • 137
  • 219
  • 1
    Your code doesn't execute the SubString call, since a GUID is always longer than 4 characters. This invalidates your measurement ;) – John Willemse Mar 14 '13 at 10:16
  • 1
    Wow .. 4 years later, but hey ... why not ... You're testing the same string over and over, getting the same results on all of them ... I've added an [answer](https://stackoverflow.com/a/44916564/1698987) that will create a list of input strings of varied lengths, and then do the `substring` / `take` with some more entropy. Results suggest `Take` is 6-10 times slower, but still pretty fast (less than 0.0008 ms per `take`). – Noctis Jul 05 '17 at 03:29
8

Firstly I didn't want to answer (as there already are valid answers), but I would like to add something that doesn't fit as a comment:

You're talking about performance / memory issues. Right. As others said, string.SubString is way more efficient, because of how it is internally optimized and because of how LINQ works with string.Take() (enumeration of chars...etc.).

What no one said is that the main disadvantage of Take() in your case is that it totally destroys the simplicity of a substring. As Tim said, to get the actual string you want, you'll have to write:

string myString = new string(temp.Take(20).ToArray());

Damn... this is so much harder to understand than (see Matthew's extension method):

string myString = temp.Left(20);

LINQ is great for lots of use cases, but shouldn't be used if not necessary. Even a simple loop is sometimes better (i.e. faster, more readable/understandable) than LINQ, so imagine for a simple substring...

To summarize about LINQ in your case:

  • worse performances
  • less readable
  • less understandable
  • requires LINQ (so won't work with .Net 2.0 for instance)
ken2k
  • 48,145
  • 10
  • 116
  • 176
  • You can use an extension method that encapsulates the string constructor: `public static string StringJoin(this IEnumerable chars) { return new string(chars.ToArray()); }` And then use it as follows: `string myString = temp.Take(20).StringJoin();` If it is readability, then I think this solution is quite elegant, otherwise the LINQ is far too slow in comparison with `Substring` – Can Bud Apr 27 '17 at 13:22
3

A variation of @Daniel answer that seems more accurate to me.
a Guid's length is 36. We're creating a list with a variable length of strings from 1 to 36, and we'll aim for taking 18 with the substring / take methods, so around half will go through.

The results I'm getting suggest that Take will be 6-10 times slower than Substring.

Results example :

Build time: 3812 ms
Time substring: 391 ms, Time take: 1828 ms

Build time: 4172 ms
Time substring: 406 ms, Time take: 2141 ms

so, for 5 million strings, doing roughly 2.5 millions operations, total time is 2.1 seconds , or around 0.0008564 milliseconds = ~ 1 micro second per operation. If you feel you need to cut it by 5 for substring, go for it, but I doubt in real life situations, outside of tights loop, you'll ever feel the difference.

void Main()
{
    Console.WriteLine("Build time: {0} ms", BuildInput());
    Console.WriteLine("Time substring: {0} ms, Time take: {1} ms", MeasureSubstring(), MeasureTake());
}

internal const int RETRIES = 5000000;
static internal List<string> input;

// Measure substring time
private static long MeasureSubstring()
{
    var v = new List<string>();
    long ini = Environment.TickCount;

    foreach (string test in input)
        if (test.Length > 18)
        {
            v.Add(test.Substring(18));
        }
    //v.Count().Dump("entries with substring");
    //v.Take(5).Dump("entries with Sub");

    return Environment.TickCount - ini;
}

// Measure take time
private static long MeasureTake()
{
    var v = new List<string>();
    long ini = Environment.TickCount;

    foreach (string test in input)
        if (test.Length > 18) v.Add(new string(test.Take(18).ToArray()));

    //v.Count().Dump("entries with Take");
    //v.Take(5).Dump("entries with Take");

    return Environment.TickCount - ini;
}

// Create a list with random strings with random lengths
private static long BuildInput()
{
    long ini = Environment.TickCount;
    Random r = new Random();
    input = new List<string>();

    for (int i = 0; i < RETRIES; i++)
        input.Add(Guid.NewGuid().ToString().Substring(1,r.Next(0,36)));

    return Environment.TickCount - ini;
}
Noctis
  • 11,507
  • 3
  • 43
  • 82
2

Is there any disadvantage in terms of speed or memory use when one uses the Take method

Yes. Take() involves creating an IEnumerator<char> first and, for each char, going through the hoops of MoveNext() and yield return; etc. Also note the ToArray and the string constructor.

Not an issue for small numbers of strings but in a large loop the specialized string functions are a lot better.

H H
  • 263,252
  • 30
  • 330
  • 514
1

The Take extension method does not create a substring, it returns a query which can be used to create a Char[](ToArray) or a List<Char>(ToList). But you actually want to have that substring.

Then you need other methods as well:

string  data = new string(temp.Take(20).ToArray());

This implicitly uses a foreach to enumerate the chars, creates a new char[] (which might allocate too much size due to the doubling algorithm). Finally a new string is created from the char[].

The Substring on the other hand uses optimized methods.

So you pay this little convenience with memory which might be negligible but not always.

Jim G.
  • 15,141
  • 22
  • 103
  • 166
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939