17

What is the best (shortest and fastest) way to check if StringBuilder ends with specific string?

If I want to check just one char, that's not a problem sb[sb.Length-1] == 'c', but how to check if it's ends with longer string?

I can think about something like looping from "some string".Length and read characters one by one, but maybe there exists something more simple? :)

At the end I want to have extension method like this:

StringBuilder sb = new StringBuilder("Hello world");
bool hasString = sb.EndsWith("world");
Alex Dn
  • 5,465
  • 7
  • 41
  • 79

5 Answers5

30

To avoid the performance overhead of generating the full string, you can use the ToString(int,int) overload that takes the index range.

public static bool EndsWith(this StringBuilder sb, string test)
{
    if (sb.Length < test.Length)
        return false;

    string end = sb.ToString(sb.Length - test.Length, test.Length);
    return end.Equals(test);
}

Edit: It would probably be desirable to define an overload that takes a StringComparison argument:

public static bool EndsWith(this StringBuilder sb, string test)
{
    return EndsWith(sb, test, StringComparison.CurrentCulture);
}

public static bool EndsWith(this StringBuilder sb, string test, 
    StringComparison comparison)
{
    if (sb.Length < test.Length)
        return false;

    string end = sb.ToString(sb.Length - test.Length, test.Length);
    return end.Equals(test, comparison);
}

Edit2: As pointed out by Tim S in the comments, there is a flaw in my answer (and all other answers that assume character-based equality) that affects certain Unicode comparisons. Unicode does not require two (sub)strings to have the same sequence of characters to be considered equal. For example, the precomposed character é should be treated as equal to the character e followed by the combining mark U+0301.

Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");

string s = "We met at the cafe\u0301";
Console.WriteLine(s.EndsWith("café"));    // True 

StringBuilder sb = new StringBuilder(s);
Console.WriteLine(sb.EndsWith("café"));   // False

If you want to handle these cases correctly, it might be easiest to just call StringBuilder.ToString(), and then use the built-in String.EndsWith.

Douglas
  • 53,759
  • 13
  • 140
  • 188
  • Whoops, my foul...I missed that ToString has overload with start and end indexes! Thanks, this is the answer! :) – Alex Dn Jul 10 '13 at 20:27
  • 1
    Nice option to have StringComparer, but compiler won't let go with line end.Equals(test, comparison); he wants StringComparison instead of comparison :) – Alex Dn Jul 10 '13 at 20:58
  • 3
    In certain cultural comparisons, this will not work right. E.g. the example from http://msdn.microsoft.com/en-us/library/t9h2fbth.aspx, using StringBuilder all return `False`. see http://ideone.com/mVHhWR – Tim S. Jul 10 '13 at 20:58
  • @AlexDn: Thanks; my fault for not trying to compile. Fixed. – Douglas Jul 10 '13 at 21:08
  • Hm, I think return comparison.Compare(end, test) == 0; will be better option :) – Alex Dn Jul 10 '13 at 21:09
  • @AlexDn: Why? It would also work, but I think that `Equals` fits better since we're testing for equality, not ordering. – Douglas Jul 10 '13 at 21:13
  • @TimS.: I assume that you posted your comment before I introduced the `StringComparison` overload? The current implementation should be consistent with the Framework classes. – Douglas Jul 10 '13 at 21:14
  • I just think with comparison.Compare it is possible to use any culture and not to fit just Current or Invariant...but, maybe I am wrong. – Alex Dn Jul 10 '13 at 21:15
  • @AlexDn: You're right (in [`String.Compare`](http://msdn.microsoft.com/en-us/library/e6883c06.aspx)), and you make a fair point. Actually, the Framework seems to be inconsistent on whether such an overload should be provided. [`EndsWith`](http://msdn.microsoft.com/en-us/library/system.string.endswith.aspx) has it, but [`Equals`](http://msdn.microsoft.com/en-us/library/system.string.equals.aspx), for some reason, doesn't. – Douglas Jul 10 '13 at 21:21
7

On msdn you can find the topic on how to search text in the StringBuilder object. The two options available to you are:

  1. Call ToString and search the returned String object.
  2. Use the Chars property to sequentially search a range of characters.

Since the first option is out of the question. You'll have to go with the Chars property.

public static class StringBuilderExtensions
{
    public static bool EndsWith(this StringBuilder sb, string text)
    {
        if (sb.Length < text.Length)
            return false;

        var sbLength = sb.Length;
        var textLength = text.Length;
        for (int i = 1; i <= textLength; i++)
        {
            if (text[textLength - i] != sb[sbLength - i])
                return false;
        }
        return true;
    }
}
Hemario
  • 140
  • 9
  • 1
    Yes, this is nice solution to...when I will have some time, I will check the performance difference with ToString and For loop. Anyway, I think ToString(int, int) uses the same loop inside, so performance will be about the same...but it is just assumption :) – Alex Dn Jul 10 '13 at 20:44
  • 1
    It's a bit verbose but I like that this method does not create any garbage. –  Jul 10 '13 at 20:52
  • 1
    Actually, when I think about this, your method may be faster on large strings, because at first not equal char, you finish the loop...with ToString, it is need to get the all number of chars before compare – Alex Dn Jul 10 '13 at 21:13
  • 2
    I also want to avoid `ToString()` but: for loop has best case and worst case scenarios. In **worst case** (last tested char different) for loop was 7 times slower, in **best case** (first char tested different) it was 77 backward or even 140 times forward faster (depends on the loop direction). I tested the release version, 1m calls to each method version, 1000 length of the test string. – Bitterblue Jun 27 '14 at 08:58
2

TL;DR

If you're goal is to get a piece or the whole of the StringBuilder's contents in a String object, you should use its ToString function. But if you aren't yet done creating your string, it's better to treat the StringBuilder as a character array and operate in that way than to create a bunch of strings you don't need.

String operations on a character array can become complicated by localization or encoding, since a string can be encoded in many ways (UTF8 or Unicode, for example), but its characters (System.Char) are meant to be 16-bit UTF16 values.

I've written the following method which returns the index of a string if it exists within the StringBuilder and -1 otherwise. You can use this to create the other common String methods like Contains, StartsWith, and EndsWith. This method is preferable to others because it should handle localization and casing properly, and does not force you to call ToString on the StringBuilder. It creates one garbage value if you specify that case should be ignored, and you can fix this to maximize memory savings by using Char.ToLower instead of precomputing the lower case of the string like I do in the function below. EDIT: Also, if you're working with a string encoded in UTF32, you'll have to compare two characters at a time instead of just one.

You're probably better off using ToString unless you're going to be looping, working with large strings, and doing manipulation or formatting.

public static int IndexOf(this StringBuilder stringBuilder, string str, int startIndex = 0, int? count = null, CultureInfo culture = null, bool ignoreCase = false)
{
    if (stringBuilder == null)
        throw new ArgumentNullException("stringBuilder");

    // No string to find.
    if (str == null)
        throw new ArgumentNullException("str");
    if (str.Length == 0)
        return -1;

    // Make sure the start index is valid.
    if (startIndex < 0 && startIndex < stringBuilder.Length)
        throw new ArgumentOutOfRangeException("startIndex", startIndex, "The index must refer to a character within the string.");

    // Now that we've validated the parameters, let's figure out how many characters there are to search.
    var maxPositions = stringBuilder.Length - str.Length - startIndex;
    if (maxPositions <= 0) return -1;

    // If a count argument was supplied, make sure it's within range.
    if (count.HasValue && (count <= 0 || count > maxPositions))
        throw new ArgumentOutOfRangeException("count");

    // Ensure that "count" has a value.
    maxPositions = count ?? maxPositions;
    if (count <= 0) return -1;

    // If no culture is specified, use the current culture. This is how the string functions behave but
    // in the case that we're working with a StringBuilder, we probably should default to Ordinal.
    culture = culture ?? CultureInfo.CurrentCulture;

    // If we're ignoring case, we need all the characters to be in culture-specific 
    // lower case for when we compare to the StringBuilder.
    if (ignoreCase) str = str.ToLower(culture);

    // Where the actual work gets done. Iterate through the string one character at a time.
    for (int y = 0, x = startIndex, endIndex = startIndex + maxPositions; x <= endIndex; x++, y = 0)
    {
        // y is set to 0 at the beginning of the loop, and it is increased when we match the characters
        // with the string we're searching for.
        while (y < str.Length && str[y] == (ignoreCase ? Char.ToLower(str[x + y]) : str[x + y]))
            y++;

        // The while loop will stop early if the characters don't match. If it didn't stop
        // early, that means we found a match, so we return the index of where we found the
        // match.
        if (y == str.Length)
            return x;
    }

    // No matches.
    return -1;
}

The primary reason one generally uses a StringBuilder object rather than concatenating strings is because of the memory overhead you incur since strings are immutable. The performance hit you see when you do excessive string manipulation without using a StringBuilder is often the result of collecting all the garbage strings you created along the way.

Take this for example:

string firstString = "1st", 
       secondString = "2nd", 
       thirdString = "3rd", 
       fourthString = "4th";
string all = firstString;
all += " & " + secondString;
all += " &" + thirdString;
all += "& " + fourthString + ".";

If you were to run this and open it up in a memory profiler, you'd find a set of strings that look something like this:

"1st", "2nd", "3rd", "4th", 
" & ", " & 2nd", "1st & 2nd"
" &", "&3rd", "1st & 2nd &3rd"
"& ", "& 4th", "& 4th."
"1st & 2nd &3rd& 4th."

That's fourteen total objects we created in that scope, but if you don't realize that every single addition operator creates a whole new string every time you might think there's only five. So what happens to the nine other strings? They languish away in memory until the garbage collector decides to pick them up.

So now to my point: if you're trying to find something out about a StringBuilder object and you're not wanting to call ToString(), it probably means you aren't done building that string yet. And if you're trying to find out if the builder ends with "Foo", it's wasteful to call sb.ToString(sb.Length - 1, 3) == "Foo" because you're creating another string object that becomes orphaned and obsolete the minute you made the call.

My guess is that you're running a loop aggregating text into your StringBuilder and you want to end the loop or just do something different if the last few characters are some sentinel value you're expecting.

David Schwartz
  • 1,956
  • 19
  • 28
1
    private static bool EndsWith(this StringBuilder builder, string value) {
        return builder.GetLast( value.Length ).SequenceEqual( value );
    }
    private static IEnumerable<char> GetLast(this StringBuilder builder, int count) {
        count = Math.Min( count, builder.Length );
        return Enumerable.Range( builder.Length - count, count ).Select( i => builder[ i ] );
    }
Denis535
  • 3,407
  • 4
  • 25
  • 36
  • According to the source code, it seems that the builder[i] indexer linearly searches for a specific chunk, starting with the current chunk. But that could be ok if you have a small number of chunks. – Alexey F Jan 18 '22 at 12:41
0

I'm giving you what you asked for (with the limitations you state) but not the best way to do it. Something like:

StringBuilder sb = new StringBuilder("Hello world"); bool hasString = sb.Remove(1,sb.Length - "world".Length) == "world";

M Miller
  • 86
  • 3
  • There two problems with your solution...1) I don't want to remove anything from initial StringBuilder. 2) Remove returns StringBuilder so it cannot be used as sb == "string"; – Alex Dn Jul 10 '13 at 20:40