116

Is there a better way of doing this...

MyString.Trim().Replace("&", "and").Replace(",", "").Replace("  ", " ")
         .Replace(" ", "-").Replace("'", "").Replace("/", "").ToLower();

I've extended the string class to keep it down to one job but is there a quicker way?

public static class StringExtension
{
    public static string clean(this string s)
    {
        return s.Replace("&", "and").Replace(",", "").Replace("  ", " ")
                .Replace(" ", "-").Replace("'", "").Replace(".", "")
                .Replace("eacute;", "é").ToLower();
    }
}

Just for fun (and to stop the arguments in the comments) I've shoved a gist up benchmarking the various examples below.

https://gist.github.com/ChrisMcKee/5937656

The regex option scores terribly; the dictionary option comes up the fastest; the long winded version of the stringbuilder replace is slightly faster than the short hand.

Chris McKee
  • 4,298
  • 10
  • 48
  • 83
  • 1
    Based on what you have in your benchmarks it looks like the dictionary version isn't doing all of the replacements which I suspect is what is making it faster than the StringBuilder solutions. – toad Sep 12 '14 at 21:23
  • 1
    @toad Hi from 2009; I added a comment below in April about that glaring mistake. The gist is updated though I skipped over D. The dictionary version is still faster. – Chris McKee Sep 15 '14 at 00:24
  • Possible duplicate of [Alternative to String.Replace multiple times?](http://stackoverflow.com/questions/12007358/alternative-to-string-replace-multiple-times) – Tot Zam Mar 17 '16 at 20:19
  • 1
    @TotZam at least check the dates before flagging things; this is from 2009 thats from 2012 – Chris McKee Jun 22 '16 at 13:27
  • Since many answers here seem concerned with performance, I believe it should be pointed out [Andrej Adamanko's answer](https://stackoverflow.com/a/23784456/7197632) is likely to be the fastest for many replacements; certainly faster than chaining .Replace() especially on a large input string as stated in his answer. – person27 Dec 09 '19 at 01:05
  • Note that this and many answers may be inappropriate for arbitrary replacement strings. `"abc".Replace("a", "c").Replace("c", "d")` will produce "dbd" as opposed to the expected "cbd". – Rei Miyasaka Feb 07 '22 at 00:51

10 Answers10

160

Quicker - no. More effective - yes, if you will use the StringBuilder class. With your implementation each operation generates a copy of a string which under circumstances may impair performance. Strings are immutable objects so each operation just returns a modified copy.

If you expect this method to be actively called on multiple Strings of significant length, it might be better to "migrate" its implementation onto the StringBuilder class. With it any modification is performed directly on that instance, so you spare unnecessary copy operations.

public static class StringExtention
{
    public static string clean(this string s)
    {
        StringBuilder sb = new StringBuilder (s);

        sb.Replace("&", "and");
        sb.Replace(",", "");
        sb.Replace("  ", " ");
        sb.Replace(" ", "-");
        sb.Replace("'", "");
        sb.Replace(".", "");
        sb.Replace("eacute;", "é");

        return sb.ToString().ToLower();
    }
}
BC2
  • 892
  • 1
  • 7
  • 23
  • 2
    For clarity the dictionary answer is the fastest http://stackoverflow.com/a/1321366/52912 – Chris McKee Jul 06 '13 at 20:28
  • 6
    In your benchmark on https://gist.github.com/ChrisMcKee/5937656 the dictionary test is not complete: it does not do all replacements and " " replaces " ", not " ". Not doing all replacements could be the reason, why it's fastest in the benchmark. The regex replacement is not complete, either. But most importantly your string TestData is _very_ short. Like the accepted answer states, the string has to be of significant length for the StringBuilder to be of advantage. Could you please repeat the benchmark with strings of 10kB, 100kB and 1MB? – Leif Feb 14 '14 at 08:42
  • Its a good point; as it stands it was being used for url cleansing so testings at 100kb - 1mb would have been unrealistic. I will update the benchmark so its using the whole thing though, that was a mistake. – Chris McKee Apr 25 '14 at 09:30
  • For best performance, loop over the characters and replace them yourself. However that can be tedious if you have more than single characters strings (find them enforces you to compare multiple characters at once, while replacing them requires allocating more memory and moving the rest of the string). – Chayim Friedman Jun 10 '20 at 21:56
  • When none of the characters or strings to be replaced occur in the input string, this will be a very bad solution. In that case String.Replace would just return the original reference and be dirt cheap compared to the StringBuilder solution. – Michel van Engelen Oct 18 '21 at 14:47
  • The isolated `ToLower` call at the end will force another allocation of the whole string. Any suggestions to prevent that memory waste as well that still looks readable? – julealgon Sep 01 '22 at 15:40
31

If you are simply after a pretty solution and don't need to save a few nanoseconds, how about some LINQ sugar?

var input = "test1test2test3";
var replacements = new Dictionary<string, string> { { "1", "*" }, { "2", "_" }, { "3", "&" } };

var output = replacements.Aggregate(input, (current, replacement) => current.Replace(replacement.Key, replacement.Value));
TimS
  • 2,085
  • 20
  • 31
  • Similar to example C in the Gist (if you look above it the uglier linq statement is in the comment) – Chris McKee Sep 15 '14 at 00:26
  • 4
    Interesting that you define a functional statment as "Uglier" than a procedural one. – TimS Sep 15 '14 at 00:36
  • not going to argue about it; its merely preference. As you say, linq is simply syntactic sugar; and as I said I'd already put the equivalent above the code :) – Chris McKee Sep 15 '14 at 08:48
17

this will be more efficient:

public static class StringExtension
{
    public static string clean(this string s)
    {
        return new StringBuilder(s)
              .Replace("&", "and")
              .Replace(",", "")
              .Replace("  ", " ")
              .Replace(" ", "-")
              .Replace("'", "")
              .Replace(".", "")
              .Replace("eacute;", "é")
              .ToString()
              .ToLower();
    }
}
Magnus
  • 45,362
  • 8
  • 80
  • 118
TheVillageIdiot
  • 40,053
  • 20
  • 133
  • 188
  • Really hard to read. I am sure you know what it does but a Junior Dev will scratch his head at what actually goes on. I agree- I also always look for the shortes hand of writting something- But it was only for my own satisfaction. Other people were freaking out at the pile of mess. – Piotr Kula Feb 12 '13 at 10:55
  • 4
    This is actually slower. BenchmarkOverhead... 13ms StringClean-user151323... 2843ms StringClean-TheVillageIdiot... 2921ms Varies on reruns but the answer wins https://gist.github.com/anonymous/5937596 – Chris McKee Jul 05 '13 at 22:16
11

Maybe a little more readable?

    public static class StringExtension {

        private static Dictionary<string, string> _replacements = new Dictionary<string, string>();

        static StringExtension() {
            _replacements["&"] = "and";
            _replacements[","] = "";
            _replacements["  "] = " ";
            // etc...
        }

        public static string clean(this string s) {
            foreach (string to_replace in _replacements.Keys) {
                s = s.Replace(to_replace, _replacements[to_replace]);
            }
            return s;
        }
    }

Also add New In Town's suggestion about StringBuilder...

Paolo Tedesco
  • 55,237
  • 33
  • 144
  • 193
  • 6
    It would be more readable like this: `private static Dictionary _replacements = new Dictionary() { {"&", "and"}, {",", ""}, {" ", " "} /* etc */ };` – ANeves Aug 04 '11 at 10:04
  • 2
    or of course... private static readonly Dictionary Replacements = new Dictionary() { { "&", "and" }, { ",", "" }, { " ", " " } /* etc */ }; public static string Clean(this string s) { return Replacements.Keys.Aggregate(s, (current, toReplace) => current.Replace(toReplace, Replacements[toReplace])); } – Chris McKee Jul 05 '13 at 22:22
  • 3
    -1 : Using a Dictionary doesen't make any sence here. Just use a `List>`. This also changes the order of the replacings is taken AND is not as fast as e.g. `s.Replace("a").Replace("b").Replace("c")`. Don't use this! – Thomas Jul 01 '19 at 14:29
10

There is one thing that may be optimized in the suggested solutions. Having many calls to Replace() makes the code to do multiple passes over the same string. With very long strings the solutions may be slow because of CPU cache capacity misses. May be one should consider replacing multiple strings in a single pass.

The essential content from that link:

static string MultipleReplace(string text, Dictionary<string, string> replacements) {
    return Regex.Replace(text,
                 "(" + String.Join("|", replacements.Keys) + ")",
                 delegate(Match m) { return replacements[m.Value]; });
}
    // somewhere else in code
    string temp = "Jonathan Smith is a developer";

    var adict = new Dictionary<string, string>();
    adict.Add("Jonathan", "David");
    adict.Add("Smith", "Seruyange");

    string rep = MultipleReplace(temp, adict);
mskfisher
  • 3,291
  • 4
  • 35
  • 48
Andrej Adamenko
  • 1,650
  • 15
  • 31
  • 2
    A lot of answers seem concerned about performance, in which case this is the best. And it's simple because it's just [a documented overload](https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.replace?view=netframework-4.8#System_Text_RegularExpressions_Regex_Replace_System_String_System_String_System_Text_RegularExpressions_MatchEvaluator_) of String.Replace where you return an expected value based on the match, in this example, using a dictionary to match them up. Should be simple to understand. – person27 Dec 09 '19 at 01:01
  • 2
    Added code from linked page to prevent this answer becoming useless if linked page dies – Caius Jard Feb 14 '22 at 07:23
  • Please correct your answer so the code actually works. In your MultipleReplace method there is no variable called 'adict', change it to 'replacements and remove the .ToArray(). Indent code correctly to make it more readable. Thank you for the solution. – Bijan Negari Sep 23 '22 at 08:42
4

Another option using linq is

[TestMethod]
public void Test()
{
  var input = "it's worth a lot of money, if you can find a buyer.";
  var expected = "its worth a lot of money if you can find a buyer";
  var removeList = new string[] { ".", ",", "'" };
  var result = input;

  removeList.ToList().ForEach(o => result = result.Replace(o, string.Empty));

  Assert.AreEqual(expected, result);
}
Chris McKee
  • 4,298
  • 10
  • 48
  • 83
Luiz Felipe
  • 49
  • 1
  • 1
  • 1
    You may declare `var removeList = new List { /*...*/ };` then just call `removeList.ForEach( /*...*/ );` and simplify your code. Note also that it doesn't fully answer the question because *all* found strings are replaced with `String.Empty`. – Tok' Nov 17 '17 at 22:17
  • 2
    Where precisely is Linq used? This wastefully converts `removeList` to a `List`, for the unnecessary goal of making it a single line. But Lamdas and Linq aren't synonymous. – Devin Burke May 31 '21 at 03:49
  • Note, List.ForEach is not a LINQ thing, it's a List thing – Caius Jard Feb 14 '22 at 07:24
0

Regular Expression with MatchEvaluator could also be used:

    var pattern = new Regex(@"These|words|are|placed|in|parentheses");
    var input = "The matching words in this text are being placed inside parentheses.";
    var result = pattern.Replace(input , match=> $"({match.Value})");

Note:

  • Obviously different expression (like: \b(\w*test\w*)\b) could be used for words matching.
  • I was hoping it to be more optimized to find the pattern in expression and do the replacements
  • The advantage is the ability to process the matching elements while doing the replacements
babakansari
  • 89
  • 1
  • 4
  • This answer would be improved by showing a better use of the match delegate than simply supplying the same value that was matched; it's a non op – Caius Jard Feb 14 '22 at 07:28
0

This is essentially Paolo Tedesco's answer, but I wanted to make it re-usable.

    public class StringMultipleReplaceHelper
    {
        private readonly Dictionary<string, string> _replacements;

        public StringMultipleReplaceHelper(Dictionary<string, string> replacements)
        {
            _replacements = replacements;
        }

        public string clean(string s)
        {
            foreach (string to_replace in _replacements.Keys)
            {
                s = s.Replace(to_replace, _replacements[to_replace]);
            }
            return s;
        }
    }

One thing to note that I had to stop it being an extension, remove the static modifiers, and remove this from clean(this string s). I'm open to suggestions as to how to implement this better.

red_dorian
  • 327
  • 1
  • 6
  • 18
0

I'm doing something similar, but in my case I'm doing serialization/De-serialization so I need to be able to go both directions. I find using a string[][] works nearly identically to the dictionary, including initialization, but you can go the other direction too, returning the substitutes to their original values, something that the dictionary really isn't set up to do.

Edit: You can use Dictionary<Key,List<Values>> in order to obtain same result as string[][]

radu florescu
  • 4,315
  • 10
  • 60
  • 92
sidDemure
  • 17
  • 1
-1
string input = "it's worth a lot of money, if you can find a buyer.";
for (dynamic i = 0, repl = new string[,] { { "'", "''" }, { "money", "$" }, { "find", "locate" } }; i < repl.Length / 2; i++) {
    input = input.Replace(repl[i, 0], repl[i, 1]);
}
  • 3
    You should consider adding context to your answers. Like a brief explanation of what's it doing And, if relevant, why you wrote it the way you did. – Neil Mar 16 '17 at 00:31