I'm looking for a way, in .NET, to split a string while ignoring split characters that are within quotes (or another delimiter). (This functionality would match what a typical CSV parser does if the split delimiter is a comma.) I'm not sure why this ability isn't built into String.Split()
.
Asked
Active
Viewed 2,624 times
2

Daniel Brückner
- 59,031
- 16
- 99
- 143

Pat
- 16,515
- 15
- 95
- 114
-
possible duplicate of [Input array is longer than the number of columns in this table. Exception](http://stackoverflow.com/questions/3177511/input-array-is-longer-than-the-number-of-columns-in-this-table-exception) – Rex M Jul 05 '10 at 22:40
-
@Pat: What about if you have escaped delimiters? `'Here\'s, an example'` Getting this right is difficult. It's probably best to use a dedicated CSV parser instead of trying to roll your own. – Mark Byers Jul 05 '10 at 22:41
-
So funny as I thought of the same one. Guess I should have just mentioned it as a duplicate possibly. Though I think his is different because he doesn't want to deal with just csv. – spinon Jul 05 '10 at 22:41
-
@spinon - the reader mentioned in the other question allows most standard delimiters / patterns. – Marc Gravell Jul 05 '10 at 22:45
-
oh well then there you go Pat. Sounds like you should check that out. Thanks @Marc. I think I am going to keep that in mind as well. – spinon Jul 05 '10 at 22:47
-
Thanks for the links to the other post. That CSV parser is nice, but it fails in one instance. When the input is `"\"a, a\" , \"a, a\" "`, I expect the output to be two identical strings of `"\"a, a\" "`. The CSV parser instead throws an exception - I guess that the mix of quoted and non-quoted strings isn't allowed in CSV, even though it is used for parsing email addresses (which is what I am doing). – Pat Jul 06 '10 at 15:58
4 Answers
5
You can use a regular expression for that. Example:
string test = @"this,i""s,a"",test";
string[] parts =
Regex.Matches(test, @"(""[^""]*""|[^,])+")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
foreach (string s in parts) Console.WriteLine(s);
Output:
this
i"s,a"
test

Guffa
- 687,336
- 108
- 737
- 1,005
-
Nice! This method works well for everything I need. I added a `.Select(m => m.Value.Trim())` to clean things up. – Pat Jul 06 '10 at 15:55
-
didn't work my test case, "java.exe -cp \"a stupid jar with a space.jar\" my.MainClass" which should have returned a 4 parts, but i ended up with one – spy Apr 07 '19 at 00:34
-
@spy: The example uses comma as a separator. You would need to use a space instead of the comma in the regular expression. – Guffa Apr 17 '19 at 23:58
1
Check out Marc's answer in this post:
Input array is longer than the number of columns in this table. Exception
He mentions a library you can use for this.
0
If you also want to allow single quote (') then change the expression to @"(""[^""]""|'[^']'|[^\s])+".
If you want to remove the quotes from the string then change your Select to .Select(m => m.Value.Trim(new char [] {'\'','"'})).

Brad
- 1
0
Using @Guffa's method, here is my full solution:
/// <summary>
/// Splits the string while preserving quoted values (i.e. instances of the delimiter character inside of quotes will not be split apart).
/// Trims leading and trailing whitespace from the individual string values.
/// Does not include empty values.
/// </summary>
/// <param name="value">The string to be split.</param>
/// <param name="delimiter">The delimiter to use to split the string, e.g. ',' for CSV.</param>
/// <returns>A collection of individual strings parsed from the original value.</returns>
public static IEnumerable<string> SplitWhilePreservingQuotedValues(this string value, char delimiter)
{
Regex csvPreservingQuotedStrings = new Regex(string.Format("(\"[^\"]*\"|[^{0}])+", delimiter));
var values =
csvPreservingQuotedStrings.Matches(value)
.Cast<Match>()
.Select(m => m.Value.Trim())
.Where(v => !string.IsNullOrWhiteSpace(v));
return values;
}

Pat
- 16,515
- 15
- 95
- 114