In Java there is a method splitByCharacterType that takes a string, for example 0015j8*(
, and split it into "0015","j","8","*","("
. Is there a built in function like this in c#? If not how would I go around building a function to do this?
Asked
Active
Viewed 464 times
1
-
2This uses Apache's wonderous `StringUtils` library, not really native Java. I'd argue that you're looking for a third party library more so than a native implementation. – Makoto Nov 25 '12 at 04:04
3 Answers
3
public static IEnumerable<string> SplitByCharacterType(string input)
{
if (String.IsNullOrEmpty(input))
throw new ArgumentNullException(nameof(input));
StringBuilder segment = new StringBuilder();
segment.Append(input[0]);
var current = Char.GetUnicodeCategory(input[0]);
for (int i = 1; i < input.Length; i++)
{
var next = Char.GetUnicodeCategory(input[i]);
if (next == current)
{
segment.Append(input[i]);
}
else
{
yield return segment.ToString();
segment.Clear();
segment.Append(input[i]);
current = next;
}
}
yield return segment.ToString();
}
Usage as follows:
string[] split = SplitByCharacterType("0015j8*(").ToArray();
And the result is "0015","j","8","*","("
I recommend you implement as an extension method.

caesay
- 16,932
- 15
- 95
- 160
-
+1 Nicely done. I had a go at it myself but ended up with waaayyyy too much code. – Sid Holland Nov 25 '12 at 04:20
-
-
@Mehrdad: The performance is negligible. Especially with today's hardware. If i was nit-picking, I'd use a stringbuilder instead of a segment string, too. However, I actually think in this case, it could be faster for alot of situations. (as opposed to just straight up building a list and returning it) In this case `yield return` is actually the correct and best way to do this. – caesay Nov 25 '12 at 04:33
-
@caesay: Yeah sorry, I meant in general, not this example in particular. Here the bottleneck is in the strings, not in the collection, so yes you're right, your code is fine. – user541686 Nov 25 '12 at 04:45
-
Wow, I totally would have over complicated the implementation of this. How exactly is this yield return working? Is it just returning a piece of it but not returning completely? I'm assuming it's a IEnumerable feature. – thed0ctor Nov 25 '12 at 05:21
2
I don't think that such method exist. You can follow steps as below to create your own utility method:
- Create a list to hold split strings
Define strings with all your character types e.g.
string numberString = "0123456789"; string specialChars = "~!@#$%^&*(){}|\/?"; string alphaChars = "abcde....XYZ";
- Define a variable to hold the temporary string
- Define a variable to note the type of chars
- Traverse your string, one char at a time, check the type of char by checking the presence of the char in predefined type strings.
- If type is new than the previous type(check the type variable value) then add the temporary string(not empty) to the list, assign the new type to type variable and assign the current char to the temp string. If otherwise, then append the char to temporary string.
- In the end of traversal, add the temporary string(not empty) to the list
- Now your list contains the split strings.
- Convert the list to an string array and you are done.

Yogendra Singh
- 33,927
- 6
- 63
- 73
-
2The method itself doesnt exist, however the function to compare character type DOES exist. implementing all the character type catagories and checking them one by one is not a very good idea. – caesay Nov 25 '12 at 04:16
-
0
You could maybe use regex class, somthing like below, but you will need to add support for other chars other than numbers and letters.
var chars = Regex.Matches("0015j8*(", @"((?:""[^""\\]*(?:\\.[^""\\]*)*"")|[a-z]|\d+)").Cast<Match>().Select(match => match.Value).ToArray();
Result 0015,J,8

sa_ddam213
- 42,848
- 7
- 101
- 110
-
1I think this requires way too much manual labor. It would be next to impossible to implement every different character type supported by splitByCharacterType in Java. – caesay Nov 25 '12 at 04:15
-
There is always *someone* who tries to answer any text-base question with regexes ... sigh – Stephen C Nov 25 '12 at 04:58
-