14

I need a function that will take a string and "pascal case" it. The only indicator that a new word starts is an underscore. Here are some example strings that need to be cleaned up:

  1. price_old => Should be PriceOld
  2. rank_old => Should be RankOld

I started working on a function that makes the first character upper case:

public string FirstCharacterUpper(string value)
{
 if (value == null || value.Length == 0)
  return string.Empty;
 if (value.Length == 1)
  return value.ToUpper();
 var firstChar = value.Substring(0, 1).ToUpper();
 return firstChar + value.Substring(1, value.Length - 1);
}

The thing the above function doesn't do is remove the underscore and "ToUpper" the character to the right of the underscore.

Also, any ideas about how to pascal case a string that doesn't have any indicators (like the underscore). For example:

  1. companysource
  2. financialtrend
  3. accountingchangetype

The major challenge here is determining where one word ends and another starts. I guess I would need some sort of lookup dictionary to determine where new words start? Are there libraries our there to do this sort of thing already?

Thanks,

Paul

Paul Fryer
  • 9,268
  • 14
  • 61
  • 93
  • 4
    One quick comment - that's Pascal case. Camel case starts with a lower case, e.g. `rankOld`. – Jon Skeet Aug 02 '10 at 09:50
  • 1
    @Jon O, good to know..updating... – Paul Fryer Aug 02 '10 at 09:52
  • 1
    Another quick comment - it's not necessary to specify a length when you want the entire substring from a certain starting point. So instead of value.Substring(1, value.Length - 1) you can simply do value.Substring(1). – Anton Aug 02 '10 at 09:54
  • @Anton Thanks for the tip, I'll start using the shorter overload. – Paul Fryer Aug 02 '10 at 09:57
  • 1
    Word of warning: if you go down the dictionary route to identify components of a string with no indicator of where each word begins and ends, then be very careful otherwise you can end up with unexpected results -- the folks at expertsexchange can probably tell you more about this problem. – Paul Ruane Aug 02 '10 at 10:03

6 Answers6

23

You can use the TextInfo.ToTitleCase method then remove the '_' characters.

So, using the extension methods I've got:

http://theburningmonk.com/2010/08/dotnet-tips-string-totitlecase-extension-methods

you can do somethingl ike this:

var s = "price_old";
s.ToTitleCase().Replace("_", string.Empty);
theburningmonk
  • 15,701
  • 14
  • 61
  • 104
  • @theburningmonk I like what I'm seeing so far... might just end up using this approach. – Paul Fryer Aug 02 '10 at 10:06
  • @Paul - no probs ;-) glad I could help! – theburningmonk Aug 02 '10 at 10:31
  • MSDN states that this function is implemented incorrectly and may therefore be subject to change; so be careful with new releases of .NET. – Jan Jongboom Aug 02 '10 at 12:03
  • @Jan - but then anything in the framework is and can be subject to change even if the relevant MSDN article doesn't specifically state it as such.. I don't think that's reason enough to refrain from using them. – theburningmonk Aug 02 '10 at 12:14
  • This doesn't work for things like 'iTunes'. It yields 'Itunes' instead of 'ITunes' which is what I think is correct. – Nick Strupat Feb 03 '14 at 04:18
11

Well the first thing is easy:

string.Join("", "price_old".Split(new [] { '_' }, StringSplitOptions.RemoveEmptyEntries).Select(s => s.Substring(0, 1).ToUpper() + s.Substring(1)).ToArray());

returns PriceOld

Second thing is way more difficult. As companysource could be CompanySource or maybe CompanysOurce, can be automated but is quite faulty. You will need an English dictionary, and do some guessing (ah well, I mean alot) on which combination of words is correct.

Jan Jongboom
  • 26,598
  • 9
  • 83
  • 120
  • As you so effectively pointed out, dealing with words is hard. I guess there is no way around it, I'll have to do some sort of dictionary lookup. I guess I was hoping someone already developed something I could use. – Paul Fryer Aug 02 '10 at 10:00
  • +1: for pointing out the dictionary solution for *second thing* – KMån Aug 02 '10 at 10:46
4

Try this:

public static string GetPascalCase(string name)
{
    return Regex.Replace(name, @"^\w|_\w", 
        (match) => match.Value.Replace("_", "").ToUpper());
}

Console.WriteLine(GetPascalCase("price_old")); // => Should be PriceOld
Console.WriteLine(GetPascalCase("rank_old" )); // => Should be RankOld
Rubens Farias
  • 57,174
  • 8
  • 131
  • 162
  • Only this is four times as slow as just splitting and substringing, and twice as slow when compiling the regex (doing this 100.000 times). – Jan Jongboom Aug 02 '10 at 09:58
2

With underscores:

s = Regex.Replace(s, @"(?:^|_)([a-z])",
      m => m.Groups[1].Value.ToUpper());

Without underscores:

You're on your own there. But go ahead and search; I'd be surprised if nobody has done this before.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • Exactly what I was looking for, so +1. Adding some bits of information about how this regex works would make the answer even more valuable. And for those who want this to run faster, you can use this with options and make the regex pre-compiled: `s = Regex.Replace(s, @"(?:^|_)([a-z])", m => m.Groups[1].Value.ToUpper(), RegexOptions.Compiled);` – Manfred Jul 24 '18 at 04:32
0

For your 2nd problem of splitting concatenated words, you could utilize our best friends Google & Co. If your concatenated input is made up of usual english words, the search engines have a good hit rate for the single words as an alternative search query

If you enter your sample input, Google and Bing suggest the following:

original             | Google                | Bing
=====================================================================
companysource        | company source        | company source 
financialtrend       | financial trend       | financial trend
accountingchangetype | accounting changetype | accounting change type

See this exaple.

Writing a small screen scraper for that should be fairly easy.

Frank Bollack
  • 24,478
  • 5
  • 49
  • 58
  • http://stackoverflow.com/questions/3856630/how-to-separate-words-in-a-sentence-with-spaces - 8 lines for a shell script. – Dave Jarvis Dec 07 '10 at 02:38
0

for those who needs a non regex solution

 public static string RemoveAllSpaceAndConcertToPascalCase(string status)
        {
            var textInfo = new System.Globalization.CultureInfo("en-US").TextInfo;
            var titleCaseStr = textInfo.ToTitleCase(status);
            string result = titleCaseStr.Replace("_","").Replace(" ", "");

            return result;
        }
Iman
  • 17,932
  • 6
  • 80
  • 90